file command apparently returning wrong MIME type
Why doesn’t the following return
$ echo 'foo,barnbaz,quux' > temp.csv;file -b --mime temp.csv
I used this example for extra clarity but I’m also experiencing the problem with other CSV files.
$ file -b --mime '/Users/jasonswett/projects/client_work/gd/spec/test_files/wtf.csv'
Why doesn’t it think the CSV is a CSV? Is there anything I can do to the CSV to make
file return the “right” thing?
Unfortunately, there is probably nothing you can do to make file produce the correct output.
file command tests the first few bytes of a file against a database of magic numbers. That is easy to check for in binary files (like images or executables) which have some specific identifiers at the beginning of the file.
If the file is not a binary file, it will check the encoding as well as look for some specific words in the file to determine the type, but only for a limited number of file types (most of which are programming languages).
The mimetypes are determined by what the unix manpages called ‘magic numbers’. In every file there is a magic number that determine the file type and file format. The extract below is from the file command man pages
The magic number tests are used to check for files with data in partic-
ular fixed formats. The canonical example of this is a binary exe-
cutable (compiled program) a.out file, whose format is defined in
a.out.h and possibly exec.h in the standard include directory. These
files have a 'magic number' stored in a particular place near the
beginning of the file that tells the UNIX operating system that the
file is a binary executable, and which of several types thereof. The
concept of 'magic number' has been applied by extension to data files.
Any file with some invariant identifier at a small fixed offset into
the file can usually be described in this way. The information identi-
fying these files is read from the compiled magic file
/usr/share/file/magic.mgc , or /usr/share/file/magic if the compile
file does not exist. In addition file will look in $HOME/.magic.mgc ,
or $HOME/.magic for magic entries.
The unix man pages also mentioned that if the file does not match a magic number, the text file is considered ASCII/ISO-8859-x/non-ISO 8-bit extended-ASCII (best suited format)
If a file does not match any of the entries in the magic file, it is
examined to see if it seems to be a text file. ASCII, ISO-8859-x, non-
ISO 8-bit extended-ASCII character sets (such as those used on Macin-
tosh and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded Uni-
code, and EBCDIC character sets can be distinguished by the different
ranges and sequences of bytes that constitute printable text in each
set. If a file passes any of these tests, its character set is
reported. ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are iden-
tified as ''text'' because they will be mostly readable on nearly any
mimetype command instead of the file command
web link for further digging