[wellylug] utility to determine text file encoding?
Daniel Pittman
daniel at rimspace.net
Mon Mar 2 17:33:13 NZDT 2009
Joe Mahoney <joe at cheerschopper.com> writes:
> Is there a nice little command line app that, given a text file, will
> tell me the encoding/charset of the file.
No, because this is an impossible task. Unless the file contains
embedded or external metadata you can /guess/, but not actually know.
Specifically, it isn't generally possible to distinguish UTF-8 from the
ISO-8859-* encodings, and hard to distinguish those from the Shift-JIS
style encodings.
It is also not possible, generally, to distinguish UTF-16 and
equivalents from random binary data, since most random binary data is
valid.
Some languages include code to take a stab at the likely encoding, and
you can probably rule out some values (such as UTF-8 with invalid
bytes), but your best guess is just going to be a guess...
What are you trying to achieve? There may be a better way to do it.
Regards,
Daniel
More information about the wellylug
mailing list