[wellylug] utility to determine text file encoding?

Daniel Pittman daniel at rimspace.net
Mon Mar 2 18:26:32 NZDT 2009


Joe Mahoney <joe at cheerschopper.com> writes:
> On Mon, Mar 2, 2009 at 5:33 PM, Daniel Pittman <daniel at rimspace.net> wrote:
>
>> No, because this is an impossible task.  Unless the file contains
>> embedded or external metadata you can /guess/, but not actually know.
>>
> Yeah, I knew it was black magic, I just wondered if anyone had had a
> crack at a best guess app.

Well, the ICU classes used to implement coding conversion include a
statistical and algorithmic model.  I don't know of anything that has
implemented that in a command-line wrapper:

http://icu-project.org/docs/papers/Automatic_Charset_Recognition_IUC29.ppt

Regards,
        Daniel



More information about the wellylug mailing list