[wellylug] utility to determine text file encoding?

Peter Lambrechtsen plambrechtsen at gmail.com
Tue Mar 3 08:38:53 NZDT 2009


Or were you more interested to know if a file was XML/HTML/PS/ 
something else as it's missing a file extension and you want to figure  
out the file contents. Rather than the code page of the text in the  
document which is what that sf app does.



On 2/03/2009, at 10:31 PM, Joe Mahoney <joe at cheerschopper.com> wrote:

> I'll give it a crack. Thanks!
>
> Joe
>
> On Mon, Mar 2, 2009 at 6:50 PM, Peter Lambrechtsen
> <plambrechtsen at gmail.com> wrote:
>> Googling around would cpdetector on source forge do what you want?
>>
>>
>>
>> On 2/03/2009, at 6:26 PM, Daniel Pittman <daniel at rimspace.net> wrote:
>>
>>> Joe Mahoney <joe at cheerschopper.com> writes:
>>>> On Mon, Mar 2, 2009 at 5:33 PM, Daniel Pittman
>>>> <daniel at rimspace.net> wrote:
>>>>
>>>>> No, because this is an impossible task.  Unless the file contains
>>>>> embedded or external metadata you can /guess/, but not actually
>>>>> know.
>>>>>
>>>> Yeah, I knew it was black magic, I just wondered if anyone had  
>>>> had a
>>>> crack at a best guess app.
>>>
>>> Well, the ICU classes used to implement coding conversion include a
>>> statistical and algorithmic model.  I don't know of anything that  
>>> has
>>> implemented that in a command-line wrapper:
>>>
>>> http://icu-project.org/docs/papers/Automatic_Charset_Recognition_IUC29.ppt
>>>
>>> Regards,
>>>        Daniel
>>>
>>>
>>> --
>>> Wellington Linux Users Group Mailing List: wellylug at lists.wellylug.org.nz
>>> To Leave:  http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>>
>>
>> --
>> Wellington Linux Users Group Mailing List: wellylug at lists.wellylug.org.nz
>> To Leave:  http://lists.wellylug.org.nz/mailman/listinfo/wellylug
>>
>
>
> -- 
> Wellington Linux Users Group Mailing List: wellylug at lists.wellylug.org.nz
> To Leave:  http://lists.wellylug.org.nz/mailman/listinfo/wellylug



More information about the wellylug mailing list