[wellylug] utility to determine text file encoding?

Stephen Judd stephen at vital.org.nz
Tue Mar 3 07:24:00 NZDT 2009


On Mon, 2009-03-02 at 17:21 +1300, Joe Mahoney wrote:
> Hi All
> 
> Is there a nice little command line app that, given a text file, will
> tell me the encoding/charset of the file.

There is a python module called "chardet" which is somewhat successful a
lot of the time:

http://chardet.feedparser.org/

stephen at lung:~$

>>> import urllib
>>> urlread = lambda url: urllib.urlopen(url).read()
>>> import chardet
>>> chardet.detect(urlread("http://google.cn/"))
{'encoding': 'GB2312', 'confidence': 0.99}





More information about the wellylug mailing list