[wellylug] Indentifying Duplicate Files

David Antliff dave.antliff at paradise.net.nz
Mon Dec 20 17:04:42 NZDT 2004


On Sat, 18 Dec 2004, Jamie Dobbs wrote:

> I have a directory full of sound bytes and clip art (approx. 45,000 files) 
> that I have collected over many years.
> I want to search the entire directory (and sub directories) and find any 
> duplicate files, by content rather than by filename or filesize. Can anyone 
> tell me of any command line programs for Linux that might allow me to do 
> this, then give me the option of deleting (or moving) the duplicate files.

Here we go - the basis of something along those lines at least:

$ find -exec md5sum \{\} \; 2>/dev/null  > list
$ sort list  > list.sorted
$ uniq -w 32 -d list.sorted  > duplicates


-- 
David.




More information about the wellylug mailing list