[wellylug] Indentifying Duplicate Files

David Antliff dave.antliff at paradise.net.nz
Mon Dec 20 17:02:12 NZDT 2004


On Sat, 18 Dec 2004, Jamie Dobbs wrote:

> I have a directory full of sound bytes and clip art (approx. 45,000 files) 
> that I have collected over many years.
> I want to search the entire directory (and sub directories) and find any 
> duplicate files, by content rather than by filename or filesize. Can anyone 
> tell me of any command line programs for Linux that might allow me to do 
> this, then give me the option of deleting (or moving) the duplicate files.

By content but not filesize... you want to discriminate between files that 
are the same size but have different contents, right?

How about something like (this is kinda pseudo-commands):

find w/ -exec md5sum, sort then pipe thru uniq -w 32 -d

This will give you a list of duplicates which you can then pipe thru 
something to clean them up.

?

-- 
David.





More information about the wellylug mailing list