[wellylug] Indentifying Duplicate Files

Andrej andrej at paradise.net.nz
Mon Dec 20 17:25:18 NZDT 2004


> --- Jamie Dobbs <jamie.dobbs at orcon.net.nz> wrote:
> I have a directory full of sound bytes and clip art (approx.
> 45,000 files) that I have collected over many years.
> I want to search the entire directory (and sub directories)
> and find any duplicate files, by content rather than by
> filename or filesize. Can anyone tell me of any command line
> programs for Linux that might allow me to do this, then give
> me the option of deleting (or moving) the duplicate files.
The thing is that a diff or other common tool won't be able
to do anything like pattern matching if the size doesn't 
match the content won't... there's tools out there to match
graphics files against each other on a per-content approach,
I don't know how reliably they work though.

I'd go with Brent and first of make a comparison of 
file-names and sizes, in regards to mp3's you can probably
try some tools that allow for extraction of id3 tags and
run comparisons on those ...

 




More information about the wellylug mailing list