[wellylug] Indentifying Duplicate Files
Grant McLean
grant at mclean.net.nz
Mon Dec 20 17:01:50 NZDT 2004
On Sat, 2004-12-18 at 17:32, Jamie Dobbs wrote:
> I want to search the entire directory (and sub directories) and find any
> duplicate files, by content rather than by filename or filesize.
You could use md5sum which will give a hex 'hash' of the contents each
file, eg:
ef9630b73a7193029f35f229fcca0f48 cavalry.gif
4a50e5b47f84496a98e2492efd02e344 cavalry.jpg
8ee6a4bcb0bb776d6aa05b01cc8e6a7d compass.gif
You could then identify all files with the same hash. Here's an
incredibly ugly Perl one liner to do that:
find . -type f -print0 | xargs -0 md5sum | perl -ne '($m, $f) = split;
$d{$m} ||= [], push @{$d{$m}}, $f; END { foreach (values %d) { next
unless @$_ > 1; print join "\n", @$_, "", ""} }'
My mailer will have wrapped that but it should work anyway.
Cheers
Grant
More information about the wellylug
mailing list