Organizing the virtual sock drawer
I spent yesterday working on the equivalent of reorganizing my sock drawer, in this case, reconciling two (legal) 10 gig mp3 collections on my laptop and linux machine to one 16 gig collection.
The tracks have been ripped with at least two programs (iTunes/cdparanoia) and 4 different naming conventions. Each platform had songs ripped to filenames that the other couldn’t understand — either by length or character set issues. Some of the albums had been copied from one to the other, and had metadata corrected on the new platform. Generally multidisk sets, just to helpfully use more space. And to top it off, I didn’t have enough spare space on any single drive to hold the entire possible 20 gig combination of the two, and I didn’t want to touch up filenames or merge by hand.
So to sum it up, I have duplicate files with different names and (slightly) different contents, and the ones that are most likely to be like that are part of the collections that take up the most space. And I don’t want to do any of it by hand.
So after playing with perl to make a bunch of consistent, safely named soft links on each platform, md5summing both archives, and transferring the files that didn’t have matching checksums, I wound up within 200 megs of filling the target, while only transferring an extra gig or so. And realizing that duplicate checking manually might have been faster.
I think there’s one other learning point here, and that’s that automated processes are no match for bad metadata.
No comments