|
||||||
Checksum mp3 audio frames (the data and not the headers)DJ 1E and I have often lamented over the difficulties in version control with our audio collections. Particularly with our usage of MusicBrainz, there are often pre- and post- processed files that have the same audio data, but different headers. Traditional full-file checksums (e.g. md5sum) will present different checksums given the header variations. The ideal would be a tag program that can checksum the mp3/ogg/m4a/aac audio stream and store the generated hash as a tag, similar to how flac handles its crc sums. I did a bit more digging into this matter today and it seems that some code is moving in this direction. I find a good but dated thread over at Hydrogen Audio asking a similar question, Is there a Tool for MP3 checksum generation?, Audio part only. This thread introduces LAMEtag, a Windows command line exe with a GUI component as well. Unfortunately, this otherwise handy utility only works on LAME encoded mp3s (by design, obviously). Good for verifying CRC checksums if they were encoded into the frames with LAME. While some of my collection fits this bill, I need a more general solution. The same thread goes on to mention MP3tag, another Windows utility (GUI) that offers a wealth of features, including md5 checksums of audio only. It takes me a bit of trial and error to realize that the means of generating and validating said sums has to be done through the programs export functionality. $filename(txt)$loop(%_filename_ext%)%_filename_ext% %_md5audio% $loopend() I eyeballed the results and sure enough, they matched. Unfortunately, my archives live in Linux and having to transfer two similar files every time I want to compare is inconvenient. I need command line! SebastianG offers a java .jar called mp3d5.jar that would apparently do the trick, but the only links I can find to that bit of magical code turn up 404. Fortunately, this post at StackOverflow is on the same path and would provide me with the silver (plated at least) bullet for this issue. Calculate checksum of audio files without considering the header
The author wants the same thing I am searching for–the ability to generate a checksum of the audio stream and store it in the file header as a tag. Furthermore, he mentions his use of mp3cat! I pulled down a copy of mp3cat and compiled it on my archive box. Then the fun began File1 and File2 are two mp3s with the same header information and content, but differing file sizes. File3 a known “non-matching” mp3. % ./mp3cat - - < file1.mp3 | md5sum e9f10503ea7afd9adf676e6f20370e45 - % ./mp3cat - - < file2.mp3 | md5sum e9f10503ea7afd9adf676e6f20370e45 - % ./mp3cat - - < file3.mp3 | md5sum 793774c7956a5fcacb7062b58b8c4677 - % eyeD3 file1.mp3 > file1.id3 % eyeD3 file2.mp3 > file2.id3 % diff file1.id3 file2.id3 2c2 < file1.mp3 [ 5.20 MB ] --- > file2.mp3 [ 5.20 MB ] 6c6 < ID3 v2.4: --- > ID3 v2.3: % ls -l *.mp3 -rwx------ 1 tmo tmo 5447834 2009-01-12 01:19 file1.mp3 -rwxr-xr-x 1 tmo tmo 5448730 2009-07-19 18:31 file2.mp3 See the diff? Conflicting ID3v2 tag versions. One file is using 2.3 and the other is using 2.4, which would likely explain the negligable file size differences. Firing up MusicBrainz Picard and looking at my configuration, I see that I’d checked the Write IDV2.3 tags (instead of the 2.4 default)
I unchecked this box but still had questions. What’s the difference between 2.3 and 2.4? What do the different character encodings afford? A glance at the help file for MusicBrainz reveals the following
That explains a good chunk of the difference. Going forward I will use the 2.4 tags though I have some concerns as to whether the newer tags will confuse some of my older hardware players (my 500GB iRiver, for instance). This also introduces another variable or two that I hadn’t considered. ID3 tag versioning and character encoding of said. Ideally, I’d want backward compatibility with ID3v1 for ancient devices. But for sake of functionality and depth of options, I’d want to eventually migrate my tags (across ogg, mp3, aac, m4a) to v2.4 with either UTf-16 or UTF-8. I am still confused about the character encoding. The always informative Hydrogen Audio forums covered the topic in this thread: UTF-8 vs UTF-16 in ID3 Tags? The takeaway for me being:
mp3cat seems to be the solution for now and I will benefit from creating a script to utilize it in verifying files in my collection. |
||||||
|
Copyright © 2012 Tim's Mind Organized - All Rights Reserved |
||||||