by Paul Trilsbeek
Field linguists often ask me whether they shouldn’t be recording audio in high definition, 24 bit 96 kHz format, because their recorder has this option and the higher the quality the better, right? Well, not really. I’ll try to explain why it doesn’t make much sense to do so and why we even convert all audio recordings that we receive at The Language Archive to 16 bit 44.1 or 48 kHz.
When the digital audio CD standard was developed, it was argued that a digital representation of the audio signal using 16 bits and a sampling frequency of 44.1 kHz was sufficient to capture all the details a human being would be able to hear in a musical recording. For most types of music that is actually the case, only some highly dynamic music with both very loud as well as very silent passages might not fit in the 96 dB of dynamic range that 16 bits of audio resolution offer. Nonetheless, companies selling audio equipment such as Philips and Sony saw the need to introduce newer formats such as the Super Audio CD and the DVD-Audio format at the end of the nineties, not unlikely driven by the idea to have consumers replace their perfectly fine CD players with the latest state of the art. Both turned out a commercial failure. Still, high definition audio has gained some ground in the recording industry and during the last years also in “prosumer” audio recording equipment.
Before I go into the issue whether or not humans can actually hear a difference between HD and regular CD-quality audio, let me give some arguments why from a technical point of view it makes little to no sense for field linguists to record in 24 bit 96 kHz or higher.
Many cheap portable audio recording devices these days offer the possibility to record in 24 bit at 96 kHz. Recording with a sampling frequency of 96 kHz means that in theory you can record frequencies up to 48 kHz, more than double the highest frequency that (young) human beings can hear and way beyond the highest frequency components that are present in a speech signal (about 7 kHz). The built-in microphones in these types of recorders however do not capture anything above 16 kHz at most, so in order to record higher frequencies, one needs to use an external microphone. There are microphones on the market that record frequencies up to 40 or 50 kHz, but these are not the kind of microphones a linguist would typically take into the field if they even were within their budget (>3000 € a piece). The same is true for the dynamic range. 16 bit recordings can have a theoretical dynamic range of 96 dB, 24 bit recordings can have a dynamic range of 144 dB. The background noise in a very quiet room has a sound pressure level of about 20-30 dB, the human pain threshold lays around 130 dB. Human speech has a dynamic range of about 40 dB. Very good microphones have a dynamic range of about 120 dB, however the type of microphone a linguist is likely to be using in the field does not have a dynamic range higher than about 75 dB. Recording high definition audio from a technical point of view only makes sense with ultimate quality recording equipment, for example in a recording studio or in a high-end digitization facility.
Some argue that recording in 24 bit would allow one to leave more “headroom” for unexpected peaks when setting the recording level. This is only true though for the level of the analog line-level signal that goes into the analog–to-digital converter of the recorder. Most portable audio recorders only allow one to adjust the input gain of the microphone preamplifier, which should be adjusted properly anyhow to achieve a good signal-to-noise ratio, regardless of whether one records in 16 or 24 bit.
Some analogue carriers can actually reproduce sound beyond the limits of the digital audio CD specification. 1/4 inch open reel audio tape being recorded/played on a studio recorder with Dolby SR noise reduction could achieve a dynamic range of over 100 dB for example. Commercially produced vinyl records can in some cases contain frequencies of up to 50 kHz. For archives dealing with these kinds of materials, it would make sense to digitize them in high definition formats in order to truthfully capture the originals.
It is still debated whether humans can actually hear the difference between CD-quality and high definition audio. Audiophiles claim that the presence of frequencies above the human hearing limit does have an influence on the frequencies that we do hear. Blind listening tests however have shown that even expert listeners were at chance level when having to judge whether a recording was high definition or not (Meyer and Moran, 2007). In order to rule out possible differences in the recordings themselves, the same high definition recordings were played both with and without a device in the chain to reduce the recordings to regular CD quality. The rest of the playback setup (loudspeakers, amplifiers, cables, etc.) was left identical.
The main disadvantages of recording with high sampling frequencies and bit rates are that the recordings take up more storage space and that they are less compatible with audio software and hardware. Recordings made in 24bit/96kHz take up 3 times as much storage space as CD quality recordings and even though flash memory cards are getting cheaper every month, this is still a drastic reduction in recording capacity for no real-world benefit in terms of quality. Recording in 24 bit at normal sampling frequencies (44.1kHz/48kHz) would create files that are 1/3 larger than 16 bit files, which isn’t too dramatic and could be justified when using very high grade microphones and recording equipment. The fact that not all audio software and hardware can play back high-definition formats may cause problems when working with the files on a computer. As an archive, we would therefore need to create additional copies in standard CD quality, such that everyone can use the files. Instead of creating duplicate files in different qualities, we have chosen to normalize and convert high definition audio to regular 16 bit at 44.1 or 48 kHz. The normalization step before the conversion makes sure that we use the maximum 96 dB of dynamic range that 16 bits offer, which is more than enough to retain the full quality of the recordings we receive.
E. Brad Meyer and David R. Moran (2007). “Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback”, Journal of the Audio Engineering Society, 55-9, pp. 775-779.