Posted on 30th October 2018 by Callum
In this post, we’ll be talking about compression. This isn’t audio compression as you may find as a plugin in your DAW or on your pedalboard. We are talking about compression of file sizes! Whether you prefer listening to digital or analogue audio, most music consumed today is digital. That may be downloaded onto your mobile or streamed directly from services such as Spotify, YouTube or Apple Music. It’s all digital audio. Because of physical limitations such as download speeds and hard drive space, it’s important to get these digital files as small as possible, whilst still preserving audio quality. So how is it done? What’s a WAV and what’s an mp3?
There are two types of digital audio formats; “Lossless” and “Lossy”. Lossless compression is so named as no audio information is lost through the compression process. Lossy compression reduces the file size by removing some of the audio, and therefore loses quality. The main trade-off here is the compromise between file size and audio quality. It’s possible to reduce an audio file’s size by a greater magnitude using lossy compression, however there is an inherent reduction in quality.
As an example, 1 min of CD quality WAV is around 10 MB, whereas an mp3 at 320 kbps is around 2.5 MB.
Codecs are devices, computer programs or systems used for encoding and decoding digital data. These are usually identified from the file extension. Examples of audio codecs include MP3, WAV and AAC. Other non-audio codecs could be .doc (Microsoft Word document), .html (web page) etc. They are effectively the “language” used to pack and unpack the data inside a particular file.
When we discuss audio quality an important consideration is “Bit Rate” in Kbps (Kilobits per second). This value is a combination of the Bit Depth and Sample Rate, resulting from multiplying the two together. Sampling is the instantaneous “snapshot” of an audio signal, to generate a digital file. Taking repeated snapshots and putting them in chronological order gives you an approximation of an audio signal. (Check out the audio interfaces post for a deeper explanation).
Sample Rate (Hz): The number of times per second that audio is sampled.
Bit Depth (bit): The resolution at which the audio can be sampled.
So the more samples per second and the higher the bit depth, the more detail in the audio.
Whether you like it or not, most media ends up being lossy by necessity:
These compression codecs use algorithms (which we will go into later) to try and maintain the highest audio quality whilst still managing to cut out considerable file size. At higher bit rates you may actually not be able to tell the difference between a WAV and an MP3. There is always a compromise between what is practical and keeping the quality high.
WAV and PCM are uncompressed file formats. This gives you the maximum quality audio and ‘fullest’ sound. This essentially means that every point in time has an amplitude assigned to it. So if you are sampling at 44.1 kHz you’ll have 44100 samples for every second of your audio. The compromise of this level of quality is that you end up with a big file.
Maximum resolution WAV is the quality used in a recording studio environment. This is the closest to analogue audio we can currently get to in the digital world. Studios often use very high bit rate audio, such as 192 kHz at 32-bit. This is because every time you convert the file, it will lose some quality. So, the idea here is to start at the highest possible quality to lose as little the time you are down to CD quality at 44.1 kHz at 16-bit.
FLAC is a form of lossless compression, which maintains the same uncompressed quality yet manages to produce a file that is around 95% of the original size. Whilst not seeming a massive saving, once you’re working with say your whole music library the spare gigabytes will add up! How does it do it then? FLAC files use the exact same information that’s in a WAV, but it is repacked in a more efficient way with respects to file size. This means you retain the maximum uncompressed quality but shave a bit off the file size.
But when we need to compress audio and make some significant space savings for online use, there’s currently no other option than to use lossy compression formats. They are a range of these such as MP3, MP4, Ogg Vorbis, WMA and AAC.
Uncompressed CD audio (WAV) is 1,411.2 kbit/s, so the bitrates 128, 160, 192, 320 kbit/s (MP3) represent compression ratios of approximately 11:1, 9:1, 7:1 and 4:1 respectively.
So how do these formats make such a big saving on space? They use a variety of algorithms which take advantage of psychoacoustic phenomena, or how your brain actually perceives sound.
Adult Hearing Loss – Despite the upper limit for human hearing being around 24 kHz, adults often lose their capacity for hearing above 16-18kHz. This means you can remove anything above this frequency with little perceptible effect.
De-Emphasize Quiet – Your brain and ears do something called “Simultaneous Masking”. This allows you to focus on a loud sound over the top of a background of quieter sounds. Think about having a conversation in a busy bar/ restaurant. You can still chat over the background noise. Mp3s remove some of the quieter sounds in the mix. These sounds don’t need to be encoded in such detail, as your brain largely ignores them anyway.
Temporal Masking – Your brain does a similar thing with sounds that occur a few milliseconds apart, and you only ‘hear’ the loudest of the two. Your hearing works on a kind of latency, running around 30ms behind reality. This is like the processing time for your brain, and means that you focus on the most important sounds from the last 30ms. Doing this on an Mp3 means removing some of the much quieter sounds around a louder one.
Minimum Audition Threshold – Quiet sounds are just that – very quiet. This means the encoder can save much less of the data for a quiet sound as we aren’t going to hear much of it anyway!
Bit Rate Management – This is where the big savings are done! If you go from a 24 bit recording and go down to 16 bit, you’ve straightaway reduced by 25%, with very little noticeable drop in quality. Sample rate reduction is also another way to make a huge saving. Going from 96 kHz to 44.1 kHz (CD Quality) represents over 50% of file-size saving.
The image above shows the difference in freqeuncy content between a compressed MP3 and the uncompressed audio. See how the compressed audio contrains frequencies up to 45 kHz, whereas the MP3 has cut off most of the content at just over 20 kHz.
Another explanation can be found here.
There’s our rundown of file formats and file compression! Next time someone asks you to send over your latest track, you’ll have a bit more information on what is going to be the best format! And hopefully you’ll have a deeper understanding of what sampling is, and what it can do to your music.
All of this combines when you’re using your studio gear at home – it’s our philosophy that knowledge will help you get the most out of your gear! Have a look at some of the audio interfaces available, and see how you can boost your home studio setup.
Receive weekly news of special offers, discounts, new products and promotions.
You can unsubscribe at any time.
Rich Tone Music Ltd is a company registered in England with company number 05285423 and VAT Number 870 3855 09