Encoding Audio with Andrew the Audio Scientist

Today, we’re debuting a new blog feature from the ACX Audio Team. Our resident audio scientist will be stopping by occasionally to tackle a different technical aspect of audiobook production. For his inaugural post, Andrew takes a look at the process of encoding your audio and introduces a new resource to aid in your productions.

Decoding Encoding

Andrew_250x320Today I want to talk about one of the nerdiest aspects of audiobook production: your audio files. I’m addressing this now because we’ve recently added a helpful audio encoding guide to ACX, and I want to make sure producers understand the concept and can use it to create audio that will meet our encoding requirements.

As the last step in the audiobook production process, audio encoding tends to get overlooked, and it’s easy to understand why. After spending hours producing an audiobook – from recording, to proofing, editing, and mastering – it can be easy to forget to tick the right check boxes and configure the necessary settings in your encoding software. But overlooking this step could block the file from successfully uploading, or cause the files to be rejected during our Quality Assurance check.

What is encoding? It’s the process of converting your uncompressed audio files into a format more suitable for certain applications. For example, most digital audio workstations (DAW’s) will output files by default, and each of your audiobook’s WAV files may end up being several hundred megabytes in size. This is fine for audio production environments, but it’s not an ideal format for uploading your files to an external location (like ACX), so we require users to encode their audio with the MP3 audio codec. This process compresses the data in your file, reducing size and allowing for faster uploads without severely degrading the sound quality.

The ACX Audio Encoding Guide

We want Audible’s customers to have the best possible listening experience, and we don’t want any ACX title to be held up because it contains files that don’t meet our requirements. This brings us back to our new audio encoding guide, which should help you navigate these tricky waters. The techniques used work on both Windows and Macintosh platforms, and if followed correctly, will encode your audio into standards that meet the ACX Audio Submission Requirements.

I’d like to end my premier blog post with one final note: at the end of the day, all digital audio is data. It’s made up of the same zeroes and ones that comprise an eBook’s manuscript, the ACX website, and everything else in the digital world. The integrity of this data is critical to your audiobook’s success. Keeping this thought in the back of your mind while producing your next audiobook may very well lead to an even better final production.


Andrew, the ACX Audio Scientist.

21 responses to “Encoding Audio with Andrew the Audio Scientist

  1. Is this new or just a reminder? I convert .wav files to .mp3 using TwistedWave (on a Mac), with all the specs discussed above. Do we now need to do this other process too?

    • Hi Jack,
      Thanks for the question! This encoding guide was designed to help ACX producers who may like assistance in ensuring their audio is encoded to ACX’s specifications. So, if you have been successfully encoding your MP3s to ACX standards in the past then never fear. Nothing about our Audio Submission Requirements has changed.
      Andrew Grathwohl

  2. I keep coming up blank with an answer from ACX to this question: HOW exactly, and by using WHAT specific converter, can we studio pros make “INTERLEAVED” stereo? Your otherwise excellent guide to fre:ac does NOT address stereo, only mono, and my stereo files made by fre:ac have not been accepted. It seems impossible to find anything on the subject of interleaved stereo on the internet, so I look to ACX for an answer. And please, do NOT again repeat “read our guidelines.” They don’t address this issue at all. Nor have I found a single converter online stating “This makes interleaved stereo files.” I am therefore at your mercy. Respectfully, Chuck McKibben

    • Chuck,
      In case you don’t see my post below, here’s one solution to providing interleaved stereo. I am forced to deliver stereo files to ACX because the software I use can’t create a 192kbps mono file.

      Using Sony Sound Forge:
      Load .wav file into Sound Forge
      Select File>Save As…
      Select MP3 Audio (*.mp3) in Save as type: dropdown
      Click Custom… button
      Select bit rate of 192 Kbps, 44,100 Hz from dropdown
      For Stereo Encoding, select the Stereo radio button
      Click OK
      Click Save

      Vegas is similar, but you start with File>Render As…

  3. Chuck, does ACX distribute audio in stereo? I wasn’t aware of that. If they don’t, why would you deliver in stereo? TwistedWave uses the LAME encoder engine, and when set to 0=Best quality it sounds amazing. And no client of mine who’s used this method has yet to be rejected to my knowledge, so never fear, JackdG!

    • Thanks for the comments, George. Yes, ACX continues to say that they distribute in INTERLEAVED stereo, and as the producer for a young female narrator who wants her recordings in stereo due to the presence of music and sound effects, I’ve tried to give her what she wants. I use the original Cool Edit Pro, still my fav of all recording programs, which can make mp3 in stereo at 192 kbps. Unfortunately, ACX rejected that because it wasn’t interleaved. My frustration is that no one seems to be able to tell me, with absolute certainly, which program makes interleaved stereo. Like you, no one has ever rejected my files of any type before…until now. The ACX-recommended fre:ac converter uses LAME, too, like Twisted Wave, but NG as far as ACX is concerned. Anyway, I would love to chat with you some day and pick your brain. I’m a VO coach who was studio manager for Mel Blanc in the ’70’s, a producer/director of commercials in NYC for 30 years, and now helping students get their audiobooks on ACX, the most exciting development for voice artists since the dawning of the internet. Again, thanks for responding. Best regards, Chuck Mckibben, Philadelphia.

      • This is an interesting discussion. However, if this article is about conforming to ACX standards then that’s what is done. They’ve obviously done the math about file size vs. quality and have arrived there. I believe that your question involves how to deliver a stereo interleaved file.There may also be a question about, How did ACX decide on format?

        I work primarily in Pro Tools 10 and it will give me the option to encode an interleaved .mp3 at CBR: 192 Kbit/s. I quickly looked at Logic 9 which records interleaved stereo however, I was surprised to find an interleaved .mp3 was not an option. So follows iTunes, which output choices for .mp3 are only normal and “Joint Stereo”.

        So I think answers will come from determining which algorithms each software encoder employs. In the ACX guideline .pdf for fee:ac, there is a caveat “ensure joint stereo is not checked” probably because, though smaller, the file is distinctly 2 channels.

      • There’s no such thing as interleaved stereo in mp3. There’s only interleaved stereo in .wav files and other simple file formats. You have mono, dual channel (meaning left and right are totally independent – used for a second channel containing another language, for example), stereo (with limited redundancy reduction), and joint stereo, which takes advantage of the redundancy in the stereo signal. Enabling mid/side and Intensity features of joint stereo allows the codec to switch between two different redundancy algorithms. The intensity stereo algorithm is a simple frequency based scheme and mid/side is the standard M/S (sum and differences) method, and the difference channel is allocated less quality than the mid channel. I believe that what is being requested is to not use joint stereo due to some limitations in the software on ACX’s servers. I’m not sure why they ask for 192Kbit/sec because the highest quality output on Audible is only 64Kbit/sec mp3, and the lower bit rate endcodings are not mp3 but use a speech codec. Also we are requested to NOT include ID3 tags, which should have been mentioned.

      • Chuck, sounds like you’d be a really cool guest for our show East West Audio Body Shop, ewabs.com! Would you join us? Drop us a note at ewabshop@gmail.com

  4. Aren’t there patent issues with using the LAME encoder? From http://www.mp3-tech.org:
    “They now control a portfolio of 18 patents related to MP3, and one for Mp3Pro. This portfolio is very extensive, and cover various aspects of MP3 encoding. Some of those patents could be avoided in an MP3 implementation by either not using several features of the standard, or by using different algorithms than those specified. But what is important is that there are some of those patents that you will not be able to avoid, so practically it means that you can not use any MP3 implementation without using some parts of those patents.”

    In other words, there is no such thing as a free MP3 encoder, and ACX should probably not be encouraging people to use something that claims to be free without pointing out that the user is responsible for obtaining the proper licenses. LAME is billed by its creators as an educational tool.

    Me, I use Sony software (Vegas Pro/Sound Forge) so the license is included, but here’s the thing: It does not allow 192kbps mono encoding, so I’m stuck encoding in stereo.

  5. I’m using Amadeus which has a LAME encoding option. I just need to type in “additional command online arguments” do you know what to type here?

  6. Hi Chuck, some software will export interleaved stereo including Pro Tools. Guess that doesn’t help you with Cool Edit Pro 😦

  7. Pingback: Hdmi Leads - HDMI Cables : HDMI Cables

  8. What is the difference between interleaved stereo and joint stereo? I’ve not come across someone delivering product to ACX in stereo, so this is a very helpful tidbit for me to know for the future.

  9. Hi George,

    Great to hear from you! Interleaved stereo audio files are simply audio files which join left and right channel information into one block of data. When errors occur during the decoding of stereo information, the most common type of error is a “burst error,” whereby many bits in a row will be damaged.

    For instance, consider the sentence:

    “ACX loves producers and rights holders”

    if this message contained a burst error, it may end up looking like this:

    “ACX loves prod_____nd rights holders”

    The strength offered by interleaving your audio files is noteworthy: it spreads the signal damage addressed by forward error correction across the entire data word, allowing for a more coherent end result:

    “ACX l_ves pr_ducers and ri_hts hold_rs”

    Because the majority of Audible products are mono, we must ensure that any stereo audio we receive will not be damaged by the time it is encoded and reaches the listener’s ears. Interleaving your stereo audio before encoding to MP3 is the best way to ensure that the separate L and R channel information is not damaged when summed to mono and streamed to the listener’s device. Otherwise, your stereo audio can risk degraded panning, which can cause otherwise great audio to possess a sub-optimal sound quality.

    Joint stereo is a stereo encoding option for MP3. It will decode the stereo audio by listening to the frequency content and panning of the source audio file. In doing so, it will make a number of stereo processing decisions on a frame-by-frame basis. This is done in an effort to save disk space, and for music and other audio with a wide frequency range and rich dynamic content, it is a perfectly acceptable format to use for encoding to MP3. However, since audiobooks are considerably less wide-ranged and tend to have a more consistent dynamic level, joint stereo’s risks outweigh their benefits.

    Thank you for your inquiry,

    Andrew Grathwohl

  10. Pingback: File Managment with Andrew The Audio Scientist | Audiobook Creation Exchange Blog (ACX)

  11. Pingback: How to Succeed at Audiobook Production: Part 4 | Audiobook Creation Exchange Blog (ACX)

  12. What is the best setting to “Normalize Peak Level – Avg. RMS Loudness? I use WavePad and my first book I sent to ACX, I set at -18dB. I thought I saw somewhere in this blog to set it at -6. What is best?

  13. Andrew I’m going to ask a really basic question – I work in pro-tools and I have a very controlled environment. However I’m a basic user. I can’t seem to find the place to confirm the very last steps before I bounce my work to an MP3 . It’s this – Thus far I’ve managed peaks etc by looking at my waves and listening. How do I reveal in numbers (within Protools ) or Set my -23db and -18db and 3db Peak values? I’ve go my CBR and 44.1Khz when I bounce but how do I confirm the 192Kbps? I’ve been through my pro-tools help manual and search and find nothing and I KNOW it’s basic and I’m embarrassed to ask but…I have a short book due on Monday and I want it to be right for my client and you guys. I’m pretty sure I have everything else under control (recording, editing etc) and I’ve booked 3 gigs with my samples and auditions but…these couple basics are tripping me… Thanks ( blushing at the basics)

  14. Oh nope – I even have the 192 or higher. Pro-tools has been doing that for me when I bounce….It’s trying to get the db numbers to show – not just my green guide when I speak – for voice and room tone (floor) Thanks! 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s