Anyone that has ever listened to an audio book knows two things right away. That they are not cheap, and that the quality of the narration makes or breaks a book. The best book on earth can be rendered unlistenable by a bad narrator.
Prior to AI, your options for an audio book were limited to using a production company, or doing cost share on Audible, Amazon’s audiobook service. The process of using a production company makes clear why audio books are expensive.
The price you pay to make an audio book depends on the contract and the company. You can pay 100% of the costs yourself, and have the rights to 100% for the sales. This option, depending on the size of your book, can cost between $5-$7k. Ouch.
Some companies offer a cost share option. You pay 50% of the costs, and in return you get 50% of the profits from sales. Some companies will go as far as to cover 100% of the costs, and give 10-20% of the profits until they recoup their costs, then you get bumped up to 50%. Under these cost share options you never get 100% of the profits. Participation in the cost share is also dependent on your book having a certain threshold of sales.
The upside to using a cost share option is that the production company has a financial incentive to help advertise your book. Anytime you have someone marketing your product for you is a good thing.
If you use Amazon’s cost share option, the narrators take a lot of the risk upon themselves, so it’s not a very popular option. They can end up getting to the end of the narration process, a large time commitment for them, and have authors pull out. It’s still a risk for authors as well, as narrators can go radio silence and never produce your book. The only way either of you make money is for you to put your book onto ACX once it’s finished and you split the royalties.
Because of everything listed above, entry into the audio book space has been limited to those with high sales or money to spare, which is often the same thing. The cost of producing an audio book has stood as a barrier to entry for newer, independent authors.
Enter AI
Note: I have only used ElevenLabs for audio book creation. This is not an endorsement of them over any other software, simply the limit of my experience.
https://twitter.com/Wifi_Pioneers/status/1652336989662748672?s=20
Elevenlabs represents a game changer in audio book production. For $99 you can produce a full length book in 15-20 hours. (You can half ass it quicker, but you should do it right).
The Technical Details
They have several monthly plans, broken down by how many characters you want. A character is a letter, punctuation or space. To convert a full length novel (apx 80-100k words) you would need around 500k characters. Their $99 a month plan happens to be 500K characters per month. Pretty convenient.
The process of turning your book into audio is easy too. You can copy and paste up to 5000 characters at a time. This equates to about 900 words at a time. You then download each recording as an MP3, and when done splice them all together. You will need audio editing software for this, but the free software is more than sufficient for this.
The Time Consuming Part
First you need to pick or create a voice. You can record your own voice, and it will clone it with amazing accuracy, if you wish to be your own narrator. For non fiction authors this is very appealing. Even fiction authors find some joy with this option. However, if you want to use a different voice, or in the case of a fiction book, have multiple voices, you can create them in the Voice Lab.
You can create and save dozens or even hundreds of voices depending on your subscription plan. That being said, the amount of voices you can create is technically accurate, but a little bit of propaganda.
The voice lab is broken up into Male and female, several accents (American, British, African, Australian, Indian), and age (Young, Middle aged, Old). Once you pick those three settings, the accent has a sliding scale for strength.
While technically you can make endless voices from those settings, realistically you can only get a few unique voices per setting. IOW, a Young Female with American accent as your primary settings will only get 5-10 unique voices by adjusting the accent strength. An 80% accent sounds to similar to an 85% accent to be used in the same book.
That being said, I would expect future upgrades to have more options.
Now to Record, The Other Time Consuming Part
With your voice or voices selected, It’s time to copy and paste. If your book is told from one POV, this is easy, as you only have one voice to deal with. For fiction novels that switch between multiple main characters, you need to pay attention at the beginning of every chapter to ensure you have the right voice. If you record under the wrong voice, all you did was waste characters that you can’t get back.
The time consuming part is that you want to listen to every recording for mistakes. You’ll be surprised at how good the AI is at detecting voice inflection and emotion, but its not perfect. Sometimes it mispronounces a word, usually a foreign town name. Or it mispronounces an acronym.
The point is you need to listen to every single recording, in its entirety, to ensure best quality.
Tips and Tricks
If your book has alternating perspectives, then start each chapter like this:
Chapter 1. Bob. I walked into the room…
By adding a period and several spaces behind the chapter and name, it will record a pause long enough to avoid sounding like a strange run on sentence.
If you use *** in your chapters to break a chapter up, delete it and hit the space bar ten times. This will create another pause. Additionally, in editing you can add a 1-2 second pause.
If you use Italics to indicate an internal monologue, the AI will not recognize this. Add ‘I/he thought’ at the end of every italic so that the listener knows internal thoughts vs spoken words
The voice you select will do its own inflections for the opposite sex. Meaning if your chosen voice is a woman, and she is narrating a conversation with a man, she will deepen her voice to indicate a man. It only knows to do this is ever quotation ends with ‘he said/bill said’. Ensure every quotation ends with the name or gender of the speaker, otherwise the inflection might be wrong.
When you hear a mistake in the audio clip, don’t rerecord the entire thing. That’s a waste of 5000 characters that you paid for. Instead rerecord the sentence or paragraph that is wrong, and splice it in during editing.
If you need to convey more emotion than the AI is producing, then during editing speed up or slow down the recording by 5%. This will give a sense of excitement or drama. It’s not perfect, but its something.
Where to sell your audio book
The immediate downside to an AI audiobook is that you cannot sell it on Audible. Audible is the biggest audiobook platform, and for the time being they are not allowing AI. That’s good news for the audio production companies.
https://twitter.com/Wifi_Pioneers/status/1648690723384958976?s=20
You can sell your audiobook straight from your own website, and you can sell it on platforms like Kobo. You can set your own price and make yourself competitive with Audible by being 1/3 the price. In the world of independent publishing, being cheaper (but not free) leads to higher volume of sales.
As with your published books, the marketing is up to you. For $99 and 10-15 hours of your time, you can put your audiobook out there and make sales right away. Despite Amazon sticking with the production companies, long term I don’t think they’ll be able to compete with AI. When you compare $99 to $2-7k, the production company’s’ days are numbered. There are still some who will hold out for real narrators, but as the cost of audiobooks increases, even some of the most die-hard fans might have to consider giving AI narrated books a chance.