Day 10: Behind the scenes

Turning Starborn Alive into an audiobook may sound simple: paste in the text, and the AI voice reads it. In practice, it’s a process of fine-tuning and constant adjustments.

It all begins with the voice. You select one from ElevenLabs’ vast library, each with its own timbre and personality. Once chosen, the voice can be fine-tuned — adjusting speed, expressiveness, and how much it should exaggerate emotions or dialogue.

The narration is then generated piece by piece. ElevenLabs produces voices with striking realism, but they don’t always deliver exactly what you need. There’s a degree of randomness that makes the reading feel alive, but also unpredictable. A single word might suddenly come out with the wrong emphasis, pitch, or rhythm, and sometimes an entire sentence has to be regenerated several times before it sounds natural.

Names can be especially tricky. My character Amalie, for instance, was often mispronounced with the wrong stress. After much trial and error, I discovered that spelling it as Æmahlee finally produced the correct pronunciation. ElevenLabs also includes a pronunciation editor, where you can create a custom dictionary to lock in how specific names and words should always sound. That feature quickly became essential for consistency.

Pauses are another subtle but crucial detail. The difference between a short pause and a longer breath can completely reshape the pacing of a scene — whether it feels rushed, reflective, or dramatic. Being able to insert and control these pauses adds an extra layer of performance, something you don’t expect from an AI at first glance.

In the end, creating an audiobook with AI is less about clicking a button and more about directing a performance — experimenting, adjusting, and listening closely until the voice on the page truly comes alive.

Leave a Comment