Skip to content

AI Can Clone Any Voice. Making It Feel Is the Hard Part.

Language Tech UnboxED 

Beluga CEO Jan Hinrichs, @TheObiJanNetwork, sits down with Voiseed founder Andrea Ballista to talk about emotional AI voice, the real limits of voice cloning, and what AI dubbing means for the people who run localization.

Watch the full conversation with Andrea Ballista, founder and CEO of Voiseed.

Every tech company is racing to build the perfect synthetic voice, and most of them are solving the easy half of the problem. Cloning a voice takes seconds. Translating it into fifty languages is close to a commodity. Making that same voice sound genuinely afraid, or furious, or heartbroken, on the right line and in the right language, is where the work actually starts.

That gap is Andrea Ballista’s whole thesis. He spent three decades in recording studios directing voice talent and producing audio for thousands of video games, and in 2020, before ChatGPT put generative AI on every boardroom agenda, he founded Voiseed to close it.

Jan Hinrichs, Beluga Linguistics CEO and the host many of you know as Obi Jan, sat down with Ballista for our Language Tech Unbox series. The session was recorded in the Beluga studio, with Andrea joining remotely. Here is what stood out, and why it matters for anyone who runs translation, dubbing, or localization for a living.

The elephant in the room

Cloning is solved. Emotion is not.

Ballista is direct about the company everyone names first. ElevenLabs, Google, Microsoft. He calls it the elephant in the room, and he does not pretend Voiseed wins on raw cloning or sheer language coverage. Those problems are largely behind us. What none of the big platforms do well yet, in his view, is emotion under control.

A model can read a line cleanly. It cannot reliably decide that this line is bitter, that the next one cracks, and that the reply lands somewhere between relief and exhaustion.

So if you are searching for an ElevenLabs alternative, the honest framing is about a different job entirely: expressive performance you can direct, line by line, with control over how each line actually feels rather than a clean read with the feeling sanded off.

The framework

An emotional universe you can navigate

To make emotion something you can engineer rather than hope for, Voiseed built what Ballista calls an emotional universe, navigated with an emotional compass. It is grounded in Plutchik’s wheel of emotions, the psychologist Robert Plutchik’s model of eight core emotions that blend and shift in intensity, and in thirty years of studio instinct about how those emotions actually sound when a human performs them.

The model earns its keep by giving an editor coordinates. Instead of nudging anonymous sliders and listening for luck, you place a line somewhere in a mapped space of emotion and intensity, then move it on purpose. The illustration below shows the idea: eight core emotions, each rising from neutral at the centre to full intensity at the rim.

CALM CENTRE Joy Trust Fear Surprise Sadness Disgust Anger Anticipation
Illustration of the eight-emotion concept behind expressive voice models, after Plutchik’s wheel. Intensity rises from the centre outward.

Inside the tool

Line by line, in every language

The platform, Revoiceit, relaunched in early 2026 as Voiseed Studio, turns that model into a working editor. Ballista walked Obi Jan through it line by line. Each line of a script can be tuned for emotion and intensity, per character, and crucially per language, so a scene holds together whether it ships in English, German, or Japanese.

This is text to speech with a director’s hand on it, built for internationalization rather than a single English master.

Watching it, Jan reached for two comparisons that stuck. It is like conducting a symphony, and it is like painting. You go through the script again and again, adjusting, until the scene feels right. That is a craft, and it takes a trained ear. Which is the point Beluga keeps making about AI in language work: the tool strips out the mechanical cost and puts a premium on the person with judgement.

Where it pays off

Games and film, for very different reasons

Two markets came up most. Video games and film dubbing.

In games, the economics are blunt. Sending a second or third tier character back into a studio for a handful of extra lines, in every language, is slow and expensive. Producing those lines virtually, with emotional control, changes what an indie studio can afford to localize at all. More characters get a real voice. More languages become viable. The long tail of dialogue stops being a budget problem and starts being a creative decision.

For game localization, that is the difference between shipping two languages and shipping twelve.

Film and television dubbing is the harder test, and the one Ballista seemed most energized by. The interview included a dubbed clip that, played cold, could have come straight from the original cast. A real performance, with the breaks and the weight in the right places. For anyone who has sat through flat synthetic voices, that clip is the part of the episode worth rewinding.

Fits the workflow

It plugs into the stack LSPs already run

For the localization professionals in our network, this is the part that matters most. Voiseed is not trying to live in a separate silo. It already integrates with memoQ and with Blackbird.io, which means expressive voice can sit inside the translation management and automation stack that language service providers already operate.

Two customer types are in scope: localization companies, and corporates buying directly. The work shifts toward whoever can orchestrate it.

Ballista has a name for the role that emerges from this: the voice AI editor. Someone who understands script, performance, emotion, and language, and who directs the model the way a dubbing director once directed actors in a booth. Voiseed is planning training for exactly that role. For experienced linguists and audio professionals wondering where they fit in an AI dubbing workflow, that is a concrete and frankly encouraging answer.

Who owns a voice

Consent is the foundation

None of this is clean, and the interview did not pretend otherwise. The question of who owns a voice stopped being theoretical this year. In April 2026, Taylor Swift filed to trademark her own voice as a sound mark, an attempt to fence off AI imitation that, as the trademark lawyers who flagged it pointed out, has never been tested in court. Rights holders are split on the whole question. Sony and Universal are exploring joint ventures with AI voice companies, while others move to block voice cloning outright.

Ballista’s position is that consent and ethical production come first and shape everything downstream: voices used with permission, talent compensated, provenance clear.

For LSPs fielding dubbing and voice cloning requests from regulated clients, that is the same conversation Beluga has every day about translation in FinTech or HR SaaS. The value is in being the partner who gets the governance right, which is what a regulated client is really buying.

The market

A twenty billion dollar reason to pay attention

The numbers explain the urgency. Industry analysts at MarketsandMarkets project the AI voice generator market at roughly twenty billion dollars by 2031, growing around thirty percent a year. Ballista cited figures in the same range. Media and entertainment is the largest slice today; gaming and developer tools are the fastest growing.

Beluga’s read, and the thread running through our Language is Infrastructure work, is that voice is becoming part of that infrastructure. When a game, a course, or a product speaks to someone in their own language with the right emotion, that is what makes the content land. The companies that win this market will be the ones who can guarantee the result, line by line and language by language.

Watch the full conversation

The complete interview with Andrea Ballista is above. Explore Voiseed’s emotional universe, follow Andrea for news on the voice AI editor training, and catch the rest of the Language Tech Unbox series on the ObiJan channel.