VALL•E: Text2Voice AI DEMO

Here is our review of Microsoft’s VALLE text-to-voice AI.

Everything you need to know about Microsoft’s new algorithm, VALL-E is here.

VALL-E can take a three-second recording of someone’s voice and create a realistic synthetic version of that voice.

VALL-E’s Synthesize Voice

So, here is an example.

You input a short clip of audio.

And VALL-E can take and synthesize a voice that sounds identical.

It’s incredible that it maintains the speaker’s accent.

And the acoustic environment in which they recorded whether they were in a small echoey room or a large expansive space.

What’s most interesting about this AI tool is it’s able to take a piece of text and differentiate intonation and emotion.

So it stops it creating this synthetic computerized voice that has no understanding of the meaning of the words that it is speaking.

It adds a cadence, an intonation, and a feeling of what might be needed.

So you can see here that it’s taking the emotion.

And it’s maintaining that same anger you can see here.

A sleepy voice.

I would say that the amused intonation is not quite as deliberate as the others.

Did that sound amuse to you?

So what are the implications of this new technology?

Well, one thing is that it’s not necessarily a revolution, more of an evolution.

But the improvements are remarkable.

VALL-E’s Ethics Statement

Here is an ethics statement on the VALL-E white paper.

valle text to voice review

So, what happens if your voice is stolen and recreated using this model?

Suddenly there is somebody who sounds exactly like you existing in the world.

Does that impact our own notion of identity and the increase of AI avatars?

AI personalities that are generated and exist on the web are going to explode.

So now we can keep our voice alive forever.

Imagine this, somebody you loved has died you can record that voice and keep them alive inside of this voice generator.

You can write down conversations that you wish you had with people.

Or what about in therapeutic uses where you wish you had a conversation with someone and you wanted to hear something from a mother, a father, a teacher, or a lover?

Well, you can create these experiences for developing and exploring your relationships on a deeper level.

What about creating deeply experiential therapeutic sessions where you’re deeply role-playing with people who have had an immense impact on your psychological makeup?

Imagine combining this with ChatGPT to generate realistic conversations and have them voiced with the intonation and emotion of a real person’s voice is going to lead to an explosion of intricately complex worlds and interactions.

You can see that already content generation is popping up with AI involved.

Apple Books Digital Narration

Apple has now launched Apple Books Digital Narration offering a new way for publishers to automatically generate high-quality AI-narrated audio from written text.

Here is an example of an AI narration in Apple audiobooks.

So, there is this beautiful soprano female voice that certainly would work very well for romance.

However, for me, something on a deep level can recognize this as synthetic.

Although you will be able to lose yourself in the voice, I would still prefer an authentic human voice in this situation.

And this is the baritone version.

Again, there is some intonation and some emotional inflection in the voice.

And yet there is this slight recurrence of a monotonous voice that reduces the experience to something that is not quite as fluent and lustrous and lilting and effervescent as the human intonation.

But obviously, it’s going to improve rapidly.

And I would say that the inflection of the reading of the oratorship is obviously almost at the level of a professional human.

So, it far outweighs the vast majority of what people who are not trained or able to do can achieve and that is the key.

Final Thoughts

I would be lying if I said I wasn’t slightly worried about my own unique talents being super planted by AI.

But for the moment I think it is important to take these tools and collaborate with them to create what we can whilst we still can.

The last embers of creativity are here.

And of course, this can be used for those who have lost their voice due to disability and gives people a real voice.

I’m sure Stephen Hawking would have appreciated this.

Rest in peace.

