DeepMind says it’s developed much more realistic computer speech

Google DeepMind claims to have significantly improved computer-generated speech with its AI technology, paving the way forward for sophisticated talking machines like those seen in sci-fi films like “Her” and “Ex-Machina.”

The London-based research lab,acquired by Google in 2014 for a reported £400 million,announced on Thursday that it has developed a talking computer programme called “WaveNet” that halves the quality gap that currently exists between human speech and computer speech.

Although WaveNet sounds more like a human voice than existing artificial voice generators — known as “text-to-speech” (TTS) systems — it requires too much computing power to make it practical, meaning Google won’t be integrating it into its products any time soon, according to The Financial Times.

 AI Landscape: Global Quarterly Financing History

Image: CB Insights

Aäron van den Oord, a research scientist, at DeepMind said: “Mimicking realistic speech has always been a major challenge, with state-of-the-art systems, composed of a complicated and long pipeline of modules, still lagging behind real human speech. Our research shows that not only can neural networks learn how to generate speech, but they can already close the gap with human performance by over 50%.

“This is a major breakthrough for text-to-speech systems, with potential uses in everything from smartphones to movies, and we’re excited to publish the details for the wider research community to explore.”

Unlike existing artificial voice generators, WaveNet focuses on the sound waves being produced as opposed to the language itself. It uses a neural network — a technology that tries to replicate the human brain — to analyse raw waveforms of an audio signal and model speech and other types of audio, including music.

DeepMind published sample audio recordings of WaveNet talking in English and Mandarin and it’s easy to see that the audio recordings are an improvement on Google Now, Amazon’s Alexa, and Apple’s Siri. The company also showed off some of the music that WaveNet has been able to produced after studying solo piano music on YouTube.

Like other AI systems, WaveNet requires vast quantities of existing data to train itself. DeepMind used Google’s existing TTS datasets to do this.

DeepMind, which sits under Alphabet, Google’s parent company, is best-known for developing artificial intelligence systems that can master games like Space Invaders and Go. However, Google has been slow to integrate the company’s technology into its products, with just one data centre efficiency project announced so far, albeit on a global scale.

For more details on WaveNet, take a look at Google DeepMind’s academic paper.

Leave a Reply