Categories: NewsTechnology

Deep Learning AI Mimics Human Voices In 30 Minutes or Less

Advancements made in artificial intelligence are seemingly announced every single week. Earlier this year, Deep Voice 1 was released by Baidu Silicon Valley AI Lab. That system was received with considerable enthusiasm and hesitation alike, as it generates synthetic human voices through deep neural networks. (As it happens, the end result is quite creepy.) Now that Deep Voice 2 has been announced, things have taken a dramatic new turn.

Deep Voice 2 Contains Hundreds of Voices

Transforming text to speech in a human-like fashion has proven to be difficult. Even when projects succeed in doing so, the end results are incredibly synthetic and far less professional than we would like. Thanks to solutions such as Deep Voice 1 and Deep Voice 2, that situation is slowly changing for the better. There are some definite benefits to these solutions, even though they may not necessarily become accessible to everyday consumers anytime soon.

The Deep Voice 1 platform generated synthetic human voice through deep neural networks. It did so in real-time, allowing the solution to synthesize audio as fast as it needed to be played. This was a major development for inactive projects like video games, conversational interfaces, and the media industry. Baidu Silicon Valley AI Lab took the time to train deep neural networks to learn from vast amounts of data. As a result, it created one of the most comprehensive voice synthesis solutions the world had seen to date.

However, the company has not rested on its laurels. Instead, it has been working hard on improving its Deep Voice solution over the past few months. Over the course of roughly 90 days, the team managed to make their next Deep Voice system far more advanced than their first. Deep Voice 1 provided 20 hours of speech and only one voice from which to choose. Although this was not necessarily a major limitation, companies have to keep moving forward at all times, improving upon existing solutions and scaling the technology.

Related Post

That is exactly what the company did. Deep Voice 2 now contains hundreds of hours of speech and provides hundreds of voices to choose from as well. It is also more than capable of learning from hundreds of voices and imitating them perfectly. Indeed, it’s pretty creepy to hear a computer utter your voice saying something you’d never said. Deep Voice 2 is capable of learning a new voice in the span of just 30 minutes.

Based on samples provided on the official website, one can hardly tell which voice is human and which is a copy, as the difference between the two is incredibly small. This goes to show that AI technology has come a long way and is advancing at an exceptional pace. Although this is by far one of the creepier developments we have seen (or heard) in recent months, it also represents a major breakthrough for any sector reliant on voice synthesis.

Baidu will not release this project’s source code free of charge, nor does it have to. It would be nice to know if this platform will ever become a consumer-grade product. Technology such as this can have major consequences and powerful use cases, assuming people have a chance to play around with it at some point. Any company working with chat assistants, robotics, or other tools involving some degree of communication would certainly benefit from embracing this technology in the future.

JP Buntinx

JP Buntinx is a FinTech and Bitcoin enthusiast living in Belgium. His passion for finance and technology made him one of the world's leading freelance Bitcoin writers, and he aims to achieve the same level of respect in the FinTech sector.

Share
Published by
JP Buntinx

Recent Posts

Top 5 Modular Blockchain Tokens Less Than $1 Price Mark To Monitor In August 2025

As the blockchain ecosystem continues to evolve, modular blockchains are emerging as a promising frontier,…

5 hours ago

MetaMask Proposes Stablecoin Launch, Taps Stripe to Bridge TradFi and DeFi

MetaMask wants its own stablecoin. It’s calling it MetaMask USD (mmUSD). And if the recent…

1 day ago

Spartan, Stake & Betway: Top 2025 Crypto Gambling Prizes

Spartan’s $250K Lambo Challenge Tops 2025’s Crypto Gambling Prize War with Stake & Betway Crypto…

1 day ago

SharpLink’s Ethereum Accumulation Hits High Top With Staking Strategy

SharpLink is leaning hard into Ethereum. They buy. They stake. They hold. Ethereum currently trades…

2 days ago

Cardano Price Prediction: Is a Return to $2 Imminent or Just a FOMO Fantasy?

After months of consolidation, Cardano (ADA) is regaining investor attention thanks to renewed forecasts projecting…

3 days ago

Bitcoin and Ethereum Whales Quietly Accumulating—What Does This Mean for the Market?

Whales are back—and this time, they’re not making noise. Despite the relative calm in prices,…

3 days ago