Categories: NewsTechnology

The Speech-to-Text Revolution

First, many schools decided to stop teaching cursive since digital devices text presents in print. Now, some parents are campaigning to end handwriting lessons entirely in favor of earlier typing education. Soon enough, kids might not learn any form of writing; they’ll just speak, instead.

This might seem like a regression – after all, written language is largely believed to be what catapulted the human race forward into civilization. However, thanks to advancements in speech recognition software, returning to spoken communication only may be the next big step into the future.

The History of Speech Recognition

As is always the case with technology, the first speech recognition machines were extremely limited in what they could understand. In 1952, “Audrey” – the first ancestor to today’s “Siri” and “Alexa” – could recognize numbers when they were spoken by a single, familiar voice. Ten years later, “Shoebox” could pick out a total of 16 English words. More than 10 years after that, DARPA built “Harpy,” which had roughly the vocabulary of a 3-year-old – but could search faster and more efficiently than any system before it.

Indeed, advancements in speech recognition have largely been associated with advances in search technology and methods because speech recognition machines must be able to match perceived sounds with possible meanings exceedingly quickly. Google has excelled at producing speech recognition software for mobile devices because its core product is a powerful web search that can discern meaning regardless of spelling or ambiguous phrasing.

During the 1970s and 1980s, technological innovation in the field of speech recognition came fast and heavy. Bell Laboratories developed a system that could interpret multiple voices, and mathematicians developed a new search structure called the hidden Markov model which relied on probability of sound patterns rather than word templates. With this innovation, speech recognition machines began entering the consumer marketplace, as dictation aids (for adults) or responsive toys (for kids).

Related Post

However, the systems were significantly hampered by one serious flaw in most people’s speech: poor enunciation. For machines to understand sounds, speakers had to talk unbearably slowly, which made manual writing or human-to-human dictation more practical. This slightly improved over time: In the ‘90s, “NaturallySpeaking” dictation software allowed speakers to talk at a rate of 100 words per minute. Yet, by the mid-‘00s, there was not much progress, and the demand for speech-to-text programs was low.

Until the smartphone. One of the primary restraints on the development of speech recognition technology was the availability of speech data, so machines had little information to help them learn what speakers were probably saying. With smartphones, Google and other speech recognition developers gained an overabundance of data; soon, the addition of voice search on computers added to the wealth of sound files computers could analyze and use. Today, advanced voice-to-text software knows more than 230 billion words – a massive jump from the original 16.

Voice Tech of the Future

Speech recognition has improved enough to make it a useful technology for everyday life, and the masses are now clamoring for more voice-controlled options on every device. It seems that developers are complying with enthusiasm. Samsung, Apple, Google, and other smartphone and mobile device manufacturers are racing to produce the smoothest speech recognition apps on the market to help users avoid the labor of typing once and for all.

If voice technology isn’t already ubiquitous, it will be fairly soon. Speech recognition software is becoming exceedingly natural and intelligent, able to function in noisy settings, comprehending multiple languages, discerning different speakers, and responding with lifelike (and customizable) speech of its own. Alongside the development of speech recognition, engineers have worked diligently to build smart networks. Therefore, voice will be the primary means users soon use to interact and change their physical environments: close the blinds, raise the temperature, play a new song, lock the doors, etc. As processors shrink, powerful wearable tech will begin recognizing and reacting to speech. Even cars, which will soon be autonomous anyway, will likely respond to voice commands rather than diligent mechanical handling.

Voice is the oldest of humankind’s myriad tools – and it is arguably the most influential. It should come as no surprise that after centuries of emphasizing the written word, we are now returning to a natural and easy means of communicating and impacting the world around us.  

Mark Arguinbaev

I'm a 29 year old cryptocurrency entrepreneur. I was introduced to Bitcoin in 2013 and have been involved with it ever since. Fun Fact: I mined cryptocurrency using my college dorm room's free electricity.

Share
Published by
Mark Arguinbaev

Recent Posts

Solana Data Insights: Pump.fun Livestream Tokens Generate $4.7M in Creator Fees

Livestream tokens on Pump.fun are rewriting the playbook for creator monetization. They’ve opened a floodgate…

6 hours ago

FTX to Release $1.6 Billion in Third Creditor Distribution

FTX is set to make another round of creditor payouts. Yesterday, the exchange confirmed it…

6 hours ago

Tether Cofounder Reeve Collins Launches $STBL, A Next-Gen Stablecoin Infrastructure

The stablecoin market just got a major shake-up. Reeve Collins, the cofounder of Tether, the…

6 hours ago

Justin Sun Pledges $SUN Buybacks With SunPerp Revenue

Justin Sun, CEO of TRON DAO, has just made one of his biggest announcements of…

3 days ago

$BNB Hits $1,000 ATH as Market Cap Reaches $145.7B

$BNB has broken through a historic milestone. The token surged past $1,000, setting a new…

3 days ago

Top 5 DeFi Tokens Less Than $1 Price Mark To Watch In September

Decentralized finance (DeFi) has continued to disrupt traditional financial systems, offering permissionless access to lending,…

3 days ago