By Matt Zajechowski
Automatic speech recognition, also known as ASR for short, is a technology we’ve all come into contact with at some point in our lives. Along with what is also known as Interactive Voice Recognition (IVR), ASR is the mechanism by which we can use our voices to communicate with computers and electronic interfaces.
Some common uses of ASR, in both business and private contexts, are the interfaces used in automated telephone customer service systems, voice controlled computing applications, text editing software like Dragon Naturally Speaking, and the voice command interfaces of many modern smartphones such as the newer model iPhones by Apple with their Siri interface.
Small businesses obviously make use of ASR technology where they can afford to do so and its applications in both customer service and backend contexts are numerous.
Before we get down to more detail on how ASR can be in your small business, let’s first cover just how the technology itself works.
A Primer on ASR: How It Works
The basics of the ASR/IVR voice recognition process go as follows:
- A person talks to an ASR/IVR interface
- The software behind the interface creates a raw wave of what the person said
- The software then cleans this wave up by reducing background noise and normalizing volume
- The filtered wave form is then broken down into pieces known as phonemes. These are the basic sound building blocks of words, such as “ka”, “wh” or “t”. English has 44 of them, French only 33 and Italian, for example, has 49.
- Each phoneme is a sort of link in a long chain and by examining these links in sequence, the ASR software can intuit complete words and then sentences, thus “understanding” what’s being said to it.
- The ASR then responds to the human speaking to it in a meaningful way.
Some ASR Types
ASR software can be broken down into two essential types that we most commonly see in personal and business contexts. These are “directed dialogue conversation” and “natural language conversation” systems.
Directed dialogue conversations are the more basic type of ASR systems and the ones most commonly used by businesses. They are used to create many of the automated customer service platforms that respond to a user’s voice with information. These systems are simpler because they limit a human user to a selection of pre-programmed word choices that trigger software responses when spoken.
Natural Language conversations involve the creation of ASR that more closely resembles human conversation. The more sophisticated these systems are, the more “human-like” that conversation is. They enable human users to openly speak with the software. Siri in the iPhone is a well-known example of natural language ASR.
Natural language ASR is very complex and represents the future of voice recognition software. Because its requires a human-like conversation capacity, this technology typically uses an internal vocabulary of tens of thousands of words and utilizes “tagged” keywords to intuit the word context of what human users say to it, So for example, if a person says “weather forecast” to such a system, it recognizes the keyword “forecast” and uses it to guess that the other word is indeed “weather” and not the equal sounding word “whether”.
Sophisticated natural conversation ASR is where programmers want to take the future of automated customer service systems.
How ASR Systems are “Taught” to Understand Humans
The two main ways by which all ASR systems are taught to better understand human speech consist of something called “tuning” and another, more sophisticated process known as “automatic learning”
“Tuning” is a human powered process by which programmers basically review an ASR software’s conversation logs periodically to see which new words it’s been hearing more often and then add those words to its recognition dictionary.
“Active learning” on the other hand, is much more complex and involves the programming of ASR software to autonomously learn adopt and then use new words it starts to hear over time on its own in ways that are meaningful to the human users it most often interacts with. The result is (hopefully) a more individually tailored comprehension between the machine and its human user.
How Businesses Can Use ASR Systems
ASR/IVR technology is already heavily used by many larger and even small businesses. For starters, programming and installing such a system, while initially expensive, can lead to thousands or even millions of dollars in employee retention expenses being saved because the machine can solve customers’ problems instead of a live human being who has to be paid to remain on standby.
One example of a company that offers these kinds of services to small businesses is West Interactive, which also created the infographic below that goes into more, deeply visual detail on how ASR technology works.
Furthermore, many companies that are involved in media, entertainment and consumer electronics applications are now implementing natural language Siri type ASR interfaces to their applications so that the customers who use them can have a more “hands-off” user experience.
These are just some examples of the “front end” use of ASR technology. At the back-end, many companies are taking advantage of the software to more closely integrate workflow between employees and allow staffers to more easily move through several different kinds of work at the same time without being overburdened by typing and clicking.