Automatic speech recognition
The transformation of the speech signal into a text stream
Verbal communication is a natural and convenient way of intercourse for humans. Speech recognition aim is to remove a mediator in human-computer communication. Control of the machine by voice in real time mode, as well as input of information by human speech will simplify considerably life of a modern man. To teach a machine to understand without a mediator the language spoken by people is the speech recognition aim.
Scientists and engineers have been solving the problem of verbal communication between man and machine for many years. The first speech recognition device was created in 1952, it could recognize spoken by man numbers. Commercial speech recognition software appeared in the early nineties.
All speech recognition systems can be divided into two classes:
- Systems dependent on the speaker should be adapted to the speaker’s speech during training. In order to operate with another speaker, such systems require a complete reconfiguration.
- Systems independent on the speaker, their operation does not depend on the speaker. Such systems do not require prior training and are able to recognize speech of every speaker.
- Recognition of voice marks - recognition of speech fragments per pre-recorded speech sample. This approach is widely used in relatively simple systems designed for the performance of pre-recorded voice commands.
- Recognition of lexical items - implies recognition of fragments of pre-recorded speech sample. This approach is widely used in relatively simple systems designed for the performance of pre-recorded voice commands.
Sound pattern of command was stored as a holistic pattern in systems of the first kind. To compare an unknown pronouncing and a standard command, methods of dynamic programming were used. These systems worked well while recognizing small sets of 10-30 commands and could understand only one speaker. To operate with another speaker, these systems required a complete reconfiguration.
To understand continuous speech it was necessary to use dictionaries of much larger size from several tens to hundreds of thousands of words. Methods, used in the systems of the first kind, didn’t fit for this task solution, as it was simply impossible to create standards for such amount of words.
In addition, there was a desire to create independent from the speaker system. It is the challenging problem, because every person has his own individual style of pronouncing: rate of speech, voice timbre, peculiarities of pronunciation. Such differences are called the variability of speech. To take it into account, new statistical methods, based primarily on mathematical apparatus of Hidden Markov Models (HMM) or artificial neural networks, were proposed. Instead of creating standards for each word, standards of individual sounds, the so-called acoustic models, are being made. They are generated with the help of statistical processing of large speech databases containing speech recording of hundreds of people.
There are two fundamentally different approaches in existing speech recognition systems:
Note that the creation of speech recognition systems is an extremely difficult task. Experts of Speech Technology Ltd. have years of experience in the practical application of speech technologies.
Today systems of automatic speech recognition are widely used in various fields of human activity.
The most obvious application of continuous speech recognition consists in the creation of automatic transcription systems, which can substitute secretaries (dictating via voice texts of letters, notes and reports). This solution allows reducing expenses due to reduction of stenographer’s work, and also increases the degree of data confidentiality. At the moment, such systems are better implemented for English (though with a lot of limitations on the application), the recognition system of continuous Russian speech are under active development.
It is known how inconvenient and dangerous to use mobile phones with the usual (tactile) way of dialing while driving. Many countries adopt laws which banning drivers to use such telephones in order to reduce the number of accidents. Therefore, more and more people are interested in mobile phones with voice dialing, releasing users from the need to dial the number manually. It’s enough to say the caller's name and the connection will happen automatically. In these phones all functional and digital keys are also replaced by voice commands; using such phones while driving is more secure than common mobile phones and even mobile phones with a hands-free headset. Audio control systems are already being used in some brands of cars. Car owner gives voice commands of temperature conditions control, radio, navigation system, which perceive the voice and execute any commands (link to DIVO and VoiceCommander).
Automatic speech recognition system is used extensively in call-centers. Typically, such systems are known as IVR-Systems (Interactive Voice Response). IVR-systems can automate the dialog with the client, as the result there’s no need to hire a huge number of operators taking telephone calls, i.e. reducing of staff costs. In addition, customer service is improved; cause the connection with the machine is performed almost immediately, relieving customers from long waiting. IVR-system allows to choice menus via voice commands instead of tone dialing, which greatly simplifies operation with clients. In order to obtain necessary information, the subscriber has no need to listen to the end of the whole range of offered services. Got through, a person can get to any menu level, saying only one sentence, it saves time. Today, many large companies have already switched to IVR-systems.
Speech recognition system offers such opportunities that were unavailable when using tone dialing. For example, for the service of booking tickets over the phone: the number of cities is so vast that the tone menu is unrealistic. At the same time, the speech recognition system is able to provide the most natural communication.
Video games with voice-controlled characters, dictionaries and translators from voice to voice, complex systems of human-computer dialogue are developed on speech recognition basis.
We have presented only some examples of using the automatic speech recognition technology, in fact, there are many more.