SAE Global Supplier Marketplace
Login / MySAE  |  Sign Up!
SAE Home Industries
Search: Advanced Search
A free library of business intelligence, research and product information related to automotive engineering.

Magazine eMedia Advertising Info Contact Us

VoiceBox enables XM conversational control

VoiceBox Technologies is beta-testing a first-of-its-kind reference design that enables XM users to conversationally search for radio stations, stock quotes, weather, and other data.

VoiceBox Technologies engineers have entered beta testing with a first-of-its-kind reference design that will enable users to conversationally search for radio stations, stock quotes, weather, and other data on an XM satellite radio. They overcame a number of difficult challenges in creating this reference design including finding a chip with the process power needed to understand the user's conversation and respond without noticeable delays; identifying an operating system and development environment that would minimize the time required to port the company's core technology to the new hardware environment; and integrating the design with the satellite radio and other hardware specified by XM.

As users speak, an automatic speech-recognition (ASR) system, such as ViaVoice from IBM, converts the sound into words. The Voicebox conversational language processor then looks at the words to determine their context using algorithms that are patterned after the way humans determine context from speech. We know where we are, who we are talking to, and what we are talking about. In the same way, the conversational language processor looks at the relationships among words to look for similar concepts and examines sentence structure to take into account the semantic relationships of words. It also examines words relative to recent and frequently used concepts and semantics. This makes it possible to mend sentences that have been distorted by noise or accents.

VoiceBox has developed a platform-independent core technology that is capable of running on servers, desktops, and mobile devices. Each particular application requires that the core technology be ported to a particular hardware platform and operating system and integrated with an ASR as well as the controlled device. In this case, XM chose VoiceBox as its source for voice technology and asked it to integrate its technology onto a printed circuit board designed by XM that includes an XM receiver. The board is being provided to aftermarket audio vendors as a reference design.

VoiceBox engineers originally worked with Analog Devices' Blackfin 533device, but planned from the very beginning to roll out the product on the Blackfin 539 because they needed nearly all of the nine ports it offered. The Blackfin processor runs the conversational language processor as well as IBM ViaVoice ASR, the interface to the XM radio, and the process that parses the radio's 255 data streams. The ports are occupied by the audio-out codec, data in from the radio, Bluetooth for the wireless remote microphone, UART (universal asynchronous receiver-transmitter) to control the Bluetooth chip, UART for the control line to the radio, and finally a UART for the console. These last three physical layers could potentially be multiplexed to enable the device to run on a less expensive chip.

The next step was to select an operating system. VoiceBox engineers considered several real-time embedded operating systems (RTOS) and selected Integrity from Green Hills Software. The RTOS was selected first of all because of its conformance with the POSIX (Portable Operating System Interface for uniX) standard, which provides a common language for low-level operating system controls such as semaphores, messages, process controls, and thread management. VoiceBox has centralized its code around the POSIX standard, which saves a considerable amount of time in porting it to a new platform. Engineers also liked the Green Hills toolkit, which provides the very powerful MULTI integrated development environment.

Another important point is that Integrity supports the JTAG (Joint Test Action Group) standard for coupling the development board to the personal computer running a debugger much more tightly than can be accomplished with a simple ethernet connection. With JTAG, you can set a break point to stop the program and it stops at exactly that point. JTAG also gives you visibility to every register on the chip. "This level of control was particularly important since this was the first time we have ever worked with the Blackfin chip or this new hardware platform," said Alan Gordan, Director of Embedded Development at VoiceBox Technologies.

With the new embedded device up and running, engineers were able to measure the effectiveness of the implementation. The performance of a voice-recognition device is measured by how quickly it can interpret a given amount of human speech. For example, if it takes15 s to process 10 s of speech, then the device has a performance index of 1.5. The performance of the XM application is currently at 0.4, which means it can process speech in less than half the time in which it was spoken. The advantage of this level of performance is that engineers have plenty of room to improve accuracy by trading off speed for higher accuracy.

This article was written for AEI by Jerry Fireman based on information provided by Green Hills software and VoiceBox.

©2009 SAE International. All rights reserved.