NCSA Speech Translator
David Bock
NCSA Visualization and Virtual
Environments
January, 1998
Overview
NCSA's Speech Translator is an interface tool for Windows 95//98/NT
providing voice recognition capabilities using IBM's
ViaVoice speech processing technology. Translator also
allows a user to connect and send recognized speech commands to an awaiting
remote process through a standard TCP/IP socket connection across the Internet.
Description
NCSA's Speech Translator was designed with the primary goal of providing
an easy to use interface to two basic operations. First the tool
was designed to provide an interface to a typical voice recognition session
including such tasks as enabling/disabling the speech engine, reading a
grammar file, activating/deactivating the microphone, and processing the
speech. Secondly, the tool was designed to allow a user to easily
open a socket connection and send to an awaiting process the recognized
speech commands. Translator's voice recognition is enabled by accessing
speech processing technology provided by the IBM
ViaVoice software product. User's must have this software installed
on their system to access Translator's speech processing functionality.
Translator was developed by NCSA's
Visualization and Virtual Environments
group at the University of Illinois -
Urbana-Champaign. Translator is written in C++/MFC under Microsoft's
Visual C++ and IBM's
ViaVoice SDK for Windows 95/98/NT platforms.
Operation
The Translator tool interface is shown below followed by a detailed
description of its operation.
Preparing the grammar file
Speech command grammar is created in a readable text format called
Backus-Naur Form or BNF. This grammar file can be created in any
standard text editor. The ViaVoice speech engine processes grammar
in a binary representation called a finite state grammar file or FSG.
To prepare your grammar file for the speech engine, you must convert the
readable BNF file into a corresponding FSG format. This can be accomplished
with the grammar compiler supplied with the ViaVoice
SDK. Once the grammar file is in FSG format, it is ready to be
read by Translator.
Enabling
the speech engine
Before speech can be recognized, the ViaVoice speech engine must be
enabled. To start the ViaVoice speech engine, either select "Connect"
under the "Controls" menu item or press the button with the keys icon (shown
above). This will start the speech engine and a message will appear
in the "Output Information" window informing the user of operation results.
Reading the grammar file
Once the speech engine has been started, Translator is now ready to
accept an FSG grammar file. Use this icon button or select "Open"
under the "File" menu item to read an FSG grammar file. A message
will appear in the "Output Information" window informing the user of operation
results.
Connecting to a remote host
To send your processed speech commands to a second listening process
over a socket connection on a remote host, use this icon button or select
"Remote server..." under the "Controls" menu item to connect and establish
communication. This action will present the user with a dialog box
with which to enter both the 1) IP address or host name of the remote machine
and 2) the socket port number on which to establish communication.
A message will appear in the "Output Information" window informing the
user of operation results.
Activating the microphone
Select this icon button or "Start recognition" under the "Controls"
menu item to activate the microphone and begin speaking. A message
will appear in the "Output Information" window informing the user of operation
results.
Deactivating the microphone
Select this icon button or "Stop recognition" under the "Controls"
menu item to deactivate the microphone. A message will appear in
the "Output Information" window informing the user of operation results.
Disabling the speech engine
Upon completion of a session, disable the speech engine with this icon
button or selecting "Disconnect" under the "Controls" menu item.
A message will appear in the "Output Information" window informing the
user of operation results.
Download
Download the Windows 95/98/NT executable Translator
tool only.
Unzip this file to get the binary executable only.
Download the source code for the Windows 95/98/NT
Translator tool.
Unzipping this archive will create a folder "NCSASpeech" containing
the Microsoft Visual C++ project complete with all source code.
Installation
Installation instructions for both the executable and source code are
described below.
IBM ViaVoice software
NCSA's Speech Translator tool requires
IBM's ViaVoice software product. This software must be installed
on your system for proper operation of the Translator tool.
Setting your system path
The Translator executable must have access to the ViaVoice libraries
installed with the software product. To accomplish this, add the
ViaVoice/lib directory to your system path. For example, if you've
installed ViaVoice in your C: partition, simply add "C:\ViaVoice\lib" to
your system path.
Compiling the source code
To compile the source code for NCSA's Speech Translator tool, you'll
need to download and install IBM's
ViaVoice SDK development kit.
|
|