[conspire] Speech recognition on Linux (or, don't rely on IBM)
Rick Moen
rick at linuxmafia.com
Sun Feb 10 21:17:39 PST 2008
Subject matter of this post is something I've never had occasion to try,
but I've just looked it up because Kai is interested. Things I found:
http://tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html
Speech Recognition HOWTO, dtd 2002 (slightly moldy)
Talks about various aspects of the problem including hardware.
Software mentioned:
A. Open source:
1. XVoice
Dictation/continuous speech recognizer that can be used with a variety
of X applications.
Requires IBM ViaVoice for Linux and Motif/Lesstif graphics libs.
http://www.compapp.dcu.ie/~tdoris/Xvoice/
http://www.zachary.com/creemer/xvoice.html
http://xvoice.sourceforge.net
http://www.onelist.com/community/xvoice
2. CVoiceControl (Console Voice Control)
A basic speech recognition system that allows a user to execute Linux
commands by using spoken commands, and includes a microphone-level
configuration utility, a vocabulary "model editor" for adding new
commands and utterances, and the speech recognition system..
(Replaces KVoiceControl.)
http://www.kiecza.de/daniel/linux/
http://www.kiecza.de/daniel/linux/cvoicecontrol/
3. Open Mind Speech
Not end-user oriented, and still under development at the time of the
HOWTO update. Previously called FreeSpeech, before that SpeechInput,
before that VoiceControl.
http://freespeech.sourceforge.net/
2008 update: "mostly complete". Last update was 2002.
http://sourceforge.net/projects/freespeech/ They've added a nice
C++ rapid-development environment called FlowDesigner and are using
that.
http://flowdesigner.sourceforge.net/wiki/index.php/Main_Page
Looks like the "Open Mind Speech environment aka Piper PL" has been
given the name "Overflow". (I hope this is meaningful to some people,
because it isn't to me.)
4. GVoice
A library (i.e. core module to be used by other software) to use
IBM's ViaVoice to control Gtk/GNOME apps, including libraries for
initialization, recognition engine, vocabulary manipulation, and panel
control. Development was stalled
at the time of the HOWTO update.
http://www.cse.ogi.edu/~omega/gnome/gvoice/
5. ISIP
Speech recognition engine (toolkit) from the Mississiptti State U.
Institute for Signal and Information Processing, aimed at developers,
including a front-end, a decoder, and a training module.
http://www.isip.msstate.edu/project/speech/
6. CMU Sphinx
Large package, aimed at developers, including trainers, recognizers,
acoustic models, language models, and some limited documentation.
http://www.speech.cs.cmu.edu/sphinx/Sphinx.html
http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz
7. Ears
Another in-progress kit for developers.
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/
8. NICO ANN Toolkit
NICO Artificial Neural Network toolkit, aimed at developers, is a
flexible back propagation neural network toolkit optimized for
speech recognition applications.
http://www.speech.kth.se/NICO/
9. Myers's Hidden Markov Model Software
Developers' toolkit implementing in C++ Hidden Markov Model algorithms
detailed in L. Rabiner's book "Fundamentals of Speech Recognition".
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html
10. Jialong He's Speech Recognition Research Tool
Research tool for developers implementing three different types of
recognisers: DTW, Dynamic Hidden Markov Model, and a Continuous
Density Hidden Markov Model.
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html
B. Proprietary Software.
1. IBM ViaVoice
Proprietary, partly gratis, partly for pay as of the HOWTO update.
Had hefty resource requirements for the day. Includes documentation
(PDF), trainer, dictation system, and installation scripts. Some other
components available. Apparently, Java stuff.
http://www-4.ibm.com/software/speech/dev/sdk_linux.html (Gone.)
(See footnote [1], below: IBM killed it.)
2. Vocalis Speechware
http://www.vocalisspeechware.com/
http://www.vocalis.com/
3. Babel Technologies's Babear SDK for Linux
Speaker-independent system based on Hybrid Markov Models and
Artificial Neural Networks technology. They also have a variety of
products for Text-to-speech, speaker verification, and phoneme analysis.
http://www.babeltech.com/
4. SpeechWorks
http://www.speechworks.com/
5. Nuance
Speech recognition/natural language product; can handle very large
vocabularies and uses a unqiue distributed architecture for scalability
and fault tolerance.
http://www.nuance.com/
6. Abbot/AbbotDemo
very large vocabulary, speaker independent system, originally
developed at Cambridge Univ., then spun off.
http://www.softsound.com/
7. Entropic
Offered software for Linux, but then were bought by Microsoft.
Old site http://www.entropic.com/ showed what they had (but you'll
probably have to use an Internet Archive snapshot, by now).
Older copy of their Hidden Markov Model Toolkit is available gratis
(but proprietary) from http://htk.eng.cam.ac.uk/ .
A bunch more options (a second catalogue of projects):
http://linux-sound.org/speech.html
A good page on the subject, last updated _June 2005_ and hence much less
moldy than the HOWTO:
http://volker.top.geek.nz/linux/speechrec.html
[1] Article from 2004 about IBM plans to finally open-source ViaVoice:
http://www.theinquirer.net/en/inquirer/news/2004/09/14/ibm-to-open-source-speech-recognition
(At 2008, I see no sign that they ever did.)
Article from 2002 about IBM making yet more bizarre moves, including
discontuing without comment the Linux SDK for ViaVoice:
http://www.linuxjournal.com/article/6383
Article from 2004 that IBM had open-sourced _some_ voice-recognition
software, donating it to Apache Softwre Foundation and Eclipse
Foundation, but had omitted ViaVoice:
http://www.hackinthebox.org/modules.php?op=modload&name=News&file=article&sid=14188&mode=thread&order=0&thold=0
Further detail:
http://www.theinquirer.net/en/inquirer/news/2004/09/22/open-sourced-ibm-speech-code-doesnt-include-viavoice
Sounds like ViaVoice for Linux -- both the SDK and runtime -- has been
bureaucratised to death and buried somewhere within IBM. Too bad, but
that's what happens all too often when you rely on proprietary software.
http://xvoice.sourceforge.net/faq.html includes:
What is xvoice?
Xvoice enables continuous speech dictation and speech control of
most X applications. To convert users' speech into text it uses the IBM
ViaVoice speech recognition engine, which is no longer made available
from IBM.
Where can I get the ViaVoice Runtime RPM, the ViaVoice SDK RPM, or the
ViaVoice Dictation (GUI) RPM?
They are no longer available from IBM. Used versions may be
available; ask on the mailing list for more help locating people who are
willing to relinquish their license(s) to you. Check in at the xvoice
mailing list to stay up to date on developments.
More information about the conspire
mailing list