[conspire] Speech recognition on Linux (or, don't rely on IBM)

Sun Feb 10 21:17:39 PST 2008

Subject matter of this post is something I've never had occasion to try,
but I've just looked it up because Kai is interested.  Things I found:

http://tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html 
  Speech Recognition HOWTO, dtd 2002 (slightly moldy)
  Talks about various aspects of the problem including hardware.
  Software mentioned:

  A. Open source:
  1.  XVoice 
  Dictation/continuous speech recognizer that can be used with a variety
  of X applications. 
  Requires IBM ViaVoice for Linux and Motif/Lesstif graphics libs.
  http://www.compapp.dcu.ie/~tdoris/Xvoice/
  http://www.zachary.com/creemer/xvoice.html
  http://xvoice.sourceforge.net
  http://www.onelist.com/community/xvoice

  2.  CVoiceControl (Console Voice Control) 
  A basic speech recognition system that allows a user to execute Linux
  commands by using spoken commands, and includes a microphone-level
  configuration utility, a vocabulary "model editor" for adding new
  commands and utterances, and the speech recognition system..  
  (Replaces KVoiceControl.)
  http://www.kiecza.de/daniel/linux/
  http://www.kiecza.de/daniel/linux/cvoicecontrol/

  3.  Open Mind Speech
  Not end-user oriented, and still under development at the time of the 
  HOWTO update.  Previously called FreeSpeech, before that SpeechInput,
  before that VoiceControl.
  http://freespeech.sourceforge.net/

  2008 update:  "mostly complete".  Last update was 2002.
  http://sourceforge.net/projects/freespeech/  They've added a nice 
  C++ rapid-development environment called FlowDesigner and are using
  that.
  http://flowdesigner.sourceforge.net/wiki/index.php/Main_Page
  Looks like the "Open Mind Speech environment aka Piper PL" has been 
  given the name "Overflow".  (I hope this is meaningful to some people,
  because it isn't to me.)

  4.  GVoice
  A library (i.e. core module to be used by other software) to use 
  IBM's ViaVoice to control Gtk/GNOME apps, including libraries for
  initialization, recognition engine, vocabulary manipulation, and panel
  control.  Development was stalled
  at the time of the HOWTO update.
  http://www.cse.ogi.edu/~omega/gnome/gvoice/

  5.  ISIP
  Speech recognition engine (toolkit) from the Mississiptti State U.
  Institute for Signal and Information Processing, aimed at developers, 
  including a front-end, a decoder, and a training module.
  http://www.isip.msstate.edu/project/speech/

  6.  CMU Sphinx
  Large package, aimed at developers, including trainers, recognizers, 
  acoustic models, language models, and some limited documentation.
  http://www.speech.cs.cmu.edu/sphinx/Sphinx.html
  http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz

  7.  Ears
  Another in-progress kit for developers.
  ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/

  8.  NICO ANN Toolkit
  NICO Artificial Neural Network toolkit, aimed at developers, is a 
  flexible back propagation neural network toolkit optimized for 
  speech recognition applications.
  http://www.speech.kth.se/NICO/

  9.  Myers's Hidden Markov Model Software
  Developers' toolkit implementing in C++ Hidden Markov Model algorithms 
  detailed in L. Rabiner's book "Fundamentals of Speech Recognition".
  http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html

  10.  Jialong He's Speech Recognition Research Tool
  Research tool for developers implementing three different types of 
  recognisers:  DTW, Dynamic Hidden Markov Model, and a Continuous
  Density Hidden Markov Model.
  http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html

  B.   Proprietary Software.
  1.  IBM ViaVoice
  Proprietary, partly gratis, partly for pay as of the HOWTO update.  
  Had hefty resource requirements for the day.  Includes documentation 
  (PDF), trainer, dictation system, and installation scripts.  Some other 
  components available.  Apparently, Java stuff.
  http://www-4.ibm.com/software/speech/dev/sdk_linux.html  (Gone.)
  (See footnote [1], below:  IBM killed it.)

  2.  Vocalis Speechware
  http://www.vocalisspeechware.com/
  http://www.vocalis.com/

  3.  Babel Technologies's Babear SDK for Linux
  Speaker-independent system based on Hybrid Markov Models and
  Artificial Neural Networks technology. They also have a variety of
  products for Text-to-speech, speaker verification, and phoneme analysis. 
  http://www.babeltech.com/

  4.  SpeechWorks
  http://www.speechworks.com/

  5.  Nuance
  Speech recognition/natural language product; can handle very large
  vocabularies and uses a unqiue distributed architecture for scalability
  and fault tolerance. 
  http://www.nuance.com/

  6.  Abbot/AbbotDemo
  very large vocabulary, speaker independent system, originally
  developed at Cambridge Univ., then spun off.
  http://www.softsound.com/

  7.  Entropic
  Offered software for Linux, but then were bought by Microsoft.
  Old site http://www.entropic.com/ showed what they had (but you'll
  probably have to use an Internet Archive snapshot, by now).
  Older copy of their Hidden Markov Model Toolkit is available gratis
  (but proprietary) from http://htk.eng.cam.ac.uk/ .

A bunch more options (a second catalogue of projects):
http://linux-sound.org/speech.html

A good page on the subject, last updated _June 2005_ and hence much less
moldy than the HOWTO:
http://volker.top.geek.nz/linux/speechrec.html

[1] Article from 2004 about IBM plans to finally open-source ViaVoice:
http://www.theinquirer.net/en/inquirer/news/2004/09/14/ibm-to-open-source-speech-recognition
(At 2008, I see no sign that they ever did.)
Article from 2002 about IBM making yet more bizarre moves, including 
discontuing without comment the Linux SDK for ViaVoice: 
http://www.linuxjournal.com/article/6383
Article from 2004 that IBM had open-sourced _some_ voice-recognition
software, donating it to Apache Softwre Foundation and Eclipse
Foundation, but had omitted ViaVoice:
http://www.hackinthebox.org/modules.php?op=modload&name=News&file=article&sid=14188&mode=thread&order=0&thold=0
Further detail:
http://www.theinquirer.net/en/inquirer/news/2004/09/22/open-sourced-ibm-speech-code-doesnt-include-viavoice

Sounds like ViaVoice for Linux -- both the SDK and runtime -- has been
bureaucratised to death and buried somewhere within IBM.  Too bad, but
that's what happens all too often when you rely on proprietary software.
http://xvoice.sourceforge.net/faq.html includes:

  What is xvoice?
    Xvoice enables continuous speech dictation and speech control of
  most X applications. To convert users' speech into text it uses the IBM
  ViaVoice speech recognition engine, which is no longer made available
  from IBM. 

  Where can I get the ViaVoice Runtime RPM, the ViaVoice SDK RPM, or the
  ViaVoice Dictation (GUI) RPM?
    They are no longer available from IBM. Used versions may be
  available; ask on the mailing list for more help locating people who are
  willing to relinquish their license(s) to you. Check in at the xvoice
  mailing list to stay up to date on developments.