AboutChuck Cosby Expertise I can answer questions about speech recognition and natural language understanding. I am particulary strong in knowedge based natural langauge techniques. I cannot answer questions about robotics, nueral nets, prolog, or vision recognition - just speech and natural language.
Experience I have spent 25 years developing natural language software products. I have never developed speech systems, but I have developed sophisticated interfaces from natural language to speech. I have been working with speech recognition systems also for 25 years.
Question [I accidentally asked this in general programming, but clearly you are the right person to ask!]
I have lots of speech data and corresponding text that I would like to use to train SAPI (or a similar / better speech recognizing speech engine) that I would like to use to make speaker invariant speech recognition software.
Do you know what Google is using to run all the information it is collecting from it's goog 411 service? It is my belief that they use all the voice information and corresponding text to train an invariant engine to better translate their videos into searchable text.
Does SAPI only allow for one user profile at a time, or if you run many voices will it being to understand variable language speech?
If not, what can I use that will learn an invariant form from a large data set?
I hope to hear from you soon
Thanks!
-Bilal Ghalib
Answer SAPI is mostly a speaker dependant system - you will need to do voice training for each user. I don't know what Google is using - look for an open source solution as that is thier preference. SAPI works best one user at a time. Your last question represents the one of the holy grails of speech recognition. Let me know when you figure it out. (Hint: Deep natural language processing integrated with speech reco is a big part of the solution)