Speech recognizer sample code

Now let us see some sample program. I assume that readers are familiar with speech technology and its terms (if not take a look: Java speech recognizer).

Requirements

To use the Java Speech API, a user must have certain minimum software and hardware available.

Speech software: we require jsapi (in our case we are going to use cmu sphinx)

Audio Hardware: Microphone, speakers

**********************************************************

package edu.cmu.sphinx.demo.hellodigits;

import edu.cmu.sphinx.frontend.util.Microphone;

import edu.cmu.sphinx.recognizer.Recognizer;

import edu.cmu.sphinx.result.Result;

import edu.cmu.sphinx.util.props.ConfigurationManager;

import java.io.File;

import java.net.MalformedURLException;

import java.net.URL;

/**

* A simple HelloDigits demo showing a simple speech application built using Sphinx-4. This application uses the Sphinx-4 endpointer, which automatically segments incoming audio into utterances and silences.

*/

public class HelloDigits {

public static void main(String[] args) throws MalformedURLException {

URL url;

if (args.length > 0) {

url = new File(args[0]).toURI().toURL();

} else {

url = HelloDigits.class.getResource(“hellodigits.config.xml”);

}

ConfigurationManager cm = new ConfigurationManager(url);

// allocate the recognizer

Recognizer recognizer = (Recognizer) cm.lookup(“recognizer”);

recognizer.allocate();

// start the microphone

Microphone microphone = (Microphone) cm.lookup(“microphone”);

microphone.startRecording();

System.out.println(“Say any digit(s): e.g. “two oh oh four” , or “three six five”.”);

// loop the recognition until the programm exits.

while (true) {

System.out.println(“Start speaking. Press Ctrl-C to quit.n”);

Result result = recognizer.recognize();

if (result != null) {

String resultText = result.getBestResultNoFiller();

System.out.println(“You said: ” + resultText + “n”);

} else {

System.out.println(“I can’t hear what you said.n”);

}

}

}

}

 

code for grammar file

***************digits.gram******************

#JSGF V1.0;

/**

* JSGF Digits Grammar for Hello World example

*/

grammar digits;

public <numbers> = (oh | zero | one | two | three | four | five | six | seven | eight | nine) * ;

*************************************************

 

code got xml configuration file

************hellodigits.config************

<?xml version=”1.0″ encoding=”UTF-8″?>

<!–

Sphinx-4 Configuration file

–>

<!– ******************************************************** –>

<!– an4 configuration file –>

<!– ******************************************************** –>

<config>

<!– ******************************************************** –>

<!– frequently tuned properties –>

<!– ******************************************************** –>

<property name=”logLevel” value=”WARNING”/>

<property name=”absoluteBeamWidth” value=”-1″/>

<property name=”relativeBeamWidth” value=”1E-80″/>

<property name=”wordInsertionProbability” value=”1E-36″/>

<property name=”languageWeight” value=”8″/>

<property name=”frontend” value=”epFrontEnd”/>

<property name=”recognizer” value=”recognizer”/>

<property name=”showCreations” value=”true”/>

<!– ******************************************************** –>

<!– word recognizer configuration –>

<!– ******************************************************** –>

<component name=”recognizer” type=”edu.cmu.sphinx.recognizer.Recognizer”>

<property name=”decoder” value=”decoder”/>

<propertylist name=”monitors”>

<item>accuracyTracker </item>

<item>speedTracker </item>

<item>memoryTracker </item>

</propertylist>

</component>

<!– ******************************************************** –>

<!– The Decoder configuration –>

<!– ******************************************************** –>

<component name=”decoder” type=”edu.cmu.sphinx.decoder.Decoder”>

<property name=”searchManager” value=”searchManager”/>

</component>

<component name=”searchManager”

type=”edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager”>

<property name=”logMath” value=”logMath”/>

<property name=”linguist” value=”flatLinguist”/>

<property name=”pruner” value=”trivialPruner”/>

<property name=”scorer” value=”threadedScorer”/>

<property name=”activeListFactory” value=”activeList”/>

</component>

<component name=”activeList”

type=”edu.cmu.sphinx.decoder.search.PartitionActiveListFactory”>

<property name=”logMath” value=”logMath”/>

<property name=”absoluteBeamWidth” value=”${absoluteBeamWidth}”/>

<property name=”relativeBeamWidth” value=”${relativeBeamWidth}”/>

</component>

<component name=”trivialPruner”

type=”edu.cmu.sphinx.decoder.pruner.SimplePruner”/>

<component name=”threadedScorer”

type=”edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer”>

<property name=”frontend” value=”${frontend}”/>

</component>

<!– ******************************************************** –>

<!– The linguist configuration –>

<!– ******************************************************** –>

<component name=”flatLinguist”

type=”edu.cmu.sphinx.linguist.flat.FlatLinguist”>

<property name=”logMath” value=”logMath”/>

<property name=”grammar” value=”jsgfGrammar”/>

<property name=”acousticModel” value=”tidigits”/>

<property name=”wordInsertionProbability”

value=”${wordInsertionProbability}”/>

<property name=”languageWeight” value=”${languageWeight}”/>

<property name=”unitManager” value=”unitManager”/>

</component>

<!– ******************************************************** –>

<!– The Grammar configuration –>

<!– ******************************************************** –>

<component name=”jsgfGrammar” type=”edu.cmu.sphinx.jsapi.JSGFGrammar”>

<property name=”dictionary” value=”dictionary”/>

<property name=”grammarLocation”

value=”resource:/edu.cmu.sphinx.demo.hellodigits.HelloDigits!/edu/cmu/sphinx/demo/hellodigits/”/>

<property name=”grammarName” value=”digits”/>

<property name=”logMath” value=”logMath”/>

</component>

<!– ******************************************************** –>

<!– The Dictionary configuration –>

<!– ******************************************************** –>

<component name=”dictionary”

type=”edu.cmu.sphinx.linguist.dictionary.FastDictionary”>

<property name=”dictionaryPath”

value=”resource:/edu.cmu.sphinx.model.acoustic.TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz/dictionary”/>

<property name=”fillerPath”

value=”resource:/edu.cmu.sphinx.model.acoustic.TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz/fillerdict”/>

<property name=”addSilEndingPronunciation” value=”false”/>

<property name=”wordReplacement” value=”&lt;sil&gt;”/>

<property name=”allowMissingWords” value=”false”/>

<property name=”unitManager” value=”unitManager”/>

</component>

<!– ******************************************************** –>

<!– The acoustic model configuration –>

<!– ******************************************************** –>

<component name=”tidigits”

type=”edu.cmu.sphinx.model.acoustic.TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model”>

<property name=”loader” value=”sphinx3Loader”/>

<property name=”unitManager” value=”unitManager”/>

</component>

<component name=”sphinx3Loader”

type=”edu.cmu.sphinx.model.acoustic.TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader”>

<property name=”logMath” value=”logMath”/>

<property name=”unitManager” value=”unitManager”/>

</component>

<!– ******************************************************** –>

<!– The unit manager configuration –>

<!– ******************************************************** –>

<component name=”unitManager”

type=”edu.cmu.sphinx.linguist.acoustic.UnitManager”/>

<!– ******************************************************** –>

<!– The live frontend configuration –>

<!– ******************************************************** –>

<component name=”epFrontEnd” type=”edu.cmu.sphinx.frontend.FrontEnd”>

<propertylist name=”pipeline”>

<item>microphone </item>

<item>dataBlocker </item>

<item>speechClassifier </item>

<item>speechMarker </item>

<item>nonSpeechDataFilter </item>

<item>preemphasizer </item>

<item>windower </item>

<item>fft </item>

<item>melFilterBank </item>

<item>dct </item>

<item>liveCMN </item>

<item>featureExtraction </item>

</propertylist>

</component>

<!– ******************************************************** –>

<!– The frontend pipelines –>

<!– ******************************************************** –>

<component name=”speechClassifier”

type=”edu.cmu.sphinx.frontend.endpoint.SpeechClassifier”>

<property name=”threshold” value=”13″/>

</component>

<component name=”dataBlocker” type=”edu.cmu.sphinx.frontend.DataBlocker”/>

<component name=”nonSpeechDataFilter”

type=”edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter”/>

<component name=”speechMarker”

type=”edu.cmu.sphinx.frontend.endpoint.SpeechMarker” >

<property name=”speechTrailer” value=”50″/>

</component>

<component name=”preemphasizer”

type=”edu.cmu.sphinx.frontend.filter.Preemphasizer”/>

<component name=”windower”

type=”edu.cmu.sphinx.frontend.window.RaisedCosineWindower”>

</component>

<component name=”fft”

type=”edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform”>

</component>

<component name=”melFilterBank”

type=”edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank”>

</component>

<component name=”dct”

type=”edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform”/>

<component name=”liveCMN”

type=”edu.cmu.sphinx.frontend.feature.LiveCMN”/>

<component name=”featureExtraction”

type=”edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor”/>

<component name=”microphone”

type=”edu.cmu.sphinx.frontend.util.Microphone”>

<property name=”closeBetweenUtterances” value=”false”/>

</component>

<!– ******************************************************* –>

<!– monitors –>

<!– ******************************************************* –>

<component name=”accuracyTracker”

type=”edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker”>

<property name=”recognizer” value=”${recognizer}”/>

<property name=”showAlignedResults” value=”false”/>

<property name=”showRawResults” value=”false”/>

</component>

<component name=”memoryTracker”

type=”edu.cmu.sphinx.instrumentation.MemoryTracker”>

<property name=”recognizer” value=”${recognizer}”/>

<property name=”showSummary” value=”false”/>

<property name=”showDetails” value=”false”/>

</component>

<component name=”speedTracker”

type=”edu.cmu.sphinx.instrumentation.SpeedTracker”>

<property name=”recognizer” value=”${recognizer}”/>

<property name=”frontend” value=”${frontend}”/>

<property name=”showSummary” value=”true”/>

<property name=”showDetails” value=”false”/>

</component>

<!– ******************************************************* –>

<!– Miscellaneous components –>

<!– ******************************************************* –>

<component name=”logMath” type=”edu.cmu.sphinx.util.LogMath”>

<property name=”logBase” value=”1.0001″/>

<property name=”useAddTable” value=”true”/>

</component>

</config>

 

Note: Recognizer consumes larger heap memory in VM than a normal program hence it is necessary to set the heap size manually.

Set the size as –Xmx256m and then run the program


18 Comments

  1. andy wrote
    at 3:59 PM - 18th March 2011 Permalink

    hwo do u compile the above program using cmu spinx?

  2. shakthydoss wrote
    at 4:46 PM - 18th March 2011 Permalink

    It is same as compiling any Java program.
    You must provide proper heap memory size at the time of execution also make sure that config file and grammar files are in correct path.

  3. techstu123 wrote
    at 12:10 PM - 7th June 2011 Permalink

    ok i figured what i was doin wrong,lot many things apparently.
    anyway,now i’m getting this error:

    Exception in thread “main” java.lang.NullPointerException
    at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:64)
    at edu.cmu.sphinx.util.props.ConfigurationManager.loader(ConfigurationManager.java:383)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:115)
    at HelloDigits.HelloDigits.main(HelloDigits.java:36)

    It gives this error where i get the recognizer.something wrong the the config file i guess.it compiles fine though.

  4. shakthydoss wrote
    at 2:14 PM - 7th June 2011 Permalink

    I Dont know what the changes you have made exactly.
    Any How current exception may come due grammar file configuration.

    In the Grammar file(XML File) , go to The Grammar configuration section
    then check this

  5. techstu123 wrote
    at 8:28 AM - 7th June 2011 Permalink

    I tried this out,i didn’t include the package statement n i saved the files to my regular directory called Project.IT fives me a NoClassDefFoundError.
    Speech Recognition is a major part of my project.Please help.

  6. shakthydoss wrote
    at 2:43 PM - 7th June 2011 Permalink

    In the Grammar file(XML File) , go to The Grammar configuration section
    then check this
    property name=”grammarLocation”

    value=”resource:/edu.cmu.sphinx.demo.hellodigits.HelloDigits!/edu/cmu/sphinx/demo/hellodigits/”

  7. techstu123 wrote
    at 3:45 AM - 8th June 2011 Permalink

    ya its right,no problem there.

  8. techstu123 wrote
    at 3:47 AM - 8th June 2011 Permalink

    Exception in thread “main” java.lang.NullPointerException
    at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:64)
    at edu.cmu.sphinx.util.props.ConfigurationManager.loader(ConfigurationManager.java:383)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:115)
    at hellodigits.HelloDigits.main(HelloDigits.java:36)

  9. techstu123 wrote
    at 4:33 AM - 8th June 2011 Permalink

    found out the prob.the placement of config.xml was outside the package!
    now i am getting this error :

    Exception in thread “main” java.lang.NullPointerException
    at hellodigits.HelloDigits.main(HelloDigits.java:47)

    th eline is:microphone.startRecording();could be wrong?

    my mike is connected and working.What

  10. shakthydoss wrote
    at 7:35 AM - 8th June 2011 Permalink

    To techstu123
    Take a look of this http://shakthydoss.wordpress.com/2011/06/08/speech-recognizer-execution/

  11. Mustafa adel wrote
    at 2:32 AM - 8th June 2011 Permalink

    I have error on recognizer.allocate(); >> null please help as I get disappointed with the error ?? mustafa.adel.elnagar@ieee.org

  12. shakthydossdoss wrote
    at 7:32 AM - 8th June 2011 Permalink

    To Mustafa adel
    Ok this may be help full for you http://shakthydoss.wordpress.com/2011/06/08/speech-recognizer-execution/

  13. Ryan Kareem wrote
    at 4:57 AM - 1st February 2012 Permalink

    Hi, I have run the 2000 project into my netbeans and it worked fine. However, I am not getting any clue of adding new file in the dictionary. I followed the following process,

    PART ONE
    Step 1 : Create a txt file “words.txt”, Write all the names of cities and states in it and save.
    Step 2 : Open this link : http://www.speech.cs.cmu.edu/tools/lmtool.html
    Step 3 : On that page, go to “Sentence corpus file:” section, Browse to “words.txt” file and click “Compile Knowledge Base”.
    Step 4 : On next page, Click on “Dictionary” link and save that .DIC file.

    PART TWO
    Step 1 : Extract WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar file.
    Step 2 : Go to educmusphinxmodelacousticWSJ_8gau_13dCep_16k_40mel_130Hz_6800Hzdict folder.
    Step 3 : Open “cmudict.0.6d” file in that folder.
    Step 4 : Copy data from .DIC file, you have downloaded in PART ONE, paste it in “cmudict.0.6d” file and save.
    Step 5 : Zip the extracted hierarchy back as it was and Zip file named should be same as JAR file.

    Now, remove “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar” file from Project’s CLASSPATH and add “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.zip” instead of it.

    (got these steps from this link http://puneetk.com/expanding-dictionary-of-acoustic-model)

    but after following the process, I got this exception message,

    class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
    Exception in thread “main” Property Exception component:’flatLinguist’ property:’acousticModel’ – mandatory property is not set!
    edu.cmu.sphinx.util.props.InternalConfigurationException
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:292)
    at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:207)
    at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:171)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:430)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:280)
    at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.newProperties(SimpleBreadthFirstSearchManager.java:145)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:430)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:280)
    at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:52)
    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:31)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:430)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:280)
    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:78)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:430)
    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:163)
    at edu.cmu.sphinx.demo.helloworld.HelloWorld.main(HelloWorld.java:36)
    Java Result: 1

    Could you please help me to add more words in the project you posted (2000)? It would be great to have a video tutorial on the step of adding a words. I searched and found one video about this on youtube but it didnot work as well.

    Thanks in advanced

  14. shakthydoss wrote
    at 7:59 AM - 1st February 2012 Permalink

    I will try and let you know what it is going wrong.

  15. abinash bastola wrote
    at 1:53 PM - 8th February 2013 Permalink

    “hi shakthydoss! i am avinash, i need your help to understand What are actually sphinx and freetts? why cant we use jsapi only to develop speech synthesizer or recognizer? jsapi.jar can also be created using jsapi.exe found in freetts and sphinx what does it mean?Is tha jsapi.jar diffrent from other jsapi.jar provided by oracle? please help…”

  16. SWAPNIL wrote
    at 11:10 PM - 31st May 2013 Permalink

    @Ryan Kareem I am also getting the same problem. Were you able to debug your problem . Please help me out as soon as possible.

  17. shakthydoss wrote
    at 7:29 PM - 4th September 2013 Permalink

    Hi all,

    Demo Application has been moved to github. Please find the link.
    https://github.com/shakthydoss/SpeechRecognizerDemo/

  18. Nguyen Thanh Hung wrote
    at 2:22 PM - 27th May 2015 Permalink

    now, I want to build a programming that will say a new dictionary, how to do it: for Ex my dictionary is
    Hello John, Are you studying CMU Sphnix?
    🙂

Post a Comment

Your email is never published nor shared. Required fields are marked *