Saturday, August 14, 2010

Speech recognition under Linux using simon

Simon is a free speech recognition engine for Linux and Windows. Trying to make it work, I found that there is quite a lack of easy-to-understand tutorials for this software. In this article I'll describe how to record some words and link them to commands.
First, install simon. You can download the software for free. You'll also need to install htk; you need to register at their Website, but it's free and they don't send you spam (as far as I can see that). I won't describe how to do that here, you'll find plenty of tutorials in the internet.

When you're done installing, start simon. First, create a new Scenario (Scenarios -> Manage -> New). You can also download some predefined stuff from the internet if you like. The next thing you should do is importing a shadow dictionary. Such a dictionary contains a list of words and how they are pronounced, which helps simon to recognize them easier. This blog has a nice collection of dictionaries for all languages you could think about (right column, "Import PLS dictionary"). Import it by clicking Vocabulary -> Import Dictionary. Select "Shadow dictionary" and "PLS".

Now, you can add a word. Click "Add word" and just enter the word like you normally write it. A good word to start with could be "firefox", used to start the firefox web browser. Simon should now find that word in the shadow dictionary you just imported and tell you about the pronunciation (in fact, it does not matter what syntax is used for pronunciation, but the same phoneme should always have the same letter; the dictionary does this very well). You're also asked to select a type for the word you entered (like "Noun"). Be sure to select "Trigger" here (create that category if it does not exist). You are now asked to speak the word two times for simon to learn how it sounds. I recommend training the word at least five more times using the training function from the vocabulary tab (recognition becomes reliable for me when I trained a word about ten to fifteen times).

The next thing you should do now is adding a grammar structure that allows simon to recognize the word as a command. We'll make it very simple; click the Grammar tab, add a new sentence, and just enter the type of word you just selected (like "Trigger"; by the way, simon calls those types "Terminals").

Now, the interesting part; we define commands. Go to the "Commands" tab and click "Manage plugins". Click "Add". Select "Program" and click "Ok". Now, select "Program" in the list and clear the "Trigger" field (for me, it was set to "Computer"). This allows you to define a trigger word for the given category of commands, but we don't want that now, it makes things more complicated. Click Ok. Now, click the "New Command" button on the right.  Select "Program" as type. In the top of the window, there's a "Name" field; in brackets, you see the Terminal (type) of the word simon expects. It's very important that this type ("Trigger" for me) matches the Terminal you entered for the word you just added. You can now select a program or a command, for Linux, I just used Command and entered "firefox".

You're done! Now click the "Activate" button and hope the HTK (that's what it's for) succeeds in compling your language model. When it's done, speak (as much as possible like you did before) "firefox" and hope firefox starts. Good luck!

Troubleshooting:
  • Nothing happens when I speak. For me, this was usually because the terminals of the commands did not match. Be sure that: a) you have defined a grammar structure that matches the command you issue (in our case, it should just contain "Trigger"); b) that the type of the word "firefox" (or whatever) is the same ("Trigger" in our case), and c) that "Trigger" is what simon expects as type (you can see that in the "Add command" dialogue, it should ask you for something like "Name (Trigger):". If you select the command in the "Commands" tab, you'll see a summary in the right; you can also trigger the command there to see if it works. In the top of that summary, there's written what you need to say to activate this command; be sure it reads "firefox", and not something like "Computer firefox". In the latter case, remove the "Computer" trigger by using the "Manage plugins" tab. Note: It's improbable that your executable setup is wrong, as simon displays a small blue window in the bottom of the screen whenever it recognizes a command.
  •  I cannot activate simon, it displays error messages. Sorry, I'm not an expert, so I can just advise you to read the error message carefully; usually, it provides enough information to fix the problem. Be sure you defined at least one word, trained it, and defined at least one grammar structure.
I'd be glad if you told me in the comments whether or not this worked for you.

1 comment:

  1. I will argue that the very fact that we have such models is at the root of the problem. Let me explain.
    speech recognition program

    ReplyDelete