ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A

Python pocketsphinx speech recognition (GStreamer free)

Hi. I come from speech recognition community, and only start experimenting with ROS.

I found several examples of ROS voice control using pocketsphinx. However, they seem a little too complicated, out-dated and also require GStreamer dependency.

I prepared a simple python demo using the latest pocketsphinx-python release. It works in keyword spotting mode, which means better filtering of out-of-vocabulary words.

I debugged it only with turtlebot world simulator, which works fine. However, I did not have a chance to try it on anything else.

The questions are:

  • could it be useful?
  • if yes, what is the proper way to integrate it?

Thanks for the suggestions if any

1 Like

Hi Arseniy,

I am workiing with Ubiquity Robotics, and last year I wrote a speech
control module for these robots, which use ROS. Originally I tried to use
pocketsphinx, but I couldn’t make it work well (surely my fault), and when
I learned about the web speech api I decided to go that way. Also, I
realized that putting the recognition in the robot was a bad idea because
it could easily go out of earshot.

So my stuff is written in Javascript, runs in a browser on notebook or
phone. It does recognition using web speech and transmits commands to the
robot using rosbridge, over wifi.

This works pretty well, as long as there is a good internet connection and
good wifi between the controlling station and the robot. Pocketsphinx
could provide local recognition and so make the internet connection
unnecessary, if the local wifi is reliable and the distances permit.

The only problem I see with command spotting mode is in setting up
waypoints, which (at least in my implementation requires the use of
unfamiliar words.

Here is a demo.

and github

I’ll be happy to talk shop anytime.

Joe Landau

1 Like

That would definitely be useful as there are already a few efforts to deal with speech recognition (like
I would suggest you try to find other packages on the wiki or ROS answers (e.g. and try to see if you could come up with a common message. That would greatly help speech recognition efforts.
For pocket sphinx specifically, check with the authors of the other packages you have found and try to see if they would be ok to merge with yours (they might not be on this mailing list so pinging them right up could be nice).
And yes there is interest! Robots need to interact with humans!


Thank you for the suggestions and for the links.

As for pocketsphinx, I tried to search for the existing projects. The official one seems to be no more maintained according to commit and PR activity.

Since that time pocketsphinx community came up with some nice features, super simplified interface and minimum dependencies, better accuracy and many more languages supported. So I was feeling it should be somehow spread out. One concern is that my expertise is enough to only provide a simple working example, while I have no idea how this should be integrated in real robots / wether it will have conflicts with other versions of ROS, interfaces, etc.

You are right, I’ll try to reach the authors of other similar projects, too.
Thanks again!

This is pretty cool! I agree with Vincent that robot human interaction is important, but unfortunately mostly ignored.

Since it seems that you want to use this project as an opportunity for learning ROS, there are a few things you could do to become for situated with ROS, also while integrating your code better. Ordered roughly in order of difficulty.

  • Put your node in a catkin package, this allows it be be run/launched with rosrun and roslaunch, as well as depend on your node from their own packages.
  • Write a demo roslaunch file, which could for example run both the simulator and your node
  • Use ROS parameters instead of command line args, this way your node can be configured the same ways as other nodes
  • You could also try separating out detection of words and the actions in separate nodes. The speech recognition node can be given a dictionary at start and publish std_msgs/Strings to a node that moves the robot based on the commands. This way people can swap out parts, such as using Web-based speech instead of pocketsphinx, or using a natural language processing node instead of a simple dictionary one

This list is merely some suggestions for using this project to lean. Anyway, this is list is already getting quite a ways up on the ROS learning curve, and will put you well on the way to becoming comfortable with ROS. If you run into any issues, I, and I’m sure others in the ROS community will be glad to help.


1 Like

Great input! I’ll use your project as an example then.

Agree that using ASR onboard is hard. Noise and far-field signal distortions are still challenging unless you have microphone array and some fancy processing.

Adding new words in keyword spotting mode is actually quite easy and can be done on the fly. But yes, for good performance we are limited with a few dozens of phrases…

Again thanks a lot. This was helpful.

Ah great, I definitely have to study this. This was something I actually removed from the old pocketsphinx gstreamer project.

Thank you, will modify the code soon

Just for reference, here is an implementation / wrapper for the Sphinx4 Java library: code

Unfortunately it’s a bit complex to install just this package without the entire RAPP Platform infrastructure, but I think it’s worth a look.

We have a live instantiation of RAPP Platform if you want to test the call with a Python API. Check here for the cloud-part and here for the Python API