ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A answers.ros.org

Announcement: A ROS package for google's speech-to-text API and NLP API - Dialogflow

kinetic

#1

Hi everyone,

This is my first time contributing to open source in general so bear with me.
I have developed a ROS package that uses google’s speech to text API to publish text onto a topic. That is then parsed by their NLP platform, Dialogflow, to extract a user’s intent and whether a developer wishes to run actions associated with that intent. It’s much better explained when you see their console here.

My package can be found here: http://wiki.ros.org/dialogflow_ros

I’d love to get some feedback on how to enhance this and what features to add. Also, would like to see if this is of interest to people.

Best,
Anas


#2

You have PR on a few little things. Thanks for your contribution.


#3

Anas,

I have been working on adding Dialogflow to my Robot Commander app, which runs on Android. So far it’s looking good, but not ready for release. One problem: I am using Dialogflow api v.1, because v,2 speech handling is subject to charges, and I don’t see a way to pass them on.

But if I understand Diagflow correctly, the use of v.1 does not allow sending audio to Google for complete processing including speech recognition, NLP, and voice interaction. Thus I am using speech recognition to get the user’s utterance as text, then sending the text to Dialogflow as I think you are. The drawback of this is 2 trips over the net instead of one, with the consequent performance penalty.

I’ll be interested to follow your progress.

Joe


#4

Thanks Sam! Saw the PR and appreciate setting me on track. I did not know Python2 classes should explicitly inherit object so thats a +1.
I did notice that there were some requirements already listed within rosdep so that will simplify things. Will update requirements.txt in the meantime.
Merged and will follow up accordingly.


#5

Hey Joe,

So I am using a similar approach to yours and that’s cause even in v2 the streaming API is not stable (not even implemented if I read the code correctly…). So what I do is use the asynchronous continuous speech streaming function, get the text, send it to NLP and get the fulfillment text.

Feel free to take a look at my implementation. You’ll notice I have 2 nodes, one for TTS and the other for NLP.

Once I get word that Dialogflow has audio streaming ready, I’ll add that functionality.

Anas