Announcement: A ROS package for google's speech-to-text API and NLP API - Dialogflow

piraka9011 · March 28, 2018, 11:12pm

Hi everyone,

This is my first time contributing to open source in general so bear with me.
I have developed a ROS package that uses google’s speech to text API to publish text onto a topic. That is then parsed by their NLP platform, Dialogflow, to extract a user’s intent and whether a developer wishes to run actions associated with that intent. It’s much better explained when you see their console here.

My package can be found here: http://wiki.ros.org/dialogflow_ros

I’d love to get some feedback on how to enhance this and what features to add. Also, would like to see if this is of interest to people.

Best,
Anas

awesomebytes · March 29, 2018, 12:21am

You have PR on a few little things. Thanks for your contribution.

jrlandau · March 29, 2018, 12:30am

Anas,

I have been working on adding Dialogflow to my Robot Commander app, which runs on Android. So far it’s looking good, but not ready for release. One problem: I am using Dialogflow api v.1, because v,2 speech handling is subject to charges, and I don’t see a way to pass them on.

But if I understand Diagflow correctly, the use of v.1 does not allow sending audio to Google for complete processing including speech recognition, NLP, and voice interaction. Thus I am using speech recognition to get the user’s utterance as text, then sending the text to Dialogflow as I think you are. The drawback of this is 2 trips over the net instead of one, with the consequent performance penalty.

I’ll be interested to follow your progress.

Joe

piraka9011 · March 29, 2018, 1:11am

Thanks Sam! Saw the PR and appreciate setting me on track. I did not know Python2 classes should explicitly inherit object so thats a +1.
I did notice that there were some requirements already listed within rosdep so that will simplify things. Will update requirements.txt in the meantime.
Merged and will follow up accordingly.

piraka9011 · March 29, 2018, 1:42am

Hey Joe,

So I am using a similar approach to yours and that’s cause even in v2 the streaming API is not stable (not even implemented if I read the code correctly…). So what I do is use the asynchronous continuous speech streaming function, get the text, send it to NLP and get the fulfillment text.

Feel free to take a look at my implementation. You’ll notice I have 2 nodes, one for TTS and the other for NLP.

Once I get word that Dialogflow has audio streaming ready, I’ll add that functionality.

Anas

Topic		Replies	Views
Announcement: Update for Google's Dialogflow-v2 and Speech-To-Text API for ROS ROS General release , kinetic , melodic	0	1486	November 9, 2018
New offline speech recognition ROS package: picovoice_driver Projects	2	1674	July 12, 2021
Porting a project from ROS1 to ROS2 — our experience ROS General ros2	2	1216	April 16, 2020
ROS Node for Speech Recognition with Mozilla DeepSpeech ROS General	1	1358	November 27, 2018
ROS2 OpenAI Whisper Projects ros2	6	5522	November 20, 2023

Announcement: A ROS package for google's speech-to-text API and NLP API - Dialogflow

Related topics