Do I need a cloud for speech recognition or speech-to-text?

buhtz · March 29, 2018, 12:15pm

This is an X-Post to StackExchange “Robotics” section.

Definition
In the context of speaking to a robot and make it understand: Is there a difference between the two words speech recognition and speech to text?

My tasks
In the current state of my knowledge and plans I only need speech to text in the meaning that the robot will record spoken words via its microphones and convert that to strings - not more. How the strings are interpreted is the business of the programmer - the scripts I put on the machine.

It is not my goal to bring automaticly “sense” into the spoken words like the “smart” speakers trying to do.

Do I need a “cloud”?
When and why do I need a extern computer system (e. g. a cloud-based servicer like one of the usa-data-hungry “KI”-systems or a NVIDIA Jetson System) for that tasks? What are the “borders” of the different solutions?

Why I ask?
My question is not about specific products or coding problems. I prepare to buy a research robot (don’t want to make advertising at this point) and try to figuring out the conrete setup/configuration of the machine.

The machine will have contact to vulnarable people in case of research. So there are a lot of reasons why cloud-based service are not an option: Privacy of the subject, data security laws, ethical concers (ethics commissions won’t never say OK).

The goal of my question is to get a bigger picture about that topic and it’s side topics.

Loy · March 29, 2018, 1:37pm

No, you do not need a ‘cloud’.

Getting text from speech is a separate problem from the interpretation of that text. That last part is sometimes called NLU: Natural Language Understanding and is related to Natural Language Processing.

Several cloud or online services exist that do both and there are that only do the text-to-speech, not the NLU.
There are offline (ROS)-packages as well:

https://wiki.ros.org/pocketsphinx (though not easy to make reliable in my experience). Also take a look at https://answers.ros.org/question/246247/speech-recognition-packages-for-ros-kinetic-kame/
For RoboCup@Home, my team uses https://github.com/tue-robotics/dragonfly_speech_recognition together with https://github.com/tue-robotics/grammar_parser
https://answers.ros.org/question/60323/speech-recognition-packages/
https://github.com/julius-speech/julius
https://snips.ai/

ruffsl · March 31, 2018, 6:31pm

The mycroft community has been working on local speech to text:

Mycroft also has a project called OpenSTT, although it seems to be presently dormant.

Also, if you happen to need something thing to go the other direction, say TTS:

Topic		Replies	Views
Voice recognition without a cloud. General	11	2615	March 24, 2020
Potential Cloud Robotics WG General wg-cloud-robotics	59	4213	April 25, 2024
Announcement: A ROS package for google's speech-to-text API and NLP API - Dialogflow General kinetic	4	3957	March 29, 2018
Announcement: Update for Google's Dialogflow-v2 and Speech-To-Text API for ROS General release , kinetic , melodic	0	1485	November 9, 2018
Creation of voice control for kuka iiwa ROS Projects	3	503	September 8, 2022

Do I need a cloud for speech recognition or speech-to-text?

Related topics