This is an X-Post to StackExchange “Robotics” section.
Definition
In the context of speaking to a robot and make it understand: Is there a difference between the two words speech recognition and speech to text?
My tasks
In the current state of my knowledge and plans I only need speech to text in the meaning that the robot will record spoken words via its microphones and convert that to strings - not more. How the strings are interpreted is the business of the programmer - the scripts I put on the machine.
It is not my goal to bring automaticly “sense” into the spoken words like the “smart” speakers trying to do.
Do I need a “cloud”?
When and why do I need a extern computer system (e. g. a cloud-based servicer like one of the usa-data-hungry “KI”-systems or a NVIDIA Jetson System) for that tasks? What are the “borders” of the different solutions?
Why I ask?
My question is not about specific products or coding problems. I prepare to buy a research robot (don’t want to make advertising at this point) and try to figuring out the conrete setup/configuration of the machine.
The machine will have contact to vulnarable people in case of research. So there are a lot of reasons why cloud-based service are not an option: Privacy of the subject, data security laws, ethical concers (ethics commissions won’t never say OK).
The goal of my question is to get a bigger picture about that topic and it’s side topics.