Do I need a cloud for speech recognition or speech-to-text?

No, you do not need a ‘cloud’.

Getting text from speech is a separate problem from the interpretation of that text. That last part is sometimes called NLU: Natural Language Understanding and is related to Natural Language Processing.

Several cloud or online services exist that do both and there are that only do the text-to-speech, not the NLU.
There are offline (ROS)-packages as well:

1 Like