We implemented an inference node for the openai whisper audio recognition here GitHub - mhubii/ros2_whisper
It is a little hacked right now, and can be activated by saying “Hello ROS”, which whisper
understands as “Hello Ross”.
Noticeably, audio_common does not build for ros2
. Hence, there is a small node that captures and publishes audio using pyaudio
.
Moving forward, it would probably make sense to
- Fix
audio_common
- Clean up the code for doing inference using
whisper
. Ideally, the inference node should just publish text to aros2
topic, rather than doing further processing