ROS2 OpenAI Whisper

We implemented an inference node for the openai whisper audio recognition here GitHub - mhubii/ros2_whisper

It is a little hacked right now, and can be activated by saying “Hello ROS”, which whisper understands as “Hello Ross”.

Noticeably, audio_common does not build for ros2. Hence, there is a small node that captures and publishes audio using pyaudio.

Moving forward, it would probably make sense to

  • Fix audio_common
  • Clean up the code for doing inference using whisper. Ideally, the inference node should just publish text to a ros2 topic, rather than doing further processing

This is pretty cool, thanks @mhubii!

Any chance you could land a at GitHub - mhubii/ros2_whisper and provide simple instructions on how to get it to work in a Linux-based workstation? I think this has lots of potential to be re-used.

thanks for checking this out @vmayoral , there is now a read me and a little documentation. But this repository is very experimental. I wonder if this could be turned into something more useful

1 Like