ROS2 OpenAI Whisper

We implemented an inference node for the openai whisper audio recognition here GitHub - mhubii/ros2_whisper

It is a little hacked right now, and can be activated by saying “Hello ROS”, which whisper understands as “Hello Ross”.

Noticeably, audio_common does not build for ros2. Hence, there is a small node that captures and publishes audio using pyaudio.

Moving forward, it would probably make sense to

  • Fix audio_common
  • Clean up the code for doing inference using whisper. Ideally, the inference node should just publish text to a ros2 topic, rather than doing further processing
5 Likes

This is pretty cool, thanks @mhubii!

Any chance you could land a README.md at GitHub - mhubii/ros2_whisper and provide simple instructions on how to get it to work in a Linux-based workstation? I think this has lots of potential to be re-used.

thanks for checking this out @vmayoral , there is now a read me and a little documentation. But this repository is very experimental. I wonder if this could be turned into something more useful

1 Like

There have been changes to the ros2_whisper package. The package now lives under GitHub - ros-ai/ros2_whisper: Whisper C++ inference action server for ROS 2 and is built using whisper.cpp.

Main new ideas are:

  • Action server for whisper.cpp that publishes intermediate transcription as feedback and final transcription as result
  • A vendor package for whisper.cpp
  • A manager for downloading whisper.cpp models to cache
  • Ring buffer for audio

Note that there is another package out there whisper_ros, which also utilizes whisper.cpp but has no action server, no vendor, no model caching, and no ring buffer. It does, however, have voice activation detection through silero-vad. This could be supported nicely within ros2_whisper through the action server.

1 Like

Thanks @mhubii seems super useful, looking forward to trying this out.

Hi @peterdavidfagan , thanks for expressing your interest.

There are some cool concepts with the action server etc but the whisper cpp implementation is (despite high expectations and cuda backend) much slower than the plain pytorch one. Maybe there are some improvements now but it might make more sense to just use Open AIs pytorch version and keep some of the action server concepts

whisper.cpp 1.5.0 is out and it is wild!

The whisper port now fully runs with CUDA backend with no other dependencies.

It is now available for ros2_whisper as well: GitHub - ros-ai/ros2_whisper: Whisper C++ inference action server for ROS 2

1 Like