We implemented an inference node for the openai whisper audio recognition here GitHub - mhubii/ros2_whisper
It is a little hacked right now, and can be activated by saying “Hello ROS”, which
whisper understands as “Hello Ross”.
Noticeably, audio_common does not build for
ros2. Hence, there is a small node that captures and publishes audio using
Moving forward, it would probably make sense to
- Clean up the code for doing inference using
whisper. Ideally, the inference node should just publish text to a
ros2 topic, rather than doing further processing
This is pretty cool, thanks @mhubii!
Any chance you could land a README.md at GitHub - mhubii/ros2_whisper and provide simple instructions on how to get it to work in a Linux-based workstation? I think this has lots of potential to be re-used.
thanks for checking this out @vmayoral , there is now a read me and a little documentation. But this repository is very experimental. I wonder if this could be turned into something more useful
There have been changes to the
ros2_whisper package. The package now lives under GitHub - ros-ai/ros2_whisper: Whisper C++ inference action server for ROS 2 and is built using whisper.cpp.
Main new ideas are:
- Action server for
whisper.cpp that publishes intermediate transcription as feedback and final transcription as result
- A vendor package for
- A manager for downloading
whisper.cpp models to cache
- Ring buffer for audio
Note that there is another package out there whisper_ros, which also utilizes
whisper.cpp but has no action server, no vendor, no model caching, and no ring buffer. It does, however, have voice activation detection through silero-vad. This could be supported nicely within ros2_whisper through the action server.
Thanks @mhubii seems super useful, looking forward to trying this out.
Hi @peterdavidfagan , thanks for expressing your interest.
There are some cool concepts with the action server etc but the whisper cpp implementation is (despite high expectations and cuda backend) much slower than the plain pytorch one. Maybe there are some improvements now but it might make more sense to just use Open AIs pytorch version and keep some of the action server concepts
whisper.cpp 1.5.0 is out and it is wild!
The whisper port now fully runs with CUDA backend with no other dependencies.
It is now available for
ros2_whisper as well: GitHub - ros-ai/ros2_whisper: Whisper C++ inference action server for ROS 2