ROS2 OpenAI Whisper

mhubii · February 6, 2023, 8:58am

We implemented an inference node for the openai whisper audio recognition here GitHub - mhubii/ros2_whisper

It is a little hacked right now, and can be activated by saying “Hello ROS”, which whisper understands as “Hello Ross”.

Noticeably, audio_common does not build for ros2. Hence, there is a small node that captures and publishes audio using pyaudio.

Moving forward, it would probably make sense to

Fix audio_common
Clean up the code for doing inference using whisper. Ideally, the inference node should just publish text to a ros2 topic, rather than doing further processing

vmayoral · February 6, 2023, 10:50am

This is pretty cool, thanks @mhubii!

Any chance you could land a README.md at GitHub - mhubii/ros2_whisper and provide simple instructions on how to get it to work in a Linux-based workstation? I think this has lots of potential to be re-used.

mhubii · February 8, 2023, 9:16am

thanks for checking this out @vmayoral , there is now a read me and a little documentation. But this repository is very experimental. I wonder if this could be turned into something more useful

mhubii · August 31, 2023, 5:33pm

There have been changes to the ros2_whisper package. The package now lives under GitHub - ros-ai/ros2_whisper: Whisper C++ inference action server for ROS 2 and is built using whisper.cpp.

Main new ideas are:

Action server for whisper.cpp that publishes intermediate transcription as feedback and final transcription as result
A vendor package for whisper.cpp
A manager for downloading whisper.cpp models to cache
Ring buffer for audio

Note that there is another package out there whisper_ros, which also utilizes whisper.cpp but has no action server, no vendor, no model caching, and no ring buffer. It does, however, have voice activation detection through silero-vad. This could be supported nicely within ros2_whisper through the action server.

peterdavidfagan · October 24, 2023, 1:38pm

Thanks @mhubii seems super useful, looking forward to trying this out.

mhubii · November 2, 2023, 12:56pm

Hi @peterdavidfagan , thanks for expressing your interest.

There are some cool concepts with the action server etc but the whisper cpp implementation is (despite high expectations and cuda backend) much slower than the plain pytorch one. Maybe there are some improvements now but it might make more sense to just use Open AIs pytorch version and keep some of the action server concepts

mhubii · November 20, 2023, 11:05am

whisper.cpp 1.5.0 is out and it is wild!

The whisper port now fully runs with CUDA backend with no other dependencies.

It is now available for ros2_whisper as well: GitHub - ros-ai/ros2_whisper: Whisper C++ inference action server for ROS 2

Topic		Replies	Views
ROS 2 Whisper update ROS General	0	212	November 25, 2024
Announcement: A ROS package for google's speech-to-text API and NLP API - Dialogflow ROS General kinetic	4	3961	March 29, 2018
ROS Node for Speech Recognition with Mozilla DeepSpeech ROS General	1	1358	November 27, 2018
Invitation to Collaborate on AsyncAPI Specification for ROS2 ROS General ros2 , discussion	16	880	May 30, 2025
Current status of FreeRTPS? ROS General	1	1402	February 28, 2017

ROS2 OpenAI Whisper

Related topics