Dear ROS community,
Following REP-1 recommendations for new REP submissions, I would like to gauge interest in a new REP to create a set of conventions around HRI (human-robot interaction) application scenarios.
Abstract
This new ‘ROS4HRI’ REP proposal aims at providing a set of conventions and common interfaces for Human-Robot Interaction (HRI) scenarios. This interface is designed to promote interoperability and reusability of core functionality between the many HRI-related software tools, from skeleton tracking, to face recognition, to natural language processing.
By following the naming conventions and leveraging the interfaces defined in this REP, both tools and libraries can be designed to be reusable between different frameworks and experiments. Importantly, the REP does not mandate specific tools or algorithms to perform human perception/social signal recognition per se. It only specify naming conventions and interfaces between these nodes.
These interfaces are designed to be relevant for a broad range of HRI situations, from crowd simulation, to kineastetic teaching, to social interaction.
Rationale
ROS is widely used in the context of human-robot interactions (HRI). However, to date, not a single effort has been successful at coming up with broadly accepted interfaces and pipelines for that domain, as found in other parts of the ROS ecosystem (for manipulation or 2D navigation for instance). As a result, many different implementations of common tasks (skeleton tracking, face recognition, speech processing, etc) cohabit, and while they achieve similar goals, they are not generally compatible, hampering the code reusability, experiment replicability, and general sharing of knowledge.
In order to address this issue, this REP aims at structuring the whole “ROS for HRI” space by creating an adequate set of ROS messages and services to describe the software interactions relevant to the HRI domain, as well as a set of convention (eg topics structure, tf frames) to expose human-related information.
The REP aims at modeling these interfaces based on existing, state-of-the-art algorithms relevant to HRI, while considering the broad range of application scenario in HRI.
It is hoped that such an effort will allow easier collaboration between projects and allow a reduction in duplicate efforts to implement the same functionality.
Items covered by the proposed REP
- human modeling, as a combination of a permanent identity (person) and transient parts that are intermittently detected (eg face, skeleton, voice);
- topic naming conventions under the
/humans/
topic namespace; - 3D tf frame conventions (naming, orientation – compatible with REP120 where possible)
- representation of group interactions (groups, mutual gaze)
A detailed proposal was presented at IROS2021: ROS for Human-Robot Interaction | IEEE Conference Publication | IEEE Xplore
Reference implementation
The reference implementation will include:
- a set of HRI-related ROS messages:
hri_msgs
; -
libhri
, a library that eases the access to human-related signals (providing aHRIListener
inspired bytfListener
); - a reference open-source pipeline that will include:
- face detection and gaze estimation
- multi-body 3D pose estimation
- voice activity detection and speaker diarization
- sound source localisation
- ASR
- rviz plugins to visualise human-related information like 3D skeletons, face & body regions of interest