It’s obvious by now that LLMs/GPTs and other AI models create many powerful new ways to interact with robots. For example:
Is the door open?
Is it safe to move forward?
Did the robot grasp the item?
These questions can all be answered by a simple, free prompt to ChatGPT (for example).
However, there is no ROS-standard way to communicate these prompts yet. A recent PR to rosdistro began the conversation. I’d like to get your feedback here. The naming question is really difficult. I’ll try to summarize the discussion so far:
ai_msgs: too vague
llm_msgs: it’s debatable if the recent AI models should even be called LLM’s any longer. They take multi-modal input (image & audio) now.
openai_msgs: OpenAI isn’t the org releasing the message package so it wouldn’t be fair to use that name. Also, it doesn’t describe a concrete intention for the messages. Also, the same messages would work well with other AI models.
ai_prompt_msgs: Hasn’t been shot down yet, I think?
Naming ROS interfaces is always astonishingly difficult…
I would avoid using the term “AI” in general because that has negative connotations with hype-chasing. And definitely agree that using an organization name is also not ideal.
A term that people are using ubiquitously nowadays is Vision Language Models (VLM). For example,
While vlm_msgs sounds like a nice idea, it might be too specific to exactly this instance of multimodality. And like many terms in this quickly moving field, it may very well wind up being a cringe term in a few months.
My gut says ml_msgs could also work, where the message/service/action types could include different model subcategories, like PromptLanguageModel, PromptVisionLanguageModel, etc.
Or if you want to stay specific to models involving language, maybe language_model_msgs?
I like the idea of PromptLanguageModel and PromptVisionLanguageModel as service names.
Hopefully audio will be added in the near future, although there’s no ROS2 message type for it yet. What would the service name be then? PromptAudioModel?
Video inputs to AI are also on the horizon…
Currently the service name is StringImagePrompt.srv which (imo) extends pretty well to AudioPrompt.srv and VideoPrompt.srv.
Still thinking that ai_prompt_msgs is my package name preference.
Generally, the idea behind REP 144 is to avoid both too-generic package names or land-grab package names (historically this hasn’t been an issue). At the same time, we recognize that renaming packages (especially interfaces) once they have been released is difficult and to be avoided. That’s why the process can seem a bit opaque at times, and we appreciate your patience.
We discussed this in the weekly maintenance meeting and came to consensus around ai_prompt_msgs. In this case, the stated goal of the package is to be an interface that is mostly generic and interoperable. We agreed with the summary of the reasoning laid out in the initial post that ai_msgs and llm_msgs are probably either too vague or not a good fit for the intention of the package. We also recognize putting the organization name robosoft in probably hurts generic interop.
Thanks for raising this discussion and thanks again for your patience.
I honestly think better names for it might be hmi_msgs (human-machine interface) or hri_msgs (human-robot interface) since these tools are more geared to human interfacing with a machine/robot. It’s less specific than ml_msgs or vlm_msgs and may encompass multiple means of using or not using ML.