Soliciting feedback on a msg package for AI prompting

AndyZe · July 13, 2024, 7:25pm

It’s obvious by now that LLMs/GPTs and other AI models create many powerful new ways to interact with robots. For example:

Is the door open?
Is it safe to move forward?
Did the robot grasp the item?

These questions can all be answered by a simple, free prompt to ChatGPT (for example).

However, there is no ROS-standard way to communicate these prompts yet. A recent PR to rosdistro began the conversation. I’d like to get your feedback here. The naming question is really difficult. I’ll try to summarize the discussion so far:

ai_msgs: too vague

llm_msgs: it’s debatable if the recent AI models should even be called LLM’s any longer. They take multi-modal input (image & audio) now.

openai_msgs: OpenAI isn’t the org releasing the message package so it wouldn’t be fair to use that name. Also, it doesn’t describe a concrete intention for the messages. Also, the same messages would work well with other AI models.

ai_prompt_msgs: Hasn’t been shot down yet, I think?

robosoft_ai_msgs: The message repo is hosted at https://github.com/robosoft-ai/ but there’s no functional dependency.

Suggestions and feedback appreciated!

scastro · July 13, 2024, 8:12pm

Naming ROS interfaces is always astonishingly difficult…

I would avoid using the term “AI” in general because that has negative connotations with hype-chasing. And definitely agree that using an organization name is also not ideal.

A term that people are using ubiquitously nowadays is Vision Language Models (VLM). For example,

While vlm_msgs sounds like a nice idea, it might be too specific to exactly this instance of multimodality. And like many terms in this quickly moving field, it may very well wind up being a cringe term in a few months.

My gut says ml_msgs could also work, where the message/service/action types could include different model subcategories, like PromptLanguageModel, PromptVisionLanguageModel, etc.

Or if you want to stay specific to models involving language, maybe language_model_msgs?

AndyZe · July 14, 2024, 10:22pm

I like the idea of PromptLanguageModel and PromptVisionLanguageModel as service names.

Hopefully audio will be added in the near future, although there’s no ROS2 message type for it yet. What would the service name be then? PromptAudioModel?

Video inputs to AI are also on the horizon…

Currently the service name is StringImagePrompt.srv which (imo) extends pretty well to AudioPrompt.srv and VideoPrompt.srv.

Still thinking that ai_prompt_msgs is my package name preference.

mjcarroll · July 16, 2024, 3:51pm

It’s one of the “two hard things”: Two Hard Things

Generally, the idea behind REP 144 is to avoid both too-generic package names or land-grab package names (historically this hasn’t been an issue). At the same time, we recognize that renaming packages (especially interfaces) once they have been released is difficult and to be avoided. That’s why the process can seem a bit opaque at times, and we appreciate your patience.

We discussed this in the weekly maintenance meeting and came to consensus around ai_prompt_msgs. In this case, the stated goal of the package is to be an interface that is mostly generic and interoperable. We agreed with the summary of the reasoning laid out in the initial post that ai_msgs and llm_msgs are probably either too vague or not a good fit for the intention of the package. We also recognize putting the organization name robosoft in probably hurts generic interop.

Thanks for raising this discussion and thanks again for your patience.

zmk5 · July 16, 2024, 4:40pm

I honestly think better names for it might be hmi_msgs (human-machine interface) or hri_msgs (human-robot interface) since these tools are more geared to human interfacing with a machine/robot. It’s less specific than ml_msgs or vlm_msgs and may encompass multiple means of using or not using ML.

gbiggs · July 17, 2024, 3:11am

@zmk5 's point is quite valid, I think. Naming interfaces by their semantic purpose is better than naming them like data types.

Topic		Replies	Views
Vision AI Standard Interfaces General	0	1404	December 8, 2020
Proposal - New Computer Vision Message Standards Computer Vision / Perception	28	5842	June 2, 2017
Common Messages Github Organization Next Generation ROS	15	1880	January 1, 2021
Metadata common to AI Prompts for LLM/LWM interoperability(?) Next Generation ROS	0	1201	December 4, 2023
Object_recognition_msgs porting to ROS2 Next Generation ROS ros2	2	1272	October 24, 2018

Soliciting feedback on a msg package for AI prompting

Related topics