`ros2ai` next-gen command line interface

Hi ROS users,

I could allocate some spare time during thanks-giving holiday to come up with idea and implementation.
I would like to introduce ros2ai next-gen command line interface. (currently this is just a hobby and my personal project.)

please see demo, really easy to see what we can do

(https://github.com/fujitatomoya/ros2ai/assets/43395114/78a0799b-40e3-4dc8-99cb-488994e94769)

If you are interested, please reach out to me :smile:

I got some more ideas, so I will keep working on this :+1:

thanks,
Tomoya

7 Likes

This is a pretty good endeavour @tomoyafujita, thanks for sharing! This has potential in my opinion.

I find myself typing parts of my ROS graphs into foundational models’ chat-like UIs quite often lately. I find these great for troubleshooting through lots of ROS data and/or summarization (which I simply can’t digest). The extension you propose can bridge that gap very nicely and avoid having to copy and paste unneccesarily.

I don’t have much bandwidth these days for side projects but I’ll throw some ideas in case some in the community may want to grab and implement them on top of your disclosure:

  • I see there’s exec, query and status subverbs implemented for now. How about a init subverb which initiates a new conversation and packs the most representative (if not all) ROS graph data and passes it over for digestion, so that future queries take that into account.
  • Subverbs update and summarize come to mind as well, which could build upon the init one proposed above (and with a similar mindset).
  • Currently the proposed implementation makes use of the API, which is costly AFAIK. Something interesting would be to try out the hack we introduced back while developing PentestGPT. In a nutshell, instead of using the API, in earlier prototypes, we simulated a browser-interactions through an accepted/initiated COOKIE. We exposed this to the environmental variable CHATGPT_COOKIE instead (see here for a branch where this was prototyped). Extraction of the cookie could be automated (tested only in some browsers) easily.

You don’t suggest working around the Terms and Conditions of the service, do you, Victor? :slight_smile:

@vmayoral thank you so much for sharing your thoughts and ideas.

How about a init subverb which initiates a new conversation and packs the most representative (if not all) ROS graph data and passes it over for digestion, so that future queries take that into account.

yeah, that would be interesting. so that AI can help user more precisely and dedicated to the environment.

btw, i consider some ideas about session with context-aware mode, i gotta spend some more time to figure that out, instead of playing NintendoSwitch :computer: (should be PlayStation :sweat_smile: ) …

Subverbs update and summarize come to mind as well, which could build upon the init one proposed above (and with a similar mindset).

I see, this would be useful too, good idea.

Actually I am kinda inclined to have much less subcommands in the future but only ros2 ai, so that user just relies on calling simple commands without being bothered by sub-commands and options…

which is costly AFAIK

true, I personally use my own API key. I need funding :yen: :laughing:

it would be really nice if we could have community support for this, we can get more scaled data for ROS 2 and that could be really useful for ROS 2 community as well.

to be honest, the pain is not the cost right now, but latency. API is kinda lagging… i guess, i need to find some better fine tuning…

all that said, i just have some ideas but concrete, i will post the issues and slide to share more details :smile:

thanks.

AFAIK the terms don’t establish that you can’t input the data by other means-than-your-keyboard with their Free and Pro accounts, but I’d be happy to learn about it if that’s the case. Note that these accounts are limited to specific amount of interactions per hour, which is a significant restriction (as opposed to the unlimited APIs), but I believe fits this use case (I don’t see myself asking questions about my graph every minute after all). As I said, this is something I’m already doing manually and I’d be surprised other don’t.

What’s proposed is a reasonable use: A simple click-button tool that pre-fills the conversation for me with the right context sounds fair. Instead of dumping manually things, you’re just feeding the data programatically, but you can always turn to the UIs (e.g. your phone, browser, etc.) to continue with the querying. Also, given the competitive landscape of foundational language models being increasingly more capable and more accessible, I think it’s on their interest to facilitate use cases like this, specially with paid and more capable accounts.

Interesting. I’ve been paying attention to locally run smaller foundational models for various robotic use cases. This could overcome the limitation you’re observing. Maybe keeping plug-and-play (from a foundational models’ perspective) interface within ros2ai would be helpful?

1 Like

Special thanks to The Construct @ricardotellez about ros2ai podcast interview :exclamation:

It was really good opportunity to summarize thoughts, and expanding ideas with different perspective :+1:

I do need to study and learn more, but i will try to keep this up :rocket:

some updates (from last time)

thanks,

@tomoyafujita

Can this be used to call actions and services?

For example: ros2 ai exec "move robot_1 to position (0,0,0), and then to (10,10,10)"

This command would look up the available topics, actions, and services, and call them if it finds a match. For instance, from the string above, it would identify an action that suggests moving robot 1, execute it, wait for completion, and then invoke it again with different parameters.

Perhaps also something like ros2 ai exec "..." --require-confirmation would be useful. This would prompt the user to confirm each action or service before calling it. Alternatively, that could be the default behavior, with --skip-confirmation as an option to bypass confirmations.

1 Like

to be honest, dont know. i have never tried.

but i do not think that can be done with ros2ai atm.
it just knows the ROS 2 general and distro specific information only, that said it cannot understand user specific environment at all.

obviously this requirement is one of the goal for user experience. that is under consideration phase for what we can do more.

please create the issue for GitHub - fujitatomoya/ros2ai: ros2ai is a next-generation ROS 2 command line interface extension with AI such as OpenAI, and also take look at https://raw.githack.com/fujitatomoya/ros2ai/rolling/doc/overview.html

thanks for sharing the use case.

tomoya

Hey Craig, you may be interested in the LLM example here (uses the OpenAI API) TurtleBot 4 Navigator · User Manual

I just checked my costs and during the development of this tutorial, I spent $0.10 USD total for 196 API calls (64.7k tokens)

1 Like