ROS Software Architecture for GPT/AGI Integration

This is a reboot of the lapsed ROS Software Architecture thread.
ROS software architecture - General - ROS Discourse

ROS is a well laid foundation for robotics with lots of great resources aligning, but ongoing architectural design is radically challenged by rapidly advancing AI.

This thread is to review classical architectural software models and emerging concepts for ROS as a layered data-flow middleware sandwiched between hardwired and deliberative AI application layers. Suggest your ideas for standard semantic networks of master-topics, with robust operational properties.

So far, GPT already seems quite expert at coding ROS in Python, only getting better, and almost too eager to help architect itself into robots, under widely accepted best practices.

1 Like

I’m no Generative AI expert, but I think it’s a bit premature to say the software architecture designs in ROS are “radically challenged” by things like ChatGPT. Most robotics systems follow (at a very very high level) a sense->perceive->plan->act paradigm. I haven’t seen much evidence that modern generative AI (like chat GPT) can change the sense or act portions in a meaningful way as these are usually low level interfaces to sensors, actuators, and controllers possibly with some sophisticated data filtering on inputs and outputs. The perceive and plan portions will certainly be affected but the level of impact to architecture might not be that great. At the moment, both these pieces are already AI focused with perception being handled by various “AI” techniques like CNNs, Particle Filters, etc. The planning portion is also leveraging a wide range of AI techniques like A* path planning, behavior trees, q-learning etc.

A big portion of robotics is the need to experiment with different strategies to find the one most suited to the current domain. So most architectures already allow for a fair bit of flexibility in the underlying AI algorithms. Nav2 is a great example of this letting users select different path planners and controllers based on their needs.

I think things like chat gpt would probably be wrapped in ROS nodes just like any other AI algorithm. You could either convert inputs and outputs as needed to provide structure or maybe even get away with passing raw message content directly. (that would certainly be interesting to experiment with)

However, I think this ignores the two biggest architectural hurdle to using these tools, that is safety assurance and execution time. Chat GPT is quite slow and uses a large amount of resources to run so it’s not something you could run on a robot and maintain any type of hard realtime constraint. Similarly, the more unpredictable nature of it means using it in a safety critical system would be difficult.

1 Like

Thanks msmcconnell. We must be careful not to fall into the classic boiled-frog trap if “a bit premature” turns out to be “a bit late”. Surely someone reading this urgently needs to become a GAI expert, although I know fine COBAL coders stilling eking out careers.

Generative pre-trained transformers (GPT ), as a fast-evolving technology, is a disruptive picture, not ChatGPT, a mere chatbot. The new robotics paradigm is LWN (large world models) based on multi-modal data (thanks Electric Sheep).

A skill gap is expert GPT prompting and execution of ROS code, rather than more hand-coding wanted. Real time demands will be helped by AI-driven architectural optimization, Moore’s Law next will be expanding performance into optical neural nets, as a dominant QM Computing variant, with semantification as the human-readable basis-

All-optical machine learning using diffractive deep neural networks | Science

Here is a robotic AGI Reference Architecture, not yet fully ROSified, where the original OODA Loop paradigm is exactly your “sense->perceive->plan->act” naming-

Certainly things are changing rapidly, and that’s why I responded to this thread to see what insights the discussion might have! Could you provide a bit more description of this architecture diagram you shared. I see a low-latency hardware layer of FPGAs, a ROS Layer, and the LWM. The core idea of an LWM as I understand it is that the AGI learns it’s world model and then can reason on it. What’s less clear to me is if the many LLM and LLM Agent text boxes are separate agents or the same one over time.

These LLMs can mostly be the same underlying model but each one a thread running under special prompts. The secret of prompting is “warm up” the LLM with preparatory prompts to define context, then ask for the generative leap.

The diagram suggests how to train LWMs for robotics, to let the model watch the robot’s broad I/O doing realistic tasks with human or scripted help or in VR sim, and then remove all the “synthetic data”.

For history, the cybernetic loop is first formalized by Weiner, then operationalized as OODA by Boyd, with many variants like Brooks, Braitenberg, Simon, etc., but all conformalize and regularize onto OODA. GPT is even suggesting a new refinement- OO3DPA (more later).

ROS 1 and 2 may even evolve into separate layers, 1 interacting with distributed intelligence, and 2 over the real-time core. Nothing is frozen yet, and many features are sketchy. The AI is eerily willing to sort it all out.

1 Like

Toward integrating real time and deliberative threads by a uniform schema.


Interesting thoughts. One thing I would suggest is avoid having ROS1 and ROS2 both utilized in these systems as shown in the diagram directly above. ROS1 will be end of life in about 1.5 years and while it will probably still get some usage that will be due mostly to legacy code bases. ROS2 could certainly fill any role ROS1 could. If you want to interface with legacy systems better to just add a bridge node instead of a whole ROS1 layer in your theoretical architecture.

In terms of a ROS specific architecture, this makes me wonder if there is value in a common message specification for “prompts” beyond just raw data (text etc). Is there some metadata common to prompts which is useful to capture and might help LLM/LWM interoperability?

No expectation ROS is mature in any version. NASA and SWRI variants presumed to represent most-rigorous R&D, while anyone is welcome to try their best.

“ROS1” and “ROS2” are stand-in names based on ROS2 having improved real time functionality, but ROS1 currently more legacy code. Names will constantly vary between versions. There may be a ROS-RT (real-time), ROS-MW (middle-ware), ROS-AI, and so on. The architectural challenge is clarity and soundness, as these layers interact. Hence the synched epicyclic layer clock to constrain latencies.

There will be tension between Enterprise-Scale design standards managers and users can understand and brilliant but poorly documented code no one but the genius who wrote it understands. ROS is developing both ways, and AI will soon reconcile it all. What is it about “superhuman intelligence” some folks cannot understand? It’s computer science come true.

The question should not be who is coding with AI, but how to code for AI Integration. Already, AI can do good super-human work finding human-coded bugs in a twinkling, if not yet as reliable and creative as the best human coders.

Natural Language is the new “no code” coding standard. Ability to semantically express high-level logic and knowledge without ambiguity is the top Prompter qualification. A classical education is helpful. Prompting is Socratic Dialog:

Meno’s Slave

This Topic is loosely mirrored on the LESSWRONG Forum on the AI Developer side. A notable post to the AI Alignment Forum by folks from Anthropic, Conjecture, etc., covers “Induction Heads”, as clusters that form in in large neural nets as they train, as emergent “circuit” structures hoped to provide sound basis for “mechanistic interpretability”. We may anticipate growing Induction Heads in the OODA framework presented here, to interface to ROS objects.

Especially relevant here is Conjecture’s independent parallel identification of the OODA Loop paradigm.

" Increasing automation elevates the importance of thinking about the ‘automated interpretability [OODA] in which we use models to help us interpret networks and decide which experiments or interventions to perform on them."

Current themes in mechanistic interpretability research — AI Alignment Forum

1 Like