Looking inside ROScribe and the idea of LLM-based robotic platform

ROScribe

ROScribe is an open source tool that uses LLM for software generation in robotics within ROS (Robot Operating System) framework. ROScribe supports both ROS 1 and ROS 2 with python implementations. In this post, I want to dive deep into how ROScribe works under the hood, and explain some high level ideas behind this project. Further, I want to discuss how using LLM for code generation in a specific domain compares to a generic domain code generation.

Robot Operating System (ROS)

ROS is not an operating system; it is an open-source software framework for robot software development. It provides hardware abstraction, device drivers, libraries of commonly-used functionalities, visualizers, message-passing, package management, and more [1, 2].

ROS was designed to be open source, so that users could choose the configuration of tools and libraries to get their software stacks to fit their robot and application area [1]. There is very little which is core to ROS, beyond the general structure within which programs must exist and communicate. In one sense, ROS is the underlying plumbing behind processes and message passing [1]. However, in reality, ROS is not only that plumbing, but also a rich and mature set of tools, a wide-ranging set of robot-agnostic abilities provided by packages, and a greater ecosystem of additions to ROS [1].

There are two versions of ROS available today (i.e. ROS 1 and ROS 2), each containing several distributions (e.g. ROS Noetic, ROS 2 humble, ROS 2 Iron). A comprehensive collection of repositories and packages across different ROS versions and distributions can be found in ROS Index [3].

ROS graph

ROS graph is a graph representation of all processes involved in the robot and the communication among them. ROS graph is a directed bipartite graph (a.k.a. bigraph), meaning that there are two disjoint sets of vertices in the graph: ROS nodes, and ROS topics. ROS nodes abstract hardware components such as sensors, actuators, controllers, processors, etc., that run ROS-based processes. ROS topics are the information communicated by the ROS nodes. Figure 1 shows an example of a ROS graph. A ROS node can publish to a topic (when the node is the source of the data) or subscribe to it (when the node is the destination of the data). The edges of the ROS graph represent the publish and subscribe actions.

Figure 1. An example of a ROS graph; ovals represent the ROS nodes (i.e. the processes) and rectangles represent the ROS topics (i.e. the data); the direction of the edges indicate the flow of data.

Adopting ROS

ROS is the most common framework for robot softwares; it is growing in popularity in robotics industry and it’s finding its way to other industries such as IoT (Internet of Things). However, if you are new to ROS, whether you are a robotics enthusiast, a college student, or a professional engineer, you might find it intimidating at first, and you might hesitate to adopt ROS for your robotic project. This skill barrier sometimes forces the beginner roboticist to give up on ROS altogether and opt out for non-standard options.

ROScribe is designed to help you overcome the skill barrier so that you can adopt ROS for your robotic project. ROScribe creates the ROS graph for your project, and implements all ROS nodes, and takes care of the ROS-specific installation and launch files. If you are working on a new robotic solution (e.g. a new mapping algorithm, a new clustering solution), you can focus on your own component and let ROScribe take care of everything else that is needed to be connected to your component to have a functioning robot. In other words, you can use ROScribe to create a blueprint for your software; ROScribe gives you a complete software that you can use as an initial draft for your project; then, you can replace parts of the generated code (e.g. one of the ROS nodes) with the specific code that you have manually developed for your component of interest.

We believe ROScribe helps beginners to better learn ROS and adopt it for their projects, while helping advanced users to quickly create a comprehensive blueprint of the robot software for their project.

Using LLMs to generate robot software

LLMs (Large Language Models) belong to a class of AI specialized in processing human languages. This raises the question: how well an AI made for human language can perform when dealing with programming languages such as Python and C? In [4], we discussed how LLMs fare in code generation for a general-purpose software. It is notable that the power of the LLM is mainly in understanding the software spec (which is written in human language) rather than in code generation itself. Please refer to [4] for a discussion on how LLMs can be used to generate a general-purpose software, and how computer programming paradigm could evolve in the presence of LLMs.

In ROScribe, we are not dealing with software generation in general, but rather, software generation in a special case of robotics. More specifically, we use LLMs to generate robot softwares within ROS framework.This limits the scope of the software design, and narrows down the LLM’s attention to a smaller task, resulting in lower likelihood of hallucination and higher quality outcome. Like humans, LLMs have limited attention span; having a smaller design space, when dealing with a more specific task of robot software, yields a better outcome compared to when dealing with a broader design space for a generic software design. Prompt engineering techniques, fine tuning, and priming could further help to limit the design space. In the next section, we will explain how ROScribe breaks down the given task to smaller pieces to implement the software through divide and conquer.

To improve the quality of the generated code, and increase the efficiency of the LLM, other methods such as RAG (Retrieval Augmented Generation) can be used. Having a well-structured scope for designing robot softwares in ROS makes it easier to effectively integrate RAG in ROScribe and further improve the quality of the generated robot software. The current version of ROScribe (v0.0.3) doesn’t use RAG, and therefore we save the RAG discussion for future articles. In the current version of ROScribe, the entire robot software is generated by the LLM.

How ROScribe works

The process of robot software generation in ROScribe can be explained in three steps:

  1. ROS graph synthesis
  2. ROS node synthesis
  3. ROS-specific installation scripts generation

ROS graph synthesis

First, ROScribe captures a high-level description of the robotic task from the user in the initial prompt, and asks the user to specify which version of ROS to use (i.e. ROS 1 or ROS 2).

Then, ROScribe asks a series of high-level questions about the overall design of the system. The questioning process continues until ROScribe gathers enough information for generating the ROS graph. The process of generating follow-up questions is illustrated in Figure 2. ROScribe uses the chat history of all previous questions and answers, as well as the first prompt about the task and ROS version, to generate a prompt that will be fed to the LLM to create the follow-up question. Sometimes the follow-up question is accompanied by a suggested answer, or viable options, to help guide the user to the right direction.

Our intention here is to keep the human in the driver’s seat and let him make all decisions on how the robotic system should be designed. ROScribe uses the LLM as a robotics expert who knows what questions to ask to get the necessary information from the human designer.

When there is no further question, the ROS graph can be generated. ROScribe uses the entire chat history, the initial prompt and ROS version, and some internal formatting instructions to generate a prompt and feed it to LLM to create the ROS graph (Figure 3). ROScribe drafts a visual representation of the ROS graph in RQT-style, and allows the user to modify it by adding or removing ROS nodes.

Figure 2. ROS graph synthesis; generating follow-up questions

Figure 3. ROS graph generation

ROS node synthesis

For every ROS node identified and finalized in the previous step, ROScribe captures the spec from the user; only when the spec is completely clear, it implements that node. The task of capturing the spec involves an elaborate process of prompt engineering that we call prompt synthesis. This is the heart of ROScribe where the LLM’s strong suit is used in processing conversations and generating questions all in natural language.

Figure 4 shows the process of prompt synthesis in which ROScribe uses a summary of the chat history plus the top-level information about the task, the ROS version, and the ROS graph to generate a prompt that will be fed to the LLM to create a follow-up question. This process will continue in a loop until the spec is clear and the user has provided the necessary details about the design.

Finally, when the spec is clear, and all the design details are resolved, ROScribe generates the code for that ROS node and moves on to the next node. Figure 5 illustrates the ROS node generation step.

Throughout the entire process of code generation, ROScribe keeps the user involved. The idea is to walk the user across the design space and shed light on different subjects that need clarification, and let the user make the decision on each subject. Ultimately, it is not the tool that invents the software; it is the user utilizing the tool who is in charge of the project.

Figure 4. ROS node specification using prompt synthesis

Figure 5. ROS node generation

ROS-specific installation scripts generation

At the final step of the software generation process, ROScribe creates the files needed for installation and launch of the generated ROS packages; the generated files include package.xml, CMakeLists.txt, and launchfile.launch (Figure 6).

Figure 6. ROS-specific installation files

Lessons we learned from ROScribe

The following remarks summarize the lessons we learned from development of ROScribe:

  • The strength of LLM-based software generation tools are in capturing the spec, and the spec cannot be captured efficiently in a single prompt.
  • A good prompt engineering is key to capture design details from user, and the LLM’s output is only as good as its prompts.
  • Human should remain in the driver’s seat and control the design process.
  • The LLM performs better when dealing with software generation in a special domain, as opposed to a general domain.

About ROScribe

We made ROScribe open source hoping that it would benefit robotics students and engineers who want to speed up the process of robot software generation. We encourage all of you to check out this tool, and give us feedback here, or by filing issues on our github. If you like ROScribe, or the ideas behind it, please star our repository to give it more recognition and let others know about it. We plan to keep maintaining and updating this tool, and we welcome all of you to participate in this open source project.

About RoboCoach

We are a small early-stage startup company based in San Diego, California. We are exploring the applications of LLMs in software generation in general, and in robot software generation specifically. You can learn more about our products in our github [5, 6].

References

[1] Robot Operating System - Wikipedia

[2] Documentation - ROS Wiki

[3] https://index.ros.org/

[4] Looking inside GPT-Synthesizer and the idea of LLM-based code generation - DEV Community

[5] GitHub - RoboCoachTechnologies/GPT-Synthesizer: Collaboratively build an entire code base for any project with the help of an AI

[6] GitHub - RoboCoachTechnologies/ROScribe: Translate natural language into robot software.

13 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.