I’m familiar with your work :+1: I’m collaborating with Lan B. Dang to bring this functionality to OSR, and I think joining forces would be fantastic! :muscle:

ROSA offers several advantages over manual processes:

  • First, the agent can select from a wide range of tools to answer a query. It can utilize multiple tools simultaneously and even execute them in parallel. The responses are then synthesized into a final result, or the agent can continue using additional tools if the query hasn’t been fully addressed yet. This is made possible by its “reasoning - action - observation” loop.

  • You can also integrate custom tools for tasks beyond information retrieval. For example, we’ve developed a custom ROSA-Spot agent that can sit, stand, and even walk around. Essentially, we’ve given ROSA the same control capabilities you’d get with an Xbox controller (with plenty of safety measures in place to mitigate the risk of unwanted behavior). You can instruct it to “stand up, walk forward and to the left about 3 meters, then sit down,” and it will follow through. (short demo video attached to this post)

  • Additionally, ROSA can produce semi-structured output from fuzzy templates. For instance, the following system report was generated primarily by feeding diagnostics, environment variables, and a depth image into the model, along with system prompts and a very generic system report template.


(btw, I didn’t have to parse the response to create this report, the models output is being directly rendered as Markdown)

As you can see, ROSA is also multi-modal, capable of capturing depth images from a RealSense camera, describing what it observes, and even estimating distances.

These are just a few examples of the more complex agents we’ve developed using ROSA. There’s so much more that can be done, and we’re just getting started :blush:

2 Likes