[URDF-NG] Next-generation robot descriptions

I agree with @Nate_Koenig and @wjwwood that the “data exchange” format shouldn’t be a scripting language. As others have pointed out, you can then have libraries/tools/DSLs for generating that data exchange format, one of which could be python.

I think that the choice of language (XML/json/yaml/etc) is less important now than determining what information about the robot will be represented. Once we have an outline of what we want in the format, expressing that in XML or yaml or whatever should be a (mostly) easy step. Personally I like XML, but I don’t care much.

I’d like to suggest that we should actually be coming up with a set of robot description formats, each of which express a particular type of robot information. For example we could have formats for:

  1. kinematic description - what are the joints, their spatial relationships, and how do they move
  2. dynamic description - what are the dynamic (mass/inertia properties)
  3. visual description - how does the robot look? This might include textures and things that 99% of robot code doesn’t care about, but which is used by visualization tools like rviz.
  4. simulation parameters - it is sometimes necessary to set parameters that are specific to the way that physics are simulated.
  5. limits - are there max velocities for the joints? max positions that are safe?
  6. semantic information about joints and groups of joints - “this is a gripper” or “these joints make up an arm”
    . . .

My hope is that by breaking this down into sub-specs, we would be able to have smaller groups of people flesh out a spec for the part that they are deeply involved in. Some of these matter to almost everyone. kinematics for example, will be used by almost every piece of robot software. That means there will be more interested parties who want to have input about what goes into that spec, but it (hopefully) is also a simpler spec to come up with.

For any given robot, then, the description could include some subset of these. To create a simple robot that you can publish TF for, you might only need to use the elements from (1). Then if you want to make that robot simulate-able, you might need to define the dynamics elements, and add some simulation elements that give the physics engine hints.

The overall spec would then be a list of “accepted sub-specs”. When specs have bee proposed but not fully stabilized and accepted, they could be “proposed sub-specs”. This would give smaller groups within the robotics community a chance to come up with a new format that helps them describe their needs, test it out, and as it gets stabilized and finds broader use it could be merged in as a sub-spec.

As an example, lets say that roboticists working on soft robots find that they need a way to describe the physical characteristics of the links of their robots in particular ways. They could create a sub-spec for defining physical properties of materials, start using that in some of their custom tools, and over time it could be accepted as an official sub-spec. I think this approach is better than putting all elements for all robot use cases into one spec.

1 Like

Yes, of course it’s possible to use URDF and/or ROS without python.

Honestly, when was the last time you did that? How many users or deployments do you know of are loading nodes without the python tools roslaunch and friends?

‘considerably reduce’ is a bit vague. What amount of processing time, energy, or memory is painful for you? How does that amount compare to your mission time, battery capacity, or total memory?

For myself, by the time I’m working with a system so constrained that the difference matters (a microcontroller), I’m using a binary image rather than parsing anything, text or code.

For a next gen format, I think it’s a really interesting question of how to balance the needs of the machine with the needs of the user.

With a text file, especially a purely declarative one, as the complexity of the model increases, we have to bring in tools to help us visualize the current state, and tools to help generate the file (templating, xacro, whatever).

Meanwhile every process using the file has to parse it.

So it raises the question for me, is there value in a format that favors the machine? If we will be using tools for generation and visualization anyway, why not store the data in shared object form? For a large subset of users, loading a shared object file is trivial. For the rest, a port of libelf might be tractable and of comparable effort to parsing xml.

Someone somewhere mentioned reading URDF with Javascript being desirable. I can’t imagine many use cases for client side only URDF reading. Anyone have any? Perhaps a presentation or visualization tool loaded as a local html file rather than across a server?

Thoughts on a standard binary (instead of text) format?

Good point about the format not needing to be text if it is primarily for machine transmission/consumption. ELF would be an interesting way to do it - is ELF commonly used for serialization? ROS runs on networks and robots run different CPUs, so whatever formatis used needs to be architecture independent.

There are lots of binary formats though - for example ROS messages! Also things like BSON: http://bsonspec.org/ (just naming two; there are tons).

1 Like

When I suggested going python only, I wasn’t thinking of applications other than robot_state_publisher. I think it makes perfect sense to have a non-interpreted exchange format, in addition to having a interpreted or templated (not sure which term is most correct) format to generate the exchange format.

I mentioned ROS IPC because it is the most common binary exchange format in ROS systems :). But it is may not be suited for all the different use cases for a robot description.

If the exchange format is not to be seen by humans, Going with a binary format makes a lot of sense. It makes parsing easier and you don’t have to deal with human things like whitespace and capitalization. (HTTP/2 made a similar decision).

BSON and Protobufs are binary formats that come to mind when I think about multi-arch and multi-language support.

@jon
ELF is architecture dependent, though there is a FatELF that can wrap multiple architecture specific ELFs. It would be a stretch to use it if one were storing and sharing binary data only.

There is a big difference between using a python tool and having to embed a python interpreter in your own program. For starters the tool is completely optional, and there are actually cases where I didn’t use them. Additionally, having an interpreted description format introduces the risk of malicious robot descriptions abusing exploits in the interpreter or surrounding process. It would also force the use of a specific scripting language while with current URDF you can use whichever scripting language or template engine you want.

I’m confused by this train of thought. To me, shared object form or any other binary format is “favouring the machine”. Binary formats are generally more compact, easier to parse and arguably easier to generate from code. They are also very difficult to read by humans, let alone edit.

The vast majority of users will not be parsing the description themselves, they will be using libraries or tools to do that. I think those users’ needs should be considered most important when selecting a format, and to me that says a human readable, human editable text based format.

However, I also agree with @jon that the most important question first is “What should be in the robot description(s)?” and not what format(s) will be used to store it. The latter is a technical detail (an important one, but nevertheless).

Embedding interpreters is not that bad, comparable to adding a parser for XML, depending on the interpreter. I spent a little while above convincing myself above that declarative text is better but when I look at the total ecosystem, the total workflow, I’m not sure. Meanwhile many would never have to embed; as they are just writing descriptions and letting robot_state_publisher and friends do the work.

The security aspect is very interesting. Probably especially important as something like ROS Industrial gains traction. Depending on context, sandboxing the interpreter would make sense. Same as with xacro. Signed config files (regardless of format) will become more common I would expect.

Yes, it would force a language, just like XML, and to be honest, I don’t really love any of the candidates. But python is the scripting language of choice for ROS, and very lightweight python engines exist, so that would be a strong contender.

I was observing that since the complexity of a URDF is often enough that people will use a language (xacro is the common one now I guess) or an GUI editor, maybe there is no real benefit to the text file. No text file might simplify all the readers. Just thinking outside the box here. (ELF is not an idea that will survive, but it does intrigue me.)

The ultimate goal is to have a definition that everyone knows how to read and agrees what the meaning of the sub-bits is. That’s what the URDF is. So when trying to design a format, we take a step back and look at all the producers and all the consumers and try to see where the overlap is the densest. Maybe it’s a text file, maybe it’s a text program, maybe it’s a binary file, maybe it’s a memory image.

Almost everyone consuming the description will parse it and map it into a data structure. For many of the consumers, the same data structure will work fine. A GUI editor would probably work directly with that same data structure. A very simple console editor could work with that same structure. And anyone wanting to create the structure without the GUI editor could certainly throw together a shell script using the console editor.

There are pros and cons. Equivalents in common use are the windows registry, gnome configuration, and there are certainly frustrations when one can’t just use a text editor.

Yes, I hope the other thread picks that up.

That depends. TinyXML for example is not at all comparable to an interpreter. It only supports a (sane) subset of XML which avoid the complexity and dangers of DTD parsing and remote entitity lookups. That makes it a lot more lightweight than any script interpreter that I know of. If the description format limits itself to this subset of XML, it is not much different from other text formats.

It’s getting more and more complex already, when all we wanted to do is read a robot description. This is exactly why I would stay away from having an interpreted robot description format. In my view all of this would be unnecessary complexity compared to a non-interpreted data format, considering you can always use any scripting language you like to generate such a non-interpreted robot description.

I’m not aware of lightweight python engines. Do you have more information on this (a link perhaps)?

Being tied to a specific scripting language is not the same as being tied to a declarative format in my opinion. Not every language you want to work in has a python interpreter, and the same is true for every other scripting language out there. However, XML, JSON and YAML parsers are almost universally available. Even custom declarative formats are easy to parse in almost every scripting language.

It is true that you would normally still be tied to whichever declarative format is used then (possibly XML), but with the right tools you wont even have to know what output format you are generating. In fact, you can still embed an interpreter to generate the required robot description as the wanted internal data structure without going through the data format. The reverse is not true: without declarative data format you can not skip the interpreter (not without coming up with your own declarative format anyway).

I believe your key point is that having only a declarative format is too restricting and leads to an unmanageable mess in the face of complex robot systems. I agree with that. However, I also think that there are two separate problems here. The first is the have a format that can be used to store robot descriptions that can be written and loaded from different tools. The second is to provide users with a method of keeping their robot descriptions readable and maintainable. I think that solving both problems in the description format is not worth the added complexity that would introduce in the parser.

One more observation: An interpreted format would be turned into a different data structure by the interpreter. That data structure represents the same robot description but with any programming constructs already processed. Simply serializing that data structure already gives a declarative data format, so why force the trip through a script interpreter?

I have not measured in-memory footprints, but disk footprints of .so files are (approximately) 90k tinyxml, 200k lua, 200k javascript (jerryscript reported), 300k micropython (unix system embedded form), 3M python2.7. Perhaps the important comparison is to system resources? What is the most resource constrained system you’ve seen or heard of urdfdom being run on?

This is why we have to step back and look at the whole ecosystem. The non-interpreted data format (for non-text information) is almost never existing in a vacuum without the machine component. There are a few formats used for human-human communication (BNF is a good example, as are regular expressions). The rest involve template/program/GUI generators on the front end and parser->internal-model code on the back end.

Yes. My argument in favor of a configuration language is that it solves some of the problems in a declarative format very cleanly and simply. Some complexities around multiple configurations (if this, else that) and permutations of options disappear. It also gives an easy route to callback function scripting, adding finite state machines, and other such things (strictly within the boundary of describing a system).

Note that a configuration language is no more difficult to generate text for than a declarative format. If the language were LUA and someone wanted to generate it from python, the effort would be equal to generating XML from python. The same holds if the configuration language were C++ (Cling is a bit heavy but it would make handling the C/C++ parts of the ecosystem, which are large, very easy).

I mostly agree. (it’s the VM/interpreter that’s the extra complexity, not the parsing). But then we have to look at how many, and where, interpreters would need to be written.

I wonder how many unique consumers of URDF(+ friends) are out there? URDFDOM, etc.

This is the argument that for me supports a binary format.

Human editable formats (text) add a couple burdens. The format and data have to be validated as correct by the parser and the code mapping the serialized data to the final model data structure.

Nonetheless the expressiveness and readability of a configuration language is so much better than static formats that it really makes a strong case for standardizing around the datamodel of a configuration languages VM rather than a static text format.

@de-vri-es Does URDF/SDF present any problems for you? If so, where does it fall short?