Design By Contract

What do you think about adding “Design By Contract” functionality to ROS?

Design By Contract is an approach for designing software. It prescribes that software designers should define formal, precise and verifiable interface specifications for software components, which extend the ordinary definition of abstract data types with preconditions, postconditions and invariants.” Design By Contract (wikipedia.org)

“Design By Contract (DBC)” can dramatically decrease the effort during software integration and dramatically increase the overall system reliability.

“Design By Contract (DbC)” is a built-in feature of many recent programming languages. Usually the verification of these interface specifications is supported on different levels of abstractions during “debug” builds at runtime. D (contracts) supports runtime checking of contracts on the interface, the class and the (member) function level for example:

On the class level the caller of a member function ensures defined preconditions. When the preconditions are fulfilled the member functions guarantees proper functioning and defined postconditions. Invariant checks ensure that an object remains in a valid state during runtime. The validity of the internal state (e.g. class members) is checked after the execution of the constructor, before the execution of the destructor, before and after the execution of a public member function. [Cehreli, Ali: Programming in D, IngramSpark, 1st edition, 2017, p. 218 and 386 or somewhere in the online version]

In ROS one could think of node level contracts w.r.t. … (impact of DbC on timing):

  • if node is topic publisher (postcondition checks)
    • (“guaranteed” topic publish rate)
    • “guaranteed” topic message type values
  • if node is topic subscriber (precondition checks)
    • (“expected” topic reception rate)
    • “expected” topic message type values
  • if node is service server
    • (“guaranteed” service response time) (postcondition check)
    • “expected” service request message type values (precondition check)
    • “guaranteed” service response message type values (postcondition check)
  • if node is service client
    • “guaranteed” service request message type values (postcondition check)
    • “expected” service response message type values (precondition check)
  • if node is action server
    • (“guaranteed” feedback transmission delay after goal request has been received) (postcondition check)
    • (“guaranteed” result transmission delay after goal request has been received) (postcondition check)
    • “expected” action goal message type values (precondition check)
    • “guaranteed” action feedback message type values (postcondition check)
    • “guaranteed” action result message type values (postcondition check)
  • if node is action client
    • (“expected” feedback message delay after goal request has been transmitted) (precondition check)
    • (“expected” goal message delay after goal request has been transmitted) (precondition check)
    • “guaranteed” action goal message type values (postcondition check)
    • “expected” action feedback message type values (precondition check)
    • “expected” action result message type values (precondition check)

Checks w.r.t. to timing could be heavily impacted by the prrocessing overhead of checks for complex message types. However in some circumstances (comparably low processing overhead due to checks) they could at least give some rough estimate w.r.t. to “dynamic” node dependencies.

1 Like

My two cents.
It sound to be a good idea, if a missing requirement genrate a warning, rather than an assertion/exception.

Some of the things you described seems to be related more to Quality of Services, rather than Design by Contract, but I might be wrong.

The more I learn modern C++, more I realize that the best “contracts” can be usually implemented using “strong types”.
I am in favour of using something similar to boost::units in ROS messages, some kind of metadata attached to the topic itself that is NOT transmitted every time and is immutable.

But I am aware that this is another topic…

I have often put some thought into providing such an idea over the years, but I’ve never come close to putting in enough effort to actually do it. :slight_smile: I’d love to see such work move forward so I’m happy to see someone putting in the work!

Having said that, I think that it would be a major effort to achieve for ROS 1, but quite doable for ROS 2 because many of the things you want to establish as the “contract” for a data-flow-based software component can be implemented using the DDS QoS policies. It would need some thought put into how and where to specify the contracts and then how to translate that into QoS settings. On from that things get more complicated. DDS provides facilities to be informed when a QoS policy has been violated, but how to respond to that is probably going to be very application specific. (In classical Eiffel-style DbC, the contracts are essentially asserts that raise exceptions when violated at run time.)

This can be done using the DDS DEADLINE QoS policy.

This could be done in the message IDL and an assert in the topic publish API. There has been occasional talk in ROS 2 discussions of adding allowable ranges to the IDL, but I don’t think it’s gone anywhere. However if it is going to be node specific it would probably be better done purely in the topic publish API, with a node declaring when it sets up the publisher what allowable ranges the message can hold. This would be fine for run-time checks, but it would make it a lot harder to check contracts statically than having them in the IDL. I think that the need for a unique message definition for every node would be too prohibitive to put contracts in there, though. Other than perhaps things that everyone agrees are sensible for that particular message.

This can also be done using the DDS DEADLINE QoS policy.

Same comment as for the publishing side above.

DEADLINE QoS policy again.

Now we’re getting close to Eiffel-style DbC. These can be done now by putting asserts in your service callback, but what you really want is a way to notify the caller that there was a problem fulfilling the service call due to a contract violation. This would probably require extending the way services are implemented.

The RPC over DDS specification, which was finally published in April this year and hopefully the OSRF’s DDS vendors will rapidly support, does not provide any QoS policies specific to services (only ways to specify existing QoS policies on a per-interface level). Therefore ROS2 would need to decide how to deal with contract violations themselves.

Same as above.

Again same as above.

So the things that you are requesting are doable in ROS 2 using a combination of the DEADLINE QoS policy and adding some features to the API for specifying pre- and postconditions. I do not think it would be a huge level of work, but there would need to be a focus on how the API will work to make it clear what is happening, and making sure that the performance is both minimal and zero-able (i.e. all checks can be turned off).

Design by contract encompasses more than can be specified using strong types. It includes the behaviour of the function called. Even in a language where functions strictly have no side effect, there are parts of this that cannot be expressed by ensuring the types are correct, unless every function takes and returns types unique to that function that exactly define its input and output spaces. For functions that do have side effects (such as class methods) and situations where the side effects are the whole point (many service calls and many data-flow-based nodes), types will not cover everything you want to check.

I think that this would be a useful feature. Adding units to data is a powerful way to catch a common class of potentially fatal errors, and it can be done in a way that has minimal impact on performance.

1 Like

Re-posting here because replying from email breaks…

IMHO There are many many known ways to improve software quality in general, and ROS in particular, but many limiting factors, mostly the lack of resources ( including each and everyone motivation and time ).

So I think the focus should be on applying the software development methodologies that are most likely to bring big benefits with relatively little investment, picking from the list of already proven existing software systems in other areas.

The contracts as described seems to be a “weak” version of a specification + model checker (check TLA+) that could also be integrated with a ROS system, but the effort required for the potential users is probably prohibitive…

Before doing contracts, I would first focus on proper, static (since message structure is static), strong, typing (despite the default weak/dynamic typing of the supported languages, it is doable using existing libraries, like for C++ and Python - and even LISPs).

My personal top two wishes are :

  • static&strong typing for ROS message fields (typing helps, and even more when things are distributed). It s a first step… later we could do much more, like add external dynamic typechecker that can check communication during execution (for types, contracts or stronger formal systems)

  • erlang VM integration ( ROS messages as a port, communicating with speed number crunching C++ code, and able to use the erlang VM for all the distribution concerns ) - especially for ROS1.

I have often put some thought into providing such an idea over the years, but I’ve never come close to putting in enough effort to actually do it. :slight_smile: I’d love to see such work move forward so I’m happy to see someone putting in the work!

I have my roots in the domain of embedded software development. Most languages used there like C/C++ are lacking a built-in support for “Design By Contract” (one exception is e.g. Ada) and try to compensate the lack e.g. with coding standards suggesting to define the interfaces in a “contract” like manner as part of the built-in code documentation. But in comparison with language built-in support all these measures are very weak. To get to the point… this topic is keeping me busy for quite a while :slight_smile:.

Having said that, I think that it would be a major effort to achieve for ROS 1, but quite doable for ROS 2 because many of the things you want to establish as the “contract” for a data-flow-based software component can be implemented using the DDS QoS policies. It would need some thought put into how and where to specify the contracts and then how to translate that into QoS settings. On from that things get more complicated.

ROS is a new technology for me. But after reading a bit about the ROS2 design I thought the proposal for “Design By Contract” would fit better into the ROS2 design GitHub repo than into ROS1 (and discourse.ros.org) in the first place. Unfortunately I do not know enough about ROS2 to implement something reasonable on my own yet. However I am very interested in contributing code if it is ensured that it is no waste of time. (Means contributing code w.r.t. some reasonable up-front design.)

This can be done using the DDS DEADLINE QoS policy.

I will dive deeper into DDS and QoS policy the near future.

DDS provides facilities to be informed when a QoS policy has been violated, but how to respond to that is probably going to be very application specific. (In classical Eiffel-style DbC, the contracts are essentially asserts that raise exceptions when violated at run time.)

(In D one can decide whether to use “assert”, “static assert” or “enforce” checks. “assert” checks throw an AssertError which is no Exception but an error during runtime. “static assert” checks do the same but at compile time. Both are usually enabled during debugging only.
“enforce” checks throw exceptions at runtime which which can be handled and are most suitable for public interfaces.)

This could be done in the message IDL and an assert in the topic publish API. There has been occasional talk in ROS 2 discussions of adding allowable ranges to the IDL, but I don’t think it’s gone anywhere. However if it is going to be node specific it would probably be better done purely in the topic publish API, with a node declaring when it sets up the publisher what allowable ranges the message can hold. This would be fine for run-time checks, but it would make it a lot harder to check contracts statically than having them in the IDL. I think that the need for a unique message definition for every node would be too prohibitive to put contracts in there, though. Other than perhaps things that everyone agrees are sensible for that particular message.

Unfortunatelly I do not know much about IDL as well right now. From a conceptional point of view the possiblility for static checking considering a single source of specification should be favored. Considering the “static” behaviour of the node interface only the IDL seems suiteable for me. What do you mean “if it is going to be node specific” exactly? In case of exotic node topic message types?

Will “dynamic” aspects like the “guaranteed” publish rate of topics is gooing to be defined in the IDL as well? If so then the IDL could be a suitable place for these kind of contract information in the first place as well. However if one thinks further about situations where the “guaranteed” publish rate shall not be constant but dependent on the node state during runtime (in common if the behavior of the node depends on its state during runtime) the IDL seems not that suitable anymore.

Now we’re getting close to Eiffel-style DbC. These can be done now by putting asserts in your service callback, but what you really want is a way to notify the caller that there was a problem fulfilling the service call due to a contract violation. This would probably require extending the way services are implemented.

In case a “strict” contract check (analog to D “assert” checks throwing errors) would fail one as programmer should be notified somehow about that (e.g. logger). In case of “weak” contract checks failing (analog to D “enforce” checks throwing exceptions) the caller should be notified directly.

The RPC over DDS specification, which was finally published in April this year and hopefully the OSRF’s DDS vendors will rapidly support, does not provide any QoS policies specific to services (only ways to specify existing QoS policies on a per-interface level). Therefore ROS2 would need to decide how to deal with contract violations themselves.

I will dive deeper into DDS and QoS policy the near future.

So the things that you are requesting are doable in ROS 2 using a combination of the DEADLINE QoS policy and adding some features to the API for specifying pre- and postconditions. I do not think it would be a huge level of work, but there would need to be a focus on how the API will work to make it clear what is happening, and making sure that the performance is both minimal and zero-able (i.e. all checks can be turned off).

I consider the possibility to “disable” performance overhead essential as well. The possibility to keep performance overhead minimal would depend on the communicated data type to a great extend I guess. I would love to contribute.

Yes, it’s specification and model checking. If the effort required for the potential users is prohibitive depens heavily on the domain and environmental conditions they are acting in. I think one should give every possible user as much optional technical possibilities to work with as possible. If users make use of the concepts offered is their choice.

I agree that static strong typing is very important. However from an integration point of view I wouldn’t consider “Design By Contract” less important. DbC helps to avoid “higher level” interaction issues in addition to typing issues. But as DbC would require many features to be most effective w.r.t. to effort static strong typing could probably be achieved faster.

Freedom-from-choice supporters would probably disagree with you here.

From a maintenance pov this is also not a very popular sentiment.

1 Like

I am one of those “Freedom-from-choice” supporters :wink:

I am a freedom-from-choice supported as well :slight_smile: . I would stick to freedom-from-choice w.r.t. all “internals” of a framework. However from a framework user perspective it is probably not always possible or reasonable to beeing forced to define contracts in practice.

For everyone which is interested to get hands dirty: There is a short introductional overview about ROS2 and tutorials from Erle Robotics on their docs.

I am interested in this topic as well. But I would prefer to discuss only DbC in this thread. If you can convince me that I prioritize it higher than DbC I will be with you :wink:

This does not fit into this thread as well but what is the benefit of erlang VM integration?

For more information about the DDS DEADLINE QoS policy refer to page 95 in the DDS specification v1.4.

This relates to the way the message definition language is used in ROS. It defines data types, not node interfaces. Contracts are much more likely to be specific to node interfaces than to the messages, which are intended to be generic and highly reusable. Because ROS doesn’t currently have a node interface definition language, there is not yet a suitable place to specify contracts.

Furthermore, some contracts may be specific to a particular implementation of a node, and so wouldn’t fit in a node interface specification intended to be reused by many different implementations (although then I would argue that the nodes with different contracts should not be considered interchangeable and so should be using different interfaces).

Contracts aim is to enforce complex dynamic properties of a system.
Types aim is to enforce simple (usually) static properties of a system.

Therefore I am of this point of view : haskell - Comparing design by contract to type systems - Stack Overflow
Notice how the complexity increase from top left to bottom right.

So, before trying to do something complex (which means heavy maintenance, and likely to be unused until it is perfectly optimized), I would focus on the doable, lighter side of things.

I also agree with @gbiggs and would like first to see more strict enforced message types, before thinking about their combination in an IDL, how this would behave dynamically, and how to enforce some behavior and prevent others…
Right now the message field type is too ambiguous (“node N can subscribe to a message M with a field int, but actually there will ever be only even numbers there…” except the developer of N don’t know that, unless he goes through the code of all nodes publishing M)

This does not fit into this thread as well but what is the benefit of erlang VM integration?

Don’t try to reinvent the wheel, reuse 30 years of expertise in distributed system programming. There are already a bunch of people working on these questions in a distributed setting, and some tools available : Types (or lack thereof) | Learn You Some Erlang for Great Good!

By the way, isn’t this thread some kind of X-Y problem. Which problem exactly are you planning to solve with DbC ?

Always nice to see another Erlang fan! That language is such a pleasure to program in.

Some would say that this is one thing contracts are meant to check… but I think that depends on where you draw the line between what is a contract and what is the type. But I think you are correct in saying that this sort of information really needs to be available and checkable. It is helpful to have in documentation but far more beneficial to developers (and safer) for it to be automatically checkable.

Types/Contracts : In my mind, ‘Design by Contract’ was an informal concept/practice introduced a few years ago because type systems of most languages at that time was insufficient to guarantee correct program behavior. But it is fundamentally the same thing…
Except that we have researched type theory for a while now, whereas ‘contract theory’ is probably not what you would expect after learning about DbC…

These days I am following dependent types and experiments to bring them into distributed systems.

1 Like

That summarizes the difference between types and contracts. And that is exactly about why I didn’t propose types here: In my experience the hard to find defects tend to have their root cause in implicit, incomplete or missing definitions of the dynamic characteristics of here in ROS, node interactions.

Unfortunatelly that is exactly what I found out when looking into the ROS2 sources. One could add deadlines for topics that (a) do not change or (b) do change over node runtime they could be (a) defined and/or (b) updated via the rmw C API which wrapps the DDS DynamicData API or the statically generated DDS functionality from the IDL definitions. However as you said: The interface considers the aspects of the message description languages IDL only, not a node description language. And a node description language would be required to add functionality which would be most benefitial.

That’s right. The question should be: “How can I prevent from introducing defects into/detecting defects in distributed ROS systems which have their root cause in the dynamic interaction of several ROS nodes?” I am biased and did not propose physical continuous integration because that seems hard to implement for distributed systems. DbC or actually model checking based on kind of a node description language seems to be cheaper to me.

I might be stating the obvious here, but still worth reminding everyone I think…

How can I prevent from introducing defects into distributed ROS systems which have their root cause in the dynamic interaction of several ROS nodes?

  • Don’t build a distributed (==multiprocess) system if you don’t have to. Programming language elements (functions, classes, libraries, packages) are made for composing correctly in all sorts of ways, and there is usually theoretical background, tooling, conventions, processes, to help you satisfy the cognitive biases you didn’t know you had. No distributed software system that allows you to control the distribution graph, has anything equivalent to that currently. ROS is no exception (actually erlang might be the only exception).
    Example : A whole part of Operating System design is to prevent processes interactions, and most recent OSes are preempting ? This is opposed to the features suitable for a distributed system, which by definition needs process cooperation, and where controlling when each process can be interrupted, or not, is really useful. In one process, in one language, all these problems vanish.

  • If you have to build a distributed system, congratulations, you are doing distributed system research. This is not robotics and there is a different set of assumptions coming along in that context.
    Example : most existing and widely-used distributed software systems rely on the fact that a message, a unit of computation “task”, is atomic and idempotent. That requirement usually cannot be met in a robotic platform, because of side effects on the real world, the whole point of it. Painful lesson after a year working on GitHub - asmodehn/celeros: Celery ROS python interface - .

For the rest of us having to do both distribution and real world side-effect, I feel the most promising way, is still integrating side-effects into the theory. But, as far as I know, it is still a software research topic on its own.

Regarding ROS, the best bet is likely to integrate/interface/implement ROS with the existing programming language that provide the feature that you need, instead of trying to integrate “that awesome language feature” into ROS (because it implies re-implementation and proactive maintenance from ROS community for something that is not purely robotics related)

For DbC, I’m thinking if you get around implementing a Eiffel-based ROS interface/integration/implementation, you might find some interesting changes needed in ROS itself, even in REPs, in order to make that possible without compromising Eiffel. I’m thinking these changes would likely be worth it for ROS, especially in the long run.
Disclaimer: I’m currently following the same path with Python, improving how ROS integrate with it along the way, and finding basic problems where I didn’t expect to…

But ultimately, writing a “solid” software project is a matter of computer science and software engineering expertise, so general software theory, knowledge and tools apply there. It’s not a problem specific to robotics, and therefore robotic science and tools (like ROS) are not focusing on it.