My ROS2 minimal examples, difficulties coming from ROS1

Hey, I am new here on Discourse, but not new to ROS. I am a long-time “power user” of ROS1, coming from the field of multirotor aerial vehicles at the MRS group, CTU in Prague. Here is our code base if you are interested: github.com/ctu-mrs/mrs_uav_system. We are slowly getting to a point where others start to ask about ROS2 support by our system, so I took a peek into it.

I started from the basics, first trying the essential building blocks before attempting any serious application. I immediately stumbled upon difficulties and major problems. I already realized that ROS2 is not “the second version of ROS”, but more of another variant of a similar system and, therefore, the concepts and design patterns might not be transferable. Anyway, I approach the transition from the point of view of a ROS1 user, since we have tens of thousands of lines of code build around the ROS1 design.

I was told that you guys here might appreciate my feedback, so here is my repository with the basic examples (Foxy): github.com/ctu-mrs/ros2_examples. The README inside covers all the basics on my list that should be probed before going further. The README also contains my impressions and notes regarding the differences and problems. Please, feel free to collaborate, any help will be appreciated. It took me a relatively long time to investigate those basics, during which I found the existing documentation and examples difficult to adhere to. Truthfully, I might have used stack overflow more during the last week than during the past five years. More in the README.

TL;DR from the repo:

  • How to make services synchronous? Services in ROS1 were completely synchronous. I concede that I might want asynchronous services in like 5% of the time, but that can be done easily with some threading or one-shot timers. But making all services asynchronous makes stuff difficult to port from ROS1.
  • Multithreading with timers seems to be broken.
  • Timers seem to struggle to produce a stable rate with the multi-threaded executor.
  • Why are most demos and examples not using the “component” architecture which is promoted in the ROS2 docs? We are already nodeleting everything in ROS1, so seeing main() in the ROS2 demos seems very strange.

Edit: typos

18 Likes

Thanks for sharing, @klaxalk ! I think this is a very valuable feedback (and hopefully also support for others deciding about ROS 2 transition).

2 Likes

@klaxalk These are some great examples and we really appreciate it. To your last point regarding why the docs don’t use nodelets, the docs have a history and a lot of them didn’t get re-written with ROS 2, just updated. Nodelets also end up being a slightly more advanced topic; it seems perfectly fine to avoid that subject in the more introductory parts of the docs. There is certainly room for examples of both approaches so this material is greatly appreciated.

4 Likes

I hope the following helps :grinning_face_with_smiling_eyes: thanks for sharing thoughts :+1:

How to make services synchronous?

btw, i think you got point. providing synchronous methods in Client class sounds reasonable. (ParameterClient does have SyncParametersClient class)

i created the issue accordingly,

  • Multithreading with timers seems to be broken.
  • Timers seem to struggle to produce a stable rate with the multi-threaded executor.

right, this is known issue, we’ve been working on this.

  • Why are most demos and examples not using the “component” architecture which is promoted in the ROS2 docs? We are already nodeleting everything in ROS1, so seeing main() in the ROS2 demos seems very strange.

I do not think that is strange, because it depends on the use cases and that is what user can choose. IMO, each demonstration or example would be nice to be specified on topic to show and simple enough. We do use ROS1 nodelet with product in the market, but we don’t put everything inside the nodelet since we consider security, process isolation, performance and so on.

anyway, as @Katherine_Scott mentioned, there is room to enhance examples.

4 Likes

Thanks for the reply. But I would say the docs would not agree with you on the topic:

Selection_076

And since the components (as well as notelets in ROS1) can be run as standalone, there is no point in still developing standalone nodes anymore. I am with the ROS2 docs on this.

1 Like

Thanks for the complex reply and for creating the issue. I hope something good will come out of it :-).

Regarding the components (nodelets) I would have a similar answer as to @Katherine_Scott. Already in ROS1, the nodelet architecture does not take away anything from the traditional ROS1 node (besides a slightly more complex build setup). We nodelet everything, but it does not mean we really run everything under the same manager. Some nodeletes run as standalone, same are grouped if necessary (typically due to large image/point cloud data rates). I honestly cheered up ROS2 for embracing the nodelet architecture, when I heard it is going to be the default option.

Edit: your suggested solution for the synchronicity of service response is not applicable in a Component (as far as I know). You don’t have the access to the node in a Component and if you try to get it through this->get_node_base_interface() and pass it to spin_until_future_complete(), then the executor complains that the node is already added to an executor.

4 Likes

Just to be precise - nodelets do take away something from you in ROS 1. They take away node(let)-specific ros::ok() checks (i.e. for unloading just the nodelet and not shutting down the whole manager). Nodelets have no way of knowing they are requested to stop (the request is performed by calling their destructor). You can implement some state variable which changes in destructor, but the existing tooling doesn’t count on it resulting in issues like canTransform() with timeout while unloading nodelet with paused time -> freeze · Issue #381 · ros/geometry2 · GitHub .

Plus nodelets (if I’m not mistaken) do not allow you to rebuild the .so with the manager still running and load the nodelet again with the updated .so. Which slows down development and is less friendly for inexperienced developers.

Just to add a few cents against nodelets in ROS 1 (admitting they have lots of goods, though). I guess the first isn’t an issue with ROS 2 components, while I’m not so sure about the latter…

I must admit that It would not occur to me that someone needs to unload/reload notelets while keeping the manager still running, or to recompile them “on the fly”. These hovewer seem as “development inconveniences” rather than deployment problems. But you only have this problem if you use nodelets as nodelets. You won’t have these issues if use them as standalone nodes. And that was my point: making everything as a nodelet gives you the option to combine them together. But you don’t have to use it if you don’t want to.

Those points you made still do not really take anything away since they apply only in the use cases where multiple nodelets are loaded in a nodelet manager. You can work just fine if you emulate the single node behavior by running them as standalone. Then you can freely shut them down one by one, or recompile and re-run while the rest is still running. I am not saying nodelets in ROS1 are perfect. For sure, the mechanism could be improved. But the discussion was not supposed to be about nodelets in ROS1, but about their recommended counterpart in ROS2, which currently has actual deal-breaking problems.

I agree. I just wanted to pinpoint that it shouldn’t be just “let’s nodelet/componentize everything, because we can”, but “let’s nodelet/componentize everything, because it’s better”. With ROS1 nodelets, I can make this decision. With ROS 2 components, I can’t (yet), so I’m not sure if converting most tutorials to components wouldn’t actually complicate/break the things. And writing code in components which have to run isolated because of the deal-breaking problems you’re talking about doesn’t seem like a good idea to me. Once it’s a component, somebody will take it and compose it with another one, regardless of what the docs say…

@peci1 I might have expressed myself poorly. Please, check the examples. The deal-breaking issues are not solved by running the components as a standalone. I have not even tried composing more components together yet. All the examples are launched as standalone components and the following issues are present:

  • Timer parallelism does not work.
  • Timer regularity is poor.
  • Retrieving service response synchronically (how to?) … the offered solutions (which I already tried before, see the commented parts of the example) do not work with components.

I think it is ok to admit that ROS2 is not ready. I am just trying to approach it from the already established field of ROS1 and I am finding that the usual features are not yet ready. I am fine with that, I understand that it takes time to develop them, our transition can wait. But please, don’t try to convince me that

  • I don’t actually need the features (they are implemented, therefore, I expect them to work), or
  • I am not supposed to expect Components to work, or that I am not supposed to use them (remember, they are the recommended way), or
  • I am using the features wrong (If so, please provide help).

We don’t need to transition right now. We will happily use ROS1 till its final days. But this whole thread was started because I attempted the first steps and I failed.

1 Like

I thought you convinced yourself they don’t (at the moment). Except from that, I didn’t want to convince you about any of other points. I was reacting to

trying to show that the sole fact that components still have these problems, it might still make sense to write standalone nodes (until components get better).

thanks for the catch :+1: you are mentioning here, right? it says it leads to crash the process, which i believe it is a bug. would you mind filing issue for rclcpp with reproducible procedure? so that we are not gonna miss that :exclamation:

I am afraid it’s not a bug. It’s just a design for ROS2 node there, and the thrown error message told us the reason.

Why don’t we use another node for creating the service client if we want to use spin_until_future_complete inside a node’s callback?

Please refer to somebody source code,

true, thanks for the correction. it is application responsibility to catch the exception in current implementation. actually i was thinking that current implementation could be discussed.

Why don’t we use another node for creating the service client if we want to use spin_until_future_complete inside a node’s callback?

on this topic, this is really related to implementation of rclcpp, i was thinking it would be nice to talk about this in rclcpp or probably ROS answers. ROS Discourse is for news and general interest discussions.

1 Like

@klaxalk thanks for the examples. Personally I came into ROS 2 with less ROS 1 ‘baggage’ and now try to get people on board with it based on the good experience and enjoyment of working with it that I have with it so far, so it is good to learn more about pain points from people like you who have invested a lot of time in mastering ROS 1.

In the use cases I had so far for services in ROS 2 the async model always made sense, and I could use the callback to do what needed to be done at the time that the response was received, without having to explicitly block and wait for the results. So it would be interesting for me to learn how you use services in your code and get a better feeling of where synchronous calls may be a must. Could you point out some examples? I couldn’t immediately find any client calls in the repository you linked or some of the other ctu-mrs repos. Cheers!

3 Likes

Hey @sgvandijk, before I explain more: I am fine with asynchronous only services if I can design my system around it from the beginning. In this particular example of mine, the issue is with the transition from a system (ROS1) that only worked synchronously. I would be fine with waiting for the std::future as suggested in my examples, however, this seems to not work when calling service from a callback (currently due to malfunctioning parallelism of timer callbacks).

One such example of the use of synchronous services can be our high level takeoff service callback for a multirotor helicopter. When approached from the high-level point of view, this service (potentially called by a user) has to trigger the following sub-services in a fast and consecutive fashion:

  1. Activate a particular feedback controller for takeoff (service call)
  2. Activate a particular reference generator for takeoff (service call)
  3. Trigger a reference generation for the takeoff (service call)

Each service call can fail which will trigger an abort of the takeoff by calling more services to either restore the previous state or to call emergency landing routine. You could say that this should be implemented as a full-fledged “state machine” where every awaited service call result should be checked by its own state asynchronously. For some situations yes, however, here a few if-then-else statements are fine, especially, in the case of a linear state machine. Moreover, all this happens in a service callback so a user can get a memo back whether the takeoff was successful or not. Some mechanism for blocking the callback until all the asynchronous mess is settled would have to be implemented. Someone could say that this should be handled by an action server. Mind that this whole event takes a millisecond and action server just seems as a too large cannon for this job.

It is not pretty but you asked for it :-). There is a ton of similar situation where the logical flow of our programs just expects that the “one-liner” of a service call will just get you the result… And building asynchronous state machines for all of them seems to me just too much.

1 Like

I’m sorry to continue a discussion here if it should be somewhere else but I was also fighting with how to wait on a service future. Since the component manager adds the node to an executor and spins that executor can you not just wait on the future with get() in your component node? This is the approach I’m using and it seems to work.

Relating to this, I discovered this usage of spin_until_future_complete when trying to write a component node that uses the infrastructure to request changing a parameter in another node. The SyncParametersClient uses this method and is therefore useless in component nodes as it exists. Instead, I just interact with the set_parameters service just like any other ros2 service, calling get() on the future to block until it returns.

1 Like