ROS 2 Humble – Continue Program Execution After Network Loss

Hello everyone,

I’m using ROS 2 Humble on an embedded platform. During program execution, if the robot loses network connection, everything stops and the program halts. I would like to ensure that the robot continues executing the program even after the network connection is lost.

I’ve already tried the solution described in this post, but unfortunately, it didn’t work for me.

Has anyone encountered a similar issue or found a reliable solution?

Thank you in advance!

What middleware are you using? If it’s an option, you may want to consider switching to rmw_zenoh, which by default only works on localhost. This should make your system robust to the network dropping.

We have had the best results compiling rmw_zenoh from source as we found the release builds had a few issues.

Alternatively, try the “Improves Dynamic Discovery” configuration as linked to in the first reply here: Proposal : Restrict DDS to localhost by default (direct link is here - Improved Dynamic Discovery — ROS 2 Documentation: Rolling documentation)

How are you invoking the “program”?

Does your OS have nohup command?

I use something similar to this from a remote term:

nohup /home/ubuntu/HumbleDave2/start_robot.sh >/dev/null 2>&1 &

If you want to move the concept into crontab, getting the user and path is trickey so make a “nohup_start_robot.sh” and crontab start that script.

Better startup is to start your robot with a service instead of crontab, but more complex to explain and many more options.

1 Like

Thank you for your reply. We are using cyclonedds RMW, but switching to zenoh is not an option as we are building a system for a robotics competition next week. Therefore we need a robust solution, without critical changes.

Using the ssh protocol, we communicate with the embedded platform, where we start all the necessary components (navigation, detection, hardware components). We have a startup launch script that starts all.

1 Like

Thank you for your reply. Unfortunately, our system don’t support that command. We are using Ubuntu 22.04.

quick and easy test:

  • use “nohup” before the startup script:

nohup start_robot.sh &
exit

Then ssh back in and look at the file nohup.out to see the output of your programs

??? Ubuntu 22.04 doesn’t have nohup ???
What is the result of:

nohup --version

If command not found then:

sudo apt-get install coreutils

Have you tried?

  • ssh into the robot
  • ‘screen’
  • run the system
  • CTRL+A, then D. (Detach terminal) It keeps running in the background
  • go on, disconnect ssh, whatever

At any moment:

  • ssh
  • screen -R
2 Likes

This is a common question but has actually nothing to do with ROS. See this question for a number of good options (including nohup, screen, and also tmux):
How to keep processes running after ending ssh session?

2 Likes

We are already using tmux, and I mean that is not an issue.

We are already using tmux, and I mean that is not an issue.

Is using systemd service suitable for you?

Are you saying your problem is that when a network interface disappears and then appears again, the RMW doesn’t notice it and doesn’t communicate on this interface? Or does the disappearing of the network device even break comms on localhost while the link is lost?

Hi @Marija_Golubovic and everyone helping resolve this,

We appreciate the iterations here. However this thread is a much better suited for Robotics Stack Exchange As is described in our support guidelines ROS Discourse we want to focus on general disussions and troubleshooting support is on Robotics Stack Exchange which has many better mechanisms to improve the flow for helping get to the best answer faster.

@Marija_Golubovic based on all these iterations please try to make a minimal reproducible example so that others can help you better. Because your question is not clearly defined and reproducible you’re getting a lot of suggestions that are not helpful for you. Please ask your question and then follow up here one last time to provide a link to the question so that if someone has an issue similar to yours they can find the question and hopefully answer on Stack Exchange.

3 Likes