Code Smells in ROS Code

In the General forum, @tareq97 asked me to share some dubious code patterns (code smells?) I have observed over the years in various ROS projects. He is working on a static analysis tool, and adding checks for such patterns would be a nice addition to his tool.

I thought such a discussion would fit better in the QA forum, and so I will share here a small list of code patterns along with my reasoning for finding them questionable. Some of them are found in public repositories, but I will refrain from citing the sources (blaming, or shaming), unless asked to. Feel free to tell me whether you agree or disagree with my assessment, as well as adding code patterns you have observed yourself.


Pattern: Creating one thread to process callbacks, while the main thread stays idle.

ros::AsyncSpinner spinner(1);
spinner.start();
while (!gShutdownRequest) { ros::Duration(0.05).sleep(); }

Reasoning: Why create a new thread? Seems like a remnant from code refactoring. This pattern can be replaced with:

ros::spin();

Pattern: Using a Rate object just to receive callbacks.

ros::Rate r(100);
while (ros::ok()) {
  ros::spinOnce();
  r.sleep();
}

Reasoning: This is only justifiable if one wants to limit the rate at which callbacks are processed by some degree, at the risk of possibly dropping some messages. This pattern can be replaced with:

ros::spin();

Pattern: Calling spinOnce at the end of the loop.

  while (ros::ok())
  {
    /** ... */
    chatter_pub.publish(msg);
    ros::spinOnce();
    loop_rate.sleep();
  }

Reasoning: If the published message is based on the received data, spinOnce should be called at the start of the loop, so that the newest data is processed as fast as possible.


Pattern: One thread processes callbacks, another publishes messages. They share internal state, but lack thread synchronization.

void MyClass::onInit() {
  update_thread_.start(&MyClass::update, *this);
}

void MyClass::topicCB(const std_msgs::Int8ConstPtr msg) {
  var1_ = ...
  var2_ = ...
}

void MyClass::update() {
  ros::Rate spin_rate(10);
  while (!shutdown_requested_ && ros::ok()) {
    if (var1_ || var2_) { ... }
    spin_rate.sleep();
  }
}

Reasoning: Lack of thread synchronization means that variable updates on one thread might not be visible to the other thread. That is, updates to var1_ or var2_ in topicCB() might never make it to the thread that runs update().

18 Likes

I love this pattern of discussion. We should write a book, “Learn ROS by Doing it the Wrong Way First.”

I made this slide for a talk a few weeks back. The intended audience was SREs / ops people so it addresses deployment patterns in production. I wanted the audience to be on the look out for non-standard ways of using ROS. Like the slide says, these aren’t forbidden or wrong, they are just things I would ask questions about if I saw them in production.

7 Likes

I’ve thought about this, it could be the anti-pattern version of Guidelines on how to architect ROS-based systems .

3 Likes

Interesting! Well spotted @afsantos !
By the way, we just finished to extend the original “guidelines paper” into a journal paper containing the description of ALL 47 guidelines. I will come back on this somewhere next week.

4 Likes

I think it would also be helpful to talk about this kind of thing in the context of ROS2. There are a few patterns which appear in the demos and example packages (and which are therefore implicitly endorsed) that can cause problems down the road. As an example:


ROS2 (Anti-)Pattern: Using spin_until_future_complete (which is used in many places, such as the minimal_client rclcpp example):

  auto request = std::make_shared<AddTwoInts::Request>();
  request->a = 41;
  request->b = 1;
  auto result_future = client->async_send_request(request);
  if (rclcpp::spin_until_future_complete(node, result_future) !=
    rclcpp::FutureReturnCode::SUCCESS)
  {
    RCLCPP_ERROR(node->get_logger(), "service call failed :(");
    return 1;
  }
  auto result = result_future.get();

Reasoning: It is not possible to call spin_until_future_complete from a callback which is being handled by an executor. If you’ve written some portable function that uses it, it may fail depending on the architecture of the node that is using it!

Also, if you’re able to use spin_until_future_complete in your node, then you have intermingled the custom functionality of your node with the scope of the node’s executor. This will make it much more difficult to structure the node so it is composable as a Component.

I’ve been using the future returned by the rclcpp service client’s async_send_request function to help wait until the service has finished, though this requires manual setup of callback groups and a MultithreadedExecutor to let the node get the service result response in a different thread.

7 Likes

I think having a collection of anti-patterns/code smells of some sort (book, repo, thread…) would be a great! Having some of these patterns checked by static analysis would be even greater!

1 Like

Using spin_until_future_complete

This could be considered an anti-pattern in itself, since, in its current form, the function should actually be called spin_until_future_complete_and_then_some, where “then_some” may be rounding up to 1 second (that happens with FastRTPS at least, I surmise this may be middleware specific) or something else.