High cpu load for simple python nodes

I have been using in a project various nodes, some of which in C++, and some in Python. I noticed a huge difference, even for basic nodes that just do some publishing, in CPU usage between Python and C++. Python CPU usage is generally several times higher than a C++ node doing the same.
This happens for publisher nodes, as well as for server nodes, even in their idle state when no services or actions are actually requested, which I find extremely strange…

Did anyone notice similar issues? Does anyone have any recommendations for reducing the CPU usage of Python nodes?
The issue has been observed so far in ROS2 Galactic, under different CPU architectures (both amd and arm)

3 Likes

I have observed this as well. For this reason, I only use rclpy for development, and never for production nodes.

4 Likes

Not just CPU overhead when spinning, but memory footprint as well. Mirroring Aposhian we usually prototype in python and then convert to C++ for release

1 Like

@Marco can you create the issue on GitHub - ros2/rclpy: rclpy (ROS Client Library for Python) ? with description.

Or maybe we already have similar issue?

CC: @aposhian

1 Like

Every single time I observed that, it was a busy loop that could be easily avoided (while loop without any sleep or blocking operation).
There was always a simple solution to avoid that. If you share the code of the main loop, I can give you some hints

3 Likes

Can you share the difference you see in CPU and memory usage when you run

docker run --rm -it osrf/ros:humble-desktop ros2 run demo_nodes_py talker

vs

docker run --rm -it osrf/ros:humble-desktop ros2 run demo_nodes_cpp talker
?

This would make a great ROS 2 Docs article. :wink: :wink:

3 Likes

At the risk of sounding callous, you guys do realize you’re comparing a compiled language known for its memory efficiency and speed to one that’s none of those three things. The virtual machine will add overhead, so will the interpreter, and of course the language itself needs more processor cycles to run, given dynamic typing and other high level abstractions. Even if you’re not receiving anything there’s still a thread in the background that needs to periodically poll for new socket data I presume. Unlike C++ which will strip out all unused code at compile time, Python also needs to keep all imported modules loaded into memory.

But none of us are using Python because it’s fast or memory efficient, we’re using it because it cuts down development time by an order of magnitude :smiley:

8 Likes

I would have to check again but I have noticed this same ssue with the CLI tools such as ros2 topic or ros2 bag record
that are implemented in python. I believe one basic issue is the executor used in this setting that has the busy loop (as @facontidavide mentioned) that can eat up a whole CPU even if if there is little actual work to be done. I don’t think it’s to just get rid off, though

At the risk of sounding callous, you guys do realize you’re comparing a compiled language known for its memory efficiency and speed to one that’s none of those three things.

@MoffKalast my first reaction was to think the same and say: “let’s convert this to C++, of course”.

But I was always wrong. If you observe a process that takes 100% of CPU and your intuition is telling you that should be below 10%, then your intuition is usually right and there is a busy loop.

This, at least, is my experience (and I am a hardcore C++ developer obsessed with optimization :wink: )

2 Likes

@Marco you started an animated conversation. Could you share your code or a similar example, to give us the opportunity to give you and other users proper advice?

3 Likes

I think that the “python node taking 100% of CPU” must not have a subscriber or timer callback to give spin a chance to block the node?

As for ros2 topic taking alot of CPU? On my Galactic, ros2 topic echo /battery_state which publishes ever 10 seconds takes around 1% of CPU but only when it is active:

Ah yeah once something caps out a core you’re either doing something wrong or processing more of something than you have the capacity to and it may be time to multithread. If that’s the really case then it’s surely a problem.

But in my experience the rule of thumb for non-busy code is that if something takes 1-2% of CPU in C++ that’ll likely still be around 9-15% in python. Usually that trade-off is still acceptable.

1 Like

It would be great if someone could run the problematic node with a profiler, like the one Pycharm pro offers. That could shed some light on where the problem is…

I have an intel core i7-7700HQ w/ 16GB memory, running Galactic, for python:

and C++

So yes the python is using more (.7% vs .0%), but not anything egregious. Same with memory.

2 Likes

Sorry for the late reply.
Thanks a lot everyone for all the great feedback!
I see there are contrasting opinions on the topic, so I started to analyze things a bit more in detail on our code base.
I found out, surprisingly, that the high CPU usage I had noticed appears only in our testing pipelines with launch_testing.
There I have e.g. an idle action server node that listens for action goals and does nothing else, with a more or less constant CPU usage of that specific process of about 40% of a core (and this huge load for an idle process is the main reason why I started investigating potential performance issues with rclpy).
I now tried to analyze the same idle node when run individually via ros2 run, and the CPU load of that same process is instead close to 0%.
As a next step I will look further into whether the cause is launch_testing, something in our pipelines, or somehow the effect of the other nodes that are launched in parallel in our tests.
I’ll update you on these findings as soon as I get some results, and if it is still unclear I’ll also share the code of the problematic nodes with you.
Might take a few days, since in parallel I am also working on several other topics, so I apologize in advance for the wait.

1 Like

Although far from 100%, we have seen high CPU usage by nodes that don’t do much. For example a simple node that just subscribes to a high-frequency topic (e.g. /clock in simulation) can consume about 10-15% of a single CPU… I would need to profile more, and on a recent distribution (this was Foxy), but I suspect that there are quite a lot of inefficiencies in the executor and / or the deserialization of messages.

1 Like

Serialization and deserialization of non-primitive elements (such as ROS time) in ROS2 is slow in C++, and can be terribly slow when using Python. Because it involves deserialization, the simple command “ros2 topic hz” can choke up so badly that it is close to useless for measuring the actual frequency at which a topic is published, at least for high frequency messages containing non-primitive types. This sad fact is re-learned by every ROS2 newby (like me). Please see below link for more.

1 Like

Hi everyone.
After some more investigation I found that it really is the gazebo simulation that causes the peaking CPU usage in python nodes.
As @haudren pointed out, this really seem to be caused by python nodes listening to the sim_time coming from gazebo.
In our setup in particular, we had set gazebo parameters to have a publish_rate of 1000Hz. This is mainly because some C++ nodes are working at high frequency, and we need the clock to be in the order of the milliseconds.
The C++ nodes do not seem to suffer noticeably from this high frequency messaging, while the python nodes really go crazy instead.

I tested this with an empty idle action server node and an empty gazebo world, and these are the results using different publish rates for gazebo:
publish_rate → action server node CPU load
1000Hz → ~50%
100Hz → ~12%
10Hz → ~2%
1Hz → 0%
(tested on a laptop with an intel i7 2.60GHz CPU)

This is really a terrible handling of the simulation time. I find it crazy that the deserialization of something so basic as a Time message would really take this much CPU…

Does anyone have any suggestions for dealing with this situation?
Our C++ nodes still need a high clock update rate. We could of course port all our python nodes to C++, but it is normally very convenient to have some high level stuff in python for quick development iterations…

If anyone would like to try to reproduce this, here is the code for the minimal action server (it uses a nav2 action, just because I did not see any standard ros2 actions yet):


import rclpy
from rclpy.node import Node
from rclpy.action import ActionServer
from nav2_msgs.action import Wait


class MinimalActionServerNode(Node):
    def __init__(self):
        super().__init__(
            node_name="MinimalActionServerNode",
        )
        self._minimal_action_server = ActionServer(
            self,
            Wait,
            "random_action_name",
            execute_callback=self.exec_callback,
        )
        self.get_logger().info("Initialized!")

    def exec_callback(self, goal_handle_):
        return Wait.Result()


def main(args=None):
    rclpy.init(args=args)
    node = MinimalActionServerNode()

    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        node.get_logger().info("MinimalActionServerNode interrupted via keyboard")
    rclpy.shutdown()


if __name__ == "__main__":
    main()

1 Like