Of clocks and simulation, betimes and otherwise

I wish to make some commentary and discussion on this article: http://design.ros2.org/articles/clock_and_time.html

First, it appears that there has as yet been no attempt to support non-wall-clock time (er, simulator or playback time) in ROS2. True? It appears that rcl/time.c has some support for that, but I can’t see that this support is exposed in rclcpp::Time. Is there something I missed there or is anyone working on a newer use_sim_time? I feel a little bit late to the party. Sorry.

At ASI we’ve worked around this by publishing our own clock signal at high frequency, but I feel like arbitrary clock support needs to be built into the core framework. I believe that non-realtime playback is a critical feature because I believe simulation and playback should work out of the box.

In addition to simulation, though, there is a need to support hardware platforms that don’t have (wall) clock chips. This is not mentioned in the design document. Although rare, there still exists plenty of embedded processor boards without clocks or batteries. On the other hand, there aren’t any remaining embedded devices without runtime frequency counters. I think we’re safe to assume that all platforms have timers accurate to the millisecond, and that 99% of the platforms have timers accurate to the microsecond.

I want to discuss a few options for the /clock topic:

  1. A “steady clock publisher” or simulator would publish this value at high frequency. Use the most recent value if you get it (or fallback to wall clock time otherwise) for any timestamps on messages published. Rely on NTP to synchronize networked nodes’ wall clock sources. This is what is proposed in the design doc.

  2. We keep the /clock message, but it includes more than a timestamp. It would also include a realtime multiplier. It would be published as frequently as the realtime multiplier changed.

  3. The clock message would have a uptime offset and a multiplier but no timestamp.

  4. Assume all nodes have synchronized realtime clocks. Send out the realtime multiplier and the wall clock offset.

  5. The timestamp in the message header would include information about its source, epoch, realtime scale, and futurity.

Concerning more sophisticated clock message schemes, the design document states that “all of these techniques will require making assumptions about the future behavior of the time abstraction. And in the case that playback or simulation is instantaneously paused, it will break any of these assumptions.” I don’t believe that to be entirely true. There will be some propagation delay of a “pause” in any situation. I’ve also pondered some other IPC mechanisms for synchronizing clocks between processes, but I think I would disfavor all of them as being too platform-specific.

For #1, what publishing frequency is enough? Can we really go at 1000Hz? We aren’t running typically on RTOS platforms, and even for an RTOS that’s a hard constraint to meet. I think 200Hz is more realistic. Going much faster than that will just fill up the loopback buffer with multiple clock messages that never see the light of day. It turns into a waste of CPU and networking resources. The cartographer (or other mapper) utilizes the difference in time between IMU readings and laser scans. This is a critical part of the algorithm. Do we really want that rounded to the nearest millisecond (or 5ms)? Even a single millisecond may be too long for a high-speed robot.

Option #2 allows us to publish the clock less frequently. It also allows (requires) us to utilize our onboard timer. We start the timer whenever we receive a clock message. Anything we publish gets timestamped with last_timestamp + timer * realtime_multiplier.

For #3, I can’t guarantee that all nodes have the same system uptime, but it seems common in simulation.

Option #4 doesn’t meet the requirement to support a platform without a clock. However, it may be handy for rosbag replay if you are using real timestamps. Those are helpful for video synchronization.

Option #5 is more sophisticated. The output timestamp on any published message would typically be derived from the input timestamps on data that went in to that calculation. Sensors would still need access to a real or simulated clock. However, transformer nodes would need no external clock. Transformer nodes would need to throw an exception if the timestamps on their input were too disparate. Tools to make these calculations easier would need to be included in the library.

We have traditionally shied away from automatically adding message subscriptions behind the scenes, but is that where we want to go on the clock? And do we have some other general node coordination data that should be part of the magic-message-subscribed-to-behind-the-scenes? Something like a machine ID and a wall clock timestamp so that the message delay could be estimated?

Thoughts? Progress? Do we have a list of out-of-the-box nodes that need this work?

As a precursor, I am definitely not an expert in the field. Also I’m not 100% versed on ROS2, so some of this may be off.

For the end goal, I think a good aim would be a local node that runs in the background and serves out time requests to other nodes as they request then (similar to how roscore is launched if it isn’t running when using roslaunch). If this node is not running, then any other given node should interface with the system wall clock.

If the clock node is run, then it should publish to a clock topic. If a clock topic is already published, then the clock node should subscribe and run time calculations based off of the topic. This could also allow for a clock node to be run on multiple machines, and allow them to semi-transparently sync to a common node (step time forward to the message most in the future by default?).

As for the message itself, I think that a default rate of 1Hz should cover enough for most uses. When a system connects to another, there will have to be some ping exchange to make sure the network delay is accounted for (and maybe the clock nodes should do this every so often to keep calculating delay). If it is set up well, then it should be quite accurate at low frequencies, but it also should inherently allow much higher frequencies (perhaps 1000Hz might be overkill in most applications simply due to flooding the link?).

An approach for the clock nodes to use should follow something like ntp, and I feel that most of the diagnostic info shown in things like chrony (https://docs.fedoraproject.org/en-US/Fedora/18/html/System_Administrators_Guide/sect-Checking_if_chrony_is_synchronized.html) would be key in allowing the clock skew and frequency skew be automatically calculated by the clock node.

I think having a multiplier set in the message would be a good call, as it would allow systems with conflicting expectations to reject sync messages (sure this adds more configuration, but it ensures your system is all on the same page). This would also allow you to run a stand-alone system in slower or faster time easily by specifying such parameters in the launch file.

Another advantage is that for systems that my disconnect and then reconnect, a system like this could allow for a graceful handover in those moments without disrupting each individual local ROS clock node.

Another improvement may be to allow a “time-since-node-start” (rather than just the current ros::Time() wall time) function to allow nodes that rely on difference in time for loop control to not panic as time syncs externally. Could also allow for ROS_INFO to log with the synced wall time while the node runs on its own “clock” for relative time calculations.

Hopefully that makes enough sense to be useful for you!

The other thing I needed to mention about the time message that has to change: two fields for the timestamp. Two fields – really? Is there some advantage to it? A single 64bit nanosecond value would be infinitely more handy in my mind. I can read and write that atomically with no extra work on my common x64 platforms. I’ve had to resort to std:atomic for the current time message just so I don’t accidentally read a nanosec value with the wrong sec value. It’s not pretty.

x64 platforms are increasingly common, but an explicit use case for ROS2 is small, embedded microprocessors that are not 64 bits.

It would seem that rclcpp::Time::now() – a static method – would not be what we want if that method is built from subscribed data. Instead, it would need to live on node. You would call a node instance method to get the current time, which time would be computed or pulled from the system clock if there is nothing to compute the time from. That way, the node could subscribe for whatever global messages it wants to behind the scenes.

Thanks for the suggestion. For those of you who haven’t seen it @BrannonKing put together a PR for the ROS2 Time design document.

If you’re interested in this topic I’ve replied in detail at: https://github.com/ros2/design/pull/128#pullrequestreview-35463951

At a high level I don’t think that we want to overload the simulation /clock abstraction to support system time synchronization as well. Clock synchronization is something that is quite specialized and there are many mature mechanisms for doing that already. We don’t want to have to provide our own implementation when we can leverage existing technologies such as Network Time Protocol.

For the use case of a system that starts without a system clock. It’s relatively straight forward to have it run ntpdate or chrony on startup to get it to the right system time.

Sorry it has taken so long for me to reply here, but I just wanted to make some short replies to help connect different parts of the discussion.

No you’re right about that. We’ve thought about it enough to start the design doc and to implement the parts we believe are necessary to support ROS 1 style sim time in rcl’s C code, but we’ve stopped short of implementing ROS 1 style sim time (using a /clock topic) or exposing it in C++ and making an example to demonstrate how it would work.

I agree, this is a crucial feature in ROS 1 and I think we need it in the core of ROS 2 as well.

Your right, and thanks for starting to discuss this related topic there (@tfoote already referenced it too): enhanced time article to include realtime factor by BrannonKing · Pull Request #128 · ros2/design · GitHub

I think we can continue to discuss the options for /clock (or a similarly named topic) in that pr.

There’s a lot more to comment on @BrannonKing’s original post , but I’ll get back to that in another post or on the reference pr.

I agree with you.

If you look in the implementation of time in rcl it uses a single uint64_t to store nanoseconds for a time point and a single int64_t to store nanoseconds for a duration. I did this after much research and talking with a few people at OSRF.

I did not, however, get the chance to write down all the justifications so that I could argue that the time message be switched from the current system of two 32-bit ints to a single 64-bit int. There are some trade-offs, but I think it’s a good path forward. Unfortunately there would be a lot of inertia to overcome when changing this, because changing the time layout would affect lots and lots of code in ROS 1 I think. This isn’t a reason in itself not to do it, but it does mean we need to be sure it is necessary and properly justify it and provide decent migration options.

This is true, but even those on machines without 64-bit instructions, the compiler will generate code to do the 64-bit math using 32-bit registers and it will almost certainly be faster and more correct than what we do. I think this is a commonly used but weak argument against using a 64-bit type for this.

Historically the time_t was 32-bit because it was conceived before machines and languages had 64-bit type support. This is an interesting read:

Some OS’s have already expanded their underlying storage to 64-bit. Either way we have no good reason to stick with this layout for time in our messages.

But this is an argument that needs to be made formally, so I won’t dive deep into it now.

I’ll continue with @BrannonKing and @tfoote on the pr.