Experience with PTP (Precision Time Protocol) for mobile robots

Hello everybody, I would like to collect some ideas on using the Precision Time Protocol (IEEE 1588v2) on mobile robots. Ten years ago, most robots had only one computer and maybe a camera. Nowadays, there can be multiple computers, SBCs or Jetsons on a single robot, together with GigE cameras, lidars, IMUs and other equipment that benefits from having synchronized time.

We have started with PTP on our robots quite recently and we still haven’t found an ideal solution. So I’m reaching out to the community to collect ideas. I’ll describe my view of how the time sync should work. Please, comment on it if you do something differently! This topic is not exactly ROS-specific, but I guess a lot of people who are working with ROS are facing the same kind of problems.


From my point of view, each mobile robot should behave as an isolated time island which synchronizes its island time via NTP with either the Internet (if possible) or with some other central unit running itself as an isolated time island (NTP does not like this, but can be persuaded). It can also use GPS time if available. The idea for using NTP “to the outside” is simple - PTP usually requires wired connections to work well, while most mobile robots are connected wirelessly. So it is not possible to use PTP to sync with the outer world. I think getting to the 10-100 ms precision offered by NTP is enough here.

However, “inside” the robot, all time should be ideally synchronized very precisely (at microsecond level) to be able to fuse various sensor measurements. At the same time, the network architecture on mobile robots tends to be mostly static, which makes it very similar to what Automotive profile defines (Automotive profile is a PTP profile that discards the Best Master Clock Algorithm and allows selecting the master clock manually; it also adds configuration that allows faster convergence of the time sync). I would usually want one of the computers to be the NTP client synced with outer world, and this computer should always act as PTP master to the rest of the robot. All other computers and sensors should be set to be PTP slaves.

So, that was the theory. Practically, we see several problems.

  1. PTP Delay mechanism. PTP defines peer-to-peer (P2P) and end-to-end (E2E) delay mechanisms. P2P is probably better, but as the name suggests (peer-to-peer), there can only be 2 PTP endpoints on each network time segment. This is quite a limitation which either tells you can only have 2 synced devices on the whole robot, or you have to buy expensive boundary clock or P2P transparent clock switches (or you have to have multiple network cards in the computers). E2E delay mechanism, on the other hand, can serve any number of endpoints on one segment, but it is not used by the Automotive profile (gPTP and Automotive only count with P2P). Some devices and libraries tend to only provide whatever is published as an official PTP profile, so getting them to work with “Automotive on E2E” is not possible. E2E is usually supported by the default PTP profile, but that has the BMCA selection enabled, which is not suitable on the mobile robot (you don’t want your clocks desynced by a rougue device claiming itself a master just because there was a network hiccup).
  2. Switches. As said above, P2P between more than 2 devices requires a boundary clock switch. And even with E2E, it is better if the switch has at least the transparent clock functionality, which greatly decreases the time needed for synchronization. But it seems almost nobody makes small PTP-enabled switches. Mobile robots usually can’t carry rack-mount beasts. We’ve only found BotBlox who are making high-performance minature switches. GigaBlox Rugged (unavailable due to chip shortage until '23; 4 gigabit ports, roughly $300 if chip prices go back to normal) should work as E2E transparent clock (but the firmware still has to be fixed for this to really work). GigaStax Rugged (first experimental batch in May '22; 5 gigabit ports, roughly $400) should work as P2P and E2E transparent clock and can also be configured as boundary clock. These are the only 1588-aware switches of reasonable size and price we have found. It also seems to me that there are two types of transparent clocks - those only filling the correction field in passing packets, and those that can act as P2P peers. So not all TC switches are compatible with the P2P protocol.
  3. Network cards. Not all network cards support packet timestamping. Figuring out which card supports software or hardware timestamping is almost impossible (maybe by examining linux kernel, but even then - most NIC manufacturers do not tell you which NIC chip they used). It would be really useful to have an online database of ethtool -T which you can search when you’re looking for the right network card. But I haven’t found such.
  4. When is sync ready? Last problem (but a smaller one) is how to tell all devices are in sync and operation of the robot can begin. The PTP protocol does not define this kind of important aspect. The official way is “look at the master offset on slaves and decide whether its average and std are in an acceptable range”. But even then, the PTP clients such as linuxptp cannot do this for you automatically. The best thing I found so far is running pmc in a loop, querying the client (or the whole network) and manually analyzing the received offsets. I haven’t found a ROS node that could do this for me and e.g. publish a diagnostics message.

When doing the research, I sketched this table that helps me reason about the various official PTP profiles:

Profile BMCA Delay mech. Layer
Default Yes P2P/E2E L2/L3
gPTP (802.1AS) Yes P2P L2
Automotive No P2P L2
Autosar No P2P L2
LXI Yes P2P L3
IEC 62439-3 L2P2P Yes P2P L2
IEC 62439-3 L3E2E Yes E2E L3
Power Profile Yes P2P L2
GigE Vision Yes P2P/E2E ?
My ideal No E2E L2

I have also seen some newer mentions of unicast PTP, but we haven’t experimented with that too much and it also seems that support in sensors is not yet there (Basler ace2 cameras seem to be among the first ones).

This reading was quite useful to me when doing my research:

11 Likes

I’ll chime in here with what little experience I have using PTP with an Ouster lidar. At the time my understanding was that you could still use PTP in “software only” mode and achieve syncs in the 10-100us which was good enough for my application at the time. You probably won’t like this answer but in the end I’ve always found it easier to sync lidars using PPS input + an INS or GNSS receiver. Same with cameras, most INS’ in the market have “trigger events” and can capture the time stamp of those rising edge signals.

For switches I’ve used the Techaya boards in the past, they work great although some of them require a heatsink so you’ll have to be careful about that.

2 Likes

I’ve used PTP quite a lot in the past. @AndreTrimble hit the nail on the head. In addition, what you need will be pretty tied to your hardware setup and system architecture. If you’re lucky, you can get away with the software only approach and having a PTP daemon on each machine.

Thanks for your experience. What do you mean by the software-only PTP? I know you can have NICs doing software timestamping, but that still does not remove the requirements for PTP-aware switches.

Consider the use case I had, where an Ouster lidar was directly connected to my computer and synced in PTP, no special switch was required. What I mean by “software only” is that you can still run PTP with a regular switch but you won’t get the best accuracy because you won’t be able to measure the processing time introduced by the switch.

Really depends on your application.

This thread is informative for time synchronization, and I’m adding here some idea’s from experience.

There are many different methods for time synchronization across clock domains, and we use the application to derive the requirements for the accuracy, drift, and jitter of time measurements.

NTP is very useful when we need a wall clock measurement for an application, or recorded data.

For synchronization between computers for communication, we rely on PTP as the network topology is fixed. Software PTP allows for broader compatibility across hardware devices, but comes more drift, and jitter in synchronization of the clock compared to PTP in hardware. For most of the high level applications (i.e. monitoring) this can be sufficient.

When tighter accurate is needed PTP implemented in a combination of hardware and firmware; this has the added cost of verifying interoperability between the hardware components. There is effort involved to have the computer, switches, and sensors all support PTP in HW.

For sensors, time synchronization methods are more fragmented. We strive to have the highest accuracy for the acquisition time of the sensor data when we need it. We measure acquisition time where possible, and derive acquisition time from arrival time at the computer; acquisition time is when the sensor reads measurements (i.e. photons | waves). When we are unable to measure acquisition time, we strive for the lowest jitter on arrival time of the sensor data at the computer to more reliably derive the acquisition time.

In practice for sensors we timestamp key events using timestamping hardware in our computer with 2us accuracy. We use this hardware timestamp as the singular common clock source, to align the multiple concurrent clock domains where acquisition and arrival time measurements occur. timestamps are captured by the hardware for the linux kernel clock, PTP, PPS, and several camera interface events (i.e. frame sync, frame start, frame stop), so we can convert time between the various fragmented clocks. This is the most effort invested method for time synchronization as it’s used to provide the highest accuracy for recorded sensor streams which are inputs to sensor fusion functions, offline development of perception systems, and offline re-simulation of open loop perception systems real data for testing. Accuracy of acquisition time for these operations correlates to accuracy in three dimensions of perception.

LIDAR is used with PTP over ethernet, and PPS measuring acquisition time. In indoor applications, the computer generates PPS as GPS does not work. In both cases we hardware timestamp PPS to accurately derive acquisition time. RADAR is typically PTP on Ethernet, and hardware measured on a CAN interface, both providing arrival time. Camera and IMU are very precise with timestamps in hardware as we connect them directly to the computer; frame synchronization is used to trigger all the camera’s to capture at the same time for an aligned observation of data.

Hardware timestamping mechanisms allow the platform to maintain accuracy under heavy load.

The accuracy of time synchronization depends on the application.

2 Likes

@peci1 thanks for sharing your thoughts!

but that still does not remove the requirements for PTP-aware switches.

Do you actually need Transparent Clock with Switch/Router for the following use cases?

However, “inside” the robot, all time should be ideally synchronized very precisely (at microsecond level) to be able to fuse various sensor measurements.

sounds only one network segment is required to inside robot network?

in this case, master and slave clock E2E would be precise enough? what about making network switch be a master clock in this domain?

i was thinking that Transparent Clock would be required to have much bigger network configuration with multiple network switches.

Great writeup, @ggrigor! So you have a specific extra piece of hardware for the timestamping, or is it something already present inside the computer? And is your description for a system with just one “fully-fledged” computer, or do you have more computers on the robot?

1 Like

Thanks for sharing your ideas, Tomoya!

I think E2E can work over any switch. TC switches just make the sync more precies. The problem can be with P2P: officially, the peer delay messages should not be forwarded by any switch. So if you want to sync P2P over a switch, you should have at least a P2P TC-enabled switch. However, we’ve found out some switches are so dumb they do not know these packets should not be forwarded, and forward them as any other packets. That is why P2P might sometimes work over dumb switches. But in any case, P2P over a dumb switch means only 1 master and 1 slave can be present. Otherwise, the master yells at you and refuses working.

I think E2E would usually be enough. The problem is there is no published PTP profile with E2E and without BMCA. We really do not want BMCA. And as mentioned earlier, some sensors only offer PTP configs that correspond to some official profiles.

Do you know a small switch capable of acting as master clock? :slight_smile: Moreover, if you’d want to sync the PTP master with an outside NTP server/GPS receiver, this might rule out most switches…

With E2E, it is technically not required. It might just happen that the jitter on a bigger network will be too high to get a good sync.

Again the mysterious “software PTP” :slight_smile: What is that? Do you refer only to the timestamping mode of the network cards, or something completely different? Is there some “PTP over IP?”. AFAIK, the PTP packets would still be the same, and if a switch on the path knows that some of them should not be forwarded, they will not forward them, thus making any sync attempts impossible.

It is present in the computer. Hardware timestamping is called GTE, in the Xavier & Orin SOC’s as a feature of Jetson and DRIVE platforms.

It’s a question of implementation; is the hardware designed with the clock and synchronization directly, or is it supported through a combination of drivers & firmware on hardware adapted to support it.

Thanks

Sorry … missed this one.

It depends.

On less compute intensive platforms we use a single computer. On others we run more than one computer. The distributed systems are typically used to increase compute for demanding applications, or provide fault tolerance/safety for higher levels of autonomy where perception systems are distributed across computers to allow completion of a minimal risk maneuver on failure. Connection between computers is on Gig-E or faster.

Hi,

I’ve used PTP on this handheld SLAM device, to synchronize the host, lidar and camera.



I feel that PTP and ROS are not directly related, it is the managed by the host. I did not use a switch and configured ptp4l (linuxptp) directly on the host. The camera uses the GigE Vision 2.0 protocol, so IEEE 1588 is supported.
On WiFi there is a similar protocol called WiFi Fine Time Measurement (WiFi FTM). But it is mainly used for WiFi ToF positioning. WiFi based PTP is supported by openwifi project.

Thanks

Lots of good comments above in the topic.

My experiences with PTP are very positive while using it with ROS 2 in fully distributed systems (and though not for mobile robots specifically, they are easily applicable to various interconnected SBCs within your robotic system). Specially when using hardware PTP implementations (either through hardware acceleration or via dedicated subsystems in the SoCs). Part of our work was published using PTP in modular ROS 2-based robots: Time Synchronization in modular collaborative robots.

The article above shows how using a tiny and inexpensive AMD Zynq 7000-based SoM module you can obtain a) sub-microsecond clock synchronization accuracy, b) ROS 2.0 timestamping accuracies below 100 microseconds and c) bounded end-to-end communication latencies between 1-2 milliseconds.

2 Likes

I have synchronized radars, lidar and camera’s in our autonomous driving research vehicle. I had to use various protocols, because it all depends on your hardware. For example, the Velodyne HDL64-E S3 only accepts NMEA+PPS. The PTP option is not available. For the radars, I had to use gPTP/Automotive. It is also a matter of purchasing the right hardware. Such as switches that support PTP and ethernet devices with hardware time stamping.

I do not remember experiencing any issues with running multiple P2P endpoints.

You should also be able to see when the sync is ready with wireshark. The packets contain the delays.

Hi,
we use ptp primary for syncing multiple PCs, and GigE Cameras.
Our setup is, that we have one master PC for control and multiple
Vision PCs. Note, that our GigE Cameras only support auto clock
selection.
For our use case we use it in this way :
The control PC is configured as Grandmaster.
The vision PCs sync themself to the control PC.
The vision PCs themself act as Grandmaster on the network interfaces towards
the cameras. This ensures, that the cameras sync to the PC and not some other camera…
We only use network cards with hardware support. Datasheets usually state if they support this. Sometimes its written as PTP, and sometimes as IEEE 1588 if any of these terms was present in the datasheet of the card, hardware ptp worked out if the box with linuxptp for us. If in doubt, just buy intel cards.

Something that I would not recommend is having an NTP running together with ptp. This always resulted in non monotonic clocks, and problems with TF.

We also use dumb switches. I guess, as long as there is only one subnet / master on the switch, there should be no problem. AFAIK you’ll only need border control switches, if you need to get your master into a different subnet.

2 Likes

Thank you for your inspiring description, @JM_ROS. This looks like an almost ideal setup if you can afford fitting your robot with PCs that have PCIe slots. We’re looking for more integrated solutions like Intel NUC, where we can’t really choose the network cards. And USB network cards are a big no-no (not only because they don’t usually have HW PTP). We also run Jetson AGX Xavier devkits, which have a PCIe slot, but as there is no good fixing/screwing point for the cards, they are just “floating” around and the connection is not as 100% secure as in a desktop box (we made some 3D prints to hold the cards, but still…).

I’m interested in this. Is it the case that multiple cameras are connected to a single ethernet port on the NIC via a dumb switch? Then, according to our experience, P2P delay mechanism should not work, while E2E should work. Can you confirm either of these cases?

I’m interested in this. Is it the case that multiple cameras are connected to a single ethernet port on the NIC via a dumb switch? Then, according to our experience, P2P delay mechanism should not work, while E2E should work. Can you confirm either of these cases?

We once hat to do a cowboy solution, to sync the cameras of a third party system.
I 99% sure that this was E2E. Anyway, the setup was like this:
One Vision PC with 4 NICs running Windows. Each NIC was connected to a GigE camera.
The problem we ran into was, that the third party was not able to setup its (Windows) PC to use PTP.
So in the end, we came up with the solution, to plug all cameras into one switch, and all 4 NICs into the same switch. The idea here beeing that the switch would route the traffic correctly between the cameras and the NICs.
Additionally we connected our linux box running linuxptp to the switch (16 Port TP-Link NonManaged).
This super goofy setup actually worked. I don’t remember if we put all cameras into the same subnet or if they were on individual subnets though.

Okay, I like (and do!) this kind of cowboy solutions :slight_smile: Thank you for confirming that E2E was used in this case. I’m still surprised it is kind of frowned upon (gPTP, audio-video bridging, automotive and other profiles strictly require P2P).

Are you able to share the ptp4l configuration file (and any other services you used like chrony or phc2shm). We have the exact same set of sensors (Livox + Mako) but are having issues setting up PTP with them