Experiment to inhibit DDS and ROS2 child threads

Abstract

In our previous post, we reported DDS child thread and scheduling policy affect wake up latency.
In summary, RR-TS setting improve wake up latency than RR-RR, where

  • RR-TS means that main thread is Round-Robin schedule (real-time) and child threads are Time-Sharing schedul(non real-time),
  • RR-RR means that both main thread and child threads are RR.

We will report on the effect of higher priority process on communication in the next post.

I report in this post.

Motivation
Child threads stop in RR-TS if there is other thread/process with higher priority than child thread
If DDS is implemented as child thread does any communication process, DDS cannot send topic in such a situation.

Therefore we experiment to inhibit the child process by higher priority process.

Experiment Condition

Our setup is following:

  • Software Stack, see previous post in detail.

    • Hardware: Raspberry Pi B+ with careful tuning
    • OS : ubuntu 18.04 4.19.55-rt24-v7+
    • ROS Distro : ROS 2 Eloquent Elusor
  • Benchmark program

    • Ping-pong program.
      Ping sender wakeups periodically, sends ping topic and listen pong topic.
      Pong sender subscribes ping topic, replies pong to pong topic.
    • We implement ping-pong program in 3 pattern.
      • 1 process with 1 node (1e1n)
      • 1 process with 2 nodes (1e2n). Nodes are ping-sender node and pong-sender node.
      • 2 processes (2e). 2 Processes means ping-sender process and pong-sender process.
      • I used SingleThreadedExecutor.
  • Inhibition program

    • large image bitwise_not program
  • Metrics: we measure following in benchmark program

    • (M1) Time until benchmark program ends
    • (M2) Timing from wake up to ping sent:
      This increases when publish is blocked,
      because rclcpp::Publisher::publish calls blocking function rcl_publish.
    • (M3) ping-pong RTT:
      If rcl_publish blocks, this metrics becomes wrong too.
      if rcl_publish does not block and this metrics is wrong, publishing pong may block.
    • (M4) difference between last wake-up time and current wake-up time:
      Check this to detect too long sleep.
  • Experimentation Protocol

    • Run benchmark program with RR-TS schedule. The priority of RR is 98.
      Immediately run inhibition program in RR schedule in the same core. The priority is 90.
      So, schedule priority is “main thread” > “inhibition program” > “DDS child thread”.
    • We tuned that benchmark program and inhibition program finish within 1 minute,
      so if child thread is completely stopped by inhibition program, it takes 2 minuts to finish benchmark program.
    • We measured 9 times under each condition and summarize the results.

I plan to publish codes of benchmark program and inhibition program.

Result Summary

Our result is following.
For 1 executor, 1 node or 2 nodes were unaffected.

number of executors DDS result
1 CycloneDDS no effect by inhibition program
1 FastRTPS no effect by inhibition program
2 CycloneDDS no effect by inhibition program
2 FastRTPS 3 effect patterns

I describe details.

1 executor: cyclonedds and FastRTPS

  • Result

    • The benchmark program finished in about 1 minute.
  • Comment

    • In 1 executor, the topic communication looks like to be implemented as intra process communication.
      We capture network devices by wireshark, and found no data packet are communicated.
      It’s seems UNIX socet nor shared-memory are used, “copy” or “move” may be used internally in DDS.

2 executor in cyclonedds

  • Reslt

    • The benchmark program finished in about 1 minute.
  • Comment

    • Order of programs is important.
    • When we start inhibition program before benchmark program, benchmark program blocks until inhibitation program finishes.
      (I think child thread is used for negotiation or discovery)

2 executor in FastRTPS

There is 3 patterns.

  • (1) ping blocks
    • It takes 2 minutes to finish benchmark program.
    • More precisely, it takes 1 minutes to send specific 1 ping.
  • (2) 2 minutes, but unknown reason
    • It takes 2 minites to finish benchmark program.
    • But the reason is unknown.
      (M2) The worst latency from wake-up to ping-sent is 100 [us]-order which is almost same in no inhibition program.
      (M3) The worst ping-pong RTT is 3 ms, no problem.
      (M4) The worst value is 1 second, which is too large bacause ping-sender wakes 10ms priod.
      But the worst value came out only once, so this is not the cause.
  • (3) no effect
    • The benchmark program finished in about 1 minute.

For (1), I plot (M2) i.e. time between wake-up and publish-sent, with/without inhibition program.
Y-axis is value of (M2), and x-axis is the number of loop.
With inhibition program, max (M1) become 59,736,187 [us] at x ≈ 500 which is almost 1 minute

  • Without inhibition program
    tw_2exec_pub_ping_pong_1_rrts_without_task97.log
  • With inhibition program
    tw_2exec_pub_ping_pong_1_rrts_with_task97.log |

tw_2exec_pub_ping_pong_1_rrts_without_task97.log

Each occured twice, twice, and 5 times.
My expectations are:

  • Child threads are used for control communication in FastRTPS
    • If child threads used for data packet, (1) happens every time.
  • So timing matters?
    If both inhibition program and control communication run simultaneously, pattern (1) or (2) happen.

To investigate the cause of (2) is a future work.

Conclusion

We experimented how higher priority process affects RR-TS child threads.

  • If you use only 1 process, don’t care.
  • If you multiple processes, don’t use higher priority process/thread then child threds in the same CPU core.
    For example, use RR(97) for main thread, RR(97 or 96) for child process, and more lower priority or TS for other process/threads.
    • If CPU cores are different then there is no affect (I didn’t mention in this post).
  • I think MultiThreadedExecutor or other executor under development such as cbg or let executor may use thread for callbacks and use priority (As I remember, cbg executor uses priority).
    I don’t know whether such exectors care DDS child thread scheduling, priority and CPU core, I hope out posts will be useful.

Next post

We have finished experimenting RR-RR vs RR-TS communication performance, so we will report in the next post.

3 Likes