Memory growth in fleet adapter and traffic schedule (#664)

Posted by @cwrx777:

Hi,

ros2: humble
openrmf: iron-231229
fleet size A: 29 robots
fleet B size: 8 robots
total participants: 37

We’ve observed memory growth in the traffic schedule and fleet adapter components during task execution. After task completion, memory usage did not noticeably decrease. These nodes were running in a k3s environment. Other rmf components did not exhibit similar memory behavior over the same period.

The diagram below only shows fleet A and traffic schedule.

  • at 12 am (after daily rmf restart)
    • fleet A: 1GB
    • traffic schedule: 40MB
  • at 9pm
    • fleet A: 10 GB
    • traffic schedule: 4.2 GB

delivery schedule:

  • 6:20 am to 7:40 am
  • 9:30 am to 12pm
  • 1:40 pm to 6pm
  • 7:20 pm to 9 pm

For traffic schedule node (orange line), there seems to be upper limit of slightly more than 4 GB.

What make both nodes not releasing memory when robots are idle?

Posted by @mxgrey:

In general programs don’t usually release memory from its heap back to the operating system even when it’s not needed anymore.

Since we’ve seen memory spikes during negotiations in the past, we added an explicit malloc_trim in two places:

  • When a task begins a new phase [1]
  • Every 5 minutes on a timer [2]

If the memory remains this high indefinitely then that suggests something within the fleet adapter is still “using” the memory, perhaps inadvertently. To figure out how to successfully release the memory, we’ll need to figure out what data structure specifically is accumulating so much memory. If you’re willing and able to help us introduce some additional logging into your system, maybe we can track it down.

I’ve opened a pair of PRs that logs the size of open negotiation activity every 5 minutes:

If there are no active negotiations going on then these logs should just say

0 active negotiations, largest sizes:

This is what I would expect to be the case in general since negotiation data should get cleared out after each negotiation finishes. If you see this number growing instead of generally being 0, then that means there’s a race condition in the negotiation protocol which is failing to clear out the negotiations.

In my own testing, I found that this always goes back to zero after each negotiation is finished, but that could just mean that I haven’t managed to trigger the race condition.

Posted by @cwrx777:

@mxgrey

The memory chart in the first post is run with modification of the following code to always return no conflict here as shown below, will negotiation be even triggered?

//==============================================================================
std::optional<DetectConflict::Conflict> DetectConflict::Implementation::between(
  const Profile& input_profile_a,
  const Trajectory& trajectory_a,
  const DependsOnCheckpoint* deps_a_on_b,
  const Profile& input_profile_b,
  const Trajectory& trajectory_b,
  const DependsOnCheckpoint* deps_b_on_a,
  Interpolate /*interpolation*/,
  std::vector<Conflict>* output_conflicts)
{
   //CW always assume there is no conflict
  return std::nullopt;
  ///

  if (trajectory_a.size() < 2)
  {
    throw invalid_trajectory_error::Implementation
          ::make_segment_num_error(
            trajectory_a.size(), __LINE__, __FUNCTION__);
  }

  ....

}

Posted by @mxgrey:

The memory chart in the first post is run with modification of the following code to always return no conflict here as shown below, will negotiation be even triggered?

With that modification you’re right, there should be no negotiations taking place, so there shouldn’t be any memory taken up by the negotiation system.

Traffic Schedule Memory

The most likely reason that the traffic schedule node flattens out is because every minute we cull any schedule data older than 2 hours in the past, so it shouldn’t be able to grow unbounded unless there’s a bug in the culling mechanism.

While auditing the implementation of the schedule database, I realized that we’re holding onto traffic history much longer than what’s really necessary. Essentially every change, including each delay signal, for every robot gets tracked until it is more than 2 hours old. This is done with the intention of making network traffic lighter so mirrors can be updated with diffs of the schedule instead of republishing the whole schedule repeatedly. However the memory cost of this is obviously not scaling well.

I’ve added some commits to Reduce memory footprint of database by mxgrey · Pull Request #116 · open-rmf/rmf_traffic · GitHub that should drastically reduce how much history is being retained in between the culls. I expect with these changes the memory should stay within an MB scale and not reach GB, although I haven’t tested with the scale of robots that you’re using. If you can try it out and let me know how it goes, I would appreciate it.

Fleet Adapter Memory

In the fleet adapter, the schedule mirror shouldn’t be growing at all. It should reach a steady state very quickly and have much less memory usage than the traffic schedule node because it doesn’t attempt to retain any history. Whatever is happening in the fleet adapter, I expect it’s unrelated to what has been happening in the traffic schedule. I’ve added memory profile logging for the mirrors in this PR but I expect that to just tell us the mirrors aren’t growing.

So here are my best guesses for what the cause might be:

1. Task Log Growth

Everything inside rmf_fleet_adapter besides the traffic mirror and negotiations uses RAII, so every time a new task is started for a robot, the memory of its previous task should be cleared out. However we are aware of an issue where task logs can grow very large for long-running tasks. This can even happen while a robot is idle charging, since charging is still considered a task and may produce logs.

If you’re willing to test this out, you could apply this patch to rmf_task, which will block the task logging mechanism entirely. Virtually all task logging will still get sent to the ROS logger, so you’ll still be able to find the information in the stdout of the fleet adapter, it just won’t go to the operator dashboard. If you find that this is the cause of the memory growth then we can discuss options for addressing this.

2. Python memory leaks

Another possibility to consider is whether there are memory leaks happening in your integration code. This is especially risky if you are integrating with Python since most likely you wouldn’t be using weakref to manage circular dependencies. I don’t have personal experience with debugging memory leaks in Python, but there seem to be some options available.

3. Something else??

Unfortunately memory profiling is very difficult to do in general. The only way I know how to do it with C/C++ code is using valgrind, but valgrind adds so much overhead that the fleet adapter can’t actually function properly when it’s run through valgrind. If neither of the above theories pan out then we may have to start considering more obscure corners like maybe the websocket server or the quality of service settings for the ROS primitives.

Posted by @mxgrey:

For what it’s worth, for the next generation protocol I plan to move to an entirely new distributed schedule system where every robot gets its own pair of topics, e.g. /<fleet_name>/<robot_name>/traffic/trajectories and /<fleet_name>/<robot_name>/traffic/delay and both topics will use transient local and a history depth of 1. Knowing a robot’s schedule is just a matter of subscribing to both and combining the data from the two. This would allow network traffic to be extremely efficient, minimal memory footprint, and the schedule node will no longer be needed.

The catch with the above is that DDS scales very badly as the number of topics increase, so this will require users to use rmw_zenoh or to configure their DDS very carefully.

Posted by @cwrx777:

@mxgrey

Does the cache used in quickest_path have any upper bound or expiry?
DifferentialDriveHeuristic::generate

Posted by @mxgrey:

The theoretical upper bound for the cache is to hold a path plan for every pair of waypoints in the graph. The realistic upper bound is it will hold a path plan leading to every destination that you’ve requested, from every initial point that you’ve requested it from. In practice I wouldn’t expect this cache to get very big, but it’s certainly worth considering in case there’s some bug in the implementation that’s causing significant bloat. I’ll open a PR to help test that theory.

Posted by @mxgrey:

I’ve opened up a pair of PRs that should help you investigate:

In particular Add a method to audit the size of planner caches by mxgrey · Pull Request #117 · open-rmf/rmf_traffic · GitHub should actually improve the memory size since I found and fixed an implementation flaw while working on it. You could try using that branch to see if it resolves the issue you’re having.

If you do find that the planner cache is the culprit behind the memory inflation then I could follow up on this with a function to periodically clear the caches.

Posted by @cwrx777:

@mxgrey
Thanks of providing this in such a short time.

I did a comparison of Ubuntu’s System Monitor Resident Memory, DifferentialDrive and ShortestPath value in the CacheAudit printout between when new_items.insert({new_key, solution}); is commented out and when it is not.

I used dispatch patrol script to send a robot to a waypoint, cancel the task and repeat multiple times with different waypoints.

My observation is when new_items.insert({new_key, solution}) is not commented out, Resident Memory and DifferrentialDrive scale with the number of different waypoints submitted in dispatch patrol .
Before any task submission, the DifferentialDrive value is around 43,000. after 1st waypoint, it’s around 49,000, after 2nd waypoint, 54,000, …

But with new_items.insert({new_key, solution}) commented out, the DifferentialDrive value is 0, and the Resident Memory value remain flat, without significant growth over time.

In both scenarios, ShortestPath value is around 17,000.

Posted by @mxgrey:

Commenting out that line means that you’ll no longer be caching the orientation-aware motion plans that get generated by the planner and which are used as a heuristic for the traffic-aware planner. Without caching those results, the traffic-aware planner will need to recalculate the heuristic at every search step. The cost of all that recalculating might not be too noticeable with a small number of robots, but I expect it to get very significant as the number of robots increases (all robots in one fleet share the same planner cache).

Even within the process of generating a single plan, the cache should significantly improve performance, so I would suggest we keep it operational. However, if I provide an API to clear the differential drive cache, it might be reasonable to have a timer clear it every N minutes, or to clear it when the cache audit value gets too large.


Edited by @mxgrey at 2025-05-09T05:21:19Z

Posted by @mxgrey:

I’ve opened Add an API for clearing the planner cache by mxgrey · Pull Request #118 · open-rmf/rmf_traffic · GitHub and updated Add a timer to log the size of planner caches by mxgrey · Pull Request #427 · open-rmf/rmf_ros2 · GitHub to allow periodic clearing of the planner cache.

If you can help confirm that periodically clearing the planner cache helps mitigate the memory inflation problem that you were seeing, then I could add a parameter to Add a timer to log the size of planner caches by mxgrey · Pull Request #427 · open-rmf/rmf_ros2 · GitHub that lets users set how large they want to allow the planner cache to grow before clearing it.

Posted by @cwrx777:

@mxgrey
Thanks for the periodical clearing. It helps in limiting the memory growth.

Posted by @mxgrey:

I’ve updated Add a timer to log the size of planner caches by mxgrey · Pull Request #427 · open-rmf/rmf_ros2 · GitHub to allow setting a max size for the planner cache and then Provide example of using reset planner cache setting by mxgrey · Pull Request #300 · open-rmf/rmf_demos · GitHub shows how to use this parameter with the Python API.

I’ll get these merged and included in the kilted release as soon as the CI passes.

Posted by @cwrx777:

@mxgrey
Do you mind also creating API to clear the shortest path cache?

Posted by @cwrx777:

@mxgrey

Just to update after running in production with the changes:

fleet adapter and traffic schedule memory trend:

DifferentialDrive (max_planner_cache_size set to 400000) and ShortestPath cache size:

Based on the chart, the ShortestPath cache size increase also contributed to the overall memory consumption of the fleet adapter.
It will be good if there can be API to also clear the ShortestPath cache and different threshold value.

Another observation is there was one time that DifferentialDrive and ShortestPath cache went down to 0 without being explicit cleared using the clear differential drive cache API.

[1748406716.933130192] Cache sizes: DifferentialDrive[116567], ShortestPath[281481], Euclidean[0]
[1748406961.223595126] close lane request due to change in lift state
[1748406964.843701333] open lane request due to change in lift state
[1748407017.078993313] Cache sizes: DifferentialDrive[0], ShortestPath[0], Euclidean[0]

FYI, our fleet adapter subscribes to lane_closure_request topic (type: LaneRequest.msg) and call fleet_handle.close_lanes, similar to how it is done here.


Edited by @cwrx777 at 2025-05-29T03:29:02Z

Posted by @mxgrey:

Thanks for the update. Can I check what the units are for the y-axis? Is that still megabytes?

FYI, our fleet adapter subscribes to lane_closure_request topic (type: LaneRequest.msg) and call fleet_handle.close_lanes, similar to how it is done here.

This is most likely the reason the shortest path cache was cleared. Opening or closing a lane will force the planner caches to be cleared since the heuristic values need to change when the graph structure changes.

When I find some time (probably the week of June 9) I can add an API for clearing the shortest path cache. Or if you can reverse engineer what I did for the differential drive cache, you’re welcome to open a PR. I believe this field holds the shortest path cache.

Posted by @cwrx777:

Thanks for the update. Can I check what the units are for the y-axis? Is that still megabytes?

For fleet adapter and traffic schedule memory trend, it’s in mega bytes.

For the cache graph, the value is from CacheAudit printout