Adding clang thread safety analysis for ROS2 core packages

emersonknapp · February 20, 2019, 11:55pm

Hi all

There’s been some discussion about adding clang’s compile time thread safety analysis for ROS2 core packages, to help us find mistakes automatically and early on. I’ve talked to @tfoote and @Thomas_Moulard about this a bit offline, we wanted to move it into a public forum before moving further.

Annotation Macros

The thread safety features are enabled by some annotations that are best used by macros that can be turned off when using non-clang compilers (see mutex.h in the clang doc page), as a general utility. The first question is, where should these be defined?

One thought was to add it to rclcpp, but since we definitely want to use it in the rmw layer, that would be too high
Another candidate is rcutils, but these macros are C++ specific (extending functionality of std::mutex, and that packages is meant to be C-specific

Given that, I personally think creating a new rcpputils package makes the most sense. It hasn’t been needed yet but could make sense now given we have something that would fit there.

Usage

In terms of using these annotations, we have a Ci.ros2.org clang nightly job now available that could raise thread safety warnings for annotated code, at least in the nightly build.

For usage and roll-out, the rmw packages are prime candidates for first packages to start using the annotations, since they are highly threaded code. The thinking on guidelines

private mutexes
use GUARDED_BY on mutex-protected members to surface the rest of the usage needs
copy small things to avoid blocking

I’ve put together a proof of concept in rmw_fastrtps to illustrate

Feedback?

Before changing anything, want to hear thoughts on 1) where the macros should live, whether we should create that new package, and 2) usage patterns as put together in that WIP commit

gbiggs · February 20, 2019, 11:57pm

How does the analysis handle threads that are not annotated? For example, threads inside the DDS implementation.

I noticed in your proof of concept that there are places where locks have been added and places where they have been removed. Is this for the analysis or because the analysis found errors?

emersonknapp · February 21, 2019, 12:05am

Anything not annotated doesn’t get analyzed. The llvm libcxx implementation has the Capability annotations on its std::mutex implementation, but if you don’t mark any data as GUARDED_BY the mutex, the analysis has nothing to operate on

I kept all existing locking logic intact but shifted ownership of the locking around - The main difference is that in favor of a private mutex ownership patterm, I removed the “lock-wrapper” LockedObject pattern, because it spread out locking and safety in such a way that it would be easy to make more mistakes, spreads ownership of safety to any user of the data, instead of centralizing in the thread-accessed data. That may not be the way we want to go, it’s just what seems to make the most sense to me

gbiggs · February 21, 2019, 12:08am

Yep, that makes sense. What I’m wondering is does this have any impact on the analysis of threads that are annotated? If an annotated thread interacts with a non-annotated one, how trustworthy are the analysis results?

gbiggs · February 21, 2019, 12:09am

I haven’t looked at the details, but what you say makes sense from a maintainability point of view.

emersonknapp · February 21, 2019, 12:13am

Good question - the way I understand it the annotations are more about data than the threads themselves - so if you annotate data as being guarded, the analysis will spit out warnings for other code/packages only if it is built with -Wthread-safety, and accesses that data unsafely (according to the the analysis rules). So we could keep “non-ported” code warning-free by not enabling that flag on it until it’s been looked at specifically.

gbiggs · February 21, 2019, 12:22am

OK, that makes sense to me. Thanks for the clarifications!

emersonknapp · February 22, 2019, 8:10pm

Proposal for phasing/staging moving forward on this:

Create rcpputils package with annotation macros
Enable -Wthread-safety on rmw_fastrtps and migrate to the private-mutex pattern for TopicCache
Enable -Wthread-safety-negative on rmw_fastrtps and add negative capability annotations where needed
Perform 2&3 on the other rmw implementations

tfoote · February 22, 2019, 11:53pm

@emersonknapp Thanks for your complete and well thought out proposal. Hopefully this can be a good example for others going forward.

Overall we have general support for a utility package with a scope like rcpputils in parallel with rcutils here. There are several utility functions in rclcpp that could be refactored out into this package too in the future.

There was some discussion about whether there might be another more generic package scope. But there wasn’t any strong alternative proposals so lets go ahead with this as proposed.

emersonknapp · February 25, 2019, 10:02pm

@tfoote what’s the process for creating the new repo? I have a take at a an empty package on my account, with a PR against it adding the macros

But I realized I can’t transfer ownership to ros2 because I don’t have permissions to create repos in the org. Advice?

tfoote · February 26, 2019, 1:03am

Moving into an organization is a bit of a challenge. The most common thing to do is to transfer it to one of the individuals who has access and then they transfer it into the organization. Luckily github is relatively good about setting up redirects so most people won’t notice the indirect path.

If you can transfer it to me, I can then transfer it to the organization.

emersonknapp · February 26, 2019, 8:08pm

Great! Done, thanks!

Another thought, it seems a little unwieldy to add this block to the CMakeLists.txt for every package that wants to use this analysis

# Enable thread safety annotations and analysis when using clang
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libc++ -std=c++14")
  add_definitions(-D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS)
  add_compile_options(-Wthread-safety)
endif()

Would there be a better way to either

Put that block in for all ament packages, so it’s enabled globally, but only affects code that is annotated using the C++ macros
Add a cmake macro/function to enable it as a one-liner, like ament_enable_thread_safety_analysis?

@Thomas_Moulard maybe you have thoughts on this?

Thomas_Moulard · February 27, 2019, 2:04am

We have two ways IMHO:

Keep TSA optional and write a colcon mixin to enable this behavior.
Inject the statements you quoted everywhere using the *-extras.cmake mechanism.

My feeling is that, keeping this opt-in per-package will be challenging because mixing the clang and GNU standard libraries is probably unsafe. So it would need to be a global on/off flag. 2. is most likely too brutal of a change so 1. could be the best option.

emersonknapp · February 28, 2019, 8:50pm

I’ve hit a wall with enabling this - given the following constraint

we do not want to make our own Mutex wrapper, in favor of using std::mutex

High level problem -

To get the capability annotations on std::mutex, I need to use libc++ (LLVM)
On Linux, by default, all libraries use libstd++ (GNU)
The build, using libc++ on an annotated package, can succeed just fine, BUT
Any std objects passed across .so boundaries into this library now have runtime undefined behavior, because you call libc++ functions on libstdc++-created objects

Specific example case that I ran into -

I add thread safety annotation to rmw_fastrtps_shared_cpp
- link it to libc++ to have std::mutex be analyzable
upstream, an rclcpp test creates a Node, which eventually resolves to fastrtps_shared_cpp::rmw_node.cpp::__rmw_create_node, which:
- creates a ParticipantAttributes from fastrtps (which has not been modified, so it uses libstdc++ std::vector creation)
- calls vector::resize on the member vector (which jumps into the libc++ implementation)
- program hangs
- slight variations on operations can cause a crash

Workarounds I’ve tried:

colcon build --cmake-args -DCMAKE_CXX_FLAGS=-stdlib=libc++ (mixin approach)
- FAILS (on Linux) because poco_vendor is not built in the ROS2 workspace, and we get stdlib linker errors against it
rmw_shared_cpp-extras.cmake with the -stdlib=libc++ block
- FAILS for same reason

In conclusion, here are a list of options I have come up with to move forward (I am also very open to other suggestions)

When we run this analysis, libc++ linking flag that is turned on by the build caller, not specified at the package level
- Static analysis runs on compilation, giving useful warning messages
- We cannot expect linking or running code to work
- So, somehow disable linking?
Build all code from source, allowing forcing the standard library for all code
- looking at libpoco specificially here
Run this analysis on Mac only, where all the code will be linked against LLVM libc++
Remove the initially stated constraint and provide our own annotated mutex wrapper, removing the need for libc++
Modify Fast-RTPS interfaces to remove the need to interact with std:: containers directly
- This may not be the only place where std:: objects are used across API boundaries, but it’s the place I identified in my first annotations

@tfoote thoughts?

I think this analysis is valuable, but it’s definitely got its technical hangups here

@Thomas_Moulard

gbiggs · March 1, 2019, 12:32am

Should the analysis always be run for best results, or can we limit it to a CI run? If the latter, then perhaps we can set up a separate CI run that builds everything from source using libc++ and run that once a day or something, rather than making it part of every CI build.

dirk-thomas · March 1, 2019, 5:04am

That sounds like the commonly used approach for this.

In general the same applies when mixing libraries built with different C++ compilers, e.g. g++ vs. clang. While you might be lucky this is not something guaranteed.

emersonknapp · March 4, 2019, 6:22pm

I think that we gain value by running at all. Though it would be nice to run for every build, it seems prohibitively expensive given the current situation - we’re only running a Clang build nightly today as it is.

Based on @dirk-thomas input, my thinking now is to get all code building from source in the nightly Clang build and force that build to use libc++

Thomas_Moulard · March 4, 2019, 9:30pm

As we talked offline, I would do as follow:

Add and merge PRs adding the TSA colcon mixin, the rcpputils code for the macros and the class annotation. We can do that now and this is a no-op if the mixing is not being used.
Try to find a way for the whole build + test run to succeed:

Only target Mac OS X for now (if clang stdc++ is the default one on OS X? otherwise I guess brew can pass custom flags?)
Try to build everything from source on Linux and check that it works. If yes, we can figure a way to build everything from source in the CI.

Topic		Replies	Views
Ci.ros2.org clang nightly job now available Next Generation ROS ros2 , quality-assurance	0	962	February 14, 2019
Interop SIG March 6, 2025: Memory-safe High-performance Async ROS Programming: A Preview General rust , rclrs , open-rmf , interoperability	3	250	March 6, 2025
Need reviewers for ROSIDL related change General ros2	2	839	August 6, 2022
Comprehensive code walkthrough of the rclcpp / rcutils / rcl logging subsystems ros2 , logging	3	1394	March 7, 2023
Help wanted for rclcs Client Libraries	0	1157	December 12, 2016

Adding clang thread safety analysis for ROS2 core packages

Annotation Macros

Usage

Feedback?

Related topics