FastRTPS throughput issues

I am publishing some data at a very high frequency (1000 Hz). It’s essentially a time server, so the data isn’t huge. Just seconds and nanoseconds. This is my way of replicating the ROS Time that exists in ROS 1.

I created another node that subscribes to this time data.
I am hitting a roadblock using Alpha8 and FastRTPS, where I call spin_some(node) and it basically never comes back.
My intuition is that it spends the whole time processing messages for the subscription, never gets ahead of the incoming data, so the spin_some can never return.

Any experience with this? Or pointers? Is this something that’s an inherent limitation of FastRTPS?
I didn’t notice this with Alpha7, but I was using OpenSplice. I assume the ROS layer didn’t change enough to make this a ROS issue, but rather a FastRTPS issue. I can run that test to verify.

Thanks.

Hi,

1000 Hz is not so high, and I think you should be able to do it with our Fast RTPS. We have throughput tests sending at higher rates with larger data.

Without seeing the actual test you are running is hard to answer. Could you try to reproduce the problem on Fast RTPS and post an issue on our Github ?

Thanks,
Jaime Martin Losa
CEO eProsima - The middleware experts

Phone: + 34 607 91 37 45
Twitter: @JaimeMartinLosa
www.eProsima.com

1 Like

It’s hard to say why you’re experiencing this behavior. If you can provide a simple example where spin_some doesn’t return, we can have a look at it. I’d have to look into how spin_some is implemented, but I was aware of that pitfall when we were writing it. It could still be broken though.

I finally had some time to revisit this. It’s pretty reproducible for me on a Windows 10 machine.
It could very well be that I’m doing something incorrect in my setup.
The code is essentially 1 talker node that publishes “time” (It’s fake time, just a static counter starts at 0 and increments every millisecond). And a listener node that creates an object that contains a subscription to this time. The desire is that the Time object can be called on during program execution to get the current time (similar to the ROS::Time::now() concept).

The behavior is that the listener node gets stuck just processing the subscription to the Time. It never exits the spin_some(node) function call.

Here’s the code:
Listener

#include "framework/os/Time.hpp"
#include "rclcpp/rclcpp.hpp"

int main(int argc, char* argv[]) {

    rclcpp::init(argc, argv);
    auto node = rclcpp::node::Node::make_shared("ros_time_echo");
    std::cout << "waiting for time" << std::endl;
    framework::Time time(node);
    std::cout << "count, time (s)" << std::endl;

    rclcpp::WallRate wallRate(100.0);
    uint64_t i = 0U;
    while (rclcpp::ok())
    {
        rclcpp::spin_some(node);
        std::cout << i << ", " << std::fixed << std::setprecision(3) << framework::toSecondsAsDouble(time.getTime())
              << std::endl;
        ++i;
        wallRate.sleep();
    }
    return EXIT_SUCCESS;
}

Talker

#include <rclcpp/rclcpp.hpp>
#include "framework/os/Time.hpp"

int main(int argc, char* argv[]) {
    rclcpp::init(argc, argv);
    auto node = rclcpp::node::Node::make_shared("steady_clock_publisher_single_threaded");
    auto pub = node->create_publisher<builtin_interfaces::msg::Time>("asi_clock", rmw_qos_profile_sensor_data);

    rclcpp::WallRate rate(1000);
    while (rclcpp::ok())
    {
	static uint32_t fake_sec = 0;
	builtin_interfaces::msg::Time msg;
	msg.sec = fake_sec;
	fake_sec++;
            pub->publish(msg);
            rclcpp::spin_some(node);
            rate.sleep();
    }

    return EXIT_SUCCESS;
}

Time Object (hpp)

#include <rclcpp/node.hpp>
#include <rclcpp/rclcpp.hpp>
#include <rclcpp/subscription.hpp>
#include "builtin_interfaces/msg/time.hpp"

namespace framework
{
class Time
{
public:
    explicit Time(rclcpp::node::Node::SharedPtr node);
builtin_interfaces::msg::Time getTime() const;
private:
    bool receivedFirstClockMsg;
builtin_interfaces::msg::Time lastTime;
    rclcpp::subscription::Subscription<builtin_interfaces::msg::Time>::SharedPtr sub;
};

std::chrono::time_point<std::chrono::steady_clock> toTimePoint(const builtin_interfaces::msg::Time& msg);
std::chrono::time_point<std::chrono::steady_clock> toTimePoint(double time_s);

double toSecondsAsDouble(const builtin_interfaces::msg::Time& msg);
double toSecondsAsDouble(std::chrono::time_point<std::chrono::steady_clock> const& time_point);

}  // namespace framework

Time Object (cpp)

namespace framework
{

using std::chrono::nanoseconds;
using std::chrono::steady_clock;
using std::chrono::time_point;

Time::Time(rclcpp::node::Node::SharedPtr node)
    : receivedFirstClockMsg(false)
    , sub(node->create_subscription<builtin_interfaces::msg::Time>(
            "asi_clock",
            [this](builtin_interfaces::msg::Time::SharedPtr msg) {
        this->lastTime = *msg;
        this->receivedFirstClockMsg = true;
    std::cout << "time is " << msg->sec << "\n";
    }, rmw_qos_profile_sensor_data))
{
    while (rclcpp::ok() && !this->receivedFirstClockMsg)
    {
        rclcpp::spin_some(node);
        rclcpp::sleep_for(std::chrono::milliseconds(1));
    }
}

builtin_interfaces::msg::Time Time::getTime() const
{
    return this->lastTime;
}

time_point<std::chrono::steady_clock> toTimePoint(const builtin_interfaces::msg::Time& msg)
{
    return time_point<steady_clock>(
        std::chrono::nanoseconds(msg.sec * static_cast<uint64_t>(1000000000LU) + msg.nanosec));
}

time_point<std::chrono::steady_clock> toTimePoint(const double time_s)
{
    return time_point<steady_clock>(
        std::chrono::nanoseconds(static_cast<uint64_t>(time_s * 1000000000.0)));
}

double toSecondsAsDouble(const builtin_interfaces::msg::Time& msg)
{
    return toSecondsAsDouble(toTimePoint(msg));
}

double toSecondsAsDouble(time_point<steady_clock> const& time_point)
{
    return std::chrono::duration_cast<nanoseconds>(time_point.time_since_epoch()).count() / 1e9;
}
}

Any pointers or attempts to replicate would be appreciated. I want to switch to use FastRTPS, but unless this type of mechanism works, I’m stuck on OpenSplice.

Thanks.

HI SecretaryBirds,

We are going to test this. I will keep you posted.

In the meanwhile, could you please send me a message with your contact details just in case we need more information about the case?

1 Like

It looks like the fastrtps part works with the latest code from the ros2 package (i.e. https://github.com/ros2/ros2).
I packaged the demo code at https://github.com/geoffviola/time_prototype.

I see two issues

  1. A problem with sourcing the setup.bash, which is probably not related: https://github.com/ros2/rosidl_typesupport/issues/2.
  2. The listener program does not close with ctrl-c. I haven’t explored why it acts this way. The talker program does close with ctrl-c.

Thanks geoffviola,

For us works also. Are you in the SecretaryBirds team?

Yeah, we work together.

Hi @geoffviola,
I just tested this code:

  1. As been solved on my side by https://github.com/ros2/rosidl_typesupport/pull/3
  2. On my machine (xenial) the code is still stuck in the spin_some. I didn’t have time to investigate but changing the rate of the main loop to 1000Hz allowed me to get both spin_some and ctrl+c to work as expected

Cheers

I recompiled with the latest code as of today. The typesupport problem is definitely fixed. With fast RTPS The talker uses 12% CPU even if no one is subscribed. Talker uses 25% CPU, when there is one subscriber. The listener takes 100% CPU.

Ctrl-C works on the listener if there is no publisher. When the talker is publishing to the listener, Ctrl-C does not close it. Holding Ctr-C does stops it from using so much CPU and printing to the screen.

Also, this package does not compile in Linux alpha 8 opensplice binary. It seems that some symbols don’t generate. I see the following error:
[ 66%] Linking CXX executable talker
[ 66%] Linking CXX executable listener
libtime.a(Time.cpp.o): In function rosidl_message_type_support_t const* rosidl_generator_cpp::get_message_type_support_handle<builtin_interfaces::msg::Time_<std::allocator<void> > >()': Time.cpp:(.text._ZN20rosidl_generator_cpp31get_message_type_support_handleIN18builtin_interfaces3msg5Time_ISaIvEEEEEPK29rosidl_message_type_support_tv[_ZN20rosidl_generator_cpp31get_message_type_support_handleIN18builtin_interfaces3msg5Time_ISaIvEEEEEPK29rosidl_message_type_support_tv]+0x5): undefined reference torosidl_message_type_support_t const* rosidl_typesupport_opensplice_cpp::get_message_type_support_handle_opensplice<builtin_interfaces::msg::Time_<std::allocator > >()'
collect2: error: ld returned 1 exit status
CMakeFiles/listener.dir/build.make:164: recipe for target ‘listener’ failed
make[2]: *** [listener] Error 1
CMakeFiles/Makefile2:173: recipe for target ‘CMakeFiles/listener.dir/all’ failed
make[1]: *** [CMakeFiles/listener.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs…
CMakeFiles/talker.dir/src/talker.cpp.o: In function rosidl_message_type_support_t const* rosidl_generator_cpp::get_message_type_support_handle<builtin_interfaces::msg::Time_<std::allocator<void> > >()': talker.cpp:(.text._ZN20rosidl_generator_cpp31get_message_type_support_handleIN18builtin_interfaces3msg5Time_ISaIvEEEEEPK29rosidl_message_type_support_tv[_ZN20rosidl_generator_cpp31get_message_type_support_handleIN18builtin_interfaces3msg5Time_ISaIvEEEEEPK29rosidl_message_type_support_tv]+0x5): undefined reference torosidl_message_type_support_t const* rosidl_typesupport_opensplice_cpp::get_message_type_support_handle_opensplice<builtin_interfaces::msg::Time_<std::allocator > >()'
collect2: error: ld returned 1 exit status
CMakeFiles/talker.dir/build.make:171: recipe for target ‘talker’ failed

1 Like

We are still seeing some issues with this. Now we are trying to enable beta1 with fastRTPS. With a similar node to SecretaryBirds, publishing time at rmw_qos_profile_sensor_data, and the subscribing node subscribing at the same qos, we see the subscribing node using 100% cpu. Ths subscriber is calling spin(). If we try using spin_some or spin once we see the cache fill up and errors. Even when the loop rate is 2000 Hz.

We found a way to get the cpu usage down by subscribing and publishing at qos_default. However, now the publisher slowly climbs up to 100% cpu (takes about 30 seconds to a minute on my machine).

Also, please note that even if we only publish the time 1x per second, we still see the subscriber at 100% cpu.

During my testing, I found that the listener is capable of keeping up at least for a time on my machine. However, at some point, it fails to keep up, at which point it doesn’t return from Spin Some, which makes sense. However, if I close the talkers at this point, I would expect that spin_some would eventually return, which it never does.

Because I don’t like dealing with high rate items (hard to see), I slowed everything down to 10 Hz (talker/listener) and started 1 talker, then 1 listener, waiting for a bit to make sure things were operating as expected, then started another 2 talkers. It ran fine for awhile, but eventually (within 1 minute) the spin some stopped working. At this point, I stopped all the talkers, expecting the spin some to return. Never did. I was able to re-start a talker and the callback was still functional, but spin some would never return.

@dayoung99 I believe what you’re seeing is ticketed here: https://github.com/ros2/rclcpp/issues/280