ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A

New perception architecture: message types

We have been discussing the architecture of the perception pipeline in issue
. However, the discussion stopped
before it converged.

The current perception architecture does not contain the information necessary for planning to
avoid obstacles or stop in front of an obstacle. These features were not considered when the architecture
was developed. Based on these needs and other requirements from the planning layer, we would like
to propose the following message types for use in the perception pipeline.

@yukkysaito have you seen these msgs by Autonomous Stuff? It looks like they cover what you are proposing


We tried to be as broad as we could with the Object and ObjectWithCovariance types in derived_object_msgs to cover as many possible sensor-specific inputs as we could with generic structures (like geometry_msgs/Pose and sensor_msgs/SolidPrimitive) but we are always open to feedback if these don’t cover a specific need.

1 Like

@yukkysaito I second @sgermanserrano 's opinion to try to analyse work that AS has done, that is messages from, and rather improve those than to come up with a completely new definition of messages.

To comment on, in this issue we tried to come up with the better decomposition of nodes that would a) allow code reuse and b) fusion of sensor data on multiple levels (raw data, features, objects). This is the result of the last discussion that I, @kfunaoka and @Kosuke_MURAKAMI arrived to

What would be our next step is what you did, which is the data modeling step.

From a quick glance there is nothing in your proposal that is not already included in so I suggest to include messages from AS.


I third using the AutonomousStuff messages. They have what we need, and while I can see a couple of things I’d consider making more exact I don’t see any problems, not even small ones, in using them.

I update the slides.


Current message type is made to be able to cover various information, and a lot of information can be defined. This includes not only the requirements for planning but also the data specific to algorithms and sensors.
The current definition has some issues.

  • Issue 1: A large number of algorithm-specific data makes it difficult to define interface and modularize perception. The interface I’m talking about here is what information is filled in with detection and tracking and which information should be filled out
  • Issue 2: Unstructured and redundant message type
  • Issue 3: Missing information required for planning


This time, we defined the message type from the information required for dynamic object (Something that can move such as a pedestrian, car, truck, etc.).
The details are written here.
As for paths, it is still undefined and will be added as we discuss in the future.

About derived_object_msgs

derived_object_msgs is made with the same idea as the current message type as @JWhitleyAStuff said.

1 Like

@yukkysaito Can you give a detailed comparison of what each of the three message types can store?

@yukkysaito @JWhitleyAStuff

Please feel free to add some comments.





1 Like

@Kosuke_MURAKAMI - Please see for a version which contains the PoseWithCovariance, TwistWithCovariance, and AccelWithCovariance. The uuid is definitely a useful addition that we had not considered and would be happy to add. This was the intent of the “id” field but it is somewhat limited as an int.

1 Like

thank you @Kosuke_MURAKAMI.
As a side note, I think that the label specific shape of AS msg is No. It can not switch between polygon and SolidPrimitive. It can switch only in SolidPrimitive (Cone, sphere, bounding box, cylinder). This is also important from the point of view of multi object tracking and path planning.

Who uses detection label and classification age for what?
If we add items widely, when we modularize detection, tracking, and prediction, these modules will have to fill in these items. Some algorithms may not be filled. I think that it is better to define only what is really necessary in msg.

@yukkysaito - I think the field you’re referencing is “detection_level” which is meant to indicate whether this object has been “detected” or “tracked.” From the notes in, here are the definitions:

A Detected object is one which has been seen in at least one scan/frame of a sensor.
A Tracked object is one which has been correlated over multiple scans/frames of a sensor.
An object which is detected can only be assumed to have valid pose and shape properties.
An object which is tracked should also be assumed to have valid twist and accel properties.
The validity of the individual components of each object property are defined by the property's covariance matrix.

classification_age indicates the number of “scans” or “detections” made of the object where the classification type is the same. When a sensor classifies an object, it usually tells you how many “scans” of that object have been sent since the object was classified as that type. This helps determine the certainty of the classification.

According to your experience in robotics and autonomous vehicles. I would like to hear your opinions @JWhitleyAStuff @Dejan_Pangercic and @sgermanserrano about this messsage definition not including sensor data (i.e. ImageROI, PointCloud)?
Do you think it is necessary? Or do you think it’s better to keep it like this to add an abstraction layer?

@amc-nu I think this is based on your intent for the message. If you intend to follow a “domain-specific-controller” approach, then you are trusting that the individual sensor processing nodes know how to correctly filter the raw data and produce abstracted objects for the most part. The uncertainty is then encoded into the covariance matrix and the classification quality data.

However, if you intend to do either data fusion before object segmentation or fusion and segmentation in the same node, you would need the raw data in the message as well. My understanding is that Autoware is shooting for the first approach so I would say we don’t need the raw data.

1 Like

Agreed with @JWhitleyAStuff comment, I would expect the raw data to be processed on a separate step so that the filtered sensor data is in a usable stage. I think it also makes sense in a setup where you might have an edge device on the sensor that is pre-processing the raw data for consumption by higher level nodes.

1 Like

I think this work is being blocked by a lack of a shared understanding of what it is we want to achieve. What objects do we want to recognise, where do we want to recognise them, what sorts of data do we want to use, what data rates, should data be synchronised or can information be added on to a detection after the fact, do we or do we not use consecutive detections to strengthen an object’s presence, how interchangeable/optional do we want different algorithms and detection types to be, and so on. There are a huge number of unanswered questions that need to be defined and then answered before we can even begin to think about the messages used.

In other words, we need to define our requirements before we try to solve them. Otherwise we are solving an unknown or undefined problem.

We also need to keep in mind that we are designing Autoware for all Autoware users, not just for Tier IV’s favourite sensor set, or AutonomousStuff’s specific demonstration. I’m not saying that that is what is happening, but it is easy to forget.

Additionally, I think it would be useful to draw up a list of:

  • The different types of sensors we expect to be used. Not just ones we use now, but also ones a potential Autoware user might use.
  • The different types of data we might process. Obviously this closely relates to the sensors used, but don’t forget using post-processed data as an input to an algorithm, e.g. merged dense point clouds versus individual sparse point clouds, or point clouds with or without RGB data added from a camera.
  • The object locating, object identifying, object tracking, object predicting, etc. algorithm types that we might use.
  • Possible orderings of algorithms.

@amc-nu in my experience you need to decide between performance and synchronization. That is if you do not have many nodes and you have fast middleware, then you can use message types that also include raw data. If you have the opposite case then you should go with small messages.

If you split your message types too much you will have to deal with the time synchronization once the messages received by the end node. That is both hard to do and computationally expensive.

In any case I believe that we should finish the computational graph architecture first (how many nodes and composition) and then define the messages and not the other way around. I assume that AS has a solid computational graph architecture that let them define such messages.


I listed the differences between derived_object_msgs/ObjectWithCovariance and DynamicObject, and heard why DynamicObject is constructed in such a way. I’d like to clarify the differences and reasons, and like to make more common view in the whole community.

@JWhitleyAStuff Would you comment about the different information?

Same information

  • geometry_msgs/PoseWithCovariance pose
  • geometry_msgs/TwistWithCovariance twist
  • geometry_msgs/AccelWithCovariance accel
  • geometry_msgs/Polygon polygon and geometry_msgs/Polygon Shape::footprint

New common view

Different information


PoseWithCovariance[] past_paths is removed from DynamicObject since planning is not interested in past information. Though prediction will require past information, it can be resolved inside prediction.


Thank you for the summary. Do you have any opinions?

I think the reasons of DynamicObject are reasonable since I created the table based on the hearing :wink: My opinion is already reflected into the table, e.g. object_classified and past_paths are not necessary.

@JWhitleyAStuff Sorry to bother you again. Would you comment about the different information? I’m not familiar with the background of ObjectWithCovariance.

@kfunaoka I’m sorry it has taken so long to get back to you. Here are responses addressing your issues:

The object does have a header field. However, it does not need to be populated and ObjectWithCovarianceArray also has a header.

The types listed for “classification” are not exhaustive nor definitive. As far as I know, the message type has not been extensively used so it is open to modification. We have only done tests with it internally and have not released any packages that use it. I agree with your assessments for the “UNKNOWN_” types. They were types provided by a sensor vendor so we included them.

Not a problem to change the classification_certainty to a float (0-1).

Many algorithms use a convex hull bounding area to define an object. This is why “shape” was included. There is also geometry_msgs/Polygon polygon in the message for defining a non-normal polygons.

Regarding the rest of the comments: I think the overall concept is that our message structure for these is flexible. We can add or modify just about anything in the message, though I would prefer not to remove much (if any) of the fields.