Since ROSCon 2018 I’ve been thinking about whether it makes sense to include a common interface for visual localization that allows the perception step to be decoupled from state estimation through message abstraction. This would us to push the output from localization pipelines (feature tracking, marker tracking, ICP tracking) to a filter in a standardized way.
Here are a tentative set of messages for exchanging registration pulses and correspondences:
Registration.msg
std_msgs/Header header # Camera frame/id, time at which the image was taken
float64 shutter_delay # Latency between the shutter fire and registration timestamp
Landmark.msg
uint64 feature_id # Feature ID
geometry_msgs/Vector 3d camera_coord # Position in the camera frame
geometry_msgs/Vector 3d parent_coord # Position in the parent frame
Landmarks.msg
std_msgs/Header header # Camera frame/id, time at which the landmarks were calculated
uint8 type # Camera type
uint8 TYPE_RGB = 0
uint8 TYPE_DEPTH = 1
VisualLandmark[] landmarks
Additionally, we could add some general messages to store features and maps of features
Feature
uint64 feature_id # Feature identifier
geometry_msgs/Pose # Position / pose of the feature in the world frame (optional)
byte[] descriptor # Feature descriptor
FeatureMap
std_msgs/Header header # Time, frame in which features are described
string keypoint_algorithm # Keypoint detection algorithm
string descriptor_algorithm # Descriptor algorithm string identifier
Feature[] features # Features
I don’t know if anybody else would find this useful, or where we could put these features (perception_msgs? vision_msgs?). Comments warmly welcomed!
Andrew