VSLAM + ROS2 + Innovation

Hello everyone,

My name is Alex, and alongside my work in the industry, I am pursuing a Ph.D. in Robotics/Computer Vision at a Brazilian university.

We are working on an autonomous vessel project, a USV (Unmanned Surface Vehicle). I am currently dedicated to implementing VSLAM (Visual Simultaneous Localization and Mapping) for the boat, which is currently a sailboat but could potentially be another type of vessel, such as a catamaran.

I need your help. I am looking for a scientific contribution that my work could address. If anyone is an expert in the field of VSLAM applied to USV or has experience in this area, could you please provide insights on where you think we can improve upon the current solutions available in ROS (Robot Operating System) and how this could be an innovation?

The idea is to create something new and contribute for ROS community.

Thank you,

VSLAM on the water? That sounds hard. I’m no expert on VSLAM, but doesn’t it rely on identifying visual features that can be recognized between frames? On the water, you wouldn’t expect many such features to exist, especially since the water itself keeps moving, too, no?

Additionally, one usually has optimal conditions for GNSS (aka GPS) on the water. Short outages can be easily bridged by coupling with an IMU.

1 Like

I agree. Perhaps they are assuming the vessel will be in clear or shallow water, and then perform VSLAM on the sea/lake bed below? Or they are navigating rivers/waterways, where there are constant nearby landmarks on the shore. But even still, this seems challenging (for the reasons you’ve mentioned, but also because of the up-and-down bobbing of the camera). It’s also not clear what advantages this offers over GPS+RTK: if you’re on the water, you’ve got a great line-of-sight to satellites basically all the time.

I think this is the backwards way to do it: rather than building a solution and then finding a problem to apply it to, perhaps you should be focusing on finding the problem and then solving that? For example, perhaps there’s a specific edge-scenario where this is applicable (like navigating around a dock). Alternatively, focusing on the problem of VSLAM in highly noisy areas or with hard-to-model camera movement would be useful beyond just USVs.

@Alex_Salgado Could you provide some additional detail on what your use-case is here? That might make it easier to see what you have in mind.

1 Like

The idea is to use VSLAM for USVs in coastal waters, whether or not fixed landmarks on land are used as reference points.

I’ve talked to some sailor friends, and they mentioned that despite a vessel being equipped with various sensors, in port or dock areas, they often rely on visual navigation for the practicality of reaction time (as seen in the example of the iceberg with the Titanic).

So, I would like to create a system that replicates this behavior. They refer to it as “bearing problems navigation in restricted waters.”

As mentioned, some challenges I already foresee are:

  • A scarcity of reference landmarks.
  • Various and often compromised visibility conditions.
  • Typically unstable observation points.
  • The teleportation effect due to sensor signal loss.
  • We would also incorporate an IMU to enhance and adjust the trajectory.

Additionally, we plan to incorporate an Inertial Measurement Unit (IMU) to enhance and adjust the trajectory.

I apologize if my earlier expression was unclear, but the main idea is to consult the community to see if anyone has undertaken a similar endeavor and can identify an unresolved issue that would be valuable for someone to dedicate their efforts to and contribute to the community. This could serve as my starting point as a kind of initial survey.

I’m already appreciative of your feedback (@chfritz and @cst0 )

I think to contribute something useful here you’d have to come up with something pretty novel, since traditional vslam won’t really perform well in the general maritime use case. Additionally, there’s already a ton of tried and true sensors in use for the same purpose, and navigation/localization of surface vessels is well studied and sort of solved (except for edge cases, which you should try to identify!).

These methods/sensors are generally quite robust, like traditional gnss with RTK, INS systems and DVLs (doppler velocity log). You could argue a use case for ASVs in gps-denied environments, but these are rare for your typical asv application. Maybe if you’re travelling through tunnels, or in defense.

I’ve seen a couple papers where they used monocular cameras for autonomous docking. This makes sense, since you have clear, static physical features. I would try to find a concrete scenario where VSLAM specifically would perform better than current methods.

Lastly, are you just implementing traditional vslam on a boat? or are you looking to design a new method (or tailor an existing one)? Maybe this calls for a neural network


My wish is to implement something new.

Ah, so this is an interesting detail: does this mean you’re focusing less on a mapping/localizing task and more on an obstacle avoidance task? So in this context, using vision processing to aid an odometry estimation and obstacle detection/avoidance. At the risk of being pedantic, that’s different from VSLAM, so that may be a source of confusion here.

If that’s the case, we can narrow our focus a bit. For example: visual odometry is pretty widely researched, and so there’s a lot you can build off of there. But I imagine many of them struggle with the noisy environments that you’re bound to uncover (from the waves as previously mentioned). An approach that finds points for calculating VIO while remaining highly noise resistant would be very useful to the broader community: for example, I could imaging VIO being infeasible in human-robot contexts because of crowd dynamics; an approach that handles waves may also be useful here.

Real-time object tracking despite hard-to-predict movements will likely be necessary for you to solve, since you’re dealing with sensing that’s bobbing around (and the object you’re tracking may also be bobbing around or in and out of waves). While folks have made object tracking much more robust using approaches that model object movement, this isn’t feasible if you can’t predict the object movement. If you can crack this, it would be very useful for a lot of robotics (and even plain video processing) applications.

Those are just two quick examples I can think of, but I’m sure there’s more out there with more digging. In my opinion, breaking the problem into these two unique sub-problems of VO+obstacle tracking might be best approach.


From my personal experience driving watercraft, I often relied on the outline of the shore and islands to navigate (not using GPS). At times the shoreline can entirely be underexposed due to lighting conditions. There are features for tracking, even if they are limited so long as a shoreline is visible on the horizon.

One simple approach would be use leverage an AI perception function to mask water and sky from the image, and focus on visual features that remain with the well explored VIO approaches. Consider multiple sources of odometry including stereo and mono VIO to fuse the results into a more robust prediction.

Another would be to use AI perception for the path prediction, by leveraging recordings of prior navigation as training. PilotNet is an efficient DNN for end to end driving for autonomous vehicles using camera, outputting steering, gas, and brake commands; efficient as in we could use do this on mobile GPU’s in 2016. Could be an informative starting point for adaptation, as the principal is similar. May need additional camera’s than the current plane based design to augment training data for pitch and roll in addition to yaw for waves during training. With this, it’s likely to handle drift from current | waves.

Good luck :crossed_fingers:t6: as it seems an interesting problem.

1 Like

Refer ardupilot, there is solution for bot.