From my personal experience driving watercraft, I often relied on the outline of the shore and islands to navigate (not using GPS). At times the shoreline can entirely be underexposed due to lighting conditions. There are features for tracking, even if they are limited so long as a shoreline is visible on the horizon.
One simple approach would be use leverage an AI perception function to mask water and sky from the image, and focus on visual features that remain with the well explored VIO approaches. Consider multiple sources of odometry including stereo and mono VIO to fuse the results into a more robust prediction.
Another would be to use AI perception for the path prediction, by leveraging recordings of prior navigation as training. PilotNet is an efficient DNN for end to end driving for autonomous vehicles using camera, outputting steering, gas, and brake commands; efficient as in we could use do this on mobile GPU’s in 2016. Could be an informative starting point for adaptation, as the principal is similar. May need additional camera’s than the current plane based design to augment training data for pitch and roll in addition to yaw for waves during training. With this, it’s likely to handle drift from current | waves.
Good luck as it seems an interesting problem.