consider adding video. Landmark navigation works well for longer distances and outdoor environments. After all this is what humans do when driving a car. They look out the window and see that big green building and know that is where to make the left hand turn. I think vision is best for a certain scale and lidar best for getting through a doorway without hitting the jambs.
There is another class of vision-based navigation that is different, this is where they convert a stereo 3D image to simulated LIDAR data. I am not talking about that. I mean that you recognize the landmarks and enter them into a list. Then when the landmark is re-recognized the list is consulted. This can be robust if multiple landmarks are in sight and location is determined by triangulation.
The same camera data can be used by other algorithms for “visual odometry” possibly using optical flow
Then as the robot approaches a wall or door the lidar data becomes useful again.
That said, a lidar upgrad is conceptually simpler