How do ya’ll model sensor noise when simulating depth/RGBD cameras?
I’m simulating a depth camera in Unity by reading directly from the depth buffer of a rendering camera. This approach works well, but the depth data is too good. My first idea for adding some realistic noise is to just generate random gaussian noise for each pixel, potentially weighting this noise by the dot product between the camera rays and the surface normals, such that surfaces orthogonal to the camera view give accurate measurements and surfaces that are close to parallell with the camera beams give noisy measurements.
This shouldn’t be too hard to implement, but it’s just a model I made up, more or less. From experience I know that these sensors struggle along edges, so that could be something to weight as well. I’m just curious if anyone has done this sort of thingg before.
I have not! One complication is that it needs to run constantly, so I’ll be implementing it in shader code… For simple stuff like gaussian noise this isnt a big problem (although random number generation on GPUs is… weird.) but I wonder how parallellizable the fancier stuff it. I’ll have a closer look at it.
The basic additive Gaussian noise is practically useless as it is very far from the real noise. What we did is we implemented multiplicative Gaussian noise so that the further the measurements are, the more noise is applied. But this is still very basic. You could add some spatial coherence to the noise as usually neighbor pixels have a large noise correlation (driving it further from IID Gaussian).
As an extreme, you could also actually render the 2 images and run stereo on them =)
I gave gone down the “extreme” route of rendering the stereo images, running a stereo algorithm on them to compute the disparity, and publishing a depth image computed from the disparity image. I was emulating a realsense, so I also had a projection plugin to create a similar dot pattern (I borrowed this idea from a PR2 simulation project I had seen on github). And this gave a very high fidelity simulation of the depth camera. But there are pitfalls to watch out for here if you go down that path:
Unlike the rendered depth, the realism of the stereo depth depends some on your simulation environment. Part of the noise from the depth camera in real life is due to stereo matching failures, so if your sim environment is too easy for the stereo algorithm, you might not see as many of these failures as you see in the real world.
The cpu overhead is significant (I have not gotten to try GPU implementations yet, maybe that would help. GPU certainly helps the rendered depth). So we have a flag to switch between this high fidelity stereo depth plugin and a rendered depth plugin with some additive gaussian noise. And we only use the higher fidelity model when we are testing or prototyping in a scenario where having realistic stereo depth camera noise is impactful.
Hmm yeah, this is interesting, thank you! I’m working on a shader to produce these textures in real-time. I think I’ll just go with multiplicative gaussian noise for now (box-muller transform), and maybe look at blurring/neighbour coherence later.
What about lens distortion? whats a good way to model this? I made a lens distortion shader that applies the brown-conrady/plumb_bob distortion to an RGB camera texture. Is it reasonable to just apply the same shader to the depth texture? As I’m doing here:
this depends on your camera type.
TL;DR when you go to actually use the depth, you will have to undistort it prior to projecting it into 3d space. but for your simulation, it depends. you are basically just choosing a format for your depth image at this point. some sensors provide the depth as either distorted or undistorted depending on how they get the depth OR if they are aligning the depth to the distorted rgb image.
in the case of a stereo camera, the input images are always undistorted prior to computing depth, so the output is also not distorted.
in the case of time of flight rgbd cameras (e.g. azure dk and orbbec femto), the output depth may have its own distortion. usually this is modeled just like the rgb camera’s distortion (e.g. brown conrady) and that spec is given by the camera’s sdk.
although sometimes, you might want to align that depth to your rgb image, so you would apply the same distortion as your rgb camera. So in the example you have here, it would be like your program is just automatically generating a depth image that is aligned to and distorted like the rgb image. which is reasonable imo.
Thanks for the insight! It’s so temping to dive into how to fully simulate stereo cameras on the GPU just based on to RGB feeds, but sadly I don’t think I have time for that right now.
I guess the utility in how I’m doing distortion now is so that the user of the simulator could attempt to figure out how to undistort the image without looking at the actual parameters in the simulator. Maybe that identification is trivial? Maybe so, I don’t know, but hopefully it can still be useful in some sense. Either way, they’ll get the depth distortion for free if they identify the RGB distortion, which i think is fine.
It’s easy to enable/disable, and the actual computation cost is basically negligible with shaders.
It seems even the multiplicative Gaussian error is quite far from reality. I’ve just stumbled upon this very interesting article from Luxonis, where they show that because of the integral nature of disparity matching, the error looks more like a distorted sine wave when plotted against true distance. This would really support the “wavy walls” effect I always see in stereo cams.
Oooh that’s interesting. It makes makes a lot of sense, the wavy walls. And of course, the multiplicative gaussian noise is a massive simplification/assumption. Also- due to the nature of stereo cameras you get these “shadows” of zero confidence, where the object is occluded from the perspective of one of the cameras. This is another major effect of real stereo that I’m not really able to generate artificially (not easily).