ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A

OpenCV AI Kit (OAK)

is it possible to build robot control like of STRETCH with OAK-D?
how does OAK-D compare to intel realsense (there is realsense d435i with imu)?
maybe frame rate speed or accuracy, compute power, simplicity of integration.

how to build a system like that in ros, like a general road map?

1 Like

Yes, I think so. Would love to see it used in that application.

WRT the D435i. So OAK-D’s premise is real-time spatial AI. So the RealSense cameras can’t do any neural inference/etc.

So here’s the lineage:
D4 chipset (Gen0): Intel D400 series (D410, D415 D435/i, D455): Depth only
Myriad 2 (Gen1): Intel T265: Depth + Tracking (i.e. great for SLAM)
Myriad X (Gen2): OpenCV AI Kit: Depth + AI + a whole slew of accelerated computer vision functions like H.265 encoding. Stereo neural inference, etc.

So to get the equivalent functionality of OAK-D you’d have to buy 3 things:

  1. 12MP camera
  2. Depth Camera (e.g. Intel D455)
  3. AI processor (e.g. NCS2).

And unlike previous depth or SLAM cameras, OAK-D isn’t for making maps of rooms or objects or for doing SLAM… it’s new use-case:
Giving real-time information on the physical location of objects (e.g. people, strawberries, fish and how big they are, seeds when seeding a farm, etc.).

It supports two ways of doing this:

Monocular Neural inference (e.g. off the color camera) fused with stereo depth. An example of that is below:
Spatial AI

Or stereo neural inference, like below:
Stereo AI

So the monocular AI + stereo depth is useful for objects, while the stereo AI can be used for objects or for features (e.g. facial landmarks, pose information, etc.)

Both modes support standard 2D models (since 3D training data is extremely limited compared to 2D data) and use the stereo cameras to produce 3D results all onboard.

Sorry that was a bit long but I hope it helps!



Brandon & Jon will demo OAK during 2020-07-23T22:00:00Z call which is open to all ROS 2 Edge AI WG Meeting - Thursday July 23rd 15:00 PST (UTC-8). Fernando will demo ROS YOLACT, we will review results of ROS user ML survey and make a plan

0b6dd1a76adcd835d6ac0d9faa6bb2b5_original d23ce8700f65a9edd76d9cb3490cfb73_original 85575aa41244de7c44270520fececdd1_original fede55b7bb767a7336cd564a761685d3_original

1 Like

Thanks @joespeed! It was great being able to share on the talk. Oh and if anyone is interested on more strawberry picking, see below:

The student who is making this system is also going to record a version with a bunch of clutter/grass/etc. being put into the scene while it picks to show off that it is indeed ML-based on not just hue-detection against a background. :slight_smile:


This sounds pretty clear. So it’s not suitable for doing SLAM. Is this a result of the optics chosen for the OAK-D? If so what, in particular, is the issue? Or is it something fundamental about the Myriad X (as opposed to the Myriad 2)?

So for SLAM, one would still have to choose e.g. a T265, rather than an OAK-D?

1 Like

Great question @ghawkins.

TL;DR: It’s the optics. Wide FOV (160-deg.) is great for SLAM. We have narrow FOV (70 deg). You could use an OAK-D as-is for SLAM, similar to how you could use an Intel D455 for SLAM, but likely the T265 will outperform both (disclaimer: I don’t really know, just a guess).

Myriad X actually has a ton of features that enable great SLAM that Myriad 2 doesn’t (super-low-latency hardware-accelerated depth, hardware-accelerated feature tracking and optical flow, etc).

So using wide FOV optics on OAK-D would make it an insanely good SLAM camera.

So… I may have over-stated it when I said it’s not suitable for doing SLAM. What I should have said is that it was not architected with SLAM in mind (narrow optics). So in contrast the Intel T265 was architected to be a simultaneous localization and mapping device.

And yes, you nailed it on the optics WRT being optimized for SLAM (or not). So the main difference is the Intel T265 has a fisheye lenses optimized for wide field of view (163 degrees IIRC), so as to optimize optical tracking (the more of the world around you that you see, the better the tracking results) whereas our grayscale cameras are optimized for comparatively narrow field of view (~70 degrees) for producing better spatial neural inference results - i.e. object or feature positions in 3D space.

I’m also FAR from knowledgable about SLAM… so someone who’s great at it might come in here and say: “What the heck are you talking about? OAK-D as-is is ideal for SLAM for reason X, Y, and Z”.

One thing I do know is that the capability to run AI onboard, even in SLAM use-cases, can be quite powerful for ignoring moving objects/etc. and so producing better localization results in the presence of people, forklifts, other robots, etc.

And in fact actually use the exact same image sensor (OV9282) as the T265… so it’s as simple as different optics which would make the device optimized for SLAM or not. Where wide FOV is desirable for SLAM applications, and narrow FOV is favorable for real-time 3D object/feature neural inference application.

On other thing to note is that since OAK-D is open source, it is possible to use other camera modules - say ones like used in the T265 - and then in that case OAK-D would be a top-notch SLAM+AI camera. For example here is an OV9281 module which would allow fisheye lenses, and there are some options for direct-integrated fisheye as well.

And some companies have reached out about making OAK-D variants (based off our open source hardware here) with wide-angle OV9282-based cameras that these companies already produce. And we’re super excited for them to do so as it’s part of our goal with open sourcing - to allow folks to leverage all this and go optimize for other use-cases we either didn’t focus on or couldn’t even think of.


I hope that helps!



Wow - thanks @Luxonis-Brandon - I never expected such a detailed reply!

I only have experience with the Sony IMX219 sensor used in the cheaper wide-angle cameras for the Jetson Nano (e.g. this 160° one from Waveshare).

One cool option might have been to make it possible to swap out the sensor and optics. Despite its low price, this is possible with the current (V2) standard Raspberry Pi camera like so:

So companies like Waveshare sell alternative modules with different optics and sensors (e.g. this 160° module) that you can snap on and off the main camera PCB as needed.

Something for the OAK-D mark 2 :smile:


I would highly advocate for M12 lens mounts in the next version, or at least an option for them. I love M12 lenses as you get all the functionality of a C mount lens at a fraction of the price. The diversity of options has really proliferated in the past couple years. The ability to add on filters also has advantages for some applications (e.g. autonomous boats). The only difficulty with using M12’s is that they would require re-calibration for stereo reconstruction and SLAM. That’s not a huge deal but it makes things not work “out of the box.”

Having said that, thanks to cell phones, we have some pretty great surface mount sensors at this point so the current cameras probably cover 80% of applications.


Thanks both! Totally agree with the comments here.

So serendipitously ArduCam reached out actually to offer to support other camera modules. And so with a crazy-short turn afforded by ArduCam, I am excited to already mention we are working on support for everything requested in this thread:

So this means we can make an OAK-D variant that is entirely M12-mount compatible! How cool!

M12-mount OV9281 used on existing ArduCam dual-camera solution:

155-degree HFOV fisheye for enabling SLAM use-cases:

So @ghawkins with this 155HFOV module on OAK-D, this would the ideal SLAM solution you were asking about.

And @Katherine_Scott this combination allows the awesome flexibility you mention from M12 mount capability, while also allowing pre-calibrated camera intrinsics for more SLAM application with the integrated FishEye 155 HFOV module.




Wow - ArduCam has changed a lot over the years. I’m embarrassed to admit that the last time I looked at them (several years ago) it was in relation to mid-range optics, that were extremely competitive price-wise with the kind of AI-oriented lenses/sensors available at the time, but were still somewhat on the pricey side for people whose company/institution wasn’t paying for things.

On looking now, I see they offer a huge range of cameras/sensors for many different platforms (almost a bewilderingly large range - it’s a little difficult to get an overview).

I added a comment to the GitHub issue you created. I couldn’t find the OV9282 module you mentioned (maybe it’s still in development?). The nearest I could find was this US$30 170° OV9281 based camera.


You are right, we are making progress in the past few years and there are lot of new things coming out. it is yes diagonal 166 degrees lens, almost fisheye, and no black corners. This is the very module I shown to Brandon last night.
I also mentioned it is only 1-lane MIPI output which can only handle half of the frame rate (60fps) stated in the datasheet (120). I think it is still good enough to have stereo 60fps performance on OAK-D board.
We will work with DeepthAI to port 1-lane MIPI driver to make it working on OAK-D board.

1 Like

Thanks both! Yes, I had incorrectly stated OV9282 fisheye whereas it is actually OV9281 - but as far as I know they are extremely similar with the only difference being the packaging and the chief ray angle (CRA).

From here:

The OV9281 has a chief ray angle (CRA) of 9 degrees and comes in a chip scale package (CSP). The OV9282 features a CRA of 27 degrees and is available in a reconstructed wafer (RW) format. Both sensors are currently available in volume production.

I have to admit I don’t know enough about CRA to know how this would flow to SLAM/etc. Anyone here happen to know?

Two M12 lens arducam camera modules for those who need to import to Altium for evaluation.
Arducam 3d model for drop-in replacement IMX219 camera module

Arducam 3d model for OV2311 camera module

As far as I know the die package sensors are designed for mobile devices which have strict low profile requirement. The light enters into the lens with big bending in order to cover all the sensor surface, the sensor CRA (actually they are micro-lenses on top of the sensor) should match with the CRA of the lens on the module. That’s why the OV9282 is designed with bigger CRA (Chief Ray Angle), although the electrical performance is the same as OV9281.
Actually the fisheye module is using OV9282, sorry for the confusion.


Thanks for both! We’ll be trying these out in Altium today.

Fits almost perfectly. Just happenstance that the existing mounting hole fits with the lens housing mounting hole, but nice! We could make room for this on the design pretty easily, I think. Would need to move a few passive components and punch a hole (doesn’t need to be plated or have annular ring).

This is just the physical M12 housing - the connector/MIPI pinout would need to be figured out. Likely shorter cables and then either changing the connector on OAK-D, or if possible changing the connector on the ArduCam module.


1 Like

@ArduCAM @Luxonis-Brandon We’re calling y’all the postmen, because y’all deliver! Expedited shipping.


Now to figure out an open source calibration routine. I think the camera calibration packages could probably use some ROS 2 love.


Heh. Thanks and agreed WRT ArduCam. Very looking forward to working with ArduCam on this. The NDVI applications alone will be super cool, afforded by lenses with filters

1 Like

Hi guys, I’ve checked in a working first draft of ROS2 support for the OpenCV AI Kit. It can be found here along with some setup instructions:

It’s essentially a ROS2 wrapper for the python interface defined here: It’ll broadcast a topic for each stream specified with the cliArgs parameter. It will also take any input parameter that the will take. The included demoListen component can be used as an example on how to receive those topics in your own ROS2 nodes.

For a more detailed list of arguments that can be passed see the help or add ‘help’ as a cliArg to the depthai_wrapper talker component.

I’ll be offline for about a week but am interested in any comments and opinions you guys might have. I’m pretty new to both ROS2 and Python so I’d very much welcome suggestions and criticism.

Thanks for reading!


Brought me to this thread a new search about OAK. The search was “OAK FISHEYE”. The reason of this search is I’m opening a small robotics company focused in affordable R&D robots. My first product use D435, T265 and AI accelerator. As main feature they are AI and SLAM capable.
My second product, more capable use from 1 to 4 cameras Arducam imx477, D455 and T265.

They are customized to the customer requeriment, so I was looking a while ago to make one platform based in OAK as I knew it gonna be successful and probably some customer would ask for it, but unfortunately I knew that with that FOV it could not be integrated in any of my mobile robots as main sensor.
2 months after I searched again in case there is some mod released in this time or an alternative version, and what a surprise!.

I very happy of what I’m reading here, happy because Arducam will have hands on it, and happy that the decision spot on in the robotics requirement, and very happy how OAK team is listening. Just great.

If you allow me I would like add , the desirable of a dynamic calibration feature, please check this feature of Realsense, also is desirable the RGB camera be pixel to pixel aligned with the depth frame and very important global shutter, just like D455. They are quite challenging features, but I think they deserves at least study the possibility. An input via i2c would be great (in example to can correct or receive external odometry and a external synchronization input in case have no one, in example for multi-camera setup.
Looking forward to the new modification be released to build a prototype and can offer it to my future customers.
This features could convert OAK in a software defined camera, in example pass from a spatial object detection camera, to a SLAM camera or tracking camera, CV camera and who knows with the right hardware a semantic slam camera (for which the dynamic calibration could be great, well almost necessary) .

My better wishes

Andrés Camacho

1 Like