OpenCV AI Kit (OAK)

Hi Everyone,

Brandon from Luxonis here, one of the many folks behind the OpenCV AI Kit!

So we wanted to announce here that our campaign for the kit is live!


TLDR: AI + depth in a small, modular, and embeddable device.

We have prototyped (proof-of-concept’ed) initial support for ROS (and have some Alpha testers using it in production actually!) on our host driver, and will be delivering full support by the time the KickStarter hardware ships.

Please feel free to ping us with any and all questions!

Thanks,
Brandon Gilles
OpenCV AI Kit Team

13 Likes

So we hit $250k so OAK-D will now include an integrated IMU for all the backers!

2 Likes

Do you perhaps mean “will now include”? :stuck_out_tongue:

1 Like

Oh yes thank you! Just edited!

1 Like

Now that it includes an IMU, will the IMU be hardware synchronized with the cameras?

2 Likes

Great question thanks. Yes we will be timestamping with the MIPI receipts so it will be synced to the micro-second level.

2 Likes

Huge fan of the work @Luxonis-Brandon and team are doing, I backed it!

Offloading edge compute in a small, modular, and embeddable device away from host CPU will enable a lot safer, less expensive robotic solutions.

Keep up the great work.

2 Likes

Thanks @SeeBQ ! (And cool to see you on here!)

2 Likes

is it possible to build robot control like of https://hello-robot.com STRETCH with OAK-D?
how does OAK-D compare to intel realsense (there is realsense d435i with imu)?
maybe frame rate speed or accuracy, compute power, simplicity of integration.

how to build a system like that in ros, like a general road map?

1 Like

Yes, I think so. Would love to see it used in that application.

WRT the D435i. So OAK-D’s premise is real-time spatial AI. So the RealSense cameras can’t do any neural inference/etc.

So here’s the lineage:
D4 chipset (Gen0): Intel D400 series (D410, D415 D435/i, D455): Depth only
Myriad 2 (Gen1): Intel T265: Depth + Tracking (i.e. great for SLAM)
Myriad X (Gen2): OpenCV AI Kit: Depth + AI + a whole slew of accelerated computer vision functions like H.265 encoding. Stereo neural inference, etc.

So to get the equivalent functionality of OAK-D you’d have to buy 3 things:

  1. 12MP camera
  2. Depth Camera (e.g. Intel D455)
  3. AI processor (e.g. NCS2).

And unlike previous depth or SLAM cameras, OAK-D isn’t for making maps of rooms or objects or for doing SLAM… it’s new use-case:
Giving real-time information on the physical location of objects (e.g. people, strawberries, fish and how big they are, seeds when seeding a farm, etc.).

It supports two ways of doing this:

Monocular Neural inference (e.g. off the color camera) fused with stereo depth. An example of that is below:
Spatial AI

Or stereo neural inference, like below:
Stereo AI

So the monocular AI + stereo depth is useful for objects, while the stereo AI can be used for objects or for features (e.g. facial landmarks, pose information, etc.)

Both modes support standard 2D models (since 3D training data is extremely limited compared to 2D data) and use the stereo cameras to produce 3D results all onboard.

Sorry that was a bit long but I hope it helps!

Thanks,
Brandon

5 Likes

Brandon & Jon will demo OAK during 2020-07-23T22:00:00Z call which is open to all ROS 2 Edge AI WG Meeting - Thursday July 23rd 15:00 PST (UTC-8). Fernando will demo ROS YOLACT, we will review results of ROS user ML survey and make a plan

0b6dd1a76adcd835d6ac0d9faa6bb2b5_original d23ce8700f65a9edd76d9cb3490cfb73_original 85575aa41244de7c44270520fececdd1_original fede55b7bb767a7336cd564a761685d3_original

1 Like

Thanks @joespeed! It was great being able to share on the talk. Oh and if anyone is interested on more strawberry picking, see below:
https://youtu.be/Okjh2OCP-o8

The student who is making this system is also going to record a version with a bunch of clutter/grass/etc. being put into the scene while it picks to show off that it is indeed ML-based on not just hue-detection against a background. :slight_smile:

3 Likes

This sounds pretty clear. So it’s not suitable for doing SLAM. Is this a result of the optics chosen for the OAK-D? If so what, in particular, is the issue? Or is it something fundamental about the Myriad X (as opposed to the Myriad 2)?

So for SLAM, one would still have to choose e.g. a T265, rather than an OAK-D?

1 Like

Great question @ghawkins.

TL;DR: It’s the optics. Wide FOV (160-deg.) is great for SLAM. We have narrow FOV (70 deg). You could use an OAK-D as-is for SLAM, similar to how you could use an Intel D455 for SLAM, but likely the T265 will outperform both (disclaimer: I don’t really know, just a guess).

Myriad X actually has a ton of features that enable great SLAM that Myriad 2 doesn’t (super-low-latency hardware-accelerated depth, hardware-accelerated feature tracking and optical flow, etc).

So using wide FOV optics on OAK-D would make it an insanely good SLAM camera.

So… I may have over-stated it when I said it’s not suitable for doing SLAM. What I should have said is that it was not architected with SLAM in mind (narrow optics). So in contrast the Intel T265 was architected to be a simultaneous localization and mapping device.

And yes, you nailed it on the optics WRT being optimized for SLAM (or not). So the main difference is the Intel T265 has a fisheye lenses optimized for wide field of view (163 degrees IIRC), so as to optimize optical tracking (the more of the world around you that you see, the better the tracking results) whereas our grayscale cameras are optimized for comparatively narrow field of view (~70 degrees) for producing better spatial neural inference results - i.e. object or feature positions in 3D space.

I’m also FAR from knowledgable about SLAM… so someone who’s great at it might come in here and say: “What the heck are you talking about? OAK-D as-is is ideal for SLAM for reason X, Y, and Z”.

One thing I do know is that the capability to run AI onboard, even in SLAM use-cases, can be quite powerful for ignoring moving objects/etc. and so producing better localization results in the presence of people, forklifts, other robots, etc.

And in fact actually use the exact same image sensor (OV9282) as the T265… so it’s as simple as different optics which would make the device optimized for SLAM or not. Where wide FOV is desirable for SLAM applications, and narrow FOV is favorable for real-time 3D object/feature neural inference application.

On other thing to note is that since OAK-D is open source, it is possible to use other camera modules - say ones like used in the T265 - and then in that case OAK-D would be a top-notch SLAM+AI camera. For example here is an OV9281 module which would allow fisheye lenses, and there are some options for direct-integrated fisheye as well.

And some companies have reached out about making OAK-D variants (based off our open source hardware here) with wide-angle OV9282-based cameras that these companies already produce. And we’re super excited for them to do so as it’s part of our goal with open sourcing - to allow folks to leverage all this and go optimize for other use-cases we either didn’t focus on or couldn’t even think of.

Thoughts?

I hope that helps!

Thanks,
Brandon

2 Likes

Wow - thanks @Luxonis-Brandon - I never expected such a detailed reply!

I only have experience with the Sony IMX219 sensor used in the cheaper wide-angle cameras for the Jetson Nano (e.g. this 160° one from Waveshare).

One cool option might have been to make it possible to swap out the sensor and optics. Despite its low price, this is possible with the current (V2) standard Raspberry Pi camera like so:

So companies like Waveshare sell alternative modules with different optics and sensors (e.g. this 160° module) that you can snap on and off the main camera PCB as needed.

Something for the OAK-D mark 2 :smile:

3 Likes

I would highly advocate for M12 lens mounts in the next version, or at least an option for them. I love M12 lenses as you get all the functionality of a C mount lens at a fraction of the price. The diversity of options has really proliferated in the past couple years. The ability to add on filters also has advantages for some applications (e.g. autonomous boats). The only difficulty with using M12’s is that they would require re-calibration for stereo reconstruction and SLAM. That’s not a huge deal but it makes things not work “out of the box.”

Having said that, thanks to cell phones, we have some pretty great surface mount sensors at this point so the current cameras probably cover 80% of applications.

5 Likes

Thanks both! Totally agree with the comments here.

So serendipitously ArduCam reached out actually to offer to support other camera modules. And so with a crazy-short turn afforded by ArduCam, I am excited to already mention we are working on support for everything requested in this thread:

So this means we can make an OAK-D variant that is entirely M12-mount compatible! How cool!

M12-mount OV9281 used on existing ArduCam dual-camera solution:

155-degree HFOV fisheye for enabling SLAM use-cases:

So @ghawkins with this 155HFOV module on OAK-D, this would the ideal SLAM solution you were asking about.

And @Katherine_Scott this combination allows the awesome flexibility you mention from M12 mount capability, while also allowing pre-calibrated camera intrinsics for more SLAM application with the integrated FishEye 155 HFOV module.

Thoughts?

Thanks,
Brandon

2 Likes

Wow - ArduCam has changed a lot over the years. I’m embarrassed to admit that the last time I looked at them (several years ago) it was in relation to mid-range optics, that were extremely competitive price-wise with the kind of AI-oriented lenses/sensors available at the time, but were still somewhat on the pricey side for people whose company/institution wasn’t paying for things.

On looking now, I see they offer a huge range of cameras/sensors for many different platforms (almost a bewilderingly large range - it’s a little difficult to get an overview).

I added a comment to the GitHub issue you created. I couldn’t find the OV9282 module you mentioned (maybe it’s still in development?). The nearest I could find was this US$30 170° OV9281 based camera.

2 Likes

You are right, we are making progress in the past few years and there are lot of new things coming out.
https://www.arducam.com/product/arducam-ov9281-mipi-1mp-monochrome-global-shutter-camera-module-raspberry-pi/ it is yes diagonal 166 degrees lens, almost fisheye, and no black corners. This is the very module I shown to Brandon last night.
I also mentioned it is only 1-lane MIPI output which can only handle half of the frame rate (60fps) stated in the datasheet (120). I think it is still good enough to have stereo 60fps performance on OAK-D board.
We will work with DeepthAI to port 1-lane MIPI driver to make it working on OAK-D board.

1 Like

Thanks both! Yes, I had incorrectly stated OV9282 fisheye whereas it is actually OV9281 - but as far as I know they are extremely similar with the only difference being the packaging and the chief ray angle (CRA).

From here: https://www.prnewswire.com/news-releases/omnivisions-new-1-megapixel-high-speed-global-shutter-image-sensors-enable-low-latency-computer-vision-applications-300386955.html

The OV9281 has a chief ray angle (CRA) of 9 degrees and comes in a chip scale package (CSP). The OV9282 features a CRA of 27 degrees and is available in a reconstructed wafer (RW) format. Both sensors are currently available in volume production.

I have to admit I don’t know enough about CRA to know how this would flow to SLAM/etc. Anyone here happen to know?