Building a Tensorflow Object Detection and Localization ROS Package

Cannot waiting for the release. Since you mentioned it also does localization, may I ask how can you get the localization information (x, y and z) from a streaming video? Does it contain point cloud or something?