GOOGLE ANNOUNCES THE OBJECTRON DATASET TO UNDERSTAND OBJECTS IN 3D

Most Read Tech

Tue 17 November 2020:

Machine learning helps to perform various computer vision tasks. However, understanding objects in 3D is still a challenging problem due to the lack of 3D datasets. Recently, researchers published an Objectron dataset consisting of short object-centered video clips. It also includes camera metadata and manually annotated 3D bounding boxes.

On November 9, Google AI released its Objectron dataset — a collection of short, object-centric video clips capturing a large set of common objects from different angles. Each video clip is accompanied by AR session metadata that includes both camera poses and sparse point-clouds.

The Google researchers hope the dataset’s release will help the research community push the limits of 3D object-geometry understanding, which has the potential to power a wide range of applications such as augmented reality, robotics, autonomy, and image retrieval.

The dataset can be applied in 3D object detection. A two-stage model is proposed for this task. Firstly, the object detector finds the 2D crop of an object. Then, a 3D bounding box is estimated, and a 2D crop for the next frame is computed. That way, the object detector does not have to run every frame.

Image for post

The approach is suitable for detecting shoes, chairs, mugs, and cameras. The authors also suggest an algorithm to calculate metrics for the performance of 3D object detection. This work paves the road for further applications in view synthesis or 3D representation.

The Google researchers believe the ML community has a strong need for object-centric video datasets that capture more of the 3D structure of an object while matching the data format used for many vision tasks, and so decided to release the Objectron dataset to aid in the training and benchmarking of ML models.

The Objectron dataset contains manually annotated 3D bounding boxes that describe each object’s position, orientation, and dimensions, comprising 15,000 annotated video clips supplemented with over 4 million annotated images collected from a geo-diverse sample covering 10 countries across five continents.

Image for post

Along with the dataset, the researchers also shared a 3D object detection solution for the shoes, chairs, mugs and cameras categories. The models are trained with the Objectron dataset and have been released in MediaPipe, Google’s open-source framework for cross-platform customizable ML solutions for live and streaming media.

The Objectron dataset is available on GitHub.

Article originally Published in Technology.org CLICK HERE 

 

FOLLOW INDEPENDENT PRESS:

TWITTER (CLICK HERE)
https://twitter.com/IpIndependent

FACEBOOK (CLICK HERE)
https://web.facebook.com/ipindependent

Think your friends would be interested? Share this story!

Leave a Reply

Your email address will not be published. Required fields are marked *