Abstract
The advanced development in autonomous agents like self-driving cars can be attributed to computer vision, a branch of artificial intelligence that enables software to understand the content of image and video. These autonomous agents require a three-dimensional modelling of its surrounding in order to operate reliably in the real-world. Despite the significant progress of 2D object detectors, they have a critical limitation in location sensitive applications as they do not provide accurate physical information of objects in 3D space. 3D object detection is a promising topic that can provide relevant solutions which could improve existing 2D based applications. Due to the advancements in deep learning methods and relevant datasets, the task of 3D scene understanding has evolved greatly in the past few years. 3D object detection and localization are crucial in autonomous driving tasks such as obstacle avoidance, path planning and motion control. Traditionally, there have been successful methods towards 3D object detection but they rely on highly expensive 3D LiDAR sensors for accurate depth information. On the other hand, 3D object detection from single monocular images is inexpensive but lacks in accuracy. The primary reason for such a disparity in performance is that the monocular image-based methods attempt at inferring 3D information from 2D images. In this work, we try to bridge the performance gap observed in single image input by introducing different mapping strategies between the 2D image data and its corresponding 3D representation and use it to perform object detection in 3D. The performance of the proposed method is evaluated on the popular KITTI 3D object detection benchmark dataset.
Library of Congress Subject Headings
Pattern recognition systems; Computer vision; Three-dimensional imaging; Automated vehicles--Data processing
Publication Date
7-7-2021
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Advisor
Guoyu Lu
Advisor/Committee Member
Andres Kwasinski
Advisor/Committee Member
Alexander Loui
Recommended Citation
Tembe, Atharva Arun, "Monocular 3D Object Detection via Ego View-to-Bird’s Eye View Translation" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10854
Campus
RIT – Main Campus
Plan Codes
CMPE-MS