Abstract

Sensing and modelling of the surrounding environment is crucial for solving many of the problems in intelligent machines like self-driving cars, autonomous robots, and augmented reality displays. Performance, reliability and safety of the autonomous agents rely heavily on the way the environment is modelled. Two-dimensional models are inadequate to capture the three-dimensional nature of real-world scenes. Three-dimensional models are necessary to achieve the standards required by the autonomy stack for intelligent agents to work alongside humans. Data driven deep learning methodologies for three-dimensional scene modelling has evolved greatly in the past few years because of the availability of huge amounts of data from variety of sensors in the form of well-designed datasets. 3D object detection and localization are two of the key requirements for tasks such as obstacle avoidance, agent-to-agent interaction, and path planning. Most methodologies for object detection work on a single sensor data like camera or LiDAR. Camera sensors provide feature rich scene data and LiDAR provides us 3D geometrical information. Advanced object detection and localization can be achieved by leveraging the information from both camera and LiDAR sensors. In order to effectively quantify the uncertainty of each sensor channel, an appropriate fusion strategy is needed to fuse the independently encoded point clouds from LiDAR with the RGB images from standard vision cameras. In this work, we introduce a fusion strategy and develop a multimodal pipeline which utilizes existing state-of-the-art deep learning based data encoders to produce robust 3D object detection and localization in real-time. The performance of the proposed fusion model is evaluated on the popular KITTI 3D benchmark dataset.

Library of Congress Subject Headings

Multisensor data fusion; Machine learning; Computer vision; Pattern recognition systems; Three dimensional imaging

Publication Date

6-2020

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Engineering (MS)

Department, Program, or Center

Computer Engineering (KGCOE)

Advisor

Raymond Ptucha

Advisor/Committee Member

Amlan Ganguly

Advisor/Committee Member

Clark Hochgraf

Recommended Citation

Bhanushali, Darshan Ramesh, "Multi-Sensor Fusion for 3D Object Detection" (2020). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10526

Campus

RIT – Main Campus

Plan Codes

CMPE-MS

Download

COinS

Theses

Multi-Sensor Fusion for 3D Object Detection

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Multi-Sensor Fusion for 3D Object Detection

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links