Abstract
Multi-step manipulation tasks in unstructured environments are extremely challenging for a robot to learn. Such tasks interlace high-level reasoning that consists of the expected states that can be attained to achieve an overall task and low-level reasoning that decides what actions will yield these states. A model-free deep reinforcement learning method is proposed to learn multi-step manipulation tasks. This work introduces a novel Generative Residual Convolutional Neural Network (GR-ConvNet) model that can generate robust antipodal grasps from n-channel image input at real-time speeds (20ms). The proposed model architecture achieved a state-of-the-art accuracy on three standard grasping datasets. The adaptability of the proposed approach is demonstrated by directly transferring the trained model to a 7 DoF robotic manipulator with a grasp success rate of 95.4% and 93.0% on novel household and adversarial objects, respectively. A novel Robotic Manipulation Network (RoManNet) is introduced, which is a vision-based model architecture, to learn the action-value functions and predict manipulation action candidates. A Task Progress based Gaussian (TPG) reward function is defined to compute the reward based on actions that lead to successful motion primitives and progress towards the overall task goal. To balance the ratio of exploration/exploitation, this research introduces a Loss Adjusted Exploration (LAE) policy that determines actions from the action candidates according to the Boltzmann distribution of loss estimates. The effectiveness of the proposed approach is demonstrated by training RoManNet to learn several challenging multi-step robotic manipulation tasks in both simulation and real-world. Experimental results show that the proposed method outperforms the existing methods and achieves state-of-the-art performance in terms of success rate and action efficiency. The ablation studies show that TPG and LAE are especially beneficial for tasks like multiple block stacking.
Library of Congress Subject Headings
Reinforcement learning; Robots--Motion; Computer vision
Publication Date
4-2022
Document Type
Dissertation
Student Type
Graduate
Degree Name
Engineering (Ph.D.)
Department, Program, or Center
Engineering (KGCOE)
Advisor
Ferat Sahin
Advisor/Committee Member
Andres Kwasinski
Advisor/Committee Member
Christopher Kanan
Recommended Citation
Kumra, Sulabh, "Learning Multi-step Robotic Manipulation Tasks through Visual Planning" (2022). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11140
Campus
RIT – Main Campus
Plan Codes
ENGR-PHD