TY - JOUR
T1 - DeepSLAM: A Robust Monocular SLAM System With Unsupervised Deep Learning
AU - Li, Ruihao
AU - Wang, Sen
AU - Gu, Dongbing
N1 - Funding Information:
Manuscript received October 15, 2019; revised January 8, 2020 and February 5, 2020; accepted March 2, 2020. Date of publication March 25, 2020; date of current version December 8, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 61903377, in part by the Engineering and Physical Sciences Research Council (EPSRC) Robotics and Artificial Intelligence Offshore Robotics for Certification of Assets (ORCA) Hub under Grant EP/R026173/1, in part by the EU H2020 Program under EUMarineR-obots Project under Grant 731103, and in part by the DeepField Project under Grant 857339. (Corresponding author: Sen Wang.) Ruihao Li is with the Artificial Intelligence Research Center, National Innovation Institute of Defense Technology, Beijing 100166, China, and also with the Tianjin Artificial Intelligence Innovation Center, Tianjin 300457, China (e-mail: [email protected]).
Publisher Copyright:
© 1982-2012 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2021/4
Y1 - 2021/4
N2 - In this article, we propose DeepSLAM, a novel unsupervised deep learning based visual simultaneous localization and mapping (SLAM) system. The DeepSLAM training is fully unsupervised since it only requires stereo imagery instead of annotating ground-truth poses. Its testing takes a monocular image sequence as the input. Therefore, it is a monocular SLAM paradigm. DeepSLAM consists of several essential components, including Mapping-Net, Tracking-Net, Loop-Net, and a graph optimization unit. Specifically, the Mapping-Net is an encoder and decoder architecture for describing the 3-D structure of environment, whereas the Tracking-Net is a recurrent convolutional neural network architecture for capturing the camera motion. The Loop-Net is a pretrained binary classifier for detecting loop closures. DeepSLAM can simultaneously generate pose estimate, depth map, and outlier rejection mask. In this article, we evaluate its performance on various datasets, and find that DeepSLAM achieves good performance in terms of pose estimation accuracy, and is robust in some challenging scenes.
AB - In this article, we propose DeepSLAM, a novel unsupervised deep learning based visual simultaneous localization and mapping (SLAM) system. The DeepSLAM training is fully unsupervised since it only requires stereo imagery instead of annotating ground-truth poses. Its testing takes a monocular image sequence as the input. Therefore, it is a monocular SLAM paradigm. DeepSLAM consists of several essential components, including Mapping-Net, Tracking-Net, Loop-Net, and a graph optimization unit. Specifically, the Mapping-Net is an encoder and decoder architecture for describing the 3-D structure of environment, whereas the Tracking-Net is a recurrent convolutional neural network architecture for capturing the camera motion. The Loop-Net is a pretrained binary classifier for detecting loop closures. DeepSLAM can simultaneously generate pose estimate, depth map, and outlier rejection mask. In this article, we evaluate its performance on various datasets, and find that DeepSLAM achieves good performance in terms of pose estimation accuracy, and is robust in some challenging scenes.
KW - Depth estimation
KW - machine learning
KW - recurrent convolutional neural network (RCNN)
KW - simultaneous localization and mapping (SLAM)
KW - unsupervised deep learning (DL)
UR - http://www.scopus.com/inward/record.url?scp=85094849307&partnerID=8YFLogxK
U2 - 10.1109/TIE.2020.2982096
DO - 10.1109/TIE.2020.2982096
M3 - Article
SN - 0278-0046
VL - 68
SP - 3577
EP - 3587
JO - IEEE Transactions on Industrial Electronics
JF - IEEE Transactions on Industrial Electronics
IS - 4
ER -