面向室内动态场景的VSLAM

doi:10.16180/j.cnki.issn1007-7820.2022.04.003

Abstract

Abstract:

The traditional VSLAM algorithm is implemented based on static scenes, and the positioning accuracy is degraded in indoor dynamic scenes, and the 3D sparse point cloud map has problems such as mismatching of dynamic feature points. In this study, the ORB-SLAM2 framework is improved, which is combined with Mask R-CNN to perform semantic segmentation of images to remove dynamic feature points located on dynamic objects, optimize the camera pose, and obtain a static 3D sparse point cloud map. The experimental results on the public TUM dataset show that ORB-SLAM2 combined with Mask R-CNN effectively improves the pose estimation accuracy of intelligent mobile robots. The root mean square error of the absolute trajectory can be increased by 96.3%. The root mean square error of relative translation trajectory can be increased by 41.2%, and the relative rotation trajectory error has also been significantly improved. Compared with ORB-SLAM2, the proposed method can more accurately establish a 3D sparse point cloud map without the interference of dynamic object feature points.

Key words: VSLAM, indoor dynamic scene, Mask R-CNN, semantic segmentation, accuracy of pose estimation, ORB-SLAM2, TUM data set, 3D sparse point cloud map

CLC Number:

TP242.6

Hongjun SAN,Wanglin WANG,Jiupeng CHEN,Feiya XIE,Yangyang XU,Jia CHEN. VSLAM for Indoor Dynamic Scenes[J].Electronic Science and Technology, 2022, 35(4): 14-19.

Figures/Tables 11

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Table 1

Table 2

Table 3

Table 4

Figure 6.

Figure 7.

References 19

[1]	高翔, 张涛, 刘毅. 视觉SLAM十四讲:从理论到实践[M]. 北京: 电子工业出版社, 2019.
	Gao Xiang, Zhang Tao, Liu Yi. Visual SLAM: from theory to practice[M]. Beijing: Publishing House of Electronics Industry, 2019.
[2]	Davison A J, Reid I D, Molton N D. MonoSLAM: Real-time single camera slam[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6):1052-1067. pmid: 17431302
[3]	De Croce M, Pire T, Bergero F. DS-PTAM: distributed stereo parallel tracking and mapping slam system[J]. Journal of Intelligent & Robotic Systems, 2019, 95(2):365-377.
[4]	Mur-Artal R, Tardos J D. ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5):1255-1262. doi: 10.1109/TRO.2017.2705103
[5]	Enhel J, Schoeps T, Cremers D. LSD-SLAM: Large-scale direct monocular slam[C]. Zurich:Proceedingss of the European Conference on Computer Vision, 2014.
[6]	Forster C, Zhang Z, Gassner M. SVO: Semidirect visual odometry for monocular and multicamera systems[J]. IEEE Transactions on Robotics, 2016, 33(2):249-265. doi: 10.1109/TRO.2016.2623335
[7]	Engel J, Koltun V, Cremers D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3):611-625. doi: 10.1109/TPAMI.2017.2658577
[8]	Yang S, Wang J, Wang G, et al. Robust RGB-D slam in dynamic environment using faster R-CNN[C]. Chengdu:IEEE International Conference on Computer and Communications, 2017.
[9]	Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031
[10]	Zhong F, Wang S, Zhang Z, et al. Detect-SLAM: Making object detection and slam mutually beneficial[C]. Lake Tahoe:Proceedingss of the IEEE Winter Conference on Applications of Computer Vision, 2018.
[11]	Wang Y B, Huang S D. Towards dense moving object segmentation based robust dense RGB-D slam in dynamic scenarios[C]. Singapore:Proceedingss of the Thirteenth International Conference on Control Automation Robotics & Vision, 2014.
[12]	Bescos B, Facil J M, Civera J, et al. Dynaslam: Tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4):4076-4083. doi: 10.1109/LRA.2018.2860039
[13]	Yu C, Liu Z X, Liu X J, et al. DS-SLAM: A semantic visual slam towards dynamic environments[C]. Madrid:Proceedingss of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018.
[14]	He K, Gkioxari G, Dollar P, et al. Mask R-CNN[C]. Venice:Proceedingss of the IEEE International Conference on Computer Vision, 2017.
[15]	叶飞, 刘子龙. 基于改进YOLOv3算法的行人检测研究[J]. 电子科技, 2021, 34(1):5-9.
	Ye Fei, Liu Zilong. Pedestrian detection based on improved YOLOv3 algorithm[J]. Electronic Science and Technology, 2021, 34(1):5-9.
[16]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]. Seattle:Proceedingss of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[17]	Tateno K, Tombari F, Laina I, et al. CNN-SLAM: Real-time dense monocular slam with learned depth prediction[C]. Honolulu:Proceedingss of the Conference on Computer Vision and Pattern Recognition, 2017.
[18]	Sturm J, Engelhard N, Endres F, et al. A benchmark for the evaluation of RGB-D slam systems[C]. Vilamoura-Algarve:Proceedingss of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
[19]	张慧丽, 彭晓东, 谢文明, 等. 一种动态光照下视觉VSLAM中的场景特征匹配方法[J]. 电子设计工程, 2018, 26(24):1-5.
	Zhang Huili, Peng Xiaodong, Xie Wenming, et al. A method of feature matching under changing illumination in VSLAM[J]. Electronic Design Engineering, 2018, 26(24):1-5.

数据集	ORB-SLAM2				本文算法
数据集	均方根误差/m	平均值/m	中值/m	标准差/m	均方根误差/m	平均值/m	中值/m	标准差/m
s_r	0.028	0.022	0.017	0.017	0.020	0.017	0.014	0.011
s_s	0.024	0.020	0.018	0.012	0.023	0.021	0.017	0.010
w_r	0.155	0.128	0.126	0.088	0.069	0.060	0.047	0.035
w_s	0.374	0.362	0.355	0.093	0.021	0.018	0.013	0.010

数据集	均方根误差/%	平均值/%	中值/%	标准差/%
s_r	2.9	2.3	17.6	35.3
s_s	4.2	-5.0	5.6	16.7
w_r	55.5	53.1	62.7	60.2
w_s	94.4	95.0	96.3	89.2

数据集	ORB-SLAM2				本文算法
数据集	均方根误差/m	平均值/m	中值/m	标准差/m	均方根误差/m	平均值/m	中值/m	标准差/m
s_r	0.023	0.018	0.015	0.015	0.016	0.012	0.008	0.011
s_s	0.033	0.025	0.020	0.022	0.020	0.017	0.014	0.012
w_r	0.051	0.032	0.019	0.040	0.030	0.025	0.023	0.017
w_s	0.030	0.018	0.009	0.024	0.027	0.018	0.006	0.019

数据集	均方根误差/%	平均值/%	中值/%	标准差/%
s_r	30.4	33.3	46.7	26.7
s_s	39.4	32.0	30.0	45.5
w_r	41.2	21.9	-21.1	57.5
w_s	10.0	0.0	33.3	20.8

VSLAM for Indoor Dynamic Scenes

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 19

Related Articles 1

Metrics

Comments

Recommended 10