Electronic Science and Technology ›› 2022, Vol. 35 ›› Issue (4): 14-19.doi: 10.16180/j.cnki.issn1007-7820.2022.04.003

Previous Articles     Next Articles

VSLAM for Indoor Dynamic Scenes

Hongjun SAN1,Wanglin WANG1,Jiupeng CHEN1,Feiya XIE2,Yangyang XU1,Jia CHEN1   

  1. 1. Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650500,China
    2. No.78098 Unit of PLA,Meishan 620031,China
  • Received:2021-05-07 Online:2022-04-15 Published:2022-04-15
  • Supported by:
    National Key R&D Projects(2017YFC1702503);Major Special Project of Yunnan Provincial S&T Department(202002AC080001)


The traditional VSLAM algorithm is implemented based on static scenes, and the positioning accuracy is degraded in indoor dynamic scenes, and the 3D sparse point cloud map has problems such as mismatching of dynamic feature points. In this study, the ORB-SLAM2 framework is improved, which is combined with Mask R-CNN to perform semantic segmentation of images to remove dynamic feature points located on dynamic objects, optimize the camera pose, and obtain a static 3D sparse point cloud map. The experimental results on the public TUM dataset show that ORB-SLAM2 combined with Mask R-CNN effectively improves the pose estimation accuracy of intelligent mobile robots. The root mean square error of the absolute trajectory can be increased by 96.3%. The root mean square error of relative translation trajectory can be increased by 41.2%, and the relative rotation trajectory error has also been significantly improved. Compared with ORB-SLAM2, the proposed method can more accurately establish a 3D sparse point cloud map without the interference of dynamic object feature points.

Key words: VSLAM, indoor dynamic scene, Mask R-CNN, semantic segmentation, accuracy of pose estimation, ORB-SLAM2, TUM data set, 3D sparse point cloud map

CLC Number: 

  • TP242.6