一种改进dueling网络的机器人避障方法

doi:10.19665/j.issn1001-2400.2019.01.008

Abstract

Abstract:

In view of the disadvantages of traditional reinforcement learning methods in motion planning, especially the problem of robot obstacle avoidance, it is easy to have overestimation and difficult to adapt to complex environment. A new model based on deep reinforcement learning is proposed to improve the obstacle avoidance performance of robots. The model combines dueling networks with Q-learning which is the traditional reinforcement learning method, and using two independent trained dueling networks to deal with environmental data and predict the action value. In the output layer, the state value and the action advantage are output respectively, with both values combined as the final action value. The model can process high dimension data to adapt to complex and changeable environment, and output advantageous actions for robot selection to get a higher accumulative reward. It can effectively improve the obstacle avoidance performance of a robot.

Key words: robot obstacle avoidance, deep reinforcement learning, dueling networks, independent trained

CLC Number:

TP242.6

ZHOU Yi,CHEN Bo. Method for robot obstacle avoidance based on the improved dueling network[J].Journal of Xidian University, 2019, 46(1): 46-50.

Figures/Tables 7

References 12

[1]	LAVALLE S M . Motion Planning[J]. IEEE Robotics and Automation Magazine, 2011,18(2):108-118.
[2]	NILSSON N J . Shakey the Robot[J]. Sri International Menlo Park, 1984,42(1991):38-65.
[3]	ORLIN J . Network Flows[J]. Journal of the Operational Research Society, 1993,45(11):791-796.
[4]	STENTZ A. The Focussed D* Algorithm for Real-time Replanning [C]//Proceedings of the 1995 IEEE Joint Conference on Artificial Intelligence. Piscataway: IEEE, 1995: 1652-1659.
[5]	KHATIB O . Real-time Obstacle Avoidance for Manipulators and Mobile Robots[J]. International Journal of Robotics Research, 1986,5(1):90-98. doi: 10.1007/978-1-4613-8997-2_29
[6]	SUTTON R S, BARTO A G. Reinforcement Learning: an Introduction[M]. 2nd edition. Cambridge: The MIT Press, 2017.
[7]	ZHANG Q C, LIN M, YANG L T , et al. Energy-efficient Scheduling for Real-time Systems Based on Deep Q-learning Model[J]. IEEE Transactions on Sustainable Computing, 2017, DOI 10.1109/TSUSC. 2017. 2743704. doi: 10.1109/TSUSC.2017.2743704
[8]	DERHAMI V, MAJD V J, AHMADABADI M N . Fuzzy Sarsa Learning and the Proof of Existence of Its Stationary Points[J]. Asian Journal of Control, 2008,10(5):535-549. doi: 10.1002/asjc.54
[9]	MNIH V, KAVUKCUOGLU K, SILVER D , et al. Human Level Control Through Deep Reinforcement Learning[J]. Nature, 2015,518:529-533. doi: 10.1038/nature14236 pmid: 25719670
[10]	MNIH V, KAVUKCUOGLU K, SILVER D , et al. Playing Atari with Deep Reinforcement Learning[J]. Computer Science, 2013,1312(5602):23-32.
[11]	PAN J, WANG X CHENG Y , et al. Multisource Transfer Double DQN Based on Actor Learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018,29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087
[12]	WANG Z, SCHAUL T, HESSEL M, et al. Dueling Network Architectures for Deep Reinforcement Learning [C]// Proceedings of the 2016 33rd International Conference on Machine Learning. Lille: International Machine Learning Society (IMLS), 2016: 2939-2947.

算法模型	平均累积奖励	最大奖励	平均步数
Q学习	-20.917	41.5	49
深度Q网络	441.958	3600.0	190
可独立训练dueling网络	742.004	3668.0	294

Method for robot obstacle avoidance based on the improved dueling network

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 12

Related Articles 0

Metrics

Comments

Recommended 10