Journal of Xidian University ›› 2020, Vol. 47 ›› Issue (2): 75-82.doi: 10.19665/j.issn1001-2400.2020.02.011

Previous Articles     Next Articles

Aircraft reinforcement learning multi-mode control in orbit

ZHANG Ying1,2,3,WEI Minfeng2,3,4,WANG Shihui2,3,TAO Leiyan5,CAO Jian1(),ZHANG Xing1   

  1. 1.School of Software and Microelectronics, Peking University, Beijing, 100871, China
    2.Beijing Aerospace Automatic Control Institute, Beijing, 100854, China
    3.National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing, 100854, China
    4.School of Automation, Beijing Institute of Technology, Beijing, 100081, China
    5. Beijing Institute of Remote Sensing Equipment, Beijing, 100854, China
  • Received:2019-08-30 Online:2020-04-20 Published:2020-04-26
  • Contact: Jian CAO E-mail:caojian@ss.pku.edu.cn

Abstract:

In order to improve the long-term in orbit flight reliability of the aircraft control system, a multi-mode control scheme is proposed based on reinforcement learning. This system includes a sensor module, a control module and an execution module. The sensor module is used to input the sensitive flight data of the aircraft to the control module in real time. This data is divided into multidimensional structured floating point data with historical relevance that can be directly used for aircraft control and the unique physical representation quantity of a particular sensor. The control module is divided into an input layer, a feature extraction layer and a full connection layer. The execution module is used to receive the driving data from the control module in real time, which includes the optimal state value for decision-making and the action output value for evaluation. The system decides which specific execution modules to use based on the optimal return value for decision making, with the output value of a selected specific execution module depending on the output value of the action used for evaluation. The system enables the aircraft to complete a long-term orbit operation in the multi-mode input and output state with 15ms fast response and 5.23GOP/s/W Performance per Watt.

Key words: aircraft, control system, multi-mode, reinforcement learning

CLC Number: 

  • TN911.22