FOLLOWUS
School of Automation, Southeast University, Nanjing 210096, China
Qing-ling WANG, E-mail: qlwang@seu.edu.cn
纸质出版日期:2020-05,
收稿日期:2019-11-22,
修回日期:2020-04-27,
Scan QR Code
胡欢, 王庆领. 基于带积分补偿近端策略优化算法的四旋翼控制[J]. 信息与电子工程前沿(英文), 2020,21(5):777-795.
HU HUAN, WANG QING-LING. Proximal policy optimization with an integral compensator for quadrotor control. [J]. Frontiers of information technology & electronic engineering, 2020, 21(5): 777-795.
胡欢, 王庆领. 基于带积分补偿近端策略优化算法的四旋翼控制[J]. 信息与电子工程前沿(英文), 2020,21(5):777-795. DOI: 10.1631/FITEE.1900641.
HU HUAN, WANG QING-LING. Proximal policy optimization with an integral compensator for quadrotor control. [J]. Frontiers of information technology & electronic engineering, 2020, 21(5): 777-795. DOI: 10.1631/FITEE.1900641.
使用先进的近端策略优化强化学习算法优化随机控制策略,实现对无模型四旋翼飞行器速度的稳定控制。飞行器模型由4个可以学习训练的子神经网络控制,神经网络以一种端到端的方式将模型状态映射为控制命令输送给飞行器执行。将积分补偿器引入行为评估算法框架,可大大提高模型速度跟踪的准确性和鲁棒性。此外,开发了包括离线学习和在线学习的两阶段学习方案,以供实际飞行之需。在在线学习阶段,不断优化模型的飞行策略。最后,对比提出的算法与传统PID算法的实验效果。
We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the "model-free" quadrotor. The model is controlled by four learned neural networks
which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework
the speed tracking accuracy and robustness have been greatly enhanced. In addition
a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then
the flight policy of the model is continuously optimized in the online learning phase. Finally
the performances of our proposed algorithm are compared with those of the traditional PID algorithm.
强化学习近端策略优化四旋翼控制神经网络
Reinforcement learningProximal policy optimizationQuadrotor controlNeural network
M Abadi, , , P Barham, , , JM Chen, , , 等. . TensorFlow: a system for large-scale machine learning. . Proc 12th USENIX Conf on Operating Systems Design and Implementation, , 2016. . p.265--283. . http://d.old.wanfangdata.com.cn/Periodical/dlzdhsb201904030http://d.old.wanfangdata.com.cn/Periodical/dlzdhsb201904030, , ..
K Alexis, , , G Nikolakopoulos, , , A Tzes. . Model predictive quadrotor control: attitude, altitude and position experimental studies. . IET Contr Theory Appl, , 2012. . 6((12):):1812--1827. . DOI:10.1049/iet-cta.2011.0348http://doi.org/10.1049/iet-cta.2011.0348..
SI Amari. . Natural gradient works efficiently in learning. . Neur Comput, , 1998. . 10((2):):251--276. . DOI:10.1162/089976698300017746http://doi.org/10.1162/089976698300017746..
G Antonelli, , , E Cataldi, , , F Arrichiello, , , 等. . Adaptive trajectory tracking for quadrotor MAVs in presence of parameter uncertainties and external disturbances. . IEEE Trans Contr Syst Technol, , 2018. . 26((1):):248--254. . DOI:10.1109/TCST.2017.2650679http://doi.org/10.1109/TCST.2017.2650679..
A Bobtsov, , , A Guirik, , , M Budko, , , 等. . Hybrid parallel neuro-controller for multirotor unmanned aerial vehicle. . Proc 8th Int Congress on Ultra Modern Telecommunications and Control Systems and Workshops, , 2016. . p.1--4. . DOI:10.1109/ICUMT.2016.7765223http://doi.org/10.1109/ICUMT.2016.7765223..
S Bouabdallah, , , A Noth, , , R Siegwart. . PID vs LQ control techniques applied to an indoor micro quadrotor. . Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, , 2004. . p.2451--2456. . DOI:10.1109/IROS.2004.1389776http://doi.org/10.1109/IROS.2004.1389776..
T Dierks, , , S Jagannathan. . Output feedback control of a quadrotor UAV using neural networks. . IEEE Trans Neur Netw, , 2010. . 21((1):):50--66. . DOI:10.1109/TNN.2009.2034145http://doi.org/10.1109/TNN.2009.2034145..
Y Duan, , , X Chen, , , R Houthooft, , , 等. . Benchmarking deep reinforcement learning for continuous control. . Proc 33rd Int Conf on Machine Learning, , 2016. . p.1329--1338. . http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Arxiv000001181208http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Arxiv000001181208, , ..
M Fumagalli, , , R Naldi, , , A Macchelli, , , 等. . Modeling and control of a flying robot for contact inspection. . Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, , 2012. . p.3532--3537. . DOI:10.1109/IROS.2012.6385917http://doi.org/10.1109/IROS.2012.6385917..
J Hwangbo, , , I Sa, , , R Siegwart, , , 等. . Control of a quadrotor with reinforcement learning. . IEEE Robot Autom Lett, , 2017. . 2((4):):2096--2103. . DOI:10.1109/LRA.2017.2720851http://doi.org/10.1109/LRA.2017.2720851..
S Kakade, , , J Langford. . Approximately optimal approximate reinforcement learning. . Proc 19th Int Conf on Machine Learning, , 2002. . p.267--274. . http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC027184001/http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC027184001/, , ..
DP Kingma, , , J Ba. . ADAM: a method for stochastic optimization. . 2014. . https://arxiv.org/abs/1412.6980https://arxiv.org/abs/1412.6980, , ..
T Lee. . Robust adaptive attitude tracking on SO(3) with an application to a quadrotor UAV. . IEEE Trans Contr Syst Technol, , 2013. . 21((5):):1924--1930. . DOI:10.1109/TCST.2012.2209887http://doi.org/10.1109/TCST.2012.2209887..
Lillicrap TP, Hunt JJ, Pritzel A, et al., 2016. Continuous control with deep reinforcement learning. https: //arxiv.org/abs/1509.02971
O Miglino, , , HH Lund, , , S Nolfi. . Evolving mobile robots in simulated and real environments. . Artif Life, , 1995. . 2((4):):417--434. . DOI:10.1162/artl.1995.2.4.417http://doi.org/10.1162/artl.1995.2.4.417..
V Mnih, , , K Kavukcuoglu, , , D Silver, , , 等. . Human-level control through deep reinforcement learning. . Nature, , 2015. . 518((7540):):529--533. . DOI:10.1038/nature14236http://doi.org/10.1038/nature14236..
Quanser. . User Manual Qball 2 for QUARC: Set Up and Configuration. . Quanser, Inc., Markham, ON, Canada, , 2015. ..
HA Rozi, , , E Susanto, , , IP Dwibawa. . Quadrotor model with proportional derivative controller. . Proc Int Conf on Control, Electronics, Renewable Energy and Communications, , 2017. . p.241--246. . DOI:10.1109/ICCEREC.2017.8226676http://doi.org/10.1109/ICCEREC.2017.8226676..
AL Salih, , , M Moghavvemi, , , HAF Mohamed, , , 等. . Flight PID controller design for a UAV quadrotor. . Sci Res Essays, , 2010. . 5((23):):3660--3667. . http://d.old.wanfangdata.com.cn/Periodical/dzjsyy201804007http://d.old.wanfangdata.com.cn/Periodical/dzjsyy201804007, , ..
F Santoso, , , MA Garratt, , , SG Anavatti. . State-of-the-art intelligent flight control systems in unmanned aerial vehicles. . IEEE Trans Autom Sci Eng, , 2018. . 15((2):):613--627. . DOI:10.1109/TASE.2017.2651109http://doi.org/10.1109/TASE.2017.2651109..
J Schulman. . Optimizing Expectations: from Deep Reinforcement Learning to Stochastic Computation Graphs. PhD Thesis, , ::Berkeley, USAUniversity of California, , 2016. ..
J Schulman, , , S Levine, , , P Moritz, , , 等. . Trust region policy optimization. . Proc 31st Int Conf on Machine Learning, , 2015. . p.1889--1897. . ..
J Schulman, , , F Wolski, , , P Dhariwal, , , 等. . Proximal policy optimization algorithms. . 2017. . https://arxiv.org/abs/1707.06347https://arxiv.org/abs/1707.06347, , ..
DJ Shi, , , XH Dai, , , XW Zhang, , , 等. . A practical performance evaluation method for electric multicopters. . IEEE/ASME Trans Mechatr, , 2017. . 22((3):):1337--1348. . DOI:10.1109/TMECH.2017.2675913http://doi.org/10.1109/TMECH.2017.2675913..
D Silver, , , G Lever, , , N Heess, , , 等. . Deterministic policy gradient algorithms. . Proc 31st Int Conf on Machine Learning, , 2014. . p.1--9. . http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC0214671624/http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC0214671624/, , ..
D Silver, , , A Huang, , , CJ Maddison, , , 等. . Mastering the game of Go with deep neural networks and tree search. . Nature, , 2016. . 529((7587):):484--489. . DOI:10.1038/nature16961http://doi.org/10.1038/nature16961..
RS Sutton. . Generalization in reinforcement learning: successful examples using sparse coarse coding. . Proc 8th Int Conf on Neural Information Processing Systems, , 1995. . p.1038--1044. . ..
RS Sutton, , , AG Barto. . Reinforcement Learning: an Introduction, , ::Cambridge, USAMIT Press, , 1998. ..
T Tomic, , , K Schmid, , , P Lutz, , , 等. . Toward a fully autonomous UAV: research platform for indoor and outdoor urban search and rescue. . IEEE Robot Autom Mag, , 2012. . 19((3):):46--56. . DOI:10.1109/MRA.2012.2206473http://doi.org/10.1109/MRA.2012.2206473..
J Valente, , , J del Cerro, , , A Barrientos, , , 等. . Aerial coverage optimization in precision agriculture management: a musical harmony inspired approach. . Comput Electron Agric, , 2013. . 99153--159. . DOI:10.1016/j.compag.2013.09.008http://doi.org/10.1016/j.compag.2013.09.008..
RG Valenti, , , YD Jian, , , K Ni, , , 等. . An autonomous flyer photographer. . Proc IEEE Int Conf on Cyber Technology in Automation, Control, and Intelligent Systems, , 2016. . p.273--278. . DOI:10.1109/CYBER.2016.7574835http://doi.org/10.1109/CYBER.2016.7574835..
H van Hasselt. . Double Q-learning. . Proc 23rd Int Conf on Neural Information Processing Systems, , 2010. . p.2613--2621. . http://d.old.wanfangdata.com.cn/Periodical/jsjjczzxt201412010http://d.old.wanfangdata.com.cn/Periodical/jsjjczzxt201412010, , ..
H van Hasselt, , , A Guez, , , D Silver. . Deep reinforcement learning with double Q-learning. . Proc 30th AAAI Conf on Artificial Intelligence, , 2016. . p.2094--2100. . http://d.old.wanfangdata.com.cn/Periodical/jsjxb201801001http://d.old.wanfangdata.com.cn/Periodical/jsjxb201801001, , ..
YD Wang, , , J Sun, , , HB He, , , 等. . Deterministic policy gradient with integral compensator for robust quadrotor control. . IEEE Trans Syst Man Cybern Syst, , 2019. . p.1--13. . DOI:10.1109/TSMC.2018.2884725http://doi.org/10.1109/TSMC.2018.2884725..
SL Waslander, , , GM Hoffmann, , , JS Jang, , , 等. . Multiagent quadrotor testbed control design: integral sliding mode vs. . reinforcement learning. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, , 2005. . p.3712--3717. . DOI:10.1109/IROS.2005.1545025http://doi.org/10.1109/IROS.2005.1545025..
CJCH Watkins, , , P Dayan. . Q-learning. . Mach Learn, , 1992. . 8((3-4):):279--292. . DOI:10.1007/BF00992698http://doi.org/10.1007/BF00992698..
PS Williams-Hayes. . Flight test implementation of a second generation intelligent flight control system. . Proc Infotech@Aerospace, , 2005. . p.26--29. . DOI:10.2514/6.2005-6995http://doi.org/10.2514/6.2005-6995..
B Xu. . Composite learning finite-time control with application to quadrotors. . IEEE Trans Syst Man Cybern Syst, , 2018. . 48((10):):1806--1815. . DOI:10.1109/TSMC.2017.2698473http://doi.org/10.1109/TSMC.2017.2698473DOI:10.1109/TSMC.2017.2698473http://doi.org/10.1109/TSMC.2017.2698473..
R Xu, , , U Ozguner. . Sliding mode control of a quadrotor helicopter. . Proc 45th IEEE Conf on Decision and Control, , 2006. . p.4957--4962. . DOI:10.1109/CDC.2006.377588http://doi.org/10.1109/CDC.2006.377588..
HJ Yang, , , L Cheng, , , YQ Xia, , , 等. . Active disturbance rejection attitude control for a dual closed-loop quadrotor under gust wind. . IEEE Trans Contr Syst Technol, , 2018. . 26((4):):1400--1405. . DOI:10.1109/TCST.2017.2710951http://doi.org/10.1109/TCST.2017.2710951..
O Yechiel, , , H Guterman. . A survey of adaptive control. . Int Rob Autom J, , 2017. . 3((2):):290--292. . DOI:10.15406/iratj.2017.03.00053http://doi.org/10.15406/iratj.2017.03.00053..
关联资源
相关文章
相关作者
相关机构