基于多智能体微分博弈的数据驱动协同一致控制

石宇; 化永朝; 于江龙; 董希旺; 任章

doi:10.1631/FITEE.2200001

Your Location：

Home >

Browse articles >

基于多智能体微分博弈的数据驱动协同一致控制

多智能体系统的体系化和组织化博弈专题 | Updated：2022-09-08

- 基于多智能体微分博弈的数据驱动协同一致控制
  Enhanced Publication
- Multi-agent differential game based cooperative synchronization control using a data-driven method
- 信息与电子工程前沿（英文） 2022年23卷第7期页码：1043-1056
- Affiliations：
  
  1.School of Automation Science and Electrical Engineering, Beihang University, Beijing100191, China
  2.Institute of Artificial Intelligence, Beihang University, Beijing100191, China
- Author bio：
  
  E-mail: shiyu_sasee@buaa.edu.cn;
  E-mail: yongzhaohua@buaa.edu.cn;
  E-mail: sdjxyjl@buaa.edu.cn;
  ‡Corresponding authors
  E-mail: renzhang@buaa.edu.cn
- Funds：
  
  Science and Technology Innovation 2030, China(2020AAA0108200);National Natural Science Foundation of China(61873011;61973013;61922008;61803014);Defense Industrial Technology Development Program, China(JCKY2019601C106);Innovation Zone Project, China(18-163-00-TS-001-001-34);Foundation Strengthening Program Technology Field Fund, China(2019-JCJQ-JJ-243);Fund from the Key Laboratory of Dependable Service Computing in Cyber Physical Society, China(CPSDSC202001)
- DOI：10.1631/FITEE.2200001
  中图分类号： TP273
- 纸质出版日期：2022-07-23，
  
  收稿日期：2022-01-03，
  
  录用日期：2022-04-21
- Accepted：
Scan QR Code
石宇, 化永朝, 于江龙, 等. 基于多智能体微分博弈的数据驱动协同一致控制[J]. 信息与电子工程前沿（英文）, 2022,23(7):1043-1056.

YU SHI, YONGZHAO HUA, JIANGLONG YU, et al. Multi-agent differential game based cooperative synchronization control using a data-driven method. [J]. Frontiers of information technology & electronic engineering, 2022, 23(7): 1043-1056.
石宇, 化永朝, 于江龙, 等. 基于多智能体微分博弈的数据驱动协同一致控制[J]. 信息与电子工程前沿（英文）, 2022,23(7):1043-1056. DOI： 10.1631/FITEE.2200001.

YU SHI, YONGZHAO HUA, JIANGLONG YU, et al. Multi-agent differential game based cooperative synchronization control using a data-driven method. [J]. Frontiers of information technology & electronic engineering, 2022, 23(7): 1043-1056. DOI： 10.1631/FITEE.2200001.

摘要

本文研究了多智能体微分博弈问题及其在协同一致控制中的应用。提出系统化的多智能体微分博弈构建和分析方法，同时给出一种基于强化学习技术的数据驱动方法。首先论证了由于网络交互的耦合特性，典型的分布式控制器无法充分保证微分博弈的全局纳什均衡。其次通过定义最优对策的概念，将问题分解为局部微分博弈问题，并给出局部纳什均衡解。构造了一种无需系统模型信息的离轨策略强化学习算法，利用在线邻居交互数据对控制器进行优化更新，并证明控制器的稳定性和鲁棒性。进一步提出一种基于改进耦合指标函数的微分博弈模型及其等效的强化学习求解方法。与现有研究相比，该模型解决了多智能体所需信息的耦合问题，并实现分布式框架下全局纳什均衡和稳定控制。构造了与此纳什解对应的等价并行强化学习方法。最后，仿真结果验证了学习过程的有效性和一致控制的稳定性。

Abstract

This paper studies the multi-agent differential game based problem and its application to cooperative synchronization control. A systematized formulation and analysis method for the multi-agent differential game is proposed and a data-driven methodology based on the reinforcement learning (RL) technique is given. First

it is pointed out that typical distributed controllers may not necessarily lead to global Nash equilibrium of the differential game in general cases because of the coupling of networked interactions. Second

to this end

an alternative local Nash solution is derived by defining the best response concept

while the problem is decomposed into local differential games. An off-policy RL algorithm using neighboring interactive data is constructed to update the controller without requiring a system model

while the stability and robustness properties are proved. Third

to further tackle the dilemma

another differential game configuration is investigated based on modified coupling index functions. The distributed solution can achieve global Nash equilibrium in contrast to the previous case while guaranteeing the stability. An equivalent parallel RL method is constructed corresponding to this Nash solution. Finally

the effectiveness of the learning process and the stability of synchronization control are illustrated in simulation results.

关键词

多智能体系统微分博弈一致控制数据驱动强化学习

Keywords

Multi-agent systemDifferential gameSynchronization controlData-drivenReinforcement learning

references

Abouheaf MI, Lewis FL, Vamvoudakis KG, et al., 2014. Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica, 50(12):3038-3053. doi: 10.1016/j.automatica.2014.10.047http://doi.org/10.1016/j.automatica.2014.10.047

Başar T, Olsder GJ, 1982. Dynamic Noncooperative Game Theory. Academic Press, New York, USA.

Dong XW, Xi JX, Lu G, et al., 2014. Formation control for high-order linear time-invariant multiagent systems with time delays. IEEE Trans Contr Netw Syst, 1(3): 232-240. doi: 10.1109/TCNS.2014.2337972http://doi.org/10.1109/TCNS.2014.2337972

Lewis FL, Vrabie DL, Syrmos VL, 2012. Optimal Control. John Wiley & Sons, Hoboken, NJ, USA.

Li JN, Modares H, Chai TY, et al., 2017. Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans Neur Netw Learn Syst, 28(10):2434-2445. doi: 10.1109/TNNLS.2016.2609500http://doi.org/10.1109/TNNLS.2016.2609500

Liu MS, Wan Y, Lopez VG, et al., 2021. Differential graphical game with distributed global Nash solution. IEEE Trans Contr Netw Syst, 8(3):1371-1382. doi: 10.1109/TCNS.2021.3065654http://doi.org/10.1109/TCNS.2021.3065654

Lopez VG, Lewis FL, Wan Y, et al., 2020. Stability and robustness analysis of minmax solutions for differential graphical games. Automatica, 121:109177. doi: 10.1016/j.automatica.2020.109177http://doi.org/10.1016/j.automatica.2020.109177

Modares H, Lewis FL, 2014. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Contr, 59(11):3051-3056. doi: 10.1109/TAC.2014.2317301http://doi.org/10.1109/TAC.2014.2317301

Modares H, Lewis FL, Jiang ZP, 2015. <math id="M309"><mrow><msub><mi>H</mi><mi>∞</mi></msub></mrow></math> tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neur Netw Learn Syst, 26(10):2550-2562. doi: 10.1109/TNNLS.2015.2441749http://doi.org/10.1109/TNNLS.2015.2441749

Mu CX, Zhen N, Sun CY, et al., 2017. Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans Cybern, 47(6):1460-1470. doi: 10.1109/TCYB.2016.2548941http://doi.org/10.1109/TCYB.2016.2548941

Olfati-Saber R, Murray RM, 2004. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Contr, 49(9):1520-1533. doi: 10.1109/TAC.2004.834113http://doi.org/10.1109/TAC.2004.834113

Peng QY, Low SH, 2018. Distributed optimal power flow algorithm for radial networks, I: balanced single phase case. IEEE Trans Smart Grid, 9(1):111-121. doi: 10.1109/TSG.2016.2546305http://doi.org/10.1109/TSG.2016.2546305

Qian YY, Liu MS, Wan Y, et al., 2021. Distributed adaptive Nash equilibrium solution for differential graphical games. IEEE Trans Cybern, early access. doi: 10.1109/TCYB.2021.3114749http://doi.org/10.1109/TCYB.2021.3114749

Qin JH, Gao HJ, Zheng WX, 2011. Second-order consensus for multi-agent systems with switching topology and communication delay. Syst Contr Lett, 60(6):390-397. doi: 10.1016/j.sysconle.2011.03.004http://doi.org/10.1016/j.sysconle.2011.03.004

Ren W, Beard RW, 2005. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans Autom Contr, 50(5):655-661. doi: 10.1109/TAC.2005.846556http://doi.org/10.1109/TAC.2005.846556

Sun C, Ye MJ, Hu GQ, 2017. Distributed time-varying quadratic optimization for multiple agents under undirected graphs. IEEE Trans Autom Contr, 62(7):3687-3694. doi: 10.1109/TAC.2017.2673240http://doi.org/10.1109/TAC.2017.2673240

Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA.

Tamimi A, Lewis FL, Abu-Khalaf M, 2008. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern B Cybern, 38(4):943-949. doi: 10.1109/TSMCB.2008.926614http://doi.org/10.1109/TSMCB.2008.926614

Vamvoudakis KG, Lewis FL, 2011. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 47(8):1556-1569. doi: 10.1016/j.automatica.2011.03.005http://doi.org/10.1016/j.automatica.2011.03.005

Vamvoudakis KG, Lewis FL, Hudas GR, 2012. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 48(8):1598-1611. doi: 10.1016/j.automatica.2012.05.074http://doi.org/10.1016/j.automatica.2012.05.074

Wang MY, Wang ZJ, Talbot J, et al., 2021. Game-theoretic planning for self-driving cars in multivehicle competitive scenarios. IEEE Trans Robot, 37(4):1313-1325. doi: 10.1109/TRO.2020.3047521http://doi.org/10.1109/TRO.2020.3047521

Wang W, Chen X, Fu H, et al., 2020. Model-free distributed consensus control based on actor-critic framework for discrete-time nonlinear multiagent systems. IEEE Trans Syst Man Cybern Syst, 50(11):4123-4134. doi: 10.1109/tsmc.2018.2883801http://doi.org/10.1109/tsmc.2018.2883801

Wen GH, Yu XH, Liu ZW, 2021. Recent progress on the study of distributed economic dispatch in smart grid: an overview. Front Inform Technol Electron Eng, 22(1):25-39. doi: 10.1631/FITEE.2000205http://doi.org/10.1631/FITEE.2000205

Yang T, Yi XL, Wu JF, et al., 2019. A survey of distributed optimization. Ann Rev Contr, 47:278-305. doi: 10.1016/j.arcontrol.2019.05.006http://doi.org/10.1016/j.arcontrol.2019.05.006

Yang YJ, Wan Y, Zhu JH, et al., 2021. <math id="M310"><mrow><msub><mi>H</mi><mi>∞</mi></msub></mrow></math> tracking control for linear discrete-time systems: model-free Q-learning designs. IEEE Contr Syst Lett, 5(1):175-180. doi: 10.1109/LCSYS.2020.3001241http://doi.org/10.1109/LCSYS.2020.3001241

Ye MJ, Hu GQ, Lewis FL, 2018. Nash equilibrium seeking for N-coalition noncooperative games. Automatica, 95:266-272. doi: 10.1016/j.automatica.2018.05.020http://doi.org/10.1016/j.automatica.2018.05.020

Ye MJ, Hu GQ, Lewis FL, et al., 2019. A unified strategy for solution seeking in graphical N-coalition noncooperative games. IEEE Trans Autom Contr, 64(11):4645-4652. doi: 10.1109/TAC.2019.2901820http://doi.org/10.1109/TAC.2019.2901820

Zhang HG, Jiang H, Luo YH, et al., 2017. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 64(5):4091-4100. doi: 10.1109/TIE.2016.2542134http://doi.org/10.1109/TIE.2016.2542134

Zhao DB, Xia ZP, Wang D, 2015. Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng, 12(4):1461-1468. doi: 10.1109/TASE.2014.2348991http://doi.org/10.1109/TASE.2014.2348991

Zhao JG, 2020. Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning. Neurocomputing, 412:167-176. doi: 10.1016/j.neucom.2020.06.083http://doi.org/10.1016/j.neucom.2020.06.083

Zheng WY, Wu WC, Zhang BM, et al., 2016. A fully distributed reactive power optimization and control method for active distribution networks. IEEE Trans Smart Grid, 7(2):1021-1033. doi: 10.1109/TSG.2015.2396493http://doi.org/10.1109/TSG.2015.2396493

Zhu QY, Başar T, 2015. Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Contr Syst, 35(1):46-65. doi: 10.1109/MCS.2014.2364710http://doi.org/10.1109/MCS.2014.2364710

浏览量

Downloads

CSCD

文章被引用时，请邮件提醒。

Submit

工具集

关联资源

Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents

Domain adaptation in reinforcement learning: a comprehensive and systematic study

Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments

Hybrid-driven Gaussian process online learning for highly maneuvering multi-target tracking