多智能体协作与博弈展望：挑战、技术和应用

刘瑜; 李徵; 姜智卓; 何友

doi:10.1631/FITEE.2200055

Your Location：

Home >

Browse articles >

多智能体协作与博弈展望：挑战、技术和应用

多智能体系统的体系化和组织化博弈专题 | Updated：2022-09-08

- 多智能体协作与博弈展望：挑战、技术和应用
  Enhanced Publication
- Prospects for multi-agent collaboration and gaming: challenge, technology, and application
- 信息与电子工程前沿（英文） 2022年23卷第7期页码：1002-1009
- Affiliations：
  
  1.Department of Electronic Engineering, Tsinghua University, Beijing100084, China
  2.Shenzhen International Graduate School, Tsinghua University, Shenzhen518055, China
- Author bio：
  
  E-mail: liuyu77360132@126.com;
  ‡Corresponding authors
- Funds：
  
  National Key R&D Program of China(2021YFA0715202);National Natural Science Foundation of China(61790550;62022092)
- DOI：10.1631/FITEE.2200055
  中图分类号：
- 纸质出版日期：2022-07-23，
  
  网络出版日期：2022-05-20，
  
  收稿日期：2022-02-14，
  
  录用日期：2022-04-23
- Accepted：
Scan QR Code
刘瑜, 李徵, 姜智卓, 等. 多智能体协作与博弈展望：挑战、技术和应用[J]. 信息与电子工程前沿（英文）, 2022,23(7):1002-1009.

YU LIU, ZHI LI, ZHIZHUO JIANG, et al. Prospects for multi-agent collaboration and gaming: challenge, technology, and application. [J]. Frontiers of information technology & electronic engineering, 2022, 23(7): 1002-1009.
刘瑜, 李徵, 姜智卓, 等. 多智能体协作与博弈展望：挑战、技术和应用[J]. 信息与电子工程前沿（英文）, 2022,23(7):1002-1009. DOI： 10.1631/FITEE.2200055.

YU LIU, ZHI LI, ZHIZHUO JIANG, et al. Prospects for multi-agent collaboration and gaming: challenge, technology, and application. [J]. Frontiers of information technology & electronic engineering, 2022, 23(7): 1002-1009. DOI： 10.1631/FITEE.2200055.

摘要

近年来，多智能体系统在解决复杂环境中各种决策问题方面取得显著进步，并已实现与人类相似甚至更好的决策性能。本文从任务挑战、技术方向和应用领域3个角度简要回顾多智能体协作和博弈相关技术。首先回顾近期多智能体系统工作中的典型研究问题和挑战，然后进一步讨论关于多智能体协作和游戏任务的前沿研究方向，最后对多智能体协作与博弈的应用领域进行重点展望。

Abstract

Recent years have witnessed significant improvement of multi-agent systems for solving various decision-making problems in complex environments and achievement of similar or even better performance than humans. In this study

we briefly review multi-agent collaboration and gaming technology from three perspectives

i.e.

task challenges

technology directions

and application areas. We first highlight the typical research problems and challenges in the recent work on multi-agent systems. Then we discuss some of the promising research directions on multi-agent collaboration and gaming tasks. Finally

we provide some focused prospects on the application areas in this field.

关键词

多智能体博弈论集体智能强化学习智能控制

Keywords

references

Arora S, Doshi P, 2021. A survey of inverse reinforcement learning: challenges, methods and progress. Artif Intell, 297:103500. doi: 10.1016/j.artint.2021.103500http://doi.org/10.1016/j.artint.2021.103500

Arulkumaran K, Deisenroth MP, Brundage M, et al., 2017. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag, 34(6):26-38. doi: 10.1109/MSP.2017.2743240http://doi.org/10.1109/MSP.2017.2743240

Bailey JP, Piliouras G, 2019. Multi-agent learning in network zero-sum games is a Hamiltonian system. Proc 18th Int Conf on Autonomous Agents and Multiagent Systems, p.233-241.

Balduzzi D, Racanière S, Martens J, et al., 2018. The mechanics of n-player differentiable games. Proc 35th Int Conf on Machine Learning, p.354-363.

Baltrušaitis T, Ahuja C, Morency LP, 2019. Multimodal machine learning: a survey and taxonomy. IEEE Trans Patt Anal Mach Intell, 41(2):423-443. doi: 10.1109/TPAMI.2018.2798607http://doi.org/10.1109/TPAMI.2018.2798607

Barron EN, 2013. Game Theory: an Introduction. John Wiley & Sons, Hoboken, USA.

Beattie C, Leibo JZ, Teplyashin D, et al., 2016. DeepMind Lab. https://arxiv.org/abs/1612.03801v2https://arxiv.org/abs/1612.03801v2

Bellemare MG, Naddaf Y, Veness J, et al., 2013. The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res, 47:253-279. doi: 10.1613/jair.3912http://doi.org/10.1613/jair.3912

Berner C, Brockman G, Chan B, et al., 2019. Dota 2 with large scale deep reinforcement learning. https://arxiv.org/abs/1912.06680https://arxiv.org/abs/1912.06680

Betancourt C, Chen WH, 2021. Deep reinforcement learning for portfolio management of markets with a dynamic number of assets. Expert Syst Appl, 164:114002. doi: 10.1016/j.eswa.2020.114002http://doi.org/10.1016/j.eswa.2020.114002

Brockman G, Cheung V, Pettersson L, et al., 2016. OpenAI Gym. https://arxiv.org/abs/1606.01540https://arxiv.org/abs/1606.01540

Busoniu L, Babuska R, De Schutter B, 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C, 38(2):156-172. doi: 10.1109/TSMCC.2007.913919http://doi.org/10.1109/TSMCC.2007.913919

Cañizares PC, Merayo MG, Núñez M, et al., 2017. A multi-agent system architecture for statistics managing and soccer forecasting. Proc 2nd IEEE Int Conf on Computational Intelligence and Applications, p.572-576. doi: 10.1109/CIAPP.2017.8167282http://doi.org/10.1109/CIAPP.2017.8167282

Coulom R, 2007. Efficient selectivity and backup operators in Monte-Carlo tree search. Proc 5th Int Conf on Computers and Games, p.72-83. doi: 10.1007/978-3-540-75538-8_7http://doi.org/10.1007/978-3-540-75538-8_7

Das A, Gervet T, Romoff J, et al., 2019. TarMAC: targeted multi-agent communication. Proc 36th Int Conf on Machine Learning, p.1538-1546.

Dionisio JDN, Burns WGIII, Gilbert R, 2013. 3D virtual worlds and the metaverse: current status and future possibilities. ACM Comput Surv, 45(3):34. doi: 10.1145/2480741.2480751http://doi.org/10.1145/2480741.2480751

Foerster JN, Assael YM, de Freitas N, et al., 2016. Learning to communicate with deep multi-agent reinforcement learning. Proc 30th Int Conf on Neural Information Processing Systems, p.2145-2153.

Georgeff MP, 1988. Communication and interaction in multi-agent planning. In: Bond AH, Gasser L (Eds.), Distributed Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, USA, p.200-204.

Grigorescu S, Trasnea B, Cocias T, et al., 2020. A survey of deep learning techniques for autonomous driving. J Field Robot, 37(3):362-386. doi: 10.1002/rob.21918http://doi.org/10.1002/rob.21918

Hernandez-Leal P, Kaisers M, Baarslag T, et al., 2017. A survey of learning in multiagent environments: dealing with non-stationarity. https://arxiv.org/abs/1707.09183v1https://arxiv.org/abs/1707.09183v1

Hoen PJ, Tuyls K, Panait L, et al., 2005. An overview of cooperative and competitive multiagent learning. Proc 1st Int Conf on Learning and Adaption in Multi-Agent Systems, p.1-46. doi: 10.1007/11691839_1http://doi.org/10.1007/11691839_1

Hüttenrauch M, Šošić A, Neumann G, 2019. Deep reinforcement learning for swarm systems. J Mach Learn Res, 20(54):1-31.

Jennings NR, Sycara K, Wooldridge M, 1998. A roadmap of agent research and development. Auton Agent Multi-Agent Syst, 1(1):7-38. doi: 10.1023/A:1010090405266http://doi.org/10.1023/A:1010090405266

Jiang JC, Lu ZQ, 2018. Learning attentional communication for multi-agent cooperation. Proc 32nd Int Conf on Neural Information Processing Systems, p.7265-7275.

Johnson M, Hofmann K, Hutton T, et al., 2016. The Malmo platform for artificial intelligence experimentation. Proc 25th Int Joint Conf on Artificial Intelligence, p.4246-4247.

Kempka M, Wydmuch M, Runc G, et al., 2016. ViZDoom: a doom-based AI research platform for visual reinforcement learning. Proc IEEE Conf on Computational Intelligence and Games, p.1-8. doi: 10.1109/CIG.2016.7860433http://doi.org/10.1109/CIG.2016.7860433

Kim D, Moon S, Hostallero D, et al., 2019. Learning to schedule communication in multi-agent reinforcement learning. https://arxiv.org/abs/1902.01554https://arxiv.org/abs/1902.01554

Lagorse J, Paire D, Miraoui A, 2010. A multi-agent system for energy management of distributed power sources. Renewab Energy, 35(1):174-182. doi: 10.1016/j.renene.2009.02.029http://doi.org/10.1016/j.renene.2009.02.029

Lazaridou A, Peysakhovich A, Baroni M, 2017. Multi-agent cooperation and the emergence of (natural) language. https://arxiv.org/abs/1612.07182https://arxiv.org/abs/1612.07182

Leonardos S, Piliouras G, Spendlove K, 2021. Exploration-exploitation in multi-agent competition: convergence with bounded rationality. https://arxiv.org/abs/2106.12928https://arxiv.org/abs/2106.12928

Li YM, Ren SL, Wu PX, et al., 2021. Learning distilled collaboration graph for multi-agent perception. https://arxiv.org/abs/2111.00643v2https://arxiv.org/abs/2111.00643v2

Li ZY, Yuan Q, Luo GY, et al., 2021. Learning effective multi-vehicle cooperation at unsignalized intersection via bandwidth-constrained communication. Proc IEEE 94th Vehicular Technology Conf, p.1-7.

Lin XM, Adams SC, Beling PA, 2019. Multi-agent inverse reinforcement learning for certain general-sum stochastic games. J Artif Intell Res, 66:473-502. doi: 10.1613/jair.1.11541http://doi.org/10.1613/jair.1.11541

Liu YC, Tian JJ, Glaser N, et al., 2020a. When2com: multi-agent perception via communication graph grouping. Proc IEEE/CVF Conf on Compute Vision and Pattern Recognition, p.4105-4114.

Liu YC, Tian JJ, Ma CY, et al., 2020b. Who2com: collaborative perception via learnable handshake communication. Proc IEEE Int Conf on Robotics and Automation, p.6876-6883.

Mao HY, Gong ZB, Zhang ZC, et al., 2019. Learning multi-agent communication under limited-bandwidth restriction for Internet packet routing. https://arxiv.org/abs/1903.05561https://arxiv.org/abs/1903.05561

Mazumdar E, Ratliff LJ, Jordan MI, et al., 2020. Policy-gradient algorithms have no guarantees of convergence in linear quadratic games. Proc 19th Int Conf on Autonomous Agents and Multiagent Systems, p.860-868.

Mei SW, Wei W, Liu F, 2017. On engineering game theory with its application in power systems. Contr Theory Technol, 15(1):1-12. doi: 10.1007/s11768-017-6186-yhttp://doi.org/10.1007/s11768-017-6186-y

Mordatch I, Abbeel P, 2018. Emergence of grounded compositional language in multi-agent populations. https://arxiv.org/abs/1703.04908https://arxiv.org/abs/1703.04908

Nachum O, Gu SX, Lee H, et al., 2018. Data-efficient hierarchical reinforcement learning. Proc 32nd Int Conf on Neural Information Processing Systems, p.3307-3317.

Neumeyer C, Oliehoek FA, Gavrila DM, 2021. General-sum multi-agent continuous inverse optimal control. IEEE Robot Autom Lett, 6(2):3429-3436. doi: 10.1109/LRA.2021.3060411http://doi.org/10.1109/LRA.2021.3060411

Nguyen TT, Nguyen ND, Nahavandi S, 2020. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern, 50(9):3826-3839. doi: 10.1109/TCYB.2020.2977374http://doi.org/10.1109/TCYB.2020.2977374

Oroojlooy A, Hajinezhad D, 2019. A review of cooperative multi-agent deep reinforcement learning. https://arxiv.org/abs/1908.03963https://arxiv.org/abs/1908.03963

Peng P, Wen Y, Yang YD, et al., 2017. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play StarCraft combat games. https://arxiv.org/abs/1703.10069https://arxiv.org/abs/1703.10069

Polydoros AS, Nalpantidis L, 2017. Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst, 86(2):153-173. doi: 10.1007/s10846-017-0468-yhttp://doi.org/10.1007/s10846-017-0468-y

Puterman ML, 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Hoboken, USA.

Rakhlin A, Sridharan K, 2013. Optimization, learning, and games with predictable sequences. Proc 26th Int Conf on Neural Information Processing Systems, p.3066-3074.

Shao K, Zhu YH, Zhao DB, 2019. StarCraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans Emerg Top Comput Intell, 3(1):73-84. doi: 10.1109/TETCI.2018.2823329http://doi.org/10.1109/TETCI.2018.2823329

Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489. doi: 10.1038/nature16961http://doi.org/10.1038/nature16961

Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359. doi: 10.1038/nature24270http://doi.org/10.1038/nature24270

Singh A, Jain T, Sukhbaatar S, 2018. Learning when to communicate at scale in multiagent cooperative and competitive tasks. https://arxiv.org/abs/1812.09755https://arxiv.org/abs/1812.09755

Spielberg SPK, Gopaluni RB, Loewen PD, 2017. Deep reinforcement learning approaches for process control. Proc 6th Int Symp on Advanced Control of Industrial Processes, p.201-206. doi: 10.1109/ADCONIP.2017.7983780http://doi.org/10.1109/ADCONIP.2017.7983780

Synnaeve G, Nardelli N, Auvolat A, et al., 2016. TorchCraft: a library for machine learning research on real-time strategy games. https://arxiv.org/abs/1611.00625https://arxiv.org/abs/1611.00625

Tao F, Zhang H, Liu A, et al., 2019. Digital Twin in industry: state-of-the-art. IEEE Trans Ind Inform, 15(4):2405-2415. doi: 10.1109/TII.2018.2873186http://doi.org/10.1109/TII.2018.2873186

Tessler C, Givony S, Zahavy T, et al., 2017. A deep hierarchical approach to lifelong learning in minecraft. Proc 31st AAAI Conf on Artificial Intelligence, p.1553-1561.

Todorov E, Erez T, Tassa Y, 2012. MuJoCo: a physics engine for model-based control. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.5026-5033.

Tso KS, Tharp GK, Zhang W, et al., 1999. A multi-agent operator interface for unmanned aerial vehicles. Proc Gateway to the New Millennium. Proc 18th Digital Avionics Systems Conf, Article6.A.4. doi: 10.1109/DASC.1999.821969http://doi.org/10.1109/DASC.1999.821969

Vinyals O, Babuschkin I, Czarnecki WM, et al., 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350-354. doi: 10.1038/s41586-019-1724-zhttp://doi.org/10.1038/s41586-019-1724-z

Wang RD, He X, Yu RS, et al., 2020. Learning efficient multi-agent communication: an information bottleneck approach. Proc 37th Int Conf on Machine Learning, p.9908-9918.

Wang Y, Cheng ZS, Xiao M, 2020. UAVs' formation keeping control based on multi-agent system consensus. IEEE Access, 8:49000-49012. doi: 10.1109/ACCESS.2020.2979996http://doi.org/10.1109/ACCESS.2020.2979996

Wang YN, Xu T, Niu X, et al., 2022. STMARL: a spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control. IEEE Trans Mob Comput, 21(6):2228-2242. doi: 10.1109/TMC.2020.3033782http://doi.org/10.1109/TMC.2020.3033782

Zhang KQ, Yang RZ, Başar T, 2021. Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis KG, Wan Y, Lewis FL, et al. (Eds.), Derya Cansever Handbook of Reinforcement Learning and Control. Springer, Cham, p.321-384. doi: 10.1007/978-3-030-60990-0_12http://doi.org/10.1007/978-3-030-60990-0_12

Zhang Y, Yang Q, 2018. An overview of multi-task learning. Nat Sci Rev, 5(1):30-43. doi: 10.1093/nsr/nwx105http://doi.org/10.1093/nsr/nwx105

Zhou HY, Zhang HF, Zhou YS, et al., 2018. Botzone: an online multi-agent competitive platform for AI education. Proc 23rd Annual ACM Conf on Innovation and Technology in Computer Science Education, p.33-38. doi: 10.1145/3197091.3197099http://doi.org/10.1145/3197091.3197099

Zhuang FZ, Qi ZY, Duan KY, et al., 2021. A comprehensive survey on transfer learning. Proc IEEE, 109(1):43-76. doi: 10.1109/JPROC.2020.3004555http://doi.org/10.1109/JPROC.2020.3004555

浏览量

330

Downloads

CSCD

文章被引用时，请邮件提醒。

Submit

工具集

关联资源

暂无数据