Significance extraction based on data augmentation for reinforcement learning

Yuxi HAN; Dequan LI; Yang YANG

doi:10.1631/FITEE.2400406

Your Location：

Home >

Browse articles >

Significance extraction based on data augmentation for reinforcement learning

Regular Papers | Updated：2025-04-03

- Significance extraction based on data augmentation for reinforcement learning
  Enhanced Publication
- 基于数据增强的显著性提取强化学习
- Frontiers of Information Technology & Electronic Engineering Vol. 26, Issue 3, Pages: 385-399(2025)
- Affiliations：
  
  Faculty of Artificial Intelligence, Anhui University of Science and Technology, Huainan 232000, China
- Author bio：
  
  E-mail: hanyuxi0712@163.com
  ‡Corresponding author
- Funds：
- DOI：10.1631/FITEE.2400406
  CLC： TP391.4
- Received：17 May 2024，
  
  Revised：18 September 2024，
  
  Published Online：06 March 2025，
  
  Published：2025-03
- Accepted：
Scan QR Code
Yuxi HAN, Dequan LI, Yang YANG. Significance extraction based on data augmentation for reinforcement learning[J]. Frontiers of information technology & electronic engineering, 2025, 26(3): 385-399.
DOI：

Yuxi HAN, Dequan LI, Yang YANG. Significance extraction based on data augmentation for reinforcement learning[J]. Frontiers of information technology & electronic engineering, 2025, 26(3): 385-399. DOI： 10.1631/FITEE.2400406.

摘要

深度强化学习在视觉任务中展现了显著的能力，但在输入图像受到干扰信号的情况下，其泛化能力较弱，因此难以将训练有素的智能体应用于新环境中。为了让智能体能区分图像中的噪声信号和重要像素，数据增强技术和辅助网络的建立是有效的解决方案。提出一种新的算法，即增强提取显著性Q值（SEQA），该算法鼓励智能体全面探索未知状态，并将注意力集中在重要信息上。具体来说，SEQA屏蔽干扰特征，提取显著特征，使用评论家损失更新掩码解码网络，从而促使智能体关注重要特征并做出正确决策。在DeepMind控制泛化基准上评估该算法，实验结果表明，该算法极大提高了训练效率和稳定性。同时，在大多数DeepMind控制泛化基准任务中，我们的算法在样本效率和泛化能力方面优于最先进的强化学习方法。

Abstract

Deep reinforcement learning has shown remarkable capabilities in visual tasks

but it does not have a good generalization ability in the context of interference signals in the input images; this approach is therefore hard to be applied to trained agents in a new environment. To enable agents to distinguish between noise signals and important pixels in images

data augmentation techniques and the establishment of auxiliary networks are proven effective solutions. We introduce a novel algorithm

namely

saliency-extracted Q-value by augmentation (SEQA)

which encourages the agent to explore unknown states more comprehensively and focus its attention on important information. Specifically

SEQA masks out interfering features and extracts salient features and then updates the mask decoder network with critic losses to encourage the agent to focus on important features and make correct decisions. We evaluate our algorithm on the DeepMind Control generalization benchmark (DMControl-GB)

and the experimental results show that our algorithm greatly improves training efficiency and stability. Meanwhile

our algorithm is superior to state-of-the-art reinforcement learning methods in terms of sample efficiency and generalization in most DMControl-GB tasks.

关键词

Keywords

references

Almuzairee A , Hansen N , Christensen HI , 2024 . A recipe for unbounded data augmentation in visual reinforcement learning . https://arxiv.org/abs/2405.17416 https://arxiv.org/abs/2405.17416

Antotsiou D , Ciliberto C , Kim TK , 2021 . Adversarial imitation learning with trajectorial augmentation and correction . IEEE Int Conf on Robotics and Automation , p. 4724 - 4730 . https://doi.org/10.1109/ICRA48506.2021.9561915 https://doi.org/10.1109/ICRA48506.2021.9561915

Arulkumaran K , Deisenroth MP , Brundage M , et al. , 2017 . Deep reinforcement learning: a brief survey . IEEE Signal Process Mag , 34 ( 6 ): 26 - 38 . https://doi.org/10.1109/MSP.2017.2743240 https://doi.org/10.1109/MSP.2017.2743240

Bertoin D , Zouitine A , Zouitine M , et al. , 2022 . Look where you look! Saliency-guided Q-networks for generalization in visual reinforcement learning . Proc 36 th Int Conf on Neural Information Processing Systems , Article 2225 .

Chen T , Kornblith S , Norouzi M , et al. , 2020 . A simple framework for contrastive learning of visual representations . Proc 37 th Int Conf on Machine Learning , p. 1597 - 1607 .

Cobbe K , Klimov O , Hesse C , et al. , 2019 . Quantifying generalization in reinforcement learning . Proc 36 th Int Conf on Machine Learning , p. 1282 - 1289 .

Farebrother J , Machado MC , Bowling M , 2018 . Generalization and regularization in DQN . https://arxiv.org/abs/1810.00123 https://arxiv.org/abs/1810.00123

Fu X , Yang G , Agrawal P , et al. , 2021 . Learning task informed abstractions . Proc 38 th Int Conf on Machine Learning , p. 3480 - 3491 .

Gamrian S , Goldberg Y , 2019 . Transfer learning for related reinforcement learning tasks via image-to-image translation . Proc 36 th Int Conf on Machine Learning , p. 2063 - 2072 .

Gelada C , Kumar S , Buckman J , et al. , 2019 . DeepMDP: learning continuous latent space models for representation learning . Proc 36 th Int Conf on Machine Learning , p. 2170 - 2179 .

Grooten B , Tomilin T , Vasan G , et al. , 2024 . MaDi: learning to mask distractions for generalization in visual deep reinforcement learning . Proc 23 rd Int Conf on Autonomous Agents and Multiagent Systems , p. 733 - 742 .

Hansen N , Wang XL , 2021 . Generalization in reinforcement learning by soft data augmentation . IEEE Int Conf on Robotics and Automation , p. 13611 - 13617 . https://doi.org/10.1109/ICRA48506.2021.9561103 https://doi.org/10.1109/ICRA48506.2021.9561103

Hansen N , Jangir R , Sun Y , et al. , 2021a . Self-supervised policy adaptation during deployment . Proc 9 th Int Conf on Learning Representations .

Hansen N , Su H , Wang XL , 2021b . Stabilizing deep Q-learning with ConvNets and vision Transformers under data augmentation . Proc 35 th Int Conf on Neural Information Processing Systems , Article 281 .

Hansen N , Yuan ZC , Ze YJ , et al. , 2023 . On pre-training for visuo-motor control: revisiting a learning-from-scratch baseline . Proc 40 th Int Conf on Machine Learning , Article 506 .

Henderson P , Islam R , Bachman P , et al. , 2017 . Deep reinforcement learning that matters . Proc 32 nd AAAI Conf on Artificial Intelligence , Article 392 . https://doi.org/10.1609/aaai.v32i1.11694 https://doi.org/10.1609/aaai.v32i1.11694

Kaelbling LP , Littman ML , Cassandra AR , 1998 . Planning and acting in partially observable stochastic domains . Artif Intell , 101 ( 1-2 ): 99 - 134 . https://doi.org/10.1016/S0004-3702(98)00023-X https://doi.org/10.1016/S0004-3702(98)00023-X

Kalashnikov D , Irpan A , Pastor P , et al. , 2018 . Scalable deep reinforcement learning for vision-based robotic manipulation . Proc 2 nd Conf on Robot Learning , p. 651 - 673 .

Khraishi R , Okhrati R , 2023 . Simple noisy environment augmentation for reinforcement learning . https://arxiv.org/abs/2305.02882 https://arxiv.org/abs/2305.02882

Kirk R , Zhang A , Grefenstette E , et al. , 2023 . A survey of zero-shot generalisation in deep reinforcement learning . J Artif Intell Res , 76 : 201 - 264 . https://doi.org/10.1613/jair.1.14174 https://doi.org/10.1613/jair.1.14174

Kurniawati H , 2022 . Partially observable Markov decision processes and robotics . Ann Rev Contr Rob Auton Syst , 5 : 253 - 277 . https://doi.org/10.1146/annurev-control-042920-092451 https://doi.org/10.1146/annurev-control-042920-092451

Laskin M , Srinivas A , Abbeel P , 2020a . CURL: contrastive unsupervised representations for reinforcement learning . Proc 37 th Int Conf on Machine Learning , Article 523 .

Laskin M , Lee K , Stooke A , et al. , 2020b . Reinforcement learning with augmented data . Proc 34 th Int Conf on Neural Information Processing Systems , Article 1669 .

Lee K , Lee K , Shin J , et al. , 2020 . Network randomization: a simple technique for generalization in deep reinforcement learning . Pro c 8 th Int Conf on Learning Representations .

Levine S , Finn C , Darrell T , et al. , 2016 . End-to-end training of deep visuomotor policies . J Mach Learn Res , 17 ( 1 ): 1334 - 1373 .

Lin X , Baweja HS , Kantor GA , et al. , 2019 . Adaptive auxiliary task weighting for reinforcement learning . Proc 33 rd Conf on Neural Information Processing Systems , p. 4772 - 4783 .

Luketina J , Nardelli N , Farquhar G , et al. , 2019 . A survey of reinforcement learning informed by natural language . Proc 28 th Int J oint Conf on Artificial Intelligence , p. 6309 - 6317 .

Mnih V , Kavukcuoglu K , Silver D , et al. , 2013 . Playing Atari with deep reinforcement learning . https://arxiv.org/abs/1312.5602 https://arxiv.org/abs/1312.5602

Mnih V , Kavukcuoglu K , Silver D , et al. , 2015 . Human-level control through deep reinforcement learning . Nature , 518 ( 7540 ): 529 - 533 . https://doi.org/10.1038/nature14236 https://doi.org/10.1038/nature14236

Nair A , Pong VH , Dalal M , et al. , 2018 . Visual reinforcement learning with imagined goals . Proc 32 nd Int Conf on Neural Information Processing Systems , p. 9209 - 9220 .

OpenAI , Akkaya I , Andrychowicz M , et al. , 2019 . Solving Rubik’s cube with a robot hand . https://arxiv.org/abs/1910.07113 https://arxiv.org/abs/1910.07113

Pinto L , Andrychowicz M , Welinder P , et al. , 2018 . Asymmetric actor critic for image-based robot learning . https://arxiv.org/abs/1710.06542 https://arxiv.org/abs/1710.06542

Sinha S , Mandlekar A , Garg A , 2022 . S4RL: surprisingly simple self-supervision for offline reinforcement learning in robotics . Proc 5 th Conf on Robot Learning , p. 907 - 917 .

Song XY , Jiang YD , Tu S , et al. , 2020 . Observational overfitting in reinforcement learning . Proc 8 th Int Conf on Learning Representations .

Sutton RS , Barto AG , 2018 . Reinforcement learning: an introduction . IEEE Trans Neur Netw , 9 : 1054 .

Tassa Y , Doron Y , Muldal A , et al. , 2018 . DeepMind Control Suite . https://arxiv.org/abs/1801.00690 https://arxiv.org/abs/1801.00690

Tobin J , Fong R , Ray A , et al. , 2017 . Domain randomization for transferring deep neural networks from simulation to the real world . IEEE/RSJ Int Conf on Intelligent Robots and Systems , p. 23 - 30 . https://doi.org/10.1109/IROS.2017.8202133 https://doi.org/10.1109/IROS.2017.8202133

Wang XD , Lian L , Yu SX , 2021 . Unsupervised visual attention and invariance for reinforcement learning . IEEE/CVF Conf on Computer Vision and Pattern Recognition , p. 6673 - 6683 . https://doi.org/10.1109/CVPR46437.2021.00661 https://doi.org/10.1109/CVPR46437.2021.00661

Xing JW , Nagata T , Chen KX , et al. , 2021 . Domain adaptation in reinforcement learning via latent unified state representation . Proc 35 th AAAI Conf on Artificial Intelligence , p. 10452 - 10459 . https://doi.org/10.1609/aaai.v35i12.17251 https://doi.org/10.1609/aaai.v35i12.17251

Yang SZ , Ze YJ , Xu HZ , 2023 . MoVie: visual model-based policy adaptation for view generalization . Proc 37 th Int Conf on Neural Information Processing Systems , Article 940 .

Yang W , Wang XL , Farhadi A , et al. , 2019 . Visual semantic navigation using scene priors . Proc 7 th Int Conf on Learning Representations .

Yarats D , Zhang A , Kostrikov I , et al. , 2019 . Improving sample efficiency in model-free reinforcement learning from images . Proc 35 th AAAI Conf on Artificial Inte lligence , p. 10674 - 10681 . https://doi.org/10.1609/aaai.v35i12.17276 https://doi.org/10.1609/aaai.v35i12.17276

Yarats D , Kostrikov I , Fergus R , 2021 . Image augmentation is all you need: regularizing deep reinforcement learning from pixels . Proc 9 th Int Conf on Learning Representations .

Yu T , Zhang ZZ , Lan CL , et al. , 2022 . Mask-based latent reconstruction for reinforcement learning . Proc 36 th Conf on Neural Information Processing Systems , p. 25117 - 25131 .

Ze YJ , Hansen N , Chen YB , et al. , 2023 . Visual reinforcement learning with self-supervised 3D representations . IEEE Rob Autom Lett , 8 ( 5 ): 2890 - 2897 . https://doi.org/10.1109/LRA.2023.3259681 https://doi.org/10.1109/LRA.2023.3259681

Zhang A , Ballas N , Pineau J , 2018 . A dissection of overfitting and generalization in continuous reinforcement learning . https://arxiv.org/abs/1806.07937 https://arxiv.org/abs/1806.07937

Zhang A , McAllister RT , Calandra R , et al. , 2021 . Learning invariant representations for reinforcement learning without reconstruction . Proc 9 th Int Conf on Learning Representations .

Zhang H , Chen HG , Xiao CW , et al. , 2020 . Robust deep reinforcement learning against adversarial perturbations on state observations . Proc 34 th Int Conf on Neural Information Processing Systems , Article 1765 .

Zhao J , Zhao YP , Wang WX , et al. , 2022 . Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents . Front Inform Technol Electron Eng , 23 ( 7 ): 1032 - 1042 . https://doi.org/10.1631/FITEE.2100594 https://doi.org/10.1631/FITEE.2100594

Zhou ZH , 2024 . Continuous control reinforcement learning: distributed distributional DrQ algorithms . https://arxiv.org/abs/2404.10645 https://arxiv.org/abs/2404.10645

Zhu YK , Mottaghi R , Kolve E , et al. , 2016 . Target-driven visual navigation in indoor scenes using deep reinforcement learning . IEEE Int Conf on Robotics and Automation , p. 3357 - 3364 . https://doi.org/10.1109/ICRA.2017.7989381 https://doi.org/10.1109/ICRA.2017.7989381

Views

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

No data

Related Author

Feng Qian

Zhaoyang Dong

Ke Meng

Shuai Mao

Chensheng Liu

Yang Tang

Luolin Xiong

Yining CHEN

Related Institution

East China University of Science and Technology, the Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education

Nantong University, the Department of Electrical Engineering

University of New South Wales, School of Electrical Engineering and Telecommunications

Nanyang Technological University, School of Electrical and Electronics Engineering, 50 Nanyang Avenue

School of Aeronautics and Astronautics, Zhejiang University

Chat

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰