Attention-based efficient robot grasp detection network

Xiaofei QIN; Wenkai HU; Chen XIAO; Changxiang HE; Songwen PEI; Xuedian ZHANG

doi:10.1631/FITEE.2200502

Your Location：

Home >

Browse articles >

Attention-based efficient robot grasp detection network

Regular Papers | Updated：2023-10-25

- Attention-based efficient robot grasp detection network
  Enhanced Publication
- 基于注意力的高效机器人抓取检测网络
- Frontiers of Information Technology & Electronic Engineering Vol. 24, Issue 10, Pages: 1430-1444(2023)
- Affiliations：
  
  1.School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  2.College of Science, University of Shanghai for Science and Technology, Shanghai 200093, China
  3.Shanghai Key Laboratory of Modern Optical System, Shanghai 200093, China
  4.Key Laboratory of Biomedical Optical Technology and Devices of Ministry of Education, Shanghai 200093, China
  5.Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai 201210, China
- Author bio：
  
  †E-mail: xiaofei.qin@usst.edu.cn
  ‡Corresponding author
- Funds：
- DOI：10.1631/FITEE.2200502
  CLC： TP391.4
- Published：0 October 2023，
  
  Received：23 October 2022，
  
  Accepted：2023-04-09
- Accepted：
Scan QR Code
XIAOFEI QIN, WENKAI HU, CHEN XIAO, et al. Attention-based efficient robot grasp detection network. [J]. Frontiers of information technology & electronic engineering, 2023, 24(10): 1430-1444.
DOI：

XIAOFEI QIN, WENKAI HU, CHEN XIAO, et al. Attention-based efficient robot grasp detection network. [J]. Frontiers of information technology & electronic engineering, 2023, 24(10): 1430-1444. DOI： 10.1631/FITEE.2200502.

摘要

为平衡抓取检测算法的推理速度和检测精度，本文提出一种编码器–解码器结构的像素级抓取检测神经网络，称为基于注意力的高效机器人抓取检测网络（AE-GDN）。在编码器阶段引入3个空间注意模块以增强细节信息，在解码器阶段引入3个通道注意模块以提取更多语义信息。采用多个轻量高效的DenseBlocks连接编码器和解码器，提高AE-GDN的特征建模能力。预测得到的抓取矩形框与标签抓取框之间的高交并比（IoU）值并不意味着高质量的抓取配置，但可能会导致碰撞。这是因为传统IoU损失计算方法将预测抓取框中心部分像素与夹爪附近像素视为同等重要。本文设计了一种新的基于沙漏形匹配机制的IoU损失计算方法，该方法可在高IoU和高质量抓取配置之间建立良好对应关系。AE-GDN在Cornell和Jacquard数据集上的准确率分别达到98.9%和96.6%。推理速度达到每秒43.5帧，参数仅约1.2×10

。本文提出的AE-GDN已实际部署在机械臂抓取系统中，并实现良好抓取性能。代码可在

https://github.com/robvincen/robot_gradet

获得。

Abstract

To balance the inference speed and detection accuracy of a grasp detection algorithm

which are both important for robot grasping tasks

we propose an encoder–decoder structured pixel-level grasp detection neural network named the attention-based efficient robot grasp detection network (AE-GDN). Three spatial attention modules are introduced in the encoder stages to enhance the detailed information

and three channel attention modules are introduced in the decoder stages to extract more semantic information. Several lightweight and efficient DenseBlocks are used to connect the encoder and decoder paths to improve the feature modeling capability of AE-GDN. A high intersection over union (IoU) value between the predicted grasp rectangle and the ground truth does not necessarily mean a high-quality grasp configuration

but might cause a collision. This is because traditional IoU loss calculation methods treat the center part of the predicted rectangle as having the same importance as the area around the grippers. We design a new IoU loss calculation method based on an hourglass box matching mechanism

which will create good correspondence between high IoUs and high-quality grasp configurations. AE-GDN achieves the accuracy of 98.9% and 96.6% on the Cornell and Jacquard datasets

respectively. The inference speed reaches 43.5 frames per second with only about 1.2×10

parameters. The proposed AE-GDN has also been deployed on a practical robotic arm grasping system and performs grasping well. Codes are available at

https://github.com/robvincen/robot_gradet

关键词

机器人抓取检测注意力机制编码器–解码器神经网络

Keywords

Robot grasp detectionAttention mechanismEncoder–decoderNeural network

references

Ainetter S, Fraundorfer F, 2021. End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from RGB. Proc IEEE Int Conf on Robotics and Automation, p.13452-13458. 10.1109/ICRA48506.2021.9561398https://doi.org/10.1109/ICRA48506.2021.9561398

Asif U, Bennamoun M, Sohel FA, 2017. RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans Rob, 33(3):547-564. 10.1109/TRO.2016.2638453https://doi.org/10.1109/TRO.2016.2638453

Asif U, Tang JB, Harrer S, 2018. GraspNet: an efficient convolutional neural network for real-time grasp detection for low-powered devices. Proc 27th Int Joint Conf on Artificial Intelligence, p.4875-4882.

Asif U, Tang JB, Harrer S, 2019. Densely supervised grasp detector (DSGD). Proc 33rd AAAI Conf on Artificial Intelligence, p.8085-8093. 10.1609/aaai.v33i01.33018085https://doi.org/10.1609/aaai.v33i01.33018085

Chen L, Huang PF, Meng ZJ, 2019. Convolutional multi-grasp detection using grasp path for RGBD images. Rob Auton Syst, 113:94-103. 10.1016/j.robot.2019.01.009https://doi.org/10.1016/j.robot.2019.01.009

Chen L, Huang PF, Li YH, et al., 2020. Detecting graspable rectangles of objects in robotic grasping. Int J Contr Autom Syst, 18(5):1343-1352. 10.1007/s12555-019-0186-2https://doi.org/10.1007/s12555-019-0186-2

Chu FJ, Xu RN, Vela PA, 2018a. Deep grasp: detection and localization of grasps with deep neural networks. https://arxiv.org/abs/1802.00520v2https://arxiv.org/abs/1802.00520v2

Chu FJ, Xu RN, Vela PA, 2018b. Real-world multiobject, multigrasp detection. IEEE Rob Autom Lett, 3(4):3355-3362. 10.1109/LRA.2018.2852777https://doi.org/10.1109/LRA.2018.2852777

Depierre A, Dellandréa E, Chen LM, 2018. Jacquard: a large scale dataset for robotic grasp detection. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.3511-3516. 10.1109/IROS.2018.8593950https://doi.org/10.1109/IROS.2018.8593950

Fang B, Long XM, Sun FC, et al., 2022. Tactile-based fabric defect detection using convolutional neural network with attention mechanism. IEEE Trans Instrum Meas, 71:5011309. 10.1109/TIM.2022.3165254https://doi.org/10.1109/TIM.2022.3165254

Ghazaei G, Laina I, Rupprecht C, et al., 2018. Dealing with ambiguity in robotic grasping via multiple predictions. Proc 14th Asian Conf on Computer Vision, p.38-55. 10.1007/978-3-030-20870-7_3https://doi.org/10.1007/978-3-030-20870-7_3

Guo D, Sun FC, Liu HP, et al., 2017. A hybrid deep architecture for robotic grasp detection. Proc IEEE Int Conf on Robotics and Automation, p.1609-1614. 10.1109/ICRA.2017.7989191https://doi.org/10.1109/ICRA.2017.7989191

Hara K, Vemulapalli R, Chellappa R, 2017. Designing deep convolutional neural networks for continuous object orientation estimation. https://arxiv.org/abs/1702.01499https://arxiv.org/abs/1702.01499

He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778. 10.1109/CVPR.2016.90https://doi.org/10.1109/CVPR.2016.90

Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. Proc IEEE Conf on on Computer Vision and Pattern Recognition, p.4700-4708. 10.1109/CVPR.2017.243https://doi.org/10.1109/CVPR.2017.243

Jaderberg M, Simonyan K, Zisserman A, et al., 2015. Spatial transformer networks. Proc 28th Int Conf on Neural Information Processing Systems, p.2017-2025.

Jiang Y, Moseson S, Saxena A, 2011. Efficient grasping from RGBD images: learning using a new rectangle representation. Proc IEEE Int Conf on Robotics and Automation, p.3304-3311. 10.1109/ICRA.2011.5980145https://doi.org/10.1109/ICRA.2011.5980145

Karaoguz H, Jensfelt P, 2019. Object detection approach for robot grasp detection. Proc Int Conf on Robotics and Automation, p.4953-4959. 10.1109/ICRA.2019.8793751https://doi.org/10.1109/ICRA.2019.8793751

Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf on Neural Information Processing Systems, p.1097-1105.

Kumra S, Kanan C, 2017. Robotic grasp detection using deep convolutional neural networks. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.769-776. 10.1109/IROS.2017.8202237https://doi.org/10.1109/IROS.2017.8202237

Kumra S, Joshi S, Sahin F, 2020. Antipodal robotic grasping using generative residual convolutional neural network. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.9626-9633. 10.1109/IROS45743.2020.9340777https://doi.org/10.1109/IROS45743.2020.9340777

Lenz I, Lee H, Saxena A, 2015. Deep learning for detecting robotic grasps. Int J Rob Res, 34(4-5):705-724. 10.1177/0278364914549607https://doi.org/10.1177/0278364914549607

Liu FK, Sun F, Fang B, et al., 2023. Hybrid robotic grasping with a soft multimodal gripper and a deep multistage learning scheme. IEEE Trans Rob, 39(3):2379-2399. 10.1109/TRO.2023.3238910https://doi.org/10.1109/TRO.2023.3238910

Mahler J, Liang J, Niyaz S, et al., 2017. Dex-Net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. https://arxiv.org/abs/1703.09312https://arxiv.org/abs/1703.09312

Morrison D, Corke P, Leitner J, 2018. Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. https://arxiv.org/abs/1804.05172https://arxiv.org/abs/1804.05172

Morrison D, Corke P, Leitner J, 2020. Learning robust, real-time, reactive robotic grasping. Int J Rob Res, 39(2-3):183-201. 10.1177/0278364919859066https://doi.org/10.1177/0278364919859066

Park D, Seo Y, Chun SY, 2020. Real-time, highly accurate robotic grasp detection using fully convolutional neural network with rotation ensemble module. Proc IEEE Int Conf on Robotics and Automation, p.9397-9403. 10.1109/ICRA40945.2020.9197002https://doi.org/10.1109/ICRA40945.2020.9197002

Pinto L, Gupta A, 2016. Supersizing self-supervision: learning to grasp from 50K tries and 700 robot hours. Proc IEEE Int Conf on Robotics and Automation, p.3406-3413. 10.1109/ICRA.2016.7487517https://doi.org/10.1109/ICRA.2016.7487517

Quigley M, Conley K, Gerkey BP, et al., 2009. ROS: an open-source robot operating system. Proc ICRA Workshop on Open Source Software, p.5.

Redmon J, Angelova A, 2015. Real-time grasp detection using convolutional neural networks. Proc IEEE Int Conf on Robotics and Automation, p.1316-1322. 10.1109/ICRA.2015.7139361https://doi.org/10.1109/ICRA.2015.7139361

Rezatofighi H, Tsoi N, Gwak J, et al., 2019. Generalized intersection over union: a metric and a loss for bounding box regression. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.658-666. 10.1109/CVPR.2019.00075https://doi.org/10.1109/CVPR.2019.00075

Song YN, Gao L, Li XY, et al., 2020. A novel robotic grasp detection method based on region proposal networks. Rob Comput Integr Manuf, 65:101963. 10.1016/j.rcim.2020.101963https://doi.org/10.1016/j.rcim.2020.101963

Wang Q, Fan Z, Seng WH, et al., 2022. Cloud-assisted cognition adaptation for service robots in changing home environments. Front Inform Technol Electron Eng, 23(2):246-257. 10.1631/FITEE.2000431https://doi.org/10.1631/FITEE.2000431

Wang Y, Zheng YT, Gao BY, et al., 2021. Double-dot network for antipodal grasp detection. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.4654-4661. 10.1109/IROS51168.2021.9636706https://doi.org/10.1109/IROS51168.2021.9636706

Wang ZC, Li ZQ, Wang B, et al., 2016. Robot grasp detection using multimodal deep convolutional neural networks. Adv Mech Eng, 8(9):1687814016668077. 10.1177/1687814016668077https://doi.org/10.1177/1687814016668077

Woo S, Park J, Lee JY, et al., 2018. CBAM: convolutional block attention module. Proc 15th European Conf on Computer Vision, p.3-19. 10.1007/978-3-030-01234-2_1https://doi.org/10.1007/978-3-030-01234-2_1

Zeiler MD, Fergus R, 2014. Visualizing and understanding convolutional networks. Proc 13th European Conf on Computer Vision, p.818-833. 10.1007/978-3-319-10590-1_53https://doi.org/10.1007/978-3-319-10590-1_53

Zhang HB, Lan XG, Bai ST, et al., 2019. RoI-based robotic grasp detection for object overlapping scenes. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.4768-4775. 10.1109/IROS40897.2019.8967869https://doi.org/10.1109/IROS40897.2019.8967869

Zhou XW, Lan XG, Zhang HB, et al., 2018. Fully convolutional grasp detection network with oriented anchor box. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.7223-7230.

Views

103

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

Accurate estimation of 6-DoF tooth pose in 3D intraoral scans for dental applications using deep learning

Exploring nonlinear spatiotemporal effects for personalized next point-of-interest recommendation

Adaptive neural network based boundary control of a flexible marine riser system with output constraints

A personality-guided affective brain–computer interface for implementation of emotional intelligence in machines

Adaptive tracking control of high-order MIMO nonlinear systems with prescribed performance

Related Author

Zuozhu LIU

Jianhua LI

Yang FENG

Hangzheng LIN

Mengfei YU

Kaiwei SUN

Wanghui DING

Zhimin LV

Related Institution

Hangzhou Dental Hospital

Angel Align Inc

Zhejiang University–University of Illinois at Urbana-Champaign Institute, Zhejiang University

Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Engineering Research Center of Oral Biomaterials and Devices of Zhejiang Province

Collaborative Innovation Center of Steel Technology, University of Science and Technology Beijing

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰