针对水下作业的新型机器人视觉感知框架

鲁岳; 陈星宇; 吴正兴; 喻俊志; 文力

doi:10.1631/FITEE.2100366

Your Location：

Home >

Browse articles >

针对水下作业的新型机器人视觉感知框架

常规文章 | Updated：2023-01-11

- 针对水下作业的新型机器人视觉感知框架
  Enhanced Publication
- A novel robotic visual perception framework for underwater operation
- 信息与电子工程前沿（英文） 2022年23卷第11期页码：1602-1619
- Affiliations：
  
  1.State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  2.Ytech, Kuaishou Technology, Beijing 100085, China
  3.State Key Laboratory for Turbulence and Complex Systems, Department of Advanced Manufacturing and Robotics, College of Engineering, Peking University, Beijing 100871, China
  4.School of Mechanical Engineering and Automation, Beihang University, Beijing 100191, China
- Author bio：
  
  ‡Corresponding author
- Funds：
  
  National Natural Science Foundation of China(61633004;61725305;62073196);S&T Program of Hebei Province, China(F2020203037)
- DOI：10.1631/FITEE.2100366
  中图分类号： TP391.4
- 纸质出版日期：2022-11-0 ，
  
  网络出版日期：2022-05-31，
  
  收稿日期：2021-07-29，
  
  录用日期：2022-01-25
- Accepted：
Scan QR Code
鲁岳, 陈星宇, 吴正兴, 等. 针对水下作业的新型机器人视觉感知框架[J]. 信息与电子工程前沿（英文）, 2022,23(11):1602-1619.

YUE LU, XINGYU CHEN, ZHENGXING WU, et al. A novel robotic visual perception framework for underwater operation. [J]. Frontiers of information technology & electronic engineering, 2022, 23(11): 1602-1619.
鲁岳, 陈星宇, 吴正兴, 等. 针对水下作业的新型机器人视觉感知框架[J]. 信息与电子工程前沿（英文）, 2022,23(11):1602-1619. DOI： 10.1631/FITEE.2100366.

YUE LU, XINGYU CHEN, ZHENGXING WU, et al. A novel robotic visual perception framework for underwater operation. [J]. Frontiers of information technology & electronic engineering, 2022, 23(11): 1602-1619. DOI： 10.1631/FITEE.2100366.

摘要

水下机器人操作通常需要视觉感知（如目标检测和跟踪），但水下场景视觉质量较差，且代表一种特殊分布，会影响视觉感知的准确性。同时，检测的连续性和稳定性对机器人感知也很重要，但常用的基于静态精度的评估（即平均精度（average precision））不足以反映检测器的时序性能。针对这两个问题，本文提出一种新型机器人视觉感知框架。首先，研究不同质量的数据分布与视觉恢复在检测性能上的关系。结果表明虽然分布质量对分布内检测精度几乎没有影响，但是视觉恢复可以通过缓解分布漂移，从而有益于真实海洋场景的检测。此外，提出基于目标轨迹的检测连续性和稳定性的非参考评估方法，以及一种在线轨迹优化（online tracklet refinement，OTR）来提高检测器的时间性能。最后，结合视觉恢复，建立精确稳定的水下机器人视觉感知框架。为了将视频目标检测（video object detection，VID）方法扩展到单目标跟踪任务，提出小交并比抑制（small-overlap suppression，SOS）方法，实现目标检测和目标跟踪之间的灵活切换。基于ImageNet VID数据集和真实环境下的机器人任务进行了大量实验，实验结果验证了所作分析的正确性及所提方法的优越性。代码公开在https://github.com/yrqs/VisPerception。

Abstract

Underwater robotic operation usually requires visual perception (e.g.

object detection and tracking)

but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual perception. In addition

detection continuity and stability are important for robotic perception

but the commonly used static accuracy based evaluation (i.e.

average precision) is insufficient to reflect detector performance across time. In response to these two problems

we present a design for a novel robotic visual perception framework. First

we generally investigate the relationship between a quality-diverse data domain and visual restoration in detection performance. As a result

although domain quality has an ignorable effect on within-domain detection accuracy

visual restoration is beneficial to detection in real sea scenarios by reducing the domain shift. Moreover

non-reference assessments are proposed for detection continuity and stability based on object tracklets. Further

online tracklet refinement is developed to improve the temporal performance of detectors. Finally

combined with visual restoration

an accurate and stable underwater robotic visual perception framework is established. Small-overlap suppression is proposed to extend video object detection (VID) methods to a single-object tracking task

leading to the flexibility to switch between detection and tracking. Extensive experiments were conducted on the ImageNet VID dataset and real-world robotic tasks to verify the correctness of our analysis and the superiority of our proposed approaches. The codes are available at https://github.com/yrqs/VisPerception.

关键词

水下作业机器人感知视觉恢复视频目标检测

Keywords

Underwater operationRobotic perceptionVisual restorationVideo object detection

references

Bernardin K, Stiefelhagen R, 2008. Evaluating multiple object tracking performance: the clear MOT metrics. EURASIP J Image Video Process, 2008:246309.

Bertasius G, Torresani L, Shi JB, 2018. Object detection in video with spatiotemporal sampling networks. Proc 15th European Conf on Computer Vision, p.342-357. doi: 10.1007/978-3-030-01258-8_21http://doi.org/10.1007/978-3-030-01258-8_21

Cai MX, Wang Y, Wang S, et al., 2020. Grasping marine products with hybrid-driven underwater vehicle-manipulator system. IEEE Trans Autom Sci Eng, 17(3):1443-1454. doi: 10.1109/TASE.2019.2957782http://doi.org/10.1109/TASE.2019.2957782

Chen XY, Yang XY, Kong SH, et al., 2019a. Dual refinement network for single-shot object detection. Proc Int Conf on Robotics and Automation, p.8305-8310. doi: 10.1109/ICRA.2019.8793816http://doi.org/10.1109/ICRA.2019.8793816

Chen XY, Yu JZ, Kong SH, et al., 2019b. Towards real-time advancement of underwater visual quality with GAN. IEEE Trans Ind Electron, 66(12):9350-9359. doi: 10.1109/TIE.2019.2893840http://doi.org/10.1109/TIE.2019.2893840

Chen XY, Yu JZ, Wu ZX, 2020. Temporally identity-aware SSD with attentional LSTM. IEEE Trans Cybern, 50(6):2674-2686. doi: 10.1109/TCYB.2019.2894261http://doi.org/10.1109/TCYB.2019.2894261

Chen XY, Yu JZ, Kong SH, et al., 2021. Joint anchor-feature refinement for real-time accurate object detection in images and videos. IEEE Trans Circ Syst Video Technol, 31(2):594-607. doi: 10.4324/9781003144281-4http://doi.org/10.4324/9781003144281-4

Chen YH, Li W, Sakaridis C, et al., 2018. Domain adaptive faster R-CNN for object detection in the wild. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3339-3348. doi: 10.1109/CVPR.2018.00352http://doi.org/10.1109/CVPR.2018.00352

Chi C, Zhang SF, Xing JL, et al., 2019. Selective refinement network for high performance face detection. Proc AAAI Conf on Artificial Intelligence, p.8231-8238. doi: 10.1609/aaai.v33i01.33018231http://doi.org/10.1609/aaai.v33i01.33018231

Everingham M, van Gool L, Williams CKI, et al., 2010. The PASCAL visual object classes (VOC) challenge. Int J Comput Vis, 88(2):303-338. doi: 10.1007/s11263-009-0275-4http://doi.org/10.1007/s11263-009-0275-4

Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. Proc IEEE Int Conf on Computer Vision, p.3057-3065. doi: 10.1109/ICCV.2017.330http://doi.org/10.1109/ICCV.2017.330

Gong ZY, Cheng JH, Chen XY, et al., 2018. A bio-inspired soft robotic arm: kinematic modeling and hydrodynamic experiments. J Bion Eng, 15(2):204-219. doi: 10.1007/s42235-018-0016-xhttp://doi.org/10.1007/s42235-018-0016-x

He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778. doi: 10.1109/CVPR.2016.90http://doi.org/10.1109/CVPR.2016.90

Howard AG, Zhu ML, Chen B, et al., 2017. MobileNets: efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861https://arxiv.org/abs/1704.04861

Inoue N, Furuta R, Yamasaki T, et al., 2018. Cross-domain weakly-supervised object detection through progressive domain adaptation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5001-5009. doi: 10.1109/CVPR.2018.00525http://doi.org/10.1109/CVPR.2018.00525

Kalman RE, 1960. A new approach to linear filtering and prediction problems. J Bas Eng, 82(1):35-45. doi: 10.1115/1.3662552http://doi.org/10.1115/1.3662552

Kalogeiton V, Ferrari V, Schmid C, 2016. Analysing domain shift factors between videos and images for object detection. IEEE Trans Patt Anal Mach Intell, 38(11):2327-2334. doi: 10.1109/TPAMI.2016.2551239http://doi.org/10.1109/TPAMI.2016.2551239

Kang K, Li HS, Yan JJ, et al., 2018. T-CNN: tubelets with convolutional neural networks for object detection from videos. IEEE Trans Circ Syst Video Technol, 28(10):2896-2907. doi: 10.1109/TCSVT.2017.2736553http://doi.org/10.1109/TCSVT.2017.2736553

Khodabandeh M, Vahdat A, Ranjbar M, et al., 2019. A robust learning approach to domain adaptive object detection. Proc IEEE/CVF Int Conf on Computer Vision, p.480-490. doi: 10.1109/ICCV.2019.00057http://doi.org/10.1109/ICCV.2019.00057

Kim HU, Kim CS, 2016. CDT: cooperative detection and tracking for tracing multiple objects in video sequences. Proc 14th European Conf on Computer Vision, p.851-867. doi: 10.1007/978-3-319-46466-4_51http://doi.org/10.1007/978-3-319-46466-4_51

Kristan M, Leonardis A, Matas J, et al., 2018. The sixth visual object tracking VOT2018 challenge results. Proc European Conf on Computer Vision, p.3-53. doi: 10.1007/978-3-030-11009-3_1http://doi.org/10.1007/978-3-030-11009-3_1

Li B, Xu YX, Fan SS, et al., 2018. Underwater docking of an under-actuated autonomous underwater vehicle: system design and control implementation. Front Inform Technol Electron Eng, 19(8):1024-1041. doi: 10.1631/FITEE.1700382http://doi.org/10.1631/FITEE.1700382

Li CY, Guo JC, Cong RM, et al., 2016. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans Image Process, 25(12):5664-5677. doi: 10.1109/tip.2016.2612882http://doi.org/10.1109/tip.2016.2612882

Lin TY, Goyal P, Girshick R, et al., 2017. Focal loss for dense object detection. Proc IEEE Int Conf on Computer Vision, p.2999-3007. doi: 10.1109/ICCV.2017.324http://doi.org/10.1109/ICCV.2017.324

Liu RS, Fan X, Zhu M, et al., 2020. Real-world underwater enhancement: challenges, benchmarks, and solutions under natural light. IEEE Trans Circ Syst Video Technol, 30(12):4861-4875. doi: 10.1109/TCSVT.2019.2963772http://doi.org/10.1109/TCSVT.2019.2963772

Liu W, Anguelov D, Erhan D, et al., 2016. SSD: single shot multibox detector. Proc 14th European Conf on Computer Vision, p.21-37. doi: 10.1007/978-3-319-46448-0_2http://doi.org/10.1007/978-3-319-46448-0_2

Lowe DG, 2004. Distinctive image features from scale-invariant keypoints. Int J Comput Vis, 60(2):91-110. doi: 10.1023/B:VISI.0000029664.99615.94http://doi.org/10.1023/B:VISI.0000029664.99615.94

Luo H, Xie WX, Wang XG, et al., 2019. Detect or track: towards cost-effective video object detection/tracking. Proc AAAI Conf on Artificial Intelligence, p.8803-8810. doi: 10.1609/aaai.v33i01.33018803http://doi.org/10.1609/aaai.v33i01.33018803

Panetta K, Gao C, Agaian S, 2016. Human-visual-system-inspired underwater image quality measures. IEEE J Ocean Eng, 41(3):541-551. doi: 10.1109/JOE.2015.2469915http://doi.org/10.1109/JOE.2015.2469915

Raj A, Namboodiri VP, Tuytelaars T, 2015. Subspace alignment based domain adaptation for RCNN detector. Proc British Machine Vision Conf, p.166.1-166.11.

Russakovsky O, Deng J, Su H, et al., 2015. ImageNet large scale visual recognition challenge. Int J Comput Vis, 115(3):211-252. doi: 10.1007/s11263-015-0816-yhttp://doi.org/10.1007/s11263-015-0816-y

Schechner YY, Karpel N, 2004. Clear underwater vision. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.536-543. doi: 10.1109/CVPR.2004.1315078http://doi.org/10.1109/CVPR.2004.1315078

Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556https://arxiv.org/abs/1409.1556

Xu JL, Ramos S, Vázquez D, et al., 2014. Domain adaptation of deformable part-based models. IEEE Trans Patt Anal Mach Intell, 36(12):2367-2380. doi: 10.1109/TPAMI.2014.2327973http://doi.org/10.1109/TPAMI.2014.2327973

Yang M, Sowmya A, 2015. An underwater color image quality evaluation metric. IEEE Trans Image Process, 24(12):6062-6071. doi: 10.1109/TIP.2015.2491020http://doi.org/10.1109/TIP.2015.2491020

Zhang SF, Wen LY, Bian X, et al., 2018. Single-shot refinement neural network for object detection. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4203-4212. doi: 10.1109/CVPR.2018.00442http://doi.org/10.1109/CVPR.2018.00442

Zhou XY, Zhuo JC, Krähenbühl P, 2019. Bottom-up object detection by grouping extreme and center points. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.850-859. doi: 10.1109/CVPR.2019.00094http://doi.org/10.1109/CVPR.2019.00094

Zhu DQ, Qu Y, Yang SX, 2019. Multi-AUV SOM task allocation algorithm considering initial orientation and ocean current environment. Front Inform Technol Electron Eng, 20(3):330-341. doi: 10.1631/FITEE.1800562http://doi.org/10.1631/FITEE.1800562

Zhu YS, Zhao CY, Guo HY, et al., 2019. Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process, 28(1):113-126. doi: 10.1109/TIP.2018.2865280http://doi.org/10.1109/TIP.2018.2865280

浏览量

Downloads

CSCD

文章被引用时，请邮件提醒。

Submit

工具集

关联资源

暂无数据