

FOLLOWUS
Hikvision Research Institute, Hangzhou 310000, China
Liang MA, E-mail: maliang6@hikvision.com
[ "Qiaoyong ZHONG, E-mail: zhongqiaoyong@hikvision.com" ]
[ "Yingying ZHANG, E-mail: zhangyingying7@hikvision.com" ]
[ "Di XIE, E-mail: xiedi@hikvision.com" ]
[ "Shiliang PU, E-mail: pushiliang.hri@hikvision.com" ]
Received:04 June 2020,
Revised:2021-;7-22,
Published:2021-09
Scan QR Code
Liang MA, Qiaoyong ZHONG, Yingying ZHANG, et al. Associative affinity network learning for multi-object tracking[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1194-1206.
Liang MA, Qiaoyong ZHONG, Yingying ZHANG, et al. Associative affinity network learning for multi-object tracking[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1194-1206. DOI: 10.1631/FITEE.2000272.
为解决视频多目标跟踪问题,提出一种特征和度量联合学习的深度神经网络架构,称为关联相似度网络。关联相似度网络以端到端的方式学习跟踪轨迹和检测结果之间的关联相似度。针对有缺陷的检测结果,关联相似度网络同时学习矩形框回归、目标分类和相似度回归3个任务。不同于现有基于对比排序思想的方法,我们直接训练一个二分类器来学习跟踪轨迹与检测结果的关联相似度,同时设计了损失函数来约束匹配集合元素的个数。得益于上述设计,关联相似度网络不仅能够解决多目标跟踪问题中的匹配问题,还可以进行单目标跟踪。基于提出的关联相似度网络,设计了一个简单的多目标跟踪算法,在MOT16和MOT17测试集上的实验结果表明其有效性。
We propose a joint feature and metric learning deep neural network architecture
called the associative affinity network (AAN)
as an affinity model for multi-object tracking (MOT) in videos. The AAN learns the associative affinity between tracks and detections across frames in an end-to-end manner. Considering flawed detections
the AAN jointly learns bounding box regression
classification
and affinity regression via the proposed multi-task loss. Contrary to networks that are trained with ranking loss
we directly train a binary classifier to learn the associative affinity of each track-detection pair and use a matching cardinality loss to capture information among candidate pairs. The AAN learns a discriminative affinity model for data association to tackle MOT
and can also perform single-object tracking. Based on the AAN
we propose a simple multi-object tracker that achieves competitive performance on the public MOT16 and MOT17 test datasets.
A Andriyenko , , , S Roth , , , K Schindler . . An analytical formulation of global occlusion reasoning for multi-target tracking . . IEEE Int Conf on Computer Vision Workshops , , 2011 . . p.1839 - - 1846 . . DOI: 10.1109/ICCVW.2011.6130472 http://doi.org/10.1109/ICCVW.2011.6130472 . .
P Bergmann , , , T Meinhardt , , , L Leal-Taixé . . Tracking without bells and whistles . . IEEE/CVF Int Conf on Computer Vision , , 2019a . . p.941 - - 951 . . DOI: 10.1109/ICCV.2019.00103 http://doi.org/10.1109/ICCV.2019.00103 . .
P Bergmann , , , T Meinhardt , , , L Leal-Taixé . . Tracktor++_v2 . . Available from , , 2019b . . 2020 [Accessed on July 9, 2020] https://github.com/phil-bergmann/tracking_wo_bnw https://github.com/phil-bergmann/tracking_wo_bnw , , . .
S Bullinger , , , C Bodensteiner , , , M Arens . . Instance flow based online multiple object tracking . . IEEE Int Conf on Image Processing , , 2017 . . p. 785 - - 789 . . DOI: 10.1109/ICIP.2017.8296388 http://doi.org/10.1109/ICIP.2017.8296388 . .
L Chen , , , HZ Ai , , , ZJ Zhuang , , , 等 . . Real-time multiple people tracking with deeply learned candidate selection and person re-identification . . IEEE Int Conf on Multimedia and Expo , , 2018 . . p.1 - - 6 . . DOI: 10.1109/ICME.2018.8486597 http://doi.org/10.1109/ICME.2018.8486597 . .
S Chen , , , C Gong , , , J Yang , , , 等 . . Adversarial metric learning . . Proc 27 th Int Joint Conf on Artificial Intelligence , , 2018 . . p.2021 - - 2027 . . DOI: 10.24963/IJCAI.2018/279 http://doi.org/10.24963/IJCAI.2018/279 . .
S Chen , , , L Luo , , , J Yang , , , 等 . . Curvilinear distance metric learning . . Proc 33 rd Int Conf on Neural Information Processing Systems , , 2019 . . p.4223 - - 4232 . . . .
W Choi . . Near-online multi-target tracking with aggregated local flow descriptor . . IEEE Int Conf on Computer Vision , , 2015 . . p.3029 - - 3037 . . DOI: 10.1109/ICCV.2015.347 http://doi.org/10.1109/ICCV.2015.347 . .
P Chu , , , HB Ling . . FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking . . IEEE/CVF Int Conf on Computer Vision , , 2019 . . p.6171 - - 6180 . . DOI: 10.1109/ICCV.2019.00627 http://doi.org/10.1109/ICCV.2019.00627 . .
Q Chu , , , WL Ouyang , , , HS Li , , , 等 . . Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism . . Proc IEEE Int Conf on Computer Vision , , 2017 . . p.4846 - - 4855 . . DOI: 10.1109/ICCV.2017.518 http://doi.org/10.1109/ICCV.2017.518 . .
N Dalal , , , B Triggs . . Histograms of oriented gradients for human detection . . IEEE Computer Society Conf on Computer Vision and Pattern Recognition , , 2005 . . p.886 - - 893 . . DOI: 10.1109/CVPR.2005.177 http://doi.org/10.1109/CVPR.2005.177 . .
YQ Duan , , , JW Lu , , , WH Zheng , , , 等 . . Deep adversarial metric learning . . IEEE Trans Image Process , , 2020 . . 29 2037 - - 2051 . . DOI: 10.1109/TIP.2019.2948472 http://doi.org/10.1109/TIP.2019.2948472 . .
P Emami , , , S Ranka . . Learning permutations with sinkhorn policy gradient , , 2018 . . https://arxiv.org/abs/1805.07010 https://arxiv.org/abs/1805.07010 , , . .
L Fagot-Bouquet , , , R Audigier , , , Y Dhome , , , 等 . . Improving multi-frame data association with sparse representations for robust near-online multi-object tracking . . Proc 14 th European Conf on Computer Vision , , 2016 . . p. 774 - - 790 . . DOI: 10.1007/978-3-319-46484-8_47 http://doi.org/10.1007/978-3-319-46484-8_47 . .
K Fang , , , Y Xiang , , , XC Li , , , 等 . . Recurrent autoregressive networks for online multi-object tracking . . IEEE Winter Conf on Applications of Computer Vision , , 2018 . . p.466 - - 475 . . DOI: 10.1109/WACV.2018.00057 http://doi.org/10.1109/WACV.2018.00057 . .
C Feichtenhofer , , , A Pinz , , , A Zisserman . . Detect to track and track to detect . . IEEE Int Conf on Computer Vision , , 2017 . . p.3057 - - 3065 . . DOI: 10.1109/ICCV.2017.330 http://doi.org/10.1109/ICCV.2017.330 . .
PF Felzenszwalb , , , RB Girshick , , , D McAllester , , , 等 . . Object detection with discriminatively trained part-based models . . IEEE Trans Patt Anal Mach Intell , , 2010 . . 32 ( ( 9 ): ): 1627 - - 1645 . . DOI: 10.1109/TPAMI.2009.167 http://doi.org/10.1109/TPAMI.2009.167 . .
XF Han , , , T Leung , , , YG Jia , , , 等 . . MatchNet: unifying feature and metric learning for patch-based matching . . IEEE Conf on Computer Vision and Pattern Recognition , , 2015 . . p.3279 - - 3286 . . DOI: 10.1109/CVPR.2015.7298948 http://doi.org/10.1109/CVPR.2015.7298948 . .
KM He , , , G Gkioxari , , , P Dollãr , , , 等 . . Mask R-CNN . . IEEE Int Conf on Computer Vision , , 2017 . . p.2980 - - 2988 . . DOI: 10.1109/ICCV.2017.322 http://doi.org/10.1109/ICCV.2017.322 . .
R Henschel , , , L Leal-Taixé , , , D Cremers , , , 等 . . Fusion of head and full-body detectors for multi-object tracking . . IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops , , 2018 . . p.1509 - - 1518 . . DOI: 10.1109/CVPRW.2018.00192 http://doi.org/10.1109/CVPRW.2018.00192 . .
A Hermans , , , L Beyer , , , B Leibe . . In defense of the triplet loss for person re-identification , , 2017 . . https://arxiv.org/abs/1703.07737 https://arxiv.org/abs/1703.07737 , , . .
E Ilg , , , N Mayer , , , T Saikia , , , 等 . . FlowNet 2.0: evolution of optical flow estimation with deep networks . . IEEE Conf on Computer Vision and Pattern Recognition , , 2017 . . p. 1647 - - 1655 . . DOI: 10.1109/CVPR.2017.179 http://doi.org/10.1109/CVPR.2017.179 . .
M Keuper , , , SY Tang , , , ZJ Yu , , , 等 . . A multi-cut formulation for joint segmentation and tracking of multiple objects , , 2016 . . https://arxiv.org/abs/1607.06317 https://arxiv.org/abs/1607.06317 , , . .
C Kim , , , FX Li , , , A Ciptadi , , , 等 . . Multiple hypothesis tracking revisited . . IEEE Int Conf on Computer Vision , , 2015 . . p. 4696 - - 4704 . . DOI: 10.1109/ICCV.2015.533 http://doi.org/10.1109/ICCV.2015.533 . .
L Lan , , , DC Tao , , , C Gong , , , 等 . . Online multi-object tracking by quadratic pseudo-Boolean optimization . . Proc 25 th Int Joint Conf on Artificial Intelligence , , 2016 . . p.3396 - - 3402 . . . .
L Leal-Taixé , , , C Canton-Ferrer , , , K Schindler . . Learning by tracking: Siamese CNN for robust target association . . IEEE Conf on Computer Vision and Pattern Recognition Workshops , , 2016 . . p.418 - - 425 . . DOI: 10.1109/CVPRW.2016.59 http://doi.org/10.1109/CVPRW.2016.59 . .
C Ma , , , CS Yang , , , F Yang , , , 等 . . Trajectory factory: tracklet cleaving and re-connection by deep Siamese Bi-GRU for multiple object tracking . . IEEE Int Conf on Multimedia and Expo , , 2018 . . p.1 - - 6 . . DOI: 10.1109/ICME.2018.8486454 http://doi.org/10.1109/ICME.2018.8486454 . .
A Maksai , , , XC Wang , , , F Fleuret , , , 等 . . Non-Markovian globally consistent multi-object tracking . . IEEE Int Conf on Computer Vision , , 2017 . . p.2563 - - 2573 . . DOI: 10.1109/ICCV.2017.278 http://doi.org/10.1109/ICCV.2017.278 . .
A Milan , , , SH Rezatofighi , , , R Garg , , , 等 . . Data-driven approximations to NP-hard problems . . Proc 31 st AAAI Conf on Artificial Intelligence , , 2017a . . p.1453 - - 1459 . . . .
A Milan , , , SH Rezatofighi , , , A Dick , , , 等 . . Online multi-target tracking using recurrent neural networks . . Proc 31 st AAAI Conf on Artificial Intelligence , , 2017b . . p.4225 - - 4232 . . . .
K Nummiaro , , , E Koller-Meier , , , L van Gool . . An adaptive color-based particle filter . . Image Vis Comput , , 2003 . . 21 ( ( 1 ): ): 99 - - 110 . . DOI: 10.1016/S0262-8856(02)00129-4 http://doi.org/10.1016/S0262-8856(02)00129-4 . .
SQ Ren , , , KM He , , , R Girshick , , , 等 . . Faster R-CNN: towards real-time object detection with region proposal networks . . IEEE Trans Patt Anal Mach Intell , , 2017 . . 39 ( ( 6 ): ): 1137 - - 1149 . . DOI: 10.1109/TPAMI.2016.2577031 http://doi.org/10.1109/TPAMI.2016.2577031 . .
SH Rezatofighi , , , A Milan , , , Z Zhang , , , 等 . . Joint probabilistic data association revisited . . IEEE Int Conf on Computer Vision , , 2015 . . p.3047 - - 3055 . . DOI: 10.1109/ICCV.2015.349 http://doi.org/10.1109/ICCV.2015.349 . .
E Ristani , , , C Tomasi . . Features for multi-target multi-camera tracking and re-identification . . IEEE/CVF Conf on Computer Vision and Pattern Recognition , , 2018 . . p.6036 - - 6046 . . DOI: 10.1109/CVPR.2018.00632 http://doi.org/10.1109/CVPR.2018.00632 . .
E Ristani , , , F Solera , , , R Zou , , , 等 . . Performance measures and a data set for multi-target, multi-camera tracking . . European Conf on Computer Vision , , 2016 . . p.17 - - 35 . . DOI: 10.1007/978-3-319-48881-3_2 http://doi.org/10.1007/978-3-319-48881-3_2 . .
A Sadeghian , , , A Alahi , , , S Savarese . . Tracking the untrackable: learning to track multiple cues with long-term dependencies . . IEEE Int Conf on Computer Vision , , 2017 . . p.300 - - 311 . . DOI: 10.1109/ICCV.2017.41 http://doi.org/10.1109/ICCV.2017.41 . .
S Schulter , , , P Vernaza , , , W Choi , , , 等 . . Deep network flow for multi-object tracking . . IEEE Conf on Computer Vision and Pattern Recognition , , 2017 . . p.2730 - - 2739 . . DOI: 10.1109/CVPR.2017.292 http://doi.org/10.1109/CVPR.2017.292 . .
H Shen , , , LC Huang , , , C Huang , , , 等 . . Tracklet association tracker: an end-to-end learning-based association approach for multi-object tracking , , 2018 . . https://arxiv.org/abs/1808.01562 https://arxiv.org/abs/1808.01562 , , . .
A Shrivastava , , , A Gupta , , , R Girshick . . Training region-based object detectors with online hard example mining . . IEEE Conf on Computer Vision and Pattern Recognition , , 2016 . . p. 761 - - 769 . . DOI: 10.1109/CVPR.2016.89 http://doi.org/10.1109/CVPR.2016.89 . .
J Son , , , M Baek , , , M Cho , , , 等 . . Multi-object tracking with quadruplet convolutional neural networks . . IEEE Conf on Computer Vision and Pattern Recognition , , 2017 . . p.3786 - - 3795 . . DOI: 10.1109/CVPR.2017.403 http://doi.org/10.1109/CVPR.2017.403 . .
SJ Sun , , , N Akhtar , , , HS Song , , , 等 . . Deep affinity network for multiple object tracking . . IEEE Trans Patt Anal Mach Intell , , 2021 . . 43 ( ( 1 ): ): 104 - - 119 . . DOI: 10.1109/TPAMI.2019.2929520 http://doi.org/10.1109/TPAMI.2019.2929520 . .
SY Tang , , , M Andriluka , , , B Andres , , , 等 . . Multiple people tracking by lifted multicut and person re-identification . . IEEE Conf on Computer Vision and Pattern Recognition , , 2017 . . p.3701 - - 3710 . . DOI: 10.1109/CVPR.2017.394 http://doi.org/10.1109/CVPR.2017.394 . .
B Wang , , , L Wang , , , B Shuai , , , 等 . . Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association . . IEEE Conf on Computer Vision and Pattern Recognition Workshops , , 2016 . . p.386 - - 393 . . DOI: 10.1109/CVPRW.2016.55 http://doi.org/10.1109/CVPRW.2016.55 . .
XY Wang , , , TX Han , , , S Yan . . An HOG-LBP human detector with partial occlusion handling . . Proc IEEE 12 th Int Conf on Computer Vision , , 2009 . . p.32 - - 39 . . DOI: 10.1109/ICCV.2009.5459207 http://doi.org/10.1109/ICCV.2009.5459207 . .
N Wojke , , , A Bewley , , , D Paulus . . Simple online and realtime tracking with a deep association metric . . IEEE Int Conf on Image Processing , , 2017 . . p.3645 - - 3649 . . DOI: 10.1109/ICIP.2017.8296962 http://doi.org/10.1109/ICIP.2017.8296962 . .
J Xiang , , , N Sang , , , JH Hou , , , 等 . . Hough forest-based association framework with occlusion handling for multi-target tracking . . IEEE Signal Process Lett , , 2016 . . 23 ( ( 2 ): ): 257 - - 261 . . DOI: 10.1109/LSP.2015.2512878 http://doi.org/10.1109/LSP.2015.2512878 . .
J Xiang , , , GH Xu , , , C Ma , , , 等 . . End-to-end learning deep CRF models for multi-object tracking . . IEEE Trans Circ Syst Video Technol , , 2021 . . 31 ( ( 1 ): ): 275 - - 288 . . DOI: 10.1109/TCSVT.2020.2975842 http://doi.org/10.1109/TCSVT.2020.2975842 . .
Y Xiang , , , A Alahi , , , S Savarese . . Learning to track: online multi-object tracking by decision making . . IEEE Int Conf on Computer Vision , , 2015 . . p.4705 - - 4713 . . DOI: 10.1109/ICCV.2015.534 http://doi.org/10.1109/ICCV.2015.534 . .
B Yang , , , R Nevatia . . Multi-target tracking by online learning a CRF model of appearance and motion patterns . . Int J Comput Vis , , 2014 . . 107 ( ( 2 ): ): 203 - - 217 . . DOI: 10.1007/S11263-013-0666-4 http://doi.org/10.1007/S11263-013-0666-4 . .
F Yang , , , W Choi , , , YQ Lin . . Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers . . IEEE Conf on Computer Vision and Pattern Recognition , , 2016 . . p.2129 - - 2137 . . DOI: 10.1109/CVPR.2016.234 http://doi.org/10.1109/CVPR.2016.234 . .
JB Yin , , , WG Wang , , , QH Meng , , , 等 . . A unified object motion and affinity model for online multi-object tracking . . IEEE/CVF Conf on Computer Vision and Pattern Recognition , , 2020 . . p.6767 - - 6776 . . DOI: 10.1109/CVPR42600.2020.00680 http://doi.org/10.1109/CVPR42600.2020.00680 . .
JMY Zhang , , , SP Zhou , , , X Chang , , , 等 . . Multiple object tracking by flowing and fusing , , 2020 . . https://arxiv.org/abs/2001.11180 https://arxiv.org/abs/2001.11180 , , . .
XY Zhou , , , V Koltun , , , P Krähenühl . . Tracking objects as points , , 2020 . . https://arxiv.org/abs/2004.01177 https://arxiv.org/abs/2004.01177 , , . .
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621