

FOLLOWUS
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210093, China
Key Laboratory of Safety-Critical Software, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
[ "Tian-bao DU, E-mail: tbdu_312@outlook.com" ]
Guo-hua SHEN, E-mail: ghshen@nuaa.edu.cn
[ "Zhi-qiu HUANG, E-mail: zqhuang@nuaa.edu.cn" ]
Received:02 May 2019,
Revised:2020-;5-18,
Published Online:04 July 2020,
Published:2020-08
Scan QR Code
Tian-bao DU, Guo-hua SHEN, Zhi-qiu HUANG, et al. Automatic traceability link recovery via active learning[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(8): 1217-1225.
Tian-bao DU, Guo-hua SHEN, Zhi-qiu HUANG, et al. Automatic traceability link recovery via active learning[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(8): 1217-1225. DOI: 10.1631/FITEE.1900222.
可追踪性生成(traceability link recovery,TLR)是一项重要且昂贵的软件任务,需要开发人员在同一项目中建立源制品集合与目标制品集合之间的关系。之前研究提出通过机器学习创建可追踪性方法。但是,当前机器学习方法无法很好地应用于没有追踪信息的项目,因为训练有效的预测模型需要人工标记太多追踪链。为节省人力,提出一种基于主动学习(active learning,AL)的TLR方法,简称基于AL的方法。在7个常用可追踪性数据集上评估该方法,并将其与基于信息检索的方法和最新机器学习方法比较。结果表明,基于AL的方法在F-score方面优于其他两种方法。
Traceability link recovery (TLR) is an important and costly software task that requires humans establish relationships between source and target artifact sets within the same project. Previous research has proposed to establish traceability links by machine learning approaches. However
current machine learning approaches cannot be well applied to projects without traceability information (links)
because training an effective predictive model requires humans label too many traceability links. To save manpower
we propose a new TLR approach based on active learning (AL)
which is called the AL-based approach. We evaluate the AL-based approach on seven commonly used traceability datasets and compare it with an information retrieval based approach and a state-of-the-art machine learning approach. The results indicate that the AL-based approach outperforms the other two approaches in terms of F-score.
G Antoniol , , , G Canfora , , , A Lucia , , , 等 . . Information retrieval models for recovering traceability links between code and documentation . . 16 th Int Conf on Software Maintenance , , 2000 . . p.40 - - 49 . . DOI: 10.1109/ICSM.2000.883003 http://doi.org/10.1109/ICSM.2000.883003 . .
HU Asuncion , , , AU Asuncion , , , RN Taylor . . Software traceability with topic modeling . . 32 nd Int Conf on Software Engineering , , 2010 . . p.5 - - 104 . . DOI: 10.1145/1806799.1806817 http://doi.org/10.1145/1806799.1806817 . .
M Borg , , , P Runeson , , , A Ard . . Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability . . Emp Softw Eng , , 2013 . . 19 ( ( 6 ): ): 565 - - 1616 . . DOI: 10.1007/s10664-013-9255-y http://doi.org/10.1007/s10664-013-9255-y . .
L Breiman . . Random forests . . Mach Learn , , 2001 . . 45 ( ( 1 ): ): 5 - - 32 . . DOI: 10.1023/A:1010933404324 http://doi.org/10.1023/A:1010933404324 . .
NV Chawla , , , KW Bowyer , , , LO Hall , , , 等 . . Smote: synthetic minority over-sampling technique . . J Artif Intell Res , , 2002 . . 16 ( ( 1 ): ): 321 - - 357 . . DOI: 10.1613/jair.953 http://doi.org/10.1613/jair.953 . .
Y Cheng , , , ZZ Chen , , , L Liu , , , 等 . . Feedback driven multiclass active learning for data streams . . 22nd Int Conf on Information & Knowledge Management , , 2013 . . p.1311 - - 1320 . . DOI: 10.1145/2505515.2505528 http://doi.org/10.1145/2505515.2505528 . .
J Cleland-Huang , , , R Settimi , , , C Duan , , , 等 . . Utilizing supporting evidence to improve dynamic requirements traceability . . 13 th Int Conf on Requirements Engineering , , 2005 . . p.135 - - 144 . . DOI: 10.1109/RE.2005.78 http://doi.org/10.1109/RE.2005.78 . .
J Cleland-Huang , , , R Settimi , , , XC Zou , , , 等 . . Automated classification of non-functional requirements . . Req Eng , , 2007 . . 12 ( ( 2 ): ): 103 - - 120 . . DOI: 10.1007/s00766-007-0045-1 http://doi.org/10.1007/s00766-007-0045-1 . .
J Cleland-Huang , , , A Czauderna , , , M Gibiec , , , 等 . . A machine learning approach for tracing regulatory codes to product specific requirements . . 32 nd Int Conf on Software Engineering , , 2010 . . p.155 - - 164 . . DOI: 10.1145/1806799.1806825 http://doi.org/10.1145/1806799.1806825 . .
M Gethers , , , R Oliveto , , , D Poshyvanyk , , , 等 . . On integrating orthogonal information retrieval methods to improve traceability recovery . . 27 th Int Conf on Software Maintenance , , 2011 . . p.133 - - 142 . . DOI: 10.1109/ICSM.2011.6080780 http://doi.org/10.1109/ICSM.2011.6080780 . .
H He , , , E Garcia . . Learning from imbalanced data . . IEEE Trans Knowl Data Eng , , 2009 . . 21 ( ( 9 ): ): 1263 - - 1284 . . DOI: 10.1109/TKDE.2008.239 http://doi.org/10.1109/TKDE.2008.239 . .
G Jin , , , M Gibiec , , , J Cleland-Huang . . Tackling the termmismatch problem in automated trace retrieval . . Emp Softw Eng , , 2017 . . 22 ( ( 3 ): ): 1103 - - 1142 . . DOI: 10.1007/s10664-016-9479-8 http://doi.org/10.1007/s10664-016-9479-8 . .
HY Kuang , , , J Nie , , , H Hu , , , 等 . . Analyzing closeness of code dependencies for improving IR-based traceability recovery . . 24 th Int Conf on Software Analysis, Evolution, and Reengineering , , 2017 . . p.68 - - 78 . . DOI: 10.1109/SANER.2017.7884610 http://doi.org/10.1109/SANER.2017.7884610 . .
ZH Li , , , MR Chen , , , LG Huang , , , 等 . . Recovering traceability links in requirements documents . . 19 th Conf on Computational Natural Language Learning , , 2015 . . p.237 - - 246 . . DOI: 10.18653/v1/K15-1024 http://doi.org/10.18653/v1/K15-1024 . .
A Lucia , , , F Fasano , , , R Oliveto , , , 等 . . Recovering traceability links in software artifact management systems using information retrieval methods . . ACM Trans Softw Eng Methodol , , 2007 . . 16 ( ( 4 ): ): 13 DOI: 10.1145/1276933.1276934 http://doi.org/10.1145/1276933.1276934 . .
A Lucia , , , A Marcus , , , R Oliveto , , , 等 . . Information retrieval methods for automated traceability recovery . . In: Cleland-Huang J, Gotel O, Zisman A (Eds.), Software and Systems Traceability. Springer, London , , 2012 . . p.71 - - 98 . . DOI: 10.1007/978-1-4471-2239-5 http://doi.org/10.1007/978-1-4471-2239-5 . .
A Marcus , , , JI Maletic . . Recovering documentationto-source-code traceability links using latent semantic indexing . . 25 th Int Conf on Software Engineering , , 2003 . . p.125 - - 135 . . DOI: 10.1109/ICSE.2003.1201194 http://doi.org/10.1109/ICSE.2003.1201194 . .
A Marcus , , , JI Maletic , , , A Sergeyev . . Recovery of traceability links between software documentation and source code . . Int J Soft Eng Knowl Eng , , 2005 . . 15 ( ( 5 ): ): 811 - - 836 . . DOI: 10.1142/S0218194005002543 http://doi.org/10.1142/S0218194005002543 . .
C Mills , , , S Haiduc . . The impact of retrieval direction on IR-based traceability link recovery . . 39 th Int Conf on Software Engineering: New Ideas and Emerging Technologies Results Track , , 2017a . . p.51 - - 54 . . DOI: 10.1109/ICSE-NIER.2017.14 http://doi.org/10.1109/ICSE-NIER.2017.14 . .
C Mills , , , S Haiduc . . A machine learning approach for determining the validity of traceability links . . 39 th Int Conf on Software Engineering Companion , , 2017b . . p.121 - - 123 . . DOI: 10.1109/ICSE-C.2017.86 http://doi.org/10.1109/ICSE-C.2017.86 . .
C Mills , , , G Bavota , , , S Haiduc , , , 等 . . Predicting query quality for applications of text retrieval to software engineering tasks . . ACM Trans Softw Eng Methodol , , 2017 . . 26 ( ( 1 ): ): 3 DOI: 10.1145/3078841 http://doi.org/10.1145/3078841 . .
C Mills , , , J Escobar-Avila , , , S Haiduc . . Automatic traceability maintenance via machine learning classification . . 34 th Int Conf on Software Maintenance and Evolution , , 2018 . . p.369 - - 380 . . DOI: 10.1109/ICSME.2018.00045 http://doi.org/10.1109/ICSME.2018.00045 . .
M Mirakhorli , , , Y Shin , , , J Cleland-Huang , , , 等 . . A tactic-centric approach for automating traceability of quality concerns . . 34 th Int Conf on Software Engineering , , 2012 . . p.639 - - 649 . . DOI: 10.1109/ICSE.2012.6227153 http://doi.org/10.1109/ICSE.2012.6227153 . .
A Panichella , , , C McMillan , , , E Moritz , , , 等 . . When and how using structural information to improve IR-based traceability recovery . . 17 th European Conf on Software Maintenance and Reengineering , , 2013 . . p.199 - - 208 . . DOI: 10.1109/CSMR.2013.29 http://doi.org/10.1109/CSMR.2013.29 . .
P Rempel , , , P Mder . . Preventing defects: the impact of requirements traceability completeness on software quality . . IEEE Trans Softw Eng , , 2017 . . 43 ( ( 8 ): ): 777 - - 797 . . DOI: 10.1109/TSE.2016.2622264 http://doi.org/10.1109/TSE.2016.2622264 . .
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621