FOLLOWUS
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210093, China
Key Laboratory of Safety-Critical Software, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
[ "Tian-bao DU, E-mail: tbdu_312@outlook.com" ]
Guo-hua SHEN, E-mail: ghshen@nuaa.edu.cn
[ "Zhi-qiu HUANG, E-mail: zqhuang@nuaa.edu.cn" ]
Published:2020-08,
Published Online:04 July 2020,
Received:02 May 2019,
Revised:18 May 2020,
Scan QR Code
TIAN-BAO DU, GUO-HUA SHEN, ZHI-QIU HUANG, et al. Automatic traceability link recovery via active learning. [J]. Frontiers of information technology & electronic engineering, 2020, 21(8): 1217-1225.
TIAN-BAO DU, GUO-HUA SHEN, ZHI-QIU HUANG, et al. Automatic traceability link recovery via active learning. [J]. Frontiers of information technology & electronic engineering, 2020, 21(8): 1217-1225. DOI: 10.1631/FITEE.1900222.
可追踪性生成(traceability link recovery,TLR)是一项重要且昂贵的软件任务,需要开发人员在同一项目中建立源制品集合与目标制品集合之间的关系。之前研究提出通过机器学习创建可追踪性方法。但是,当前机器学习方法无法很好地应用于没有追踪信息的项目,因为训练有效的预测模型需要人工标记太多追踪链。为节省人力,提出一种基于主动学习(active learning,AL)的TLR方法,简称基于AL的方法。在7个常用可追踪性数据集上评估该方法,并将其与基于信息检索的方法和最新机器学习方法比较。结果表明,基于AL的方法在F-score方面优于其他两种方法。
Traceability link recovery (TLR) is an important and costly software task that requires humans establish relationships between source and target artifact sets within the same project. Previous research has proposed to establish traceability links by machine learning approaches. However
current machine learning approaches cannot be well applied to projects without traceability information (links)
because training an effective predictive model requires humans label too many traceability links. To save manpower
we propose a new TLR approach based on active learning (AL)
which is called the AL-based approach. We evaluate the AL-based approach on seven commonly used traceability datasets and compare it with an information retrieval based approach and a state-of-the-art machine learning approach. The results indicate that the AL-based approach outperforms the other two approaches in terms of F-score.
自动可追踪性生成人力主动学习
AutomaticTraceability link recoveryManpowerActive learning
G Antoniol, , , G Canfora, , , A Lucia, , , 等. . Information retrieval models for recovering traceability links between code and documentation. . 16th Int Conf on Software Maintenance, , 2000. . p.40--49. . DOI:10.1109/ICSM.2000.883003http://doi.org/10.1109/ICSM.2000.883003..
HU Asuncion, , , AU Asuncion, , , RN Taylor. . Software traceability with topic modeling. . 32nd Int Conf on Software Engineering, , 2010. . p.5--104. . DOI:10.1145/1806799.1806817http://doi.org/10.1145/1806799.1806817..
M Borg, , , P Runeson, , , A Ard. . Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. . Emp Softw Eng, , 2013. . 19((6):):565--1616. . DOI:10.1007/s10664-013-9255-yhttp://doi.org/10.1007/s10664-013-9255-y..
L Breiman. . Random forests. . Mach Learn, , 2001. . 45((1):):5--32. . DOI:10.1023/A:1010933404324http://doi.org/10.1023/A:1010933404324..
NV Chawla, , , KW Bowyer, , , LO Hall, , , 等. . Smote: synthetic minority over-sampling technique. . J Artif Intell Res, , 2002. . 16((1):):321--357. . DOI:10.1613/jair.953http://doi.org/10.1613/jair.953..
Y Cheng, , , ZZ Chen, , , L Liu, , , 等. . Feedback driven multiclass active learning for data streams. . 22nd Int Conf on Information & Knowledge Management, , 2013. . p.1311--1320. . DOI:10.1145/2505515.2505528http://doi.org/10.1145/2505515.2505528..
J Cleland-Huang, , , R Settimi, , , C Duan, , , 等. . Utilizing supporting evidence to improve dynamic requirements traceability. . 13th Int Conf on Requirements Engineering, , 2005. . p.135--144. . DOI:10.1109/RE.2005.78http://doi.org/10.1109/RE.2005.78..
J Cleland-Huang, , , R Settimi, , , XC Zou, , , 等. . Automated classification of non-functional requirements. . Req Eng, , 2007. . 12((2):):103--120. . DOI:10.1007/s00766-007-0045-1http://doi.org/10.1007/s00766-007-0045-1..
J Cleland-Huang, , , A Czauderna, , , M Gibiec, , , 等. . A machine learning approach for tracing regulatory codes to product specific requirements. . 32nd Int Conf on Software Engineering, , 2010. . p.155--164. . DOI:10.1145/1806799.1806825http://doi.org/10.1145/1806799.1806825..
M Gethers, , , R Oliveto, , , D Poshyvanyk, , , 等. . On integrating orthogonal information retrieval methods to improve traceability recovery. . 27th Int Conf on Software Maintenance, , 2011. . p.133--142. . DOI:10.1109/ICSM.2011.6080780http://doi.org/10.1109/ICSM.2011.6080780..
H He, , , E Garcia. . Learning from imbalanced data. . IEEE Trans Knowl Data Eng, , 2009. . 21((9):):1263--1284. . DOI:10.1109/TKDE.2008.239http://doi.org/10.1109/TKDE.2008.239..
G Jin, , , M Gibiec, , , J Cleland-Huang. . Tackling the termmismatch problem in automated trace retrieval. . Emp Softw Eng, , 2017. . 22((3):):1103--1142. . DOI:10.1007/s10664-016-9479-8http://doi.org/10.1007/s10664-016-9479-8..
HY Kuang, , , J Nie, , , H Hu, , , 等. . Analyzing closeness of code dependencies for improving IR-based traceability recovery. . 24th Int Conf on Software Analysis, Evolution, and Reengineering, , 2017. . p.68--78. . DOI:10.1109/SANER.2017.7884610http://doi.org/10.1109/SANER.2017.7884610..
ZH Li, , , MR Chen, , , LG Huang, , , 等. . Recovering traceability links in requirements documents. . 19th Conf on Computational Natural Language Learning, , 2015. . p.237--246. . DOI:10.18653/v1/K15-1024http://doi.org/10.18653/v1/K15-1024..
A Lucia, , , F Fasano, , , R Oliveto, , , 等. . Recovering traceability links in software artifact management systems using information retrieval methods. . ACM Trans Softw Eng Methodol, , 2007. . 16((4):):13DOI:10.1145/1276933.1276934http://doi.org/10.1145/1276933.1276934..
A Lucia, , , A Marcus, , , R Oliveto, , , 等. . Information retrieval methods for automated traceability recovery. . In: Cleland-Huang J, Gotel O, Zisman A (Eds.), Software and Systems Traceability. Springer, London, , 2012. . p.71--98. . DOI:10.1007/978-1-4471-2239-5http://doi.org/10.1007/978-1-4471-2239-5..
A Marcus, , , JI Maletic. . Recovering documentationto-source-code traceability links using latent semantic indexing. . 25th Int Conf on Software Engineering, , 2003. . p.125--135. . DOI:10.1109/ICSE.2003.1201194http://doi.org/10.1109/ICSE.2003.1201194..
A Marcus, , , JI Maletic, , , A Sergeyev. . Recovery of traceability links between software documentation and source code. . Int J Soft Eng Knowl Eng, , 2005. . 15((5):):811--836. . DOI:10.1142/S0218194005002543http://doi.org/10.1142/S0218194005002543..
C Mills, , , S Haiduc. . The impact of retrieval direction on IR-based traceability link recovery. . 39th Int Conf on Software Engineering: New Ideas and Emerging Technologies Results Track, , 2017a. . p.51--54. . DOI:10.1109/ICSE-NIER.2017.14http://doi.org/10.1109/ICSE-NIER.2017.14..
C Mills, , , S Haiduc. . A machine learning approach for determining the validity of traceability links. . 39th Int Conf on Software Engineering Companion, , 2017b. . p.121--123. . DOI:10.1109/ICSE-C.2017.86http://doi.org/10.1109/ICSE-C.2017.86..
C Mills, , , G Bavota, , , S Haiduc, , , 等. . Predicting query quality for applications of text retrieval to software engineering tasks. . ACM Trans Softw Eng Methodol, , 2017. . 26((1):):3DOI:10.1145/3078841http://doi.org/10.1145/3078841..
C Mills, , , J Escobar-Avila, , , S Haiduc. . Automatic traceability maintenance via machine learning classification. . 34th Int Conf on Software Maintenance and Evolution, , 2018. . p.369--380. . DOI:10.1109/ICSME.2018.00045http://doi.org/10.1109/ICSME.2018.00045..
M Mirakhorli, , , Y Shin, , , J Cleland-Huang, , , 等. . A tactic-centric approach for automating traceability of quality concerns. . 34th Int Conf on Software Engineering, , 2012. . p.639--649. . DOI:10.1109/ICSE.2012.6227153http://doi.org/10.1109/ICSE.2012.6227153..
A Panichella, , , C McMillan, , , E Moritz, , , 等. . When and how using structural information to improve IR-based traceability recovery. . 17th European Conf on Software Maintenance and Reengineering, , 2013. . p.199--208. . DOI:10.1109/CSMR.2013.29http://doi.org/10.1109/CSMR.2013.29..
P Rempel, , , P Mder. . Preventing defects: the impact of requirements traceability completeness on software quality. . IEEE Trans Softw Eng, , 2017. . 43((8):):777--797. . DOI:10.1109/TSE.2016.2622264http://doi.org/10.1109/TSE.2016.2622264..
Publicity Resources
Related Articles
Related Author
Related Institution