[1]盖金晶,郑 尚,于化龙,等.一种跨项目缺陷预测的源项目训练数据选择方法[J].南京师大学报(自然科学版),2022,(01):110-117.[doi:10.3969/j.issn.1001-4616.2022.01.016]
 Gai Jinjing,Zheng Shang,Yu Hualong,et al.A Cross Project Defect Prediction Method for Source Project Training Data Selection[J].Journal of Nanjing Normal University(Natural Science Edition),2022,(01):110-117.[doi:10.3969/j.issn.1001-4616.2022.01.016]
点击复制

一种跨项目缺陷预测的源项目训练数据选择方法()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
期数:
2022年01期
页码:
110-117
栏目:
·计算机科学与技术·
出版日期:
2022-03-15

文章信息/Info

Title:
A Cross Project Defect Prediction Method for Source Project Training Data Selection
文章编号:
1001-4616(2022)01-0110-08
作者:
盖金晶郑 尚于化龙高 尚
(江苏科技大学计算机学院,江苏 镇江 212100)
Author(s):
Gai JinjingZheng ShangYu HualongGao Shang
(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China)
关键词:
跨项目缺陷预测软件质量数据选择JS散度相对密度混合高斯模型蒙特卡洛
Keywords:
cross projectdefect predictionsoftware qualitydata selectionJS divergencerelative densitymixed Gaussian modelMonte Carlo
分类号:
TP311.5
DOI:
10.3969/j.issn.1001-4616.2022.01.016
文献标志码:
A
摘要:
跨项目软件缺陷预测(cross project defect prediction,CPDP)旨在实际的软件开发场景中,需要进行缺陷预测的目标项目可能是一个新启动项目,或这个项目已有的训练数据较为稀缺,需要利用其他项目已经搜集的训练数据来构建缺陷预测模型,其已经成为软件质量保证的一种手段,吸引了国内外研究人员的关注. 然而,面对不同的目标项目,训练数据的选择将直接影响预测模型的性能. 为了解决这个问题,本文描述了一种基于JS散度(Jensen-Shannon divergence)和相对密度的跨项目软件缺陷预测方法. 该方法首先通过将源项目和目标项目分别拟合高斯混合模型(Gaussian mixture model,GMM),再通过蒙特卡洛方法计算出目标项目和所有候选项目之间的JS散度. 其次,根据获得的JS散度选取与目标项目最接近的源项目; 再次,提出相对密度概念,对选取的源项目训练数据进行有效选择. 最后,利用CPDP中常用分类器构建预测模型. 通过实验对比表明,本文方法不仅能够提高跨项目缺陷预测模型的性能,同时对不同分类器表现出较高的适应性.
Abstract:
In real software development,a project which needs defect prediction is always new or without any historical data. It is necessary to use training data from several projects and performs prediction on another one. Therefore,cross project defect prediction(CPDP)has become a means of software quality assurance and been studied by researchers. However,the performance of prediction model can be directly affected by training data. In order to solve the problem,a cross-project software defect prediction method based on Jensen-Shannon divergence and relative density is proposed in this paper. Firstly,the Gaussian mixture model(GMM)is applied to the source and target projects respectively,and then the JS divergence between the target and all candidate projects is calculated by Monte Carlo method. Secondly,according to the obtained JS divergence,the source project that is most similar to the target project is selected. Thirdly,the concept of relative density is proposed to improve the training data quality of selected source project. Finally,some common classifiers are used to build the prediction model. Experimental results show that the proposed method can not only improve the performance of the prediction model,but also show high adaptability to different classifiers.

参考文献/References:

[1] JURECZKO M,MADEYSKI L. Cross-project defect prediction with respect to code ownership model:an empirical study[J]. E-informatica software engineering journal,2015,9(1):21-35.
[2]TURHAN B. On the dataset shift problem in software engineering prediction models[J]. Empirical software engineering,2012,17(1/2):62-74.
[3]ZIMMERMANN T,NAGAPPAN N,GALL H,et a1. Cross-project defect prediction:a large scale experiment on data vs. Domain vs. process[C]//In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering. Amsterdam:Netherland,2009:91-100.
[4]TURHAN B,MENZIES T,BENER A B,et a1. On the relative value of cross-company and within-company data for defect prediction[J]. Empirical software engineering,2009,14(5):540-578.
[5]HE P,LI BING,ZHANG D G,et a1. Simplification of training data for cross-project defect prediction[J]. arXiv:1405.0773,2014.
[6]PETERS F,MENZIES T,MARCUS A. Better cross company defect prediction[C]//Mining Software Repositories. San Francisco:IEEE,2013:409-418.
[7]李勇,黄志球,王勇,等. 基于多源数据的跨项目软件缺陷预测方法[J]. 吉林大学学报(工学版),2016,46(6):2034-2041.
[8]NAM J,PAN S J,KIM S. Transfer defect learning[C]//Proceedings of the International Conference on Software Engineering. San Francisco:IEEE,2013:382-391.
[9]ZHAO H Q,ZENG X P,ZHANG J S. Adaptive reduced feedback FLNN filter for active control of nonlinear noise processes[J]. Signal processing,2010,90(3):834-847.
[10]MA Y,LUO G,ZENG X,et al. Transfer learning for cross-company software defect prediction[J]. Information & software technology,2012,54(3):248-256.
[11]PENG L,YANG B,CHEN Y,et al. Data gravitation based classification[J]. Information ences,2009,179(6):809-819.
[12]CHEN L,FANG B,SHANG Z,et al. Negative samples reduction in cross-company software defects prediction[J]. Information & software technology,2015,62:67-77.
[13]DAI W Y,YANG Q,XUE G,et al. Boosting for transfer learning[C]//Proceedings of the 24th International Conference on Machine Learning-ICML’07. Oregon:USA,2007.
[14]周志华. 机器学习[M]. 北京:清华大学出版社,2016.
[15]LIN J. Divergence measures based on the Shannon entropy[J]. IEEE transactions on information theory,2002,37(1):145-151.
[16]HERSHEY J R,OLSEN P A. Approximating the Kullback Leibler divergence between gaussian mixture models[C]//IEEE International Conference on Acoustics. Honolulu HI:IEEE,2007.
[17]WANG Q,KULKARNI S R,VERD S. Divergence estimation for multidimensional densities via-nearest-neighbor distances[J]. IEEE transactions on information theory,2009,55(5):2392-2405.
[18]O’HAGAN,ADRIAN A,MURPHY T B,GORMLEY I C,et al. Clustering with the multivariate normal inverse Gaussian distribution[J]. Computational stats & data analysis,2016,93(C):18-30.
[19]JURECZKO M,MADEYSKI L. Towards identifying software project clusters with regard to defect prediction[C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering-PROMISE’ 10. Timisoars:Romania,2010.
[20]GRETTON A,SCH?KOPF B,HUANG J. Correcting sample selection Bias by unlabeled date[J]. Advances in neural information processing systems,2007,19:601-608.
[21]QIU S,LU L,JIANG S. Multiple components weights model for cross-project defect prediction[J]. IET software,2018,12(4):345-355.

备注/Memo

备注/Memo:
收稿日期:2020-07-27.
基金项目:江苏省自然科学面上基金项目(BK20191457)、江苏省高校面上基金项目(18JKB520011)、江苏省镇江市社会发展重点研发项目(SH2019021)、江苏科技大学高层次人才启动项目.
通讯作者:郑尚,博士,副教授,研究方向:智能软件工程. E-mail:szheng@just.edu.cn
更新日期/Last Update: 1900-01-01