|Table of Contents|

A Cross Project Defect Prediction Method for Source Project Training Data Selection(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2022年01期
Page:
110-117
Research Field:
·计算机科学与技术·
Publishing date:

Info

Title:
A Cross Project Defect Prediction Method for Source Project Training Data Selection
Author(s):
Gai JinjingZheng ShangYu HualongGao Shang
(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China)
Keywords:
cross projectdefect predictionsoftware qualitydata selectionJS divergencerelative densitymixed Gaussian modelMonte Carlo
PACS:
TP311.5
DOI:
10.3969/j.issn.1001-4616.2022.01.016
Abstract:
In real software development,a project which needs defect prediction is always new or without any historical data. It is necessary to use training data from several projects and performs prediction on another one. Therefore,cross project defect prediction(CPDP)has become a means of software quality assurance and been studied by researchers. However,the performance of prediction model can be directly affected by training data. In order to solve the problem,a cross-project software defect prediction method based on Jensen-Shannon divergence and relative density is proposed in this paper. Firstly,the Gaussian mixture model(GMM)is applied to the source and target projects respectively,and then the JS divergence between the target and all candidate projects is calculated by Monte Carlo method. Secondly,according to the obtained JS divergence,the source project that is most similar to the target project is selected. Thirdly,the concept of relative density is proposed to improve the training data quality of selected source project. Finally,some common classifiers are used to build the prediction model. Experimental results show that the proposed method can not only improve the performance of the prediction model,but also show high adaptability to different classifiers.

References:

[1] JURECZKO M,MADEYSKI L. Cross-project defect prediction with respect to code ownership model:an empirical study[J]. E-informatica software engineering journal,2015,9(1):21-35.
[2]TURHAN B. On the dataset shift problem in software engineering prediction models[J]. Empirical software engineering,2012,17(1/2):62-74.
[3]ZIMMERMANN T,NAGAPPAN N,GALL H,et a1. Cross-project defect prediction:a large scale experiment on data vs. Domain vs. process[C]//In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering. Amsterdam:Netherland,2009:91-100.
[4]TURHAN B,MENZIES T,BENER A B,et a1. On the relative value of cross-company and within-company data for defect prediction[J]. Empirical software engineering,2009,14(5):540-578.
[5]HE P,LI BING,ZHANG D G,et a1. Simplification of training data for cross-project defect prediction[J]. arXiv:1405.0773,2014.
[6]PETERS F,MENZIES T,MARCUS A. Better cross company defect prediction[C]//Mining Software Repositories. San Francisco:IEEE,2013:409-418.
[7]李勇,黄志球,王勇,等. 基于多源数据的跨项目软件缺陷预测方法[J]. 吉林大学学报(工学版),2016,46(6):2034-2041.
[8]NAM J,PAN S J,KIM S. Transfer defect learning[C]//Proceedings of the International Conference on Software Engineering. San Francisco:IEEE,2013:382-391.
[9]ZHAO H Q,ZENG X P,ZHANG J S. Adaptive reduced feedback FLNN filter for active control of nonlinear noise processes[J]. Signal processing,2010,90(3):834-847.
[10]MA Y,LUO G,ZENG X,et al. Transfer learning for cross-company software defect prediction[J]. Information & software technology,2012,54(3):248-256.
[11]PENG L,YANG B,CHEN Y,et al. Data gravitation based classification[J]. Information ences,2009,179(6):809-819.
[12]CHEN L,FANG B,SHANG Z,et al. Negative samples reduction in cross-company software defects prediction[J]. Information & software technology,2015,62:67-77.
[13]DAI W Y,YANG Q,XUE G,et al. Boosting for transfer learning[C]//Proceedings of the 24th International Conference on Machine Learning-ICML’07. Oregon:USA,2007.
[14]周志华. 机器学习[M]. 北京:清华大学出版社,2016.
[15]LIN J. Divergence measures based on the Shannon entropy[J]. IEEE transactions on information theory,2002,37(1):145-151.
[16]HERSHEY J R,OLSEN P A. Approximating the Kullback Leibler divergence between gaussian mixture models[C]//IEEE International Conference on Acoustics. Honolulu HI:IEEE,2007.
[17]WANG Q,KULKARNI S R,VERD S. Divergence estimation for multidimensional densities via-nearest-neighbor distances[J]. IEEE transactions on information theory,2009,55(5):2392-2405.
[18]O’HAGAN,ADRIAN A,MURPHY T B,GORMLEY I C,et al. Clustering with the multivariate normal inverse Gaussian distribution[J]. Computational stats & data analysis,2016,93(C):18-30.
[19]JURECZKO M,MADEYSKI L. Towards identifying software project clusters with regard to defect prediction[C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering-PROMISE’ 10. Timisoars:Romania,2010.
[20]GRETTON A,SCH?KOPF B,HUANG J. Correcting sample selection Bias by unlabeled date[J]. Advances in neural information processing systems,2007,19:601-608.
[21]QIU S,LU L,JIANG S. Multiple components weights model for cross-project defect prediction[J]. IET software,2018,12(4):345-355.

Memo

Memo:
-
Last Update: 1900-01-01