[1]宋慧玲,李 勇,张文静.基于联邦迁移的跨项目软件缺陷预测[J].南京师大学报(自然科学版),2024,(03):122-128.[doi:10.3969/j.issn.1001-4616.2024.03.015]
 Song Huiling,Li Yong,Zhang Wenjing.Cross-project Software Defect Prediction Based on Federated Transfer[J].Journal of Nanjing Normal University(Natural Science Edition),2024,(03):122-128.[doi:10.3969/j.issn.1001-4616.2024.03.015]
点击复制

基于联邦迁移的跨项目软件缺陷预测()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
期数:
2024年03期
页码:
122-128
栏目:
计算机科学与技术
出版日期:
2024-09-15

文章信息/Info

Title:
Cross-project Software Defect Prediction Based on Federated Transfer
文章编号:
1001-4616(2024)03-0122-07
作者:
宋慧玲12李 勇123张文静12
(1.新疆师范大学计算机科学技术学院,新疆 乌鲁木齐 830054)
(2.新疆电子研究所软件事业部,新疆 乌鲁木齐 830010)
(3.南京航空航天大学高安全系统的软件开发与验证技术工信部重点实验室,江苏 南京 211106)
Author(s):
Song Huiling12Li Yong123Zhang Wenjing12
(1.College of Computer Science and Technology,Xinjiang Normal University,Urumqi 830054,China)
(2.Software Development Department,Xinjiang Electronic Research Institute,Urumqi 830010,China)
(3.Key Laboratory of Ministry of Industry and Information Technology for Safety-critical Software Development and Verification,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
关键词:
软件缺陷预测联邦学习迁移学习差分隐私卷积神经网络
Keywords:
software defect predictionfederated learningtransfer learningdifferential privacyconvolutional neural network
分类号:
TP181; TP311.5
DOI:
10.3969/j.issn.1001-4616.2024.03.015
文献标志码:
A
摘要:
跨项目软件缺陷预测基于已标注的多源项目数据构建模型,可以解决软件历史数据不足和标注代价高的问题. 但在传统跨项目缺陷预测中,源项目数据持有者为了保护软件数据的商业隐私,而导致的“数据孤岛”问题直接影响了跨项目预测的模型性能. 本文提出基于联邦迁移的跨项目软件缺陷预测方法(FT-CPDP). 首先,针对数据隐私泄露和项目间特征异构问题,提出基于联邦学习与迁移学习相结合的模型算法,打破各数据持有者间的“数据壁垒”,实现隐私保护场景下的跨项目缺陷预测模型. 其次,在联邦通信过程中添加满足隐私预算的噪声来提高隐私保护水平,最后构建卷积神经网络模型实现软件缺陷预测. 基于NASA软件缺陷预测数据集进行实验,结果表明与传统跨项目缺陷预测方法相比,本文提出的FT-CPDP方法在实现软件数据隐私保护的前提下,模型的综合性能表现较优.
Abstract:
Cross-project software defect prediction is based on labeled multi-source project data to build a model,which can address the problem of insufficient software historical data and high labeling cost. However,in traditional cross-project defect prediction,the problem of “data-island” caused by source project data holders to protect the business privacy of software data directly affects the model performance of cross-project prediction. Therefore,in this paper,we propose a cross-project software defect prediction method based on federated transfer(FT-CPDP). Firstly,to address the problem of data privacy leaking and feature heterogeneity between projects,this paper presents a model algorithm based on the combination of federal learning and migratory learning to break down the “data barrier” among data holders,and to achieve cross-project defect prediction model in the privacy protection scenario. Secondly,in the federal communication process,the level of privacy protection is increased by adding noise that satisfies the privacy budget. Finally,a convolution neural network model is built to realize software defect prediction. Experiments based on NASA software defect prediction dataset show that compared with traditional cross-project defect prediction methods,FT-CPDP method achieves better comprehensive performance on the premise of software data privacy protection.

参考文献/References:

[1]李勇,黄志球,房丙午,等. 代价敏感分类的软件缺陷预测方法[J]. 计算机科学与探索,2014,8(12):1442-1451.
[2]李勇,黄志球,王勇,等. 数据驱动的软件缺陷预测研究综述[J]. 电子学报,2017,45(4):982-988.
[3]刘文英,林亚林,李克文,等. 一种软件缺陷不平衡数据分类新方法[J]. 山东科技大学学报(自然科学版),2021,40(2):84-94.
[4]曲豫宾,陈翔,李龙,等. 可缓解类重叠问题的跨版本软件缺陷预测方法[J]. 吉林大学学报(理学版),2021,59(2):372-378.
[5]盖金晶,郑尚,于化龙,等. 一种跨项目缺陷预测的源项目训练数据选择方法[J]. 南京师大学报(自然科学版),2022,45(1):110-117.
[6]倪超,陈翔,刘望舒,等. 基于特征迁移和实例迁移的跨项目缺陷预测方法[J]. 软件学报,2019,30(5):1308-1329.
[7]李勇,黄志球,王勇,等. 基于多源数据的跨项目软件缺陷预测[J]. 吉林大学学报(工学版),2016,46(6):2034-2041.
[8]CHEN X,ZHANG D,CUI Z Q,et al. DP-share:privacy-preserving software defect prediction model sharing through differential privacy[J]. Journal of computer science and technology,2019,34(5):1020-1038.
[9]CHEN Y,QIN X,WANG J,et al. FedHealth:a federated transfer learning framework for wearable healthcare[J]. IEEE intelligent systems,2020,35(4):83-93.
[10]CHEN Y,LU W,WANG J,et al. Federated learning with adaptive batchnorm for personalized healthcare[J/OL]. arXiv Preprint arXiv:2112.00734,2021.
[11]ZHANG W,LI X. Federated transfer learning for intelligent fault diagnostics using deep adversarial networks with data privacy[J]. IEEE/ASME transactions on mechatronics,2022,27(1):430-439.
[12]WANG A,ZHANG Y,YAN Y,et al. Heterogeneous defect prediction based on federated transfer learning via knowledge distillation[J]. IEEE access,2021,9:29530-29540.
[13]SHARMA S,CHAOPING X,LIU Y,et al. Secure and efficient federated transfer learning[C]//2019 IEEE International Conference on Big Data. New York:IEEE,2019.
[14]JU C,GAO D,MARE R,et al. Federated transfer learning for EEG signal classification[C]//IEEE Engineering in Medicine and Biology Society Conference Proceedings. Motreal,Canada:IEEE,2020.
[15]孔秀平,陆林. 隐私保护下的车辆轨迹联邦嵌入学习与聚类[J]. 南京师范大学学报(工程技术版),2022,22(2):80-86.
[16]TANG S,HUANG S,ZHENG C,et al. A novel cross-project software defect prediction algorithm based on transfer learning[J]. Tsinghua science and technology,2022,27(1):41-57.
[17]WU X,ZHANG Y,SHI M,et al. An adaptive federated learning scheme with differential privacy preserving[J]. Future generation computer systems,2022,127:362-372.
[18]叶青青,孟小峰,朱敏杰,等. 本地化差分隐私研究综述[J]. 软件学报,2018,29(7):25.
[19]GF A,RS B. On the behavioral implications of differential privacy[J]. Theoretical computer science,2020,841:84-93.
[20]LI H,HYB C,LANG L,et al. MHAT:an efficient model-heterogenous aggregation training scheme for federated learning[J]. Information sciences,2021.
[21]LI T,SAHU A K,TALWALKAR A,et al. Federated learning:challenges,methods,and future directions[J]. IEEE Signal processing magazine,2020,37(3):50-60.
[22]ZHANG W,LI X,MA H,et al. Federated learning for machinery fault diagnosis with dynamic validation and self-supervision[J]. Knowledge-based systems,2021,213(1):106679.
[23]张泽辉,富瑶,高铁杠. 支持数据隐私保护的联邦深度神经网络模型研究[J]. 自动化学报,2022,48(5):1273-1284.
[24]杨庚,王周生. 联邦学习中的隐私保护研究进展[J]. 南京邮电大学学报(自然科学版),2020,40(5):204-214.
[25]RODRIGUEZ D,HERRAIZ I,HARRISON R. On software engineering repositories and their open problems[C]//First International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering(RAISE'12). New York:IEEE,2012.
[26]FANG H,QIAN Q,CHEN M L,et al. Privacy preserving machine learning with homomorphic encryption and federated learning[J]. Future internet,2021,13.
[27]贾峰,李世豪,沈建军,等. 采用深度迁移学习与自适应加权的滚动轴承故障诊断[J]. 西安交通大学学报,2022,56(8):1-10.
[28]MG A,KE P A,YU X A,et al. Preserving differential privacy in deep neural networks with relevance-based adaptive noise imposition-ScienceDirect[J]. Neural networks,2020,125:131-141.

备注/Memo

备注/Memo:
收稿日期:2022-09-20.
基金项目:新疆维吾尔自治区天山青年计划项目(2020Q019)、新疆师范大学博士科研启动基金项目(XJNUBS1905).
通讯作者:李勇,博士,副教授,研究方向:机器学习与实证软件工程. E-mail:liyong@live.com
更新日期/Last Update: 2024-09-15