[1]吕 峰,宋 媚,赵 礼,等.基于局部合力改进的Borderline-SMOTE过采样方法[J].南京师大学报(自然科学版),2025,48(05):93-103.[doi:10.3969/j.issn.1001-4616.2025.05.011]
 Lv Feng,Song Mei,Zhao Li,et al.An Improved Borderline-SMOTE Oversampling Method Based on Local Resultant Gravitation[J].Journal of Nanjing Normal University(Natural Science Edition),2025,48(05):93-103.[doi:10.3969/j.issn.1001-4616.2025.05.011]
点击复制

基于局部合力改进的Borderline-SMOTE过采样方法()

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
48
期数:
2025年05期
页码:
93-103
栏目:
计算机科学与技术
出版日期:
2025-10-20

文章信息/Info

Title:
An Improved Borderline-SMOTE Oversampling Method Based on Local Resultant Gravitation
文章编号:
1001-4616(2025)05-0093-11
作者:
吕 峰宋 媚赵 礼祝 义李赫男
(江苏师范大学计算机科学与技术学院,江苏省教育智能技术高校重点实验室,江苏 徐州 221116)
Author(s):
Lv FengSong MeiZhao LiZhu YiLi Henan
(School of Computer Science and Technology, Jiangsu Normal University, Jiangsu Provincial Key Laboratory of Intelligent Education Technology, Xuzhou 221116, China)
关键词:
不平衡数据过拟合类重叠过采样Borderline-SMOTE局部合力
Keywords:
imbalanced dataoverfittingclass overlapoversamplingBorderline-SMOTElocal resultant gravitation
分类号:
TP181
DOI:
10.3969/j.issn.1001-4616.2025.05.011
文献标志码:
A
摘要:
数据分类是保障大数据分析有效进行的关键环节,解决数据分类中的类别不平衡成为当前研究的热点. 过采样技术凭借其简洁性、有效性等特点,成为处理类不平衡问题的主要途径之一. 现有的过采样技术在处理不平衡数据中类重叠时缺乏合理的采样策略,导致机器学习模型预测时出现过拟合. 因此,本文提出一种基于局部合力改进的Borderline-SMOTE过采样方法(IBSLG). 首先,根据少数类样本最近邻分布构建边界区域; 其次,基于局部合力计算边界区域内样本的集中度,根据集中度将样本划分为低概率/高概率边界样本; 然后,基于两类边界样本分布,计算缩放因子构建新边界区域; 最后,基于类不平衡比,对新边界区域自适应生成新样本. 通过IBSLG与6种采样方法在4种分类器、8个不平衡数据集上进行对比实验,结果表明,IBSLG在大部分数据集上取得了最优的F1、G-mean、AUC和Friedman排名,并在大部分分类器上取得了最高的平均次优率,说明所提方法的有效性.
Abstract:
Data classification is a key process to ensure the effectiveness of big data analysis, and addressing class imbalance in data classification has become a major focus of current research. Oversampling techniques, due to their simplicity and effectiveness, have become one of the primary approaches for handling class imbalance. However, existing oversampling techniques lack rational sampling strategies when dealing with class overlap in imbalanced data, leading to overfitting in machine learning model predictions. Therefore, this study proposes an improved Borderline-SMOTE oversampling method based on local resultant gravitation(IBSLG). Firstly, the boundary region is constructed based on the nearest neighbour distribution of minority samples; secondly, samples in this region are classified into low-probability and high-probability boundary samples using a concentration measure derived from local resultant gravitation; then a new boundary region is constructed by calculating scaling factor based on these boundary samples distributions; finally, new samples are adaptively generated for this new region based on the class imbalance ratio. Comparative experiments between IBSLG and six other sampling methods on four classifiers and eight imbalanced datasets show that IBSLG achieves optimal F1, G-mean, AUC, and Friedman rankings on most datasets, as well as the highest average suboptimal ratio on most classifiers, demonstrating its effectiveness.

参考文献/References:

[1]EL-ASSY A M,AMER H M,IBRAHIM H M,et al. A novel CNN architecture for accurate early detection and classification of Alzheimer's disease using MRI data[J]. Scientific reports,2024,14(1):3463-3481.
[2]HATAMI M,YAGHMAEE F,EBRAHIMPOUR R. Improving Alzheimer's disease classification using novel rewards in deep reinforcement learning[J]. Biomedical signal processing and control,2025,100(3):106920-106936.
[3]ULLAH F,ULLAH S,SRIVASTAVA G,et al. IDS-INT:intrusion detection system using transformer-based transfer learning for imbalanced network traffic[J]. Digital communications and networks,2024,10(1):190-204.
[4]LUO J,ZHANG Y,YANG F,et al. Imbalanced data fault diagnosis of rolling bearings using enhanced relative generative adversarial network[J]. Journal of mechanical science and technology,2024,38(2):541-555.
[5]FAN X,DUAN L,ZHANG N. A multi-scale graph-guided dynamic enhanced alignment network for mechanical fault diagnosis considering domain shift and data imbalance[J]. Neurocomputing,2025,625(1):129546-129559.
[6]XIE Y,LI A,HU B,et al. A credit card fraud detection model based on multi-feature fusion and generative adversarial network[J]. Computers,materials & continua,2023,76(3):2707-2726.
[7]GHOSH K,BELLINGER C,CORIZZO R,et al. The class imbalance problem in deep learning[J]. Machine learning,2024,113(7):4845-4901.
[8]CHEN L,JING X Y,CHEN R,et al. Sample-pair learning network for extremely imbalanced classification[J]. Neurocomputing,2025,634(1):129859-129871.
[9]MA H,ZHANG X,SONG M,et al. SD-CSMOTE:Over-sampling method based on SNN-DPC and improved SMOTE[J]. Neurocomputing,2025,620(1):129233-129243.
[10]AGUITAR G,KRAWCZYK B,CANO A. A survey on learning from imbalanced data streams:taxonomy,challenges,empirical study,and reproducible experimental framework[J]. Machine learning,2024,113(7):4165-4243.
[11]CHAWLA N V,BOWYER K W,HALL L O,et al. SMOTE:synthetic minority oversampling technique[J]. Journal of artificial intelligence research,2002,16(1):321-357.
[12]HE H,BAI Y,GARCIA E A,et al. ADASYN:adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks(IEEE World Congress on Computational Intelligence). Hong Kong:IEEE,2008:1322-1328.
[13]HAN H,WANG W Y,MAO B H. Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing. Berlin,Heidelberg:Springer,2005:878-887.
[14]CHEN B,XIA S,CHEN Z,et al. RSMOTE:a self-adaptive robust SMOTE for imbalanced problems with label noise[J]. Information sciences,2021,553(1):397-428.
[15]陈海龙,杨畅,杜梅,等. 基于边界自适应SMOTE和Focal Loss函数改进LightGBM的信用风险预测模型[J]. 计算机应用,2022,42(7):2256-2264.
[16]马贺,宋媚,祝义. 改进边界分类的Borderline-SMOTE过采样方法[J]. 南京大学学报(自然科学版),2023,59(6):1003-1012.
[17]GUO J,WU H,CHEN X,et al. Adaptive SV-Borderline SMOTE-SVM algorithm for imbalanced data classification[J]. Applied soft computing,2024,150(1):110986-110998.
[18]ZHANG R,LU S,YAN B,et al. A density-based oversampling approach for class imbalance and data overlap[J]. Computers & industrial engineering,2023,186(1):109747-109760.
[19]LU X,YE X,CHENG Y. An overlapping minimization-based over-sampling algorithm for binary imbalanced classification[J]. Engineering applications of artificial intelligence,2024,133(1):108107-108120.
[20]WANG Z,YU Z,CHEN C L P,et al. Clustering by local gravitation[J]. IEEE transactions on cybernetics,2018,48(5):1383-1396.
[21]冀常鹏,尚佳奇,代巍. 不平衡数据集的DC-SMOTE过采样方法[J]. 智能系统学报,2024,19(3):525-533.
[22]陶佳晴,贺作伟,冷强奎. 基于Tomek链的边界少数类样本合成过采样方法[J]. 计算机应用研究,2023,40(2):463-469.
[23]DOUZAS G,BACAO F,LAST F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J]. Information sciences,2018,465(1):1-20.
[24]FEI Y,CHEN F,HE L,et al. Intelligent classification of antenatal cardiotocography signals via multimodal bidirectional gated recurrent units[J]. Biomedical signal processing and control,2022,78:104008-104015.
[25]CHEN Q,YE A,ZHANG Y,et al. An intra-class distribution-focused generative adversarial network approach for imbalanced tabular data learning[J]. International journal of machine learning and cybernetics,2024,15(1):2551-2572.
[26]YANG K,YU Z,CHENG C L P,et al. Incremental weighted ensemble broad learning system for imbalanced data[J]. IEEE transactions on knowledge and data engineering,2022,34(12):5809-5822.

相似文献/References:

[1]袁兴梅,杨明.一种面向不平衡数据的结构化SVM集成算法[J].南京师大学报(自然科学版),2010,33(04):123.
 Yuan Xingmei,Yang Ming.A Kind of StASVM Ensemble Algorithm for Unbalanced Data Sets[J].Journal of Nanjing Normal University(Natural Science Edition),2010,33(05):123.

备注/Memo

备注/Memo:
收稿日期:2025-02-25.
基金项目:国家自然科学基金项目(71503108、62077029、62401235)、江苏省教育科学规划课题项目(B-b/2024/01/47).
通讯作者:宋媚,博士,副教授,研究方向:机器学习、大数据分析. E-mail:msong@jsnu.edu.cn
更新日期/Last Update: 2025-10-20