«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1001-4616.2022.03.014]
点击复制

基于AP聚类和互信息的弱标记特征选择方法()

分享到：

《南京师大学报（自然科学版）》[ISSN:1001-4616/CN:32-1239/N]

卷:: 第45卷
期数:: 2022年03期

页码:: 108-115

栏目:: 计算机科学与技术

出版日期:: 2022-09-15

文章信息/Info

Title:: Weak Label Feature Selection Method Based on AP Clustering and Mutual Information

文章编号:: 1001-4616(2022)03-0108-08

作者:: 孙林; 施恩惠; 司珊珊; 徐久成; (河南师范大学计算机与信息工程学院,河南新乡 453007)

Author(s):: Sun Lin; Shi Enhui; Si Shanshan; Xu Jiucheng; (College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)

关键词:: 多标记学习; 特征选择; AP聚类; 互信息; 缺失标记

Keywords:: multi-label learning; feature selection; AP clustering; mutual information; missing labels

分类号:: TP399

DOI:: 10.3969/j.issn.1001-4616.2022.03.014

文献标志码:: A

摘要:: 特征选择是多标记学习中重要的预处理过程. 针对现有多标记分类方法没有考虑标记占比对特征和标记相关性的影响,以及不能有效处理弱标记数据等问题,提出一种基于仿射传播(affinity propagation,AP)聚类和互信息的弱标记特征选择方法. 首先,在AP聚类的基础上,结合剩余标记信息和样本相似性,构建概率填补公式,预测缺失标记值,有效补齐缺失标记; 然后,使用先验概率定义标记占比,结合互信息构建相关性度量,评估特征与标记集之间的相关程度; 最后,设计一种弱标记特征选择算法,有效提高弱标记数据的分类性能. 在 6个多标记数据集上进行仿真实验,结果表明,该算法在多个指标上获得了良好的分类性能,优于当前多种相关的多标记特征选择算法,有效验证了所提算法的有效性.

Abstract:: Feature selection is an important preprocessing process in multi-label learning. To address the issues that some multi-label classification methods do not consider the influence of the proportion of label on the correlation between features and label sets and cannot efficiently deal with weak label data,a weak label feature selection method based on affinity propagation(AP)clustering and mutual information was proposed. Firstly,to effectively fill in all missing labels,the combination of the remaining label information with the similarity of samples was performed based on AP clustering,and then a probability filling formula was constructed to predict the values of missing labels. Secondly,the prior probability was used to define the proportion of label,which was combined with mutual information to develop the correlation metric for evaluating the correlation degree between features and label sets. Finally,a weak label feature selection algorithm was designed to effectively improve the classification performance of the weak label data. The simulation experimental results and analysis under six multi-label datasets show that the algorithm achieves better classification performance on multiple metrics and is superior to many related multi-label feature selection algorithms at present. All these can verify the effectiveness of the proposed algorithm.

参考文献/References:

[1]袁京洲,高昊,周家特,等. 基于树结构的层次性多示例多标记学习[J]. 南京师大学报(自然科学版),2019,42(3):80-87.
[2]SUN L,YIN T Y,DING W P,et al. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems[J]. Information sciences,2020,537:401-424.
[3]SUN L,YIN T Y,DING W P,et al. Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy[J]. IEEE transactions on fuzzy systems,2021,DOI:10.1109/TFUZZ.2021. 3053844.
[4]徐海峰,张雁,刘江,等. 基于变异系数和最大特征树的特征选择方法[J]. 南京师大学报(自然科学版),2021,44(1):111-118.
[5]刘艳,程璐,孙林. 基于K-S检验和邻域粗糙集的特征选择方法[J]. 河南师范大学学报(自然科学版),2019,47(2):21-28.
[6]邓威,郭钇秀,李勇,等. 基于特征选择和Stacking集成学习的配电网网损预测[J]. 电力系统保护与控制,2020,48(15):108-115.
[7]WANG C X,LIN Y J,LIU J H. Feature selection for multi-label learning with missing labels[J]. Applied intelligence,2019,49(8):3027-3042.
[8]应臻奕. 基于AP聚类的不完备数据处理方法的研究与实现[D]. 北京:北京邮电大学,2018.
[9]ZHU P F,XU Q,HU Q H,et al. Multi-label feature selection with missing labels[J]. Pattern recognition,2018,74:488-502.
[10]JIANG L,YU G X,GUO M Z,et al. Feature selection with missing labels based on label compression and local feature correlation[J]. Neurocomputing,2020,395:95-106.
[11]薛占熬,庞文莉,姚守倩,等. 基于前景理论的直觉模糊三支决策模型[J]. 河南师范大学学报(自然科学版),2020,48(5):31-36.
[12]李征,李斌. 一种基于关联规则与K-means的领域本体构建方法[J]. 河南师范大学学报(自然科学版),2020,48(1):24-32.
[13]韦修喜,黄华娟,周永权. 基于AP聚类的约简孪生支持向量机快速分类算法[J]. 计算机工程与科学,2019,41(10):1899-1904.
[14]LEE J,KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern recognition letters,2013,34(3):349-357.
[15]LIN Y J,HU Q H,LIU J H,et al. Multi-label feature selection based on max-dependency and min-redundancy[J]. Neurocomputing,2015,168:92-103.
[16]SUN Z Q,ZHANG J,DAI L,et al. Mutual information based multi-label feature selection via constrained convex optimization[J]. Neurocomputing,2019,329:447-456.
[17]SHI E H,SUN L,XU J C,et al. Multilabel feature selection using mutual information and ML-ReliefF for multilabel classification[J]. IEEE access,2020,8:145381-145400.
[18]FREY B J,DUECK D. Clustering by passing messages between data points[J]. Science,2007,315(5814):972-976.
[19]徐洪峰,孙振强. 多标签学习中基于互信息的快速特征选择方法[J]. 计算机应用,2019,39(10):2815-2821.
[20]ZHANG M L,ZHOU Z Z. ML-KNN:a lazy learning approach to multi-label learning[J]. Pattern recognition,2007,40:2038-2048.
[21]ZHANG Y,ZHOU Z Z. Multilabel dimensionality reduction via dependence maximization[J]. ACM transactions on knowledge discovery from data,2010,4(3):1-21.
[22]ZHANG M L,PENA J M,ROBLES V. Feature selection for multilabel naive Bayes classification[J]. Information sciences,2009,179:3218-3229.
[23]LIN Y J,LI Y W,WANG C X,et al. Attribute reduction for multi-label learning with fuzzy rough set[J]. Knowledge-based systems,2018,152:51-61.
[24]FRIEDMAN M. A comparison of alternative tests of significance for the problem of m rankings[J]. Annals of mathematical statistics,1940,11(1):86-92.
[25]孙林,赵婧,徐久成,等. 基于改进帝王蝶优化算法的特征选择方法[J]. 模式识别与人工智能,2020,33(11):981-994.
[26]DEMIAR J,SCHUURMANS D. Statistical comparisons of classifiers over multiple data sets[J]. Journal of machine learning research,2006,7(1):1-30.

相似文献/References:

[1]徐海峰,张雁,刘江,等.基于变异系数和最大特征树的特征选择方法[J].南京师大学报(自然科学版),2021,44(01):111.[doi:10.3969/j.issn.1001-4616.2021.01.016]
　Xu Haifeng,Zhang Yan,Liu Jiang,et al.Feature Selection Method Based on Coefficient ofVariation and Maximum Feature Tree[J].Journal of Nanjing Normal University(Natural Science Edition),2021,44(03):111.[doi:10.3969/j.issn.1001-4616.2021.01.016]
[2]吉珊珊.基于神经网络树和人工蜂群优化的数据聚类[J].南京师大学报(自然科学版),2021,44(01):119.[doi:10.3969/j.issn.1001-4616.2021.01.017]
　Ji Shanshan.Neuron Network Tree and Artificial Bee Colony OptimizationBased Data Clustering Algorithm[J].Journal of Nanjing Normal University(Natural Science Edition),2021,44(03):119.[doi:10.3969/j.issn.1001-4616.2021.01.017]
[3]陆嘉华,梅飞,杨赛,等.基于特征选择和组合预测模型的负荷短期预测方法[J].南京师大学报(自然科学版),2023,46(04):114.[doi:10.3969/j.issn.1001-4616.2023.04.015]
　Lu Jiahua,Mei Fei,Yang Sai,et al.Short-term Load Forecasting Method Based on Feature Selection and Combination Forecasting Model[J].Journal of Nanjing Normal University(Natural Science Edition),2023,46(03):114.[doi:10.3969/j.issn.1001-4616.2023.04.015]
[4]穆晓霞,郑李婧.基于F-score和二进制灰狼优化的肿瘤基因选择方法[J].南京师大学报(自然科学版),2024,(01):111.[doi:10.3969/j.issn.1001-4616.2024.01.013]
　Mu Xiaoxia,Zheng Lijing.Tumor Gene Selection Based on F-score and Binary Grey Wolf Optimization[J].Journal of Nanjing Normal University(Natural Science Edition),2024,(03):111.[doi:10.3969/j.issn.1001-4616.2024.01.013]

备注/Memo

备注/Memo:: 收稿日期:2021-04-25.
基金项目:国家自然科学基金项目(62076089、61772176、61976082)、河南省科技攻关项目(212102210136).
通讯作者:孙林,博士,副教授,研究方向:粒计算、数据挖掘、生物信息学. E-mail:sunlin@htu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1666
全文下载/Downloads1733
评论/Comments

更新日期/Last Update: 2022-09-15