|Table of Contents|

Weak Label Feature Selection Method Based on AP Clustering and Mutual Information(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2022年03期
Page:
108-115
Research Field:
计算机科学与技术
Publishing date:

Info

Title:
Weak Label Feature Selection Method Based on AP Clustering and Mutual Information
Author(s):
Sun LinShi EnhuiSi ShanshanXu Jiucheng
(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)
Keywords:
multi-label learningfeature selectionAP clusteringmutual informationmissing labels
PACS:
TP399
DOI:
10.3969/j.issn.1001-4616.2022.03.014
Abstract:
Feature selection is an important preprocessing process in multi-label learning. To address the issues that some multi-label classification methods do not consider the influence of the proportion of label on the correlation between features and label sets and cannot efficiently deal with weak label data,a weak label feature selection method based on affinity propagation(AP)clustering and mutual information was proposed. Firstly,to effectively fill in all missing labels,the combination of the remaining label information with the similarity of samples was performed based on AP clustering,and then a probability filling formula was constructed to predict the values of missing labels. Secondly,the prior probability was used to define the proportion of label,which was combined with mutual information to develop the correlation metric for evaluating the correlation degree between features and label sets. Finally,a weak label feature selection algorithm was designed to effectively improve the classification performance of the weak label data. The simulation experimental results and analysis under six multi-label datasets show that the algorithm achieves better classification performance on multiple metrics and is superior to many related multi-label feature selection algorithms at present. All these can verify the effectiveness of the proposed algorithm.

References:

[1]袁京洲,高昊,周家特,等. 基于树结构的层次性多示例多标记学习[J]. 南京师大学报(自然科学版),2019,42(3):80-87.
[2]SUN L,YIN T Y,DING W P,et al. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems[J]. Information sciences,2020,537:401-424.
[3]SUN L,YIN T Y,DING W P,et al. Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy[J]. IEEE transactions on fuzzy systems,2021,DOI:10.1109/TFUZZ.2021. 3053844.
[4]徐海峰,张雁,刘江,等. 基于变异系数和最大特征树的特征选择方法[J]. 南京师大学报(自然科学版),2021,44(1):111-118.
[5]刘艳,程璐,孙林. 基于K-S检验和邻域粗糙集的特征选择方法[J]. 河南师范大学学报(自然科学版),2019,47(2):21-28.
[6]邓威,郭钇秀,李勇,等. 基于特征选择和Stacking集成学习的配电网网损预测[J]. 电力系统保护与控制,2020,48(15):108-115.
[7]WANG C X,LIN Y J,LIU J H. Feature selection for multi-label learning with missing labels[J]. Applied intelligence,2019,49(8):3027-3042.
[8]应臻奕. 基于AP聚类的不完备数据处理方法的研究与实现[D]. 北京:北京邮电大学,2018.
[9]ZHU P F,XU Q,HU Q H,et al. Multi-label feature selection with missing labels[J]. Pattern recognition,2018,74:488-502.
[10]JIANG L,YU G X,GUO M Z,et al. Feature selection with missing labels based on label compression and local feature correlation[J]. Neurocomputing,2020,395:95-106.
[11]薛占熬,庞文莉,姚守倩,等. 基于前景理论的直觉模糊三支决策模型[J]. 河南师范大学学报(自然科学版),2020,48(5):31-36.
[12]李征,李斌. 一种基于关联规则与K-means的领域本体构建方法[J]. 河南师范大学学报(自然科学版),2020,48(1):24-32.
[13]韦修喜,黄华娟,周永权. 基于AP聚类的约简孪生支持向量机快速分类算法[J]. 计算机工程与科学,2019,41(10):1899-1904.
[14]LEE J,KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern recognition letters,2013,34(3):349-357.
[15]LIN Y J,HU Q H,LIU J H,et al. Multi-label feature selection based on max-dependency and min-redundancy[J]. Neurocomputing,2015,168:92-103.
[16]SUN Z Q,ZHANG J,DAI L,et al. Mutual information based multi-label feature selection via constrained convex optimization[J]. Neurocomputing,2019,329:447-456.
[17]SHI E H,SUN L,XU J C,et al. Multilabel feature selection using mutual information and ML-ReliefF for multilabel classification[J]. IEEE access,2020,8:145381-145400.
[18]FREY B J,DUECK D. Clustering by passing messages between data points[J]. Science,2007,315(5814):972-976.
[19]徐洪峰,孙振强. 多标签学习中基于互信息的快速特征选择方法[J]. 计算机应用,2019,39(10):2815-2821.
[20]ZHANG M L,ZHOU Z Z. ML-KNN:a lazy learning approach to multi-label learning[J]. Pattern recognition,2007,40:2038-2048.
[21]ZHANG Y,ZHOU Z Z. Multilabel dimensionality reduction via dependence maximization[J]. ACM transactions on knowledge discovery from data,2010,4(3):1-21.
[22]ZHANG M L,PENA J M,ROBLES V. Feature selection for multilabel naive Bayes classification[J]. Information sciences,2009,179:3218-3229.
[23]LIN Y J,LI Y W,WANG C X,et al. Attribute reduction for multi-label learning with fuzzy rough set[J]. Knowledge-based systems,2018,152:51-61.
[24]FRIEDMAN M. A comparison of alternative tests of significance for the problem of m rankings[J]. Annals of mathematical statistics,1940,11(1):86-92.
[25]孙林,赵婧,徐久成,等. 基于改进帝王蝶优化算法的特征选择方法[J]. 模式识别与人工智能,2020,33(11):981-994.
[26]DEMIAR J,SCHUURMANS D. Statistical comparisons of classifiers over multiple data sets[J]. Journal of machine learning research,2006,7(1):1-30.

Memo

Memo:
-
Last Update: 2022-09-15