[1]徐海峰,张 雁,刘 江,等.基于变异系数和最大特征树的特征选择方法[J].南京师大学报(自然科学版),2021,(01):111-118.[doi:10.3969/j.issn.1001-4616.2021.01.016]
 Xu Haifeng,Zhang Yan,Liu Jiang,et al.Feature Selection Method Based on Coefficient ofVariation and Maximum Feature Tree[J].Journal of Nanjing Normal University(Natural Science Edition),2021,(01):111-118.[doi:10.3969/j.issn.1001-4616.2021.01.016]
点击复制

基于变异系数和最大特征树的特征选择方法()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
期数:
2021年01期
页码:
111-118
栏目:
·计算机科学与技术·
出版日期:
2021-03-15

文章信息/Info

Title:
Feature Selection Method Based on Coefficient ofVariation and Maximum Feature Tree
文章编号:
1001-4616(2021)01-0111-08
作者:
徐海峰张 雁刘 江吕丹桔
西南林业大学大数据与智能工程学院,云南 昆明 650224
Author(s):
Xu HaifengZhang YanLiu JiangLü Danjv
School of Big Data and Intelligent Engineering,Southwest Forestry University,Kunming 650224,China
关键词:
特征选择特征贡献度变异系数互信息最大特征树二邻域去冗余
Keywords:
feature selectionfeature contribution scoringcoefficient of variationmutual
分类号:
TP3-0
DOI:
10.3969/j.issn.1001-4616.2021.01.016
文献标志码:
A
摘要:
特征选择是数据挖掘的关键过程,特征贡献度评分和特征优选是其核心部分. 针对特征贡献度评分,提出一种用变异系数度量类内距离、互信息度量类间距离的CVMI(coefficient of variation and mutual of information)方法,将该算法运用到嵌入式特征选择方法中进行特征优选. 实验采用UCI提供的4组数据集、1组遥感数据和1组鸟鸣声数据,使用7种特征贡献度评分方法进行对比. 结果表明,CVMI方法更符合特征贡献度评价的客观规律,对比其他7种方法,CVMI方法取得较好效果. 此外,基于CVMI特征评分方法构建最大特征树,结合二邻域去冗余的特征优选方法CVMI-RRMFT(remove redundancy of maximum feature tree),采用上述数据集进行实验,结果表明该方法不仅能有效降低数据维度,而且还能提高分类准确率.
Abstract:
Feature selection is a key process in data mining. Feature contribution scoring and feature optimization are its core parts. This paper proposed a CVMI(coefficient of variation and mutual of information)method that used the coefficient of variation to measure the distance between intraclass and the mutual information to measure the distance between interclass,and then applied the algorithm to the embedded feature selection method. The experiment used four UCI data sets,one set of remote sensing data and birds sound data,and tested seven different feature contribution scoring methods. The results showed that the CVMI method was more in line with the objective law of feature contribution uation. It also achieved better results compared to the other feature scoring methods. Besides,this paper also proposed a feature optimization method CVMI-RRMFT(remove redundancy of maximum feature tree)based on CVMI to construct a maximum feature tree and remove redundancy with two-neighborhood. Experiment results demonstrated that this feature optimization method effectively reduced data dimensions and improved the classification accuracy.

参考文献/References:

[1] Kozodoi N,Lessmann S,Papakonstantinou K,et al. A multi-objective approach for profit-driven feature selection in credit scoring[J]. Decision support systems,2019,120:106-117.
[2]JIANG B,LI C,RIJKE M D,et al. Probabilistic feature selection and classification vector machine[J]. ACM transactions on knowledge discovery from data,2019,13(2):1-27.
[3]KULKARNI A,METTA R. A new code obfuscation scheme for software protection[C]//2014 IEEE 8th International Symposium on Service Oriented System Engineering. Oxford:IEEE,2014:409-414.
[4]COLLBERG C,THOMBORSON C,LOW D. A taxonomy of obfuscating transformations[D]. New Zealand:The University of Auckland,1997.
[5]LI J,CHENG K,WANG S,et al. Feature selection:a data perspective[J]. ACM computing surveys,2017,50(6):1-45.
[6]李郅琴,杜建强,聂斌. 特征选择方法综述[J]. 计算机工程与应用,2019,55(24):10-19.
[7]ZHANG Y,WANG Q,GONG D,et al. Nonnegative Laplacian embedding guided subspace learning for unsupervised feature selection[J]. Pattern recognition,2019,93:337-352.
[8]ZHAO S,ZHANG Y,XU H,et al. Ensemble classification based on feature selection for environmental sound recognition[J]. Mathematical problems in engineering,2019,2019.
[9]SAQLAIN S M,SHER M,SHAH F A,et al. Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines[J]. Knowledge and information systems,2019,58(1):139-167.
[10]张康,黑保琴,周壮,等. 变异系数降维的CNN高光谱遥感图像分类[J]. 遥感学报,2018,22(1):87-96.
[11]MAFARJA M,ALJARAH I,HEIDARI A A,et al. Binary dragonfly optimization for feature selection using time-varying transfer functions[J]. Knowledge-based systems,2018,161:185-204.
[12]王金杰,李炜. 混合互信息和粒子群算法的多目标特征选择方法[J]. 计算机科学与探索,2020,14(1):83-95.
[13]RAO H,SHI X,RODRIGUE A K,et al. Feature selection based on artificial bee colony and gradient boosting decision tree[J]. Applied soft computing,2019,74:634-642.
[14]WANG H,MENG Y,YIN P,et al. A model-driven method for quality reviews detection:an ensemble model of feature selection[C]//Wuhan International Conference on E-Business. Wuhan,China,2016:2.
[15]巫红霞,谢强. 基于加权社区检测与增强人工蚁群算法的高维数据特征选择[J]. 计算机应用与软件,2019,36(9):285-292,301.



[16]程玉胜,宋帆,王一宾,等. 基于专家特征的条件互信息多标记特征选择算法[J]. 计算机应用,2020,40(2):503-509.

[17]DUA D,GRAFF C. UCI Machine Learning Repository[http://archive.ics.uci.edu/ml]. Irvine,CA:University of California,School of Information and Computer Science. 2019.

相似文献/References:

[1]吉珊珊.基于神经网络树和人工蜂群优化的数据聚类[J].南京师大学报(自然科学版),2021,(01):119.[doi:10.3969/j.issn.1001-4616.2021.01.017]
 Ji Shanshan.Neuron Network Tree and Artificial Bee Colony OptimizationBased Data Clustering Algorithm[J].Journal of Nanjing Normal University(Natural Science Edition),2021,(01):119.[doi:10.3969/j.issn.1001-4616.2021.01.017]

备注/Memo

备注/Memo:
收稿日期:2020-09-16.
基金项目:国家自然科学基金资助项目(61462078,31860332)、云南省教育厅科学研究基金资助性项目(2017ZZX212).
通讯作者:张雁,博士,教授,研究方向:智能信息处理、机器学习. E-mail:zydyr@163.com
更新日期/Last Update: 2021-03-15