[1]穆晓霞,郑李婧.基于F-score和二进制灰狼优化的肿瘤基因选择方法[J].南京师大学报(自然科学版),2024,(01):111-120.[doi:10.3969/j.issn.1001-4616.2024.01.013]
 Mu Xiaoxia,Zheng Lijing.Tumor Gene Selection Based on F-score and Binary Grey Wolf Optimization[J].Journal of Nanjing Normal University(Natural Science Edition),2024,(01):111-120.[doi:10.3969/j.issn.1001-4616.2024.01.013]
点击复制

基于F-score和二进制灰狼优化的肿瘤基因选择方法()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
期数:
2024年01期
页码:
111-120
栏目:
计算机科学与技术
出版日期:
2024-03-15

文章信息/Info

Title:
Tumor Gene Selection Based on F-score and Binary Grey Wolf Optimization
文章编号:
1001-4616(2024)01-0111-10
作者:
穆晓霞郑李婧
(河南师范大学计算机与信息工程学院,河南 新乡 453007)
Author(s):
Mu XiaoxiaZheng Lijing
(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)
关键词:
肿瘤基因Fisher-scoreSpearman相关系数二进制灰狼优化算法特征选择
Keywords:
tumor geneFisher-scoreSpearman correlation coefficientbinary grey wolf optimization algorithmfeature selection
分类号:
TP311
DOI:
10.3969/j.issn.1001-4616.2024.01.013
文献标志码:
A
摘要:
针对肿瘤基因数据维度高、噪声多、冗余性高的现状,结合Spearman相关系数改进F-score算法,在此基础上优化二进制灰狼算法,提出了一种基于改进F-score和二进制灰狼算法的肿瘤基因选择算法. 首先,考虑特征之间的相关性,计算每个特征的F-score值和特征之间的Spearman 相关系数的绝对值; 然后,计算权重系数得出各个特征的权重值,依据重要性进行排序,选出初选特征子集; 最后,通过收敛因子的衰减曲线和初始化方法优化二进制灰狼算法,调整全局搜索和局部搜索所占比例,增强全局搜索能力并提高局部搜索速度,有效节省时间开销,提升特征选择的分类性能和效率,得到最优特征子集. 在9个肿瘤基因数据集上测试所提算法,在分类准确率和筛选特征数目两个指标上进行仿真实验,并与4种其他算法进行对比,实验结果证明所提算法表现良好,可有效降低基因数据维度,并具有较好的分类精度.
Abstract:
According to the tumor gene situation of high dimensionality,noise and redundancy,this paper improved the F-score algorithm by the Spearman correlation coefficient,optimized the binary gray wolf algorithm,and proposed a gene feature selection algorithm with the improved F-score and the binary gray wolf algorithm. Firstly,by considering the correlation between features,the F-score value of each feature and the absolute value of Spearman correlation coefficient between features were calculated. Secondly,by calculating the weight coefficient,the weight value of each feature was derived to be ranked according to their importance and select a primary feature subset. Finally,the binary gray wolf algorithm was optimized through adjusting the proportion of global search and local search to enhance the global search capability and improve the speed of local search,so that the time overhead could be saved and the optimal feature subset was selected,which can improve the classification performance and efficiency of feature selection. The designed algorithm is tested on nine tumor gene datasets and simulated on two indexes of correct accuracy and number of filtered features. When compared with four other algorithms,the experimental results prove that the algorithm performed well,reduced the dimensionality of gene data,and had better classification accuracy.

参考文献/References:

[1]吴辰文,纪海斌. 混合mRMR和改进磷虾群的肿瘤基因特征选择算法[J]. 西北大学学报(自然科学版),2022,52(2):262-269.
[2]孙林,徐枫,李硕,等. 基于ReliefF和最大相关最小冗余的多标记特征选择[J]. 河南师范大学学报(自然科学版),2023,51(6):22-30.
[3]马超. 基于FCBF特征选择和集成优化学习的基因表达数据分类算法[J]. 计算机应用研究,2019,36(10):2986-2991.
[4]王琛,董永权. 基于二进制灰狼优化的特征选择及文本聚类[J]. 计算机工程与设计,2021,42(9):2526-2535.
[5]GUYON I,WESTON J,BARNHILL S,et al. Gene selection for cancer classification using support vector machines[J]. Machine learning,2002,46:389-422.
[6]谢娟英,王春霞,蒋帅,等. 基于改进的F-score与支持向量机的特征选择方法[J]. 计算机应用,2010,30(4):993-996.
[7]谢娟英,郑清泉,吉新媛. F-score结合核极限学习机的集成特征选择算法[J]. 陕西师范大学学报(自然科学版),2020,48(2):1-8.
[8]吴晓燕,刘笃晋. 基于樽海鞘群与粒子群混合优化算法的特征选择[J]. 重庆邮电大学学报(自然科学版),2021,33(5):844-850.
[9]秦喜文,王芮,于爱军,等. 基于F-score的特征选择算法在多分类问题中的应用[J]. 长春工业大学学报,2021,42(2):128-134.
[10]MIRJALILI S,MIRJALILI S M,LEWIS A. Grey wolf optimizer[J]. Advances in engineering software,2014,69:46-61.
[11]EMARY E,ZAWBA H M,HASSANIEN A E. Binary grey wolf optimization approaches for feature selection[J]. Neurocomputing,2016,172(8):371-381.
[12]陈长倩,慕晓冬,牛犇,等. 结合高斯分布的改进二进制灰狼优化算法[J]. 计算机工程与应用,2019,55(13):145-150.
[13]邢燕祯,王东辉. 一种基于收敛因子改进的灰狼优化算法[J]. 网络新媒体技术,2020,9(3):28-34.
[14]王伟,吕婷婷,周晓冰. 河南5A级景区网络关注度时空演变特征与影响因素[J]. 河南师范大学学报(自然科学版),2023,51(2):70-78.
[15]孙林,马天娇,薛占熬. 基于Fisher score与模糊邻域熵的多标记特征选择算法[J/OL]. 计算机应用:1-12[2023-08-18]. https://kns-cnki-net.webvpn.las.ac.cn/kcms/detail/51.1307.tp.20230214.1544.002.html.
[16]吴迪,郭嗣琮. 改进的Fisher Score特征选择方法及其应用[J]. 辽宁工程技术大学学报(自然科学版),2019,38(5):472-479.
[17]王梓辰,窦震海,董军,等. 多策略改进的自适应动态鲸鱼优化算法[J]. 计算机工程与设计,2022,43(9):2638-2645.
[18]崔鸣,靳其兵. 基于Levy飞行策略的灰狼优化算法[J]. 计算机与数字工程,2022,50(5):948-952,958.
[19]汪丽丽,邓丽,余玥,等. 基于Spark的肿瘤基因混合特征选择方法[J]. 计算机工程,2018,44(11):1-6.
[20]SUN L,WANG L Y,DING W P,et al. Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets[J]. IEEE transactions on fuzzy systems,2021,29(1):19-33.
[21]YANG J,LIU Y L,FENG C S,et al. Applying the Fisher score to identify Alzheimer's disease-related genes[J]. Genetics & molecular research gmr,2016,15(2):19-28.
[22]SALEM H,ATTIYA G,EL-FISHAWY N. Classification of human cancer diseases by gene expression profiles[J]. Applied soft computing,2016,50:124-134.
[23]ALGAMAL Z Y,LEE M H.A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification[J]. Advances in data analysis and classification,2019,13(3):753-771.
[24]SHAH S H,IQBAL M J,AHMAD I,et al. Optimized gene selection and classification of cancer from microarray gene expression data using deep learning[J]. Neural computing and applications,2020,(3/4):1-12.

备注/Memo

备注/Memo:
收稿日期:2023-05-31.
基金项目:国家自然科学基金项目(61772176).
通讯作者:穆晓霞,博士,副教授,研究方向:机器学习、数学建模等.E-mail:muxx1980@126.com
更新日期/Last Update: 2024-03-15