[1]王 垚,柴变芳,李文斌,等.一种基于逆模拟退火和高斯混合模型的半监督聚类算法[J].南京师范大学学报(自然科学版),2017,40(03):67.[doi:10.3969/j.issn.1001-4616.2017.03.010]
 Wang Yao,Chai Bianfang,Li Wenbin,et al.A Semi-supervised Clustering Algorithm Based onAnti-annealing and Gaussian Mixture Model[J].Journal of Nanjing Normal University(Natural Science Edition),2017,40(03):67.[doi:10.3969/j.issn.1001-4616.2017.03.010]
点击复制

一种基于逆模拟退火和高斯混合模型的半监督聚类算法()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第40卷
期数:
2017年03期
页码:
67
栏目:
·计算机科学·
出版日期:
2017-09-30

文章信息/Info

Title:
A Semi-supervised Clustering Algorithm Based onAnti-annealing and Gaussian Mixture Model
文章编号:
1001-4616(2017)03-0067-07
作者:
王 垚柴变芳李文斌吕 峰
河北地质大学信息工程学院,河北 石家庄 050031
Author(s):
Wang YaoChai BianfangLi WenbinLü Feng
School of Information Engineering,Hebei GEO University,Shijiazhuang 050031,China
关键词:
高斯混合模型期望最大化算法逆模拟退火半监督聚类
Keywords:
Gaussian mixture modelexpectation maximization algorithmanti-annealingsemi-supervised clustering
分类号:
TP391
DOI:
10.3969/j.issn.1001-4616.2017.03.010
文献标志码:
A
摘要:
基于节点标记的半监督高斯混合模型(Semi-supervised Gaussian Mixture Model,SGMM)可利用少量标记样本提高模型参数估计的准确率,但参数估计算法(SGMM Expectation Maximization,SGMM-EM)的准确率和收敛速度受高斯分布之间的重叠度和混和系数差异度影响. 为提高SGMM模型参数估计的准确率和收敛速度,将逆模拟退火框架与SGMM模型的EM算法相结合,提出一种基于逆模拟退火框架的半监督高斯混合模型聚类算法(Anti-annealing SGMM-EM,ASGMM-EM). 该算法逆温度参数从一个较小且大于0的值逐渐增加到大于1的上界,再逐渐降回1. 在每个逆温度参数下执行半监督聚类算法SGMM-EM并迭代至收敛. 人工数据和真实数据上实验表明提出的算法ASGMM-EM优于仅用半监督技术或逆模拟退火技术的基于高斯混合模型的EM算法.
Abstract:
Semi-supervised Gaussian mixture model(SGMM)based on labeling nodes can improve the accuracy of model parameter estimation. However,the accuracy and convergence of the Expectation Maximization(EM)algorithm are affected by the amount of overlap and mixing coefficients among the Gaussian distributions. In order to improve the accuracy and speed of the SGMM parameter estimation,the Anti-annealing is combined with the EM algorithm of SGMM. A clustering algorithm of the semi-supervised Gaussian mixture model based on anti-annealing(ASGMM-EM)is proposed. The inverse temperature parameter of the algorithm increases from a smaller value to an upper bound that more than 1 and then back to 1. The semi-supervised clustering EM algorithm is implemented at each inverse temperature parameter. Experiments on synthetic and real data show that the ASGMM-EM is better compared to the algorithms only using semi-supervised or anti-annealing technique.

参考文献/References:

[1] YEUNG K Y,YEUNG K Y,HAYNOR D R,et al. Validating clustering for gene expression data[J]. Bioinformatics,2001,17:309-318.
[2]YANG Y,XU D,NIE F,et al. Image clustering using local discriminant models and global integration[J]. IEEE Trans Image Process,2010,19:2 761-2 773.
[3]XU W,LIU X,GONG Y. Document clustering based on non-negative matrix factorization[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(ACM). Nuray,2003:267-273.
[4]CHAPELLE O,SCH?CKOPF B,ZIEN A. Semi-Supervised Learning[C]//Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer-Verlag,2006:588-595.
[5]YANG M S,LAI C Y. A robust EM clustering algorithm for Gaussian mixture models[J]. Patter recognition,2012,45(10):3 950-3 961.
[6]PORTELA N M,CAVALCANTI G D C,REN T I. Semi-supervised clustering for MR brain image segmentation[J]. Expert systems with applications,2014,41(4):1 492-1 497.
[7]於跃成,生佳根,邹晓华. 基于约束正则化的生成聚类分析[J]. 系统工程与电子设计,2014,36(4):777-783.
[8]HE X F,CAI D,SHAO Y L,et al. Laplacian regularized Gaussian mixture model for data clustering[J]. IEEE Trans on Knowledge and Data Engineering,2011,23(9):1 406-1 418.
[9]GAN H T,SANG N,HUANG R. Manifold regularized semi-supervised Gaussian mixture model[J]. Journal of the optical society of America. A,optics and image science,2015,32(4):566-575.
[10]LIU J,CAI D,HE X. Gaussian mixture model with local consistency[C]//The 24th AAAI Conference on Artificial Intelligence(AAAI). Atlanta,USA,2010:512-517.
[11]MARTINEZ U A,PLA F,SOTOCA J M. A semi-supervised Gaussian mixture model for image segmentation[C]//Proc of the 20th International Conference on Pattern Recognition. Istanbul Turkey,2010:2 941-2 944.
[12]周志华. 机器学习[M]. 北京:清华大学出版社,2015:293-295.
[13]IFTEKHAR N,DANIEL G. Convergence of the EM algorithm for Gaussian mixtures with unbalanced mixing coefficients[C]// ICML,Edinburgh,Scotland,2012.
[14]UEDA N,NAKANO R. Deterministic annealing EM algorithm[J]. Neural networks,1998,11(2):271-282.
[15]STEINHAUS H. Sur la division des corpmaterielsen parties[J]. Bulletin of Acad Polon Sci,IV(C1. Ⅲ),1956:801-804.
[16]MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]//Fifth Berkeley Symposium on Mathematics,Statistics and Probability. California,USA, 1967:281-297.
[17]ASUNCION A,NEWMAN D. UCI machine learning repository[EB/OL]. [2014-02-18]. http://www.ics.uci.edu/~mlearn/MLRepository.html. 2007.

备注/Memo

备注/Memo:
收稿日期:2017-03-18.
基金项目:国家自然科学基金(61503260)、河北省研究生创新资助项目(CXZZSS2017131).
通讯联系人:李文斌,博士后,教授,研究方向:机器学习、复杂网络等. E-mail:25304189@qq.com
更新日期/Last Update: 2017-09-30