[1]杨天霞,王治和,王华,等.聚类初始中心点选取研究[J].南京师大学报(自然科学版),2010,33(04):161-165.
 Yang Tianxia,Wang Zhihe,Wang Hua,et al.Research of Clustering Initial Center Selection[J].Journal of Nanjing Normal University(Natural Science Edition),2010,33(04):161-165.
点击复制

聚类初始中心点选取研究()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
第33卷
期数:
2010年04期
页码:
161-165
栏目:
计算机科学
出版日期:
2010-12-20

文章信息/Info

Title:
Research of Clustering Initial Center Selection
作者:
杨天霞;王治和;王华;王凌云;
西北师范大学数学与信息科学学院, 甘肃兰州730070
Author(s):
Yang TianxiaWang ZhiheWang HuaWang Lingyun
College of Mathematics and Information Science,Northwest Normal University,Lanzhou 730070,China
关键词:
K-均值 序列模式 Hu ffm an树 聚类 初始中心
Keywords:
K-m eans sequentia l patterns H uffm an tree cluster ing in itial cen ter
分类号:
TP311.13
摘要:
研究了利用已发现的频繁序列模式对序列数据库进行再聚类再发现的问题,针对已有的K-均值聚类算法随机选取初始中心点而导致聚类结果不稳定性的缺点,提出了一种基于Huffman思想的初始中心点选取算法——K-SPAM(K-means algorithm of sequence pattern mining based on the Huffman Method)算法.该算法能够在一定程度上减少陷入局部最优的可能,而且对序列间相似度的计算采用一种高效的"与"、"或"运算,可极大提高算法的执行效率.
Abstract:
The paper stud ied the problem o f recluster ing and red iscover ing in the sequence da tabase on the basis o f the resu lts o f sequentia l patte rn m ining. A im ing at this shortcom ing tha t it could lead to the instab ility o f c lustering results to se lect random ly the initial fo ca l po ints in the ex isting K-m eans c luster ing a lgo rithm, an in itia l center se lec tion a lgo rithm nam ed K-SPAM ( K-m eans algor ithm of sequence pa ttern m in ing based on theH uffm anM ethod) algor ithm was proposed. It was based on Hu ffm an idea. The a lgo rithm cou ld reduce probability of lo ca l optim um to a certa in ex tent. M o reover, a h ighly effic ient / and0 and / o r0 operato rs w ere adopted to ca lculate sim ilar ity betw een pa irs o f sequences. To do so could g rea tly im prove the ex ecution effic iency of the a lgor ithm.

参考文献/References:

[ 1] Ag rawa l A, Sr ikant R. M ining sequential pa tterns[ C ] / / Taipe :i Proc o f the 11 st Int Conf on Data Eng inee ring, 1995: 3-14.
[ 2] Kaufman L, Roueeeuw P J. Finding Groups in Data: An Introduc tion to C luster Analysis[M ]. New York: JohnW iley& Sons, 1990.
[ 3] Mo rzy T, W o jciechow sk iM, Zakrzew iczM. Sca lab le h ierar-ch ica l c luste ring m ethod for sequences of ca tego rical va lues [ C] / /Proc o f the 5th Pac ific-Asia Conference on Know ledg eD iscovery and DataM ining ( PA KDD) , Lecture Notes in Compu ter Sc ience 2035. New Yo rk: Spr inger-V erlag, 2001: 282-293.
[ 4] Ay res J, Gehrkeeta l J. Sequen tia l pattern m in ing using a b itm ap representation[ C] / / Proc of the 8 th ACM S IGKDD Int Conf on Know ledge D iscove ry and DataM in ing. Edm onton, 2002: 429-435.
[ 5] 严蔚敏, 吴伟民. 数据结构[M ]. 北京: 清华大学出版社, 2007: 144-145.
[ 6] UCI 数据集[ DB /OL]. [ 2008-03-13]. h ttp: / /down load. csdn. ne t/source /378926.
[ 7] IBM A lmaden Research Cente r. Quest DataM in ing Pro ject[ DB /OL]. ( 1996-03-12) [ 2007-05-26]. http: / /www. a lm aden. ibm. com /cs /quest /syndata. html .

备注/Memo

备注/Memo:
基金项目: 西北师范大学2006- 2010 年度重点学科基金( 2007C04) . 通讯联系人: 杨天霞, 硕士研究生, 研究方向: 数据挖掘. E-m ail:ytxluck@ 163. com
更新日期/Last Update: 2013-04-08