|Table of Contents|

Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2015年01期
Page:
57-
Research Field:
计算机科学
Publishing date:

Info

Title:
Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method
Author(s):
Wang Hengjing1Cao Cungen2Gao Shang1
(1.School of Computer Science and Engineering,Jiangsu University of Science and Technology,Zhenjiang 212003,China)(2.Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
Keywords:
micro-blog public opinion analysisword sense clustercollocationsimilaritytext clustering
PACS:
TP391
DOI:
-
Abstract:
Micro-blog is the new internet information exchange platform emerged recently,which has the features of theme dispersion,short volume,stylistic freedom,and it can have a huge impact on society. So the information supervision department and commercial enterprise have urgent demand for public opinion analysis based on micro-blog information. This paper presents a novel collocation-based method for text clustering. This method conducts micro-blog text preprocessing firstly,and then uses word sense clustering model to extract effective collocation automatically,and effective collocation-based text clustering finally. Experiments proved that the efficiency of the text clustering method using word sense cluster is higher than traditional text clustering method by 6.3%,and the method of this paper has higher rate than the text clustering method using word sense cluster by 16.8%. The result shows the validity of our method.

References:

[1] 李勇,张克亮,李伟刚. 基于微博的网络舆情分析系统设计[J]. 计算技术与自动化,2013,32(2):2-5.
[2]张洋,何楚杰,段俊文. 微博舆情热点分析系统设计研究[J]. 信息网络安全,2012(9):60-63.
[3]唐国瑜,夏云庆,张民. 基于词义类簇的文本聚类[J]. 中文信息学报,2013,27(3):114-118.
[4]董婧灵. 基于LDA模型的文本聚类研究[D]. 武汉:华中师范大学计算机科学系,2012.
[5]石晶,李万龙. 基于LDA模型的主题词抽取方法[J]. 计算机工程,2010,39(19):81-83.
[6]陈慧,石冰. 基于贝叶斯模型的微博虚假话题数据分析研究[D]. 山东:山东大学计算机科学与技术学院,2013.
[7]Pessiot J,Kim Y,Amini M,et al. Improving document clustering in a learned concet space[J]. Information Processing and Management,2010,46:180-192.
[8]Dhillon S. Co-clustering document and words using bipartite spectral graph partitioning[C]//UT CS Technical Report. Austin,2001:269-274.
[9]朱鑫,词语搭配自动抽取方法对比研究[D]. 大连:大连海事大学计算机科学与技术学院,2010.
[10]孙茂松,黄昌宁,方捷. 汉语搭配定量分析初探[J]. 中国语文,1997(1):29-38.
[11]邓耀臣,王同顺. 词语搭配抽取的统计方法及计算机实现[J]. 外语电化教学,2005,105:25-26.
[12]郎需超. 基于R值的汉语搭配抽取[D]. 北京:北京邮电大学计算机科学与技术学院,2012.
[13]Cowie A P,Mackin R,McCaig I R. Oxford Dictionary of Current Idiomatic English[M]. London:Oxford University Press,1975.
[14]Brody S,M Lapata. Bayesian word sense induction[C]//Proc of EACL. Bergen,Norway:European Chapter of the Association for Computational Linguistics,2009:101-113.
[15]王金铨,梁茂成,俞洪亮. 基于N-gram和向量空间模型的语句相似度研究[J]. 现代外语,2007,30(4):406-412.
[16]曾星宇,李淑琴,陈斌. 基于微博文本的舆情分析和研究[J]. 信息技术与信息化,2014(1):86-87.
[17]林达真,面向博客的舆情分析若干关键技术研究[D]. 厦门:厦门大学计算机科学系,2012.
[18]曲维光,陈小荷,吉根林. 基于框架的词语搭配自动抽取方法[J]. 计算机工程,2004,30(23):22-24.
[19]Tang G,Xia Y,Zhang M,et al. 2011 CLGVSM:adapting generalized vector space model to cross-lingual document clustering[C]//Proc of IJCNLP,Hainan Island:Springer,2010:578-588.
[20]Steinbach M,Karypis G,Kumar V. A comparison of document clustering techniques[C]//KDD Workshop on Text Mining. Boston,2000:368-503.
[21]楼佳. 中文文本聚类的评价与改进研究[D]. 杭州:杭州电子科技大学计算机学院,2009.
[22]刘远超,王晓龙,徐志明. 文档聚类综述[J]. 中文信息学报,2005,20(3):57-61.
[23]周昭涛. 文本聚类分析效果评价及文本表示研究[D]. 北京:中国科学院计算技术研究所,2005.
[24]李勇,张克亮,李伟刚. 基于微博的网络舆情分析系统设计[J]. 计算机技术与自动化,2013,32(2):123-127.
[25]时睿,面向短文本的网络舆情分析[D]. 西安:西安电子科技大学电子工程学院,2012.
[26]陈雅菊,现代汉语词语搭配的自动抽取方法[D]. 上海:华东师范大学软件学院,2005.
[7]Thangavel K,Pethalakshmi A. Dimensionality reduction based on rough set theory:a review[J]. Applied Soft Computing,2009,9(1):1-12.
[8]林俊伟,叶东毅. 基于领域辨识矩阵的属性约简增量式算法[J]. 计算机应用,2009,29(11):119-121
[9]Hu F,Wang G Y,Huang H,et al. Incremental attribute reduction based on element arysets[C]//Proceedings of the 10th International Conference on Rough Sets,Fuzzy Sets,Data Mining,and Granular Computing. Regina,2005:183-193
[10]梁吉业,魏巍,钱宇华. 一种基于条件熵的增量核求解方法[J]. 系统工程理论与实践,2008,28(4):81-89
[11]Guoyin W,Yiyu Y,Hong Y. A survey on rough set theory and applications[J]. Chinese Journal of Computers,2009,32(7):1 229-1 246.
[12]Yu H,Liu Z,Wang G. An automatic method to determine the number of clusters using decision-theoretic rough set[J]. International Journal of Approximate Reasoning,2014,55(1):101-115.
[13]Jia X,Liao W,Tang Z,et al. Minimum cost attribute reduction in decision-theoretic rough set models[J]. Information Sciences,2013,219:151-167.
[14]Chen H,Li T,Ruan D,et al. A rough-set-based incremental approach for updating approximations under dynamic maintenance environments[J]. IEEE Transactions on Knowledge and Data Engineering,2013,25(2):274-284.

Memo

Memo:
-
Last Update: 2015-03-30