事务(进程 ID 422)与另一个进程被死锁在 锁 资源上,并且已被选作死锁牺牲品。请重新运行该事务。 基于词类和搭配的微博舆情文本聚类方法研究-《南京师大学报》(自然科学版)


点击复制

基于词类和搭配的微博舆情文本聚类方法研究()

《南京师大学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第38卷
期数:
2015年01期
页码:
57
栏目:
计算机科学
出版日期:
2015-06-30

文章信息/Info

Title:
Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method
作者:
王恒静1曹存根2高 尚1
(1.江苏科技大学计算机科学与工程学院,江苏 镇江 212003)(2.中国科学院计算技术研究所智能信息处理重点实验室,北京 100190)
Author(s):
Wang Hengjing1Cao Cungen2Gao Shang1
(1.School of Computer Science and Engineering,Jiangsu University of Science and Technology,Zhenjiang 212003,China)(2.Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
关键词:
Keywords:
分类号:
TP391
文献标志码:
A
摘要:
微博是近年出现的新型互联网信息交流平台,它具有主题分散、体量短小、文体自由等特性,它能对社会产生巨大的影响,所以信息监管部门和商业企业对基于微博信息的舆情分析都有迫切需求. 提出了基于搭配的文本聚类新方法,该方法先进行微博文本预处理,然后利用词类模型进行自动抽取有效搭配,最后基于有效搭配的模型进行文本聚类. 实验证明利用词类文本聚类方法比传统文本聚类方法性能提高6.3%,而本文方法比利用词类文本聚类方法性能提升了16.8%,结果显示了本方法的有效性.
Abstract:
Micro-blog is the new internet information exchange platform emerged recently,which has the features of theme dispersion,short volume,stylistic freedom,and it can have a huge impact on society. So the information supervision department and commercial enterprise have urgent demand for public opinion analysis based on micro-blog information. This paper presents a novel collocation-based method for text clustering. This method conducts micro-blog text preprocessing firstly,and then uses word sense clustering model to extract effective collocation automatically,and effective collocation-based text clustering finally. Experiments proved that the efficiency of the text clustering method using word sense cluster is higher than traditional text clustering method by 6.3%,and the method of this paper has higher rate than the text clustering method using word sense cluster by 16.8%. The result shows the validity of our method.

参考文献/References:

相似文献/References:

备注/Memo

备注/Memo:
更新日期/Last Update: 2015-03-30