Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method(PDF)


Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method
Wang Hengjing1Cao Cungen2Gao Shang1
(1.School of Computer Science and Engineering,Jiangsu University of Science and Technology,Zhenjiang 212003,China)(2.Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
micro-blog public opinion analysisword sense clustercollocationsimilaritytext clustering
Micro-blog is the new internet information exchange platform emerged recently,which has the features of theme dispersion,short volume,stylistic freedom,and it can have a huge impact on society. So the information supervision department and commercial enterprise have urgent demand for public opinion analysis based on micro-blog information. This paper presents a novel collocation-based method for text clustering. This method conducts micro-blog text preprocessing firstly,and then uses word sense clustering model to extract effective collocation automatically,and effective collocation-based text clustering finally. Experiments proved that the efficiency of the text clustering method using word sense cluster is higher than traditional text clustering method by 6.3%,and the method of this paper has higher rate than the text clustering method using word sense cluster by 16.8%. The result shows the validity of our method.


Last Update: 2015-03-30