|Table of Contents|

Research of Text Subject Extraction Based on Improved Weight for Bayesian Reasoning and TFIDF Algorithm(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2014年01期
Page:
57-
Research Field:
计算机科学
Publishing date:

Info

Title:
Research of Text Subject Extraction Based on Improved Weight for Bayesian Reasoning and TFIDF Algorithm
Author(s):
Shao Xiaogen1Ju Xunguang1Hu Juxin1Ma Zhongwei2
(1.Department of Information and Electrical Engineering,Xuzhou Institute of Technology,Xuzhou 221111,China) (2.College of Information Engineering,Xiangtan University,Xiangtan 411105,China)
Keywords:
Bayesian reasoningposition weighttopic words extractionTFIDF algorithm
PACS:
TP391; TP301
DOI:
-
Abstract:
The shortcoming of the TFIDF algorithm is improved for Chinese text topic word extraction.This paper considers the keywords appearing frequency,position weight in the text,the hybrid algorithm of Bayesian Reasoning and TFIDF was designed to extracte topic words,and the topic words was extracted from forward,reverse and middle based on sorting position of the candidate words.The results was higher average accuracy than the simple TFIDF by 6.2%.

References:

[1] 施聪莺,徐朝军,杨晓江.TFIDF算法研究综述[J].计算机应用,2009,6(29):167-170.
[2]刘兴林,彭宏,马千里.基于增量词集频率的文本主题词提取算法研究[J].计算机应用研究,2010,27(9):3 237-3 238.
[3]饶丽丽,刘雄辉,张东站.基于特征相关的改进加权朴素贝叶斯分类算法[J].厦门大学学报:自然科学版,2012,51(4):682-685.
[4]刘林.基于词语权重改进的朴素贝叶斯分类算法的研究与应用[D].广州:中山大学软件学院,2009.
[5]管瑞霞,陆蓓.TFLD:一种中文文本关键词自动提取方法[J].机电工程,2010,27(9):123-126.
[6]李艳美,张卓奎.基于贝叶斯网络的数据挖掘方法[J].计算机仿真,2008,25(2):117-119.
[7]Sarah Petersen,Mari Ostendorf.Assessing the reading level of web pages[C]//Proceedings of Interspeech(poster).Pittsburgh,2006:833-836..
[8]Christopher D Manning,Prabhakar Raghavan,Hinrich Schutze.Introduction to Information Retrieval[M].Cambridge:Cambridge University Press,2008:96-100.
[9]Harry Zhang,Shengli Sheng.Learning weighted naive bayes with accurate ranking[C]//Fourth IEEE International Conference on Data Mining(ICDM’04).Brighton,2004.DOI:10.1109/ICDM.2004.10030
[10]卫洁,石洪波,冀素琴.基于Hadoop的分布式朴素贝叶斯文本分类[J].计算机系统应用,2012,212:210-212.
[11]胡局新,鞠训光.自学习分词算法在科研项目查重系统中的应用[J].科技通报,2013,29(6):14-19.
[12]胡局新,鞠训光.基于贝叶斯推理和TFIDF算法的中文关键词智能抽取[J].微电子学与计算机,2012,29(9):197-200.

Memo

Memo:
-
Last Update: 2014-03-30