|Table of Contents|

Research on Method for Independent Languages Acquisitionand Clustering Corpus(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2014年04期
Page:
150-
Research Field:
计算机科学
Publishing date:

Info

Title:
Research on Method for Independent Languages Acquisitionand Clustering Corpus
Author(s):
Zhou Feng1Zhu Junwu1Tong Lin12Chen Weicong3Chen Bo1
(1.College of Information Engineering,Yangzhou University,Yangzhou 225127,China)(2.Institute of Computing Technology Chinese Academy of Sciences,Laboratory of Intelligent Information Processing,Beijing 100190,China)(3.Department of Computer Science and
Keywords:
independent languagesacquisitionrecognitionalgorithm
PACS:
TP391
DOI:
-
Abstract:
Eliminate irrelevant and corpora clustering is of great significance to improve the quality of the natural language understanding,and it is also the key technology for the pretreatment of the natural language understanding.Independent Languageshave has obvious features in the corpus,and this article through seeds independent language derived strong independent language,and based on the strong independent language recognition and derived a new independent language; then,based on the similarity between sentences constructed by the 2-gram,using the method of hierarchical clustering of corpus of QA corpora are similar clustering problem.Finally,by identifying new independent language experiment and corpus clustering experiment,the validity of the method proposed in this paper is verified.

References:

[1] Sambasivam,Theodosopoulos.Advanced data clustering methods of mining web documents[J].Issues in Informing Science and Information Technology,2006,8(3):563-579.
[2]Han J,Kamber M.数据挖掘概念与技术[M].第二版.范明,孟小峰,译.北京:机械工业出版社,2006.
[3]孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
[4]段良涛,郭曙超.中文文本校对技术研究[J].电脑知识与技术,2014,10(19):4 601-4 604.
[5]陈智鹏.基于统计的搜索引擎中文输入纠错技术研究[D].北京:北京邮电大学电子工程学院,2010.
[6]来社安,蔡中民.基于相似度的问答社区问答质量评价方法[J].计算机应用与软件,2013,30(2):266-269.
[7]李晨,巢文涵,陈小明,等.中文社区问答中问题答案质量评价和预测[J].计算机科学,2011,38(6):230-236.
[8]李彬,刘挺,秦兵,等.基于语义依存的汉语句子相似度计算[J].计算机应用研究,2003,20(12):15-21.
[9]陈力为,袁琦.计算语言学进展与应用[M].北京:清华大学出版社,1995.
[10]刘群,李素建.基于《知网》的词汇语义相似度计算.Http://www.keenage.com.
[11]王盛,樊兴华,陈现麟.利用上下位关系的中文短文本分类[J].计算机应用,2010,30(3):603-606.
[12]刘汉兴,林旭东,田绪红.基于本体的自动答疑系统的研究与实现[J].计算机应用,2010,30(2):415-418.
[13]冯成,陈智敏.领域本体建模方法的研究[J].科学技术与工程,2009,9(2):455-459.

[14]骆正华,樊孝忠,刘林.本体论在自动问答系统中的应用[J].计算机工程与应用,2005,41(32):229-232.
[15]俞士汶.基于骨架依存树的语句相似度计算模型[C]//中文信息处理国际会议(ICCIP’98),北京,1998.
[16]崔恒,蔡东风,苗雪雷.基于网络的中文问答系统及信息抽取算法研究[J].中文信息学报,2004,18(3):24-31.
[17]Keiji Yasuda,Fumiali Suagya,etc.An automatic evaluation method of translation quality using translation answer candidates queried from a parallel corpus[C]//Proceeding of MT Summit’s conference,Santiago de Compostela,2001.
[18]Yao Jianmin,Zhou Ming.An automatic evaluation method for localization oriented lexicalised EBMT system[C]//Proceeding of the 19th International Conference on Computational Linguistics,Taipei,2002.
[19]Yasuhiro Akiba,Kenji Imamura,EiichiroSumita.Using multiple edit distances to automatically rank machine translation output[C]//Proceeding of MT Summit’s conference,Santiago de Compostela,2001.
[20]黄河燕,陈肇雄,张孝飞,等.大规模句子相似度计算方法[J].中文信息学报,2006,(z1):47-52.

Memo

Memo:
-
Last Update: 2014-12-31