|Table of Contents|

Research on Disambiguation of Multiple Syntactic Category Words Based on Ensemble of Classifiers(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2010年04期
Page:
144-147
Research Field:
计算机科学
Publishing date:

Info

Title:
Research on Disambiguation of Multiple Syntactic Category Words Based on Ensemble of Classifiers
Author(s):
Zhang YizheQu WeiguangLiu JinkeSun Yuxia
School of Computer Science and Technology,Nanjing Normal University,Nanjing 210097,China
Keywords:
d isam biguation o f mu ltip le syntactic category w ords suppo rt vector m ach ine cond itiona l random fie lds m ax im um entropy ensemb le o f class ifies
PACS:
TP391.1
DOI:
-
Abstract:
One of the d ifficu lties o f Chinese wo rd POS tagg ing is the d isam biguation of mu ltiple syn tactic categoryw ords. In order to tack le this prob lem, th is a rtic le tr ies the ensem ble o f three c lassifie rs of suppo rt vector m ach ine, m ax im um entropy and cond itiona l random fields. 410 o ften-used examp les from Peop les Daily corpus in January 1998 are used in the expe rim ent, and the average precision is up to 89.69%. Th is is a re lative good resu lt

References:

[ 1] 刘开瑛. 中文文本自动分词和标注[M ]. 北京: 商务印书馆, 2000: 162-166.
[ 2] 张虎, 郑家恒. 基于分类的汉语语料库词性标注一致性检查[ J]. 计算机工程, 2008, 34( 8): 90-92.
[ 3] 周强. 规则和统计相结合的汉语词类标注方法[ J] . 中文信息学报, 1995, 9( 3): 1-10.
[ 4] 白栓虎. 汉语词切分及词性自动标注一体化方法[ J] . 中文信息, 1996( 2): 46-48.
[ 5] 刘群, 张华平, 俞鸿魁, 等. 基于层叠隐马模型的汉语词法分析[ J]. 计算机研究与发展, 2003, 41( 8): 1 421-1 428.
[ 6] 钱揖丽, 郑家恒. 汉语语料词性标注自动校对方法的研究[ J]. 中文信息学报, 2003, 18( 2): 33-35.
[ 7] 邓乃扬, 田英杰. 支持向量机—理论、算法与拓展[M ]. 北京: 科学出版社, 2009: 79-111.
[ 8]  Lafferty J, M cCa llum A, Pere ira F. Conditiona l random fie lds: probabilisticm ode ls for segm en ting and labe ling sequence data [ C ] / /Proceed ing s of the 18 th ICML. San Francisco: Mogan Koufm ann, 2001: 282-289.
[ 9] 丁德鑫, 曲维光, 徐涛, 等. 基于CRF 模型的组合型歧义消解研究[ J]. 南京师范大学学报: 工程技术版, 2008, 8( 4): 73-76.
[ 10]  Adwa it Ratnaparkh.i A sim ple introduc tion toM ax im um En tropyM odels for natura l language process[ R]. Ph ilade lph ia: Un iv ers ity o f Pennsy lvania, Tech Rep: IRCS-97-08, 1997.
[ 11]  俞士汶, 段慧明, 朱学锋, 等. 北京大学现代汉语语料库基本加工规范[ J] . 中文信息学报, 2002, 16( 5): 49-64.
[ 12] 俞士汶, 段慧明, 朱学锋, 等. 北京大学现代汉语语料库基本加工规范( 续) [ J]. 中文信息学报, 2002, 16( 6): 59-63.
[ 13]  郭永辉, 吴保民, 王炳锡. 一种用于词性标注的相关投票融合策略[ J]. 中文信息学报, 2007, 21( 2): 9-13.
[ 14] 姜维, 关毅, 王晓龙. 基于条件随机域的词性标注模型[ J]. 计算机工程与应用, 2006, 21: 13-16.

Memo

Memo:
-
Last Update: 2013-04-08