|Table of Contents|

Chinese Named Entity Recognition Method Based on Ensemble Learning(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2022年03期
Page:
123-131
Research Field:
计算机科学与技术
Publishing date:

Info

Title:
Chinese Named Entity Recognition Method Based on Ensemble Learning
Author(s):
Liang Bingtao1Ni Yunfeng2
(1.Hangzhou Youxing Technology CO.,LTD.,Zhejiang 310000,China)(2.College of Communication and Information Engineering,Xi'an University of Science and Technology,Xi'an 710600,China)
Keywords:
named entity recognitionBERT modelensemble learningattention mechanismIDCNN
PACS:
TP391
DOI:
10.3969/j.issn.1001-4616.2022.03.016
Abstract:
Aiming at the problems existing in the classical BiLSTM-CRF(bi-directional long short-term memory-conditional random field)model of Chinese named entity recognition,such as the inability of the embedding vector cannot represent polysemy,the attention of the coding layer is distracted and lack of capturing local spatial features. This paper proposes an ensemble model that combines the advantages of the BERT-BiGRU-MHA-CRF and BERT-IDCNN-CRF models to complete named entity recognition. This method uses the BERT model to obtain a semantic vector containing contextual information,and then inputs the semantic vector into BiGRU-MHA(bi-directional gated recurrent unit-multi head attention)and IDCNN(Iterated Dilated Convolutional Neural Network)networks. The former captures the timing characteristics of the input sequence and can assign weights according to the importance of the characters,the latter mainly captures the spatial characteristics of the input,and uses the mean ensemble method to fuse the captured features. Finally,the global optimal annotation sequence is obtained through the CRF layer. The F1 values of the ensemble model on the datasets of People's Daily and Microsoft Research Asia(MSRA)reached 96.09% and 95.01%,respectively. Compared with the single model,it has increased by more than 0.74% and 0.55%,respectively,which verifies the effectiveness of the method in this paper.

References:

[1]HAMMERTON J. Named entity recognition with long short-term memory[C]//Proceedings of the Seventh Conference on Natural Language Learning at Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies 2003. 2003:172-175.
[2]LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al. Neural architectures for named entity recognition[J]. arXiv preprint arXiv:1603.01360,2016.
[3]ZHANG Y,YANG J. Chinese NER using lattice LSTM[J]. arXiv preprint arXiv:1805.02023,2018.
[4]GUI T,MA R,ZHANG Q,et al. CNN-Based Chinese NER with Lexicon Rethinking[C]//Twenty-Eighth International Joint Conference on Artificial Intelligence. Macao:Springer,2019:4982-4988.
[5]WANG C Q,CHEN W,XU B. Named entity recognition with gated convolutional neural networks[M]//Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer,Cham,2017:110-121.
[6]YU F,KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv preprint arXiv:1511.07122,2015.
[7]YU B H,WEI J X. IDCNN-CRF-based domain named entity recognition method[C]//2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology(IEEE 2nd International Conference on Civil Aviation Safety and Information Technology:542-546.
[8]杨晓辉,毕雪华,张琳琳,等. 基于多任务的中文电子病历中命名实体识别研究[J]. 东北师大学报(自然科学版),2020,52(1):81-87. DOI:10.16163/j.cnki.22-1123/n.2020.01.016.
[9]孙弋,梁兵涛. 基于BERT和多头注意力的中文命名实体识别方法[J/OL]. 重庆邮电大学学报(自然科学版):1-10[2022-02-13]. http://kns.cnki.net/kcms/detail/50.1181.N.20211209.2010.004.html.
[10]张柯文,李翔,严云洋,等. 基于多特征双向门控神经网络的领域专家实体抽取方法[J]. 南京师大学报(自然科学版),2021,44(1):128-135.
[11]孔祥鹏,吾守尔·斯拉木,杨启萌,等. 基于迁移学习的维吾尔语命名实体识别[J]. 东北师大学报(自然科学版),2020,52(2):58-65. DOI:10.16163/j.cnki.22-1123/n.2020.02.010.
[12]李妮,关焕梅,杨飘,等. 基于BERT-IDCNN-CRF的中文命名实体识别方法[J]. 山东大学学报(理学版),2020,55(1):102-109.
[13]周志华. 机器学习[M]. 北京:清华大学出版社,2016.
[14]石春丹,秦岭. 基于BGRU-CRF的中文命名实体识别方法[J]. 计算机科学,2019,46(9):237-242.
[15]杨飘,董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程,2020,46(4):40-45,52. DOI:10.19678/j.issn.1000-3428.0054272.

Memo

Memo:
-
Last Update: 2022-09-15