[1]龚乐君,张立鹏,李宇茜,等.基于决策树的乳腺癌病历文本的挖掘与决策[J].南京师范大学学报(自然科学版),2019,42(03):42-51.[doi:10.3969/j.issn.1001-4616.2019.03.006]
 Gong Lejun,Zhang Lipeng,Li Yuxi,et al.Mining and Decision-Making of Breast Cancer MedicalRecord Text Based on Decision Tree[J].Journal of Nanjing Normal University(Natural Science Edition),2019,42(03):42-51.[doi:10.3969/j.issn.1001-4616.2019.03.006]
点击复制

基于决策树的乳腺癌病历文本的挖掘与决策()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第42卷
期数:
2019年03期
页码:
42-51
栏目:
·全国机器学习会议论文专栏·
出版日期:
2019-09-30

文章信息/Info

Title:
Mining and Decision-Making of Breast Cancer MedicalRecord Text Based on Decision Tree
文章编号:
1001-4616(2019)03-0042-10
作者:
龚乐君1张立鹏1李宇茜1吴向辉1高志宏2潘传迪2杨 庚1
(1.江苏省大数据安全与智能处理重点实验室,南京邮电大学计算机学院、软件学院、网络空间安全学院,江苏 南京 210023)(2.浙江省智慧医疗工程技术研究中心,浙江 温州 325035)
Author(s):
Gong Lejun1Zhang Lipeng1Li Yuxi1Wu Xianghui1Gao Zhihong2Pan Chuandi2Yang Geng1
(1.Jiangsu Key Lab of Big Data Security & Intelligent Processing,School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)(2.Zhejiang Engineering Research Center of Intelligent Medicine,Wenzhou 325035,China)
关键词:
乳腺癌自然语言处理决策树文本挖掘Neo4j
Keywords:
breast cancernatural language processingdecision treetext miningNeo4j
分类号:
TP391
DOI:
10.3969/j.issn.1001-4616.2019.03.006
文献标志码:
A
摘要:
乳腺癌是女性最常见的恶性肿瘤之一,严重威胁着世界范围内女性的健康,临床病历文本携带着经验丰富医生对疾病的诊断信息,对其挖掘,可获得乳腺癌相关的病况,从而可以辅助决策. 本文提交了一种方法从文本处理的角度,使用数据挖掘算法-决策树处理病历文本,挖掘乳腺癌疾病相关信息,对乳腺癌进行TNM及临床癌症分期决策,并对决策结果进行验证,同时结合Neo4j图数据库建立乳腺癌TNM-临床分期知识图谱,通过实例展示,该方法可得到乳腺癌的TNM与临床癌症分期病况. 表明提交的方法有望用来辅助医生进行决策.
Abstract:
Breast cancer is one of the most common malignant tumors in women,which seriously threatens the health of women worldwide. Clinical medical records carry the diagnostic information from experienced doctors. Mining these records could receive breast cancer-related conditions. This paper presents a method using data mining algorithm-Decision Tree to process medical records,to obtain breast cancer disease-related information via text processing. We conduct TNM and clinical cancer staging decisions for breast cancer and validate decision results. At the same time,we also combine the Neo4j-map database to establish breast cancer TNM-clinical staging knowledge map. The example shows that this method could obtain TNM and clinical cancer grading conditions for breast cancer. It indicates that the presented method is expected to be used to assist doctors in making decisions.

参考文献/References:

[1] 史双,路潜,杨萍,等. 乳腺癌就诊延误的研究现状[J]. 中华护理杂志,2015(4):88-91.
[2]陈万青,张思维,郑荣寿,等. 中国2009年恶性肿瘤发病和死亡分析[J]. 中国肿瘤,2013(1):5-15.
[3]TSURUOKA Y,MIWA M,HAMAMOTO K,et al. Discovering and visualizing indirect associations between biomedical concepts[J]. Bioinformatics,2011,27(13):111-119.
[4]OKAZAKI N,ANANIADOU S,TSUJII J. Building a high-quality sense inventory for improved abbreviation disambiguation[J]. Bioinformatics,2010,26(9):1246-1253.
[5]WANG X,TSUJII J,ANANIADOU S. Disambiguating the species of biomedical named entities using natural language parsers[J]. Bioinformatics,2010,26(5):661-667.
[6]WANG X,RAK R,RESTIFICAR A,et al. Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature[J]. BMC bioinformatics,2011,12(Suppl 8):S11.
[7]HANNA S. Text mining and information analysis of health documents[J]. Artificial intelligence in medicine,2014,61(3):127-130.
[8]王浩畅,赵铁军. 生物医学文本挖掘技术的研究与进展[J]. 中文信息学报,2008,22(3):89-98.
[9]李慧林. 基于电子病历的疾病预测方法研究及应用[D]. 郑州:郑州大学,2018.
[10]Hazewinkel Mirjam C,de Winter Remco F P,van Est Roel W,et al. Text analysis of electronic medical records to predict seclusion in psychiatric wards:proof of concept.[J]. Frontiers in psychiatry,2019,10:188-192.
[11]李德辉,范焕芳,孙春霞. 乳腺癌中医证型与TNM分期相关性的Meta分析[J]. 中国老年学杂志,2017(15):135-137.
[12]傅春燕,陈述政,潘颖. 乳腺癌中医症候分类与TNM分期相关性研究[J]. 中国现代医生,2013(4):121-123.
[13]薛卫成. 介绍乳腺癌TNM分期系统(第7版)[J]. 诊断病理学杂志,2010(4):6-9.
[14]刘雨馨,王振光,武凤玉,等. 乳腺癌术前TNM分期与术后~(18)F-FDGPET/CT阳性显像相关性分析[J]. 影像研究与医学应用,2018(11):86-88.
[15]代文杰,张爽. 从AJCC第8版乳腺癌预后分期解读看外科临床新进展[J]. 临床外科杂志,2018(1):21-23.
[16]薛卫成,阚秀. 介绍乳腺癌TNM分期系统(第6版)[J]. 诊断病理学杂志,2008,15(3):161-164.
[17]刘艳辉,张芬. 新辅助化疗后的乳腺癌AJCC TNM分级与预后关系的评价[J]. 循证医学,2007(3):149-151.
[18]王若佳,魏思仪,赵怡然,等. 数据挖掘在健康医疗领域中的应用研究综述[J]. 图书情报知识,2018(11):116-125.
[19]姜欣,徐六通,张雷. C4.5决策树展示算法的设计[J]. 计算机工程与应用,2003,8(4):93-95.
[20]王灿辉,张敏,马少平. 自然语言处理在信息检索中的应用综述[J]. 中文信息学报,2007(2):37-47.
[21]刘颖. 计算语言学[M]. 北京:清华大学出版社,2002.
[22]宗成庆. 统计自然语言处理[M]. 北京:清华大学出版社,2008.
[23]车万翔,刘挺,李生. 实体关系自动抽取[J]. 中文信息学报,2005(2):2-7.
[24]李保利,陈玉忠,俞士汶. 信息抽取研究综述[J]. 计算机工程与应用,2003(10):4-8.
[25]孙琳. 基于NLPIR汉语分词系统和BFSU PowerConc 1.0的警务汉语词频与搭配研究——以禁毒案件为例[J]. 现代语文(语言研究版),2016(12):142-147.
[26]CHEN S B,RAO P. Land degradation monitoring using multitemporal Landsat TM/ETM data in a transition zone between grassland and cropland of northeast China[J]. International journal of remote sensing,2008,29(7):2055-2073.
[27]PAL M,MATHER P M. An assessment of the effectiveness of decision tree methods for land cover classification[J]. Remote sensing of environment,2003,86(4):554-565.
[28]苗夺谦,王珏. 基于粗糙集的多变量决策树构造方法[J]. 软件学报,1997(6):26-32.
[29]马克. 数据清洗在统计调查实践中的应用[J]. 调研世界,2018(10):1-2.
[30]郝爽,李国良,冯建华,等. 结构化数据清洗技术综述[J]. 清华大学学报(自然科学版),2018,58(12):3-16.
[31]饶萍,王建力,王勇. 基于多特征决策树的建设用地信息提取[J]. 农业工程学报,2014(12):241-248.
[32]刘学艺,李平,郜传厚. 极限学习机的快速留一交叉验证算法[J]. 上海交通大学学报,2011,45(8):49-54.

相似文献/References:

[1]吴家皋,周凡坤,张雪英.HMM模型和句法分析相结合的事件属性信息抽取[J].南京师范大学学报(自然科学版),2014,37(01):30.
 WuJiagao,Zhou Fankun,Zhang Xueying.Research of the Extraction Method of Event Properties Based on the Combining of HMM and Syntactic Analysis[J].Journal of Nanjing Normal University(Natural Science Edition),2014,37(03):30.

备注/Memo

备注/Memo:
收稿日期:2019-07-16.基金项目:国家自然科学基金项目(61502243、61502247、61572263)、浙江省智慧医疗工程技术研究中心项目(2016E10011)、中国博士后基金(2018M632349)、江苏省高校自然科学基金(16KJB520003). 通讯联系人:龚乐君,博士,副教授,硕士生导师,研究方向:数据与文本挖掘,生物医学信息处理. E-mail:glj98226@163.com
更新日期/Last Update: 2019-09-30