|Table of Contents|

Research on Drug Addicts Screening ModelBased on Random Forest Algorithm(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2019年02期
Page:
44-49
Research Field:
·数学与计算机科学·
Publishing date:

Info

Title:
Research on Drug Addicts Screening ModelBased on Random Forest Algorithm
Author(s):
Gu Haiyan1Wang Quan2
(1.Department of Computer Information and Cyber Security,Jiangsu Police Institute,Nanjing 210031,China)(2.Shenzhen Qianhai General Health Technology Co.,Ltd,Shenzhen 518000,China)
Keywords:
data preprocessingRandom Forestclassification modelrecall rateF1 score
PACS:
TP183
DOI:
10.3969/j.issn.1001-4616.2019.02.007
Abstract:
Based on data mining technology,using pulse wave data to construct screening model to detect drug addicts is a new technology research. After the pre-processing of pulse wave data,the original Random Forest classification model is initially established with high accuracy,but with a relatively low recall rate and F1 score. To resolve this issue,an improved classification model is henceforth proposed. The improved model mainly involves three improvement strategies:firstly,perform cross-validation by dividing multiple training sets and test sets to obtain generalization errors; Secondly,balance the sample distribution using down-sampling techniques; And finally,select model parameters based on multi-criteria analysis. According to the evaluation results of accuracy,precision,recall rates,and F1 scores,the performance of the improved Random Forest classification model has been significantly improved.

References:

[1] 顾海艳,林祝发. 基于Healthme采集分析系统的吸毒者脉搏波特征研究[J]. 中国人民公安大学学报(自然科学版),2018,24(3):25-29.
[2]李淑娟,张发祥,倪家升,等. 基于FBG的触点式动态压力传感器及其在脉象信息测量中的应用[J]. 光电子·激光,2016,27(10):1017-1022.
[3]MCGARRY M,NAULEAU P,APOSTOLAKIS I,et al. In vivo repeatability of the pulse wave inverse problem in human carotid arteries[J]. Journal of biomechanics,2017,64:136-144.
[4]REECE A S,NORMAN A,HULSE GK. Cannabis exposure as an interactive cardiovascular risk factor and accelerant of organismal ageing:a longitudinal study[J]. Bmj Open,2016,6(11):e011891.
[5]张琳. 关于居民消费水平影响因素及影响水平的实证分析[J]. 统计与管理,2017(6):69-71.
[6]章溢,龚海林. 偏度系数的近似线性贝叶斯估计[J]. 统计与决策,2017(10):78-81.
[7]顾爱华. 云计算网络中高维数据标准化处理优化仿真[J]. 计算机仿真,2017,34(3):317-320.
[8]赵发林,张涛,李康. 基于遗传算法的随机森林模型在特征基因筛选中的应用[J]. 中国卫生统计,2016,33(4):559-562.
[9]陈煜,周继恩,杜金泉. 基于交易数据的信用评估方法[J]. 计算机应用与软件,2018,35(5):168-171.
[10]黄浩,徐海华,王羡慧,等. 自动发音错误检测中基于最大化F1值准则的区分性特征补偿训练算法[J]. 电子学报,2015,43(7):1294-1299.
[11]李婉华,陈宏,郭昆,等. 基于随机森林算法的用电负荷预测研究[J]. 计算机工程与应用,2016,52(23):236-243.
[12]周志华. 机器学习[M]. 北京:清华大学出版社,2016.
[13]孙宽宏. 不平衡数据分类方法研究[D]. 西安:西安电子科技大学,2015.
[14]尹华,胡玉平. 基于随机森林的不平衡特征选择算法[J]. 中山大学学报(自然科学版),2014,53(5):59-65.
[15]姚登举,杨静,詹晓娟. 基于随机森林的特征选择算法[J]. 吉林大学学报(工学版),2014,44(1):137-141.
[16]胡小生,温菊屏,钟勇. 动态平衡采样的不平衡数据集成分类方法[J]. 智能系统学报,2016,11(2):257-263.
[17]杨毅,卢诚波,徐根海. 面向不平衡数据集的一种精化Borderline-SMOTE方法[J]. 复旦学报(自然科学版),2017,56(5):537-544.

Memo

Memo:
-
Last Update: 2019-06-30