[1]贾 涛,韩 萌,王少峰,等.数据流决策树分类方法综述[J].南京师范大学学报(自然科学版),2019,42(04):49-60.[doi:10.3969/j.issn.1001-4616.2019.04.008]
 Jia Tao,Han Meng,Wang Shaofeng,et al.Survey of Decision Tree Classification Methods over Data Streams[J].Journal of Nanjing Normal University(Natural Science Edition),2019,42(04):49-60.[doi:10.3969/j.issn.1001-4616.2019.04.008]
点击复制

数据流决策树分类方法综述()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第42卷
期数:
2019年04期
页码:
49-60
栏目:
·数学与计算机科学·
出版日期:
2019-12-30

文章信息/Info

Title:
Survey of Decision Tree Classification Methods over Data Streams
文章编号:
1001-4616(2019)04-0049-12
作者:
贾 涛韩 萌王少峰杜诗语申明尧
北方民族大学计算机科学与工程学院,宁夏 银川 750021
Author(s):
Jia TaoHan MengWang ShaofengDu ShiyuShen Mingyao
School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China
关键词:
数据流挖掘分类决策树概念漂移集成分类
Keywords:
data streams miningclassificationdecision treeconcept driftensemble classification
分类号:
TP3
DOI:
10.3969/j.issn.1001-4616.2019.04.008
文献标志码:
A
摘要:
数据流的特征是海量的、高速流动的、实时处理的. 由于一些数据分布随着时间而改变,因此将这些数据流称为概念漂移. 首先按照分类模型对数据流决策树进行分类,分为单分类决策树和集成分类决策树. 单分类模型分为快速决策树、变异决策树和其他决策树算法. 集成分类模型分为衍生快速决策树和随机决策树变体算法. 其次介绍了概念漂移处理技术,包括概念漂移问题的描述、常见的概念漂移处理技术和用于解决概念漂移的决策树算法.接着介绍了增量模型决策树算法,最后对本文介绍的决策树算法进行分析总结.
Abstract:
The data streams are characterized by massive,high-speed and real-time processing. Since some data distributions change over time,these data streams are called concept drift. Firstly,the data streams decision trees are classified according to the classification models,which are divided into single classification decision trees and ensemble classification decision trees. Single classification models are divided into very fast decision tree,mutation decision trees and other decision tree algorithms. The ensemble classification models are divided into derivative very fast decision tree and random decision tree variant algorithms. Secondly,the concept drift processing technologies are introduced,including the description of the concept drift problem,the common concept drift processing technology and the decision tree algorithm for solving the concept drift. Then the incremental model decision tree algorithm is introduced. Finally,the decision tree algorithms introduced in this paper is analyzed and summarized.

参考文献/References:

[1] GHAZIKHANI A,MONSEFI R,YAZDI H S. Ensemble of online neural networks for non-stationary and imbalanced data streams[J]. Neurocomputing 2013,122:535-544.
[2]CAO K Y,WANG G R,HAN D H,et al. An algorithm for classification over uncertain data based on extreme learning machine[J]. Neurocomputing,2016,174(Part A):194-202.
[3]CERVANTES J,LAMONT F G,CHAU A L,et al. Data selection based on decision tree for SVM classification on large datasets[J]. Applied soft computing,2015,37:787-798.
[4]KRANJC J,SMAILOVIC J,PODECAN V. learning for sentiment analysis on data streams:methodology and workflow implementation in the ClowdFlows platform[J]. Information processing & management,2015,51(2):187-203.
[5]RUTKOWSKI L,JAWORSKI M,DUDA P. Decision Trees in Data Stream Mining[M]//Stream Data Mining:Algorithms and Their Probabilistic Properties. Switzerland:Studies in Big Data,Springer Nature Switzerland AG 2020:37-50.
[6]COSTA V G T D,CARVALHO A C,JUNIOR S B. Strict very fast decision tree:a memory conservative algorithm for data stream mining[J]. Pattern Recognition Letters,2018,116:22-28.
[7]丁剑,韩萌,李娟. 概念漂移数据流挖掘算法综述[J]. 计算机科学,2016,43(12):24-29.
[8]MOHAMED M G,ARKADY Z,SHONALI K. A survey of classication methods in data streams[J]. Springer U,2007,43(2):39-59.
[9]BRZEZINSKI D,STEFANOWSKI J. Combining block-based and online methods in learning ensembles from concept drifting data streams[J]. Information sciences,2014,265(5):50-67.
[10]ABBASZADEH O,AMIRI A,KHANTEYMOORL A R. An ensemble method for data stream classification in the presence of concept drift[J]. Front inform technol electron Eng,2015,16(12):1059-1068.
[11]RUTKOWSKI L,JAWORSKI M,PIETRUCZUK L,et al. A new method for data stream mining based on the misclassification error[J]. IEEE transactions on neural networks & learning systems,2015,26(5):1048-1059.
[12]RUTKOWSKI L,JAWORSKI M,PIETRUCZUK L,et al. Decision trees for mining data streams based on the Gaussian approximation[J]. IEEE transactions on knowledge & data engineering,2013,26(1):108-119.
[13]RUTKOWSKI L,JAWORSKI M,PIETRUCZUK L,et al. The CART decision tree for mining data streams[J]. Information sciences,2014,266(5):1-15.
[14]RUTKOWSKI L,PIETRUCZUK L,DUDA P,et al. Decision trees for mining data streams based on the McDiarmid[J]. IEEE transactions on knowledge & data engineering,2013,25(6):1272-1279.
[15]JANKOWSKI D,JACKOWSKI K. Evolutionary algorithm for decision tree induction[C]//IFIP International Conference on Computer Information Systems and Industrial Management. Berlin,Heidelberg:Springer,2014:23-32.
[16]JANKOWSKI D,JACKOWSKI K. An increment decision tree algorithm for streamed data[C]//Trustcom/bigdatase/ispa. Helsinki,Finland:IEEE,2015:199-204.
[17]JANKOWSKI D,JACKOWSKI K,CYGANEK B. Learning decision trees from data streams with concept drift[J]. Procedia computer science,2016,80:1682-1691.
[18]MIRZAMOMEN Z,KANGAVARI M R. Evolving fuzzy min-max neural network based decision trees for data stream classification[J]. Neural processing letters,2017,45(1):341-363.
[19]DUDA P,JAWORSKI M,PIETRUCZUK L,et al. A novel application of Hoeffding’s inequality to decision trees construction for data streams[C]//International Joint Conference on Neural Networks. Beijing:IEEE,2014:3324-3330.
[20]陈煜,李玲娟. 一种基于决策树的隐私保护数据流分类算法[J]. 计算机技术与发展,2017,27(7):111-114.
[21]MANAPRAGADA C,WEBB G,SALEHI M. Extremely fast decision tree[C]//International Conference on Knoledge Discovery,London,Unite Kingdom,2018:1-10.
[22]CZARNOWSKI I,JEDRZEJOWICZ P. Ensemble classifier for mining data streams[J]. Procedia computer science,2014,35(9):397-406.
[23]HAN D,LI S,WEI F,et al. Two birds with one stone:classifying positive and unlabeled examples on uncertain data streams[J]. Neurocomputing,2018,277(1):149-160.
[24]KRAWCZYK B,SKRYJOMSKI P. Cost-sensitive perception decision trees for imbalanced drifting data streams[M]//Machine Learning and Knowledge Discovery in Databases. Cham:Springer,2017:512-527.
[25]YANG H,FONG S. Incrementally optimized decision tree for mining imperfect data streams[M]//Networked Digital Technologies. Berlin,Heidelberg:Springer,2012:281-296.
[26]ZLIOBAITE I. Learning under concept drift:an overview[J]. Computer science,2010,270(10):1-36.
[27]LI P P,WU X D,HU X G. Learning concept-drifting data streams with random ensemble decision trees[J]. Neurocomputing,2015,166(C):68-83.
[28]LIANG C Q,ZHANG Y,SONG Q. Decision tree for dynamic and uncertain data streams[C]//Proceedings of the Second Asian Conference on Machine Learning. ACML,Tokyo,Japan:Microtome Publishing,2010:209-224.
[29]LI P P,WU X D,HU X G,et al. A random decision tree ensemble for mining concept drifts from noisy data streams[J]. Applied artificial intelligence,2010,24(7):680-710.
[30]张剑,曹萍,寿国础. 网络流量识别的自适应分级滑动窗决策树算法[J]. 计算机应用研究,2013,30(8):2470-2472.
[31]刘志军,张杰,许广义. 基于自适应快速决策树的不确定数据流概念漂移分类算法[J]. 控制与决策,2016,31(9):1609-1614.
[32]DOMINGOS P,HULTEN G. Mining high-speed data streams[C]//Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston:ACM Press,2000:71-80.
[33]HULTEN G,DOMINGOS P. Mining decision trees from streams[M]//Data Stream Management. Berlin Heidelberg:Springer,2016.
[34]JOAO G,ROCHA R,MEDAS P. Accurate decision trees for mining high-speed data streams[C]//Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington,DC,USA:ACM,2003:24-27.
[35]HULTEN G,SPENCER L,DOMINGOS P. Mining time-changing data streams[C]//Proceeding of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Fransisco,2001:97-106.
[36]KRAWCZYK B,MINKU L L,GAMA J. Ensemble learning for data stream analysis:a survey[J]. Information fusion,2017,37(C):132-156.
[37]PFAHRINGER B,HOLMES G,KIRKBY R. New options for Hoeffding trees[M]//AI 2007:Advances in Artificial Intelligence. Berlin Heidelberg:Springer,2007.
[38]BIFET A,HOLMES G,PFAHRINGER B,et al. Fast perceptron decision tree learning from evolving data streams[C]//Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin:Springer,2010:299-310.
[39]FAN W,WANG H,YU P S,et al. Is random model better?On its accuracy and efficiency[C]//IEEE International Conference on Data Mining. New York:Hawthorne,IEEE,2003:51-58.
[40]BREIMAN L. Random forests[J]. Machine learning,2001,45(1):5-32.
[41]GAMA J,MEDAS P,POCHA R. Forest trees for on-line data[C]//Proceedings of ACM Symposium on Applied Computing. ACM,New York,NY,USA,2004:632-636.
[42]ABDULSALAM H,SKILLICORN D B,MARTIN P. Streaming random forests[C]//Database Engineering and Applications Symposium,2007. Ideas 2007 International. Banff,Alta,Canada,IEEE,2007:225-232.
[43]HU X,LI P,WU X,WU G. A semi-random multiple decision-tree algorithm for mining data streams[J]. Journal of Computer Science and Technology,2007,22(5):711-724.
[44]ABDULSALAM H,SKILLICORN D B,MARTIN P. Classifying Evolving Data Streams Using Dynamic Streaming Random Forests[C]//International Conference on Database and Expert Systems Applications. Kingston,Canada:Springer-Verlag,2008:643-651.
[45]LI P,HU X,WU X. Mining concept-drifting data streams with Multiple Semi-Random Decision Trees[C]//International Conference on Advanced Data Mining and Applications. Berlin,Heidelberg:Springer-Verlag,2008:733-740.
[46]IKONOMOVSKA E,GAMA J,DEROSKI S. Learning model trees from evolving data streams[J]. Data mining & knowledge discovery,2011,23(1):128-168.
[47]JABER G,CORNUEJOLS A,TARROUX P. A new on-line learning method for coping with recurring concepts:The ADACC System[C]//International Conference on Neural Information Processing. Berlin,Heidelberg:Springer,2013:595-604.
[48]ZHUKOV A V,SIDOROV D N,FOLEY A M. Random forest based approach for concept drift handling[C]//Analysis of Images Social Networks and Texts:5th International Conference. Yekaterinburg,Russia,2016,661:69-77.
[49]BIFET A,GAVALDA R. Learning from time-changing data with adaptive windowing[C]//Siam International Conference on Data Mining. Minneapolis:DBLP,2007.
[50]白洋. 数据流概念漂移检测和不平衡数据流分类算法研究[D]. 北京:北京交通大学,2017.
[51]GAMA J,MEDAS P,CASTILLO G,et al. Learning with drift detection[J]. Intelligent data analysis,2004,8:286-295.
[52]RAUDYS S. Statistical and neural classifiers:an integrated approach to design[M]. Berlin,Heidelberg:Springer-Verlag,2014:289.
[53]ROSS G J,ADAMS N M,TASOULIS D K,et al. Exponentially weighted moving average charts for detecting concept drift[J]. Pattern recognition letters,2012,33(2):191-198.
[54]JOAO G,BIFET A,PECHENIZKIY M,et al. A survey on concept drift adaptation[J]. Acm computing surveys,2014,46(4):1-37.
[55]FARID D M,RAHMAN C M. Novel class detection in concept-drifting data stream mining employing decision tree[C]//International Conference on Electrical & Computer Engineering. Dhaka,Bangladesh:IEEE,2013:630-633.
[56]BARDDAL J P,GOMES H M,ENEMBRECK F. A survey on feature drift adaptation[C]//IEEE International Conference on Tools with Artificial Intelligence. Vietri Sul Mare,Italy:IEEE,2015:1-8.
[57]HASHEMI S,YANG Y. Flexible decision tree for data stream classification in the presence of concept change,noise and missing values[J]. Data mining & knowledge discovery,2009,19(1):95-131.
[58]ISAZADEH A,MAHAN F,PEDRYCZ W. MFlexDT:multi flexible fuzzy decision tree for data stream classification[J]. Soft computing,2016,20(9):3719-3733.
[59]SONG X,WANG H,HE H Y,et al. MHFlexDT:a multivariate branch fuzzy decision tree data stream mining strategy based on hybrid partitioning standard[C]//International Symposium on Neural Networks. Cham:Springer,2018:310-317.
[60]LI P P,WU X D,LIANG Q H,et al. Random ensemble decision trees for learning concept-drifting data streams[M]//Advances in Knowledge Discovery and Data Mining. Berlin,Heidelberg:Springer,2011:313-325.
[61]RAMREZ G S,KRAWCZYK B,CARCIA S,et al. A survey on data preprocessing for data stream mining[J]. Neurocomputing,2017,239(C):39-57.
[62]JAPKOWICZ N,STEFANOWSKI J. Big data analysis:new algorithms for a new society[M]. Switzerland:Springer International Publishing,2016:1-10.
[63]SHAKER A,HULLERMEIER E. Survival analysis on data streams:analyzing temporal events in dynamically changing environments[J]. International journal of applied mathematics & computer science,2014,24(1):199-212.
[64]CANO A,ZAFRA A. Solving classification problems using genetic programming algorithms on GPUs[C]//International Conference on Hybrid Artificial Intelligence Systems. Berlin,Heidelberg:Springer-Verlag,2010:17-26.

相似文献/References:

[1]刘钦普.国内低碳城市的概念及评价指标体系研究评述[J].南京师范大学学报(自然科学版),2014,37(02):1.
 Liu Qinpu.Review of Researches on Evaluation Index Systems of LowCarbon City in China[J].Journal of Nanjing Normal University(Natural Science Edition),2014,37(04):1.
[2]钟桂凤,庞雄文,孙道宗.基于差分进化的卷积神经网络的文本分类研究[J].南京师范大学学报(自然科学版),2022,45(01):136.[doi:10.3969/j.issn.1001-4616.2022.01.019]
 Zhong Guifeng,Pang Xiongwen,Sun Daozong.Research on Text Classification Based on Convolutional Neural Network of Differential Evolution[J].Journal of Nanjing Normal University(Natural Science Edition),2022,45(04):136.[doi:10.3969/j.issn.1001-4616.2022.01.019]
[3]单芝慧,韩 萌,韩 强.基于滑动窗口的数据流高效用模糊项集挖掘[J].南京师范大学学报(自然科学版),2023,46(01):120.[doi:10.3969/j.issn.1001-4616.2023.01.016]
 Shan Zhihui,Han Meng,Han Qiang.High Utility Fuzzy Itemsets Mining Over Data Stream Based on Sliding Window Model[J].Journal of Nanjing Normal University(Natural Science Edition),2023,46(04):120.[doi:10.3969/j.issn.1001-4616.2023.01.016]

备注/Memo

备注/Memo:
收稿日期:2019-06-17.
基金项目:国家自然科学基金项目(61563001).
通讯联系人:韩萌,博士,副教授,研究方向:数据挖掘. E-mail:861254268@qq.com,2003051@nun.edu.cn
更新日期/Last Update: 2019-12-31