|Table of Contents|

Survey of Decision Tree Classification Methods over Data Streams(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2019年04期
Page:
49-60
Research Field:
·数学与计算机科学·
Publishing date:

Info

Title:
Survey of Decision Tree Classification Methods over Data Streams
Author(s):
Jia TaoHan MengWang ShaofengDu ShiyuShen Mingyao
School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China
Keywords:
data streams miningclassificationdecision treeconcept driftensemble classification
PACS:
TP3
DOI:
10.3969/j.issn.1001-4616.2019.04.008
Abstract:
The data streams are characterized by massive,high-speed and real-time processing. Since some data distributions change over time,these data streams are called concept drift. Firstly,the data streams decision trees are classified according to the classification models,which are divided into single classification decision trees and ensemble classification decision trees. Single classification models are divided into very fast decision tree,mutation decision trees and other decision tree algorithms. The ensemble classification models are divided into derivative very fast decision tree and random decision tree variant algorithms. Secondly,the concept drift processing technologies are introduced,including the description of the concept drift problem,the common concept drift processing technology and the decision tree algorithm for solving the concept drift. Then the incremental model decision tree algorithm is introduced. Finally,the decision tree algorithms introduced in this paper is analyzed and summarized.

References:

[1] GHAZIKHANI A,MONSEFI R,YAZDI H S. Ensemble of online neural networks for non-stationary and imbalanced data streams[J]. Neurocomputing 2013,122:535-544.
[2]CAO K Y,WANG G R,HAN D H,et al. An algorithm for classification over uncertain data based on extreme learning machine[J]. Neurocomputing,2016,174(Part A):194-202.
[3]CERVANTES J,LAMONT F G,CHAU A L,et al. Data selection based on decision tree for SVM classification on large datasets[J]. Applied soft computing,2015,37:787-798.
[4]KRANJC J,SMAILOVIC J,PODECAN V. learning for sentiment analysis on data streams:methodology and workflow implementation in the ClowdFlows platform[J]. Information processing & management,2015,51(2):187-203.
[5]RUTKOWSKI L,JAWORSKI M,DUDA P. Decision Trees in Data Stream Mining[M]//Stream Data Mining:Algorithms and Their Probabilistic Properties. Switzerland:Studies in Big Data,Springer Nature Switzerland AG 2020:37-50.
[6]COSTA V G T D,CARVALHO A C,JUNIOR S B. Strict very fast decision tree:a memory conservative algorithm for data stream mining[J]. Pattern Recognition Letters,2018,116:22-28.
[7]丁剑,韩萌,李娟. 概念漂移数据流挖掘算法综述[J]. 计算机科学,2016,43(12):24-29.
[8]MOHAMED M G,ARKADY Z,SHONALI K. A survey of classication methods in data streams[J]. Springer U,2007,43(2):39-59.
[9]BRZEZINSKI D,STEFANOWSKI J. Combining block-based and online methods in learning ensembles from concept drifting data streams[J]. Information sciences,2014,265(5):50-67.
[10]ABBASZADEH O,AMIRI A,KHANTEYMOORL A R. An ensemble method for data stream classification in the presence of concept drift[J]. Front inform technol electron Eng,2015,16(12):1059-1068.
[11]RUTKOWSKI L,JAWORSKI M,PIETRUCZUK L,et al. A new method for data stream mining based on the misclassification error[J]. IEEE transactions on neural networks & learning systems,2015,26(5):1048-1059.
[12]RUTKOWSKI L,JAWORSKI M,PIETRUCZUK L,et al. Decision trees for mining data streams based on the Gaussian approximation[J]. IEEE transactions on knowledge & data engineering,2013,26(1):108-119.
[13]RUTKOWSKI L,JAWORSKI M,PIETRUCZUK L,et al. The CART decision tree for mining data streams[J]. Information sciences,2014,266(5):1-15.
[14]RUTKOWSKI L,PIETRUCZUK L,DUDA P,et al. Decision trees for mining data streams based on the McDiarmid[J]. IEEE transactions on knowledge & data engineering,2013,25(6):1272-1279.
[15]JANKOWSKI D,JACKOWSKI K. Evolutionary algorithm for decision tree induction[C]//IFIP International Conference on Computer Information Systems and Industrial Management. Berlin,Heidelberg:Springer,2014:23-32.
[16]JANKOWSKI D,JACKOWSKI K. An increment decision tree algorithm for streamed data[C]//Trustcom/bigdatase/ispa. Helsinki,Finland:IEEE,2015:199-204.
[17]JANKOWSKI D,JACKOWSKI K,CYGANEK B. Learning decision trees from data streams with concept drift[J]. Procedia computer science,2016,80:1682-1691.
[18]MIRZAMOMEN Z,KANGAVARI M R. Evolving fuzzy min-max neural network based decision trees for data stream classification[J]. Neural processing letters,2017,45(1):341-363.
[19]DUDA P,JAWORSKI M,PIETRUCZUK L,et al. A novel application of Hoeffding’s inequality to decision trees construction for data streams[C]//International Joint Conference on Neural Networks. Beijing:IEEE,2014:3324-3330.
[20]陈煜,李玲娟. 一种基于决策树的隐私保护数据流分类算法[J]. 计算机技术与发展,2017,27(7):111-114.
[21]MANAPRAGADA C,WEBB G,SALEHI M. Extremely fast decision tree[C]//International Conference on Knoledge Discovery,London,Unite Kingdom,2018:1-10.
[22]CZARNOWSKI I,JEDRZEJOWICZ P. Ensemble classifier for mining data streams[J]. Procedia computer science,2014,35(9):397-406.
[23]HAN D,LI S,WEI F,et al. Two birds with one stone:classifying positive and unlabeled examples on uncertain data streams[J]. Neurocomputing,2018,277(1):149-160.
[24]KRAWCZYK B,SKRYJOMSKI P. Cost-sensitive perception decision trees for imbalanced drifting data streams[M]//Machine Learning and Knowledge Discovery in Databases. Cham:Springer,2017:512-527.
[25]YANG H,FONG S. Incrementally optimized decision tree for mining imperfect data streams[M]//Networked Digital Technologies. Berlin,Heidelberg:Springer,2012:281-296.
[26]ZLIOBAITE I. Learning under concept drift:an overview[J]. Computer science,2010,270(10):1-36.
[27]LI P P,WU X D,HU X G. Learning concept-drifting data streams with random ensemble decision trees[J]. Neurocomputing,2015,166(C):68-83.
[28]LIANG C Q,ZHANG Y,SONG Q. Decision tree for dynamic and uncertain data streams[C]//Proceedings of the Second Asian Conference on Machine Learning. ACML,Tokyo,Japan:Microtome Publishing,2010:209-224.
[29]LI P P,WU X D,HU X G,et al. A random decision tree ensemble for mining concept drifts from noisy data streams[J]. Applied artificial intelligence,2010,24(7):680-710.
[30]张剑,曹萍,寿国础. 网络流量识别的自适应分级滑动窗决策树算法[J]. 计算机应用研究,2013,30(8):2470-2472.
[31]刘志军,张杰,许广义. 基于自适应快速决策树的不确定数据流概念漂移分类算法[J]. 控制与决策,2016,31(9):1609-1614.
[32]DOMINGOS P,HULTEN G. Mining high-speed data streams[C]//Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston:ACM Press,2000:71-80.
[33]HULTEN G,DOMINGOS P. Mining decision trees from streams[M]//Data Stream Management. Berlin Heidelberg:Springer,2016.
[34]JOAO G,ROCHA R,MEDAS P. Accurate decision trees for mining high-speed data streams[C]//Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington,DC,USA:ACM,2003:24-27.
[35]HULTEN G,SPENCER L,DOMINGOS P. Mining time-changing data streams[C]//Proceeding of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Fransisco,2001:97-106.
[36]KRAWCZYK B,MINKU L L,GAMA J. Ensemble learning for data stream analysis:a survey[J]. Information fusion,2017,37(C):132-156.
[37]PFAHRINGER B,HOLMES G,KIRKBY R. New options for Hoeffding trees[M]//AI 2007:Advances in Artificial Intelligence. Berlin Heidelberg:Springer,2007.
[38]BIFET A,HOLMES G,PFAHRINGER B,et al. Fast perceptron decision tree learning from evolving data streams[C]//Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin:Springer,2010:299-310.
[39]FAN W,WANG H,YU P S,et al. Is random model better?On its accuracy and efficiency[C]//IEEE International Conference on Data Mining. New York:Hawthorne,IEEE,2003:51-58.
[40]BREIMAN L. Random forests[J]. Machine learning,2001,45(1):5-32.
[41]GAMA J,MEDAS P,POCHA R. Forest trees for on-line data[C]//Proceedings of ACM Symposium on Applied Computing. ACM,New York,NY,USA,2004:632-636.
[42]ABDULSALAM H,SKILLICORN D B,MARTIN P. Streaming random forests[C]//Database Engineering and Applications Symposium,2007. Ideas 2007 International. Banff,Alta,Canada,IEEE,2007:225-232.
[43]HU X,LI P,WU X,WU G. A semi-random multiple decision-tree algorithm for mining data streams[J]. Journal of Computer Science and Technology,2007,22(5):711-724.
[44]ABDULSALAM H,SKILLICORN D B,MARTIN P. Classifying Evolving Data Streams Using Dynamic Streaming Random Forests[C]//International Conference on Database and Expert Systems Applications. Kingston,Canada:Springer-Verlag,2008:643-651.
[45]LI P,HU X,WU X. Mining concept-drifting data streams with Multiple Semi-Random Decision Trees[C]//International Conference on Advanced Data Mining and Applications. Berlin,Heidelberg:Springer-Verlag,2008:733-740.
[46]IKONOMOVSKA E,GAMA J,DEROSKI S. Learning model trees from evolving data streams[J]. Data mining & knowledge discovery,2011,23(1):128-168.
[47]JABER G,CORNUEJOLS A,TARROUX P. A new on-line learning method for coping with recurring concepts:The ADACC System[C]//International Conference on Neural Information Processing. Berlin,Heidelberg:Springer,2013:595-604.
[48]ZHUKOV A V,SIDOROV D N,FOLEY A M. Random forest based approach for concept drift handling[C]//Analysis of Images Social Networks and Texts:5th International Conference. Yekaterinburg,Russia,2016,661:69-77.
[49]BIFET A,GAVALDA R. Learning from time-changing data with adaptive windowing[C]//Siam International Conference on Data Mining. Minneapolis:DBLP,2007.
[50]白洋. 数据流概念漂移检测和不平衡数据流分类算法研究[D]. 北京:北京交通大学,2017.
[51]GAMA J,MEDAS P,CASTILLO G,et al. Learning with drift detection[J]. Intelligent data analysis,2004,8:286-295.
[52]RAUDYS S. Statistical and neural classifiers:an integrated approach to design[M]. Berlin,Heidelberg:Springer-Verlag,2014:289.
[53]ROSS G J,ADAMS N M,TASOULIS D K,et al. Exponentially weighted moving average charts for detecting concept drift[J]. Pattern recognition letters,2012,33(2):191-198.
[54]JOAO G,BIFET A,PECHENIZKIY M,et al. A survey on concept drift adaptation[J]. Acm computing surveys,2014,46(4):1-37.
[55]FARID D M,RAHMAN C M. Novel class detection in concept-drifting data stream mining employing decision tree[C]//International Conference on Electrical & Computer Engineering. Dhaka,Bangladesh:IEEE,2013:630-633.
[56]BARDDAL J P,GOMES H M,ENEMBRECK F. A survey on feature drift adaptation[C]//IEEE International Conference on Tools with Artificial Intelligence. Vietri Sul Mare,Italy:IEEE,2015:1-8.
[57]HASHEMI S,YANG Y. Flexible decision tree for data stream classification in the presence of concept change,noise and missing values[J]. Data mining & knowledge discovery,2009,19(1):95-131.
[58]ISAZADEH A,MAHAN F,PEDRYCZ W. MFlexDT:multi flexible fuzzy decision tree for data stream classification[J]. Soft computing,2016,20(9):3719-3733.
[59]SONG X,WANG H,HE H Y,et al. MHFlexDT:a multivariate branch fuzzy decision tree data stream mining strategy based on hybrid partitioning standard[C]//International Symposium on Neural Networks. Cham:Springer,2018:310-317.
[60]LI P P,WU X D,LIANG Q H,et al. Random ensemble decision trees for learning concept-drifting data streams[M]//Advances in Knowledge Discovery and Data Mining. Berlin,Heidelberg:Springer,2011:313-325.
[61]RAMREZ G S,KRAWCZYK B,CARCIA S,et al. A survey on data preprocessing for data stream mining[J]. Neurocomputing,2017,239(C):39-57.
[62]JAPKOWICZ N,STEFANOWSKI J. Big data analysis:new algorithms for a new society[M]. Switzerland:Springer International Publishing,2016:1-10.
[63]SHAKER A,HULLERMEIER E. Survival analysis on data streams:analyzing temporal events in dynamically changing environments[J]. International journal of applied mathematics & computer science,2014,24(1):199-212.
[64]CANO A,ZAFRA A. Solving classification problems using genetic programming algorithms on GPUs[C]//International Conference on Hybrid Artificial Intelligence Systems. Berlin,Heidelberg:Springer-Verlag,2010:17-26.

Memo

Memo:
-
Last Update: 2019-12-31