|Table of Contents|

Analysis and Study of SPRINT Algorithm Based on Hadoop Platform(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2016年04期
Page:
0-
Research Field:
·数学与计算机科学·
Publishing date:

Info

Title:
Analysis and Study of SPRINT Algorithm Based on Hadoop Platform
Author(s):
Huang GangSun Yuan
School of Computer Science & Technology,School of Software,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
Keywords:
HadoopMapReducedata miningdecision treeSPRINT algorithm
PACS:
TP301.6
DOI:
10.3969/j.issn.1001-4616.2016.04.006
Abstract:
When the traditional decision tree algorithms handle massive data mining on a single platform,due to limited computing power and storage capacity. It has the shortcomings that taking too long time,poor fault tolerance,small storage capacity. The emergence of the Hadoop platform which has high reliability and fault tolerance has provided a new way for parallelization of decision tree algorithm. In this paper,a parallel SPRINT classification algorithm based on Hadoop platform has been designed and implemented. The results show that the SPRINT classification algorithm based on Hadoop platform has better classification accuracy than the SPRINT algorithm without parallelization. It also has lower time complexity and better parallel performance. It can improve the execution speed of the algorithm for the best time of the split point significantly.

References:

[1] 邵峰晶,于忠清. 数据挖掘原理与算法[M]. 北京:中国水利水电出版社,2009.
[2]王云飞. SPRINT分类算法的改进[J]. 科学技术与工程,2008,8(23):6248-6252.
[3]TAYLOR,RONALD C. An overview of the Hadoop/MapReduce/HBase framework and its current application in bioinformatics[J]. BMC bioinformatics,2010.
[4]RANGER C,RAGHORAMAN R,PENMETSA A,et al. Evaluating mapreduce for multi-core and muti-processor systems[C]//IEEE 13th Internet Symposium on High Performance Computer Architecture. Melbourne:IEEE Australia,2007:13-24.
[5]TOM W,DOUG C. Hadoop权威指南[M]. 周敏奇,王晓玲,金澈清,等译. 北京:清华大学出版社,2011.
[6]刘军. Hadoop大数据处理[M]. 北京:人民邮电出版社,2013.
[7]LU D,CHENG X. The research of decision tree Mining based on Hadoop[C]//9th Internation Conference on Fuzzy Systems and Knowledge Discovery(FSKD). Chongqing:IEEE,2012:798-801.
[8]朱明. 数据挖掘[M]. 合肥:中国科学技术大学出版社,2008.
[9]潘天鸣. 基于Hadoop平台的决策树算法并行化研究[D]. 上海:华东师范大学,2012.
[10]刘学军. 基于最小Gini指标的决策树分类算法设计与研究[J]. 教育技术导刊,2009(5):56-57.
[11]李远方,贾时银,邓世昆,等. 基于树结构的MapReduce模型[J]. 计算机技术与发展,2011,21(8):149-152.
[12]朱敏,万剑怡,王明文. 基于MR的并行决策树分类算法的设计与实现[J]. 广西师范大学学报(自然科学版),2011,29(1):82-84.
[13]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J]. 计算机技术与发展,2014,24(11):83-90.

Memo

Memo:
-
Last Update: 2016-12-31