[1]黄 刚,孙 媛.基于Hadoop平台的SPRINT算法的分析与研究[J].南京师范大学学报(自然科学版),2016,39(04):0.[doi:10.3969/j.issn.1001-4616.2016.04.006]
 Huang Gang,Sun Yuan.Analysis and Study of SPRINT Algorithm Based on Hadoop Platform[J].Journal of Nanjing Normal University(Natural Science Edition),2016,39(04):0.[doi:10.3969/j.issn.1001-4616.2016.04.006]
点击复制

基于Hadoop平台的SPRINT算法的分析与研究()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第39卷
期数:
2016年04期
页码:
0
栏目:
·数学与计算机科学·
出版日期:
2016-12-30

文章信息/Info

Title:
Analysis and Study of SPRINT Algorithm Based on Hadoop Platform
文章编号:
1001-4616(2016)04-0025-06
作者:
黄 刚孙 媛
南京邮电大学计算机学院 软件学院,江苏 南京 210003
Author(s):
Huang GangSun Yuan
School of Computer Science & Technology,School of Software,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
HadoopMapReduce数据挖掘决策树SPRINT算法
Keywords:
HadoopMapReducedata miningdecision treeSPRINT algorithm
分类号:
TP301.6
DOI:
10.3969/j.issn.1001-4616.2016.04.006
文献标志码:
A
摘要:
传统的决策树算法在单机平台上处理海量数据挖掘时,容易受到计算能力和存储能力的限制,所以存在耗时过长、容错性差、存储量小的缺点. 而拥有高可靠性和高容错性的Hadoop平台的出现为决策树算法的并行化提供了新的思路. 本文设计和实现了一种基于Hadoop平台的并行SPRINT分类算法. 实验结果表明:基于Hadoop平台的SPRINT分类算法比没有进行并行化的SPRINT算法具有较好的分类正确率、较低的时间复杂度和较好的并行性能,并且能明显提高算法求最佳分裂点时的执行速度.
Abstract:
When the traditional decision tree algorithms handle massive data mining on a single platform,due to limited computing power and storage capacity. It has the shortcomings that taking too long time,poor fault tolerance,small storage capacity. The emergence of the Hadoop platform which has high reliability and fault tolerance has provided a new way for parallelization of decision tree algorithm. In this paper,a parallel SPRINT classification algorithm based on Hadoop platform has been designed and implemented. The results show that the SPRINT classification algorithm based on Hadoop platform has better classification accuracy than the SPRINT algorithm without parallelization. It also has lower time complexity and better parallel performance. It can improve the execution speed of the algorithm for the best time of the split point significantly.

参考文献/References:

[1] 邵峰晶,于忠清. 数据挖掘原理与算法[M]. 北京:中国水利水电出版社,2009.
[2]王云飞. SPRINT分类算法的改进[J]. 科学技术与工程,2008,8(23):6248-6252.
[3]TAYLOR,RONALD C. An overview of the Hadoop/MapReduce/HBase framework and its current application in bioinformatics[J]. BMC bioinformatics,2010.
[4]RANGER C,RAGHORAMAN R,PENMETSA A,et al. Evaluating mapreduce for multi-core and muti-processor systems[C]//IEEE 13th Internet Symposium on High Performance Computer Architecture. Melbourne:IEEE Australia,2007:13-24.
[5]TOM W,DOUG C. Hadoop权威指南[M]. 周敏奇,王晓玲,金澈清,等译. 北京:清华大学出版社,2011.
[6]刘军. Hadoop大数据处理[M]. 北京:人民邮电出版社,2013.
[7]LU D,CHENG X. The research of decision tree Mining based on Hadoop[C]//9th Internation Conference on Fuzzy Systems and Knowledge Discovery(FSKD). Chongqing:IEEE,2012:798-801.
[8]朱明. 数据挖掘[M]. 合肥:中国科学技术大学出版社,2008.
[9]潘天鸣. 基于Hadoop平台的决策树算法并行化研究[D]. 上海:华东师范大学,2012.
[10]刘学军. 基于最小Gini指标的决策树分类算法设计与研究[J]. 教育技术导刊,2009(5):56-57.
[11]李远方,贾时银,邓世昆,等. 基于树结构的MapReduce模型[J]. 计算机技术与发展,2011,21(8):149-152.
[12]朱敏,万剑怡,王明文. 基于MR的并行决策树分类算法的设计与实现[J]. 广西师范大学学报(自然科学版),2011,29(1):82-84.
[13]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J]. 计算机技术与发展,2014,24(11):83-90.

相似文献/References:

[1]鞠训光,邵晓根,鲍 蓉,等.Hadoop下并行BP神经网络骆马湖水质分类[J].南京师范大学学报(自然科学版),2014,37(01):52.
 Ju Xunguang,Shao Xiaogen,Bao Rong,et al.Based on Parallel BP Neural Network of Classification on Water Quality of Luoma Lake Under Hadoop[J].Journal of Nanjing Normal University(Natural Science Edition),2014,37(04):52.
[2]孙鸿艳,吉根林.一种新的基于FP_Growth的频繁项目集并行挖掘算法[J].南京师范大学学报(自然科学版),2016,39(04):0.[doi:10.3969/j.issn.1001-4616.2016.04.005]
 Sun Hongyan,Ji Genlin.New Parallel Algorithm for Mining Frequent Item Sets Based on FP_Growth[J].Journal of Nanjing Normal University(Natural Science Edition),2016,39(04):0.[doi:10.3969/j.issn.1001-4616.2016.04.005]
[3]许 振,吉根林,唐梦梦.基于聚类的兴趣区域间异常轨迹并行检测算法[J].南京师范大学学报(自然科学版),2019,42(01):59.[doi:10.3969/j.issn.1001-4616.2019.01.010]
 Xu Zhen,Ji Genlin,Tang Mengmeng.An Algorithm for Detecting Anomalous Trajectory BetweenInterest Regions Based on Clustering[J].Journal of Nanjing Normal University(Natural Science Edition),2019,42(04):59.[doi:10.3969/j.issn.1001-4616.2019.01.010]
[4]燕 存,吉根林.Item-Based并行协同过滤推荐算法的设计与实现[J].南京师范大学学报(自然科学版),2014,37(01):71.
 Yan Cun,Ji Genlin.Design and Implementation of Item-Based Parallel Collaborative Filtering Algorithm[J].Journal of Nanjing Normal University(Natural Science Edition),2014,37(04):71.

备注/Memo

备注/Memo:
收稿日期:2015-12-16.
基金项目:国家自然科学基金(61171053).
通讯联系人:黄刚,教授,研究方向:计算机在通信中的应用、海量数据管理、移动商务平台设计开发. E-mail:huanggang@njupt.edu.cn
更新日期/Last Update: 2016-12-31