[1]孙鸿艳,吉根林.一种新的基于FP_Growth的频繁项目集并行挖掘算法[J].南京师范大学学报(自然科学版),2016,39(04):0.[doi:10.3969/j.issn.1001-4616.2016.04.005]
 Sun Hongyan,Ji Genlin.New Parallel Algorithm for Mining Frequent Item Sets Based on FP_Growth[J].Journal of Nanjing Normal University(Natural Science Edition),2016,39(04):0.[doi:10.3969/j.issn.1001-4616.2016.04.005]
点击复制

一种新的基于FP_Growth的频繁项目集并行挖掘算法()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第39卷
期数:
2016年04期
页码:
0
栏目:
·数学与计算机科学·
出版日期:
2016-12-30

文章信息/Info

Title:
New Parallel Algorithm for Mining Frequent Item Sets Based on FP_Growth
文章编号:
1001-4616(2016)04-0019-06
作者:
孙鸿艳吉根林
南京师范大学计算机科学与技术学院,江苏 南京 210023
Author(s):
Sun HongyanJi Genlin
School of Computer Science and Technology,Nanjing Normal University,Nanjing 210023,China
关键词:
频繁项目集关联规则FP_GrowthHadoopMap/Reduce
Keywords:
frequent item setsassociation ruleFP_GrowthHadoopMap/Reduce
分类号:
TP311.11
DOI:
10.3969/j.issn.1001-4616.2016.04.005
文献标志码:
A
摘要:
频繁项目集挖掘用于发现项目之间的关联规则. 为了高效求解面向大数据的频繁项目集,本文提出一种新的基于FP_Growth的频繁项目集并行挖掘算法NPFP_Growth(New Parallel algorithm based on FP_Growth),该算法对频繁模式树的存储结构进行改进,基于Map/Reduce并行计算模型,利用HDFS实现数据存储,在各自计算节点上构造局部频繁模式树,求解该局部频繁模式树中每个分支的最长全局频繁项目集; 对于全局非频繁项目集,计算其支持数,发送至相应计算节点进行支持度统计,从而以较为简单的算法实现频繁项目集并行挖掘. 实验表明,NPFP_Growth算法具有较高的计算效率和良好的可伸缩性.
Abstract:
Mining of frequent item sets is used to find the association rules between items. In order to get frequent item sets of big data efficiently,this paper proposes a new parallel algorithm for mining frequent item sets based on FP_Growth,named NPFP_Growth(New Parallel algorithm based on FP_Growth). The storage structure of local frequent pattern tree is improved and created in each node based on parallel computing model Map/Reduce and distributed storage system HDFS,and then longest global frequent item sets are mined in each branch of the tree. Finally,Support for item sets which does not meet global minimum support is computed and then sent to corresponding computing node to count. Parallel mining algorithm NPFP_Growth is implemented. The experimental results show that the algorithm have high computing efficiency and good scalability.

参考文献/References:

[1] HAN J,PEI J,YIN Y. Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record,2000,29(2):1-12.
[2]?ZDOGAN G ?,Abul O. Task-Parallel FP_growth on cluster computers[C]//Proceedings of the International Symposium on Computer and Information Science,London,UK,2010:383-388.
[3]TANBEER S K,AHMED C F,JEONG B S. Parallel and distributed frequent pattern mining in large databases[C]//11th IEEE International Conference on High Performance Computing and Communications,Seoul,Korea,2009:407-414.
[4]SHEN X L,TAO L. Association rules parallel algorithm based on FP-tree[C]//2010 2nd International Conference on Computer Engineering and Technology,Century City New International Convention & Exhibition Center,Chengdu,China,2010,4:687-689.
[5]TU F,HE B. A parallel algorithm for mining association rules based on FP-tree. Advances in computer science,environment,ecoinformatics,and education[M]. Berlin,Heidelberg:Springer,2011:399-403.
[6]金桃,何艳珊,宋伟国,等. 一种简单有效的并行化频繁项集挖掘算法[J]. 微计算机信息,2010(18):147-149.
[7]LI H,WANG Y,ZHANG D,et al. PFP:parallel FP-Growth for query recommendation[C]//Proceedings of the 2008 ACM Conference on Recommender Systems,Lausanne,Switzerland,2008:107-114.
[8]章志刚,吉根林. 一种基于FP-Growth的频繁项目集并行挖掘算法[J]. 计算机工程与应用,2014(2):103-106.

备注/Memo

备注/Memo:
收稿日期:2016-03-20.
基金项目:国家自然科学基金(41471371).
通讯联系人:吉根林,博士,教授,博士生导师,研究方向:数据挖掘与云计算. E-mail:glji@njnu.edu.cn
更新日期/Last Update: 2016-12-31