[1]王兴虎,何安元.一种基于源端数据重删的数据备份和恢复系统设计与实现[J].南京师范大学学报(自然科学版),2020,43(02):131-139.[doi:10.3969/j.issn.1001-4616.2020.02.020]
 Wang Xinghu,He Anyuan.Source Deduplication-based Data Backup and Recovery System[J].Journal of Nanjing Normal University(Natural Science Edition),2020,43(02):131-139.[doi:10.3969/j.issn.1001-4616.2020.02.020]
点击复制

一种基于源端数据重删的数据备份和恢复系统设计与实现()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第43卷
期数:
2020年02期
页码:
131-139
栏目:
·计算机科学与技术·
出版日期:
2020-05-30

文章信息/Info

Title:
Source Deduplication-based Data Backup and Recovery System
文章编号:
1001-4616(2020)02-0131-09
作者:
王兴虎1何安元2
(1.南京航空航天大学计算机科学与技术学院,江苏 南京 211106)(2.南京航空航天大学信息化技术中心,江苏 南京 211106)
Author(s):
Wang Xinghu1He Anyuan2
(1.College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)(2.Informationization Technology Center,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
关键词:
源端数据重删备份与恢复效率预处理并发计算多级缓存数据备份与恢复系统
Keywords:
source data deduplicationefficiency of backup and recoverypre-processing concurrent computingmulti-level cachesdata backup and recovery system
分类号:
TP311
DOI:
10.3969/j.issn.1001-4616.2020.02.020
文献标志码:
A
摘要:
针对当前源端数据重删技术中,数据重删效率低、计算指纹耗时长以及频繁操作数据库耗时的问题,设计并实现了一种采用基于源端数据重删技术的数据备份与恢复系统. 系统通过在客户端预先将数据流进行分段并采用预处理环形队列进行存储,基于可变分块对数据段进行分块,整个处理过程并发执行,因此该预处理并发计算模块有效缩短了计算时间,而服务端通过用容器存放临近的数据块与索引信息,设计以容器为单位的多级缓存,明显提高缓存命中率. 此外,通过使用布隆过滤器与多级缓存减少了数据库的操作频率. 仿真实验表明该系统能有效提高数据备份与恢复效率.
Abstract:
In order to solve the existing problems of the current source deduplication technology,which is low efficiency of data deduplication,time-consuming fingerprints calculation and frequent requests to the database operation,we design a source deduplication-based data backup and recovery system in this paper. By pre-segmenting the data stream,applying the pre-processed circular queue for storage and segmenting the data block by content-defined chunking at the client,the entire processing process is executed concurrently. The pre-processing concurrent computing module effectively shortens the calculation time. The server stores the adjacent data block and index information by the container and the multi-level caches. The container and multi-level caches are designed in the unit of container,which obviously improves the cache hit rate. Furthermore,the frequent access to the database is optimized by using Bloom filters and multi-level caches. Experimental results show that the system can effectively improve the efficiency of data backup and recovery.

参考文献/References:

[1] 郭平. 消除冗余解放容量[EB/OL]. http://www.ccw.com.cn/07/0710/c/0710c24_4.html,2007-03-19/2012-06-07.
[2]云端数据碎片可能增加企业存储成本[EB/OL]. http://news.west.cn/58481.html,2019-07-10/2019-12-01.
[3]彭刚. 一种面向容灾存储系统的重复数据删除方法[J]. 网络安全技术与应用,2018(4):43-44.
[4]朱江,冀鸣,杨志成,等. 基于重复数据删除技术的存储系统分析[J]. 信息系统工程,2017(4):70-72.
[5]YANG T M,JIANG H,FENG D,et al. DEBAR:A scalable high-performance de-duplication storage system for backup and archiving[C]//Proceeding of the IEEE IPDPS’10. Piscataway,NJ:IEEE,2010:1-12.
[6]ZHU B,LI K. Avoiding the disk bottleneck in the data domain deduplication file system[C]//Proceeding of the 6th Usenix Conference on File and Storage Technologies. Berkeley:USENIX Association,2008:269-282.
[7]BHAGWAT D,ESHGHI K,MEHRA P. Content-based routing and index partitioning for scalable similarity-based searches in large corpus[C]//Proceeding of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press,2007:105-112.
[8]OUYANG Z,MEMON N,SUEL T,et al. Cluster-Based delta compression of a collection of files[C]//Proceeding of the 3rd International Conference on Web Information Systems Engineering. Washington:IEEE Computer Society Press,2006:257-266.
[9]SUN Y S. Online data deduplication for in-memory big-data analytic systems[C]//2017 IEEE International Conference on Communications(ICC). [S.l.]:IEEE,2017.
[10]CHU X,ILYAS I F,KOUTRIS P. Distributed data deduplication[J]. Proceedings of the VLDB endowment,2016,9(11):864-875.
[11]YANG X,LU R,CHOO K K R,et al. Achieving efficient and privacy-preserving cross-domain big data deduplication in cloud[J]. IEEE transactions on big data,2017(1):1-3.
[12]BOLOSKY W J,CORBIN S,GOEBEL D,et al. Single instance storage in Windows 2000[C]//Proceeding of the 4th Usenix Windows System Symp. Berkeley:USENIX Association,2000:13-24.
[13]BORDER A Z,MITZENMACHER M. Network applications of bloom filters:A survey[J]. Internet Mathematics,2003,1(4):485-509.
[14]MITZENMACHER M. Compressed bloom filters[J]. IEEE ACM transaction on networking,2002,10(5):604-612.
[15]BOBBARJUNG D R,JAGANNATHAN S,DUBNICKI C. Improving duplicate elimination in storage systems[J]. ACM Transaction on storage,2006,2(4):424-448.
[16]JAIN N,DAHLIN M,TEWARI R. Taper:Tiered approach for eliminating redundancy in replica synchronization[C]//Proceeding of the 4th Usenix Conference on File and Technologies. Berkeley:USENIX Association,2005:281-293.
[17]BORDER A Z. Identifying and filtering near-duplicate s[C]//Proceeding of the 11th Annual Symp on Combinatorial Pattern Matching. London:Springer-Verlag,2000:1-10.
[18]刘文龙,李晖,金东勋. 数字指纹生成方案及关键算法研究[J]. 信息网络安全,2015(2):66-70.
[19]DOUGLIS F,IYENGAR A. Application-specific delt encoding via resemblance detection[C]//Proceeding of the 2003 USENIX Anuual Technical Conference. Berkeley:USENIX Association,2003:113-126.
[20]KULKARNI P,DOUGLIS F,LAVOIE J D,et al. Redundancy elimination within large collection within large collections of files[C]//Proceeding of the 2004 Usenix Anuual Technical Conference. Berkeley:USENIX Association,2004:59-72.

备注/Memo

备注/Memo:
收稿日期:2019-11-20.通讯作者:王兴虎,博士研究生,工程师,研究方向:软件工程、计算机网络. E-mail:tiger@nuaa.edu.cn
更新日期/Last Update: 2020-05-15