|Table of Contents|

Source Deduplication-based Data Backup and Recovery System(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2020年02期
Page:
131-139
Research Field:
·计算机科学与技术·
Publishing date:

Info

Title:
Source Deduplication-based Data Backup and Recovery System
Author(s):
Wang Xinghu1He Anyuan2
(1.College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)(2.Informationization Technology Center,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
Keywords:
source data deduplicationefficiency of backup and recoverypre-processing concurrent computingmulti-level cachesdata backup and recovery system
PACS:
TP311
DOI:
10.3969/j.issn.1001-4616.2020.02.020
Abstract:
In order to solve the existing problems of the current source deduplication technology,which is low efficiency of data deduplication,time-consuming fingerprints calculation and frequent requests to the database operation,we design a source deduplication-based data backup and recovery system in this paper. By pre-segmenting the data stream,applying the pre-processed circular queue for storage and segmenting the data block by content-defined chunking at the client,the entire processing process is executed concurrently. The pre-processing concurrent computing module effectively shortens the calculation time. The server stores the adjacent data block and index information by the container and the multi-level caches. The container and multi-level caches are designed in the unit of container,which obviously improves the cache hit rate. Furthermore,the frequent access to the database is optimized by using Bloom filters and multi-level caches. Experimental results show that the system can effectively improve the efficiency of data backup and recovery.

References:

[1] 郭平. 消除冗余解放容量[EB/OL]. http://www.ccw.com.cn/07/0710/c/0710c24_4.html,2007-03-19/2012-06-07.
[2]云端数据碎片可能增加企业存储成本[EB/OL]. http://news.west.cn/58481.html,2019-07-10/2019-12-01.
[3]彭刚. 一种面向容灾存储系统的重复数据删除方法[J]. 网络安全技术与应用,2018(4):43-44.
[4]朱江,冀鸣,杨志成,等. 基于重复数据删除技术的存储系统分析[J]. 信息系统工程,2017(4):70-72.
[5]YANG T M,JIANG H,FENG D,et al. DEBAR:A scalable high-performance de-duplication storage system for backup and archiving[C]//Proceeding of the IEEE IPDPS’10. Piscataway,NJ:IEEE,2010:1-12.
[6]ZHU B,LI K. Avoiding the disk bottleneck in the data domain deduplication file system[C]//Proceeding of the 6th Usenix Conference on File and Storage Technologies. Berkeley:USENIX Association,2008:269-282.
[7]BHAGWAT D,ESHGHI K,MEHRA P. Content-based routing and index partitioning for scalable similarity-based searches in large corpus[C]//Proceeding of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press,2007:105-112.
[8]OUYANG Z,MEMON N,SUEL T,et al. Cluster-Based delta compression of a collection of files[C]//Proceeding of the 3rd International Conference on Web Information Systems Engineering. Washington:IEEE Computer Society Press,2006:257-266.
[9]SUN Y S. Online data deduplication for in-memory big-data analytic systems[C]//2017 IEEE International Conference on Communications(ICC). [S.l.]:IEEE,2017.
[10]CHU X,ILYAS I F,KOUTRIS P. Distributed data deduplication[J]. Proceedings of the VLDB endowment,2016,9(11):864-875.
[11]YANG X,LU R,CHOO K K R,et al. Achieving efficient and privacy-preserving cross-domain big data deduplication in cloud[J]. IEEE transactions on big data,2017(1):1-3.
[12]BOLOSKY W J,CORBIN S,GOEBEL D,et al. Single instance storage in Windows 2000[C]//Proceeding of the 4th Usenix Windows System Symp. Berkeley:USENIX Association,2000:13-24.
[13]BORDER A Z,MITZENMACHER M. Network applications of bloom filters:A survey[J]. Internet Mathematics,2003,1(4):485-509.
[14]MITZENMACHER M. Compressed bloom filters[J]. IEEE ACM transaction on networking,2002,10(5):604-612.
[15]BOBBARJUNG D R,JAGANNATHAN S,DUBNICKI C. Improving duplicate elimination in storage systems[J]. ACM Transaction on storage,2006,2(4):424-448.
[16]JAIN N,DAHLIN M,TEWARI R. Taper:Tiered approach for eliminating redundancy in replica synchronization[C]//Proceeding of the 4th Usenix Conference on File and Technologies. Berkeley:USENIX Association,2005:281-293.
[17]BORDER A Z. Identifying and filtering near-duplicate s[C]//Proceeding of the 11th Annual Symp on Combinatorial Pattern Matching. London:Springer-Verlag,2000:1-10.
[18]刘文龙,李晖,金东勋. 数字指纹生成方案及关键算法研究[J]. 信息网络安全,2015(2):66-70.
[19]DOUGLIS F,IYENGAR A. Application-specific delt encoding via resemblance detection[C]//Proceeding of the 2003 USENIX Anuual Technical Conference. Berkeley:USENIX Association,2003:113-126.
[20]KULKARNI P,DOUGLIS F,LAVOIE J D,et al. Redundancy elimination within large collection within large collections of files[C]//Proceeding of the 2004 Usenix Anuual Technical Conference. Berkeley:USENIX Association,2004:59-72.

Memo

Memo:
-
Last Update: 2020-05-15