|Table of Contents|

Source Deduplication-based Data Backup and Recovery System(PDF)


Research Field:
Publishing date:


Source Deduplication-based Data Backup and Recovery System
Wang Xinghu1He Anyuan2
(1.College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)(2.Informationization Technology Center,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
source data deduplicationefficiency of backup and recoverypre-processing concurrent computingmulti-level cachesdata backup and recovery system
In order to solve the existing problems of the current source deduplication technology,which is low efficiency of data deduplication,time-consuming fingerprints calculation and frequent requests to the database operation,we design a source deduplication-based data backup and recovery system in this paper. By pre-segmenting the data stream,applying the pre-processed circular queue for storage and segmenting the data block by content-defined chunking at the client,the entire processing process is executed concurrently. The pre-processing concurrent computing module effectively shortens the calculation time. The server stores the adjacent data block and index information by the container and the multi-level caches. The container and multi-level caches are designed in the unit of container,which obviously improves the cache hit rate. Furthermore,the frequent access to the database is optimized by using Bloom filters and multi-level caches. Experimental results show that the system can effectively improve the efficiency of data backup and recovery.


[1] 郭平. 消除冗余解放容量[EB/OL]. http://www.ccw.com.cn/07/0710/c/0710c24_4.html,2007-03-19/2012-06-07.
[2]云端数据碎片可能增加企业存储成本[EB/OL]. http://news.west.cn/58481.html,2019-07-10/2019-12-01.
[3]彭刚. 一种面向容灾存储系统的重复数据删除方法[J]. 网络安全技术与应用,2018(4):43-44.
[4]朱江,冀鸣,杨志成,等. 基于重复数据删除技术的存储系统分析[J]. 信息系统工程,2017(4):70-72.
[5]YANG T M,JIANG H,FENG D,et al. DEBAR:A scalable high-performance de-duplication storage system for backup and archiving[C]//Proceeding of the IEEE IPDPS’10. Piscataway,NJ:IEEE,2010:1-12.
[6]ZHU B,LI K. Avoiding the disk bottleneck in the data domain deduplication file system[C]//Proceeding of the 6th Usenix Conference on File and Storage Technologies. Berkeley:USENIX Association,2008:269-282.
[7]BHAGWAT D,ESHGHI K,MEHRA P. Content-based routing and index partitioning for scalable similarity-based searches in large corpus[C]//Proceeding of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press,2007:105-112.
[8]OUYANG Z,MEMON N,SUEL T,et al. Cluster-Based delta compression of a collection of files[C]//Proceeding of the 3rd International Conference on Web Information Systems Engineering. Washington:IEEE Computer Society Press,2006:257-266.
[9]SUN Y S. Online data deduplication for in-memory big-data analytic systems[C]//2017 IEEE International Conference on Communications(ICC). [S.l.]:IEEE,2017.
[10]CHU X,ILYAS I F,KOUTRIS P. Distributed data deduplication[J]. Proceedings of the VLDB endowment,2016,9(11):864-875.
[11]YANG X,LU R,CHOO K K R,et al. Achieving efficient and privacy-preserving cross-domain big data deduplication in cloud[J]. IEEE transactions on big data,2017(1):1-3.
[12]BOLOSKY W J,CORBIN S,GOEBEL D,et al. Single instance storage in Windows 2000[C]//Proceeding of the 4th Usenix Windows System Symp. Berkeley:USENIX Association,2000:13-24.
[13]BORDER A Z,MITZENMACHER M. Network applications of bloom filters:A survey[J]. Internet Mathematics,2003,1(4):485-509.
[14]MITZENMACHER M. Compressed bloom filters[J]. IEEE ACM transaction on networking,2002,10(5):604-612.
[15]BOBBARJUNG D R,JAGANNATHAN S,DUBNICKI C. Improving duplicate elimination in storage systems[J]. ACM Transaction on storage,2006,2(4):424-448.
[16]JAIN N,DAHLIN M,TEWARI R. Taper:Tiered approach for eliminating redundancy in replica synchronization[C]//Proceeding of the 4th Usenix Conference on File and Technologies. Berkeley:USENIX Association,2005:281-293.
[17]BORDER A Z. Identifying and filtering near-duplicate s[C]//Proceeding of the 11th Annual Symp on Combinatorial Pattern Matching. London:Springer-Verlag,2000:1-10.
[18]刘文龙,李晖,金东勋. 数字指纹生成方案及关键算法研究[J]. 信息网络安全,2015(2):66-70.
[19]DOUGLIS F,IYENGAR A. Application-specific delt encoding via resemblance detection[C]//Proceeding of the 2003 USENIX Anuual Technical Conference. Berkeley:USENIX Association,2003:113-126.
[20]KULKARNI P,DOUGLIS F,LAVOIE J D,et al. Redundancy elimination within large collection within large collections of files[C]//Proceeding of the 2004 Usenix Anuual Technical Conference. Berkeley:USENIX Association,2004:59-72.


Last Update: 2020-05-15