Journal of Xidian University ›› 2024, Vol. 51 ›› Issue (1): 187-200.doi: 10.19665/j.issn1001-2400.20230205

• Cyberspace Security • Previous Articles     Next Articles

Deduplication scheme with data popularity for cloud storage

HE Xinfeng1,2(), YANG Qinqin1,2()   

  1. 1. School of Cyberspace Security and Computer,Hebei University,Baoding 071002,China
    2. Key Lab of High Trusted Information System of Hebei Province,Baoding 071002,China
  • Received:2022-10-25 Online:2024-01-20 Published:2023-08-30
  • Contact: YANG Qinqin E-mail:popsoda@126.com;yangqinqin202207@163.com

Abstract:

With the development of cloud computing,more enterprises and individuals tend to outsource their data to cloud storage providers to relieve the local storage pressure,and the cloud storage pressure is becoming an increasingly prominent issue.To improve the storage efficiency and reduce the communication cost,data deduplication technology has been widely used.There are identical data deduplication based on the hash table and similar data deduplication based on the bloom filter,but both of them rarely consider the impact of data popularity.In fact,the data outsourced to the cloud storage can be divided into popular and unpopular data according to their popularity.Popular data refer to the data which are frequently accessed,and there are numerous duplicate copies and similar data in the cloud,so high-accuracy deduplication is required.Unpopular data,which are rarely accessed,have fewer duplicate copies and similar data in the cloud,and low-accuracy deduplication can meet the demand.In order to address this problem,a novel bloom filter variant named PDBF(popularity dynamic bloom filter) is proposed,which incorporates data popularity into the bloom filter.Moreover,a PDBF-based deduplication scheme is constructed to perform different degrees of deduplication depending on how popular a datum is.Experiments demonstrate that the scheme makes an excellent tradeoff among the computational time,the memory consumption,and the deduplication efficiency.

Key words: cloud computing, cloud storage, data deduplication, data popularity, bloom filter

CLC Number: 

  • TP309.2