A Survey on Efficient Storage and Retrieval System for the Implementation of Data Deduplication in Cloud

  • R. VinothEmail author
  • L. Jegatha Deborah
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 49)


Cloud computing is a methodology that gives users a possibility to keep their files, access and retrieve them from any units which are related to internet. Virtualization is accompanied with cloud computing to promote the smartness and extra efficient use of sources within a corporation with the aid of simplifying upkeep and speeding up the configuration of resources. Cloud computing has transformed the software aid for large systems from a single server to provider oriented model. While storing a massive quantity of archives in cloud server repetition of equal file takes region and storage potential is reduced. Since companies make investments by lot of money for storing the facts an environment friendly method is wanted for managing the duplication of big data. Deduplication is a mechanism used in cloud storages to sidestep repetition of same data that increases storage capacity. However cipher text based deduplication is crucial for identifying and removing duplicates when their data or keys are stored in encrypted form. This survey paper focuses on a number deduplication mechanisms and compares the techniques which are used to pick out and eliminate duplicated documents in cloud storage and by results of the survey we have planned to develop an algorithm to compare and identify duplicate data in cloud server by improving the modes of chunk and hash creations.


Cloud computing Virtualization Deduplication Chunk 


  1. 1.
    Jiang, T., Chen, X., Wu, Q., Ma, J., Susilo, W., Lou, W.: Secure and efficient cloud data deduplication with randomized tag. IEEE Trans. Inf. Forensics Secur. 12, 532–543 (2017)CrossRefGoogle Scholar
  2. 2.
    Li, J., Chen, X., Li, M., Li, J., Lee, P.P.C., Lou, W.: Secure deduplication with efficient and reliable convergent key management. IEEE Trans. Parallel Distrib. Syst. 25, 1615–1624 (2014)CrossRefGoogle Scholar
  3. 3.
    Wen, M., Ota, K., Li, H., Lei, J., Chunhua, G., Zhou, S.: Secure data deduplication with reliable key management for dynamic updates in CPSS. IEEE Trans. Comput. Soc. Syst. 2, 137–147 (2015)CrossRefGoogle Scholar
  4. 4.
    Tin-Yu, W., Pan, J.-S., Lin, C.-F.: Improving accessing efficiency of cloud storage using de-duplication and feedback schemes. IEEE Syst. J. 8, 2018–2218 (2014)Google Scholar
  5. 5.
    Mao, B., Jiang, H., Suzhen, W., Tian, L.: Leveraging data deduplication to improve the performance of primary storage systems in the cloud. IEEE Trans. Comput. 65, 1775–1788 (2016)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Min, F., Dan Feng, Yu., Hua, X.H., Chen, Z., Liu, J., Xia, W., Huang, F., Liu, Q.: Reducing fragmentation for in-line deduplication backup storage via exploiting backup history and cache knowledge. IEEE Trans. Parallel Distrib. Syst. 27, 855–867 (2016)CrossRefGoogle Scholar
  7. 7.
    Luo, S., Zhang, G., Chengwen, W., Khan, S.U., Li, K.: Boafft: distributed deduplication for big data storage in the cloud. IEEE Trans. Cloud Comput. 61, 1–13 (2015)CrossRefGoogle Scholar
  8. 8.
    Zhang, Y., Feng, D., Jiang, H., Xia, W., Fu, M., Huang, F., Zhou, Y.: A fast asymmetric extremum content defined chunking algorithm for data deduplication in backup storage systems. IEEE Trans. Comput. 66, 199–211 (2017)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Jia, G., Han, G., Rodrigues, J.J.P.C., Lloret, J., Li, W.: Coordinate memory deduplication and partition for improving performance in cloud computing. IEEE Trans. Cloud Comput. 7, 357–368 (2015)CrossRefGoogle Scholar
  10. 10.
    Haoran, W., Weiqin, T., Qiang, G., Shengan, Z.: A data deduplication method in the cloud storage based on FP-tree. In: International Conference on Computer Science and Network Technology (ICCSNT), pp. 557–562 (2015)Google Scholar
  11. 11.
    Wang, J., Zhao, Z., Xu, Z., Zhang, H., Li, L., Guo, Y.: I-sieve: an inline high performance deduplication system used in cloud storage. Tsinghua Sci. Technol. 20, 17–27 (2015)CrossRefGoogle Scholar
  12. 12.
    He, K., Chen, J., Ruiying, D., Qianhong, W., Xue, G., Zhang, X.: DeyPoS: deduplicatable dynamic proof of storage for multi-user environments. IEEE Trans. Comput. 65, 1–13 (2016)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Kim, S., Jeong, J., Lee, J.: Selective memory deduplication for cost efficiency in mobile smart devices. IEEE Trans. Consum. Electron. 60, 276–284 (2014)CrossRefGoogle Scholar
  14. 14.
    Zawoad, S., Hasan, R., Warner, G., Skjellum, A.: UDaaS: a cloud-based URL-deduplication-as-a-service for big datasets. In: IEEE Fourth International Conference on Big Data and Cloud Computing, pp. 271–272 (2014)Google Scholar
  15. 15.
    Li, L., Chen, X., Jiang, H., Li, Z., Li, K.-C.: P-CP-ABE: parallelizing ciphertext-policy attribute-based encryption for clouds. In: IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1–6 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of CSEUniversity College of EngineeringTindivanamIndia

Personalised recommendations