Abstract
In the present time of industry and academia, the demand for efficient utilization of data storage needs to be taken into account, as lots of duplicate data on the cloud lead to a waste of storage space. Therefore, resulting in a need to explore and propose algorithms to increase the efficiency of storage space on the cloud. Data deduplication is a technique to turn out the need for managing the storage efficiently by removing duplicate data. It is important to study the existing state of art techniques of deduplication available in the literature that solves the storage problem. This paper discusses the impact on research via bibliometric analysis of the data deduplication for a time period from 2010 to 2023. This bibliometric analysis is based on samples of 461 documents taken from the Scopus database. Bibliometric review is done via the Biblioshiny application which is included in the Bibliometric package found in the R language. An analysis is carried out on various aspects such as annual scientific production, total citations per year, authors and documents citations, common key terms, highlights of the relevant authors and sources, and analysis of trending topics in relevant field. The inferred results are structured and organized in such a way as to help researchers in the future by providing directions for them to explore various options. The findings demonstrate that as research advances, experts pay greater attention to the consequences of duplicate data in the cloud brought by the data deduplication process and the research goals are getting more focused.
Similar content being viewed by others
Data Availability
Data is available from the authors upon reasonable request from the corresponding author.
Abbreviations
- PRISMA:
-
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
- CSV:
-
Comma-Separated Value
References
Rasina Begum B, Chitra P (2021) SEEDDUP: a three-tier secure data deduplication architecture-based storage and retrieval for cross-domains over cloud. IETE J Res. https://doi.org/10.1080/03772063.2021.1886882
Mao Z, Xue Y, Wang H, Ou W (2019) Research on big data encryption algorithms based on data deduplication technology. In: 2019 international conference on electronic engineering and informatics (EEI). pp 520–522. https://doi.org/10.1109/EEI48997.2019.00118
Malathi P, Suganthidevi S (2021) Comparative study and secure data deduplication techniques for cloud computing storage. In: 2021 international conference on innovative computing, intelligent communication and smart electrical systems (ICSES). pp 1–5. https://doi.org/10.1109/ICSES52305.2021.9633960
Zhang D, Le J, Mu N, Wu J, Liao X (2023) Secure and efficient data deduplication in jointcloud storage. IEEE Trans Cloud Comput 11(1):156–167. https://doi.org/10.1109/TCC.2021.3081702
Viji D, Revathy S (2021) Comparative analysis for content defined chunking algorithms in data deduplication. Spec Issue Inf Retr Web Search 8:255–268. https://doi.org/10.14704/WEB/V18SI02/WEB18070
Wang C, Fu Y, Yan J, Wu X, Zhang Y, Xia H, Yuan Y (2021) A cost-efficient resemblance detection scheme for post-deduplication delta compression in backup systems. wileyonlinelibrary.com/journal/cpe:1-13. https://doi.org/10.1002/cpe.6558
Kumar PMA, Pugazhendhi E, Nayak RK (2022) Cloud storage performance improvement using deduplication and compression techniques. In: 2022 4th international conference on smart systems and inventive technology (ICSSIT). pp 443–449. https://doi.org/10.1109/ICSSIT53264.2022.9716524
Keith W. How does data deduplication work? https://www.actualtechmedia.com/io/how-data-deduplication-works. Accessed 20 May 2014
Chhabra N, Bala M (2020) A comparative study of data deduplication strategies. In: First international conference on secure cyber computing and communication (ICSCCC) 2020. pp 68–72. https://doi.org/10.1109/ICSCCC.2018.8703363
Priya J, Vinothini C, Dinesh PS, Reshmi TS (2021) Data deduplication techniques: a comparative analysis. Int J Aquat Sci 12(3):1057–1065
Prajapati P, Shah P (2022) A review on secure data deduplication: cloud storage security issue. J King Saud Univ Comput Inf Sci 34(7):3996–4007. https://doi.org/10.1016/j.jksuci.2020.10.021
Ni F, Jiang S (2019) RapidCDC: leveraging duplicate locality to accelerate chunking in CDC-based deduplication systems. In: Proceedings of the ACM symposium on cloud computing. pp 220–232. https://doi.org/10.1145/3357223.3362731
Shakarami A, Ghobaei-Arani M, Shahidinejad A, Masdari M, Shakarami H (2021) Data replication schemes in cloud computing: a survey. Clust Comput 24(3):2545–2579. https://doi.org/10.1007/s10586-021-03283-7
Lakshmi Narayana N, Tirapathi Reddy B (2020) A comprehensive study on data deduplication techniques in cloud storage systems. High Technol Lett 26(10):670–678
Kim WB, Lee IY (2021) Survey on data deduplication in cloud storage environments. J Inf Process Syst 17(3):658–673
Satish V, Singh DK (2016) Secure deduplication techniques: a study. Int J Comp Appl 137(8):41–43. https://doi.org/10.5120/ijca2016908874
Rajput U, Shinde S, Thakur P, Patil G, Deokar P (2022) Analysis on deduplication techniques for storage of data in cloud. Int Res J Eng Technol 9(5):296–304
What are the real benefits of data deduplication in Cloud? https://www.webwerks.in/blogs/what-are-real-benefits-data-deduplication-cloud. Accessed 5 Dec 2022
Data deduplication 101. https://www.computerweekly.com/tutorial/data-deduplication-101. Accessed 5 Dec 2022
Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM (2021) How to conduct a bibliometric analysis: an overview and guidelines. J Bus Res 133:285–296. https://doi.org/10.1016/j.jbusres.2021.04.070
Block JH, Fisch C (2020) Eight tips and questions for your bibliographic study in business and management research. Springer Manag Rev Q 70:307–312. https://doi.org/10.1007/s11301-020-00188-4
Cobo MJ, Lopez-Herrera AG, Herrera-Viedma E, Herrera F (2011) An approach for detecting, quantifying, and visualizing the evolution of a research field: a practical application to the fuzzy sets theory field. J Informetr 5:146–166. https://doi.org/10.1016/j.joi.2010.10.002
Rojas-Sánchez MA, Palos-Sánchez PR, Folgado-Fernández JA (2023) Systematic literature review and bibliometric analysis on virtual reality and education. Educ Inf Technol 28:155–192. https://doi.org/10.1007/s10639-022-11167-5
Garg D, Sidhu J, Rani S (2019) Emerging trends in cloud computing security: a bibliometric analyses. IET Inst Eng Technol 13(3):223–231. https://doi.org/10.1049/iet-sen.2018.5222
Hr S, Thangam (2021) A hybrid cloud approach for efficient data storage and security. In: 6th international conference on communication and electronics systems (ICCES). pp 1072–1076. https://doi.org/10.1109/ICCES51350.2021.9488938
Sharma D, Kumar G, Sharma R (2021) Analysis of heterogeneous data storage and access control management for cloud computing under M/M/c queueing model. Int J Cloud Appl Comput 11(3):58–71. https://doi.org/10.4018/IJCAC.2021070104
Khattar N, Singh J, Sidhu J (2019) Multi-criteria-based energy-efficient framework for VM placement in cloud data centers. Arab J Sci Eng 44:9455–9469. https://doi.org/10.1007/s13369-019-04048-6
Nivedha R, Arshiya SS (2019) An effective system for storing data and resources using cloud computing. Int J Innov Technol Explor Eng 8(6S4):435–437
Serenko A, Bontis N (2004) Meta-review of knowledge management and intellectual capital literature: citation impact and research productivity rankings, knowledge and process management. Wiley Publisher 11(3):185–190. https://doi.org/10.1002/kpm.203
“The Publish or Perish Book,” Harzing.com. https://harzing.com/publications/publish-or-perish-book/pdf. Accessed 10 May 2023
Garfield E (2004) Historiographic mapping of knowledge domains literature. J Inf Sci 30(2):119–145. https://doi.org/10.1177/0165551504042
Jayantha WM, Oladinrin OT (2019) Bibliometric analysis of hedonic price model using CiteSpace. Int J Hous Mark Anal 13(2):357–371. https://doi.org/10.1108/IJHMA-04-2019-0044
Perrson O, Danell R, Schneider JW (2009) How to use Bibexcel for various types of bibliometric analysis. In: Celebrating scholarly communication studies. pp 9–24
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the international AAAI conference on web and social media, vol 3, no 1. pp 361–362. https://doi.org/10.1609/icwsm.v3i1.13937
Rialti R, Marzi G, Ciappei C, Busso D (2019) Big data and dynamic capabilities: a bibliometric analysis and systematic literature review. Manag Decis 57(2):2052–2068. https://doi.org/10.1108/MD-07-2018-0821
Eck VNJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84:523–538. https://doi.org/10.1007/s11192-009-0146-3
Ariaa M, Cuccurullo C (2017) Bibliometrix: an R-tool for comprehensive science mapping analysis. J Informetr 11(4):959–975. https://doi.org/10.1016/j.joi.2017.08.007
Riehmann P, Hanfler M, Froehlich B (2005) Interactive Sankey diagrams. In: IEEE symposium on information visualization INFOVIS. pp 233–240. https://doi.org/10.1109/INFVIS.2005.1532152
Wang T, Yang M, Guo Y, Wang J (2021) Virtualized resource image storage system based on data deduplication techniques. In: 2021 IEEE international conference on computer science, electronic information engineering and intelligent control technology (CEI). pp 298–302. https://doi.org/10.1109/CEI52496.2021.9574536
Vianny MM, Vempati S, Pazhanivel K, Khasim S (2022) Intelligent compression scheme for securing storage preservation in virtualized hybrid cloud. ECS Trans 107(1):16689–16697. https://doi.org/10.1149/10701.16689ecst
Ming Y, Wang C, Liu H, Zhao Y, Feng J, Zhang N, Shi W (2022) Blockchain-enabled efficient dynamic cross-domain deduplication in edge computing. IEEE Internet Things J 9(17):15639–15656. https://doi.org/10.1109/JIOT.2022.3150042
Yuvaraj D, Kumar VP, Anandaram H, Samatha B, Krishnamoorthy R, Thiyagarajan R (2022) Secure DE-duplication over wireless sensing data using convergent encryption. In: 2022 IEEE 3rd global conference for advancement in technology (GCAT). pp 1–5. https://doi.org/10.1109/GCAT55367.2022.9971983
Teng Y, Xian H, Lu Q, Guo F (2023) A data deduplication scheme based on DBSCAN with tolerable clustering deviation. IEEE Access 11:9742–9750. https://doi.org/10.1109/ACCESS.2022.3231604
Xia W, Wei C, Li Z, Wang X, Zou X (2022) NetSync: a network adaptive and deduplication-inspired delta synchronization approach for cloud storage services. IEEE Trans Parallel Distrib Syst 33(10):2554–2570. https://doi.org/10.1109/TPDS.2022.3145025
Afek Y, Giladi G, Patt-Shamir B (2021) Distributed computing with the cloud. In: Lecture notes in computer science. pp 1–20. https://doi.org/10.48550/arXiv.2109.12930
You W, Chen B (2020) Proofs of ownership on encrypted cloud data via Intel SGX. In: Lecture notes in computer science, vol 12418. pp 400–416. https://doi.org/10.1007/978-3-030-61638-0_22
Wang Z, Gao W, Yang M, Hao R (2022) Enabling secure data sharing with data deduplication and sensitive information hiding in cloud-assisted Electronic Medical Systems. Clust Comput. https://doi.org/10.1007/s10586-022-03785-y
Phyu MP, Sinha GR (eds) (2021) Efficient data deduplication scheme for scale-out distributed storage. In: Data deduplication approaches. pp 153–182
Patra SS, Jena S, Mohanty JR, Gourisaria MK (eds) (2021) DedupCloud: an optimized efficient virtual machine deduplication algorithm in cloud computing environment. In: Data deduplication approaches. Elsevier, pp 281–306
Girish DS, Bhurane AA (eds) (2021) Essentials of data deduplication using open-source toolkit. In: Data deduplication approaches. Elsevier, pp 125–151
Koushik CSN, Choubey SB, Choubey A, Sinha GR (eds) (2021) Data deduplication for cloud storage. In: Data deduplication approaches. Elsevier, pp 307–317
Mandal R, Mondal MK, Banerjee S, Chakraborty C, Biswas U (2021) A survey and critical analysis on energy generation from datacenter. In: Data deduplication approaches. Elsevier, pp 203–230
Muskan, Singh G, Singh J, Prabha C (2022) Data visualization and its key fundamentals: a comprehensive survey. In: 7th international conference on communication and electronics systems (ICCES). pp 1710–1714. https://doi.org/10.1109/ICCES54183.2022.9835803
Li T, Bai J, Yang X, Liu Q, Chen Y (2018) Co-occurrence network of high-frequency words in the bioinformatics literature: structural characteristics and evolution. Appl Sci 8(10):1–14. https://doi.org/10.3390/app8101994
Jayanthi MK, Saithya PVN, Vaibhavi PS, Reddy YH (2022) Achieving efficient data deduplication and key aggregation encryption system in cloud. In: International conference on intelligent emerging methods of artificial intelligence & cloud computing, vol 273. pp 328–340. https://doi.org/10.1007/978-3-030-92905-3_42
Panyam AS, Jakkula PK, Rao N (2021) Significant cloud computing service for secured heterogeneous data storing and its managing by cloud users. In: 5th international conference on trends in electronics and informatics (ICOEI). pp 1447–1450. https://doi.org/10.1109/ICOEI51242.2021.9452970
Rodríguez-Ruiz F, Almodóvar P, Nguyen Q (2019) Intellectual structure of international new venture research: a bibliometric analysis and suggestions for a future research agenda. Multinatl Bus Rev 27(4):285–316. https://doi.org/10.1108/MBR-01-2018-0003
Zhang D, Deng Y, Zhou Y, Li J, Zhu W, Min G (2022) MGRM: a multi-segment greedy rewriting method to alleviate data fragmentation in deduplication-based cloud backup systems. IEEE Trans Cloud Comput. https://doi.org/10.1109/TCC.2022.3214816
Li J, Li T, Liu Z, Chen X (2019) Secure deduplication system with active key update and its application in IoT. ACM Trans Intell Syst Technol 10(6):1–21. https://doi.org/10.1145/3356468
Li J, Hou M (2018) Improving data availability for deduplication in cloud storage. Int J Grid High Perform Comput 10(2):70–89. https://doi.org/10.4018/IJGHPC.2018040106
Wei J, Niu X, Zhang R, Liu J, Yao Y (2017) Efficient data possession–checking protocol with deduplication in cloud. Int J Distrib Sens Netw. https://doi.org/10.1177/1550147717727461
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
All authors have contributed to the article to be recognized as co-author of this article. Conceptualization, methodology, implementation: AG; draft preparation: CP; and supervision, writing, reviews, and editing: PS; writing, reviews, and editing: NM; and writing, reviews, and editing: VM.
Corresponding author
Ethics declarations
Conflict of interest
The authors affirm that they have no known financial or interpersonal conflicts that would have appeared to have an impact on the research presented in this study.
Ethical Approval
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national).
Human and Animal Rights
This article does not contain any studies with human or animal subjects performed by the any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Goel, A., Prabha, C., Sharma, P. et al. Emerging Research Trends in Data Deduplication: A Bibliometric Analysis from 2010 to 2023. Arch Computat Methods Eng (2024). https://doi.org/10.1007/s11831-024-10074-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11831-024-10074-x