Securing the Data Deduplication to Improve the Performance of Systems in the Cloud Infrastructure

N. Pachpor, Nishant; S. Prasad, Prakash

doi:10.1007/978-981-13-8253-6_5

Nishant N. Pachpor⁸ &
Prakash S. Prasad⁸

Part of the book series: Asset Analytics ((ASAN))

644 Accesses
1 Citations

Abstract

Data duplication is a data quality problem which may exist in database system where the same record is stored multiple times in the same or different database systems. Data duplication issue may lead to issues like data redundancy, wasted cost, lost income, negative impact on response rate, ROI, and brand reputation, poor customer service, inefficiency and lack of productivity, decreased user adoption, inaccurate reporting, less informed decisions, and poor business process. The solution to the problem of data duplication may be countered with data deduplication which is often termed as intelligent compression or single instance storage. Data deduplication eradicates duplicate copies of information resulting in the reduction of storage overheads and in enhancement of various performance parameters. The recent study on data deduplication has shown that there exists modern data redundancy in primary storage in the cloud infrastructure. Data redundancy can be reduced in primary storage system of cloud architecture using data deduplication. The research work carried out highlights the identified and established methods of data deduplication based on capacity and performance parameters. In the research work, the authors have proposed a performance-oriented data (POD) deduplication scheme which improves performance and primary storage system in the cloud. In addition to this, security analysis using encryption technique has also been performed and demonstrated to protect the sensitive data after the completion of deduplication process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pachpor, N. N., & Prasad, P. S. (2018). Improving the performance of system in cloud by using selective deduplication. In IEEE 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA).
Google Scholar
Mao, B., Jiang, H., Wu, S., & Tian, L. (2016). Leveraging data deduplication to improve the performance of primary storage systems in the cloud. IEEE Transactions on Computers, 65(6), 1775–1788.
Google Scholar
Koller, R., & Rangaswami, R. (2010). I/O deduplication: Utilizing content similarity to improve I/O performance. In Proceedings of USENIX File Storage Technologies, February 2010 (pp. 1–14).
Google Scholar
Meyer, D. T., & Bolosky, W. J. (2011). A study of practical deduplication. In Proceedings of 9th USENIX Conference on File Storage Technologies, February 2011 (pp. 1–14).
Google Scholar
Clements, T., Ahmad, I., Vilayannur, M., & Li, J. (2009). Decentralized deduplication in SAN cluster file systems. In Proceedings of USENIX Annual Technical Conference, June 2009 (pp. 101–114).
Google Scholar
Bibawe, C. B., & Baviskar, V. (2017). Secure authorized deduplication for data reduction with low overheads in hybrid cloud. International Journal of Innovative Research in Computer and Communication Engineering, 5(2), 1797–1804.
Google Scholar
Jin, K., & Miller, E. L. (2009). The effectiveness of deduplication on virtual machine disk images. In Proceedings of the Israeli Experimental Systems Conference, May 2009 (pp. 1–12).
Google Scholar
Srinivasan, K., Bisson, T., Goodson, G., & Voruganti, K. (2012). iDedup: Latency-aware, inline data deduplication for primary storage. In Proceedings of 10th USENIX Conference on File Storage Technologies, February 2012 (pp. 299–312).
Google Scholar
Gode, R. V., & Dalvi, R. A survey on authorized deduplication technique for encrypted data with DARE scheme in a twin cloud environment. IJIRCCE, ISSN(Online): 2320-9801
Google Scholar
El-Shimi, A., Kalach, R., Kumar, A., Oltean, A., Li, J., & Sengupta, S. (2012). Primary data deduplication-large scale study and system design. In Proceedings of USENIX Annual Technical Conference, June 2012 (pp. 285–296).
Google Scholar
Kiswany, S., Ripeanu, M., Vazhkudai, S. S., & Gharaibeh, A. (2008). STDCHK: A checkpoint storage system for desktop gridcomputing. In Proceedings of 28th International Conference on Distributed Computing Systems, June 2008 (pp. 613–624).
Google Scholar
Meister, D., Kaiser, J., Brinkmann, A., Cortes, T., Kuhn, M., & Kunkel, J. (2012) A study on data deduplication in HPC storage systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2012 (pp. 1–11).
Google Scholar

Download references

Author information

Authors and Affiliations

P.I.E.T, Nagpur, India
Nishant N. Pachpor & Prakash S. Prasad

Authors

Nishant N. Pachpor
View author publications
You can also search for this author in PubMed Google Scholar
Prakash S. Prasad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nishant N. Pachpor .

Editor information

Editors and Affiliations

Department of Applied Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Millie Pant
Amity School of Engineering & Technology, Amity University Rajasthan, Jaipur, Rajasthan, India
Tarun K. Sharma
Department of Computer Science, Czech Technical University in Prague, Ostrava, Praha, Czech Republic
Sebastián Basterrech
Amity Institute of Information Technology, Amity University Rajasthan, Jaipur, Rajasthan, India
Chitresh Banerjee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

N. Pachpor, N., S. Prasad, P. (2020). Securing the Data Deduplication to Improve the Performance of Systems in the Cloud Infrastructure. In: Pant, M., Sharma, T., Basterrech, S., Banerjee, C. (eds) Performance Management of Integrated Systems and its Applications in Software Engineering. Asset Analytics. Springer, Singapore. https://doi.org/10.1007/978-981-13-8253-6_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-8253-6_5
Published: 11 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8252-9
Online ISBN: 978-981-13-8253-6
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics