Abstract
Data duplication is a data quality problem which may exist in database system where the same record is stored multiple times in the same or different database systems. Data duplication issue may lead to issues like data redundancy, wasted cost, lost income, negative impact on response rate, ROI, and brand reputation, poor customer service, inefficiency and lack of productivity, decreased user adoption, inaccurate reporting, less informed decisions, and poor business process. The solution to the problem of data duplication may be countered with data deduplication which is often termed as intelligent compression or single instance storage. Data deduplication eradicates duplicate copies of information resulting in the reduction of storage overheads and in enhancement of various performance parameters. The recent study on data deduplication has shown that there exists modern data redundancy in primary storage in the cloud infrastructure. Data redundancy can be reduced in primary storage system of cloud architecture using data deduplication. The research work carried out highlights the identified and established methods of data deduplication based on capacity and performance parameters. In the research work, the authors have proposed a performance-oriented data (POD) deduplication scheme which improves performance and primary storage system in the cloud. In addition to this, security analysis using encryption technique has also been performed and demonstrated to protect the sensitive data after the completion of deduplication process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pachpor, N. N., & Prasad, P. S. (2018). Improving the performance of system in cloud by using selective deduplication. In IEEE 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA).
Mao, B., Jiang, H., Wu, S., & Tian, L. (2016). Leveraging data deduplication to improve the performance of primary storage systems in the cloud. IEEE Transactions on Computers, 65(6), 1775–1788.
Koller, R., & Rangaswami, R. (2010). I/O deduplication: Utilizing content similarity to improve I/O performance. In Proceedings of USENIX File Storage Technologies, February 2010 (pp. 1–14).
Meyer, D. T., & Bolosky, W. J. (2011). A study of practical deduplication. In Proceedings of 9th USENIX Conference on File Storage Technologies, February 2011 (pp. 1–14).
Clements, T., Ahmad, I., Vilayannur, M., & Li, J. (2009). Decentralized deduplication in SAN cluster file systems. In Proceedings of USENIX Annual Technical Conference, June 2009 (pp. 101–114).
Bibawe, C. B., & Baviskar, V. (2017). Secure authorized deduplication for data reduction with low overheads in hybrid cloud. International Journal of Innovative Research in Computer and Communication Engineering, 5(2), 1797–1804.
Jin, K., & Miller, E. L. (2009). The effectiveness of deduplication on virtual machine disk images. In Proceedings of the Israeli Experimental Systems Conference, May 2009 (pp. 1–12).
Srinivasan, K., Bisson, T., Goodson, G., & Voruganti, K. (2012). iDedup: Latency-aware, inline data deduplication for primary storage. In Proceedings of 10th USENIX Conference on File Storage Technologies, February 2012 (pp. 299–312).
Gode, R. V., & Dalvi, R. A survey on authorized deduplication technique for encrypted data with DARE scheme in a twin cloud environment. IJIRCCE, ISSN(Online): 2320-9801
El-Shimi, A., Kalach, R., Kumar, A., Oltean, A., Li, J., & Sengupta, S. (2012). Primary data deduplication-large scale study and system design. In Proceedings of USENIX Annual Technical Conference, June 2012 (pp. 285–296).
Kiswany, S., Ripeanu, M., Vazhkudai, S. S., & Gharaibeh, A. (2008). STDCHK: A checkpoint storage system for desktop gridcomputing. In Proceedings of 28th International Conference on Distributed Computing Systems, June 2008 (pp. 613–624).
Meister, D., Kaiser, J., Brinkmann, A., Cortes, T., Kuhn, M., & Kunkel, J. (2012) A study on data deduplication in HPC storage systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2012 (pp. 1–11).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
N. Pachpor, N., S. Prasad, P. (2020). Securing the Data Deduplication to Improve the Performance of Systems in the Cloud Infrastructure. In: Pant, M., Sharma, T., Basterrech, S., Banerjee, C. (eds) Performance Management of Integrated Systems and its Applications in Software Engineering. Asset Analytics. Springer, Singapore. https://doi.org/10.1007/978-981-13-8253-6_5
Download citation
DOI: https://doi.org/10.1007/978-981-13-8253-6_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8252-9
Online ISBN: 978-981-13-8253-6
eBook Packages: Business and ManagementBusiness and Management (R0)