Privacy Preserve Hadoop (PPH)—An Implementation of BIG DATA Security by Hadoop with Encrypted HDFS

Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 10)


As data is growing exponentially than linearly, the rising abuse of large data set emphasizes the need to preserve and protect the Data. Hadoop, a big data solution, has increasingly become popular and adopted by most of the trades. However, Hadoop by default does not contain any security mechanism. Though, it does not support data encryption which makes data privacy and security becomes a cardinal concern. The generally extensively compliant methodology of preservation and protection of data is through cryptography algorithms which is computationally intensive. Exploiting cryptography with apportioning the processing with MapReduce framework will improve the security of Hadoop. This paper presents two applications which disseminate the cryptographic process among MapReduce jobs. The first application will handles encryption of an input file that is resides in HDFS and second application will handle decryption of encrypted input file. Our experimental results show the comparison between the two cryptographic algorithms.


Hadoop Data security Cryptography Encrypted HDFS MapReduce 


  1. 1.
  2. 2.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). Page: 1–10, ISSN: 2160–195X.Google Scholar
  3. 3.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI’04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.Google Scholar
  4. 4.
    Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System for Large Computing Clusters. In: Proceedings of the FAST’02 Conference on File and Storage Technologies. Monterey, California, USA: USENIX. pp. 231–244. ISBN 1-880446-03-0. Retrieved 2008-01-18.Google Scholar
  5. 5.
    Fortner, B., Ahalt, S., Coposky, J., Fecho, K., Heinzel, S., Krishnamurthey, A., Moore, A., Rajasekar, A., Schmitt, C., P., Schroeder, W.: Control Your Data iRODS: integrated Rule-Oriented System. In: Morgan and Claypool Publishers (2010), Volume 2, No. 2, The RENCI White Paper 2014.Google Scholar
  6. 6.
    Weil, S.,A., Brandt, S.,A., Miller, E.,L., Long, D.,D.,E., Maltzahn, C.: Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), 2006 307–320.Google Scholar
  7. 7.
    Gluster: An Introduction to Gluster Architecture (2011).Google Scholar
  8. 8.
    Lustre: A scalable, high-performance file system. Cluster File Systems Inc. white paper, version 1.0 (Nov 2002).Google Scholar
  9. 9.
    LeBlanc, T., Subhlok, J., Gabriel, E.: A High-Level Interpreted MPI Library for Parallel Computing in Volunteer Environments. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGRID’10). IEEE Computer Society, Washington, DC, USA, 673–678.Google Scholar
  10. 10.
    Prabhakar, R., Patrick, C., Kandemir, M.: MPISec I/O: Providing Data Confidentiality in MPI-I/O, 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, vol., no., pp. 388–395, 18–21 May 2009.Google Scholar
  11. 11.
    Leo, S., Santoni, F., Zanetti, G.: Biodoop: Bioinformatics on Hadoop. In: ICPPW, International Conference on Parallel Processing Workshops, 2009. pp. 415–422.Google Scholar
  12. 12.
    Wei, W., Du, J., Yu, T., Gu, X.: SecureMR: A Service Integrity Assurance Framework for MapReduce. Annual Computer Security Applications Conference, 2009., vol., no., pp. 73–82, 7–11 Dec. 2009.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringGalgotias UniversityGreater NoidaIndia

Personalised recommendations