Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Scalable Big Data Privacy with MapReduce

  • Sibghat Ullah Bazai
  • Julian Jang-Jaccard
  • Xuyun ZhangEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_243


Processing big data to drive useful information has been in spotlight in recent years. Numerous approaches have been proposed to explore different ways to analyse the big data. However, data privacy has been an issue during the process because data could have been from various sources and they may contain sensitive personal information of individual. Hadoop MapReduce has been considered as one of the most promising approaches for big data processing. This chapter provides an overview of MapReduce environment, privacy challenges faced during the processing of data in MapReduce cluster, existing approaches adopted by various researchers to mitigate these issues. We also provide future guidelines for anonymized data processing to ensure individual privacy in MapReduce.


Big data analytics is an emerging technology for finding new insights from large amounts of data. Processing and analyzing these large amounts of data require an extra set of tools and services....

This is a preview of subscription content, log in to check access.


  1. Adnan M, Afzal M, Aslam M, Jan R, Martinez-Enriquez A (2014) Minimizing big data problems using cloud computing based on hadoop architecture. In: 2014 11th annual high-capacity optical networks and emerging/enabling technologies (HONET). IEEE, pp 99–103Google Scholar
  2. Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms. In: Privacy-preserving data mining. Springer, Dordrecht, pp 11–52CrossRefGoogle Scholar
  3. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st international conference on data engineering (ICDE 2005). IEEE, pp 217–228Google Scholar
  4. Bazai SU, Jang-Jaccard J, Wang R (2017a, in press) Anonymizing k-nn classification on mapreduce. In: The 9th EAI international conference on mobile networks and management. SpringerGoogle Scholar
  5. Bazai SU, Jang-Jaccard J, Zhang X (2017b) A privacy preserving platform for mapreduce. In: International conference on applications and techniques in information security. Springer, pp 88–99Google Scholar
  6. Bello-Orgaz G, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fusion 28:45–59CrossRefGoogle Scholar
  7. Blass EO, Di Pietro R, Molva R, Önen M (2012) Prism-privacy-preserving search in mapreduce. In: Privacy enhancing technologies, vol 7384. Springer, pp 180–200Google Scholar
  8. Blum A, Ligett K, Roth A (2013) A learning theory approach to noninteractive database privacy. J ACM (JACM) 60(2):12MathSciNetzbMATHCrossRefGoogle Scholar
  9. Clifton C, Tassa T (2013) On syntactic anonymity and differential privacy. In: 2013 IEEE 29th international conference on data engineering workshops (ICDEW). IEEE, pp 88–93Google Scholar
  10. Cramer R, Damgård I, Nielsen J (2001) Multiparty computation from threshold homomorphic encryption. In: Advances in cryptology-EUROCRYPT 2001, pp 280–300MathSciNetzbMATHGoogle Scholar
  11. Dankar FK, El Emam K (2012) The application of differential privacy to health data. In: Proceedings of the 2012 joint EDBT/ICDT workshops. ACM, pp 158–166Google Scholar
  12. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  13. Derbeko P, Dolev S, Gudes E, Sharma S (2016) Security and privacy aspects in mapreduce on clouds: a survey. Comput Sci Rev 20:1–28MathSciNetzbMATHCrossRefGoogle Scholar
  14. Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation. Springer, pp 1–19zbMATHGoogle Scholar
  15. Fletcher S, Islam MZ (2017) Differentially private random decision forests using smooth sensitivity. Exp Syst Appl 78:16–31CrossRefGoogle Scholar
  16. Goldreich O (1998) Secure multi-party computation. Manuscript preliminary version, pp 86–97Google Scholar
  17. Goldreich O, Micali S, Wigderson A (1987) How to play any mental game. In: Proceedings of the nineteenth annual ACM symposium on theory of computing. ACM, pp 218–229Google Scholar
  18. Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: Proceedings of the 13th international conference on extending database technology. ACM, pp 123–134Google Scholar
  19. Jain P, Gyanchandani M, Khare N (2016) Big data privacy: a technological perspective and review. J Big Data 3(1):25CrossRefGoogle Scholar
  20. Ko SY, Jeon K, Morales R (2011) The hybrex model for confidentiality and privacy in cloud computing. In: HotCloud, pp 1–8Google Scholar
  21. Mayberry T, Blass EO, Chan AH (2013) PIRMAP: efficient private information retrieval for mapreduce. In: International conference on financial cryptography and data security. Springer, pp 371–385Google Scholar
  22. Micciancio D (2010) A first glimpse of cryptography’s holy grail. In: Commun ACM 53(3):96–96CrossRefGoogle Scholar
  23. Mohan P, Thakurta A, Shi E, Song D, Culler D (2012) GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 349–360Google Scholar
  24. Natwichai J, Li X, Orlowska ME (2006) A reconstruction-based algorithm for classification rules hiding. In: Proceedings of the 17th Australasian database conference, vol 49. Australian Computer Society, Inc., pp 49–58Google Scholar
  25. Patel AB, Birla M, Nair U (2012) Addressing big data problem using hadoop and map reduce. In: 2012 Nirma University international conference on engineering (NUiCONE). IEEE, pp 1–5Google Scholar
  26. Peralta D, del Río S, Ramírez-Gallego S, Triguero I, Benitez JM, Herrera F (2015) Evolutionary feature selection for big data classification: a mapreduce approach. In: Math Probl Eng, pp 1–12zbMATHGoogle Scholar
  27. Roy I, Setty ST, Kilzer A, Shmatikov V, Witchel E (2010) Airavat: security and privacy for mapreduce. In: NSDI, vol 10, pp 297–312Google Scholar
  28. Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(05):571–588MathSciNetzbMATHCrossRefGoogle Scholar
  29. Tran Q, Sato H (2012) A solution for privacy protection in mapreduce. In: 2012 IEEE 36th annual computer software and applications conference (COMPSAC). IEEE, pp 515–520Google Scholar
  30. Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 495–506Google Scholar
  31. Victor N, Lopez D, Abawajy JH (2016) Privacy models for big data: a survey. Int J Big Data Intell 3(1):61–75CrossRefGoogle Scholar
  32. White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., SebastopolGoogle Scholar
  33. Xiao Z, Xiao Y (2014) Achieving accountable mapreduce in cloud computing. Futur Gener Comput Syst 30:1–13CrossRefGoogle Scholar
  34. Yao AC (1982) Protocols for secure computations. In: 23rd annual symposium on foundations of computer science, SFCS’08. IEEE, pp 160–164Google Scholar
  35. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10(10–10):95Google Scholar
  36. Zhang K, Zhou X, Chen Y, Wang X, Ruan Y (2011) Sedic: privacy-aware data intensive computing on hybrid clouds. In: Proceedings of the 18th ACM conference on computer and communications security. ACM, pp 515–526Google Scholar
  37. Zhang X, Liu C, Nepal S, Dou W, Chen J (2012) Privacy-preserving layer over mapreduce on cloud. In: 2012 second international conference on cloud and green computing (CGC). IEEE, pp 304–310Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Sibghat Ullah Bazai
    • 2
  • Julian Jang-Jaccard
    • 1
  • Xuyun Zhang
    • 2
    Email author
  1. 1.Institute of Natural and Mathematical SciencesMassey UniversityAucklandNew Zealand
  2. 2.Department of Electrical and Computer EngineeringUniversity of AucklandAucklandNew Zealand