Abstract
Big data applications typically require a large number of clusters, running in parallel, to process data fast and more efficiently. This is typically controlled and managed by MapReduce. In MapReduce operations, Mappers transform input original key/value pairs to a set of intermediate key/value pairs while Reducers aggregate a set of intermediate values, compute and write to the output. The output however can bring serious privacy concerns. Firstly, the output can directly leak sensitive information because it contains the global view of the final computation. Secondly, the output can also indirectly leak information via composite attacks where the adversary can link it with public information published via different sources such as Facebook or Twitter. To address such privacy concerns, we propose a privacy preserving platform which can prevent privacy leakage in MapReduce. Our platform can be plugged into the Reduce phase to sanitize the final output in such a way that the privacy is preserved while it yet provides a high data utility. We demonstrate the feasibility of our platform by providing empirical studies and highlights that our proposal can be used for real life applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
To, Q.C., Nguyen, B., Pucheral, P.: TrustedMR: a trusted MapReduce system based on tamper resistance hardware. In: Debruyne, C., et al. (eds.) On the Move to Meaningful Internet Systems: OTM 2015. LNCS, vol. 9415, pp. 38–56. Springer, Cham (2015). doi:10.1007/978-3-319-26148-5_3
Sweeny, L.: K-Anonymity: a model for protecting privacy. Int. J. Uncertainty Puzziness Knowledge-Based Syst. 10, 557–570 (2002)
Ninghui, L., Tiancheng, L., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and L-diversity. In: Proceedings of the International Conference on Data Engineering, pp. 106–115 (2007)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: ?-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data. 1, 3–es (2007)
Chen, C.-L., Pal, R., Golubchik, L.: Oblivious mechanisms in differential privacy: experiments, conjectures, and open questions. In: 2016 IEEE Security and Privacy Workshops, pp. 41–48 (2016)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107 (2008)
Sweeney, L.: Achieving K -anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10, 1–18 (2002)
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54, 86 (2011)
Dwork, C., Smith, A.: Differential privacy for statistic: what we know and what we want to learn. J. Priv. Confidentiality 1, 135–154 (2009)
Liu, F., Mathematics, C., Dame, N.: Generalized gaussian mechanism for differential privacy, pp. 1–29. arXiv. 46556 (2016)
Barthe, G., Gaboardi, M., Gregoire, B., Hsu, J., Strub, P.-Y.: Proving differential privacy via probabilistic couplings, pp. 1–10. arXiv. 1 (2016)
Gaboardi, M., Haeberlen, A., Hsu, J., Narayan, A., Pierce, B.C.: Linear dependent types for differential privacy. In: Popl 2013, vol. 48, pp. 357–370 (2013)
Ohrimenko, O., Costa, M., Fournet, C., Gkantsidis, C., Kohlweiss, M., Sharma, D.: Observing and preventing leakage in MapReduce. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communication Security - CCS 2015, pp. 1570–1581 (2015)
Chen, G., Cai, Q., Zhan, Y.: Approaches on personal data privacy preserving in cloud: a survey. In: Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, pp. 36–43. Turkey (2016)
Zhang, X., Liu, C., Nepal, S., Dou, W., Chen, J.: Privacy-preserving layer over MapReduce on cloud. In: Proceedings of the 2nd International Conference on Cloud Green Computing, 2nd International Conference on Society Computer Its Applications CGC/SCA 2012, pp. 304–310 (2012)
Roy, I., Setty, S.T.V.S.T.V., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for MapReduce. In: Proceedings of the 7th USENIX Conference on Networked System Design Implementation, vol. 19, pp. 20–20 (2010)
Tran, Q., Sato, H.: A solution for privacy protection in mapreduce. In: Proceeding of the International Computer Software Application Conference, pp. 515–520 (2012)
Douriez, M., Doraiswamy, H., Freire, J., Silva, C.T.: Anonymizing NYC taxi data: does it matter? In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 140–148. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bazai, S.U., Jang-Jaccard, J., Zhang, X. (2017). A Privacy Preserving Platform for MapReduce. In: Batten, L., Kim, D., Zhang, X., Li, G. (eds) Applications and Techniques in Information Security. ATIS 2017. Communications in Computer and Information Science, vol 719. Springer, Singapore. https://doi.org/10.1007/978-981-10-5421-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-5421-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5420-4
Online ISBN: 978-981-10-5421-1
eBook Packages: Computer ScienceComputer Science (R0)