Towards Privacy for MapReduce on Hybrid Clouds Using Information Dispersal Algorithm
MapReduce is a powerful model for parallel data processing. The motivation of this work is to allow running map-reduce jobs partially on untrusted infrastructures, such as public clouds and desktop grid, while using a trusted infrastructure, such as private cloud, to ensure that no outsider could get the ’entire’ information. Our idea is to break data into meaningless chunks and spread them on a combination of public and private clouds so that the compromise would not allow the attacker to reconstruct the whole data-set. To realize this, we use the Information Dispersion Algorithms (IDA), which allows to split a file into pieces so that, by carefully dispersing the pieces, there is no method for a single node to reconstruct the data if it cannot collaborate with other nodes. We propose a protocol that allows MapReduce computing nodes to exchange the data and perform IDA-aware MapReduce computation. We conduct experiments on the Grid’5000 testbed and report on performance evaluation of the prototype.
Unable to display preview. Download preview PDF.
- [Abr90]Eggers, K.W., LaPadula, L.J., Olson, I.M., Abrams, M.D.: A generalized framework for access control: an informal description. In: Proceedings of the 13th NIST-NCSC National Computer Security Conference, pp. 135–143 (1990)Google Scholar
- [CF12]Christophe Cérin and Gilles Fedak. Desktop grid Computing. CRC Press (June 2012)Google Scholar
- [DG04]Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004)Google Scholar
- [Dwo10]Dwork, C.: Differential privacy in new settings. In: Charikar, M. (ed.) Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 174–183. SIAM (2010)Google Scholar
- [Dwo11]Dwork, C.: Differential privacy. In: Encyclopedia of Cryptography and Security, 2nd edn., pp. 338–340 (2011)Google Scholar
- [McC04]McCarty, B.: Selinux: Nsa’s open source security enhanced linux. O’Reilly and Associates (2004)Google Scholar
- [RSK+10]Roy, I., Setty, S.T.V., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for mapreduce. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI 2010, p. 20. USENIX Association, Berkeley (2010)Google Scholar
- [Whi09]White, T.: Hadoop: The Definitive Guide. Definitive Guide Series. O’Reilly Media, Incorporated (2009)Google Scholar