Eavesdropper: A Framework for Detecting the Location of the Processed Result in Hadoop
Hadoop has become increasingly popular as it rapidly processes big data in parallel, while security mechanisms have been introduced or studied for Hadoop. In addition, other security issues that should not be neglected still exist. Data leakage is one of the major security challenges. This paper studies the vulnerability of authorization mechanism of services in Hadoop and the threat of information leakage. Some authorization mechanism allow all users to access services by default, an adversary can utilize these services to collect information of other users. We design and implement Eavesdropper, a framework which utilizes k-means clustering to address the nodes that store the processed results. We conduct a comprehensive of experiments, which clearly demonstrate that our detection framework is capable of detecting the nodes that store the results.
KeywordsHadoop MapReduce YARN Security Data leakage k-means
This work is supported by the National High Technology Research and Development Program (“863” Program) of China under Grant No. 2015AA016009, the National Natural Science Foundation of China under Grant No. 61232005, and the Science and Technology Program of Shen Zhen, China under Grant No. JSGG2014051 6162852628.
- 1.Apache hadoop. http://hadoop.apache.org
- 2.Sharma, A., Kalbarczyk, Z., Barlow, J.: Analysis of security data from a large computing organization. In: IEEE 41st International Conference on Dependable Systems Networks, pp. 506–517. IEEE (2011)Google Scholar
- 3.Hadoop in Secure Mode. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html
- 4.Ulusoy, H., Colombo, P., Ferrari, E.: GuardMR: fine-grained security policy enforcement for mapreduce system. In: ASIACCS 2015, pp. 285–296. ACM (2015)Google Scholar
- 5.Lahmer, I., Zhang, N.: MapReduce: MR model abstraction for future security study. In: International Conference on Security of Information and Networks, pp. 392–398. ACM (2014)Google Scholar
- 7.Dean, J., Ghenmawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004, pp.137–150. ACM (2004)Google Scholar
- 8.Huang, J., Nicol, D.M., Campbell, R.H.: Denial-of-Service threat to hadoop/YARN clusters with multi-tenancy. In: IEEE International Congress on Big Data (2014)Google Scholar
- 9.Smith, K.T.: Big Data Security: The Evolution of Hadoop’s Security Model (2013)Google Scholar
- 10.O’Malley, O., Zhang, K., Radia, S.: Hadoop security design. In: Yahoo! Tech Rep (2009)Google Scholar
- 11.White, T.: Hadoop: The Definitve Guide, 3rd Edition, pp. 43–79. O’Reilly, Sebastopol (2012)Google Scholar
- 12.Vavilapalli, V.K., Murthy, A.C., Douglas, C.: Apache hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, vol. 5. ACM (2013)Google Scholar