Abstract
Data with high volume, variety, velocity, variability and veracity termed as Big Data. Big data is usually processed in distributed environment with a number of connected machines supporting applications typically termed as Big Data Analytics. However, the amount of sensitive data typically processed in typical Big Data Analytics has made Big Data Analytics applications an eye catch to anomalous users. Processing big volume of data in distributed environment also makes it an attractive prey. In this chapter, we try to cover the security issues related to various aspects of Big Data Analytics. We show the vulnerabilities involve in that as well as situations in which these vulnerabilities arises. We focused on the security aspects which arises in the practical environment in industries and organizations. Finally, we concluded the chapter by providing some security countermeasures. Countermeasures may vary according to the situation and type of attacks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chen M, Mao S, Liu Y et al (2014) Big Data: A survey. Mobile Networks and Applications 19(2): 171–209.
http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillionbytes-of-data-created-daily/
https://www.sciencedaily.com/releases/2013/05/130522085217.htm
Jacobs A (2009) The Pathologies of Big Data. Communications of the ACM 52(8): 36–44.
White T (2012) Hadoop: The Definitive Guide. O’Reilly Media, Inc.
Shvachko K, Kuang H, Radia S et al (2010) The Hadoop Distributed File System. In IEEE 26th symposium on mass storage systems and technologies (MSST): 1–10.
Shanahan JG, Dai L (2015) Large Scale Distributed Data Science using Apache Spark. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 2323–2324
Li J, Tao F, Cheng Y, Zhao L et al (2015) Big Data in Product Lifecycle Management. The International Journal of Advanced Manufacturing Technology 81(14): 667–684.
McKenna A, Hanna M, Banks E et al (2010) The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data. Genome research 20(9): 1297–1303.
Han J, Haihong E, Le G et al (2011) Survey on NoSQL Database. 6th international conference on In Pervasive Computing and Applications (ICPCA): 363–366.
Leavitt N (2010) Will NoSQL databases live up to their promise?. Computer 43(2): 12–14.
Halfond WG, Viegas J, Orso A. et al (2006) A Classification of SQL-injection Attacks and Countermeasures. In Proceedings of the IEEE International Symposium on Secure Software Engineering: 13–15.
Okman L, Gal N, Gonen Y et al (2011) Security Issues in NoSQL Databases. In 10th IEEE International Conference on Trust, Security and Privacy in Computing and Communications: 541–547.
Agrawal R, Srikant R (2000) Privacy-Preserving Data Mining. In ACM Sigmod Record vol 29: 439–450.
Adar E, (2007) User 4xxxxx9: Anonymizing Query Logs. In Proceedings of Query Log Analysis Workshop, International Conference on World Wide Web.
Garfinkel S (2002) Network Forensics: Tapping the Internet. IEEE Internet Computing 6: 60–66.
Desmedt Y (2011) Man-in-the-Middele Attack. In Encyclopedia of Cryptography and Security Springer US: 759–759.
Goyal V, Pandey O, Sahai A et al (2006) Attribute-based Encryption for FineGrained Access Control of Encrypted Data. In Proceedings of the 13th ACM conference on Computer and communications security: 89–98
Hussain B (2006) U.S. Patent Application No. 11/425,524.
Futoransky A, Kargieman E, Bendersky D et al (2003). U.S. Patent Application No. 10/414,120.
Avramovic B, Fink LK (1992) Real-Time Reactive Security Monitoring. IEEE Transactions on Power Systems 7: 432–437.
Anthony E, Phillips J (2003) U.S. Patent Application No. 10/347,050
Tanenbaum AS, Vansteen M (2007) Distributed systems. Prentice-Hall.
Bertsekas DP, Tsitsiklis JN (1989) Parallel and distributed computation: Numerical Method vol 23.
Dean J, Ghemawat S (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51: 107–113.
Chaiken R, Jenkins B, Larson P et al (2008) SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proceedings of the VLDB Endowment vol 1: 1265–1276.
Ron A, Shulman-Peleg A, Puzanov A. (2016). Analysis and Mitigation of NoSQL Injections. IEEE Security & Privacy 2:30–39.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Swarnkar, M., Bhadoria, R.S. (2017). Security Issues and Challenges in Big Data Analytics in Distributed Environment. In: Mazumder, S., Singh Bhadoria, R., Deka, G. (eds) Distributed Computing in Big Data Analytics. Scalable Computing and Communications. Springer, Cham. https://doi.org/10.1007/978-3-319-59834-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-59834-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59833-8
Online ISBN: 978-3-319-59834-5
eBook Packages: Computer ScienceComputer Science (R0)