Skip to main content

Security Issues and Challenges in Big Data Analytics in Distributed Environment

  • Chapter
  • First Online:

Part of the book series: Scalable Computing and Communications ((SCC))

Abstract

Data with high volume, variety, velocity, variability and veracity termed as Big Data. Big data is usually processed in distributed environment with a number of connected machines supporting applications typically termed as Big Data Analytics. However, the amount of sensitive data typically processed in typical Big Data Analytics has made Big Data Analytics applications an eye catch to anomalous users. Processing big volume of data in distributed environment also makes it an attractive prey. In this chapter, we try to cover the security issues related to various aspects of Big Data Analytics. We show the vulnerabilities involve in that as well as situations in which these vulnerabilities arises. We focused on the security aspects which arises in the practical environment in industries and organizations. Finally, we concluded the chapter by providing some security countermeasures. Countermeasures may vary according to the situation and type of attacks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen M, Mao S, Liu Y et al (2014) Big Data: A survey. Mobile Networks and Applications 19(2): 171–209.

    Google Scholar 

  2. http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillionbytes-of-data-created-daily/

  3. https://www.sciencedaily.com/releases/2013/05/130522085217.htm

  4. Jacobs A (2009) The Pathologies of Big Data. Communications of the ACM 52(8): 36–44.

    Google Scholar 

  5. White T (2012) Hadoop: The Definitive Guide. O’Reilly Media, Inc.

    Google Scholar 

  6. Shvachko K, Kuang H, Radia S et al (2010) The Hadoop Distributed File System. In IEEE 26th symposium on mass storage systems and technologies (MSST): 1–10.

    Google Scholar 

  7. Shanahan JG, Dai L (2015) Large Scale Distributed Data Science using Apache Spark. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 2323–2324

    Google Scholar 

  8. Li J, Tao F, Cheng Y, Zhao L et al (2015) Big Data in Product Lifecycle Management. The International Journal of Advanced Manufacturing Technology 81(14): 667–684.

    Google Scholar 

  9. McKenna A, Hanna M, Banks E et al (2010) The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data. Genome research 20(9): 1297–1303.

    Google Scholar 

  10. Han J, Haihong E, Le G et al (2011) Survey on NoSQL Database. 6th international conference on In Pervasive Computing and Applications (ICPCA): 363–366.

    Google Scholar 

  11. Leavitt N (2010) Will NoSQL databases live up to their promise?. Computer 43(2): 12–14.

    Google Scholar 

  12. Halfond WG, Viegas J, Orso A. et al (2006) A Classification of SQL-injection Attacks and Countermeasures. In Proceedings of the IEEE International Symposium on Secure Software Engineering: 13–15.

    Google Scholar 

  13. Okman L, Gal N, Gonen Y et al (2011) Security Issues in NoSQL Databases. In 10th IEEE International Conference on Trust, Security and Privacy in Computing and Communications: 541–547.

    Google Scholar 

  14. Agrawal R, Srikant R (2000) Privacy-Preserving Data Mining. In ACM Sigmod Record vol 29: 439–450.

    Google Scholar 

  15. Adar E, (2007) User 4xxxxx9: Anonymizing Query Logs. In Proceedings of Query Log Analysis Workshop, International Conference on World Wide Web.

    Google Scholar 

  16. Garfinkel S (2002) Network Forensics: Tapping the Internet. IEEE Internet Computing 6: 60–66.

    Google Scholar 

  17. Desmedt Y (2011) Man-in-the-Middele Attack. In Encyclopedia of Cryptography and Security Springer US: 759–759.

    Google Scholar 

  18. Goyal V, Pandey O, Sahai A et al (2006) Attribute-based Encryption for FineGrained Access Control of Encrypted Data. In Proceedings of the 13th ACM conference on Computer and communications security: 89–98

    Google Scholar 

  19. Hussain B (2006) U.S. Patent Application No. 11/425,524.

    Google Scholar 

  20. Futoransky A, Kargieman E, Bendersky D et al (2003). U.S. Patent Application No. 10/414,120.

    Google Scholar 

  21. Avramovic B, Fink LK (1992) Real-Time Reactive Security Monitoring. IEEE Transactions on Power Systems 7: 432–437.

    Google Scholar 

  22. Anthony E, Phillips J (2003) U.S. Patent Application No. 10/347,050

    Google Scholar 

  23. Tanenbaum AS, Vansteen M (2007) Distributed systems. Prentice-Hall.

    Google Scholar 

  24. Bertsekas DP, Tsitsiklis JN (1989) Parallel and distributed computation: Numerical Method vol 23.

    Google Scholar 

  25. Dean J, Ghemawat S (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51: 107–113.

    Google Scholar 

  26. Chaiken R, Jenkins B, Larson P et al (2008) SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proceedings of the VLDB Endowment vol 1: 1265–1276.

    Google Scholar 

  27. Ron A, Shulman-Peleg A, Puzanov A. (2016). Analysis and Mitigation of NoSQL Injections. IEEE Security & Privacy 2:30–39.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mayank Swarnkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Swarnkar, M., Bhadoria, R.S. (2017). Security Issues and Challenges in Big Data Analytics in Distributed Environment. In: Mazumder, S., Singh Bhadoria, R., Deka, G. (eds) Distributed Computing in Big Data Analytics. Scalable Computing and Communications. Springer, Cham. https://doi.org/10.1007/978-3-319-59834-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59834-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59833-8

  • Online ISBN: 978-3-319-59834-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics