Skip to main content

Enhancing Dependability in Big Data Analytics Enterprise Pipelines

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11342))

Abstract

Big Data Analytics (BDA) brings extensive opportunities to enterprises to extract valuable information from high volume, velocity and variety data streams. However, the BDA dynamics can lead to significant project failures due to high-risk factors in terms of data availability, reliability, integrity, security and resilience which are the key components of a dependable system and are strongly linked to BDA process execution. Specifically, the heterogeneity of big data sources, diverse set of challenges related to big data integration and processing, along with a rapidly-expanding landscape warrant the need to make dependable big data systems capable of providing standard analytical solutions. In this paper, we propose the first dependable pipeline architecture for the BDA process which has a layered front-end and back-end implementation, employs the standard lambda architecture in a DataOps analytical cycle, incorporates state-of-the-art tools which are all open-source, and is coded entirely in the standard Python language to remove cross-platform implementation dependencies. We have implemented this architecture in five enterprise BDA projects but we are unable to present implementation details and results due to space limitations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Dimov, A., Davidovic, N., Stoimenov, L., Baylov, K.: Software dependability management in Big Data distributed stream computing systems (2017)

    Google Scholar 

  2. Anthony, A.: Mastering AWS Security: Create and Maintain a Secure Cloud Ecosystem, 1st edn. Packt Publishing - eBooks Account, Birmingham (2017)

    Google Scholar 

  3. Asay, M.: 85% of big data projects fail, but your developers can help yours succeed (2017). https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/

  4. Bahga, A., Madisetti, V.: Big Data Science & Analytics: A Hands-On Approach

    Google Scholar 

  5. Celebi, O.F., et al.: On use of big data for enhancing network coverage analysis. In: ICT 2013. IEEE, May 2013

    Google Scholar 

  6. Chang, B.R., Tsai, H.F., Lin, Z.Y., Chen, C.M.: Access-controlled video/voice over IP in hadoop system with BPNN intelligent adaptation. In: 2012 International Conference on Information Security and Intelligence Control (ISIC), pp. 325–328. IEEE (2012)

    Google Scholar 

  7. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)

    Article  Google Scholar 

  8. cloudera: apache-flume@ONLINE (2017). https://www.cloudera.com/products/open-source/apache-hadoop/apache-flume.html

  9. Daki, H., El Hannani, A., Aqqal, A., Haidine, A., Dahbi, A., Ouahmane, H.: Towards adopting big data technologies by mobile networks operators: A moroccan case study. In: 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), pp. 154–161. IEEE (2016)

    Google Scholar 

  10. Datafloq: Top reasons of Hadoop - big data project failures (2017). https://datafloq.com/read/top-reasons-of-hadoop-big-data-project-failures/2185

  11. Demirkan, H., Dal, B.: The data economy: Why do so many analytics projects fail? (2014). http://analytics-magazine.org/the-data-economy-why-do-so-many-analytics-projects-fail/

  12. George, J., Chen, C.A., Stoleru, R., Xie, G.: Hadoop MapReduce for mobile clouds. IEEE Trans. Cloud Comput., 1 (2016)

    Google Scholar 

  13. Haber, I.: Why redis beats memcached for caching (2017). https://www.infoworld.com/article/3063161/nosql/why-redis-beats-memcached-for-caching.html

  14. He, Y., Yu, F.R., Zhao, N., Yin, H., Yao, H., Qiu, R.C.: Big data analytics in mobile cellular networks. IEEE Access 4, 1985–1996 (2016)

    Article  Google Scholar 

  15. Khan, N., et al.: Big data: survey, technologies, opportunities, and challenges. Sci. World J. 2014, 1–18 (2014)

    Google Scholar 

  16. Khatib, E.J., Barco, R., Muñoz, P., De La Bandera, I., Serrano, I.: Self-healing in mobile networks with big data. IEEE Commun. Mag. 54(1), 114–120 (2016)

    Article  Google Scholar 

  17. Liebowitz, J.: Big Data and Business Analytics, 1st edn. CRC Press, Boca Raton (2013)

    Book  Google Scholar 

  18. Liu, J., Liu, F., Ansari, N.: Monitoring and analyzing big traffic data of a large-scale cellular network with hadoop. IEEE Network 28(4), 32–39 (2014)

    Article  Google Scholar 

  19. Magnusson, J., Kvernvik, T.: Subscriber classification within telecom networks utilizing big data technologies and machine learning. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications-BigMine 2012. ACM Press (2012)

    Google Scholar 

  20. Manyika, J., et al.: Big Data: The Next Frontier for Innovation, Competition and Productivity (2011)

    Google Scholar 

  21. Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Shelter Island (2015)

    Google Scholar 

  22. Nachiappan, R., Javadi, B., Calheiros, R.N., Matawie, K.M.: Cloud storage reliability for big data applications: a state of the art survey. J. Netw. Comput. Appl. 97, 35–47 (2017)

    Article  Google Scholar 

  23. Ohlhorst, F.J.: Big Data Analytics: Turning Big Data into Big Money, 1st edn. Wiley, Hoboken (2012)

    Book  Google Scholar 

  24. Rathore, M., Paul, A., Ahmad, A., Imran, M., Guizani, M.: High-speed network traffic analysis: detecting VoIP calls in secure big data streaming. In: 2016 IEEE 41st Conference on Local Computer Networks (LCN). IEEE, November 2016

    Google Scholar 

  25. Redis: Using redis as an lru cache (2018). https://redis.io/topics/lru-cache

  26. Senbalci, C., Altuntas, S., Bozkus, Z., Arsan, T.: Big data platform development with a domain specific language for telecom industries. In: 2013 High Capacity Optical Networks and Emerging/Enabling Technologies. IEEE, December 2013

    Google Scholar 

  27. Singh, P.: 10 reasons why big data and analytics projects fail (2017). https://analyticsindiamag.com/10-reasons-big-data-analytics-projects-fail/

  28. Tseng, J.C., et al.: A successful application of big data storage techniques implemented to criminal investigation for telecom. In: Network Operations and Management Symposium, pp. 1–3. IEEE (2013)

    Google Scholar 

  29. Turck, M.: Firing on all cylinders: the 2017 big data landscape (2017). http://mattturck.com/bigdata2017/

  30. Violino, B.: How to avoid big data analytics failures (2017). https://www.infoworld.com/article/3212945/big-data/how-to-avoid-big-data-analytics-failures.html

  31. Weiss, G.: Data mining in the telecommunications industry. GI Global (2009)

    Google Scholar 

  32. Wu, D., Zhu, L., Xu, X., Sakr, S., Lu, Q., Sun, D.: A pipeline framework for heterogeneous execution environment of big data processing. IEEE Softw. 1 (2016)

    Google Scholar 

  33. Yang, R., Xu, J.: Computing at massive scale: scalability and dependability challenges. In: 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE). IEEE, March 2016

    Google Scholar 

  34. Diogenes, Y., Shinder, T., Shinder, D.: Microsoft Azure Security Infrastructure (IT Best Practices - Microsoft Press), 1st edn. Microsoft Press, Redmond (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tariq Mahmood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zahid, H., Mahmood, T., Ikram, N. (2018). Enhancing Dependability in Big Data Analytics Enterprise Pipelines. In: Wang, G., Chen, J., Yang, L. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2018. Lecture Notes in Computer Science(), vol 11342. Springer, Cham. https://doi.org/10.1007/978-3-030-05345-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05345-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05344-4

  • Online ISBN: 978-3-030-05345-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics