Enhancing Dependability in Big Data Analytics Enterprise Pipelines

Zahid, Hira; Mahmood, Tariq; Ikram, Nassar

doi:10.1007/978-3-030-05345-1_23

Enhancing Dependability in Big Data Analytics Enterprise Pipelines

Hira Zahid¹⁶,
Tariq Mahmood¹⁶ &
Nassar Ikram¹⁷

Conference paper
First Online: 07 December 2018

1592 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11342))

Abstract

Big Data Analytics (BDA) brings extensive opportunities to enterprises to extract valuable information from high volume, velocity and variety data streams. However, the BDA dynamics can lead to significant project failures due to high-risk factors in terms of data availability, reliability, integrity, security and resilience which are the key components of a dependable system and are strongly linked to BDA process execution. Specifically, the heterogeneity of big data sources, diverse set of challenges related to big data integration and processing, along with a rapidly-expanding landscape warrant the need to make dependable big data systems capable of providing standard analytical solutions. In this paper, we propose the first dependable pipeline architecture for the BDA process which has a layered front-end and back-end implementation, employs the standard lambda architecture in a DataOps analytical cycle, incorporates state-of-the-art tools which are all open-source, and is coded entirely in the standard Python language to remove cross-platform implementation dependencies. We have implemented this architecture in five enterprise BDA projects but we are unable to present implementation details and results due to space limitations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Dimov, A., Davidovic, N., Stoimenov, L., Baylov, K.: Software dependability management in Big Data distributed stream computing systems (2017)
Google Scholar
Anthony, A.: Mastering AWS Security: Create and Maintain a Secure Cloud Ecosystem, 1st edn. Packt Publishing - eBooks Account, Birmingham (2017)
Google Scholar
Asay, M.: 85% of big data projects fail, but your developers can help yours succeed (2017). https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/
Bahga, A., Madisetti, V.: Big Data Science & Analytics: A Hands-On Approach
Google Scholar
Celebi, O.F., et al.: On use of big data for enhancing network coverage analysis. In: ICT 2013. IEEE, May 2013
Google Scholar
Chang, B.R., Tsai, H.F., Lin, Z.Y., Chen, C.M.: Access-controlled video/voice over IP in hadoop system with BPNN intelligent adaptation. In: 2012 International Conference on Information Security and Intelligence Control (ISIC), pp. 325–328. IEEE (2012)
Google Scholar
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)
Article Google Scholar
cloudera: apache-flume@ONLINE (2017). https://www.cloudera.com/products/open-source/apache-hadoop/apache-flume.html
Daki, H., El Hannani, A., Aqqal, A., Haidine, A., Dahbi, A., Ouahmane, H.: Towards adopting big data technologies by mobile networks operators: A moroccan case study. In: 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), pp. 154–161. IEEE (2016)
Google Scholar
Datafloq: Top reasons of Hadoop - big data project failures (2017). https://datafloq.com/read/top-reasons-of-hadoop-big-data-project-failures/2185
Demirkan, H., Dal, B.: The data economy: Why do so many analytics projects fail? (2014). http://analytics-magazine.org/the-data-economy-why-do-so-many-analytics-projects-fail/
George, J., Chen, C.A., Stoleru, R., Xie, G.: Hadoop MapReduce for mobile clouds. IEEE Trans. Cloud Comput., 1 (2016)
Google Scholar
Haber, I.: Why redis beats memcached for caching (2017). https://www.infoworld.com/article/3063161/nosql/why-redis-beats-memcached-for-caching.html
He, Y., Yu, F.R., Zhao, N., Yin, H., Yao, H., Qiu, R.C.: Big data analytics in mobile cellular networks. IEEE Access 4, 1985–1996 (2016)
Article Google Scholar
Khan, N., et al.: Big data: survey, technologies, opportunities, and challenges. Sci. World J. 2014, 1–18 (2014)
Google Scholar
Khatib, E.J., Barco, R., Muñoz, P., De La Bandera, I., Serrano, I.: Self-healing in mobile networks with big data. IEEE Commun. Mag. 54(1), 114–120 (2016)
Article Google Scholar
Liebowitz, J.: Big Data and Business Analytics, 1st edn. CRC Press, Boca Raton (2013)
Book Google Scholar
Liu, J., Liu, F., Ansari, N.: Monitoring and analyzing big traffic data of a large-scale cellular network with hadoop. IEEE Network 28(4), 32–39 (2014)
Article Google Scholar
Magnusson, J., Kvernvik, T.: Subscriber classification within telecom networks utilizing big data technologies and machine learning. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications-BigMine 2012. ACM Press (2012)
Google Scholar
Manyika, J., et al.: Big Data: The Next Frontier for Innovation, Competition and Productivity (2011)
Google Scholar
Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Shelter Island (2015)
Google Scholar
Nachiappan, R., Javadi, B., Calheiros, R.N., Matawie, K.M.: Cloud storage reliability for big data applications: a state of the art survey. J. Netw. Comput. Appl. 97, 35–47 (2017)
Article Google Scholar
Ohlhorst, F.J.: Big Data Analytics: Turning Big Data into Big Money, 1st edn. Wiley, Hoboken (2012)
Book Google Scholar
Rathore, M., Paul, A., Ahmad, A., Imran, M., Guizani, M.: High-speed network traffic analysis: detecting VoIP calls in secure big data streaming. In: 2016 IEEE 41st Conference on Local Computer Networks (LCN). IEEE, November 2016
Google Scholar
Redis: Using redis as an lru cache (2018). https://redis.io/topics/lru-cache
Senbalci, C., Altuntas, S., Bozkus, Z., Arsan, T.: Big data platform development with a domain specific language for telecom industries. In: 2013 High Capacity Optical Networks and Emerging/Enabling Technologies. IEEE, December 2013
Google Scholar
Singh, P.: 10 reasons why big data and analytics projects fail (2017). https://analyticsindiamag.com/10-reasons-big-data-analytics-projects-fail/
Tseng, J.C., et al.: A successful application of big data storage techniques implemented to criminal investigation for telecom. In: Network Operations and Management Symposium, pp. 1–3. IEEE (2013)
Google Scholar
Turck, M.: Firing on all cylinders: the 2017 big data landscape (2017). http://mattturck.com/bigdata2017/
Violino, B.: How to avoid big data analytics failures (2017). https://www.infoworld.com/article/3212945/big-data/how-to-avoid-big-data-analytics-failures.html
Weiss, G.: Data mining in the telecommunications industry. GI Global (2009)
Google Scholar
Wu, D., Zhu, L., Xu, X., Sakr, S., Lu, Q., Sun, D.: A pipeline framework for heterogeneous execution environment of big data processing. IEEE Softw. 1 (2016)
Google Scholar
Yang, R., Xu, J.: Computing at massive scale: scalability and dependability challenges. In: 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE). IEEE, March 2016
Google Scholar
Diogenes, Y., Shinder, T., Shinder, D.: Microsoft Azure Security Infrastructure (IT Best Practices - Microsoft Press), 1st edn. Microsoft Press, Redmond (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Institute of Business Administration, Karachi, Pakistan
Hira Zahid & Tariq Mahmood
Department of Computer Science, National University of Science and Technology, Islamabad, Pakistan
Nassar Ikram

Authors

Hira Zahid
View author publications
You can also search for this author in PubMed Google Scholar
Tariq Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Nassar Ikram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tariq Mahmood .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
Swinburne University of Technology, Melbourne, VIC, Australia
Jinjun Chen
St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zahid, H., Mahmood, T., Ikram, N. (2018). Enhancing Dependability in Big Data Analytics Enterprise Pipelines. In: Wang, G., Chen, J., Yang, L. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2018. Lecture Notes in Computer Science(), vol 11342. Springer, Cham. https://doi.org/10.1007/978-3-030-05345-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-05345-1_23
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05344-4
Online ISBN: 978-3-030-05345-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics