Abstract
Big Data Analytics (BDA) brings extensive opportunities to enterprises to extract valuable information from high volume, velocity and variety data streams. However, the BDA dynamics can lead to significant project failures due to high-risk factors in terms of data availability, reliability, integrity, security and resilience which are the key components of a dependable system and are strongly linked to BDA process execution. Specifically, the heterogeneity of big data sources, diverse set of challenges related to big data integration and processing, along with a rapidly-expanding landscape warrant the need to make dependable big data systems capable of providing standard analytical solutions. In this paper, we propose the first dependable pipeline architecture for the BDA process which has a layered front-end and back-end implementation, employs the standard lambda architecture in a DataOps analytical cycle, incorporates state-of-the-art tools which are all open-source, and is coded entirely in the standard Python language to remove cross-platform implementation dependencies. We have implemented this architecture in five enterprise BDA projects but we are unable to present implementation details and results due to space limitations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dimov, A., Davidovic, N., Stoimenov, L., Baylov, K.: Software dependability management in Big Data distributed stream computing systems (2017)
Anthony, A.: Mastering AWS Security: Create and Maintain a Secure Cloud Ecosystem, 1st edn. Packt Publishing - eBooks Account, Birmingham (2017)
Asay, M.: 85% of big data projects fail, but your developers can help yours succeed (2017). https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/
Bahga, A., Madisetti, V.: Big Data Science & Analytics: A Hands-On Approach
Celebi, O.F., et al.: On use of big data for enhancing network coverage analysis. In: ICT 2013. IEEE, May 2013
Chang, B.R., Tsai, H.F., Lin, Z.Y., Chen, C.M.: Access-controlled video/voice over IP in hadoop system with BPNN intelligent adaptation. In: 2012 International Conference on Information Security and Intelligence Control (ISIC), pp. 325–328. IEEE (2012)
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)
cloudera: apache-flume@ONLINE (2017). https://www.cloudera.com/products/open-source/apache-hadoop/apache-flume.html
Daki, H., El Hannani, A., Aqqal, A., Haidine, A., Dahbi, A., Ouahmane, H.: Towards adopting big data technologies by mobile networks operators: A moroccan case study. In: 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), pp. 154–161. IEEE (2016)
Datafloq: Top reasons of Hadoop - big data project failures (2017). https://datafloq.com/read/top-reasons-of-hadoop-big-data-project-failures/2185
Demirkan, H., Dal, B.: The data economy: Why do so many analytics projects fail? (2014). http://analytics-magazine.org/the-data-economy-why-do-so-many-analytics-projects-fail/
George, J., Chen, C.A., Stoleru, R., Xie, G.: Hadoop MapReduce for mobile clouds. IEEE Trans. Cloud Comput., 1 (2016)
Haber, I.: Why redis beats memcached for caching (2017). https://www.infoworld.com/article/3063161/nosql/why-redis-beats-memcached-for-caching.html
He, Y., Yu, F.R., Zhao, N., Yin, H., Yao, H., Qiu, R.C.: Big data analytics in mobile cellular networks. IEEE Access 4, 1985–1996 (2016)
Khan, N., et al.: Big data: survey, technologies, opportunities, and challenges. Sci. World J. 2014, 1–18 (2014)
Khatib, E.J., Barco, R., Muñoz, P., De La Bandera, I., Serrano, I.: Self-healing in mobile networks with big data. IEEE Commun. Mag. 54(1), 114–120 (2016)
Liebowitz, J.: Big Data and Business Analytics, 1st edn. CRC Press, Boca Raton (2013)
Liu, J., Liu, F., Ansari, N.: Monitoring and analyzing big traffic data of a large-scale cellular network with hadoop. IEEE Network 28(4), 32–39 (2014)
Magnusson, J., Kvernvik, T.: Subscriber classification within telecom networks utilizing big data technologies and machine learning. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications-BigMine 2012. ACM Press (2012)
Manyika, J., et al.: Big Data: The Next Frontier for Innovation, Competition and Productivity (2011)
Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Shelter Island (2015)
Nachiappan, R., Javadi, B., Calheiros, R.N., Matawie, K.M.: Cloud storage reliability for big data applications: a state of the art survey. J. Netw. Comput. Appl. 97, 35–47 (2017)
Ohlhorst, F.J.: Big Data Analytics: Turning Big Data into Big Money, 1st edn. Wiley, Hoboken (2012)
Rathore, M., Paul, A., Ahmad, A., Imran, M., Guizani, M.: High-speed network traffic analysis: detecting VoIP calls in secure big data streaming. In: 2016 IEEE 41st Conference on Local Computer Networks (LCN). IEEE, November 2016
Redis: Using redis as an lru cache (2018). https://redis.io/topics/lru-cache
Senbalci, C., Altuntas, S., Bozkus, Z., Arsan, T.: Big data platform development with a domain specific language for telecom industries. In: 2013 High Capacity Optical Networks and Emerging/Enabling Technologies. IEEE, December 2013
Singh, P.: 10 reasons why big data and analytics projects fail (2017). https://analyticsindiamag.com/10-reasons-big-data-analytics-projects-fail/
Tseng, J.C., et al.: A successful application of big data storage techniques implemented to criminal investigation for telecom. In: Network Operations and Management Symposium, pp. 1–3. IEEE (2013)
Turck, M.: Firing on all cylinders: the 2017 big data landscape (2017). http://mattturck.com/bigdata2017/
Violino, B.: How to avoid big data analytics failures (2017). https://www.infoworld.com/article/3212945/big-data/how-to-avoid-big-data-analytics-failures.html
Weiss, G.: Data mining in the telecommunications industry. GI Global (2009)
Wu, D., Zhu, L., Xu, X., Sakr, S., Lu, Q., Sun, D.: A pipeline framework for heterogeneous execution environment of big data processing. IEEE Softw. 1 (2016)
Yang, R., Xu, J.: Computing at massive scale: scalability and dependability challenges. In: 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE). IEEE, March 2016
Diogenes, Y., Shinder, T., Shinder, D.: Microsoft Azure Security Infrastructure (IT Best Practices - Microsoft Press), 1st edn. Microsoft Press, Redmond (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zahid, H., Mahmood, T., Ikram, N. (2018). Enhancing Dependability in Big Data Analytics Enterprise Pipelines. In: Wang, G., Chen, J., Yang, L. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2018. Lecture Notes in Computer Science(), vol 11342. Springer, Cham. https://doi.org/10.1007/978-3-030-05345-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-05345-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05344-4
Online ISBN: 978-3-030-05345-1
eBook Packages: Computer ScienceComputer Science (R0)