A Case for Understanding End-to-End Performance of Topic Detection and Tracking Based Big Data Applications in the Cloud

Wang, Meisong; Ranjan, Rajiv; Jayaraman, Prem Prakash; Strazdins, Peter; Burnap, Pete; Rana, Omer; Georgakopulos, Dimitrios

doi:10.1007/978-3-319-47063-4_33

Meisong Wang²⁶,
Rajiv Ranjan^27,30,
Prem Prakash Jayaraman²⁸,
Peter Strazdins²⁶,
Pete Burnap²⁹,
Omer Rana²⁹ &
…
Dimitrios Georgakopulos²⁸

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 169))

Included in the following conference series:

International Internet of Things Summit

2112 Accesses
2 Citations

Abstract

Big Data is revolutionizing nearly every aspect of our lives ranging from enterprises to consumers, from science to government. On the other hand, cloud computing recently has emerged as the platform that can provide an effective and economical infrastructure for collection and analysis of big data produced by applications such as topic detection and tracking (TDT). The fundamental challenge is how to cost-effectively orchestrate these big data applications such as TDT over existing cloud computing platforms for accomplishing big data analytic tasks while meeting performance Service Level Agreements (SLAs). In this paper a layered performance model for TDT big data analytic applications that take into account big data characteristics, the data and event flow across myriad cloud software and hardware resources. We present some preliminary results of the proposed systems that show its effectiveness as regards to understanding the complex performance dependencies across multiple layers of TDT applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lara Yejas, O.D., Zhuang, W., Pannu, A.: Big R: large-scale analytics on hadoop using R. In: IEEE International Congress on Big Data (BigData Congress), 27 June-2 July, pp. 570–577 (2014)
Google Scholar
Yang, X., Sun, J.: An analytical performance model of MapReduce. In: IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), 15–17 September, pp. 306–310 (2011)
Google Scholar
Costa, F., Silva, L., Dahlin, M.: Volunteer cloud computing: MapReduce over the Internet. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 16–20 May, pp. 1855–1862 (2011)
Google Scholar
Lin, X., Meng, Z., Xu, C., Wang, M.: A practical performance model for Hadoop MapReduce. In: IEEE International Conference on Cluster Computing Workshops (CLUSTER WORKSHOPS), pp. 231–239 (2012)
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC 2011), pp. 235–244. ACM, New York (2011)
Google Scholar
Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: MRShare: sharing across multiple queries in MapReduce. Proc. VLDB Endow. 3(1–2), 494–505 (2010)
Article Google Scholar
Cui, X., Lin, X., Hu, C., Zhang, R., Wang, C.: Modeling the Performance of MapReduce under resource contentions and task failures. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), 2–5 December 2013, vol. 1, pp. 158–163 (2013)
Google Scholar
Xu, L.: MapReduce framework optimization via performance modeling. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 21–25 May, pp. 2506–2509 (2012)
Google Scholar
Herodotou, H.: Hadoop performance models. Technical report, CS-2011-05, Computer Science Department, Duke University
Google Scholar
Khan, M., Jin, Y., Li, M., Xiang, Y., Jiang, C.: Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans. Parallel Distrib. Syst. 27, 441 (2015)
Article Google Scholar
Tamano, H., Nakadai, S., Araki, T.: Optimizing multiple machine learning jobs on MapReduce. In: IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), 29 November–1 December, pp. 59–66 (2011)
Google Scholar
Han, J., Ishii, M., Makino, H.: A Hadoop performance model for multi-rack clusters. In: 2013 5th International Conference on Computer Science and Information Technology (CSIT), 27–28 March, pp. 265–274 (2013)
Google Scholar
Alhamazani, K., Ranjan, R., Mitra, K., Jayaraman, P.P., Huang, Z., Wang, L., Rabhi, F.: CLAMS: cross-layer multi-cloud application monitoring-as-a-service framework. In: IEEE International Conference on Services Computing (SCC), 27 June–2 July, pp. 283–290 (2014)
Google Scholar
Liu, B., Blasch, E., Chen, Y., Shen, D., Chen, G.: Scalable sentiment classification for big data analysis using Nave Bayes classifier. In: 2013 IEEE International Conference on Big Data, 6–9 October, pp. 99–104 (2013)
Google Scholar
Amayri, O., Bouguila, N.: Online news topic detection and tracking via localized feature selection. In: The 2013 International Joint Conference on Neural Networks (IJCNN), 4–9 August, pp. 1–8 (2013)
Google Scholar
Huang, J., Zhao, H., Zhang, J.: Detecting flu transmission by social sensor in China. In: 2013 IEEE and Internet of Things (iThings/CPSCom), IEEE International Conference on Green Computing and Communications (GreenCom), and IEEE Cyber, Physical and Social Computing, 20–23 August, pp. 1242–1247 (2013)
Google Scholar
Wu, Z., Liao, J., Zhang, L.: Predicting on retweeting of hot topic tweets in microblog. In: 2013 5th IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT), 17–19 November, pp. 119–123 (2013)
Google Scholar
Berliska, J., Drozdowski, M.: Scheduling divisible MapReduce computations. J. Parallel Distrib. Comput. 71(3), 450–459 (2011). doi:10.1016/j.jpdc.2010.12.004
Article Google Scholar
Fischer, M.J., Su, X., Yin, Y.: Assigning tasks for efficiency in Hadoop: extended abstract. In: Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2010), pp. 30–39. ACM, New York (2010)
Google Scholar
Zhang, M., Ranjan, R., Nepal, S., Menzel, M., Haller, A.: A declarative recommender system for cloud infrastructure services selection. In: Vanmechelen, K., Altmann, J., Rana, O.F. (eds.) GECON 2012. LNCS, vol. 7714, pp. 102–113. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35194-5_8
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, Canberra, ACT, 2601, Australia
Meisong Wang & Peter Strazdins
Newcastle University, Newcastle Upon Tyne, UK
Rajiv Ranjan
RMIT University, Melbourne, 3000, Australia
Prem Prakash Jayaraman & Dimitrios Georgakopulos
Cardiff University, Cardiff, UK
Pete Burnap & Omer Rana
CSIRO, Canberra, Australia
Rajiv Ranjan

Authors

Meisong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ranjan
View author publications
You can also search for this author in PubMed Google Scholar
Prem Prakash Jayaraman
View author publications
You can also search for this author in PubMed Google Scholar
Peter Strazdins
View author publications
You can also search for this author in PubMed Google Scholar
Pete Burnap
View author publications
You can also search for this author in PubMed Google Scholar
Omer Rana
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Georgakopulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajiv Ranjan .

Editor information

Editors and Affiliations

IBM Research, Haifa, Israel
Benny Mandler
CONNECT Centre, Trinity College, University of Dublin, Dublin, Ireland
Johann Marquez-Barja
GTA/PEE-COPPE/DEL-Poli, Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
Miguel Elias Mitre Campista
Faculty of Materials Science and Technology in Trnava, Slovak University of Technology in Bratislava, Trnava, Slovakia
Dagmar Cagáňová
Institut Télécom SudParis, Evry, France
Hakima Chaouchi
College of Communication and Information, University of Kentucky, Lexington, KY, USA
Sherali Zeadally
College of Technological Innovation, Zayed University, Dubai, United Arab Emirates
Mohamad Badra
University of Pisa, Pisa, Holy See (Vatican City State)
Stefano Giordano
DICIEAMA Department, University of Messina, Messina, Italy
Maria Fazio
CREATE-NET, Trento, Italy
Andrey Somov
University of Trento, Trento, Italy
Radu-Laurentiu Vieriu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, M. et al. (2016). A Case for Understanding End-to-End Performance of Topic Detection and Tracking Based Big Data Applications in the Cloud. In: Mandler, B., et al. Internet of Things. IoT Infrastructures. IoT360 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 169. Springer, Cham. https://doi.org/10.1007/978-3-319-47063-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-47063-4_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47062-7
Online ISBN: 978-3-319-47063-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics