Building Autonomic Elements from Video-Streaming Servers

Cunha, Carlos

doi:10.1007/s10922-019-09503-1

Building Autonomic Elements from Video-Streaming Servers

Published: 16 July 2019

Volume 28, pages 160–192, (2020)
Cite this article

Journal of Network and Systems Management Aims and scope Submit manuscript

Carlos Cunha ORCID: orcid.org/0000-0002-2754-5401^1,2

230 Accesses
1 Citation
Explore all metrics

Abstract

HTTP Streaming is nowadays the main approach for delivering video-streaming on the Internet. As a consequence of that, the widely deployed HTTP infrastructures face new challenges posed by the sensitivity of video-streaming users to service quality degradation and the specificities of video-streaming workloads. Performance issues represent one main class of problems in the server infrastructure that can result into a significant deterioration of the end-users’ quality of experience (QoE), proportional to the upfront time spent by them watching the videos. This paper addresses the development of autonomic HTTP Streaming servers organized into Autonomic Elements (AEs), the building blocks of Autonomic Computing (AC) systems. AEs are structured using container-based virtualization and are provided with monitoring, failure prediction, failure diagnosis and repair features. These features are incorporated into SHStream, a self-healing framework developed by us. SHStream relies on online learning algorithms to build and evaluate classification models dynamically for prediction and diagnosis of performance anomalies. The results of our experimental analysis have shown that: (1) failure prediction can be performed with approximately \(98\%\) of recall and \(99\%\) of precision; (2) the diagnosis activity can localize and identify the resource responsible for performance failures, without misclassifications; (3) the classifiers’ performance stabilizes using a small number of learning instances; and (4) container-based virtualization technologies enable recovery times shorter than 1 s through rebooting and shorter than 3 s using server migration techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomic Performance and Power Control on Virtualized Servers: Survey, Practices, and Trends

Article 04 July 2014

Transparent Autonomicity for OpenMP Applications

Autonomic Computing in Cloud: Model and Applications

Notes

An extension of the fail-stop model (the process or the system terminates upon any error or failure) [10] that takes performance faults into account.
A long period of time from fault activation to failure manifestation at the QoE level.
In online learning, models update continuously as each data point arrives.
Combinations of metrics’ values expected during normal periods.
HTTP HEAD requests issued every n seconds to assess the server responsiveness.
In failure prediction, one class value is either a normality pattern or a pre-failure pattern.
Concepts represent combination of metrics’ values associated to each classification outcome.
Number of requests being served
Distribution of user requests over video objects.
Temporal dependency

References

Cisco Systems: Cisco visual networking index: forecast and methodology, 2014 to 2019. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html (2015). Accessed 20 May 2018
Inc ULC SANDVINE: Global internet phenomena report (2013)
Real time streaming protocol. http://tools.ietf.org/html/rfc2326 (1998). Accessed 20 May 2018
Rtp: A transport protocol for real-time applications. https://tools.ietf.org/html/rfc3550 (2003). Accessed 20 May 2018
Saxena, M., Sharan, U., Fahmy, S.: Analyzing video services in web 2.0: a global perspective. In: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video, NOSSDAV ’08, pp. 39–44. ACM, New York (2008)
Pariag, D., Brecht, T., Harji, A., Buhr, P., Shukla, A., Cheriton, D.R.: Comparing the performance of web server architectures. In: Proceedings of the 2Nd ACM SIGOPS/EuroSys European conference on computer systems 2007, EuroSys ’07, pp. 231–243. ACM, New York (2007)
Brecht, T., Pariag, D., Gammo, L.: accept () able strategies for improving web server performance. In USENIX annual technical conference, general track, pp. 227–240 (2004)
Gill, P., Arlitt, M., Li, Z., Mahanti, A.: Youtube traffic characterization: a view from the edge. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, IMC ’07, pp. 15–28. ACM, New York (2007)
Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 1(1), 11–33 (2004)
Article Google Scholar
Chandra, S., Chen, P.M.: How fail-stop are faulty programs? In: Twenty-eighth annual international symposium on fault-tolerant computing, 1998. Digest of papers, pp. 240–249 (1998)
Arpaci-Dusseau, R.H., Arpaci-Dusseau, A.C.: Fail-stutter fault tolerance. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 33–38 (2001)
Dobrian, F., Sekar, V., Awan, A., Stoica, I., Joseph, D., Ganjam, A., Zhan, J., Zhang, H.: Understanding the impact of video quality on user engagement. In: Proceedings of the ACM SIGCOMM conference, SIGCOMM ’11, pp. 362–373. ACM, New York (2011)
Avižienis, A., Laprie, J.C., Randell, B.: Dependability and its threats: a taxonomy. In: Jacquart, Renè (ed.) Building the information society. IFIP international federation for information processing, vol. 156, pp. 91–120. Springer, New York (2004)
Cherkasova, L., Ozonat, K., Mi, Ningfang, Symons, J., Smirni, E.: Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change. In: IEEE international conference on dependable systems and networks (FTCS and DCC), 2008. DSN 2008, pp. 452–461 (2008)
Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the international conference on dependable systems and networks, 2002. DSN 2002, pp. 595–604 (2002)
Gupta, M., Neogi, A., Agarwal, M.K., Kar, G.: Discovering dynamic dependencies in enterprise environments for problem determination. In: Brunner, M., Keller, A. (eds.) Self-managing distributed systems. lecture notes in computer science, vol. 2867, pp. 221–233. Springer, Berlin (2003)
Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.S.: Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: OSDI’04: symposium on operating systems design and implementation, pp. 16–16. USENIX Association, Berkeley (2004)
Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S.: Analysis of software aging in a web server. IEEE Trans. Reliab. 55(3), 411–420 (2006)
Article Google Scholar
Lei, L., Vaidyanathan, K., Trivedi, K.: An approach for estimation of software aging in a web server. In: International symposium on empirical software engineering, pp. 91–100 (2002)
Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D.: Software rejuvenation: analysis, module and applications. In: Twenty-fifth international symposium on fault-tolerant computing, 1995. FTCS-25. Digest of papers, pp. 381–390 (1995)
Tan, Y., Nguyen, H., Shen, Z., Gu, X., Venkatramani, C., Rajan, D.: Prepare: predictive performance anomaly prevention for virtualized cloud systems. In: 2012 IEEE 32nd international conference on distributed computing systems (ICDCS), pp. 285–294 (2012)
Xiaohui, G., Wang, H.: Online anomaly prediction for robust cluster systems. In: International conference on data engineering. 1000–1011 (2009)
Ganek, A.G., Corbi, T.A.: The dawning of the autonomic computing era. IBM Syst. J. 42(1), 5–18 (2003)
Article Google Scholar
Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European conference on computer systems, EuroSys ’07, pp. 275–287. ACM, New York (2007)
Liang, Y., Zhang, Y., Jette, M., Sivasubramaniam, A., Sahoo, R.: Bluegene/l failure analysis and prediction models. In: International conference on dependable systems and networks, 2006. DSN 2006, pp. 425–434 (2006)
Lou, Jungang, Jiang, Yunliang, Shen, Qing, Shen, Zhangguo, Wang, Zhen, Wang, Ruiqin: Software reliability prediction via relevance vector regression. Neurocomputing 186, 66–73 (2016)
Article Google Scholar
Pham, T.-T., Défago, X., Huynh, Q.-T.: Reliability prediction for component-based software systems: dealing with concurrent and propagating errors. Science of computer programming, 97:426–457 (2015). Special issue: selected papers from the 12th international conference on quality software (QSIC 2012)
Tan, Y., Gu, X., Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing, PODC ’10, pp. 173–182. ACM, New York (2010)
Sahoo, R.K., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., Ma, S., Vilalta, R., Sivasubramaniam, A.: Critical event prediction for proactive management in large-scale computer clusters. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp. 426–435. ACM, New York (2003)
Hoffmann, G.A., Trivedi, K.S., Malek, M.: A best practice guide to resource forecasting for computing systems. IEEE Trans. Reliab. 56(4), 615–628 (2007)
Article Google Scholar
Ibidunmoye, Olumuyiwa, Rezaie, Ali-Reza, Elmroth, Erik: Adaptive anomaly detection in performance metric streams. IEEE Trans. Netw. Serv. Manag. 15(1), 217–231 (2018)
Article Google Scholar
Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 10:1–10:42 (2010)
Article Google Scholar
Kelly, T.: detecting performance anomalies in global applications. In: Proceedings of the 2nd conference on Real, large distributed systems, vol. 2, WORLDS’05, pp. 42–47. USENIX Association, Berkeley (2005)
Brown, A., Kar, G., Keller, A.: An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. In: Proceedings of the, IEEE/IFIP international symposium on integrated network management, pp. 377–390 (2001)
Jayathilaka, H., Krintz, C., Wolski, R.: Performance monitoring and root cause analysis for cloud-hosted web applications. In: Proceedings of the 26th international conference on world wide web, WWW ’17, pp. 469–478, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017)
Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen, M., Cutler, J., Enriquez, P., Fox, A., Kiciman, E., Merzbacher, M., Oppenheimer, D., Sastry, N., Tetzlaff, W., Traupman, J., Treuhaft, N.: Recovery oriented computing (roc): motivation, definition, techniques. Technical report, Berkeley, CA, USA (2002)
Candea, G., Fox, A.: Designing for high availability and measurability. In: Proceedings of the 1st workshop on evaluating and architecting system dependability (2001)
Candea, G., Fox, A.: Recursive restartability: turning the reboot sledgehammer into a scalpel. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 125–130 (2001)
Candea, G., Fox, A.: Crash-only software. In: Proceedings of the 9th conference on hot topics in operating systems, vol. 9, HOTOS’03, pp. 12–12, USENIX Association, Berkeley (2003)
Grottke, M., Kim, D.S., Mansharamani, R., Nambiar, M., Natella, R., Trivedi, K.S.: Recovery from software failures caused by mandelbugs. IEEE Trans. Reliab. 65(1), 70–87 (2016)
Article Google Scholar
Sultan, F., Srinivasan, K., Iyer, D., Iftode, L.: Migratory tcp: connection migration for service continuity in the internet. In: Proceedings of the 22nd international conference on distributed computing systems, 2002, pp. 469−470 (2002)
Zhang, R., Abdelzaher, T.F., Stankovic, J.A.: Efficient tcp connection failover in web server clusters. In: INFOCOM 2004. twenty-third annual joint conference of the IEEE computer and communications societies, vol. 2, pp. 1219–1228 (2004)
Singh, Kundan, Schulzrinne, Henning: Failover, load sharing and server architecture in sip telephony. Comput. Commun. 30(5), 927–942 (2007)
Article Google Scholar
Dobre, C., Pop, F., Cristea, V.: A virtualization-based approach to dependable service computing. Scalable Comput. Pract. Exp. 12(3), 337–350 (2011)
Google Scholar
Tamura, Y., Sato, K., Kihara, S., Moriai, S.: Kemari: virtual machine synchronization for fault tolerance. In: Proceedings of the USENIX annual technical conference (Poster Session) (2008)
Bressoud, Thomas C., Schneider, Fred B.: Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14(1), 80–107 (1996)
Article Google Scholar
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation, NSDI’08, pp. 161–174. USENIX Association, Berkeley (2008)
Cunha, C.A., Moura e Silva, L.: Shstream: Self-healing framework for http video-streaming. In: 2013 13th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid), pp. 514–521 (2013)
Stockhammer, T.: Dynamic adaptive streaming over http: standards and design principles. In: Proceedings of the second annual ACM conference on Multimedia systems, MMSys ’11, pp. 133–144. ACM, New York (2011)
Sodagar, I.: The mpeg-dash standard for multimedia streaming over the internet. IEEE Multimedia 18(4), 62–67 (2011)
Article Google Scholar
Feamster, N., Balakrishnan, H.: Packet loss recovery for streaming video. In: 12th international packet video workshop. Pittsburgh (2002)
Puri, R., Ramchandran, K.: Multiple description source coding using forward error correction codes. In: Conference record of the thirty-third asilomar conference on signals, systems, and computers, 1999, vol. 1, pp. 342–346 (1999)
Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)
Article MathSciNet Google Scholar
Padala, P., Zhu, X., Wang, Z., Singhal, S., Shin, K.G., et al.: Performance evaluation of virtualization technologies for server consolidation. HP laboratories technical report (2007)
Openvz. http://wiki.openvz.org/main_page. Accessed 20 May 2018
Barham, Paul, Dragovic, Boris, Fraser, Keir, Hand, Steven, Harris, Tim, Ho, Alex, Neugebauer, Rolf, Pratt, Ian, Warfield, Andrew: Xen and the art of virtualization. SIGOPS Oper. Syst. Rev. 37(5), 164–177 (2003)
Article Google Scholar
Hyperic system information gatherer (sigar). http://sourceforge.net/projects/sigar/files/. Accessed 20 May 2018
Bifet, A., Kirkby, R.: Data stream mining: a practical approach. Technical report, The University of Waikato (2009)
Gama, J, Medas, P, Rocha, R: Forest trees for on-line data. In: Proceedings of the 2004 ACM symposium on applied computing, SAC ’04, pp. 632–636. ACM, New York (2004)
Witten, I.H., Frank, E., Hall, M.A.: Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Cambridge (2011)
Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pp. 71–80. ACM, New York, (2000)
Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Article Google Scholar
Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connect. Sci. 8(3–4), 385–404 (1996)
Article Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: 30th Annual symposium on foundations of computer science, 1989, pp. 256–261 (1989)
Oza, Nikunj, C., Russell, S.: Online bagging and boosting. In: In artificial intelligence and statistics, pp. 105–112. Cambridge, Morgan Kaufmann (2001)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) Computational learning theory. lecture notes in computer science, vol. 904, pp. 23–37. Springer, Berlin (1995)
Google Scholar
Breiman, Leo: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
MATH Google Scholar
Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M., Thornton, J. (eds.) AI 2007: advances in artificial intelligence. Lecture notes in computer science, vol. 4830, pp. 90–99. Springer, Berlin (2007)
Chapter Google Scholar
Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: Proceedings of the fourteenth international conference on machine learning, ICML ’97, pp. 161–169, Morgan Kaufmann Publishers Inc, San Francisco (1997)
Kuncheva, Ludmila I.: Classifier ensembles for changing environments. In: Fabio R., Josef K., Terry W., (Eds.). Multiple classifier systems, volume 3077 of lecture notes in computer science, pp 1–15. Springer, Berlin (2004)
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM international conference on data mining, pp. 443–448 (2007)
rsync. http://rsync.samba.org/. Accessed 20 May 2018
Mosberger, David, Jin, Tai: httperf—a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev. 26(3), 31–37 (1998)
Article Google Scholar
García, Roberto, Pañeda, Xabiel G., García, Victor, Melendi, David, Vilas, Manuel: Statistical characterization of a real video on demand service: user behaviour and streaming-media workload analysis. Simul. Model. Pract. Theory 15(6), 672–689 (2007)
Article Google Scholar
Sripanidkulchai, K., Maggs, B., Zhang, H.: An analysis of live streaming workloads on the internet. In: IMC ’04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp. 41–54. ACM, New York (2004)
Finamore, A., Mellia, M., Munafò, M.M., Torres, R., Rao, S.G.: Youtube everywhere: impact of device and infrastructure synergies on user experience. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, IMC ’11, pp. 345–360. ACM, New York (2011)
Kang, X., Zhang, H., Jiang, G., Chen, H., Meng, X., Yoshihira, K.: Measurement, modeling, and analysis of internet video sharing site workload: A case study. In: IEEE international conference on web services, 2008. ICWS’08, pp. 278–285. IEEE, New York (2008)
Mori, T., Kawahara, R., Hasegawa, H., Shimogawa, S.: Characterizing traffic flows originating from large-scale video sharing services. In: Ricciato, F., Mellia, M., Biersack, E. (eds.) Traffic monitoring and analysis. Lecture notes in computer science, pp. 17–31. Springer, Berlin (2010)
Chapter Google Scholar
Adhikari, V.K., Jain, S., Chen, Y., Zhang, Z.-L.: Vivisecting youtube: an active measurement study. In: Proceedings IEEE INFOCOM, 2012, pp. 2521–2525 (2012)
Summers, J., Brecht, T., Eager, D., Wong, B.: Methodologies for generating http streaming video workloads to evaluate web server performance. In: Proceedings of the 5th annual international systems and storage conference, SYSTOR ’12, pp. 2:1–2:12. ACM, New York (2012)
Standard performance evaluation corporation. Specweb2009 benchmark. http://www.spec.org/web2009 (2010). Accessed 20 May 2018
Stress tool. http://weather.ou.edu/~apw/projects/stress/. Accessed 20 May 2018
Hemminger, S.: Network emulation with netem. In Linux Conf Au (2005)
Jiang, W., Schulzrinne, H.: Modeling of packet loss and delay and their effect on real-time multimedia service quality. In: Proceedings of NOSSDAV ’2000 (2000)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book Google Scholar

Download references

Acknowledgements

This research was supported by FCT-Portugal under grant SFRH/BD/35784 and Center of Studies in Education, Technologies and Health of the Polytechnic Institute of Viseu.

Author information

Authors and Affiliations

Department of Informatics, Polytechnics Institute of Viseu, Viseu, Portugal
Carlos Cunha
Department of Informatics, University of Coimbra, Coimbra, Portugal
Carlos Cunha

Authors

Carlos Cunha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Cunha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cunha, C. Building Autonomic Elements from Video-Streaming Servers. J Netw Syst Manage 28, 160–192 (2020). https://doi.org/10.1007/s10922-019-09503-1

Download citation

Received: 05 June 2018
Revised: 26 April 2019
Accepted: 06 July 2019
Published: 16 July 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10922-019-09503-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building Autonomic Elements from Video-Streaming Servers

Abstract

Access this article

Similar content being viewed by others

Autonomic Performance and Power Control on Virtualized Servers: Survey, Practices, and Trends

Transparent Autonomicity for OpenMP Applications

Autonomic Computing in Cloud: Model and Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Building Autonomic Elements from Video-Streaming Servers

Abstract

Access this article

Similar content being viewed by others

Autonomic Performance and Power Control on Virtualized Servers: Survey, Practices, and Trends

Transparent Autonomicity for OpenMP Applications

Autonomic Computing in Cloud: Model and Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation