Abstract
HTTP Streaming is nowadays the main approach for delivering video-streaming on the Internet. As a consequence of that, the widely deployed HTTP infrastructures face new challenges posed by the sensitivity of video-streaming users to service quality degradation and the specificities of video-streaming workloads. Performance issues represent one main class of problems in the server infrastructure that can result into a significant deterioration of the end-users’ quality of experience (QoE), proportional to the upfront time spent by them watching the videos. This paper addresses the development of autonomic HTTP Streaming servers organized into Autonomic Elements (AEs), the building blocks of Autonomic Computing (AC) systems. AEs are structured using container-based virtualization and are provided with monitoring, failure prediction, failure diagnosis and repair features. These features are incorporated into SHStream, a self-healing framework developed by us. SHStream relies on online learning algorithms to build and evaluate classification models dynamically for prediction and diagnosis of performance anomalies. The results of our experimental analysis have shown that: (1) failure prediction can be performed with approximately \(98\%\) of recall and \(99\%\) of precision; (2) the diagnosis activity can localize and identify the resource responsible for performance failures, without misclassifications; (3) the classifiers’ performance stabilizes using a small number of learning instances; and (4) container-based virtualization technologies enable recovery times shorter than 1 s through rebooting and shorter than 3 s using server migration techniques.
Similar content being viewed by others
Notes
An extension of the fail-stop model (the process or the system terminates upon any error or failure) [10] that takes performance faults into account.
A long period of time from fault activation to failure manifestation at the QoE level.
In online learning, models update continuously as each data point arrives.
Combinations of metrics’ values expected during normal periods.
HTTP HEAD requests issued every n seconds to assess the server responsiveness.
In failure prediction, one class value is either a normality pattern or a pre-failure pattern.
Concepts represent combination of metrics’ values associated to each classification outcome.
Number of requests being served
Distribution of user requests over video objects.
Temporal dependency
References
Cisco Systems: Cisco visual networking index: forecast and methodology, 2014 to 2019. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html (2015). Accessed 20 May 2018
Inc ULC SANDVINE: Global internet phenomena report (2013)
Real time streaming protocol. http://tools.ietf.org/html/rfc2326 (1998). Accessed 20 May 2018
Rtp: A transport protocol for real-time applications. https://tools.ietf.org/html/rfc3550 (2003). Accessed 20 May 2018
Saxena, M., Sharan, U., Fahmy, S.: Analyzing video services in web 2.0: a global perspective. In: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video, NOSSDAV ’08, pp. 39–44. ACM, New York (2008)
Pariag, D., Brecht, T., Harji, A., Buhr, P., Shukla, A., Cheriton, D.R.: Comparing the performance of web server architectures. In: Proceedings of the 2Nd ACM SIGOPS/EuroSys European conference on computer systems 2007, EuroSys ’07, pp. 231–243. ACM, New York (2007)
Brecht, T., Pariag, D., Gammo, L.: accept () able strategies for improving web server performance. In USENIX annual technical conference, general track, pp. 227–240 (2004)
Gill, P., Arlitt, M., Li, Z., Mahanti, A.: Youtube traffic characterization: a view from the edge. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, IMC ’07, pp. 15–28. ACM, New York (2007)
Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 1(1), 11–33 (2004)
Chandra, S., Chen, P.M.: How fail-stop are faulty programs? In: Twenty-eighth annual international symposium on fault-tolerant computing, 1998. Digest of papers, pp. 240–249 (1998)
Arpaci-Dusseau, R.H., Arpaci-Dusseau, A.C.: Fail-stutter fault tolerance. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 33–38 (2001)
Dobrian, F., Sekar, V., Awan, A., Stoica, I., Joseph, D., Ganjam, A., Zhan, J., Zhang, H.: Understanding the impact of video quality on user engagement. In: Proceedings of the ACM SIGCOMM conference, SIGCOMM ’11, pp. 362–373. ACM, New York (2011)
Avižienis, A., Laprie, J.C., Randell, B.: Dependability and its threats: a taxonomy. In: Jacquart, Renè (ed.) Building the information society. IFIP international federation for information processing, vol. 156, pp. 91–120. Springer, New York (2004)
Cherkasova, L., Ozonat, K., Mi, Ningfang, Symons, J., Smirni, E.: Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change. In: IEEE international conference on dependable systems and networks (FTCS and DCC), 2008. DSN 2008, pp. 452–461 (2008)
Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the international conference on dependable systems and networks, 2002. DSN 2002, pp. 595–604 (2002)
Gupta, M., Neogi, A., Agarwal, M.K., Kar, G.: Discovering dynamic dependencies in enterprise environments for problem determination. In: Brunner, M., Keller, A. (eds.) Self-managing distributed systems. lecture notes in computer science, vol. 2867, pp. 221–233. Springer, Berlin (2003)
Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.S.: Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: OSDI’04: symposium on operating systems design and implementation, pp. 16–16. USENIX Association, Berkeley (2004)
Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S.: Analysis of software aging in a web server. IEEE Trans. Reliab. 55(3), 411–420 (2006)
Lei, L., Vaidyanathan, K., Trivedi, K.: An approach for estimation of software aging in a web server. In: International symposium on empirical software engineering, pp. 91–100 (2002)
Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D.: Software rejuvenation: analysis, module and applications. In: Twenty-fifth international symposium on fault-tolerant computing, 1995. FTCS-25. Digest of papers, pp. 381–390 (1995)
Tan, Y., Nguyen, H., Shen, Z., Gu, X., Venkatramani, C., Rajan, D.: Prepare: predictive performance anomaly prevention for virtualized cloud systems. In: 2012 IEEE 32nd international conference on distributed computing systems (ICDCS), pp. 285–294 (2012)
Xiaohui, G., Wang, H.: Online anomaly prediction for robust cluster systems. In: International conference on data engineering. 1000–1011 (2009)
Ganek, A.G., Corbi, T.A.: The dawning of the autonomic computing era. IBM Syst. J. 42(1), 5–18 (2003)
Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European conference on computer systems, EuroSys ’07, pp. 275–287. ACM, New York (2007)
Liang, Y., Zhang, Y., Jette, M., Sivasubramaniam, A., Sahoo, R.: Bluegene/l failure analysis and prediction models. In: International conference on dependable systems and networks, 2006. DSN 2006, pp. 425–434 (2006)
Lou, Jungang, Jiang, Yunliang, Shen, Qing, Shen, Zhangguo, Wang, Zhen, Wang, Ruiqin: Software reliability prediction via relevance vector regression. Neurocomputing 186, 66–73 (2016)
Pham, T.-T., Défago, X., Huynh, Q.-T.: Reliability prediction for component-based software systems: dealing with concurrent and propagating errors. Science of computer programming, 97:426–457 (2015). Special issue: selected papers from the 12th international conference on quality software (QSIC 2012)
Tan, Y., Gu, X., Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing, PODC ’10, pp. 173–182. ACM, New York (2010)
Sahoo, R.K., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., Ma, S., Vilalta, R., Sivasubramaniam, A.: Critical event prediction for proactive management in large-scale computer clusters. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp. 426–435. ACM, New York (2003)
Hoffmann, G.A., Trivedi, K.S., Malek, M.: A best practice guide to resource forecasting for computing systems. IEEE Trans. Reliab. 56(4), 615–628 (2007)
Ibidunmoye, Olumuyiwa, Rezaie, Ali-Reza, Elmroth, Erik: Adaptive anomaly detection in performance metric streams. IEEE Trans. Netw. Serv. Manag. 15(1), 217–231 (2018)
Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 10:1–10:42 (2010)
Kelly, T.: detecting performance anomalies in global applications. In: Proceedings of the 2nd conference on Real, large distributed systems, vol. 2, WORLDS’05, pp. 42–47. USENIX Association, Berkeley (2005)
Brown, A., Kar, G., Keller, A.: An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. In: Proceedings of the, IEEE/IFIP international symposium on integrated network management, pp. 377–390 (2001)
Jayathilaka, H., Krintz, C., Wolski, R.: Performance monitoring and root cause analysis for cloud-hosted web applications. In: Proceedings of the 26th international conference on world wide web, WWW ’17, pp. 469–478, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017)
Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen, M., Cutler, J., Enriquez, P., Fox, A., Kiciman, E., Merzbacher, M., Oppenheimer, D., Sastry, N., Tetzlaff, W., Traupman, J., Treuhaft, N.: Recovery oriented computing (roc): motivation, definition, techniques. Technical report, Berkeley, CA, USA (2002)
Candea, G., Fox, A.: Designing for high availability and measurability. In: Proceedings of the 1st workshop on evaluating and architecting system dependability (2001)
Candea, G., Fox, A.: Recursive restartability: turning the reboot sledgehammer into a scalpel. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 125–130 (2001)
Candea, G., Fox, A.: Crash-only software. In: Proceedings of the 9th conference on hot topics in operating systems, vol. 9, HOTOS’03, pp. 12–12, USENIX Association, Berkeley (2003)
Grottke, M., Kim, D.S., Mansharamani, R., Nambiar, M., Natella, R., Trivedi, K.S.: Recovery from software failures caused by mandelbugs. IEEE Trans. Reliab. 65(1), 70–87 (2016)
Sultan, F., Srinivasan, K., Iyer, D., Iftode, L.: Migratory tcp: connection migration for service continuity in the internet. In: Proceedings of the 22nd international conference on distributed computing systems, 2002, pp. 469−470 (2002)
Zhang, R., Abdelzaher, T.F., Stankovic, J.A.: Efficient tcp connection failover in web server clusters. In: INFOCOM 2004. twenty-third annual joint conference of the IEEE computer and communications societies, vol. 2, pp. 1219–1228 (2004)
Singh, Kundan, Schulzrinne, Henning: Failover, load sharing and server architecture in sip telephony. Comput. Commun. 30(5), 927–942 (2007)
Dobre, C., Pop, F., Cristea, V.: A virtualization-based approach to dependable service computing. Scalable Comput. Pract. Exp. 12(3), 337–350 (2011)
Tamura, Y., Sato, K., Kihara, S., Moriai, S.: Kemari: virtual machine synchronization for fault tolerance. In: Proceedings of the USENIX annual technical conference (Poster Session) (2008)
Bressoud, Thomas C., Schneider, Fred B.: Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14(1), 80–107 (1996)
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation, NSDI’08, pp. 161–174. USENIX Association, Berkeley (2008)
Cunha, C.A., Moura e Silva, L.: Shstream: Self-healing framework for http video-streaming. In: 2013 13th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid), pp. 514–521 (2013)
Stockhammer, T.: Dynamic adaptive streaming over http: standards and design principles. In: Proceedings of the second annual ACM conference on Multimedia systems, MMSys ’11, pp. 133–144. ACM, New York (2011)
Sodagar, I.: The mpeg-dash standard for multimedia streaming over the internet. IEEE Multimedia 18(4), 62–67 (2011)
Feamster, N., Balakrishnan, H.: Packet loss recovery for streaming video. In: 12th international packet video workshop. Pittsburgh (2002)
Puri, R., Ramchandran, K.: Multiple description source coding using forward error correction codes. In: Conference record of the thirty-third asilomar conference on signals, systems, and computers, 1999, vol. 1, pp. 342–346 (1999)
Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)
Padala, P., Zhu, X., Wang, Z., Singhal, S., Shin, K.G., et al.: Performance evaluation of virtualization technologies for server consolidation. HP laboratories technical report (2007)
Openvz. http://wiki.openvz.org/main_page. Accessed 20 May 2018
Barham, Paul, Dragovic, Boris, Fraser, Keir, Hand, Steven, Harris, Tim, Ho, Alex, Neugebauer, Rolf, Pratt, Ian, Warfield, Andrew: Xen and the art of virtualization. SIGOPS Oper. Syst. Rev. 37(5), 164–177 (2003)
Hyperic system information gatherer (sigar). http://sourceforge.net/projects/sigar/files/. Accessed 20 May 2018
Bifet, A., Kirkby, R.: Data stream mining: a practical approach. Technical report, The University of Waikato (2009)
Gama, J, Medas, P, Rocha, R: Forest trees for on-line data. In: Proceedings of the 2004 ACM symposium on applied computing, SAC ’04, pp. 632–636. ACM, New York (2004)
Witten, I.H., Frank, E., Hall, M.A.: Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Cambridge (2011)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pp. 71–80. ACM, New York, (2000)
Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connect. Sci. 8(3–4), 385–404 (1996)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: 30th Annual symposium on foundations of computer science, 1989, pp. 256–261 (1989)
Oza, Nikunj, C., Russell, S.: Online bagging and boosting. In: In artificial intelligence and statistics, pp. 105–112. Cambridge, Morgan Kaufmann (2001)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) Computational learning theory. lecture notes in computer science, vol. 904, pp. 23–37. Springer, Berlin (1995)
Breiman, Leo: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M., Thornton, J. (eds.) AI 2007: advances in artificial intelligence. Lecture notes in computer science, vol. 4830, pp. 90–99. Springer, Berlin (2007)
Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: Proceedings of the fourteenth international conference on machine learning, ICML ’97, pp. 161–169, Morgan Kaufmann Publishers Inc, San Francisco (1997)
Kuncheva, Ludmila I.: Classifier ensembles for changing environments. In: Fabio R., Josef K., Terry W., (Eds.). Multiple classifier systems, volume 3077 of lecture notes in computer science, pp 1–15. Springer, Berlin (2004)
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM international conference on data mining, pp. 443–448 (2007)
rsync. http://rsync.samba.org/. Accessed 20 May 2018
Mosberger, David, Jin, Tai: httperf—a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev. 26(3), 31–37 (1998)
García, Roberto, Pañeda, Xabiel G., García, Victor, Melendi, David, Vilas, Manuel: Statistical characterization of a real video on demand service: user behaviour and streaming-media workload analysis. Simul. Model. Pract. Theory 15(6), 672–689 (2007)
Sripanidkulchai, K., Maggs, B., Zhang, H.: An analysis of live streaming workloads on the internet. In: IMC ’04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp. 41–54. ACM, New York (2004)
Finamore, A., Mellia, M., Munafò, M.M., Torres, R., Rao, S.G.: Youtube everywhere: impact of device and infrastructure synergies on user experience. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, IMC ’11, pp. 345–360. ACM, New York (2011)
Kang, X., Zhang, H., Jiang, G., Chen, H., Meng, X., Yoshihira, K.: Measurement, modeling, and analysis of internet video sharing site workload: A case study. In: IEEE international conference on web services, 2008. ICWS’08, pp. 278–285. IEEE, New York (2008)
Mori, T., Kawahara, R., Hasegawa, H., Shimogawa, S.: Characterizing traffic flows originating from large-scale video sharing services. In: Ricciato, F., Mellia, M., Biersack, E. (eds.) Traffic monitoring and analysis. Lecture notes in computer science, pp. 17–31. Springer, Berlin (2010)
Adhikari, V.K., Jain, S., Chen, Y., Zhang, Z.-L.: Vivisecting youtube: an active measurement study. In: Proceedings IEEE INFOCOM, 2012, pp. 2521–2525 (2012)
Summers, J., Brecht, T., Eager, D., Wong, B.: Methodologies for generating http streaming video workloads to evaluate web server performance. In: Proceedings of the 5th annual international systems and storage conference, SYSTOR ’12, pp. 2:1–2:12. ACM, New York (2012)
Standard performance evaluation corporation. Specweb2009 benchmark. http://www.spec.org/web2009 (2010). Accessed 20 May 2018
Stress tool. http://weather.ou.edu/~apw/projects/stress/. Accessed 20 May 2018
Hemminger, S.: Network emulation with netem. In Linux Conf Au (2005)
Jiang, W., Schulzrinne, H.: Modeling of packet loss and delay and their effect on real-time multimedia service quality. In: Proceedings of NOSSDAV ’2000 (2000)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Acknowledgements
This research was supported by FCT-Portugal under grant SFRH/BD/35784 and Center of Studies in Education, Technologies and Health of the Polytechnic Institute of Viseu.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cunha, C. Building Autonomic Elements from Video-Streaming Servers. J Netw Syst Manage 28, 160–192 (2020). https://doi.org/10.1007/s10922-019-09503-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10922-019-09503-1