Skip to main content
Log in

Building Autonomic Elements from Video-Streaming Servers

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

HTTP Streaming is nowadays the main approach for delivering video-streaming on the Internet. As a consequence of that, the widely deployed HTTP infrastructures face new challenges posed by the sensitivity of video-streaming users to service quality degradation and the specificities of video-streaming workloads. Performance issues represent one main class of problems in the server infrastructure that can result into a significant deterioration of the end-users’ quality of experience (QoE), proportional to the upfront time spent by them watching the videos. This paper addresses the development of autonomic HTTP Streaming servers organized into Autonomic Elements (AEs), the building blocks of Autonomic Computing (AC) systems. AEs are structured using container-based virtualization and are provided with monitoring, failure prediction, failure diagnosis and repair features. These features are incorporated into SHStream, a self-healing framework developed by us. SHStream relies on online learning algorithms to build and evaluate classification models dynamically for prediction and diagnosis of performance anomalies. The results of our experimental analysis have shown that: (1) failure prediction can be performed with approximately \(98\%\) of recall and \(99\%\) of precision; (2) the diagnosis activity can localize and identify the resource responsible for performance failures, without misclassifications; (3) the classifiers’ performance stabilizes using a small number of learning instances; and (4) container-based virtualization technologies enable recovery times shorter than 1 s through rebooting and shorter than 3 s using server migration techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. An extension of the fail-stop model (the process or the system terminates upon any error or failure) [10] that takes performance faults into account.

  2. A long period of time from fault activation to failure manifestation at the QoE level.

  3. In online learning, models update continuously as each data point arrives.

  4. Combinations of metrics’ values expected during normal periods.

  5. HTTP HEAD requests issued every n seconds to assess the server responsiveness.

  6. In failure prediction, one class value is either a normality pattern or a pre-failure pattern.

  7. Concepts represent combination of metrics’ values associated to each classification outcome.

  8. Number of requests being served

  9. Distribution of user requests over video objects.

  10. Temporal dependency

References

  1. Cisco Systems: Cisco visual networking index: forecast and methodology, 2014 to 2019. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html (2015). Accessed 20 May 2018

  2. Inc ULC SANDVINE: Global internet phenomena report (2013)

  3. Real time streaming protocol. http://tools.ietf.org/html/rfc2326 (1998). Accessed 20 May 2018

  4. Rtp: A transport protocol for real-time applications. https://tools.ietf.org/html/rfc3550 (2003). Accessed 20 May 2018

  5. Saxena, M., Sharan, U., Fahmy, S.: Analyzing video services in web 2.0: a global perspective. In: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video, NOSSDAV ’08, pp. 39–44. ACM, New York (2008)

  6. Pariag, D., Brecht, T., Harji, A., Buhr, P., Shukla, A., Cheriton, D.R.: Comparing the performance of web server architectures. In: Proceedings of the 2Nd ACM SIGOPS/EuroSys European conference on computer systems 2007, EuroSys ’07, pp. 231–243. ACM, New York (2007)

  7. Brecht, T., Pariag, D., Gammo, L.: accept () able strategies for improving web server performance. In USENIX annual technical conference, general track, pp. 227–240 (2004)

  8. Gill, P., Arlitt, M., Li, Z., Mahanti, A.: Youtube traffic characterization: a view from the edge. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, IMC ’07, pp. 15–28. ACM, New York (2007)

  9. Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 1(1), 11–33 (2004)

    Article  Google Scholar 

  10. Chandra, S., Chen, P.M.: How fail-stop are faulty programs? In: Twenty-eighth annual international symposium on fault-tolerant computing, 1998. Digest of papers, pp. 240–249 (1998)

  11. Arpaci-Dusseau, R.H., Arpaci-Dusseau, A.C.: Fail-stutter fault tolerance. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 33–38 (2001)

  12. Dobrian, F., Sekar, V., Awan, A., Stoica, I., Joseph, D., Ganjam, A., Zhan, J., Zhang, H.: Understanding the impact of video quality on user engagement. In: Proceedings of the ACM SIGCOMM conference, SIGCOMM ’11, pp. 362–373. ACM, New York (2011)

  13. Avižienis, A., Laprie, J.C., Randell, B.: Dependability and its threats: a taxonomy. In: Jacquart, Renè (ed.) Building the information society. IFIP international federation for information processing, vol. 156, pp. 91–120. Springer, New York (2004)

  14. Cherkasova, L., Ozonat, K., Mi, Ningfang, Symons, J., Smirni, E.: Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change. In: IEEE international conference on dependable systems and networks (FTCS and DCC), 2008. DSN 2008, pp. 452–461 (2008)

  15. Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the international conference on dependable systems and networks, 2002. DSN 2002, pp. 595–604 (2002)

  16. Gupta, M., Neogi, A., Agarwal, M.K., Kar, G.: Discovering dynamic dependencies in enterprise environments for problem determination. In: Brunner, M., Keller, A. (eds.) Self-managing distributed systems. lecture notes in computer science, vol. 2867, pp. 221–233. Springer, Berlin (2003)

  17. Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.S.: Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: OSDI’04: symposium on operating systems design and implementation, pp. 16–16. USENIX Association, Berkeley (2004)

  18. Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S.: Analysis of software aging in a web server. IEEE Trans. Reliab. 55(3), 411–420 (2006)

    Article  Google Scholar 

  19. Lei, L., Vaidyanathan, K., Trivedi, K.: An approach for estimation of software aging in a web server. In: International symposium on empirical software engineering, pp. 91–100 (2002)

  20. Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D.: Software rejuvenation: analysis, module and applications. In: Twenty-fifth international symposium on fault-tolerant computing, 1995. FTCS-25. Digest of papers, pp. 381–390 (1995)

  21. Tan, Y., Nguyen, H., Shen, Z., Gu, X., Venkatramani, C., Rajan, D.: Prepare: predictive performance anomaly prevention for virtualized cloud systems. In: 2012 IEEE 32nd international conference on distributed computing systems (ICDCS), pp. 285–294 (2012)

  22. Xiaohui, G., Wang, H.: Online anomaly prediction for robust cluster systems. In: International conference on data engineering. 1000–1011 (2009)

  23. Ganek, A.G., Corbi, T.A.: The dawning of the autonomic computing era. IBM Syst. J. 42(1), 5–18 (2003)

    Article  Google Scholar 

  24. Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European conference on computer systems, EuroSys ’07, pp. 275–287. ACM, New York (2007)

  25. Liang, Y., Zhang, Y., Jette, M., Sivasubramaniam, A., Sahoo, R.: Bluegene/l failure analysis and prediction models. In: International conference on dependable systems and networks, 2006. DSN 2006, pp. 425–434 (2006)

  26. Lou, Jungang, Jiang, Yunliang, Shen, Qing, Shen, Zhangguo, Wang, Zhen, Wang, Ruiqin: Software reliability prediction via relevance vector regression. Neurocomputing 186, 66–73 (2016)

    Article  Google Scholar 

  27. Pham, T.-T., Défago, X., Huynh, Q.-T.: Reliability prediction for component-based software systems: dealing with concurrent and propagating errors. Science of computer programming, 97:426–457 (2015). Special issue: selected papers from the 12th international conference on quality software (QSIC 2012)

  28. Tan, Y., Gu, X., Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing, PODC ’10, pp. 173–182. ACM, New York (2010)

  29. Sahoo, R.K., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., Ma, S., Vilalta, R., Sivasubramaniam, A.: Critical event prediction for proactive management in large-scale computer clusters. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp. 426–435. ACM, New York (2003)

  30. Hoffmann, G.A., Trivedi, K.S., Malek, M.: A best practice guide to resource forecasting for computing systems. IEEE Trans. Reliab. 56(4), 615–628 (2007)

    Article  Google Scholar 

  31. Ibidunmoye, Olumuyiwa, Rezaie, Ali-Reza, Elmroth, Erik: Adaptive anomaly detection in performance metric streams. IEEE Trans. Netw. Serv. Manag. 15(1), 217–231 (2018)

    Article  Google Scholar 

  32. Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 10:1–10:42 (2010)

    Article  Google Scholar 

  33. Kelly, T.: detecting performance anomalies in global applications. In: Proceedings of the 2nd conference on Real, large distributed systems, vol. 2, WORLDS’05, pp. 42–47. USENIX Association, Berkeley (2005)

  34. Brown, A., Kar, G., Keller, A.: An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. In: Proceedings of the, IEEE/IFIP international symposium on integrated network management, pp. 377–390 (2001)

  35. Jayathilaka, H., Krintz, C., Wolski, R.: Performance monitoring and root cause analysis for cloud-hosted web applications. In: Proceedings of the 26th international conference on world wide web, WWW ’17, pp. 469–478, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017)

  36. Patterson, D., Brown, A., Broadwell, P., Candea, G., Chen, M., Cutler, J., Enriquez, P., Fox, A., Kiciman, E., Merzbacher, M., Oppenheimer, D., Sastry, N., Tetzlaff, W., Traupman, J., Treuhaft, N.: Recovery oriented computing (roc): motivation, definition, techniques. Technical report, Berkeley, CA, USA (2002)

  37. Candea, G., Fox, A.: Designing for high availability and measurability. In: Proceedings of the 1st workshop on evaluating and architecting system dependability (2001)

  38. Candea, G., Fox, A.: Recursive restartability: turning the reboot sledgehammer into a scalpel. In: Proceedings of the eighth workshop on hot topics in operating systems, 2001, pp. 125–130 (2001)

  39. Candea, G., Fox, A.: Crash-only software. In: Proceedings of the 9th conference on hot topics in operating systems, vol. 9, HOTOS’03, pp. 12–12, USENIX Association, Berkeley (2003)

  40. Grottke, M., Kim, D.S., Mansharamani, R., Nambiar, M., Natella, R., Trivedi, K.S.: Recovery from software failures caused by mandelbugs. IEEE Trans. Reliab. 65(1), 70–87 (2016)

    Article  Google Scholar 

  41. Sultan, F., Srinivasan, K., Iyer, D., Iftode, L.: Migratory tcp: connection migration for service continuity in the internet. In: Proceedings of the 22nd international conference on distributed computing systems, 2002, pp. 469−470 (2002)

  42. Zhang, R., Abdelzaher, T.F., Stankovic, J.A.: Efficient tcp connection failover in web server clusters. In: INFOCOM 2004. twenty-third annual joint conference of the IEEE computer and communications societies, vol. 2, pp. 1219–1228 (2004)

  43. Singh, Kundan, Schulzrinne, Henning: Failover, load sharing and server architecture in sip telephony. Comput. Commun. 30(5), 927–942 (2007)

    Article  Google Scholar 

  44. Dobre, C., Pop, F., Cristea, V.: A virtualization-based approach to dependable service computing. Scalable Comput. Pract. Exp. 12(3), 337–350 (2011)

    Google Scholar 

  45. Tamura, Y., Sato, K., Kihara, S., Moriai, S.: Kemari: virtual machine synchronization for fault tolerance. In: Proceedings of the USENIX annual technical conference (Poster Session) (2008)

  46. Bressoud, Thomas C., Schneider, Fred B.: Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14(1), 80–107 (1996)

    Article  Google Scholar 

  47. Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation, NSDI’08, pp. 161–174. USENIX Association, Berkeley (2008)

  48. Cunha, C.A., Moura e Silva, L.: Shstream: Self-healing framework for http video-streaming. In: 2013 13th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid), pp. 514–521 (2013)

  49. Stockhammer, T.: Dynamic adaptive streaming over http: standards and design principles. In: Proceedings of the second annual ACM conference on Multimedia systems, MMSys ’11, pp. 133–144. ACM, New York (2011)

  50. Sodagar, I.: The mpeg-dash standard for multimedia streaming over the internet. IEEE Multimedia 18(4), 62–67 (2011)

    Article  Google Scholar 

  51. Feamster, N., Balakrishnan, H.: Packet loss recovery for streaming video. In: 12th international packet video workshop. Pittsburgh (2002)

  52. Puri, R., Ramchandran, K.: Multiple description source coding using forward error correction codes. In: Conference record of the thirty-third asilomar conference on signals, systems, and computers, 1999, vol. 1, pp. 342–346 (1999)

  53. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)

    Article  MathSciNet  Google Scholar 

  54. Padala, P., Zhu, X., Wang, Z., Singhal, S., Shin, K.G., et al.: Performance evaluation of virtualization technologies for server consolidation. HP laboratories technical report (2007)

  55. Openvz. http://wiki.openvz.org/main_page. Accessed 20 May 2018

  56. Barham, Paul, Dragovic, Boris, Fraser, Keir, Hand, Steven, Harris, Tim, Ho, Alex, Neugebauer, Rolf, Pratt, Ian, Warfield, Andrew: Xen and the art of virtualization. SIGOPS Oper. Syst. Rev. 37(5), 164–177 (2003)

    Article  Google Scholar 

  57. Hyperic system information gatherer (sigar). http://sourceforge.net/projects/sigar/files/. Accessed 20 May 2018

  58. Bifet, A., Kirkby, R.: Data stream mining: a practical approach. Technical report, The University of Waikato (2009)

  59. Gama, J, Medas, P, Rocha, R: Forest trees for on-line data. In: Proceedings of the 2004 ACM symposium on applied computing, SAC ’04, pp. 632–636. ACM, New York (2004)

  60. Witten, I.H., Frank, E., Hall, M.A.: Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Cambridge (2011)

    Google Scholar 

  61. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pp. 71–80. ACM, New York, (2000)

  62. Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)

    Article  Google Scholar 

  63. Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connect. Sci. 8(3–4), 385–404 (1996)

    Article  Google Scholar 

  64. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: 30th Annual symposium on foundations of computer science, 1989, pp. 256–261 (1989)

  65. Oza, Nikunj, C., Russell, S.: Online bagging and boosting. In: In artificial intelligence and statistics, pp. 105–112. Cambridge, Morgan Kaufmann (2001)

  66. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) Computational learning theory. lecture notes in computer science, vol. 904, pp. 23–37. Springer, Berlin (1995)

    Google Scholar 

  67. Breiman, Leo: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    MATH  Google Scholar 

  68. Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M., Thornton, J. (eds.) AI 2007: advances in artificial intelligence. Lecture notes in computer science, vol. 4830, pp. 90–99. Springer, Berlin (2007)

    Chapter  Google Scholar 

  69. Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: Proceedings of the fourteenth international conference on machine learning, ICML ’97, pp. 161–169, Morgan Kaufmann Publishers Inc, San Francisco (1997)

  70. Kuncheva, Ludmila I.: Classifier ensembles for changing environments. In: Fabio R., Josef K., Terry W., (Eds.). Multiple classifier systems, volume 3077 of lecture notes in computer science, pp 1–15. Springer, Berlin (2004)

  71. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM international conference on data mining, pp. 443–448 (2007)

  72. rsync. http://rsync.samba.org/. Accessed 20 May 2018

  73. Mosberger, David, Jin, Tai: httperf—a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev. 26(3), 31–37 (1998)

    Article  Google Scholar 

  74. García, Roberto, Pañeda, Xabiel G., García, Victor, Melendi, David, Vilas, Manuel: Statistical characterization of a real video on demand service: user behaviour and streaming-media workload analysis. Simul. Model. Pract. Theory 15(6), 672–689 (2007)

    Article  Google Scholar 

  75. Sripanidkulchai, K., Maggs, B., Zhang, H.: An analysis of live streaming workloads on the internet. In: IMC ’04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp. 41–54. ACM, New York (2004)

  76. Finamore, A., Mellia, M., Munafò, M.M., Torres, R., Rao, S.G.: Youtube everywhere: impact of device and infrastructure synergies on user experience. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, IMC ’11, pp. 345–360. ACM, New York (2011)

  77. Kang, X., Zhang, H., Jiang, G., Chen, H., Meng, X., Yoshihira, K.: Measurement, modeling, and analysis of internet video sharing site workload: A case study. In: IEEE international conference on web services, 2008. ICWS’08, pp. 278–285. IEEE, New York (2008)

  78. Mori, T., Kawahara, R., Hasegawa, H., Shimogawa, S.: Characterizing traffic flows originating from large-scale video sharing services. In: Ricciato, F., Mellia, M., Biersack, E. (eds.) Traffic monitoring and analysis. Lecture notes in computer science, pp. 17–31. Springer, Berlin (2010)

    Chapter  Google Scholar 

  79. Adhikari, V.K., Jain, S., Chen, Y., Zhang, Z.-L.: Vivisecting youtube: an active measurement study. In: Proceedings IEEE INFOCOM, 2012, pp. 2521–2525 (2012)

  80. Summers, J., Brecht, T., Eager, D., Wong, B.: Methodologies for generating http streaming video workloads to evaluate web server performance. In: Proceedings of the 5th annual international systems and storage conference, SYSTOR ’12, pp. 2:1–2:12. ACM, New York (2012)

  81. Standard performance evaluation corporation. Specweb2009 benchmark. http://www.spec.org/web2009 (2010). Accessed 20 May 2018

  82. Stress tool. http://weather.ou.edu/~apw/projects/stress/. Accessed 20 May 2018

  83. Hemminger, S.: Network emulation with netem. In Linux Conf Au (2005)

  84. Jiang, W., Schulzrinne, H.: Modeling of packet loss and delay and their effect on real-time multimedia service quality. In: Proceedings of NOSSDAV ’2000 (2000)

  85. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

Download references

Acknowledgements

This research was supported by FCT-Portugal under grant SFRH/BD/35784 and Center of Studies in Education, Technologies and Health of the Polytechnic Institute of Viseu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Cunha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cunha, C. Building Autonomic Elements from Video-Streaming Servers. J Netw Syst Manage 28, 160–192 (2020). https://doi.org/10.1007/s10922-019-09503-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10922-019-09503-1

Keywords

Navigation