Journal of Network and Systems Management

, Volume 27, Issue 1, pp 188–232 | Cite as

Joint Minimization of Monitoring Cost and Delay in Overlay Networks: Optimal Policies with a Markovian Approach

  • Sandrine VatonEmail author
  • Olivier Brun
  • Maxime Mouchet
  • Pablo Belzarena
  • Isabel Amigo
  • Balakrishna J. Prabhu
  • Thierry Chonavel


Continuous monitoring of network resources enables to make more-informed resource allocation decisions but incurs overheads. We investigate the trade-off between monitoring costs and benefits of accurate state information for a routing problem. In our approach link delays are modeled by Markov chains or hidden Markov models. The current delay information on a link can be obtained by actively monitoring this link at a fixed cost. At each time slot, the decision maker chooses to monitor a subset of links with the objective of minimizing a linear combination of long-run average delay and monitoring costs. This decision problem is modeled as a Markov decision process whose solution is computed numerically. In addition, in simple settings we prove that immediate monitoring cost and delay minimization leads to a threshold policy on a filter which sums up information from past measurements. The lightweight method as well as the optimal policy are tested on several use-cases. We demonstrate on an overlay of 30 nodes of RIPE Atlas that we obtain delay values close to the performance of the always best path with an extremely low monitoring effort when delays between nodes are modeled with hierarchical Dirichlet process hidden Markov models.


Active monitoring Routing overlays Markov chains Hidden Markov models HDP-HMM Markov decision processes Sparse monitoring Round trip times RIPE Atlas 



The authors would like to thank the STIC AmSud program which financially supports their collaboration through the PROVE Project (2016–2017). P. Belzarena was partially supported by CSIC, UDELAR (GRUPOS I+D, ARTES).


  1. 1.
    Peterson, L., Shenker, S., Turner, J.: Overcoming the internet impasse through virtualization. In: Proceedings of the 3rd ACM Workshop on Hot Topics in Networks (HotNets-III) (2004)Google Scholar
  2. 2.
    Touch, J., Wang, Y., Eggert, L., Finn, G.: A virtual Internet architecture. Technical Report ISI-TR-2003-570, ISI (2003)Google Scholar
  3. 3.
    Feamster, N., Balakrishnan, H., Rexford, J., Shaikh, A., van der Merwe, J.: The case for separating routing from routers. In: Proceedings of the ACM SIGCOMM Workshop on Future Directions in Network Architecture, ACM Press, editor (2004)Google Scholar
  4. 4.
    Beck, M., Moore, T., Plank, J.S.: An end-to-end approach to globally scalable programmable networking. In: Proceedings of the ACM SIGCOMM Workshop on Future Directions in Network Architecture, ACM Press, editor (2003)Google Scholar
  5. 5.
    Belzarena, P., Aspirot, L.: End-to-end quality of service seen by applications: a statistical learning approach. Comput. Netw. 54(17), 3123–3143 (2010)CrossRefGoogle Scholar
  6. 6.
    RIPE NCC Staff: RIPE Atlas: a global internet measurement network. Internet Protoc. J. 18(3), 2–26 (2015)Google Scholar
  7. 7.
    RIPE Atlas: Accessed 01 Jan 2017
  8. 8.
    Pucha, H., Zhang, Y., Mao, Z.M., Hu, Y.C.: Understanding network delay changes caused by routing events. SIGMETRICS Perform. Eval. Rev. 35(1), 73–84 (2007)CrossRefGoogle Scholar
  9. 9.
    Rimondini, M., Squarcella, C., Di Battista, G.: From BGP to RTT and beyond: Matching BGP routing changes and network delay variations with an eye on traceroute paths. arXiv preprint arXiv:1309.0632 (2013)
  10. 10.
    Schwartz, Y., Shavitt, Y., Weinsberg, U.: A measurement study of the origins of end-to-end delay variations. In: Passive and Active Measurement (PAM) (2010)Google Scholar
  11. 11.
    Shih, M.-F., Hero, A.O.: Unicast-based inference of network link delay distributions with finite mixture models. IEEE Trans. Signal Process. 51, 2219–2228 (2003)CrossRefGoogle Scholar
  12. 12.
    Fontugne, R., Mazel, J., Fukuda, K.: An empirical mixture model for large-scale RTT measurements. In IEEE Conference on Computer Communications (INFOCOM) (2015)Google Scholar
  13. 13.
    Dempster, A.P., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (1998)zbMATHGoogle Scholar
  16. 16.
    Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Bayesian inference in HSMMs and HMMs. Accessed 01 Nov 2017
  18. 18.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific, Belmont (2000)Google Scholar
  19. 19.
    Bellman, R.: A Markov decision process. J. Math. Mech. 6, 679–684 (1957)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Andersen, D., Balakrishnan, H., Kaashoek, F., Morris, R.: Resilient overlay networks. In: Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, SOSP ’01, pp. 131–145, New York, NY, USA. ACM (2001)Google Scholar
  21. 21.
    Gelenbe, E., Lent, R., Montuori, A., Xu, Z.: Towards networks with cognitive packets. In Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (IEEE MASCOTS), San Francisco, CA, USA, pp. 3–12, August 29–September 1 (2000)Google Scholar
  22. 22.
    Gelenbe, E., Kazhmaganbetova, Z.: Cognitive packet network for bilateral asymmetric connections. IEEE Trans. Ind. Inf. 10(3), 1717–1725 (2014)CrossRefGoogle Scholar
  23. 23.
    Kotronis, V., Dimitropoulos, X., Ager, B.: Outsourcing the routing control logic: better internet routing based on SDN principles. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp. 55–60, New York, NY, USA. ACM (2012)Google Scholar
  24. 24.
    Jain, S., Kumar, A., Mandal, S., Ong, J., Poutievski, L., Singh, A., Venkata, S., Wanderer, J., Zhou, J., Zhu, M., Zolla, J., Hölzle, U., Stuart, S., Vahdat, A.: B4: Experience with a globally-deployed software defined WAN. SIGCOMM Comput. Commun. Rev. 43(4), 3–14 (2013)CrossRefGoogle Scholar
  25. 25.
    Fressancourt, A., Gagnaire, M.: A SDN-based network architecture for cloud resiliency. In: 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC) (2015)Google Scholar
  26. 26.
    Francois, F., Gelenbe, E.: Optimizing secure SDN-enabled inter-data centre overlay networks through cognitive routing. In: 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 283–288 (2016)Google Scholar
  27. 27.
    Belzarena, P., Gomez, G., Amigo, I., Vaton, S.: SDN-based overlay networks for QoS-aware routing. In: ACM SIGCOMM Workshop on Fostering Latin-American Research in Data Communication Networks (2016)Google Scholar
  28. 28.
    van Adrichem, N.L.M., Doerr, C., Kuipers, F.A.: OpenNetMon: network monitoring in OpenFlow software-defined networks. In: 2014 IEEE Network Operations and Management Symposium (NOMS), pp. 1–8 (2014)Google Scholar
  29. 29.
    Yu, C., Lumezanu, C., Sharma, A., Xu, Q., Jiang, G., Madhyastha, H.V.: Software-Defined Latency Monitoring in Data Center Networks, pp. 360–372. Springer, Cham (2015)Google Scholar
  30. 30.
    Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Moy, J.: RFC 7348: Virtual eXtensible Local Area Network (VXLAN): a framework for overlaying virtualized layer 2 networks over layer 3 networks. Technical report (2014)Google Scholar
  32. 32.
    Moats, R.: Open DOVE. (2013). Accessed 1 May 2018
  33. 33.
    Andreev, K., Maggs, B.M., Meyerson, A., Sitaraman, R.: Designing overlay multicast networks for streaming. In: Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), San Diego, CA, USA (2003)Google Scholar
  34. 34.
    Rahul, H., Kasbekar, M., Sitaraman, R., Berger, A.: Towards realizing the performance and availability benefits of a global overlay network. In: Passive and Active Measurement Conference, Adelaide, Australia (2006)Google Scholar
  35. 35.
    Leighton, T.: Improving performance on the internet. Commun. ACM 52(2), 44–51 (2009)CrossRefGoogle Scholar
  36. 36.
    Nygren, E., Sitaraman, R.K., Sun, J.: The Akamai network: a platform for high-performance internet applications. ACM SIGOPS Oper. Syst. Rev. 44(3), 2–19 (2010)CrossRefGoogle Scholar
  37. 37.
    Sitaraman, R.K., Kasbekar, M., Lichtenstein, W., Jain, M.: Overlay networks: an Akamai perspective. In: Pathan, M., Sitaraman, R.K., Robinson, D. (eds.) Advanced Content Delivery, Streaming, and Cloud Services. Wiley, Hoboken (2014)Google Scholar
  38. 38.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. In: SIGCOMM’01, San Diego, California, USA, August 27–31 (2001)Google Scholar
  39. 39.
    Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Proceedings of the 18th IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001) (2001)Google Scholar
  40. 40.
    Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: a resilient global-scale overlay for service deployment. IEEE J. Sel. Areas Commun. 22, 41–53 (2003)CrossRefGoogle Scholar
  41. 41.
    Chu, Y.H., Rao, S.G., Zhang, H.: A case for end system multicast. In: ACM SIGMETRICS 2000, ACM, editor, pp. 1–12, Santa Clara, CA (2000)Google Scholar
  42. 42.
    Banerjee, S., Bhattacharjee, B., Kommareddy, C., Varghese, G.: Scalable application layer multicast. In: Proceedings of the ACM SIGCOMM, New York, USA (2002)Google Scholar
  43. 43.
    Pendarakis, D., Shi, S., Verma, D., Waldvogel, M.: ALMI: an application level multicast infrastructure. In: Proceedings of the 3rd USNIX Symposium on Internet Technologies and Systems (USITS), San Francisco, CA, USA (2001)Google Scholar
  44. 44.
    Liebeherr, J., Beam, T.K.: Hypercast: a protocol for maintaining multicast group members in a logical hypercube topology. In: Proceedings of the First International COST264 Workshop on Networked Group Communication, pp. 72–89. Springer (1999)Google Scholar
  45. 45.
    Babay, A., Danilov, C., Lane, J., Miskin-Amir, M., Obenshain, D., Schultz, J., Stanton, J., Tantillo, T., Amir, Y.: Structured overlay networks for a new generation of internet services. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1771–1779 (2017)Google Scholar
  46. 46.
    Stone, R.: Centertrack: an IP overlay network for tracking DoS floods. In: Proceedings of the USENIX Security Symposium ’00 (2000)Google Scholar
  47. 47.
    Wang, J., Lu, L., Chien, A.A.: Tolerating denial-of-service attacks using overlay networks—impact of overlay network topology. In: Proceedings of the First ACM Workshop on Survivable and Self-Regenerative Systems (2003)Google Scholar
  48. 48.
    Collins, A.: The Detour framework for packet rerouting. Technical report (1998)Google Scholar
  49. 49.
    Gummadi, K.P., Madhyastha, H.V., Gribble, S.D., Levy, H.M., Wetherall, D.: Improving the reliability of Internet paths with one-hop source routing. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation (2004)Google Scholar
  50. 50.
    Hu, S.-Y., Liao, G.-M.: Scalable peer-to-peer networked virtual environment. In: NetGames’04: Proceedings of 3rd ACM SIGCOMM Workshop on Network and System Support for Games, pp. 129–133, New York, NY, USA. ACM Press (2004)Google Scholar
  51. 51.
    Nakao, A., Peterson, L., Bavier, A.: Scalable routing overlay networks. SIGOPS Oper. Syst. Rev. 40(1), 49–61 (2006)CrossRefGoogle Scholar
  52. 52.
    Medagliani, P., Paris, S., Leguay, J., Maggi, L., Xue, C., Zhou, H.: Overlay routing for fast video transfers in CDN. CoRR, arXiv:1701.09011 (2017)
  53. 53.
    Rai, A., Singh, R., Modiano, E.: A distributed algorithm for throughput optimal routing in overlay networks. CoRR, arXiv:1612.05537 (2016)
  54. 54.
    Chen, Y., Bindel, D., Song, H., Katz, R.H.: An algebraic approach to practical and scalable overlay network monitoring. ACM SIGCOMM Comput. Commun. Rev. 34(4), 55–66 (2004)CrossRefGoogle Scholar
  55. 55.
    Li, F., Thottan, M.: End-to-end service quality measurement using source-routed probes. In: INFOCOM (2006)Google Scholar
  56. 56.
    Brun, O., Wang, L., Gelenbe, E.: Big data for autonomic intercontinental overlays. IEEE J. Sel. Areas Commun. 34, 575–584 (2016). (special Issue on Emerging Technologies in Communications—Big data)CrossRefGoogle Scholar
  57. 57.
    Gellman, M.: QoS Routing for real-time traffic. Ph.D. thesis, Imperial College, London (2007)Google Scholar
  58. 58.
    Brun, O., Hassan, H., Vallet, J.: Scalable, self-healing, and self-optimizing routing overlays. In: IFIP Networking 2016, Vienna, Austria, May 17–19 (2016)Google Scholar
  59. 59.
    Sahhaf, S., Tavernier, W., Colle, D., Pickavet, M.: Adaptive and reliable multipath provisioning for media transfer in SDN-based overlay networks. Comput. Commun. 106, 107–116 (2017)CrossRefGoogle Scholar
  60. 60.
    Vardi, Y.: Network tomography: estimating source-destination traffic intensities from link data. J. Am. Stat. Assoc. 91(433), 365–377 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  61. 61.
    Coates, A., Hero III, A.O., Nowak, R., Bin, Yu.: Internet tomography. IEEE Signal Process. Mag. 19(3), 47–65 (2002)CrossRefGoogle Scholar
  62. 62.
    Rubenstein, D., Kurose, J., Towsley, D.: Detecting shared congestion of flows via end-to-end measurement. IEEE/ACM Trans. Netw. 10(3), 381–395 (2002)CrossRefGoogle Scholar
  63. 63.
    Etemadi Rad, N., Ephraim, Y., Mark, B.L.: Delay network tomography using a partially observable bivariate Markov chain. IEEE/ACM Trans. Netw. 25(1), 126–138 (2017)CrossRefGoogle Scholar
  64. 64.
    Horton, J.D., Lopez-Ortiz, A.: On the number of distributed measurement points for network tomography. In: Proceedings of the 2003 ACM SIGCOMM Conference on Internet Measurement, pp. 204–209 (2003)Google Scholar
  65. 65.
    Bejerano, Y., Rastogi, R.: Robust monitoring of link delays and faults in IP networks. IEEE/ACM Trans. Netw. 14(5), 1092–1103 (2006)CrossRefGoogle Scholar
  66. 66.
    Kumar, R., Kaur, J.: Practical beacon placement for link monitoring using network tomography. IEEE J. Sel. Areas Commun. 24(12), 1092–1103 (2006)Google Scholar
  67. 67.
    Pignolet, Y.A., Schmid, S., Trédan, G.: Tomographic node placement strategies and the impact of the routing model. Proc. ACM Meas. Anal. Comput. Syst. 1(2), 42:1–42:23 (2017)CrossRefGoogle Scholar
  68. 68.
    He, T., Ma, L., Gkelias, A., Leung, K.K., Swami, A., Towsley, D.: Robust monitor placement for network tomography in dynamic networks. In: IEEE INFOCOM (2016)Google Scholar
  69. 69.
    Gopalan, A., Ramasubramanian, S.: On identifying additive link metrics using linearly independent cycles and paths. IEEE/ACM Trans. Netw. 20(3), 906–916 (2012)CrossRefGoogle Scholar
  70. 70.
    Ma, L., He, T., Leung, K.K., Towsley, D., Swami, A.: Efficient identification of additive link metrics via network tomography. In: IEEE ICDCS (2013)Google Scholar
  71. 71.
    Tootaghaj, D.Z., He, T., La Porta, T.: Parsimonious tomography: Optimizing cost-identifiability trade-off for probing-based network monitoring. In: IFIP Performance 2017 (2017)Google Scholar
  72. 72.
    He, T.: Distributed link anomaly detection via partial network tomography. In: IFIP Performance (2017)Google Scholar
  73. 73.
    Larranaga, M., Assaad, M., Destounis, A., Paschos, G.S.: Asymptotically optimal pilot allocation over Markovian fading channels. ArXiv e-prints (2016)Google Scholar
  74. 74.
    Krishnamurthy, V.: Algorithms for optimal scheduling and management of hidden Markov model sensors. IEEE Trans. Signal Process. 50, 1382–1397 (2002)MathSciNetCrossRefGoogle Scholar
  75. 75.
    Krishnamurthy, V.: Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. Cambridge University Press, Cambridge (2016)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IMT Atlantique, IRISAUBLBrestFrance
  2. 2.Facultad de IngenieríaUniversidad de la RepúblicaMontevideoUruguay
  3. 3.CNRS, LAAS-CNRSUniversité de ToulouseToulouseFrance
  4. 4.IMT Atlantique, Lab-STICCUBLBrestFrance

Personalised recommendations