Skip to main content
Log in

Scalable algorithms for estimating flow length distributions from sampled data

  • Published:
Computing Aims and scope Submit manuscript

Abstract

With the rapid growth of link speed, obtaining detailed traffic statistics becomes much more difficult. In order to reduce the resource consumption of measurement systems, more and more passive traffic measurement employs sampling at the packet level. Packet sampling has become an attractive and scalable method to measure flow data on high-speed links. However, knowing the length distributions of traffic flows passing through a network link is very useful for network operation and management. In this paper, we consider the problem of estimating flow length distributions based on sampled flow data. This paper introduces two algorithms for estimating flow length distributions, fitting estimation and factoring estimation. The fitting estimation uses piecewise Pareto distribution to fit the original traffic for a small sampling period. The factoring estimation is used for a large sampling period. It first factorizes the large sampling period into a product of some smaller integer factors, then iteratively invokes fitting estimation or other algorithms such as \({\textit{EM}}\) in the arranged order of the factors. Evaluations of the proposed algorithms on large Internet traces obtained from several sources demonstrate that they have high measurement accuracy with low computation overhead. The algorithms allow us to recover the complete flow length distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Li T, Chen S, Ling Y (2012) Per-flow traffic measurement through randomized counter sharing. IEEE/ACM Trans Netw 20(5):1622–1634

    Article  Google Scholar 

  2. Hu C, Liu B, Wang S, Tian J, Cheng Y, Chen Y (2012) ANLS: adaptive non-linear sampling method for accurate flow size measurement. IEEE Trans commun 60(3):789–798

    Article  Google Scholar 

  3. Marold A, Lieven P, Scheuermann B (2012) Probabilistic parallel measurement of network traffic at multiple locations. IEEE Netw 26(1):6–12

    Article  Google Scholar 

  4. Yoon M, Li T, Chen S, Peir J (2011) Fit a compact spread estimator in small high-speed memory. IEEE/ACM Trans Netw 19(5):1253–1264

    Article  Google Scholar 

  5. Duffield N, Lund C, Thorup M (2005) Estimating flow distributions from sampled flow statistics. IEEE/ACM Trans Netw 13(5):933–945

    Article  MathSciNet  Google Scholar 

  6. Wojcik R, Jajszczyk A (2012) Flow oriented approaches to QoS assurance. ACM Comput Surv 44(1):5

    Article  Google Scholar 

  7. Hohn N, Veitch D (2006) Inverting sampled traffic. IEEE/ACM Trans Netw 14(1):68–80

    Article  Google Scholar 

  8. Kumar A, Xu J, Li L, Wang J (2004) Space code bloom filter for efficient traffic flow measurement. In: IEEE INFOCOM 2004, pp 1762–1773

  9. Kumar A, Sung M, Xu J, Wang J (2004) Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: ACM SIGMETRICS 2004, pp 177–188

  10. Packet Sampling (psamp) (2005) http://www.ietf.org/html.charters/psamp-charter.html. Accessed 02 Feb 2005

  11. Duffield N, Grossglauser M (2001) Trajectory sampling for direct traffic observation. IEEE/ACM Trans Netw 9(3):280–292

    Article  Google Scholar 

  12. Duffield N, Grossglauser M (2004) Trajectory sampling with unreliable reporting. In: IEEE Infocom 2004, pp 1570–1581

  13. Sampled Cisco http://www.cisco.com/en/US/products/sw/iosswrel/ps1829/products_feature_guide09186a0080081201.html. Accessed 8 Dec 2002

  14. NeTraMet Version 4.4. http://www2.auckland.ac.nz/net/Accounting/ntm.Release.note.html. Accessed Dec 2002

  15. Duffield N, Lund C, Thorup M (2002) Properties and prediction of flow statistics from sampled packet streams. In: ACM SIGCOMM internet measurement, workshop 2002, November, pp 159–171

  16. Yang L, Michailidis G (2007) Sampled based estimation of network traffic flow characteristics. In: IEEE INFOCOM 2007, pp 1775–1783

  17. Jeff Wu CF (1983) On the convergence properties of the EM algorithm. Ann Stat. 11(1):95–103

    Google Scholar 

  18. Loiseau P, Gonçalves P, Girard S, Forbes F, Vicat-Blanc P (2009) Maximum likelihood estimation of the flow size distribution tail index from sampled packet data. In: ACM sigmetrics conference, Seattle, WA, USA, June 2009, pp 263–274

  19. Ribeiro B, Towsley D, Ye T, Bolot J (2006) Fisher information on sampled packets: an application to flow size estimation. In: ACM/SIGCOMM internet measurement conference, Rio de Janeiro, Oct 2006, pp 15–26

  20. Tune P, Veitch D (2008) Towards optimal sampling for flow size estimation. In: ACM IMC 2008, October

  21. Li K, Shen H (2005) Coordinated enroute multimedia object caching in transcoding proxies for tree networks. ACM Trans Multimed Comput Commun Appl 5(3):289–314

    Article  Google Scholar 

  22. Li K, Shen H, Chin F, Zheng S (2005) Optimal methods for coordinated enroute web caching for tree networks. ACM Trans Internet Technol 5(3):480–507

    Article  Google Scholar 

  23. Hardy GH, Wright EM (1979) An introduction to the theory of numbers, 5th edn. Oxford University Press, New York

    MATH  Google Scholar 

  24. WIDE (2012) http://tracer.csl.sony.co.jp/mawi/samplepoint-F/

  25. JSLAB (2012) http://ntds.njnet.edu.cn/data/index.php. Accessed Oct 2012

  26. NLANR (2012) ftp://wits.cs.waikato.ac.nz/pma/long/ipls/3/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenyu Qu.

Additional information

This work is supported in part by National Natural Science Foundation of China under Grant No.61370198 and 61370199, the Scientific Research Fund of Liaoning Provincial Education Department under Grant No.L2013195, and the Fundamental Research Funds for the Central Universities under Grant No.3132013040 and 3132013335.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, W., Qu, W. & Liu, Z. Scalable algorithms for estimating flow length distributions from sampled data. Computing 96, 527–543 (2014). https://doi.org/10.1007/s00607-014-0386-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-014-0386-9

Keywords

Mathematics Subject Classification (2010)

Navigation