Abstract
Despite its importance in today’s Internet, network measurement was not an integral part of the original Internet architecture, i.e., there was (and still is) little native support for many essential measurement tasks. Targeting the inadequacy of counting/accounting capabilities of existing routers, many data streaming and sketching techniques have been proposed to estimate the important statistics of traffic going through a network link. Most of these techniques are, however, developed to track one specific statistic and/or answer a specific type of query. Since there are a large number of such statistics and queries of interest, it is very difficult, if not impossible, for network vendors and operators to implement and deploy data streaming/sketching solutions for all of them, due to router resource (memory, CPU, bus bandwidth, etc.) constraints.
In this paper, we propose a general-purpose solution that can not only answer a wide range of queries, but also be able to answer types of queries that were not known a priori. In particular, we introduce the use of the Conditional Random Sampling (CRS) sketch data structure for succinctly capturing network traffic data between a set of nodes in the network. This sketch is the first step towards a “universal” sketch data structure in the sense that it is not tied to measurement of a single quantity. We show that the CRS sketch can compute unbiased estimates for any linear summary statistic in the intersection of a pair of traffic streams, e.g., traffic and flow matrix information, flow counts, and entropy. We present detailed experiments, using data collected at a tier-1 ISP, that show that our sketch is capable of estimating this wide range of statistics with fairly high accuracy.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–143 (1999)
Broder, A.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences (1997)
Carter, L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)
Cohen, E., Kaplan, H.: Bottom-k sketches: better and more efficient estimation of aggregates. In: SIGMETRICS (2007)
Cohen, E., Kaplan, H.: Summarizing data using bottom-k sketches. In: PODC (2007)
Duffield, N., Grossglauser, M.: Trajectory sampling for direct traffic observation. IEEE Transaction of Networking, 280–292 (June 2001)
Duffield, N., Lund, C., Thorup, M.: Estimating flow distribution from sampled flow statistics. In: Proc. ACM SIGCOMM (August 2003)
Duffield, N.G., Lund, C., Thorup, M.: Flow sampling under hard resource constraints. In: Sigmetrics (2004)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proc. ACM SIGCOMM (2002)
Feinstein, L., Schnackenberg, D., Balupari, R., Kindred, D.: Statistical approaches to DDoS attack detection and response. In: Proceedings ofthe DARPA Information Survivability Conference and Exposition (2003)
Feldmann, A., Greenberg, A., Lund, C., Reingold, N., Rexford, J., True, F.: Deriving traffic demands for operational IP networks: Methodology and experience. IEEE Transaction on Networking (June 2001)
Gunnar, A., Johansson, M., Telkamp, T.: Traffic matrix estimation on a large ip backbone-a comparison on real data. In: USENIX/ACM SIGCOMM IMC (2004)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM 53(3), 307–323 (2006)
Krishnamurthy, B., Sen, S., Zhang, Y., Chen, Y.: Sketch-based change detection: Methods, evaluation, and applications. In: IMC (2003)
Kumar, A., Sung, M., Xu, J., Zegura, E.: Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: Proc. ACM SIGMETRICS (June 2005)
Kumar, A., Xu, J.: Sketch guided sampling-using on-line estimates of flow size for adaptive data collection. In: Proc. IEEE INFOCOM (March 2006)
Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. In: SIGCOMM (2005)
Li, P.: Improving compressed counting. In: UAI (2009)
Li, P., Church, K.W.: Using sketches to estimate associations. In: Human Language Technology and Empirical Methods in Natural Language Processing, HLT (2005)
Li, P., Church, K.W., Hastie, T.: Conditional random sampling: A sketch-based sampling technique for sparse data. In: NIPS (2006)
Li, P., Church, K.W., Hastie, T.: One sketch for all: Theory and application of conditional random sampling. In: NIPS (2008)
Li, P., Zhang, C.-H.: A new algorithm for compressed counting with applications in shannon entropy estimation in dynamic data. In: COLT (2011)
Medina, A., Taft, N., Salamatian, K., Bhattacharyya, S., Diot, C.: Traffic matrix estimation:existing techniques and new directions. In: SIGCOMM (2002)
Nucci, A., Cruz, R., Taft, N., Diot, C.: Design of igp link weight changes for estimation of traffic matrices. In: Proc. IEEE INFOCOM (March 2004)
Papagiannaki, K., Taft, N., Lakhina, A.: A distributed approach to measure traffic matrices. In: Proc. ACM/SIGCOMM IMC (October 2004)
Ramakrishna, M.V., Fu, E., Bahcekapili, E.: Efficient hardware hashing functions for high performance computers. IEEE Trans. Computers 46(12), 1378–1381 (1997)
Sekar, V., Reiter, M.K., Willinger, W., Zhang, H., Kompella, R.R., Andersen, D.G.: csamp: A system for network-wide flow monitoring. In: NSDI (2008)
Vardi, Y.: Internet tomography: estimating source-destination traffic intensities from link data. Journal of American Statistics Association, 365-377 (1996)
Wagner, A., Plattner, B.: Entropy Based Worm and Anomaly Detection in Fast IP Networks. In: Proceedings of IEEE International Workshop on Enabling Technologies, Infrastructures for Collaborative Enterprises (2005)
Xu, K., Zhang, Z.-L., Bhattacharya, S.: Profiling internet backbone traffic: Behavior models and applications. In: SIGCOMM (2005)
Zhang, Y., Roughan, M., Duffield, N., Greenberg, A.: Fast accurate computation of large-scale ip traffic matrices from link loads. In: Proc. ACM SIGMETRICS (June 2003)
Zhang, Y., Roughan, M., Lund, C., Donoho, D.: An information-theoretic approach to traffic matrix estimation. In: Proc. ACM SIGCOMM (August 2003)
Zhao, H., Lall, A., Ogihara, M., Spatscheck, O., Wang, J., Xu, J.: A data streaming algorithm for estimating entropies of OD flows. In: IMC (2007)
Zhao, Q., Kumar, A., Wang, J., Xu, J.: Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices. In: SIGMETRICS (June 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Zhao, H.(., Hua, N., Lall, A., Li, P., Wang, J., Xu, J.(. (2011). Towards a Universal Sketch for Origin-Destination Network Measurements. In: Altman, E., Shi, W. (eds) Network and Parallel Computing. NPC 2011. Lecture Notes in Computer Science, vol 6985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24403-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-24403-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24402-5
Online ISBN: 978-3-642-24403-2
eBook Packages: Computer ScienceComputer Science (R0)