Abstract
What patterns can we find in a bursty web traffic? On the web or internet graph itself? How about the distributions of galaxies in the sky, or the distribution of a company’s customers in geographical space? How long should we expect a nearest-neighbor search to take, when there are 100 attributes per patient or customer record? The traditional assumptions (uniformity, independence, Poisson arrivals, Gaussian distributions), often fail miserably. Should we give up trying to find patterns in such settings?
Self-similarity, fractals and power laws are extremely successful in describing real datasets (coast-lines, rivers basins, stock-prices, brain-surfaces, communication-line noise, to name a few). We show some old and new successes, involving modeling of graph topologies (internet, web and social networks); modeling galaxy and video data; dimensionality reduction; and more.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Arya, M., et al.: QBISM: A prototype 3-D medical image database system. IEEE Data Engineering Bulletin 16(1), 38–42 (1993)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD, Washington, DC, May 26-28, pp. 207–216 (1993)
Albert, R., Jeong, H., Barabasi, A.-L.: Diameter of the world-wide web. Nature 401, 130–131 (1999)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of VLDB Conf., Santiago, Chile, September 12-15, pp. 487–499 (1994)
Bak, P.: How nature works: The science of self-organized criticality (September 1996)
Barabasi, A.-L.: Linked: The New Science of Networks, 1st edn., Perseus Publishing (May 2002)
Berchtold, S., Boehm, C., Braunmueller, B., Keim, D.A., Kriegel, H.-P.: Fast similarity search in multimedia databases. In: SIGMOD Conference, pp. 1–12 (1997)
Berchtold, S., Boehm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: PODS, pp. 78–86 (1997)
Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the correlation fractal dimension. In: Proc. of VLDB, Zurich, Switzerland, September 1995, pp. 299–310 (1995)
Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, Englewood Cliffs (1994)
Broder, A., Kumar, R., Maghoul1, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web: experiments and models. In: WWW Conf. (2000)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual (web) search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)
Barnsley, M.F., Sloan, A.D.: A better way to compress images. Byte, 215–223 (January 1988)
Castagli, M., Eubank, S.: Nonlinear Modeling and Forecasting. In: Proc. Vol. XII, Addison Wesley, Reading (1992)
Christodoulakis, S.: Implication of certain assumptions in data base performance evaluation. ACM TODS (June 1984)
Chen, H., Schroeder, J., Hauck, R., Ridgeway, L., Atabaksh, H., Gupta, H., Boarman, C., Rasmussen, K., Clements, A.: Coplink connect: Information and knowledge management for law enforcement. CACM 46(1), 28–34 (2003)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On powerlaw relationships of the internet topology. In: SIGCOMM, pp. 251–262 (1999)
Faloutsos, C., Gaede, V.: Analysis of the z-ordering method using the hausdorff fractal dimension. In: VLDB (September 1996)
Faloutsos, C., Kamel, I.: Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In: Proc. ACM SIGACT-SIGMOD-SIGART PODS, Minneapolis, MN, May 24-26, pp. 4–13 (1994); Also available as CS-TR-3198,UMIACS-TR-93-130
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proc. ACM SIGMOD, Minneapolis, MN, May 25-27, pp. 419–429 (1994); ’Best Paper’ award; also available as CS-TR-3190, UMIACS-TR-93-131, ISR TR-93-86.
Garofalakis, M.N., Gehrke, J., Rastogi, R.: Querying and mining data streams: You only get one look. In: ACM SIGMOD, June 2002, p. 635 (2002) (tutorial)
Hastings, H.M., Sugihara, G.: Fractals: A User’s Guide for the Natural Sciences. Oxford University Press, Oxford (1993)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Leland, W.E., Taqqu, M.S., Willinger, W., Wilson, D.V.: On the selfsimilar nature of ethernet traffic. IEEE Transactions on Networking 2(1), 1–15 (1994) (earlier version in SIGCOMM, pp. 183-193) (1993)
Mandelbrot, B.: Fractal Geometry of Nature. W.H. Freeman, New York (1977)
Montgomery, A.L., Faloutsos, C.: Identifying web browsing trends and patterns. IEEE Computer 34(7), 94–95 (2001)
Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: ACM SIGMOD (2003)
Papadimitriou, S., Brockwell, A., Faloutsos, C.: Adaptive, hands-off stream mining. In: VLDB (September 2003)
Proietti, G., Faloutsos, C.: Accurate modeling of region data. IEEE TKDE 13(6), 874–883 (2001)
Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: SIGKDD, Edmonton, Canada, pp. 61–70 (2002)
Ripeanu, M., Foster, I., Iamnitchi, A.: Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Computing Journal 6(1) (2002)
Schroeder, M.: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman and Company, New York (1991)
Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: Tri-plots: Scalable tools for multidimensional data mining. In: KDD ( August 2001)
Traina, C., Traina, A., Wu, L., Faloutsos, C.: Fast feature selection using the fractal dimension. In: XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil (October 2000)
Wu, L., Faloutsos, C.: Making every bit count: Fast nonlinear axis scaling. In: KDD (July 2002)
Wang, C., Knight, J.C., Elder, M.C.: On computer viral infection and the effect of immunization. In: ACSAC, pp. 246–256 (2000)
Zipf, G.K.: Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley, Cambridge (1949)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faloutsos, C. (2003). Next Generation Data Mining Tools: Power Laws and Self-similarity for Graphs, Streams and Traditional Data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive