Fractal Mining
 Daniel Barbara,
 Ping Chen
 … show all 2 hide
Abstract
Selfsimilarity is the property of being invariant with respect to the scale used to look at the data set. Selfsimilarity can be measured using the fractal dimension. Fractal dimension is an important charactaristics for many complex systems and can serve as a powerful representation technique. In this chapter, we present a new clustering algorithm, based on selfsimilarity properties of the data sets, and also its applications to other fields in Data Mining, such as projected clustering and trend analysis. Clustering is a widely used knowledge discovery technique. The new algorithm which we call Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same clusterhave a great degree of selfsimilarity among them (and much less selfsimilarity with respect to points in other clusters). FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC effectively deals with large data sets, highdimensionality and noise and is capable of recognizing clusters of arbitrary shape.
 E. Backer. ComputerAssisted Reasoning in Cluster Analysis. Prentice Hall, 1995.
 A. Belussi and C. Faloutsos. Estimating the Selectivity of Spatial Queries Using the ‘Correlation’ Fractal Dimension. In Proceedings of the International Conference on Very Large Data Bases, pages 299–310, September 1995.
 P.S. Bradley, U. Fayyad, and C. Reina. Scaling Clustering Algorithms to Large Databases (Extended Abstract). In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, June 1998.
 CDIA. U.S. Historical Climatology Network Data, http://cdiac.esd.ornl.gov /epubs/ndpO19/ ushcn_r3.html.
 H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Annals of Mathematical Statistics, pages 493–509, 1952.
 C. Domingo, R. Gavaldá, and O. Watanabe. Practical Algorithms for Online Selection. In Proceedings of the first International Conference on Discovery Science, 1998.
 C. Domingo, R. Gavaldá, and O. Watanabe. Adaptive Sampling Algorithms for Scaling Up Knowledge Discovery Algorithms. In Proceedings of the second International Conference on Discovery Science, 2000.
 P. Domingos and G. Hulten. Mining HighSpeed Data Streams. In Proceedings of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000.
 C. Faloutsos and V. Gaede. Analysis of the Zordering Method Using the hausdorff Fractal Dimension. In Proceedings of the International Conference on Very Large Data Bases, pages 40–50, September 1996.
 C. Faloutsos and I. Kamel. Relaxing the Uniformity and Independence Assumptions, Using the Concept of Fractal Dimensions. Journal of Computer and System Sciences, 55(2):229–240, 1997. CrossRef
 C. Faloutsos, Y. Matias, and A. Silberschatz. Modeling Skewed Distributions Using Multifractals and the ‘8020 law’. In Proceedings of the International Conference on Very Large Data Bases, pages 307–317, September 1996.
 K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, California, 1990.
 P. Grassberger. Generalized Dimensions of Strange Attractors. Physics Letters, 97A:227–230, 1983.
 P. Grassberger and I. Procaccia. Characterization of Strange Attractors. Physical Review Letters, 50(5):346–349, 1983.
 S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, Seattle, Washington, pages 73–84, 1998.
 A. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988.
 L.S. Liebovitch and T. Toth. A Fast Algorithm to Determine Fractal Dimensions by Box Countig. Physics Letters, 141A(8), 1989.
 R.J. Lipton and J.F Naughton. Query Size Estimation by Adaptive Sampling. Journal of Computer Systems Science, pages 18–25, 1995.
 R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri. Efficient Sampling Strategies for Relational Database Operations. Theoretical Computer Science, pages 195–226, 1993.
 B.B. Mandelbrot. The Fractal Geometry of Nature. W.H. Freeman, New York, 1983.
 D.A. Menascé, V.A. Almeida, R.C. Fonseca, and M.A. Mendes. A Methodology for Workload Characterizatoin for Ecommerce Servers. In Proceedings of the ACM Conference in Electronic Commerce, Denver, CO, November 1999.
 J. Sarraille and P. DiFalco. FD3. http://tori.postech.ac.kr/softwares/.
 E. Schikuta. Grid clustering: An efficient hierarchical method for very large data sets. In Proceedings of the 13th Conference on Pattern Recognition, IEEE Computer Society Press, pages 101–105, 1996.
 M. Schroeder. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman, New York, 1991.
 S.Z. Selim and M.A. Ismail. KMeansType Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI6(1), 1984.
 G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A MultiResolution Clustering Approach for Very Large Spatial Databases. In Proceedings of the 24th Very Large Data Bases Conference, pages 428–439, 1998.
 W. Wang, J. Yand, and R. Muntz. STING: A statistical information grid approach to spatial data mining. In Proceedings of the 23rd Very Large Data Bases Conference, pages 186–195, 1997.
 O. Watanabe. Simple Sampling Techniques for Discovery Science. IEICE Transactions on Information and Systems, January 2000.
 Title
 Fractal Mining
 Book Title
 Data Mining and Knowledge Discovery Handbook
 Book Part
 V
 Pages
 pp 627647
 Copyright
 2005
 DOI
 10.1007/038725465X_28
 Print ISBN
 9780387244358
 Online ISBN
 9780387254654
 Publisher
 Springer US
 Copyright Holder
 Springer Science+Business Media, Inc.
 Additional Links
 Topics
 Keywords

 selfsimilarity
 clustering
 projected clustering
 trend analysis
 Industry Sectors
 eBook Packages
 Editors

 Oded Maimon ^{(1)}
 Lior Rokach ^{(1)}
 Editor Affiliations

 1. Dept. of Industrial Engineering, TelAviv University
 Authors

 Daniel Barbara ^{(2)}
 Ping Chen ^{(3)}
 Author Affiliations

 2. George Mason University, Fairfax, VA, 22030, USA
 3. University of HoustonDowntown, Houston, TX, 77002, USA
Continue reading...
To view the rest of this content please follow the download PDF link above.