Advertisement

A Density-Biased Sampling Technique to Improve Cluster Representativeness

  • Ana Paula Appel
  • Adriano Arantes Paterlini
  • Elaine P. M. de Sousa
  • Agma J. M. Traina
  • Caetano TrainaJr.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4702)

Abstract

The volume and complexity of data collected by modern applications has grown significantly, leading to increasingly costly operations for both data manipulation and analysis. Sampling is an useful technique to support manager a more sensible volume in the data reduction process. Uniform sampling has been widely used but, in datasets exhibiting skewed cluster distribution, biased sampling shows better results. This paper presents the BBS - Biased Box Sampling algorithm which aims at keeping the skewed tendency of the clusters from the original data. We also present experimental results obtained with the proposed BBS algorithm.

Keywords

Local Density Leaf Node Hash Table Original Dataset Sampling Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Palmer, C.R., Faloutsos, C.: Density biased sampling: An improved method for data mining and clustering. In: ACM SIGMOD, San Diego, pp. 82–92. ACM Press, New York (2000)CrossRefGoogle Scholar
  2. 2.
    Kerdprasop, K., Kerdprasop, N., Sattayatham, P.: Density-biased clustering based on reservoir sampling. In: DEXA Workshops, Copenhagen, pp. 1122–1126 (2005)Google Scholar
  3. 3.
    Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. TKDE 15(5), 1170–1187 (2003)Google Scholar
  4. 4.
    Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: Indexed-based density biased sampling for clustering applications. DKE 57(1), 37–63 (2006)CrossRefGoogle Scholar
  5. 5.
    Appel, A.P., Paterlini, A.A., Sousa, E.P.M.d., Traina Jr., C., Traina, A.J.M.: Biased box sampling - a density-biased sampling for clustering. In: ACM SAC, Seoul, Korea, pp. 445–446 (2007)Google Scholar
  6. 6.
    Traina, J.C., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast feature selection using fractal dimension. In: Brazilian Symposium on Data Base - SBBD, João Pessoa, PB, Brazil, pp. 158–171 (2000)Google Scholar
  7. 7.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ana Paula Appel
    • 1
  • Adriano Arantes Paterlini
    • 1
  • Elaine P. M. de Sousa
    • 1
  • Agma J. M. Traina
    • 1
  • Caetano TrainaJr.
    • 1
  1. 1.Computer Science Department - ICMC, University of São Paulo at São CarlosBrazil

Personalised recommendations