Advertisement

Knowledge and Information Systems

, Volume 6, Issue 6, pp 750–770 | Cite as

An Efficient Density-based Approach for Data Mining Tasks

  • Carlotta DomeniconiEmail author
  • Dimitrios Gunopulos
Article

Abstract

We propose a locally adaptive technique to address the problem of setting the bandwidth parameters for kernel density estimation. Our technique is efficient and can be performed in only two dataset passes. We also show how to apply our technique to efficiently solve range query approximation, classification and clustering problems for very large datasets. We validate the efficiency and accuracy of our technique by presenting experimental results on a variety of both synthetic and real datasets.

Keywords

Bandwidth setting Classification Clustering Kernel density estimation Range query approximation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bennett KP, Fayyad U, Geiger D (1999) Density-Based Indexing for Approximate Nearest-Neighbor Queries. Proc of the Int Conf on Knowl Discovery and Data MiningGoogle Scholar
  2. 2.
    Bradley PS, Fayyad U, Reina C (1998) Scaling Clustering Algorithms to Large Datasets. Proc of the Int Conf on Knowl Discovery and Data MiningGoogle Scholar
  3. 3.
    Breiman L, Meisel W, Purcell E (1977) Variable Kernel Estimates of Multivariate Densities. Technometrics 13:135–144Google Scholar
  4. 4.
    Chakrabarti K, Garofalakis MN, Rastogi R, Shim K (2000) Approximate Query Processing Using Wavelets. Proc of the Int Conf on Very Large Data BasesGoogle Scholar
  5. 5.
    Cressie NAC (1993) Statistics For Spatial Data. Wiley, New YorkGoogle Scholar
  6. 6.
    Friedman JH, Fisher NI (1999) Bump Hunting in High-Dimensional Data. Stat Comput 9(2):123–143CrossRefzbMATHGoogle Scholar
  7. 7.
    Gunopulos D, Kollios G, Tsotras V, Domeniconi C (2000) Approximating multi-dimensional aggregate range queries over real attributes. Proc of the ACM SIGMOD Int Conf on Management of DataGoogle Scholar
  8. 8.
    Haas PJ, Swami AN (1992) Sequential Sampling Procedures for Query Size Estimation. Proc of the ACM SIGMOD Int Conf on Management of DataGoogle Scholar
  9. 9.
    Hinneburg A, Keim DA (1998) An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Proc of the Int Conf on Knowledge Discovery and Data MiningGoogle Scholar
  10. 10.
    Ioannidis Y, Poosala V (1999) Histogram-Based Approximation of Set-Valued Query-Answers. Proc of the Int Conf on Very Large Data BasesGoogle Scholar
  11. 11.
    Lowe DG (1995) Similarity Metric Learning for a Variable-Kernel Classifier Neural Computation 7:72–95Google Scholar
  12. 12.
    Manku GS, Rajagopalan S, Lindsay BG (1998) Approximate Medians and other Quantiles in One Pass and with Limited Memory. Proc of the ACM SIGMOD Int Conf on Management of DataGoogle Scholar
  13. 13.
    McLachlan GJ (1992) Discriminant Analysis and Statistical Pattern Recognition. Wiley, New YorkGoogle Scholar
  14. 14.
    Park BV, Turlach BA (1992) Practical performance of several data driven bandwidth selectors. Comput Stat 7:251–270zbMATHGoogle Scholar
  15. 15.
    Poosala V, Ioannidis YE (1997) Selectivity Estimation Without the Attribute Value Independence Assumption. Proc of the Int Conf on Very Large Data BasesGoogle Scholar
  16. 16.
    Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan-KaufmannGoogle Scholar
  17. 17.
    Scott D (1992) Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New YorkGoogle Scholar
  18. 18.
    Sain SR (1999) Multivariate Locally Adaptive Density Estimation. Technical Report, Department of Statistical Science, Southern Methodist UniversityGoogle Scholar
  19. 19.
    Shanmugasundaram J, Fayyad U, Bradley P (1999) Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. Proc of the Int Conf on Knowl Discovery and Data MiningGoogle Scholar
  20. 20.
    Terrell GR, Scott DW (1992) Variable Kernel Density Estimation. Ann Stat 20:1236–1265zbMATHGoogle Scholar
  21. 21.
    Vitter JS, Wang M, Iyer BR (1998) Data Cube Approximation and Histograms via Wavelets. Proc of the ACM CIKM Int Conf on Information and Knowledge ManagementGoogle Scholar
  22. 22.
    Wand MP, Jones MC (1995) Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman & HallGoogle Scholar
  23. 23.
    Weber R, Schek HJ, Blott S (1998) A Quantitative Analysis and Performance Study for Similarity Search Methods in High-Dimensional Spaces. Proc of the Intern Conf on Very Large Data BasesGoogle Scholar

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  1. 1.Information and Software Engineering DepartmentGeorge Mason UniversityFairfaxUSA
  2. 2.Computer Science DepartmentUniversity of CaliforniaRiversideUSA

Personalised recommendations