Advertisement

The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

  • Christian Böhm
  • Bernhard Braunmüller
  • Hans-Peter Kriegel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1874)

Abstract

Numerous data mining algorithms rely heavily on similarity queries. Although many or even all of the performed queries do not depend on each other, the algorithms process them in a sequential way. Recently, a novel technique for efficiently processing multiple similarity queries issued simultaneously has been introduced. It was shown that multiple similarity queries substantially speed-up query intensive data mining applications. For the important case of multiple k-nearest neighbor queries on top of a multidimensional index structure the problem of scheduling directory pages and data pages arises. This aspect has not been addressed so far. In this paper, we derive the theoretic foundation of this scheduling problem. Additionally, we propose several scheduling algorithms based on our theoretical results. In our experimental evaluation, we show that considering the maximum priority of pages clearly outperforms other scheduling approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model for Nearest Neighbor Search in High-Dimensional Data Spaces’, Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, Tucson, USA, 1997, pp. 78–86.Google Scholar
  2. 2.
    Braunmüller B., Ester M., Kriegel H.-P., Sander J.: ‘Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases’, Proc. 16th Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 256–267.Google Scholar
  3. 3.
    Berchtold S., Böhm C., Jagadish H.V., Kriegel H.-P., Sander J.: ‘Independent Quantization: An Index Compression Technique for High-Dimensional Spaces’, Proc. Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 577–588.Google Scholar
  4. 4.
    Berchtold S., Kriegel H.-P.: ‘S3: Similarity Search in CAD Database Systems’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, USA, 1997, pp. 564–567.Google Scholar
  5. 5.
    Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, Proc. Conf. on Very Large Data Bases, Mumbai, India, 1996, pp. 28–39.Google Scholar
  6. 6.
    Breunig M. M., Kriegel H.-P., Ng R., Sander J.: ‘OPTICS-OF: Identifying Local Outliers’, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, Prague, 1999, in: Lecture Notes in Computer Science, Springer, Vol. 1704, 1999, pp. 262–270.Google Scholar
  7. 7.
    Böhm C.: ‘Efficiently Indexing High-Dimensional Data Spaces’, Ph.D. thesis, University of Munich, Munich, Germany, 1998.Google Scholar
  8. 8.
    Friedman J. H., Bentley J. L., Finkel R. A.: ‘An Algorithm for Finding Best Matches in Logarithmic Expected Time’, ACM Transactions on Mathematical Software, Vol. 3, No. 3, 1977, pp. 209–226.zbMATHCrossRefGoogle Scholar
  9. 9.
    Fayyad U. M., Piatetsky-Shapiro G., Smyth P.: ‘From Data Mining to Knowledge Discovery: An Overview’, Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996, pp. 1–34.Google Scholar
  10. 10.
    Gaede V., Günther O.:‘Multidimensional Access Methods’, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 170–231.CrossRefGoogle Scholar
  11. 11.
    Høg E. et al.: “The Tycho Catalogue”, Journal of Astronomy and Astrophysics, Vol. 323, 1997, pp. L57–L60.Google Scholar
  12. 12.
    Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. Int. Symp. on Large Spatial Databases, Portland, USA, 1995, pp. 83–95.Google Scholar
  13. 13.
    Knorr E.M., Ng R.T.: ‘Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining,’ IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 884–897.Google Scholar
  14. 14.
    Mitchell T.M.: ‘Machine Learning’, McGraw-Hill, 1997.Google Scholar
  15. 15.
    Roussopoulos N., Kelley S., Vincent F.: ‘Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, San Jose, USA, 1995, pp. 71–79.Google Scholar
  16. 16.
    Samet H.: ‘The Design and Analysis of Spatial Data Structures’, Addison-Wesley, 1989.Google Scholar
  17. 17.
    Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Data Bases, New York, USA, 1998, pp. 194–205.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Christian Böhm
    • 1
  • Bernhard Braunmüller
    • 1
  • Hans-Peter Kriegel
    • 1
  1. 1.University of MunichMunichGermany

Personalised recommendations