Advertisement

3D Shape Histograms for Similarity Search and Classification in Spatial Databases

  • Mihael Ankerst
  • Gabi Kastenmüller
  • Hans-Peter Kriegel
  • Thomas Seidl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1651)

Abstract

Classification is one of the basic tasks of data mining in modern database applications including molecular biology, astronomy, mechanical engineering, medical imaging or meteorology. The underlying models have to consider spatial properties such as shape or extension as well as thematic attributes. We introduce 3D shape histograms as an intuitive and powerful similarity model for 3D objects. Particular flexibility is provided by using quadratic form distance functions in order to account for errors of measurement, sampling, and numerical rounding that all may result in small displacements and rotations of shapes. For query processing, a general filter-refinement architecture is employed that efficiently supports similarity search based on quadratic forms. An experimental evaluation in the context of molecular biology demonstrates both, the high classification accuracy of more than 90% and the good performance of the approach.

Keywords

3D Shape Similarity Search Quadratic Form Distance Functions Spatial Data Mining Nearest Neighbor Classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [ABKS 98]
    Ankerst M., Braunmüller B., Kriegel H.-P., Seidl T.: Improving Adaptable Similarity Query Processing by Using Approximatios. Proc. 24th Int. Conf. on Very Large Databases (VLDB’98), New York, USA. Morgan Kaufmann (1998) 206–217Google Scholar
  2. [AKS 98]
    Ankerst M., Kriegel H.-P., Seidl T.: A Multi-Step Approach for Shape Similarity Search in Image Databases.IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 6 (1998) 996–1004CrossRefGoogle Scholar
  3. [BBB+ 97]
    Berchtold S., Böhm C., Braunmüller B., Keim D., Kriegel H.-P.: Fast Parallel Similarity Search in Multimedia Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ. ACM Press (1997) 1–12, Best Paper AwardCrossRefGoogle Scholar
  4. [BBKK 97]
    Berchtold S., Böhm C., Keim D., Kriegel H.-P.: A Cost Model for Nearest Neighbor Search in High-Dimensional Data Spaces. Proc. 16th ACM SIGACT-SIGMODSIGART Symp. on Principles of Database Systems (PODS), Tucson, AZ (1997) 78–86Google Scholar
  5. [Ber 97]
    Berchtold S.: Geometry Based Search of Similar Mechanical Parts. Ph.D. Thesis, Institute for Computer Science, University of Munich.Shaker Verlag, Aachen (1997) in GermanGoogle Scholar
  6. [BKW+ 77]
    Bernstein F.C., Koetzle T.F., Williams G.J., Meyer E.F., Brice M.D., Rodgers J.R., Kennard O., Shimanovichi T., Tasumi M.: The Protein Data Bank: a Computer-based Archival File for Macromolecular Structures. Journal of Molecular Biology, Vol. 112 (1977) 535–542CrossRefGoogle Scholar
  7. [BKK 96]
    Berchtold S., Keim D., Kriegel H.-P.: The X-tree: An Index Structure for High-Dimensional Data.Proc. 22nd Int. Conf. on Very Large Data Bases (VLDB‘96), Mumbai, India. Morgan Kaufmann (1996) 28–39Google Scholar
  8. [BK 97]
    Berchtold S., Kriegel H.-P.: S3: Similarity Search in CAD Database Systems. Proc. ACM SIGMOD Int. Conf. on Management of Data. ACM Press (1997) 564–567Google Scholar
  9. [BKK 97]
    Berchtold S., Keim D.A., Kriegel H.-P.: Using Extended Feature Objects for Partial Similarity Retrieval. VLDB Journal, Vol. 6, No. 4. Springer Verlag, Berlin Heidelberg New York (1997) 333–348Google Scholar
  10. [BKK 97a]
    Berchtold S., Keim D.A., Kriegel H.-P.: Section Coding: A Method for Similarity Search in CAD Databases. Proc. German Conf. on Databases for Office Automation, Technology, and Science (BTW). Series Informatik Aktuell. Springer Verlag, Berlin Heidelberg New York (1997) 152–171; in GermanGoogle Scholar
  11. [BKSS 90]
    Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles.Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ. ACM Press (1990) 322–331Google Scholar
  12. [CHY 96]
    Chen M.-S., Han J. and Yu P.S.: Data Mining: An Overview from a Database Perspective.IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6 (1996) 866–883CrossRefGoogle Scholar
  13. [FBF+ 94]
    Faloutsos C., Barber R., Flickner M., Hafner J., Niblack W., Petkovic D., Equitz W.: Efficient and Effective Querying by Image Content.Journal of Intelligent Information Systems, Vol. 3 (1994) 231–262CrossRefGoogle Scholar
  14. [GG 98]
    Gaede V., Günther O.: Multidimensional Access Methods.ACM Computing Surveys, Vol. 30, No. 2 (1998) 170–231CrossRefGoogle Scholar
  15. [GM 93]
    Gary J.E., Mehrotra R.: Similar Shape Retrieval Using a Structural Feature Index. Information Systems, Vol. 18, No. 7 (1993) 525–537CrossRefGoogle Scholar
  16. [Gut 84]
    Guttman A.: R-trees: A Dynamic Index Structure for Spatial Searching.Proc. ACM SIGMOD Int. Conf. on Management of Data, Boston, MA. ACM Press (1984) 47–57Google Scholar
  17. [HS 94]
    Holm L., Sander C.: The FSSP Database of Structurally Aligned Protein Fold Families. Nucleic Acids Research, Vol. 22 (1994) 3600–3609Google Scholar
  18. [HS 95]
    Hjaltason G.R., Samet H.: Ranking in Spatial Databases. Proc. 4th Int. Symposium on Large Spatial Databases (SSD’95). Lecture Notes in Computer Science, Vol. 951. Springer Verlag, Berlin Heidelberg New York (1995) 83–95Google Scholar
  19. [HS 98]
    Holm L., Sander C.: Touring Protein Fold Space with Dali/FSSP.Nucleic Acids Research, Vol. 26 (1998) 316–319CrossRefGoogle Scholar
  20. [HSE+ 95]
    Hafner J., Sawhney H.S., Equitz W., Flickner M., Niblack W.: Efficient Color Histogram Indexing for Quadratic Form Distance Functions. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 17, No. 7. IEEE Press (1995) 729–736CrossRefGoogle Scholar
  21. [Jag 91]
    Jagadish H.V.: A Retrieval Technique for Similar Shapes.Proc. ACM SIGMOD Int. Conf. on Management of Data. ACM Press (1991) 208–217Google Scholar
  22. [Kas 98]
    Kastenmüller G.: Shape-oriented Similarity Search in 3D Protein Database Systems. Diploma Thesis, Institute for Computer Science, University of Munich (1998) in GermanGoogle Scholar
  23. [KS 98]
    Kriegel H.-P., Seidl T.: Approximation-Based Similarity Search for 3-D Surface Segments. GeoInformatica Journal, Vol. 2, No. 2. Kluwer Academic Publishers (1998) 113–147CrossRefGoogle Scholar
  24. [KSF+ 96]
    Korn F., Sidiropoulos N., Faloutsos C., Siegel E., Protopapas Z.: Fast Nearest Neighbor Search in Medical Image Databases. Proc. 22nd VLDB Conference, Mumbai, India. Morgan Kaufmann (1996) 215–226Google Scholar
  25. [KSS 97]
    Kriegel H.-P., Schmidt T., Seidl T.: 3D Similarity Search by Shape Approximation. Proc. Fifth Int. Symposium on Large Spatial Databases (SSD’97), Berlin, Germany. Lecture Notes in Computer Science, Vol. 1262. Springer Verlag, Berlin Heidelberg New York (1997) 11–28Google Scholar
  26. [LW 88]
    Lamdan Y., Wolfson H.J.: Geometric Hashing: A General and Efficient Model-Based Recognition Scheme. Proc. IEEE Int. Conf. on Computer Vision, Tampa, Florida, 1988 238–249Google Scholar
  27. [Mit 97]
    Mitchell T.M.: Machine Learning. McCraw-Hill, (1997)Google Scholar
  28. [MST 94]
    Michie D., Spiegelhalter D.J., Taylor C.C.: Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994)Google Scholar
  29. [OMJ+ 97]
    Orengo C.A., Michie A.D., Jones S., Jones D.T. Swindells M.B., Thornton, J.M.: CATH-A Hierarchic Classification of Protein Domain Structures. Structure, Vol. 5, No. 8 (1997)1093–1108CrossRefGoogle Scholar
  30. [Sam 90]
    Samet H.: The Design and Analysis of Spatial Data Structures. Addison Wesley (1990)Google Scholar
  31. [Sei 97]
    Seidl T.: Adaptable Similarity Search in 3-D Spatial Database Systems. Ph.D. Thesis, Institute for Computer Science, University of Munich (1997). Herbert Utz Verlag, Munich, http://utzverlag.com, ISBN: 3-89675-327-4
  32. [SK 95]
    Seidl T., Kriegel H.-P.: A 3D Molecular Surface Representation Supporting Neighborhood Queries. Proc. 4th Int. Symposium on Large Spatial Databases (SSD’95), Portland, Maine, USA. Lecture Notes in Computer Science, Vol. 951. Springer Verlag, Berlin Heidelberg New York (1995)240–258Google Scholar
  33. [SK 97]
    Seidl T., Kriegel H.-P.: Efficient User-Adaptable Similarity Search in Large Multimedia Databases. Proc. 23rd Int. Conf. on Very Large Databases (VLDB’97), Athens, Greece. Morgan Kaufmann (1997) 506–515Google Scholar
  34. [SK 98]
    Seidl T., Kriegel H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, Washington (1998)154–165Google Scholar
  35. [SRF 87]
    Sellis T., Roussopoulos N., Faloutsos C.: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. Proc. 13th Int. Conf. on Very Large Databases, Brighton, England (1987) 507–518Google Scholar
  36. [TC 91]
    Taubin G., Cooper D.B.: Recognition and Positioning of Rigid Objects Using Algebraic Moment Invariants. in Geometric Methods in Computer Vision, Vol. 1570, SPIE (1991) 175–186Google Scholar
  37. [WK 91]
    Weiss S.M., Kulikowski C.A.: Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann, San Francisco (1991)Google Scholar
  38. [WSB 98]
    Weber R., Schek H.-J., Blott S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. Proc. 24th Int. Conf. on Very Large Databases (VLDB’98), New York, USA. Morgan Kaufmann (1998) 194–205Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Mihael Ankerst
    • 1
  • Gabi Kastenmüller
    • 1
  • Hans-Peter Kriegel
    • 1
  • Thomas Seidl
    • 1
  1. 1.University of MunichMunichGermany

Personalised recommendations