Skip to main content

On Data Partitioning in Tree Structure Metric-Space Indexes

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8421))

Included in the following conference series:

Abstract

Tree structure metric-space indexing methods recursively partition data according to their distances to a set of selected reference points (also called pivots). There are two basic forms of data partitioning: ball partition and General Hyper-plane (GH) partition. Most existing work only shows their superiority experimentally, and little theoretical proof is found. We propose an approach to unify existing data partitioning methods and analyze their performance theoretically. First, in theory, we unify the two basic forms of partitioning by proving that there are rotations of each other. Second, we show several theoretical or experimental results, which are able to indicate that ball partition outperforms GH partition. Our work takes a step forward in the theoretical study of metric-space indexing and is able to give a guideline of future index design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837 (1964)

    MathSciNet  Google Scholar 

  2. Bentley, J.L.: Multidimensional binary search trees used for associative searching. ACM Commun. 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  3. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: ICML, pp. 97–104 (2006)

    Google Scholar 

  4. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) 5th Annual ACM Workshop on COLT, pp. 144–152. ACM Press, Pittsburgh (1992)

    Google Scholar 

  5. Bozkaya, T., Ozsoyoglu, M.: Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24(3), 361–404 (1999)

    Article  Google Scholar 

  6. Brin, S.: Near Neighbor Search in Large Metric Spaces. In: The 21th International Conference on Very Large Data Bases (VLDB 1995). Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  7. Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniquesfor proximity searching in metric spaces. Pattern Recogn. Lett. 24(14), 2357–2366 (2003)

    Article  MATH  Google Scholar 

  8. Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press (2001)

    Google Scholar 

  9. Chavez, E., Navarro, G., Baeza-Yates, R., Marroqu, J.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  10. Ciaccia, P., Patella, M.: Bulk loading the M-tree. In: 9th Australasian Database Conference, ADO 1998 (1998)

    Google Scholar 

  11. Ciaccia, P., Patella, M.: Proceedings of the Third International Conference on Similarity Search and Applications, Istanbul, Turkey, September 18-19. ACM, New York (2010)

    Book  Google Scholar 

  12. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. Presented at the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece (1997)

    Google Scholar 

  13. Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20 (1995)

    Google Scholar 

  14. Feng, Y.-C., Kui, C., Cao, Z.-S.: A Multidimensional Index Structure for Fast Similarity Retrieval. Journal of Software 13(8), 1678–1685 (2002)

    Google Scholar 

  15. Filho, R.F.S., Traina, A.J.M., Traina, C., Faloutsos, C.: Similaritysearch without tears: The OMNI family of all-purpose accessmethods. In: ICDE, pp. 623–630 (2001)

    Google Scholar 

  16. Fuchs, H., Kedem, Z.M., Naylor, B.F.: On visible surface generation by a priori tree structures. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1980), SIGGRAPH Computer Graphics, vol. 14(3), pp. 124–133. ACM, New York (1980)

    Google Scholar 

  17. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions on Database Systems (TODS) 28(4), 517–580 (2003)

    Article  Google Scholar 

  18. Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Mathematics of Operational Research 10(2), 180–184 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  19. Jagadish, H.V., Ooi, B.C., Tan, K., Yu, C., Zhang, R.: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)

    Article  Google Scholar 

  20. Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Prentice Hall (2007)

    Google Scholar 

  21. Karger, D., Ruhl, M.: Finding nearest neighbors in growth restricted metrics. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 741–750 (2002)

    Google Scholar 

  22. Li, J.-Z., Zhang, Z.-G.: Haperplane Tree: A Structure of Indexing Metric Spaces for Similarity Search Queries. Journal of Computer Research and Development 40(8), 1209–1215 (2003)

    Google Scholar 

  23. Lokoč, J., Skopal, T.: On applications of parameterized hyper-plane partitioning. Poster in the Proceedings of the Third International Conference on Similarity Search and Applications (SISAP2010), Istanbul, Turkey, September 18-19, pp. 131–132 (2010)

    Google Scholar 

  24. Mao, R., Miranker, W., Miranker, D.P.: Pivot Selection:Dimension Reduction for Distance-Based Indexing. Journal of Discrete Algorithms 13, 32–46 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  25. Mao, R., Xu, W., Ramakrishnan, S., Nuckolls, G., Miranker, D.P.: On Optimizing Distance-Based Similarity Search for Biological Databases. In: The 2005 IEEE Computational Systems Bioinformatics Conference (2005)

    Google Scholar 

  26. Matousek, J.: Lectures on Discrete Geometry, p. 497. Springer-Verlag New York, Inc. (2002)

    Google Scholar 

  27. MoBIoS test suite, http://www.cs.utexas.edu/~mobios/

  28. Navarro, G.: Searching in Metric Spaces by Spatial Approximation. In: Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware. IEEE Computer Society (1999)

    Google Scholar 

  29. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan-Kaufmann (2006)

    Google Scholar 

  30. Shen, H.T., Ooi, B.C., Zhou, X.: Towards Effective Indexing for Very Large Video Sequence Database. In: SIGMOD Conference 2005, pp. 730–741 (2005)

    Google Scholar 

  31. SISAP test suite, http://sisap.org/Metric_Space_Library.html

  32. Traina Jr., C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-trees: High performance metric trees minimizing overlap between nodes. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 51–65. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  33. Uhlmann, J.K.: Satisfying General Proximity/Similarity Queries with Metric Trees. Information Processing Letter 40(4), 175–179 (1991)

    Article  MATH  Google Scholar 

  34. Venkateswaran, J., Kahveci, T., Jermaine, C.M., Lachwani, D.: Reference-based indexing for metric spaces with costly distance measures. VLDB J. 17(5), 1231–1251 (2008)

    Article  Google Scholar 

  35. Xu, W., Miranker, D.P.: A Metric Model of Amino Acid Substitution. Bioinformatics 20(8), 1214–1221 (2004)

    Article  Google Scholar 

  36. Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: The Fourth Annual ACM-SIAM Symposium on Discrete Algorithms.Society for Industrial and Applied Mathematics (1993)

    Google Scholar 

  37. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: the Metric Space Approach. Springer, Heidelberg (2006)

    Google Scholar 

  38. Zhang, Z.-G., Li, J.-Z.: An Algorithm Based on RGH-Tree for Similarity Search Queries. Journal of software 13(10), 1969–1976 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Mao, R., Liu, S., Xu, H., Zhang, D., Miranker, D.P. (2014). On Data Partitioning in Tree Structure Metric-Space Indexes. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05810-8_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05809-2

  • Online ISBN: 978-3-319-05810-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics