Skip to main content
Log in

The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Similarity search operations require executing expensive algorithms, and although broadly useful in many new applications, they rely on specific structures not yet supported by commercial DBMS. In this paper we discuss the new Omni-technique, which allows to build a variety of dynamic Metric Access Methods based on a number of selected objects from the dataset, used as global reference objects. We call them as the Omni-family of metric access methods. This technique enables building similarity search operations on top of existing structures, significantly improving their performance, regarding the number of disk access and distance calculations. Additionally, our methods scale up well, exhibiting sub-linear behavior with growing database size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th International Conference on Database Theory (ICDT). Lecture Notes in Computer Science, vol. 1973, pp. 420–434. Springer (2001).

  • Annamalai, M., Chopra, R., De Fazio, S.: Indexing images inoracle8i. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 539–547. ACM Press (2000)

  • Arantes, A.S., Vieira, M.R., Traina, A.J.M., Traina, C. Jr.: The fractal dimension making similarity queries more efficient. In: Proceedings of the II ACM SIGKDD Workshop on Fractals, Power Laws and Other Next Generation Data Mining Tools, pp. 12–17. Washington, USA (2003)

  • Baeza-Yates, R.A., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed-queries trees. In: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 807, pp. 198–212. Springer (1994)

  • Beckmann, N.: Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-Tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 322–331. ACM Press (1990)

  • Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In: Proceedings of 21th International Conference on Very Large Data Bases (VLDB), pp. 299–310. Morgan Kaufmann (1995)

  • Berman, A., Shapiro, L.G.: Selecting good keys fortriangle-inequality-based pruning algorithms. In: Proceedings of the International Workshop on Content-Based Access of Image and Video Databases (CAIVD), pp. 12–19. IEEE Computer Society (1998)

  • Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory (ICDT). Lecture Notes in Computer Science. vol. 1540, pp. 217–235. Springer (1999)

  • Bozkaya, T., Ózsoyoglu, Z. Meral.: Distance-based indexing for high-dimensional metric spaces. In: Proceedings of the 1997ACM SIGMOD International Conference on Management of Data, pp. 357–368. ACM Press (1997)

  • Bozkaya, T., Ózsoyoglu, Z. Meral.: Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. (TODS) 24(3), 361–404 (1999)

    Article  Google Scholar 

  • Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of 21th International Conference on Very Large DataBases (VLDB), pp. 574–584. Morgan Kaufmann (1995)

  • Burkhard, W.A., Keller, R.M.: Some approaches to best-match filesearching. Commun. ACM (CACM) 16(4),230–236 (1973)

    Article  MATH  Google Scholar 

  • Camastra, F., Vinciarelli, A.: Intrinsic dimension estimation of data: an approach based on Grassberger-Procaccia's algorithm. Neural. Process. Lett. 14(1), 27–34 (2001)

    Article  MATH  Google Scholar 

  • Chávez, E., Marroquín, J.L., Baeza-Yates, R.A.: Spaghettis: An array based algorithm for similarity queries inmetric spaces. In: Proceeding of the String Processing and Information Retrieval Symposium & International Workshop on Groupware (SPIRE/CRIWG), pp. 38–46. IEEE Computer Society (1999)

  • Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  • Ciaccia, P., Patella, M., Zezula, P.: M-Tree: An efficient access method for similarity search in metric spaces. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, pp. 426–435. Morgan Kaufmann Publishers (1997)

  • de Sousa, E.P.M., Traina, C. Jr., Traina, A.J.M., Faloutsos, C.: How to use fractal dimension to find correlations between attributes. In: Proceeding of the First Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches (in conjunction with 8th ACMSIGKDD International Conference on Knowledge Discovery & DataMining), Edmonton, Alberta, Canada, pp. 26–30. ACM Press (2002)

  • Faloutsos, C., Seeger, B., Traina, A.J.M., Traina, C. Jr.: Spatialjoin selectivity using power laws. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 177–188, Dallas, USA. ACM Press (2000)

  • Faragó, A., Linder, T., Lugosi, G.: Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell.(TPAMI) 15(9), 957–962(1993)

    Article  Google Scholar 

  • Fu, Ada Wai-Chee, Chan, Polly Mei Shuen, Cheung, Yin-Ling, Moon, Yiu Sang.: Dynamic vp-Tree indexing for n-nearest neighbor search given pair-wise distances. VLDB J. 9(2), 154–173 (2000)

    Article  Google Scholar 

  • Gaede, V., Günther, O.: Multi dimensional access methods. ACM Comput. Surveys 30(2), 170–231 (1998)

    Article  Google Scholar 

  • Gennaro, C., Savino, P., Zezula, P.: A hashed schema forsimilarity search in metric spaces. In: Proceeding of the 1st DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, pp. 83–88. Zurich, Switzerland (2000)

  • Guttman, A.: R-Tree : A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, USA, pp. 47–57. ACM Press (1984)

  • Hjaltason, G.R., Samet, H.: Index-driven similarity search inmetric spaces. ACM Trans. Database Syst. (TODS) 28(4), 517–580 (2003)

    Article  Google Scholar 

  • Ishikawa, M., Chen, H., Furuse, K., Yu, Jeffrey Xu, Ohbo, N.: Mb+tree: A dynamically updatable metric index for similarity searches. In: Proceedings of the First International Conference Web-Age Information Management (WAIM). Lecture Notes in Computer Science, vol. 1846, pp. 356–373. Springer (2000)

  • Jin, Hui, Ooi, Beng Chin, Shen, Heng Tao, Yu, Cui, Zhou, Aoying.: An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In: Proceedings of the 19th International Conference on Data Engineering (ICDE), pp. 87–98. IEEE Computer Society (2003)

  • Katayama, N., Satoh, S.: The SR-Tree: An index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 369–380. ACM Press (1997)

  • Korn, F., Pagel, Bernd-Uwe, Faloutsos, C.: On the ‘dimensionality curse’ and the ‘self-similarity blessing’. IEEE Trans. Knowledge Data Eng. (TKDE) 13(1), 96–111 (2001)

    Article  Google Scholar 

  • Koudas, N., Ooi, Beng Chin, Shen, Heng Tao, Tung, A.K.H.: Ldc: enabling search by partial distance in a hyper-dimensional space. In: Proceedings of the 20th International Conference on Data Engineering (ICDE), pp. 6–17. IEEE Computer Society (2004)

  • Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernet. Control Theory 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  • Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The tv-tree: an index structure for high-dimensional data. VLDB J. 3(4), 517–542 (1994)

    Article  Google Scholar 

  • Micó, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recog. Lett. 15(1), 9–17 (1994)

    Article  Google Scholar 

  • Moreno-Seco, F., Micó, L., Oncina, J.: Extending laesa fastnearest neighbour algorithm to find the k nearest neighbours. In: Proceedings of the International Workshop of Structural, Syntactic, and Statistical Pattern Recognition (SSPR), Lecture Notes in Computer Science, vol. 2396, pp. 718–724. Springer(2002)

  • Pagel, B.-U., Korn, F., Faloutsos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: Proceedings of the 16th International Conference on Data Engineering (ICDE), pp. 589–598. IEEE Computer Society (2000)

  • Santos Filho, R.F., Traina, A.J.M., Traina, C. Jr., Faloutsos, C.: Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of the 17th International Conference on Data Engineering (ICDE), Heidelberg, Germany, pp. 623–630. IEEE Computer Society (2001)

  • Schroeder, M.: Fractals, Chaos, Power Laws. W.H. Freeman &Company, New York, USA (1991)

    Google Scholar 

  • Sellis, T.K.: Nick Roussopoulos, and Christos Faloutsos. The R+-Tree: A dynamic index for multi-dimensional objects. In: Proceedings of 13th International Conference on Very Large Databases (VLDB), Brighton, England, pp. 507–518. Morgan Kaufmann Publishers (1987)

  • Senior, A.: A combination fingerprint classifier. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 23(10), 1165–1175 (2001)

    Article  Google Scholar 

  • Traina, C., Agma, J.M. Jr., Faloutsos, C.: Distance exponent: a new concept for selectivity estimation in metric trees. In: Proceedings of the 16th International Conference on Data Engineering (ICDE), San Diego - CA, pp. 195. IEEE Computer Society (2000)

  • Traina, A.J.M., Traina, C. Jr., Bueno, Josiane M., de Azevedo Marques, P.M.: The metric histogram: a new and effiretrieval. In: Proceedings of the Sixth IFIP Working Conference on Visual Database Systems (VDB), Brisbane, Australia, pp. 297–311. Kluwer Academic Publishers (2002)

  • Traina, C. Jr., Traina, A.J.M., Faloutsos, C., Seeger, B.: Fast indexing and visualization of metric datasets using slim-Trees. IEEE Trans. Knowledge Data Eng. (TKDE) 14(2), 244–260 (2002)

    Article  Google Scholar 

  • Traina, C. Jr., Traina, A.J.M., Faloutsos, C.: Distance exponent:a new concept for selectivity estimation in metric trees. Research Paper CMU-CS-99-110, Carnegie Mellon University - School of Computer Science, Pittsburgh-PA USA, March 1999

  • Traina, C. Jr., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-Trees: High performance metric trees minimizing overlap between nodes. In: Proceedings of the International Conference on Extending Database Technology (EDBT). Lecture Notes in Computer Science, vol. 1777, pp. 51–65, Konstanz, Germany. Springer (2000)

  • Traina, C. Jr., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast feature selection using fractal dimension. In: XV Brazilian Database Symposium (SBBD), João Pessoa, Brazil, pp. 158–171 (2000)

  • Uhlmann, J.K.: Satisfying general proximity/similarity querieswith metric trees. Inform. Process. Lett. 40(4), 175–179 (1991)

    Article  MATH  Google Scholar 

  • Wactlar, H.D., Christel, M.G., Gong, Y., Hauptmann, A.G.: Lessons learned from building a terabyte digital video library. IEEE Comput. 32(2), 66–73 (1999)

    Google Scholar 

  • Weber R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of 24rd International Conference on Very Large Data Bases (VLDB), pp. 194–205 (1998)

  • White, D.A., Jain, R.: Similarity indexing with the SS-Tree. In: Proceedings of the 12th International Conference on Data Engineering (ICDE), New Orleans, USA, pp. 516–523. IEEE Computer Society (1996)

  • Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)

    MathSciNet  MATH  Google Scholar 

  • Yianilos, P.N.: Data structures and algorithms for nearestneighbor search in general metric spaces. In: Proceedings of the 4th Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms (SODA), Austin, USA, pp. 311–321 (1993)

  • Yianilos, P.N.: Excluded middle vantage point forests for nearest neighbor search. Research paper, NEC Research Institute, Princeton, NJ, USA, Princeton, USA (1998)

  • Yu, Cui, Ooi, Beng Chin, Tan, Kian-Lee, Jagadish, H.V.: Indexing the distance: an efficient method to knn processing. In: Proceedings of 27th International Conference on Very Large Data Bases (VLDB), pp. 421–430. Morgan Kaufmann (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Caetano Traina Jr..

Rights and permissions

Reprints and permissions

About this article

Cite this article

Traina, C., Filho, R.F.S., Traina, A.J.M. et al. The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. The VLDB Journal 16, 483–505 (2007). https://doi.org/10.1007/s00778-005-0178-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0178-0

Keywords

Navigation