Querying Metric Spaces with Bit Operations
Metric search techniques can be usefully characterised by the time at which distance calculations are performed during a query. Most exact search mechanisms use a “just-in-time” approach where distances are calculated as part of a navigational strategy. An alternative is to use a“one-time” approach, where distances to a fixed set of reference objects are calculated at the start of each query. These distances are typically used to re-cast data and queries into a different space where querying is more efficient, allowing an approximate solution to be obtained.
In this paper we use a “one-time” approach for an exact search mechanism. A fixed set of reference objects is used to define a large set of regions within the original space, and each query is assessed with respect to the definition of these regions. Data is then accessed if, and only if, it is useful for the calculation of the query solution.
As dimensionality increases, the number of defined regions must increase, but the memory required for the exclusion calculation does not. We show that the technique gives excellent performance over the SISAP benchmark data sets, and most interestingly we show how increases in dimensionality may be countered by relatively modest increases in the number of reference objects used.
This work was supported by ESRC grant ES/L007487/1 “Administrative Data Research Centre—Scotland”. We would like to thank Tom Dalton for his help with preparation of the data and creating R scripts for rendering results, and Peter Christen along with the anonymous reviewers for helpful comments on earlier drafts.
- 2.Andrade, J.M., Astudillo, C.A., Paredes, R.: Metric space searching based on random bisectors and binary fingerprints. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 50–57. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_5CrossRefGoogle Scholar
- 8.Connor, R., Vadicamo, L., Cardillo, F.A., Rabitti, F.: Supermetric search. Inf. Syst. (2018). https://doi.org/10.1016/j.is.2018.01.002
- 9.Figueroa, K., Navarro, G., Chávez, E.: Metric spaces library (2007). http://www.sisap.org
- 11.Lokoč, J., Skopal, Y.: On applications of parameterized hyperplane partitioning. In: Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 2010, pp. 131–132. ACM, New York (2010)Google Scholar
- 13.Mic, V., Novak, D., Zezula, P.: Improving sketches for similarity search. Proc. MEMICS 2015, 45–57 (2015)Google Scholar
- 16.Rivero, L.C., Doorn, J.H., Ferraggine, V.E. (eds.): Encyclopedia of Database Technologies and Applications. Idea Group, Hershey (2005)Google Scholar
- 17.Silva, E., Teixeira, T., Teodoro, G., Valle, E.: Large-scale distributed locality-sensitive hashing for general metric data. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 82–93. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_8CrossRefGoogle Scholar
- 18.Tellez, E.S., Chavez, E.: On locality sensitive hashing in metric spaces. In: Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 2010, pp. 67–74. ACM, New York (2010)Google Scholar
- 20.Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search - the metric space approach. In: Advances in Database Systems (2006)Google Scholar