Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic

Filtser, Arnold; Filtser, Omrit; Katz, Matthew J.

doi:10.1007/s00453-022-01080-1

Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic

Published: 14 December 2022

Volume 85, pages 1490–1519, (2023)
Cite this article

Algorithmica Aims and scope Submit manuscript

316 Accesses
1 Citation
Explore all metrics

Abstract

In the $(1+{\varepsilon },r)$-approximate near-neighbor problem for curves (ANNC) under some similarity measure $\delta $, the goal is to construct a data structure for a given set $\mathcal {C}$ of curves that supports approximate near-neighbor queries: Given a query curve Q, if there exists a curve $C\in \mathcal {C}$ such that $\delta (Q,C)\le r$, then return a curve $C'\in \mathcal {C}$ with $\delta (Q,C')\le (1+{\varepsilon })r$. There exists an efficient reduction from the $(1+{\varepsilon })$-approximate nearest-neighbor problem to ANNC, where in the former problem the answer to a query is a curve $C\in \mathcal {C}$ with $\delta (Q,C)\le (1+{\varepsilon })\cdot \delta (Q,C^*)$, where $C^*$ is the curve of $\mathcal {C}$ most similar to Q. Given a set $\mathcal {C}$ of n curves, each consisting of m points in d dimensions, we construct a data structure for ANNC that uses $n\cdot O(\frac{1}{{\varepsilon }})^{md}$ storage space and has O(md) query time (for a query curve of length m), where the similarity measure between two curves is their discrete Fréchet or dynamic time warping distance. Our method is simple to implement, deterministic, and results in an exponential improvement in both query time and storage space compared to all previous bounds. Further, we also consider the asymmetric version of ANNC, where the length of the query curves is $k \ll m$, and obtain essentially the same storage and query bounds as above, except that m is replaced by k. Finally, we apply our method to a version of approximate range counting for curves and achieve similar bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Nearest-Neighbor Query and Clustering of Planar Curves

Computing the Fréchet Gap Distance

Article 03 August 2020

FRESH: Fréchet Similarity with Hashing

Notes

Since our storage space is already in $O(\frac{1}{{\varepsilon }})^{md}$, and $m\cdot 2^{2\,m}\le 3^{2\,m}$ is in $O(1)^{md}$, we could have used this larger upper bound. However, in Lemma 4 we show a tight upper bound on the number of relevant alignments, which may be useful for other applications.
See [5] for a closely related more recent result on simplifications with bounded length.

References

Afshani, P., Driemel, A.: On the complexity of range searching among curves. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pp 898–917, (2018), https://doi.org/10.1137/1.9781611975031.58
Aronov, B., Filtser, O., Horton, M., Katz, M.J., Sheikhan, K.: Efficient nearest-neighbor query and clustering of planar curves. In: Algorithms and Data Structures—16th International Symposium, WADS 2019, Edmonton, AB, Canada, August 5–7, 2019, Proceedings, pp 28–42 (2019), https://doi.org/10.1007/978-3-030-24766-9_3
Buchin, K., Driemel, A., Gudmundsson, J., Horton, M., Kostitsyna, I., Löffler, M., Struijs, M.: Approximating (k, l)-center clustering for curves. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pp 2922–2938, (2019), https://doi.org/10.1137/1.9781611975482.181
Bringmann, K., Driemel, A., Nusser, A., Psarros, I.: Tight bounds for approximate near neighbor searching for time series under the Fréchet distance. In: Symposium on Discrete Algorithms, SODA (2022)
Buchin, M., Driemel, A., van Greevenbroek, K., Psarros, I., Rohde, D.: Approximating length-restricted means under dynamic time warping. In: Approximation and Online Algorithms—20th International Workshop, WAOA, volume 13538, pp 225–253, (2022), https://doi.org/10.1007/978-3-031-18367-6_12
Bereg, S., Jiang, M., Wang, W., Yang, B., Zhu, B.: Simplifying 3D polygonal chains under the discrete Fréchet distance. In LATIN 2008: Theoretical Informatics, 8th Latin American Symposium, Búzios, Brazil, April 7-11, 2008, Proceedings, pp 630–641, (2008), https://doi.org/10.1007/978-3-540-78773-0_54
Bringmann, K.: Why walking the dog takes time: Fréchet distance has no strongly subquadratic algorithms unless SETH fails. In: 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pp 661–670, 2014, https://doi.org/10.1109/FOCS.2014.76
de Berg, M., Cook, A.F., IV., Gudmundsson, J.: Fast Fréchet queries. Comput. Geom. 46(6), 747–755 (2013). https://doi.org/10.1016/j.comgeo.2012.11.006
Article MATH MathSciNet Google Scholar
de Berg, M., Gudmundsson, J., Mehrabi, A. D.: A dynamic data structure for approximate proximity queries in trajectory data. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2017, Redondo Beach, CA, USA, November 7–10, 2017, pp 48:1–48:4, (2017), https://doi.org/10.1145/3139958.3140023
Driemel, A., Har-Peled, S.: Jaywalking your dog: Computing the Fréchet distance with shortcuts. SIAM J. Comput. 42(5), 1830–1866 (2013). https://doi.org/10.1137/120865112
Article MATH MathSciNet Google Scholar
Driemel, A., Psarros, I.: ANN for time series under the Fréchet distance. In: A. Lubiw and M. R. Salavatipour, editors, Algorithms and Data Structures—17th International Symposium, WADS 2021, Virtual Event, August 9-11, 2021, Proceedings, volume 12808 of Lecture Notes in Computer Science, pp 315–328. Springer, (2021), https://doi.org/10.1007/978-3-030-83508-8_23
Driemel, A., Psarros, I., Schmidt, M.: Sublinear data structures for short Fréchet queries. CoRR, abs/1907.04420, 2019, arXiv:1907.04420
Driemel, A., Silvestri, F.: Locality-sensitive hashing of curves. In Proceedings of the 33rd International Symposium on Computational Geometry, volume 77, pp 37:1–37:16, Brisbane, Australia, July 2017. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, https://doi.org/10.4230/LIPIcs.SoCG.2017.37
Emiris, I.Z., Psarros, I.: Products of Euclidean metrics, applied to proximity problems among curves: unified treatment of discrete Fréchet and dynamic time warping distances. ACM Trans. Spatial Algorithms Syst. 6(4), 27:1-27:20 (2020). https://doi.org/10.1145/3397518
Article Google Scholar
Filtser, A., Filtser, O., Katz, M. J.: Approximate nearest neighbor for curves—simple, efficient, and deterministic. In: A. Czumaj, A. Dawar, and E. Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 48:1–48:19. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020, https://doi.org/10.4230/LIPIcs.ICALP.2020.48
Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8(1), 321–350 (2012). https://doi.org/10.4086/toc.2012.v008a014
Article MATH MathSciNet Google Scholar
Har-Peled, S., Kumar, N.: Approximate nearest neighbor search for low dimensional queries. In: D. Randall, editor, Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, January 23–25, 2011, pp 854–867. SIAM, 2011, https://doi.org/10.1137/1.9781611973082.67
Indyk, P.: High-dimensional computational geometry. PhD thesis, Stanford University, 2000
Indyk, P.: Approximate nearest neighbor algorithms for Fréchet distance via product metrics. In: Proceedings of the 8th Symposium on Computational Geometry, pp 102–106, Barcelona, Spain, June 2002. ACM Press, https://doi.org/10.1145/513400.513414
Kumar, P., Mitchell, J. S. B., Yildirim, E. A.: Comuting core-sets and approximate smallest enclosing hyperspheres in high dimensions. In: Proceedings of the Fifth Workshop on Algorithm Engineering and Experiments, Baltimore, MD, USA, January 11, 2003, pp 45–55, (2003), https://doi.org/10.1145/996546.996548
Lemire, D.: Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recogn. 42(9), 2169–2180 (2009). https://doi.org/10.1016/j.patcog.2008.11.030
Article MATH Google Scholar
Megiddo, N.: Linear programming in linear time when the dimension is fixed. J. ACM 31(1), 114–127 (1984). https://doi.org/10.1145/2422.322418
Article MATH MathSciNet Google Scholar
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004). https://doi.org/10.1016/j.jalgor.2003.12.002
Article MATH MathSciNet Google Scholar
Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (neural Information Processing). The MIT press, Cambridge (2006)
Book Google Scholar

Download references

Acknowledgements

We wish to thank Boris Aronov for helpful discussions on the problems studied in this paper.

Funding

Arnold Filtser was partially supported by Grant 1042/22 from the Israel Science Foundation. Omrit Filtser was supported by the Eric and Wendy Schmidt Fund for Strategic Innovation, by the Council for Higher Education of Israel, and by Ben-Gurion University of the Negev. Matthew J. Katz was partially supported by Grant 1884/16 from the Israel Science Foundation.

Author information

Authors and Affiliations

Bar-Ilan University, Ramat Gan, Israel
Arnold Filtser
The Open University of Israel, Ra’anana, Israel
Omrit Filtser
Ben-Gurion University of the Negev, Be’er Sheva, Israel
Matthew J. Katz

Authors

Arnold Filtser
View author publications
You can also search for this author in PubMed Google Scholar
Omrit Filtser
View author publications
You can also search for this author in PubMed Google Scholar
Matthew J. Katz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Omrit Filtser.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper excluding Sect. 6 and Appendices B and C has appeared in ICALP’20 [15].

Appendices

Appendix: A Deterministic Construction Using a Prefix Tree

When implementing the dictionary $\mathcal {D}$ as a hash table, the construction of the data structure is randomized and thus in the worst case we might get higher prepeocessing time. To avoid this, we can implement $\mathcal {D}$ as a prefix tree.

1.1 Appendix: A.1 Discrete Fréchet Distance

In this section we describe the implementation of $\mathcal {D}$ as a prefix tree in the case of ANNC under DFD.

We can construct a prefix tree $\mathcal {T}$ for the curves in $\mathcal {I}$, where any path in $\mathcal {T}$ from the root to a leaf corresponds to a curve that is stored in it. For each $1\le i\le n$ and curve ${\overline{Q}}\in \mathcal {I}_i$, if ${\overline{Q}}\notin \mathcal {T}$, insert ${\overline{Q}}$ into $\mathcal {T}$, and set $C({\overline{Q}})\leftarrow C_i$.

Each node $v\in \mathcal {T}$ corresponds to a grid point from $\mathcal {G}$. Denote the set of v’s children by N(v). We store with v a multilevel search tree on N(v), with a level for each coordinate. The points in $\mathcal {G}$ are the grid points contained in nm balls of radius $(1+{\varepsilon })r$. Thus when projecting these points to a single dimension, the number of 1-dimensional points is at most $nm\cdot \frac{\sqrt{d}(1+{\varepsilon })2r}{{\varepsilon }r}=O(\frac{nm\sqrt{d}}{{\varepsilon }})$. So in each level of the search tree on N(v) we have $O(\frac{nm\sqrt{d}}{{\varepsilon }})$ 1-dimensional points, so the query time is $O(d\log (\frac{nmd}{{\varepsilon }}))$.

Inserting a curve of length m to the tree $\mathcal {T}$ takes $O(md\log (\frac{nmd}{{\varepsilon }}))$ time. Since $\mathcal {T}$ is a compact representation of $|\mathcal {I}|=n\cdot O(\frac{1}{{\varepsilon }})^{dm}$ curves of length m, the number of nodes in $\mathcal {T}$ is $m\cdot |\mathcal {I}|=nm\cdot O(\frac{1}{{\varepsilon }})^{dm}$. Each node $v\in \mathcal {T}$ contains a search tree for its children of size $O(d\cdot |N(v)|)$, and $\sum _{v\in \mathcal {T}}|N(v)|=nm\cdot O(\frac{1}{{\varepsilon }})^{dm}$ so the total space complexity is $O(nmd)\cdot O(\frac{1}{{\varepsilon }})^{md}=n\cdot O(\frac{1}{{\varepsilon }})^{md}$. Constructing $\mathcal {T}$ takes $O(|\mathcal {I}|\cdot md\log (\frac{nmd}{{\varepsilon }}))=n\log (\frac{nmd}{{\varepsilon }})\cdot O(\frac{1}{{\varepsilon }})^{md}$ time.

Theorem 23

There exists a data structure for the $(1+{\varepsilon },r)$-ANNC under DFD, with $n\cdot O(\frac{1}{{\varepsilon }})^{dm}$ space, $n\cdot \log (\frac{n}{{\varepsilon }})\cdot O(\frac{1}{{\varepsilon }})^{md}$ preprocessing time, and $O(md\log (\frac{nmd}{{\varepsilon }}))$ query time.

Similarly, for the asymmetric case we obtain the following theorem.

Theorem 24

There exists a data structure for the asymmetric $(1+{\varepsilon },r)$-ANNC under DFD, with $n\cdot O(\frac{1}{{\varepsilon }})^{dk}$ space, $nm\log (\frac{n}{{\varepsilon }})\cdot \left( O(d\log m)+O(\frac{1}{{\varepsilon }})^{kd}\right) $ preprocessing time, and $O(kd\log (\frac{nkd}{{\varepsilon }}))$ query time.

1.2 Appendix: A.2 $\ell _{p,2}$-Distance

For the case of ANNC under $\ell _{p,2}$-distance, the total number of curves stored in the tree $\mathcal {T}$ is roughly the same as in the case of DFD. We only need to show that for a given node v of the tree $\mathcal {T}$, the upper bound on the size and query time of the search tree associated with it are similar.

The grid points corresponding to the nodes in N(v) are from n sets of m balls with radius $(1+{\varepsilon })$. When projecting the grid points in one of the balls to a single dimension, the number of 1-dimensional points is at most $\frac{m^{1/p}\sqrt{d}}{{\varepsilon }}\cdot (1+{\varepsilon })$, so the total number of projected points is at most $\frac{nm^{1+\frac{1}{p}}\sqrt{d}}{{\varepsilon }}\cdot (1+{\varepsilon })$.

Thus in each level of the search tree of v we have $O(\frac{nm^2\sqrt{d}}{{\varepsilon }})$ 1-dimensional points, so the query time is $O(d\log (\frac{nmd}{{\varepsilon }}))$, and inserting a curve of length m into the tree $\mathcal {T}$ takes $O(md\log (\frac{nmd}{{\varepsilon }}))$ time. Note that the size of the search tree of v remains $O(d\cdot |N(v)|)$.

We conclude that the total space complexity is $O(\frac{nm^2\sqrt{d}}{{\varepsilon }})\cdot O(\frac{1}{{\varepsilon }})^{m(d+1)}=n\cdot O(\frac{1}{{\varepsilon }})^{m(d+1)}$, constructing $\mathcal {T}$ takes $O(|\mathcal {I}|\cdot md\log (nmd/{\varepsilon }))=n\log (\frac{n}{{\varepsilon }})\cdot O(\frac{1}{{\varepsilon }})^{m(d+1)}$ time, and the total query time is $O(md\log (\frac{nmd}{{\varepsilon }}))$.

Theorem 25

There exists a data structure for the $(1+{\varepsilon },r)$-ANNC under $\ell _{p,2}$-distance, with $n\cdot O(\frac{1}{{\varepsilon }})^{m(d+1)}$ space, $n\cdot \log (\frac{n}{{\varepsilon }})\cdot O(\frac{1}{{\varepsilon }})^{m(d+1)}$ preprocessing time, and $O(md\log (\frac{nmd}{{\varepsilon }}))$ query time.

Appendix: B Dealing with Query Curves and Input Curves of Varying Size

Notice that if an input curve $C_i$ has length $t<m$, then the size of the set of candidates $\mathcal {I}_i$ (and $\mathcal {I}'_i$ in the asymmetric case) can only decrease.

In addition, our assumption that all query curves are of length exactly k can be easily removed by constructing k data structures $\mathcal {D}_1,\dots ,\mathcal {D}_k$, where $\mathcal {D}_i$ is our data structure constructed for query curves of length i (instead of k), for $1 \le i \le k$. Clearly, the query time does not change. The storage space is multiplied by k, so for the case of DFD we have storage space $nk\cdot O(\frac{1}{{\varepsilon }})^{kd}$, but $k<2^{kd}$, so the storage space remains $n\cdot O(\frac{1}{{\varepsilon }})^{kd}$. Similarly, for the case of $\ell _{p,2}$-distance we obtain storage space of $n\cdot O(\frac{1}{{\varepsilon }})^{k(d+1)}\cdot \left( \frac{m}{k}\right) ^{kd/p}$.

Appendix: C One-Way Alignments

Claim 26

Let A, B, C be three curves, and let $\tau _1$, $\tau _2$ be two one-way alignments such that $\tau _1$ matches C to A and $\tau _2$ matches C to B. Then $d_{p,2}(A,B)\le \sigma _{p,2}(\tau _1(C,A))+\sigma _{p,2}(\tau _2(C,B))$.

Proof

Denote by $k_A,k_B,k_C$ the lengths of the curves A, B, C respectively. Consider the following algorithm that constructs an alignment $\tau $. For every $1\le x\le k_C$, denote by $i_x,j_x$ the unique indexes such that $(x,i_x)\in \tau _1$ and $(x,j_x)\in \tau _2$. Add the pair $(i_x,j_x)$ to $\tau $ if it is not already there.

First, we need to show that $\tau =\langle (i_1,j_1),\dots ,(i_t,j_t)\rangle $ is a valid alignment. Clearly, $(i_1,j_1)=(1,1)$ because $(1,1)\in \tau _1$ and $(1,1)\in \tau _2$. Similarly, $(i_t,j_t)=(k_A,k_B)$ because $(k_C,k_A)\in \tau _1$ and $(k_C,k_B)\in \tau _2$.

For any $1\le s<t$, consider the two consecutive pairs $(i_s,j_s),(i_{s+1},j_{s+1})\in \tau $. Let $x_1$ be an index such that $(x_1,i_s)\in \tau _1$ and $(x_1,j_s)\in \tau _2$, and $x_2$ an index such that $(x_2,i_{s+1})\in \tau _1$ and $(x_2,j_{s+1})\in \tau _2$. Since $\tau _1,\tau _2$ are one-way alignments, we have $x_1\ne x_2$. Moreover, since the algorithm added $(i_s,j_s)$ to $\tau $ before $(i_{s+1},j_{s+1})$, we have $x_1<x_2$. This implies that $i_{s+1}\ge i_s$ and $j_{s+1}\ge j_s$. Assume by contradiction that $i_{s+1} > i_s+1$, and let x be the index such that $(x,i_s+1)\in \tau _1$, then $x_1<x<x_2$ and thus the algorithm adds a pair $(i_s+1,j)$ for some index j after $(i_s,j_s)$ and before $(i_{s+1},j_{s+1})$, a contradiction. So we have $i_s\le i_{s+1} \le i_s+1$, and by symmetric arguments, $j_s\le j_{s+1} \le j_s+1$, and therefore $\tau $ is valid.

Using the triangle inequality for the $\ell _p$ norm, we get that

$$\begin{aligned} d_{p,2}(A,B)\le \sigma _{p,2}(\tau (A,B))&=\Big (\sum _{(i,j)\in \tau }\Vert a_{i}-b_{j}\Vert _{2}^{p}\Big )^{1/p}\\&\le \Big (\sum _{x=1}^{k_{C}}\Vert a_{i_{x}}-b_{j_{x}}\Vert _{2}^{p}\Big )^{1/p}\\&\le \Big (\sum _{x=1}^{k_{C}}\Vert a_{i_{x}}-c_{x}\Vert _{2}^{p}\Big )^{1/p}+\Big (\sum _{x=1}^{k_{C}}\Vert c_{x}-b_{j_{x}}\Vert _{2}^{p}\Big )^{1/p}\\&=\sigma _{p,2}(\tau _{1}(C,A))+\sigma _{p,2}(\tau _{2}(C,B))~. \end{aligned}$$

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Filtser, A., Filtser, O. & Katz, M.J. Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic. Algorithmica 85, 1490–1519 (2023). https://doi.org/10.1007/s00453-022-01080-1

Download citation

Received: 20 June 2022
Accepted: 28 November 2022
Published: 14 December 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00453-022-01080-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic

Abstract

Access this article

Similar content being viewed by others

Efficient Nearest-Neighbor Query and Clustering of Planar Curves

Computing the Fréchet Gap Distance

FRESH: Fréchet Similarity with Hashing

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix: A Deterministic Construction Using a Prefix Tree

1.1 Appendix: A.1 Discrete Fréchet Distance

Theorem 23

Theorem 24

1.2 Appendix: A.2 \(\ell _{p,2}\)-Distance

Theorem 25

Appendix: B Dealing with Query Curves and Input Curves of Varying Size

Appendix: C One-Way Alignments

Claim 26

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic

Abstract

Access this article

Similar content being viewed by others

Efficient Nearest-Neighbor Query and Clustering of Planar Curves

Computing the Fréchet Gap Distance

FRESH: Fréchet Similarity with Hashing

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix: A Deterministic Construction Using a Prefix Tree

1.1 Appendix: A.1 Discrete Fréchet Distance

Theorem 23

Theorem 24

1.2 Appendix: A.2 \(\ell _{p,2}\)-Distance

Theorem 25

Appendix: B Dealing with Query Curves and Input Curves of Varying Size

Appendix: C One-Way Alignments

Claim 26

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation