Advertisement

Partially Specified Nearest Neighbor Search

  • Tomas Hruz
  • Marcel Schöngens
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7434)

Abstract

We study the Partial Nearest Neighbor Problem that consists in preprocessing n points \(\mathcal{D}\) from d-dimensional metric space such that the following query can be answered efficiently: Given a query vector Q ∈ ℝ d and an axes-aligned query subspace represented by S ∈ {0,1} d , report a point \(P \in \mathcal{D}\) with d S (Q,P) ≤ d S (Q,P′) for all \(P' \in \mathcal{D}\), where d S (Q,P) is the distance between Q and P in the subspace S. This problem is related to similarity search between feature vectors w.r.t. a subset of features. Thus, the problem is of great practical importance in bioinformatics, image recognition, etc., however, due to exponentially many subspaces, each changing distances significantly, the problem has a considerable complexity. We present the first exact algorithms for ℓ2- and ℓ ∞ -metrics with linear space and sub-linear worst-case query time. We also give a simple approximation algorithm, and show experimentally that our approach performs well on real world data.

Keywords

Approximation Ratio Near Neighbor Query Range Query Point Query Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, P.K., Matoušek, J.: On Range Searching with Semialgebraic Sets. Discrete and Computational Geometry 11(1), 393–418 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Andoni, A., Indyk, P., Krauthgamer, R., Nguyen, H.L.: Approximate Line Nearest Neighbor in High Dimensions. In: SODA 2009: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 293–301. ACM (2009)Google Scholar
  3. 3.
    Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the Web. Technical Report 2000-37. Stanford InfoLab (2000)Google Scholar
  4. 4.
    Arge, L., Berg, M.D., Haverkort, H., Yi, K.: The priority R-tree: A practically efficient and worst-case optimal R-tree. ACM Trans. Algorithms 4, 9:1–9:30 (2008)Google Scholar
  5. 5.
    Bernecker, T., Emrich, T., Graf, F., Kriegel, H.-P., Kröger, P., Renz, M., Schubert, E., Zimek, A.: Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 555–564. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Brönnimann, H., Chazelle, B., Matoušek, J.: Product Range Spaces, Sensitive Sampling, and Derandomization. SIAM Journal on Computing 28(5), 1575 (1999)CrossRefGoogle Scholar
  7. 7.
    Chan, T.M.: Optimal Partition Trees. In: SCG 2010: Proceedings of the 2010 Annual Symposium on Computational Geometry, pp. 1–10. ACM (2010)Google Scholar
  8. 8.
    de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry - Algorithms and Applications, 2nd edn. Springer (2000)Google Scholar
  9. 9.
    Eastman, C.M., Zemankova, M.: Partially Specified Nearest Neighbor Searches Using k-d Trees. Information Processing Letter 15(2), 53–56 (1982)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Goodman, J.E., O’Rourke, J. (eds.): Handbook of Discrete and Computational Geometry, 2nd edn. CRC Press (2004)Google Scholar
  11. 11.
    Haussler, D., Welzl, E.: Epsilon-nets and Simplex Range Queries. In: SCG 1986: Proceedings of the 2nd Annual Symposium on Computational Geometry, p. 71. ACM (1986)Google Scholar
  12. 12.
    Hruz, T., Schöngens, M.: Partially Specified Nearest Neighbor Search. Technical Report 762. Department of Computer Science, ETH Zurich (2012)Google Scholar
  13. 13.
    Hruz, T., Wyss, M., et al.: RefGenes: identification of reliable and condition specific reference genes for RT-qPCR data normalization. BMC Genomics 12(1), 156 (2011)CrossRefGoogle Scholar
  14. 14.
    Indyk, P., Motwani, R.: Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. In: STOC 1998: Proceedings of the 30th Annual ACM Symposioum on Theory of Computing, pp. 604–613. ACM (1998)Google Scholar
  15. 15.
    Koltun, V.: Almost Tight Upper Bounds for Vertical Decompositions in Four Dimensions. In: FOCS 2001: Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer Science, pp. 56–65. IEEE (2001)Google Scholar
  16. 16.
    Kriegel, H., Kroger, P., Schubert, M., Zhu, Z.: Efficient Query Processing in Arbitrary Subspaces Using Vector Approximations. In: SSDBM 2006: Proceedings of the 18th International Conference on Scientific and Statistical Database Management, pp. 184–190 (2006)Google Scholar
  17. 17.
    Lee, D.T., Wong, C.: Worst-case Analysis for Region and Partial Region Searches in Multidimensional Binary Search Trees and Balanced Quad Trees. Acta Informatica 9(1), 23–29 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: Efficient Indexing for High-Dimensional Similarity Search. In: VLDB 2007: Proceedings of the 33rd International Conference on Very Rarge Data Bases, pp. 950–961 (2007)Google Scholar
  19. 19.
    Matoušek, J.: Reporting Points in Halfspaces. Computational Geometry 2, 169–186 (1992)MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Matoušek, J.: On Constants for Cuttings in the Plane. Discrete & Computational Geometry 20(4), 427–448 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Matoušek, J.: Lecture Notes on Discrete Geometry. Sp (2002)Google Scholar
  22. 22.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc. (2005)Google Scholar
  23. 23.
    Sharir, M., Shaul, H.: Ray Shooting Amid Balls, Farthest Point from a Line, and Range Emptiness Searching. In: SODA 2005: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 525–534 (2005)Google Scholar
  24. 24.
  25. 25.
    Zimmermann, P., Laule, O., Schmitz, J., Hruz, T., Bleuler, S., Gruissem, W.: Genevestigator transcriptome meta-analysis and biomarker search using rice and barley gene expression databases. Molecular Plant 1(5), 851 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tomas Hruz
    • 1
  • Marcel Schöngens
    • 1
  1. 1.Institute of Theoretical Computer ScienceETH ZurichSwitzerland

Personalised recommendations