Processing All k-Nearest Neighbor Queries in Hadoop

  • Takuya Yokoyama
  • Yoshiharu Ishikawa
  • Yu Suzuki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7418)

Abstract

A k-nearest neighbor (k-NN) query, which retrieves nearest k points from a database is one of the fundamental query types in spatial databases. An all k-nearest neighbor query (AkNN query), a variation of a k-NN query, determines the k-nearest neighbors for each point in the dataset in a query process. In this paper, we propose a method for processing AkNN queries in Hadoop. We decompose the given space into cells and execute a query using the MapReduce framework in a distributed and parallel manner. Using the distribution statistics of the target data points, our method can process given queries efficiently.

Keywords

Target Space Spatial Database Cell Decomposition Boundary Circle MapReduce Framework 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proc. EDBT, pp. 99–110 (2010)Google Scholar
  2. 2.
    Chen, Y., Patel, J.M.: Efficient evaluation of all-nearest-neighbor queries. In: Proc. ICDE 2007, pp. 1056–1065 (2007)Google Scholar
  3. 3.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)Google Scholar
  4. 4.
    Emrich, T., Graf, F., Kriegel, H.-P., Schubert, M., Thoma, M.: Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 501–518. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    The apache software foundation: Hadoop homepage, http://hadoop.apache.org/
  6. 6.
    Jiang, D., Tung, A.K.H., Chen, G.: MAP-JOIN-REDUCE: Toward scalable and efficient data analysis on large clusters. IEEE TKDE 23(9), 1299–1311 (2011)Google Scholar
  7. 7.
    Samet, H.: The quadtree and related hierarchical data structures. ACM Computing Surveys 16(2), 187–260 (1984)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: Proc. SIGMOD, pp. 495–506 (2010)Google Scholar
  9. 9.
    White, T.: Hadoop: The Definitive Guide. O’Reilly (2009)Google Scholar
  10. 10.
    Yokoyama, T., Ishikawa, Y., Suzuki, Y.: Processing all k-nearest neighbor queries in hadoop (long version) (2012), http://www.db.itc.nagoya-u.ac.jp/papers/2012-waim-long.pdf
  11. 11.
    Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries in spatial databases. In: Proc. SSDBM, pp. 297–306 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Takuya Yokoyama
    • 1
  • Yoshiharu Ishikawa
    • 2
    • 1
    • 3
  • Yu Suzuki
    • 2
  1. 1.Graduate School of Information ScienceNagoya UniversityJapan
  2. 2.Information Technology CenterNagoya UniversityJapan
  3. 3.National Institute of InformaticsJapan

Personalised recommendations