Advertisement

On GPU-Based Nearest Neighbor Queries for Large-Scale Photometric Catalogs in Astronomy

  • Justin Heinermann
  • Oliver Kramer
  • Kai Lars Polsterer
  • Fabian Gieseke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8077)

Abstract

Nowadays astronomical catalogs contain patterns of hundreds of millions of objects with data volumes in the terabyte range. Upcoming projects will gather such patterns for several billions of objects with peta- and exabytes of data. From a machine learning point of view, these settings often yield unsupervised, semi-supervised, or fully supervised tasks, with large training and huge test sets. Recent studies have demonstrated the effectiveness of prototype-based learning schemes such as simple nearest neighbor models. However, although being among the most computationally efficient methods for such settings (if implemented via spatial data structures), applying these models on all remaining patterns in a given catalog can easily take hours or even days. In this work, we investigate the practical effectiveness of GPU-based approaches to accelerate such nearest neighbor queries in this context. Our experiments indicate that carefully tuned implementations of spatial search structures for such multi-core devices can significantly reduce the practical runtime. This renders the resulting frameworks an important algorithmic tool for current and upcoming data analyses in astronomy.

Keywords

Test Pattern Test Instance Neighbor Query Neighbor Model Test Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  2. 2.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23 International Conference on Machine Learning, pp. 97–104. ACM (2006)Google Scholar
  4. 4.
    Borne, K.: Scientific data mining in astronomy, arXiv:0911.0505v1 (2009)Google Scholar
  5. 5.
    Bustos, B., Deussen, O., Hiller, S., Keim, D.: A graphics hardware accelerated algorithm for nearest neighbor search. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006, Part IV. LNCS, vol. 3994, pp. 196–199. Springer, Heidelberg (2006)Google Scholar
  6. 6.
    Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: CVPR Workshop on Computer Vision on GPU, Anchorage, Alaska, USA (June 2008)Google Scholar
  7. 7.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer (2009)Google Scholar
  8. 8.
    Ivezic, Z., Tyson, J.A., Acosta, E., Allsman, R., andere: Lsst: from science drivers to reference design and anticipated data products (2011)Google Scholar
  9. 9.
    Kirk, D.B., Wen-mei, H.: Programming Massively Parallel Processors: A Hands-on Approach, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2010)Google Scholar
  10. 10.
    Munshi, A., Gaster, B., Mattson, T.: OpenCL Programming Guide. OpenGL Series. Addison-Wesley (2011)Google Scholar
  11. 11.
    Nakasato, N.: Implementation of a parallel tree method on a gpu. CoRR, abs/1112.4539 (2011)Google Scholar
  12. 12.
  13. 13.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)Google Scholar
  14. 14.
    Polsterer, K.L., Zinn, P., Gieseke, F.: Finding new high-redshift quasars by asking the neighbours. Monthly Notices of the Royal Astronomical Society (MNRAS) 428(1), 226–235 (2013)CrossRefGoogle Scholar
  15. 15.
    Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing). MIT Press (2006)Google Scholar
  16. 16.
    York, D.G., et al.: The sloan digital sky survey: Technical summary. The Astronomical Journal 120(3), 1579–1587Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Justin Heinermann
    • 1
  • Oliver Kramer
    • 1
  • Kai Lars Polsterer
    • 2
  • Fabian Gieseke
    • 3
  1. 1.Department of Computing ScienceUniversity of OldenburgOldenburgGermany
  2. 2.Faculty of Physics and AstronomyRuhr-University BochumBochumGermany
  3. 3.Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations