Abstract
Nowadays astronomical catalogs contain patterns of hundreds of millions of objects with data volumes in the terabyte range. Upcoming projects will gather such patterns for several billions of objects with peta- and exabytes of data. From a machine learning point of view, these settings often yield unsupervised, semi-supervised, or fully supervised tasks, with large training and huge test sets. Recent studies have demonstrated the effectiveness of prototype-based learning schemes such as simple nearest neighbor models. However, although being among the most computationally efficient methods for such settings (if implemented via spatial data structures), applying these models on all remaining patterns in a given catalog can easily take hours or even days. In this work, we investigate the practical effectiveness of GPU-based approaches to accelerate such nearest neighbor queries in this context. Our experiments indicate that carefully tuned implementations of spatial search structures for such multi-core devices can significantly reduce the practical runtime. This renders the resulting frameworks an important algorithmic tool for current and upcoming data analyses in astronomy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM 51(1), 117–122 (2008)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23 International Conference on Machine Learning, pp. 97–104. ACM (2006)
Borne, K.: Scientific data mining in astronomy, arXiv:0911.0505v1 (2009)
Bustos, B., Deussen, O., Hiller, S., Keim, D.: A graphics hardware accelerated algorithm for nearest neighbor search. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006, Part IV. LNCS, vol. 3994, pp. 196–199. Springer, Heidelberg (2006)
Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: CVPR Workshop on Computer Vision on GPU, Anchorage, Alaska, USA (June 2008)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer (2009)
Ivezic, Z., Tyson, J.A., Acosta, E., Allsman, R., andere: Lsst: from science drivers to reference design and anticipated data products (2011)
Kirk, D.B., Wen-mei, H.: Programming Massively Parallel Processors: A Hands-on Approach, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2010)
Munshi, A., Gaster, B., Mattson, T.: OpenCL Programming Guide. OpenGL Series. Addison-Wesley (2011)
Nakasato, N.: Implementation of a parallel tree method on a gpu. CoRR, abs/1112.4539 (2011)
nVidia Corporation. Opencl TM best practices guide (2009), http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/NVIDIA_OpenCL_BestPracticesGuide.pdf
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
Polsterer, K.L., Zinn, P., Gieseke, F.: Finding new high-redshift quasars by asking the neighbours. Monthly Notices of the Royal Astronomical Society (MNRAS) 428(1), 226–235 (2013)
Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing). MIT Press (2006)
York, D.G., et al.: The sloan digital sky survey: Technical summary. The Astronomical Journal 120(3), 1579–1587
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heinermann, J., Kramer, O., Polsterer, K.L., Gieseke, F. (2013). On GPU-Based Nearest Neighbor Queries for Large-Scale Photometric Catalogs in Astronomy. In: Timm, I.J., Thimm, M. (eds) KI 2013: Advances in Artificial Intelligence. KI 2013. Lecture Notes in Computer Science(), vol 8077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40942-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-40942-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40941-7
Online ISBN: 978-3-642-40942-4
eBook Packages: Computer ScienceComputer Science (R0)