Efficient Astronomical Data Condensation Using Fast Nearest Neighbors Search

  • Szymon ŁukasikEmail author
  • Konrad Lalik
  • Piotr Sarna
  • Piotr A. Kowalski
  • Małgorzata Charytanowicz
  • Piotr Kulczycki
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 945)


Analyzing astronomical observations represents one of the most challenging tasks of data exploration. It is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of Big Data problems - like data variety - are also present, datasets size represents the most significant obstacle in visualization, and subsequent analysis. The paper studies efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. The properties of the proposed approach are preliminary studied on astronomical datasets related to the GAIA mission. It is concluded that introduced technique might serve as a scalable method of alleviating the problem of data sets size.


Big Data Astronomy Data reduction 



This work was partially financed (supported) by the Faculty of Physics and Applied Computer Science AGH UST statutory tasks within subsidy of Ministry of Science and Higher Education.

The study was also supported in part by PL-Grid Infrastructure.


  1. 1.
    GAIA mission. Accessed 20 Aug 2018
  2. 2.
    Abraham, S., Philip, N.S., Kembhavi, A., Wadadekar, Y.G., Sinha, R.: A photometric catalogue of quasars and other point sources in the Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc. 419, 80–94 (2012)CrossRefGoogle Scholar
  3. 3.
    Arefin, A.S., Riveros, C., Berretta, R., Moscato, P.: GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs. PLoS ONE 7(8), e44000 (2012)CrossRefGoogle Scholar
  4. 4.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 93–104. ACM, New York (2000)Google Scholar
  5. 5.
    Bubeck, S., von Luxburg, U.: Nearest neighbor clustering: a baseline method for consistent clustering with arbitrary objective functions. J. Mach. Learn. Res. 10, 657–698 (2009)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Burgess, R., Falcao, A.J., Fernandes, T., Ribeiro, R.A., Gomes, M., Krone-Martins, A., de Almeida, A.M.: Selection of large-scale 3D point cloud data using gesture recognition. In: Camarinha-Matos, L., Baldissera, T., Di Orio, G., Marques, F. (eds.) Technological Innovation for Cloud-Based Engineering Systems: Proceedings of the 6th IFIP WG 5.5/SOCOLNET Doctoral Conference on Computing, Electrical and Industrial Systems, DoCEIS 2015, pp. 188–195. Springer, Cham (2015)CrossRefGoogle Scholar
  7. 7.
    Dutta, H., Giannella, C., Borne, K., Kargupta, H.: Distributed top-k outlier detection from astronomy catalogs using the DEMAC system. Chapter 47, pp. 473–478. SIAM (2005)Google Scholar
  8. 8.
    Eastman, C., Weiss, S.F.: Tree structures for high dimensionality nearest neighbor searching. Inf. Syst. 7(2), 115–122 (1982)CrossRefGoogle Scholar
  9. 9.
    Freudling, W., et al.: Automated data reduction workflows for astronomy. The ESO Reflex environment. Astron. Astrophys. 559, A96 (2013)CrossRefGoogle Scholar
  10. 10.
    Grandinetti, L., Joubert, G., Kunze, M., Pascucci, V.: Big Data and High Performance Computing. Advances in Parallel Computing. IOS Press, Amsterdam (2015)Google Scholar
  11. 11.
    Hassan, A., Fluke, C.J.: Scientific visualization in astronomy: towards the petascale astronomy era. PASA Publ. Astron. Soc. Austral. 28, 150–170 (2011)CrossRefGoogle Scholar
  12. 12.
    Li, L., Zhang, Y., Zhao, Y.: k-nearest neighbors for automated classification of celestial objects. Sci. China Ser. G 51(7), 916–922 (2008)CrossRefGoogle Scholar
  13. 13.
    Łukasik, S., Moitinho, A.A., Kowalski, P.A., Falcão, A., Ribeiro, R.A., Kulczycki, P.: Survey of object-based data reduction techniques in observational astronomy. Open Phys. 14, 64 (2016)CrossRefGoogle Scholar
  14. 14.
    Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 734–747 (2002)CrossRefGoogle Scholar
  15. 15.
    Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)CrossRefGoogle Scholar
  16. 16.
    Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. CRC Press, Boca Raton (2004)CrossRefGoogle Scholar
  17. 17.
    Rocke, D.M., Dai, J.: Sampling and subsampling for cluster analysis in data mining: with applications to sky survey data. Data Min. Knowl. Disc. 7(2), 215–232 (2003)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Schirmer, M.: THELI: convenient reduction of optical, near-infrared, and mid-infrared imaging data. Astrophys. J. Suppl. Ser. 209, 21 (2013)CrossRefGoogle Scholar
  19. 19.
    Szalay, A., Gray, J.: The world-wide telescope. Science 293(5537), 2037–2040 (2001)CrossRefGoogle Scholar
  20. 20.
    Łukasik, S., Lalik, K., Sarna, P., Kowalski, P.A., Charytanowicz, M., Kulczycki, P.: Efficient astronomical data condensation using approximate nearest neighbors. In: Kulczycki, P., Kowalski, P.A., Łukasik, S. (eds.) Contemporary Computational Science, pp. 55–56 (2018)Google Scholar
  21. 21.
    Wang, D., Shi, L., Cao, J.: Fast algorithm for approximate k-nearest neighbor graph construction. In: 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 349–356, December 2013Google Scholar
  22. 22.
    Wang, X., Tino, P., Fardal, M.A., Raychaudhury, S., Babul, A.: Fast Parzen window density estimator. In: 2009 International Joint Conference on Neural Networks, pp. 3267–3274, June 2009Google Scholar
  23. 23.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA, vol. 93, pp. 311–321 (1993)Google Scholar
  24. 24.
    Zhang, Y.-M., Huang, K., Geng, G., Liu, C.-L.: Fast kNN graph construction with locality sensitive hashing. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 660–674. Springer, Heidelberg (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Szymon Łukasik
    • 1
    • 2
    Email author
  • Konrad Lalik
    • 1
  • Piotr Sarna
    • 1
  • Piotr A. Kowalski
    • 1
    • 2
  • Małgorzata Charytanowicz
    • 2
    • 3
  • Piotr Kulczycki
    • 1
    • 2
  1. 1.Faculty of Physics and Applied Computer ScienceAGH University of Science and TechnologyKrakówPoland
  2. 2.Systems Research Institute, Polish Academy of SciencesWarsawPoland
  3. 3.Faculty of Electrical Engineering and Computer ScienceLublin University of TechnologyLublinPoland

Personalised recommendations