UCAmI 2017: Ubiquitous Computing and Ambient Intelligence pp 840-852 | Cite as
Distributed Unsupervised Clustering for Outlier Analysis in the Biggest Milky Way Survey: ESA Gaia Mission
Abstract
The Gaia mission (ESA) is collecting huge amounts of information about the objects that populate our Galaxy and beyond. Such data must be processed and analyzed before being released, and this work is carried out by the Data Processing and Analysis Consortium (DPAC) through several work packages. One of these packages is Outlier Analysis, devoted to the study, by means of unsupervised clustering, of all the objects that cannot be fitted into any of the existent models. An algorithm based on optimized Self-Organized Maps (SOM) is proposed and implemented for taking advantage of distributed computing platforms, such as the MapReduce paradigm for Apache Hadoop and Apache Spark. Finally, the processing times of the sequential implementation of the algorithm is compared to the Hadoop and Spark based ones.
Keywords
Computational Astrophysics Fast Self-Organized Maps Parallel computing Map-reduce Apache Hadoop Apache Spark Remote sensingNotes
Acknowledgements
This work was supported by the Spanish FEDER through Grants ESP2016-80079-C2-2-R, and ESP2014-55996-C2-2-R.
References
- 1.Álvarez, M.A., Dafonte, C., Garabato, D., Manteiga, M.: Analysis and knowledge discovery by means of Self-Organizing Maps for Gaia data releases. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 137–144. Springer, Cham (2016). doi: 10.1007/978-3-319-46681-1_17 CrossRefGoogle Scholar
- 2.Bailer-Jones, C.A.L., et al.: The Gaia astrophysical parameters inference system (Apsis). Pre-launch description. Astron. Astrophys. 559, A74 (2013)CrossRefGoogle Scholar
- 3.Brunet, P., Montmorry, A., Frezouls, B.: Big data challenges, an insight into the GAIA Hadoop solution. In: SpaceOps Conferences, AIAA, June 2012Google Scholar
- 4.Cardelli, J.A., Clayton, G.C., Mathis, J.S.: The relationship between infrared, optical, and ultraviolet extinction. Astrophys. J. 345, 245–256 (1989)CrossRefGoogle Scholar
- 5.del Coso, C., Fustes, D., Dafonte, C., Nóvoa, F.J., Rodríguez-Pedreira, J.M., Arcay, B.: Mixing numerical and categorical data in a Self-Organizing Map by means of frequency neurons. Appl. Soft Comput. 36, 246–254 (2015)CrossRefGoogle Scholar
- 6.de Bruijne, J.H.J.: Science performance of Gaia, ESA’s space-astrometry mission. Astrophys. Space Sci. 341, 31–41 (2012)CrossRefGoogle Scholar
- 7.Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(7), 179–188 (1936)CrossRefGoogle Scholar
- 8.Fustes, D., Manteiga, M., Dafonte, C., Arcay, B., Ulla, A., Smith, K., Borrachero, R., Sordo, R.: An approach to the analysis of SDSS spectroscopic outliers based on Self-Organizing Maps: designing the outlier analysis software package for the next Gaia survey. Astron. Astrophys. 559, A7 (2013)CrossRefGoogle Scholar
- 9.Fustes, D., Dafonte, C., Arcay, B., Manteiga, M., Smith, K., Vallenari, A., Luri, X.: SOM ensemble for unsupervised outlier analysis. Application to outlier identification in the Gaia astronomical survey. ESWA 40(5), 1530–1541 (2013)Google Scholar
- 10.Collaboration, G., Brown, A.G.A., Vallenari, A., Prusti, T., de Bruijne, J.H.J., Mignard, F., Drimmel, R., Babusiaux, C., Bailer-Jones, C.A.L., Bastian, U., et al.: Gaia data release 1. Summary of the astrometric, photometric, and survey properties. Astron. Astrophys. 595, A2 (2016)CrossRefGoogle Scholar
- 11.Collaboration, G., Prusti, T., de Bruijne, J.H.J., Brown, A.G.A., Vallenari, A., Babusiaux, C., Bailer-Jones, C.A.L., Bastian, U., Biermann, M., Evans, D.W., et al.: The Gaia mission. Astron. Astrophys. 595, A1 (2016)CrossRefGoogle Scholar
- 12.Garabato, D., Dafonte, C., Manteiga, M., Fustes, D., Álvarez, M.A., Varela, B.A.: A distributed learning algorithm for Self-Organizing Maps intended for outlier analysis in the GAIA - ESA mission. In: IFSA-EUSFLAT (2015)Google Scholar
- 13.Isasi, Y., Figueras, F., Luri, X., Robin, A.C.: GUMS & GOG: simulating the universe for Gaia. Astrophys. Space Sci. Proc. 14, 415 (2010)CrossRefGoogle Scholar
- 14.Jolliffe, I.: Principal Component Analysis. Springer, New York (2002)MATHGoogle Scholar
- 15.Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analytics, 1st edn. O’Reilly Media Inc., Sebastopol (2015)Google Scholar
- 16.Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)MathSciNetCrossRefMATHGoogle Scholar
- 17.Manteiga, M., Carricajo, I., Rodríguez, A., Dafonte, C., Arcay, B.: Starmind: a fuzzy logic knowledge-based system for the automated classification of stars in the MK system. Astron. J. 137(2), 3245–3253 (2009)CrossRefGoogle Scholar
- 18.Naim, A., Ratnatunga, K.U., Griffiths, R.E.: Galaxy morphology without classification: Self-Organizing Maps. ArXiv Astrophysics e-prints, April 1997Google Scholar
- 19.Ordóñez, D., Dafonte, C., Arcay, B., Manteiga, M.: HSC: a multi-resolution clustering strategy in Self-Organizing Maps applied to astronomical observations. Appl. Soft Comput. J. 12(1), 204–215 (2012)CrossRefGoogle Scholar
- 20.Ordóñez-Blanco, D., Arcay, B., Dafonte, C., Manteiga, M., Ulla, A.: Object classification and outliers analysis in the forthcoming Gaia mission. Lect. Notes Essays Astrophys. 4, 97–102 (2010)Google Scholar
- 21.Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley Professional, Reading (2010)Google Scholar
- 22.Smith, K.W.: The discrete source classifier in Gaia-Apsis, p. 239 (2012)Google Scholar
- 23.Torra, J., Gaia Group: Gaia: the challenge begins. In: Highlights of Spanish Astrophysics VII, pp. 82–94, May 2013Google Scholar
- 24.Tsalmantza, P., et al.: A semi-empirical library of galaxy spectra for Gaia classification based on SDSS data and PÉGASE models. Astron. Astrophys. 537, A42 (2012)CrossRefGoogle Scholar
- 25.Wenger, M., et al.: The SIMBAD astronomical database: the CDS reference database for astronomical objects. Astron. Astrophys., Suppl. Ser. 143(1), 9–22 (2000)CrossRefGoogle Scholar
- 26.White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Sebastopol (2015)Google Scholar
- 27.Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRefGoogle Scholar