Abstract
Astronomy, being one of the oldest observational sciences, has collected a lot of data over the ages. In recent times, it is experiencing a huge data surge due to advancements in telescopic technologies with automated digital outputs. The main driver behind this article is to present various relevant Machine Learning (ML) algorithms and big data frameworks or tools being applied and can be employed in large astronomical data-set analysis to assist astronomers in solving multiple vital intriguing problems. Throughout this survey, we attempt to review, evaluate and summarize diverse astronomical data sources, gain knowledge of structure, the complexity of the data, and challenges in the data processing. Additionally, we discuss ample technologies being developed to handle and process this voluminous data. We also look at numerous activities being carried out all over the world enriching this domain. While going through existing literature, we perceived a limited number of comprehensive studies reported so far analyzing astronomy data-sets from the viewpoint of parallel processing and machine learning collectively. This motivated us to pursue this extensive literature review task by outlining up-to-date contributions and opportunities available in this area. Besides, this article also discusses briefly a cloud-based machine learning approach to estimate the extra-galactic object redshifts considering photometric data as input features. As the intersection of big data, machine learning and astronomy is a quite new paradigm, this article will create a strong awareness among interested young scientists for future research and provide an appropriate insight on how these algorithms and tools are becoming inevitable to the astronomy community day by day.
Similar content being viewed by others
References
Kremer, J., Stensbo-Smidt, K., Gieseke, F., Pedersen, K.S., Igel, C.: Big universe, big data: Machine learning and image analysis for astronomy. IEEE Intell. Syst. https://doi.org/10.1109/mis.2017.40 (2017)
Tallada, P., Carretero, J., Casals, J., Acosta-Silva, C., Serrano, S., Caubet, M., Castander, F.J., César, E., Crocce, M., Delfino, M., et al.: Cosmohub: Interactive exploration and distribution of astronomical data on hadoop. Astron. Comput., 100391 (2020)
Baron, D.: Machine Learning in Astronomy: a practical overview (2019)
Borne, K.D.: Astroinformatics: A 21st century approach to astronomy (2009)
Wang, K., Guo, P., Yu, F.: Computational intelligence in astronomy: A survey. Int. J. Comput. Intell. Syst. 11, 575–590 (2018)
Fluke, C.J., Jacobs, C.: Surveying the reach and maturity of machine learning and artificial intelligence in astronomy. Wiley Interdisc. Rev. Data Mining Knowl. Discov 10(2), 1349 (2020)
Bird, J., Petzold, L., Lubin, P., Deacon, J.: Advances in deep space exploration via simulators & deep learning. New Astron. 84, 101517 (2021)
Ntampaka, M., Avestruz, C., Boada, S., Caldeira, J., Cisewski-Kehe, J., Di Stefano, R., Dvorkin, C., Evrard, A.E., Farahi, A., Finkbeiner, D., et al.: The role of machine learning in the next decade of cosmology. arXiv:1902.10159 (2019)
Carleo, G., Cirac, I., Cranmer, K., Daudet, L., Schuld, M., Tishby, N., Vogt-Maranto, L., Zdeborová, L.: Machine learning and the physical sciences. Rev. Modern Phys. 91(4), 045002 (2019)
Navamani, T.: Efficient deep learning approaches for health informatics. In: Deep Learning and Parallel Computing Environment for Bioengineering Systems, pp 123–137. Elsevier (2019)
Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., Rellermeyer, J.S.: A survey on distributed machine learning. ACM Comput. Surv. (CSUR) 53(2), 1–33 (2020)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Mehta, P., Bukov, M., Wang, C.-H., Day, A.G., Richardson, C., Fisher, C.K., Schwab, D.J.: A high-bias, low-variance introduction to machine learning for physicists. Phys. Rep. 810, 1–124 (2019)
Ball, N.M., Brunner, R.J.: Data mining and machine learning in astronomy. Int. J. Modern Phys. D 19(07), 1049–1106 (2010). https://doi.org/10.1142/s0218271810017160
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
C.J., B.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2013)
Cristianini, N., Shawe-Taylor, J., et al.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Kecman, V.: Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models. MIT Press, Cambridge (2001)
Schölkopf, B., Smola, A.J., Bach, F., et al.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Abe, S.: Support Vector Machines for Pattern Classification, vol. 2. Springer, New York (2005)
Lin, C.-F., Wang, S.-D.: Fuzzy support vector machines with automatic membership setting. Support vector machines: Theory and applications, 233–254 (2005)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008)
Fix, E.: Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties. USAF School of Aviation Medicine (1951)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1), 21–27 (1967)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Dasarathy, B.V.: Nearest neighbor (nn) norms: Nn pattern classification techniques. IEEE Comput. Soc. Tutorial (1991)
Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-neighbor methods in learning and vision. IEEE Trans. Neural Netw. 19(2), 377 (2008)
Beitia-Antero, L., Yáñez, J., de Castro, A.I.G.: On the use of logistic regression for stellar classification. Exp. Astron. 45(3), 379–395 (2018)
Carliles, S., Budavári, T., Heinis, S., Priebe, C., Szalay, A.S.: Random forests for photometric redshifts. Astrophys. J. 712(1), 511 (2010)
Baron, D., Poznanski, D.: The weirdest sdss galaxies: results from an outlier detection algorithm. Mon. Not. R. Astron. Soc. 465(4), 4530–4555 (2017)
Cao, H., Bastieri, D., Rando, R., Urso, G., Luo, G., Paccagnella, A.: Machine learning on compton event identification for a nano-satellite mission. Exp. Astron. 47(1), 129–144 (2019)
Steinhaus, H.: Sur la division des corps materiels en parties. bull. acad. polon. sci., c1. iii vol iv: 801-804 (1956)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp 281–297, Oakland, CA, USA (1967)
Ward Jr, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Parzen, E.: On estimation of a probability density function and mode. Ann Math. Stat. 33(3), 1065–1076 (1962)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press, Boca Raton (1986)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York (2015)
Taylor, C.: Classification and kernel density estimation. Vistas Astron. 41(3), 411–417 (1997)
Wasserman, L.: All of Statistics: a Concise Course in Statistical Inference. Springer, New York (2013)
Klemelä, J.S.: Smoothing of Multivariate Data: Density Estimation and Visualization, vol. 737. John Wiley & Sons, New York (2009)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)
Kohonen, T.: An overview of som literature. In: Self-Organizing Maps, pp 347–371. Springer (2001)
Galvin, T.J., Huynh, M., Norris, R.P., Wang, X.R., Hopkins, E., Wong, O., Shabala, S., Rudnick, L., Alger, M.J., Polsterer, K.L.: Radio galaxy zoo: Knowledge transfer using rotationally invariant self-organizing maps. Publ. Astron. Soc. Pac. 131(1004), 108009 (2019)
Wilson, D., Nayyeri, H., Cooray, A., Häußler, B.: Photometric redshift estimation with galaxy morphology using self-organizing maps. Astrophys. J. 888(2), 83 (2020)
Gomes, Z., Jarvis, M.J., Almosallam, I.A., Roberts, J.S.: Improving photometric redshift estimation using gpz: size information, post processing, and improved photometry. Mon. Not. R. Astron. Soc. 475, 331–342 (2018)
Boroson, T.A., Green, R.F.: The emission-line properties of low-redshift quasi-stellar objects. Astrophys. J. Suppl. Ser. 80, 109–135 (1992)
Djorgovski, S.: The fundamental plane correlations for globular clusters. Astrophys. J. 438, 29–32 (1995)
Govada, A., Sahay, S.K.: A communication efficient and scalable distributed data mining for the astronomical data. Astron. Comput. 16, 166–173 (2016)
Collister, A.A., Lahav, O.: Annz: estimating photometric redshifts using artificial neural networks. Publ. Astron Soc Pac. 116(818), 345 (2004)
Sadeh, I., Abdalla, F.B., Lahav, O.: Annz2: photometric redshift and probability distribution function estimation using machine learning. Publ. Astron Soc Pac. 128(968), 104502 (2016)
Angel, J.R.P., Wizinowich, P., Lloyd-Hart, M., Sandler, D.: Adaptive optics for array telescopes using neural-network techniques. Nature 348(6298), 221–224 (1990)
Bazell, D., Peng, Y.: A comparison of neural network algorithms and preprocessing methods for star-galaxy discrimination. Astrophys. J. Suppl. Ser. 116(1), 47 (1998)
Andrešič, D., Šaloun, P., Pečíková, B.: Large astronomical time series pre-processing for classification using artificial neural networks. In: Intelligent Astrophysics, pp 265–293. Springer (2021)
Barrientos, A., Holdship, J., Solar, M., Martín, S., Rivilla, V.M., Viti, S., Mangum, J., Harada, N., Sakamoto, K., Muller, S., et al.: Towards the prediction of molecular parameters from astronomical emission lines using neural networks. Exp. Astron., 1–26 (2021)
Hinners, T.A., Tat, K., Thorp, R.: Machine learning techniques for stellar light curve classification. Astron. J. 156(1), 7 (2018)
Muthukrishna, D., Lochner, M., Webb, S.: Real-time detection of anomalies in large-scale transient surveys (2019)
Barchi, P., de Carvalho, R., Rosa, R., Sautter, R., Soares-Santos, M., Marques, B., Clua, E., Gonçalves, T., de Sá-Freitas, C., Moura, T.: Machine and deep learning applied to galaxy morphology-a comparative study. Astron. Comput. 30, 100334 (2020)
González, R.E., Munoz, R.P., Hernández, C.A.: Galaxy detection and identification using deep learning and data augmentation. Astron. Comput. 25, 103–109 (2018)
Cacho Martínez, R.: Distant galaxies analysis with deep neural networks. http://hdl.handle.net/10609/107807 (2020)
Hoyle, B., Rau, M.M., Bonnett, C., Seitz, S., Weller, J.: Data augmentation for machine learning redshifts applied to sloan digital sky survey galaxies. Mon. Not. R. Astron. Soc. 450(1), 305–316 (2015)
Iten, R., Metger, T., Wilming, H., Del Rio, L., Renner, R.: Discovering physical concepts with neural networks. Phys. Rev. Lett. 124(1), 010508 (2020)
Sedaghat, N., Romaniello, M., Carrick, J.E., Pineau, F.-X.: Machines learn to infer stellar parameters just by looking at a large number of spectra. Mon. Not. R. Astron. Soc. 501(4), 6026–6041 (2021)
Mu, Y.-H., Qiu, B., Zhang, J.-N., Ma, J.-C., Fan, X.-D.: Photometric redshift estimation of galaxies with convolutional neural network. Res. Astron. Astrophys. 20(6), 089 (2020)
Schawinski, K., Turp, D., Zhang, C.: Exploring galaxy evolution with latent space walks. AAS 231, 309–01 (2018)
Ribli, D., Pataki, B.Á., Zorrilla Matilla, J.M., Hsu, D., Haiman, Z., Csabai, I.: Weak lensing cosmology with convolutional neural networks on noisy data. Mon. Not. R. Astron. Soc. 490(2), 1843–1860 (2019)
Yue, Y., Cao, Z., Gu, H., Wang, X.: Dynamic simulation and parameter fitting method of cometary dust based on machine learning. Exp. Astro, 1–34 (2021)
Hezaveh, Y.D., Levasseur, L.P., Marshall, P.J.: Fast automated analysis of strong gravitational lenses with convolutional neural networks. Nature 548(7669), 555–557 (2017)
Pearson, J., Pennock, C., Robinson, T.: Auto-detection of strong gravitational lenses using convolutional neural networks. Emergent Sci. 2, 1 (2018)
Schaefer, C., Geiger, M., Kuntzer, T., Kneib, J.-P.: Deep convolutional neural networks as strong gravitational lens detectors. Astron. Astrophys. 611, 2 (2018)
Lanusse, F., Ma, Q., Li, N., Collett, T.E., Li, C.-L., Ravanbakhsh, S., Mandelbaum, R., Póczos, B.: Cmu deeplens: deep learning for automatic image-based galaxy–galaxy strong lens finding. Mon. Not. R. Astron. Soc. 473(3), 3895–3906 (2018)
Sedaghat, N., Mahabal, A.: Effective image differencing with convolutional neural networks for real-time transient hunting. Mon. Not. R. Astron. Soc. 476(4), 5365–5376 (2018)
Sadeh, I.: Deep learning detection of transients. arXiv:1902.03620 (2019)
Mong, Y.-L., Ackley, K., Galloway, D., Killestein, T., Lyman, J., Steeghs, D., Dhillon, V., O’Brien, P., Ramsay, G., Poshyachinda, S., et al.: Machine learning for transient recognition in difference imaging with minimum sampling effort. Mon. Not. R. Astron. Soc. 499(4), 6009–6017 (2020)
Agrawal, S., Basak, S., Saha, S., Rosario-Franco, M., Routh, S., Bora, K., Theophilus, A.J.: A comparative study in classification methods of exoplanets: Machine learning exploration via mining and automatic labeling of the habitability catalog (2015)
Basak, S., Agrawal, S., Saha, S., Theophilus, A.J., Bora, K., Deshpande, G., Murthy, J.: Habitability classification of exoplanets: a machine learning insight. arXiv:1805.08810 (2018)
Viquar, M., Basak, S., Dasgupta, A., Agrawal, S., Saha, S.: Machine learning in astronomy: A case study in quasar-star classification. In: Emerging Technologies in Data Mining and Information Security, pp 827–836. Springer (2019)
Saha, S., Nagaraj, N., Mathur, A., Yedida, R.: Evolution of novel activation functions in neural network training with applications to classification of exoplanets. arXiv:1906.01975 (2019)
Saha, S., Mathur, A., Bora, K., Agrawal, S., Basak, S.: Sbaf: A new activation function for artificial neural net based habitability classification. arXiv:1806.01844 (2018)
Bora, K., Saha, S., Agrawal, S., Safonova, M., Routh, S., Narasimhamurthy, A.: Cd-hpf: New habitability score via data analytic modeling. Astron. Comput. 17, 129–143 (2016)
Theophilus, A., Saha, S., Basak, S., Murthy, J.: A novel exoplanetary habitability score via particle swarm optimization of ces production functions. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp 2139–2147. IEEE (2018)
Saha, S., Basak, S., Safonova, M., Bora, K., Agrawal, S., Sarkar, P., Murthy, J.: Theoretical validation of potential habitability via analytical and boosted tree methods: An optimistic study on recently discovered exoplanets. Astron. Comput. 23, 141–150 (2018)
Khaidem, L., Saha, S., Kar, S., Saha, S., Basak, S.: Quantifying exoplanet habitability via penalized multi-objective optimization (2019)
Basak, S., Saha, S., Mathur, A., Bora, K., Makhija, S., Safonova, M., Agrawal, S.: Ceesa meets machine learning: A constant elasticity earth similarity approach to habitability and classification of exoplanets. Astron. Comput. 30, 100335 (2020)
Heitmann, K., Bingham, D., Lawrence, E., Bergner, S., Habib, S., Higdon, D., Pope, A., Biswas, R., Finkel, H., Frontiere, N., et al.: The mira–titan universe: precision predictions for dark energy surveys. Astrophys. J. 820(2), 108 (2016)
Varma, V., Field, S.E., Scheel, M.A., Blackman, J., Kidder, L.E., Pfeiffer, H.P.: Surrogate model of hybridized numerical relativity binary black hole waveforms. Phys. Rev. D 99(6), 064045 (2019)
Ford, E.B., Moorhead, A.V., Veras, D., et al.: A bayesian surrogate model for rapid time series analysis and application to exoplanet observations. Bayesian Anal. 6(3), 475–499 (2011)
Khan, S., Green, R.: Gravitational-wave surrogate models powered by artificial neural networks: The ann-sur for waveform generation. arXiv:2008.12932 (2020)
Aricò, G., Angulo, R.E., Hernández-Monteagudo, C., Contreras, S., Zennaro, M., Pellejero-Ibañez, M., Rosas-Guevara, Y.: Modelling the large-scale mass density field of the universe as a function of cosmology and baryonic physics. Mon. Not. R. Astron. Soc. 495(4), 4800–4819 (2020)
Blanchard, A., Camera, S., Carbone, C., Cardone, V., Casas, S., Clesse, S., Ilić, S., Kilbinger, M., Kitching, T., Kunz, M., et al.: Euclid preparation-vii. forecast validation for euclid cosmological probes. Astron. Astrophys. 642, 191 (2020)
Collaboration, E., Knabenhans, M., Stadel, J., Marelli, S., Potter, D., Teyssier, R., Legrand, L., Schneider, A., Sudret, B., Blot, L., et al.: Euclid preparation: Ii. the euclidemulator–a tool to compute the cosmology dependence of the nonlinear matter power spectrum. Mon. Not. R. Astron. Soc. 484(4), 5509–5529 (2019)
Skilling, J., et al.: Nested sampling for general bayesian computation. Bayesian Anal. 1(4), 833–859 (2006)
Feroz, F., Hobson, M.P.: Multimodal nested sampling: an efficient and robust alternative to markov chain monte carlo methods for astronomical data analyses. Mon. Not. R. Astron. Soc. 384(2), 449–463 (2008)
Speagle, J.S.: dynesty: a dynamic nested sampling package for estimating bayesian posteriors and evidences. Mon. Not. R. Astron. Soc. 493(3), 3132–3158 (2020)
Graff, P., Feroz, F., Hobson, M.P., Lasenby, A.: Neural networks for astronomical data analysis and bayesian inference. In: 2013 IEEE 13th International Conference on Data Mining Workshops, pp 16–23. IEEE (2013)
Higson, E., Handley, W., Hobson, M., Lasenby, A.: Dynamic nested sampling: an improved algorithm for parameter estimation and evidence calculation. Stat. Comput. 29(5), 891–913 (2019)
Brewer, B.J., Pártay, L.B., Csányi, G.: Diffusive nested sampling. Stat. Comput. 21(4), 649–656 (2011)
Akeret, J., Refregier, A., Amara, A., Seehars, S., Hasner, C.: Approximate bayesian computation for forward modeling in cosmology. J. Cosmol. Astropart. Phys. 2015(08), 043 (2015)
Taylor, P.L., Kitching, T.D., Alsing, J., Wandelt, B.D., Feeney, S.M., McEwen, J.D.: Cosmic shear: Inference from forward models. Phys. Rev. D 100(2), 023519 (2019)
Savage, R.S., Oliver, S.: Bayesian methods of astronomical source extraction. Astrophys. J. 661(2), 1339 (2007)
Rogers, K.K., Peiris, H.V., Pontzen, A., Bird, S., Verde, L., Font-Ribera, A.: Bayesian emulator optimisation for cosmology: application to the lyman-alpha forest. J. Cosmol. Astropart. Phys. 2019(02), 031 (2019)
Ishida, E., Vitenti, S., Penna-Lima, M., Cisewski, J., de Souza, R., Trindade, A., Cameron, E., Busti, V., collaboration, C., et al.: Cosmoabc: likelihood-free inference via population monte carlo approximate bayesian computation. Astron. Comput. 13, 1–11 (2015)
Cameron, E., Pettitt, A.: Approximate bayesian computation for astronomical model analysis: a case study in galaxy demographics and morphological transformation at high redshift. Mon. Not. R. Astron. Soc. 425(1), 44–65 (2012)
Leclercq, F.: Bayesian optimization for likelihood-free cosmological inference. Phys. Rev. D 98(6), 063511 (2018)
Pellejero-Ibañez, M., Angulo, R.E., Aricó, G., Zennaro, M., Contreras, S., Stücker, J: Cosmological parameter estimation via iterative emulation of likelihoods. Mon. Not. R. Astron. Soc. 499(4), 5257–5268 (2020)
Gao, G., Jiang, H., Vink, J.C., Chen, C., El Khamra, Y., Ita, J.J.: Gaussian mixture model fitting method for uncertainty quantification by conditioning to production data. Comput. Geosc, 1–19 (2019)
Kristiadi, A., Däubener, S., Fischer, A.: Predictive uncertainty quantification with compound density networks. arXiv:1902.01080 (2019)
Zhu, Y., Zabaras, N.: Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification. J. Comput. Phys. 366, 415–447 (2018)
Goyal, J.M., Wakeford, H.R., Mayne, N.J., Lewis, N.K., Drummond, B., Sing, D.K: Fully scalable forward model grid of exoplanet transmission spectra. Mon. Not. R. Astron. Soc. 482(4), 4503–4513 (2019)
Schmidt, F., Elsner, F., Jasche, J., Nguyen, N.M., Lavaux, G.: A rigorous eft-based forward model for large-scale structure. J. Cosmol. Astropart. Phys. 2019(01), 042 (2019)
Bailer-Jones, C.A: The ilium forward modelling algorithm for multivariate parameter estimation and its application to derive stellar parameters from gaia spectrophotometry. Mon. Not. R. Astron. Soc. 403(1), 96–116 (2010)
Sartori, L.F., Trakhtenbrot, B., Schawinski, K., Caplar, N., Treister, E., Zhang, C.: A forward modeling approach to agn variability–method description and early applications. Astrophys. J. 883(2), 139 (2019)
Hu, F.-M., Jiang, M.-H: The fuzzy classification of the solar cycle and the prediction for the 22nd solar cycle. ChJSS 5, 237–244 (1985)
Metcalfe, T.S: Genetic-algorithm-based light-curve optimization applied to observations of the w ursae majoris star bh cassiopeiae. Astron. J 117 (5), 2503 (1999)
Ordóñez, D., Dafonte, C., Manteiga, M., Arcay, B.: Parameterization of rvs synthetic stellar spectra for the esa gaia mission: Study of the optimal domain for ann training. Expert Syst. Appl. 37(2), 1719–1727 (2010)
Spiekermann, G.: Automated morphological classification of faint galaxies. In: Digitised Optical Sky Surveys, pp 209–213. Springer (1992)
Dumitrescu, A., Pop, A., Dumitrescu, D.: Structural properties of pulsating star light curves through fuzzy divisive hierarchical clustering. Astrophys. Space Sci. 250(2), 205–226 (1997)
Rodrıéguez, A., Arcay, B., Dafonte, C., Manteiga, M., Carricajo, I.: Automated knowledge-based analysis and classification of stellar spectra using fuzzy reasoning. Expert Syst. Appl. 27(2), 237–244 (2004)
Liu, Z.-B., Gao, Y.-Y., Wang, J.-Z., et al.: Automatic classification method of star spectra data based on manifold fuzzy twin support vector machine. Spectrosc. Spectr. Anal. 35(1), 263–266 (2015)
Revathy, K., Lekshmi, S., Nayar, S.P: Fractal-based fuzzy technique for detection of active regions from solar images. Solar Phys. 228(1-2), 43–53 (2005)
Freistetter, F.: Fuzzy characterization of near-earth-asteroids. Celest. Mech. Dyn. Astron. 104(1-2), 93–102 (2009)
Shamir, L., Nemiroff, R.J: Astronomical pipeline processing using fuzzy logic. Appl. Soft Comput. 8(1), 79–87 (2008)
Attia, A.-F: Hierarchical fuzzy controllers for an astronomical telescope tracking. Appl. Soft Comput. 9(1), 135–141 (2009)
Charbonneau, P.: Genetic algorithms in astronomy and astrophysics. Astrophys. J. Suppl. Ser. 101, 309 (1995)
Jin-shu, H.: Parameter estimation of stellar population synthesis using a combined genetic algorithm. Chin. Astron. Astrophys. 39(4), 454–465 (2015)
Oussous, A., Benjelloun, F.-Z., Lahcen, A.A., Belfkih, S.: Big data technologies: A survey. J. King Saud Univ.-Comput Inf. Sci. 30(4), 431–448 (2018)
Furht, B., Villanustre, F.: Introduction to big data. In: Big Data Technologies and Applications, pp 3–11. Springer (2016)
Kapil, G., Agrawal, A., Khan, R.: A study of big data characteristics. In: 2016 International Conference on Communication and Electronics Systems (ICCES), pp 1–4. IEEE (2016)
York, D.G., Adelman, J., Anderson Jr, J.E., Anderson, S.F., Annis, J., Bahcall, N.A., Bakken, J., Barkhouser, R., Bastian, S., Berman, E., et al.: The sloan digital sky survey: Technical summary. Astron. J. 120(3), 1579 (2000)
Alam, S., Albareti, F.D., Prieto, C.A., Anders, F., Anderson, S.F., Anderton, T., Andrews, B.H., Armengaud, E., Aubourg, É., Bailey, S., et al.: The eleventh and twelfth data releases of the sloan digital sky survey: final data from sdss-iii. Astrophys. J. Suppl. Ser. 219, 12 (2015)
Vipers:the vimos public extragalactic redshift survey. http://vipers.inaf.it (2020)
Guzzo, L., Scodeggio, M., Garilli, B., Granett, B., Fritz, A., Abbas, U., Adami, C., Arnouts, S., Bel, J., Bolzonella, M., et al.: The vimos public extragalactic redshift survey (vipers)-an unprecedented view of galaxies and large-scale structure at 0.5< z< 1.2. Astron. Astrophys. 566, 108 (2014)
Manzoni, G., Scodeggio, M., Baugh, C., Norberg, P., De Lucia, G., Fritz, A., Haines, C., Zamorani, G., Gargiulo, A., Guzzo, L., et al.: Modelling the quenching of star formation activity from the evolution of the colour-magnitude relation in vipers. New Astron. 84, 101515 (2021)
The Two Micron All Sky Survey at IPAC. https://old.ipac.caltech.edu/2mass/ (As on June, 2020)
Conselice, C., Bundy, K., Trujillo, I., Coil, A., Eisenhardt, P., Ellis, R., Georgakakis, A., Huang, J., Lotz, J., Nandra, K., et al.: The properties and evolution of a k-band selected sample of massive galaxies at z 0.4–2 in the palomar/deep2 survey. Mon. Not. R. Astron. Soc. 381(3), 962–986 (2007)
The large synoptic survey telescope. https://www.lsst.org/lsst (2020)
SKA in India: science with big data. https://asi2020.astron-soc.in/workshops/workshop3/ (As on July, 2020)
Ligolaser interferometer gravitational-wave observatory. https://www.ligo.caltech.edu (As on June, 2020)
Fevre, O.L., Cassata, P., Cucciati, O., Garilli, B., Ilbert, O., Brun, V.L., Maccagni, D., Moreau, C., Scodeggio, M., Tresse, L., et al.: The vimos vlt deep survey final data release: a spectroscopic sample of 35016 galaxies and agn out to z˜ 6.7 selected with 17.5<= i_ {AB} <= 24.7. arXiv:1307.0545 (2013)
Lawrence, A., Warren, S., Almaini, O., Edge, A., Hambly, N., Jameson, R., Lucas, P., Casali, M., Adamson, A., Dye, S., et al.: The ukirt infrared deep sky survey (ukidss). Mon. Not. R. Astron. Soc. 379, 1599–1617 (2007)
Pović, M., Huertas-Company, M., Aguerri, J., Márquez, I., Masegosa, J., Husillos, C., Molino, A., Cristóbal-Hornillos, D., Perea, J., Benítez, N., et al.: The alhambra survey: reliable morphological catalogue of 22 051 early-and late-type galaxies. Mon. Not. R. Astron. Soc. 435(4), 3444–3461 (2013)
Djorgovski, S., Gal, R., Odewahn, S., De Carvalho, R., Brunner, R., Longo, G., Scaramella, R.: The palomar digital sky survey (dposs). arXiv:astro-ph/9809187 (1998)
Lintott, C.J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., Raddick, M.J., Nichol, R.C., Szalay, A., Andreescu, D., et al.: Galaxy zoo: morphologies derived from visual inspection of galaxies from the sloan digital sky survey. Mon. Not. R. Astron. Soc. 389(3), 1179–1189 (2008)
Cutri, R.e., Wright, E., Conrow, T., Fowler, J., Eisenhardt, P., Grillmair, C., Kirkpatrick, J., Masci, F., McCallon, H., Wheelock, S., et al.: Vizier online data catalog: Allwise data release (cutri+ 2013). VizieR Online Data Catalog, 328 (2021)
de Jong, J.T., Kleijn, G.A.V., Kuijken, K.H., Valentijn, E.A., et al.: The kilo-degree survey. Exp. Astron. 35(1-2), 25–44 (2013)
Zhang, Y., Zhao, Y.: Astronomy in the big data era. Data Sci. J. 14 (2015)
Wells, D.C., Greisen, E.W: Fits-a flexible image transport system. In: Image Processing in Astronomy, p 445 (1979)
Anderson, K., Alexov, A., Baehren, L., Grießmeier, J.-M., Wise, M., Renting, A.: Lofar and hdf5: Toward a new radio data standard. arXiv:1012.2266 (2010)
Goucher, G., Love, J., Leckner, H.: A discipline independent scientific data management package-the national space science common data format (cdf). step, 691 (1994)
Warren-Smith, R., Lawden, M., McIlwrath, B., Jenness, T., Draper, P.: Hds heirarchical data system: Programmer’s manual. Technical report, Technical Report. Council for the Central Laboratory of the Research … (2008)
Williams, R., Ochsenbein, F., Davenhall, C., Durand, D., Fernique, P., Giaretta, D., Hanisch, R., McGlynn, T., Szalay, A., Wicenec, A.: Votable: A proposed xml format for astronomical tables. CDS: Strasbourg 28 (2002)
Thomas, B., Shaya, E., Gass, J., Blackwell, J., Cheung, C.: An xml representation of fits-introducing fitsml. AAS 197, 116–03 (2000)
Greenfield, P., Droettboom, M., Bray, E.: Asdf: A new data format for astronomy. Astron. Comput. 12, 240–251 (2015)
Patidar, S., Rane, D., Jain, P.: A survey paper on cloud computing. In: 2012 Second International Conference on Advanced Computing & Communication Technologies, pp 394–398. IEEE (2012)
Berriman, G.B., Juve, G., Deelman, E., Regelson, M., Plavchan, P.: The application of cloud computing to astronomy: A study of cost and performance. In: 2010 Sixth IEEE International Conference on e-Science Workshops, pp 1–7. IEEE (2010)
Araya, M., Osorio, M., Díaz, M., Ponce, C., Villanueva, M., Valenzuela, C., Solar, M.: Jovial: Notebook-based astronomical data analysis in the cloud. Astron. Comput. 25, 110–117 (2018)
Grid computing to tackle the mystery of the dark universe. https://astronomynow.com/2016/11/26/grid-computing-to-tackle-the-mystery-of-the-dark-universe/ (As on December, 2020)
Spark. http://spark.apache.org/ (As on June, 2020)
Flume. https://flume.apache.org/ (As on June, 2020)
Apache Pig. https://pig.apache.org/ (As on June, 2020)
Apache Oozie. https://oozie.apache.org/ (As on June, 2020)
Statwing. https://www.statwing.com/ (As on June, 2020)
Stonebraker, M., Brown, P., Zhang, D., Becla, J.: Scidb: A database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)
Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., Curino, C.: Apache tez: A unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp 1357–1369 (2015)
Parallel supercomputing for astronomy
Liu, L., Liu, D., Lü, S., Zhang, P.: An abstract description method of map-reduce-merge using haskell. Math. Probl. Eng. 2013 (2013)
Zhou, L., Huang, M.: Challenges of software testing for astronomical big data. In: 2017 IEEE International Congress on Big Data (BigData Congress), pp 529–532. IEEE (2017)
Szalay, A.S., Gray, J., Kunszt, P., Thakar, A., Slutz, D.: Large databases in astronomy. In: Mining the Sky, pp 99–116. Springer (2001)
Brahem, M., Zeitouni, K., Yeh, L.: Astroide: a unified astronomical big data processing engine over spark. IEEE Trans. Big Data (2018)
Jacob, J.C., Katz, D.S., Miller, C.D., et al.: Grist: Grid-based Data Mining for Astronomy, Astronomical Data Analysis Software and Systems XIV, ASP Conference Series, Vol. XXX (2005)
Ivanova, M., Nes, N., Goncalves, R., Kersten, M.: Monetdb/sql meets skyserver: the challenges of a scientific database. In: 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007), pp 13–13. IEEE (2007)
Juric, M.: Large survey database: A distributed framework for storage and analysis of large datasets. AAS 217, 433–19 (2011)
Wiley, K., Connolly, A., Gardner, J., Krughoff, S., Balazinska, M., Howe, B., Kwon, Y., Bu, Y.: Astronomy in the cloud: using mapreduce for image co-addition. Publ. Astron. Soc. Pac. 123(901), 366 (2011)
Brahem, M., Zeitouni, K., Yeh, L.: Hx-match: In-memory cross-matching algorithm for astronomical big data. In: International Symposium on Spatial and Temporal Databases, pp 411–415. Springer (2017)
Brahem, M., Lopes, S., Yeh, L., Zeitouni, K.: Astrospark: towards a distributed data server for big data in astronomy. In: Proceedings of the 3rd ACM SIGSPATIAL PhD Symposium, pp 1–4 (2016)
Zhang, Z., Barbary, K., Nothaft, F.A., Sparks, E.R., Zahn, O., Franklin, M.J., Patterson, D.A., Perlmutter, S.: Kira: Processing astronomy imagery using big data technology. IEEE Trans. Big Data (2016)
Zečević, P., Slater, C.T., Jurić, M., Connolly, A.J., Lončarić, S., Bellm, E.C., Golkhou, V.Z., Suberlak, K.: Axs: A framework for fast astronomical data processing based on apache spark. Astron. J. 158 (1), 37 (2019)
Garofalo, M., Botta, A., Ventre, G.: Astrophysics and big data: Challenges, methods, and tools. Proc. Int. Astron. Union 12(S325), 345–348 (2016)
Ball, N.M.: Canfar+ skytree: A cloud computing and data mining system for astronomy. arXiv:1312.3996 (2013)
Hong, S., Jeong, D., Hwang, H.S., Kim, J., Hong, S.E., Park, C., Dey, A., Milosavljevic, M., Gebhardt, K., Lee, K.-S: Constraining cosmology with big data statistics of cosmological graphs. Mon. Not. R. Astron. Soc. 493(4), 5972–5986 (2020)
Vujčić, V., Jevremović, D: Real-time stream processing in astronomy. In: Knowledge Discovery in Big Data from Astronomy and Earth Observation, pp 173–182. Elsevier (2020)
Sciacca, E., Pistagna, C., Becciani, U., Costa, A., Massimino, P., Riggi, S., Vitello, F., Bandieramonte, M., Krokos, M.: Towards a big data exploration framework for astronomical archives. In: 2014 International Conference on High Performance Computing & Simulation (HPCS). IEEE, pp 351–357 (2014)
Fillatre, L., Lepiller, D.: Processing solutions for big data in astronomy. EAS Publ. Ser. 78, 179–208 (2016)
Zhao, Q., Sun, J., Yu, C., Cui, C., Lv, L., Xiao, J.: A paralleled large-scale astronomical cross-matching function. In: International Conference on Algorithms and Architectures for Parallel Processing, pp 604–614. Springer (2009)
Mesmoudi, A., Hacid, M.-S., Toumani, F.: Benchmarking sql on mapreduce systems using large astronomy databases. Distrib. Parallel Databases 34(3), 347–378 (2016)
Peloton, J., Arnault, C., Plaszczynski, S.: Analyzing astronomical data with apache spark. arXiv:1804.07501 (2018)
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: Efficient in-memory spatial analytics. In: Proceedings of the 2016 International Conference on Management of Data, pp 1071–1085 (2016)
Wei, S., Wang, F., Deng, H., Liu, C., Dai, W., Liang, B., Mei, Y., Shi, C., Liu, Y., Wu, J.: Opencluster: a flexible distributed computing framework for astronomical data processing. Publ. Astron Soc Pac. 129(972), 024001 (2016)
Berriman, G.B., Good, J.: The application of the montage image mosaic engine to the visualization of astronomical images. Publ. Astron. Soc. Pac. 129(975), 058006 (2017)
Corizzo, R., Ceci, M., Zdravevski, E., Japkowicz, N.: Scalable auto-encoders for gravitational waves detection from time series data. Expert Syst. Appl., 113378 (2020)
Sen, S., Saha, S., Chakraborty, P., Singh, K.P: Implementation of neural network regression model for faster redshift analysis on cloud-based spark platform. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp 591–602. Springer (2021)
Vanderplas, J.T., Connolly, ž, Ivezić, A.J., Gray, A.: Introduction to astroml: Machine learning for astrophysics, pp. 47–54. https://doi.org/10.1109/CIDU.2012.6382200 (2012)
Saha, S., Agrawal, S., R, M., Bora, K., Routh, S., Narasimhamurthy, A.: ASTROMLSKIT: a new statistical machine learning toolkit: a platform for data analytics in astronomy (2015)
Astropy. https://www.astropy.org/ (As on June, 2020)
González, R.E., Muñoz, R.P., Hernández, C.A.: Astrocv: Astronomy computer vision library. ASCL, 1804 (2018)
http://astroweka.sourceforge.net/: Astroweka (Collected on June,2020)
pyfits 3.3. https://pypi.org/project/pyfits/3.3/ (As on June, 2020)
Singh, N., Browne, L.-M., Butler, R.: Parallel astronomical data processing with python: Recipes for multicore machines. Astron. Comput. 2, 1–10 (2013)
pyraf. http://astro.if.ufrgs.br/ (As on June, 2020)
Khlamov, S., Savanevych, V., Briukhovetskyi, O., Pohorelov, A., Vlasenko, V., Dikov, E.: Colitec software for the astronomical data sets processing. In: 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), pp 227–230. IEEE (2018)
Breddels, M.A., Veljanoski, J.: Vaex: big data exploration in the era of gaia. Astron. Astrophys. 618, 13 (2018)
https://authors.library.caltech.edu/50265//: Dameware, a web cyberinfrastructure for astrophysical data mining (As on June, 2020)
Welge, M., Hsu, W., Auvil, L., Redman, T., Tcheng, D.: High-performance knowledge discovery and data mining systems using workstation clusters. In: 12th National Conference on High Performance Networking and Computing (SC99) (1999)
https://ipython.org/: Ipython interactive computing (As on June, 2020)
Yu, W., Kind, M.C., Brunner, R.J: Vizic: A jupyter-based interactive visualization tool for astronomical catalogs. Astron. Comput. 20, 128–139 (2017)
https://astrostatistics.psu.edu/statcodes/: Online statistical software for astronomy and related physical sciences (As on June, 2020)
Jacob, J.C., Katz, D.S., Berriman, G.B., Good, J.C., Laity, A., Deelman, E., Kesselman, C., Singh, G., Su, M.-H., Prince, T., et al.: Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking. Int. J. Comput. Sci. Eng. 4(2), 73–87 (2009)
Tools for astronomical big data. https://www.noao.edu/meetings/bigdata/ (As on July, 2020)
Big data and astronomy. http://www.astro4dev.org/jan-mar-2017/ (As on July, 2020)
2nd Australia-China SKA big data workshop. https://eridanus.net.au/?p=269 (As on July, 2020)
Artificial intelligence in astronomy. https://www.eso.org/sci/meetings/2019/AIA2019.html (As on July, 2020)
Machine learning tools for research in astronomy. https://www2.mpia-hd.mpg.de/ml2019/ (As on July, 2020)
Swiss-SA Astronomy. https://astro.ukzn.ac.za/swiss-sa-astronomy-big-data-workshop/ (As on July, 2020)
Innovation in data driven astronomy. https://www.nrao.edu/meetings/bigdata/index.shtml (As on June, 2011)
Data science for physics. https://www.turing.ac.uk/events/data-science-physics-and-astronomy-scoping-workshop/ (As on July, 2020)
International conference on modeling, machine learning and astronomy. http://mmla.pes.edu/ (As on June, 2019)
Bigdata and digital technoloy. https://indico.narit.or.th/ (As on July, 2020)
Data science in astrophysics. https://dsap.iiita.ac.in/ (As on June, 2020)
Machine learning in astronomical data analysis. http://hea-www.harvard.edu/AstroStat/aas233/special.html/ (As on July, 2020)
Bigdata challenge. dca2019.csp.escience.cn (As on July, 2020)
AstroInformatics virtual conference. https://www.astroinformatics2020.org/ (As on July, 2020)
Workshop: astronomical data science. https://tamids.tamu.edu/2020 (As on July, 2020)
IAU symposia. https://www.iau.org/science/meetings/future/symposia/2528/ (As on July, 2020)
Astronomy in the big data era. https://generalassemb.ly/education/astronomy-in-the-big-data-era/dallas/ (As on July, 2020)
ADASS. http://adass2018.astro.umd.edu/ (As on July, 2020)
Berriman, G.B., Groom, S.L.: How will astronomy archives survive the data tsunami? Communications of the ACM (2011)
Nichols, M.R.: The fast and the curious: How’s big data changing astronomy?. https://schooledbyscience.com/big-datas-changing-astronomy/ (2016)
How Big Data Analytics is shaping Astronomy. https://runyourbusiness.deskera.in/big-data-analytics-shaping-astronomy/ (As on July, 2020)
Big data is transforming. https://www.smithsonianmag.com/science-nature/ (As on July, 2020)
Henry, L.: Data’s big bang: Applying analytics to astronomy. https://www.informationweek.com/datas-big-bang-applying-analytics-to-astronomy/a/d-id/282405 (2017)
Andersen, R.: How big data is changing astronomy (again). https://www.theatlantic.com/technology/archive/2012/04/how-big-data-is-changing-astronomy-again/255917/ (2012)
Urton, J.: With launch of new night sky survey, uw researchers ready for era of ‘big data’ astronomy. https://www.washington.edu/news/2017/11/14/with-launch-of-new-night-sky-survey-uw-researchers-ready-for-era-of-big-data-astronomy/ (2017)
Norris, R.: Expect the unexpected from the big-data boom in radio astronomy. https://phys.org/news/2017-09-unexpected-big-data-boom-radioastronomy.html(2017)
Mcguire, A.: It’s the turn of the celestial world now, big data transforming astronomy!. https://irishtechnews.ie/its-the-turn-of-the-celestial-world-now-big-data-transforming-astronomy/ (As on June, 2020)
Data science in astronomy. https://medium.com/trends-in-data-science/data-science-in-astronomy-f0e9b499273/ (As on July, 2020)
Ananthaswamy, A.: Faced with a data deluge, astronomers turn to automation. https://irishtechnews.ie/its-the-turn-of-the-celestial-world-now-big-data-transforming-astronomy/ (As on June, 2020)
Beginning with ML. https://beginningwithml.wordpress.com/ (2020)
Murphy, T.: Data-driven astronomy. https://www.coursera.org/learn/data-driven-astronomy (As on June, 2020)
DataMining and machine learning in astronomy. https://www.as.arizona.edu/ (As on July, 2020)
University, L.: Astronomy and data science. https://www.mastersportal.com/studies/188902/astronomy-and-data-science.html, (As on June, 2020)
BigSkyEarth. https://bigskyearth.eu/ (As on July, 2020)
Astrostatistics and astroinformatics portal. https://asaip.psu.edu// (As on June, 2020)
Astroinformatics research group. http://astrirg.org/ (As on June, 2020)
IDIA data intensive astronomy cloud. http://www.researchsupport.uct.ac.za/idia-data-intensive-astronomy-cloud (As on June, 2020)
Linghe Kong, Y.Z., Tian Huang, Y, S.: Big data in astronomy (16th June 2020)
Edwards, K.J., Gaber, M.M.: Astronomy and big data. Studies in Big Data. Springer (2014)
Skoda, P., Adam, F.: Knowledge discovery in big data from astronomy and earth observation 1st edition (2020)
Murtagh, F., Heck, A.: Multivariate data analysis. 131 (2012)
Wall, J.V., Jenkins, C.R.: Practical statistics for astronomers (2012)
Ivezić, ž, Connolly, A.J., VanderPlas, J.T., Gray, A.: Statistics, data mining, and machine learning in astronomy: a practical python guide for the analysis of survey data. 1 (2014)
Cavuoti, S.: Data-rich astronomy: mining synoptic sky surveys. arXiv:1304.6615 (2013)
Babu, G.J., Feigelson, E.D.: Statistical challenges in modern astronomy ii (2012)
Feigelson, E.D., Jogesh, B.G.: Statistical challenges in modern astronomy ii (1997)
Podgorski, K.: Advances in machine learning and data mining for astronomy. Int. Stat. Rev 82(1), 153–154 (2014)
Kumar, M.: Sparse image and signal processing: Wavelets, curvelets, morphological diversity, by jean-luc starck, fionn murtagh, and jalal m. fadili. J. Electron. Imaging 19(4), 049901 (2007)
Chattopadhyay, A.K., Chattopadhyay, T.: Statistical Methods for Astronomical Data Analysis, vol. 3. Springer, New York (2014)
et al., S.S.: Machine learning in astronomy: A workman’s manual (2017)
Scientific discovery through advanced computing. https://www.scidac.gov/ (As on January, 2021)
Project. https://www.scidac.org/tags/projects.html (As on January, 2021)
SciDAC-3 scientific computation application partnership project. https://www.bnl.gov/physics/scidac/ (As on January, 2021)
LQCD SciDAC-4 project. https://lqcdscidac4.github.io/ (As on January, 2021)
Frameworks, algorithms and scalable technologies for mathematics (FASTMath) SciDAC-5 Institute. https://scidac5-fastmath.lbl.gov/home (As on January, 2021)
Laureijs, R., Amiaux, J., Arduini, S., Augueres, J.-L., Brinchmann, J., Cole, R., Cropper, M., Dabin, C., Duvet, L., Ealet, A., et al.: Euclid definition study report. arXiv:1110.3193 (2011)
AWS SageMaker. https://aws.amazon.com/sagemaker/ (As on July, 2020)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Sonali Agarwal, Pavan Chakraborty and Krishna Pratap Singh contributed equally to this work.
Rights and permissions
About this article
Cite this article
Sen, S., Agarwal, S., Chakraborty, P. et al. Astronomical big data processing using machine learning: A comprehensive review. Exp Astron 53, 1–43 (2022). https://doi.org/10.1007/s10686-021-09827-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10686-021-09827-4