Abstract
Classification of young stellar objects (YSOs) into different evolutionary stages helps us to understand the formation process of new stars and planetary systems. Such classification has traditionally been based on spectral energy distribution (SED) analysis. An alternative approach is provided by supervised machine learning algorithms, which can be trained to classify large samples of YSOs much faster than via SED analysis. We attempt to classify a sample of Orion YSOs (the parent sample size is 330) into different classes, where each source has already been classified using multiwavelength SED analysis. We used eight different learning algorithms to classify the target YSOs, namely a decision tree, random forest, gradient boosting machine (GBM), logistic regression, naïve Bayes classifier, \(k\)-nearest neighbour classifier, support vector machine, and neural network. The classifiers were trained and tested by using a 10-fold cross-validation procedure. As the learning features, we employed ten different continuum flux densities spanning from the near-infrared to submillimetre wavebands (\(\lambda= 3.6\mbox{--}870~\upmu\mbox{m}\)). With a classification accuracy of 82% (with respect to the SED-based classes), a GBM algorithm was found to exhibit the best performance. The lowest accuracy of 47% was obtained with a naïve Bayes classifier. Our analysis suggests that the inclusion of the \(3.6~\upmu\mbox{m}\) and \(24~\upmu\mbox{m}\) flux densities is useful to maximise the YSO classification accuracy. Although machine learning has the potential to provide a rapid and fairly reliable way to classify YSOs, an SED analysis is still needed to derive the physical properties of the sources (e.g. dust temperature and mass), and to create the labelled training data. The machine learning classification accuracies can be improved with respect to the present results by using larger data sets, more detailed missing value imputation, and advanced ensemble methods (e.g. extreme gradient boosting). Overall, the application of machine learning is expected to be very useful in the era of big astronomical data, for example to quickly assemble interesting target source samples for follow-up studies.
Similar content being viewed by others
Notes
Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.
References
Abdi, H., Williams, L.J.: Wiley Interdiscip. Rev.: Comput. Stat. 2(4), 433–459 (2010)
Adams, F.C., Lada, C.J., Shu, F.H.: Astrophys. J. 312, 788 (1987)
Alpaydin, E.: Introduction to Machine Learning, 2nd edn. MIT Press, Cambridge (2010)
Altman, N.S.: Am. Stat. 46(3), 175–185 (1992)
An, F., Stach, S.M., Smail, I., et al.: Astrophys. J. 862(2), 101 (2018). arXiv:1806.06859
André, P., Montmerle, T.: Astrophys. J. 420, 837 (1994)
André, P., Ward-Thompson, D., Barsony, M.: Astrophys. J. 406, 122 (1993)
André, P., Ward-Thompson, D., Barsony, M.: In: Mannings, V., Boss, A.P., Russell, S.S. (eds.) Protostars and Planets IV, p. 59. University of Arizona Press, Tuscon (2000)
André, P., Men’shchikov, A., Bontemps, S., et al.: Astron. Astrophys. 518, L102 (2010)
Aniyan, A.K., Thorat, K.: Astrophys. J. Suppl. Ser. 230, 20 (2017)
Ball, N.M., Brunner, R.J.: Int. J. Mod. Phys. D 19, 1049 (2010)
Beck, M.R., Scarlata, C., Fortson, L.F., et al.: Mon. Not. R. Astron. Soc. 476(4), 5516–5534 (2018). arXiv:1802.08713
Box, G.E.P., Meyer, R.D.: Technometrics 28(1), 11–18 (1986)
Breinman, L.: Technical Report 486, Statistics Department, University of California, Berkeley, CA 94720 (1997)
Breinman, L.: Mach. Learn. 45(1), 5–32 (2001)
Breinman, L., Friedman, J.H., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Taylor & Francis, London (1984)
Burges, C.: Data Min. Knowl. Discov. 2(2), 1–47 (1998)
Cawley, G.C., Talbot, N.L.C.: J. Mach. Learn. Res. 11, 2079–2107 (2010)
Chen, T., Guestrin, C.: arXiv:1603.02754 (2016)
Cortes, C., Vapnik, V.N.: Mach. Learn. 20(3), 273–297 (1995)
Cover, T., Hart, P.: IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Cox, D.R.: J. R. Stat. Soc. B 20, 215–242 (1958)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Domingos, P., Pazzani, M.: In: Saitta, L. (ed.) Proceedings of the Thirteenth International Conference on Machine Learning, pp. 105–112. Morgan Kaufmann, San Francisco (1996)
Domínguez Sánchez, H., Huertas-Company, M., Bernardi, M., et al.: Mon. Not. R. Astron. Soc. (2018, in press). arXiv:1807.00807
Draine, B.T.: Annu. Rev. Astron. Astrophys. 41, 241 (2003)
Dunham, M.M., Stutz, A.M., Allen, L.E., et al.: In: Beuther, H., Klessen, R.S., Dullemond, C.P., Henning, Th. (eds.) Protostars and Planets VI, p. 195. University of Arizona Press, Tucson (2014). 914 pp.
Dunham, M.M., Allen, L.E., Evans, N.J. II, et al.: Astrophys. J. Suppl. Ser. 220, 11 (2015)
Evans, N.J. II, Dunham, M.M., Jørgensen, J.K., et al.: Astrophys. J. Suppl. Ser. 181, 321 (2009)
Fawcett, T.: Pattern Recognit. Lett. 27(8), 861–874 (2006)
Fazio, G.G., Hora, J.L., Allen, L.E., et al.: Astrophys. J. Suppl. Ser. 154, 10 (2004)
Fischer, W.J., Megeath, S.T., Furlan, E., et al.: Astrophys. J. 840, 69 (2017)
Friedman, J.H.: Comput. Stat. Data Anal. 38, 367–378 (1999)
Friedman, J.H.: Ann. Stat. 29(5), 1189–1232 (2001)
Furlan, E., Fischer, W.J., Ali, B., et al.: Astrophys. J. Suppl. Ser. 224, 5 (2016). FFA16
Greene, T.P., Wilking, B.A., André, P., et al.: Astrophys. J. 434, 614 (1994)
Güsten, R., Nyman, L.Å., Schilke, P., et al.: Astron. Astrophys. 454, L13 (2006)
Hassanat, A.B., Mohammad, A.A., Altarawneh, G.A., et al.: Int. J. Comput. Sci. Inf. Secur. 12(8), 33–39 (2014)
Hawkins, D.M.: J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley–IEEE Press, New York (2013)
Ho, T.K.: In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995, pp. 278–282 (1995)
Hocking, A., Geach, J.E., Sun, Y., Davey, N.: Mon. Not. R. Astron. Soc. 473, 1108 (2018)
Hora, J.L., Marengo, M., Park, R., et al.: Proc. SPIE 8442, 844239 (2012)
Hotelling, H.: J. Educ. Psychol. 24(6), 417 (1933)
Houck, J.R., Roellig, T.L., van Cleve, J., et al.: Astrophys. J. Suppl. Ser. 154, 18 (2004)
Hui, J., Aragon, M., Cui, X., Flegal, J.M.: Mon. Not. R. Astron. Soc. 475, 4494 (2018)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning with Applications in R, 8th edn. Springer, New York (2017)
Jeffrey, W., Rosner, R.: Astrophys. J. 310, 473 (1986)
Jolliffe, I.: Principal Component Analysis. Springer, Berlin (2002)
Kotsiantis, S.B.: Informatica 31, 249–268 (2007)
Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: GESTS Int. Trans. Comput. Sci. Eng. 30(1), 26–36 (2006a)
Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Artif. Intell. Rev. 26, 159–190 (2006b)
Krakowski, T., Małek, K., Bilicki, M., et al.: Astron. Astrophys. 596, A39 (2016)
Lada, C.J.: In: IAU Symposium 115: Star Forming Regions, pp. 1–18. Reidel, Dordrecht (1987)
Lada, C.J., Wilking, B.A.: Astrophys. J. 287, 610 (1984)
Lantz, B.: Machine Learning with R, 2nd edn. Packt, Birmingham (2015)
Little, R.J.A.: Missing data adjustments in large surveys (with discussion). J. Bus. Econ. Stat. 6, 287–301 (1988)
Lochner, M., McEwen, J.D., Peiris, H.V., et al.: Astrophys. J. Suppl. Ser. 225, 31 (2016)
Lukic, V., Brüggen, M., Banfield, J.K., et al.: Mon. Not. R. Astron. Soc. 476, 246 (2018)
Marton, G., Tóth, L.V., Paladini, R., et al.: Mon. Not. R. Astron. Soc. 458, 3479 (2016)
Matthews, B.W.: Biochim. Biophys. Acta, Protein Struct. 405(2), 442–451 (1975)
McCallum, A., Nigam, K.: In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)
McCulloch, W., Pitts, W.H. Jr.: Bull. Math. Biophys. 5(4), 115–133 (1943)
Megeath, S.T., Gutermuth, R., Muzerolle, J., et al.: Astron. J. 144, 192 (2012)
Miettinen, O.: Astrophys. Space Sci. 361, 248 (2016)
Miettinen, O., Harju, J., Haikala, L.K., et al.: Astron. Astrophys. 500, 845 (2009)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Mosteller, F., Turkey, J.W.: Data analysis, including statistics. In: Lindzey, G., Aronson, E. (eds.) Handbook of Social Psychology, vol. 2. Addison-Wesley, Reading (1968)
Murthy, S.K.: Data Min. Knowl. Discov. 2, 345–389 (1998)
Myers, P.C., Ladd, E.F.: Astrophys. J. Lett. 413, L47 (1993)
Pashchenko, I.N., Sokolovsky, K.V., Gavras, P.: Mon. Not. R. Astron. Soc. 475, 2326 (2018)
Pearson, K.: Philos. Mag. 2(11), 559–572 (1901)
Pearson, K.A., Palafox, L., Griffith, C.A.: Mon. Not. R. Astron. Soc. 474, 478 (2018)
Pilbratt, G.L., Riedinger, J.R., Passvogel, T., et al.: Astron. Astrophys. 518, L1 (2010)
Poglitsch, A., Waelkens, C., Geis, N., et al.: Astron. Astrophys. 518, L2 (2010)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Rathborne, J.M., Jackson, J.M., Chambers, E.T., et al.: Astrophys. J. 715, 310 (2010)
Rieke, G.H., Young, E.T., Engelbracht, C.W., et al.: Astrophys. J. Suppl. Ser. 154, 25 (2004)
Rosenblatt, F.: Psychol. Rev. 65(6), 386–408 (1958)
Saar-Tsechansky, M., Provost, F.: J. Mach. Learn. Res. 8, 1625–1657 (2007)
Sauvage, M., Okumura, K., Klaas, U., et al.: Exp. Astron. 37, 397 (2014)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Shetty, R., Kauffmann, J., Schnee, S., Goodman, A.A., Ercolano, B.: Astrophys. J. 696, 2234 (2009)
Siringo, G., Kreysa, E., Kovács, A., et al.: Astron. Astrophys. 497, 945 (2009)
Siringo, G., Kreysa, E., De Breuck, C., et al.: Messenger 139, 20 (2010)
Skrutskie, M.F., Cutri, R.M., Stiening, R., et al.: Astron. J. 131, 1163 (2006)
Spezzi, L., Petr-Gotzens, M.G., Alcalá, J.M., et al.: Astron. Astrophys. 581, A140 (2015)
Sreejith, S., Pereverzyev, S. Jr., Kelvin, L.S., et al.: Mon. Not. R. Astron. Soc. 474, 5232 (2018)
Stutz, A.M., Tobin, J.J., Stanke, T., et al.: Astrophys. J. 767, 36 (2013)
Tangaro, S., Amoroso, N., Brescia, M., et al.: Comput. Math. Methods Med. 2015, 814104 (2015)
van Buuren, S., Groothuis-Oudshoorn, K.: J. Stat. Softw. 45(3), 1–67 (2011)
Vapnik, V., Lerner, A.: Autom. Remote Control 24, 774–780 (1963)
White, R.J., Greene, T.P., Doppmann, G.W., et al.: In: Reipurth, B., Jewitt, D., Keil, K. (eds.) Protostars and Planets V, p. 117. University of Arizona Press, Tucson (2007). 951 pp.
Witten, I.H., Frank, E.: Data Mining—Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann/Elsevier Inc., San Mateo/Amsterdam (2005)
Yan, Q.-Z., Xu, Y., Walsh, A.J., et al.: Mon. Not. R. Astron. Soc. 476, 3981 (2018)
Ybarra, J.E., Lada, E.A.: Astrophys. J. Lett. 695, L120 (2009)
Yee, J.C., Fazio, G.G., Benjamin, R., et al.: arXiv:1710.04194 (2017)
Zhang, G.: IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 30(4), 451–462 (2000)
Zhang, H.: Astron. Astrophys. 1(2), 3 (2004)
Acknowledgements
I would like to thank the referee for providing constructive comments and suggestions that helped to improve the quality of this paper. This research has made use of NASA’s Astrophysics Data System and the NASA/IPAC Infrared Science Archive, which is operated by the JPL, California Institute of Technology, under contract with the NASA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Miettinen, O. Protostellar classification using supervised machine learning algorithms. Astrophys Space Sci 363, 197 (2018). https://doi.org/10.1007/s10509-018-3418-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10509-018-3418-7