Skip to main content
Log in

Protostellar classification using supervised machine learning algorithms

  • Original Article
  • Published:
Astrophysics and Space Science Aims and scope Submit manuscript

Abstract

Classification of young stellar objects (YSOs) into different evolutionary stages helps us to understand the formation process of new stars and planetary systems. Such classification has traditionally been based on spectral energy distribution (SED) analysis. An alternative approach is provided by supervised machine learning algorithms, which can be trained to classify large samples of YSOs much faster than via SED analysis. We attempt to classify a sample of Orion YSOs (the parent sample size is 330) into different classes, where each source has already been classified using multiwavelength SED analysis. We used eight different learning algorithms to classify the target YSOs, namely a decision tree, random forest, gradient boosting machine (GBM), logistic regression, naïve Bayes classifier, \(k\)-nearest neighbour classifier, support vector machine, and neural network. The classifiers were trained and tested by using a 10-fold cross-validation procedure. As the learning features, we employed ten different continuum flux densities spanning from the near-infrared to submillimetre wavebands (\(\lambda= 3.6\mbox{--}870~\upmu\mbox{m}\)). With a classification accuracy of 82% (with respect to the SED-based classes), a GBM algorithm was found to exhibit the best performance. The lowest accuracy of 47% was obtained with a naïve Bayes classifier. Our analysis suggests that the inclusion of the \(3.6~\upmu\mbox{m}\) and \(24~\upmu\mbox{m}\) flux densities is useful to maximise the YSO classification accuracy. Although machine learning has the potential to provide a rapid and fairly reliable way to classify YSOs, an SED analysis is still needed to derive the physical properties of the sources (e.g. dust temperature and mass), and to create the labelled training data. The machine learning classification accuracies can be improved with respect to the present results by using larger data sets, more detailed missing value imputation, and advanced ensemble methods (e.g. extreme gradient boosting). Overall, the application of machine learning is expected to be very useful in the era of big astronomical data, for example to quickly assemble interesting target source samples for follow-up studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.

  2. https://www.itl.nist.gov/div898/handbook/eda/section3/scatterb.htm.

References

  • Abdi, H., Williams, L.J.: Wiley Interdiscip. Rev.: Comput. Stat. 2(4), 433–459 (2010)

    Article  Google Scholar 

  • Adams, F.C., Lada, C.J., Shu, F.H.: Astrophys. J. 312, 788 (1987)

    Article  ADS  Google Scholar 

  • Alpaydin, E.: Introduction to Machine Learning, 2nd edn. MIT Press, Cambridge (2010)

    MATH  Google Scholar 

  • Altman, N.S.: Am. Stat. 46(3), 175–185 (1992)

    Google Scholar 

  • An, F., Stach, S.M., Smail, I., et al.: Astrophys. J. 862(2), 101 (2018). arXiv:1806.06859

    Article  ADS  Google Scholar 

  • André, P., Montmerle, T.: Astrophys. J. 420, 837 (1994)

    Article  ADS  Google Scholar 

  • André, P., Ward-Thompson, D., Barsony, M.: Astrophys. J. 406, 122 (1993)

    Article  ADS  Google Scholar 

  • André, P., Ward-Thompson, D., Barsony, M.: In: Mannings, V., Boss, A.P., Russell, S.S. (eds.) Protostars and Planets IV, p. 59. University of Arizona Press, Tuscon (2000)

    Google Scholar 

  • André, P., Men’shchikov, A., Bontemps, S., et al.: Astron. Astrophys. 518, L102 (2010)

    Article  ADS  Google Scholar 

  • Aniyan, A.K., Thorat, K.: Astrophys. J. Suppl. Ser. 230, 20 (2017)

    Article  ADS  Google Scholar 

  • Ball, N.M., Brunner, R.J.: Int. J. Mod. Phys. D 19, 1049 (2010)

    Article  ADS  Google Scholar 

  • Beck, M.R., Scarlata, C., Fortson, L.F., et al.: Mon. Not. R. Astron. Soc. 476(4), 5516–5534 (2018). arXiv:1802.08713

    Article  ADS  Google Scholar 

  • Box, G.E.P., Meyer, R.D.: Technometrics 28(1), 11–18 (1986)

    Article  MathSciNet  Google Scholar 

  • Breinman, L.: Technical Report 486, Statistics Department, University of California, Berkeley, CA 94720 (1997)

  • Breinman, L.: Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  • Breinman, L., Friedman, J.H., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Taylor & Francis, London (1984)

    Google Scholar 

  • Burges, C.: Data Min. Knowl. Discov. 2(2), 1–47 (1998)

    Article  Google Scholar 

  • Cawley, G.C., Talbot, N.L.C.: J. Mach. Learn. Res. 11, 2079–2107 (2010)

    MathSciNet  Google Scholar 

  • Chen, T., Guestrin, C.: arXiv:1603.02754 (2016)

  • Cortes, C., Vapnik, V.N.: Mach. Learn. 20(3), 273–297 (1995)

    Google Scholar 

  • Cover, T., Hart, P.: IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  • Cox, D.R.: J. R. Stat. Soc. B 20, 215–242 (1958)

    Google Scholar 

  • Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

    Book  MATH  Google Scholar 

  • Domingos, P., Pazzani, M.: In: Saitta, L. (ed.) Proceedings of the Thirteenth International Conference on Machine Learning, pp. 105–112. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  • Domínguez Sánchez, H., Huertas-Company, M., Bernardi, M., et al.: Mon. Not. R. Astron. Soc. (2018, in press). arXiv:1807.00807

  • Draine, B.T.: Annu. Rev. Astron. Astrophys. 41, 241 (2003)

    Article  ADS  Google Scholar 

  • Dunham, M.M., Stutz, A.M., Allen, L.E., et al.: In: Beuther, H., Klessen, R.S., Dullemond, C.P., Henning, Th. (eds.) Protostars and Planets VI, p. 195. University of Arizona Press, Tucson (2014). 914 pp.

    Google Scholar 

  • Dunham, M.M., Allen, L.E., Evans, N.J. II, et al.: Astrophys. J. Suppl. Ser. 220, 11 (2015)

    Article  ADS  Google Scholar 

  • Evans, N.J. II, Dunham, M.M., Jørgensen, J.K., et al.: Astrophys. J. Suppl. Ser. 181, 321 (2009)

    Article  ADS  Google Scholar 

  • Fawcett, T.: Pattern Recognit. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  • Fazio, G.G., Hora, J.L., Allen, L.E., et al.: Astrophys. J. Suppl. Ser. 154, 10 (2004)

    Article  ADS  Google Scholar 

  • Fischer, W.J., Megeath, S.T., Furlan, E., et al.: Astrophys. J. 840, 69 (2017)

    Article  ADS  Google Scholar 

  • Friedman, J.H.: Comput. Stat. Data Anal. 38, 367–378 (1999)

    Article  Google Scholar 

  • Friedman, J.H.: Ann. Stat. 29(5), 1189–1232 (2001)

    Article  Google Scholar 

  • Furlan, E., Fischer, W.J., Ali, B., et al.: Astrophys. J. Suppl. Ser. 224, 5 (2016). FFA16

    Article  ADS  Google Scholar 

  • Greene, T.P., Wilking, B.A., André, P., et al.: Astrophys. J. 434, 614 (1994)

    Article  ADS  Google Scholar 

  • Güsten, R., Nyman, L.Å., Schilke, P., et al.: Astron. Astrophys. 454, L13 (2006)

    Article  ADS  Google Scholar 

  • Hassanat, A.B., Mohammad, A.A., Altarawneh, G.A., et al.: Int. J. Comput. Sci. Inf. Secur. 12(8), 33–39 (2014)

    Google Scholar 

  • Hawkins, D.M.: J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)

    Article  MathSciNet  Google Scholar 

  • He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley–IEEE Press, New York (2013)

    Book  MATH  Google Scholar 

  • Ho, T.K.: In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995, pp. 278–282 (1995)

    Google Scholar 

  • Hocking, A., Geach, J.E., Sun, Y., Davey, N.: Mon. Not. R. Astron. Soc. 473, 1108 (2018)

    Article  ADS  Google Scholar 

  • Hora, J.L., Marengo, M., Park, R., et al.: Proc. SPIE 8442, 844239 (2012)

    Article  Google Scholar 

  • Hotelling, H.: J. Educ. Psychol. 24(6), 417 (1933)

    Article  Google Scholar 

  • Houck, J.R., Roellig, T.L., van Cleve, J., et al.: Astrophys. J. Suppl. Ser. 154, 18 (2004)

    Article  ADS  Google Scholar 

  • Hui, J., Aragon, M., Cui, X., Flegal, J.M.: Mon. Not. R. Astron. Soc. 475, 4494 (2018)

    Article  ADS  Google Scholar 

  • James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning with Applications in R, 8th edn. Springer, New York (2017)

    MATH  Google Scholar 

  • Jeffrey, W., Rosner, R.: Astrophys. J. 310, 473 (1986)

    Article  ADS  Google Scholar 

  • Jolliffe, I.: Principal Component Analysis. Springer, Berlin (2002)

    MATH  Google Scholar 

  • Kotsiantis, S.B.: Informatica 31, 249–268 (2007)

    MathSciNet  Google Scholar 

  • Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: GESTS Int. Trans. Comput. Sci. Eng. 30(1), 26–36 (2006a)

    Google Scholar 

  • Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Artif. Intell. Rev. 26, 159–190 (2006b)

    Article  Google Scholar 

  • Krakowski, T., Małek, K., Bilicki, M., et al.: Astron. Astrophys. 596, A39 (2016)

    Article  ADS  Google Scholar 

  • Lada, C.J.: In: IAU Symposium 115: Star Forming Regions, pp. 1–18. Reidel, Dordrecht (1987)

    Google Scholar 

  • Lada, C.J., Wilking, B.A.: Astrophys. J. 287, 610 (1984)

    Article  ADS  Google Scholar 

  • Lantz, B.: Machine Learning with R, 2nd edn. Packt, Birmingham (2015)

    Google Scholar 

  • Little, R.J.A.: Missing data adjustments in large surveys (with discussion). J. Bus. Econ. Stat. 6, 287–301 (1988)

    Google Scholar 

  • Lochner, M., McEwen, J.D., Peiris, H.V., et al.: Astrophys. J. Suppl. Ser. 225, 31 (2016)

    Article  ADS  Google Scholar 

  • Lukic, V., Brüggen, M., Banfield, J.K., et al.: Mon. Not. R. Astron. Soc. 476, 246 (2018)

    Article  ADS  Google Scholar 

  • Marton, G., Tóth, L.V., Paladini, R., et al.: Mon. Not. R. Astron. Soc. 458, 3479 (2016)

    Article  ADS  Google Scholar 

  • Matthews, B.W.: Biochim. Biophys. Acta, Protein Struct. 405(2), 442–451 (1975)

    Article  Google Scholar 

  • McCallum, A., Nigam, K.: In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)

    Google Scholar 

  • McCulloch, W., Pitts, W.H. Jr.: Bull. Math. Biophys. 5(4), 115–133 (1943)

    Article  MathSciNet  Google Scholar 

  • Megeath, S.T., Gutermuth, R., Muzerolle, J., et al.: Astron. J. 144, 192 (2012)

    Article  ADS  Google Scholar 

  • Miettinen, O.: Astrophys. Space Sci. 361, 248 (2016)

    Article  ADS  Google Scholar 

  • Miettinen, O., Harju, J., Haikala, L.K., et al.: Astron. Astrophys. 500, 845 (2009)

    Article  ADS  Google Scholar 

  • Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  • Mosteller, F., Turkey, J.W.: Data analysis, including statistics. In: Lindzey, G., Aronson, E. (eds.) Handbook of Social Psychology, vol. 2. Addison-Wesley, Reading (1968)

    Google Scholar 

  • Murthy, S.K.: Data Min. Knowl. Discov. 2, 345–389 (1998)

    Article  Google Scholar 

  • Myers, P.C., Ladd, E.F.: Astrophys. J. Lett. 413, L47 (1993)

    Article  ADS  Google Scholar 

  • Pashchenko, I.N., Sokolovsky, K.V., Gavras, P.: Mon. Not. R. Astron. Soc. 475, 2326 (2018)

    Article  ADS  Google Scholar 

  • Pearson, K.: Philos. Mag. 2(11), 559–572 (1901)

    Article  Google Scholar 

  • Pearson, K.A., Palafox, L., Griffith, C.A.: Mon. Not. R. Astron. Soc. 474, 478 (2018)

    Article  ADS  Google Scholar 

  • Pilbratt, G.L., Riedinger, J.R., Passvogel, T., et al.: Astron. Astrophys. 518, L1 (2010)

    Article  ADS  Google Scholar 

  • Poglitsch, A., Waelkens, C., Geis, N., et al.: Astron. Astrophys. 518, L2 (2010)

    Article  ADS  Google Scholar 

  • Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  • Rathborne, J.M., Jackson, J.M., Chambers, E.T., et al.: Astrophys. J. 715, 310 (2010)

    Article  ADS  Google Scholar 

  • Rieke, G.H., Young, E.T., Engelbracht, C.W., et al.: Astrophys. J. Suppl. Ser. 154, 25 (2004)

    Article  ADS  Google Scholar 

  • Rosenblatt, F.: Psychol. Rev. 65(6), 386–408 (1958)

    Article  Google Scholar 

  • Saar-Tsechansky, M., Provost, F.: J. Mach. Learn. Res. 8, 1625–1657 (2007)

    Google Scholar 

  • Sauvage, M., Okumura, K., Klaas, U., et al.: Exp. Astron. 37, 397 (2014)

    Article  ADS  Google Scholar 

  • Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  • Shetty, R., Kauffmann, J., Schnee, S., Goodman, A.A., Ercolano, B.: Astrophys. J. 696, 2234 (2009)

    Article  ADS  Google Scholar 

  • Siringo, G., Kreysa, E., Kovács, A., et al.: Astron. Astrophys. 497, 945 (2009)

    Article  ADS  Google Scholar 

  • Siringo, G., Kreysa, E., De Breuck, C., et al.: Messenger 139, 20 (2010)

    ADS  Google Scholar 

  • Skrutskie, M.F., Cutri, R.M., Stiening, R., et al.: Astron. J. 131, 1163 (2006)

    Article  ADS  Google Scholar 

  • Spezzi, L., Petr-Gotzens, M.G., Alcalá, J.M., et al.: Astron. Astrophys. 581, A140 (2015)

    Article  Google Scholar 

  • Sreejith, S., Pereverzyev, S. Jr., Kelvin, L.S., et al.: Mon. Not. R. Astron. Soc. 474, 5232 (2018)

    Article  ADS  Google Scholar 

  • Stutz, A.M., Tobin, J.J., Stanke, T., et al.: Astrophys. J. 767, 36 (2013)

    Article  ADS  Google Scholar 

  • Tangaro, S., Amoroso, N., Brescia, M., et al.: Comput. Math. Methods Med. 2015, 814104 (2015)

    Article  Google Scholar 

  • van Buuren, S., Groothuis-Oudshoorn, K.: J. Stat. Softw. 45(3), 1–67 (2011)

    Article  Google Scholar 

  • Vapnik, V., Lerner, A.: Autom. Remote Control 24, 774–780 (1963)

    Google Scholar 

  • White, R.J., Greene, T.P., Doppmann, G.W., et al.: In: Reipurth, B., Jewitt, D., Keil, K. (eds.) Protostars and Planets V, p. 117. University of Arizona Press, Tucson (2007). 951 pp.

    Google Scholar 

  • Witten, I.H., Frank, E.: Data Mining—Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann/Elsevier Inc., San Mateo/Amsterdam (2005)

    MATH  Google Scholar 

  • Yan, Q.-Z., Xu, Y., Walsh, A.J., et al.: Mon. Not. R. Astron. Soc. 476, 3981 (2018)

    Article  ADS  Google Scholar 

  • Ybarra, J.E., Lada, E.A.: Astrophys. J. Lett. 695, L120 (2009)

    Article  ADS  Google Scholar 

  • Yee, J.C., Fazio, G.G., Benjamin, R., et al.: arXiv:1710.04194 (2017)

  • Zhang, G.: IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 30(4), 451–462 (2000)

    Article  Google Scholar 

  • Zhang, H.: Astron. Astrophys. 1(2), 3 (2004)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

I would like to thank the referee for providing constructive comments and suggestions that helped to improve the quality of this paper. This research has made use of NASA’s Astrophysics Data System and the NASA/IPAC Infrared Science Archive, which is operated by the JPL, California Institute of Technology, under contract with the NASA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to O. Miettinen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miettinen, O. Protostellar classification using supervised machine learning algorithms. Astrophys Space Sci 363, 197 (2018). https://doi.org/10.1007/s10509-018-3418-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10509-018-3418-7

Keywords

Navigation