The Imbalanced Problem in Morphological Galaxy Classification

  • Jorge de la Calleja
  • Gladis Huerta
  • Olac Fuentes
  • Antonio Benitez
  • Eduardo López Domínguez
  • Ma. Auxilio Medina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6419)


In this paper we present an experimental study of the performance of six machine learning algorithms applied to morphological galaxy classification. We also address the learning approach from imbalanced data sets, inherent to many real-world applications, such as astronomical data analysis problems. We used two over-sampling techniques: SMOTE and Resampling, and we vary the amount of generated instances for classification. Our experimental results show that the learning method Random Forest with Resampling obtain the best results for three, five and seven galaxy types, with a F-measure about .99 for all cases.


machine learning imbalanced data sets galaxies 


  1. 1.
    Bazell, D., Aha, D.: Ensembles of classifiers for morphological galaxy classificacion. The Astrophysical Journal 548, 219–233 (2001)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  3. 3.
    Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  4. 4.
    Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    De la Calleja, J., Fuentes, O.: Machine learning and image analysis for morphological galaxy classification. Montly Notices of the Royal Astronomical Society 349, 87–93 (2004)CrossRefGoogle Scholar
  6. 6.
    Han, H., Wang, W., Mao, B.: Borderline-smote: A new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Hongyu, G., Herna, L.V.: Learning from imbalanced data sets with boosting and data generation: The databoost-IM approach. SIGKDD Explor. Newsl. 6(1), 30–39 (2004)CrossRefGoogle Scholar
  8. 8.
    Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186 (1997)Google Scholar
  9. 9.
    Liu, Y., An, A., Huang, X.: Boosting predicion accuracy on imbalanced datasets with svm ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 107–118. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)zbMATHGoogle Scholar
  11. 11.
    Mohamed, M.A., Atta, M.M.: Classification of galaxies using transformed domain features. Internartional Journal of Computer Science and Network Security 10(2), 86–91 (2010)Google Scholar
  12. 12.
    Naim, A., Lahav, O., Sodre Jr., L., Storrie-Lombardi, M.: Automated morphological classification of apm galaxies by supervised artificial neural networks. Monthly Notices of the Royal Astronomical Society 275, 567 (1995)CrossRefGoogle Scholar
  13. 13.
    Philip, N., Wadadekar, Y., Kembhavi, A., Joseph, K.: A difference boosting neural network for automated star-galaxy classification. Astronomy and Astrophysics 385, 1119–1126 (2002)CrossRefGoogle Scholar
  14. 14.
    Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)CrossRefzbMATHGoogle Scholar
  15. 15.
    Wang, B., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) Foundations of Intelligent Systems. LNCS (LNAI), vol. 4994, pp. 38–47. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Yagi, M., Nakamura, Y., Doi, M., Shimasaku, K., Okamura, S.: Morphological classification of nearby galaxies based on asymmetry and luminosity concentration. Monthly Notices of the Royal Astronomical Society 368(1), 211–220 (2006)CrossRefGoogle Scholar
  17. 17.
    Zhang, Y., Zhao, Y.: Automated clustering algorithms for classification of astronomical objects. The Astrophysical Journal 422, 1113–1121 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jorge de la Calleja
    • 1
  • Gladis Huerta
    • 1
  • Olac Fuentes
    • 2
  • Antonio Benitez
    • 1
  • Eduardo López Domínguez
    • 1
  • Ma. Auxilio Medina
    • 1
  1. 1.Ingeniería en InformáticaUniversidad Politécnica de PueblaPueblaMéxico
  2. 2.Computer Science DepartmentUniversity of Texas at El PasoU.S.A.

Personalised recommendations