Advertisement

Maps Ensemble for Semi-Supervised Learning of Large High Dimensional Datasets

  • Elie Prudhomme
  • Stéphane Lallich
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4994)

Abstract

In many practical cases, only few labels are available on the data. Algorithms must then take advantage of the unlabeled data to ensure an efficient learning. This type of learning is called semi-supervised learning (SSL). In this article, we propose a methodology adapted to both the representation and the prediction of large datasets in that situation. For that purpose, groups of non-correlated attributes are created in order to overcome problems related to high dimensional spaces. An ensemble is then set up to learn each group with a self-organizing map (SOM). Beside the prediction, these maps also aim at providing a relevant representation of the data which could be used in semi-supervised learning. Finally, the prediction is achieved by a vote of the different maps. Experimentations are performed both in supervised and semi-supervised learning. They show the relevance of this approach.

Keywords

Random Forest Supervise Learning High Dimensional Space Unlabeled Data Ensemble Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Becker, S.: JPMAX: Learning to recognize moving objects as a model-fitting problem. In: Advances in Neural Information Processing Systems, vol. 7, pp. 933–940. MIT Press, Cambridge (1995)Google Scholar
  2. 2.
    Bellmann, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1975)Google Scholar
  3. 3.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100. Morgan Kaufmann, San Francisco (1998)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MATHMathSciNetGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  6. 6.
    Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Information Fusion 6(1), 5–20 (2005)CrossRefGoogle Scholar
  7. 7.
    Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)Google Scholar
  8. 8.
    Demartines, P.: Analyse de données par réseaux de neurones auto-organisés. Ph.d. dissertation, Institut National Polytechnique de Grenoble, France (1994)Google Scholar
  9. 9.
    Duin, R., Tax, D.: Experiments with classifier combining rules. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 16–29. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Freund, Y.: Boosting a weak learning algorithm by majority. In: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, San Francisco (1990)Google Scholar
  11. 11.
    Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computing 4(1), 1–58 (1992)CrossRefGoogle Scholar
  12. 12.
    Jacobs, R., Jordan, M., Barto, A.: Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. Cognitive Science 15, 219–250 (1991)CrossRefGoogle Scholar
  13. 13.
    Kaiser, H.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200 (1958)MATHCrossRefGoogle Scholar
  14. 14.
    Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Heidelberg (2001)MATHGoogle Scholar
  15. 15.
    Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. Advances in NIPS 7, 231–238 (1995)Google Scholar
  16. 16.
    Leskes, B.: The Value of Agreement, a New Boosting Algorithm. Springer, Heidelberg (2005)Google Scholar
  17. 17.
    McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  18. 18.
    Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  19. 19.
    Prudhomme, E., Lallich, S.: Quality measure based on Kohonen maps for supervised learning of large high dimensional data. In: Proc. of ASMDA 2005, pp. 246–255 (2005)Google Scholar
  20. 20.
    Rakotomalala, R.: Tanagra: un logiciel gratuit pour l’enseignement et la recherche. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 697–702. Springer, Heidelberg (2005)Google Scholar
  21. 21.
    Ruta, D., Gabrys, B.: Classifier selection for majority voting. Information Fusion 6, 63–81 (2005)CrossRefGoogle Scholar
  22. 22.
    SAS, SAS/STAT user’s guide, vol. 2. SAS Institute Inc. (1989)Google Scholar
  23. 23.
    Tumer, K., Ghosh, J.: Theoretical foundations of linear and order statistics combiners for neural pattern classifiers. Technical report, Computer and Vision Research Center, University of Texas, Austin (1995)Google Scholar
  24. 24.
    Valentini, G., Masulli, F.: Ensembles of learning machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–20. Springer, Heidelberg (2002)Google Scholar
  25. 25.
    Verleysen, M., François, D., Simon, G., Wertz, V.: On the effects of dimensionality on data analysis with neural networks. In: International Work-Conference on ANNN: Computational Methods in Neural Modeling, vol. II, pp. 105–112. Springer, Heidelberg (2003)Google Scholar
  26. 26.
    Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of American Statistical Association 58(301), 236–244 (1963)CrossRefGoogle Scholar
  27. 27.
    Zanda, M., Brown, G., Fumera, G., Roli, F.: Ensemble learning in linearly combined classifiers via negative correlation. In: International Workshop on Multiple Classifier Systems (2007)Google Scholar
  28. 28.
    Zhou, Y., Goldman, S.: Democratic co-learning. In: ICTAI, pp. 594–202 (2004)Google Scholar
  29. 29.
    Zhu, X.: Semi-supervised learning literature survey. Technical report (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Elie Prudhomme
    • 1
  • Stéphane Lallich
    • 1
  1. 1.Laboratoire ERICUniversité Lumière Lyon 2BronFrance

Personalised recommendations