On-Line Classification of Data Streams with Missing Values Based on Reinforcement Learning

  • Mónica Millán-Giraldo
  • Vicente Javier Traver
  • J. Salvador Sánchez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6669)


In some applications, data arrive sequentially and they are not available in batch form, what makes difficult the use of traditional classification systems. In addition, some attributes may lack due to some real-world conditions. For this problem, a number of decisions have to be made regarding how to proceed with the incomplete and unlabeled incoming objects, how to guess its missing attributes values, how to classify it, whether to include it in the training set, or when to ask for the class label to an expert. Unfortunately, no decision works well for all data sets. This data dependency motivates our formulation of the problem in terms of elements of reinforcement learning. The application of this learning paradigm for this problem is, to the best of our knowledge, novel. The empirical results are encouraging since the proposed framework behaves better and more generally than many strategies used isolatedly, and makes an efficient use of human effort (requests for the class label to an expert) and computer memory (the increase of size of the training set).


Reinforcement learning Active learning Adaptive learning Streaming data Incomplete data Imputation techniques On-line classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Chichester (1987)zbMATHGoogle Scholar
  2. 2.
    Ding, Y., Simonoff, J.S.: An investigation of missing data methods for classification trees applied to binary response data. J. of Machine Learning Res. 11, 131–170 (2010)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognitions 41(12), 3692–3705 (2008)CrossRefzbMATHGoogle Scholar
  4. 4.
    Millán-Giraldo, M., Sánchez, J.S., Traver, V.J.: Exploring early classification strategies of streaming data with delayed attributes. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009. LNCS, vol. 5863, pp. 875–883. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Vogiatzis, D., Stafylopatis, A.: Reinforcement learning for rule extraction from a labeled dataset. Cognitive Systems Research 3(2), 237–253 (2002)CrossRefGoogle Scholar
  6. 6.
    Langford, J., Zadrozny, B.: Relating reinforcement learning performance to classification performance. In: Proc. of the Intl. Conference on Machine Learning, pp. 473–480 (2005)Google Scholar
  7. 7.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  8. 8.
    Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2006)Google Scholar
  9. 9.
    Bruzzone, L., Roli, F., Serpico, S.B.: An extension of the Jeffreys Matusita distance to multiclass cases for feature selection. IEEE Transactions on Geoscience and Remote Sensing 33(6), 1318–1321 (1995)CrossRefGoogle Scholar
  10. 10.
    Nagy, G.: Classifiers that improve with use. In: In Proc. Conf. on Pattern Recognition and Multimedia, pp. 79–86 (2004)Google Scholar
  11. 11.
    Frank, A., Asuncion, A.: UCI Machine Learning RepositoryGoogle Scholar
  12. 12.
    Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)CrossRefzbMATHGoogle Scholar
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mónica Millán-Giraldo
    • 1
  • Vicente Javier Traver
    • 1
  • J. Salvador Sánchez
    • 1
  1. 1.Institute of New Imaging TechnologiesUniversitat Jaume ICastellónSpain

Personalised recommendations