Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels

  • Clay Woolam
  • Mohammad M. Masud
  • Latifur Khan
Conference paper

DOI: 10.1007/978-3-642-04125-9_58

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5722)
Cite this paper as:
Woolam C., Masud M.M., Khan L. (2009) Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels. In: Rauch J., Raś Z.W., Berka P., Elomaa T. (eds) Foundations of Intelligent Systems. ISMIS 2009. Lecture Notes in Computer Science, vol 5722. Springer, Berlin, Heidelberg

Abstract

This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model would be built in these scenarios, considering that the data stream also has evolving nature? In our previous work we applied semi-supervised clustering to build classification models using limited amount of labeled training data. However, it assumed that the data to be labeled should be chosen randomly. In our current work, we relax this assumption, and propose a label propagation framework for data streams that can build good classification models even if the data are not labeled randomly. Comparison with state-of-the-art stream classification techniques on synthetic and benchmark real data proves the effectiveness of our approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Clay Woolam
    • 1
  • Mohammad M. Masud
    • 1
  • Latifur Khan
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at DallasUSA

Personalised recommendations