Advertisement

Classification and Novel Class Detection of Data Streams in a Dynamic Feature Space

  • Mohammad M. Masud
  • Qing Chen
  • Jing Gao
  • Latifur Khan
  • Jiawei Han
  • Bhavani Thuraisingham
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6322)

Abstract

Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Concept-drift occurs in a data stream when the underlying concept changes over time. Most existing data stream classification techniques address only the infinite length and concept-drift problems. However, concept-evolution and feature- evolution are also major challenges, and these are ignored by most of the existing approaches. Concept-evolution occurs in the stream when novel classes arrive, and feature-evolution occurs when new features emerge in the stream. Our previous work addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Most of the existing data stream classification techniques, including our previous work, assume that the feature space of the data points in the stream is static. This assumption may be impractical for some type of data, for example text data. DXMiner considers the dynamic nature of the feature space and provides an elegant solution for classification and novel class detection when the feature space is dynamic. We show that our approach outperforms state-of-the-art stream classification techniques in classifying and detecting novel classes in real data streams.

Keywords

Feature Space Data Stream Test Instance Data Chunk Class Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Chen, S., Wang, H., Zhou, S., Yu, P.: Stop chasing trends: Discovering high order models in evolving data. In: Proc. ICDE 2008, pp. 923–932 (2008)Google Scholar
  2. 2.
    Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD, Seattle, WA, USA, pp. 128–137 (2004)Google Scholar
  3. 3.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: SIGKDD, San Francisco, CA, USA, pp. 97–106 (August 2001)Google Scholar
  4. 4.
    Katakis, I., Tsoumakas, G., Vlahavas, I.: Dynamic feature space and incremental feature selection for the classification of textual data streams. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 102–116. Springer, Heidelberg (2006)Google Scholar
  5. 5.
    Kolter, J., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: ICML, Bonn, Germany, pp. 449–456 (August 2005)Google Scholar
  6. 6.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Integrating novel class detection with classification for concept-drifting data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 79–94. Springer, Heidelberg (2009); Extended version is in the preprints, IEEE TKDE, vol. 99 (2010), doi = http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.61
  7. 7.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: Perner, P. (ed.) ICDM 2008. LNCS (LNAI), vol. 5077, pp. 929–934. Springer, Heidelberg (2008)Google Scholar
  8. 8.
    Spinosa, E.J., de Leon, A.P., de Carvalho, F., Gama, J.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: ACM SAC, pp. 976–980 (2008)Google Scholar
  9. 9.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003, pp. 226–235 (2003)Google Scholar
  10. 10.
    Wenerstrom, B., Giraud-Carrier, C.: Temporal data mining in dynamic feature spaces. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 1141–1145. Springer, Heidelberg (2006)Google Scholar
  11. 11.
    Yang, Y., Wu, X., Zhu, X.: Combining proactive and reactive predictions for data streams. In: Proc. SIGKDD, pp. 710–715 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mohammad M. Masud
    • 1
  • Qing Chen
    • 1
  • Jing Gao
    • 2
  • Latifur Khan
    • 1
  • Jiawei Han
    • 2
  • Bhavani Thuraisingham
    • 1
  1. 1.University of Texas at Dallas 
  2. 2.University of Illinois at Urbana Champaign 

Personalised recommendations