Multivariate Stream Data Classification Using Simple Text Classifiers

  • Sungbo Seo
  • Jaewoo Kang
  • Dongwon Lee
  • Keun Ho Ryu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4080)


We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Naïve Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and JaroWinkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.


Sensor Node Wireless Sensor Network Dynamic Time Warping Bayesian Classifier Time Series Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mainwaring, A., Polastre, J., et al.: Wireless Sensor Networks for habitat monitoring. In: WSNA, pp. 88–97 (2002)Google Scholar
  2. 2.
    Xu, B., Wolfson, O.: Time-Series Prediction with Applications to Traffic and Moving Objects Databases. In: MobiDE, pp. 56–60 (2003)Google Scholar
  3. 3.
    Oliver, R.C., Smettem, K., et al.: Field Testing a Wireless Sensor Network for Reactive Environmental Monitoring. In: ISSNIP, pp. 7–12 (2004)Google Scholar
  4. 4.
    Aggrawal, C.C., Han, J., Yu, P.S.: On Demand Classification of Data Streams. In: KDD, pp. 503–508 (2004)Google Scholar
  5. 5.
    Kadous, M.W., Sammut, C.: Classification of multivariate time series and structured data using constructive induction. Machine Learning Journal, 176–216 (2005)Google Scholar
  6. 6.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams Using Ensemble Classifiers. In: SIGKDD, pp. 226–235 (2003)Google Scholar
  7. 7.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A Symbolic Representation of Time Series with Implications for Streaming Algorithms. In: DMKD, pp. 2–11 (2003)Google Scholar
  8. 8.
    Geurts, P.: Pattern Extraction for Time Series Classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS, vol. 2168, pp. 115–127. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  9. 9.
    Xianping, G.: Pattern Matching in Financial Time Series Data. In: Final Project Report for ICS 278 UC Irvine (1998)Google Scholar
  10. 10.
    Agrawal, R., Psaila, G., Wimmers, E.L., Zait, M.: Querying Shapes of Histories. In: VLBD, pp. 502–514 (1995)Google Scholar
  11. 11.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)Google Scholar
  12. 12.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)Google Scholar
  13. 13.
    Cohen, W.W., Ravikumar, P., Fienberg, S.: A Comparison of String Distance Metrics for Naming-matching tasks. In: IIWEB (2003)Google Scholar
  14. 14.
    On, B.W., Lee, D.W., Kang, J.W., Mitra, P.: Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework. In: JCDL, pp. 344–353 (2005)Google Scholar
  15. 15.
    Chen, J., Greiner, R.: Comparing Bayesian Network Classifiers. In: Proc. of UAI 1999, pp. 101–108 (1999)Google Scholar
  16. 16.
    Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA,
  17. 17.
    A Library for Support Vector Machines,
  18. 18.
    SecondString (Jave-based Package of Approximate String-Matching),
  19. 19.
    Java Bayesian Network Classifier Toolkit,

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sungbo Seo
    • 1
  • Jaewoo Kang
    • 2
    • 3
  • Dongwon Lee
    • 4
  • Keun Ho Ryu
    • 1
  1. 1.Dept. of Computer ScienceChungbuk National UniversityChungbukKorea
  2. 2.Dept. of Computer Science and EngineeringKorea UniversitySeoulKorea
  3. 3.Dept. of Computer ScienceNorth Carolina State UniversityRaleighUSA
  4. 4.College of Information Sciences and TechnologyPenn State UniversityUSA

Personalised recommendations