Skip to main content

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

  • Chapter
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 263))

Abstract

In recent years the plunging costs of sensors/storage have made it possible to obtain vast amounts of medical telemetry, both in clinical settings and more recently, even in patient’s own homes . However for this data to be useful, it must be annotated. This annotation, requiring the attention of medical experts is very expensive and time consuming, and remains the critical bottleneck in medical analysis. The technique of Semi-supervised learning is the obvious way to reduce the need for human labor, however, most such algorithms are designed for intrinsically discrete objects such as graphs or strings, and do not work well in this domain, which requires the ability to deal with real-valued objects arriving in a streaming fashion. In this work we make two contributions. First, we demonstrate that in many cases a surprisingly small set of human annotated examples are sufficient to perform accurate classification. Second, we devise a novel parameter-free stopping criterion for semi-supervised learning. We evaluate our work with a comprehensive set of experiments on diverse medical data sources including electrocardiograms. Our experimental results suggest that our approach can typically construct accurate classifiers even if given only a single annotated instance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Besemer, J., Lomsadze, A., Borodovsky, M.: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29(12), 2607–2618 (2001)

    Article  Google Scholar 

  2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th ACM Annual Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  3. Bouchard, D., Badler, N.: Semantic segmentation of motion capture using Laban movement analysis. In: Intelligent Virtual Agents, pp. 37–44. Springer, Heidelberg (2007)

    Google Scholar 

  4. Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, vol. 2. MIT press, Cambridge (2006)

    Book  Google Scholar 

  5. Chazal, P.D., O’Dwyer, M., Reilly, R.B.: Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. 51, 1196–1206 (2004)

    Article  Google Scholar 

  6. Chen, Y., Hu, B., Keogh, E., Batista, G.E.: DTW-D: time series semi-supervised learning from a single example. In: The 19th ACM SIGKDD, pp. 383–391 (2013)

    Google Scholar 

  7. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

  8. Druck, G., Pal, C., Zhu, X., McCallum, A.: Semi-supervised classification with hybrid generative/discriminative methods. In: The 13th ACM SIGKDD (2007)

    Google Scholar 

  9. Florea, F., Müller, H., Rogozan, A., Geissbuhler, A., Darmoni, S.: Medical image categorization with MedIC and MedGIFT. In: Medical Informatics Europe (MIE) (2006)

    Google Scholar 

  10. Geurts, P.: Pattern extraction for time series classification. In: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 115–127 (2001)

    Google Scholar 

  11. Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), 215–220 (2000)

    Article  Google Scholar 

  12. Greenwald, S.D., Patil, R.S., Mark, R.G.: Improved detection and classification of arrhythmias in noise-corrupted electrocardiograms using contextual information. In: Proceedings of IEEE Conference on Computing in Cardiology (1990)

    Google Scholar 

  13. Greenwald, S.D.: The Development and Analysis of a Ventricular Fibrillation Detector. M.S. thesis, MIT Department of Electrical Engineering and Computer Science, Cambridge (1986)

    Google Scholar 

  14. Grünwald, P.: A Tutorial Introduction to the Minimum Description Length Principle. MIT Press, Cambridge (2005)

    Google Scholar 

  15. Herwig, M.: Google’s Total Library: Putting the World’s Books on the Web (2007)

    Google Scholar 

  16. Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 2, 1–31 (2013)

    Google Scholar 

  17. Hu, B., Rakthanmanon, T., Hao, Y., Evans, S., Lonardi, S., Keogh, E.: Discovering the intrinsic cardinality and dimensionality of time series using MDL. In: Proceedings of ICDM, pp. 1086–1091 (2011)

    Google Scholar 

  18. Jones, P.D., Hulme, M.: Calculating regional climatic time series for temperature and precipitation: methods and illustrations. Int. J. Climatol. 16(4), 361–377 (1996)

    Article  Google Scholar 

  19. Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering. www.cs.ucr.edu/~eamonn/time_series_data

  20. Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)

    Google Scholar 

  21. Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. In: Proceedings of ACL (2004)

    Google Scholar 

  22. McClosky, D., Charniak, E., Johnson, M.: Effective self-training for parsing. In: Proceedings of the Main Conference on Human Language Technology and Conference of the North American Chapter of the Association of Computational Linguistics, pp. 152–159 (2006)

    Google Scholar 

  23. Nemenyi, P.B.: Distribution-free Multiple Comparisons. PhD Thesis, Princeton University (1963)

    Google Scholar 

  24. Nguyen, M.N., Li, X.L., Ng, S.K.: Positive unlabeled learning for time series classification. In: Proceedings of AAAI (2011)

    Google Scholar 

  25. Nguyen, M.N., Li, X.L., Ng, S.K.: Ensemble Based Positive Unlabeled Learning for Time Series Classification. Database Systems for Advanced Applications. Springer, Heidelberg (2012)

    Google Scholar 

  26. Ordonez, P., Oates, T., Lombardi, M.E., Hernandez, G., Holmes, K.W., Fackler, J., Lehmann, C.U.: Visualization of multivariate time-series data in a Neonatal ICU. IBM J. Res. Dev. 56(5), 7–1 (2012)

    Google Scholar 

  27. Patton, A.J.: Copula-based models for financial time series. In: Andersen, T.G., Davis, R.A., Kreiss, J-P., Mikosch, T. (eds.) Handbook of Financial Time Series, pp. 767–785. Springer, Heidelberg (2009)

    Google Scholar 

  28. Philipose, M.: Large-Scale Human Activity Recognition Using Ultra-Dense Sensing. The Bridge, vol. 35, issue 4. National Academy of Engineering, Winter (2005)

    Google Scholar 

  29. Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Time-series classification in many intrinsic dimensions. In: Proceedings of SIAM SDM, pp. 677–688 (2010)

    Google Scholar 

  30. Rakthanmanon, T., Keogh, E., Lonardi, S., Evans, S.: Time series epenthesis: clustering time series streams requires ignoring some data. In: Proceedings of ICDM (2011)

    Google Scholar 

  31. Raptis, M., Wnuk, K., Soatto, S.: Flexible dictionaries for action classification. In: The 1st International Workshop on Machine Learning for Vision-based Motion Analysis (2008)

    Google Scholar 

  32. Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: Proceedings of SIAM SDM (2004)

    Google Scholar 

  33. Ratanamahatana, C.A., Wanichsan, D.: Stopping criterion selection for efficient semi-supervised time series classification. In: Lee, R.Y. (ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Studies in Computational Intelligence, vol. 149, pp. 1–14. Springer (2008)

    Google Scholar 

  34. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised Self-training of Object Detection Models. WACV/MOTION, 29–36 (2005)

    Google Scholar 

  35. Simon, B.P., Eswaran, C.: An ECG classifier designed using modified decision based neural networks. Comput. Biomed. Res. 30(4), 257–272 (1997)

    Article  Google Scholar 

  36. Sun, A., Grishman, R.: Semi-supervised semantic pattern discovery with guidance from unsupervised pattern clusters. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1194–1202 (2010)

    Google Scholar 

  37. Sykacek, P., Roberts, S.J.: Bayesian time series classification. In: Jordan, M., Reams, M., Solla, S. (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge (2002)

    Google Scholar 

  38. Tsumoto, S.: Rule discovery in large time-series medical databases. In: In: Zytkow, J., Rauch, J. (eds.) Principles of Data Mining and Knowledge Discovery, pp. 23–31. Springer, Heidelberg (1999)

    Google Scholar 

  39. Veeraraghavan, A., Chellappa, R., Srinivasan, M.: Shape and behavior encoded tracking of bee dances. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 463–476 (2008)

    Article  Google Scholar 

  40. Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 275–309 (2013)

    Article  MathSciNet  Google Scholar 

  41. Wei, L., Keogh, E.: Semi-supervised time series classification. In: Proceedings of SIGKDD (2006)

    Google Scholar 

  42. Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd ACM International Conference on Machine Learning, pp. 1033–1040 (2006)

    Google Scholar 

  43. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of ACL (1995)

    Google Scholar 

  44. Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report No. 1530. Computer Sciences, University of Wisconsin-Madison (2005)

    Google Scholar 

  45. http://www.cs.ucr.edu/~nbegu001/SSL_myMDL.htm

Download references

Acknowledgments

This research was funded by NSF grant IIS—1161997.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nurjahan Begum or Bing Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Begum, N., Hu, B., Rakthanmanon, T., Keogh, E. (2014). A Minimum Description Length Technique for Semi-Supervised Time Series Classification. In: Bouabana-Tebibel, T., Rubin, S. (eds) Integration of Reusable Systems. Advances in Intelligent Systems and Computing, vol 263. Springer, Cham. https://doi.org/10.1007/978-3-319-04717-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04717-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04716-4

  • Online ISBN: 978-3-319-04717-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics