A Minimum Description Length Technique for Semi-Supervised Time Series Classification

Begum, Nurjahan; Hu, Bing; Rakthanmanon, Thanawin; Keogh, Eamonn

doi:10.1007/978-3-319-04717-1_8

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

Nurjahan Begum⁴,
Bing Hu⁴,
Thanawin Rakthanmanon⁴ &
…
Eamonn Keogh⁴

Chapter
First Online: 01 January 2014

882 Accesses
10 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 263))

Abstract

In recent years the plunging costs of sensors/storage have made it possible to obtain vast amounts of medical telemetry, both in clinical settings and more recently, even in patient’s own homes . However for this data to be useful, it must be annotated. This annotation, requiring the attention of medical experts is very expensive and time consuming, and remains the critical bottleneck in medical analysis. The technique of Semi-supervised learning is the obvious way to reduce the need for human labor, however, most such algorithms are designed for intrinsically discrete objects such as graphs or strings, and do not work well in this domain, which requires the ability to deal with real-valued objects arriving in a streaming fashion. In this work we make two contributions. First, we demonstrate that in many cases a surprisingly small set of human annotated examples are sufficient to perform accurate classification. Second, we devise a novel parameter-free stopping criterion for semi-supervised learning. We evaluate our work with a comprehensive set of experiments on diverse medical data sources including electrocardiograms. Our experimental results suggest that our approach can typically construct accurate classifiers even if given only a single annotated instance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Besemer, J., Lomsadze, A., Borodovsky, M.: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29(12), 2607–2618 (2001)
Article Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th ACM Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Bouchard, D., Badler, N.: Semantic segmentation of motion capture using Laban movement analysis. In: Intelligent Virtual Agents, pp. 37–44. Springer, Heidelberg (2007)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, vol. 2. MIT press, Cambridge (2006)
Book Google Scholar
Chazal, P.D., O’Dwyer, M., Reilly, R.B.: Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. 51, 1196–1206 (2004)
Article Google Scholar
Chen, Y., Hu, B., Keogh, E., Batista, G.E.: DTW-D: time series semi-supervised learning from a single example. In: The 19th ACM SIGKDD, pp. 383–391 (2013)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Druck, G., Pal, C., Zhu, X., McCallum, A.: Semi-supervised classification with hybrid generative/discriminative methods. In: The 13th ACM SIGKDD (2007)
Google Scholar
Florea, F., Müller, H., Rogozan, A., Geissbuhler, A., Darmoni, S.: Medical image categorization with MedIC and MedGIFT. In: Medical Informatics Europe (MIE) (2006)
Google Scholar
Geurts, P.: Pattern extraction for time series classification. In: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 115–127 (2001)
Google Scholar
Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), 215–220 (2000)
Article Google Scholar
Greenwald, S.D., Patil, R.S., Mark, R.G.: Improved detection and classification of arrhythmias in noise-corrupted electrocardiograms using contextual information. In: Proceedings of IEEE Conference on Computing in Cardiology (1990)
Google Scholar
Greenwald, S.D.: The Development and Analysis of a Ventricular Fibrillation Detector. M.S. thesis, MIT Department of Electrical Engineering and Computer Science, Cambridge (1986)
Google Scholar
Grünwald, P.: A Tutorial Introduction to the Minimum Description Length Principle. MIT Press, Cambridge (2005)
Google Scholar
Herwig, M.: Google’s Total Library: Putting the World’s Books on the Web (2007)
Google Scholar
Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 2, 1–31 (2013)
Google Scholar
Hu, B., Rakthanmanon, T., Hao, Y., Evans, S., Lonardi, S., Keogh, E.: Discovering the intrinsic cardinality and dimensionality of time series using MDL. In: Proceedings of ICDM, pp. 1086–1091 (2011)
Google Scholar
Jones, P.D., Hulme, M.: Calculating regional climatic time series for temperature and precipitation: methods and illustrations. Int. J. Climatol. 16(4), 361–377 (1996)
Article Google Scholar
Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering. www.cs.ucr.edu/~eamonn/time_series_data
Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)
Google Scholar
Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. In: Proceedings of ACL (2004)
Google Scholar
McClosky, D., Charniak, E., Johnson, M.: Effective self-training for parsing. In: Proceedings of the Main Conference on Human Language Technology and Conference of the North American Chapter of the Association of Computational Linguistics, pp. 152–159 (2006)
Google Scholar
Nemenyi, P.B.: Distribution-free Multiple Comparisons. PhD Thesis, Princeton University (1963)
Google Scholar
Nguyen, M.N., Li, X.L., Ng, S.K.: Positive unlabeled learning for time series classification. In: Proceedings of AAAI (2011)
Google Scholar
Nguyen, M.N., Li, X.L., Ng, S.K.: Ensemble Based Positive Unlabeled Learning for Time Series Classification. Database Systems for Advanced Applications. Springer, Heidelberg (2012)
Google Scholar
Ordonez, P., Oates, T., Lombardi, M.E., Hernandez, G., Holmes, K.W., Fackler, J., Lehmann, C.U.: Visualization of multivariate time-series data in a Neonatal ICU. IBM J. Res. Dev. 56(5), 7–1 (2012)
Google Scholar
Patton, A.J.: Copula-based models for financial time series. In: Andersen, T.G., Davis, R.A., Kreiss, J-P., Mikosch, T. (eds.) Handbook of Financial Time Series, pp. 767–785. Springer, Heidelberg (2009)
Google Scholar
Philipose, M.: Large-Scale Human Activity Recognition Using Ultra-Dense Sensing. The Bridge, vol. 35, issue 4. National Academy of Engineering, Winter (2005)
Google Scholar
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Time-series classification in many intrinsic dimensions. In: Proceedings of SIAM SDM, pp. 677–688 (2010)
Google Scholar
Rakthanmanon, T., Keogh, E., Lonardi, S., Evans, S.: Time series epenthesis: clustering time series streams requires ignoring some data. In: Proceedings of ICDM (2011)
Google Scholar
Raptis, M., Wnuk, K., Soatto, S.: Flexible dictionaries for action classification. In: The 1st International Workshop on Machine Learning for Vision-based Motion Analysis (2008)
Google Scholar
Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: Proceedings of SIAM SDM (2004)
Google Scholar
Ratanamahatana, C.A., Wanichsan, D.: Stopping criterion selection for efficient semi-supervised time series classification. In: Lee, R.Y. (ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Studies in Computational Intelligence, vol. 149, pp. 1–14. Springer (2008)
Google Scholar
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised Self-training of Object Detection Models. WACV/MOTION, 29–36 (2005)
Google Scholar
Simon, B.P., Eswaran, C.: An ECG classifier designed using modified decision based neural networks. Comput. Biomed. Res. 30(4), 257–272 (1997)
Article Google Scholar
Sun, A., Grishman, R.: Semi-supervised semantic pattern discovery with guidance from unsupervised pattern clusters. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1194–1202 (2010)
Google Scholar
Sykacek, P., Roberts, S.J.: Bayesian time series classification. In: Jordan, M., Reams, M., Solla, S. (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge (2002)
Google Scholar
Tsumoto, S.: Rule discovery in large time-series medical databases. In: In: Zytkow, J., Rauch, J. (eds.) Principles of Data Mining and Knowledge Discovery, pp. 23–31. Springer, Heidelberg (1999)
Google Scholar
Veeraraghavan, A., Chellappa, R., Srinivasan, M.: Shape and behavior encoded tracking of bee dances. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 463–476 (2008)
Article Google Scholar
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 275–309 (2013)
Article MathSciNet Google Scholar
Wei, L., Keogh, E.: Semi-supervised time series classification. In: Proceedings of SIGKDD (2006)
Google Scholar
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd ACM International Conference on Machine Learning, pp. 1033–1040 (2006)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of ACL (1995)
Google Scholar
Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report No. 1530. Computer Sciences, University of Wisconsin-Madison (2005)
Google Scholar
http://www.cs.ucr.edu/~nbegu001/SSL_myMDL.htm

Download references

Acknowledgments

This research was funded by NSF grant IIS—1161997.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Kasetsart University, University of California, Riverside, CA, USA
Nurjahan Begum, Bing Hu, Thanawin Rakthanmanon & Eamonn Keogh

Authors

Nurjahan Begum
View author publications
You can also search for this author in PubMed Google Scholar
Bing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Thanawin Rakthanmanon
View author publications
You can also search for this author in PubMed Google Scholar
Eamonn Keogh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nurjahan Begum or Bing Hu .

Editor information

Editors and Affiliations

Laboratoire de Communication dans les Systèmes Informatiques, Ecole Nationale Supérieure d’Informatique, Algiers, Algeria
Thouraya Bouabana-Tebibel
SPAWAR Systems Center Pacific, San Diego, USA
Stuart H. Rubin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Begum, N., Hu, B., Rakthanmanon, T., Keogh, E. (2014). A Minimum Description Length Technique for Semi-Supervised Time Series Classification. In: Bouabana-Tebibel, T., Rubin, S. (eds) Integration of Reusable Systems. Advances in Intelligent Systems and Computing, vol 263. Springer, Cham. https://doi.org/10.1007/978-3-319-04717-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-04717-1_8
Published: 18 February 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04716-4
Online ISBN: 978-3-319-04717-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics