Abstract
In recent years the plunging costs of sensors/storage have made it possible to obtain vast amounts of medical telemetry, both in clinical settings and more recently, even in patient’s own homes . However for this data to be useful, it must be annotated. This annotation, requiring the attention of medical experts is very expensive and time consuming, and remains the critical bottleneck in medical analysis. The technique of Semi-supervised learning is the obvious way to reduce the need for human labor, however, most such algorithms are designed for intrinsically discrete objects such as graphs or strings, and do not work well in this domain, which requires the ability to deal with real-valued objects arriving in a streaming fashion. In this work we make two contributions. First, we demonstrate that in many cases a surprisingly small set of human annotated examples are sufficient to perform accurate classification. Second, we devise a novel parameter-free stopping criterion for semi-supervised learning. We evaluate our work with a comprehensive set of experiments on diverse medical data sources including electrocardiograms. Our experimental results suggest that our approach can typically construct accurate classifiers even if given only a single annotated instance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Besemer, J., Lomsadze, A., Borodovsky, M.: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29(12), 2607–2618 (2001)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th ACM Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Bouchard, D., Badler, N.: Semantic segmentation of motion capture using Laban movement analysis. In: Intelligent Virtual Agents, pp. 37–44. Springer, Heidelberg (2007)
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, vol. 2. MIT press, Cambridge (2006)
Chazal, P.D., O’Dwyer, M., Reilly, R.B.: Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. 51, 1196–1206 (2004)
Chen, Y., Hu, B., Keogh, E., Batista, G.E.: DTW-D: time series semi-supervised learning from a single example. In: The 19th ACM SIGKDD, pp. 383–391 (2013)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Druck, G., Pal, C., Zhu, X., McCallum, A.: Semi-supervised classification with hybrid generative/discriminative methods. In: The 13th ACM SIGKDD (2007)
Florea, F., Müller, H., Rogozan, A., Geissbuhler, A., Darmoni, S.: Medical image categorization with MedIC and MedGIFT. In: Medical Informatics Europe (MIE) (2006)
Geurts, P.: Pattern extraction for time series classification. In: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 115–127 (2001)
Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), 215–220 (2000)
Greenwald, S.D., Patil, R.S., Mark, R.G.: Improved detection and classification of arrhythmias in noise-corrupted electrocardiograms using contextual information. In: Proceedings of IEEE Conference on Computing in Cardiology (1990)
Greenwald, S.D.: The Development and Analysis of a Ventricular Fibrillation Detector. M.S. thesis, MIT Department of Electrical Engineering and Computer Science, Cambridge (1986)
Grünwald, P.: A Tutorial Introduction to the Minimum Description Length Principle. MIT Press, Cambridge (2005)
Herwig, M.: Google’s Total Library: Putting the World’s Books on the Web (2007)
Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 2, 1–31 (2013)
Hu, B., Rakthanmanon, T., Hao, Y., Evans, S., Lonardi, S., Keogh, E.: Discovering the intrinsic cardinality and dimensionality of time series using MDL. In: Proceedings of ICDM, pp. 1086–1091 (2011)
Jones, P.D., Hulme, M.: Calculating regional climatic time series for temperature and precipitation: methods and illustrations. Int. J. Climatol. 16(4), 361–377 (1996)
Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering. www.cs.ucr.edu/~eamonn/time_series_data
Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)
Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. In: Proceedings of ACL (2004)
McClosky, D., Charniak, E., Johnson, M.: Effective self-training for parsing. In: Proceedings of the Main Conference on Human Language Technology and Conference of the North American Chapter of the Association of Computational Linguistics, pp. 152–159 (2006)
Nemenyi, P.B.: Distribution-free Multiple Comparisons. PhD Thesis, Princeton University (1963)
Nguyen, M.N., Li, X.L., Ng, S.K.: Positive unlabeled learning for time series classification. In: Proceedings of AAAI (2011)
Nguyen, M.N., Li, X.L., Ng, S.K.: Ensemble Based Positive Unlabeled Learning for Time Series Classification. Database Systems for Advanced Applications. Springer, Heidelberg (2012)
Ordonez, P., Oates, T., Lombardi, M.E., Hernandez, G., Holmes, K.W., Fackler, J., Lehmann, C.U.: Visualization of multivariate time-series data in a Neonatal ICU. IBM J. Res. Dev. 56(5), 7–1 (2012)
Patton, A.J.: Copula-based models for financial time series. In: Andersen, T.G., Davis, R.A., Kreiss, J-P., Mikosch, T. (eds.) Handbook of Financial Time Series, pp. 767–785. Springer, Heidelberg (2009)
Philipose, M.: Large-Scale Human Activity Recognition Using Ultra-Dense Sensing. The Bridge, vol. 35, issue 4. National Academy of Engineering, Winter (2005)
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Time-series classification in many intrinsic dimensions. In: Proceedings of SIAM SDM, pp. 677–688 (2010)
Rakthanmanon, T., Keogh, E., Lonardi, S., Evans, S.: Time series epenthesis: clustering time series streams requires ignoring some data. In: Proceedings of ICDM (2011)
Raptis, M., Wnuk, K., Soatto, S.: Flexible dictionaries for action classification. In: The 1st International Workshop on Machine Learning for Vision-based Motion Analysis (2008)
Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: Proceedings of SIAM SDM (2004)
Ratanamahatana, C.A., Wanichsan, D.: Stopping criterion selection for efficient semi-supervised time series classification. In: Lee, R.Y. (ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Studies in Computational Intelligence, vol. 149, pp. 1–14. Springer (2008)
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised Self-training of Object Detection Models. WACV/MOTION, 29–36 (2005)
Simon, B.P., Eswaran, C.: An ECG classifier designed using modified decision based neural networks. Comput. Biomed. Res. 30(4), 257–272 (1997)
Sun, A., Grishman, R.: Semi-supervised semantic pattern discovery with guidance from unsupervised pattern clusters. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1194–1202 (2010)
Sykacek, P., Roberts, S.J.: Bayesian time series classification. In: Jordan, M., Reams, M., Solla, S. (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge (2002)
Tsumoto, S.: Rule discovery in large time-series medical databases. In: In: Zytkow, J., Rauch, J. (eds.) Principles of Data Mining and Knowledge Discovery, pp. 23–31. Springer, Heidelberg (1999)
Veeraraghavan, A., Chellappa, R., Srinivasan, M.: Shape and behavior encoded tracking of bee dances. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 463–476 (2008)
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 275–309 (2013)
Wei, L., Keogh, E.: Semi-supervised time series classification. In: Proceedings of SIGKDD (2006)
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd ACM International Conference on Machine Learning, pp. 1033–1040 (2006)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of ACL (1995)
Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report No. 1530. Computer Sciences, University of Wisconsin-Madison (2005)
Acknowledgments
This research was funded by NSF grant IIS—1161997.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Begum, N., Hu, B., Rakthanmanon, T., Keogh, E. (2014). A Minimum Description Length Technique for Semi-Supervised Time Series Classification. In: Bouabana-Tebibel, T., Rubin, S. (eds) Integration of Reusable Systems. Advances in Intelligent Systems and Computing, vol 263. Springer, Cham. https://doi.org/10.1007/978-3-319-04717-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-04717-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04716-4
Online ISBN: 978-3-319-04717-1
eBook Packages: EngineeringEngineering (R0)