Advertisement

Data Mining and Knowledge Discovery

, Volume 23, Issue 2, pp 322–344 | Cite as

Sequence classification via large margin hidden Markov models

  • Minyoung Kim
  • Vladimir Pavlovic
Article

Abstract

We address the sequence classification problem using a probabilistic model based on hidden Markov models (HMMs). In contrast to commonly-used likelihood-based learning methods such as the joint/conditional maximum likelihood estimator, we introduce a discriminative learning algorithm that focuses on class margin maximization. Our approach has two main advantages: (i) As an extension of support vector machines (SVMs) to sequential, non-Euclidean data, the approach inherits benefits of margin-based classifiers, such as the provable generalization error bounds. (ii) Unlike many algorithms based on non-parametric estimation of similarity measures that enforce weak constraints on the data domain, our approach utilizes the HMM’s latent Markov structure to regularize the model in the high-dimensional sequence space. We demonstrate significant improvements in classification performance of the proposed method in an extensive set of evaluations on time-series sequence data that frequently appear in data mining and computer vision domains.

Keywords

Sequence classification Hidden Markov models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon J, Sclaroff S, Kollios G, Pavlovic V (2003) Discovering clusters in motion time-series data. In: Computer vision pattern recognition, Madison, WIGoogle Scholar
  2. Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden Markov support vector machines. In: International conference on machine learning, Washington, DCGoogle Scholar
  3. Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inform Theory 44(2): 525–536CrossRefMathSciNetzbMATHGoogle Scholar
  4. Bertsekas DP (1999) Nonlinear programming. Athena Scientific, NashuazbMATHGoogle Scholar
  5. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory, Pittsburgh, PAGoogle Scholar
  6. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgezbMATHGoogle Scholar
  7. Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Empirical methods in natural language processing, Philadelphia, PAGoogle Scholar
  8. Crammer K, Singer Y, Cristianini N, Shawe-Taylor J,Williamson B (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Machine Learn Res 2:265–292CrossRefGoogle Scholar
  9. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39: 185–197MathSciNetGoogle Scholar
  10. Duan K, Keerthi S (2003) Which is the best multiclass SVM method? An empirical study. In: Neural information processing systems, Vancouver, BC, CanadaGoogle Scholar
  11. Durbin R, Eddy S, Krogh A, Mitchenson G (2002) Biological sequence analysis. Cambridge University Press, CambridgeGoogle Scholar
  12. Greiner R, Zhou W (2002) Structural extension to logistic regression: discriminative parameter learning of belief net classifiers. In: Proceedings of annual meeting of the American Association for Artificial Intelligence, Edmonton, Alberta, CanadaGoogle Scholar
  13. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Neural information processing systems, Vancouver, BC, CanadaGoogle Scholar
  14. Heigold G, Schluter R, Ney H (2007) On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields. In: Proceedings of the international conference on spoken language processing (Interspeech). Antwerp, BelgiumGoogle Scholar
  15. Hettich S, Bay SD (1999) The UCI KDD archive. University of California, Department of Information and Computer Science, Irvine. http://kdd.ics.uci.edu
  16. Jaakkola T, Diekhans M, Haussler D (1999) Using the Fisher kernel method to detect remote protein homologies. In: International conference on intelligent systems for molecular biology, Heidelberg, GermanyGoogle Scholar
  17. Juang BH, Rabiner LR (1985) A probabilistic distance measure for hidden Markov models. AT & T Tech J 64:391–408MathSciNetGoogle Scholar
  18. Keogh E, Folias T (2002) The UCR time series data mining archive. University of California – Computer Science & Engineering Department, Riverside. http://www.cs.ucr.edu/~eamonn/TSDMA/index.html
  19. Keshet J, Shalev-Shwartz S, Bengio S, Singer Y, Chazan D (2006) Discriminative kernel-based phoneme sequence recognition. In: The 9th international conference on spoken language processing (INTERSPEECH), Pittsburgh, PAGoogle Scholar
  20. Krogh A (1994) Hidden markov models for labeled sequences. In: In proceedings of the 12th IAPR ICPR’94, IEEE Computer Society Press, pp. 140–144Google Scholar
  21. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning, Williamstown, MAGoogle Scholar
  22. Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. Pacific Symp Biocomput 7: 566–575Google Scholar
  23. Li X, Jiang H, Liu C (2005) Large margin hidden Markov models for speech recognition. In: International conference on acoustics, speech, and signal processing, Philadelphia, PAGoogle Scholar
  24. Li J, Yuan M, Lee CH (2006) Soft margin estimation of hidden Markov model parameters. In: International conference on spoken language processing, Pittsburgh, PAGoogle Scholar
  25. Liu C, Jiang H, Li X (2005) Discriminative training of CDHMMs for maximum relative separation margin. In: International conference on acoustics, speech, and signal processing, Philadelphia, PAGoogle Scholar
  26. Nadas A (1983) A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Acoust Speech Signal Process 31(4): 814–817CrossRefGoogle Scholar
  27. Ng AY, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: Neural information processing systems, Vancouver, BC, CanadaGoogle Scholar
  28. Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian Network Classifiers. In: International conference on machine learning, Bonn, GermanyGoogle Scholar
  29. Quattoni A, Wang S, Morency LP, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10): 1848–1852CrossRefGoogle Scholar
  30. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): 257–286CrossRefGoogle Scholar
  31. Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: SIAM international conference on data mining, Lake Buena Vista, FLGoogle Scholar
  32. Ratanamahatana CA, Keogh E (2005) Three myths about dynamic time warping. In: SIAM international conference on data mining, Newport Beach, CAGoogle Scholar
  33. Sakoe H, Chiba C (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1): 43–49CrossRefzbMATHGoogle Scholar
  34. Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In: Proceedings of human language technology-NAACL, Edmonton, Alberta, CanadaGoogle Scholar
  35. Sha F, Saul LK (2007) Large margin hidden Markov models for automatic speech recognition. In: Neural information processing systems, Vancouver, BC, CanadaGoogle Scholar
  36. Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th annual conference on computational learning theory, Desenzano sul Garda, ItalyGoogle Scholar
  37. Starner T, Pentland A (1995) Real-time American sign language recognition from video using hidden Markov models. In: International symposium on computer vision, Coral Gables, FLGoogle Scholar
  38. Tanawongsuwan R, Bobick A (2003) Performance analysis of time-distance gait parameters under different speeds. In: International conference on audio and video based biometric person authentication, Guildford, UKGoogle Scholar
  39. Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: Neural information processing systems, Vancouver, BC, CanadaGoogle Scholar
  40. Taskar B, Lacoste-Julien S, Klein D (2005) A discriminative matching approach to word alignment. In: Empirical methods in natural language processing, Vancouver, BC, CanadaGoogle Scholar
  41. Tian TP, Li R, Sclaroff S (2005) Articulated pose estimation in a learned smooth space of feasible solutions. In: Proceedings of IEEE workshop in computer vision and pattern recognition, San Diego, CAGoogle Scholar
  42. Vapnik VN (1995) The nature of statistical learning theory. Springer, New YorkzbMATHGoogle Scholar
  43. Veeraraghavan A, Chellappa R, Roy-Chowdhury A (2006) The function space of an activity. In: Computer vision and pattern recognition, New York, NYGoogle Scholar
  44. Wilson AD, Bobick AF (1999) Parametric hidden Markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9): 884–900CrossRefGoogle Scholar
  45. Woodland P, Povey D (2002) Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang 16(1): 25–47CrossRefGoogle Scholar
  46. Zhang T (2002) Covering number bounds of certain regularized linear function classes. J Mach Learn Res 2: 527–550CrossRefMathSciNetzbMATHGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Department of Electronic & Information EngineeringSeoul National University of Science & TechnologySeoulKorea
  2. 2.Department of Computer ScienceRutgers UniversityPiscatawayUSA

Personalised recommendations