Abstract
Mixtures of Hidden Markov Models (MHMM) are widely used for clustering of sequential data, by letting each cluster correspond to a Hidden Markov Model (HMM). Expectation Maximization (EM) is the standard approach for learning the parameters of an MHMM. However, due to the non-convexity of the objective function, EM can converge to poor local optima. To tackle this problem, we propose a novel method, the Orthogonal Mixture of Hidden Markov Models (oMHMM), which aims to direct the search away from candidate solutions that include very similar HMMs, since those do not fully exploit the power of the mixture model. The directed search is achieved by including a penalty in the objective function that favors higher orthogonality between the transition matrices of the HMMs. Experimental results on both simulated and real-world datasets show that the oMHMM consistently finds equally good or better local optima than the standard EM for an MHMM; for some datasets, the clustering performance is significantly improved by our novel oMHMM (up to 55 percentage points w.r.t. the v-measure). Moreover, the oMHMM may also decrease the computational cost substantially, reducing the number of iterations down to a fifth of those required by MHMM using standard EM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Available from the NCBI Sequence Read Archive (SRA) under accession number SRP074289; for pre-processing of the data, see [18].
References
Aghabozorgi, S., Seyed Shirkhorshidi, A., Ying Wah, T.: Time-series clustering - a decade review. Inf. Syst. 53, 16–38 (2015)
Agrawal, A., Verschueren, R., Diamond, S., Boyd, S.: A rewriting system for convex optimization problems. J. Control Decision 5(1), 42–60 (2018)
Altosaar, J., Ranganath, R., Blei, D.: Proximity variational inference. AISTATS (2017)
Bache, K., Lichman, M.: Uci machine learning repository. UCI machine learning repository (2013)
Baum, L., Petrie, T.: Statistical inference for probabilistic functions of finite state markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)
Bishop, C.: Pattern recognition and machine learning. Springer, Information science and statistics, New York (2006)
Bishop, C.: Model-based machine learning. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences 371 (2012)
Blei, D., Kucukelbir, A., Mcauliffe, J.: Variational inference: a review for statisticians. J. Am. Statist. Assoc. 112(518), 859–877 (2017)
Chamroukhi, F., Nguyen, H.: Model based clustering and classification of functional data. Wiley Interdiscip. Rev. Data Mining Knowl. Disc. 9(4), e1298 (2019)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)
Diamond, S., Boyd, S.: CVXPY: a Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(83), 1–5 (2016)
Dias, J., Vermunt, J., Ramos, S.: Mixture hidden markov models in finance research. In: Advances in Data Analysis, Data Handling and Business Intelligence, pp. 451–459 (2009)
Esmaili, N., Piccardi, M., Kruger, B., Girosi, F.: Correction: Analysis of healthcare service utilization after transport-related injuries by a mixture of hidden markov models. PLoS One 14(4), e0206274 (2019)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (2013)
Jebara, T., Song, Y., Thadani, K.: Spectral clustering and embedding with hidden markov models. In: Machine Learning: ECML 2007: 18th European Conference on Machine Learning 4701, pp. 164–175 (2007)
Jonathan, A., Sclaroff, S., Kollios, G., Pavlovic, V.: Discovering clusters in motion time-series data. In: CVPR (2003)
Kulesza, A., Taskar, B.: Determinantal point processes for machine learning. Found. Trends Mach. Learn. 5(2–3), 123–286 (2012)
Leung, M., et al.: Single-cell DNA sequencing reveals a late-dissemination model in metastatic colorectal cancer. Genome Res. 27(8), 1287–1299 (2017)
Ma, Q., Zheng, J., Li, S., Cottrell, G.: Learning representations for time series clustering. Adv. Neural Inf. Process. Syst. 32, 3781–3791 (2019)
Maoying Qiao, R., Bian, W., Xu, D., Tao, D.: Diversified hidden markov models for sequential labeling. IEEE Trans. Knowl. Data Eng. 27(11), 2947–2960 (2015)
McGibbon, R., Ramsundar, B., Sultan, M., Kiss, G., Pande, V.: Understanding protein dynamics with l1-regularized reversible hidden markov models. In: Proceedings of the 31st International Conference on Machine Learning, vol. 32, no. 2, pp. 1197–1205 (2014)
Montanez, G., Amizadeh, S., Laptev, N.: Inertial hidden markov models: modeling change in multivariate time series. In: AAAI Conference on Artificial Intelligence (2015)
Oates, T., Firoiu, L., Cohen, P.: Clustering time series with hidden markov models and dynamic time warping. In: IJCAI-99 Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, pp. 17–21 (1999)
Pernes, D., Cardoso, J.S.: Spamhmm: sparse mixture of hidden markov models for graph connected entities. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–10 (2019)
Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Statist. Assoc. 66(336), 846–850 (1971)
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL (2007)
Safinianaini, N., Boström, H., Kaldo, V.: Gated hidden markov models for early prediction of outcome of internet-based cognitive behavioral therapy. In: Riaño, D., Wilk, S., ten Teije, A. (eds.) AIME 2019. LNCS (LNAI), vol. 11526, pp. 160–169. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21642-9_22
Safinianaini, N., De Souza, C., Lagergren, J.: Copymix: mixture model based single-cell clustering and copy number profiling using variational inference. bioRxiv (2020). https://doi.org/10.1101/2020.01.29.926022
Smyth, P.: Clustering sequences with hidden markov models. In: Advances in Neural Information Processing Systems (1997)
Subakan, C., Traa, J., Smaragdis, P.: Spectral learning of mixture of hidden markov models. Adv. Neural Inf. Process. Syst. 27, 2249–2257 (2014)
Tao, L., Elhamifar, E., Khudanpur, S., Hager, G., Vidal, R.: Sparse hidden markov models for surgical gesture classification and skill evaluation. In: Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, pp. 167–177 (2012)
Wang, Q., Schuurmans, D.: Improved estimation for unsupervised part-of-speech tagging. In: Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, pp. 219–224 (2005)
Xing, Z., Pei, J., Keogh, E.: A brief survey on sequence classification. ACM SIGKDD Explor. Newslett. 12(1), 40–48 (2010)
Yuting, Q., Paisley, J., Carin, L.: Music analysis using hidden markov mixture models. IEEE Trans. Signal Process. 55(11), 5209–5224 (2007)
Acknowledgments
We thank Johan Fylling, Mohammadreza Mohaghegh Neyshabouri, and Diogo Pernes for their great help during the preparation of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Safinianaini, N., de Souza, C.P.E., Boström, H., Lagergren, J. (2021). Orthogonal Mixture of Hidden Markov Models. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-67658-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67657-5
Online ISBN: 978-3-030-67658-2
eBook Packages: Computer ScienceComputer Science (R0)