Data Mining pp 244-259 | Cite as
Hierarchical Hidden Markov Models: An Application to Health Insurance Data
Abstract
This paper provides a constructive algorithm in which a hierarchical tree of hidden Markov models may be obtained directly from data using an unsupervised learning regime. The method is applied to health insurance transaction data such that profiles with similar local temporal behaviours are grouped together. By judicious incorporation of limited additional prior information, it is found that profiles can be separated into various sub-behavioural groups thus providing a technique for large-scale automatic labelling of data. In the application to the health insurance transaction data set, by incorporating limited information concerning the medical functions used in a medical procedure, it is possible to label some individual medical transactions as to whether they are related to a particular medical condition or not. This automatic labelling process adds values to the collected transactional database for possible further applications, e.g. public health studies.
Keywords
Hide Markov Model Gaussian Mixture Model Pattern Discovery Postcode Area Health Insurance DataPreview
Unable to display preview. Download preview PDF.
References
- 1.Juang, B.H., Levenson, S.E., Sondhi, M.M.: Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans. on Information Theory 32, 307–309 (1986)CrossRefGoogle Scholar
- 2.Liporace, L.A.: Maximum likelihood estimation for multivariate observations of Markov sources. IEEE Trans. on Information Theory 28, 729–734 (1982)MATHCrossRefMathSciNetGoogle Scholar
- 3.McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)MATHCrossRefGoogle Scholar
- 4.Duda, R.O., Hart, P.E.: Pattern recognition and scene analysis. J. Wiley, New York (1972)Google Scholar
- 5.Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)MATHCrossRefMathSciNetGoogle Scholar
- 6.Kohonen, T.: Self-Organizing Maps, 2nd Extended edn. Springer, Heidelberg (1995/1997)Google Scholar
- 7.Verbeek, J.J., Vlassis, N., Krose, B.: Efficient Greedy Learning of Gaussian Mixture Model. Neural Computation 15(2), 469–485 (2003)MATHCrossRefGoogle Scholar
- 8.Bierens, H.J.: Information criteria (November 2004), http://econ.la.psu.edu/~hbierens/INFCRIT.PDF
- 9.Hastie, T., Tibshirani, R., Friedman, J.: The Effective Number of Parameters. In: The Elements of Statistical Learning, Data Mining, Inference and Prediction, pp. 203–205. Springer, Heidelberg (2001)Google Scholar
- 10.Deller Jr., J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-time Processing of Speech Signals. MacMillan Publishing Company, New York (1993)Google Scholar
- 11.Smyth, P.: Clustering Sequences with Hidden Markov Models. In: Advances in Neural Information Processing Systems, vol. 9, p. 648. MIT Press, Cambridge (1997)Google Scholar
- 12.Wahba, G.: Spline models for observational data. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)Google Scholar