Data Mining pp 244-259 | Cite as

Hierarchical Hidden Markov Models: An Application to Health Insurance Data

  • Ah Chung Tsoi
  • Shu Zhang
  • Markus Hagenbuchner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3755)

Abstract

This paper provides a constructive algorithm in which a hierarchical tree of hidden Markov models may be obtained directly from data using an unsupervised learning regime. The method is applied to health insurance transaction data such that profiles with similar local temporal behaviours are grouped together. By judicious incorporation of limited additional prior information, it is found that profiles can be separated into various sub-behavioural groups thus providing a technique for large-scale automatic labelling of data. In the application to the health insurance transaction data set, by incorporating limited information concerning the medical functions used in a medical procedure, it is possible to label some individual medical transactions as to whether they are related to a particular medical condition or not. This automatic labelling process adds values to the collected transactional database for possible further applications, e.g. public health studies.

Keywords

Hide Markov Model Gaussian Mixture Model Pattern Discovery Postcode Area Health Insurance Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Juang, B.H., Levenson, S.E., Sondhi, M.M.: Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans. on Information Theory 32, 307–309 (1986)CrossRefGoogle Scholar
  2. 2.
    Liporace, L.A.: Maximum likelihood estimation for multivariate observations of Markov sources. IEEE Trans. on Information Theory 28, 729–734 (1982)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)MATHCrossRefGoogle Scholar
  4. 4.
    Duda, R.O., Hart, P.E.: Pattern recognition and scene analysis. J. Wiley, New York (1972)Google Scholar
  5. 5.
    Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Kohonen, T.: Self-Organizing Maps, 2nd Extended edn. Springer, Heidelberg (1995/1997)Google Scholar
  7. 7.
    Verbeek, J.J., Vlassis, N., Krose, B.: Efficient Greedy Learning of Gaussian Mixture Model. Neural Computation 15(2), 469–485 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Bierens, H.J.: Information criteria (November 2004), http://econ.la.psu.edu/~hbierens/INFCRIT.PDF
  9. 9.
    Hastie, T., Tibshirani, R., Friedman, J.: The Effective Number of Parameters. In: The Elements of Statistical Learning, Data Mining, Inference and Prediction, pp. 203–205. Springer, Heidelberg (2001)Google Scholar
  10. 10.
    Deller Jr., J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-time Processing of Speech Signals. MacMillan Publishing Company, New York (1993)Google Scholar
  11. 11.
    Smyth, P.: Clustering Sequences with Hidden Markov Models. In: Advances in Neural Information Processing Systems, vol. 9, p. 648. MIT Press, Cambridge (1997)Google Scholar
  12. 12.
    Wahba, G.: Spline models for observational data. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ah Chung Tsoi
    • 1
  • Shu Zhang
    • 2
  • Markus Hagenbuchner
    • 2
  1. 1.e-Research CentreMonash University
  2. 2.Faculty of InformaticsUniversity of Wollongong

Personalised recommendations