Abstract
We are concerned with clustering sequences that have been classified according to heterogeneous schema. We adopt a model-based approach that uses a Hidden Markov model (HMM) that has as states the stages of the underlying process that generates the sequences, thus allowing us to handle complex and heterogeneous data. Each cluster is described in terms of a HMM where we seek to find schema mappings between the states of the original sequences and the states of the HMM. The general solution that we propose involves several distinct tasks. Firstly, there is a clustering problem where we seek to group similar sequences; for this we use mutual entropy to identify associations between sequence states. Secondly, because we are concerned with clustering heterogeneous sequences, we must determine the mappings between the states of each sequence in a cluster and the states of an underlying hidden process; for this we compute the most probable mapping. Thirdly, on the basis of these mappings we use maximum likelihood techniques to learn the probabilistic description of the hidden Markov process for each cluster. Finally, we use these descriptions to characterise the clusters by using Dynamic Programming to determine the most probable pathway for each cluster. Such an approach provides an intuitive way of describing the underlying shape of the process by explicitly modelling the temporal aspects of the data; non time-homogeneous HMMs are also considered. The approach is illustrated using gene expression sequences.
Keywords
- Hide Markov Model
- Schema Mapping
- Conceptual Cluster
- Heterogeneous Sequence
- Hide Markov Model State
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cadez, I., Gaffney, S., Smyth, P.: A General Probabilistic Framework for Clustering Individuals. Proceedings of ACM SIGKDD (2000) 140–149
Ge, X., Smyth, P.: Deformable Markov Model Templates for Time-series Pattern Matching. Proceedings of ACM SIGKDD (2000) 81–90
Scotney, B.W., McClean, S.I.: Efficient Knowledge Discovery through the Integration of Heterogeneous Data. Information and Software Technology (Special Issue on Knowledge Discovery and Data Mining) 41 (1999) 569–578
Scotney, B.W., McClean, S.I., Rodgers, M.C.: Optimal and Efficient Integration of Heterogeneous Summary Tables in a Distributed Database. Data and Knowledge Engineering 29 (1999) 337–350
McClean, S.I., Scotney, B.W., Shapcott, C.M.: Aggregation of Imprecise and Uncertain Information in Databases. IEEE Trans. Knowledge and Data Engineering 13(6) (2001) 902–912
McClean, S.I., Scotney, B.W., Greer, K.R.C.: Clustering Heterogenous Distributed Databases. Proceedings of KDD Workshop on Knowledge Discovery from Parallel and Distributed Databases. Kargupta, H., Ghosh, J., Kumar, V., Obradovic, Z. (eds.) (2000) 20–29
Smyth, P.: Clustering Sequences with Hidden Markov Models. In Mozer, M. C., Jordan, M. I., Petsche, T. (eds.): Advances in Neural Information Processing 9 MIT Press (1997)
D’haeseleer, P., Wen, X., Fuhrman, S., Somogyi, R.: Mining the Gene Expression Matrix: Inferring Gene Relationships from large scale Gene Expression Data. In: Paton, R.C., Holcombe, M. (eds.): Information Processing in Cells and Tissues, Plenum Publishing (1998) 203–323
Smyth, P., Heckerman, D., Jordan, M.: Probabilistic Independence Networks for Hidden Markov Models. Neural Computation 9(2) (1997) 227–269
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McClean, S., Scotney, B., Palmer, F. (2002). Conceptual Clustering of Heterogeneous Sequences via Schema Mapping. In: Hacid, MS., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_11
Download citation
DOI: https://doi.org/10.1007/3-540-48050-1_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43785-7
Online ISBN: 978-3-540-48050-1
eBook Packages: Springer Book Archive
