Skip to main content

Conceptual Clustering of Heterogeneous Sequences via Schema Mapping

  • 647 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 2366)

Abstract

We are concerned with clustering sequences that have been classified according to heterogeneous schema. We adopt a model-based approach that uses a Hidden Markov model (HMM) that has as states the stages of the underlying process that generates the sequences, thus allowing us to handle complex and heterogeneous data. Each cluster is described in terms of a HMM where we seek to find schema mappings between the states of the original sequences and the states of the HMM. The general solution that we propose involves several distinct tasks. Firstly, there is a clustering problem where we seek to group similar sequences; for this we use mutual entropy to identify associations between sequence states. Secondly, because we are concerned with clustering heterogeneous sequences, we must determine the mappings between the states of each sequence in a cluster and the states of an underlying hidden process; for this we compute the most probable mapping. Thirdly, on the basis of these mappings we use maximum likelihood techniques to learn the probabilistic description of the hidden Markov process for each cluster. Finally, we use these descriptions to characterise the clusters by using Dynamic Programming to determine the most probable pathway for each cluster. Such an approach provides an intuitive way of describing the underlying shape of the process by explicitly modelling the temporal aspects of the data; non time-homogeneous HMMs are also considered. The approach is illustrated using gene expression sequences.

Keywords

  • Hide Markov Model
  • Schema Mapping
  • Conceptual Cluster
  • Heterogeneous Sequence
  • Hide Markov Model State

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cadez, I., Gaffney, S., Smyth, P.: A General Probabilistic Framework for Clustering Individuals. Proceedings of ACM SIGKDD (2000) 140–149

    Google Scholar 

  2. Ge, X., Smyth, P.: Deformable Markov Model Templates for Time-series Pattern Matching. Proceedings of ACM SIGKDD (2000) 81–90

    Google Scholar 

  3. Scotney, B.W., McClean, S.I.: Efficient Knowledge Discovery through the Integration of Heterogeneous Data. Information and Software Technology (Special Issue on Knowledge Discovery and Data Mining) 41 (1999) 569–578

    Google Scholar 

  4. Scotney, B.W., McClean, S.I., Rodgers, M.C.: Optimal and Efficient Integration of Heterogeneous Summary Tables in a Distributed Database. Data and Knowledge Engineering 29 (1999) 337–350

    CrossRef  MATH  Google Scholar 

  5. McClean, S.I., Scotney, B.W., Shapcott, C.M.: Aggregation of Imprecise and Uncertain Information in Databases. IEEE Trans. Knowledge and Data Engineering 13(6) (2001) 902–912

    CrossRef  Google Scholar 

  6. McClean, S.I., Scotney, B.W., Greer, K.R.C.: Clustering Heterogenous Distributed Databases. Proceedings of KDD Workshop on Knowledge Discovery from Parallel and Distributed Databases. Kargupta, H., Ghosh, J., Kumar, V., Obradovic, Z. (eds.) (2000) 20–29

    Google Scholar 

  7. Smyth, P.: Clustering Sequences with Hidden Markov Models. In Mozer, M. C., Jordan, M. I., Petsche, T. (eds.): Advances in Neural Information Processing 9 MIT Press (1997)

    Google Scholar 

  8. D’haeseleer, P., Wen, X., Fuhrman, S., Somogyi, R.: Mining the Gene Expression Matrix: Inferring Gene Relationships from large scale Gene Expression Data. In: Paton, R.C., Holcombe, M. (eds.): Information Processing in Cells and Tissues, Plenum Publishing (1998) 203–323

    Google Scholar 

  9. Smyth, P., Heckerman, D., Jordan, M.: Probabilistic Independence Networks for Hidden Markov Models. Neural Computation 9(2) (1997) 227–269

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McClean, S., Scotney, B., Palmer, F. (2002). Conceptual Clustering of Heterogeneous Sequences via Schema Mapping. In: Hacid, MS., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-48050-1_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43785-7

  • Online ISBN: 978-3-540-48050-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics