Artificial Intelligence Review

, Volume 20, Issue 1, pp 53–73

Conceptual Clustering of Heterogeneous Gene Expression Sequences

  • Sally McClean
  • Bryan Scotney
  • Steve Robinson
Article

DOI: 10.1023/A:1026036631075

Cite this article as:
McClean, S., Scotney, B. & Robinson, S. Artificial Intelligence Review (2003) 20: 53. doi:10.1023/A:1026036631075

Abstract

We are concerned with clustering andcharacterising gene expression sequences thathave been classified according to heterogeneousclassification schemes. We adopt a model-basedapproach that uses a Hidden Markov Model (HMM)that has as states the stages of the underlyingprocess that generates the gene sequences, thusallowing us to handle complex and heterogeneousdata. Each cluster is described in terms of aHMM where we seek to find schema mappingsbetween the states of the original sequencesand the states of the HMM.

The general solution that we propose involvesseveral distinct tasks. Firstly, there is aclustering problem where we seek to groupsimilar sequences; for this we use mutualentropy to identify associations betweensequence states. Secondly, because we areconcerned with clustering heterogeneoussequences, we must determine the mappingsbetween the states of each sequence in acluster and the states of an underlying hiddenprocess; for this we compute the most probablemapping. Thirdly, using these mappings weemploy maximum likelihood techniques to learnthe probabilistic description of the hiddenMarkov process for each cluster. Fourthly, weuse these descriptions to characterise theclusters using Dynamic Programming to determinethe most probable pathway for each cluster.Finally, we derive linguistic labels todescribe the clusters in a user-friendlymanner. Such an approach provides an intuitiveway of describing the underlying shape of theprocess by explicitly modelling the temporalaspects of the data. Non time-homogeneous HMMsare used to capture the full temporal semantics.

bioinformatics clustering knowledge discovery schema mapping sequence processing 

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Sally McClean
    • 1
  • Bryan Scotney
    • 1
  • Steve Robinson
    • 1
  1. 1.School of Computing and Information Engineering, Faculty of InformaticsUniversity of UlsterColeraineNorthern Ireland

Personalised recommendations