Skip to main content

Advertisement

Log in

Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm

  • Published:
Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Abstract

We compute exact distributions of statistics of hidden state sequences in general settings. Distributions are computed for undirected and directed graphical models that are represented using conditional random fields and factor graphs. The methods discussed are relevant for graphs with a sparseness of edges that allows exact computation of the normalization constant. The distributions are obtained in an efficient manner by integrating sequential updates of the statistic’s value with the sum-product algorithm. Applications of this work include discrete hidden state sequences perturbed by noise and/or missing values, and state sequences that serve to classify observations. In the case of classification, the methods give a way to quantify the uncertainty in statistics associated with the classifications. The algorithm is applied to model-based false discovery distributions for protein-protein interactions and distributions related to CpG island lengths in DNA sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic search. Commun ACM 18(2):333–340

    Article  MathSciNet  MATH  Google Scholar 

  • Aji SM, McEliece RJ (2000) Generalized distributive law. IEEE T Inform Theory 46(2):325–343

    Article  MathSciNet  MATH  Google Scholar 

  • Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden Markov support vector machines. In: Proceedings of the 20th international conference on machine learning

  • Aston JAD, Martin DEK (2007). Distributions associated with general runs and patterns in hidden Markov models. Ann Appl Stat 1(2):585–611

    Article  MathSciNet  MATH  Google Scholar 

  • Baxter RJ (1982) Exactly solved models in statistical mechanics. Academic Press, New York

    MATH  Google Scholar 

  • Bird AP (1987) CpG-rich islands as gene markers in the vertebrate nucleus. Trends Genet 3:342–347

    Article  Google Scholar 

  • Culotta A, McCallum A (2004) Confidence estimation for signal extraction. In: Proceedings of the human language technology and North American chapter of the association for computational linguistics (HLT-NAACL)

  • Deng M, Hehta S, Sun F, Chen T (2002) Inferring domain-domain interactions from protein-protein interactions. Genome Res 12:1540–1548

    Article  Google Scholar 

  • Durbin R, Eddy SR, Krogh A, Mitchinson G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press

  • Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196(2):261–282

    Article  Google Scholar 

  • Habbal W, Monem F, Gärtner BC (2005) Errors in published sequences of human cytomegalovirus primers and probes: do we need more quality control? J Clin Microbiol 43(10):5408–5409

    Article  Google Scholar 

  • Hamilton JD (1989) A new approach to the economic analysis of non-stationary time series and the business cycle. Econometrica 57:357–384

    Article  MathSciNet  MATH  Google Scholar 

  • Hopcroft JE (1971) An n log n algorithm for minimizing states in a finite automaton. In: Kohavi Z, Paz A (eds) Theory of machines and computations. Academic Press, New York, pp 189–196

    Google Scholar 

  • Hopcroft J, Tarjan R (1973) Algorithm 447—efficient algorithms for graph manipulation. Commun ACM 16(2):372–378

    Article  Google Scholar 

  • Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Gaasterland T et al (eds) Proceedings of the fifth international conference on intelligent systems for molecular biology. AAAI Press, pp 179–186

  • Kschischang FR, Frey BJ, Loeliger H-A (2001) Factor graphs and the sum-product algorithm. IEEE T Inform Theory 47(2):498–519

    Article  MathSciNet  MATH  Google Scholar 

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML

  • Ledent S, Robin S (2005) Checking homogeneity of motifs’ distribution in heterogenous sequences. J Comput Biol 12(2):672–685

    Article  Google Scholar 

  • Lee C, Greiner R, Schmidt M (2005) Support vector random fields for spatial classification. In: European conference on principles and practice of knowledge discovery in databases (PKDD), pp 121–132, Porto, Portugal

    Google Scholar 

  • McEliece RJ, McKay DJC, Cheng JF (1998) Turbo decoding as an instance of Pearl’s belief propagation algorithm. IEEE J Sel Area Comm 16:140–152

    Article  Google Scholar 

  • Nam CFH, Aston JAD, Johansen AM (2012) Quantifying the uncertainty in change points. J Time Ser Anal. doi:10.1111/j.1467-9892.2011.00777.x

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. P IEEE 77(2):257–289

    Article  Google Scholar 

  • Riley R, Lee C, Sabatti C, Eisenberg D (2005) Inferring protein domain interactions from databases of interacting proteins. Genome Biol 6:R89

    Article  Google Scholar 

  • Saxonov S, Berg P, Brutlag DL (2006) A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. P Natl Acad Sci USA 103:1412–1417

    Article  Google Scholar 

  • Sikora M, Morcos F, Costello DJ, Izaguirre JA (2007) Bayesian inference of protein and domain interactions using the sum-product algorithm. In: Proceedings of the 2007 information theory and applications workshop

  • Sutton C, McCallum A (2006) An introduction to conditional random fields. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press

  • Takai D, Jones PA (2002) Comprehensive analysis of CpG islands in human chromosomes 21 and 22. P Natl Acad Sci USA 99:3740–3745

    Article  Google Scholar 

  • Tarjan (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2):146–160

    Article  MathSciNet  MATH  Google Scholar 

  • Tasker B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: Proceedings of NIPS

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donald E. K. Martin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martin, D.E.K., Aston, J.A.D. Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm. Methodol Comput Appl Probab 15, 897–918 (2013). https://doi.org/10.1007/s11009-012-9289-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11009-012-9289-4

Keywords

AMS 2000 Subject Classifications

Navigation