Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm

Martin, Donald E. K.; Aston, John A. D.

doi:10.1007/s11009-012-9289-4

Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm

Published: 17 May 2012

Volume 15, pages 897–918, (2013)
Cite this article

Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Donald E. K. Martin¹ &
John A. D. Aston²

190 Accesses
6 Citations
Explore all metrics

Abstract

We compute exact distributions of statistics of hidden state sequences in general settings. Distributions are computed for undirected and directed graphical models that are represented using conditional random fields and factor graphs. The methods discussed are relevant for graphs with a sparseness of edges that allows exact computation of the normalization constant. The distributions are obtained in an efficient manner by integrating sequential updates of the statistic’s value with the sum-product algorithm. Applications of this work include discrete hidden state sequences perturbed by noise and/or missing values, and state sequences that serve to classify observations. In the case of classification, the methods give a way to quantify the uncertainty in statistics associated with the classifications. The algorithm is applied to model-based false discovery distributions for protein-protein interactions and distributions related to CpG island lengths in DNA sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

Article 12 April 2024

Introduction to Bioinformatics

References

Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic search. Commun ACM 18(2):333–340
Article MathSciNet MATH Google Scholar
Aji SM, McEliece RJ (2000) Generalized distributive law. IEEE T Inform Theory 46(2):325–343
Article MathSciNet MATH Google Scholar
Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden Markov support vector machines. In: Proceedings of the 20th international conference on machine learning
Aston JAD, Martin DEK (2007). Distributions associated with general runs and patterns in hidden Markov models. Ann Appl Stat 1(2):585–611
Article MathSciNet MATH Google Scholar
Baxter RJ (1982) Exactly solved models in statistical mechanics. Academic Press, New York
MATH Google Scholar
Bird AP (1987) CpG-rich islands as gene markers in the vertebrate nucleus. Trends Genet 3:342–347
Article Google Scholar
Culotta A, McCallum A (2004) Confidence estimation for signal extraction. In: Proceedings of the human language technology and North American chapter of the association for computational linguistics (HLT-NAACL)
Deng M, Hehta S, Sun F, Chen T (2002) Inferring domain-domain interactions from protein-protein interactions. Genome Res 12:1540–1548
Article Google Scholar
Durbin R, Eddy SR, Krogh A, Mitchinson G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press
Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196(2):261–282
Article Google Scholar
Habbal W, Monem F, Gärtner BC (2005) Errors in published sequences of human cytomegalovirus primers and probes: do we need more quality control? J Clin Microbiol 43(10):5408–5409
Article Google Scholar
Hamilton JD (1989) A new approach to the economic analysis of non-stationary time series and the business cycle. Econometrica 57:357–384
Article MathSciNet MATH Google Scholar
Hopcroft JE (1971) An n log n algorithm for minimizing states in a finite automaton. In: Kohavi Z, Paz A (eds) Theory of machines and computations. Academic Press, New York, pp 189–196
Google Scholar
Hopcroft J, Tarjan R (1973) Algorithm 447—efficient algorithms for graph manipulation. Commun ACM 16(2):372–378
Article Google Scholar
Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Gaasterland T et al (eds) Proceedings of the fifth international conference on intelligent systems for molecular biology. AAAI Press, pp 179–186
Kschischang FR, Frey BJ, Loeliger H-A (2001) Factor graphs and the sum-product algorithm. IEEE T Inform Theory 47(2):498–519
Article MathSciNet MATH Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML
Ledent S, Robin S (2005) Checking homogeneity of motifs’ distribution in heterogenous sequences. J Comput Biol 12(2):672–685
Article Google Scholar
Lee C, Greiner R, Schmidt M (2005) Support vector random fields for spatial classification. In: European conference on principles and practice of knowledge discovery in databases (PKDD), pp 121–132, Porto, Portugal
Google Scholar
McEliece RJ, McKay DJC, Cheng JF (1998) Turbo decoding as an instance of Pearl’s belief propagation algorithm. IEEE J Sel Area Comm 16:140–152
Article Google Scholar
Nam CFH, Aston JAD, Johansen AM (2012) Quantifying the uncertainty in change points. J Time Ser Anal. doi:10.1111/j.1467-9892.2011.00777.x
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. P IEEE 77(2):257–289
Article Google Scholar
Riley R, Lee C, Sabatti C, Eisenberg D (2005) Inferring protein domain interactions from databases of interacting proteins. Genome Biol 6:R89
Article Google Scholar
Saxonov S, Berg P, Brutlag DL (2006) A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. P Natl Acad Sci USA 103:1412–1417
Article Google Scholar
Sikora M, Morcos F, Costello DJ, Izaguirre JA (2007) Bayesian inference of protein and domain interactions using the sum-product algorithm. In: Proceedings of the 2007 information theory and applications workshop
Sutton C, McCallum A (2006) An introduction to conditional random fields. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press
Takai D, Jones PA (2002) Comprehensive analysis of CpG islands in human chromosomes 21 and 22. P Natl Acad Sci USA 99:3740–3745
Article Google Scholar
Tarjan (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2):146–160
Article MathSciNet MATH Google Scholar
Tasker B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: Proceedings of NIPS

Download references

Author information

Authors and Affiliations

Department of Statistics, NC State University, 4272 SAS Hall, Raleigh, NC, 27695-8203, USA
Donald E. K. Martin
CRiSM, Department of Statistics, University of Warwick, Warwick, UK
John A. D. Aston

Authors

Donald E. K. Martin
View author publications
You can also search for this author in PubMed Google Scholar
John A. D. Aston
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donald E. K. Martin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martin, D.E.K., Aston, J.A.D. Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm. Methodol Comput Appl Probab 15, 897–918 (2013). https://doi.org/10.1007/s11009-012-9289-4

Download citation

Received: 28 December 2010
Revised: 23 January 2012
Accepted: 25 April 2012
Published: 17 May 2012
Issue Date: December 2013
DOI: https://doi.org/10.1007/s11009-012-9289-4

Keywords

AMS 2000 Subject Classifications

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

Introduction to Bioinformatics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

AMS 2000 Subject Classifications

Navigation

Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

Introduction to Bioinformatics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS 2000 Subject Classifications

Search

Navigation