Identifying Change-Points in Biological Sequences via Sequential Importance Sampling

Sofronov, George Yu.; Evans, Gareth E.; Keith, Jonathan M.; Kroese, Dirk P.

doi:10.1007/s10666-008-9160-8

Identifying Change-Points in Biological Sequences via Sequential Importance Sampling

Published: 05 July 2008

Volume 14, pages 577–584, (2009)
Cite this article

Environmental Modeling & Assessment Aims and scope Submit manuscript

George Yu. Sofronov¹,
Gareth E. Evans²,
Jonathan M. Keith³ &
…
Dirk P. Kroese²

176 Accesses
14 Citations
Explore all metrics

Abstract

The genomes of complex organisms, including the human genome, are highly structured. This structure takes the form of segmental patterns of variation in various properties and may be caused by the division of genomes into regions of distinct function, by the contingent evolutionary processes that gave rise to genomes, or by a combination of both. Whatever the cause, identifying the change-points between segments is potentially important, as a means of discovering the functional components of a genome, understanding the evolutionary processes involved, and fully describing genomic architecture. One property of genomes that is known to display a segmental pattern of variation is GC content. The GC content of a portion of DNA is the proportion of GC pairs that it contains. Sharp changes in GC content can be observed in human and other genomes. Such change-points may be the boundaries of functional elements or may play a structural role. We model genome sequences as a multiple change-point process, that is, a process in which sequential data are separated into segments by an unknown number of change-points, with each segment supposed to have been generated by a different process. We consider a Sequential Importance Sampling approach to change-point modeling using Monte Carlo simulation to find estimates of change-points as well as parameters of the process on each segment. Numerical experiments illustrate the effectiveness of the approach. We obtain estimates for the locations of change-points in artificially generated sequences and compare the accuracy of these estimates to those obtained via Markov chain Monte Carlo and a well-known method, IsoFinder. We also provide examples with real data sets to illustrate the usefulness of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Article Open access 05 December 2014

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

BUSCO: Assessing Genome Assembly and Annotation Completeness

References

Braun, J. V., & Muller, H.-G. (1998). Statistical methods for DNA sequence segmentation. Statistical Science, 13, 142–162.
Article Google Scholar
Keith, J., Kroese, D. P., & Bryant D. (2004). A generalized markov sampler. Methodology and Computing in Applied Probability, 6(1), 29–53.
Article Google Scholar
Keith, J. M. (2006). Segmenting eukaryotic genomes with the generalized Gibbs sampler. Journal of Computational Biology, 13(7), 1369–1383.
Article CAS Google Scholar
Keith, J. M., Adams, P., Stephen, S., & Mattick, J. S. (2008). Delineating slowly and rapidly evolving fractions of the drosophila genome. Journal of Computational Biology, 15(4), 407–430.
Article CAS Google Scholar
Oliver, J. L., Bernaola-Galvan, P., Carpena, P., & Roman-Roldan, R. (2001). Isochore chromosome maps of eukaryotic genomes. Gene, 276, 47–56.
Article CAS Google Scholar
Oliver, J. L., Carpena, P., Hackenberg, M., & Bernaola-Galvan, P. (2005). IsoFinder. http://bioinfo2.ugr.es/IsoF/isofinder.html.
Oliver, J. L., Carpena, P., Hackenberg, M., & Bernaola-Galvan, P. (2004). IsoFinder: computational prediction of isochores in genome sequences. Nucleic Acids Research, 32(Web Server issue), W287–W292.
Article CAS Google Scholar
Oliver, J. L., Carpena, P., Roman-Roldan, R., Mata-Balaguer, T., et al. (2002). Isochore chromosome maps of the human genome. Gene, 300, 117–127.
Article CAS Google Scholar
Oliver, J. L., Roman-Roldan, R., Perez, J., & Bernaola-Galvan, P. (1999). Segment: identifying compositional domains in DNA sequences. Bioinformatics, 15, 974–979.
Article CAS Google Scholar
Rubinstein, R. Y., & Kroese, D. P. (2007). Simulation and the Monte Carlo method, 2nd edition. Wiley, New York.
Google Scholar
Waterston, R. H., Lindblad-Toh, K., Birney, E., Rogers, J., et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562.
Article CAS Google Scholar

Download references

Acknowledgements

G. Yu. Sofronov and D. P. Kroese acknowledge the support of an Australian Research Council discovery grant (DP0556631). J. M. Keith would like to acknowledge the support of the Australian Research Council discovery grants (DP0452412, DP0556631) and a National Medical and Health Research Council grant “Statistical methods and algorithms for analysis of high-throughput genetics and genomics platforms” (389892).

Author information

Authors and Affiliations

School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, New South Wales, 2522, Australia
George Yu. Sofronov
Department of Mathematics, The University of Queensland, Brisbane, Queensland, 4072, Australia
Gareth E. Evans & Dirk P. Kroese
School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Queensland, 4001, Australia
Jonathan M. Keith

Authors

George Yu. Sofronov
View author publications
You can also search for this author in PubMed Google Scholar
Gareth E. Evans
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan M. Keith
View author publications
You can also search for this author in PubMed Google Scholar
Dirk P. Kroese
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Yu. Sofronov.

Additional information

This is an extended version of a paper presented at the 17th Biennial Congress on Modelling and Simulation, Christchurch, New Zealand, December 2007.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sofronov, G.Y., Evans, G.E., Keith, J.M. et al. Identifying Change-Points in Biological Sequences via Sequential Importance Sampling. Environ Model Assess 14, 577–584 (2009). https://doi.org/10.1007/s10666-008-9160-8

Download citation

Received: 14 May 2008
Accepted: 29 May 2008
Published: 05 July 2008
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10666-008-9160-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying Change-Points in Biological Sequences via Sequential Importance Sampling

Abstract

Access this article

Similar content being viewed by others

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

BUSCO: Assessing Genome Assembly and Annotation Completeness

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identifying Change-Points in Biological Sequences via Sequential Importance Sampling

Abstract

Access this article

Similar content being viewed by others

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

BUSCO: Assessing Genome Assembly and Annotation Completeness

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation