Distribution on Contingency of Alignment of Two Literal Sequences Under Constrains

Jäntschi, Lorentz; Bolboacă, Sorana D.

doi:10.1007/s10441-014-9243-7

Distribution on Contingency of Alignment of Two Literal Sequences Under Constrains

Regular Article
Published: 19 December 2014

Volume 63, pages 55–69, (2015)
Cite this article

Acta Biotheoretica Aims and scope Submit manuscript

Lorentz Jäntschi^1,2,3,4 &
Sorana D. Bolboacă^3,5

163 Accesses
Explore all metrics

Abstract

The case of ungapped alignment of two literal sequences under constrains is considered. The analysis lead to general formulas for probability mass function and cumulative distribution function for the general case of using an alphabet with a chosen number of letters (e.g. 4 for deoxyribonucleic acid sequences) in the expression of the literal sequences. Formulas for three statistics including mean, mode, and standard deviation were obtained. Distributions are depicted for three important particular cases: alignment on binary sequences, alignment of trinomial series (such as coming from generalized Kronecker delta), and alignment of genetic sequences (with four literals in the alphabet). A particular case when sequences contain each letter of the alphabet at least once in both sequences has also been analyzed and some statistics for this restricted case are given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allali J, Saule C, Chauve C et al (2012) BRASERO: a resource for benchmarking RNA secondary structure comparison algorithms. Adv Bioinform. art no. 893048
Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Article Google Scholar
Altschul SF, Bundschuh R, Olsen R et al (2001) The estimation of statistical parameters for local alignment score distribution. Nucleic Acids Res 29:351–361
Article Google Scholar
Bolboacă SD, Jäntschi L, Sestraş RE (2011) Distribution fitting 12. Sampling distribution of compounds abundance from plant species measured by instrumentation. Application to plants metabolism classification. Bull UASVM Hortic 68:54–61
Google Scholar
Brualdi RA (2010) Introductory combinatorics, 5th edn. Prentice-Hall, Englewood Cliffs
Do CB, Mahabhashyam MS, Brudno M et al (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340
Article Google Scholar
Fisher RA, Tippett LHC (1928) Limiting forms of the frequency distribution of the largest and smallest member of a sample. Proc Camb Philos Soc 24:180–190
Article Google Scholar
Frith MC, Hamada M, Horton P (2010) Parameters for accurate genome alignment. BMC Bioinform 11:80
Article Google Scholar
Jäntschi L, Bolboacă SD (2010) Exact probabilities and confidence limits for binomial samples: applied to the difference between two proportions. Sci World J 10:865–878
Article Google Scholar
Jäntschi L, Bolboacă SD (2011) Distributing correlation coefficients of linear structure-activity/property model. Leonardo J Sci 19:27–48
Google Scholar
Jäntschi L, Bolboacă SD, Bălan M et al (2011) Distribution fitting 13. Analysis of independent, multiplicative effect of factors. Application to the effect of essential oils extracts from plant species on bacterial species. Application to the factors of antibacterial activity of plant species. Bull UASVM Anim Sci Biotechnol 68:323–331
Google Scholar
Jäntschi L, Sobolu RS, Bolboacă SD (2012a) An analysis of the distribution of seed size: a case study of the gymnosperms. Not Bot Horti Agrobot 40:46–52
Google Scholar
Jäntschi L, Bolboacă SD, Sestraş RE (2012b) A simulation study for the distribution law of relative moments of evolution. Complexity 17:52–63
Article Google Scholar
Karlin S, Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemas. Proc Natl Acad Sci USA 87:2264–2268
Article Google Scholar
Kim J, Ma J (2011) PSAR: measuring multiple sequence alignment reliability by probabilistic sampling. Nucleic Acids Res 39:6359–6368
Article Google Scholar
Kim J, Ma J (2014) PSAR-Align: improving multiple sequence alignment using probabilistic sampling. Bioinformatics 30(7):1010–1012
Article Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760
Article Google Scholar
Mantaci S, Restivo A, Rosone G et al (2008) A new combinatorial approach to sequence comparison. Theory Comput Syst 42:411–429
Article Google Scholar
Mitrophanov AY, Borodovsky M (2006) Statistical significance in biological sequence analysis. Brief Bioinform 7:2–24
Article Google Scholar
Mokaddeml, A., Elloumi, M. (2013) Motalign: a multiple sequence alignment algorithm based on a new distance and a new score function. In: International workshop on database and expert systems applications, DEXA, Article number 6621350, pp 81–84
Mongiovì M, Sharan R (2013) Global alignment of protein–protein interaction networks. Methods Mol Biol 939:21–34
Article Google Scholar
Mott R (2005) Alignment: statistical significance. Encyclopedia of life sciences. http://mrw.interscience.wiley.com/emrw/9780470015902/els/article/a0005264/current/abstract. Accessed 24 Feb 2014
Mount DM (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor
Google Scholar
Olsen R, Bundschuh R, Hwa T.(1999) Rapid assessment of extremal statistics for gapped local alignment. In: Lengauer T, Schneider R, Bork P et al. (eds.) Proceedings of the seventh international conference on intelligent systems for molecular biology, AAAI Press, Menlo Park, pp 211–222
Pearson WR (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132:185–219
Google Scholar
Phuong TM, Do CB, Edgar RC et al (2006) Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res 34:5932–5942
Article Google Scholar
Pruitt KD, Tatusova T, Brown GR et al (2012) NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40:D130–D135
Article Google Scholar
Rahrig RR, Petrov AI, Leontis NB et al (2013) R3D Align web server for global nucleotide to nucleotide alignments of RNA 3D structures. Nucleic Acids Res 41:W15–W21
Article Google Scholar
Sharp H (1968) Cardinality of finite topologies. J Comb Theory 5:82–86. doi:10.1016/S0021-9800(68)80031-6
Article Google Scholar
Smith TF, Waterman MS, Burks C (1985) The statistical distribution of nucleic acid similarities. Nucleic Acids Res 13:645–656
Article Google Scholar
Szalkowski AM, Anisimova M (2013) Graph-based modeling of tandem repeats improves global multiple sequence alignment. Nucleic Acids Res 41:e162
Article Google Scholar
Tabei Y, Asai K (2009) A local multiple alignment method for detection of non-coding RNA sequences. Bioinformatics 25:1498–1505
Article Google Scholar
Waterman M (1994) Estimating statistical significance of sequence alignments. Philos Trans R Soc Lond B 344:383–390
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Physics and Chemistry, Technical University of Cluj-Napoca, 103-105 Muncii Bvd., 400641, Cluj-Napoca, Romania
Lorentz Jäntschi
Institute for Doctoral Studies, Babeş-Bolyai University, 1st Mihail Kogalniceanu Street, 400084, Cluj-Napoca, Romania
Lorentz Jäntschi
University of Agricultural Science and Veterinary Medicine Cluj-Napoca, 3-5 Calea Mănăştur, 400372, Cluj-Napoca, Romania
Lorentz Jäntschi & Sorana D. Bolboacă
Department of Chemistry, The University of Oradea, 1st Universităţii Street, 410087, Oradea, Romania
Lorentz Jäntschi
Department of Medical Informatics and Biostatistics, Iuliu Haţieganu University of Medicine and Pharmacy, 6 Louis Pasteur, 400349, Cluj-Napoca, Romania
Sorana D. Bolboacă

Authors

Lorentz Jäntschi
View author publications
You can also search for this author in PubMed Google Scholar
Sorana D. Bolboacă
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sorana D. Bolboacă.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jäntschi, L., Bolboacă, S.D. Distribution on Contingency of Alignment of Two Literal Sequences Under Constrains. Acta Biotheor 63, 55–69 (2015). https://doi.org/10.1007/s10441-014-9243-7

Download citation

Received: 01 July 2014
Accepted: 05 December 2014
Published: 19 December 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10441-014-9243-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distribution on Contingency of Alignment of Two Literal Sequences Under Constrains

Abstract

Access this article

Similar content being viewed by others

Structure sensitive complexity for symbol-free sequences

Subsequence versus substring constraints in sequence pattern languages

Compositional Properties of Alignments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distribution on Contingency of Alignment of Two Literal Sequences Under Constrains

Abstract

Access this article

Similar content being viewed by others

Structure sensitive complexity for symbol-free sequences

Subsequence versus substring constraints in sequence pattern languages

Compositional Properties of Alignments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation