Acta Biotheoretica

, Volume 63, Issue 1, pp 55–69

Distribution on Contingency of Alignment of Two Literal Sequences Under Constrains

Regular Article

DOI: 10.1007/s10441-014-9243-7

Cite this article as:
Jäntschi, L. & Bolboacă, S.D. Acta Biotheor (2015) 63: 55. doi:10.1007/s10441-014-9243-7
  • 87 Downloads

Abstract

The case of ungapped alignment of two literal sequences under constrains is considered. The analysis lead to general formulas for probability mass function and cumulative distribution function for the general case of using an alphabet with a chosen number of letters (e.g. 4 for deoxyribonucleic acid sequences) in the expression of the literal sequences. Formulas for three statistics including mean, mode, and standard deviation were obtained. Distributions are depicted for three important particular cases: alignment on binary sequences, alignment of trinomial series (such as coming from generalized Kronecker delta), and alignment of genetic sequences (with four literals in the alphabet). A particular case when sequences contain each letter of the alphabet at least once in both sequences has also been analyzed and some statistics for this restricted case are given.

Keywords

Alignment Contingency matrix Probability mass function (PMF) Cumulative distribution function (CDF) 

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Department of Physics and ChemistryTechnical University of Cluj-NapocaCluj-NapocaRomania
  2. 2.Institute for Doctoral Studies, Babeş-Bolyai UniversityCluj-NapocaRomania
  3. 3.University of Agricultural Science and Veterinary Medicine Cluj-NapocaCluj-NapocaRomania
  4. 4.Department of ChemistryThe University of OradeaOradeaRomania
  5. 5.Department of Medical Informatics and BiostatisticsIuliu Haţieganu University of Medicine and PharmacyCluj-NapocaRomania

Personalised recommendations