Scan Statistics in DNA and Protein Sequence Analysis
Scientists in fields ranging from evolution to medicine compare protein or DNA sequences from several biological sources. DNA is a long molecule, deoxyribonucleic acid, that contains genetic codes that control biological processes. The DNA molecule most often consists of two strands of nucleotides each consisting of a deoxyribose residue, a phosphate group, and a nucleotide base. The four nucleotide bases (or bases for short) are denoted A, C, G, T corresponding to adenine, cytosine, guanine, and thymine. The deoxyribose residues linked by phosphate bonds are like the backbone of a single strand of a long necklace with the bases being attached beads. The two strands are linked by hydrogen bonds between pairs of bases, where an A on one strand links with T on the other strand, and a C on one strand links with a G on the other strand. Knowing the sequence of bases in one strand automatically gives the sequence in the complementary strand.
KeywordsPerfect Match Common Word Independent Sequence Protein Sequence Analysis Poisson Approximation
Unable to display preview. Download preview PDF.