String Mathematics, BLAST, and FASTA

  • Xuhua Xia


What is an e-value for ungapped and gapped BLAST? What are the Karlin-Altschul parameters that affect e-value calculation? How nucleotide frequencies and match-mismatch matrices affect such parameters? What are the key algorithms for FASTA and BLAST? How do their differences affect sensitivity of sequence search? This chapter addresses these questions and illustrates applications of string matching in genomics, transcriptomics, and proteomics, as well as in drug discovery.


  1. Abraham EP, Chain E (1940) An enzyme from bacteria able to destroy penicillin. Rev Infect Dis 10(4):677–678Google Scholar
  2. Abraham EP, Chain E, Fletcher CM, Florey HW, Gardner AD, Heatley NG, Jennings MA (1941) Further observations on penicillin. Lancet 238(6155):177–189CrossRefGoogle Scholar
  3. Alderwick LJ, Seidel M, Sahm H, Besra GS, Eggeling L (2006) Identification of a novel arabinofuranosyltransferase (AftA) involved in cell wall arabinan biosynthesis in Mycobacterium tuberculosis. J Biol Chem 281(23):15653–15661PubMedCrossRefGoogle Scholar
  4. Altschul SF (1996) Local alignment statistics. Meth Enzymol 274:460–480CrossRefGoogle Scholar
  5. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410PubMedCrossRefGoogle Scholar
  6. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedPubMedCentralCrossRefGoogle Scholar
  7. Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D (2003) Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 100(7):3889–3894PubMedCrossRefGoogle Scholar
  8. Bastianelli G, Bouillon A, Nguyen C, Crublet E, Petres S, Gorgette O, Le-Nguyen D, Barale JC, Nilges M (2011) Computational reverse-engineering of a spider-venom derived peptide active against Plasmodium falciparum SUB1. PLoS One 6(7):e21812PubMedPubMedCentralCrossRefGoogle Scholar
  9. Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257(6):3026–3031PubMedPubMedCentralGoogle Scholar
  10. Bergsten E, Uutela M, Li X, Pietras K, Ostman A, Heldin CH, Alitalo K, Eriksson U (2001) PDGF-D is a specific, protease-activated ligand for the PDGF beta-receptor. Nat Cell Biol 3(5):512–516PubMedCrossRefGoogle Scholar
  11. Bhatia B, Ponia SS, Solanki AK, Dixit A, Garg LC (2014) Identification of glutamate ABC-transporter component in Clostridium perfringens as a putative drug target. Bioinformation 10(7):401–405PubMedPubMedCentralCrossRefGoogle Scholar
  12. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146):799–816PubMedCrossRefGoogle Scholar
  13. Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748PubMedPubMedCentralCrossRefGoogle Scholar
  14. Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D et al (2006) Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 6(5):656–668CrossRefGoogle Scholar
  15. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94PubMedCrossRefGoogle Scholar
  16. Chuang SE, Daniels DL, Blattner FR (1993) Global regulation of gene expression in Escherichia coli. J Bacteriol 175(7):2026–2036PubMedPubMedCentralCrossRefGoogle Scholar
  17. Cox SS, van der Giezen M, Tarr SJ, Crompton MR, Tovar J (2006) Evidence from bioinformatics, expression and inhibition studies of phosphoinositide-3 kinase signalling in Giardia intestinalis. BMC Microbiol 6:45PubMedPubMedCentralCrossRefGoogle Scholar
  18. David E, Tramontin T, Zemmel R (2009) Pharmaceutical R&D: the road to positive returns. Nat Rev Drug Discov 8(8):609–610PubMedCrossRefGoogle Scholar
  19. Deng W, Lee J, Wang H, Miller J, Reik A, Gregory PD, Dean A, Blobel GA (2012) Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149(6):1233–1244PubMedPubMedCentralCrossRefGoogle Scholar
  20. Deng W, Rupon JW, Krivega I, Breda L, Motta I, Jahn KS, Reik A, Gregory PD, Rivella S, Dean A et al (2014b) Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158(4):849–860PubMedPubMedCentralCrossRefGoogle Scholar
  21. Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA, Antoniades HN (1983) Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science 221(4607):275–277PubMedCrossRefPubMedCentralGoogle Scholar
  22. Drews J, Ryser S (1997) The role of innovation in drug development. Nat Biotechnol 15(13):1318–1319PubMedCrossRefPubMedCentralGoogle Scholar
  23. Ehnman M, Missiaglia E, Folestad E, Selfe J, Strell C, Thway K, Brodin B, Pietras K, Shipley J, Ostman A et al (2013) Distinct effects of ligand-induced PDGFRalpha and PDGFRbeta signaling in the human rhabdomyosarcoma tumor cell and stroma cell compartments. Cancer Res 73(7):2139–2149PubMedPubMedCentralCrossRefGoogle Scholar
  24. Ezzell C (2002) Proteins rule. Sci Am 286(4):40–47PubMedCrossRefGoogle Scholar
  25. Fernandez-Pinar R, Lo Sciuto A, Rossi A, Ranucci S, Bragonzi A, Imperi F (2015) In vitro and in vivo screening for novel essential cell-envelope proteins in Pseudomonas aeruginosa. Sci Rep 5:17593PubMedPubMedCentralCrossRefGoogle Scholar
  26. Figeys D (2002) Adapting arrays and lab-on-a-chip technology for proteomics. Proteomics 2(4):373–382PubMedCrossRefGoogle Scholar
  27. Figeys D (2003a) Novel approaches to map protein interactions. Curr Opin Biotechnol 14(1):119–125PubMedCrossRefGoogle Scholar
  28. Figeys D (2003b) Proteomics in 2002: a year of technical development and wide-ranging applications. Anal Chem 75(12):2891–2905PubMedCrossRefGoogle Scholar
  29. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512PubMedCrossRefPubMedCentralGoogle Scholar
  30. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM et al (1995) The minimal gene complement of Mycoplasma genitalium. Science 270(5235):397–403PubMedCrossRefPubMedCentralGoogle Scholar
  31. Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26(12):2941–2947PubMedPubMedCentralCrossRefGoogle Scholar
  32. Gal-Mor O, Finlay BB (2006) Pathogenicity islands: a molecular toolbox for bacterial virulence. Cell Microbiol 8(11):1707–1719PubMedCrossRefGoogle Scholar
  33. Gibbs JB (2000) Mechanism-based target identification and drug discovery in cancer research. Science 287(5460):1969–1973PubMedCrossRefGoogle Scholar
  34. Gilbert WV, Zhou K, Butler TK, Doudna JA (2007) Cap-independent translation is required for starvation-induced differentiation in yeast. Science 317(5842):1224–1227PubMedCrossRefGoogle Scholar
  35. Gumbel EJ (1958) Statistics of extremes. Columbia University Press, New YorkGoogle Scholar
  36. Hacker J, Kaper JB (2000) Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol 54:641–679PubMedCrossRefGoogle Scholar
  37. Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 23(6):1089–1097PubMedCrossRefGoogle Scholar
  38. Hayes WS, Borodovsky M (1998) How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 8(11):1154–1171PubMedCrossRefPubMedCentralGoogle Scholar
  39. Heath JR, Ribas A, Mischel PS (2016) Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 15(3):204–216PubMedCrossRefPubMedCentralGoogle Scholar
  40. Hofer A, Steverding D, Chabes A, Brun R, Thelander L (2001) Trypanosoma brucei CTP synthetase: a target for the treatment of African sleeping sickness. Proc Natl Acad Sci U S A 98(11):6412–6416PubMedPubMedCentralCrossRefGoogle Scholar
  41. Hou C, Zhao H, Tanimoto K, Dean A (2008) CTCF-dependent enhancer-blocking by alternative chromatin loop formation. Proc Natl Acad Sci U S A 105(51):20398–20403PubMedPubMedCentralCrossRefGoogle Scholar
  42. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119CrossRefGoogle Scholar
  43. Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324(5924):218–223PubMedPubMedCentralCrossRefGoogle Scholar
  44. Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147(4):789–802PubMedPubMedCentralCrossRefGoogle Scholar
  45. Ingram VM (1956) A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin. Nature 178(4537):792–794PubMedCrossRefGoogle Scholar
  46. Ingram VM (1957) Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature 180(4581):326–328PubMedCrossRefGoogle Scholar
  47. Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356PubMedCrossRefPubMedCentralGoogle Scholar
  48. Kaneko T, Tanaka A, Sato S, Kotani H, Sazuka T, Miyajima N, Sugiura M, Tabata S (1995) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. DNA Res 2(4):153–166. 191-8PubMedCrossRefGoogle Scholar
  49. Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S et al (1996) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res 3(3):109–136PubMedCrossRefGoogle Scholar
  50. Kioussis D, Vanin E, deLange T, Flavell RA, Grosveld FG (1983) Beta-globin gene inactivation by DNA translocation in gamma beta-thalassaemia. Nature 306(5944):662–666PubMedCrossRefGoogle Scholar
  51. Kozak M (1981) Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res 9(20):5233–5252PubMedPubMedCentralCrossRefGoogle Scholar
  52. Kozak M (1991) Effects of long 5′ leader sequences on initiation by eukaryotic ribosomes in vitro. Gene Expr 1(2):117–125PubMedGoogle Scholar
  53. Kozak M (1999) Initiation of translation in prokaryotes and eukaryotes. Gene 234(2):187–208PubMedCrossRefGoogle Scholar
  54. Krasemann EW, Meier V, Korenke GC, Hunneman DH, Hanefeld F (1996) Identification of mutations in the ALD-gene of 20 families with adrenoleukodystrophy/adrenomyeloneuropathy. Hum Genet 97(2):194–197PubMedCrossRefPubMedCentralGoogle Scholar
  55. Kutlar A (2007) Sickle cell disease: a multigenic perspective of a single gene disorder. Hemoglobin 31(2):209–224PubMedCrossRefGoogle Scholar
  56. Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nat Biotechnol 227:680–685Google Scholar
  57. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921PubMedCrossRefPubMedCentralGoogle Scholar
  58. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293PubMedPubMedCentralCrossRefGoogle Scholar
  59. Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441PubMedCrossRefGoogle Scholar
  60. Liu X, Jiang H, Gu Z, Roberts JW (2013) High-resolution view of bacteriophage lambda gene expression by ribosome profiling. Proc Natl Acad Sci U S A 110(29):11928–11933PubMedPubMedCentralCrossRefGoogle Scholar
  61. MacKay VL, Li X, Flory MR, Turcott E, Law GL, Serikawa KA, Xu XL, Lee H, Goodlett DR, Aebersold R et al (2004) Gene expression analyzed by high-resolution state array analysis and quantitative proteomics: response of yeast to mating pheromone. Mol Cell Proteomics 3(5):478–489PubMedCrossRefGoogle Scholar
  62. Madden SL, Galella EA, Zhu J, Bertelsen AH, Beaudry GA (1997) SAGE transcript profiles for p53-dependent growth regulation. Oncogene 15(9):1079–1085PubMedCrossRefGoogle Scholar
  63. Meyer IM, Durbin R (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res 32(2):776–783PubMedPubMedCentralCrossRefGoogle Scholar
  64. Moffat JG, Rudolph J, Bailey D (2014) Phenotypic screening in cancer drug discovery – past, present and future. Nat Rev Drug Discov 13(8):588–602PubMedCrossRefGoogle Scholar
  65. Morita M, Shimozawa N, Kashiwayama Y, Suzuki Y, Imanaka T (2011) ABC subfamily D proteins and very long chain fatty acid metabolism as novel targets in adrenoleukodystrophy. Curr Drug Targets 12(5):694–706PubMedCrossRefGoogle Scholar
  66. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628PubMedCrossRefGoogle Scholar
  67. Needleman SB, Wunsch CD (1970) A general method applicable to the search of similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453PubMedCrossRefGoogle Scholar
  68. Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM (2008) Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med 359(24):2619–2620PubMedCrossRefGoogle Scholar
  69. Noedl H, Socheat D, Satimai W (2009) Artemisinin-resistant malaria in Asia. N Engl J Med 361(5):540–541PubMedCrossRefGoogle Scholar
  70. Noedl H, Se Y, Sriwichai S, Schaecher K, Teja-Isavadharm P, Smith B, Rutvisuttinunt W, Bethell D, Surasri S, Fukuda MM et al (2010) Artemisinin resistance in Cambodia: a clinical trial designed to address an emerging problem in Southeast Asia. Clin Infect Dis 51(11):e82–e89PubMedCrossRefGoogle Scholar
  71. Palstra RJ, Tolhuis B, Splinter E, Nijmeijer R, Grosveld F, de Laat W (2003) The beta-globin nuclear compartment in development and erythroid differentiation. Nat Genet 35(2):190–194PubMedCrossRefPubMedCentralGoogle Scholar
  72. Pauling L, Itano HA, Singer SJ, Wells IC (1949) Sickle cell anemia a molecular disease. Science 110(2865):543–548PubMedPubMedCentralCrossRefGoogle Scholar
  73. Pearson WR (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183:63–98PubMedCrossRefGoogle Scholar
  74. Pearson WR (1994) Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol 24:307–331PubMedGoogle Scholar
  75. Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1):71–84PubMedCrossRefGoogle Scholar
  76. Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85:2444–2448PubMedPubMedCentralCrossRefGoogle Scholar
  77. Pietras K, Sjoblom T, Rubin K, Heldin CH, Ostman A (2003) PDGF receptors as cancer drug targets. Cancer Cell 3(5):439–443PubMedCrossRefGoogle Scholar
  78. Poulos MG, Batra R, Charizanis K, Swanson MS (2011) Developments in RNA splicing and disease. Cold Spring Harb Perspect Biol 3(1):a000778PubMedPubMedCentralCrossRefGoogle Scholar
  79. Press WH, Teukolsky SA, Tetterling WT, Flannery BP (1992) Numerical recipes in C: the art of scientifi computing. Cambridge University Press, CambridgeGoogle Scholar
  80. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4(8):651–657PubMedCrossRefGoogle Scholar
  81. Saadatpour A, Lai S, Guo G, Yuan GC (2015) Single-cell analysis in cancer genomics. Trends Genet 31(10):576–586PubMedPubMedCentralCrossRefGoogle Scholar
  82. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512PubMedCrossRefGoogle Scholar
  83. Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26(2):544–548PubMedPubMedCentralCrossRefGoogle Scholar
  84. Schena M (1996) Genome analysis with gene expression microarrays. BioEssays 18(5):427–431PubMedCrossRefPubMedCentralGoogle Scholar
  85. Schena M (2003) Microarray analysis. Wiley-Liss, New YorkGoogle Scholar
  86. Segurel L, Bon C (2017) On the evolution of lactase persistence in humans. Annu Rev Genomics Hum Genet 18:297–319PubMedCrossRefPubMedCentralGoogle Scholar
  87. Shine J, Dalgarno L (1974a) The 3′-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A 71(4):1342–1346PubMedPubMedCentralCrossRefGoogle Scholar
  88. Shine J, Dalgarno L (1974b) Identical 3′-terminal octanucleotide sequence in 18S ribosomal ribonucleic acid from different eukaryotes. A proposed role for this sequence in the recognition of terminator codons. Biochem J 141(3):609–615PubMedPubMedCentralCrossRefGoogle Scholar
  89. Shine J, Dalgarno L (1975) Determinant of cistron specificity in bacterial ribosomes. Nature 254(5495):34–38PubMedCrossRefPubMedCentralGoogle Scholar
  90. Shoemaker RH (2006) The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 6(10):813–823PubMedCrossRefPubMedCentralGoogle Scholar
  91. Sloane AJ, Duff JL, Wilson NL, Gandhi PS, Hill CJ, Hopwood FG, Smith PE, Thomas ML, Cole RA, Packer NH et al (2002) High throughput peptide mass fingerprinting and protein macroarray analysis using chemical printing strategies. Mol Cell Proteomics 1(7):490–499PubMedCrossRefGoogle Scholar
  92. Smircich P, Eastman G, Bispo S, Duhagon MA, Guerra-Slompo EP, Garat B, Goldenberg S, Munroe DJ, Dallagiovanna B, Holetz F et al (2015) Ribosome profiling reveals translation control as a key mechanism generating differential gene expression in Trypanosoma cruzi. BMC Genomics 16:443PubMedPubMedCentralCrossRefGoogle Scholar
  93. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197PubMedCrossRefGoogle Scholar
  94. Smyth RP, Davenport MP, Mak J (2012) The origin of genetic diversity in HIV-1. Virus Res 169(2):415–429PubMedCrossRefGoogle Scholar
  95. Smyth RP, Schlub TE, Grimm AJ, Waugh C, Ellenberg P, Chopra A, Mallal S, Cromer D, Mak J, Davenport MP (2014) Identifying recombination hot spots in the HIV-1 genome. J Virol 88(5):2891–2902PubMedPubMedCentralCrossRefGoogle Scholar
  96. Steinberg MH, Rodgers GP (2001) Pathophysiology of sickle cell disease: role of cellular and genetic modifiers. Semin Hematol 38(4):299–306PubMedCrossRefPubMedCentralGoogle Scholar
  97. Steitz JA, Jakes K (1975) How ribosomes select initiator regions in mRNA: base pair formation between the 3′ terminus of 16S rRNA and the mRNA during initiation of protein synthesis in Escherichia coli. Proc Natl Acad Sci U S A 72(12):4734–4738PubMedPubMedCentralCrossRefGoogle Scholar
  98. Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982a) Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10(9):2997–3011PubMedPubMedCentralCrossRefGoogle Scholar
  99. Taniguchi T, Weissmann C (1978) Inhibition of Qbeta RNA 70S ribosome initiation complex formation by an oligonucleotide complementary to the 3′ terminal region of E. coli 16S ribosomal RNA. Nature 275(5682):770–772PubMedCrossRefGoogle Scholar
  100. Tao H, Bausch C, Richmond C, Blattner FR, Conway T (1999) Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181(20):6425–6440PubMedPubMedCentralGoogle Scholar
  101. Taramelli R, Kioussis D, Vanin E, Bartram K, Groffen J, Hurst J, Grosveld FG (1986) Gamma delta beta-thalassaemias 1 and 2 are the result of a 100 kbp deletion in the human beta-globin cluster. Nucleic Acids Res 14(17):7017–7029PubMedPubMedCentralCrossRefGoogle Scholar
  102. Tech M, Merkl R (2003) YACOP: enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3(4):441–451PubMedGoogle Scholar
  103. Tolhuis B, Palstra RJ, Splinter E, Grosveld F, de Laat W (2002) Looping and interaction between hypersensitive sites in the active beta-globin locus. Mol Cell 10(6):1453–1465PubMedCrossRefGoogle Scholar
  104. Trudel MV, Vincent AT, Attere SA, Labbe M, Derome N, Culley AI, Charette SJ (2016) Diversity of antibiotic-resistance genes in Canadian isolates of Aeromonas salmonicida subsp. salmonicida: dominance of pSN254b and discovery of pAsa8. Sci Rep 6:35617PubMedPubMedCentralCrossRefGoogle Scholar
  105. Vasilescu J, Figeys D (2006) Mapping protein-protein interactions by mass spectrometry. Curr Opin Biotechnol 17(4):394–399PubMedCrossRefGoogle Scholar
  106. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487PubMedCrossRefPubMedCentralGoogle Scholar
  107. Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr, Hieter P, Vogelstein B, Kinzler KW (1997) Characterization of the yeast transcriptome. Cell 88(2):243–251PubMedCrossRefPubMedCentralGoogle Scholar
  108. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al (2001) The sequence of the human genome. Science 291(5507):1304–1351PubMedCrossRefGoogle Scholar
  109. Vlasschaert C, Xia X, Coulombe J, Gray DA (2015) Evolution of the highly networked deubiquitinating enzymes USP4, USP15, and USP11. BMC Evol Biol 15:230PubMedPubMedCentralCrossRefGoogle Scholar
  110. Washburn MP, Wolters D, Yates JR 3rd (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 19(3):242–247PubMedCrossRefGoogle Scholar
  111. Waterfield MD, Scrace GT, Whittle N, Stroobant P, Johnsson A, Wasteson A, Westermark B, Heldin CH, Huang JS, Deuel TF (1983) Platelet-derived growth factor is structurally related to the putative transforming protein p28sis of simian sarcoma virus. Nature 304(5921):35–39PubMedCrossRefGoogle Scholar
  112. Waterman MS, Vingron M (1994) Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A 91(11):4625–4628PubMedPubMedCentralCrossRefGoogle Scholar
  113. Weigert MG, Garen A (1965) Base composition of nonsense codons in E. coli. evidence from amino-acid substitutions at a tryptophan site in alkaline phosphatase. Nature 206(988):992–994PubMedCrossRefGoogle Scholar
  114. Wilson DS, Nock S (2002) Functional protein microarrays. Curr Opin Chem Biol 6(1):81–85PubMedCrossRefGoogle Scholar
  115. Wu J, Tzanakakis ES (2013) Deconstructing stem cell population heterogeneity: single-cell analysis and modeling approaches. Biotechnol Adv 31(7):1047–1062PubMedPubMedCentralCrossRefGoogle Scholar
  116. Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728PubMedPubMedCentralCrossRefGoogle Scholar
  117. Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43CrossRefGoogle Scholar
  118. Yates JR (2004a) Mass spectral analysis in proteomics. Annu Rev Biophys Biomol Struct 33:297–316PubMedCrossRefGoogle Scholar
  119. Yates JR (2004b) Mass spectrometry as an emerging tool for systems biology. BioTechniques 36(6):917–919PubMedCrossRefGoogle Scholar
  120. Yoon JH, De S, Srikantan S, Abdelmohsen K, Grammatikakis I, Kim J, Kim KM, Noh JH, White EJ, Martindale JL et al (2014) PAR-CLIP analysis uncovers AUF1 impact on target RNA fate and genome integrity. Nat Commun 5:5248PubMedPubMedCentralCrossRefGoogle Scholar
  121. Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Vogelstein B, Kinzler KW (1997) Gene expression profiles in normal and cancer cells. Science 276(5316):1268–1272PubMedCrossRefPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  • Xuhua Xia
    • 1
  1. 1.University of Ottawa CAREG and Biology DepartmentOttawaCanada

Personalised recommendations