Skip to main content

Computational Techniques for a Comprehensive Understanding of Different Genotype-Phenotype Factors in Biological Systems and Their Applications

  • Chapter
  • First Online:
Synthetic Biology

Abstract

Identification and analyses of the discrete genotype and phenotype components of a biological system are complex, multi-scale problems. Computational techniques are widely used to extract some meaningful information from the underlying heterogeneous, raw biological data. Here, we review the use of different approaches to mathematically model biological data at different levels of complexities. At the raw sequence level, we discuss about the various techniques that model biological sequences based on frequency, geometry and spectral representation of nucleotides/amino acids. We also discuss about techniques that can be used to capture quantitative patterns of codons and briefly present an application in identification of evolutionary mechanisms that preserve codon usage patterns in trypanosomatids. Lastly, we discuss about the integration of the genotype and phenotype components into a systems-level mathematical representation of functional interactions in the form of a reconstructed genome-scale metabolic network and discern its role in governing variations in metabolic behaviours under different environmental conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Almeida JS, Carrico JA, Maretzek A et al (2001) Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17:429–437

    Article  CAS  Google Scholar 

  • Alonso G, Guevara P, Ramirez JL (1992) Trypanosomatidae codon usage and GC distribution. Mem Inst Oswaldo Cruz 87:517–523

    Article  CAS  Google Scholar 

  • Apweiler R, Martin MJ, O’Donovan C et al (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41:D43–D47

    Article  Google Scholar 

  • Arakawa K, Oshita K, Tomita M (2009) A web server for interactive and zoomable Chaos Game Representation images. Source Code Biol Med 4:6

    Article  Google Scholar 

  • Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Res 28:304–305

    Article  CAS  Google Scholar 

  • Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2008) GenBank. Nucleic Acids Res 36:D25–D30

    Article  CAS  Google Scholar 

  • Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2010) GenBank. Nucleic Acids Res 38:D46–D51. https://doi.org/10.1093/nar/gkp1024

    Article  CAS  PubMed  Google Scholar 

  • Bielińska-Wąż D, Wąż P (2017) Spectral-dynamic representation of DNA sequences. J Biomed Inform 72:1–7

    Article  Google Scholar 

  • Borst P (1986) How proteins get into microbodies (peroxisomes, glyoxysomes, glycosomes). Biochim Biophys Acta (BBA)-Gene Struct Expr 866:179–203

    Article  CAS  Google Scholar 

  • Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA1. J Mol Biol 268:78–94

    Article  CAS  Google Scholar 

  • Consortium U et al (2014) Activities at the universal protein resource (UniProt). Nucleic Acids Res 42:D191–D198

    Article  Google Scholar 

  • Crick F (1970) Central dogma of molecular biology. Nature 227:561–563

    Article  CAS  Google Scholar 

  • Dai Q, Liu X-Q, Wang T-M, Vukicevic D (2007) Linear regression model of DNA sequences and its application. J Comput Chem 28:1434–1445

    Article  CAS  Google Scholar 

  • Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23:324–328

    Article  CAS  Google Scholar 

  • De Castro E, Sigrist CJA, Gattiker A et al (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34:W362–W365

    Article  Google Scholar 

  • Dror G, Sorek R, Shamir R (2005) Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21:897–901. https://doi.org/10.1093/bioinformatics/bti132

    Article  CAS  PubMed  Google Scholar 

  • Eddy SR (1996) Hidden markov models. Curr Opin Struct Biol 6:361–365

    Article  CAS  Google Scholar 

  • Eddy SR (2001) HMMER: profile hidden Markov models for biological sequence analysis

    Google Scholar 

  • Emanuelsson O, Brunak S, Von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:953–971

    Article  CAS  Google Scholar 

  • Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90. https://doi.org/10.1038/47056

    Article  CAS  PubMed  Google Scholar 

  • Fertil B, Massin M, Lespinats S et al (2005) GENSTYLE: exploration and analysis of DNA sequences with genomic signature. Nucleic Acids Res 33:W512–W515

    Article  CAS  Google Scholar 

  • Finn RD, Bateman A, Clements J et al (2013) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230

    Article  Google Scholar 

  • Fridolin G, Green S (2017) The sum of the parts: large-scale modeling in systems biology. Philos Theory Biol 9:1–26

    Google Scholar 

  • Gasteiger E, Hoogland C, Gattiker A et al (2005) Protein identification and analysis tools on the ExPASy server. In: Walker JM (ed) The proteomics protocols handbook. Humana press, Totowa, pp 571–607

    Chapter  Google Scholar 

  • Glunčić M, Paar V (2012) Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res 41:e17–e17

    Article  Google Scholar 

  • Guerra-Giraldez C, Quijada L, Clayton CE (2002) Compartmentation of enzymes in a microbody, the glycosome, is essential in Trypanosoma brucei. J Cell Sci 115:2651–2658

    CAS  PubMed  Google Scholar 

  • Henry CS, Overbeek R, Xia F et al (2011) Connecting genotype to phenotype in the era of high-throughput sequencing. Biochim Biophys Acta (BBA) Gen Subj 1810:967–977

    Article  CAS  Google Scholar 

  • Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42:287–299

    Article  CAS  Google Scholar 

  • Hou W, Pan Q, He M (2016) A new graphical representation of protein sequences and its applications. Phys A Stat Mech Appl 444:996–1002

    Article  CAS  Google Scholar 

  • Huh W-K, Falvo JV, Gerke LC et al (2003) Global analysis of protein localization in budding yeast. Nature 425:686

    Article  CAS  Google Scholar 

  • Huynen M (2000) Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10:1204–1210. https://doi.org/10.1101/gr.10.8.1204

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jeffrey HJ (1990) Chaos game representation of gene structure. Nucleic Acids Res 18:2163–2170

    Article  CAS  Google Scholar 

  • Jensen LJ, Kuhn M, Stark M et al (2008) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37:D412–D416

    Article  Google Scholar 

  • Joseph J, Sasikumar R (2006) Chaos game representation for comparison of whole genomes. BMC Bioinforma 7:243

    Article  Google Scholar 

  • Kanehisa M, Goto S, Sato Y et al (2013) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205

    Article  Google Scholar 

  • Lin J (1991) Divergence measures based on the Shannon entropy. Inf Theory IEEE Trans 37:145–151

    Article  Google Scholar 

  • Lu YY, Tang K, Ren J et al (2017) CAFE: accelerated alignment-free sequence analysis. Nucleic Acids Res 45:W554–W559. https://doi.org/10.1093/nar/gkx351

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Michels PAM (1988) Compartmentation of glycolysis in trypanosomes: a potential target for new trypanocidal drugs. Biol Cell 64:157–164

    Article  CAS  Google Scholar 

  • Misset O, Bos OJM, Opperdoes FR (1986) Glycolytic enzymes of Trypanosoma brucei. Eur J Biochem 157:441–453

    Article  CAS  Google Scholar 

  • Mostafavi S, Ray D, Warde-Farley D et al (2008) GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol 9:S4

    Article  Google Scholar 

  • Mount DW (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, New York

    Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  CAS  Google Scholar 

  • Opperdoes FR, Szikora J-P (2006) In silico prediction of the glycosomal enzymes of Leishmania major and trypanosomes. Mol Biochem Parasitol 147:193–206

    Article  CAS  Google Scholar 

  • Orgogozo V, Morizot B, Martin A (2015) The differential view of genotype – phenotype relationships. Front Genet 6:179

    Article  Google Scholar 

  • Overbeek R, Fonstein M, D’Souza M et al (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901

    Article  CAS  Google Scholar 

  • Pandit A, Sinha S (2010) Using genomic signatures for HIV-1 sub-typing. BMC Bioinforma 11:S26

    Article  Google Scholar 

  • Pellegrini M, Marcotte EM, Thompson MJ et al (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288

    Article  CAS  Google Scholar 

  • Plotkin JB, Kudla G (2010) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42

    Article  Google Scholar 

  • Qi Z-H, Jin M-Z (2016) An intuitive graphical method for visualizing protein sequences based on linear regression and physicochemical properties. Match Commun Math Comput Chem 75:463–480

    Google Scholar 

  • Qi J, Luo H, Hao B (2004) CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res 32:W45–W47

    Article  CAS  Google Scholar 

  • Qi X, Wu Q, Zhang Y et al (2011) A novel model for DNA sequence similarity analysis based on graph theory. Evol Bioinforma 7:EBO–S7364

    Article  Google Scholar 

  • Randić M, Vracko M, Nandy A, Basak SC (2000) On 3-D graphical representation of DNA primary sequences and their numerical characterization. J Chem Inf Comput Sci 40:1235–1244

    Article  Google Scholar 

  • Randić M, Vračko M, Lerš N, Plavšić D (2003a) Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem Phys Lett 368:1–6

    Article  Google Scholar 

  • Randić M, Vračko M, Lerš N, Plavšić D (2003b) Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation. Chem Phys Lett 371:202–207

    Article  Google Scholar 

  • Randić M, Zupan J, Balaban AT (2004) Unique graphical representation of protein sequences based on nucleotide triplet codons. Chem Phys Lett 397:247–252

    Article  Google Scholar 

  • Ren Q, Chen K, Paulsen IT (2006) TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels. Nucleic Acids Res 35:D274–D279

    Article  Google Scholar 

  • Rice P, Longden I, Bleasby A et al (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277

    Article  CAS  Google Scholar 

  • Schomburg I, Chang A, Placzek S et al (2012) BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res 41:D764–D772 gks1049

    Article  Google Scholar 

  • Sharma D, Issac B, Raghava GPS, Ramaswamy R (2004) Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20:1405–1412

    Article  CAS  Google Scholar 

  • Sharp PM, Li W-H (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295

    Article  CAS  Google Scholar 

  • Smith TF, Waterman MS (1981) Comparison of biosequences. Adv Appl Math 2:482–489

    Article  Google Scholar 

  • Snoep JL, Westerhoff HV (2005) From isolation to integration, a systems biology approach for building the Silicon Cell. In: Alberghina L, Westerhoff HV (eds) Systems biology: definitions and perspectives. Springer Berlin Heidelberg, Berlin, pp 13–30

    Chapter  Google Scholar 

  • Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:ii215–ii225

    Article  Google Scholar 

  • Subramanian A, Sarkar RR (2015) Comparison of codon usage bias across Leishmania and Trypanosomatids to understand mRNA secondary structure, relative protein abundance and pathway functions. Genomics 106:232–241

    Article  CAS  Google Scholar 

  • Subramanian A, Sarkar RR (2017) Revealing the mystery of metabolic adaptations using a genome scale model of Leishmania infantum. Sci Rep 7:10262. https://doi.org/10.1038/s41598-017-10743-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Subramanian A, Jhawar J, Sarkar RR (2015) Dissecting Leishmania infantum energy metabolism – a systems perspective. PLoS One 10:e0137976. https://doi.org/10.1371/journal.pone.0137976

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sueoka N (1988) Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci 85:2653–2657

    Article  CAS  Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    Article  CAS  Google Scholar 

  • Tsonis AA, Elsner JB, Tsonis PA (1991) Periodicity in DNA coding sequences: implications in gene evolution. J Theor Biol 151:323–331

    Article  CAS  Google Scholar 

  • Vinga S, Almeida J (2003) Alignment-free sequence comparison—a review. Bioinformatics 19:513–523

    Article  CAS  Google Scholar 

  • Wang Y, Hill K, Singh S, Kari L (2005) The spectrum of genomic signatures: from dinucleotides to chaos game representation. Gene 346:173–185

    Article  CAS  Google Scholar 

  • Wang S-Y, Tian F-C, Liu X, Wang J (2009a) A novel representation approach to DNA sequence and its application. IEEE Signal Process Lett 16:275–278

    Article  Google Scholar 

  • Wang S, Tian F, Feng W, Liu X (2009b) Applications of representation method for DNA sequences based on symbolic dynamics. J Mol Struct THEOCHEM 909:33–42

    Article  CAS  Google Scholar 

  • Wang Y, Tang H, DeBarry JD et al (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40:e49–e49

    Article  CAS  Google Scholar 

  • van Weelden SWH, van Hellemond JJ, Opperdoes FR, Tielens AGM (2005) New functions for parts of the Krebs cycle in procyclic Trypanosoma brucei, a cycle not operating as a cycle. J Biol Chem 280:12451–12460

    Article  Google Scholar 

  • Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV (2001) Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 11:356–372. https://doi.org/10.1101/gr.161901

    Article  CAS  PubMed  Google Scholar 

  • Wright F (1990) The “effective number of codons” used in a gene. Gene 87:23–29

    Article  CAS  Google Scholar 

  • Yang ZR (2004) Biological applications of support vector machines. Brief Bioinform 5:328–338

    Article  CAS  Google Scholar 

  • Yang X, Wang T (2013) Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths. J Theor Biol 337:61–70

    Article  Google Scholar 

  • Yao Y-H, Dai Q, Li C et al (2008) Analysis of similarity/dissimilarity of protein sequences. Protein Struct Funct Bioinforma 73:864–871

    Article  CAS  Google Scholar 

  • Yao Y, Yan S, Han J et al (2014) A novel descriptor of protein sequences and its application. J Theor Biol 347:109–117

    Article  CAS  Google Scholar 

  • Yuan C, Liao B, Wang T (2003) New 3D graphical representation of DNA sequences and their numerical characterization. Chem Phys Lett 379:412–417

    Article  CAS  Google Scholar 

  • Zielezinski A, Vinga S, Almeida J, Karlowski WM (2017) Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 18:186

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ram Rup Sarkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Subramanian, A., Sarkar, R.R. (2018). Computational Techniques for a Comprehensive Understanding of Different Genotype-Phenotype Factors in Biological Systems and Their Applications. In: Singh, S. (eds) Synthetic Biology. Springer, Singapore. https://doi.org/10.1007/978-981-10-8693-9_8

Download citation

Publish with us

Policies and ethics