Abstract
Identification and analyses of the discrete genotype and phenotype components of a biological system are complex, multi-scale problems. Computational techniques are widely used to extract some meaningful information from the underlying heterogeneous, raw biological data. Here, we review the use of different approaches to mathematically model biological data at different levels of complexities. At the raw sequence level, we discuss about the various techniques that model biological sequences based on frequency, geometry and spectral representation of nucleotides/amino acids. We also discuss about techniques that can be used to capture quantitative patterns of codons and briefly present an application in identification of evolutionary mechanisms that preserve codon usage patterns in trypanosomatids. Lastly, we discuss about the integration of the genotype and phenotype components into a systems-level mathematical representation of functional interactions in the form of a reconstructed genome-scale metabolic network and discern its role in governing variations in metabolic behaviours under different environmental conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Almeida JS, Carrico JA, Maretzek A et al (2001) Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17:429–437
Alonso G, Guevara P, Ramirez JL (1992) Trypanosomatidae codon usage and GC distribution. Mem Inst Oswaldo Cruz 87:517–523
Apweiler R, Martin MJ, O’Donovan C et al (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41:D43–D47
Arakawa K, Oshita K, Tomita M (2009) A web server for interactive and zoomable Chaos Game Representation images. Source Code Biol Med 4:6
Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Res 28:304–305
Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2008) GenBank. Nucleic Acids Res 36:D25–D30
Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2010) GenBank. Nucleic Acids Res 38:D46–D51. https://doi.org/10.1093/nar/gkp1024
Bielińska-Wąż D, Wąż P (2017) Spectral-dynamic representation of DNA sequences. J Biomed Inform 72:1–7
Borst P (1986) How proteins get into microbodies (peroxisomes, glyoxysomes, glycosomes). Biochim Biophys Acta (BBA)-Gene Struct Expr 866:179–203
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA1. J Mol Biol 268:78–94
Consortium U et al (2014) Activities at the universal protein resource (UniProt). Nucleic Acids Res 42:D191–D198
Crick F (1970) Central dogma of molecular biology. Nature 227:561–563
Dai Q, Liu X-Q, Wang T-M, Vukicevic D (2007) Linear regression model of DNA sequences and its application. J Comput Chem 28:1434–1445
Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23:324–328
De Castro E, Sigrist CJA, Gattiker A et al (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34:W362–W365
Dror G, Sorek R, Shamir R (2005) Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21:897–901. https://doi.org/10.1093/bioinformatics/bti132
Eddy SR (1996) Hidden markov models. Curr Opin Struct Biol 6:361–365
Eddy SR (2001) HMMER: profile hidden Markov models for biological sequence analysis
Emanuelsson O, Brunak S, Von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:953–971
Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90. https://doi.org/10.1038/47056
Fertil B, Massin M, Lespinats S et al (2005) GENSTYLE: exploration and analysis of DNA sequences with genomic signature. Nucleic Acids Res 33:W512–W515
Finn RD, Bateman A, Clements J et al (2013) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230
Fridolin G, Green S (2017) The sum of the parts: large-scale modeling in systems biology. Philos Theory Biol 9:1–26
Gasteiger E, Hoogland C, Gattiker A et al (2005) Protein identification and analysis tools on the ExPASy server. In: Walker JM (ed) The proteomics protocols handbook. Humana press, Totowa, pp 571–607
Glunčić M, Paar V (2012) Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res 41:e17–e17
Guerra-Giraldez C, Quijada L, Clayton CE (2002) Compartmentation of enzymes in a microbody, the glycosome, is essential in Trypanosoma brucei. J Cell Sci 115:2651–2658
Henry CS, Overbeek R, Xia F et al (2011) Connecting genotype to phenotype in the era of high-throughput sequencing. Biochim Biophys Acta (BBA) Gen Subj 1810:967–977
Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42:287–299
Hou W, Pan Q, He M (2016) A new graphical representation of protein sequences and its applications. Phys A Stat Mech Appl 444:996–1002
Huh W-K, Falvo JV, Gerke LC et al (2003) Global analysis of protein localization in budding yeast. Nature 425:686
Huynen M (2000) Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10:1204–1210. https://doi.org/10.1101/gr.10.8.1204
Jeffrey HJ (1990) Chaos game representation of gene structure. Nucleic Acids Res 18:2163–2170
Jensen LJ, Kuhn M, Stark M et al (2008) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37:D412–D416
Joseph J, Sasikumar R (2006) Chaos game representation for comparison of whole genomes. BMC Bioinforma 7:243
Kanehisa M, Goto S, Sato Y et al (2013) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205
Lin J (1991) Divergence measures based on the Shannon entropy. Inf Theory IEEE Trans 37:145–151
Lu YY, Tang K, Ren J et al (2017) CAFE: accelerated alignment-free sequence analysis. Nucleic Acids Res 45:W554–W559. https://doi.org/10.1093/nar/gkx351
Michels PAM (1988) Compartmentation of glycolysis in trypanosomes: a potential target for new trypanocidal drugs. Biol Cell 64:157–164
Misset O, Bos OJM, Opperdoes FR (1986) Glycolytic enzymes of Trypanosoma brucei. Eur J Biochem 157:441–453
Mostafavi S, Ray D, Warde-Farley D et al (2008) GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol 9:S4
Mount DW (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, New York
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Opperdoes FR, Szikora J-P (2006) In silico prediction of the glycosomal enzymes of Leishmania major and trypanosomes. Mol Biochem Parasitol 147:193–206
Orgogozo V, Morizot B, Martin A (2015) The differential view of genotype – phenotype relationships. Front Genet 6:179
Overbeek R, Fonstein M, D’Souza M et al (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901
Pandit A, Sinha S (2010) Using genomic signatures for HIV-1 sub-typing. BMC Bioinforma 11:S26
Pellegrini M, Marcotte EM, Thompson MJ et al (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288
Plotkin JB, Kudla G (2010) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42
Qi Z-H, Jin M-Z (2016) An intuitive graphical method for visualizing protein sequences based on linear regression and physicochemical properties. Match Commun Math Comput Chem 75:463–480
Qi J, Luo H, Hao B (2004) CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res 32:W45–W47
Qi X, Wu Q, Zhang Y et al (2011) A novel model for DNA sequence similarity analysis based on graph theory. Evol Bioinforma 7:EBO–S7364
Randić M, Vracko M, Nandy A, Basak SC (2000) On 3-D graphical representation of DNA primary sequences and their numerical characterization. J Chem Inf Comput Sci 40:1235–1244
Randić M, Vračko M, Lerš N, Plavšić D (2003a) Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem Phys Lett 368:1–6
Randić M, Vračko M, Lerš N, Plavšić D (2003b) Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation. Chem Phys Lett 371:202–207
Randić M, Zupan J, Balaban AT (2004) Unique graphical representation of protein sequences based on nucleotide triplet codons. Chem Phys Lett 397:247–252
Ren Q, Chen K, Paulsen IT (2006) TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels. Nucleic Acids Res 35:D274–D279
Rice P, Longden I, Bleasby A et al (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277
Schomburg I, Chang A, Placzek S et al (2012) BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res 41:D764–D772 gks1049
Sharma D, Issac B, Raghava GPS, Ramaswamy R (2004) Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20:1405–1412
Sharp PM, Li W-H (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295
Smith TF, Waterman MS (1981) Comparison of biosequences. Adv Appl Math 2:482–489
Snoep JL, Westerhoff HV (2005) From isolation to integration, a systems biology approach for building the Silicon Cell. In: Alberghina L, Westerhoff HV (eds) Systems biology: definitions and perspectives. Springer Berlin Heidelberg, Berlin, pp 13–30
Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:ii215–ii225
Subramanian A, Sarkar RR (2015) Comparison of codon usage bias across Leishmania and Trypanosomatids to understand mRNA secondary structure, relative protein abundance and pathway functions. Genomics 106:232–241
Subramanian A, Sarkar RR (2017) Revealing the mystery of metabolic adaptations using a genome scale model of Leishmania infantum. Sci Rep 7:10262. https://doi.org/10.1038/s41598-017-10743-x
Subramanian A, Jhawar J, Sarkar RR (2015) Dissecting Leishmania infantum energy metabolism – a systems perspective. PLoS One 10:e0137976. https://doi.org/10.1371/journal.pone.0137976
Sueoka N (1988) Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci 85:2653–2657
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Tsonis AA, Elsner JB, Tsonis PA (1991) Periodicity in DNA coding sequences: implications in gene evolution. J Theor Biol 151:323–331
Vinga S, Almeida J (2003) Alignment-free sequence comparison—a review. Bioinformatics 19:513–523
Wang Y, Hill K, Singh S, Kari L (2005) The spectrum of genomic signatures: from dinucleotides to chaos game representation. Gene 346:173–185
Wang S-Y, Tian F-C, Liu X, Wang J (2009a) A novel representation approach to DNA sequence and its application. IEEE Signal Process Lett 16:275–278
Wang S, Tian F, Feng W, Liu X (2009b) Applications of representation method for DNA sequences based on symbolic dynamics. J Mol Struct THEOCHEM 909:33–42
Wang Y, Tang H, DeBarry JD et al (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40:e49–e49
van Weelden SWH, van Hellemond JJ, Opperdoes FR, Tielens AGM (2005) New functions for parts of the Krebs cycle in procyclic Trypanosoma brucei, a cycle not operating as a cycle. J Biol Chem 280:12451–12460
Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV (2001) Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 11:356–372. https://doi.org/10.1101/gr.161901
Wright F (1990) The “effective number of codons” used in a gene. Gene 87:23–29
Yang ZR (2004) Biological applications of support vector machines. Brief Bioinform 5:328–338
Yang X, Wang T (2013) Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths. J Theor Biol 337:61–70
Yao Y-H, Dai Q, Li C et al (2008) Analysis of similarity/dissimilarity of protein sequences. Protein Struct Funct Bioinforma 73:864–871
Yao Y, Yan S, Han J et al (2014) A novel descriptor of protein sequences and its application. J Theor Biol 347:109–117
Yuan C, Liao B, Wang T (2003) New 3D graphical representation of DNA sequences and their numerical characterization. Chem Phys Lett 379:412–417
Zielezinski A, Vinga S, Almeida J, Karlowski WM (2017) Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 18:186
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Subramanian, A., Sarkar, R.R. (2018). Computational Techniques for a Comprehensive Understanding of Different Genotype-Phenotype Factors in Biological Systems and Their Applications. In: Singh, S. (eds) Synthetic Biology. Springer, Singapore. https://doi.org/10.1007/978-981-10-8693-9_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-8693-9_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8692-2
Online ISBN: 978-981-10-8693-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)