Computational Techniques for a Comprehensive Understanding of Different Genotype-Phenotype Factors in Biological Systems and Their Applications

Subramanian, Abhishek; Sarkar, Ram Rup

doi:10.1007/978-981-10-8693-9_8

Abhishek Subramanian^2,3 &
Ram Rup Sarkar^2,3

908 Accesses

Abstract

Identification and analyses of the discrete genotype and phenotype components of a biological system are complex, multi-scale problems. Computational techniques are widely used to extract some meaningful information from the underlying heterogeneous, raw biological data. Here, we review the use of different approaches to mathematically model biological data at different levels of complexities. At the raw sequence level, we discuss about the various techniques that model biological sequences based on frequency, geometry and spectral representation of nucleotides/amino acids. We also discuss about techniques that can be used to capture quantitative patterns of codons and briefly present an application in identification of evolutionary mechanisms that preserve codon usage patterns in trypanosomatids. Lastly, we discuss about the integration of the genotype and phenotype components into a systems-level mathematical representation of functional interactions in the form of a reconstructed genome-scale metabolic network and discern its role in governing variations in metabolic behaviours under different environmental conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Almeida JS, Carrico JA, Maretzek A et al (2001) Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17:429–437
Article CAS Google Scholar
Alonso G, Guevara P, Ramirez JL (1992) Trypanosomatidae codon usage and GC distribution. Mem Inst Oswaldo Cruz 87:517–523
Article CAS Google Scholar
Apweiler R, Martin MJ, O’Donovan C et al (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41:D43–D47
Article Google Scholar
Arakawa K, Oshita K, Tomita M (2009) A web server for interactive and zoomable Chaos Game Representation images. Source Code Biol Med 4:6
Article Google Scholar
Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Res 28:304–305
Article CAS Google Scholar
Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2008) GenBank. Nucleic Acids Res 36:D25–D30
Article CAS Google Scholar
Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2010) GenBank. Nucleic Acids Res 38:D46–D51. https://doi.org/10.1093/nar/gkp1024
Article CAS PubMed Google Scholar
Bielińska-Wąż D, Wąż P (2017) Spectral-dynamic representation of DNA sequences. J Biomed Inform 72:1–7
Article Google Scholar
Borst P (1986) How proteins get into microbodies (peroxisomes, glyoxysomes, glycosomes). Biochim Biophys Acta (BBA)-Gene Struct Expr 866:179–203
Article CAS Google Scholar
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA1. J Mol Biol 268:78–94
Article CAS Google Scholar
Consortium U et al (2014) Activities at the universal protein resource (UniProt). Nucleic Acids Res 42:D191–D198
Article Google Scholar
Crick F (1970) Central dogma of molecular biology. Nature 227:561–563
Article CAS Google Scholar
Dai Q, Liu X-Q, Wang T-M, Vukicevic D (2007) Linear regression model of DNA sequences and its application. J Comput Chem 28:1434–1445
Article CAS Google Scholar
Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23:324–328
Article CAS Google Scholar
De Castro E, Sigrist CJA, Gattiker A et al (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34:W362–W365
Article Google Scholar
Dror G, Sorek R, Shamir R (2005) Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21:897–901. https://doi.org/10.1093/bioinformatics/bti132
Article CAS PubMed Google Scholar
Eddy SR (1996) Hidden markov models. Curr Opin Struct Biol 6:361–365
Article CAS Google Scholar
Eddy SR (2001) HMMER: profile hidden Markov models for biological sequence analysis
Google Scholar
Emanuelsson O, Brunak S, Von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:953–971
Article CAS Google Scholar
Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90. https://doi.org/10.1038/47056
Article CAS PubMed Google Scholar
Fertil B, Massin M, Lespinats S et al (2005) GENSTYLE: exploration and analysis of DNA sequences with genomic signature. Nucleic Acids Res 33:W512–W515
Article CAS Google Scholar
Finn RD, Bateman A, Clements J et al (2013) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230
Article Google Scholar
Fridolin G, Green S (2017) The sum of the parts: large-scale modeling in systems biology. Philos Theory Biol 9:1–26
Google Scholar
Gasteiger E, Hoogland C, Gattiker A et al (2005) Protein identification and analysis tools on the ExPASy server. In: Walker JM (ed) The proteomics protocols handbook. Humana press, Totowa, pp 571–607
Chapter Google Scholar
Glunčić M, Paar V (2012) Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res 41:e17–e17
Article Google Scholar
Guerra-Giraldez C, Quijada L, Clayton CE (2002) Compartmentation of enzymes in a microbody, the glycosome, is essential in Trypanosoma brucei. J Cell Sci 115:2651–2658
CAS PubMed Google Scholar
Henry CS, Overbeek R, Xia F et al (2011) Connecting genotype to phenotype in the era of high-throughput sequencing. Biochim Biophys Acta (BBA) Gen Subj 1810:967–977
Article CAS Google Scholar
Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42:287–299
Article CAS Google Scholar
Hou W, Pan Q, He M (2016) A new graphical representation of protein sequences and its applications. Phys A Stat Mech Appl 444:996–1002
Article CAS Google Scholar
Huh W-K, Falvo JV, Gerke LC et al (2003) Global analysis of protein localization in budding yeast. Nature 425:686
Article CAS Google Scholar
Huynen M (2000) Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10:1204–1210. https://doi.org/10.1101/gr.10.8.1204
Article CAS PubMed PubMed Central Google Scholar
Jeffrey HJ (1990) Chaos game representation of gene structure. Nucleic Acids Res 18:2163–2170
Article CAS Google Scholar
Jensen LJ, Kuhn M, Stark M et al (2008) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37:D412–D416
Article Google Scholar
Joseph J, Sasikumar R (2006) Chaos game representation for comparison of whole genomes. BMC Bioinforma 7:243
Article Google Scholar
Kanehisa M, Goto S, Sato Y et al (2013) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205
Article Google Scholar
Lin J (1991) Divergence measures based on the Shannon entropy. Inf Theory IEEE Trans 37:145–151
Article Google Scholar
Lu YY, Tang K, Ren J et al (2017) CAFE: accelerated alignment-free sequence analysis. Nucleic Acids Res 45:W554–W559. https://doi.org/10.1093/nar/gkx351
Article CAS PubMed PubMed Central Google Scholar
Michels PAM (1988) Compartmentation of glycolysis in trypanosomes: a potential target for new trypanocidal drugs. Biol Cell 64:157–164
Article CAS Google Scholar
Misset O, Bos OJM, Opperdoes FR (1986) Glycolytic enzymes of Trypanosoma brucei. Eur J Biochem 157:441–453
Article CAS Google Scholar
Mostafavi S, Ray D, Warde-Farley D et al (2008) GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol 9:S4
Article Google Scholar
Mount DW (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, New York
Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Article CAS Google Scholar
Opperdoes FR, Szikora J-P (2006) In silico prediction of the glycosomal enzymes of Leishmania major and trypanosomes. Mol Biochem Parasitol 147:193–206
Article CAS Google Scholar
Orgogozo V, Morizot B, Martin A (2015) The differential view of genotype – phenotype relationships. Front Genet 6:179
Article Google Scholar
Overbeek R, Fonstein M, D’Souza M et al (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901
Article CAS Google Scholar
Pandit A, Sinha S (2010) Using genomic signatures for HIV-1 sub-typing. BMC Bioinforma 11:S26
Article Google Scholar
Pellegrini M, Marcotte EM, Thompson MJ et al (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288
Article CAS Google Scholar
Plotkin JB, Kudla G (2010) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42
Article Google Scholar
Qi Z-H, Jin M-Z (2016) An intuitive graphical method for visualizing protein sequences based on linear regression and physicochemical properties. Match Commun Math Comput Chem 75:463–480
Google Scholar
Qi J, Luo H, Hao B (2004) CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res 32:W45–W47
Article CAS Google Scholar
Qi X, Wu Q, Zhang Y et al (2011) A novel model for DNA sequence similarity analysis based on graph theory. Evol Bioinforma 7:EBO–S7364
Article Google Scholar
Randić M, Vracko M, Nandy A, Basak SC (2000) On 3-D graphical representation of DNA primary sequences and their numerical characterization. J Chem Inf Comput Sci 40:1235–1244
Article Google Scholar
Randić M, Vračko M, Lerš N, Plavšić D (2003a) Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem Phys Lett 368:1–6
Article Google Scholar
Randić M, Vračko M, Lerš N, Plavšić D (2003b) Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation. Chem Phys Lett 371:202–207
Article Google Scholar
Randić M, Zupan J, Balaban AT (2004) Unique graphical representation of protein sequences based on nucleotide triplet codons. Chem Phys Lett 397:247–252
Article Google Scholar
Ren Q, Chen K, Paulsen IT (2006) TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels. Nucleic Acids Res 35:D274–D279
Article Google Scholar
Rice P, Longden I, Bleasby A et al (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277
Article CAS Google Scholar
Schomburg I, Chang A, Placzek S et al (2012) BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res 41:D764–D772 gks1049
Article Google Scholar
Sharma D, Issac B, Raghava GPS, Ramaswamy R (2004) Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20:1405–1412
Article CAS Google Scholar
Sharp PM, Li W-H (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295
Article CAS Google Scholar
Smith TF, Waterman MS (1981) Comparison of biosequences. Adv Appl Math 2:482–489
Article Google Scholar
Snoep JL, Westerhoff HV (2005) From isolation to integration, a systems biology approach for building the Silicon Cell. In: Alberghina L, Westerhoff HV (eds) Systems biology: definitions and perspectives. Springer Berlin Heidelberg, Berlin, pp 13–30
Chapter Google Scholar
Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:ii215–ii225
Article Google Scholar
Subramanian A, Sarkar RR (2015) Comparison of codon usage bias across Leishmania and Trypanosomatids to understand mRNA secondary structure, relative protein abundance and pathway functions. Genomics 106:232–241
Article CAS Google Scholar
Subramanian A, Sarkar RR (2017) Revealing the mystery of metabolic adaptations using a genome scale model of Leishmania infantum. Sci Rep 7:10262. https://doi.org/10.1038/s41598-017-10743-x
Article CAS PubMed PubMed Central Google Scholar
Subramanian A, Jhawar J, Sarkar RR (2015) Dissecting Leishmania infantum energy metabolism – a systems perspective. PLoS One 10:e0137976. https://doi.org/10.1371/journal.pone.0137976
Article CAS PubMed PubMed Central Google Scholar
Sueoka N (1988) Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci 85:2653–2657
Article CAS Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Article CAS Google Scholar
Tsonis AA, Elsner JB, Tsonis PA (1991) Periodicity in DNA coding sequences: implications in gene evolution. J Theor Biol 151:323–331
Article CAS Google Scholar
Vinga S, Almeida J (2003) Alignment-free sequence comparison—a review. Bioinformatics 19:513–523
Article CAS Google Scholar
Wang Y, Hill K, Singh S, Kari L (2005) The spectrum of genomic signatures: from dinucleotides to chaos game representation. Gene 346:173–185
Article CAS Google Scholar
Wang S-Y, Tian F-C, Liu X, Wang J (2009a) A novel representation approach to DNA sequence and its application. IEEE Signal Process Lett 16:275–278
Article Google Scholar
Wang S, Tian F, Feng W, Liu X (2009b) Applications of representation method for DNA sequences based on symbolic dynamics. J Mol Struct THEOCHEM 909:33–42
Article CAS Google Scholar
Wang Y, Tang H, DeBarry JD et al (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40:e49–e49
Article CAS Google Scholar
van Weelden SWH, van Hellemond JJ, Opperdoes FR, Tielens AGM (2005) New functions for parts of the Krebs cycle in procyclic Trypanosoma brucei, a cycle not operating as a cycle. J Biol Chem 280:12451–12460
Article Google Scholar
Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV (2001) Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 11:356–372. https://doi.org/10.1101/gr.161901
Article CAS PubMed Google Scholar
Wright F (1990) The “effective number of codons” used in a gene. Gene 87:23–29
Article CAS Google Scholar
Yang ZR (2004) Biological applications of support vector machines. Brief Bioinform 5:328–338
Article CAS Google Scholar
Yang X, Wang T (2013) Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths. J Theor Biol 337:61–70
Article Google Scholar
Yao Y-H, Dai Q, Li C et al (2008) Analysis of similarity/dissimilarity of protein sequences. Protein Struct Funct Bioinforma 73:864–871
Article CAS Google Scholar
Yao Y, Yan S, Han J et al (2014) A novel descriptor of protein sequences and its application. J Theor Biol 347:109–117
Article CAS Google Scholar
Yuan C, Liao B, Wang T (2003) New 3D graphical representation of DNA sequences and their numerical characterization. Chem Phys Lett 379:412–417
Article CAS Google Scholar
Zielezinski A, Vinga S, Almeida J, Karlowski WM (2017) Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 18:186
Article Google Scholar

Download references

Author information

Authors and Affiliations

Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
Abhishek Subramanian & Ram Rup Sarkar
Academy of Scientific & Innovative Research (AcSIR), CSIR-NCL Campus, Pune, India
Abhishek Subramanian & Ram Rup Sarkar

Authors

Abhishek Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Ram Rup Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ram Rup Sarkar .

Editor information

Editors and Affiliations

Department of Pathogenesis and Cellular Response, National Centre for Cell Science, Computational and Systems Biology Lab, Pune, Maharashtra, India
Shailza Singh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Subramanian, A., Sarkar, R.R. (2018). Computational Techniques for a Comprehensive Understanding of Different Genotype-Phenotype Factors in Biological Systems and Their Applications. In: Singh, S. (eds) Synthetic Biology. Springer, Singapore. https://doi.org/10.1007/978-981-10-8693-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-10-8693-9_8
Published: 02 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8692-2
Online ISBN: 978-981-10-8693-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics