Skip to main content

Linking Genome-Scale Metabolic Modeling and Genome Annotation

  • Protocol
  • First Online:
Systems Metabolic Engineering

Part of the book series: Methods in Molecular Biology ((MIMB,volume 985))

Abstract

Genome-scale metabolic network reconstructions, assembled from annotated genomes, serve as a platform for integrating data from heterogeneous sources and generating hypotheses for further experimental validation. Implementing constraint-based modeling techniques such as flux balance analysis (FBA) on network reconstructions allows for interrogating metabolism at a systems level, which aids in identifying and rectifying gaps in knowledge. With genome sequences for various organisms from prokaryotes to eukaryotes becoming increasingly available, a significant bottleneck lies in the structural and functional annotation of these sequences. Using topologically based and biologically inspired metabolic network refinement, we can better characterize enzymatic functions present in an organism and link annotation of these functions to candidate transcripts; both steps can be experimentally validated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blaby-Haas CE, de Crecy-Lagard V (2011) Mining high-throughput experimental data to link gene and function. Trends Biotechnol 29(4):174–182. doi:10.1016/j.tibtech.2011.01.001

    Article  CAS  Google Scholar 

  2. Hanson AD, Pribat A, Waller JC, de Crecy-Lagard V (2010) ‘Unknown’ proteins and ‘orphan’ enzymes: the missing half of the engineering parts list–and how to find it. Biochem J 425(1):1–11. doi:10.1042/BJ20091328

    Article  CAS  Google Scholar 

  3. Pouliot Y, Karp PD (2007) A survey of orphan enzyme activities. BMC Bioinformatics 8:244. doi:10.1186/1471-2105-8-244

    Article  Google Scholar 

  4. Rombel IT, Sykes KF, Rayner S, Johnston SA (2002) ORF-FINDER: a vector for high-throughput gene identification. Gene 282(1–2):33–41

    Article  CAS  Google Scholar 

  5. Lamesch P, Li N, Milstein S, Fan C, Hao T, Szabo G, Hu Z, Venkatesan K, Bethel G, Martin P, Rogers J, Lawlor S, McLaren S, Dricot A, Borick H, Cusick ME, Vandenhaute J, Dunham I, Hill DE, Vidal M (2007) hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes. Genomics 89(3):307–315. doi:10.1016/j.ygeno.2006.11.012

    Article  CAS  Google Scholar 

  6. Frishman D (2007) Protein annotation at genomic scale: the current status. Chem Rev 107(8):3448–3466. doi:10.1021/cr068303k

    Article  CAS  Google Scholar 

  7. Erdin S, Lisewski AM, Lichtarge O (2011) Protein function prediction: towards integration of similarity metrics. Curr Opin Struct Biol 21(2):180–188. doi:10.1016/j.sbi.2011.02.001

    Article  CAS  Google Scholar 

  8. Emes RD (2008) Inferring function from homology. Methods Mol Biol 453:149–168. doi:10.1007/978-1-60327-429-6_6

    Article  CAS  Google Scholar 

  9. Jones CE, Brown AL, Baumann U (2007) Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics 8:170. doi:10.1186/1471-2105-8-170

    Article  Google Scholar 

  10. Thiele I, Palsson BO (2010) A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc 5(1):93–121. doi:10.1038/nprot.2009.203

    Article  CAS  Google Scholar 

  11. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2005) GenBank. Nucleic Acids Res 33(Database issue):D34–D38. doi:10.1093/nar/gki063

    Article  CAS  Google Scholar 

  12. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. doi:10.1016/S0022-2836(05)80360-2

    CAS  Google Scholar 

  13. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36(Web Server issue):W5–W9. doi:10.1093/nar/gkn201

    Article  CAS  Google Scholar 

  14. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30

    Article  CAS  Google Scholar 

  15. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32(Database issue):D277–D280. doi:10.1093/nar/gkh063

    Article  CAS  Google Scholar 

  16. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36(Database issue):D480–D484. doi:10.1093/nar/gkm882

    CAS  Google Scholar 

  17. Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31(13):3784–3788

    Article  CAS  Google Scholar 

  18. Schneider M, Tognolli M, Bairoch A (2004) The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. Plant Physiol Biochem 42(12):1013–1021. doi:10.1016/j.plaphy.2004.10.009

    Article  CAS  Google Scholar 

  19. Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL (2010) High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol 28(9):977–982. doi:10.1038/nbt.1672

    Article  CAS  Google Scholar 

  20. Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD (2012) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 40(Database issue):D742–D753. doi:10.1093/nar/gkr1014

    Article  CAS  Google Scholar 

  21. Karp PD, Caspi R (2011) A survey of metabolic databases emphasizing the MetaCyc family. Arch Toxicol 85(9):1015–1033. doi:10.1007/s00204-011-0705-2

    Article  CAS  Google Scholar 

  22. Hertz-Fowler C, Peacock CS, Wood V, Aslett M, Kerhornou A, Mooney P, Tivey A, Berriman M, Hall N, Rutherford K, Parkhill J, Ivens AC, Rajandream MA, Barrell B (2004) GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res 32(Database issue):D339–D343. doi:10.1093/nar/gkh007

    Article  CAS  Google Scholar 

  23. Kumar A, Suthers PF, Maranas CD (2012) MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinformatics 13(1):6. doi:10.1186/1471-2105-13-6

    Article  Google Scholar 

  24. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32(Database issue):D115–D119. doi:10.1093/nar/gkh131

    Article  CAS  Google Scholar 

  25. Bolser DM, Chibon PY, Palopoli N, Gong S, Jacob D, Del Angel VD, Swan D, Bassi S, Gonzalez V, Suravajhala P, Hwang S, Romano P, Edwards R, Bishop B, Eargle J, Shtatland T, Provart NJ, Clements D, Renfro DP, Bhak D, Bhak J (2012) MetaBase—the wiki-database of biological databases. Nucleic Acids Res 40(Database issue):D1250–D1254. doi:10.1093/nar/gkr109

    Article  CAS  Google Scholar 

  26. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H (2006) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2006:0008. doi:10.1038/msb4100050

    Google Scholar 

  27. Yamamoto N, Nakahigashi K, Nakamichi T, Yoshino M, Takai Y, Touda Y, Furubayashi A, Kinjyo S, Dose H, Hasegawa M, Datsenko KA, Nakayashiki T, Tomita M, Wanner BL, Mori H (2009) Update on the Keio collection of Escherichia coli single-gene deletion mutants. Mol Syst Biol 5:335. doi:10.1038/msb.2009.92

    Article  Google Scholar 

  28. Zhang R, Ou HY, Zhang CT (2004) DEG: a database of essential genes. Nucleic Acids Res 32(Database issue):D271–D272. doi:10.1093/nar/gkh024

    Article  CAS  Google Scholar 

  29. Zhang R, Lin Y (2009) DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res 37(Database issue):D455–D458. doi:10.1093/nar/gkn858

    Article  CAS  Google Scholar 

  30. Chen WH, Minguez P, Lercher MJ, Bork P (2012) OGEE: an online gene essentiality database. Nucleic Acids Res 40(Database issue):D901–D906. doi:10.1093/nar/gkr986

    Article  CAS  Google Scholar 

  31. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J, Forum S (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4):524–531

    Article  CAS  Google Scholar 

  32. Schellenberger J, Park JO, Conrad TM, Palsson BO (2010) BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics 11:213. doi:10.1186/1471-2105-11-213

    Article  Google Scholar 

  33. Pabinger S, Rader R, Agren R, Nielsen J, Trajanoski Z (2011) MEMOSys: bioinformatics platform for genome-scale metabolic models. BMC Syst Biol 5:20. doi:10.1186/1752-0509-5-20

    Article  CAS  Google Scholar 

  34. Schellenberger J, Que R, Fleming RM, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, Kang J, Hyduke DR, Palsson BO (2011) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc 6(9):1290–1307. doi:10.1038/nprot.2011.308

    Article  CAS  Google Scholar 

  35. Keating SM, Bornstein BJ, Finney A, Hucka M (2006) SBMLToolbox: an SBML toolbox for MATLAB users. Bioinformatics 22(10):1275–1277. doi:10.1093/bioinformatics/btl111

    Article  CAS  Google Scholar 

  36. Mahadevan R, Schilling CH (2003) The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng 5(4):264–276

    Article  CAS  Google Scholar 

  37. Chavali AK, D'Auria KM, Hewlett EL, Pearson RD, Papin JA (2012) A metabolic network approach for the identification and prioritization of antimicrobial drug targets. Trends Microbiol 20(3):113–123. doi:10.1016/j.tim.2011.12.004

    Article  CAS  Google Scholar 

  38. Satish Kumar V, Dasika MS, Maranas CD (2007) Optimization based automated curation of metabolic reconstructions. BMC Bioinformatics 8:212. doi:10.1186/1471-2105-8-212

    Article  Google Scholar 

  39. Reed JL, Patel TR, Chen KH, Joyce AR, Applebee MK, Herring CD, Bui OT, Knight EM, Fong SS, Palsson BO (2006) Systems approach to refining genome annotation. Proc Natl Acad Sci U S A 103(46):17480–17484. doi:10.1073/pnas.0603364103

    Article  CAS  Google Scholar 

  40. Karp PD, Paley S, Romero P (2002) The Pathway Tools software. Bioinformatics 18(Suppl 1):S225–S232

    Article  Google Scholar 

  41. Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L, Altman T, Paulsen I, Keseler IM, Caspi R (2010) Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 11(1):40–79. doi:10.1093/bib/bbp043

    Article  CAS  Google Scholar 

  42. Latendresse M, Krummenacker M, Trupp M, Karp PD (2012) Construction and completion of flux balance models from pathway databases. Bioinformatics 28(3):388–396. doi:10.1093/bioinformatics/btr681

    Article  CAS  Google Scholar 

  43. Green ML, Karp PD (2004) A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 5:76. doi:10.1186/1471-2105-5-76

    Article  Google Scholar 

  44. Green ML, Karp PD (2007) Using genome-context data to identify specific types of functional associations in pathway/genome databases. Bioinformatics 23(13):i205–i211. doi:10.1093/bioinformatics/btm213

    Article  CAS  Google Scholar 

  45. Kumar VS, Maranas CD (2009) GrowMatch: an automated method for reconciling in silico/in vivo growth predictions. PLoS Comput Biol 5(3):e1000308. doi:10.1371/journal.pcbi.1000308

    Article  Google Scholar 

  46. Herrgard MJ, Fong SS, Palsson BO (2006) Identification of genome-scale metabolic network models using experimentally measured flux profiles. PLoS Comput Biol 2(7):e72. doi:10.1371/journal.pcbi.0020072

    Article  Google Scholar 

  47. Hatzimanikatis V, Li C, Ionita JA, Henry CS, Jankowski MD, Broadbelt LJ (2005) Exploring the diversity of complex metabolic networks. Bioinformatics 21(8):1603–1609. doi:10.1093/bioinformatics/bti213

    Article  CAS  Google Scholar 

  48. Ghamsari L, Balaji S, Shen Y, Yang X, Balcha D, Fan C, Hao T, Yu H, Papin JA, Salehi-Ashtiani K (2011) Genome-wide functional annotation and structural verification of metabolic ORFeome of Chlamydomonas reinhardtii. BMC Genomics 12(Suppl 1):S4. doi:10.1186/1471-2164-12-S1-S4

    Article  CAS  Google Scholar 

  49. Manichaikul A, Ghamsari L, Hom EF, Lin C, Murray RR, Chang RL, Balaji S, Hao T, Shen Y, Chavali AK, Thiele I, Yang X, Fan C, Mello E, Hill DE, Vidal M, Salehi-Ashtiani K, Papin JA (2009) Metabolic network analysis integrated with transcript verification for sequenced genomes. Nat Methods 6(8):589–592. doi:10.1038/nmeth.1348

    Article  CAS  Google Scholar 

  50. Chang RL, Ghamsari L, Manichaikul A, Hom EF, Balaji S, Fu W, Shen Y, Hao T, Palsson BO, Salehi-Ashtiani K, Papin JA (2011) Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism. Mol Syst Biol 7:518. doi:10.1038/msb.2011.52

    Article  Google Scholar 

  51. Orth JD, Palsson BO (2010) Systematizing the generation of missing metabolic knowledge. Biotechnol Bioeng 107(3):403–412. doi:10.1002/bit.22844

    Article  CAS  Google Scholar 

  52. Rolfsson O, Palsson BO, Thiele I (2011) The human metabolic reconstruction Recon 1 directs hypotheses of novel human metabolic functions. BMC Syst Biol 5:155. doi:10.1186/1752-0509-5-155

    Article  Google Scholar 

  53. Oberhardt MA, Chavali AK, Papin JA (2009) Flux balance analysis: interrogating genome-scale metabolic networks. Methods Mol Biol 500:61–80. doi:10.1007/978-1-59745-525-1_3

    Article  CAS  Google Scholar 

  54. Joyce AR, Reed JL, White A, Edwards R, Osterman A, Baba T, Mori H, Lesely SA, Palsson BO, Agarwalla S (2006) Experimental and computational assessment of conditionally essential genes in Escherichia coli. J Bacteriol 188(23):8259–8271. doi:10.1128/JB.00740-06

    Article  CAS  Google Scholar 

  55. Feist AM, Palsson BO (2010) The biomass objective function. Curr Opin Microbiol 13(3):344–349. doi:10.1016/j.mib.2010.03.003

    Article  CAS  Google Scholar 

  56. Chavali AK, Whittemore JD, Eddy JA, Williams KT, Papin JA (2008) Systems analysis of metabolism in the pathogenic trypanosomatid Leishmania major. Mol Syst Biol 4:177. doi:10.1038/msb.2008.15

    Article  Google Scholar 

  57. Orth JD, Palsson BO (2012) Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC Syst Biol 6(1):30. doi:10.1186/1752-0509-6-30

    Article  Google Scholar 

  58. Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BO (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci U S A 104(6):1777–1782. doi:10.1073/pnas.0610772104

    Article  CAS  Google Scholar 

  59. Yeku O, Frohman MA (2011) Rapid amplification of cDNA ends (RACE). Methods Mol Biol 703:107–122. doi:10.1007/978-1-59745-248-9_8

    Article  CAS  Google Scholar 

  60. Frohman MA, Dush MK, Martin GR (1988) Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proc Natl Acad Sci U S A 85(23):8998–9002

    Article  CAS  Google Scholar 

  61. Jones SJ (2006) Prediction of genomic functional elements. Annu Rev Genomics Hum Genet 7:315–338. doi:10.1146/annurev.genom.7.080505.115745

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason A. Papin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Blais, E.M., Chavali, A.K., Papin, J.A. (2013). Linking Genome-Scale Metabolic Modeling and Genome Annotation. In: Alper, H. (eds) Systems Metabolic Engineering. Methods in Molecular Biology, vol 985. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-299-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-299-5_4

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-298-8

  • Online ISBN: 978-1-62703-299-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics