Integrated Analysis of Transcriptomic and Proteomic Datasets Reveals Information on Protein Expressivity and Factors Affecting Translational Efficiency

  • Jiangxin Wang
  • Gang Wu
  • Lei Chen
  • Weiwen ZhangEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1375)


Integrated analysis of large-scale transcriptomic and proteomic data can provide important insights into the metabolic mechanisms underlying complex biological systems. In this chapter, we present methods to address two aspects of issues related to integrated transcriptomic and proteomic analysis. First, due to the fact that proteomic datasets are often incomplete, and integrated analysis of partial proteomic data may introduce significant bias. To address these issues, we describe a zero-inflated Poisson (ZIP)-based model to uncover the complicated relationships between protein abundances and mRNA expression levels, and then apply them to predict protein abundance for the proteins not experimentally detected. The ZIP model takes into consideration the undetected proteins by assuming that there is a probability mass at zero representing expressed proteins that were undetected owing to technical limitations. The model validity is demonstrated using biological information of operons, regulons, and pathways. Second, weak correlation between transcriptomic and proteomic datasets is often due to biological factors affecting translational processes. To quantify the effects of these factors, we describe a multiple regression-based statistical framework to quantitatively examine the effects of various translational efficiency-related sequence features on mRNA–protein correlation. Using the datasets from sulfate-reducing bacteria Desulfovibrio vulgaris, the analysis shows that translation-related sequence features can contribute up to 15.2–26.2 % of the total variation of the correlation between transcriptomic and proteomic datasets, and also reveals the relative importance of various features in translation process.


Transcriptome Proteome Correlation Zero-inflated Poisson regression Prediction Undetected proteins Translation Sequence features 


  1. 1.
    Medini D, Serruto D, Parkhill J, Relman DA, Donati C, Moxon R, Falkow S, Rappuoli R (2008) Microbiology in the post-genomic era. Nat Rev Microbiol 6:419–430PubMedGoogle Scholar
  2. 2.
    Kyrpides NC (2009) Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream. Nat Biotechnol 27:627–632CrossRefPubMedGoogle Scholar
  3. 3.
    Uchiyama I, Mihara M, Nishide H, Chiba H (2013) MBGD update 2013: the microbial genome database for exploring the diversity of microbial world. Nucleic Acids Res 41(Database issue):D631–D635CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Schoolnik GK (2001) The accelerating convergence of genomics and microbiology. Genome Biol 2: REPORTS4009Google Scholar
  5. 5.
    Ward N, Fraser CM (2005) How genomics has affected the concept of microbiology. Curr Opin Microbiol 8:564–571CrossRefPubMedGoogle Scholar
  6. 6.
    Sharan R, Ideker T (2006) Modeling cellular machinery through biological network comparison. Nat Biotechnol 24:427–433CrossRefPubMedGoogle Scholar
  7. 7.
    Cardenas E, Tiedje JM (2008) New tools for discovering and characterizing microbial diversity. Curr Opin Biotechnol 19:544–549CrossRefPubMedGoogle Scholar
  8. 8.
    Rocha EP (2008) The organization of the bacterial genome. Annu Rev Genet 42:211–223CrossRefPubMedGoogle Scholar
  9. 9.
    Fiehn O (2001) Combining genomics, metabolome analysis, and biochemical modelling to understand metabolic networks. Comp Funct Genomics 2:155–168CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Singh OV, Nagaraj NS (2006) Transcriptomics, proteomics and interactomics: unique approaches to track the insights of bioremediation. Brief Funct Genomic Proteomic 4:355–362CrossRefPubMedGoogle Scholar
  11. 11.
    Lin J, Qian J (2007) Systems biology approach to integrative comparative genomics. Expert Rev Proteomics 4:107–119CrossRefPubMedGoogle Scholar
  12. 12.
    Kandpal R, Saviola B, Felton J (2009) The era of omics unlimited. Biotechniques 46:351–355CrossRefPubMedGoogle Scholar
  13. 13.
    Ishii N, Tomita M (2009) Multi-omics data-driven systems biology of E. coli. In: Lee SY (ed) Systems biology and biotechnology of Escherichia coli. Springer, Dordrecht, The Netherlands, pp 41–57CrossRefGoogle Scholar
  14. 14.
    Tang YJ, Martin HG, Myers S, Rodriguez S, Baidoo EE, Keasling JD (2009) Advances in analysis of microbial metabolic fluxes via 13C isotopic labeling. Mass Spectrom Rev 28:362–375CrossRefPubMedGoogle Scholar
  15. 15.
    Park SJ, Lee SY, Cho J, Kim TY, Lee JW, Park JH, Han MJ (2005) Global physiological understanding and metabolic engineering of microorganisms based on omics studies. Appl Microbiol Biotechnol 68:567–579CrossRefPubMedGoogle Scholar
  16. 16.
    Gygi SP, Rochon Y, Franza BR, Aebersold R (1999) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19:1720–1730CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Hegde PS, White IR, Debouck C (2003) Interplay of transcriptomics and proteomics. Curr Opin Biotechnol 14:647–651CrossRefPubMedGoogle Scholar
  18. 18.
    Mootha VK, Lepage P, Miller K, Bunkenborg J, Reich M, Hjerrild M, Del-monte T, Villeneuve A, Sladek R et al (2003) Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proc Natl Acad Sci U S A 100:605–610CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Mootha VK, Bunkenborg J, Olsen JV, Hjerrild M, Wisniewski JR, Stahl E, Bolouri MS, Ray HN, Sihag S et al (2003) Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria. Cell 115:629–640CrossRefPubMedGoogle Scholar
  20. 20.
    Alter O, Golub GH (2004) Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. Proc Natl Acad Sci U S A 101:16577–16582CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Greenbaum D, Jansen R, Gerstein M (2002) Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts. Bioinformatics 18:585–596CrossRefPubMedGoogle Scholar
  22. 22.
    Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292:929–934CrossRefPubMedGoogle Scholar
  23. 23.
    Washburn MP, Koller A, Oshiro G, Ulaszek G, Plouffe D, Deciu C, Winzeler E, Yates JR III (2003) Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 100:3107–3112CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Greenbaum D, Colangelo C, Williams K, Gerstein M (2003) Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol 4:117.1–117.8CrossRefGoogle Scholar
  25. 25.
    Beyer A, Hollunder J, Nasheuer HP, Wilhelm T (2004) Posttranscriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale. Mol Cell Proteomics 3:1083–1092CrossRefPubMedGoogle Scholar
  26. 26.
    Nie L, Wu G, Zhang W (2006) Correlation of mRNA expression and protein abundance affected by multiple sequence features related to translational efficiency in Desulfovibrio vulgaris: a quantitative analysis. Genetics 174:2229–2243CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Wilkins MR, Pasquali C, Appel RD, Ou K, Golaz O, Sanchez J, Yan JX, Gooley AA, Hughes G et al (1996) From proteins to proteomes: large scale protein identification by two-dimensional electrophoresis and amino acid analysis. Biotechnology (NY) 14:61–65CrossRefGoogle Scholar
  28. 28.
    Scherl A, Francois P, Charbonnier Y, Deshusses JM, Koessler T, Huyghe A, Bento M, Stahl-Zeng J, Fischer A et al (2006) Exploring glycopeptide-resistance in Staphylococcus aureus: a combined proteomics and transcriptomics approach for the identification of resistance-related markers. BMC Genomics 7:296CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Zhang W, Gritsenko M, Moore RJ, Culley DE, Nie L, Petritis K, Strittmat-ter EF, Camp DG, Smith RD, Brockman FJ (2006) A proteomic view of Desulfovibrio vulgaris metabolism as determined by liquid chromatography coupled with tandem mass spectrometry. Proteomics 6:4286–4299CrossRefPubMedGoogle Scholar
  30. 30.
    Tuikkala J, Elo L, Nevalainen OS, Aittokallio T (2006) Improving missing value estimation in microarray data with gene ontology. Bioinformatics 22:566–572CrossRefPubMedGoogle Scholar
  31. 31.
    Nie L, Wu G, Brockman FJ, Zhang W (2006) Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated Poisson regression models to predict abundance of undetected proteins. Bioinformatics 22:1641–1647CrossRefPubMedGoogle Scholar
  32. 32.
    Collins RF, Roberts M, Phoenix DA (1995) Codon bias in Escherichia coli may modulate translation initiation. Biochem Soc Trans 23:76CrossRefGoogle Scholar
  33. 33.
    Akashi H, Gojobori T (2002) Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci U S A 99:3695–3700CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Tate WP, Poole ES, Dalphin ME, Major LL, Crawford DJ et al (1996) The translational stop signal: codon with a context, or extended factor recognition element? Biochimie 78:945–952CrossRefPubMedGoogle Scholar
  35. 35.
    Heidelberg JF, Seshadri R, Haveman SA, Hemme CL et al (2004) The genome sequence of the anaerobic, sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough. Nat Biotechnol 22:554–559CrossRefPubMedGoogle Scholar
  36. 36.
    Zhang W, Culley DE, Scholten JC, Hogan M, Vitiritti L, Brockman FJ (2006) Global transcriptomic analysis of Desulfovibrio vulgaris on different electron donors. Antonie Van Leeuwenhoek 89:221–237CrossRefPubMedGoogle Scholar
  37. 37.
    Nie L, Wu G, Zhang W (2006) Correlation between mRNA and protein abundance in Desulfovibrio vulgaris: a multiple regression to identify sources of variations. Biochem Biophys Res Commun 339:603–610CrossRefPubMedGoogle Scholar
  38. 38.
    McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, Boca Raton, FLCrossRefGoogle Scholar
  39. 39.
    Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14CrossRefGoogle Scholar
  40. 40.
    Johnson RA (2005) Miller and Freund’s probability and statistics for engineers. Pearson prentice HallGoogle Scholar
  41. 41.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning-data mining, inference, prediction. Springer, New York, NY, USAGoogle Scholar
  42. 42.
    Osada Y, Saito R, Tomita M (1999) Analysis of base-pairing potentials between 16S rRNA and 5′ UTR for translation initiation in various prokaryotes. Bioinformatics 15:578–581CrossRefPubMedGoogle Scholar
  43. 43.
    Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL (2001) A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17:1123–1130CrossRefPubMedGoogle Scholar
  44. 44.
    Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31:3429–3431CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Hofacker IL, Stadler PF (2006) Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics 22:1172–1176CrossRefPubMedGoogle Scholar
  46. 46.
    Wu G, Nie L, Zhang W (2006) Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance. Biochem Biophys Res Commun 344:114–121CrossRefPubMedGoogle Scholar
  47. 47.
    Devore J, Farnum N (2005) Applied statistics for engineers and scientists. Thompson Learning, Belmont, CAGoogle Scholar
  48. 48.
    Ott RY, Longnecker M (2001) An introduction to statistical methods and data analysis. Thompson Learning, Pacific Grove, CAGoogle Scholar
  49. 49.
    Montgomery DC (2001) Introduction to statistical quality control (Wiley series in statistics and probability). Wiley, New YorkGoogle Scholar
  50. 50.
    Nie L, Wu G, Culley DE, Scholten JC, Zhang W (2007) Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. Crit Rev Biotechnol 27:63–75CrossRefPubMedGoogle Scholar
  51. 51.
    Lange R, Hengge-Aronis R (1994) The cellular concentration of the S subunit of RNA polymerase in Escherichia coli is controlled at the levels of transcription, translation, and protein stability. Genes Dev 8:1600–1612CrossRefPubMedGoogle Scholar
  52. 52.
    Rocha EP, Danchin A, Viari A (1999) Translation in Bacillus subtilis: roles and trends of initiation and termination, insights from a genome analysis. Nucleic Acids Res 27:3567–3576CrossRefPubMedPubMedCentralGoogle Scholar
  53. 53.
    Romby P, Springer M (2003) Bacterial translational control at atomic resolution. Trends Genet 19:155–161CrossRefPubMedGoogle Scholar
  54. 54.
    Lithwick G, Margalit H (2003) Hierarchy of sequence-dependent features associated with prokaryotic translation. Genome Res 13:2665–2673CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Jiangxin Wang
    • 1
    • 2
    • 3
  • Gang Wu
    • 4
  • Lei Chen
    • 1
    • 2
    • 3
  • Weiwen Zhang
    • 1
    • 2
    • 3
    Email author
  1. 1.Laboratory of Synthetic Microbiology, School of Chemical Engineering and TechnologyTianjin UniversityTianjinPeople’s Republic of China
  2. 2.Key Laboratory of Systems BioengineeringMinistry of Education of ChinaTianjinPeople’s Republic of China
  3. 3.Collaborative Innovation Center of Chemical Science and EngineeringTianjinPeople’s Republic of China
  4. 4.University of Maryland at Baltimore CountryBaltimore CountyUSA

Personalised recommendations