Statistics in Biosciences

, 1:228

Bayesian Analysis of iTRAQ Data with Nonrandom Missingness: Identification of Differentially Expressed Proteins

  • Ruiyan Luo
  • Christopher M. Colangelo
  • William C. Sessa
  • Hongyu Zhao


iTRAQ (isobaric Tags for Relative and Absolute Quantitation) is a technique that allows simultaneous quantitation of proteins in multiple samples. In this paper, we describe a Bayesian hierarchical model-based method to infer the relative protein expression levels and hence to identify differentially expressed proteins from iTRAQ data. Our model assumes that the measured peptide intensities are affected by both protein expression levels and peptide specific effects. The values of these two effects across experiments are modeled as random effects. The nonrandom missingness of peptide data is modeled with a logistic regression which relates the missingness probability for a peptide with the expression level of the protein that produces this peptide. We propose a Markov chain Monte Carlo method for the inference of model parameters, including the relative expression levels across samples. Our simulation results suggest that the estimates of relative protein expression levels based on the MCMC samples have smaller bias than those estimated from ANOVA models or fold changes. We apply our method to an iTRAQ dataset studying the roles of Caveolae for postnatal cardiovascular function.


Bayesian hierarchical model iTRAQ Mixed-effects model Nonignorable missing Protein quantitation 


  1. 1.
    Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193 CrossRefGoogle Scholar
  2. 2.
    Choe L, D’Ascenzo M, Relkin NR, Pappin D, Ross P, Williamson B, Guertin S, Pribil P, Lee KH (2007) 8-plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer’s disease. Proteomics 7:3651–3660 CrossRefGoogle Scholar
  3. 3.
    Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R (1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 17:994–999 CrossRefGoogle Scholar
  4. 4.
    Hamdan M, Righetti PG (2002) Modern strategies for protein quantification in proteome analysis: advantages and limitations. Mass Spectrom Rev 21:287–302 CrossRefGoogle Scholar
  5. 5.
    Hill EG, Schwacke JH, Comte-Walters S, Slate EH, Oberg AL, Eckel-Passow JE, Therneau TM, Schey KL (2008) A statistical model for iTRAQ data analysis. J Proteome Res 7:3091–3101 CrossRefGoogle Scholar
  6. 6.
    Liu H, Sadygov RG, Yates JR (2004) A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 76:4193–4201 CrossRefGoogle Scholar
  7. 7.
    Marx J (2001) Caveolae: a once-elusive structure gets some respect. Science 294:1862–1865 Google Scholar
  8. 8.
    Oberg A, Mahoney D, Eckel-Passow J, Malone C, Wolfinger R, Hill E, Cooper L, Onuma O, Spiro C, Therneau T, Bergen H (2008) Statistical analysis of relative labeled mass spectrometry data from complex samples using ANOVA. J Proteome Res 7:225–233 CrossRefGoogle Scholar
  9. 9.
    O’Farrell PH (1975) High resolution two-dimensional electrophoresis of proteins. J Biol Chem 250:4007–4012 Google Scholar
  10. 10.
    Patton WF (2002) Detection technologies in proteome analysis. J Chromatogr B, Anal Technol Biomed Life Sci 771:3–31 CrossRefGoogle Scholar
  11. 11.
    Perkins DN, Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567 CrossRefGoogle Scholar
  12. 12.
    Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ (2004) Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 3:1154–1169 CrossRefGoogle Scholar
  13. 13.
    Salim K, Kehoe L, Minkoff MS, Bilsland JG, Munoz-Sanjuan I, Guest PC (2006) Identification of differentiating neural progenitor cell markers using shotgun isobaric tagging mass spectrometry. Stem Cells Dev 15:461–470 CrossRefGoogle Scholar
  14. 14.
    Seshi B (2006) An integrated approach to mapping the proteome of the human bone marrow stromal cell. Proteomics 6:5169–5182 CrossRefGoogle Scholar
  15. 15.
    Wang P, Tang H, Zhang H, Whiteaker J, Paulovich AG, Mcintosh M (2006) Normalization regarding non-random missing values in high-throughput mass spectrometry data. Pac Symp Biocomput 11:315–326 CrossRefGoogle Scholar
  16. 16.
    Wu WW, Wang G, Baek SJ, Shen R-F (2006) Comparative study of three proteomic quantitative methods, DIGE, cICAT, and iTRAQ, using 2D Gel- or LC-MALDI TOF/TOF. J Proteome Res 5:651–658 CrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2009

Authors and Affiliations

  • Ruiyan Luo
    • 1
  • Christopher M. Colangelo
    • 2
  • William C. Sessa
    • 3
  • Hongyu Zhao
    • 1
  1. 1.Department of Epidemiology and Public HealthYale University School of MedicineNew HavenUSA
  2. 2.W.M. Keck Foundation, Biotechnology Resource LaboratoryYale University School of MedicineNew HavenUSA
  3. 3.Department of PharmacologyYale University School of MedicineNew HavenUSA

Personalised recommendations