Sankhya B

, 73:123 | Cite as

Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis



Two-dimensional gel electrophoresis is a biochemical technique that combines isoelectric focusing and SDS-polyacrylamide gel technology to achieve simultaneous separation of protein mixtures on the basis of isoelectric point and molecular weight. Upon staining, each protein on a gel can be characterized by an intensity measurement that reflects its abundance in the mixture. These can then conceptually be used to determine which proteins are differentially expressed under different experimental conditions. We propose an EM approach to identify differentially expressed proteins using an inferential strategy that accounts for uncertainty in matching spots to proteins across gels. The underlying mixture model has trivariate Gaussian components. The application of the EM is however, not straightforward, with the main difficulty lying in the E-step calculations because of the dependent structure of proteins within each gel. Therefore, the usual model-based clustering approach is inapplicable, and an MCMC approach is employed. Through data-based simulation, we demonstrate that our proposed method effectively accounts for uncertainty in spot matching and more successfully distinguishes differentially and non-differentially expressed proteins than a naïve t-test which ignores uncertainty in spot matching.


Conditional point process simulation Isoelectric points Molecular weights ROC curves Observed information matrix EM algorithm Markov chain Monte Carlo Gaussian mixture model 


  1. Almeida, J.S., R. Stanislaus, E. Krug, and J.M. Arthur. 2003. Normalization and analysis of residual variation in two-dimensional gel electrophoresis for quantitative differential proteomics. Proteomics 3:1567–1596.CrossRefGoogle Scholar
  2. Altman, M., J. Gill, and M. McDonald. 2003. Numerical issues in statistical computing for the social scientist. New York: Wiley-Interscience.CrossRefGoogle Scholar
  3. Baddeley, A.J., and J. Møller. 1989. Nearest-neighbour Markov point processes and random sets. International Statistical Review 2:89–121.Google Scholar
  4. Booth, J.G., and J.P. Hobert. 1999. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society 61:265–285.MATHCrossRefGoogle Scholar
  5. Celeux, G., and J. Diebolt. 1992. A stochastic approximation type EM algorithm for the mixture problem. Stochastics and Stochastic Reports 41:127–146.MathSciNetGoogle Scholar
  6. Dasgupta, S. 1999. Learning mixtures of Gaussians. In Proc. IEEE symposium on foundations of computer science, 633–644. New York.Google Scholar
  7. Delyon, B., M. Lavielle, and E. Moulines. 1999. Convergence of a stochastic approximation of the EM algorithm. The Annals of Statistics 27:94–128.MathSciNetMATHCrossRefGoogle Scholar
  8. Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood for incomplete data via the EM algorithm (with discussion). Jounal of the Royal Statistical Society, Series B 39:1–38.MathSciNetMATHGoogle Scholar
  9. Dowsey, A., M.J. Dunn, and G. Yang. 2003. The role of bioinformatics in two-dimensional gel electrophoresis. Proteomics 3:1567–1596.CrossRefGoogle Scholar
  10. Green, P.J., and K.V. Mardia. 2006. Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93(2):235–254.MathSciNetMATHCrossRefGoogle Scholar
  11. Levine, R., and G. Casella. 2001. Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics 10:422–439.MathSciNetCrossRefGoogle Scholar
  12. Levine, R., and J. Fan. 2004. An automated (Markov Chain) Monte Carlo algorithm. Journal of Statistical Computation and Simulation 74:349–359.MathSciNetMATHCrossRefGoogle Scholar
  13. Louis, T.A. 1982. Finding the observed information matrix when using the EM algorithm. Journal of Royal Statistical Society, B 44:226–233.MathSciNetMATHGoogle Scholar
  14. Maitra, R. 2009. Initializing partition-optimization algorithms. IEEE/ACM Transactions on Computational Biology and Bioinformatics 6:144–157. doi:10.1109/TCBB.2007.70244.CrossRefGoogle Scholar
  15. McLachlan, G., and T. Krishnan. 2008. The EM algorithm and extensions. New York: Wiley.MATHCrossRefGoogle Scholar
  16. McLachlan, G., and D. Peel. 2000. Finite mixture models. New York: Wiley.MATHCrossRefGoogle Scholar
  17. Meng, X.L., and D.B. Rubin. 1991. Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86:899–909.CrossRefGoogle Scholar
  18. Morris, J.S., B.N. Clark, and H.B. Gutstein. 2008. Pinnacle: A fast, automatic and accurate method for detecting and quantifying protein spots in 2-dimensional gel electrophoresis data. Bioinformatics 24:529–536.CrossRefGoogle Scholar
  19. Palagi, P.M., P. Hernandez, D. Walther, and R.D. Appel. 2006. Proteome informatics I: Bioinformatics tools for processing experimental data. Proteomics 6:5435–5444.CrossRefGoogle Scholar
  20. Roy, A., F. Seillier-Moiseiwitsch, K. Lee, Y. Hang, M.R. Marten, and B. Raman. 2003. Analyzing two-dimensional gel images. Chance 16:13–18.MathSciNetGoogle Scholar
  21. Wei, G.C.J., and M.A. Tanner. 1990. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association 85:699–704.CrossRefGoogle Scholar

Copyright information

© Indian Statistical Institute 2011

Authors and Affiliations

  • Volodymyr Melnykov
    • 1
  • Ranjan Maitra
    • 2
  • Dan Nettleton
    • 2
  1. 1.Department of StatisticsNorth Dakota State UniversityFargoUSA
  2. 2.Department of StatisticsIowa State UniversityAmesUSA

Personalised recommendations