Abstract
Two-dimensional gel electrophoresis is a biochemical technique that combines isoelectric focusing and SDS-polyacrylamide gel technology to achieve simultaneous separation of protein mixtures on the basis of isoelectric point and molecular weight. Upon staining, each protein on a gel can be characterized by an intensity measurement that reflects its abundance in the mixture. These can then conceptually be used to determine which proteins are differentially expressed under different experimental conditions. We propose an EM approach to identify differentially expressed proteins using an inferential strategy that accounts for uncertainty in matching spots to proteins across gels. The underlying mixture model has trivariate Gaussian components. The application of the EM is however, not straightforward, with the main difficulty lying in the E-step calculations because of the dependent structure of proteins within each gel. Therefore, the usual model-based clustering approach is inapplicable, and an MCMC approach is employed. Through data-based simulation, we demonstrate that our proposed method effectively accounts for uncertainty in spot matching and more successfully distinguishes differentially and non-differentially expressed proteins than a naïve t-test which ignores uncertainty in spot matching.
Similar content being viewed by others
References
Almeida, J.S., R. Stanislaus, E. Krug, and J.M. Arthur. 2003. Normalization and analysis of residual variation in two-dimensional gel electrophoresis for quantitative differential proteomics. Proteomics 3:1567–1596.
Altman, M., J. Gill, and M. McDonald. 2003. Numerical issues in statistical computing for the social scientist. New York: Wiley-Interscience.
Baddeley, A.J., and J. Møller. 1989. Nearest-neighbour Markov point processes and random sets. International Statistical Review 2:89–121.
Booth, J.G., and J.P. Hobert. 1999. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society 61:265–285.
Celeux, G., and J. Diebolt. 1992. A stochastic approximation type EM algorithm for the mixture problem. Stochastics and Stochastic Reports 41:127–146.
Dasgupta, S. 1999. Learning mixtures of Gaussians. In Proc. IEEE symposium on foundations of computer science, 633–644. New York.
Delyon, B., M. Lavielle, and E. Moulines. 1999. Convergence of a stochastic approximation of the EM algorithm. The Annals of Statistics 27:94–128.
Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood for incomplete data via the EM algorithm (with discussion). Jounal of the Royal Statistical Society, Series B 39:1–38.
Dowsey, A., M.J. Dunn, and G. Yang. 2003. The role of bioinformatics in two-dimensional gel electrophoresis. Proteomics 3:1567–1596.
Green, P.J., and K.V. Mardia. 2006. Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93(2):235–254.
Levine, R., and G. Casella. 2001. Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics 10:422–439.
Levine, R., and J. Fan. 2004. An automated (Markov Chain) Monte Carlo algorithm. Journal of Statistical Computation and Simulation 74:349–359.
Louis, T.A. 1982. Finding the observed information matrix when using the EM algorithm. Journal of Royal Statistical Society, B 44:226–233.
Maitra, R. 2009. Initializing partition-optimization algorithms. IEEE/ACM Transactions on Computational Biology and Bioinformatics 6:144–157. doi:10.1109/TCBB.2007.70244.
McLachlan, G., and T. Krishnan. 2008. The EM algorithm and extensions. New York: Wiley.
McLachlan, G., and D. Peel. 2000. Finite mixture models. New York: Wiley.
Meng, X.L., and D.B. Rubin. 1991. Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86:899–909.
Morris, J.S., B.N. Clark, and H.B. Gutstein. 2008. Pinnacle: A fast, automatic and accurate method for detecting and quantifying protein spots in 2-dimensional gel electrophoresis data. Bioinformatics 24:529–536.
Palagi, P.M., P. Hernandez, D. Walther, and R.D. Appel. 2006. Proteome informatics I: Bioinformatics tools for processing experimental data. Proteomics 6:5435–5444.
Roy, A., F. Seillier-Moiseiwitsch, K. Lee, Y. Hang, M.R. Marten, and B. Raman. 2003. Analyzing two-dimensional gel images. Chance 16:13–18.
Wei, G.C.J., and M.A. Tanner. 1990. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association 85:699–704.
Acknowledgements
The authors acknowledge partial support by the National Science Foundation Awards NSF CAREER DMS-0437555, NSF IOS-0236060 and NSF DMS-0502347.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Melnykov, V., Maitra, R. & Nettleton, D. Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis. Sankhya B 73, 123–143 (2011). https://doi.org/10.1007/s13571-011-0016-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-011-0016-x