The principle of exhaustiveness versus the principle of parsimony: a new approach for the identification of biomarkers from proteomic spot volume datasets based on principal component analysis
The field of biomarkers discovery is one of the leading research areas in proteomics. One of the most exploited approaches to this purpose consists of the identification of potential biomarkers from spot volume datasets produced by 2D gel electrophoresis. In this case, problems may arise due to the large number of spots present in each map and the small number of maps available for each class (control/pathological). Multivariate methods are therefore usually applied together with variable selection procedures, to provide a subset of potential candidates. The variable selection procedures available usually pursue the so-called principle of parsimony: the most parsimonious set of spots is selected, providing the best classification performances. This approach is not effective in proteomics since all potential biomarkers must be identified: not only the most discriminating spots, usually related to general responses to inflammatory events, but also the smallest differences and all redundant molecules, i.e. biomarkers showing similar behaviour. The principle of exhaustiveness should be pursued rather than parsimony. To solve this problem, a new ranking and classification method, “Ranking-PCA”, based on principal component analysis and variable selection in forward search, is proposed here for the exhaustive identification of all possible biomarkers. The method is successfully applied to three different proteomic datasets to prove its effectiveness.
KeywordsExhaustiveness Biomarker discovery Ranking PCA Variable selection 2D gel electrophoresis Classification methods
The authors gratefully acknowledge the collaboration of Prof Pier Giorgio Righetti (Polytechnic of Milan, Italy) and Dr Daniela Cecconi (University of Verona, Italy) who provided the proteomic datasets used in this study.
- 1.U.S. Human Genome Project (Department of Energy and the National Institutes of Health of USA). http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
- 3.Heidema AG, Thissen U, Boer JMA, Bouwman FG, Feskens EJM, Mariman ECM (2009) The association of 83 plasma proteins with CHD mortality, BMI, HDL-, and total-cholesterol in men: applying multivariate statistics to identify proteins with prognostic value and biological relevance. J Prot Res 8(6):2640–2649CrossRefGoogle Scholar
- 7.Marengo E, Robotti E, Righetti PG, Campostrini N, Pascali J, Ponzoni M, Hamdan M, Astner H (2004) Study of proteomic changes associated with healthy and tumoral murine samples in neuroblastoma by principal component analysis and classification methods. Clin Chim Acta 345(1–2):55–67CrossRefGoogle Scholar
- 12.Lo Re VIII, Bellini LM (2002) William of Occam and Occam's razor. Annals Int Med 136(8):634–635Google Scholar
- 13.Massart DL, Vanderginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1998) Handbook of chemometrics and qualimetrics: part A. Elsevier, AmsterdamGoogle Scholar
- 14.Massart DL, Vanderginste BGM, Deming SM, Michotte Y, Kaufman L (1988) Chemometrics: a textbook. Elsevier, AmsterdamGoogle Scholar