Multiprobabilistic prediction in early medical diagnoses
This paper describes the methodology of providing multiprobability predictions for proteomic mass spectrometry data. The methodology is based on a newly developed machine learning framework called Venn machines. Is allows to output a valid probability interval. The methodology is designed for mass spectrometry data. For demonstrative purposes, we applied this methodology to MALDI-TOF data sets in order to predict the diagnosis of heart disease and early diagnoses of ovarian cancer and breast cancer. The experiments showed that probability intervals are narrow, that is, the output of the multiprobability predictor is similar to a single probability distribution. In addition, probability intervals produced for heart disease and ovarian cancer data were more accurate than the output of corresponding probability predictor. When Venn machines were forced to make point predictions, the accuracy of such predictions is for the most data better than the accuracy of the underlying algorithm that outputs single probability distribution of a label. Application of this methodology to MALDI-TOF data sets empirically demonstrates the validity. The accuracy of the proposed method on ovarian cancer data rises from 66.7 % 11 months in advance of the moment of diagnosis to up to 90.2 % at the moment of diagnosis. The same approach has been applied to heart disease data without time dependency, although the achieved accuracy was not as high (up to 69.9 %). The methodology allowed us to confirm mass spectrometry peaks previously identified as carrying statistically significant information for discrimination between controls and cases.
KeywordsConfident prediction Probabilistic prediction Risk Diagnostic
Mathematics Subject Classifications (2010)68T05 68Q32
Unable to display preview. Download preview PDF.
- 1.Dawid, A.P.: Probability Forecasting. Encyclopedia of Statistical Sciences, vol. 7. pp. 210–218. Wiley, New York (1985)Google Scholar
- 2.Devetyarov, D., Nouretdinov, I., Burford, B., Luo, Z., Chervonenkis, A., Vovk, V., Waterfield, M., Tiss, A., Smith, C., Cramer, R., Gentry-Maharaj, A., Hallett, R., Camuzeaux, S., Ford, J., Timms, J., Menon, U., Jacobs, I., Gammerman, A.: Analysis of serial UKCTOCS-OC data: discriminating abilities of proteomics peaks. (Technical report). http://www.clrc.rhul.ac.uk/projects/proteomic3.htm (2008)
- 4.Vovk, V., Shafer, G., Nouretdinov, I.: Self-Calibrating Probability Forecasting. (On-line compression modelling project. Working paper 9) http://vovk.net/cp/09.pdf (2003)
- 8.Timms, J.F., Menon, U., Devetyarov, D., Tiss, A., Camuzeaux, S., McCurry, K., Nouretdinov, I., Burford, B., Smith, C., Gentry-Maharaj, A., Hallett, R., Ford, J., Luo, Z., Vovk, V., Gammerman, A., Cramer, R; Jacobs, I.: Early detection of ovarian cancer in pre-diagnosis samples using CA125 and MALDI MS peaks. Cancer Genomics Proteomics 8(6), 289–305 (2011)Google Scholar
- 9.Gammerman, A., Vovk, V., Burford, B., Nouretdinov, I., Luo, Z., Chervonenkis, A., Waterfield, M., Cramer, R., Tempst, P., Villanueva, J., Kabir, M., Camuzeaux, S., Timms, J., Menon, U., Jacobs, I.: Serum proteomic abnormality predating screen detection of ovarian cancer. Comput. J. 52(3), 326–333 (2009). On behalf of the British Computer SocietyGoogle Scholar
- 10.Gelman, A., Carlin, J.B., Stern, H.S. Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall/CRC (2003)Google Scholar
- 11.Papadopoulos, H.: Reliable probabilistic prediction for medical decision support. In: Artificial Intelligence Applications and Innovations IFIP Advances in Information and Communication Technology, vol. 364, pp. 265–274 (2011)Google Scholar
- 12.Zhou, C., Nouretdinov, I., Luo, Z., Adamskiy, D., Coldham, N., Gammerman, A.: A comparison of Venn machine with Platt’s method in probabilistic outputs. In: 12th INNS EANN-SIG International Conference, EANN 2011 and 7th IFIP WG 12.5 International Conference, Artificial Intelligence Applications and Innovations. Corfu, Greece, 15–18 September 2011. Proceedings Part II. IFIP AICT, vol. 364, pp. 483–490 (2011)Google Scholar
- 13.Lambrou, A., Papadopoulos, H., Nouretdinov, I., and Gammerman, A.: Reliable probability estimates based on support vector machines for large multiclass datasets. In: AIAI 2012 Workshops, IFIP AICT, vol. 382, pp. 182–191. Springer (2012). doi: 10.1007/978-3-642-33412-2_19
- 14.Timms, J.F. , Cramer, R., Camuzeaux, S., Tiss, A., Smith, C., Burford, B., Nouretdinov, I., Devetyarov, D., Gentry-Maharaj, A., Ford, J., Luo, Z., Gammerman, A., Menon, U., Jacobs, I.: Peptides generated ex vivo from serum proteins by tumour-specific exopeptidases are not useful biomarkers in ovarian cancer. Clin. Chem. 56, 262–271 (2010)CrossRefGoogle Scholar
- 15.Devetyarov, D. Confidence and Venn machines and their applications to proteomics. Doctoral thesis (2011). Available at http://digirep.rhul.ac.uk/file/4d74228e-3ca0-d6ca-469f-0ce0b22c122d/1/PhD_Thesis_Final_Dmitry_Devetyarov2011.pdf