Advertisement

Multiprobabilistic prediction in early medical diagnoses

  • Ilia NouretdinovEmail author
  • Dmitry Devetyarov
  • Volodya Vovk
  • Brian Burford
  • Stephane Camuzeaux
  • Aleksandra Gentry-Maharaj
  • Ali Tiss
  • Celia Smith
  • Zhiyuan Luo
  • Alexey Chervonenkis
  • Rachel Hallett
  • Mike Waterfield
  • Rainer Cramer
  • John F. Timms
  • Ian Jacobs
  • Usha Menon
  • Alex Gammerman
Article

Abstract

This paper describes the methodology of providing multiprobability predictions for proteomic mass spectrometry data. The methodology is based on a newly developed machine learning framework called Venn machines. Is allows to output a valid probability interval. The methodology is designed for mass spectrometry data. For demonstrative purposes, we applied this methodology to MALDI-TOF data sets in order to predict the diagnosis of heart disease and early diagnoses of ovarian cancer and breast cancer. The experiments showed that probability intervals are narrow, that is, the output of the multiprobability predictor is similar to a single probability distribution. In addition, probability intervals produced for heart disease and ovarian cancer data were more accurate than the output of corresponding probability predictor. When Venn machines were forced to make point predictions, the accuracy of such predictions is for the most data better than the accuracy of the underlying algorithm that outputs single probability distribution of a label. Application of this methodology to MALDI-TOF data sets empirically demonstrates the validity. The accuracy of the proposed method on ovarian cancer data rises from 66.7 % 11 months in advance of the moment of diagnosis to up to 90.2 % at the moment of diagnosis. The same approach has been applied to heart disease data without time dependency, although the achieved accuracy was not as high (up to 69.9 %). The methodology allowed us to confirm mass spectrometry peaks previously identified as carrying statistically significant information for discrimination between controls and cases.

Keywords

Confident prediction Probabilistic prediction Risk Diagnostic 

Mathematics Subject Classifications (2010)

68T05 68Q32 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dawid, A.P.: Probability Forecasting. Encyclopedia of Statistical Sciences, vol. 7. pp. 210–218. Wiley, New York (1985)Google Scholar
  2. 2.
    Devetyarov, D., Nouretdinov, I., Burford, B., Luo, Z., Chervonenkis, A., Vovk, V., Waterfield, M., Tiss, A., Smith, C., Cramer, R., Gentry-Maharaj, A., Hallett, R., Camuzeaux, S., Ford, J., Timms, J., Menon, U., Jacobs, I., Gammerman, A.: Analysis of serial UKCTOCS-OC data: discriminating abilities of proteomics peaks. (Technical report). http://www.clrc.rhul.ac.uk/projects/proteomic3.htm (2008)
  3. 3.
    Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, New York (2005)zbMATHGoogle Scholar
  4. 4.
    Vovk, V., Shafer, G., Nouretdinov, I.: Self-Calibrating Probability Forecasting. (On-line compression modelling project. Working paper 9) http://vovk.net/cp/09.pdf (2003)
  5. 5.
    von Mises, R.: Grundlagen der wahrscheinlichkeitsrechnung. Math. Z. 5, 52–99 (1919)zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    von Mises, R.: Wahrscheinlichkeitsrechnung, Statistik und Wahrheit. Julius Springer, Wien (1928)CrossRefGoogle Scholar
  7. 7.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  8. 8.
    Timms, J.F., Menon, U., Devetyarov, D., Tiss, A., Camuzeaux, S., McCurry, K., Nouretdinov, I., Burford, B., Smith, C., Gentry-Maharaj, A., Hallett, R., Ford, J., Luo, Z., Vovk, V., Gammerman, A., Cramer, R; Jacobs, I.: Early detection of ovarian cancer in pre-diagnosis samples using CA125 and MALDI MS peaks. Cancer Genomics Proteomics 8(6), 289–305 (2011)Google Scholar
  9. 9.
    Gammerman, A., Vovk, V., Burford, B., Nouretdinov, I., Luo, Z., Chervonenkis, A., Waterfield, M., Cramer, R., Tempst, P., Villanueva, J., Kabir, M., Camuzeaux, S., Timms, J., Menon, U., Jacobs, I.: Serum proteomic abnormality predating screen detection of ovarian cancer. Comput. J. 52(3), 326–333 (2009). On behalf of the British Computer SocietyGoogle Scholar
  10. 10.
    Gelman, A., Carlin, J.B., Stern, H.S. Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall/CRC (2003)Google Scholar
  11. 11.
    Papadopoulos, H.: Reliable probabilistic prediction for medical decision support. In: Artificial Intelligence Applications and Innovations IFIP Advances in Information and Communication Technology, vol. 364, pp. 265–274 (2011)Google Scholar
  12. 12.
    Zhou, C., Nouretdinov, I., Luo, Z., Adamskiy, D., Coldham, N., Gammerman, A.: A comparison of Venn machine with Platt’s method in probabilistic outputs. In: 12th INNS EANN-SIG International Conference, EANN 2011 and 7th IFIP WG 12.5 International Conference, Artificial Intelligence Applications and Innovations. Corfu, Greece, 15–18 September 2011. Proceedings Part II. IFIP AICT, vol. 364, pp. 483–490 (2011)Google Scholar
  13. 13.
    Lambrou, A., Papadopoulos, H., Nouretdinov, I., and Gammerman, A.: Reliable probability estimates based on support vector machines for large multiclass datasets. In: AIAI 2012 Workshops, IFIP AICT, vol. 382, pp. 182–191. Springer (2012). doi: 10.1007/978-3-642-33412-2_19
  14. 14.
    Timms, J.F. , Cramer, R., Camuzeaux, S., Tiss, A., Smith, C., Burford, B., Nouretdinov, I., Devetyarov, D., Gentry-Maharaj, A., Ford, J., Luo, Z., Gammerman, A., Menon, U., Jacobs, I.: Peptides generated ex vivo from serum proteins by tumour-specific exopeptidases are not useful biomarkers in ovarian cancer. Clin. Chem. 56, 262–271 (2010)CrossRefGoogle Scholar
  15. 15.
    Devetyarov, D. Confidence and Venn machines and their applications to proteomics. Doctoral thesis (2011). Available at http://digirep.rhul.ac.uk/file/4d74228e-3ca0-d6ca-469f-0ce0b22c122d/1/PhD_Thesis_Final_Dmitry_Devetyarov2011.pdf

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Ilia Nouretdinov
    • 1
    Email author
  • Dmitry Devetyarov
    • 1
  • Volodya Vovk
    • 1
  • Brian Burford
    • 1
  • Stephane Camuzeaux
    • 2
  • Aleksandra Gentry-Maharaj
    • 2
  • Ali Tiss
    • 3
  • Celia Smith
    • 3
  • Zhiyuan Luo
    • 1
  • Alexey Chervonenkis
    • 1
  • Rachel Hallett
    • 2
  • Mike Waterfield
    • 2
  • Rainer Cramer
    • 3
  • John F. Timms
    • 2
  • Ian Jacobs
    • 2
  • Usha Menon
    • 2
  • Alex Gammerman
    • 1
  1. 1.Computer Learning Research Centre, Royal HollowayUniversity of LondonLondonUK
  2. 2.EGA Institute for Women’s HealthUniversity College LondonLondonUK
  3. 3.BioCentre and Department of ChemistryUniversity of ReadingWest BerkshireUK

Personalised recommendations