Skip to main content

Application of Machine Learning for Multicenter Learning

  • Chapter
Machine Learning in Radiation Oncology

Abstract

Advancements in radiation oncology are driving more specific, and thus improved, treatment opportunities. This creates challenges on the assessment of treatment options, as more information is needed to make an informed decision. One of the methods is to use machine-learning techniques to develop predictive models. Although prediction models, embedded in clinical decision support systems (CDSSs), are the foreseen solution, developing/training such prediction models requires large amounts of detailed patient information to reach decisive power. The amount of patients needed to train a reliable prediction model rapidly outgrows the numbers available in a single institution, hence the need for multicenter machinelearning. To be able to learn over multiple centers, several infrastructural prerequisites need to be addressed. First, data needs to be extracted from multiple source systems and represented using standardized terminologies, preferably including the semantics (the actual description) of the represented data. For research and model training purposes, this means that value representations (e.g. “m” or “f” indicating gender) need to be converted into standardized terms (the NCI Thesaurus codes C20197 or C16576, respectively), and that patient-identifiable information (e.g. name, institutional ID, address, etc.) needs to be removed or changed in a non-identifiable way. If datasets from different institutions use the same standardized terminology and data structure, data can be merged. Finally, after merging, prediction models can be learned on the complete dataset, in this chapter known as centralized learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abernethy AP, Etheredge LM, Ganz PA, Wallace P, German RR, Neti C, Bach PB, Murphy SB. Rapid-learning system for cancer care. J Clin Oncol. 2010;28(27):4268–74. doi:10.1200/JCO.2010.28.5478.

    Article  PubMed Central  PubMed  Google Scholar 

  2. Allemang D, Hendler JA. Semantic web for the working ontologist effective modeling in RDFS and OWL. 2nd ed. Waltham: Morgan Kaufmann; 2011.

    Google Scholar 

  3. Berners-Lee T, Hendler J, Lassila O. The semantic web. Sci Am. 2001;284(5):28–37.

    Article  Google Scholar 

  4. Bizer C, Heath T, Berners-Lee T. Linked data-the story so far. Int J Semantic Web Inform Syst. 2009;5(3):1–22.

    Article  Google Scholar 

  5. Boyd S. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2010;3(1):1–122. doi:10.1561/2200000016.

    Article  Google Scholar 

  6. Brickley D, Guha R. RDF schema 1.1. 2014. URL http://www.w3.org/TR/2014/REC-rdf-schema-20140225/.

  7. De Keizer NF, Abu-Hanna A, Zwetsloot-Schonk JHM. Understanding terminological systems. I: terminology and typology. Method Inform Med. 2000;39:16–21.

    Google Scholar 

  8. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.

    Article  Google Scholar 

  9. Dekker A, Nalbantov G, Oberije C, Wiessler W, Elbe M, Dries W, JanvaryL, Bulens P, Krishnapuram B, Lambin P. Multi-centric learning with a federated IT infrastructure: application to 2-year lung-cancer survival prediction. In: 2nd ESTRO FORUM, Elsevier, Geneva, Switzerland, 2013: p. S35. http://www.estro-events.org/ESTROevents/Documents/FORUM_abstract_bookPRESS_lowres.pdf.

  10. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014. doi:10.1136/amiajnl-2014-002747.

    Google Scholar 

  11. Gali A, Chen C, Claypool K, Uceda-Sosa R. From ontology to relational databases. In: Wang S, Tanaka K, Zhou S, Ling T-W, Guan J, Yang D, et al. (Eds.), Conceptual modeling for advanced application domains. Springer Berlin Heidelberg; 2004. p. 278–89. http://dx.doi.org/10.1007/978-3-540-30466-1_26.

  12. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter. 2009;11(1):10–8.

    Article  Google Scholar 

  13. Hofmann M, Klinkenberg R. RapidMiner: data mining use cases and business analytics applications. Boca Raton: CRC Press; 2013. ISBN: 978-1482205497

    Google Scholar 

  14. Jayasurya K, Fung G, Yu S, Dehing-Oberije C, De Ruysscher D, Hope A, De Neve W, Lievens Y, Lambin P, Dekker ALAJ. Comparison of bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. Med Phys. 2010;37(4):1401–7. doi:10.1118/1.3352709.

    Article  CAS  PubMed  Google Scholar 

  15. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, Aerts HJ. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6. doi:10.1016/j.ejca.2011.11.036.

    Article  PubMed  Google Scholar 

  16. Lambin P, Roelofs E, Reymen B, Velazquez ER, Buijsen J, Zegers CM, Carvalho S, Leijenaar RT, Nalbantov G, Oberije C, Scott Marshall M, Hoebers F, Troost EG, van Stiphout RG, van Elmpt W, van der Weijden T, Boersma L, Valentini V, Dekker A. ‘Rapid learning health care in oncology’ – an approach towards decision support systems enabling customised radiotherapy. Radiother Oncol. 2013;109(1):159–64. doi:10.1016/j.radonc.2013.07.007.

    Article  PubMed  Google Scholar 

  17. Lambin P, van Stiphout RGPM, Starmans MHW, Rios-Velazquez E, Nalbantov G, Aerts HJWL, Roelofs E, van Elmpt W, Boutros PC, Granone P, Valentini V, Begg AC, De Ruysscher D, Dekker A. Predicting outcomes in radiation oncology – multifactorial decision support systems. Nat Rev Clin Oncol. 2012;10(1):27–40. doi:10.1038/nrclinonc.2012.196.

    Article  PubMed  Google Scholar 

  18. Leijenaar RTH, Carvalho S, Velazquez ER, van Elmpt WJC, Parmar C, Hoekstra OS, Hoekstra CJ, Boellaard R, Dekker ALAJ, Gillies RJ, Aerts HJWL, Lambin P. Stability of FDG-PET radiomics features: an integrated analysis of test-retest and inter-observer variability. Acta Oncol. 2013;52(7):1391–7. doi:10.3109/0284186X.2013.812798.

    Article  CAS  PubMed  Google Scholar 

  19. Liu K, Kargupta H, Ryan J. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. Knowledge Data Eng IEEE Transact. 2006;18(1):92–106.

    Article  Google Scholar 

  20. Meldolesi E, van Soest J, Dinapoli N, Dekker A, Damiani A, Gambacorta MA, Valentini V. An umbrella protocol for standardized data collection (SDC) in rectal cancer: a prospective uniform naming and procedure convention to support personalized medicine. Radiother Oncol. 2014. doi:10.1016/j.radonc.2014.04.008.

    Google Scholar 

  21. Murphy SN, Chueh HC. A security architecture for query tools used to access large biomedical databases. In: Proceedings of the AMIA symposium. American Medical Informatics Association. 2002. p. 552.

    Google Scholar 

  22. Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, Phillips LC, Gainer V, Berkowicz D, Glaser JP, Kohane I. Architecture of the open-source clinical research chart from informatics for integrating biology and the bedside. In: AMIA annual symposium proceedings, vol. 2007. American Medical Informatics Association. 2007. p. 552–6.

    Google Scholar 

  23. Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124–30. doi:10.1136/jamia.2009.000893.

    Article  PubMed Central  PubMed  Google Scholar 

  24. Prud’hommeaux E, Seaborne A. SPARQL query language for RDF. 2008. URL http://www.w3.org/TR/rdf-sparql-query/.

  25. Ramamohan Y, Vasantharao K, Chakravarti CK, Ratnam ASK. A study of data mining tools in knowledge discovery process. Int J Soft Comput Eng (IJSCE). ISSN. 2012;2(3):2231–307.

    Google Scholar 

  26. Roelofs E, Dekker A, Meldolesi E, van Stiphout RG, Valentini V, Lambin P. International data-sharing for radiotherapy research: an open-source based infrastructure for multicentric clinical data mining. Radiother Oncol. 2014;110(2):370–4. doi:10.1016/j.radonc.2013.11.001.

    Article  PubMed  Google Scholar 

  27. Roelofs E, Persoon L, Nijsten S, Wiessler W, Dekker A, Lambin P. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial. Radiother Oncol. 2013;108(1):174–9. doi:10.1016/j.radonc.2012.09.019.

    Article  PubMed  Google Scholar 

  28. Sioutos N, Coronado SD, Haber MW, Hartel FW, Shaiu WL, Wright LW. NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform. 2007;40(1):30–43. doi:10.1016/j.jbi.2006.02.013.

    Article  CAS  PubMed  Google Scholar 

  29. Valentini V, Schmoll HJ, Velde CJH. Multidisciplinary management of rectal cancer questions and answers. Berlin/New York: Springer; 2012.

    Book  Google Scholar 

  30. Waitman LR, Aaronson LS, Nadkarni PM, Connolly DW, Campbell JR. The greater plains collaborative: a PCORnet clinical research data network. J Am Med Inform Assoc. 2014. doi:10.1136/amiajnl-2014-002756.

    PubMed Central  PubMed  Google Scholar 

  31. Weber GM, Murphy SN, McMurry AJ, MacFadden D, Nigrin DJ, Churchill S, Kohane IS. The shared health research information network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624–30. doi:10.1197/jamia.M3191.

    Article  PubMed Central  PubMed  Google Scholar 

  32. Wiessler W, Dekker A, Nalbantov G, Oberije C, Eble M, Dries W, Janvary L, Bulens P, Balaji, K, Lambin P. Privacy-preserving, multi-centric machine learning across institutions and countries: does it work? Elsevier, Geneva. 2013

    Google Scholar 

  33. World Health Organization. International statistical classification of diseases and related health problems. Geneva: World Health Organization; 2011.

    Google Scholar 

  34. Wu Y, Jiang X, Kim J, Ohno-Machado L. Grid binary LOgistic REgression (GLORE): building shared models without sharing data. J Am Med Inform Assoc. 2012;19(5):758–64. doi:10.1136/amiajnl-2012-000862.

    Article  PubMed Central  PubMed  Google Scholar 

  35. Yu S, Fung G, Rosales R, Krishnan S, Rao RB, Dehing-Oberije C, Lambin P. Privacy-preserving cox regression for survival analysis. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM; 2008. p. 1034–42.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgi Nalbantov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

van Soest, J.P.A., Dekker, A.L.A.J., Roelofs, E., Nalbantov, G. (2015). Application of Machine Learning for Multicenter Learning. In: El Naqa, I., Li, R., Murphy, M. (eds) Machine Learning in Radiation Oncology. Springer, Cham. https://doi.org/10.1007/978-3-319-18305-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18305-3_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18304-6

  • Online ISBN: 978-3-319-18305-3

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics