Abstract
Advancements in radiation oncology are driving more specific, and thus improved, treatment opportunities. This creates challenges on the assessment of treatment options, as more information is needed to make an informed decision. One of the methods is to use machine-learning techniques to develop predictive models. Although prediction models, embedded in clinical decision support systems (CDSSs), are the foreseen solution, developing/training such prediction models requires large amounts of detailed patient information to reach decisive power. The amount of patients needed to train a reliable prediction model rapidly outgrows the numbers available in a single institution, hence the need for multicenter machinelearning. To be able to learn over multiple centers, several infrastructural prerequisites need to be addressed. First, data needs to be extracted from multiple source systems and represented using standardized terminologies, preferably including the semantics (the actual description) of the represented data. For research and model training purposes, this means that value representations (e.g. “m” or “f” indicating gender) need to be converted into standardized terms (the NCI Thesaurus codes C20197 or C16576, respectively), and that patient-identifiable information (e.g. name, institutional ID, address, etc.) needs to be removed or changed in a non-identifiable way. If datasets from different institutions use the same standardized terminology and data structure, data can be merged. Finally, after merging, prediction models can be learned on the complete dataset, in this chapter known as centralized learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abernethy AP, Etheredge LM, Ganz PA, Wallace P, German RR, Neti C, Bach PB, Murphy SB. Rapid-learning system for cancer care. J Clin Oncol. 2010;28(27):4268–74. doi:10.1200/JCO.2010.28.5478.
Allemang D, Hendler JA. Semantic web for the working ontologist effective modeling in RDFS and OWL. 2nd ed. Waltham: Morgan Kaufmann; 2011.
Berners-Lee T, Hendler J, Lassila O. The semantic web. Sci Am. 2001;284(5):28–37.
Bizer C, Heath T, Berners-Lee T. Linked data-the story so far. Int J Semantic Web Inform Syst. 2009;5(3):1–22.
Boyd S. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2010;3(1):1–122. doi:10.1561/2200000016.
Brickley D, Guha R. RDF schema 1.1. 2014. URL http://www.w3.org/TR/2014/REC-rdf-schema-20140225/.
De Keizer NF, Abu-Hanna A, Zwetsloot-Schonk JHM. Understanding terminological systems. I: terminology and typology. Method Inform Med. 2000;39:16–21.
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.
Dekker A, Nalbantov G, Oberije C, Wiessler W, Elbe M, Dries W, JanvaryL, Bulens P, Krishnapuram B, Lambin P. Multi-centric learning with a federated IT infrastructure: application to 2-year lung-cancer survival prediction. In: 2nd ESTRO FORUM, Elsevier, Geneva, Switzerland, 2013: p. S35. http://www.estro-events.org/ESTROevents/Documents/FORUM_abstract_bookPRESS_lowres.pdf.
Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014. doi:10.1136/amiajnl-2014-002747.
Gali A, Chen C, Claypool K, Uceda-Sosa R. From ontology to relational databases. In: Wang S, Tanaka K, Zhou S, Ling T-W, Guan J, Yang D, et al. (Eds.), Conceptual modeling for advanced application domains. Springer Berlin Heidelberg; 2004. p. 278–89. http://dx.doi.org/10.1007/978-3-540-30466-1_26.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter. 2009;11(1):10–8.
Hofmann M, Klinkenberg R. RapidMiner: data mining use cases and business analytics applications. Boca Raton: CRC Press; 2013. ISBN: 978-1482205497
Jayasurya K, Fung G, Yu S, Dehing-Oberije C, De Ruysscher D, Hope A, De Neve W, Lievens Y, Lambin P, Dekker ALAJ. Comparison of bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. Med Phys. 2010;37(4):1401–7. doi:10.1118/1.3352709.
Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, Aerts HJ. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6. doi:10.1016/j.ejca.2011.11.036.
Lambin P, Roelofs E, Reymen B, Velazquez ER, Buijsen J, Zegers CM, Carvalho S, Leijenaar RT, Nalbantov G, Oberije C, Scott Marshall M, Hoebers F, Troost EG, van Stiphout RG, van Elmpt W, van der Weijden T, Boersma L, Valentini V, Dekker A. ‘Rapid learning health care in oncology’ – an approach towards decision support systems enabling customised radiotherapy. Radiother Oncol. 2013;109(1):159–64. doi:10.1016/j.radonc.2013.07.007.
Lambin P, van Stiphout RGPM, Starmans MHW, Rios-Velazquez E, Nalbantov G, Aerts HJWL, Roelofs E, van Elmpt W, Boutros PC, Granone P, Valentini V, Begg AC, De Ruysscher D, Dekker A. Predicting outcomes in radiation oncology – multifactorial decision support systems. Nat Rev Clin Oncol. 2012;10(1):27–40. doi:10.1038/nrclinonc.2012.196.
Leijenaar RTH, Carvalho S, Velazquez ER, van Elmpt WJC, Parmar C, Hoekstra OS, Hoekstra CJ, Boellaard R, Dekker ALAJ, Gillies RJ, Aerts HJWL, Lambin P. Stability of FDG-PET radiomics features: an integrated analysis of test-retest and inter-observer variability. Acta Oncol. 2013;52(7):1391–7. doi:10.3109/0284186X.2013.812798.
Liu K, Kargupta H, Ryan J. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. Knowledge Data Eng IEEE Transact. 2006;18(1):92–106.
Meldolesi E, van Soest J, Dinapoli N, Dekker A, Damiani A, Gambacorta MA, Valentini V. An umbrella protocol for standardized data collection (SDC) in rectal cancer: a prospective uniform naming and procedure convention to support personalized medicine. Radiother Oncol. 2014. doi:10.1016/j.radonc.2014.04.008.
Murphy SN, Chueh HC. A security architecture for query tools used to access large biomedical databases. In: Proceedings of the AMIA symposium. American Medical Informatics Association. 2002. p. 552.
Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, Phillips LC, Gainer V, Berkowicz D, Glaser JP, Kohane I. Architecture of the open-source clinical research chart from informatics for integrating biology and the bedside. In: AMIA annual symposium proceedings, vol. 2007. American Medical Informatics Association. 2007. p. 552–6.
Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124–30. doi:10.1136/jamia.2009.000893.
Prud’hommeaux E, Seaborne A. SPARQL query language for RDF. 2008. URL http://www.w3.org/TR/rdf-sparql-query/.
Ramamohan Y, Vasantharao K, Chakravarti CK, Ratnam ASK. A study of data mining tools in knowledge discovery process. Int J Soft Comput Eng (IJSCE). ISSN. 2012;2(3):2231–307.
Roelofs E, Dekker A, Meldolesi E, van Stiphout RG, Valentini V, Lambin P. International data-sharing for radiotherapy research: an open-source based infrastructure for multicentric clinical data mining. Radiother Oncol. 2014;110(2):370–4. doi:10.1016/j.radonc.2013.11.001.
Roelofs E, Persoon L, Nijsten S, Wiessler W, Dekker A, Lambin P. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial. Radiother Oncol. 2013;108(1):174–9. doi:10.1016/j.radonc.2012.09.019.
Sioutos N, Coronado SD, Haber MW, Hartel FW, Shaiu WL, Wright LW. NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform. 2007;40(1):30–43. doi:10.1016/j.jbi.2006.02.013.
Valentini V, Schmoll HJ, Velde CJH. Multidisciplinary management of rectal cancer questions and answers. Berlin/New York: Springer; 2012.
Waitman LR, Aaronson LS, Nadkarni PM, Connolly DW, Campbell JR. The greater plains collaborative: a PCORnet clinical research data network. J Am Med Inform Assoc. 2014. doi:10.1136/amiajnl-2014-002756.
Weber GM, Murphy SN, McMurry AJ, MacFadden D, Nigrin DJ, Churchill S, Kohane IS. The shared health research information network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624–30. doi:10.1197/jamia.M3191.
Wiessler W, Dekker A, Nalbantov G, Oberije C, Eble M, Dries W, Janvary L, Bulens P, Balaji, K, Lambin P. Privacy-preserving, multi-centric machine learning across institutions and countries: does it work? Elsevier, Geneva. 2013
World Health Organization. International statistical classification of diseases and related health problems. Geneva: World Health Organization; 2011.
Wu Y, Jiang X, Kim J, Ohno-Machado L. Grid binary LOgistic REgression (GLORE): building shared models without sharing data. J Am Med Inform Assoc. 2012;19(5):758–64. doi:10.1136/amiajnl-2012-000862.
Yu S, Fung G, Rosales R, Krishnan S, Rao RB, Dehing-Oberije C, Lambin P. Privacy-preserving cox regression for survival analysis. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM; 2008. p. 1034–42.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
van Soest, J.P.A., Dekker, A.L.A.J., Roelofs, E., Nalbantov, G. (2015). Application of Machine Learning for Multicenter Learning. In: El Naqa, I., Li, R., Murphy, M. (eds) Machine Learning in Radiation Oncology. Springer, Cham. https://doi.org/10.1007/978-3-319-18305-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-18305-3_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18304-6
Online ISBN: 978-3-319-18305-3
eBook Packages: MedicineMedicine (R0)