Knowledge Integration for Disease Characterization: A Breast Cancer Example

  • Oshani SeneviratneEmail author
  • Sabbir M. RashidEmail author
  • Shruthi ChariEmail author
  • James P. McCuskerEmail author
  • Kristin P. BennettEmail author
  • James A. HendlerEmail author
  • Deborah L. McGuinnessEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11137)


With the rapid advancements in cancer research, the information that is useful for characterizing disease, staging tumors, and creating treatment and survivorship plans has been changing at a pace that creates challenges when physicians try to remain current. One example involves increasing usage of biomarkers when characterizing the pathologic prognostic stage of a breast tumor. We present our semantic technology approach to support cancer characterization and demonstrate it in our end-to-end prototype system that collects the newest breast cancer staging criteria from authoritative oncology manuals to construct an ontology for breast cancer. Using a tool we developed that utilizes this ontology, physician-facing applications can be used to quickly stage a new patient to support identifying risks, treatment options, and monitoring plans based on authoritative and best practice guidelines. Physicians can also re-stage existing patients or patient populations, allowing them to find patients whose stage has changed in a given patient cohort. As new guidelines emerge, using our proposed mechanism, which is grounded by semantic technologies for ingesting new data from staging manuals, we have created an enriched cancer staging ontology that integrates relevant data from several sources with very little human intervention.


Ontologies Knowledge integration Deductive inference Automatic extraction Cancer characterization Cancer staging guidelines 



This work is partially supported by IBM Research AI through the AI Horizons Network. We thank our colleagues from IBM (Amar Das, Ching-Hua Chen) and RPI (John Erickson, Alexander New, Rebecca Cowan) who provided insight and expertise that greatly assisted the research.


  1. 1.
    Amin, M.B., et al.: The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more personalized approach to cancer staging. CA: Cancer J. Clin. 67(2), 93–99 (2017)Google Scholar
  2. 2.
    Beahrs, O.H., Henson, D.E., Hutter, R.V., Myers, M.H.: Manual for staging of cancer. Am. J. Clin. Oncol. 11(6), 686 (1988)CrossRefGoogle Scholar
  3. 3.
    Bechhofer, S.: OWL: Web ontology language. In: Encyclopedia of Database Systems, pp. 2008–2009. Springer (2009)Google Scholar
  4. 4.
    Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 31. Association for Computational Linguistics (2004)Google Scholar
  5. 5.
    Boeker, M., França, F., Bronsert, P., Schulz, S.: TNM-O: ontology support for staging of malignant tumours. J. Biomed. Semant. 7(1), 64 (2016)CrossRefGoogle Scholar
  6. 6.
    Chakravarty, D., et al.: OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017)CrossRefGoogle Scholar
  7. 7.
    Consortium, U.: UniProt: a hub for protein information. Nucleic Acids Res. 43(D1), D204–D212 (2014)CrossRefGoogle Scholar
  8. 8.
    Degtyarenko, K., et al.: ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36(Suppl. 1), D344–D350 (2007)CrossRefGoogle Scholar
  9. 9.
    Dumontier, M., Baker, C.J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., Del Rio, N.R., Duck, G., Furlong, L.I., Keath, N.: The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J. Biomed. Semant. 5(1), 14 (2014)CrossRefGoogle Scholar
  10. 10.
    Edge, S.B., Compton, C.C.: The American Joint Committee on Cancer: the 7th edition of the AJCC cancer staging manual and the future of TNM. Ann. Surg. Oncol. 17(6), 1471–1474 (2010)CrossRefGoogle Scholar
  11. 11.
    Garraway, L.A., Verweij, J., Ballman, K.V.: Precision oncology: an overview. J. Clin. Oncol. 31(15), 1803–1805 (2013)CrossRefGoogle Scholar
  12. 12.
    Kumar, A., Smith, B.: Oncology ontology in the NCI thesaurus. In: Miksch, S., Hunter, J., Keravnou, E.T. (eds.) AIME 2005. LNCS (LNAI), vol. 3581, pp. 213–220. Springer, Heidelberg (2005). Scholar
  13. 13.
    Gonzalez-Perez, A., et al.: IntOGen-mutations identifies cancer drivers across tumor types. Nature Methods 10(11), 1081 (2013)CrossRefGoogle Scholar
  14. 14.
    Griffith, M., et al.: CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 49(2), 170 (2017)CrossRefGoogle Scholar
  15. 15.
    Hanna, J., Joseph, E., Brochhausen, M., Hogan, W.R.: Building a drug ontology based on RxNorm and other sources. J. Biomed. Semant. 4(1), 44 (2013)CrossRefGoogle Scholar
  16. 16.
    Hayat, M.J., Howlader, N., Reichman, M.E., Edwards, B.K.: Cancer statistics, trends, and multiple primary cancer analyses from the surveillance, epidemiology, and end results (SEER) program. Oncologist 12(1), 20–37 (2007)CrossRefGoogle Scholar
  17. 17.
    Kennedy, B.M., Oren, L.G., Buehring Jr., W.J.: Interactive report generation system and method of operation, uS Patent 5,937,155, 10 August 1999Google Scholar
  18. 18.
    Kim, H.L., Puymon, M.R., Qin, M., Guru, K., Mohler, J.L.: NCCN clinical practice guidelines in oncology™. J. Natl. Compr. Cancer Netw. (2013)Google Scholar
  19. 19.
    Klyne, G., Carroll, J.J.: Resource Description Framework (RDF): Concepts and Abstract Syntax. Technical report. World Wide Web Consortium (2006).
  20. 20.
    Malone, J., et al.: Modeling sample variables with an Experimental Factor Ontology. Bioinformatics 26(8), 1112–1118 (2010)CrossRefGoogle Scholar
  21. 21.
    Massicano, F., et al.: An ontology for TNM clinical stage inference. In: ONTOBRAS (2015)Google Scholar
  22. 22.
    McCusker, J.P.: Whyis: nano-scale knowledge graph publishing, management, and analysis framework (2018).
  23. 23.
    Micheel, C.M., Lovly, C.M., Levy, M.A.: My cancer genome. Cancer Genet. 207(6), 289 (2014)CrossRefGoogle Scholar
  24. 24.
    Mons, B., Velterop, J.: Nano-Publication in the e-science era. In: Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), pp. 14–15 (2009)Google Scholar
  25. 25.
    Rashid, S.M., Chastain, K., Stingone, J.A., McGuinness, D.L., McCusker, J.P.: The semantic data dictionary approach to data annotation & integration. In: Proceedings of the First Workshop on Enabling Open Semantic Science (SemSci), pp. 47–54 (2017)Google Scholar
  26. 26.
    Schriml, L.M., Arze, C., Nadendla, S., Chang, Y.W.W., Mazaitis, M., Felix, V., Feng, G., Kibbe, W.A.: Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res. 40(D1), D940–D946 (2011)CrossRefGoogle Scholar
  27. 27.
    Singletary, S.E., Greene, F.L., Sobin, L.H.: Classification of isolated tumor cells. Cancer 98(12), 2740–2741 (2003)CrossRefGoogle Scholar
  28. 28.
    Sirin, E., Parsia, B.: SPARQL-DL: SPARQL Query for OWL-DL. In: OWLED, vol. 258 (2007)Google Scholar
  29. 29.
    Wishart, D.S., et al.: DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34(Suppl. 1), D668–D672 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Rensselaer Polytechnic InstituteTroyUSA

Personalised recommendations