Mining Hierarchical Pathology Data Using Inductive Logic Programming

  • Tim Op De BeéckEmail author
  • Arjen Hommersom
  • Jan Van Haaren
  • Maarten van der Heijden
  • Jesse Davis
  • Peter Lucas
  • Lucy Overbeek
  • Iris Nagtegaal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9105)


Considerable amounts of data are continuously generated by pathologists in the form of pathology reports. To date, there has been relatively little work exploring how to apply machine learning and data mining techniques to these data in order to extract novel clinical relationships. From a learning perspective, these pathology data possess a number of challenging properties, in particular, the temporal and hierarchical structure that is present within the data. In this paper, we propose a methodology based on inductive logic programming to extract novel associations from pathology excerpts. We discuss the challenges posed by analyzing these data and discuss how we address them. As a case study, we apply our methodology to Dutch pathology data for discovering possible causes of two rare diseases: cholangitis and breast angiosarcomas.


Background Knowledge Electronic Health Record Rule Mining Inductive Logic Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bellazzi, R., Zupan, B.: Predictive data mining in clinical medicine: current issues and guidelines. International Journal of Medical Informatics 77(2), 81–97 (2008)CrossRefGoogle Scholar
  2. 2.
    Bennett, C., Doub, T.: Data mining and electronic health records: Selecting optimal clinical treatments in practice. In: Proc. of DMIN 2010, pp. 313–318 (2010)Google Scholar
  3. 3.
    Casparie, M., Tiebosch, A., Burger, G., Blauwgeers, H., Van de Pol, A., van Krieken, J., Meijer, G.: Pathology databanking and biobanking in the netherlands, a central role for PALGA, the nationwide histopathology and cytopathology data network and archive. Analytical Cellular Pathology 29(1), 19–24 (2007)Google Scholar
  4. 4.
    Cios, K., Moore, W.: Uniqueness of medical data mining. Artificial Intelligence in Medicine 26(1), 1–24 (2002)CrossRefGoogle Scholar
  5. 5.
    Cote, R., Robboy, S.: Progress in medical information management: Systematized nomenclature of medicine (SNOMED). JAMA 243(8), 756–762 (1980)CrossRefGoogle Scholar
  6. 6.
    Fournier-Viger, P.: Spmf: A sequential pattern mining framework (2011),
  7. 7.
    Jensen, P., Jensen, L., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics 13(6), 395–405 (2012)CrossRefGoogle Scholar
  8. 8.
    Lavrač, N., Dzeroski, S., Bratko, I.: Handling imperfect data in inductive logic programming. Advances in Inductive Logic Programming 32, 48–64 (1996)Google Scholar
  9. 9.
    Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. The Journal of Logic Programming 19, 629–679 (1994)CrossRefGoogle Scholar
  10. 10.
    Ramakrishnan, N., Hanauer, D., Keller, B.: Mining electronic health records. Computer 43(10), 77–81 (2010)CrossRefGoogle Scholar
  11. 11.
    Singh, A., Nadkarni, G., Guttag, J., Bottinger, E.: Leveraging hierarchy in medical codes for predictive modeling. In: Proc. of ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 96–103. ACM (2014)Google Scholar
  12. 12.
    Srinivasan, A.: The Aleph manual. Machine Learning at the Computing Laboratory. Oxford University (2001)Google Scholar
  13. 13.
    Sun, J., Hu, J., Luo, D., Markatou, M., Wang, F., Edabollahi, S., Steinhubl, S., Daar, Z., Stewart, W.: Combining knowledge and data driven insights for identifying risk factors using electronic health records. In: Proc. of AMIA Annual Symposium., vol. 2012, p. 901. American Medical Informatics Association (2012)Google Scholar
  14. 14.
    Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the sdm-toolkit. The Computer Journal 56(3), 304–320 (2013)CrossRefGoogle Scholar
  15. 15.
    Wang, F., Lee, N., Hu, J., Sun, J., Ebadollahi, S.: Towards heterogeneous temporal clinical event pattern discovery: a convolutional approach. In: Proc. of the 18th ACM SIGKDD, pp. 453–461. ACM (2012)Google Scholar
  16. 16.
    Žáková, M., Železný, F.: Exploiting term, predicate, and feature taxonomies in propositionalization and propositional rule learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 798–805. Springer, Heidelberg (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tim Op De Beéck
    • 1
    Email author
  • Arjen Hommersom
    • 2
  • Jan Van Haaren
    • 1
  • Maarten van der Heijden
    • 2
  • Jesse Davis
    • 1
  • Peter Lucas
    • 2
  • Lucy Overbeek
    • 3
  • Iris Nagtegaal
    • 4
  1. 1.Department of Computer ScienceKULeuvenBelgium
  2. 2.Institute for Computing and Information SciencesRadboud UniversityNijmegenThe Netherlands
  3. 3.Registry of Histo- and Cytopathology in the NetherlandsUtrechtThe Netherlands
  4. 4.Department of PathologyRadboud University Medical CentreNijmegenThe Netherlands

Personalised recommendations