Skip to main content

The Clinical Data Intelligence Project

A smart data initiative


This article is about a new project that combines clinical data intelligence and smart data. It provides an introduction to the “Klinische Datenintelligenz” (KDI) project which is founded by the Federal Ministry for Economic Affairs and Energy (BMWi); we transfer research and development results (R&D) of the analysis of data which are generated in the clinical routine in specific medical domain. We present the project structure and goals, how patient care should be improved, and the joint efforts of data and knowledge engineering, information extraction (from textual and other unstructured data), statistical machine learning, decision support, and their integration into special use cases moving towards individualised medicine. In particular, we describe some details of our medical use cases and cooperation with two major German university hospitals.

This is a preview of subscription content, access via your institution.


  1. 1.

    Agrawal A, Shiffman RN (2001) Using gem-encoded guidelines to generate medical logic modules. In: AMIA 2001, American Medical Informatics Association Annual Symposium, Washington, DC, USA, 3.–7. November 2001,

  2. 2.

    Azzato EM, Tyrer J, Fasching PAEA (2010) Association between a germline OCA2 polymorphism at chromosome 15q13.1 and estrogen receptor-negative breast cancer survival. J Natl Cancer I 102:650–662

    Article  Google Scholar 

  3. 3.

    Barbieri DF, Braga D, Ceri S, Valle ED, Huang Y, Tresp V, Rettinger A, Wermser H (2010) Deductive and inductive stream reasoning for semantic social media analytics. IEEE Intell Syst 25(6):32–41

    Article  Google Scholar 

  4. 4.

    Bissler JJ, Kingswood JC, Radzikowska E, Zonnenberg BA, Frost M, Belousova E, Sauter M, Nonomura N, Brakemeier S, de Vries PJ, Whittemore VH, Chen D, Sahmoud T, Shah G, Lincy J, Lebwohl D, Budde K (2013) Everolimus for angiomyolipoma associated with tuberous sclerosis complex or sporadic lymphangioleiomyomatosis (EXIST-2): a multicentre, randomised, double-blind, placebo-controlled trial. Lancet 381(9869):817–824

    Article  Google Scholar 

  5. 5.

    Bleyer A, Welch HG (2012) Effect of three decades of screening mammography on breast-cancer incidence. N Engl J Med

  6. 6.

    Budde K, Becker T, Arns W, Sommerer C, Reinke P, Eisenberger U, Kramer S, Fischer W, Gschaidmeier H, Pietruck F (2011) Everolimus-based, calcineurin-inhibitor-free regimen in recipients of de-novo kidney transplants: an open-label, randomised, controlled trial. Lancet 377:837–847

    Article  Google Scholar 

  7. 7.

    Budde K, Lehner F, Sommerer C, Arns W, Reinke P, Eisenberger U, Wüthrich RP, Scheidl S, May C, Paulus EMM, Mühlfeld A, Wolters HH, Pressmar K, Stahl R, Witzke O, ZEUS Study Investigators (2012) Conversion from cyclosporine to everolimus at 4.5 months posttransplant: 3-year results from the randomized ZEUS study. Am J Transplant 12(6):1528–1540

    Article  Google Scholar 

  8. 8.

    Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP (2008) Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 9:207

    Article  Google Scholar 

  9. 9.

    Chaney K, Shiffman RN, Middleton B, White J, Reider J (2013) Findings from a five-year clinical decision support demonstration project and the road ahead. In: AMIA 2013, American Medical Informatics Association Annual Symposium, Washington, DC, USA, 16.–20. November 2013.

  10. 10.

    Choi IY, Kim TM, Kim MS, Mun SK, Chung YJ (2013) Perspectives on clinical informatics: integrating large-scale clinical, genomic, and health information for clinical care. Genomics Inform 11(4):186–90

    Article  Google Scholar 

  11. 11.

    Daumke P, Enders F, Simon K, Poprat M, Marko K (2012) Semantic Annotation of Clinical Text — the Averbis Annotation Editor. In: Proceedings of the 55th Conference of the German Society of Medical Informatics, Biometry and Epidermiology (GMDS)

  12. 12.

    Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp 601–610, ACM, New York, NY, USA,

  13. 13.

    Dugas M, Lange M, Müller-Tidow C, Kirchhof P, Prokosch H (2010) Routine data from hospital information systems can support patient recruitment for clinical studies. Clin Trials 7(2):183–9

    Article  Google Scholar 

  14. 14.

    Elter M, Held C, Wittenberg T (2010) Contour tracing for segmentation of mammographic masses. Phys Med Biol 55(18):5299–5315

    Article  Google Scholar 

  15. 15.

    Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med Phys 34:4164–4172

    Article  Google Scholar 

  16. 16.

    Evans WE, Relling MV (2009) Moving towards individualized medicine with pharmacogenomics. Nature 429:464–468

    Article  Google Scholar 

  17. 17.

    Fasching P, Pharoah P, Cox A et al (2012) The role of genetic breast cancer susceptibility variants as prognostic factors. Hum Mol Genet

  18. 18.

    Gaizauskas RJ, Harkema H, Hepple M, Setzer A (2006) Task-Oriented Extraction of Temporal Information: The Case of Clinical Narratives. In: TIME, IEEE Computer Society, pp 188–195

  19. 19.

    Ganslandt T, Mate S, Helbing K, Sax U, Prokosch HU (2011) Unlocking Data for Clinical Research – The German i2b2 Experience. Appl Clin Inform 2:116–127

    Article  Google Scholar 

  20. 20.

    Glass A, McGuinness DL, Wolverton M (2008) Toward establishing trust in adaptive agents. In: IUI ’08: Proceedings of the 13th international conference on Intelligent user interfaces, pp 227–236, ACM, New York, NY, USA,

  21. 21.

    Groves P, Kayyali B, Knott D, Kuiken SV (2013) The “big data” revolution in healthcare, accelerating value and innovation. In: Centre for US Health System Reform Business Technology Office, Mckinsey & Company

  22. 22.

    Hammon M, Dankerl P, Kramer M, Seifert S, Tsymbal A, Costa MJ, Janka R, Uder M, Cavallaro A (2012) Automated Detection and Volumetric Segmentation of the Spleen in CT Scans. Rofo

  23. 23.

    Hinrichs C, Wendland S, Zimmermann H, Eurich D, Neuhaus R, Schlattmann P, Babel N, Riess H, Gärtner B, Anagnostopoulos I, Reinke P, Trappe RU (2011) IL-6 and IL-10 in post-transplant lymphoproliferative disorders development and maintenance: a longitudinal study of cytokine plasma levels and T-cell subsets in 38 patients undergoing treatment. Transpl Int

  24. 24.

    Hoyer J, Dreweke A, Becker C, Göhring I, Thiel C, Peippo M, Rauch R, Hofbeck M, Trautmann U, Zweier C, Zenker M, Hüffmeier U, Kraus C, Ekici A, Rüschendorf F, Nürnberg P, Reis A, Rauch A (2007) Molecular karyotyping in patients with mental retardation using 100K single-nucleotide polymorphism arrays. J Med Genet 44:629–636

    Article  Google Scholar 

  25. 25.

    Huber L, Naik M, Budde K (2011) Desensitization of HLA-Incompatible Kidney Recipients. New Engl J Med 365(17):1643–1645

    Article  Google Scholar 

  26. 26.

    Hussain T, Michel G, Shiffman RN (2009) The yale guideline recommendation corpus: A representative sample of the knowledge content of guidelines, Int J Med Inform 78(5):354–363

    Google Scholar 

  27. 27.

    Kage A, Elter M, Wittenberg T (2007) An evaluation and comparison of the performance of state of the art approaches for the detection of spiculated masses in mammograms. Conf Proc IEEE Eng Med Biol Soc, pp 3773–3776

    Book  Google Scholar 

  28. 28.

    Krompass D, Esteban C, Tresp V, Sedlmayr M, Ganslandt T (2015) Exploiting latent embeddings of nominal clinical data for predicting hospital readmission. KI – Künstliche Intelligenz, 153–159,

  29. 29.

    Lasserre J, Arnold S, Vingron M, Reinke P, Hinrichs C (2012) Predicting the outcome of renal transplantation. JAMIA 19(2):255–262

    Google Scholar 

  30. 30.

    Lu W, Jansen L, Post W, Bonnema J, de Velde JV, Bock GD (2009) Impact on survival of early detection of isolated breast recurrences after the primary treatment for breast cancer: a meta-analysis. Breast Cancer Res Treat

  31. 31.

    Lysaght M (2002) Maintenance dialysis population dynamics: Current trends and longterm implications. J Am Soc Nephrol 13:37–40

    Google Scholar 

  32. 32.

    Mandl KD, Mandel JC, Murphy SN, Bernstam EV, Ramoni RL, Kreda DA, McCoy JM, Adida B, Kohane IS (2012) The smart platform: early experience enabling substitutable applications for electronic health records. J Am Med Inform Assoc 19(4):597–603

    Article  Google Scholar 

  33. 33.

    Middleton B, Kawamoto K, Reider J, Rosendale D, Shiffman RN (2012) From guidelines to clinical decision support: a unified approach to translating and implementing knowledge. In: AMIA 2012, American Medical Informatics Association Annual Symposium, Chicago, Illinois, USA, 3–7 November 2012,

  34. 34.

    Mkrtchyan T, Sonntag D (2014) Deep Parsing at the CLEF2014 IE Task (DFKI-Medical). In: CEUR Workshop Proceedings, vol 1180, pp 138–146

  35. 35.

    Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I (2010) Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 17(2):124–130

    Article  Google Scholar 

  36. 36.

    Nickel M, Tresp V, Gabrilovich E, Murphy K (2015) Relational machine learning for knowledge graphs. In: Proceedings of the IEEE Conference. IEEE

  37. 37.

    Nickel M, Tresp V, Kriegel HP (2012) Factorizing YAGO: scalable machine learning for linked data. In: Proceedings of the 21st International Conference on World Wide Web Conference, (WWW), pp 271–280. ACM, New York, NY, USA,

  38. 38.

    Oberkampf H, Zillner S, Bauer B, Hammon M (2012) Interpreting Patient Data using Medical Background Knowledge. In: Proceedings of the International Conference on Biomedical Ontologies (ICBO) 2012, Austria, Graz

  39. 39.

    Oberkampf H, Zillner S, Bauer B, Hammon M (2013) An OGMS-based Model for Clinical Information (MCI). In: Proceedings of International Conference on Biomedical Ontology 2013, pp 97–100

  40. 40.

    Prokosch H, Beck A, Ganslandt T, Hummel M, Kiehntopf M, Sax U, Ückert F, Semler S (2010) IT Infrastructure Components for Biobanking. Appl Clin Inform

  41. 41.

    Prokosch H, Ries M, Beyer A, Schwenk M, Seggewies C, Köpcke F, Mate S, Martin M, Bärthlein B, Beckmann MW, Stürzl M, Croner R, Wullich B, Ganslandt T, Bürkle T (2011) IT Infrastructure Components to Support Clinical Care and Translational Research Projects in a Comprehensive Cancer Center. In: User Centered Networked Health Care – Proceedings of MIE International Congress of the European Federation for Medical Informatics, Oslo, Norway

  42. 42.

    Prokosch HU, Ganslandt T (2009) Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Method Inform Med 48:38–44

    Google Scholar 

  43. 43.

    Rauch A, Thiel C, Schindler D, Wick U, Crow Y, Ekici A, van Essen A, Goecke T, Al-Gazali L, Chrzanowska H, Zweier C, Brunner H, Becker K, Curry C, Dallapiccola B, Devriendt K, Dörfler A, Kinning E, Megarbane A et al (2008) Mutations in the pericentrin (PCNT) gene cause primordial dwarfism. Science 319:816–819

    Article  Google Scholar 

  44. 44.

    Rojas M, Telaro E, Russo A, Moschetti I, Coe L, Fossati R, Palli D, del Roselli T, Liberati A (2005) Follow-up strategies for women treated for early breast cancer. Cochrane Database Syst Rev

  45. 45.

    Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS (1996) Evidence based medicine: what it is and what it isn’t. BMJ 312(7023):71–72

    Article  Google Scholar 

  46. 46.

    Seifert S, Barbu A, Zhou SK, Liu D, Feulner J, Huber M, Sühling M, Cavallaro A, Comaniciu D (2010) Hierarchical parsing and semantic navigation of full body CT data. In: Proceedings of the SPIE Medical Imaging.

  47. 47.

    Seifert S, Thoma M, Stegmaier F, Hammon M, Kramer M, Huber M, Kriegel HP, Cavallaro A, Comaniciu D (2011) Combined semantic and similarity search in medical image databases. In: SPIE Medical Imaging

  48. 48.

    Seifert S, Zillner S, Huber M, Sintek M, Sonntag D, Cavallaro A (2011) Theseus Usecase MEDICO (in German). In: Acatech diskutiert ,,Internet der Dienste“ (Internet of Services). Springer

  49. 49.

    Siegel R, Naishadham D, Jemal A (2012) Cancer statistics. CA Cancer J Clin

  50. 50.

    Sonntag D, Möller M (2009) Unifying semantic annotation and querying in biomedical image repositories. In: Proceedings of International Conference on Knowledge Management and Information Sharing (KMIS)

  51. 51.

    Sonntag, D., Wennerberg, P., Buitelaar, P., Zillner, S.: Cases on Semantic Interoperability for Information Systems Integration: Practices and Applications, chap. Pillars of Ontology Treatment in the Medical Domain, pp 162–186. Information Science Reference (2010)

  52. 52.

    Sonntag D, Zillner S, Ernst P, Schulz C, Sintek M, Dankerl P (2014) Mobile radiology interaction and decision support systems of the future. In: Wahlster W, Grallert HJ, Wess S, Friedrich H, Widenka T (eds) Towards the Internet of Services: The THESEUS Research Program, Cognitive Technologies. Springer International Publishing, pp 371–382

  53. 53.

    Sreenivasaiah PK, Kim do H (2010) Current trends and new challenges of databases and web applications for systems driven biological research. Front Physiol 1:147

    Article  Google Scholar 

  54. 54.

    Styler WF, Bethard S, Finan S, Palmer M, Pradhan S, de Groen PC, Erickson B, Miller T, Lin C, Savova G, Pustejovsky J (2014) Temporal annotation in the clinical domain. T Assoc Comput Linguist 2:143–154

    Google Scholar 

  55. 55.

    Sun W, Rumshisky A, Uzuner O (2013) Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc 20(5):806–813

    Article  Google Scholar 

  56. 56.

    Tresp V, Huang Y, Nickel M (2014) Querying the Web with Statistical Machine Learning. In: Wahlster W, Grallert HJ, Wess S, Friedrich H, Widenka T (eds) Towards the Internet of Services: The THESEUS Research Program, Cognitive Technologies. Springer International Publishing

  57. 57.

    Tresp V, Zillner S, Costa MJ, Huang Y, Cavallaro A, Fasching PA, Reis A, Sedlmayr M, Ganslandt T, Budde K, Hinrichs C, Schmidt D, Daumke P, Sonntag D, Wittenberg T, Oppelt PG, Krompass D (2013) Towards a new science of a clinical data intelligence. In: Proceedings of the NIPS Workshop on Machine Learning for Clinical Data Analysis and Healthcare

  58. 58.

    Untch M, von Minckwitz G, Konecny GE, Conrad U, Fett W et al., CK (2011) PREPARE trial: a randomized phase III trial comparing preoperative, dose-dense, dose-intensified chemotherapy with epirubicin, paclitaxel, and CMF versus a standard-dosed epirubicin–cyclophosphamide followed by paclitaxel with or without darbepoetin alfa in primary breast cance-outcome on prognosis. Ann Oncol: 1999–2006

  59. 59.

    Wagner F, Wittenberg T (2011) New features for the classification of mammographic masses. Int J Comput Appl 35(4):29–35

    Google Scholar 

  60. 60.

    Wagner F, Wittenberg T, Elter M (2010) Classification of mammographic masses: influence of regions used for feature extraction on the classification performance. Proc. SPIE, Medical Imaging

  61. 61.

    Wels M, Kelm BM, Hammon M, Jerebko AK, Sühling M, Comaniciu D (2012) Data-driven breast decompression and lesion mapping from digital breast tomosynthesis. MICCAI (1):438–446

  62. 62.

    Wels M, Kelm BM, Tsymbal A, Hammon M, Soza G, Sühling M, Cavallaro A, Comaniciu D (2012) Multi-stage osteolytic spinal bone lesion detection from CT data with internal sensitivity control. In: SPIE Medical Imaging

  63. 63.

    Woeckel A, Kurzeder C, Geyer V, Novasphenny I, Wolters R, Wischnewsky M, Kreienberg R, Varga D (2010) Effects of guideline adherence in primary breast cancer – a 5-year multi-center cohort study of 3976 patients. Breast

  64. 64.

    Woeckel A, Kreienberg R (2008) First Revision of the German S3 Guideline “Diagnosis, Therapy, and Follow-Up of Breast Cancer”. Breast Care

  65. 65.

    Xu F, Uszkoreit H, Li H, Adolphs P, Cheng X (2014) Domain-adaptive relation extraction for the semantic web. In: Wahlster W, Grallert HJ, Wess S, Friedrich H, Widenka T (eds) Towards the Internet of Services: The THESEUS Research Program, Cognitive Technologies. Springer International Publishing, pp 289–297

  66. 66.

    Yu K, Chu W, Yu S, Tresp V, Xu Z (2006) Stochastic Relational Models for Discriminative Link Prediction. In: Advances in Neural Information Processing Systems (NIPS 2006). MIT Press

  67. 67.

    Zhou L, Friedman C, Parsons S, Hripcsak G (2005) System architecture for temporal information extraction, representation and reasoning in clinical narrative reports. AMIA Annu Symp Proc, pp 869–873

  68. 68.

    Zillner S (2010) Reasoning-Based Patient Classification for Enhanced Medical Image Annotations. In: Proceedings of the Extended Semantic Web Conference, (ESWC 2010), Heraklion, Greece, June

  69. 69.

    Zillner S, Neururer S (2015) Technology roadmap for big data healthcare applications. KI – Kuenstliche Intelligenz 29(2):131–141

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Daniel Sonntag.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sonntag, D., Tresp, V., Zillner, S. et al. The Clinical Data Intelligence Project. Informatik Spektrum 39, 290–300 (2016).

Download citation


  • Clinical Decision Support
  • Semantic Annotation
  • Protected Health Information
  • Unstructured Data
  • Data Intelligence