Big Data in Gesundheitswesen und Medizin

  • Stefan RüpingEmail author
  • Jil Sander


In Medizin und Gesundheitswesen sind immer größere Mengen immer vielfältigerer Daten verfügbar, die zunehmend schneller generiert werden. Dieser allgemeine Trend wird als Big Data bezeichnet. Die Analyse von Big Data mit Methoden des maschinellen Lernens führt zur Entwicklung innovativer Lösungen, die neue medizinische Einsichten generieren und die Qualität und Effizienz im Gesundheitssystem erhöhen können. Prototypische Beispiele existieren im Bereich der Analyse klinischer Texte, der klinischen Entscheidungsunterstützung, der Analyse von Daten aus öffentlichen Datenquellen oder Wearables und in Form der Entwicklung persönlicher Assistenten. Diese Potenziale bringen aber auch neue Herausforderungen im Bereich Datenschutz und in der Transparenz bzw. Nachvollziehbarkeit der Ergebnisse für den medizinischen Experten mit sich.


  1. Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, USCrossRefGoogle Scholar
  2. Alaa AM, Hu S, Schaar M (2017) Learning from clinical judgments: semi-markov-modulated marked Hawkes processes for risk prognosis. Proceedings of the 34th international conference on machine learning. PMLR 70:60–69Google Scholar
  3. Amir S et al (2017) Quantifying Mental Health from Social Media with Neural User Embeddings. Proceedings of machine learning for healthcare 2017, PMLR 68:306–321Google Scholar
  4. Bishop C (2006) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
  5. Blecker S et al (2016) Comparison of approaches for heart failure case identification from electronic health record data. J Am Med Assoc Cardiol 1:1014–1020. Scholar
  6. Butler D (2013) When Google got flu wrong, US outbreak foxes a leading web-based method for tracking seasonal flu. Nature 494:155–156. Zugegriffen: 6. Juni 2018CrossRefGoogle Scholar
  7. Choi E et al (2016) Doctor AI: predicting clinical events via recurrent neural networks. Proceedings of the 1st machine learning for healthcare conference. PMLR 56:301–318Google Scholar
  8. Craven MW, Shavlik JW (1996) Extracting tree-structured representations of trained networks. Adv Neural Process Sys 8:24–30Google Scholar
  9. Dempsey WH et al (2016) iSurvive: an interpretable, event-time prediction model for mHealth. Proceedings of the 34th international conference on machine learning. PMLR 70:970–979Google Scholar
  10. Dernoncourt F et al (2017) De-identification of patient notes with recurrent neural networks. J Am Med Inf Assoc 24:596–606. Scholar
  11. Doshi-Velez F et al (2017) Accountability of AI under the law: the role of explanation. Zugegriffen: 9. Juni 2018
  12. Dwork C (2006) Differential privacy. 33rd international colloquium on automata, languages and programming, part II (ICALP 2006). Springer, Heidelberg, S 1–12Google Scholar
  13. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118. Scholar
  14. Ferrucci D et al (2010) Building Watson: an overview of the deepQA Project. AI magazin, Fall 2010, Association for the Advancement of Artificial Intelligence, S 59–79Google Scholar
  15. Fletcher RR et al (2011) Wearable sensor platform and mobile application for use in cognitive behavioral therapy for drug addiction and PTSD. Proceedings of 2011 annual international conference of the IEEE engineering in medicine and biology society, Boston, S 1802–1805.
  16. Futoma J, Hariharan S, Heller K (2017) Learning to detect sepsis with a multitask Gaussian process RNN classifier. Proceedings of the 34th international conference on machine learning. PMLR 70:1174–1182Google Scholar
  17. Gardner J, Xiong L (2009) An integrated framework for de-identifying unstructured medical data. Data Knowl Eng 68:1441–1451. Scholar
  18. Gartner (2017) Gartner IT glossary, big data – from the Gartner IT glossary: what is big data? Zugegriffen: 22. Juni 2018
  19. Garvin JH et al (2018) Automating quality measures for heart failure using natural language processing: a descriptive study in the department of veterans affairs. JMIR Med Inform 6(1):e5CrossRefGoogle Scholar
  20. Giansanti Daniele et al (2008) Assessment of fall-risk by means of a neural network based on parameters assessed by a wearable device during posturography. Med Eng Phys 30:367–372CrossRefGoogle Scholar
  21. Ginsberg J et al (2009) Detecting influenza epidemics using search engine query data. Nature 457:1012–1014. Scholar
  22. Gonzalez-Hernandez G et al (2017) Capturing the patient’s perspective: a review of advances in natural language processing of health-related text. IMIA Yearb Med Inform 1:214–227CrossRefGoogle Scholar
  23. Goodfellow I, Bengio Y (2016) Deep learning. MIT Press. Zugegriffen: 6. Juni 2018
  24. Grace K, Salvatier J, Dafoe A, Zhang B, O (2017) When will AI exceed human performance? Evidence from AI experts. arXiv preprint. arXiv:1705.08807
  25. Gravina R et al (2017) Multi-sensor fusion in body sensor networks: state-of-the-art and research challenges. Inf Fusion 35:68–80CrossRefGoogle Scholar
  26. Grosskreutz H et al (2012) An enhanced relevance criterion for more concise supervised pattern. KDD ’12, Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining (KDD 2012). ACM, S 1442–1450Google Scholar
  27. Grosskreutz H, Lemmen B, Rüping S (2010) Privacy-preserving data-mining. Informatik-Spektrum 33:380–383CrossRefGoogle Scholar
  28. Gurulingappa H et al (2013) Automatic detection of adverse events to predict drug label changes using text and data mining techniques. Pharmacoepidemiol Drug Saf 22:1189–1194. Scholar
  29. Haq HUI, Ahmad R, Hussain SUI (2017) Intelligent EHRs: predicting procedure codes from diagnosis codes. 31st conference on neural information processing systems (NIPS 2017), Long BeachGoogle Scholar
  30. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2. Aufl. Springer, New YorkCrossRefGoogle Scholar
  31. Kao HC, Tang KF, Chang EY (2018) Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. Proceedings of AAAI coference on artificial intelligenceGoogle Scholar
  32. Karssemeijer N, Laak JAWM van der, and the CAMELYON16 Consortium (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318:2199–2210. Scholar
  33. Kim B, Khanna R, Koyejo S (2016) Examples are not Enough, Learn to Criticize! Criticism for Interpretability. Neural Information Processing Systems. Adv Neural Inf Process Syst 2280–2288Google Scholar
  34. King RC et al (2017) Application of data fusion techniques and technologies for wearable health monitoring. Med Eng Phys 42:1–12CrossRefGoogle Scholar
  35. Kreimeyer K et al (2017) Natural language processing systems for capturing and standardizing unstructured clinical information. A systematic review. J Biomed Inform 73:14–29. Scholar
  36. Laney D (2001) 3D data management: controlling data volume, velocity and variety. META Group, StamfordGoogle Scholar
  37. Leaman R, Khare R, Lu Z (2015) Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform 57:28–37. Scholar
  38. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
  39. Limsopatham N, Collier N (2016) Normalising medical concepts in social media texts by learning semantic representation. Proceedings of the 54th annual meeting of the association for computational linguistics, Berlin, S 1014–1023Google Scholar
  40. Lipton ZC et al (2016) Learning to diagnose with LSTM recurrent neural networks. International conference on learning representations (ICLR 2016)Google Scholar
  41. Madan S et al (2016) The BEL information extraction workflow (BELIEF): evaluation in the biocreative v bel and iat track. Database J Biol Database Curation 2016:baw136 (PMC)CrossRefGoogle Scholar
  42. Mikolov T et al (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Sys 26:3111–3119Google Scholar
  43. Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP 2. Association for Computational Linguistics, Stroudsburg, USA, S 1003–1011Google Scholar
  44. Miotto R et al (2015) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform, bbx044.
  45. Montavon G, Samek W, Müller KR (2017) Methods for interpreting and understanding deep neural networks. Digit Signal Process 73:1–15CrossRefGoogle Scholar
  46. Nguyen H, Patrick J (2016) Text mining in clinical domain: dealing with noise. KDD ’16, Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 22, ACM, S 549–558Google Scholar
  47. Nguyen P, Tran T, Wickramasinghe N, Venkatesh S (2017) Deepr: a convolutional net for medical records. IEEE J Biomed Health Inform 21:22–30. Scholar
  48. Nicolas J et al (2013) A data mining approach for grouping and analyzing trajectories of care using claim data: the example of breast cancer. BMC Med Inform Decis Making 13:130. Scholar
  49. Nosenge N (2016) Can you teach old drugs new tricks? Nature 534:314–316. Scholar
  50. O’Connor K et al (2014) Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. Am Med Inform Assoc 2014:924–933Google Scholar
  51. Osthus D et al (2017) Dynamic Bayesian influenza forecasting in the United States with hierarchical discrepancy. arXiv preprint. arXiv:1708.09481
  52. Pommerening K et al (2014) Leitfaden zum Datenschutz in medizinischen Forschungsprojekten – Generische Lösungen der TMF 2.0. Medizinisch Wissenschaftliche VerlagsgesellschaftGoogle Scholar
  53. Quinlan JR (1993) C4.5: Programs for machine learning. Machine learning. Morgan Kaufmann, San MateoGoogle Scholar
  54. Rajpurkar P, Hannun AY, Haghpanahi M, Bourn C, Ng AY (2017) Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv preprint. arXiv:1707.01836
  55. Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the Predictions of any classifier. KDD ’16, Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, S 1135-1144.
  56. Röhrig B et al (2009) Types of study in medical research. Deutsch Ärtzteblatt Int 106:262–268. Scholar
  57. Rotmensch M et al (2017) Learning a health knowledge graph from electronic medical records. Sci Rep 7:5994. Scholar
  58. Ruud KL et al (2010) Automated detection of follow-up appointments using text mining of discharge records. Int J Qual Health Care 22:229–235. Scholar
  59. Salarian A et al (2007) Ambulatory monitoring of physical activities in patients with parkinson’s disease. IEEE Trans Biomed Eng 54:2296–2299. Scholar
  60. Sculley D et al (2015) Hidden technical debt in machine learning systems. Adv Neural Inf Process Sys 28:817–824Google Scholar
  61. Sebastiani P, Mandl KD, Szolovits P, Kohane IS, Ramoni MF (2006) A Bayesian dynamic model for influenza surveillance. Stat Med 25:1803–1825. Scholar
  62. Semigran HL, Levine DM, Nundy S, Mehrotra A (2016) Comparison of physician and computer diagnostic accuracy. J Am Med Assoc Int Med 176:1860–1861. Scholar
  63. Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehouse 5:13–22Google Scholar
  64. Stuart EA (2010) Matching methods for causal inference: a review and a look forward. Stat Sci 25:1–21. Scholar
  65. Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, CambridgeGoogle Scholar
  66. Sweeney L (2000) Simple demographics often identify people uniquely. Carnegie Mellon University, Data privacy Working Paper 3. PittsburghGoogle Scholar
  67. Szegedy et al (2015). Going deeper with convolutions. Proceedings of 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, BostonGoogle Scholar
  68. Tamang S et al (2015) Detecting unplanned care from clinician notes in electronic health records. J Oncol Pract 11:3CrossRefGoogle Scholar
  69. Turing AM (1950) Computing machinery and intelligence. Mind 49:433–460CrossRefGoogle Scholar
  70. Vasan G, Pilarski PM (2017) Learning from demonstration: teaching a myoelectric prosthesis with an intact limb via reinforcement learning. 2017 International Conference on Rehabilitation Robotics (ICORR), London, S 1457–1464.
  71. Wang Z, Brudno M (2017) Towards a directory of rare disease specialists: identifying experts from publication history. Proceedings of machine learning for healthcare 2017. PMLR: 352–360Google Scholar
  72. Wang X, Sontag D, Wang F (2014) Unsupervised learning of disease progression models. KDD ’14, Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, S 85–94
  73. Wang Y et al (2018) Clinical information extraction applications: a literature review. J Biomed Inf 77:34–49. Scholar
  74. Yang YP et al (2017) The effects of an activity promotion system on active living in overweight subjects with metabolic abnormalities. Obes Res Clin Pract 11:718-727. Scholar
  75. Yang Y, Fasching PA, Tresp V (2017) Predictive modeling of therapy decisions in metastatic breast cancer with recurrent neural network encoder and multinomial hierarchical regression decoder. 2017 IEEE international conference on healthcare informatics (ICHI), Park City, S 46–55.
  76. Yildirim P, Ekmekci IO, Holzinger A (2013) On knowledge discovery in open medical data on the example of the fda drug adverse event reporting system for alendronate (Fosamax). Human-computer interaction and knowledge discovery in complex, unstructured, Big Data, S 95–206Google Scholar
  77. Yumak Z, Pu P (2013) Survey of sensor-based personal wellness management systems. BioNanoSci 3:254–269. Scholar
  78. Zarringhalam K et al (2014) Robust clinical outcome prediction based on Bayesian analysis of transcriptional profiles and prior causal networks. Bioinformatics 30:i69–i77. Scholar

Copyright information

© Springer-Verlag GmbH Deutschland, ein Teil von Springer Nature 2019

Authors and Affiliations

  1. 1.Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS)Sankt AugustinDeutschland

Personalised recommendations