Advertisement

Knowledge Discovery and Data Mining in Biomedical Informatics: The Future Is in Integrative, Interactive Machine Learning Solutions

  • Andreas Holzinger
  • Igor Jurisica
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8401)

Abstract

Biomedical research is drowning in data, yet starving for knowledge. Current challenges in biomedical research and clinical practice include information overload – the need to combine vast amounts of structured, semi-structured, weakly structured data and vast amounts of unstructured information – and the need to optimize workflows, processes and guidelines, to increase capacity while reducing costs and improving efficiencies. In this paper we provide a very short overview on interactive and integrative solutions for knowledge discovery and data mining. In particular, we emphasize the benefits of including the end user into the “interactive” knowledge discovery process. We describe some of the most important challenges, including the need to develop and apply novel methods, algorithms and tools for the integration, fusion, pre-processing, mapping, analysis and interpretation of complex biomedical data with the aim to identify testable hypotheses, and build realistic models. The HCI-KDD approach, which is a synergistic combination of methodologies and approaches of two areas, Human–Computer Interaction (HCI) and Knowledge Discovery & Data Mining (KDD), offer ideal conditions towards solving these challenges: with the goal of supporting human intelligence with machine intelligence. There is an urgent need for integrative and interactive machine learning solutions, because no medical doctor or biomedical researcher can keep pace today with the increasingly large and complex data sets – often called “Big Data”.

Keywords

Knowledge Discovery Data Mining Machine Learning Biomedical Informatics Integration Interaction HCI-KDD Big Data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Simon, H.A.: Designing Organizations for an Information-Rich World. In: Greenberger, M. (ed.) Computers, Communication, and the Public Interest, pp. 37–72. The Johns Hopkins Press, Baltimore (1971)Google Scholar
  2. 2.
    Dugas, M., Hoffmann, E., Janko, S., Hahnewald, S., Matis, T., Miller, J., Bary, C.V., Farnbacher, A., Vogler, V., Überla, K.: Complexity of biomedical data models in cardiology: the Intranet-based AF registry. Computer Methods and Programs in Biomedicine 68(1), 49–61 (2002)CrossRefGoogle Scholar
  3. 3.
    Akil, H., Martone, M.E., Van Essen, D.C.: Challenges and opportunities in mining neuroscience data. Science 331(6018), 708–712 (2011)CrossRefGoogle Scholar
  4. 4.
    Holzinger, A.: Biomedical Informatics: Computational Sciences meets Life Sciences. BoD, Norderstedt (2012)Google Scholar
  5. 5.
    Holzinger, A.: Biomedical Informatics: Discovering Knowledge in Big Data. Springer, New York (2014)zbMATHCrossRefGoogle Scholar
  6. 6.
    Berghel, H.: Cyberspace 2000: Dealing with Information Overload. Communications of the ACM 40(2), 19–24 (1997)CrossRefGoogle Scholar
  7. 7.
    Noone, J., Warren, J., Brittain, M.: Information overload: opportunities and challenges for the GP’s desktop. Medinfo 9(2), 1287–1291 (1998)Google Scholar
  8. 8.
    Holzinger, A., Geierhofer, R., Errath, M.: Semantic Information in Medical Information Systems - from Data and Information to Knowledge: Facing Information Overload. In: Procedings of I-MEDIA 2007 and I-SEMANTICS 2007, pp. 323–330 (2007)Google Scholar
  9. 9.
    Holzinger, A., Simonic, K.-M., Steyrer, J.: Information Overload - stößt die Medizin an ihre Grenzen? Wissensmanagement 13(1), 10–12 (2011)Google Scholar
  10. 10.
    Holzinger, A., Scherer, R., Ziefle, M.: Navigational User Interface Elements on the Left Side: Intuition of Designers or Experimental Evidence? In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011, Part II. LNCS, vol. 6947, pp. 162–177. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions. BMC Bioinformatics 15(suppl. 6), I1 (2014)Google Scholar
  12. 12.
    Shneiderman, B.: Inventing Discovery Tools: Combining Information Visualization with Data Mining. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 17–28. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  13. 13.
    Shneiderman, B.: Inventing Discovery Tools: Combining Information Visualization with Data Mining. Information Visualization 1(1), 5–12 (2002)zbMATHCrossRefGoogle Scholar
  14. 14.
    Shneiderman, B.: Creativity support tools. Communications of the ACM 45(10), 116–120 (2002)CrossRefGoogle Scholar
  15. 15.
    Shneiderman, B.: Creativity support tools: accelerating discovery and innovation. Communications of the ACM 50(12), 20–32 (2007)CrossRefGoogle Scholar
  16. 16.
    Butler, D.: 2020 computing: Everything, everywhere. Nature 440(7083), 402–405 (2006)CrossRefGoogle Scholar
  17. 17.
    Chaudhry, B., Wang, J., Wu, S.Y., Maglione, M., Mojica, W., Roth, E., Morton, S.C., Shekelle, P.G.: Systematic review: Impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10), 742–752 (2006)CrossRefGoogle Scholar
  18. 18.
    Chawla, N.V., Davis, D.A.: Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework. J. Gen. Intern. Med. 28, S660–S665 (2013)Google Scholar
  19. 19.
    Mirnezami, R., Nicholson, J., Darzi, A.: Preparing for Precision Medicine. N. Engl. J. Med. 366(6), 489–491 (2012)CrossRefGoogle Scholar
  20. 20.
    Sackett, D.L., Rosenberg, W.M., Gray, J., Haynes, R.B., Richardson, W.S.: Evidence based medicine: what it is and what it isn’t. BMJ: British Medical Journal 312(7023), 71 (1996)CrossRefGoogle Scholar
  21. 21.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM 39(11), 27–34 (1996)CrossRefGoogle Scholar
  22. 22.
    Jurisica, I., Mylopoulos, J., Glasgow, J., Shapiro, H., Casper, R.F.: Case-based reasoning in IVF: prediction and knowledge mining. Artificial Intelligence in Medicine 12(1), 1–24 (1998)CrossRefGoogle Scholar
  23. 23.
    Yildirim, P., Ekmekci, I.O., Holzinger, A.: On Knowledge Discovery in Open Medical Data on the Example of the FDA Drug Adverse Event Reporting System for Alendronate (Fosamax). In: Holzinger, A., Pasi, G. (eds.) HCI-KDD 2013. LNCS, vol. 7947, pp. 195–206. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  24. 24.
    Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies 43(5-6), 907–928 (1995)CrossRefGoogle Scholar
  25. 25.
    Pinciroli, F., Pisanelli, D.M.: The unexpected high practical value of medical ontologies. Computers in Biology and Medicine 36(7-8), 669–673 (2006)CrossRefGoogle Scholar
  26. 26.
    Eiter, T., Ianni, G., Polleres, A., Schindlauer, R., Tompits, H.: Reasoning with rules and ontologies. In: Barahona, P., Bry, F., Franconi, E., Henze, N., Sattler, U. (eds.) Reasoning Web 2006. LNCS, vol. 4126, pp. 93–127. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Tjoa, A.M., Andjomshoaa, A., Shayeganfar, F., Wagner, R.: Semantic Web challenges and new requirements. In: Database and Expert Systems Applications (DEXA), pp. 1160–1163. IEEE (2005)Google Scholar
  28. 28.
    d’Aquin, M., Noy, N.F.: Where to publish and find ontologies? A survey of ontology libraries. Web Semantics: Science, Services and Agents on the World Wide Web 11, 96–111 (2012)CrossRefGoogle Scholar
  29. 29.
    Ruttenberg, A., Clark, T., Bug, W., Samwald, M., Bodenreider, O., Chen, H., Doherty, D., Forsberg, K., Gao, Y., Kashyap, V., Kinoshita, J., Luciano, J., Marshall, M.S., Ogbuji, C., Rees, J., Stephens, S., Wong, G.T., Wu, E., Zaccagnini, D., Hongsermeier, T., Neumann, E., Herman, I., Cheung, K.H.: Methodology - Advancing translational research with the Semantic Web. BMC Bioinformatics 8 (2007)Google Scholar
  30. 30.
    Shortliffe, E.H., Barnett, G.O.: Biomedical data: Their acquisition, storage, and use. Biomedical informatics, pp. 39–66. Springer, London (2014)Google Scholar
  31. 31.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)zbMATHCrossRefGoogle Scholar
  32. 32.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2011)Google Scholar
  33. 33.
    Arel, I., Rose, D.C., Karnowski, T.P.: Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]. IEEE Computational Intelligence Magazine 5(4), 13–18 (2010)CrossRefGoogle Scholar
  34. 34.
    Dietterich, T.G.: Ensemble methods in machine learning. Multiple classifier systems, pp. 1–15. Springer (2000)Google Scholar
  35. 35.
    Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1-2), 1–39 (2010)CrossRefGoogle Scholar
  36. 36.
    Card, S.K., Moran, T.P., Newell, A.: The keystroke-level model for user performance time with interactive systems. Communications of the ACM 23(7), 396–410 (1980)CrossRefGoogle Scholar
  37. 37.
    Card, S.K., Moran, T.P., Newell, A.: The psychology of Human-Computer Interaction. Erlbaum, Hillsdale (1983)Google Scholar
  38. 38.
    Sanchez, C., Lachaize, C., Janody, F., Bellon, B., Roder, L., Euzenat, J., Rechenmann, F., Jacq, B.: Grasping at molecular interactions and genetic networks in Drosophila melanogaster using FlyNets, an Internet database. Nucleic Acids Res. 27(1), 89–94 (1999)CrossRefGoogle Scholar
  39. 39.
    McNeil, B.J., Keeler, E., Adelstein, S.J.: Primer on Certain Elements of Medical Decision Making. N. Engl. J. Med. 293(5), 211–215 (1975)CrossRefGoogle Scholar
  40. 40.
    Sweller, J.: Cognitive load during problem solving: Effects on learning. Cognitive Science 12(2), 257–285 (1988)CrossRefGoogle Scholar
  41. 41.
    Stickel, C., Ebner, M., Holzinger, A.: Useful Oblivion Versus Information Overload in e-Learning Examples in the Context of Wiki Systems. Journal of Computing and Information Technology (CIT) 16(4), 271–277 (2008)CrossRefGoogle Scholar
  42. 42.
    Workman, M.: Cognitive Load Research and Semantic Apprehension of Graphical Linguistics. In: Holzinger, A. (ed.) USAB 2007. LNCS, vol. 4799, pp. 375–388. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  43. 43.
    Mitchell, T.M.: Machine learning, p. 267. McGraw-Hill, Boston (1997)zbMATHGoogle Scholar
  44. 44.
    Shortliffe, E.H., Perrault, L.E., Wiederhold, G., Fagan, L.M.: Medical Informatics: Computer Applications in Health Care and Biomedicine. Springer, New York (1990)Google Scholar
  45. 45.
    Holzinger, A.: Usability engineering methods for software developers. Communications of the ACM 48(1), 71–74 (2005)CrossRefGoogle Scholar
  46. 46.
    Keim, D.A.: Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics 8(1), 1–8 (2002)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Gotz, D., Wang, F., Perer, A.: A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. J. Biomed. Inform. (in print, 2014)Google Scholar
  48. 48.
    Pastrello, C., Pasini, E., Kotlyar, M., Otasek, D., Wong, S., Sangrar, W., Rahmati, S., Jurisica, I.: Integration, visualization and analysis of human interactome. Biochemical and Biophysical Research Communications 445(4), 757–773 (2014)CrossRefGoogle Scholar
  49. 49.
    Dehmer, M.: Information-theoretic concepts for the analysis of complex networks. Applied Artificial Intelligence 22(7-8), 684–706 (2008)CrossRefGoogle Scholar
  50. 50.
    Pastrello, C., Otasek, D., Fortney, K., Agapito, G., Cannataro, M., Shirdel, E., Jurisica, I.: Visual Data Mining of Biological Networks: One Size Does Not Fit All. PLoS Computational Biology 9(1), e1002833 (2013)Google Scholar
  51. 51.
    Bowman, I., Joshi, S.H., Van Horn, J.D.: Visual systems for interactive exploration and mining of large-scale neuroimaging data archives. Frontiers in Neuroinformatics 6(11) (2012)Google Scholar
  52. 52.
    Kolling, J., Langenkamper, D., Abouna, S., Khan, M., Nattkemper, T.W.: WHIDE–a web tool for visual data mining colocation patterns in multivariate bioimages. Bioinformatics 28(8), 1143–1150 (2012)CrossRefGoogle Scholar
  53. 53.
    Wegman, E.J.: Visual data mining. Stat. Med. 22(9), 1383–1397 (2003)CrossRefGoogle Scholar
  54. 54.
    Holzinger, A.: Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  55. 55.
    Lovell, M.C.: Data Mining. Review of Economics and Statistics 65(1), 1–12 (1983)CrossRefGoogle Scholar
  56. 56.
    Mooers, C.N.: Information retrieval viewed as temporal signalling. In: Proc. Internatl. Congr. of Mathematicians, August 30-September 6, p. 572 (1950)Google Scholar
  57. 57.
    Mooers, C.N.: The next twenty years in information retrieval; some goals and predictions. American Documentation 11(3), 229–236 (1960)CrossRefGoogle Scholar
  58. 58.
    Piatetsky-Shapiro, G.: Knowledge Discovery in Real Databases - A report on the IJCAI-89 Workshop. AI Magazine 11(5), 68–70 (1991)Google Scholar
  59. 59.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. Ai Magazine 17(3), 37–54 (1996)Google Scholar
  60. 60.
    Holzinger, A., Malle, B., Bloice, M., Wiltgen, M., Ferri, M., Stanganelli, I., Hofmann-Wellenhof, R.: On the Generation of Point Cloud Data Sets: the first step in the Knowledge Discovery Process. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 57–80. Springer, Heidelberg (2014)Google Scholar
  61. 61.
    Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: A Policy-based Cleansing and Integration Framework for Labour and Healthcare Data. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 141–168. Springer, Heidelberg (2014)Google Scholar
  62. 62.
    Nguyen, H., Thompson, J.D., Schutz, P., Poch, O.: Intelligent integrative knowledge bases: bridging genomics, integrative biology and translational medicine. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 255–270. Springer, Heidelberg (2014)Google Scholar
  63. 63.
    Huppertz, B., Holzinger, A.: Biobanks – A Source of large Biological Data Sets: Open Problems and Future Challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining, vol. 8401, pp. 317–330. Springer, Heidelberg (2014)Google Scholar
  64. 64.
    Holzinger, K., Palade, V., Rabadan, R., Holzinger, A.: Darwin or Lamarck? Future Challenges in Evolutionary Algorithms for Knowledge Discovery and Data Mining. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 35–56. Springer, Heidelberg (2014)Google Scholar
  65. 65.
    Katz, G., Shabtai, A., Rokach, L.: Adapted Features and Instance Selection for Improving Co-Training. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 81–100. Springer, Heidelberg (2014)Google Scholar
  66. 66.
    Yildirim, P., Bloice, M., Holzinger, A.: Knowledge Discovery & Visualization of Clusters for Erythromycin Related Adverse Events in the FDA Drug Adverse Event Reporting System. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 101–116. Springer, Heidelberg (2014)Google Scholar
  67. 67.
    Kobayashi, M.: Resources for Studying Statistical Analysis of Biomedical Data and R. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 183–195. Springer, Heidelberg (2014)Google Scholar
  68. 68.
    Windridge, D., Bober, M.: A Kernel-based Framework for Medical Big-Data Analytics. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 197–208. Springer, Heidelberg (2014)Google Scholar
  69. 69.
    Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical Text Mining: Open Problems and Future Challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)Google Scholar
  70. 70.
    Holzinger, A., Ofner, B., Dehmer, M.: Multi-touch Graph-Based Interaction for Knowledge Discovery on Mobile Devices: State-of-the-Art and Future Challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 241–254. Springer, Heidelberg (2014)Google Scholar
  71. 71.
    Lee, S.: Sparse Inverse Covariance Estimation for Graph Representation of Feature Structure. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 227–240. Springer, Berlin (2014)Google Scholar
  72. 72.
    Holzinger, A., Hortenhuber, M., Mayer, C., Bachler, M., Wassertheurer, S., Pinho, A., Koslicki, D.: On Entropy-based Data Mining. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 209–226. Springer, Heidelberg (2014)Google Scholar
  73. 73.
    Holzinger, A.: Topological Data Mining in a Nutshell. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 331–356. Springer, Heidelberg (2014)Google Scholar
  74. 74.
    Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual Data Mining: Effective Exploration ofthe Biological Universe. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol.  8401, pp. 19–33. Springer, Heidelberg (2014)Google Scholar
  75. 75.
    Turkay, C., Jeanquartier, F., Holzinger, A., Hauser, H.: On Computationally-enhanced Visual Analysis of Heterogeneous Data and its Application in Biomedical Informatics. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 117–140. Springer, Heidelberg (2014)Google Scholar
  76. 76.
    van Leeuwen, M.: Interactive Data Exploration using Pattern Mining. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014)Google Scholar
  77. 77.
    Kieseberg, P., Hobel, H., Schrittwieser, S., Weippl, E., Holzinger, A.: Protecting Anonymity in the Data-Driven Medical Sciences. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 301–316. Springer, Heidelberg (2014)Google Scholar
  78. 78.
    Gigerenzer, G.: Gut Feelings: Short Cuts to Better Decision Making. Penguin, London (2008)Google Scholar
  79. 79.
    Gigerenzer, G., Gaissmaier, W.: Heuristic Decision Making. In: Fiske, S.T., Schacter, D.L., Taylor, S.E. (eds.) Annual Review of Psychology, vol. 62, pp. 451–482. Annual Reviews, Palo Alto (2011)Google Scholar
  80. 80.
    Fang, F.C., Steen, R.G., Casadevall, A.: Misconduct accounts for the majority of retracted scientific publications. Proc. Natl. Acad. Sci. U.S.A 109(42), 17028–17033 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Andreas Holzinger
    • 1
  • Igor Jurisica
    • 2
  1. 1.Institute for Medical Informatics, Statistics and Documentation, Research Unit HCI, Austrian IBM Watson Think GroupMedical University GrazGrazAustria
  2. 2.IBM Life Sciences Discovery Centre, and TECHNA Institute for the Advancement of Technology for HealthPrincess Margaret Cancer Centre, University Health NetworkTorontoCanada

Personalised recommendations