Advertisement

Extravaganza Tutorial on Hot Ideas for Interactive Knowledge Discovery and Data Mining in Biomedical Informatics

  • Andreas Holzinger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8609)

Abstract

Biomedical experts are confronted with ”Big data”, driven by the trend towards precision medicine. Despite the fact that humans are excellent at pattern recognition in dimensions of ≤ 3, most biomedical data is in dimensions much higher than 3, making manual analysis often impossible. Experts in daily routine are decreasingly capable of dealing with such data. Efficient, useable and useful computational methods, algorithms and tools to interactively gain insight into such data are a commandment of the time. A synergistic combination of methodologies of two areas may be of great help here: Human–Computer Interaction (HCI) and Knowledge Discovery/Data Mining (KDD), with the goal of supporting human intelligence with machine learning. Mapping higher dimensional data into lower dimensions is a major task in HCI, and a concerted effort including recent advances from graph-theory and algebraic topology may contribute to finding solutions. Moreover, much biomedical data is sparse, noisy and time-dependent, hence entropy is also amongst promising topics. This tutorial gives an overview of the HCI-KDD approach and focuses on 3 topics: graphs, topology and entropy. The goal of this intro tutorial is to motivate and stimulate further research.

Keywords

Knowledge Discovery Data Mining HCI-KDD Graph- based Text Mining Topological Data Mining Entropy-based Data Mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics - state-of-the-art, future challenges and research directions. BMC Bioinformatics 15, I1 (2014)Google Scholar
  2. 2.
    Holzinger, A.: Biomedical Informatics: Discovering Knowledge in Big Data. Springer, New York (2014)CrossRefGoogle Scholar
  3. 3.
    Wu, X.D., Zhu, X.Q., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26, 97–107 (2014)CrossRefGoogle Scholar
  4. 4.
    Huppertz, B., Holzinger, A.: Biobanks – A source of large biological data sets: Open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 317–330. Springer, Heidelberg (2014)Google Scholar
  5. 5.
    Mattmann, C.A.: Computing: A vision for data science. Nature 493, 473–475 (2013)CrossRefGoogle Scholar
  6. 6.
    Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual data mining: Effective exploration of the biological universe. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 19–33. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  7. 7.
    Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)MATHGoogle Scholar
  8. 8.
    Edelsbrunner, H., Kirkpatrick, D., Seidel, R.: On the shape of a set of points in the plane. IEEE Transactions on Information Theory 29, 551–559 (1983)CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Edelsbrunner, H., Mucke, E.P.: 3-dimensional alpha-shapes. ACM Transactions on Graphics 13, 43–72 (1994)CrossRefMATHGoogle Scholar
  10. 10.
    Albou, L.P., Schwarz, B., Poch, O., Wurtz, J.M., Moras, D.: Defining and characterizing protein surface using alpha shapes. Proteins-Structure Function and Bioinformatics 76, 1–12 (2009)CrossRefGoogle Scholar
  11. 11.
    Frosini, P., Landi, C.: Persistent betti numbers for a noise tolerant shape-based approach to image retrieval. Pattern Recognition Letters 34, 863–872 (2013)CrossRefGoogle Scholar
  12. 12.
    Cook, D., Holder, L.B.: Mining Graph Data. Wiley Interscience (2007)Google Scholar
  13. 13.
    Chakrabarti, D., Faloutsos, C.: Graph mining: Laws, generators, and algorithms. ACM Computing Surveys (CSUR) 38, 2 (2006)CrossRefGoogle Scholar
  14. 14.
    Whitehead, G.W.: Elements of homotopy theory. Springer (1978)Google Scholar
  15. 15.
    Munkres, J.R.: Elements of algebraic topology, vol. 2. Addison-Wesley Reading (1984)Google Scholar
  16. 16.
    Dorogovtsev, S., Mendes, J.: Evolution of networks: From biological nets to the Internet and WWW. Oxford University Press (2003)Google Scholar
  17. 17.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, vol. 2. Wiley, New York (2000)Google Scholar
  18. 18.
    Cook, D.J., Holder, L.B.: Graph-based data mining. IEEE Intelligent Systems and their Applications 15, 32–41 (2000)CrossRefGoogle Scholar
  19. 19.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)Google Scholar
  20. 20.
    Edelsbrunner, H., Harer, J.: Persistent homology - a survey. Contemporary Mathematics Series, vol. 453, pp. 257–282. Amer Mathematical Soc., Providence (2008)Google Scholar
  21. 21.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)CrossRefGoogle Scholar
  22. 22.
    Emmert-Streib, F., Dehmer, M.: Networks for systems biology: Conceptual connection of data and function. IET Systems Biology 5, 185–207 (2011)CrossRefGoogle Scholar
  23. 23.
    Koslicki, D.: Topological entropy of dna sequences. Bioinformatics 27, 1061–1067 (2011)CrossRefGoogle Scholar
  24. 24.
    Ghrist, R.: Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society 45, 61–75 (2008)CrossRefMATHMathSciNetGoogle Scholar
  25. 25.
    Holzinger, A.: Human-computer interaction and knowledge discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  26. 26.
    Holzinger, A.: On knowledge discovery and interactive intelligent visualization of biomedical data - challenges in human–computer interaction and biomedical informatics. In: DATA 2012, Rome, Italy, pp. 9–20 (2012)Google Scholar
  27. 27.
    Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  28. 28.
    Holzinger, A., Bruschi, M., Eder, W.: On interactive data visualization of physiological low-cost-sensor data with focus on mental stress. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 469–480. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  29. 29.
    Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  30. 30.
    Wiltgen, M., Holzinger, A., Tilz, G.P.: Interactive analysis and visualization of macromolecular interfaces between proteins. In: Holzinger, A. (ed.) USAB 2007. LNCS, vol. 4799, pp. 199–212. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  31. 31.
    Preuss, M., Dehmer, M., Pickl, S., Holzinger, A.: On terrain coverage optimization by using a network approach for universal graph-based data mining and knowledge discovery. In: Proceedings of the Active Media Technology - 10th International Conference, AMT 2014, Warsaw, Poland, August 11-14. LNCS, vol. 8610, Springer, Heidelberg (2014)Google Scholar
  32. 32.
    Holzinger, A., Ofner, B., Dehmer, M.: Multi-touch graph-based interaction for knowledge discovery on mobile devices: State-of-the-art and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 241–254. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  33. 33.
    Holzinger, A., Malle, B., Aigner, R., Giuliani, N.: On graph extraction from image data. In: Slezak, D., Schaefer, G., Vuong, T.S., Kim, Y.S. (eds.) Active Media Technology AMT 2014. LNCS, vol. 8610, Springer, Heidelberg (2014)Google Scholar
  34. 34.
    Holzinger, A., Ofner, B., Stocker, C., Calero Valdez, A., Schaar, A.K., Ziefle, M., Dehmer, M.: On graph entropy measures for knowledge discovery from publication network data. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 354–362. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  35. 35.
    Holzinger, A., Hörtenhuber, M., Mayer, C., Bachler, M., Wassertheurer, S., Pinho, A.J., Koslicki, D.: On entropy-based data mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 209–226. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  36. 36.
    Holzinger, A., Stocker, C., Bruschi, M., Auinger, A., Silva, H., Gamboa, H., Fred, A.: On applying approximate entropy to ECG signals for knowledge discovery on the example of big sensor data. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 646–657. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  37. 37.
    Holzinger, A.: On topological data mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 331–356. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  38. 38.
    Kieseberg, P., Hobel, H., Schrittwieser, S., Weippl, E., Holzinger, A.: Protecting anonymity in data-driven biomedical science. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 301–316. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  39. 39.
    Harary, F.: Structural models. An introduction to the theory of directed graphs. Wiley (1965)Google Scholar
  40. 40.
    Strogatz, S.: Exploring complex networks. Nature 410, 268–276 (2001)CrossRefGoogle Scholar
  41. 41.
    Dehmer, M., Mowshowitz, A.: A history of graph entropy measures. Information Sciences 181, 57–78 (2011)CrossRefMATHMathSciNetGoogle Scholar
  42. 42.
    Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)CrossRefMathSciNetGoogle Scholar
  43. 43.
    Kleinberg, J.: Navigation in a small world. Nature 406, 845–845 (2000)CrossRefGoogle Scholar
  44. 44.
    Koontz, W., Narendra, P., Fukunaga, K.: A graph-theoretic approach to nonparametric cluster analysis. IEEE Transactions on Computers 100, 936–944 (1976)CrossRefMathSciNetGoogle Scholar
  45. 45.
    Wittkop, T., Emig, D., Truss, A., Albrecht, M., Boecker, S., Baumbach, J.: Comprehensive cluster analysis with transitivity clustering. Nature Protocols 6, 285–295 (2011)CrossRefGoogle Scholar
  46. 46.
    Holzinger, A., Malle, B., Bloice, M., Wiltgen, M., Ferri, M., Stanganelli, I., Hofmann-Wellenhof, R.: On the generation of point cloud data sets: Step one in the knowledge discovery process. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 57–80. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  47. 47.
    Canutescu, A.A., Shelenkov, A.A., Dunbrack, R.L.: A graph-theory algorithm for rapid protein side-chain prediction. Protein science 12, 2001–2014 (2003)CrossRefGoogle Scholar
  48. 48.
    Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowledge-Based Systems 23, 302–308 (2010)CrossRefGoogle Scholar
  49. 49.
    Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter 5, 59 (2003)Google Scholar
  50. 50.
    Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Int. Res. 1, 231–255 (1994)Google Scholar
  51. 51.
    Yoshida, K., Motoda, H., Indurkhya, N.: Graph-based induction as a unified learning framework. Applied Intelligence 4, 297–316 (1994)CrossRefGoogle Scholar
  52. 52.
    Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3, 7–36 (1999)CrossRefGoogle Scholar
  53. 53.
    Windridge, D., Bober, M.: A kernel-based framework for medical big-data analytics. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 197–208. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  54. 54.
    Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Approaches to text mining for clinical medical records. In: Proceedings of the 2006 ACM symposium on Applied computing - SAC 2006, p. 235. ACM Press, New York (2006)Google Scholar
  55. 55.
    Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Text and structural data mining of influenza mentions in Web and social media. International journal of environmental research and public health 7, 596–615 (2010)CrossRefGoogle Scholar
  56. 56.
    Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC bioinformatics 5, 147 (2004)CrossRefGoogle Scholar
  57. 57.
    Barabási, A., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68 (2011)CrossRefGoogle Scholar
  58. 58.
    Cannon, J.W.: The recognition problem: what is a topological manifold? Bulletin of the American Mathematical Society 84, 832–866 (1978)CrossRefMATHMathSciNetGoogle Scholar
  59. 59.
    Zomorodian, A.: Chapman & Hall/CRC Applied Algorithms and Data Structures series. In: Computational Topology, pp. 1–31. Chapman and Hall, Boca Raton (2010), doi:10.1201/9781584888215-c3.Google Scholar
  60. 60.
    Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. Inverse Problems 27, 120201 (2011)CrossRefGoogle Scholar
  61. 61.
    Wagner, H., Dlotko, P.: Towards topological analysis of high-dimensional feature spaces. Computer Vision and Image Understanding 121, 21–26 (2014)CrossRefGoogle Scholar
  62. 62.
    Kobayashi, M., Aono, M.: Vector space models for search and cluster mining. In: Berry, M.W. (ed.) Survey of Text Mining: Clustering, Classification, and Retrieval, pp. 103–122. Springer, New York (2004)CrossRefGoogle Scholar
  63. 63.
    Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: State-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  64. 64.
    Wagner, H., Dlotko, P., Mrozek, M.: Computational topology in text mining. In: Ferri, M., Frosini, P., Landi, C., Cerri, A., Di Fabio, B. (eds.) CTIC 2012. LNCS, vol. 7309, pp. 68–78. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  65. 65.
    Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences of the United States of America 108, 7265–7270 (2011)CrossRefGoogle Scholar
  66. 66.
    Carlsson, G.: Topology and Data. Bull. Amer. Math. Soc. 46, 255–308 (2009)CrossRefMATHMathSciNetGoogle Scholar
  67. 67.
    Zhu, X.: Persistent homology: An introduction and a new text representation for natural language processing. In: Rossi, F. (ed.) IJCAI. IJCAI/AAAI (2013)Google Scholar
  68. 68.
    Cerri, A., Fabio, B.D., Ferri, M., Frosini, P., Landi, C.: Betti numbers in multidimensional persistent homology are stable functions. Mathematical Methods in the Applied Sciences 36, 1543–1557 (2013)CrossRefMATHMathSciNetGoogle Scholar
  69. 69.
    Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Homology, Homotopy and Applications 9, 337–362 (2007)CrossRefMATHMathSciNetGoogle Scholar
  70. 70.
    Mowshowitz, A.: Entropy and the complexity of graphs: I. an index of the relative complexity of a graph. The Bulletin of Mathematical Biophysics 30, 175–204 (1968)CrossRefMATHMathSciNetGoogle Scholar
  71. 71.
    Körner, J.: Coding of an information source having ambiguous alphabet and the entropy of graphs. In: 6th Prague Conference on Information Theory, pp. 411–425 (1973)Google Scholar
  72. 72.
    Holzinger, A., Ofner, B., Stocker, C., Calero Valdez, A., Schaar, A.K., Ziefle, M., Dehmer, M.: On graph entropy measures for knowledge discovery from publication network data. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 354–362. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  73. 73.
    Adler, R.L., Konheim, A.G., McAndrew, M.H.: Topological entropy. Transactions of the American Mathematical Society 114, 309–319 (1965)CrossRefMATHMathSciNetGoogle Scholar
  74. 74.
    Adler, R., Downarowicz, T., Misiurewicz, M.: Topological entropy. Scholarpedia 3, 2200 (2008)CrossRefGoogle Scholar
  75. 75.
    Hornero, R., Aboy, M., Abasolo, D., McNames, J., Wakeland, W., Goldstein, B.: Complex analysis of intracranial hypertension using approximate entropy. Crit. Care Med. 34, 87–95 (2006)CrossRefGoogle Scholar
  76. 76.
    Pincus, S.M.: Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences 88, 2297–2301 (1991)CrossRefMATHMathSciNetGoogle Scholar
  77. 77.
    Holzinger, A., Stocker, C., Peischl, B., Simonic, K.M.: On using entropy for enhancing handwriting preprocessing. Entropy 14, 2324–2350 (2012)CrossRefGoogle Scholar
  78. 78.
    Holzinger, K., Palade, V., Rabadan, R., Holzinger, A.: Darwin or lamarck? Future challenges in evolutionary algorithms for knowledge discovery and data mining. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 35–56. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  79. 79.
    Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Andreas Holzinger
    • 1
    • 2
  1. 1.Research Unit Human-Computer Interaction, Institute for Medical Informatics, Statistics & DocumentationMedical University GrazAustria
  2. 2.Institute for Information Systems and Computer MediaGraz University of TechnologyAustria

Personalised recommendations