Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges

  • Andreas Holzinger
  • Johannes Schantl
  • Miriam Schroettner
  • Christin Seifert
  • Karin Verspoor
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8401)


Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making – neither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.


Text Mining Natural Language Processing Unstructured Information Big Data Knowledge Discovery Statistical Models Text Classification LSA PLSA LDA hLDA PCA SVM 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics: State-of-the-art, future challenges and research directions. BMC Bioinformatics 15(suppl. 6), I1 (2014)Google Scholar
  2. 2.
    Holzinger, A.: Biomedical Informatics: Discovering Knowledge in Big Data. Springer, New York (2014)CrossRefzbMATHGoogle Scholar
  3. 3.
    Holzinger, A.: On Knowledge Discovery and Interactive Intelligent Visualization of Biomedical Data - Challenges in Human Computer Interaction and Biomedical Informatics, pp. 9–20. INSTICC, Rome (2012)Google Scholar
  4. 4.
    Holzinger, A., Stocker, C., Dehmer, M.: Big complex biomedical data: Towards a taxonomy of data. In: Springer Communications in Computer and Information Science. Springer, Heidelberg (in print, 2014)Google Scholar
  5. 5.
    Resnik, P., Niv, M., Nossal, M., Kapit, A., Toren, R.: Communication of clinically relevant information in electronic health records: a comparison between structured data and unrestricted physician language. In: CAC Proceedings of the Perspectives in Health Information Management (2008)Google Scholar
  6. 6.
    Kreuzthaler, M., Bloice, M., Faulstich, L., Simonic, K., Holzinger, A.: A comparison of different retrieval strategies working on medical free texts. Journal of Universal Computer Science 17(7), 1109–1133 (2011)Google Scholar
  7. 7.
    Holzinger, A., Geierhofer, R., Modritscher, F., Tatzl, R.: Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. Journal of Universal Computer Science 14(22), 3781–3795 (2008)Google Scholar
  8. 8.
    Witten, I., Frank, E., Hall, M.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2011)Google Scholar
  9. 9.
    Verspoor, K., Cohen, K.: Natural language processing. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 1495–1498. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Cohen, K.B., Demner-Fushman, D.: Biomedical Natural Language Processing. John Benjamins (2014)Google Scholar
  11. 11.
    Holzinger, A., Geierhofer, R., Errath, M.: Semantische Informationsextraktion in medizinischen Informationssystemen. Informatik Spektrum 30(2), 69–78 (2007)CrossRefGoogle Scholar
  12. 12.
    Kumar, V., Tipney, H. (eds.): Biomedical Literature Mining. Methods in Molecular Biology, vol. 1159. Springer (2014)Google Scholar
  13. 13.
    Seifert, C., Sabol, V., Kienreich, W., Lex, E., Granitzer, M.: Visual analysis and knowledge discovery for text. In: Gkoulalas-Divanis, A., Labbi, A. (eds.) Large Scale Data Analytics, pp. 189–218. Springer (2014)Google Scholar
  14. 14.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    W3C: HTML5 : a vocabulary and associated APIs for HTML and XHTML (2012)Google Scholar
  16. 16.
    Adobe Systems, I.: Pdf reference, 6th edn., version 1.23. (2006)Google Scholar
  17. 17.
    Liu, H., Christiansen, T., Baumgartner Jr., W.A., Verspoor, K.: BioLemmatizer: a lemmatization tool for morphological processing of biomedical text. Journal of Biomedical Semantics 3(3) (2012)Google Scholar
  18. 18.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  19. 19.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 620 (1975)CrossRefzbMATHGoogle Scholar
  20. 20.
    Boerjesson, E., Hofsten, C.: A vector model for perceived object rotation and translation in space. Psychological Research 38(2), 209–230 (1975)CrossRefGoogle Scholar
  21. 21.
    Joachims, T.: Text categorization with suport vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  22. 22.
    Crouch, C., Crouch, D., Nareddy, K.: Connectionist model for information retrieval based on the vector space model. International Journal of Expert Systems 7(2), 139–163 (1994)Google Scholar
  23. 23.
    Spärk Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manage. 36(6) (2000)Google Scholar
  24. 24.
    Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E., Milios, E.: Information Retrieval by Semantic Similarity. Intern. Journal on Semantic Web and Information Systems (IJSWIS) 3(3), 55–73 (2006); Special Issue of Multimedia SemanticsGoogle Scholar
  25. 25.
    Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 192–201. Springer-Verlag New York, Inc., New York (1994)Google Scholar
  26. 26.
    Müller, H., Michoux, N., Bandon, D., Geissbuhler, A.: A review of content-based image retrieval systems in medical applications - clinical benefits and future directions. International Journal of Medical Informatics 73(1), 1–23 (2003)CrossRefGoogle Scholar
  27. 27.
    da Cunha, I., Fernández, S., Velázquez Morales, P., Vivaldi, J., SanJuan, E., Torres-Moreno, J.-M.: A new hybrid summarizer based on vector space model, statistical physics and linguistics. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 872–882. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  28. 28.
    Liu, G.: Semantic Vector Space Model: Implementation and Evaluation. Journal of the American Society for Information Science 48(5), 395–417 (1997)CrossRefGoogle Scholar
  29. 29.
    Bellegarda, J.: Latent semantic mapping (information retrieval). IEEE Signal Processing Magazine 22(5), 70–80 (2005)CrossRefGoogle Scholar
  30. 30.
    Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)CrossRefGoogle Scholar
  31. 31.
    Landauer, T., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  32. 32.
    Foltz, P., Kintsch, W., Landauer, T.: The measurement of textual coherence with latent semantic analysis. Discourse Processes 25, 285–308 (1998)CrossRefGoogle Scholar
  33. 33.
    Kintsch, W.: The potential of latent semantic analysis for machine grading of clinical case summaries. Journal of Biomedical Informatics 35(1), 3–7 (2002)CrossRefGoogle Scholar
  34. 34.
    Cohen, T., Blatter, B., Patel, V.: Simulating expert clinical comprehension: adapting latent semantic analysis to accurately extract clinical concepts from psychiatric narrative. Journal of Biomedical Informatics 41(6), 1070–1087 (2008)CrossRefGoogle Scholar
  35. 35.
    Yeh, J.F., Wu, C.H., Chen, M.J.: Ontology-based speech act identification in a bilingual dialog system using partial pattern trees. J. Am. Soc. Inf. Sci. Technol. 59(5), 684–694 (2008)CrossRefGoogle Scholar
  36. 36.
    Ginter, F., Suominen, H., Pyysalo, S., Salakoski, T.: Combining hidden markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application. I. J. Medical Informatics 78(12), 1–6 (2009)CrossRefGoogle Scholar
  37. 37.
    Jonnalagadda, S., Cohen, T., Wu, S., Gonzalez, G.: Enhancing clinical concept extraction with distributional semantics. Journal of biomedical informatics 45(1), 129–140 (2012)CrossRefGoogle Scholar
  38. 38.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  39. 39.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. ACM, New York (1999)Google Scholar
  40. 40.
    Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences 61(2), 217–235 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42, 177–196 (2001)CrossRefzbMATHGoogle Scholar
  42. 42.
    Xu, G., Zhang, Y., Zhou, X.: A web recommendation technique based on probabilistic latent semantic analysis. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 15–28. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  43. 43.
    Si, L., Jin, R.: Adjusting mixture weights of gaussian mixture model via regularized probabilistic latent semantic analysis. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 622–631. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  44. 44.
    Lin, C., Xue, G., Zeng, H., Yu, Y.: Using Probabilistic Latent Semantic Analysis for Personalized Web Search. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 707–717. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  45. 45.
    Kim, Y.S., Oh, J.S., Lee, J.Y., Chang, J.H.: An intelligent grading system for descriptive examination papers based on probabilistic latent semantic analysis. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 1141–1146. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  46. 46.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Dobrokhotov, P.B., Goutte, C., Veuthey, A.L., Gaussier, R.: Assisting medical annotation in swiss-prot using statistical classifiers. I. J. Medical Informatics 74(2-4), 317–324 (2005)CrossRefGoogle Scholar
  48. 48.
    Srinivas, K., Rao, G., Govardhan, A.: Survey on prediction of heart morbidity using data mining techniques. International Journal of Data Mining & … 1(3), 14–34 (2011)Google Scholar
  49. 49.
    Lu, Y., Zhang, P., Deng, S.: Exploring Health-Related Topics in Online Health Community Using Cluster Analysis. In: 2013 46th Hawaii International Conference on System Sciences, pp. 802–811 (January 2013)Google Scholar
  50. 50.
    Masseroli, M., Chicco, D., Pinoli, P.: Probabilistic latent semantic analysis for prediction of gene ontology annotations. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012)Google Scholar
  51. 51.
    Koehler, R.: Aspects of Automatic Text Analysis. Springer (2007)Google Scholar
  52. 52.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  53. 53.
    Kakkonen, T., Myller, N., Sutinen, E.: Applying latent Dirichlet allocation to automatic essay grading. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 110–120. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  54. 54.
    Xing, D., Girolami, M.: Employing latent dirichlet allocation for fraud detection in telecommunications. Pattern Recognition Letters 28(13), 1727–1734 (2007)CrossRefGoogle Scholar
  55. 55.
    Girolami, M., Kaban, A.: Sequential activity profiling: Latent Dirichlet allocation of Markov chains. Data Mining and Knowledge Discovery 10(3), 175–196 (2005)MathSciNetCrossRefGoogle Scholar
  56. 56.
    Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, UAI 2002, pp. 352–359. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  57. 57.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)CrossRefGoogle Scholar
  58. 58.
    Asou, T., Eguchi, K.: Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation. In: Proceedings of the 2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO 2008, pp. 77–80. ACM, New York (2008)Google Scholar
  59. 59.
    Arnold, C.W., El-Saden, S.M., Bui, A.A.T., Taira, R.: Clinical case-based retrieval using latent topic analysis. In: AMIA Annu. Symp. Proc., vol. 2010, pp. 26–30 (2010)Google Scholar
  60. 60.
    Arnold, C., Speier, W.: A topic model of clinical reports. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 1031–1032. ACM, New York (2012)Google Scholar
  61. 61.
    Yao, L., Riedel, S., McCallum, A.: Unsupervised relation discovery with sense disambiguation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, ACL 2012, vol. 1, pp. 712–720. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  62. 62.
    Dawson, J., Kendziorski, C.: Survival-supervised latent Dirichlet allocation models for genomic analysis of time-to-event outcomes. arXiv preprint arXiv:1202.5999, 1–21 (2012)Google Scholar
  63. 63.
    Hripcsak, G., Albers, D.J.: Next-generation phenotyping of electronic health records. JAMIA 20(1), 117–121 (2013)Google Scholar
  64. 64.
    Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)Google Scholar
  65. 65.
    Pitman, J.: Combinatorial stochastic processes. Springer Lecture Notes in Mathematics. Springer (2002); Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour (2002)Google Scholar
  66. 66.
    Saria, S., Koller, D., Penn, A.: Discovering shared and individual latent structure in multiple time series. arXiv preprint arXiv:1008 (d), 1–9 (2028)Google Scholar
  67. 67.
    Bartlett, N., Wood, F., Perotte, A.: Hierarchically Supervised Latent Dirichlet Allocation. In: NIPS, pp. 1–9 (2011)Google Scholar
  68. 68.
    Hotelling, H.: Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24(6), 417–441 (1933)CrossRefzbMATHGoogle Scholar
  69. 69.
    Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6 2(11), 559–572 (1901)CrossRefzbMATHGoogle Scholar
  70. 70.
    Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904–909 (2006)CrossRefGoogle Scholar
  71. 71.
    Robertson, M.M., Althoff, R.R., Hafez, A., Pauls, D.L.: Principal components analysis of a large cohort with Tourette syndrome. The British Journal of Psychiatry: the Journal of Mental Science 193(1), 31–36 (2008)CrossRefGoogle Scholar
  72. 72.
    Himmel, W., Reincke, U., Michelmann, H.W.: Text mining and natural language processing approaches for automatic categorization of lay requests to web-based expert forums. Journal of Medical Internet Research 11(3), e25 (2009)Google Scholar
  73. 73.
    Oprea, T., Nielsen, S., Ursu, O.: Associating Drugs, Targets and Clinical Outcomes into an Integrated Network Affords a New Platform for Computer Aided Drug Repurposing. Molecular Informatics 30, 100–111 (2011)CrossRefGoogle Scholar
  74. 74.
    Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997)Google Scholar
  75. 75.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  76. 76.
    Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press (2010)Google Scholar
  77. 77.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)zbMATHGoogle Scholar
  78. 78.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)CrossRefzbMATHGoogle Scholar
  79. 79.
    Ghanem, M., Guo, Y., Lodhi, H., Zhang, Y.: Automatic scientific text classification using local patterns: Kdd cup 2002 (task 1). SIGKDD Explorations 4(2), 95–96 (2002)CrossRefGoogle Scholar
  80. 80.
    Donaldson, I.M., Martin, J.D., de Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S., Baskin, B., Bader, G.D., Michalickova, K., Pawson, T., Hogue, C.W.V.: Prebind and textomy - mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 4, 11 (2003)CrossRefGoogle Scholar
  81. 81.
    Eskin, E., Agichtein, E.: Combining text mining and sequence analysis to discover protein functional regions. In: Altman, R.B., Dunker, A.K., Hunter, L., Jung, T.A., Klein, T.E. (eds.) Pacific Symposium on Biocomputing, pp. 288–299. World Scientific (2004)Google Scholar
  82. 82.
    Joshi, M., Pedersen, T., Maclin, R.: A comparative study of support vector machines applied to the supervised word sense disambiguation problem in the medical domain. In: Prasad, B. (ed.) IICAI, pp. 3449–3468 (2005)Google Scholar
  83. 83.
    Uzuner, Z., Bodnari, A., Shen, S., Forbush, T., Pestian, J., South, B.R.: Evaluating the state of the art in coreference resolution for electronic medical records. JAMIA 19(5), 786–791 (2012)Google Scholar
  84. 84.
    Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29(2-3), 103–130 (1997)CrossRefzbMATHGoogle Scholar
  85. 85.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  86. 86.
    Kim, J.D., Pyysalo, S.: Bionlp shared task. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 138–141. Springer, New York (2013)CrossRefGoogle Scholar
  87. 87.
    Arighi, C., Lu, Z., Krallinger, M., Cohen, K., Wilbur, W., Valencia, A., Hirschman, L., Wu, C.: Overview of the biocreative iii workshop. BMC Bioinformatics 12(suppl. 8), S1 (2011)Google Scholar
  88. 88.
    Hersh, W., Voorhees, E.: Trec genomics special issue overview. Information Retrieval 12(1), 1–15 (2009)CrossRefGoogle Scholar
  89. 89.
    Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: Genia corpus: a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), i180–i182 (2003)Google Scholar
  90. 90.
    Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner, W., Cohen, K., Verspoor, K., Blake, J., Hunter, L.: Concept annotation in the CRAFT corpus. BMC Bioinformatics 13(161) (2012)Google Scholar
  91. 91.
    Verspoor, K., Cohen, K., Lanfranchi, A., Warner, C., Johnson, H., Roeder, C., Choi, J., Funk, C., Malenkiy, Y., Eckert, M., Xue, N., Baumgartner, W., Bada, M., Palmer, M., Hunter, L.: A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics 13, 207 (2012)CrossRefGoogle Scholar
  92. 92.
    Klinger, R., Kolik, C., Fluck, J., Hofmann-Apitius, M., Friedrich, C.M.: Detection of iupac and iupac-like chemical names. Bioinformatics 24(13), i268–i276 (2008)Google Scholar
  93. 93.
    Verspoor, K., Jimeno Yepes, A., Cavedon, L., McIntosh, T., Herten-Crabb, A., Thomas, Z., Plazzer, J.P.: Annotating the biomedical literature for the human variome. Database 2013 (2013)Google Scholar
  94. 94.
    Voorhees, E., Tong, R.: Overview of the trec 2011 medical records track. In: Proceedings of the Text Retrieval Conference (2011)Google Scholar
  95. 95.
    Uzuner, O.: Second i2b2 workshop on natural language processing challenges for clinical records. In: Proceedings of the American Medical Informatics Association Annual Symposium, pp. 1252–1253 (2008)Google Scholar
  96. 96.
    Sun, W., Rumshisky, A., Uzuner, O.: Evaluating temporal relations in clinical text: 2012 i2b2 challenge. Journal of the American Medical Informatics Association 20(5), 806–813 (2013)CrossRefGoogle Scholar
  97. 97.
    Suominen, H., et al.: Overview of the share/clef ehealth evaluation lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  98. 98.
    Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(1), 57–71 (2005)CrossRefGoogle Scholar
  99. 99.
    Hirschman, L., Burns, G.A.P.C., Krallinger, M., Arighi, C., Cohen, K.B., Valencia, A., Wu, C.H., Chatr-Aryamontri, A., Dowell, K.G., Huala, E., Loureno, A., Nash, R., Veuthey, A.L., Wiegers, T., Winter, A.G.: Text mining for the biocuration workflow. Database 2012 (2012)Google Scholar
  100. 100.
    Ananiadou, S., Rea, B., Okazaki, N., Procter, R., Thomas, J.: Supporting systematic reviews using text mining. Social Science Computer Review 27(4), 509–523 (2009)CrossRefGoogle Scholar
  101. 101.
    Dai, H.J., Chang, Y.C., Tsai, R.T.H., Hsu, W.L.: New challenges for biological text-mining in the next decade. J. Comput. Sci. Technol. 25(1), 169–179 (2009)CrossRefGoogle Scholar
  102. 102.
    Tan, A.H.: Text mining: The state of the art and the challenges. In: Proceedings of the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD 1999, Workshop on Knowledge Discovery from Advanced Databases, KDAD 1999, pp. 65–70 (1999)Google Scholar
  103. 103.
    Carrero, F., Cortizo, J., Gomez, J.: Testing concept indexing in crosslingual medical text classification. In: Third International Conference on Digital Information Management, ICDIM 2008, pp. 512–519 (November 2008)Google Scholar
  104. 104.
    Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravičius, V., Hassel, M., Kokkinakis, D., Lundgren-Laine, H., Nilsson, G., Nytrø, O., Salanterä, S., Skeppstedt, M., Suominen, H., Velupillai, S.: Characteristics and analysis of finnish and swedish clinical intensive care nursing narratives. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents, Louhi 2010, pp. 53–60. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  105. 105.
    Patrick, J., Sabbagh, M., Jain, S., Zheng, H.: Spelling correction in clinical notes with emphasis on first suggestion accuracy. In: 2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining, pp. 2–8 (2010)Google Scholar
  106. 106.
    Holzinger, A., Yildirim, P., Geier, M., Simonic, K.M.: Quality-based knowledge discovery from medical text on the web. In: Pasi, G., Bordogna, G., Jain, L.C. (eds.) Quality Issues in the Management of Web Information, Intelligent Systems Reference Library. ISRL, vol. 50, pp. 145–158. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  107. 107.
    Wong, W., Martinez, D., Cavedon, L.: Extraction of named entities from tables in gene mutation literature. In: BioNLP 2009, p. 46 (2009)Google Scholar
  108. 108.
    Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proc. VLDB Endow., vol. 3(1-2), pp. 1338–1347 (September 2010)Google Scholar
  109. 109.
    Quercini, G., Reynaud, C.: Entity discovery and annotation in tables. In: Proceedings of the 16th International Conference on Extending Database Technology, EDBT 2013, pp. 693–704. ACM, New York (2013)CrossRefGoogle Scholar
  110. 110.
    Zwicklbauer, S., Einsiedler, C., Granitzer, M., Seifert, C.: Towards disambiguating web tables. In: International Semantic Web Conference (Posters & Demos), pp. 205–208 (2013)Google Scholar
  111. 111.
    Jimeno Yepes, A., Verspoor, K.: Literature mining of genetic variants for curation: Quantifying the importance of supplementary material. Database: The Journal of Biological Databases and Curation 2013 (2013)Google Scholar
  112. 112.
    Liu, H., Johnson, S.B., Friedman, C.: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS. Journal of the American Medical Informatics Association 9(6), 621–636 (2002)CrossRefGoogle Scholar
  113. 113.
    Aronson, A.R., Lang, F.M.: An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3), 229–236 (2010)CrossRefGoogle Scholar
  114. 114.
    Lu, Z., Kao, H.Y., Wei, C.H., Huang, M., Liu, J., Kuo, C.J., Hsu, C.N., Tsai, R., Dai, H.J., Okazaki, N., Cho, H.C., Gerner, M., Solt, I., Agarwal, S., Liu, F., Vishnyakova, D., Ruch, P., Romacker, M., Rinaldi, F., Bhattacharya, S., Srinivasan, P., Liu, H., Torii, M., Matos, S., Campos, D., Verspoor, K., Livingston, K., Wilbur, W.: The gene normalization task in biocreative iii. BMC Bioinformatics 12(suppl. 8), S2 (2011)Google Scholar
  115. 115.
    Zwicklbauer, S., Seifert, C., Granitzer, M.: Do we need entity-centric knowledge bases for entity disambiguation? In: Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies, I-Know (2013)Google Scholar
  116. 116.
    Ogren, P.V.: Improving syntactic coordination resolution using language modeling. In: Proceedings of the NAACL HLT 2010 Student Research Workshop, HLT-SRWS 2010, pp. 1–6. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  117. 117.
    Chae, J., Jung, Y., Lee, T., Jung, S., Huh, C., Kim, G., Kim, H., Oh, H.: Identifying non-elliptical entity mentions in a coordinated {NP} with ellipses. Journal of Biomedical Informatics 47, 139–152 (2014)CrossRefGoogle Scholar
  118. 118.
    Gasperin, C., Briscoe, T.: Statistical anaphora resolution in biomedical texts. In: Proceedings of the 22nd International Conference on Computational Linguistics, COLING 2008, vol. 1, pp. 257–264. Association for Computational Linguistics, Stroudsburg (2008)Google Scholar
  119. 119.
    Jonnalagadda, S.R., Li, D., Sohn, S., Wu, S.T.I., Wagholikar, K., Torii, M., Liu, H.: Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules. Journal of the American Medical Informatics Association 19(5), 867–874 (2012)CrossRefGoogle Scholar
  120. 120.
    Kim, J.D., Nguyen, N., Wang, Y., Tsujii, J., Takagi, T., Yonezawa, A.: The genia event and protein coreference tasks of the bionlp shared task 2011. BMC Bioinformatics 13(suppl. 11), S1 (2012)Google Scholar
  121. 121.
    Yildirim, P., Ekmekci, I.O., Holzinger, A.: On knowledge discovery in open medical data on the example of the fda drug adverse event reporting system for alendronate (fosamax). In: Holzinger, A., Pasi, G. (eds.) HCI-KDD 2013. LNCS, vol. 7947, pp. 195–206. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  122. 122.
    Holzinger, A.: On topological data mining. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 333–358. Springer, Heidelberg (2014)Google Scholar
  123. 123.
    Holzinger, K., Palade, V., Rabadan, R., Holzinger, A.: Darwin or lamarck? future challenges in evolutionary algorithms for knowledge discovery and data mining. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 35–56. Springer, Heidelberg (2014)Google Scholar
  124. 124.
    Mukherjee, I., Al-Fayoumi, M., Mahanti, P., Jha, R., Al-Bidewi, I.: Content analysis based on text mining using genetic algorithm. In: 2nd International Conference on Computer Technology and Development (ICCTD), pp. 432–436. IEEE (2010)Google Scholar
  125. 125.
    Petz, G., Karpowicz, M., Fürschuß, H., Auinger, A., Stříteský, V., Holzinger, A.: Opinion mining on the web 2.0 – characteristics of user generated content and their impacts. In: Holzinger, A., Pasi, G. (eds.) HCI-KDD 2013. LNCS, vol. 7947, pp. 35–46. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  126. 126.
    Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Text and structural data mining of influenza mentions in Web and social media. International Journal of Environmental Research and Public Health 7(2), 596–615 (2010)CrossRefGoogle Scholar
  127. 127.
    White, R.W., Tatonetti, N.P., Shah, N.H., Altman, R.B., Horvitz, E.: Web-scale pharmacovigilance: listening to signals from the crowd. Journal of the American Medical Informatics Association (2013)Google Scholar
  128. 128.
    Wu, H., Fang, H., Stanhope, S.J.: Exploiting online discussions to discover unrecognized drug side effects. Methods of Information in Medicine 52(2), 152–159 (2013)CrossRefGoogle Scholar
  129. 129.
    Yildirim, P., Majnaric, L., Ekmekci, O., Holzinger, A.: Knowledge discovery of drug data on the example of adverse reaction prediction. BMC Bioinformatics 15(suppl. 6), S7 (2014)Google Scholar
  130. 130.
    Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)Google Scholar
  131. 131.
    Holzinger, A., Stocker, C., Ofner, B., Prohaska, G., Brabenetz, A., Hofmann-Wellenhof, R.: Combining hci, natural language processing, and knowledge discovery - potential of ibm content analytics as an assistive technology in the biomedical domain. In: Holzinger, A., Pasi, G. (eds.) HCI-KDD 2013. LNCS, vol. 7947, pp. 13–24. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Andreas Holzinger
    • 1
  • Johannes Schantl
    • 1
  • Miriam Schroettner
    • 1
  • Christin Seifert
    • 2
  • Karin Verspoor
    • 3
    • 4
  1. 1.Research Unit Human-Computer Interaction, Institute for Medical Informatics, Statistics and DocumentationMedical University GrazGrazAustria
  2. 2.Media InformaticsUniversity of PassauGermany
  3. 3.Department of Computing & Information SystemsUniversity of MelbourneAustralia
  4. 4.Health and Biomedical Informatics CentreUniversity of MelbourneAustralia

Personalised recommendations