Skip to main content

Quality-Based Knowledge Discovery from Medical Text on the Web

Example of Computational Methods in Web Intelligence

  • Chapter
Quality Issues in the Management of Web Information

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 50))

Abstract

The MEDLINE database (Medical Literature Analysis and Retrieval System Online) contains an enormously increasing volume of biomedical articles. Consequently there is need for techniques which enable the quality-based discovery, the extraction, the integration and the use of hidden knowledge in those articles. Text mining helps to cope with the interpretation of these large volumes of data. Co-occurrence analysis is a technique applied in text mining. Statistical models are used to evaluate the significance of the relationship between entities such as disease names, drug names, and keywords in titles, abstracts or even entire publications. In this paper we present a selection of quality-oriented Web-based tools for analyzing biomedical literature, and specifically discuss PolySearch, FACTA and Kleio. Finally we discuss Pointwise Mutual Information (PMI), which is a measure to discover the strength of a relationship. PMI provides an indication of how more often the query and concept co-occur than expected by change. The results reveal hidden knowledge in articles regarding rheumatic diseases indexed by MEDLINE, thereby exposing relationships that can provide important additional information for medical experts and researchers for medical decision-making and quality-enhancing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.ncbi.nlm.nih.gov/pubmed

  2. http://www.ncbi.nlm.nih.gov/Entrez

  3. http://www.nlm.nih.gov/bsd/revup/revup_pub.html#med_update

  4. Holzinger, A., Simonic, K.M., Yildirim, P.: Disease-disease relationships for rheumatic diseases: Web-based biomedical textmining and knowledge discovery to assist medical decision making. In: 36th International Conference on Computer Software and Applications, COMPSAC, pp. 573–580. IEEE, Izmir (2012)

    Google Scholar 

  5. Kreuzthaler, M., Bloice, M.D., Faulstich, L., Simonic, K.M., Holzinger, A.: A Comparison of Different Retrieval Strategies Working on Medical Free Texts. Journal of Universal Computer Science 17, 1109–1133 (2011)

    Google Scholar 

  6. Solka, J.L.: Text data mining: theory and methods. Statistics Surveys 2, 94–112 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. Yıldırım, P., Çeken, Ç., Çeken, K., Tolun, M.R.: Clustering Analysis for Vasculitic Diseases. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 36–45. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. http://wishart.biology.ualberta.ca/polysearch/cgi-bin/help.cgi#eval1

  9. Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011 (2011)

    Google Scholar 

  10. Cheng, D., Knox, C., Young, N., Stothard, P., Damaraju, S., Wishart, D.S.: PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Research 36, W399–W405 (2008)

    Article  Google Scholar 

  11. Tsuruoka, Y., Tsujii, J., Ananiadou, S.: FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24, 2559–2560 (2008)

    Article  Google Scholar 

  12. Yildirim, P., Çeken, Ç., Hassanpour, R., Tolun, M.R.: Prediction of similarities among rheumatic diseases. Journal of Medical Systems, 1–6 (2010)

    Google Scholar 

  13. http://refine1-nactem.mc.man.ac.uk/facta-visualizer/

  14. Nobata, C., Cotter, P., Okazaki, N., Rea, B., Sasaki, Y., Tsuruoka, Y., Tsujii, J., Ananiadou, S.: Kleio: a knowledge-enriched information retrieval system for biology (Year)

    Google Scholar 

  15. Schmeier, S., Hakenberg, J., Kowald, A., Klipp, E., Leser, U.: Text mining for systems biology using statistical learning methods, pp. 125–129 (Year)

    Google Scholar 

  16. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423 (1948)

    MathSciNet  MATH  Google Scholar 

  17. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16, 22–29 (1990)

    Google Scholar 

  18. Fano, R.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge (1961)

    Google Scholar 

  19. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. From Form to Meaning: Processing Texts Automaticallym. In: Proceedings of the Biennial GSCL Conference, pp. 31–40. Günter Narr Verlag, Tübingen (2009)

    Google Scholar 

  20. Van de Cruys, T.: Two multivariate generalizations of pointwise mutual information. In: Workshop on Distributional Semantics and Compositionality (DiSCo 2011), pp. 16–20. Association for Computational Linguistics (Year)

    Google Scholar 

  21. Recchia, G., Jones, M.N.: More data trumps smarter algorithms: comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods 41, 647–656 (2009)

    Article  Google Scholar 

  22. Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 215–224. ACM, Gold Coast (2010)

    Chapter  Google Scholar 

  23. Takada, T.: Mining local and tail dependence structures based on pointwise mutual information. Data Min. Knowl. Discov. 24, 78–102 (2012)

    Article  MathSciNet  Google Scholar 

  24. Ferreira da Silva, J., Pereira Lopes, G.: A local maxima method and a fair dispersion normalization for extracting multiword units from corpora. In: Sixth Meeting on Mathematics of Language, pp. 369–381 (Year)

    Google Scholar 

  25. Bar-Ilan, J.: Comparing rankings of search results on the web. Inf. Process. Manage. 41, 1511–1519 (2005)

    Article  Google Scholar 

  26. Holzinger, A., Stocker, C., Peischl, B., Simonic, K.-M.: On Using Entropy for Enhancing Handwriting Preprocessing. Entropy 14, 2324–2350 (2012)

    Article  Google Scholar 

  27. Holzinger, A., Stocker, C., Bruschi, M., Auinger, A., Silva, H., Gamboa, H., Fred, A.: On Applying Approximate Entropy to ECG Signals for Knowledge Discovery on the Example of Big Sensor Data. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 646–657. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Holzinger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer- Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Holzinger, A., Yildirim, P., Geier, M., Simonic, KM. (2013). Quality-Based Knowledge Discovery from Medical Text on the Web. In: Pasi, G., Bordogna, G., Jain, L. (eds) Quality Issues in the Management of Web Information. Intelligent Systems Reference Library, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37688-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37688-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37687-0

  • Online ISBN: 978-3-642-37688-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics