Abstract
The MEDLINE database (Medical Literature Analysis and Retrieval System Online) contains an enormously increasing volume of biomedical articles. Consequently there is need for techniques which enable the quality-based discovery, the extraction, the integration and the use of hidden knowledge in those articles. Text mining helps to cope with the interpretation of these large volumes of data. Co-occurrence analysis is a technique applied in text mining. Statistical models are used to evaluate the significance of the relationship between entities such as disease names, drug names, and keywords in titles, abstracts or even entire publications. In this paper we present a selection of quality-oriented Web-based tools for analyzing biomedical literature, and specifically discuss PolySearch, FACTA and Kleio. Finally we discuss Pointwise Mutual Information (PMI), which is a measure to discover the strength of a relationship. PMI provides an indication of how more often the query and concept co-occur than expected by change. The results reveal hidden knowledge in articles regarding rheumatic diseases indexed by MEDLINE, thereby exposing relationships that can provide important additional information for medical experts and researchers for medical decision-making and quality-enhancing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Holzinger, A., Simonic, K.M., Yildirim, P.: Disease-disease relationships for rheumatic diseases: Web-based biomedical textmining and knowledge discovery to assist medical decision making. In: 36th International Conference on Computer Software and Applications, COMPSAC, pp. 573–580. IEEE, Izmir (2012)
Kreuzthaler, M., Bloice, M.D., Faulstich, L., Simonic, K.M., Holzinger, A.: A Comparison of Different Retrieval Strategies Working on Medical Free Texts. Journal of Universal Computer Science 17, 1109–1133 (2011)
Solka, J.L.: Text data mining: theory and methods. Statistics Surveys 2, 94–112 (2008)
Yıldırım, P., Çeken, Ç., Çeken, K., Tolun, M.R.: Clustering Analysis for Vasculitic Diseases. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 36–45. Springer, Heidelberg (2010)
http://wishart.biology.ualberta.ca/polysearch/cgi-bin/help.cgi#eval1
Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011 (2011)
Cheng, D., Knox, C., Young, N., Stothard, P., Damaraju, S., Wishart, D.S.: PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Research 36, W399–W405 (2008)
Tsuruoka, Y., Tsujii, J., Ananiadou, S.: FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24, 2559–2560 (2008)
Yildirim, P., Çeken, Ç., Hassanpour, R., Tolun, M.R.: Prediction of similarities among rheumatic diseases. Journal of Medical Systems, 1–6 (2010)
Nobata, C., Cotter, P., Okazaki, N., Rea, B., Sasaki, Y., Tsuruoka, Y., Tsujii, J., Ananiadou, S.: Kleio: a knowledge-enriched information retrieval system for biology (Year)
Schmeier, S., Hakenberg, J., Kowald, A., Klipp, E., Leser, U.: Text mining for systems biology using statistical learning methods, pp. 125–129 (Year)
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423 (1948)
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16, 22–29 (1990)
Fano, R.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge (1961)
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. From Form to Meaning: Processing Texts Automaticallym. In: Proceedings of the Biennial GSCL Conference, pp. 31–40. Günter Narr Verlag, Tübingen (2009)
Van de Cruys, T.: Two multivariate generalizations of pointwise mutual information. In: Workshop on Distributional Semantics and Compositionality (DiSCo 2011), pp. 16–20. Association for Computational Linguistics (Year)
Recchia, G., Jones, M.N.: More data trumps smarter algorithms: comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods 41, 647–656 (2009)
Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 215–224. ACM, Gold Coast (2010)
Takada, T.: Mining local and tail dependence structures based on pointwise mutual information. Data Min. Knowl. Discov. 24, 78–102 (2012)
Ferreira da Silva, J., Pereira Lopes, G.: A local maxima method and a fair dispersion normalization for extracting multiword units from corpora. In: Sixth Meeting on Mathematics of Language, pp. 369–381 (Year)
Bar-Ilan, J.: Comparing rankings of search results on the web. Inf. Process. Manage. 41, 1511–1519 (2005)
Holzinger, A., Stocker, C., Peischl, B., Simonic, K.-M.: On Using Entropy for Enhancing Handwriting Preprocessing. Entropy 14, 2324–2350 (2012)
Holzinger, A., Stocker, C., Bruschi, M., Auinger, A., Silva, H., Gamboa, H., Fred, A.: On Applying Approximate Entropy to ECG Signals for Knowledge Discovery on the Example of Big Sensor Data. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 646–657. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer- Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Holzinger, A., Yildirim, P., Geier, M., Simonic, KM. (2013). Quality-Based Knowledge Discovery from Medical Text on the Web. In: Pasi, G., Bordogna, G., Jain, L. (eds) Quality Issues in the Management of Web Information. Intelligent Systems Reference Library, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37688-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-37688-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37687-0
Online ISBN: 978-3-642-37688-7
eBook Packages: EngineeringEngineering (R0)