Abstract
Web-based medical digital libraries contain a huge amount of valuable, up-to-date health care information. However, their size, their keyword-based access methods and their lack of semantic structure make it difficult to find the desired information. In this paper we present an automatic, unsupervised and domain-independent approach for structuring the resources available in an electronic repository. The system automatically detects and extracts the main topics related to a given domain, building a taxonomical structure. Our Web-based system is integrated smoothly with the digital library’s search engine, offering a tool for accessing the library’s resources by hierarchically browsing domain topics in a comprehensive and natural way. The system has been tested over the well-known PubMed medical library, obtaining better topic hierarchies than those generated by widely-used taxonomic search engines employing clustering techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agirre, E., Ansa, O., Hovy, E., Martínez, D.: Enriching very large ontologies using the WWW. In: Proceedings of the Workshop on Ontology Construction of the European Conference of AI, Berlin, Germany (2000)
Agrawal, R., Imielinksi, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 5(284), 34–43 (2001)
Brill, E., Lin, J., Banko, M., Dumais, S.A.: Data-intensive Question Answering. In: Proceedings of the Tenth Text Retrieval Conference, pp. 393–400 (2001)
Chung, C.Y., Lieu, R., Luk, A., Mao, J., Raghavan, P.: Tematic Mapping – From Unstructured Documents to Taxonomies. In: Proceedings of the 11th International Conference on Information and Knowledge Management, USA, pp. 608–610 (2002)
Cilibrasi, R., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2006)
Cimiano, P., Staab, S.: Learning by Googling. Proceedings of SIGKDD Explorations 6(2), 24–33 (2004)
Ciravegna, F., Dingli, A., Guthrie, D., Wilks, Y.: Integrating Information to Bootstrap Information Extraction from Web Sites. In: Proceedings of the IJCAI Workshop on Information Integration on the Web, pp. 9–14 (2003)
Cutting, D., Karger, D., Pedersen, J., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, pp. 318–329 (1992)
da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Proceedings of Sixth Meeting on Mathematics of Language, pp. 369–381 (1999)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Fano, R.: Transmission of Information. MIT Press, Cambridge (1961)
Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, Heidelberg (2001)
Freeman, R.T.: Topological Tree Clustering of Web Search Results. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 789–797. Springer, Heidelberg (2006)
Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Egineering, 2nd edn. (2004)
Grefenstette, G.: SQLET: Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text. In: Proceedings of Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, RIAO 1997, Montreal, Canada, pp. 97–114 (1997)
Hahn, U., Schulz, S.: Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. In: Proceedings of Canadian Conference on AI, pp. 176–186 (2000)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of 14th International Conference on Computational Linguistics, France, pp. 539–545 (1992)
Ismond, K.P., Shiri, A.: The medical digital library landscape. Online Information Review 31(6), 744–758 (2007)
Kietz, J.U., Maedche, A., Volz, R.: A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet. In: Proceedings of the EKAW 2000 Workshop on Ontologies and Texts, Amsterdam, The Netherlands. CEUR Workshop Proceedings, vol. 51, pp. 4.1–4.14 (2000)
Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2), 144–173 (2000)
Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Canada, pp. 768–773 (1998)
Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral document clustering for Web applications. Technical Report RJ 10186, IBM Research (2000)
Morin, E.: Automatic acquisition of semantic relations between terms from technical corpora. In: Proceedings of the fifth international congress on terminology and knowledge engineering. TermNet-Verlag, Vienna (1999)
Navigli, R., Velardi, P.: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Computational Linguistics 30(2), 151–179 (2004)
Popescu, A., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 339–346 (2005)
Sánchez, D., Moreno, A.: Pattern-based automatic taxonomy learning from the Web. AI Communications 21(1), 27–48 (2008)
Sánchez, D., Moreno, A.: Automatic Discovery of Synonyms and Lexicalizations from the Web. In: Artificial Intelligence Research and Development, pp. 205–212. IOS Press, Amsterdam (2005)
Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–499. Springer, Heidelberg (2001)
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. In: Proceedings of the Eighth International WWW Conference, Canada, pp. 1361–1374 (2000)
Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results. In: Proceedings of the 6th Asia Pacific Web Conference, China (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez, D., Moreno, A. (2010). Creating Topic Hierarchies for Large Medical Libraries. In: Riaño, D., ten Teije, A., Miksch, S., Peleg, M. (eds) Knowledge Representation for Health-Care. Data, Processes and Guidelines. KR4HC 2009. Lecture Notes in Computer Science(), vol 5943. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11808-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-11808-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11807-4
Online ISBN: 978-3-642-11808-1
eBook Packages: Computer ScienceComputer Science (R0)