Creating Topic Hierarchies for Large Medical Libraries

Sánchez, David; Moreno, Antonio

doi:10.1007/978-3-642-11808-1_1

David Sánchez²³ &
Antonio Moreno²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5943))

Included in the following conference series:

International Workshop on Knowledge Representation for Health Care

596 Accesses
1 Citations

Abstract

Web-based medical digital libraries contain a huge amount of valuable, up-to-date health care information. However, their size, their keyword-based access methods and their lack of semantic structure make it difficult to find the desired information. In this paper we present an automatic, unsupervised and domain-independent approach for structuring the resources available in an electronic repository. The system automatically detects and extracts the main topics related to a given domain, building a taxonomical structure. Our Web-based system is integrated smoothly with the digital library’s search engine, offering a tool for accessing the library’s resources by hierarchically browsing domain topics in a comprehensive and natural way. The system has been tested over the well-known PubMed medical library, obtaining better topic hierarchies than those generated by widely-used taxonomic search engines employing clustering techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., Ansa, O., Hovy, E., Martínez, D.: Enriching very large ontologies using the WWW. In: Proceedings of the Workshop on Ontology Construction of the European Conference of AI, Berlin, Germany (2000)
Google Scholar
Agrawal, R., Imielinksi, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)
Google Scholar
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 5(284), 34–43 (2001)
Article Google Scholar
Brill, E., Lin, J., Banko, M., Dumais, S.A.: Data-intensive Question Answering. In: Proceedings of the Tenth Text Retrieval Conference, pp. 393–400 (2001)
Google Scholar
Chung, C.Y., Lieu, R., Luk, A., Mao, J., Raghavan, P.: Tematic Mapping – From Unstructured Documents to Taxonomies. In: Proceedings of the 11th International Conference on Information and Knowledge Management, USA, pp. 608–610 (2002)
Google Scholar
Cilibrasi, R., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2006)
Article MATH Google Scholar
Cimiano, P., Staab, S.: Learning by Googling. Proceedings of SIGKDD Explorations 6(2), 24–33 (2004)
Article Google Scholar
Ciravegna, F., Dingli, A., Guthrie, D., Wilks, Y.: Integrating Information to Bootstrap Information Extraction from Web Sites. In: Proceedings of the IJCAI Workshop on Information Integration on the Web, pp. 9–14 (2003)
Google Scholar
Cutting, D., Karger, D., Pedersen, J., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, pp. 318–329 (1992)
Google Scholar
da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Proceedings of Sixth Meeting on Mathematics of Language, pp. 369–381 (1999)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Article Google Scholar
Fano, R.: Transmission of Information. MIT Press, Cambridge (1961)
MATH Google Scholar
Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, Heidelberg (2001)
Book MATH Google Scholar
Freeman, R.T.: Topological Tree Clustering of Web Search Results. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 789–797. Springer, Heidelberg (2006)
Chapter Google Scholar
Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Egineering, 2nd edn. (2004)
Google Scholar
Grefenstette, G.: SQLET: Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text. In: Proceedings of Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, RIAO 1997, Montreal, Canada, pp. 97–114 (1997)
Google Scholar
Hahn, U., Schulz, S.: Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. In: Proceedings of Canadian Conference on AI, pp. 176–186 (2000)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of 14th International Conference on Computational Linguistics, France, pp. 539–545 (1992)
Google Scholar
Ismond, K.P., Shiri, A.: The medical digital library landscape. Online Information Review 31(6), 744–758 (2007)
Article Google Scholar
Kietz, J.U., Maedche, A., Volz, R.: A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet. In: Proceedings of the EKAW 2000 Workshop on Ontologies and Texts, Amsterdam, The Netherlands. CEUR Workshop Proceedings, vol. 51, pp. 4.1–4.14 (2000)
Google Scholar
Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2), 144–173 (2000)
Article Google Scholar
Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Canada, pp. 768–773 (1998)
Google Scholar
Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral document clustering for Web applications. Technical Report RJ 10186, IBM Research (2000)
Google Scholar
Morin, E.: Automatic acquisition of semantic relations between terms from technical corpora. In: Proceedings of the fifth international congress on terminology and knowledge engineering. TermNet-Verlag, Vienna (1999)
Google Scholar
Navigli, R., Velardi, P.: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Computational Linguistics 30(2), 151–179 (2004)
Article MATH Google Scholar
Popescu, A., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 339–346 (2005)
Google Scholar
Sánchez, D., Moreno, A.: Pattern-based automatic taxonomy learning from the Web. AI Communications 21(1), 27–48 (2008)
MathSciNet MATH Google Scholar
Sánchez, D., Moreno, A.: Automatic Discovery of Synonyms and Lexicalizations from the Web. In: Artificial Intelligence Research and Development, pp. 205–212. IOS Press, Amsterdam (2005)
Google Scholar
Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–499. Springer, Heidelberg (2001)
Chapter Google Scholar
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. In: Proceedings of the Eighth International WWW Conference, Canada, pp. 1361–1374 (2000)
Google Scholar
Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results. In: Proceedings of the 6th Asia Pacific Web Conference, China (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

ITAKA-Intelligent Technologies for Advanced Knowledge Acquisition Department of Computer Science and Mathematics, University Rovira i Virgili, Av. Països Catalans, 26, 43007, Tarragona, Spain
David Sánchez & Antonio Moreno

Authors

David Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Moreno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Rovira i Virgili University, Av. Països Catalans 26, 43007, Tarragona, Spain
David Riaño
Free University Amsterdam, De Boelelaan 1081A, 1081HV, Amsterdam, The Netherlands
Annette ten Teije
Danube University Krems, Dr.-Karl-Dorrek-Str. 30, 3500, Krems, Austria
Silvia Miksch
University of Haifa, Rabin Bldg, 31905, Haifa, Israel
Mor Peleg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez, D., Moreno, A. (2010). Creating Topic Hierarchies for Large Medical Libraries. In: Riaño, D., ten Teije, A., Miksch, S., Peleg, M. (eds) Knowledge Representation for Health-Care. Data, Processes and Guidelines. KR4HC 2009. Lecture Notes in Computer Science(), vol 5943. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11808-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-11808-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11807-4
Online ISBN: 978-3-642-11808-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics