Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5943))

Included in the following conference series:

Abstract

Web-based medical digital libraries contain a huge amount of valuable, up-to-date health care information. However, their size, their keyword-based access methods and their lack of semantic structure make it difficult to find the desired information. In this paper we present an automatic, unsupervised and domain-independent approach for structuring the resources available in an electronic repository. The system automatically detects and extracts the main topics related to a given domain, building a taxonomical structure. Our Web-based system is integrated smoothly with the digital library’s search engine, offering a tool for accessing the library’s resources by hierarchically browsing domain topics in a comprehensive and natural way. The system has been tested over the well-known PubMed medical library, obtaining better topic hierarchies than those generated by widely-used taxonomic search engines employing clustering techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Ansa, O., Hovy, E., Martínez, D.: Enriching very large ontologies using the WWW. In: Proceedings of the Workshop on Ontology Construction of the European Conference of AI, Berlin, Germany (2000)

    Google Scholar 

  2. Agrawal, R., Imielinksi, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)

    Google Scholar 

  3. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 5(284), 34–43 (2001)

    Article  Google Scholar 

  4. Brill, E., Lin, J., Banko, M., Dumais, S.A.: Data-intensive Question Answering. In: Proceedings of the Tenth Text Retrieval Conference, pp. 393–400 (2001)

    Google Scholar 

  5. Chung, C.Y., Lieu, R., Luk, A., Mao, J., Raghavan, P.: Tematic Mapping – From Unstructured Documents to Taxonomies. In: Proceedings of the 11th International Conference on Information and Knowledge Management, USA, pp. 608–610 (2002)

    Google Scholar 

  6. Cilibrasi, R., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2006)

    Article  MATH  Google Scholar 

  7. Cimiano, P., Staab, S.: Learning by Googling. Proceedings of SIGKDD Explorations 6(2), 24–33 (2004)

    Article  Google Scholar 

  8. Ciravegna, F., Dingli, A., Guthrie, D., Wilks, Y.: Integrating Information to Bootstrap Information Extraction from Web Sites. In: Proceedings of the IJCAI Workshop on Information Integration on the Web, pp. 9–14 (2003)

    Google Scholar 

  9. Cutting, D., Karger, D., Pedersen, J., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, pp. 318–329 (1992)

    Google Scholar 

  10. da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Proceedings of Sixth Meeting on Mathematics of Language, pp. 369–381 (1999)

    Google Scholar 

  11. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165, 91–134 (2005)

    Article  Google Scholar 

  12. Fano, R.: Transmission of Information. MIT Press, Cambridge (1961)

    MATH  Google Scholar 

  13. Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, Heidelberg (2001)

    Book  MATH  Google Scholar 

  14. Freeman, R.T.: Topological Tree Clustering of Web Search Results. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 789–797. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Egineering, 2nd edn. (2004)

    Google Scholar 

  16. Grefenstette, G.: SQLET: Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text. In: Proceedings of Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, RIAO 1997, Montreal, Canada, pp. 97–114 (1997)

    Google Scholar 

  17. Hahn, U., Schulz, S.: Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. In: Proceedings of Canadian Conference on AI, pp. 176–186 (2000)

    Google Scholar 

  18. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of 14th International Conference on Computational Linguistics, France, pp. 539–545 (1992)

    Google Scholar 

  19. Ismond, K.P., Shiri, A.: The medical digital library landscape. Online Information Review 31(6), 744–758 (2007)

    Article  Google Scholar 

  20. Kietz, J.U., Maedche, A., Volz, R.: A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet. In: Proceedings of the EKAW 2000 Workshop on Ontologies and Texts, Amsterdam, The Netherlands. CEUR Workshop Proceedings, vol. 51, pp. 4.1–4.14 (2000)

    Google Scholar 

  21. Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2), 144–173 (2000)

    Article  Google Scholar 

  22. Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Canada, pp. 768–773 (1998)

    Google Scholar 

  23. Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral document clustering for Web applications. Technical Report RJ 10186, IBM Research (2000)

    Google Scholar 

  24. Morin, E.: Automatic acquisition of semantic relations between terms from technical corpora. In: Proceedings of the fifth international congress on terminology and knowledge engineering. TermNet-Verlag, Vienna (1999)

    Google Scholar 

  25. Navigli, R., Velardi, P.: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Computational Linguistics 30(2), 151–179 (2004)

    Article  MATH  Google Scholar 

  26. Popescu, A., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 339–346 (2005)

    Google Scholar 

  27. Sánchez, D., Moreno, A.: Pattern-based automatic taxonomy learning from the Web. AI Communications 21(1), 27–48 (2008)

    MathSciNet  MATH  Google Scholar 

  28. Sánchez, D., Moreno, A.: Automatic Discovery of Synonyms and Lexicalizations from the Web. In: Artificial Intelligence Research and Development, pp. 205–212. IOS Press, Amsterdam (2005)

    Google Scholar 

  29. Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–499. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  30. Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. In: Proceedings of the Eighth International WWW Conference, Canada, pp. 1361–1374 (2000)

    Google Scholar 

  31. Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results. In: Proceedings of the 6th Asia Pacific Web Conference, China (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sánchez, D., Moreno, A. (2010). Creating Topic Hierarchies for Large Medical Libraries. In: Riaño, D., ten Teije, A., Miksch, S., Peleg, M. (eds) Knowledge Representation for Health-Care. Data, Processes and Guidelines. KR4HC 2009. Lecture Notes in Computer Science(), vol 5943. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11808-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11808-1_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11807-4

  • Online ISBN: 978-3-642-11808-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics