GlossExtractor: A Web Application to Automatically Create a Domain Glossary

  • Roberto Navigli
  • Paola Velardi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4733)


We describe a web application, GlossExtractor, that receives in input the output of a terminology extraction web application, TermExtractor, or a user-provided terminology, and then searches several repositories (on-line glossaries, web documents, user-specified web pages) for sentences that are candidate definitions for each of the input terms. Candidate definitions are then filtered using statistical indicators and machine-learned regular patterns. Finally, the user can inspect the acquired definitions and perform an individual or group validation. The validated glossary is then downloaded in one of several formats.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Androutsopoulos, I., Galanis, D.: A practically unupervised learning method to identify single-snippets answers to definition questions on the web. In: HLT-EMNLP 2005. Proc. of the Human Language Technology Conference, Vancouver, Canada (2005)Google Scholar
  2. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics, Nantes, France (1992)Google Scholar
  3. Klavans, J., Muresan, S.: Text Mining Techniques for fully automatic Glossary Construction. In: Proc. of the HTL 2001 Conference, San Diego, CA (2001)Google Scholar
  4. Miliaraki, S., Androutsopoulos, I.: Learning to identify single-snippet answers to definition questions. In: Proceedings of COLING 2004, pp. 1360–1366 (2004)Google Scholar
  5. Ng, H.T., Kwan, J.L.P., Xia, Y.: Question answering using a large text database: A machine learning approach. In: Proceedings of EMNLP 2001, Pittsburgh, PA, pp. 67–73 (2001)Google Scholar
  6. Navigli, R., Velardi, P.: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Computational Linguistics 50(2) (2004)Google Scholar
  7. Sclano, F., Velardi, P.: TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities. In: Proc. of 3rd I-ESA 2007, Madeira, March 28-30 (2007)Google Scholar
  8. Velardi, P., Cucchiarelli, A., Pétit, M.: A Taxonomy Learning Method and its Application to Characterize a Scientific Web Community. IEEE Transaction on Data and Knowledge Engineering (TDKE) 19(2), 180–191 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Roberto Navigli
    • 1
  • Paola Velardi
    • 1
  1. 1.Dipartimento di Informatica, Università di Roma “La Sapienza”, via Salaria 113, Roma 

Personalised recommendations