Advertisement

Technical terminology for domain specification and content characterisation

  • Branimir Boguraev
  • Christopher Kennedy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1299)

Abstract

The identification and extraction of technical terms is one of the better understood and most robust natural language processing (NLP) technologies within the current state of the art of language engineering. What is particularly interesting here is the clear understanding how to derive, from their linguistic properties, computational procedures for reliable identification and extraction of terms from technical, scientific, prose. In generic information management contexts, terms have been associated both with procedures seeking to identify a term set which uniquely distinguishes a document within a nearly homogenous document collection, and with procedures seeking to extract a representative terms sample which uniquely characterises a document's content. There is a wide range of uses for terminology, commonly identified with e.g. text indexing, computational lexicology, and machine-assisted translation; most of these employ the notion of terminology being representative of a given domain. This paper discusses some specific extensions of the terminology identification technology to make it fully capable of domain specification; it also presents extensions of the technology beyond domain specification, to the purpose of document characterisation. These extensions make terminology identification the foundation of an operational environment for document processing and content characterisation and abstraction; more generally, it becomes an immensely empowering technology in the age of growing information overload.

Keywords

Noun Phrase Floppy Disk Domain Object Discourse Referent Local Ontology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apple Computer, Inc., 20525 Mariani Avenue, Cupertino, CA 95014-6299. Macintosh User's Guide, 1994.Google Scholar
  2. 2.
    B. Boguraev. WORDWEB and APPLE GUIDE: a comparative evaluation. Technical report, Internal Report, Advanced Technologies Group, Apple Computer, 1995.Google Scholar
  3. 3.
    B. Boguraev. Content analysis via lexical semantics. The Apple Research Labs Review, pages 2–13, September 1996.Google Scholar
  4. 4.
    B. Boguraev and C. Kennedy. Salience-based content characterisation of text documents. In Proceedings of ACL'97 Workshop on Intelligent, Scalable Text Summarisation, Madrid, Spain, 1997.Google Scholar
  5. 5.
    B. Boguraev and J. Pustejovsky, editors. Corpus processing for lexical acquisition. MIT Press, Cambridge, Mass, 1996.Google Scholar
  6. 6.
    D. Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases. In 14th International Conference on Computational Linguistics, Nantes, France, 1992.Google Scholar
  7. 7.
    J. Buchan. Heart's journey in winter. Harvill Collins, London, 1996.Google Scholar
  8. 8.
    I. Dagan and K. Church. Termight: identifying and translating technical terminology. In 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, 1995.Google Scholar
  9. 9.
    M. Hearst. Multi-paragraph segmentation of expository text. In 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.Google Scholar
  10. 10.
    I. Heim. The semantics of definite and indefinite noun phrases. PhD thesis, University of Massachusetts, Department of Linguistics, Amherst, MA, 1981. unpublished.Google Scholar
  11. 11.
    J. Hodges, S. Yie, R. Reighart, and L. Bogges. An automated system that assists in the generation of document indexes. Natural Language Engineering, 2:137–160, 1996.CrossRefGoogle Scholar
  12. 12.
    N. Hutheesing. Gilbert Amelio's grand scheme to rescue Apple. Forbes Magazine, December 16, 1996.Google Scholar
  13. 13.
    M. Johnston, B. Boguraev, and J. Pustejovsky. The structure and interpretation of compound nominals. In AAAI Spring Symposium on Generativity and the Lexicon, Stanford, 1994.Google Scholar
  14. 14.
    J. S. Justeson and S. M. Katz. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):927, 1995.CrossRefGoogle Scholar
  15. 15.
    L. Karttunen. Discourse referents. In J. McCawley, editor, Syntax and Semantics. Academic Press, New York, NY, 1968.zbMATHGoogle Scholar
  16. 16.
    C. Kennedy and B. Boguraev. Anaphora for everyone: Pronominal anaphora resolution without a parser, In Proceedings of COLING-96 (16th International Conference on Computational Linguistics), Copenhagen, DK, 1996.Google Scholar
  17. 17.
    C. Kennedy and B. Boguraev. Anaphora in a wider context: Tracking discourse referents. In W. Wahlster, editor, Proceedings of ECAI-96 (12th European Conference on Artificial Intelligence), Budapest, Hungary, 1996. John Wiley and Sons, Ltd, London/New York.Google Scholar
  18. 18.
    S. Lappin and H. Leass. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561, 1994.Google Scholar
  19. 19.
    I. Mani and T. R. MacMillan. Identifying unknown proper names in newswire text. In B. Boguraev and J. Pustejovsky, editors, Corpus Processing for Lexical Acquisition, pages 41–60. MIT Press, 1996.Google Scholar
  20. 20.
    M. M. McCord. Slot grammar: a system for simpler construction of practical natural language grammars. In R. Studer, editor, Natural language and logic: international scientific symposium, Lecture Notes in Computer Science, pages 118–145. Springer Verlag, Berlin, 1990.CrossRefGoogle Scholar
  21. 21.
    G. Salton. Syntactic approaches to automatic book indexing. In 26th Annual Meeting of the Association for Computational Linguistics, Buffalo, New York, 1988.Google Scholar
  22. 22.
    G. Salton, Z. Zhao, and C. Buckley. A simple syntactic approach for the generation of indexing phrases. Technical Report 90-1137, Department of Computer Science, Cornell University, 1990.Google Scholar
  23. 23.
    S. Waterman. Distinguished usage. In B. Boguraev and J. Pustejovsky, editors, Corpus processing for domain acquisition, pages 143–172. MIT Press, Cambridge, MA, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Branimir Boguraev
    • 1
  • Christopher Kennedy
    • 2
  1. 1.Apple Research LaboratoriesUSA
  2. 2.Department of LinguisticsUSA

Personalised recommendations