Abstract
Technical terms (henceforth called simply terms), are important elements for digital libraries. In this paper we present a domainindependent method for the automatic extraction of multi-word terms, from machine-readable special language corpora.
The method, (C-value/NC-value), combines linguistic and statistical information. The first part, C-value enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms), 2) the incorporation of information from term context words to the extraction of terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ananiadou, S.: A Methodology for Automatic Term Recognition. Proceedings of the 15th International Conference on Computational Linguistics, COLING’94, (1994) 1034–1038
Ananiadou, S.: Towards a Methodology for Automatic Term Recognition. University of Manchester Institute of Science and Technology (1988)
Bourigault, D.: Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases. Proceedings of the 14th International Conference on Computational Lingustics, COLING’92, (1992) 977–981
Brill, E.: A simple rule-based part of speech tagger. Proceedings of the 3rd Conference of Applied Natural Language Processing, ANLP’92, (1992)
Brill, E.: A Corpus-Based Approach to Language Learning. Ph.D. Thesis, Dept. of Computer and information Science, University of Pennsylvania (1993)
Dagan, I., Pereira, F., Lee, L.: Similarity-Based Estimation of Word Cooccurence Probabilities. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, ACL’94, (1994) 272–278
Dagan, I., Church, K.: Termight: Identifying and Translating Technical Terminology. Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, EACL’95, (1995) 34–40
Daille, B., Gaussier, E., Langé, J.: Towards Automatic extraction of Monolingual and Bilingual Terminology. Proceedings of the 15th International Conference on Computational Linguistics, COLING’94, (1994) 515–521
Damerau, F.J.: Generating and Evaluating Domain-Oriented Multi-Word Terms from Texts. Information Processing & Management 29 (1993) 433–447
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19 (1993) 61–74
Enguehard, C., Pantera, L.: Automatic Natural Acquisition of a Terminology. Journal of Quantitative Linguistics 2 (1994) 27–32
Frantzi, K.T., and Sophia Ananiadou, S., Tsujii, J.: Extracting Terminological Expressions. The Special Interest Group Notes of Information Processing Society of Japan, 96-NL-112, (1996) 83–88
Frantzi, K.T., Ananiadou, S.: Extracting Nested Collocations. Proceedings of the 16th International Conference on Computational Linguistics, COLING’96, (1996) 41–46
Frantzi, K.T., Ananiadou, S., Tsujii, J.: Automatic Term Recognition using Contextual Cues. Proceedings of the 2nd Workshop on Multilinguality in Software Industry (MULSAIC’97), 15th International Joint Conference on Artificial Intelligence, IJCAI’97, (1997) 73–79
Frantzi, K.T.: Incorporating Context Information for the Extraction of Terms. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL) and 8th Conference of the European Chapter of the Association for Computational Linguistics (EACL), (1997) 501–503
Frantzi, K.T.: Automatic Recognition of Multi-Word Terms. Ph.D. Thesis, Manchester Metropolitan University Dept. Of Computing & Mathematics, in collaboration with UMIST Centre for Computational Linguistics, (1998)
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, (1994)
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1 (1995) 9–27
Kageura, K., Umino, B,: Methods of Automatic Term Recognition-A Review-. Terminology 3 (1996) 259–289
Larson, H.J., Larson, J.: Introduction to probability theory and statistical inference. Wiley series in probability and mathematical statistics, Wiley, New York, Chichester (1982)
Lauriston, A.: Automatic Term Recognition: performance of Linguistic and Statistical Techniques. Ph.D. Thesis, University of Manchester Institute of Science and Technology (1996)
Lehrberger, J.: Sublanguage analysis. Analyzing language in restricted domains, Ralph Grishman and Richard Kittredge (editors), Lawrence Erlbaum, 2 (1986) 1938
Penn: Penn Treebank Annotation. Computational Linguistics 19 (1993)
Sager, J.C.: Commentary by Prof. Juan Carlos Sager, Actes Table Ronde sur les Problfiemes du Découpage du Terms, Montréal, 26 aouŨt. Guy Rondeau, AILAComterm, Office de la Langue Francaise, Québec, (1978) 39–74
Sager, J.C., Dungworth, D., McDonald, P.F.: English Special Languages: principles and practice in science and technology. Oscar Brandstetter Verlag KG, Wiesbaden, (1980)
Sager, J.C.: A Practical Course in Terminology Processing. John Benjamins Publishing Company, (1990)
Salton, G.: Introduction to modern information retrieval. Computer Science, McGraw-Hill (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Frantzi, K.T., Ananiadou, S., Tsujii, J. (1998). The C-value/NC-value Method of Automatic Recognition for Multi-word Terms. In: Nikolaou, C., Stephanidis, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_35
Download citation
DOI: https://doi.org/10.1007/3-540-49653-X_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65101-7
Online ISBN: 978-3-540-49653-3
eBook Packages: Springer Book Archive