A Heuristic Strategy for Extracting Terms from Scientific Texts

  • Elena I. BolshakovaEmail author
  • Natalia E. Efremova
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 542)


The paper describes a strategy that applies heuristics to combine sets of terminological words and words combination pre-extracted from a scientific text by several term recognition procedures. Each procedure is based on a collection of lexico-syntactic patterns representing specific linguistic information about terms within scientific texts. Our strategy is aimed to improve the quality of automatic term extraction from a particular scientific text. The experiments have shown that the strategy gives 11–17 % increase of F-measure compared with the commonly-used methods of term extraction.


Multiword terms Automatic term extraction Text variants of terms Term occurrences in scientific text Lexico-syntactic patterns 



We would like to thank the anonymous reviewers of our paper for their helpful and constructive comments.


  1. 1.
    Arora, C., Sabetzadeh, M., Briand, L., Zimmer, F.: Improving requirements glossary construction via clustering: approach and industrial case studies. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, New York, NY (2014)Google Scholar
  2. 2.
    Bolshakova, E.I.: Recognition of author’s scientific and technical terms. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 281–290. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Bolshakova, E., Efremova, N., Noskov, A.: LSPL-patterns as a tool for information extraction from natural language texts. In: Markov, K., Ryazanov, V., Velychko, V., Aslanyan, L. (eds.) New Trends in Classification and Data Mining, pp. 110–118. ITHEA, Sofia (2010)Google Scholar
  4. 4.
    Bosma, W., Vossen, P.: Bootstrapping language neutral term extraction. In: Proceedings of the 7th Language Resources and Evaluation Conference, pp. 2277–2282. LREC, Valetta (2010)Google Scholar
  5. 5.
    Castellvi, M., Bagot, R., Palatresi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology, pp. 53–87. John Benjamins, Amsterdam (2001)CrossRefGoogle Scholar
  6. 6.
    Csomai, A., Mihalcea, R.: Investigations in unsupervised back-of-the-book indexing. In: Proceedings of the Florida Artificial Intelligence Research Society Conference, pp. 211–216 (2007)Google Scholar
  7. 7.
    Dobrov, B., Loukachevich, N., Syromiatnikov, S.: Forming base of terminological word combinations from problem oriented texts. In: Proceedings of the 5th Russian Scientific Conference “Digital Libraries: Perspective Methods and Technologies, Electronic Collections”, pp. 201–210 (2003) (in Russian)Google Scholar
  8. 8.
    Efremova, N.E.: Methods and Programming Tools for Extraction of Terminological Information from Scientific and Technical Texts. PhD Thesis, Lomonosov Moscow State University (2013) (in Russian)Google Scholar
  9. 9.
    Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-Word Terms: The C-value/NC-value method. In: Nikolau, C. et al. (Eds.) International Journal on Digital Libraries, vol. 3(2), pp. 115–130 (2000)Google Scholar
  10. 10.
    Jacquemin, C., Tsoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon, and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval, pp. 25–74. Kluwer Academic Publishers, Dordrecht (1999)CrossRefGoogle Scholar
  11. 11.
    Korkontzelos, I., Ananiadou, S.: Term extraction. In: Oxford Handbook of Computational Linguistics (2nd Ed.). Oxford University Press, Oxford (2014)Google Scholar
  12. 12.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  13. 13.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(1), 157–169 (2004)CrossRefGoogle Scholar
  14. 14.
    Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: Proceedings of 20th International Conference on Computational Linguistics COLING 2004, pp. 604–610. Morristown, NJ (2004)Google Scholar
  15. 15.
    Nokel, M.A., Bolshakova, E.I., Loukachevich, N.V.: Combining multiple features for single-word term extraction. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 1, no. 11 pp. 490–501. RGGU, Moscow (2012)Google Scholar
  16. 16.
    Paice, C.D., Jones P.A.: The identification of important concepts in highly structured technical papers. In: Korfhage, R., Rasmussen, E., Willett, P. (eds.) Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 69–78. ACM, Pittsburgh, PA (1993)Google Scholar
  17. 17.
    Smadja, F., McKeown, K.: Automatically extracting and representing collocations for language generation. In: Proceedings of the 28th Annual Meeting on Association for Computational Linguistics, pp. 252–259. ACL, Pittsburgh, PA (1990)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Lomonosov Moscow State University, National Research University Higher School of EconomicsMoscowRussia

Personalised recommendations