A Comparison of Methods for Automatic Term Extraction for Domain Analysis

  • William B. Frakes
  • Gregory Kulczycki
  • Jason Tilley
Conference paper

DOI: 10.1007/978-3-319-14130-5_19

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8919)
Cite this paper as:
Frakes W.B., Kulczycki G., Tilley J. (2014) A Comparison of Methods for Automatic Term Extraction for Domain Analysis. In: Schaefer I., Stamelos I. (eds) Software Reuse for Dynamic Systems in the Cloud and Beyond. ICSR 2015. Lecture Notes in Computer Science, vol 8919. Springer, Cham

Abstract

Fourteen word frequency metrics were tested to evaluate their effectiveness in identifying vocabulary in a domain. Fifteen domain-engineering projects were examined to measure how closely the vocabularies selected by the fourteen word frequency metrics were to the vocabularies produced by domain engineers. Stemming and stopword removal were also evaluated to measure their impact on selecting proper vocabulary terms. The results of the experiment show that stemming and stopword removal do improve performance and that term frequency is a valuable contributor to performance. Most word frequency metrics gave similar results. A few of the metrics did poorly compared to the others.

Keywords

domain engineering vocabulary extraction stemming stoplists word frequency metrics software reuse domain documents 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • William B. Frakes
    • 1
  • Gregory Kulczycki
    • 1
  • Jason Tilley
    • 1
  1. 1.Software Reuse LaboratoryVirginia TechFalls ChurchUSA

Personalised recommendations