Towards Automatic Domain Classification of Technical Terms: Estimating Domain Specificity of a Term Using the Web

  • Takehito Utsuro
  • Mitsuhiro Kida
  • Masatsugu Tonoike
  • Satoshi Sato
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4182)

Abstract

This paper proposes a method of domain specificity estimation of technical terms using the Web. In the proposed method, it is assumed that, for a certain technical domain, a list of known technical terms of the domain is given. Technical documents of the domain are collected through the Web search engine, which are then used for generating a vector space model for the domain. The domain specificity of a target term is estimated according to the distribution of the domain of the sample pages of the target term. Experimental evaluation results show that the proposed method achieved mostly 90% precision/recall.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chung, T.M.: A corpus comparison approach for terminology extraction. Terminology 9(2), 221–246 (2004)CrossRefGoogle Scholar
  2. 2.
    Drouin, P.: Term extraction using non-technical corpora as a point of leverage. Terminology 9(1), 99–117 (2003)CrossRefGoogle Scholar
  3. 3.
    Huang, C.-C., Lin, K.-M., Chien, L.-F.: Automatic training corpora acquisition through Web mining. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 193–199 (2005)Google Scholar
  4. 4.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Springer, Heidelberg (2002)Google Scholar
  5. 5.
    Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 179–186 (2003)Google Scholar
  6. 6.
    Liu, B., Li, X., Lee, W.S., Yu, P.S.: Text classification by labeling words. In: Proceedings of the 19th AAAI, pp. 425–430 (2004)Google Scholar
  7. 7.
    Nakagawa, H., Mori, T.: Automatic term recognition based on statistics of compound nouns and their components. Terminology 9(2), 201–219 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Takehito Utsuro
    • 1
  • Mitsuhiro Kida
    • 2
  • Masatsugu Tonoike
    • 3
  • Satoshi Sato
    • 4
  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaTsukubaJapan
  2. 2.Nintendo Co.,Ltd.Kyoto-shiJapan
  3. 3.Graduate School of InformaticsKyoto UniversityKyotoJapan
  4. 4.Graduate School of EngineeringNagoya UniversityNagoyaJapan

Personalised recommendations