Pruning Terminology Extracted from a Specialized Corpus for CV Ontology Acquisition

  • Mathieu Roche
  • Yves Kodratoff
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4278)


This paper presents an experimental study for extracting a terminology from a corpus made of Curriculum Vitae (CV). This terminology is to be used for ontology acquisition. The choice of the pruning rate of the terminology is crucial relative to the quality of the ontology acquired. In this paper, we investigate this pruning rate by using several evaluation measures (precision, recall, F-measure, and ROC curve).


Ontology Learning Binary Term Curriculum Vita Specialize Corpus Supervise Machine Learning Method 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amrani, A., Kodratoff, Y., Matte-Tailliez, O.: A semi-automatic system for tagging specialized corpora. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 670–681. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Aussenac-Gilles, N., Bourigault, D.: Construction d’ontologies à partir de textes. In: Actes de TALN 2003, vol. 2, pp. 27–47 (2003)Google Scholar
  3. 3.
    Bourigault, D., Jacquemin, C.: Term extraction + term clustering: An integrated platform for computer-aided terminology. In: Proceedings of EACL 1999, Bergen., pp. 15–22 (1999)Google Scholar
  4. 4.
    Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI, vol. 1, pp. 722–727 (1994)Google Scholar
  5. 5.
    Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: Resnik, P., Klavans, J. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)Google Scholar
  6. 6.
    David, S., Plante, P.: De la nécessité d’une approche morpho syntaxique dans l’analyse de textes. In: Intelligence Artificielle et Sciences Cognitives au Quebec, vol. 3, pp. 140–154 (1990)Google Scholar
  7. 7.
    Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
  8. 8.
    Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of ACL, Santa Cruz, US, pp. 17–24 (1996)Google Scholar
  9. 9.
    Fabre, C., Bourigault, D.: Linguistic clues for corpus-based acquisition of lexical dependencies. In: Corpus Linguistics, Lancaster, pp. 176–184 (2001)Google Scholar
  10. 10.
    Ferri, C., Flach, P., Hernandez-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Proceedings of ICML 2002, pp. 139–146 (2002)Google Scholar
  11. 11.
    Halliday, M.A.K.: System and Function in Language. Oxford University Press, London (1976)Google Scholar
  12. 12.
    Jacquemin, C.: Variation terminologique : Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. PhD thesis, Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes (1997)Google Scholar
  13. 13.
    Roche, M.: Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes. PhD thesis, Université de Paris, Décembre 11 (2004)Google Scholar
  14. 14.
    Roche, M., Azé, J., Kodratoff, Y., Sebag, M.: Learning interestingness measures in terminology extraction. A ROC-based approach. In: Proceedings of ”ROC Analysis in AI” Workshop (ECAI 2004), Valencia, Spain, pp. 81–88 (2004)Google Scholar
  15. 15.
    Roche, M., Heitz, T., Matte-Tailliez, O., Kodratoff, Y.: Exit: Un système itératif pour l’extraction de la terminologie du domaine à partir de corpus spécialisés. In: Proceedings of JADT 2004, vol. 2, pp. 946–956 (2004)Google Scholar
  16. 16.
    Shamsfard, M., Barforoush, A.A.: The state of the art in ontology learning: a framework for comparison. The Knowledge Engineering Review 18(4), 293–316 (2003)CrossRefGoogle Scholar
  17. 17.
    Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)Google Scholar
  18. 18.
    Thanopoulos, A., Fakotakis, N., Kokkianakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of LREC 2002, vol. 2, pp. 620–625 (2002)Google Scholar
  19. 19.
    Van Risbergen, C.J.: Information Retrieval, 2nd edn., London, Butterworths (1979)Google Scholar
  20. 20.
    Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In: Proceedings of ICML 2003, pp. 848–855 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mathieu Roche
    • 1
  • Yves Kodratoff
    • 2
  1. 1.LIRMM – UMR 5506, Université Montpellier 2Montpellier Cedex 5France
  2. 2.LRI – UMR 8623, Université Paris-SudOrsay CedexFrance

Personalised recommendations