Advertisement

Yet Another Ranking Function for Automatic Multiword Term Extraction

  • Juan Antonio Lossio-Ventura
  • Clement Jonquet
  • Mathieu Roche
  • Maguelonne Teisseire
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8686)

Abstract

Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to term extraction, e.g., noise, silence, low frequency, large-corpora, complexity of the multiword term extraction process. Instead, we focus on managing the entire set of problems, e.g., detecting rare terms and overcoming the low frequency issue. We show that the two proposed measures outperform precision results previously reported for automatic multiword extraction by comparing them with the state-of-the-art reference measures.

Keywords

Ranking Function Candidate Term Term Extraction Linguistic Pattern Text Mining Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahmad, K., Gillam, L., Tostevin, L.: University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation, Retrieval (WILDER). In: TREC (1999)Google Scholar
  2. 2.
    Barrón-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An improved automatic term recognition method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125–136. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Information Retrieval 15, 54–92 (2012)CrossRefGoogle Scholar
  4. 4.
    Conrado, M.S., Pardo, T.A.S., Rezende, S.O.: Exploration of a Rich Feature Set for Automatic Term Extraction. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part I. LNCS (LNAI), vol. 8265, pp. 342–354. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Dobrov, B., Loukachevitch, N.: Multiple Evidence for Term Extraction in Broad Domains. In: Proceeding of Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria, pp. 710–715 (2011)Google Scholar
  6. 6.
    Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multiword terms: the C-value/NC-value Method. International Journal on Digital Libraries 3, 115–130 (2000)CrossRefGoogle Scholar
  7. 7.
    Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition, classification in biological science journal articles. In: Proceeding of the Computional Terminology for Medical, Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000)Google Scholar
  8. 8.
    Hliaoutakis, A., Zervanou, K., Petrakis, E.G.M.: The AMTEx approach in the medical document indexing, retrieval application. Data & Knowl. Engineering 68, 380–392 (2009)CrossRefGoogle Scholar
  9. 9.
    Ittoo, A., Bouma, G.: Term Extraction from Sparse, Ungrammatical Domain-specific Documents. Expert Systems with Applications 40, 2530–2540 (2013)CrossRefGoogle Scholar
  10. 10.
    Ji, L., Sum, M., Lu, Q., Li, W., Chen, Y.: Chinese Terminology Extraction Using Window-Based Contextual Information. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 62–74. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Kageura, K., Umino, B.: Methods of automatic term recognition: A review. Terminology 3, 259–289 (1996)CrossRefGoogle Scholar
  12. 12.
    Kozakov, L., Park, Y., Fin, T., Drissi, Y., Doganata, N., Confino, T.: Glossary extraction, knowledge in large organisations via semantic web technologies. In: Proceedings of the 6th International Semantic Web Conference, he 2nd Asian Semantic Web Conference (Semantic Web Challenge Track) (2004)Google Scholar
  13. 13.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Biomedical Terminology Extraction: A new combination of Statistical, Web Mining Approaches. In: Proceedings of Journées Internationales d’Analyse Statistique des Données Textuelles (JADT 2014), Paris, France (2014)Google Scholar
  14. 14.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Combining C-value, Keyword Extraction Methods for Biomedical Terms Extraction. In: Proceedings of the Fifth International Symposium on Languages in Biology, Medicine (LBM 2013), Tokyo, Japan, pp. 45–49 (2013)Google Scholar
  15. 15.
    Lossio-Ventura, J.A., Hacid, H., Ansiaux, A., Maag, M.L.: Conversations reconstruction in the social web. In: Proceedings of the 21st International Conference Companion on World Wide Web (WWW 2012), pp. 573–574. ACM, Lyon (2012)CrossRefGoogle Scholar
  16. 16.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13, 157–169 (2004)CrossRefGoogle Scholar
  17. 17.
    Newman, D., Koilada, N., Lau, J.H., Baldwin, T.: Bayesian Text Segmentation for Index Term Identification, Keyphrase Extraction. In: Proceedings of 24th International Conference on Computational Linguistics, Mumbai, India, pp. 2077–2092 (2012)Google Scholar
  18. 18.
    Noh, T., Park, S., Yoon, H., Lee, S., Park, S.: An Automatic Translation of Tags for Multimedia Contents Using Folksonomy Networks. In: Proceedings of the 32Nd International ACM SIGIR Conference on Research, Development in Information Retrieval, SIGIR 2009, pp. 492–499. ACM, Boston (2009)Google Scholar
  19. 19.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Stanford InfoLab (1999)Google Scholar
  20. 20.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Mining: Theory, Applications, pp. 1–20. John Wiley, Sons, Ltd. (2010)Google Scholar
  21. 21.
    Rousseau, F., Vazirgiannis, M.: Graph-of-word, TW-IDF: New Approach to Ad Hoc IR. In: Proceedings of the 22nd ACM International Conference on Conference on Information, Knowledge Management, CIKM 2013, pp. 59–68. ACM, San Francisco (2013)CrossRefGoogle Scholar
  22. 22.
    Stoykova, V., Petkova, E.: Automatic extraction of mathematical terms for precalculus. Procedia Technology Journal 1, 464–468 (2012)CrossRefGoogle Scholar
  23. 23.
    Van Eck, N.J., Waltman, L., Noyons, E.C.M., Buter, R.K.: Automatic term identification for bibliometric mapping. Scientometrics 82, 581–596 (2010)CrossRefGoogle Scholar
  24. 24.
    Zhang, X., Song, Y., Fang, A.C.: Term recognition using conditional random fields. In: International Conference on Natural Language Processing, Knowledge Engineering (NLP-KE), pp. 1–6. IEEE (2010)Google Scholar
  25. 25.
    Zhang, Z., Iria, J., Brewster, C., Ciravegna, F.: A Comparative Evaluation of Term Recognition Algorithms. In: Proceedings of the Sixth International Conference on Language Resources, Evaluation (LREC 2008), Marrakech, Morocco (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Juan Antonio Lossio-Ventura
    • 1
  • Clement Jonquet
    • 1
  • Mathieu Roche
    • 1
    • 2
  • Maguelonne Teisseire
    • 1
    • 2
  1. 1.University of Montpellier 2, LIRMM, CNRS - MontpellierFrance
  2. 2.Irstea, CIRAD, TETIS - MontpellierFrance

Personalised recommendations