Skip to main content

Automatic Extraction of the Phraseology Through NooJ

  • Conference paper
  • First Online:
Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications (NooJ 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 811))

Abstract

To teach the nominal expressions of the phraseology to foreign learners, we must build a corpus from which we can extract desirable sequences. Modeling and disambiguation are at the heart of extraction. In this article, we discuss how the two procedures are established and also show how a data implementation is processed in NooJ. At the end of the article, our quantitative and qualitative analyses prove that the result of this extraction is positive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In NooJ, the single quotation marks (<>) allow us to find all the occurrences of this term and its variants.

References

  1. Anthony, L.: AntConc: design and development of a freeware corpus analysis toolkit for the technical writing classroom. In: Communication présentée à la Conference IPC 2005, pp. 729–737 (2005)

    Google Scholar 

  2. Cavalla, C.: La phraséologie en classe de FLE. Les Langues Modernes 1/2009 (2009). http://www.aplv-languesmodernes.org/spip.php?article2292

  3. Cavalla, C., Loiseau, M.: Scientext comme corpus pour l’enseignement. In: Tutin, A., Grossmann, F. (Eds.) L’écrit scientifique: du lexique au discours. Autour de Scientext, Rennes: PUR, pp. 163–182 (2013)

    Google Scholar 

  4. Church, K., Gale, W., Hanks, P., Hindle, D.: Using statistics in lexical analysis, pp. 115–164 (1991)

    Google Scholar 

  5. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 26–29 June 1989, Vancouver, Canada, pp. 76–83 (1989)

    Google Scholar 

  6. Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957). Livre traduit en 1969: Structures syntaxiques. Le Seuil, Paris

    Google Scholar 

  7. Choueka, Y., Klein, S.T., Neuwitz, E.: Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. J. Assoc. Lit. Linguist. Comput. 4, 34–38 (1983)

    Google Scholar 

  8. Cowie, A.P.: The place of illustrative material and collocations in the design of a learner’s dictionary. In: Strevens, P. (ed.) In Honour of A.S. Hornby. Oxford University Press, Oxford, pp. 127–139 (1978)

    Google Scholar 

  9. Goldman, J.P., Nerima, L., Wehrli, E.: Collocation extraction using a syntactic parser. In: Proceedings of the ACL 2001 Workshop on Collocation, Toulouse, pp. 61–66 (2001)

    Google Scholar 

  10. González-Rey, I.: La phraséologie du français. Presses Universitaires du Mirail, Toulouse (2002)

    Google Scholar 

  11. González-Rey, I.: La didactique du français idiomatique. E.M.E., Fernelmont (2008)

    Google Scholar 

  12. Grefenstette, G., Teufel, S.: Corpus-based method for automatic identification of support verbs for nominalizations. In: Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, 27–31 March 1995, Dublin, Ireland, pp. 98–103 (1995)

    Google Scholar 

  13. Gross, M.: Une classification des phrases «figées» du français. Revue québécoise de linguistique 11(2), 151–185 (1982)

    Article  MathSciNet  Google Scholar 

  14. Grossmann, F., Tutin, A.: Les collocations: analyse et traitement. De Werelt, Amsterdam (2003)

    Google Scholar 

  15. Kilgarriff, A., Tugwell, D.: Word sketch: extraction, combination and display of significant collocations for lexicography. In: Proceedings of the Workshop on Collocations: Computational Extraction, Analysis and Exploitation, ACL-EACL 2001, Toulouse, pp. 32–38 (2001)

    Google Scholar 

  16. Lamiroy, B., Klein, J.R.: Le problème central du figement est le semi-figement. Linx 53, 135–154 (2005)

    Article  Google Scholar 

  17. Lewis, M.: Teaching Collocation, Further Developments in the Lexical Approach. Language Teaching Publications LTP, Hove (2000)

    Google Scholar 

  18. Lin, D.: Extracting collocations from text corpora. In: First Workshop on Computational Terminology, Montréal, pp. 57–63 (1998)

    Google Scholar 

  19. Luka, N., Seretan, V., Wehrli, E.: Le problème de collocation en TAL. In: Nouveaux cahiers de linguistiques Française, pp. 95–115 (2006)

    Google Scholar 

  20. Mejri, S.: Figement, néologie et renouvellement du lexique. Linx. Revue des linguistes de l’université Paris X Nanterre 52, 163–174 (2005). https://doi.org/10.4000/linx.231

    Google Scholar 

  21. Mel’čuk, I.: La Phraséologie et son rôle dans l’enseignement-apprentissage d’une langue étrangère. Études de linguistique appliquée 92, 82–113 (1993)

    Google Scholar 

  22. Polguère, A.: Towards a theoretically-motivated general public dictionary of semantic derivations and collocations for French. In: Proceedings of EURALEX 2000, Stuttgart, pp. 517–527 (2000)

    Google Scholar 

  23. Polguère, A.: Lexicologie et sémantique lexicale: notions fondamentales. Troisième édition (première édition en 2003), les presses de l’Université de Montréal, Montréal (2016)

    Google Scholar 

  24. Sag, I., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City, pp. 1–15 (2002)

    Google Scholar 

  25. Salem, A., équipe SYLED: Statistique textuelle. Dunod, Paris (2001)

    Google Scholar 

  26. Silberztein, M., Tutin, A.: NooJ, un outil TAL pour l’enseignement des langues: application pour l’étude de la morphologie lexicale en FLEM. Alsic, 8(2), 123–134 (2005)

    Google Scholar 

  27. Smadja, F.: Retrieving collocations form text: Xtract. Comput. Linguist. 19(1), 143–177 (1993)

    Google Scholar 

  28. Tutin, A.: Pour une modélisation dynamique des collocations dans les textes. Actes d’Euralex, Lorient (2004)

    Google Scholar 

  29. Yang, T.: Cuisitext: un corpus écrit et oral pour l’enseignement, colloque LOSP (Langues sur objectifs spécifiques: perspective croisées entre linguistique et didactique), Grenoble, 24–25 November 2016 (2016). http://losp2016.u-grenoble3.fr

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, T. (2018). Automatic Extraction of the Phraseology Through NooJ. In: Mbarki, S., Mourchid, M., Silberztein, M. (eds) Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications. NooJ 2017. Communications in Computer and Information Science, vol 811. Springer, Cham. https://doi.org/10.1007/978-3-319-73420-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73420-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73419-4

  • Online ISBN: 978-3-319-73420-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics