Identifying Lexical Bundles for an Academic Writing Assistant in Spanish

  • Marcos García SalidoEmail author
  • Marcos GarciaEmail author
  • Margarita Alonso-RamosEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11755)


This paper presents the process of identifying recurrent multi-word expressions (i.e. lexical bundles) relevant for a writing-aid of academic texts in Spanish. It also proposes a repertory of discourse functions that enables the classification of the candidate word-strings as well as their onomasiological retrieval. This classification into discourse functions is also key in the process of selecting candidates, as those that do not fit in any function will predictably be of little interest in the context of the mentioned writing aid. Through the examination of the resulting data, the study explores the correlation between candidate selection by lexicographers and several association measures proposed in the literature to obtain high-quality lexical bundles, with a view to assess the feasibility of automating this process.


Lexical bundles Academic writing Discourse functions 



Supported by Xunta de Galicia, through grant ED481D 2017/009, MINECO, through grant IJCI-2016-29598 and project FFI2016-78299-P, and by a 2017 Leonardo Grant for Researchers and Cultural Creators (BBVA Foundation).


  1. 1.
    Alonso-Ramos, M., García-Salido, M., Garcia, M.: Exploiting a corpus to compile a lexical resource for academic writing: Spanish lexical combinations. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 571–586. Lexical Computing CZ, Brno (2017).
  2. 2.
    Appel, R., Trofimovich, P.: Transitional probability predicts native and non-native use of formulaic sequences. Int. J. Appl. Linguist. 27, 1–20 (2017). Scholar
  3. 3.
    Biber, D., Conrad, S., Viviana, C.: If you look at....: lexical bundles in university teaching and textbooks. Appl. Linguist. 25, 371–405 (2004)CrossRefGoogle Scholar
  4. 4.
    Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23(4), 397–423 (2004). Scholar
  5. 5.
    da Cunha, I., Montané, M.A., Hysa, L.: The arText prototype: an automatic system for writing specialized texts. In: Peñas, A., Martins, A. (eds.) Proceedings of the EACL 2017 Software Demonstrations, pp. 57–60. Association for Computational Linguistics (2017).
  6. 6.
    Durrant, P.: Lexical bundles and disciplinary variation in university students’ writing: mapping the territories. Appl. Linguist. 38(2), 165–193 (2017). Scholar
  7. 7.
    Evert, S., Uhrig, P., Bartsch, S., Proisl, T.: E-VIEW-alation - a large-scale evaluation study of association measures for collocation identification. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 531–549. Lexical Computing CZ, Brno (2017).
  8. 8.
    Ferrero, C.L., Renau, I., Nazar, R., Torner, S.: Computer-assisted revision in Spanish academic texts: peer-assessment. Procedia - Soc. Behav. Sci. 141, 470–483 (2014). Scholar
  9. 9.
    Grabowski, Ł., Juknevičiené, R.: Towards a refined inventory of lexical bundles: an experiment in the formulex method. Stud. Lang. 29, 58–73 (2017). Scholar
  10. 10.
    Granger, S., Paquot, M.: Electronic lexicography goes local design and structures of a needs-driven online academic writing aid. Lexicogr.: Int. Ann. Lexicogr. 31(1), 118–141 (2015). Scholar
  11. 11.
    Grau, J., Grosse, I., Keilwagen, J.: PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31(15), 2595–2597 (2015). Scholar
  12. 12.
    Hyland, K.: As can be seen: lexical bundles and disciplinary variation. Engl. Specif. Purp. 27(1), 4–21 (2008). Scholar
  13. 13.
    Kübler, N., Pecman, M.: The ARTES bilingual LSP dictionary: from collocation to higher order phraseology. In: Electronic Lexicography, pp. 187–210. Oxford University Press, November 2012. Scholar
  14. 14.
    Mel’čuk, I.: Clichés, an understudied subclass of phrasemes. Yearb. Phraseol. 6(1), 55–86 (2015). Scholar
  15. 15.
    Montolío, E.: Mecanismos de cohesión (II). Los conectores. In: Montolío, E. (ed.) Manual de escritura académica y profesional, pp. 9–92. Ariel, Barcelona (2014)Google Scholar
  16. 16.
    Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Calzolari, N., et al. (eds.) Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC2012), pp. 2473–2479. European Language Resources Association (ELRA) (2012).
  17. 17.
    Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44(1–2), 137–158 (2010)CrossRefGoogle Scholar
  18. 18.
    Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics, July 2006Google Scholar
  19. 19.
    Pérez-Llantada, C.: Formulaic language in L1 and L2 expert academic writing: convergent and divergent usage. J. Engl. Acad. Purp. 14, 84–94 (2014)CrossRefGoogle Scholar
  20. 20.
    Salazar, D.: Lexical Bundles in Native and Non-native Scientific Writing: Applying a Corpus-based Study to Language Teaching. Studies in Corpus Linguistics. John Benjamins Publishing Company, Amsterdam (2014). Scholar
  21. 21.
    Simpson-Vlach, R., Ellis, N.C.: An academic formulas list: new methods in phraseology research. Appl. Linguist. 31(4), 487–512 (2010). Scholar
  22. 22.
    Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991). Scholar
  23. 23.
    Verdaguer, I., et al.: SciE-Lex. A lexical database. In: Verdaguer, I., Laso, N.J., Salazar, D. (eds.) Biomedical English: A Corpus-Based Approach, pp. 21–38. John Benjamins, Amsterdam/Philadelphia (2013)CrossRefGoogle Scholar
  24. 24.
    Wei, N., Li, J.: A new computing method for extracting contiguous phraseological sequences from academic text corpora. Int. J. Corpus Linguist. 18(4), 506–535 (2013). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Grupo LyS, Departamento de Letras, Fac. de FiloloxíaUniversidade da CoruñaA CoruñaSpain
  2. 2.CITICUniversidade da CoruñaA CoruñaSpain

Personalised recommendations