Abstract
This paper presents the process of identifying recurrent multi-word expressions (i.e. lexical bundles) relevant for a writing-aid of academic texts in Spanish. It also proposes a repertory of discourse functions that enables the classification of the candidate word-strings as well as their onomasiological retrieval. This classification into discourse functions is also key in the process of selecting candidates, as those that do not fit in any function will predictably be of little interest in the context of the mentioned writing aid. Through the examination of the resulting data, the study explores the correlation between candidate selection by lexicographers and several association measures proposed in the literature to obtain high-quality lexical bundles, with a view to assess the feasibility of automating this process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The corpus was tokenized with Freeling [16].
- 4.
In Biber et al. [3] they are called “stance” and “referential expressions” and “discourse organizers”, to which a fourth category of “special conversational” functions is added.
- 5.
For the sake of brevity, we do not give all the names of this group of functions.
- 6.
A sample of the most frequent bundles selected can be seen in the Appendix, Table 5. For the sake of space, we only give the 35 most frequent ones.
- 7.
For this, we used R’s PRROC package [11].
References
Alonso-Ramos, M., García-Salido, M., Garcia, M.: Exploiting a corpus to compile a lexical resource for academic writing: Spanish lexical combinations. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 571–586. Lexical Computing CZ, Brno (2017). https://elex.link/elex2017/proceedingsdownload/
Appel, R., Trofimovich, P.: Transitional probability predicts native and non-native use of formulaic sequences. Int. J. Appl. Linguist. 27, 1–20 (2017). https://doi.org/10.1111/ijal.12100. http://doi.wiley.com/10.1111/ijal.12100
Biber, D., Conrad, S., Viviana, C.: If you look at....: lexical bundles in university teaching and textbooks. Appl. Linguist. 25, 371–405 (2004)
Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23(4), 397–423 (2004). https://doi.org/10.1016/j.esp.2003.12.001
da Cunha, I., Montané, M.A., Hysa, L.: The arText prototype: an automatic system for writing specialized texts. In: Peñas, A., Martins, A. (eds.) Proceedings of the EACL 2017 Software Demonstrations, pp. 57–60. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/e17-3015
Durrant, P.: Lexical bundles and disciplinary variation in university students’ writing: mapping the territories. Appl. Linguist. 38(2), 165–193 (2017). https://doi.org/10.1093/applin/amv011. http://applij.oxfordjournals.org/cgi/doi/10.1093/applin/amv011
Evert, S., Uhrig, P., Bartsch, S., Proisl, T.: E-VIEW-alation - a large-scale evaluation study of association measures for collocation identification. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 531–549. Lexical Computing CZ, Brno (2017). https://elex.link/elex2017/proceedings-download/
Ferrero, C.L., Renau, I., Nazar, R., Torner, S.: Computer-assisted revision in Spanish academic texts: peer-assessment. Procedia - Soc. Behav. Sci. 141, 470–483 (2014). https://doi.org/10.1016/j.sbspro.2014.05.083
Grabowski, Ł., Juknevičiené, R.: Towards a refined inventory of lexical bundles: an experiment in the formulex method. Stud. Lang. 29, 58–73 (2017). https://doi.org/10.5755/j01.sal.0.29.15327. http://www.kalbos.ktu.lt/index.php/KStud/article/view/15327
Granger, S., Paquot, M.: Electronic lexicography goes local design and structures of a needs-driven online academic writing aid. Lexicogr.: Int. Ann. Lexicogr. 31(1), 118–141 (2015). https://doi.org/10.1515/lexi. http://hdl.handle.net/2078.1/166516
Grau, J., Grosse, I., Keilwagen, J.: PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31(15), 2595–2597 (2015). https://doi.org/10.1093/bioinformatics/btv153
Hyland, K.: As can be seen: lexical bundles and disciplinary variation. Engl. Specif. Purp. 27(1), 4–21 (2008). https://doi.org/10.1016/j.esp.2007.06.001
Kübler, N., Pecman, M.: The ARTES bilingual LSP dictionary: from collocation to higher order phraseology. In: Electronic Lexicography, pp. 187–210. Oxford University Press, November 2012. https://doi.org/10.1093/acprof:oso/9780199654864.003.0010
Mel’čuk, I.: Clichés, an understudied subclass of phrasemes. Yearb. Phraseol. 6(1), 55–86 (2015). https://doi.org/10.1515/phras-2015-0005. http://www.degruyter.com/view/j/yop.2015.6.issue-1/phras-2015-0005/phras-2015-0005.xml
Montolío, E.: Mecanismos de cohesión (II). Los conectores. In: Montolío, E. (ed.) Manual de escritura académica y profesional, pp. 9–92. Ariel, Barcelona (2014)
Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Calzolari, N., et al. (eds.) Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC2012), pp. 2473–2479. European Language Resources Association (ELRA) (2012). http://dblp.uni-trier.de/db/conf/lrec/lrec2012.html#PadroS12
Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44(1–2), 137–158 (2010)
Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics, July 2006
Pérez-Llantada, C.: Formulaic language in L1 and L2 expert academic writing: convergent and divergent usage. J. Engl. Acad. Purp. 14, 84–94 (2014)
Salazar, D.: Lexical Bundles in Native and Non-native Scientific Writing: Applying a Corpus-based Study to Language Teaching. Studies in Corpus Linguistics. John Benjamins Publishing Company, Amsterdam (2014). https://books.google.es/books?id=9OJ4oAEACAAJ
Simpson-Vlach, R., Ellis, N.C.: An academic formulas list: new methods in phraseology research. Appl. Linguist. 31(4), 487–512 (2010). https://doi.org/10.1093/applin/amp058
Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991). https://doi.org/10.2307/330144
Verdaguer, I., et al.: SciE-Lex. A lexical database. In: Verdaguer, I., Laso, N.J., Salazar, D. (eds.) Biomedical English: A Corpus-Based Approach, pp. 21–38. John Benjamins, Amsterdam/Philadelphia (2013)
Wei, N., Li, J.: A new computing method for extracting contiguous phraseological sequences from academic text corpora. Int. J. Corpus Linguist. 18(4), 506–535 (2013). https://doi.org/10.1075/ijcl.18.4.03wei
Acknowledgments
Supported by Xunta de Galicia, through grant ED481D 2017/009, MINECO, through grant IJCI-2016-29598 and project FFI2016-78299-P, and by a 2017 Leonardo Grant for Researchers and Cultural Creators (BBVA Foundation).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendix: Sample of the Selected Most Frequent Bundles
Appendix: Sample of the Selected Most Frequent Bundles
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
García Salido, M., Garcia, M., Alonso-Ramos, M. (2019). Identifying Lexical Bundles for an Academic Writing Assistant in Spanish. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019. Lecture Notes in Computer Science(), vol 11755. Springer, Cham. https://doi.org/10.1007/978-3-030-30135-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-30135-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30134-7
Online ISBN: 978-3-030-30135-4
eBook Packages: Computer ScienceComputer Science (R0)