Identifying Lexical Bundles for an Academic Writing Assistant in Spanish

García Salido, Marcos; Garcia, Marcos; Alonso-Ramos, Margarita

doi:10.1007/978-3-030-30135-4_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11755))

Included in the following conference series:

International Conference on Computational and Corpus-Based Phraseology

784 Accesses

Abstract

This paper presents the process of identifying recurrent multi-word expressions (i.e. lexical bundles) relevant for a writing-aid of academic texts in Spanish. It also proposes a repertory of discourse functions that enables the classification of the candidate word-strings as well as their onomasiological retrieval. This classification into discourse functions is also key in the process of selecting candidates, as those that do not fit in any function will predictably be of little interest in the context of the mentioned writing aid. Through the examination of the resulting data, the study explores the correlation between candidate selection by lexicographers and several association measures proposed in the literature to obtain high-quality lexical bundles, with a view to assess the feasibility of automating this process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.estilector.com/.
2.
http://sistema-artext.com/.
3.
The corpus was tokenized with Freeling [16].
4.
In Biber et al. [3] they are called “stance” and “referential expressions” and “discourse organizers”, to which a fourth category of “special conversational” functions is added.
5.
For the sake of brevity, we do not give all the names of this group of functions.
6.
A sample of the most frequent bundles selected can be seen in the Appendix, Table 5. For the sake of space, we only give the 35 most frequent ones.
7.
For this, we used R’s PRROC package [11].

References

Alonso-Ramos, M., García-Salido, M., Garcia, M.: Exploiting a corpus to compile a lexical resource for academic writing: Spanish lexical combinations. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 571–586. Lexical Computing CZ, Brno (2017). https://elex.link/elex2017/proceedingsdownload/
Appel, R., Trofimovich, P.: Transitional probability predicts native and non-native use of formulaic sequences. Int. J. Appl. Linguist. 27, 1–20 (2017). https://doi.org/10.1111/ijal.12100. http://doi.wiley.com/10.1111/ijal.12100
Article Google Scholar
Biber, D., Conrad, S., Viviana, C.: If you look at....: lexical bundles in university teaching and textbooks. Appl. Linguist. 25, 371–405 (2004)
Article Google Scholar
Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23(4), 397–423 (2004). https://doi.org/10.1016/j.esp.2003.12.001
Article Google Scholar
da Cunha, I., Montané, M.A., Hysa, L.: The arText prototype: an automatic system for writing specialized texts. In: Peñas, A., Martins, A. (eds.) Proceedings of the EACL 2017 Software Demonstrations, pp. 57–60. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/e17-3015
Durrant, P.: Lexical bundles and disciplinary variation in university students’ writing: mapping the territories. Appl. Linguist. 38(2), 165–193 (2017). https://doi.org/10.1093/applin/amv011. http://applij.oxfordjournals.org/cgi/doi/10.1093/applin/amv011
Article MathSciNet Google Scholar
Evert, S., Uhrig, P., Bartsch, S., Proisl, T.: E-VIEW-alation - a large-scale evaluation study of association measures for collocation identification. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 531–549. Lexical Computing CZ, Brno (2017). https://elex.link/elex2017/proceedings-download/
Ferrero, C.L., Renau, I., Nazar, R., Torner, S.: Computer-assisted revision in Spanish academic texts: peer-assessment. Procedia - Soc. Behav. Sci. 141, 470–483 (2014). https://doi.org/10.1016/j.sbspro.2014.05.083
Article Google Scholar
Grabowski, Ł., Juknevičiené, R.: Towards a refined inventory of lexical bundles: an experiment in the formulex method. Stud. Lang. 29, 58–73 (2017). https://doi.org/10.5755/j01.sal.0.29.15327. http://www.kalbos.ktu.lt/index.php/KStud/article/view/15327
Article Google Scholar
Granger, S., Paquot, M.: Electronic lexicography goes local design and structures of a needs-driven online academic writing aid. Lexicogr.: Int. Ann. Lexicogr. 31(1), 118–141 (2015). https://doi.org/10.1515/lexi. http://hdl.handle.net/2078.1/166516
Article Google Scholar
Grau, J., Grosse, I., Keilwagen, J.: PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31(15), 2595–2597 (2015). https://doi.org/10.1093/bioinformatics/btv153
Article Google Scholar
Hyland, K.: As can be seen: lexical bundles and disciplinary variation. Engl. Specif. Purp. 27(1), 4–21 (2008). https://doi.org/10.1016/j.esp.2007.06.001
Article MathSciNet Google Scholar
Kübler, N., Pecman, M.: The ARTES bilingual LSP dictionary: from collocation to higher order phraseology. In: Electronic Lexicography, pp. 187–210. Oxford University Press, November 2012. https://doi.org/10.1093/acprof:oso/9780199654864.003.0010
Chapter Google Scholar
Mel’čuk, I.: Clichés, an understudied subclass of phrasemes. Yearb. Phraseol. 6(1), 55–86 (2015). https://doi.org/10.1515/phras-2015-0005. http://www.degruyter.com/view/j/yop.2015.6.issue-1/phras-2015-0005/phras-2015-0005.xml
Article Google Scholar
Montolío, E.: Mecanismos de cohesión (II). Los conectores. In: Montolío, E. (ed.) Manual de escritura académica y profesional, pp. 9–92. Ariel, Barcelona (2014)
Google Scholar
Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Calzolari, N., et al. (eds.) Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC2012), pp. 2473–2479. European Language Resources Association (ELRA) (2012). http://dblp.uni-trier.de/db/conf/lrec/lrec2012.html#PadroS12
Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44(1–2), 137–158 (2010)
Article Google Scholar
Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics, July 2006
Google Scholar
Pérez-Llantada, C.: Formulaic language in L1 and L2 expert academic writing: convergent and divergent usage. J. Engl. Acad. Purp. 14, 84–94 (2014)
Article Google Scholar
Salazar, D.: Lexical Bundles in Native and Non-native Scientific Writing: Applying a Corpus-based Study to Language Teaching. Studies in Corpus Linguistics. John Benjamins Publishing Company, Amsterdam (2014). https://books.google.es/books?id=9OJ4oAEACAAJ
Book Google Scholar
Simpson-Vlach, R., Ellis, N.C.: An academic formulas list: new methods in phraseology research. Appl. Linguist. 31(4), 487–512 (2010). https://doi.org/10.1093/applin/amp058
Article Google Scholar
Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991). https://doi.org/10.2307/330144
Book Google Scholar
Verdaguer, I., et al.: SciE-Lex. A lexical database. In: Verdaguer, I., Laso, N.J., Salazar, D. (eds.) Biomedical English: A Corpus-Based Approach, pp. 21–38. John Benjamins, Amsterdam/Philadelphia (2013)
Chapter Google Scholar
Wei, N., Li, J.: A new computing method for extracting contiguous phraseological sequences from academic text corpora. Int. J. Corpus Linguist. 18(4), 506–535 (2013). https://doi.org/10.1075/ijcl.18.4.03wei
Article Google Scholar

Download references

Acknowledgments

Supported by Xunta de Galicia, through grant ED481D 2017/009, MINECO, through grant IJCI-2016-29598 and project FFI2016-78299-P, and by a 2017 Leonardo Grant for Researchers and Cultural Creators (BBVA Foundation).

Author information

Authors and Affiliations

Grupo LyS, Departamento de Letras, Fac. de Filoloxía, Universidade da Coruña, Campus da Zapateira, 15071, A Coruña, Spain
Marcos García Salido, Marcos Garcia & Margarita Alonso-Ramos
CITIC, Universidade da Coruña, Campus de Elviña, 15071, A Coruña, Spain
Marcos Garcia & Margarita Alonso-Ramos

Authors

Marcos García Salido
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Margarita Alonso-Ramos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Marcos García Salido , Marcos Garcia or Margarita Alonso-Ramos .

Editor information

Editors and Affiliations

University of Malaga, Malaga, Spain
Gloria Corpas Pastor
University of Wolverhampton, Wolverhampton, UK
Ruslan Mitkov

Appendix: Sample of the Selected Most Frequent Bundles

Table 5. Most frequent bundles selected among bi-, tri- and four-grams

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García Salido, M., Garcia, M., Alonso-Ramos, M. (2019). Identifying Lexical Bundles for an Academic Writing Assistant in Spanish. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019. Lecture Notes in Computer Science(), vol 11755. Springer, Cham. https://doi.org/10.1007/978-3-030-30135-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-30135-4_11
Published: 18 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30134-7
Online ISBN: 978-3-030-30135-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identifying Lexical Bundles for an Academic Writing Assistant in Spanish

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix: Sample of the Selected Most Frequent Bundles

Appendix: Sample of the Selected Most Frequent Bundles

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation