Skip to main content

Identifying Lexical Bundles for an Academic Writing Assistant in Spanish

  • Conference paper
  • First Online:
Computational and Corpus-Based Phraseology (EUROPHRAS 2019)

Abstract

This paper presents the process of identifying recurrent multi-word expressions (i.e. lexical bundles) relevant for a writing-aid of academic texts in Spanish. It also proposes a repertory of discourse functions that enables the classification of the candidate word-strings as well as their onomasiological retrieval. This classification into discourse functions is also key in the process of selecting candidates, as those that do not fit in any function will predictably be of little interest in the context of the mentioned writing aid. Through the examination of the resulting data, the study explores the correlation between candidate selection by lexicographers and several association measures proposed in the literature to obtain high-quality lexical bundles, with a view to assess the feasibility of automating this process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.estilector.com/.

  2. 2.

    http://sistema-artext.com/.

  3. 3.

    The corpus was tokenized with Freeling [16].

  4. 4.

    In Biber et al. [3] they are called “stance” and “referential expressions” and “discourse organizers”, to which a fourth category of “special conversational” functions is added.

  5. 5.

    For the sake of brevity, we do not give all the names of this group of functions.

  6. 6.

    A sample of the most frequent bundles selected can be seen in the Appendix, Table 5. For the sake of space, we only give the 35 most frequent ones.

  7. 7.

    For this, we used R’s PRROC package [11].

References

  1. Alonso-Ramos, M., García-Salido, M., Garcia, M.: Exploiting a corpus to compile a lexical resource for academic writing: Spanish lexical combinations. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 571–586. Lexical Computing CZ, Brno (2017). https://elex.link/elex2017/proceedingsdownload/

  2. Appel, R., Trofimovich, P.: Transitional probability predicts native and non-native use of formulaic sequences. Int. J. Appl. Linguist. 27, 1–20 (2017). https://doi.org/10.1111/ijal.12100. http://doi.wiley.com/10.1111/ijal.12100

    Article  Google Scholar 

  3. Biber, D., Conrad, S., Viviana, C.: If you look at....: lexical bundles in university teaching and textbooks. Appl. Linguist. 25, 371–405 (2004)

    Article  Google Scholar 

  4. Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23(4), 397–423 (2004). https://doi.org/10.1016/j.esp.2003.12.001

    Article  Google Scholar 

  5. da Cunha, I., Montané, M.A., Hysa, L.: The arText prototype: an automatic system for writing specialized texts. In: Peñas, A., Martins, A. (eds.) Proceedings of the EACL 2017 Software Demonstrations, pp. 57–60. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/e17-3015

  6. Durrant, P.: Lexical bundles and disciplinary variation in university students’ writing: mapping the territories. Appl. Linguist. 38(2), 165–193 (2017). https://doi.org/10.1093/applin/amv011. http://applij.oxfordjournals.org/cgi/doi/10.1093/applin/amv011

    Article  MathSciNet  Google Scholar 

  7. Evert, S., Uhrig, P., Bartsch, S., Proisl, T.: E-VIEW-alation - a large-scale evaluation study of association measures for collocation identification. In: Kosem, I., Tiberius, C., Jakubícek, M., Kallas, J., Krek, S., Baisa, V. (eds.) Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, pp. 531–549. Lexical Computing CZ, Brno (2017). https://elex.link/elex2017/proceedings-download/

  8. Ferrero, C.L., Renau, I., Nazar, R., Torner, S.: Computer-assisted revision in Spanish academic texts: peer-assessment. Procedia - Soc. Behav. Sci. 141, 470–483 (2014). https://doi.org/10.1016/j.sbspro.2014.05.083

    Article  Google Scholar 

  9. Grabowski, Ł., Juknevičiené, R.: Towards a refined inventory of lexical bundles: an experiment in the formulex method. Stud. Lang. 29, 58–73 (2017). https://doi.org/10.5755/j01.sal.0.29.15327. http://www.kalbos.ktu.lt/index.php/KStud/article/view/15327

    Article  Google Scholar 

  10. Granger, S., Paquot, M.: Electronic lexicography goes local design and structures of a needs-driven online academic writing aid. Lexicogr.: Int. Ann. Lexicogr. 31(1), 118–141 (2015). https://doi.org/10.1515/lexi. http://hdl.handle.net/2078.1/166516

    Article  Google Scholar 

  11. Grau, J., Grosse, I., Keilwagen, J.: PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31(15), 2595–2597 (2015). https://doi.org/10.1093/bioinformatics/btv153

    Article  Google Scholar 

  12. Hyland, K.: As can be seen: lexical bundles and disciplinary variation. Engl. Specif. Purp. 27(1), 4–21 (2008). https://doi.org/10.1016/j.esp.2007.06.001

    Article  MathSciNet  Google Scholar 

  13. Kübler, N., Pecman, M.: The ARTES bilingual LSP dictionary: from collocation to higher order phraseology. In: Electronic Lexicography, pp. 187–210. Oxford University Press, November 2012. https://doi.org/10.1093/acprof:oso/9780199654864.003.0010

    Chapter  Google Scholar 

  14. Mel’čuk, I.: Clichés, an understudied subclass of phrasemes. Yearb. Phraseol. 6(1), 55–86 (2015). https://doi.org/10.1515/phras-2015-0005. http://www.degruyter.com/view/j/yop.2015.6.issue-1/phras-2015-0005/phras-2015-0005.xml

    Article  Google Scholar 

  15. Montolío, E.: Mecanismos de cohesión (II). Los conectores. In: Montolío, E. (ed.) Manual de escritura académica y profesional, pp. 9–92. Ariel, Barcelona (2014)

    Google Scholar 

  16. Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Calzolari, N., et al. (eds.) Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC2012), pp. 2473–2479. European Language Resources Association (ELRA) (2012). http://dblp.uni-trier.de/db/conf/lrec/lrec2012.html#PadroS12

  17. Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44(1–2), 137–158 (2010)

    Article  Google Scholar 

  18. Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics, July 2006

    Google Scholar 

  19. Pérez-Llantada, C.: Formulaic language in L1 and L2 expert academic writing: convergent and divergent usage. J. Engl. Acad. Purp. 14, 84–94 (2014)

    Article  Google Scholar 

  20. Salazar, D.: Lexical Bundles in Native and Non-native Scientific Writing: Applying a Corpus-based Study to Language Teaching. Studies in Corpus Linguistics. John Benjamins Publishing Company, Amsterdam (2014). https://books.google.es/books?id=9OJ4oAEACAAJ

    Book  Google Scholar 

  21. Simpson-Vlach, R., Ellis, N.C.: An academic formulas list: new methods in phraseology research. Appl. Linguist. 31(4), 487–512 (2010). https://doi.org/10.1093/applin/amp058

    Article  Google Scholar 

  22. Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991). https://doi.org/10.2307/330144

    Book  Google Scholar 

  23. Verdaguer, I., et al.: SciE-Lex. A lexical database. In: Verdaguer, I., Laso, N.J., Salazar, D. (eds.) Biomedical English: A Corpus-Based Approach, pp. 21–38. John Benjamins, Amsterdam/Philadelphia (2013)

    Chapter  Google Scholar 

  24. Wei, N., Li, J.: A new computing method for extracting contiguous phraseological sequences from academic text corpora. Int. J. Corpus Linguist. 18(4), 506–535 (2013). https://doi.org/10.1075/ijcl.18.4.03wei

    Article  Google Scholar 

Download references

Acknowledgments

Supported by Xunta de Galicia, through grant ED481D 2017/009, MINECO, through grant IJCI-2016-29598 and project FFI2016-78299-P, and by a 2017 Leonardo Grant for Researchers and Cultural Creators (BBVA Foundation).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Marcos García Salido , Marcos Garcia or Margarita Alonso-Ramos .

Editor information

Editors and Affiliations

Appendix: Sample of the Selected Most Frequent Bundles

Appendix: Sample of the Selected Most Frequent Bundles

Table 5. Most frequent bundles selected among bi-, tri- and four-grams

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

García Salido, M., Garcia, M., Alonso-Ramos, M. (2019). Identifying Lexical Bundles for an Academic Writing Assistant in Spanish. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019. Lecture Notes in Computer Science(), vol 11755. Springer, Cham. https://doi.org/10.1007/978-3-030-30135-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30135-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30134-7

  • Online ISBN: 978-3-030-30135-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics