2.3 Cynllunio Corpws Cenedlaethol mewn Iaith Leiafrifoledig

Knight, Dawn; Morris, Steve; Fitzpatrick, Tess

doi:10.1007/978-3-030-72484-9_8

Dawn Knight⁴,
Steve Morris⁵ &
Tess Fitzpatrick⁵

90 Accesses

Abstract

Mae creu corpws yn cynnwys casglu set data ar sail egwyddorion, ac fel arfer yn achos corpora a gynlluniwyd at ddibenion cyffredinol, mae’n rhaid cyflwyno’r data hynny gan ddilyn proses anodi lle mae pob eitem yn cael ei ‘thagio’ yn ôl ei rhan ymadrodd (RhY). Mewn rhai achosion, cymhwysir set barod o dagiau i’r data, ac mewn achosion eraill mae’n ofynnol cael set bwrpasol o dagiau. Mae’n rhaid i gorpora gael seilwaith i’w lletya; mae creu neu ddod o hyd i hwn yn un o elfennau hanfodol eraill cynllunio corpws. Creu’r cydrannau hyn, ynghyd â thagiwr semantig (i ddynodi ystyr y data yn hytrach na’r rhan ymadrodd) a’i set o dagiau ei hun, yn ogystal â’r pecyn cymorth pedagogaidd pwrpasol (Y Tiwtiadur) oedd cynllun creu CorCenCC. Mae’r penderfyniadau ynghylch y seilwaith wedi’i yrru gan ddefnyddwyr a chasglu a phrosesu data, yn benodol, yn cynnig heriau arbennig yng nghyd-destun ieithoedd lleiafrifoledig. Yn y bennod hon rydym yn amlinellu sut aeth prosiect CorCenCC i’r afael â’r heriau hyn.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cyfeiriadau

Adolphs, S., & Carter, R. (2013). Spoken Corpus Linguistics: From Monomodal to Multimodal. Routledge.
Book Google Scholar
Adolphs, S., Knight, D., Smith, C., & Price, D. (2020). Crowdsourcing Formulaic Phrases: Towards a New Type of Spoken Corpus. Corpora, 15(1), 141–168.
Article Google Scholar
Anderson, J., Beavan, D., & Kay, C. (2007). SCOTS: Scottish Corpus of Texts and Speech. Yn J. Beal, K. Corrigan, & H. Moisl (Goln.), Creating and Digitizing Language Corpora: Volume 1: Synchronic Databases (tt. 17–34). Basingstoke: Palgrave Macmillan.
Google Scholar
Aston, G. (2001). Learning with Corpora. Open Library.
Google Scholar
Aston, G., & Burnard, L. (1997). The BNC Handbook: Exploring the British National Corpus with SARA. Gwasg Prifysgol Caeredin.
Google Scholar
Bauer, M. W., & Aarts, B. (2000). Corpus Construction: A Principle for Qualitative Data Collection. Yn M. W. Bauer & G. Gaskell (Goln.), Qualitative Researching: With Text, Image and Sound (tt. 19–37). Llundain: Sage.
Google Scholar
Biber, D. (1994). Representativeness in Corpus Design. Yn A. Zampolli, N. Calzolari, & M. Palmer (Goln.), Current Issues in Computational Linguistics: In Honour of Don Walker (tt. 377–407). Dordrecht: Springer Yr Iseldiroedd.
Google Scholar
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Longman/Pearson.
Google Scholar
Boulton, A. (2010). Bringing Corpora to the Masses: Free and Easy Tools for Interdisciplinary Language Studies. Yn N. Kübler (Gol.), Corpora, Language, Teaching, and Resources: From Theory to Practice (tt. 69–96). Berlin: Peter Lang.
Google Scholar
Brabham, D. (2008). Crowdsourcing as a Model for Problem Solving: An Introduction and Cases. Convergence: The International Journal of Research into New Media Technologies, 14, 75–90.
Article Google Scholar
Brookes, G., & McEnery, A. (2020). Corpus Linguistics. Yn S. Adolphs & D. Knight (Goln.), Routledge Handbook of English Language and Corpus Linguistics (tt. 378–404). Llundain: Routledge.
Google Scholar
Carter, R., & McCarthy, M. (2004). Talking, Creating: Interactional Language, Creativity, and Context. Applied Linguistics, 25(1), 62–88.
Article Google Scholar
CLIC. (2017). Guidelines for Building Language Corpora Under German Law: Guidelines by the DFG Review Board on Linguistics [Ar-lein]. Cafwyd o: https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf [Cyrchwyd 15/02/2021].
Cobb, T. (2000). The Compleat Lexical Tutor [Ar-lein]. Cafwyd o: http://www.lextutor.ca/ [Cyrchwyd 15/02/2021].
Cooper, S., Jones, D. B., & Prys, D. (2019). Crowdsourcing the Paldaruo Speech Corpus of Welsh for Speech Technology. Information, 10(8), 247–258.
Article Google Scholar
Corrigan, K. P., & Mearns, A. (2016). Creating and Digitizing Language Corpora: Volume 3: Databases for Public Engagement. Llundain: Palgrave Macmillan.
Google Scholar
Davies, M. (2010). The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English. Literary and Linguistic Computing, 25(4), 447–464.
Article Google Scholar
Deuchar, M., Webb-Davies, P., & Donnelly, K. (2018). Building and Using the Siarad Corpus. John Benjamins.
Book Google Scholar
Du Bois, J. W., Schuetze-Coburn, S., Paolino, D., & Cumming, S. (1992). Discourse Transcription (Cyf. 4). Santa Barbara: Prifysgol California.
Google Scholar
Estellés-Arolas, E., & L. Guevara, F. G. (2012). Towards an Integrated Crowdsourcing Definition. Journal of Information Science, 38, 189–200.
Article Google Scholar
Fishman, J. (2001). Can Threatened Languages Be Saved? Multilingual Matters.
Book Google Scholar
Flowerdew, L. (2012). Corpora and Language Education. Palgrave Macmillan.
Book Google Scholar
Halliday, M. A. K. (1978). Language as Social Semiotic: The Social Interpretation of Language and Meaning. Edward Arnold.
Google Scholar
Hawtin, A. (2018). The Written British National Corpus 2014: Design, Compilation and Analysis [Traethawd PhD heb ei gyhoeddi]. Prifysgol Caerhirfryn.
Google Scholar
Hunston, S. (2002). Corpora in Applied Linguistics. Gwasg Prifysgol Caergrawnt.
Book Google Scholar
Hunston, S. (2008). Collection Strategies and Design Decisions. Yn A. Lüdeling & M. Kytö (Goln.), Corpus Linguistics: An International Handbook (Cyf. 1, tt. 154–168). Berlin: de Gruyter.
Google Scholar
Johns, T. (1991). Should You Be Persuaded: Two Samples of Data-Driven Learning Materials. English Language Research Journal, 4, 1–16.
Google Scholar
King, G. (2016). Modern Welsh Dictionary: A Guide to the Living Language. Gwasg Prifysgol Rhydychen.
Google Scholar
Knight, D., Adolphs, S., & Carter, R. (2013). Formality in Digital Discourse: A Study of Hedging in CANELC. Yn J. Romero-Trillo (Gol.), Yearbook of Corpus Linguistics and Pragmatics (tt. 131–152). Dordrecht: Springer.
Google Scholar
Knight, D., Loizidea, F., Neale, S., Anthony, L., & Spasic, I. (2020). Developing Computational Infrastructure for the CorCenCC Corpus – The National Corpus of Contemporary Welsh. Language Resources and Evaluation, 1–28.
Google Scholar
Knight, D., Morris, S., Arman, L., Needs, J., & Rees, M. (2021). Building a National Corpus: A Welsh language case study. Palgrave.
Google Scholar
Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Chlumská, L., Jelínek, T., Kováříková, D., Petkevič, V., Procházka, P., Skoumalová, H., Škrabal, M., Truneček, P., Vondřička, P., & Zasina, A. (2016). SYN2015: Representative Corpus of Contemporary Written Czech. Papur a gyflwynwyd yn Tenth International Conference on Language Resources and Evaluation (LREC) (tt. 2522–2528), Portorož, Slofenia.
Google Scholar
Kupietz, M., Belica, C., Keibel, H., & Witt, A. (2010). The German Reference Corpus DeReKo: A Primordial Sample for Linguistic Research. Papur a gyflwynwyd yn Language Resources Evaluation 2010 Conference (tt. 1848–1854), Valletta, Malta.
Google Scholar
Leech, G. (2007). New Resources, or Just Better Old Ones? The Holy Grail of Representativeness. Yn M. Hundt, N. Nesselhauf, & C. Biewer (Goln.), Corpus Linguistics and the Web (tt. 133–150). Amsterdam: Rodopi.
Google Scholar
Little, D. (2007). Language Learner Autonomy: Some Fundamental Considerations Revisited. Innovations in Language Learning and Teaching, 1(1), 14–29.
Article Google Scholar
Love, R. (2020). Overcoming Challenges in Corpus Construction. Routledge.
Book Google Scholar
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The Spoken BNC2014: Designing and Building a Spoken Corpus of Everyday Conversations. International Journal of Corpus Linguistics, 22, 319–344.
Article Google Scholar
Lüdeling, A. & Kytö, M. (2008). Introduction. Yn A. Lüdeling & M. Kytö (Goln.), Corpus Linguistics: An International Handbook (tt. i–xii). Berlin: Walter de Gruyter.
Google Scholar
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Child Language Teaching and Therapy, 8(2), 217–218.
Article Google Scholar
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-Based Language Studies an Advanced Resource Book. Routledge.
Google Scholar
McEnery, T., Love, R., & Brezina, V. (2017). Compiling and Analysing the Spoken British National Corpus 2014. International Journal of Corpus Linguistics, 22(3), 311–318.
Article Google Scholar
Roberts, C. (2003). Applied Linguistics Applied. Yn S. Sarangi & T. V. Leeuwen (Goln.), Applied Linguistics and Communities of Practice: Selected Papers from the Annual Meeting of the British Association for Applied Linguistics, Cardiff University (tt. 132–149). Llundain: Bloomsbury Publishing Plc.
Google Scholar
Rose, D., Pevalin, D., & O’Reilly, K. (2005). The National Statistics Socio-economic Classification: Origins, Development and Use [Ar-lein]. Cafwyd o: https://www.ons.gov.uk/methodology/classificationsandstandards/otherclassifications/thenationalstatisticssocioeconomicclassificationnssecrebasedonsoc2010 [Cyrchwyd 15/02/2021].
Siepmann, D., Bürgel, C., & Sascha, D. (2015). The Corpus de référence du français contemporain (CRFC) as the first genre-diverse mega-corpus of French. International Journal of Lexicography, 30(1), 63–84.
Google Scholar
Sinclair, J. (1991). Corpus, Concordance, Collocation. Gwasg Prifysgol Rhydychen.
Google Scholar
Sinclair, J. (2005). Corpus and Text – Basic Principles. Yn M. Wynne (Gol.), Developing Linguistic Corpora: A Guide to Good Practice. Rhydychen: Oxbow Books.
Google Scholar
Sinclair, J. (2008). Borrowed Ideas. Language and Computers, 64, 21–41.
Google Scholar
Thompson, P. (2006). Assessing the contribution of corpora to EAP practice. Yn Kantaridou, Z., Papadopoulou, I. a Mahili, I. (Goln.) Motivation in Learning Language for Specific and Academic Purposes. Macedonia: Prifysgol Macedonia [CDROM].
Google Scholar
Tikkinen-Piri, C., Rohunen, A., & Markkula, J. (2017). EU General Data Protection Regulation: Changes and Implications for Personal Data Collecting Companies. Computer Law and Security Review, 34(1), 134–153.
Article Google Scholar
Williams, C. H., & Evas, J. (1998). Community Language Regeneration: Realising the Potential. Community Language Regeneration, 1–13.
Google Scholar

Download references

Author information

Authors and Affiliations

English, Communication & Philosophy, Cardiff University, Cardiff, UK
Dawn Knight
College of Arts and Humanities, Swansea University, Swansea, UK
Steve Morris & Tess Fitzpatrick

Authors

Dawn Knight
View author publications
You can also search for this author in PubMed Google Scholar
Steve Morris
View author publications
You can also search for this author in PubMed Google Scholar
Tess Fitzpatrick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dawn Knight .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Knight, D., Morris, S., Fitzpatrick, T. (2021). 2.3 Cynllunio Corpws Cenedlaethol mewn Iaith Leiafrifoledig. In: Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-72484-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-72484-9_8
Published: 06 July 2021
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-030-72483-2
Online ISBN: 978-3-030-72484-9
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

2.3 Cynllunio Corpws Cenedlaethol mewn Iaith Leiafrifoledig