Skip to main content

Abstract

Mae creu corpws yn cynnwys casglu set data ar sail egwyddorion, ac fel arfer yn achos corpora a gynlluniwyd at ddibenion cyffredinol, mae’n rhaid cyflwyno’r data hynny gan ddilyn proses anodi lle mae pob eitem yn cael ei ‘thagio’ yn ôl ei rhan ymadrodd (RhY). Mewn rhai achosion, cymhwysir set barod o dagiau i’r data, ac mewn achosion eraill mae’n ofynnol cael set bwrpasol o dagiau. Mae’n rhaid i gorpora gael seilwaith i’w lletya; mae creu neu ddod o hyd i hwn yn un o elfennau hanfodol eraill cynllunio corpws. Creu’r cydrannau hyn, ynghyd â thagiwr semantig (i ddynodi ystyr y data yn hytrach na’r rhan ymadrodd) a’i set o dagiau ei hun, yn ogystal â’r pecyn cymorth pedagogaidd pwrpasol (Y Tiwtiadur) oedd cynllun creu CorCenCC. Mae’r penderfyniadau ynghylch y seilwaith wedi’i yrru gan ddefnyddwyr a chasglu a phrosesu data, yn benodol, yn cynnig heriau arbennig yng nghyd-destun ieithoedd lleiafrifoledig. Yn y bennod hon rydym yn amlinellu sut aeth prosiect CorCenCC i’r afael â’r heriau hyn.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cyfeiriadau

  • Adolphs, S., & Carter, R. (2013). Spoken Corpus Linguistics: From Monomodal to Multimodal. Routledge.

    Book  Google Scholar 

  • Adolphs, S., Knight, D., Smith, C., & Price, D. (2020). Crowdsourcing Formulaic Phrases: Towards a New Type of Spoken Corpus. Corpora, 15(1), 141–168.

    Article  Google Scholar 

  • Anderson, J., Beavan, D., & Kay, C. (2007). SCOTS: Scottish Corpus of Texts and Speech. Yn J. Beal, K. Corrigan, & H. Moisl (Goln.), Creating and Digitizing Language Corpora: Volume 1: Synchronic Databases (tt. 17–34). Basingstoke: Palgrave Macmillan.

    Google Scholar 

  • Aston, G. (2001). Learning with Corpora. Open Library.

    Google Scholar 

  • Aston, G., & Burnard, L. (1997). The BNC Handbook: Exploring the British National Corpus with SARA. Gwasg Prifysgol Caeredin.

    Google Scholar 

  • Bauer, M. W., & Aarts, B. (2000). Corpus Construction: A Principle for Qualitative Data Collection. Yn M. W. Bauer & G. Gaskell (Goln.), Qualitative Researching: With Text, Image and Sound (tt. 19–37). Llundain: Sage.

    Google Scholar 

  • Biber, D. (1994). Representativeness in Corpus Design. Yn A. Zampolli, N. Calzolari, & M. Palmer (Goln.), Current Issues in Computational Linguistics: In Honour of Don Walker (tt. 377–407). Dordrecht: Springer Yr Iseldiroedd.

    Google Scholar 

  • Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Longman/Pearson.

    Google Scholar 

  • Boulton, A. (2010). Bringing Corpora to the Masses: Free and Easy Tools for Interdisciplinary Language Studies. Yn N. Kübler (Gol.), Corpora, Language, Teaching, and Resources: From Theory to Practice (tt. 69–96). Berlin: Peter Lang.

    Google Scholar 

  • Brabham, D. (2008). Crowdsourcing as a Model for Problem Solving: An Introduction and Cases. Convergence: The International Journal of Research into New Media Technologies, 14, 75–90.

    Article  Google Scholar 

  • Brookes, G., & McEnery, A. (2020). Corpus Linguistics. Yn S. Adolphs & D. Knight (Goln.), Routledge Handbook of English Language and Corpus Linguistics (tt. 378–404). Llundain: Routledge.

    Google Scholar 

  • Carter, R., & McCarthy, M. (2004). Talking, Creating: Interactional Language, Creativity, and Context. Applied Linguistics, 25(1), 62–88.

    Article  Google Scholar 

  • CLIC. (2017). Guidelines for Building Language Corpora Under German Law: Guidelines by the DFG Review Board on Linguistics [Ar-lein]. Cafwyd o: https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf [Cyrchwyd 15/02/2021].

  • Cobb, T. (2000). The Compleat Lexical Tutor [Ar-lein]. Cafwyd o: http://www.lextutor.ca/ [Cyrchwyd 15/02/2021].

  • Cooper, S., Jones, D. B., & Prys, D. (2019). Crowdsourcing the Paldaruo Speech Corpus of Welsh for Speech Technology. Information, 10(8), 247–258.

    Article  Google Scholar 

  • Corrigan, K. P., & Mearns, A. (2016). Creating and Digitizing Language Corpora: Volume 3: Databases for Public Engagement. Llundain: Palgrave Macmillan.

    Google Scholar 

  • Davies, M. (2010). The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English. Literary and Linguistic Computing, 25(4), 447–464.

    Article  Google Scholar 

  • Deuchar, M., Webb-Davies, P., & Donnelly, K. (2018). Building and Using the Siarad Corpus. John Benjamins.

    Book  Google Scholar 

  • Du Bois, J. W., Schuetze-Coburn, S., Paolino, D., & Cumming, S. (1992). Discourse Transcription (Cyf. 4). Santa Barbara: Prifysgol California.

    Google Scholar 

  • Estellés-Arolas, E., & L. Guevara, F. G. (2012). Towards an Integrated Crowdsourcing Definition. Journal of Information Science, 38, 189–200.

    Article  Google Scholar 

  • Fishman, J. (2001). Can Threatened Languages Be Saved? Multilingual Matters.

    Book  Google Scholar 

  • Flowerdew, L. (2012). Corpora and Language Education. Palgrave Macmillan.

    Book  Google Scholar 

  • Halliday, M. A. K. (1978). Language as Social Semiotic: The Social Interpretation of Language and Meaning. Edward Arnold.

    Google Scholar 

  • Hawtin, A. (2018). The Written British National Corpus 2014: Design, Compilation and Analysis [Traethawd PhD heb ei gyhoeddi]. Prifysgol Caerhirfryn.

    Google Scholar 

  • Hunston, S. (2002). Corpora in Applied Linguistics. Gwasg Prifysgol Caergrawnt.

    Book  Google Scholar 

  • Hunston, S. (2008). Collection Strategies and Design Decisions. Yn A. Lüdeling & M. Kytö (Goln.), Corpus Linguistics: An International Handbook (Cyf. 1, tt. 154–168). Berlin: de Gruyter.

    Google Scholar 

  • Johns, T. (1991). Should You Be Persuaded: Two Samples of Data-Driven Learning Materials. English Language Research Journal, 4, 1–16.

    Google Scholar 

  • King, G. (2016). Modern Welsh Dictionary: A Guide to the Living Language. Gwasg Prifysgol Rhydychen.

    Google Scholar 

  • Knight, D., Adolphs, S., & Carter, R. (2013). Formality in Digital Discourse: A Study of Hedging in CANELC. Yn J. Romero-Trillo (Gol.), Yearbook of Corpus Linguistics and Pragmatics (tt. 131–152). Dordrecht: Springer.

    Google Scholar 

  • Knight, D., Loizidea, F., Neale, S., Anthony, L., & Spasic, I. (2020). Developing Computational Infrastructure for the CorCenCC Corpus – The National Corpus of Contemporary Welsh. Language Resources and Evaluation, 1–28.

    Google Scholar 

  • Knight, D., Morris, S., Arman, L., Needs, J., & Rees, M. (2021). Building a National Corpus: A Welsh language case study. Palgrave.

    Google Scholar 

  • Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Chlumská, L., Jelínek, T., Kováříková, D., Petkevič, V., Procházka, P., Skoumalová, H., Škrabal, M., Truneček, P., Vondřička, P., & Zasina, A. (2016). SYN2015: Representative Corpus of Contemporary Written Czech. Papur a gyflwynwyd yn Tenth International Conference on Language Resources and Evaluation (LREC) (tt. 2522–2528), Portorož, Slofenia.

    Google Scholar 

  • Kupietz, M., Belica, C., Keibel, H., & Witt, A. (2010). The German Reference Corpus DeReKo: A Primordial Sample for Linguistic Research. Papur a gyflwynwyd yn Language Resources Evaluation 2010 Conference (tt. 1848–1854), Valletta, Malta.

    Google Scholar 

  • Leech, G. (2007). New Resources, or Just Better Old Ones? The Holy Grail of Representativeness. Yn M. Hundt, N. Nesselhauf, & C. Biewer (Goln.), Corpus Linguistics and the Web (tt. 133–150). Amsterdam: Rodopi.

    Google Scholar 

  • Little, D. (2007). Language Learner Autonomy: Some Fundamental Considerations Revisited. Innovations in Language Learning and Teaching, 1(1), 14–29.

    Article  Google Scholar 

  • Love, R. (2020). Overcoming Challenges in Corpus Construction. Routledge.

    Book  Google Scholar 

  • Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The Spoken BNC2014: Designing and Building a Spoken Corpus of Everyday Conversations. International Journal of Corpus Linguistics, 22, 319–344.

    Article  Google Scholar 

  • Lüdeling, A. & Kytö, M. (2008). Introduction. Yn A. Lüdeling & M. Kytö (Goln.), Corpus Linguistics: An International Handbook (tt. i–xii). Berlin: Walter de Gruyter.

    Google Scholar 

  • MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Child Language Teaching and Therapy, 8(2), 217–218.

    Article  Google Scholar 

  • McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-Based Language Studies an Advanced Resource Book. Routledge.

    Google Scholar 

  • McEnery, T., Love, R., & Brezina, V. (2017). Compiling and Analysing the Spoken British National Corpus 2014. International Journal of Corpus Linguistics, 22(3), 311–318.

    Article  Google Scholar 

  • Roberts, C. (2003). Applied Linguistics Applied. Yn S. Sarangi & T. V. Leeuwen (Goln.), Applied Linguistics and Communities of Practice: Selected Papers from the Annual Meeting of the British Association for Applied Linguistics, Cardiff University (tt. 132–149). Llundain: Bloomsbury Publishing Plc.

    Google Scholar 

  • Rose, D., Pevalin, D., & O’Reilly, K. (2005). The National Statistics Socio-economic Classification: Origins, Development and Use [Ar-lein]. Cafwyd o: https://www.ons.gov.uk/methodology/classificationsandstandards/otherclassifications/thenationalstatisticssocioeconomicclassificationnssecrebasedonsoc2010 [Cyrchwyd 15/02/2021].

  • Siepmann, D., Bürgel, C., & Sascha, D. (2015). The Corpus de référence du français contemporain (CRFC) as the first genre-diverse mega-corpus of French. International Journal of Lexicography, 30(1), 63–84.

    Google Scholar 

  • Sinclair, J. (1991). Corpus, Concordance, Collocation. Gwasg Prifysgol Rhydychen.

    Google Scholar 

  • Sinclair, J. (2005). Corpus and Text – Basic Principles. Yn M. Wynne (Gol.), Developing Linguistic Corpora: A Guide to Good Practice. Rhydychen: Oxbow Books.

    Google Scholar 

  • Sinclair, J. (2008). Borrowed Ideas. Language and Computers, 64, 21–41.

    Google Scholar 

  • Thompson, P. (2006). Assessing the contribution of corpora to EAP practice. Yn Kantaridou, Z., Papadopoulou, I. a Mahili, I. (Goln.) Motivation in Learning Language for Specific and Academic Purposes. Macedonia: Prifysgol Macedonia [CDROM].

    Google Scholar 

  • Tikkinen-Piri, C., Rohunen, A., & Markkula, J. (2017). EU General Data Protection Regulation: Changes and Implications for Personal Data Collecting Companies. Computer Law and Security Review, 34(1), 134–153.

    Article  Google Scholar 

  • Williams, C. H., & Evas, J. (1998). Community Language Regeneration: Realising the Potential. Community Language Regeneration, 1–13.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dawn Knight .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Knight, D., Morris, S., Fitzpatrick, T. (2021). 2.3 Cynllunio Corpws Cenedlaethol mewn Iaith Leiafrifoledig. In: Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-72484-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72484-9_8

  • Published:

  • Publisher Name: Palgrave Macmillan, Cham

  • Print ISBN: 978-3-030-72483-2

  • Online ISBN: 978-3-030-72484-9

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics