Abstract
Mae creu corpws yn cynnwys casglu set data ar sail egwyddorion, ac fel arfer yn achos corpora a gynlluniwyd at ddibenion cyffredinol, mae’n rhaid cyflwyno’r data hynny gan ddilyn proses anodi lle mae pob eitem yn cael ei ‘thagio’ yn ôl ei rhan ymadrodd (RhY). Mewn rhai achosion, cymhwysir set barod o dagiau i’r data, ac mewn achosion eraill mae’n ofynnol cael set bwrpasol o dagiau. Mae’n rhaid i gorpora gael seilwaith i’w lletya; mae creu neu ddod o hyd i hwn yn un o elfennau hanfodol eraill cynllunio corpws. Creu’r cydrannau hyn, ynghyd â thagiwr semantig (i ddynodi ystyr y data yn hytrach na’r rhan ymadrodd) a’i set o dagiau ei hun, yn ogystal â’r pecyn cymorth pedagogaidd pwrpasol (Y Tiwtiadur) oedd cynllun creu CorCenCC. Mae’r penderfyniadau ynghylch y seilwaith wedi’i yrru gan ddefnyddwyr a chasglu a phrosesu data, yn benodol, yn cynnig heriau arbennig yng nghyd-destun ieithoedd lleiafrifoledig. Yn y bennod hon rydym yn amlinellu sut aeth prosiect CorCenCC i’r afael â’r heriau hyn.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Cyfeiriadau
Adolphs, S., & Carter, R. (2013). Spoken Corpus Linguistics: From Monomodal to Multimodal. Routledge.
Adolphs, S., Knight, D., Smith, C., & Price, D. (2020). Crowdsourcing Formulaic Phrases: Towards a New Type of Spoken Corpus. Corpora, 15(1), 141–168.
Anderson, J., Beavan, D., & Kay, C. (2007). SCOTS: Scottish Corpus of Texts and Speech. Yn J. Beal, K. Corrigan, & H. Moisl (Goln.), Creating and Digitizing Language Corpora: Volume 1: Synchronic Databases (tt. 17–34). Basingstoke: Palgrave Macmillan.
Aston, G. (2001). Learning with Corpora. Open Library.
Aston, G., & Burnard, L. (1997). The BNC Handbook: Exploring the British National Corpus with SARA. Gwasg Prifysgol Caeredin.
Bauer, M. W., & Aarts, B. (2000). Corpus Construction: A Principle for Qualitative Data Collection. Yn M. W. Bauer & G. Gaskell (Goln.), Qualitative Researching: With Text, Image and Sound (tt. 19–37). Llundain: Sage.
Biber, D. (1994). Representativeness in Corpus Design. Yn A. Zampolli, N. Calzolari, & M. Palmer (Goln.), Current Issues in Computational Linguistics: In Honour of Don Walker (tt. 377–407). Dordrecht: Springer Yr Iseldiroedd.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Longman/Pearson.
Boulton, A. (2010). Bringing Corpora to the Masses: Free and Easy Tools for Interdisciplinary Language Studies. Yn N. Kübler (Gol.), Corpora, Language, Teaching, and Resources: From Theory to Practice (tt. 69–96). Berlin: Peter Lang.
Brabham, D. (2008). Crowdsourcing as a Model for Problem Solving: An Introduction and Cases. Convergence: The International Journal of Research into New Media Technologies, 14, 75–90.
Brookes, G., & McEnery, A. (2020). Corpus Linguistics. Yn S. Adolphs & D. Knight (Goln.), Routledge Handbook of English Language and Corpus Linguistics (tt. 378–404). Llundain: Routledge.
Carter, R., & McCarthy, M. (2004). Talking, Creating: Interactional Language, Creativity, and Context. Applied Linguistics, 25(1), 62–88.
CLIC. (2017). Guidelines for Building Language Corpora Under German Law: Guidelines by the DFG Review Board on Linguistics [Ar-lein]. Cafwyd o: https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf [Cyrchwyd 15/02/2021].
Cobb, T. (2000). The Compleat Lexical Tutor [Ar-lein]. Cafwyd o: http://www.lextutor.ca/ [Cyrchwyd 15/02/2021].
Cooper, S., Jones, D. B., & Prys, D. (2019). Crowdsourcing the Paldaruo Speech Corpus of Welsh for Speech Technology. Information, 10(8), 247–258.
Corrigan, K. P., & Mearns, A. (2016). Creating and Digitizing Language Corpora: Volume 3: Databases for Public Engagement. Llundain: Palgrave Macmillan.
Davies, M. (2010). The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English. Literary and Linguistic Computing, 25(4), 447–464.
Deuchar, M., Webb-Davies, P., & Donnelly, K. (2018). Building and Using the Siarad Corpus. John Benjamins.
Du Bois, J. W., Schuetze-Coburn, S., Paolino, D., & Cumming, S. (1992). Discourse Transcription (Cyf. 4). Santa Barbara: Prifysgol California.
Estellés-Arolas, E., & L. Guevara, F. G. (2012). Towards an Integrated Crowdsourcing Definition. Journal of Information Science, 38, 189–200.
Fishman, J. (2001). Can Threatened Languages Be Saved? Multilingual Matters.
Flowerdew, L. (2012). Corpora and Language Education. Palgrave Macmillan.
Halliday, M. A. K. (1978). Language as Social Semiotic: The Social Interpretation of Language and Meaning. Edward Arnold.
Hawtin, A. (2018). The Written British National Corpus 2014: Design, Compilation and Analysis [Traethawd PhD heb ei gyhoeddi]. Prifysgol Caerhirfryn.
Hunston, S. (2002). Corpora in Applied Linguistics. Gwasg Prifysgol Caergrawnt.
Hunston, S. (2008). Collection Strategies and Design Decisions. Yn A. Lüdeling & M. Kytö (Goln.), Corpus Linguistics: An International Handbook (Cyf. 1, tt. 154–168). Berlin: de Gruyter.
Johns, T. (1991). Should You Be Persuaded: Two Samples of Data-Driven Learning Materials. English Language Research Journal, 4, 1–16.
King, G. (2016). Modern Welsh Dictionary: A Guide to the Living Language. Gwasg Prifysgol Rhydychen.
Knight, D., Adolphs, S., & Carter, R. (2013). Formality in Digital Discourse: A Study of Hedging in CANELC. Yn J. Romero-Trillo (Gol.), Yearbook of Corpus Linguistics and Pragmatics (tt. 131–152). Dordrecht: Springer.
Knight, D., Loizidea, F., Neale, S., Anthony, L., & Spasic, I. (2020). Developing Computational Infrastructure for the CorCenCC Corpus – The National Corpus of Contemporary Welsh. Language Resources and Evaluation, 1–28.
Knight, D., Morris, S., Arman, L., Needs, J., & Rees, M. (2021). Building a National Corpus: A Welsh language case study. Palgrave.
Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Chlumská, L., Jelínek, T., Kováříková, D., Petkevič, V., Procházka, P., Skoumalová, H., Škrabal, M., Truneček, P., Vondřička, P., & Zasina, A. (2016). SYN2015: Representative Corpus of Contemporary Written Czech. Papur a gyflwynwyd yn Tenth International Conference on Language Resources and Evaluation (LREC) (tt. 2522–2528), Portorož, Slofenia.
Kupietz, M., Belica, C., Keibel, H., & Witt, A. (2010). The German Reference Corpus DeReKo: A Primordial Sample for Linguistic Research. Papur a gyflwynwyd yn Language Resources Evaluation 2010 Conference (tt. 1848–1854), Valletta, Malta.
Leech, G. (2007). New Resources, or Just Better Old Ones? The Holy Grail of Representativeness. Yn M. Hundt, N. Nesselhauf, & C. Biewer (Goln.), Corpus Linguistics and the Web (tt. 133–150). Amsterdam: Rodopi.
Little, D. (2007). Language Learner Autonomy: Some Fundamental Considerations Revisited. Innovations in Language Learning and Teaching, 1(1), 14–29.
Love, R. (2020). Overcoming Challenges in Corpus Construction. Routledge.
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The Spoken BNC2014: Designing and Building a Spoken Corpus of Everyday Conversations. International Journal of Corpus Linguistics, 22, 319–344.
Lüdeling, A. & Kytö, M. (2008). Introduction. Yn A. Lüdeling & M. Kytö (Goln.), Corpus Linguistics: An International Handbook (tt. i–xii). Berlin: Walter de Gruyter.
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Child Language Teaching and Therapy, 8(2), 217–218.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-Based Language Studies an Advanced Resource Book. Routledge.
McEnery, T., Love, R., & Brezina, V. (2017). Compiling and Analysing the Spoken British National Corpus 2014. International Journal of Corpus Linguistics, 22(3), 311–318.
Roberts, C. (2003). Applied Linguistics Applied. Yn S. Sarangi & T. V. Leeuwen (Goln.), Applied Linguistics and Communities of Practice: Selected Papers from the Annual Meeting of the British Association for Applied Linguistics, Cardiff University (tt. 132–149). Llundain: Bloomsbury Publishing Plc.
Rose, D., Pevalin, D., & O’Reilly, K. (2005). The National Statistics Socio-economic Classification: Origins, Development and Use [Ar-lein]. Cafwyd o: https://www.ons.gov.uk/methodology/classificationsandstandards/otherclassifications/thenationalstatisticssocioeconomicclassificationnssecrebasedonsoc2010 [Cyrchwyd 15/02/2021].
Siepmann, D., Bürgel, C., & Sascha, D. (2015). The Corpus de référence du français contemporain (CRFC) as the first genre-diverse mega-corpus of French. International Journal of Lexicography, 30(1), 63–84.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Gwasg Prifysgol Rhydychen.
Sinclair, J. (2005). Corpus and Text – Basic Principles. Yn M. Wynne (Gol.), Developing Linguistic Corpora: A Guide to Good Practice. Rhydychen: Oxbow Books.
Sinclair, J. (2008). Borrowed Ideas. Language and Computers, 64, 21–41.
Thompson, P. (2006). Assessing the contribution of corpora to EAP practice. Yn Kantaridou, Z., Papadopoulou, I. a Mahili, I. (Goln.) Motivation in Learning Language for Specific and Academic Purposes. Macedonia: Prifysgol Macedonia [CDROM].
Tikkinen-Piri, C., Rohunen, A., & Markkula, J. (2017). EU General Data Protection Regulation: Changes and Implications for Personal Data Collecting Companies. Computer Law and Security Review, 34(1), 134–153.
Williams, C. H., & Evas, J. (1998). Community Language Regeneration: Realising the Potential. Community Language Regeneration, 1–13.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Knight, D., Morris, S., Fitzpatrick, T. (2021). 2.3 Cynllunio Corpws Cenedlaethol mewn Iaith Leiafrifoledig. In: Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-72484-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-72484-9_8
Published:
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-030-72483-2
Online ISBN: 978-3-030-72484-9
eBook Packages: Social SciencesSocial Sciences (R0)