Skip to main content

Taming Digital Texts, Voices and Images for the Wild: Models and Methods for Handling Unconventional Corpora to Engage the Public

  • Chapter
  • First Online:
Creating and Digitizing Language Corpora
  • 324 Accesses

Abstract

This volume is the third in a series of books published by Palgrave Macmillan which focus on establishing guidelines for the creation and digitization of language corpora that are unconventional in some respect (see Beal et al. 2007a, b). Volume 3 is dedicated to the issue of public engagement and questions of how linguists can and should make their corpora accessible for a broader range of uses and to a wider audience. Although in this regard the road to building a corpus is often paved with good intentions, as Rickford (1993: 130) observes, these are frequently overtaken by ‘the less escapable commitments’ of teaching and further research. While this may be understandable, it is ‘not a picture, when we step back and view it, with which we can be proud’, since it means that ‘[m]ost of us fall short of paying our debts to the communities whose data have helped to build and advance our careers’ (Rickford 1993: 130). The importance of taking public engagement initiatives more seriously has generated considerable recent scholarly debate (especially amongst researchers in the arts, humanities and social sciences) as the so-called ‘impact agenda’ has taken hold particularly, though not exclusively, in UK higher education institutions (Lawson and Sayers 2016; Martin 2011; Samuel and Derrick 2015). A key objective of this volume is to examine the evidence for the view that despite the new requirements by funding bodies (and ultimately governments) that corpora should have a dual purpose as data that is deployable for engagement as well as research, twenty-first-century corpus linguists who do just that are not following conventional practices within their discipline. A second goal is to demonstrate how the issues that purportedly stand in the way of developing what one might term ‘impactful corpora’ can be circumvented (as our contributors have done) with a little ingenuity and motivation. Another objective is to sketch what we consider to be best practices in creating corpora for public engagement by offering guidance on optimal methods by which such data (audio, text and still/moving images) can be created, digitized and subsequently exploited for public engagement projects.

The term ‘unconventional’ here relates to the distinction first articulated in Beal et al. (2007a, b) between large-scale standardized or conventional corpora like the International Corpus of English or COBUILD and smaller more specialized databases. These are often not devised at the outset as corpora strictly speaking since they initially arise from sociolinguistically oriented projects, but such resources can indeed be used as such providing they are ‘tamed’ in particular ways (Beal et al. 2007a: 1). See also D’Arcy (2011: 54–6) and Kendall (2011: 362–3).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The introduction by Lawson and Sayers (2016) to their Routledge volume, which explores the possibilities for combining sociolinguistic research with the impact agenda, offers an excellent historical overview of how this ideology developed and its implications for scholarship from the 1980s to the present day.

  2. 2.

    Often in the sense that there is no intention to share it with other scholars, let alone members of the public. On the distinction between public and private corpora, see also Bauer (2004) and D’Arcy (2011: 51–6).

  3. 3.

    This can be gathered from the published guidelines: ‘an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’ (Research Excellence Framework 2011: 26).

  4. 4.

    See also: http://www.rcuk.ac.uk/research/openaccess.

  5. 5.

    See: http://www.w3.org.

  6. 6.

    In the North American sense of non-fee-paying, state-funded educational settings.

References

Books and Articles

  • Allen, Will, Joan C. Beal, Karen P. Corrigan, Hermann L. Moisl, and Warren Maguire. 2007. The Newcastle Electronic Corpus of Tyneside English. In Creating and Digitizing Language Corpora: Vol. 2, Diachronic Databases, eds. Joan C. Beal, Karen P. Corrigan, and Hermann L. Moisl, 16–48. Basingstoke: Palgrave Macmillan.

    Google Scholar 

  • Bauer, Laurie. 2004. Inferring variation and change from public corpora. In The Handbook of Language Variation and Change, 1 edn, eds. J.K. Chambers, and Natalie Schilling, 97–114. Malden: Blackwell.

    Google Scholar 

  • Beal, Joan C., Karen P. Corrigan, and Hermann L. Moisl, eds. 2007a. Creating and Digitizing Language Corpora: Vol. 1, Synchronic Databases. Basingstoke: Palgrave Macmillan.

    Google Scholar 

  • ———, eds. 2007b. Creating and Digitizing Language Corpora: Vol. 2, Diachronic Databases. Basingstoke: Palgrave Macmillan.

    Google Scholar 

  • Beal, Joan C., and Karen P. Corrigan. 2013. Working with unconventional existing data resources. In Data Collection in Sociolinguistics: Methods and Applications, eds. Becky Childs, Christine Mallinson, and Gerard van Herk, 213–216. London: Routledge.

    Google Scholar 

  • Beal, Joan C., Karen P. Corrigan, Adam J. Mearns, and Hermann L. Moisl. 2014. The Diachronic Electronic Corpus of Tyneside English: annotation and dissemination practices. In The Oxford Handbook of Corpus Phonology, eds. Jacques Durand, Ulrike Gut, and Gjert Kristoffersen, 517–533. Oxford: Oxford University Press.

    Google Scholar 

  • Cameron, Deborah, Elizabeth Frazer, Penelope Harvey, Ben Rampton, and Kay Richardson. 1997. Ethics, advocacy and empowerment in researching language. In Sociolinguistics, eds. Nikolas Coupland, and Adam Jaworski, 145–162. Houndmills: Macmillan. (Originally published in Language and Communication 13(2): 81–94 in 1993.)

    Google Scholar 

  • Childs, Becky, Gerard van Herk, and Jennifer Thorburn. 2011. Safe harbour: ethics and accessibility in sociolinguistic corpus building. Corpus Linguistics and Linguistic Theory 7(1): 163–180.

    Article  Google Scholar 

  • Choudrie, Jyoti, Susan Grey, and Nicholas Tsitsianis. 2010. Evaluating the digital divide: the Silver Surfer’s perspective. Electronic Government, An International Journal 7(2): 148–167.

    Article  Google Scholar 

  • Corrigan, Karen P., Adam J. Mearns, and Hermann L. Moisl. 2013. Data-mining the DECTE Corpus: phonological and morphological variability in Tyneside English. In Cross-Linguistic and Language-Internal Variation in Text and Speech, eds. Benedikt Szmrecsanyi, and Bernhard Wälchli, 113–149. Berlin: Walter de Gruyter.

    Google Scholar 

  • D’Arcy, Alexandra. 2011. Corpora: capturing language in use. In Analysing Variation in English, eds. Warren Maguire, and April McMahon, 49–71. Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Day, Timothy. 2001. The National Sound Archive: the first fifty years. In Aural History: Essays on Recorded Sound, ed. Andy Linehan, 41–64. London: The British Library.

    Google Scholar 

  • Durand, Jacques, Ulrike Gut, and Gjert Kristoffersen, eds. 2014. The Oxford Handbook of Corpus Phonology. Oxford: Oxford University Press.

    Google Scholar 

  • Kendall, Tyler. 2007. The Sociolinguistic Archive and Analysis Project: empowering the sociolinguistic archive. Penn Working Papers in Linguistics 13(2): 15–26.

    Google Scholar 

  • ———. 2008. On the history and future of sociolinguistic data. Language and Linguistics Compass 2(2): 332–351.

    Google Scholar 

  • ———. 2011. Corpora from a sociolinguistic perspective. Revista Brasileira de Linguística Aplicada 11(2): 361–389.

    Google Scholar 

  • Kretzschmar, William A., Jean Anderson, Joan C. Beal, Karen P. Corrigan, Lisa-Lena Opas-Hänninen, and Bartek Plichta. 2006. Collaboration on corpora for regional and social analysis. Journal of English Linguistics 34(3): 172–205.

    Google Scholar 

  • Labov, William. 1982. Objectivity and commitment in linguistic science. Language in Society 11: 165–201.

    Article  Google Scholar 

  • Lawson, Robert, and Dave Sayers. 2016a. Introduction. In Sociolinguistic Research: Application and Impact, eds. Robert Lawson, and Dave Sayers, 1-6. London: Routledge.

    Google Scholar 

  • Lawson, Robert, and Dave Sayers. 2016b. Where we’re going, we don’t need roads: the past, present, and future of impact. In Sociolinguistic Research: Application and Impact, eds. Robert Lawson, and Dave Sayers, 7-22. London: Routledge.

    Google Scholar 

  • Martin, Ben R. 2011. The Research Excellence Framework and the ‘impact agenda’: are we creating a Frankenstein monster? Research Evaluation 20(3): 247–254.

    Article  Google Scholar 

  • Norris, Pippa. 2001. Digital Divide: Civic Engagement, Information Poverty and the Internet in Democratic Societies. New York: Cambridge University Press.

    Book  Google Scholar 

  • Perks, Robert P. 2011. Messiah with a microphone? Oral historians, technologies and sound archives. In The Oxford Handbook of Oral History, ed. Donald A. Ritchie, 315–332. Oxford: Oxford University Press.

    Google Scholar 

  • Reaser, Jeffrey, and Caroyln Temple Adger. 2007. Developing language awareness materials for non-linguists: lessons learned from the Do You Speak American? project. Language and Linguistics Compass 1(3): 155–167.

    Article  Google Scholar 

  • Rickford, John. 1993. Comments on ‘ethics, advocacy and empowerment’. Language and Communication 13(2): 129–131.

    Article  Google Scholar 

  • Robertson, Beth M. 2011. The archival imperative: can oral history survive the funding crisis in archival institutions? In The Oxford Handbook of Oral History, ed. Donald A. Ritchie, 393–408. Oxford: Oxford University Press.

    Google Scholar 

  • Rowlands, Ian, David Nicholas, Peter Williams, Paul Huntington, Maggie Fieldhouse, Barrie Gunter, Richard Withey, Hamid R. Jamali, Tom Dobrowolski, and Carol Tenopir. 2008. The Google Generation: the information behaviour of the researcher of the future. Aslib Proceedings 60(4): 290–310.

    Article  Google Scholar 

  • Samuel, Gabrielle N., and Gemma E. Derrick. 2015. Societal impact evaluation: exploring evaluator perceptions of the characterization of impact under the REF2014. Research Evaluation 24: 229–241.

    Article  Google Scholar 

  • Smith, Abby, David Allen, and Karen Allen. 2004. Survey of the State of Audio Collections in Academic Libraries. Washington, DC: Council on Library and Information Resources.

    Google Scholar 

  • Wolfram, Walt. 1993. Ethical considerations in language awareness programs. Issues in Applied Linguistics 4: 225–255.

    Google Scholar 

  • ———. 2012. In the profession: connecting with the public. Journal of English Linguistics 40(1): 111–117.

    Article  Google Scholar 

  • ———. 2013. Community, commitment and responsibility. In The Handbook of Language Variation and Change, eds. J. K. Chambers and Natalie Schilling. 555-576, 2. Malden: Wiley/Blackwell.

    Google Scholar 

  • ———. 2016. Public sociolinguistic education in the United States: a proactive, comprehensive program. In Sociolinguistic Research: Application and Impact, eds. Robert Lawson, and Dave Sayers. 87-108. London: Routledge.

    Google Scholar 

  • Wolfram, Walt, Jeffrey Reaser, and Charlotte Vaughan. 2008. Operationalizing linguistic gratuity: from principle to practice. Language and Linguistics Compass 2(6): 1109–1134.

    Article  Google Scholar 

Websites and Online Resources

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Copyright information

© 2016 The Author(s)

About this chapter

Cite this chapter

Corrigan, K.P., Mearns, A. (2016). Taming Digital Texts, Voices and Images for the Wild: Models and Methods for Handling Unconventional Corpora to Engage the Public. In: Corrigan, K., Mearns, A. (eds) Creating and Digitizing Language Corpora. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-137-38645-8_1

Download citation

  • DOI: https://doi.org/10.1057/978-1-137-38645-8_1

  • Published:

  • Publisher Name: Palgrave Macmillan, London

  • Print ISBN: 978-1-137-38644-1

  • Online ISBN: 978-1-137-38645-8

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics