Initiatives of Digital Humanities in Cantonese Studies: A Corpus of Mid-Twentieth-Century Hong Kong Cantonese

  • Andy Chi-on ChinEmail author
Part of the Digital Culture and Humanities book series (DICUHU, volume 1)


This paper reports on one of the new initiatives in digital humanities on Cantonese studies undertaken in the author’s department at the Education University of Hong Kong: A Corpus of Mid-twentieth-century Hong Kong Cantonese. The data of the corpus were collected by transcribing some of the dialogue of Cantonese movies produced between the 1950s and the 1970s. There are two phases of the corpus. This paper focuses on the second phase of the corpus which, when compared with the first phase of the corpus, includes lexical semantics information and media technology which can facilitate users to undertake Cantonese linguistic studies beyond the traditional approach such as discourse and pragmatics, multimodality, and ontology.


Digital Humanities Corpus of mid-twentieth-century Hong Kong Cantonese Cantonese movies Cantonese linguistics research 


General Note and Acknowledgment

The construction of The Corpus of Mid-twentieth-century Hong Kong Cantonese (phases I and II) was supported by four research grants: Spoken Corpus construction and linguistic analysis of mid-twentieth-century Cantonese (Internal Research Grant, The Hong Kong Institute of Education, Project No.: RG41/2010–2011), a preliminary linguistic analysis of mid-twentieth-century Cantonese from a corpus-based approach (Internal Research Grant, The Hong Kong Institute of Education, Project No.: RG62/12-13R), linguistic analysis of mid-twentieth-century Hong Kong Cantonese by constructing an annotated spoken corpus (Early Career Scheme, Research Grants Council, Hong Kong SAR Government, Project No.: ECS859713), and initiatives in digital humanities (Central Reserve for Strategic Development, The Education University of Hong Kong). The author would like to acknowledge the following colleagues (in alphabetical order of last names) for their advice and assistance in the development of the corpus and the relevant tools: Dicky Cheung, Hintat Cheung, William Chong, Ka Po Chow, Calvin Lai, Yick Sun Lam, Tin Yau Lau, Chung-sum Leung, Tin King Lo, Wing Ng, Lili Ou, Chris Sun, Cat Tang, Crono Tse, Benjamin Tsou, Alistair Tweed, Byron Wong, and Tak-sum Wong.


  1. Alvarez-Pereyre, M. (2011). Using film as linguistic specimen. In R. Piazza, M. Bednarek, & F. Rossi (Eds.), Telecinematic discourse: Approaches to the language of films and television series (pp. 47–67). Amsterdam: John Benjamins.CrossRefGoogle Scholar
  2. Bacon-Shone, J., & Bolton, K. (1997). Charting multilingualism: Language censuses and language surveys in Hong Kong. In M. C. Pennington (Ed.), Language in Hong Kong at century’s end (pp. 43–90). Hong Kong: Hong Kong University Press.Google Scholar
  3. Baker, P. (2009). Contemporary corpus linguistics. London: Continuum.Google Scholar
  4. Bauer, R. S. (2000). Hong Kong Cantonese and the road ahead. In D. C. S. Li, A. Lin, & W. K. Tsang (Eds.), Language and education in post-colonial Hong Kong (pp. 35–58). Hong Kong: Linguistic Society of Hong Kong.Google Scholar
  5. Bednarek, M. (2010). The language of fictional television: Drama and identity. London: Continuum International Publishing.Google Scholar
  6. Bender, M. (2001). Regional literatures. In V. H. Mair (Ed.), The Columbia history of Chinese literature (pp. 1015–1031). New York: Columbia University Press.Google Scholar
  7. Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–257.CrossRefGoogle Scholar
  8. Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  9. Brugman, H., & Russel, A. (2004). Annotating multi-media/multi-modal resources with ELAN. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th international conference on language resources and language evaluation (LREC 2004) (pp. 2065–2068). Paris: European Language Resources Association.Google Scholar
  10. Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., & Schnapp, J. (2012). Digital humanities. Cambridge, MA: The MIT Press.Google Scholar
  11. Chan, M. K. M. (2005). Cantonese opera and the growth and spread of vernacular written Cantonese in the twentieth century. In G. Qian (Ed.), Proceedings of the seventeenth North American conference on Chinese linguistics (NACCL-17) (pp. 1–18). Los Angeles: GSIL Publications, University of Southern California.Google Scholar
  12. Chan, M. K. M. (2006). Gender-marked speech in Cantonese: The case of sentence-final particles je and jek. Studies in the Linguistic Sciences, 26(1/2), 1–38.Google Scholar
  13. Chao, Y. R. (1947). Cantonese primer. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
  14. Cheng, S.-P., & Tang, S. W. (2014). Languagehood of Cantonese: A renewed front in an old debate. Open Journal of Modern Linguistics, 4, 389–398.CrossRefGoogle Scholar
  15. Cheung, S. H.-N. (1972). Xianggang yueyu yufa yanjiu [A study of the grammar of Hong Kong Cantonese]. Hong Kong: The Chinese University Press.Google Scholar
  16. Chin, A. C. (2011). Yueyu yufa de duojiaodu yanjiu [Studies of Cantonese from multiple perspectives]. Studies in Chinese Linguistics, 31(32), 33–43.Google Scholar
  17. Chin, A. C. (2013). Yueyu yanjiu xin ziyuan: xianggang ershi shiji zhongqi yueyu yuliaoku [New resources for Cantonese studies: A corpus of mid-20th century Hong Kong Cantonese]. Newsletter of Chinese Language, 92(1), 7–16.Google Scholar
  18. Chin, A. C. (2017). Yueyu (sizi) xiehouyu de xiuci gongneng [The rhetorical function of xiehouyu in Cantonese]. Yueyu yanjiu (Studies in Cantonese), 125–134.Google Scholar
  19. Chu, Y. (2002). Hong Kong cinema: Colonizer, motherland and self. London: Routledge.Google Scholar
  20. Chung, P. Y. (2004). Xianggang yingshiye bai nian [The Hong Kong TV and movie industry in the past 100]. Hong Kong: Joint Publishing.Google Scholar
  21. Claridge, C. (2008). Historical corpora. In A. Ludeling & M. Kyto (Eds.), Corpus linguistics: An international handbook (Vol. 1, pp. 242–258). Berlin: Mouton de Gruyter.Google Scholar
  22. Ferguson, C. (1959). Diglossia. Word, 15, 325–340.CrossRefGoogle Scholar
  23. Flewitt, R., Hampel, R., Hauck, M., & Lancaster, L. (2014). What are multimodal data and transcription? In C. Jewitt (Ed.), The Routledge handbook of multimodal analysis (pp. 44–59). London: Routledge.Google Scholar
  24. Fonoroff, P. (1988). A brief history of Hong Kong cinema. Renditions, 29 & 30, 293–308.Google Scholar
  25. Fu, P. S., & Desser, D. (Eds.). (2000). The cinema of Hong Kong: History, arts, identity. Cambridge: Cambridge University Press.Google Scholar
  26. Gao, H. (1980). Guangzhou fangyan yanjiu [A study of the Guangzhou dialect]. Hong Kong: Commercial Press.Google Scholar
  27. Guldin, G. E. (1997). Hong Kong ethnicity of folk models and change. In G. Evans & M. Tam (Eds.), Hong Kong: The anthropology of a Chinese metropolis (pp. 25–50). London: Curzon Press.Google Scholar
  28. Jarvie, I. C. (1977). Window on Hong Kong: A sociological study of the Hong Kong film industry and its audience. Hong Kong: Centre of Asian Studies, University of Hong Kong.Google Scholar
  29. Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  30. Lai, M. L. (2005). Language attitudes of the first postcolonial generation in Hong Kong secondary schools. Language in Society, 34(4), 263–388.Google Scholar
  31. Lai, M. L. (2009). ‘I love Cantonese but I want English’: A qualitative account of Hong Kong students’ language attitudes. The Asia-Pacific Education Researcher, 18(1), 79–92.Google Scholar
  32. Lai, Y. P., & Chin, A. C. (2017). Yueyu de dongci houzhui “zhe” [The verbal suffix “zhe” in Cantonese]. Hanyu yu hanzangyu qianyan yanjiu – Ding Bangxin xiansheng badie shouqing lunwenji [A festschrift to celebrate the 80th birthday of Professor Ting Pang-hsin]. Beijing: Social Sciences Academic Press.Google Scholar
  33. Lau, C.-F., & So, D. W.-C. (2005). Cong fangyan zachu dao guangfuhua weizhu – 1949–1971 nianjian xianggang shehui yuyan zhuanxing de chubu tantao [A preliminary study of the sociolinguistic realignment in Hong Kong during 1949 and 1971]. Journal of Chinese Sociolinguistics, 5, 89–104.Google Scholar
  34. Lee, J. (2011). Toward a parallel corpus of spoken Cantonese and written Chinese. In Proceedings of the 5th International Joint Conference on Natural Language Processing (pp. 1462–1466), Chiang Mai.Google Scholar
  35. Lee, K. S., & Leung, W. M. (2012). The status of Cantonese in the education policy of Hong Kong. Multilingual Education, 2(2), 1–22.Google Scholar
  36. Lee, H.-T., & Wong, C. (1998). CANCORP: The Hong Kong Cantonese child language corpus. Cahiers de Linguistique – Asie Orientale, 27(2), 211–228.CrossRefGoogle Scholar
  37. Leung, C.-S. (2005). A study of the utterance particles in Cantonese as spoken in Hong Kong. Hong Kong: Language Information Sciences Research Centre, City University of Hong Kong.Google Scholar
  38. Leung, M. T., & Law, S. P. (2001). HKCAC: The Hong Kong Cantonese adult language corpus. International Journal of Corpus Linguistics, 6(2), 305–325.CrossRefGoogle Scholar
  39. Li, X. (1995). Guangzhou fangyan yanjiu [A study of the Guangzhou dialect]. Guangzhou: Guangdong People’s Publishing House.Google Scholar
  40. Lin, Y., & Qin, F. (2008). Guangxi nanning baihua yanjiu [A study of the Yue dialect in Nanning of Guangxi]. Guilin: Guangxi Normal University Press.Google Scholar
  41. Luke, K. K., & Wong, M. L. Y. (2015). The Hong Kong Cantonese corpus: Design and uses. In B. Tsou & O. Kwong (Eds.), Linguistic corpus and corpus linguistics in the Chinese context, Journal of Chinese Linguistics Monograph Series Number 25 (pp. 312–333). Hong Kong: The Chinese University of Hong Kong Press.Google Scholar
  42. Mai, Y., & Tan, B. Y. (2011). Shiyong guangzhouhua feilei cidian [A practical thesaurus of the Guangzhou dialect]. Hong Kong: Commercial Press.Google Scholar
  43. Matthews, S., & Yip, V. (1994). Cantonese: A comprehensive grammar (1st ed.). London: Routledge.Google Scholar
  44. McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press.Google Scholar
  45. McEnery, T., & Wilson, A. (2001). Corpus linguistics: An introduction. Edinburgh: Edinburgh University Press.Google Scholar
  46. Morrison, R. (1828). Vocabulary of the Canton dialect. Macao: The Honorable East India Company’s Press.Google Scholar
  47. Quaglio, P. (2008). Television dialogue and natural conversation: Linguistic similarities and functional differences. A. Adel R. Reppen, Corpora and discourse (pp. 189–210). Amsterdam: John Benjamins.Google Scholar
  48. Quaglio, P. (2009). Television dialogue: The sitcom friends vs. natural conversation. Amsterdam: John Benjamins.CrossRefGoogle Scholar
  49. Richardson, K. (2010). Television dramatic dialogue: A sociolinguistic study. New York: Oxford University Press.CrossRefGoogle Scholar
  50. Tang, S.-F. (2008). Guanyu pujiaozhong taolun de fansi. [A reflection on using Putonghua to teach the Chinese language subject]. Journal of Basic Education, 17(2), 1–13.Google Scholar
  51. Tang, S. W. (2015). Yueyu yufa jiangyi [Lectures notes on Cantonese grammar]. Hong Kong: Commercial Press.Google Scholar
  52. Teo, S. (2007). Hong Kong cinema: The extra dimensions. London: BFI Publishing.Google Scholar
  53. Trudgill, P. (2003). A glossary of sociolinguistics. Edinburgh: Edinburgh University Press.Google Scholar
  54. Tsou, B. K. (1997). Sanyan, liangyu shuo xianggang. [Trilingualism and biliteracy in Hong Kong]. Journal of Chinese Linguistics, 25(2), 290–307.Google Scholar
  55. Tsou, B. K. (2002). Some considerations for additive bilingualism: A tale of two cities (Singapore and Hong Kong). In D. W.-C. So & G. M. Jones (Eds.), Education and society in plurilingual contexts (pp. 163–198). Brussels: Brussels University Press.Google Scholar
  56. Tsou, B. K., & You, R. (2003). Hanyu yu huaren shehui [Chinese language and society]. Hong Kong: City University of Hong Kong Press.Google Scholar
  57. Wong, P.-W. (2006). The specification of POS tagging of the Hong Kong University Cantonese corpus. International Journal of Technology and Human Interaction, 2(1), 21–38.CrossRefGoogle Scholar
  58. Wurm, S. A., Li, R., Baumann, T., & Lee, M. W. (1987). Language Atlas of China. Hong Kong: Longman.Google Scholar
  59. Yang, R. (2008). Yueyu lishi wenxian shujuku de zhizuo yu yingyong [Construction and application of a corpus of early Cantonese texts]. Studies in Chinese Linguistics, 25(1), 1–8.Google Scholar
  60. Yip, V., & Matthews, S. (2007). The bilingual child – Early development and language contact. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  61. You, R. (2002). Xiyang chuanjiaoshi hanyu fangyanxue zhuzuo shumu kaoshu [A study of the reference materials on Chinese dialects compiled by western missionaries]. Harbin: Heilongjiang University.Google Scholar
  62. Yue, A. (2004). Materials for the diachronic study of the Yue dialects. In F. Shi & Z. Shen (Eds.), The joy of research: A festschrift in honor of Professor William S-Y. Wang on his seventieth birthday (pp. 246–271). Nankai: Nankai University.Google Scholar
  63. Yue-Hashimoto, A. (1972). Studies in Yue dialects 1: Phonology of Cantonese. Cambridge: Cambridge University Press.Google Scholar
  64. Yue-Hashimoto, A. (1991). The Yue dialects. In W. S.-Y. Wang (Ed.), Languages and dialects of China, Journal of Chinese Linguistics Monograph Series Number 3 (pp. 294–324). Berkeley: University of California, Berkeley.Google Scholar
  65. Yue-Hashimoto, A. (2005). The Dancun dialect of Taishan. Hong Kong: Language Information Sciences Research Center, City University of Hong Kong.Google Scholar
  66. Zhan, B. (2002). Guangdong yue fangyan gaiyao [An outline of Yue dialects in Guangdong]. Guangzhou: Ji’nan University Press.Google Scholar
  67. Zhan, B., & Cheung, Y.-S. (1988). Zhujiangsanjiaozhou fangyan cihui duizhaobiao [A comparison of vocabulary among the dialects of Pearl River Delta]. Guangzhou: Guangdong People’s Publishing House.Google Scholar
  68. Zhan, B., & Cheung, Y.-S. (1994). Yuebei shi xianshi yue fangyan diaocha baogao [A report on the 10 Yue dialects in northern Guangdong]. Guangzhou: Ji’nan University Press.Google Scholar
  69. Zhan, B., & Cheung, Y.-S. (1998). Yuexi shi xianshi yue fangyan diaocha baogao [A report on the 10 Yue dialects in western Guangdong]. Guangzhou: Ji’nan University Press.Google Scholar
  70. Zhang, Z. (2009). Language and society in early Hong Kong (1841–1884). Guangzhou: Sun Yat-sen University Press.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.The Education University of Hong KongHong KongChina

Personalised recommendations