Abstract
This paper reports on one of the new initiatives in digital humanities on Cantonese studies undertaken in the author’s department at the Education University of Hong Kong: A Corpus of Mid-twentieth-century Hong Kong Cantonese. The data of the corpus were collected by transcribing some of the dialogue of Cantonese movies produced between the 1950s and the 1970s. There are two phases of the corpus. This paper focuses on the second phase of the corpus which, when compared with the first phase of the corpus, includes lexical semantics information and media technology which can facilitate users to undertake Cantonese linguistic studies beyond the traditional approach such as discourse and pragmatics, multimodality, and ontology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Trilingualism and biliteracy focus on the spoken varieties and the written varieties respectively. The former refers to the ability and proficiency in Cantonese, English, and Putonghua, while the latter refers to Chinese (i.e., Modern Standard Chinese) and English.
- 3.
For a summary on the studies that made use of these earlier materials, see Chin (2011).
- 4.
Apparent-time and real-time approaches are used to investigate language change in sociolinguistic studies. The former compares “the speech of older speakers with that of younger speakers in a given community” and assumes that “differences between them are due to changes currently taking place within the dialect of that community” (Trudgill 2003, p. 9). On the other hand, the real-time approach studies “language changes as they happen … by investigating the speech of a particular community and then returning a number of years later to investigate how speech in this community has changed” (p. 109).
- 5.
There are in fact extant publications on Cantonese prior to Morrison (1828). In 1815, Morrison published A Grammar of the Chinese Language (Morrison 1815) in which there is a section on Cantonese. However, the section only contains a few pages, and the data are not sufficient for a detailed examination of Cantonese prior to 1828. Other pre-nineteenth-century works include二荷花史 (er hehua shi), 花箋記(hua jian ji), and 粵謳 (yue ou). Although their exact publication dates are uncertain, the first two could be dated back to the sixteenth or the seventeenth centuries, while the third one was likely published in the nineteenth century. These three works are usually categorized as Cantonese wooden fish songs (木魚歌), which are “…printed and hand-written texts of oral and performance-related prosimetric narratives” (Bender 2001, p. 1025–26). The language in these wooden fish songs is considered colloquial and reflects “…a more informal style of spoken Cantonese” (Chan 2005, p. 4). Since these works are only the few extant works produced in the sixteenth and seventeenth centuries, they may not be able to allow us to trace the development of Cantonese across time.
- 6.
This word has no entry in any modern Cantonese dictionaries. The author informally consulted some senior speakers of Cantonese, and none of them recalled this item. It has been suggested that chicken tea could be a thick and salty extract of chicken similar to Bovril.
- 7.
The URLs of these corpora are provided below (accessed on March 16 2018).
Cantonese corpus
URL
Hong Kong Cantonese Child Language Corpus
Hong Kong Bilingual Child Language Corpus
Hong Kong Cantonese Corpus
Hong Kong Cantonese Adult Corpus
The corpus is not available online
PolyU Corpus of Spoken Chinese (Cantonese)
A Parallel Corpus of Spoken Cantonese and Written Chinese
The corpus is not available online
Early Cantonese Dolloquial Texts: A Database
Early Cantonese Tagged Database
- 8.
This Cantonese corpus was developed by Professor Luke Kang Kwong at the University of Hong Kong in the 1990s. Professor Luke is now working at the Nanyang Technological University (NTU), Singapore, and the corpus is now hosted at NTU.
- 9.
See Yiu, C. (December, 2012). Zaoqi yueyu biaozhu yuliaoku: jiankou he yingyong [The early Cantonese tagged database]. Paper presented at the 17th International Conference on Yue Dialects, Guangzhou.
- 10.
In 2008, a Cantonese Cinema Study Association (香港粵語片研究會) was set up with the aim to reaffirm the status of Cantonese movies as an important legacy and asset in Hong Kong history, but the focus is still on the cultural aspect
(https://www.facebook.com/groups/CCSAHK/).The potential value of these movies for linguistic analysis, however, is not mentioned.
- 11.
There are only a few studies such as Lee and Hsu (2005) and Lau and Siu (2010), but these studies do not use the corpus approach. There are other studies using TV dramas or radio programs such as Chan (2006) and Leung (2005).
The study by Lee and Hsu (December, 2005) refers to: Lee, H-K. and Hsu, T-P. (2005). Wu, liushi niandai xianggang yueyu dianying yuyan yanjiu – yi yuqici ‘ze’ ‘zek’ weili [A study of Hong Kong Cantonese in movies of 1950s and 1960s with special reference to ze and zek]. Paper presented at the 10th International Conference on Yue Dialects, Hong Kong.
And the study by Lau and Siu (December, 2010) refers to: Lau, C-F. and Siu, P-S.(2010). Xianggang yuyan bianhua de tantao: touguo liushi niandai yueyu dianying bijiao jinxi yueyu yuyin [A study of language change in Hong Kong: An analysis of sound change with Cantonese movies of the 1960s]. Paper presented at the 15th International Conference on Yue Dialects, Macau.
- 12.
Appendix I lists the 60 movies.
- 13.
It should be stressed that the video segment displayed only associates with the utterance returned from the search algorithm. The corpus does not mean to distribute the whole movie, and only relevant segments are shown for illustrating how the utterance was made.
- 14.
See Tse, M. S. and Chin, A. C. (April, 2015). Yueyu “ming-liang-ming” jiegou de tongzhi yongfa [The co-referential usage of the N-CL-N structure in Cantonese]. Paper presented at the 15th Workshop on Cantonese, Hong Kong.
- 15.
See Chin, A. C. (March, 2018). Discourse markers in Cantonese. Paper presented at the 30th North American Conference on Chinese Linguistics (NACCL), Columbus, Ohio.
References
Alvarez-Pereyre, M. (2011). Using film as linguistic specimen. In R. Piazza, M. Bednarek, & F. Rossi (Eds.), Telecinematic discourse: Approaches to the language of films and television series (pp. 47–67). Amsterdam: John Benjamins.
Bacon-Shone, J., & Bolton, K. (1997). Charting multilingualism: Language censuses and language surveys in Hong Kong. In M. C. Pennington (Ed.), Language in Hong Kong at century’s end (pp. 43–90). Hong Kong: Hong Kong University Press.
Baker, P. (2009). Contemporary corpus linguistics. London: Continuum.
Bauer, R. S. (2000). Hong Kong Cantonese and the road ahead. In D. C. S. Li, A. Lin, & W. K. Tsang (Eds.), Language and education in post-colonial Hong Kong (pp. 35–58). Hong Kong: Linguistic Society of Hong Kong.
Bednarek, M. (2010). The language of fictional television: Drama and identity. London: Continuum International Publishing.
Bender, M. (2001). Regional literatures. In V. H. Mair (Ed.), The Columbia history of Chinese literature (pp. 1015–1031). New York: Columbia University Press.
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–257.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.
Brugman, H., & Russel, A. (2004). Annotating multi-media/multi-modal resources with ELAN. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th international conference on language resources and language evaluation (LREC 2004) (pp. 2065–2068). Paris: European Language Resources Association.
Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., & Schnapp, J. (2012). Digital humanities. Cambridge, MA: The MIT Press.
Chan, M. K. M. (2005). Cantonese opera and the growth and spread of vernacular written Cantonese in the twentieth century. In G. Qian (Ed.), Proceedings of the seventeenth North American conference on Chinese linguistics (NACCL-17) (pp. 1–18). Los Angeles: GSIL Publications, University of Southern California.
Chan, M. K. M. (2006). Gender-marked speech in Cantonese: The case of sentence-final particles je and jek. Studies in the Linguistic Sciences, 26(1/2), 1–38.
Chao, Y. R. (1947). Cantonese primer. Cambridge, MA: Harvard University Press.
Cheng, S.-P., & Tang, S. W. (2014). Languagehood of Cantonese: A renewed front in an old debate. Open Journal of Modern Linguistics, 4, 389–398.
Cheung, S. H.-N. (1972). Xianggang yueyu yufa yanjiu [A study of the grammar of Hong Kong Cantonese]. Hong Kong: The Chinese University Press.
Chin, A. C. (2011). Yueyu yufa de duojiaodu yanjiu [Studies of Cantonese from multiple perspectives]. Studies in Chinese Linguistics, 31(32), 33–43.
Chin, A. C. (2013). Yueyu yanjiu xin ziyuan: xianggang ershi shiji zhongqi yueyu yuliaoku [New resources for Cantonese studies: A corpus of mid-20th century Hong Kong Cantonese]. Newsletter of Chinese Language, 92(1), 7–16.
Chin, A. C. (2017). Yueyu (sizi) xiehouyu de xiuci gongneng [The rhetorical function of xiehouyu in Cantonese]. Yueyu yanjiu (Studies in Cantonese), 125–134.
Chu, Y. (2002). Hong Kong cinema: Colonizer, motherland and self. London: Routledge.
Chung, P. Y. (2004). Xianggang yingshiye bai nian [The Hong Kong TV and movie industry in the past 100]. Hong Kong: Joint Publishing.
Claridge, C. (2008). Historical corpora. In A. Ludeling & M. Kyto (Eds.), Corpus linguistics: An international handbook (Vol. 1, pp. 242–258). Berlin: Mouton de Gruyter.
Ferguson, C. (1959). Diglossia. Word, 15, 325–340.
Flewitt, R., Hampel, R., Hauck, M., & Lancaster, L. (2014). What are multimodal data and transcription? In C. Jewitt (Ed.), The Routledge handbook of multimodal analysis (pp. 44–59). London: Routledge.
Fonoroff, P. (1988). A brief history of Hong Kong cinema. Renditions, 29 & 30, 293–308.
Fu, P. S., & Desser, D. (Eds.). (2000). The cinema of Hong Kong: History, arts, identity. Cambridge: Cambridge University Press.
Gao, H. (1980). Guangzhou fangyan yanjiu [A study of the Guangzhou dialect]. Hong Kong: Commercial Press.
Guldin, G. E. (1997). Hong Kong ethnicity of folk models and change. In G. Evans & M. Tam (Eds.), Hong Kong: The anthropology of a Chinese metropolis (pp. 25–50). London: Curzon Press.
Jarvie, I. C. (1977). Window on Hong Kong: A sociological study of the Hong Kong film industry and its audience. Hong Kong: Centre of Asian Studies, University of Hong Kong.
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge: Cambridge University Press.
Lai, M. L. (2005). Language attitudes of the first postcolonial generation in Hong Kong secondary schools. Language in Society, 34(4), 263–388.
Lai, M. L. (2009). ‘I love Cantonese but I want English’: A qualitative account of Hong Kong students’ language attitudes. The Asia-Pacific Education Researcher, 18(1), 79–92.
Lai, Y. P., & Chin, A. C. (2017). Yueyu de dongci houzhui “zhe” [The verbal suffix “zhe” in Cantonese]. Hanyu yu hanzangyu qianyan yanjiu – Ding Bangxin xiansheng badie shouqing lunwenji [A festschrift to celebrate the 80th birthday of Professor Ting Pang-hsin]. Beijing: Social Sciences Academic Press.
Lau, C.-F., & So, D. W.-C. (2005). Cong fangyan zachu dao guangfuhua weizhu – 1949–1971 nianjian xianggang shehui yuyan zhuanxing de chubu tantao [A preliminary study of the sociolinguistic realignment in Hong Kong during 1949 and 1971]. Journal of Chinese Sociolinguistics, 5, 89–104.
Lee, J. (2011). Toward a parallel corpus of spoken Cantonese and written Chinese. In Proceedings of the 5th International Joint Conference on Natural Language Processing (pp. 1462–1466), Chiang Mai.
Lee, K. S., & Leung, W. M. (2012). The status of Cantonese in the education policy of Hong Kong. Multilingual Education, 2(2), 1–22.
Lee, H.-T., & Wong, C. (1998). CANCORP: The Hong Kong Cantonese child language corpus. Cahiers de Linguistique – Asie Orientale, 27(2), 211–228.
Leung, C.-S. (2005). A study of the utterance particles in Cantonese as spoken in Hong Kong. Hong Kong: Language Information Sciences Research Centre, City University of Hong Kong.
Leung, M. T., & Law, S. P. (2001). HKCAC: The Hong Kong Cantonese adult language corpus. International Journal of Corpus Linguistics, 6(2), 305–325.
Li, X. (1995). Guangzhou fangyan yanjiu [A study of the Guangzhou dialect]. Guangzhou: Guangdong People’s Publishing House.
Lin, Y., & Qin, F. (2008). Guangxi nanning baihua yanjiu [A study of the Yue dialect in Nanning of Guangxi]. Guilin: Guangxi Normal University Press.
Luke, K. K., & Wong, M. L. Y. (2015). The Hong Kong Cantonese corpus: Design and uses. In B. Tsou & O. Kwong (Eds.), Linguistic corpus and corpus linguistics in the Chinese context, Journal of Chinese Linguistics Monograph Series Number 25 (pp. 312–333). Hong Kong: The Chinese University of Hong Kong Press.
Mai, Y., & Tan, B. Y. (2011). Shiyong guangzhouhua feilei cidian [A practical thesaurus of the Guangzhou dialect]. Hong Kong: Commercial Press.
Matthews, S., & Yip, V. (1994). Cantonese: A comprehensive grammar (1st ed.). London: Routledge.
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press.
McEnery, T., & Wilson, A. (2001). Corpus linguistics: An introduction. Edinburgh: Edinburgh University Press.
Morrison, R. (1828). Vocabulary of the Canton dialect. Macao: The Honorable East India Company’s Press.
Quaglio, P. (2008). Television dialogue and natural conversation: Linguistic similarities and functional differences. A. Adel R. Reppen, Corpora and discourse (pp. 189–210). Amsterdam: John Benjamins.
Quaglio, P. (2009). Television dialogue: The sitcom friends vs. natural conversation. Amsterdam: John Benjamins.
Richardson, K. (2010). Television dramatic dialogue: A sociolinguistic study. New York: Oxford University Press.
Tang, S.-F. (2008). Guanyu pujiaozhong taolun de fansi. [A reflection on using Putonghua to teach the Chinese language subject]. Journal of Basic Education, 17(2), 1–13.
Tang, S. W. (2015). Yueyu yufa jiangyi [Lectures notes on Cantonese grammar]. Hong Kong: Commercial Press.
Teo, S. (2007). Hong Kong cinema: The extra dimensions. London: BFI Publishing.
Trudgill, P. (2003). A glossary of sociolinguistics. Edinburgh: Edinburgh University Press.
Tsou, B. K. (1997). Sanyan, liangyu shuo xianggang. [Trilingualism and biliteracy in Hong Kong]. Journal of Chinese Linguistics, 25(2), 290–307.
Tsou, B. K. (2002). Some considerations for additive bilingualism: A tale of two cities (Singapore and Hong Kong). In D. W.-C. So & G. M. Jones (Eds.), Education and society in plurilingual contexts (pp. 163–198). Brussels: Brussels University Press.
Tsou, B. K., & You, R. (2003). Hanyu yu huaren shehui [Chinese language and society]. Hong Kong: City University of Hong Kong Press.
Wong, P.-W. (2006). The specification of POS tagging of the Hong Kong University Cantonese corpus. International Journal of Technology and Human Interaction, 2(1), 21–38.
Wurm, S. A., Li, R., Baumann, T., & Lee, M. W. (1987). Language Atlas of China. Hong Kong: Longman.
Yang, R. (2008). Yueyu lishi wenxian shujuku de zhizuo yu yingyong [Construction and application of a corpus of early Cantonese texts]. Studies in Chinese Linguistics, 25(1), 1–8.
Yip, V., & Matthews, S. (2007). The bilingual child – Early development and language contact. Cambridge: Cambridge University Press.
You, R. (2002). Xiyang chuanjiaoshi hanyu fangyanxue zhuzuo shumu kaoshu [A study of the reference materials on Chinese dialects compiled by western missionaries]. Harbin: Heilongjiang University.
Yue, A. (2004). Materials for the diachronic study of the Yue dialects. In F. Shi & Z. Shen (Eds.), The joy of research: A festschrift in honor of Professor William S-Y. Wang on his seventieth birthday (pp. 246–271). Nankai: Nankai University.
Yue-Hashimoto, A. (1972). Studies in Yue dialects 1: Phonology of Cantonese. Cambridge: Cambridge University Press.
Yue-Hashimoto, A. (1991). The Yue dialects. In W. S.-Y. Wang (Ed.), Languages and dialects of China, Journal of Chinese Linguistics Monograph Series Number 3 (pp. 294–324). Berkeley: University of California, Berkeley.
Yue-Hashimoto, A. (2005). The Dancun dialect of Taishan. Hong Kong: Language Information Sciences Research Center, City University of Hong Kong.
Zhan, B. (2002). Guangdong yue fangyan gaiyao [An outline of Yue dialects in Guangdong]. Guangzhou: Ji’nan University Press.
Zhan, B., & Cheung, Y.-S. (1988). Zhujiangsanjiaozhou fangyan cihui duizhaobiao [A comparison of vocabulary among the dialects of Pearl River Delta]. Guangzhou: Guangdong People’s Publishing House.
Zhan, B., & Cheung, Y.-S. (1994). Yuebei shi xianshi yue fangyan diaocha baogao [A report on the 10 Yue dialects in northern Guangdong]. Guangzhou: Ji’nan University Press.
Zhan, B., & Cheung, Y.-S. (1998). Yuexi shi xianshi yue fangyan diaocha baogao [A report on the 10 Yue dialects in western Guangdong]. Guangzhou: Ji’nan University Press.
Zhang, Z. (2009). Language and society in early Hong Kong (1841–1884). Guangzhou: Sun Yat-sen University Press.
General Note and Acknowledgment
The construction of The Corpus of Mid-twentieth-century Hong Kong Cantonese (phases I and II) was supported by four research grants: Spoken Corpus construction and linguistic analysis of mid-twentieth-century Cantonese (Internal Research Grant, The Hong Kong Institute of Education, Project No.: RG41/2010–2011), a preliminary linguistic analysis of mid-twentieth-century Cantonese from a corpus-based approach (Internal Research Grant, The Hong Kong Institute of Education, Project No.: RG62/12-13R), linguistic analysis of mid-twentieth-century Hong Kong Cantonese by constructing an annotated spoken corpus (Early Career Scheme, Research Grants Council, Hong Kong SAR Government, Project No.: ECS859713), and initiatives in digital humanities (Central Reserve for Strategic Development, The Education University of Hong Kong). The author would like to acknowledge the following colleagues (in alphabetical order of last names) for their advice and assistance in the development of the corpus and the relevant tools: Dicky Cheung, Hintat Cheung, William Chong, Ka Po Chow, Calvin Lai, Yick Sun Lam, Tin Yau Lau, Chung-sum Leung, Tin King Lo, Wing Ng, Lili Ou, Chris Sun, Cat Tang, Crono Tse, Benjamin Tsou, Alistair Tweed, Byron Wong, and Tak-sum Wong.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: List of 60 Movies with Transcribed Data in the Second Phase of the Corpus
Appendix: List of 60 Movies with Transcribed Data in the Second Phase of the Corpus
Year | Film title | English title |
---|---|---|
1943 | 癡兒女 | Stubborn Lovers |
1947 | 新白金龍 | The New White Golden Dragon |
1948 | 刁蠻宮主 | A Spoilt Brat |
1950 | 血淚洗殘脂 | Blood, Rouge and Tears |
1950 | 細路祥 | The Kid |
1950 | 英雄難過美人關 | The Hero Becomes a Prisoner of Love |
1951 | 唔嫁 | She Says “No” to Marriage |
1951 | 紅菱血(下集) | Mysterious Murder, Part Two |
Hongling’s Blood, Part Two | ||
1952 | 為情顛倒 | Lovesick |
1952 | 十月芥菜 | A Ready Lover |
1952 | 契爺艷史 | Foster-Daddy’s Romantic Affairs |
1953 | 危樓春曉 | In the Face of Demolition |
1953 | 鬼妻 | The Ghostly Wife |
1954 | 好女十八嫁 | Eighteen Marriages of a Smart Girl |
1954 | 長使我郎淚滿襟 | Grief-Stricken for My Husband |
1954 | 金蘭姊妹 | Sworn Sisters |
1955 | 人頭奇案 | The Mystery of the Human Head |
1955 | 半夜奇談 | Strange Tale at Midnight |
1955 | 飛天蠄蟧 | The Flying Spider |
1956 | 失匙夾萬/失匙甲萬濶少爺 | The Scatterbrain |
Alias: All Lost But One | ||
1956 | 九九九命案 | Dragnet |
1956 | 人面桃花相映紅 | Peach-Blossom Face |
1956 | 同撈同煲 | Great Chums |
1957 | 黛綠年華 | The Tender Age |
1957 | 彩鳳引金龍 | She’s So Neat |
1957 | 小婦人 | Four Daughters |
1957 | 鬼夜哭 | The Nightly Cry of the Ghost |
1958 | 歷盡滄桑一美人 | The Beauty Who Lived Through Great Changes |
1958 | 奸情 | Adultery |
1959 | 大廈情殺案 | Crime of Passion in the Mansion |
1959 | 歡喜冤家 | The Quarrelsome Couple |
1960 | 亞福對錯馬票 | A Wonderful Dream |
1960 | 龍鳳合歡花 | The Joyful Matrimony |
1961 | 骨肉情深/父子情深/偷香血債 | Blood Is Thicker Than Water |
1961 | 小千金 | Valuable False Daughter |
1962 | 九九九怪屍案 | 999 Grotesque Corpse |
1962 | 難得有情郎 | He Is a Rare and Passionate Lover |
1962 | 秋風秋雨 | Autumn Wind and Autumn Rain |
1962 | 浴室飛屍 | Murder in the Bathroom |
1963 | 九九九我是兇手 | I Am the Murderer |
1963 | 夜半人狼 | Midnight Were-Wolf |
1963 | 千金之女 | The Millionaire’s Daughter |
1964 | 小夫妻 | Beware of the Husband |
1964 | 死亡角之夜 | A Deadly Night |
1965 | 標準丈夫 | Standard Husband |
Alias: An Ideal Husband | ||
1965 | 女生外向 | When Girls Are in Love |
1965 | 八個兇手/午夜追兇 | Eight Murderers |
1965 | 恩義難忘 | Your Infinitive Kindness |
1966 | 神秘的血案 | A Fatal Adventure |
1966 | 難為了嬌妻 | Love Burst/Aggrieve My Wife |
1966 | 送錯禮餅煲錯薑/喜結良緣 | The Topsy-Turvy Marriage |
1967 | 紅衣少女 | Girl with Red Coat |
1967 | 血染鐵魔掌 | The Anti-poison Heroine |
1967 | 一步一驚心 | Shaky Steps |
Alias: Every Step of Alarm | ||
1968 | 青春歌后 | Lady Songbird |
Alias: The Great Singer | ||
1969 | 說謊的人 | The Liar |
1969 | 相思甜如蜜 | My Sweet Heart |
1969 | 聰明太太笨丈夫 | Lovely Husbands |
1970 | 瘋狂酒 | The Mad Bar |
Alt title: The Crazy Bar | ||
1970 | 歡樂時光 | Happy Times |
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Chin, A.Co. (2019). Initiatives of Digital Humanities in Cantonese Studies: A Corpus of Mid-Twentieth-Century Hong Kong Cantonese. In: Tso, A.Wb. (eds) Digital Humanities and New Ways of Teaching. Digital Culture and Humanities, vol 1. Springer, Singapore. https://doi.org/10.1007/978-981-13-1277-9_5
Download citation
DOI: https://doi.org/10.1007/978-981-13-1277-9_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1276-2
Online ISBN: 978-981-13-1277-9
eBook Packages: Social SciencesSocial Sciences (R0)