Building Cantonese Dictionaries Using Crowdsourcing Strategies: The Project

  • Chaak-ming Lau
Part of the Digital Culture and Humanities book series (DICUHU, volume 1)


The project is the first attempt to build a Cantonese-to-Cantonese dictionary using a lean start-up (see Ries, The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business, 2011) model combined with crowdsourcing strategies. The goal is to produce a comprehensive dictionary written for Cantonese and in Cantonese. Existing resources are often (1) not available electronically, (2) out of date, or (3) too Anglo- or Sino-centric. Building large data sets from these existing resources requires a lot of editing and ‘data-janitorial’ work, which can be done far better with a large group of less-experienced people than just a handful of experts, and crowdsourcing strategies are particularly appropriate in these cases. We started with a small team of editors and software developers in 2014. In less than 3 years’ time, we grew into an organisation with over 400 volunteers, gathered over 42,000 entries, of which more than 36,000 entries have been edited with Written Cantonese descriptions, examples, and translations as of June 2017. Given the nature of the project and the member composition – a language with no authority to fall back on and most members with no formal linguistics or lexicographical training – we adhere to two simple principles, in order to keep the dictionary growing without introducing major issues in the core data: ‘usage over etymology’ and ‘decision problem avoidance’. I will discuss how these principles have shaped the architecture of the project, the editing workflow, and other technological difficulties that we face.


Cantonese Dictionary compilation Crowdsourcing Usage over etymology Decision problem avoidance Open data 


  1. Bauer, R. (1988). Written Cantonese of Hong Kong. Cahiers de Linguistique-Asie Orientale, 17(2), 245–293.CrossRefGoogle Scholar
  2. Caau2. n.d. Retrieved June 25, 2017, from
  3. Cantonese Wikipedia. n.d. Retrieved June 25, 2017, from Wikipedia
  4. Chin, A. C.-O. (2018). Initiatives of digital humanities in Cantonese studies: A corpus of mid-20th century Hong Kong Cantonese. In K.-K. Tam (Ed.), Digital humanities and new ways of teaching. Singapore: Springer.Google Scholar
  5. Chishima, E. (2005). Tōhō Kantongo Jiten [Tōhō Cantonese dictionary]. Tōkyō: Tōhō Shoten.Google Scholar
  6. Cowles, R. (1965). Cantonese speaker’s dictionary. Hong Kong: Hong Kong University Press.Google Scholar
  7. Eitel, E. (1877). A Chinese dictionary in the Cantonese dialect. London: Trübner and Co. 57 & 59, Ludgate Hill and Hong Kong: Lane, Crawford & Co.Google Scholar
  8. Ferguson, C. (1959). Diglossia. Word, 15, 325–340.CrossRefGoogle Scholar
  9. Huang, P. (1970). Cantonese dictionary: Cantonese-English, English-Cantonese. New Haven: Yale University Press.Google Scholar
  10. Hutton, C., & Bolton, K. (2005). A dictionary of Cantonese slang: The language of Hong Kong movies, street gangs and city life. Honolulu: University of Hawaii Press.Google Scholar
  11. Kong, Z. N. (1933). Guangdong Suyu Kao [Study on common sayings in Cantonese]. Guangzhou: Nanfang Fulunshe.Google Scholar
  12. Lau, S. (1977). A practical Cantonese-English dictionary. Hong Kong: Hong Kong Government Printer.Google Scholar
  13. Leung, M., & Law, S. (2002). HKCAC: The Hong Kong Cantonese adult language corpus. International Journal of Corpus Linguistics, 6(2), 305–326.CrossRefGoogle Scholar
  14. Li, Y. M. F. (2011). Qingmo Minchu de Yueyu Shuxie [Cantonese writing in late Qing and early Republic of China]. Hong Kong: Joint Publishing (HK).Google Scholar
  15. Luke, K., & Wong, M. (2015). The Hong Kong Cantonese corpus: Design and uses. In B. K. Tsou & O. Y. Kwong (Eds.), JCL monograph series no. 25: Linguistic corpus and corpus linguistics in the Chinese context (pp. 312–333). Hong Kong: The Chinese University Press.Google Scholar
  16. Meyer, B., & Wempe, T. (1935). The student’s Cantonese-English dictionary. Unknown: St. Louis Industrial School Printing Press.Google Scholar
  17. Qu, D. J. (1678). Guangdong Xinyu [New words about Guangdong]. (n.p.)Google Scholar
  18. Rieder, B., & Röhle, T. (2012). Digital methods: Five challenges. In D. Berry (Ed.), Understanding digital humanities (pp. 67–84). London: Palgrave Macmillan.Google Scholar
  19. Ries, E. (2011). The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business.Google Scholar
  20. Snow, D. B. (2004). Cantonese as written language: The growth of a written Chinese vernacular. Hong Kong: Hong Kong University Press.Google Scholar
  21. Snow, D. B. (2008). Cantonese as written standard? Journal of Asian Pacific Communication, 18(2), 190–208.CrossRefGoogle Scholar
  22. Tang, S. W. (2015). Yueyu Yufa Jiangyi [Lectures on Cantonese grammar]. Hong Kong: Commercial Press.Google Scholar
  23. Wong, S. L. (1941). Yueyin Yunhui [A Chinese syllabary pronounced according to the dialect of Canton]. Hong Kong.Google Scholar
  24. Zhao, Z. Y. (1821). Yue’ou [Cantonese folklore].Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Chaak-ming Lau
    • 1
  1. 1.The Chinese University of Hong KongHong KongChina

Personalised recommendations