Applying Functional Partition in the Investigation of Lexical Tonal-Pattern Categories in an Under-Resourced Chinese Dialect

  • Junru WuEmail author
  • Yiya Chen
  • Vincent J. van Heuven
  • Niels O. Schiller
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 807)


The present study applied functional partition to investigate disyllabic lexical tonal-pattern categories in an under-resourced Chinese dialect, Jinan Mandarin. A Two-Stage partitioning procedure was introduced to process a multi-speaker corpus that contains irregular lexical variants in a semi-automatic way. In the first stage, a program provides suggestions for the phonetician to decide the lexical tonal variants for the recordings of each word, based on the result of a functional k-means partitioning algorithm and tonal information from an available pronunciation dictionary of a related Chinese dialect, i.e. Standard Chinese. The second stage iterates a functional version of k-means partitioning with Silhouette-based criteria to abstract an optimal number of tonal patterns from the whole corpus, which also allows the phoneticians to adjust the results of the automatic procedure in a controlled way and so redo partitioning for a subset of clusters. The procedure yielded eleven disyllabic tonal patterns for Jinan Mandarin, representing the tonal system used by contemporary Jinan Mandarin speakers from a wide range of age groups. The procedure used in this paper is different from previous linguistic descriptions, which were based on more elderly speakers’ pronunciations. This method incorporates phoneticians’ linguistic knowledge and preliminary linguistic resources into the procedure of partitioning. It can improve the efficiency and objectivity in the investigation of lexical tonal-pattern categories when building pronunciation dictionaries for under-resourced languages.


Pattern recognition Phonetics Tone Pronunciation dictionary K-means partition 



J. Wu’s work was supported by a PhD Scholarship sponsored by Talent and Training China-Netherlands Program, by “Chenguang Program” supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission, and by Shanghai Philosophy and Social Sciences Fund (Grant number 2017BYY001). We would like to thank the support to Yiya Chen from the European Research Council (ERC-Starting Grant 206198).


  1. 1.
    Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRefGoogle Scholar
  2. 2.
    Wu, J., Chen, Y., van Heuven, V.J., Schiller, N.O.: Tonal variability in lexical access. Lang. Cogn. Neurosci. 29, 1317–1324 (2014)CrossRefGoogle Scholar
  3. 3.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-Means Clustering Algorithm. J. R. Stat. Soc. Ser. C. Appl. Stat. 28, 100–108 (1979)CrossRefGoogle Scholar
  4. 4.
    Iverson, P., Kuhl, P.K.: Tests of the perceptual magnet effect for American English/r/and/l. J. Acoust. Soc. Am. 95, 2976 (1994)CrossRefGoogle Scholar
  5. 5.
    Iverson, P., Kuhl, P.K.: Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. J. Acoust. Soc. Am. 97, 553–562 (1995)CrossRefGoogle Scholar
  6. 6.
    Estivill-Castro, V.: Why so many clustering algorithms: a position paper. ACM SIGKDD Explor. Newsl. 4, 65–75 (2002)CrossRefGoogle Scholar
  7. 7.
    Febrero-Bande, Manuel: Manuel Oviedo de la Fuente: Statistical Computing in Functional Data Analysis: The R Package fda.usc. J. Stat. Softw. 51, 1–28 (2012)CrossRefGoogle Scholar
  8. 8.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  9. 9.
    Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: cluster: Cluster Analysis Basics and Extensions. R package version 1.15.2. (2014)Google Scholar
  10. 10.
    Fraiman, R., Muniz, G.: Trimmed means for functional data. Test. 10, 419–440 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Qian, Z.-Y.: Jinan fangyan cidian yinlun. Introduction to the Jinan Dialect Dictionary. Fangyan Dialects. 95, 242–256 (1995)Google Scholar
  12. 12.
    Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5, 341–345 (2002)Google Scholar
  13. 13.
    Lobanov, B.M.: Classification of Russian vowels spoken by different speakers. J. Acoust. Soc. Am. 49, 606–608 (1971)CrossRefGoogle Scholar
  14. 14.
    Chen, Y.: How does phonology guide phonetics in segment–f0 interaction? J. Phon. 39, 612–625 (2011)CrossRefGoogle Scholar
  15. 15.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. Presented at the ACM Sigmod Record (2000)Google Scholar
  16. 16.
    R_Core_Team: R: A language and environment for statistical computing, Computer program. R Foundation for Statistical Computing, Vienna, Austria, version 2.15 (2013)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Chinese Language and Literature, Lab of Language Cognition and EvolutionEast China Normal UniversityShanghaiChina
  2. 2.Leiden University Centre for LinguisticsLeidenNetherlands
  3. 3.Department of Hungarian and Applied LinguisticsUniversity of PannoniaVeszprémHungary

Personalised recommendations