Abstract
In human speech, laughter has a special role as an important non-verbal element, signaling a general positive affect and cooperative intent. However, laughter occurrences may be categorized into several sub-groups, each having a slightly or significantly different role in human conversation. It means that, besides automatically locating laughter events in human speech, it would be beneficial if we could automatically categorize them as well. In this study, we focus on laughter events occurring in Hungarian spontaneous conversations. First we use the manually annotated occurrence time segments, and the task is to simply determine the correct laughter type via Deep Neural Networks (DNNs). Secondly we seek to localize the laughter events as well, for which we utilize Hidden Markov Models. Detecting different laughter types also poses a challenge to DNNs due to the low number of training examples for specific types, but this can be handled using the technique of probabilistic sampling during frame-level DNN training.
This study was partially funded by the National Research, Development and Innovation Office of Hungary via contract NKFIH FK-124413. Gábor Gosztolya was also supported by the Ministry of Human Capacities, Hungary (grant 20391-3/2018/FEKUSTRAT). András Beke was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
NIST Spoken Term Detection 2006 Evaluation Plan (2006). http://www.nist.gov/speech/tests/std/docs/std06-evalplan-v10.pdf
Ayadi, M.E., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
Bachorowski, J.A., Smoski, M.J., Owren, M.J.: The acoustic features of human laughter. J. Acoust. Soc. Am. 110(3), 1581–1597 (2001)
Bourlard, H., Morgan, N.: Connectionist Speech Recognition - A Hybrid Approach. Kluwer Academic (1994)
Brueckner, R., Schuller, B.: Hierarchical neural networks and enhanced class posteriors for social signal classification. In: Proceedings of ASRU, pp. 362–367 (2013)
Campbell, N., Kashioka, H., Ohara, R.: No laughing matter. In: Proceedings of Interspeech, pp. 465–468, Lisbon, Portugal (2005)
Galvan, C., Manangan, D., Sanchez, M., Wong, J., Cu, J.: Audiovisual affect recognition in spontaneous Filipino laughter. In: Proceedings of KSE, pp. 266–271 (2011)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks. In: Proceedings of AISTATS, pp. 315–323 (2011)
Gósy, M.: BEA: a multifunctional Hungarian spoken language database. Phonetician 105(106), 50–61 (2012)
Gosztolya, G.: On evaluation metrics for social signal detection. In: Proceedings of Interspeech, pp. 2504–2508, Dresden, Germany, September 2015
Gosztolya, G., Beke, A., Neuberger, T., Tóth, L.: Laughter classification using Deep Rectifier Neural Networks with a minimal feature subset. Arch. Acoust. 41(4), 669–682 (2016)
Gosztolya, G., Grósz, T., Tóth, L.: Social signal detection by probabilistic sampling DNN training. IEEE Trans. Affect. Comput. (2019, to appear)
Gosztolya, G., Grósz, T., Tóth, L., Beke, A., Neubergers, T.: Neurális hálók tanítása valószínűségi mintavételezéssel nevetések felismerésére. In: Proceedings of MSZNY, pp. 136–145, Szeged, Hungary (2017). (in Hungarian)
Grammer, K., Eibl-Eibesfeldt, I.: The ritualisation of laughter, Chapter 10. In: Natürlichkeit der Sprache und der Kultur: Acta colloquii, pp. 192–214, Brockmeyer (1990)
Griffin, H.J., et al.: Laughter type recognition from whole body motion. In: Proceedings of ACII, pp. 349–355 (2013)
Tóth, L., Grósz, T.: A comparison of deep neural network training methods for large vocabulary speech recognition. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 36–43. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40585-3_6
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Laskowski, K.: Contrasting emotion-bearing laughter types in multi participant vocal activity detection for meetings. In: Proceedings of ICASSP, pp. 4765–4768 (2009)
Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L.: Neural network classification and prior class probabilities. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 299–313. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49430-8_15
McDermott, E., Heigold, G., Moreno, P., Senior, A., Bacchiani, M.: Asynchronous stochastic optimization for sequence training of Deep Neural Networks: towards big data. In: Proceedings of Interspeech, pp. 1224–1228, September 2014
McKeown, G., Cowie, R., Curran, W., Ruch, W., Douglas-Cowie, E.: Ilhaire laughter database. In: Proceedings of LREC, pp. 32–35 (2012)
Neuberger, T., Beke, A.: Automatic laughter detection in Hungarian spontaneous speech using GMM/ANN hybrid method. In: Proceedings of SJUSK Conference on Contemporary Speech Habits, pp. 1–13 (2013)
Ohara, R.: Analysis of a laughing voice and the method of laughter in dialogue speech. Master’s thesis, Nara Institute of Science and Technology, Ikoma, Japan (2004)
Pokorny, F.B., et al.: Manual versus automated: the challenging routine of infant vocalisation segmentation in home videos to study neuro(mal)development. In: Proceedings of Interspeech, San Francisco, CA, USA, pp. 2997–3001, September 2016
Ross, M.D., Owren, M.J., Zimmermann, E.: The evolution of laughter in great apes and humans. Commun. Integr. Biol. 3(2), 191–194 (2010)
Salamin, H., Polychroniou, A., Vinciarelli, A.: Automatic detection of laughter and fillers in spontaneous mobile phone conversations. In: Proceedings of SMC, pp. 4282–4287 (2013)
Tóth, L.: Phone recognition with hierarchical Convolutional Deep Maxout Networks. EURASIP J. Audio Speech Music Process. 2015(25), 1–13 (2015)
Tóth, L.: Phone recognition with deep sparse rectifier neural networks. In: Proceedings of ICASSP, pp. 6985–6989 (2013)
Tóth, L., Kocsor, A.: Training HMM/ANN hybrid speech recognizers by probabilistic sampling. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 597–603. Springer, Heidelberg (2005). https://doi.org/10.1007/11550822_93
Young, S., et al.: The HTK Book. Cambridge University Engineering Department, Cambridge (2006)
Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gosztolya, G., Beke, A., Neuberger, T. (2019). Differentiating Laughter Types via HMM/DNN and Probabilistic Sampling. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-26061-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)