Abstract
Language is closely related to how we perceive ourselves and signify our reality. In this scope, we created Desiring Machines, an interactive media art project that allows the experience of affective virtual environments adopting speech emotion recognition as the leading input source. Participants can share their emotions by speaking, singing, reciting poetry, or making any vocal sounds to generate virtual environments on the run. Our contribution combines two machine learning models. We propose a long-short term memory and a convolutional neural network to predict four main emotional categories from high-level semantic and low-level paralinguistic acoustic features. Predicted emotions are mapped to audiovisual representations by an end-to-end process encoding emotion in virtual environments. We use a generative model of chord progressions to transfer speech emotion into music based on the tonal interval space. Also, we implement a generative adversarial network to synthesize an image from the transcribed speech-to-text. The generated visuals are used as the style image in the style-transfer process onto an equirectangular projection of a spherical panorama selected for each emotional category. The result is an immersive virtual space encapsulating emotions in spheres disposed into a 3D environment. Users can create new affective representations or interact with other previously encoded instances (This ArtsIT publication is an extended version of the earlier abstract presented at the ACM MM22 [1]).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Forero, J., Bernardes, G., Mendes, M.: Emotional machines: toward affective virtual environments. In Proceedings of the 30th ACM International Conference on Multimedia (MM 2022), pp. 7237–7238. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3503161.3549973
Minsky, M.: The Emotion Machine. Simon & Schuster (2006)
Picard, R.W.: Affective computing. M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 321 (1995)
Kitayama, S., Markus, H.R.: Emotion and culture: Empirical studies of mutual influence. Am. Psychol. Assoc. 1–19 (1994)
Roach, P.: Techniques for the phonetic description of emotional speech. In: Proceedings of the ISCA Workshop on Speech and Emotion (2000)
Russell, J.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Russell, J.: How shall an emotion be called? (1997)
Scherer, K.: What are emotions? And how can they be measured? Soc. Sci. Inf. 44, 695–729 (2005)
Cowie, R., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18, 32–80 (2001)
Oudeyer, P.-Y.: Novel useful features and algorithms for the recognition of emotions in human speech. In: International Conference on Speech Prosody 2002 (2002)
Petrushin, V.: Emotion in speech: recognition and application to call centers. In: Proceedings of Artificial Neural Networks in Engineering (2000)
Mehrabian, A., Wiener, M.: Decoding inconsistent communications. J. Pers. Soc. Psychol. 6, 109–114 (1967)
Pinilla, A., Garcia, A., Raffe, W., Voigt-Antons, J.-N., Spang, R., Müller, S.: Affective visualization in virtual reality: an integrative review. Front. Virtual Reality 2, 630731 (2021)
Aylett, R., Cavazza, M.: Intelligent virtual environments – a state-of-the-art report. In: Proceedings of the Eurographics Workshop in Manchester UK (2001)
Karlgren, J., Bretan, N., Jonsson, L.: Interaction models, reference, and interactivity for speech interfaces to virtual environments. In: Göbel, M. (ed.) Virtual Environments 1995. Eurographics, pp. 149–159. Springer, Vienna (1995). https://doi.org/10.1007/978-3-7091-9433-1_13
Kamath, R., Kamat, R.: Development of an intelligent virtual environment for augmenting natural language processing in virtual reality systems. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 2, 198–203 (2013)
Everett, S., Wauchope, K., Pérez, M.: A natural language interface for virtual reality systems (1995)
Clay, S.R., Wilhelms, J.: Put: language-based interactive manipulation of objects. IEEE Comput. Graph. Appl. 16, 31–39 (1996)
Levin, G., Lieberman, Z.: In-situ speech visualization in real-time interactive installation and performance (2004)
Sra, M., Maes, P., Vijayaraghavan, P., Roy, D.: Auris: creating affective virtual spaces from music. In: Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology (VRST 2017), pp. 1–11 (2017)
Sheng, E., Uthus, D.: Investigating societal biases in a poetry composition system. arXiv (2020)
Maas, A., Daly, R., Pham, P., Huan, D., Ng, A., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)
Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (RAVDESS). Zenodo 13 (2018)
Holz, N., Larrouy-Maestri, P., Poeppel, D.: The variably intense vocalizations of affect and emotion (VIVAE) corpus prompts new perspectives on nonspeech perception. Emotion 22, 213 (2022)
Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5, 377–390 (2014)
Bernardes, G., Cocharro, D., Guedes, C., Davies, M.E.P.: Conchord: an application for generating musical harmony by navigating in the tonal interval space. In: Kronland-Martinet, R., Aramaki, M., Ystad, S. (eds.) CMMR 2015. LNCS, vol. 9617, pp. 243–260. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46282-0_15
Lacan, J.: Écrits, A Selection. Alan Sheridan (1977)
Deleuze, G.: L’Île déserte et autres textes: textes et entretiens 1953–1974. In: Lapoujade, D. (ed.) Les Éditions de Minuit (2002)
Deleuze, G., Guattari, F.: Qu’est-ce que la philosophie? N.p.: Editions de Minuit (1991)
Acknowledgments
This research was partially funded by the project “Experimentation in music in Portuguese culture: History, contexts and practices in the 20th and 21st centuries” (POCI-01-0145-FEDER-031380) co-funded by the European Union through the Operational Program Competitiveness and Internationalization, in its ERDF component, and by national funds, through the Portuguese Foundation for Science and Technology (FCT). It was also partially funded by FCT in the scope of the projects: UIDB/11918/2022 and UIDB/50009/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Forero, J., Bernardes, G., Mendes, M. (2023). Desiring Machines and Affective Virtual Environments. In: Brooks, A.L. (eds) ArtsIT, Interactivity and Game Creation. ArtsIT 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 479. Springer, Cham. https://doi.org/10.1007/978-3-031-28993-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-28993-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28992-7
Online ISBN: 978-3-031-28993-4
eBook Packages: Computer ScienceComputer Science (R0)