Abstract
Embodied Cognition (EC) states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience, making this biological semantic machinery noisy with respect to semantics inherent to media, such as music and language. We propose to represent media semantics using low-dimensional vector embeddings by jointly modeling the functional Magnetic Resonance Imaging (fMRI) activity of several brains via Generalized Canonical Correlation Analysis (GCCA). We evaluate the semantic richness of the resulting latent space in appropriate semantic classification tasks: music genres and language topics. We show that the resulting unsupervised representations outperform the original high-dimensional fMRI voxel spaces in these downstream tasks while being more computationally efficient. Furthermore, we show that joint modeling of several subjects increases the semantic richness of the learned latent vector spaces as the number of subjects increases. Quantitative results and corresponding statistical significance testing demonstrate the instantiation of music and language semantics in the brain, thereby providing further evidence for multimodal embodied cognition as well as a method for extraction of media semantics from multi-subject brain dynamics.
Similar content being viewed by others
Data Availability
The data used for these experiments are based on the following repositories: https://openneuro.org/datasets/ds000113/versions/1.3.0 (MG) and https://osf.io/crwz7 (LT243 and LT384).
Code Availability
The source code is freely available for use from https://gitlab.hlt.inesc-id.pt/fraposo/fmri-gcca-pub.
References
Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., et al. (2014). Machine Learning for Neuroimaging with Scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014
Bestgen, Y. (2015). Exact Expected Average Precision of the Random Baseline for System Evaluation. The Prague Bulletin of Mathematical Linguistics, 103(1), 131–138. https://doi.org/10.1515/pralin-2015-0007
Bidelman, G. M., & Krishnan, A. (2009). Neural Correlates of Consonance, Dissonance, and the Hierarchy of Musical Pitch in the Human Brainstem. Journal of Neuroscience, 29(42), 13165–13171. https://doi.org/10.1523/jneurosci.3900-09.2009
Brown, S., & Jordania, J. (2011). Universals in the World’s Musics. Psychology of Music, 41(2), 229–248. https://doi.org/10.1177/0305735611425896
Casey, M. A. (2017). Music of the 7Ts: Predicting and Decoding Multivoxel fMRI Responses with Acoustic, Schematic, and Categorical Music Features. Frontiers in Psychology, 8, 1179. https://doi.org/10.3389/fpsyg.2017.01179
Cortes, C., & Vapnik, V. (1995). Support-vector Networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Desai, R. H., Binder, J. R., Conant, L. L., Mano, Q. R., & Seidenberg, M. S. (2011). The Neural Career of Sensory-motor Metaphors. Journal of Cognitive Neuroscience, 23(9), 2376–2386. https://doi.org/10.1162/jocn.2010.21596
Eitan, Z., & Rothschild, I. (2011). How Music Touches: Musical Parameters and Listeners’ Audio-tactile Metaphorical Mappings. Music Perception, 39(4), 449–467. https://doi.org/10.1177/0305735610377592
Friston, K. (2009). The Free-Energy Principle: A Rough Guide to the Brain? Trends in Cognitive Sciences, 13(7), 293–301. https://doi.org/10.1016/j.tics.2009.04.005
Hanke, M., Dinga, R., Häusler, C., Guntupalli, J. S., Casey, M., Kaule, F. R., Stadler, J. (2015). High-resolution 7-Tesla fMRI Data on the Perception of Musical Genres - An Extension to the Studyforrest Dataset. F1000Research 4, 174. https://doi.org/10.12688/f1000research.6679.1
Hershey, S., Chaudhuri, S., Ellis, D. P. W., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., Seybold, B., Slaney, M., Weiss, R. J., Wilson, K. (2017). CNN Architectures for Large-scale Audio Classification. In: Proceedings of the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing. pp 131–135. https://doi.org/10.1109/icassp.2017.7952132
Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634
Horst, P. (1961). Generalized Canonical Correlations and their Applications to Experimental Data. Journal of Clinical Psychology, 17(4), 331–347. https://doi.org/10.1002/1097-4679(196110)17:4
Hotelling, H. (1936). Relations Between Two Sets of Variates. Biometrika, 28(3), 321–377. https://doi.org/10.2307/2333955
Juslin, P. N. (2013). From Everyday Emotions to Aesthetic Emotions: Towards a Unified Theory of Musical Emotions. Physics of Life Reviews, 10(3), 235–266. https://doi.org/10.1016/j.plrev.2013.05.008
Kettenring, J. R. (1971). Canonical Analysis of Several Sets of Variables. Biometrika, 58(3), 433–451. https://doi.org/10.1093/biomet/58.3.433
Kiefer, M., & Pulvermüller, F. (2012). Conceptual Representations in Mind and Brain: Theoretical Developments. Current Evidence and Future Directions. Cortex, 48(7), 805–825. https://doi.org/10.1016/j.cortex.2011.04.006
Knapp, T. R. (1978). Canonical Correlation Analysis: A General Parametric Significance-testing System. Psychological Bulletin, 85(2), 410–416. https://doi.org/10.1037/0033-2909.85.2.410
Koelsch, S., Vuust, P., & Friston, K. (2019). Predictive Processes and the Peculiar Case of Music. Trends in Cognitive Sciences, 23(1), 63–77. https://doi.org/10.1016/j.tics.2018.10.006
Korsakova-Kreyn, M. (2018). Two-level Model of Embodied Cognition in Music. Psychomusicology: Music, Mind, and Brain 28(4), 240–259. https://doi.org/10.1037/pmu0000228
Lakoff, G. (2012). Explaining Embodied Cognition Results. Topics in Cognitive Science, 4(4), 773–785. https://doi.org/10.1111/j.1756-8765.2012.01222.x
Lakoff, G. (2014). Mapping the Brain’s Metaphor Circuitry: Metaphorical Thought in Everyday Reason. Frontiers in Human Neuroscience, 8, 958. https://doi.org/10.3389/fnhum.2014.00958
Leman, M. (2010). An Embodied Approach to Music Semantics. Musicae Scientiae, 14(1), 43–67. https://doi.org/10.1177/10298649100140S104
Li, R., Johansen, J. S., Ahmed, H., Ilyevsky, T. V., Wilbur, R. B., Bharadwaj, H. M., Siskind, J. M. (2018). Training on the Test Set? An Analysis of Spampinato et al. CoRR abs/1812.07697
Maes, P. J., Leman, M., Palmer, C., & Wanderley, M. M. (2014). Action-based Effects on Music Perception. Frontiers in Psychology, 4, 1008. https://doi.org/10.3389/fpsyg.2013.01008
Meteyard, L., Cuadrado, S. R., Bahrami, B., & Vigliocco, G. (2012). Coming of Age: A Review of Embodiment and the Neuroscience of Semantics. Cortex, 48(7), 788–804. https://doi.org/10.1016/j.cortex.2010.11.002
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pennington, J., Socher, R., Manning, C. D. (2014). GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1532–1543. https://doi.org/10.3115/v1/d14-1162
Pereira, F., Lou, B., Pritchett, B., Ritter, S., Gershman, S. J., Kanwisher, N., et al. (2018). Toward a Universal Decoder of Linguistic Meaning from Brain Activation. Nature Communications, 9, 963. https://doi.org/10.1038/s41467-018-03068-4
Pulvermüller, F. (2018). Neurobiological Mechanisms for Semantic Feature Extraction and Conceptual Flexibility. Topics in Cognitive Science, 10(3), 590–620. https://doi.org/10.1111/tops.12367
Ralph, M. A. L., Jefferies, E., Patterson, K., & Rogers, T. T. (2017). The Neural and Computational Bases of Semantic Cognition. Nature Reviews Neuroscience, 18(1), 42–55. https://doi.org/10.1038/nrn.2016.150
Raposo, F. A., de Matos, D. M., & Ribeiro, R. (2021). Assessing Kinetic Meaning of Music and Dance via Deep Cross-Modal Retrieval. Neural Computing and Applications. https://doi.org/10.1007/s00521-021-06090-8
Schlenker, P. (2017). Outline of Music Semantics. Music Perception, 35(1), 3–37. https://doi.org/10.1525/mp.2017.35.1.3
Thibodeau, P. H., & Boroditsky, L. (2013). Natural Language Metaphors Covertly Influence Reasoning. PLOS One, 8(1), e52961. https://doi.org/10.1371/journal.pone.0052961
van der Maaten, L., & Hinton, G. (2008). Visualizing Data Using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Virtala, P., Huotilainen, M., Partanen, E., Fellman, V., & Tervaniemi, M. (2013). Newborn Infants’ Auditory System is Sensitive to Western Music Chord Categories. Frontiers in Psychology, 4, 492. https://doi.org/10.3389/fpsyg.2013.00492
Wallmark, Z. (2019). Semantic Crosstalk in Timbre Perception. Music & Science, 2, 1–18. https://doi.org/10.1177/2059204319846617
Widmann, A., Kujala, T., Tervaniemi, M., Kujala, A., & Schröger, E. (2004). From Symbols to Sounds: Visual Symbolic Information Activates Sound Representations. Psychophysiology, 41(5), 709–715. https://doi.org/10.1111/j.1469-8986.2004.00208.x
Yousefnezhad, M., & Zhang, D. (2018). Anatomical Pattern Analysis for Decoding Visual Stimuli in Human Brains. Cognitive Computation, 10(2), 284–295. https://doi.org/10.1007/s12559-017-9518-9
Yu, Y., Tang, S., Raposo, F., & Chen, L. (2019). Deep Cross-modal Correlation Learning for Audio and Lyrics in Music Retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(1), 20. https://doi.org/10.1145/3281746
Funding
Francisco Afonso Raposo is supported by a PhD scholarship granted by Fundação para a Ciância e a Tecnologia (FCT), with reference SFRH/BD/135659/2018. Additionally, this work was supported by Portuguese national funds through FCT, with reference UIDB/50021/2020.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Raposo, F.A., Martins de Matos, D. & Ribeiro, R. Learning Low-Dimensional Semantics for Music and Language via Multi-Subject fMRI. Neuroinform 20, 451–461 (2022). https://doi.org/10.1007/s12021-021-09560-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12021-021-09560-5