Evaluation of Audio Feature Groups for the Prediction of Arousal and Valence in Music

Vatolkin, Igor; Nagathil, Anil

doi:10.1007/978-3-030-25147-5_19

Igor Vatolkin²³ &
Anil Nagathil²⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1053 Accesses
1 Citations

Abstract

Computer-aided prediction of arousal and valence ratings helps to automatically associate emotions with music pieces, providing new music categorisation and recommendation approaches, and also theoretical analysis of listening habits. The impact of several groups of music properties like timbre, harmony, melody or rhythm on perceived emotions has often been studied in literature. However, only little work has been done to extensively measure the potential of specific feature groups, when they supplement combinations of other possible features already integrated into the regression model. In our experiment, we measure the performance of multiple linear regression applied to combinations of energy, harmony, rhythm and timbre audio features to predict arousal and valence ratings. Each group is represented by a smaller number of dimensions estimated with the help of Minimum Redundancy–Maximum Relevance (MRMR) feature selection. The results show that cepstral timbre features are particularly useful to predict arousal, and rhythm features are the most relevant to predict valence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Article Google Scholar
Ding, C. H. Q., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal on Bioinformatics and Computational Biology, 3(2), 185–206.
Article Google Scholar
Fujishima, T. (1999). Realtime chord recognition of musical sound: a system using common lisp music. In Proceedings of the international computer music conference (ICMC) (pp. 464–467).
Google Scholar
Grekow, J. (2018). Audio features dedicated to the detection and tracking of arousal and valence in musical compositions. Journal of Information and Telecommunication, 2(3), 322–333. https://doi.org/10.1080/24751839.2018.1463749.
Article Google Scholar
Hevner, K. (1930). Tests for aesthetic appreciation in the field of music. Journal of Applied Psychology, 14, 470–477.
Article Google Scholar
Huq, A., Pablo Bello, J., & Rowe, R. (2010). Automated music emotion recognition: A systematic evaluation. Journal of New Music Research, 39(3), 227–244.
Article Google Scholar
Jiang, D. N., Lu, L., Zhang, H. J., Tao, J. H., & Cai, L. H. (2002). Music type classification by spectral contrast feature. In Proceedings IEEE international conference on multimedia and expo (ICME) (vol. 1, pp. 113–116). IEEE.
Google Scholar
Katayose, H., Imai, M., & Inokuchi, S. (1988). Sentiment extraction in music. In Proceedings of the 9th international conference on pattern recognition (ICPR) (pp. 1083–1087). IEEE.
Google Scholar
Kramer, P. (2016). Relevanz cepstraler Merkmale für Vorhersagen im Arousal-Valence Modell auf Musiksignaldaten. Bachelor’s thesis. TU Dortmund: Department of Computer Science.
Google Scholar
Malik, M., Adavanne, S., Drossos, K., Virtanen, T., Ticha, D., & Jarina, R. (2017). Stacked convolutional and recurrent neural networks for music emotion recognition. CoRR. arXiv:abs/1706.02292. (2017)
Martin, R., & Nagathil, A. (2009). Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP).
Google Scholar
Mauch, M., & Dixon, S. (2010). Approximate note transcription for the improved identification of difficult chords. In J. S. Downie, & R. C. Veltkamp (Eds.), Proceedings of the 11th international society for music information retrieval conference (ISMIR) (pp. 135–140).
Google Scholar
McFee, B., Raffel, C., Liang, D., Ellis, D., McVicar, M., & Battenberg, E. (2015). Librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in science conference (pp. 1–7).
Google Scholar
McKinney, M. F., & Breebaart, J. (2003). Features for audio and music classification. In Proceedings of international society of music information retrieval conference (ISMIR) (vol. 3, pp. 151–158).
Google Scholar
Mierswa, I., & Morik, K. (2005). Automatic feature extraction for classifying audio data. Machine Learning Journal, 58(2–3), 127–149.
Article Google Scholar
Müller, M., & Ewert, S. (2011). Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: A. Klapuri, & C. Leider (Eds.), Proceedings of the 12th international conference on music information retrieval (ISMIR) (pp. 215–220). University of Miami.
Google Scholar
Nagathil, A., & Martin, R. (2016). Signal-level features. In C. Weihs, D. Jannach, I. Vatolkin, & G. Rudolph (Eds.), Music data analysis: foundations and applications (pp. 145–164). CRC Press.
Google Scholar
Panda, R., Malheiro, R., Rocha, B., Oliveira, A., & Paiva, R. P. (2013). Multi-modal music emotion recognition: A new dataset, methodology and comparative analysis. In Proceedings of the 10th international symposium on computer music multidisciplinary research (CMMR). Berlin: Springer.
Google Scholar
Panda, R., Malheiro, R. M., & Paiva, R. P. (2018). Novel audio features for music emotion recognition. IEEE Transactions on Affective Computing, 1–1. https://doi.org/10.1109/TAFFC.2018.2820691
Panda, R., Rocha, B., & Paiva, R. P. (2013). Dimensional music emotion recognition: Combining standard and melodic audio features. In Proceedings of the 10th international symposium on computer music multidisciplinary research (CMMR). Berlin: Springer.
Google Scholar
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.
Article Google Scholar
Saari, P., Eerola, T., & Lartillot, O. (2011). Generalizability and simplicity as criteria in feature selection: Application to mood classification in music. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1802–1812.
Article Google Scholar
Scherer, K. R. (1982). Vokale Kommunikation: Nonverbale Aspekte des Sprachverhaltens. Weinheim/Basel: Beltz.
Google Scholar
Schmidt, E.M., & Kim, Y. E. (2011). Learning emotion-based acoustic features with deep belief networks. In 2011 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp. 65–68). https://doi.org/10.1109/ASPAA.2011.6082328.
Schubert, E. (2004). Modeling perceived emotion with continuous musical features. Music Perception, 21(4), 561–585.
Article MathSciNet Google Scholar
Soleymani, M., Caro, M. N., Schmidt, E. M., Sha, C. Y., & Yang, Y. H. (2013). 1000 songs for emotional analysis of music. In Proceedings of the 2nd ACM international workshop on crowdsourcing for multimedia (pp. 1–6). USA: CrowdMM 13. https://doi.org/10.1145/2506364.2506365.
Tellegen, A., Watson, D., & Clark, L. A. (1999). On the dimensional and hierarchical structure of affect. Psychological Science, 10(4), 297–303.
Article Google Scholar
Vatolkin, I., Theimer, W., & Botteck, M. (2010). AMUSE (Advanced MUSic Explorer)—a multitool framework for music data analysis. In: J. S. Downie, & R. C. Veltkamp (Eds.), Proceedings of the 11th international society on music information retrieval conference (ISMIR) (pp. 33–38).
Google Scholar
Vatolkin, I., & Rudolph, G. (2018). Comparison of audio features for recognition of western and ethnic instruments in polyphonic mixtures. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018 (pp. 554–560). Paris, France.
Google Scholar
Yang, X., Dong, Y., & Li, J. (2018). Review of data features-based music emotion recognition methods. Multimedia Systems, 24(4), 365–389. https://doi.org/10.1007/s00530-017-0559-4.
Article Google Scholar
Yang, Y. H., & Chen, H. H. (2011). Music emotion recognition. CRC Press.
Google Scholar

Download references

Acknowledgements

We thank Philipp Kramer for providing the code and explanations of experiments from his bachelor’s thesis, in particular for the extraction of MFCC and OBSC features.

Author information

Authors and Affiliations

Department of Computer Science, Technische Universität Dortmund, Dortmund, Germany
Igor Vatolkin
Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany
Anil Nagathil

Authors

Igor Vatolkin
View author publications
You can also search for this author in PubMed Google Scholar
Anil Nagathil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Vatolkin .

Editor information

Editors and Affiliations

Department of Computer Science, Dortmund University of Applied Sciences and Arts, Dortmund, Germany
Nadja Bauer
Faculty of Statistics, TU Dortmund University, Dortmund, Germany
Katja Ickstadt
Institute for Empirical Research and Statistics, FOM University of Applied Sciences, Essen, Germany
Karsten Lübke
School of Business Studies, HOST University of Applied Sciences Stralsund, Stralsund, Germany
Gero Szepannek
Department of Information Systems, University of Münster, Münster, Germany
Heike Trautmann
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vatolkin, I., Nagathil, A. (2019). Evaluation of Audio Feature Groups for the Prediction of Arousal and Valence in Music. In: Bauer, N., Ickstadt, K., Lübke, K., Szepannek, G., Trautmann, H., Vichi, M. (eds) Applications in Statistical Computing. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-25147-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-25147-5_19
Published: 12 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25146-8
Online ISBN: 978-3-030-25147-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics