Nordic Music Genre Classification Using Song Lyrics

  • Adriano A. de Lima
  • Rodrigo M. Nunes
  • Rafael P. Ribeiro
  • Carlos N. SillaJr.
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8455)


Lyrics-based music genre classification is still understudied within the music information retrieval community. The existing approaches, reported in the literature, only deals with lyrics in the English language. Thus, it is necessary to evaluate if the standard text classification techniques are suitable for lyrics in languages other than English. More precisely, in this work we are interested in analyzing which approach gives better results: a language-dependent approach using stemming and stopwords removal or a language-independent approach using n-grams. To perform the experiments we have created the Nordic music genre lyrics database. The analysis of the experimental results shows that using a language-independent approach with the n-gram representation is better than using a language-dependent approach with stemming. Additional experiments using stylistic features were also performed. The analysis of these additional experiments has shown that using stylistic features combined with the other approaches improve the classification results.


Lyrics Classification Multi-language text classification Music Genre Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Orio, N.: Music retrieval: a tutorial and review. Foundations and Trends in Information Retrieval 1(1), 1–90 (2006)CrossRefzbMATHGoogle Scholar
  2. 2.
    Mayer, R., Neumayer, R., Rauber, A.: Combination of audio and lyrics features for genre classification in digital audio collections. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 159–168 (2008)Google Scholar
  3. 3.
    Mayer, R., Neumayer, R., Rauber, A.: Rhyme and style features for musical genre classification by song lyrics. In: Proceedings of the 9th International Conference on Music Information Retrieval, pp. 337–342 (2008)Google Scholar
  4. 4.
    Mayer, R., Neumayer, R.: Multi-modal analysis of music: A large-scale evaluation. In: Proceedings of the Workshop on Exploring Musical Information Spaces, pp. 30–35 (2009)Google Scholar
  5. 5.
    Mayer, R., Rauber, A.: Building ensembles of audio and lyrics features to improve musical genre classification. In: Proceedings of the International Conference on Distributed Framework and Applications, pp. 1–6 (2010)Google Scholar
  6. 6.
    Mayer, R., Rauber, A.: Musical genre classification by ensembles of audio and lyrics features. In: Proceedings of International Conference on Music Information Retrieval, pp. 675–680 (2011)Google Scholar
  7. 7.
    Silla Jr., C.N., Koerich, A.L., Kaestner, C.A.A.: Improving automatic music genre classification with hybrid content-based feature vectors. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1702–1707 (2010)Google Scholar
  8. 8.
    El-Khair, I.A.: Effects of stop words elimination for arabic information retrieval: a comparative study. International Journal of Computing & Information Sciences 4(3), 119–133 (2006)Google Scholar
  9. 9.
    Yu, B.: An evaluation of text classification methods for literary study. Literary and Linguistic Computing 23(3), 327–343 (2008)CrossRefGoogle Scholar
  10. 10.
    Hu, X., Downie, J.S.: Improving mood classification in music digital libraries by combining lyrics and audio. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 159–168 (2010)Google Scholar
  11. 11.
    Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175 (1994)Google Scholar
  12. 12.
    Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14, 130–137 (1980)CrossRefGoogle Scholar
  13. 13.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  14. 14.
    Porter, M.F.: Snowball: A language for stemming algorithms,
  15. 15.
    Tokunaga, T., Makoto, I.: Text categorization based on weighted inverse document frequency. Technical report, Tokyo Institute of Technology (1994)Google Scholar
  16. 16.
    Wu, H., Salton, G.: A comparison of search term weighting: Term relevance vs. inverse document frequency. In: Proceedings of the 4th Special Interest Group on Information Retrieval, pp. 30–39 (1981)Google Scholar
  17. 17.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  18. 18.
    Fabbri, F.: Browsing music spaces: Categories and the musical mind (1999)Google Scholar
  19. 19.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)CrossRefGoogle Scholar
  20. 20.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2005)Google Scholar
  21. 21.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods, pp. 185–208 (1999)Google Scholar
  22. 22.
    Dhanaraj, R., Logan, B.: Automatic prediction of hit songs. In: Proceedings of International Conference on Music Information Retrieval, pp. 488–491 (2005)Google Scholar
  23. 23.
    Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: Proceedings of the 7th International Conference on Machine Learning and Applications, pp. 688–693 (2008)Google Scholar
  24. 24.
    Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology 60(1), 9–26 (2009)CrossRefGoogle Scholar
  25. 25.
    HaCohen-Kerner, Y., Beck, H., Yehudai, E., Rosenstein, M., Mughaz, D.: Cuisine: Classification using stylistic feature sets and/or name-based feature sets. Journal of the American Society for Information Science and Technology 61, 1644–1657 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Adriano A. de Lima
    • 1
  • Rodrigo M. Nunes
    • 1
  • Rafael P. Ribeiro
    • 1
  • Carlos N. SillaJr.
    • 1
  1. 1.Computer Music Technology LaboratoryFederal University of Technology of Parana (UTFPR)Cornélio ProcópioBrazil

Personalised recommendations