Skip to main content

Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7988))

  • 4344 Accesses

Abstract

In this paper an approach for music clustering, using only lyrics features, is developed for identifying groups with similar feelings, content or emotions in the songs. For this study, a collection of 30.000 Spanish lyrics has been used. The songs were represented in a vector space model (Bag Of Words (BOW)), and some techniques of Part Of Speech (POS) were used as part of preprocessing. Partitional and hierarchical methods were used to perform clustering estimating the appropriate number of clusters (k). For evaluating the clustering results, some internal measures were used such as Davies Bouldin Index (DBI), intra similarity and inter similarity measures. At last, the final clusters were tagged using top words and association rules. Experiments show that music could be organized in related groups and tagged using unsupervised techniques as clustering with only lyrics information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anaya-Sánchez, H., Pons-Porrata, A., Berlanga-Llavori, R.: A document clustering algorithm for discovering and describing topics. Pattern Recogn. Lett. 31(6), 502–510 (2010)

    Article  Google Scholar 

  2. Barreira, L., Cavaco, S., Da Silva, J.: Unsupervised music genre classification with a model-based approach. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 268–281. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Transactions on Multimedia 13(2), 303–319 (2011)

    Article  Google Scholar 

  4. Hu, X., Downie, J.S.: Improving mood classification in music digital libraries by combining lyrics and audio. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, JCDL 2010, New York, NY, USA, pp. 159–168 (2010)

    Google Scholar 

  5. Hu, Y., Chen, X., Yang, D.: Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In: Proceedings of ISMIR 2009, pp. 123–128 (2009)

    Google Scholar 

  6. Inc., C.: Cisco visual networking index: Forecast and methodology, 2011-2016. Tech. rep., Cisco (2012)

    Google Scholar 

  7. Karypis, G.: Cluto a clustering toolkit. Tech. Rep. 02-017, Dept. of Computer Science, University of Minnesota (2003), http://www.cs.umn.edu/~cluto

  8. Kleedorfer, F., Knees, P., Pohle, T.: Oh oh oh whoah! towards automatic topic detection in song lyrics. In: Bello, J.P., Chew, E., Turnbull, D. (eds.) ISMIR, pp. 287–292 (2008)

    Google Scholar 

  9. Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications, ICMLA 2008, pp. 688–693. IEEE Computer Society, Washington, DC (2008)

    Chapter  Google Scholar 

  10. Li, T., Ogihara, M., Zhu, S.: Integrating features from different sources for music information retrieval. In: IEEE International Conference on Data Mining, pp. 372–381 (2006)

    Google Scholar 

  11. Mayer, R., Neumayer, R., Rauber, A.: Rhyme and style features for musical genre classification by song lyrics. In: Proceedings of the 9th International Conference on Music Information Retrieval (2008)

    Google Scholar 

  12. Nakatani, S.: Language detection library for java. Tech. rep., Cybozu Labs, Inc. (2011), http://code.google.com/p/language-detection/

  13. Neumayer, R., Rauber, A.: Integration of text and audio features for genre classification in music information retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 724–727. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Neumayer, R., Rauber, A.: Multi-modal music information retrieval - visualisation and evaluation of clusterings by both audio and lyrics. In: Proceedings of the 8th Conference Recherche d’Information Assiste Par Ordinateur, RIAO 2007. ACM (2007)

    Google Scholar 

  15. Ozgur, A.: Supervised and Unsupervised Machine Learning Techniques For Text Document Categorization. Master’s thesis, Department of Computer Engineering, Bogazici University, Istanbul, Turkey (2002)

    Google Scholar 

  16. Pachet, F., Cazaly, D.: A taxonomy of musical genres. In: Proc. Content-Based Multimedia Information Access, RIAO 2000 (2000)

    Google Scholar 

  17. Padró, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference, LREC 2012. ELRA, Istanbul (2012)

    Google Scholar 

  18. Pham, D.T., Dimov, S.S.N.C.D.: Selection of k in k -means clustering. In: Proceedings of the Institution of Mechanical Engineers, vol. 219, p. 103 (2005)

    Google Scholar 

  19. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980)

    Article  Google Scholar 

  20. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988), cited By (since 1996) 1952

    Article  Google Scholar 

  21. Scaringella, N., Zoia, G., Mlynek, D.: Automatic genre classification of music content: a survey. IEEE Signal Processing Magazine 23(2), 133–141 (2006)

    Article  Google Scholar 

  22. Shao, X., Xu, C., Kankanhalli, M.: Unsupervised classification of music genre using hidden markov model. In: 2004 IEEE International Conference on Multimedia and Expo, ICME 2004, vol. 3, pp. 2023–2026 (2004)

    Google Scholar 

  23. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison Wesley (May 2005)

    Google Scholar 

  24. Ying, T.C., Doraisamy, S., Abdullah, L.: Genre and mood classification using lyric features. In: 2012 International Conference on Information Retrieval Knowledge Management, CAMP, pp. 260–263 (March 2012)

    Google Scholar 

  25. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Parra, F.L., León, E. (2013). Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39712-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39711-0

  • Online ISBN: 978-3-642-39712-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics