Advertisement

A general framework for learning prosodic-enhanced representation of rap lyrics

  • Hongru Liang
  • Haozheng Wang
  • Qian Li
  • Jun Wang
  • Guandong Xu
  • Jiawei Chen
  • Jin-Mao Wei
  • Zhenglu YangEmail author
Article
  • 85 Downloads

Abstract

Learning and analyzing rap lyrics is a significant basis for many Web applications, such as music recommendation, automatic music categorization, and music information retrieval, due to the abundant source of digital music in the World Wide Web. Although numerous studies have explored the topic, knowledge in this field is far from satisfactory, because critical issues, such as prosodic information and its effective representation, as well as appropriate integration of various features, are usually ignored. In this paper, we propose a hierarchical attention variational a utoe ncoder framework (HAVAE), which simultaneously considers semantic and prosodic features for rap lyrics representation learning. Specifically, the representation of the prosodic features is encoded by phonetic transcriptions with a novel and effective strategy (i.e., rhyme2vec). Moreover, a feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation. A comprehensive empirical evaluation demonstrates that the proposed framework outperforms the state-of-the-art approaches under various metrics in different rap lyrics learning tasks.

Keywords

Representation learning Variational autoencoder Hierarchical attention mechanism Rap lyrics 

Notes

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No.U1636116, 11431006, 61772288, the Research Fund for International Young Scientists under Grant No. 61650110510 and 61750110530, and the Ministry of education of Humanities and Social Science project under grant 16YJC790123.

References

  1. 1.
    Addanki, K., Wu, D.: Unsupervised rhyme scheme identification in hip hop lyrics using hidden Markov models. In: Proceedings of the International Conference on Statistical Language and Speech Processing, pp. 39–50 (2013)Google Scholar
  2. 2.
    Alexey, T., Ivan, P.Y.: Music generation with variational recurrent autoencoder supported by history. arXiv:170505458 (2017)
  3. 3.
    Association, I.P.: Handbook of the International Phonetic Association: a guide to the use of the International Phonetic Alphabet. Cambridge University Press, Cambridge (1999)Google Scholar
  4. 4.
    Bengio, Y.: Learning deep architectures for AI. Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bryant, P.E., MacLean, M., Bradley, L.L., Crossland, J.: Rhyme and alliteration, phoneme detection, and learning to read. Developmental psychology 26(3), 429–438 (1990)CrossRefGoogle Scholar
  6. 6.
    Chen, X., Wang, Y., Liu, Q: Visual and textual sentiment analysis using deep fusion convolutional neural networks. In: Proceedings of the 2017 IEEE International Conference on Image Processing, pp. 1557–1561 (2017)Google Scholar
  7. 7.
    Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 658–666 (2016)Google Scholar
  8. 8.
    Duddington, J.: eSpeak text to speech. http://espeak.sourceforge.net (2012)
  9. 9.
    Edwards, P.: How to Rap. Random House (2012)Google Scholar
  10. 10.
    Fabius, O., van Amersfoort, J.R.: Variational recurrent auto-encoders. arXiv:14126581 (2014)
  11. 11.
    Hadjeres, G., Nielsen, F., Pachet, F: GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures. arXiv:170704588 (2017)
  12. 12.
    He, J., Zhou, M., Jiang, L.: Generating chinese classical poems with statistical machine translation models. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 1650–1656 (2012)Google Scholar
  13. 13.
    Hirjee, H., Brown, D.G.: Automatic detection of internal and imperfect rhymes in rap lyrics. In: Proceedings of the International Society for Music Information Retrieval Conference, pp. 711–716 (2009)Google Scholar
  14. 14.
    Hirjee, H., Brown, D.G.: Rhyme analyzer: an analysis tool for rap lyrics. In: Proceedings of the 11th International Society for Music Information Retrieval Conference (2010)Google Scholar
  15. 15.
    Hirjee, H., Brown, D.G.: Using automated rhyme detection to characterize rhyming style in rap music. Empirical Musicology Review 5(4), 121–145 (2010)CrossRefGoogle Scholar
  16. 16.
    Hou, X., Shen, L., Sun, K., Qiu, G.: Deep feature consistent variational autoencoder. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 1133–1141 (2017)Google Scholar
  17. 17.
    Hu, X., Bai, K., Cheng, J., Deng, J.q., Guo, Y., Hu, B., Krishnan, A.S., Wang, F.: MeDJ: multidimensional emotion-aware music delivery for adolescent. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 793–794 (2017)Google Scholar
  18. 18.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:13126114 (2013)
  19. 19.
    Kingma, D.P., Rezende, D.J., Mohamed, S., Welling, M.: Semi-supervised learning with deep generative models. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3581–3589 (2014)Google Scholar
  20. 20.
    Malmi, E., Takala, P., Toivonen, H., Raiko, T., Gionis, A.: DopeLearning: a computational approach to rap lyrics generation. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–204 (2016)Google Scholar
  21. 21.
    Mauch, M., MacCallum, R.M., Levy, M., Leroi, A.M.: The evolution of popular music: USA 1960–2010. Royal Society open science, 2(5) (2015)Google Scholar
  22. 22.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:13013781 (2013)
  23. 23.
    Oliveira, H.G.: Poetryme: a versatile platform for poetry generation. In: Proceedings of Workshop on Computational Creativity, Concept Invention, and General Intelligence (2012)Google Scholar
  24. 24.
    Potash, P., Romanov, A., Rumshisky, A: Ghostwriter: using an LSTM for automatic rap lyric generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 165–177 (2015)Google Scholar
  25. 25.
    Potash, P., Romanov, A., Rumshisky, A.: Evaluating creative language generation: the case of rap lyric ghostwriting. arXiv:161203205(2016)
  26. 26.
    Quoc, V.L., Tomas, M.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196 (2014)Google Scholar
  27. 27.
    Real, R., Vargas, J.M.: The probabilistic basis of Jaccard’s index of similarity. Syst. Biol. 45(3), 380–385 (1996)CrossRefGoogle Scholar
  28. 28.
    Ruli, M., Graeme, R., Henry, s.T.: Using genetic algorithms to create meaningful poetic text. J. Exp. Theor. Artif. Intell. 24(1), 43–64 (2012)CrossRefGoogle Scholar
  29. 29.
    Thorsten, J.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226 (2006)Google Scholar
  30. 30.
    Tsaptsinos, A.: Lyrics-based music genre classification using a hierarchical attention network. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, pp. 694–701 (2017)Google Scholar
  31. 31.
    Wang, Q., Luo, T., Wang, D., Xing, C.: Chinese song iambics generation with neural attention-based model. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 2943–2949 (2016)Google Scholar
  32. 32.
    Wu, D., Addanki, V.S.K., Saers, M.S., Beloucif, M.: Learning to freestyle: Hip hop challenge-response induction via transduction rule segmentation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 102–112 (2013)Google Scholar
  33. 33.
    Yan, R., Jiang, H., Lapata, M., Lin, S.D., Lv, X., Li, X.: i, Poet: Automatic chinese poetry composition through a generative summarization framework under constrained optimization. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 2197–2203 (2013)Google Scholar
  34. 34.
    Yu, C., Mohammed, J.Z.: KATE: K-competitive autoencoder for text. In: Proceedings of the ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, pp. 85–94 (2017)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Hongru Liang
    • 1
  • Haozheng Wang
    • 1
  • Qian Li
    • 1
  • Jun Wang
    • 2
  • Guandong Xu
    • 3
  • Jiawei Chen
    • 1
  • Jin-Mao Wei
    • 1
  • Zhenglu Yang
    • 1
    Email author
  1. 1.College of Computer ScienceNankai UniversityTianjinChina
  2. 2.College of Mathematics and Statistics ScienceLudong UniversityYantaiChina
  3. 3.Advanced Analytics InstituteUniversity of Technology SydneyUltimoAustralia

Personalised recommendations