Skip to main content

Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales


The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user’s musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6




  3. There is an excellent web article by Radim Rehurek from 2014 which studies this in depth, see


  1. Al-Rfou R, Alain G, Almahairi A et al (2016) Theano - A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1

  2. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations (ICLR)

  3. Barkan O, Koenigstein N (2016) Item2vec - neural item embedding for collaborative filtering. RecSys

  4. Bennett J, Lanning S (2007) The Netflix prize. In: KDD cup and workshop

  5. Charikar M (2002) Similarity estimation techniques from rounding algorithms. In: Symposium on theory of computing (STOC)

  6. Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555v1

  7. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res

  8. De Boom C, Van Canneyt S, Demeester T, Dhoedt B (2016) Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn Lett

  9. dos Santos C N, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: International conference on computational linguistics (COLING)

  10. Dror G, Koenigstein N, Koren Y (2011) Yahoo! Music recommendations - modeling music ratings with temporal dynamics and item taxonomy. In: RecSys

  11. Figueiredo F, Ribeiro B, Faloutsos C, Andrade N, Almeida JM (2016) Mining online music listening trajectories. In: International society of music information retrieval conference (ISMIR)

  12. Goodman J (2001) Classes for fast maximum entropy training. In: 2001 IEEE international conference on acoustics, speech, and signal processing

  13. Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: International conference on machine learning (ICML)

  14. Greff K, Srivastava R K, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. arXiv:1503.04069v1

  15. Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939v4

  16. Hill F, Cho K, Korhonen A, Bengio Y (2016) Learning to understand phrases by embedding the dictionary. In: Transactions of the association for computational linguistics (TACL)

  17. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput

  18. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: IEEE International conference on data mining (ICDM)

  19. Ji S, Satish N, Li S, Dubey P (2016) Parallelizing Word2Vec in multi-core and many-core architectures. arXiv:1611.06172

  20. Johnson CC (2014) Logistic matrix factorization for implicit feedback data. In: Advances in neural information processing systems (NIPS) workshop on distributed machine learning

  21. Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078v2

  22. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR)

  23. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer

  24. Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems (NIPS)

  25. Liang D, Altosaar J, Charlin L (2016) Factorization meets the item embedding: regularizing matrix factorization with item co-occurrence. In: International conference on machine learning (ICML) workshop

  26. Maas A L, Hannun A Y, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: International conference on machine learning (ICML)

  27. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (NIPS)

  28. Moore J L, Chen S, Turnbull D, Joachims T (2013) Taste over time - the temporal dynamics of user preferences. In: International society of music information retrieval conference (ISMIR)

  29. Ozsoy MG (2016) From word embeddings to item recommendation. arXiv:1601.01356v3

  30. Pan R, Zhou Y, Cao B, Liu N N, Lukose R, Scholz M, Yang Q (2008) One-class collaborative filtering. In: IEEE international conference on data mining (ICDM)

  31. Paterek A (2007) Improving regularized singular value decomposition for collaborative filtering. In: KDD cup and workshop

  32. Rendle S, Freudenthaler C, Gantner Z (2009) BPR - Bayesian personalized ranking from implicit feedback. In: Uncertainty in artificial intelligence (UAI)

  33. Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech

  34. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems (NIPS)

  35. Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117v2

  36. Van Den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. In: Advances in neural information processing systems (NIPS)

  37. Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499v2

Download references


Cedric De Boom is funded by a PhD grant of the Research Foundation - Flanders (FWO). We greatly thank Nvidia for its donation of a Tesla K40 and Titan X GPU to support the research of the IDLab group at Ghent University.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Cedric De Boom.

Appendix: Table of symbols

Appendix: Table of symbols

In order of appearance:

v u (t)

User vector for user u at time t

v i (t)

Item vector for item i at time t

r u i

rating of item i by user u


Global average rating

b u (t)

Rating bias of user u at time t

b i (t)

Rating bias of item i at time t

h t

Hidden state at time t

c t

Cell state at time t

f t

Forget gate at time t

o t

Output gate at time t

r t

Reset gate at time t

u t

Update gate at time t

U x ,W x

Weight matrices for gate x

w x

Weight vector for gate x

b x

Bias for gate x

\(\mathcal {F}(\cdot )\)

Non-linear function


Sigmoid function

Element-wise multiplication operator


Number of songs in the catalog


Embedding dimensionality


Set of all users on the platform

(s u)

Ordered sequence of song vectors user u listened to

t u

Taste vector of user u

\(\mathcal {R}\left (\cdot ; \mathbf {W} \right )\)

RNN function with parameters W

\(\mathcal {L}(\cdot )\)

Loss function


L2 norm

L cos(⋅)

Cosine distance


Uniform distribution between x and y

\(\mathcal {D}\)

Dataset of song sequences

min, max

Minimum and maximum sampling offsets


Learning rate

c u

Context vector for user u

Vector concatenation operator


Ordered set of contexts on the Spotify platform

C i

i’th context in C


set of contexts for song s


One-hot vector of length L with a 1 at position i

1 A (x)

Indicator function: 1 if xA, else 0


Time difference between playing songs x and y

D h i d

Hidden dimensionality


Discount factor

\(\mathcal {W}(\cdot ; \mathbf {w})\)

Weight-based model function with weights w


Regularization term


Riemann zeta function

Zipf z (⋅)

Zipf probability density function with parameter z

r P S T, r P L T

Short- and long-term playlist RNN

r H S T, r H L T

Short- and long-term user listening history RNN

b W S T, b W L T

Short- and long-term weight-based model

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Boom, C., Agrawal, R., Hansen, S. et al. Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. Multimed Tools Appl 77, 15385–15407 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: