The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user’s musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.
Similar content being viewed by others
There is an excellent web article by Radim Rehurek from 2014 which studies this in depth, see http://rare-technologies.com/performance-shootout-of-nearest-neighbors-querying.
Al-Rfou R, Alain G, Almahairi A et al (2016) Theano - A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations (ICLR)
Barkan O, Koenigstein N (2016) Item2vec - neural item embedding for collaborative filtering. RecSys
Bennett J, Lanning S (2007) The Netflix prize. In: KDD cup and workshop
Charikar M (2002) Similarity estimation techniques from rounding algorithms. In: Symposium on theory of computing (STOC)
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555v1
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res
De Boom C, Van Canneyt S, Demeester T, Dhoedt B (2016) Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn Lett
dos Santos C N, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: International conference on computational linguistics (COLING)
Dror G, Koenigstein N, Koren Y (2011) Yahoo! Music recommendations - modeling music ratings with temporal dynamics and item taxonomy. In: RecSys
Figueiredo F, Ribeiro B, Faloutsos C, Andrade N, Almeida JM (2016) Mining online music listening trajectories. In: International society of music information retrieval conference (ISMIR)
Goodman J (2001) Classes for fast maximum entropy training. In: 2001 IEEE international conference on acoustics, speech, and signal processing
Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: International conference on machine learning (ICML)
Greff K, Srivastava R K, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. arXiv:1503.04069v1
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939v4
Hill F, Cho K, Korhonen A, Bengio Y (2016) Learning to understand phrases by embedding the dictionary. In: Transactions of the association for computational linguistics (TACL)
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: IEEE International conference on data mining (ICDM)
Ji S, Satish N, Li S, Dubey P (2016) Parallelizing Word2Vec in multi-core and many-core architectures. arXiv:1611.06172
Johnson CC (2014) Logistic matrix factorization for implicit feedback data. In: Advances in neural information processing systems (NIPS) workshop on distributed machine learning
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078v2
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR)
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems (NIPS)
Liang D, Altosaar J, Charlin L (2016) Factorization meets the item embedding: regularizing matrix factorization with item co-occurrence. In: International conference on machine learning (ICML) workshop
Maas A L, Hannun A Y, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: International conference on machine learning (ICML)
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (NIPS)
Moore J L, Chen S, Turnbull D, Joachims T (2013) Taste over time - the temporal dynamics of user preferences. In: International society of music information retrieval conference (ISMIR)
Ozsoy MG (2016) From word embeddings to item recommendation. arXiv:1601.01356v3
Pan R, Zhou Y, Cao B, Liu N N, Lukose R, Scholz M, Yang Q (2008) One-class collaborative filtering. In: IEEE international conference on data mining (ICDM)
Paterek A (2007) Improving regularized singular value decomposition for collaborative filtering. In: KDD cup and workshop
Rendle S, Freudenthaler C, Gantner Z (2009) BPR - Bayesian personalized ranking from implicit feedback. In: Uncertainty in artificial intelligence (UAI)
Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems (NIPS)
Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117v2
Van Den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. In: Advances in neural information processing systems (NIPS)
Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499v2
Cedric De Boom is funded by a PhD grant of the Research Foundation - Flanders (FWO). We greatly thank Nvidia for its donation of a Tesla K40 and Titan X GPU to support the research of the IDLab group at Ghent University.
Author information
Authors and Affiliations
Corresponding author
Appendix: Table of symbols
Appendix: Table of symbols
In order of appearance:
v u (t) | User vector for user u at time t |
v i (t) | Item vector for item i at time t |
r u i | rating of item i by user u |
μ | Global average rating |
b u (t) | Rating bias of user u at time t |
b i (t) | Rating bias of item i at time t |
h t | Hidden state at time t |
c t | Cell state at time t |
f t | Forget gate at time t |
o t | Output gate at time t |
r t | Reset gate at time t |
u t | Update gate at time t |
U x ,W x | Weight matrices for gate x |
w x | Weight vector for gate x |
b x | Bias for gate x |
\(\mathcal {F}(\cdot )\) | Non-linear function |
σ(⋅) | Sigmoid function |
⊙ | Element-wise multiplication operator |
N | Number of songs in the catalog |
D | Embedding dimensionality |
U | Set of all users on the platform |
(s u) | Ordered sequence of song vectors user u listened to |
t u | Taste vector of user u |
\(\mathcal {R}\left (\cdot ; \mathbf {W} \right )\) | RNN function with parameters W |
\(\mathcal {L}(\cdot )\) | Loss function |
‖⋅‖2 | L2 norm |
L cos(⋅) | Cosine distance |
unif{x,y} | Uniform distribution between x and y |
\(\mathcal {D}\) | Dataset of song sequences |
ℓ min,ℓ max | Minimum and maximum sampling offsets |
η | Learning rate |
c u | Context vector for user u |
⊕ | Vector concatenation operator |
C | Ordered set of contexts on the Spotify platform |
C i | i’th context in C |
c(s) | set of contexts for song s |
onehot(i,L) | One-hot vector of length L with a 1 at position i |
1 A (x) | Indicator function: 1 if x ∈ A, else 0 |
Δ(x,y) | Time difference between playing songs x and y |
D h i d | Hidden dimensionality |
γ | Discount factor |
\(\mathcal {W}(\cdot ; \mathbf {w})\) | Weight-based model function with weights w |
λ | Regularization term |
ζ(⋅) | Riemann zeta function |
Zipf z (⋅) | Zipf probability density function with parameter z |
r P S T, r P L T | Short- and long-term playlist RNN |
r H S T, r H L T | Short- and long-term user listening history RNN |
b W S T, b W L T | Short- and long-term weight-based model |
Rights and permissions
About this article
Cite this article
De Boom, C., Agrawal, R., Hansen, S. et al. Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. Multimed Tools Appl 77, 15385–15407 (2018). https://doi.org/10.1007/s11042-017-5121-z
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5121-z