Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

De Boom, Cedric; Agrawal, Rohan; Hansen, Samantha; Kumar, Esh; Yon, Romain; Chen, Ching-Wei; Demeester, Thomas; Dhoedt, Bart

doi:10.1007/s11042-017-5121-z

Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

Published: 29 August 2017

Volume 77, pages 15385–15407, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Cedric De Boom¹,
Rohan Agrawal²,
Samantha Hansen²,
Esh Kumar²,
Romain Yon²,
Ching-Wei Chen²,
Thomas Demeester¹ &
…
Bart Dhoedt¹

800 Accesses
11 Citations
26 Altmetric
Explore all metrics

Abstract

The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user’s musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in recommender systems

Article Open access 01 November 2020

A review on the long short-term memory model

Article 13 May 2020

Deep learning for time series classification: a review

Article 02 March 2019

Notes

github.com/spotify/annoy.
github.com/Lasagne/Lasagne.
There is an excellent web article by Radim Rehurek from 2014 which studies this in depth, see http://rare-technologies.com/performance-shootout-of-nearest-neighbors-querying.

References

Al-Rfou R, Alain G, Almahairi A et al (2016) Theano - A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations (ICLR)
Barkan O, Koenigstein N (2016) Item2vec - neural item embedding for collaborative filtering. RecSys
Bennett J, Lanning S (2007) The Netflix prize. In: KDD cup and workshop
Charikar M (2002) Similarity estimation techniques from rounding algorithms. In: Symposium on theory of computing (STOC)
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555v1
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res
De Boom C, Van Canneyt S, Demeester T, Dhoedt B (2016) Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn Lett
dos Santos C N, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: International conference on computational linguistics (COLING)
Dror G, Koenigstein N, Koren Y (2011) Yahoo! Music recommendations - modeling music ratings with temporal dynamics and item taxonomy. In: RecSys
Figueiredo F, Ribeiro B, Faloutsos C, Andrade N, Almeida JM (2016) Mining online music listening trajectories. In: International society of music information retrieval conference (ISMIR)
Goodman J (2001) Classes for fast maximum entropy training. In: 2001 IEEE international conference on acoustics, speech, and signal processing
Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: International conference on machine learning (ICML)
Greff K, Srivastava R K, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. arXiv:1503.04069v1
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939v4
Hill F, Cho K, Korhonen A, Bengio Y (2016) Learning to understand phrases by embedding the dictionary. In: Transactions of the association for computational linguistics (TACL)
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: IEEE International conference on data mining (ICDM)
Ji S, Satish N, Li S, Dubey P (2016) Parallelizing Word2Vec in multi-core and many-core architectures. arXiv:1611.06172
Johnson CC (2014) Logistic matrix factorization for implicit feedback data. In: Advances in neural information processing systems (NIPS) workshop on distributed machine learning
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078v2
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR)
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems (NIPS)
Liang D, Altosaar J, Charlin L (2016) Factorization meets the item embedding: regularizing matrix factorization with item co-occurrence. In: International conference on machine learning (ICML) workshop
Maas A L, Hannun A Y, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: International conference on machine learning (ICML)
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (NIPS)
Moore J L, Chen S, Turnbull D, Joachims T (2013) Taste over time - the temporal dynamics of user preferences. In: International society of music information retrieval conference (ISMIR)
Ozsoy MG (2016) From word embeddings to item recommendation. arXiv:1601.01356v3
Pan R, Zhou Y, Cao B, Liu N N, Lukose R, Scholz M, Yang Q (2008) One-class collaborative filtering. In: IEEE international conference on data mining (ICDM)
Paterek A (2007) Improving regularized singular value decomposition for collaborative filtering. In: KDD cup and workshop
Rendle S, Freudenthaler C, Gantner Z (2009) BPR - Bayesian personalized ranking from implicit feedback. In: Uncertainty in artificial intelligence (UAI)
Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems (NIPS)
Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117v2
Van Den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. In: Advances in neural information processing systems (NIPS)
Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499v2

Download references

Acknowledgments

Cedric De Boom is funded by a PhD grant of the Research Foundation - Flanders (FWO). We greatly thank Nvidia for its donation of a Tesla K40 and Titan X GPU to support the research of the IDLab group at Ghent University.

Author information

Authors and Affiliations

IDLab - imec, Ghent University, Technologiepark-Zwijnaarde 15, 9052, Ghent, Belgium
Cedric De Boom, Thomas Demeester & Bart Dhoedt
Spotify, Inc., 45 W 18th St, New York, NY, 10011, USA
Rohan Agrawal, Samantha Hansen, Esh Kumar, Romain Yon & Ching-Wei Chen

Authors

Cedric De Boom
View author publications
You can also search for this author in PubMed Google Scholar
Rohan Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Samantha Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Esh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Romain Yon
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Demeester
View author publications
You can also search for this author in PubMed Google Scholar
Bart Dhoedt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cedric De Boom.

Appendix: Table of symbols

In order of appearance:

v _u(t)	User vector for user u at time t
v _i(t)	Item vector for item i at time t
r _{u i}	rating of item i by user u
μ	Global average rating
b _u(t)	Rating bias of user u at time t
b _i(t)	Rating bias of item i at time t
h _t	Hidden state at time t
c _t	Cell state at time t
f _t	Forget gate at time t
o _t	Output gate at time t
r _t	Reset gate at time t
u _t	Update gate at time t
U _x,W _x	Weight matrices for gate x
w _x	Weight vector for gate x
b _x	Bias for gate x
\(\mathcal {F}(\cdot )\)	Non-linear function
σ(⋅)	Sigmoid function
⊙	Element-wise multiplication operator
N	Number of songs in the catalog
D	Embedding dimensionality
U	Set of all users on the platform
(s ^u)	Ordered sequence of song vectors user u listened to
t ^u	Taste vector of user u
\(\mathcal {R}\left (\cdot ; \mathbf {W} \right )\)	RNN function with parameters W
\(\mathcal {L}(\cdot )\)	Loss function
‖⋅‖₂	L2 norm
L _cos(⋅)	Cosine distance
unif{x,y}	Uniform distribution between x and y
\(\mathcal {D}\)	Dataset of song sequences
ℓ _min,ℓ _max	Minimum and maximum sampling offsets
η	Learning rate
c ^u	Context vector for user u
⊕	Vector concatenation operator
C	Ordered set of contexts on the Spotify platform
C _i	i’th context in C
c(s)	set of contexts for song s
onehot(i,L)	One-hot vector of length L with a 1 at position i
1 _A(x)	Indicator function: 1 if x ∈ A, else 0
Δ(x,y)	Time difference between playing songs x and y
D _{h i d}	Hidden dimensionality
γ	Discount factor
\(\mathcal {W}(\cdot ; \mathbf {w})\)	Weight-based model function with weights w
λ	Regularization term
ζ(⋅)	Riemann zeta function
Zipf_z(⋅)	Zipf probability density function with parameter z
r P S T, r P L T	Short- and long-term playlist RNN
r H S T, r H L T	Short- and long-term user listening history RNN
b W S T, b W L T	Short- and long-term weight-based model

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Boom, C., Agrawal, R., Hansen, S. et al. Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. Multimed Tools Appl 77, 15385–15407 (2018). https://doi.org/10.1007/s11042-017-5121-z

Download citation

Received: 17 January 2017
Revised: 29 June 2017
Accepted: 16 August 2017
Published: 29 August 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11042-017-5121-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

A review on the long short-term memory model

Deep learning for time series classification: a review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Table of symbols

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

A review on the long short-term memory model

Deep learning for time series classification: a review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Table of symbols

Appendix: Table of symbols

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation