A Hybrid Recommendation for Music Based on Reinforcement Learning

Wang, Yu

doi:10.1007/978-3-030-47426-3_8

Yu Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12084))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

7240 Accesses
10 Citations

Abstract

The key to personalized recommendation system is the prediction of users’ preferences. However, almost all existing music recommendation approaches only learn listeners’ preferences based on their historical records or explicit feedback, without considering the simulation of interaction process which can capture the minor changes of listeners’ preferences sensitively. In this paper, we propose a personalized hybrid recommendation algorithm for music based on reinforcement learning (PHRR) to recommend song sequences that match listeners’ preferences better. We firstly use weighted matrix factorization (WMF) and convolutional neural network (CNN) to learn and extract the song feature vectors. In order to capture the changes of listeners’ preferences sensitively, we innovatively enhance simulating interaction process of listeners and update the model continuously based on their preferences both for songs and song transitions. The extensive experiments on real-world datasets validate the effectiveness of the proposed PHRR on song sequence recommendation compared with the state-of-the-art recommendation approaches.

You have full access to this open access chapter, Download conference paper PDF

Music recommender using deep embedding-based features and behavior-based reinforcement learning

Article 27 December 2019

Towards the Improvement of Personalized Music Recommendation System Using Deep Learning Techniques

MCRN: A New Content-Based Music Classification and Recommendation Network

Keywords

1 Introduction

Recommendation systems have become indispensable for our daily life to help users navigate through the abundant data in the Internet. As the rapid expansion of the scale of music database, traditional music recommendation technology is difficult to help listeners to choose songs from such huge digital music resources. How to manage and recommend music effectively in the massive music library has become the main task of music recommendation system [1].

The mainstream recommendation algorithms can be classified as content-based [2, 3], collaborative filtering [5, 25], knowledge-based [6] and hybrid ones [7]. The collaborative filtering methods recommend items to users by exploiting the taste of other similar users. However, the cold-start and data sparse problem is very common in collaborative filtering. In knowledge-based approaches, users directly express their requirements and the recommendation system tries to retrieve items that are analogous to the users’ specified requirements. The content-based recommendation approaches are to find items similar to the ones that the users once liked, and the content information or expert label of items is also needed, but it does not require a large number of user-item rating records [4]. In order to improve performance, above methods can be combined into a hybrid recommendation system. The hybrid approach we use is feature augmentation, which takes the feature output from one method as input to another.

Nowadays, reinforcement learning [8] becomes one of the most important research hotspots. It mainly focuses on how to learn interactively, obtain feedback information in the action-evaluation environment, and then improve the choices of actions to adapt to the environment. In this paper, we propose a personalized hybrid recommendation algorithm for music based on reinforcement learning (PHRR). Based on the idea of hybrid recommendation, we utilize WMF-CNN model which uses content and collaborative filtering to learn and predict music features, and simulate listeners’ decision-making behaviors by model-based reinforcement learning process. What’s more, we establish a novel personalized music recommendation model to recommend song sequences which match listeners’ preferences better. Our contributions are as follows:

Our proposed PHRR algorithm combines the method of extracting music features based on WMF-CNN process with reinforcement learning model to recommend personalized song sequences to listeners.
We make innovative improvements to the method of learning listeners’ decision-making behaviors. And we promote the accuracy of model-learning by enhancing the simulation of interaction process in the reinforcement learning model.
We conduct experiments in the real-world datasets. The experimental results show that the proposed PHRR algorithm has a better recommendation performance than other comparison algorithms in the experiments.

The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 presents details about the proposed PHRR algorithm. Section 4 introduces experimental results and analyses. In Sect. 5, we conclude our work.

2 Related Work

The recommendation system for music service differs from that for other service (such as movies or e-books), because the implicit user preferences on music are more difficult to track than the explicit rating of items in other applications. Besides, users are more likely to listen a song several times. In recent years, music recommendations have been widely studied in both academia and industry. Since music contains an appreciable amount of textual and acoustic information, several recommendation algorithms model users’ preferences based on extracted textual and acoustic features [24].

What’s more, the advanced recommendation approaches start to apply reinforcement learning [8] to the recommendation process, and consider the recommendation task as a decision problem to provide more accurate recommendations. Wang et al. [11] proposed a reinforcement learning framework based on Bayesian model to balance the exploration and exploitation of users’ preferences for recommendation. To learn user preferences, it uses a Bayesian model that accounts for both audio content and the novelty of recommendations. Chen et al. [12] combined interest forgetting mechanism with Markov models because people’s interest in earlier items will be lost from time to time. They believed that discrete random state was represented by random variables in Markov chain. Zhang et al. [15] took the social network and Markov chain into account, and proposed a PRCM recommendation algorithm based on collaborative filtering. Taking the influence of song transitions into account, Liebman et al. [13] added the listeners’ preferences for the transitions between songs to the recommendation process and proposed a reinforcement learning model named DJ-MC. Hu et al. [14] integrated users’ feedback into the recommendation model and proposed a Q-Learning based window list recommendation model called RLWRec based on greedy strategy, which traded off between the precision and recall of recommendation. It is a model-free reinforcement learning framework, and it has the data-inefficient problem without model.

Different from the previous research, we focus more on simulating interaction process of listeners based on their implicit preferences for songs and song transitions. Our main aim is to capture the changes of listeners’ preferences sensitively in the recommendation process and promote the recommendation quality of music.

3 Our Approach

3.1 Music Feature Extraction

As the song transition dataset is not large enough to train a good model, we can do “transfer learning”, i.e. the WMF-CNN process, from the larger Million Song Dataset [22]. To extract the music features, we use WMF [9, 17] to compute the feature vectors of some songs, which is an improved matrix factorization approach for implicit feedback datasets. The feature vectors calculated by WMF are used to train the CNN model [18] to learn the feature vectors of all other songs. Each song’s feature vector only needs to be trained once, so it doesn’t take a long time to train. Suppose that the play count for listener $ u $ listening to song $ i $ is $ r_{ui} $, for each listener-song pair, we define a preference variable $ p_{ui} $ and a confidence variable $ c_{ui} $ ($ \alpha $ and $ \epsilon $ are hyper-parameters, and are set as 2.0 and 1e-6 respectively):

$$ p_{ui} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {r_{ui} > 0} \hfill \\ {0,} \hfill & {r_{ui} = 0} \hfill \\ \end{array} } \right. $$

(1)

$$ c_{ui} = 1 + \alpha log\left( {1 + \epsilon^{ - 1} r_{ui} } \right) $$

(2)

The preference variable $ p_{ui} $ indicates whether listener $ u $ has ever listened to song $ i $ or not. if it is 1, we assume that listener $ u $ may like song $ i $. The confidence variable $ c_{ui} $ measures the extent to which listener $ u $ likes song $ i $. The song with a higher play count is more likely to be preferred. The objective function of WMF contains a confidence weighted mean squared error term and an L2-regularization term, given by Eq. 3.

$$ \mathop{min}\limits_{{x^{*} y^{*} }} \sum\nolimits_{u,i} {c_{ui} \left( {p_{ui} - x_{u}^{T} y_{i} } \right)^{2} } + \lambda \left( {\sum\nolimits_{u} {\left| {\left| {x_{u} } \right|} \right|^{2} } + \sum\nolimits_{i} {\left| {\left| {y_{i} } \right|} \right|^{2} } } \right) $$

(3)

where λ is the regularization parameter set as 1e-5, $ x_{u} $ is the latent feature vector of listener $ u $ and $ y_{i} $ is the latent feature vector of song $ i $.

In this paper, we use ResNet [26] as our CNN model, the input of the CNN model is mel-frequency cepstral coefficient spectrum (MFCC) [19] of songs, including 500 frames in the time dimension and 12 frequency-bins in the frequency dimension. The output vectors are the 20-dimensional predicted latent feature vector of songs. The objective function of CNN is to minimize the mean squared error (MSE) and weighted predict error (WPE), given by Eq. 4 ($ \theta $ representing the model parameters).

$$ \mathop {min}\nolimits_{\theta } \sum\nolimits_{i} {\left| {\left| {y_{i} - y_{i}^{{\prime }} } \right|} \right|^{2} } + \sum\nolimits_{u,i} {c_{ui} \left( {p_{ui} - x_{u}^{T} y_{i}^{{\prime }} } \right)^{2} } $$

(4)

where $ y_{i} $ is the feature vector of song $ i $ calculated by WMF, and $ y_{i}^{'} $ is the predicted vector of song $ i $ by the CNN model.

3.2 Problem Description

We model the reinforcement learning based music recommendation problem as an improved Markov decision process (MDP) [10], which is denoted as a five-tuple $ \left( {S,A,P,R,T} \right) $. And the framework is shown in Fig. 1. Given the song set $ M = \left\{ {a_{1} ,a_{2} , \ldots ,a_{n} } \right\} $, the length of song sequences to recommend is defined as $ k $ and the mathematical description of the MDP model for music recommendation is as follows.

State set S. The state set denoted as $ S = \{ \left( {a_{1} ,a_{2} , \ldots ,a_{i} } \right)|1 \le i \le k $} is the set of recommended song sequences including all intermediate states. A state $ s \in S $ is a song sequence in the recommendation process.

Action set A. The action set $ A $ is the actions of listening to songs in M, denoted as $ A = \{ listening \,to\, song\, a_{i} |a_{i} \in M\} $. An action $ a_{i} \in A $ means listening to song $ a_{i} $.

State transition probability function P. We use abbreviated symbols $ P\left( {s,a,s^{{\prime }} } \right) = 1 $ to indicate that when we take action $ a $ in state $ s $, the probability of transition to $ s^{{\prime }} $ is 1, and 0 otherwise, i.e. $ P\left( {\left( {a_{1} ,a_{2} , \ldots ,a_{i} } \right),a,\left( {a_{1} ,a_{2} , \ldots ,a_{i} ,a} \right)} \right) = 1. $

Reward function R. The reward function $ R\left( {s,a} \right) $ obtains the reward value when listener takes action $ a $ in state $ s $, and each listener has a unique reward function. One of our key problems is how to calculate the reward function of new listeners effectively.

Final state T. The final state denoted as $ T = \left\{ {\left( {a_{1} ,a_{2} , \ldots ,a_{k} } \right)} \right\} $ is the final recommended song sequence of length $ k $.

Solving the MDP problem means to find a strategy $ \pi : S \to A $, so that we can get an action $ \pi \left( s \right) $ for a given state $ s $. With the optimal strategy $ \pi^{*} $, the highest expected total reward can be generated. However, the listener’s reward function is unknown, so the basic challenge of song sequence recommendation is how to model $ R $ effectively.

3.3 Listener Reward Function Model

Towards our recommendation problem, the probability function $ P $ is already known, so the only unknown element is the reward function $ R $. Most literatures about music recommendation only consider the listeners’ preferences for songs, without considering their preferences for song transitions. The reward function $ R $ should consider the listeners’ preferences both for songs and song transitions, as shown in Eq. 5.

$$ R\left( {s,a} \right) = R_{s} \left( a \right) + R_{t} \left( {s,a} \right) $$

(5)

where $ R_{s} :A \to R $ is the listener’s preference reward for song $ a $, and $ R_{t} :S \times A \to R $ is the listener’s preference reward for the song transition from song sequence $ s $ to song $ a $.

Listener Reward for Songs.

After extracting the features in Sect. 3.1, we obtain a 20-dimensional song feature vector. Then we use the binarized feature vector by sparse coding of the feature vector to represent the song’s features. As the feature vector is 20-dimensional, it has 20 descriptors. Each descriptor can be represented as m-bit binarized feature factors, so the binarized feature vector of song $ a $ denoted as $ \theta_{s} \left( a \right) $ is a 20 m-dimensional vector. What’s more, each listener has preference factors corresponding to binarized song feature factors respectively. For a 20 m-dimensional binarized feature vector $ \theta_{s} $, listener $ u $ has a 20 m-dimensional preference vector $ \varPhi_{s} \left( u \right) $. Therefore,

$$ R_{s} \left( a \right) = \varPhi_{s} \left( u \right) \cdot \theta_{s} \left( a \right) $$

(6)

Listener Reward for Song Transitions.

When the listener listens to song $ a_{j} $ after song $ a_{i} $, we note the reward function as $ r_{t} \left( {a_{i} ,a_{j} } \right) $. The song transition reward $ R_{t} :S \times A \to R $ is based on a certain song sequence $ s $ and the next-song $ a $ to listen, as shown below.

$$ R_{t} \left( {s,a} \right) = R_{t} \left( {\left( {a_{1} , \ldots ,a_{t - 1} } \right),a_{t} } \right) = \sum\nolimits_{i = 1}^{t - 1} {\frac{1}{{i^{2} }}r_{t} \left( {a_{t - i} ,a_{t} } \right)} $$

(7)

In Eq. 7, the probability of the i-th song having influence on transition reward is $ 1/i $. And its influence is attenuated over time, so the i-th song’s influence is reduced to $ 1/i $ times the original. As a result, the coefficient $ 1/i^{2} $ is the product of these two $ 1/i $ [13].

The calculation equation of $ r_{t} \left( {a_{i} ,a_{j} } \right) $ is similar to $ R_{s} \left( a \right) $, as shown in Eq. 8. We use the sparse coding of song transition feature vector to generate the binarized feature vector $ \theta_{t} \left( {a_{i} ,a_{j} } \right) $. Each descriptor can be represented as $ m^{2} $-bit binarized feature factors. Similar to $ \varPhi_{s} \left( u \right) $, listener $ u $ has a 20 $ m^{2} $-dimensional preference vector $ \varPhi_{t} \left( u \right) $ for the 20 $ m^{2} $-dimensional binarized feature vector $ \theta_{t} \left( {a_{i} ,a_{j} } \right) $.

$$ r_{t} \left( {a_{i} ,a_{j} } \right) = \varPhi_{t} \left( u \right)\cdot\theta_{t} \left( {a_{i} ,a_{j} } \right) $$

(8)

Listener Preference for Songs.

We keep the listener’s historical song sequence whose length is longer than $ k_{s} $. In order to make $ \varPhi_{s} \left( u \right) $ tend to be uniform, we initialize each factor of the vector $ \varPhi_{s} \left( u \right) $ to $ 1/(k_{s} + bins) $, where $ bins $ indicates the discretization granularity of song feature and the value is same as m above. For each song $ a_{i} $ in the listener’s historical song sequence, $ \varPhi_{s} \left( u \right) $ adds $ 1/(k_{s} + bins)*\theta_{s} \left( {a_{i} } \right) $ iteratively so the feature of song $ a_{i} $ can be learned. After $ \varPhi_{s} \left( u \right) $ is calculated, we normalize $ \varPhi_{s} \left( u \right) $ so that the weights of m-bit factors corresponding to each descriptor sum to 1 respectively.

Listener Preference for Song Transitions.

Similar to the process of $ \varPhi_{s} \left( u \right) $, the length of song transition sequence is $ k_{s} - 1 $ noted as $ k_{t} $. In order to make $ \varPhi_{t} \left( u \right) $ tend to be uniform, we also initialize each factor of the vector $ \varPhi_{t} \left( u \right) $ to $ 1/(k_{t} + bint) $, and the value of $ bint $ is $ bins\,*\,bins $. Obviously, the song transition pattern of historical song sequence is the best transition pattern that listener prefers. For each transition from $ a_{i} $ to $ a_{j} $ in historical song sequence,$ \varPhi_{t} \left( u \right) $ adds $ 1/(k_{t} + bint)*\theta_{t} \left( {a_{i} ,a_{j} } \right) $ iteratively. After $ \varPhi_{t} \left( u \right) $ is calculated, we normalize $ \varPhi_{t} \left( u \right) $ in the same way as $ \varPhi_{s} \left( u \right) $.

3.4 Next-Song Recommendation Model

In order to reduce the time and space complexity of processing, we utilize the hierarchical searching heuristic method [20] to recommend next-song. And search is only performed from the search space where $ R_{s} $ is relatively high (line 1). Besides, we take the horizon problem similar to the Go algorithm into account, which chooses the first step of the path with highest total reward as the next step (lines 9-14).

Since the song space is too large, it is not feasible to select songs from the complete song dataset $ M $. To alleviate this problem, we cluster songs by song type to reduce the complexity of searching (line 6). Clustering by song type is achieved by applying $ \delta $-medoids algorithm [21], which is a method for representative selection.

3.5 Song Sequence Recommendation and Update Model

To recommend song sequence, we define $ r_{adj} $ as $ log\left( {r_{i} /\overline{r} } \right) $, which determines the direction and size of update (lines 2-5). If $ r_{adj} $ is a positive value, it means that the listener likes the recommended song and the update direction is positive, and vice versa. And the relative contributions of the song reward $ R_{s} $ and the song transition reward $ r_{t} $ to their total reward are calculated as $ w_{s} $ and $ w_{t} $ respectively, as shown in Eq. 9 and Eq. 10.

$$ w_{s} = \frac{{R_{s} \left( {a_{i} } \right)}}{{R_{s} \left( {a_{i} } \right) + r_{t} \left( {a_{i - 1} ,a_{i} } \right)}} $$

(9)

$$ w_{t} = \frac{{r_{t} \left( {a_{i - 1} ,a_{i} } \right)}}{{R_{s} \left( {a_{i} } \right) + r_{t} \left( {a_{i - 1} ,a_{i} } \right)}} $$

(10)

$$ \varPhi_{s} = \frac{i}{i + 1} \cdot \varPhi_{s} + \frac{1}{i + 1} \cdot \theta_{s} \cdot w_{s} \cdot r_{adj} $$

(11)

$$ \varPhi_{t} = \frac{i}{i + 1} \cdot \varPhi_{t} + \frac{1}{i + 1} \cdot \theta_{t} \cdot w_{t} \cdot r_{adj} $$

(12)

Besides, the preference vector $ \varPhi_{s} $ and $ \varPhi_{t} $ are updated based on $ r_{adj} $, $ w_{s} $ and $ w_{t} $, and need to be normalized. This update process considers the changes of listener’s interest over time and balances the degree of trusting history with new rewards (line 6-7).

4 Experiment

4.1 Datasets

Million Song Dataset.

Million Song Dataset (MSD) [22] is a dataset of audio feature for 1 million songs, providing powerful support for the CNN model to learn and extract music features. The dataset is available at http://labrosa.ee.columbia.edu/millionsong/.

Taste Profile Subset Dataset.

Taste Profile Subset Dataset [22] as shown in Table 1 is in the form of listener-song-play count triple, providing a sufficient amount of dataset for WMF. The dataset is available at https://labrosa.ee.columbia.edu/millionsong/.

Table 1. A dataset of listener-song-playcount

Full size table

Historical Song Playlist Dataset.

The dataset is collected from the music website Yes.com [23], which is available at http://lme.joachims.org/. As shown in Table 2, it contains 51,260 historical song sequences.

Table 2. Listeners’ historical song playlist dataset

Full size table

4.2 Comparison Algorithms and Evaluation Methods

Comparison Algorithms.

We compare PHRR with baselines as below. For historical song playlist dataset, we use 90% of the dataset for training and the rest 10% for testing.

PHRR-S: PHRR-S algorithm is just the PHRR recommendation algorithm without taking song transitions into account.

DJ-MC [13]: DJ-MC algorithm is a reinforcement learning model added the listeners’ preferences for the transitions between songs to the recommendation process.

PRCM [15]: PRCM algorithm is a collaborative filtering recommendation algorithm taking the social network and Markov chain into account.

PopRec [16]: PopRec algorithm recommends the most popular songs.

RandRec: RandRec algorithm recommends songs randomly.

Evaluation Methods.

Our evaluation metrics include hit ratio of the recommended next-songs and F1-score of the recommended song sequences.

Hit Ratio (HR). We calculate hit ratio of the recommended next-songs for evaluation. In the historical song sequence dataset, the first $ n $ songs of each song sequence are used to recommend the n+1th songs. We compare the recommended n+1th song with the true n+1th song in the actual song sequence. If it is same, it is hit, otherwise it’s not hit.

F1-Score (F1). The second evaluation indicator we use is F1-score. F1-score combines precision and recall of recommendation, and the Score calculated by Eq. 13 – Eq. 15 is used to evaluate the effect of song sequence recommendation.

$$ Precision = \frac{{\left| {\left\{ {a \in S_{p} \cap a \in S_{t} } \right\}} \right|}}{{\left| {S_{p} } \right|}} $$

(13)

$$ Recall = \frac{{\left| {\left\{ {a \in S_{p} \cap a \in S_{t} } \right\}} \right|}}{{\left| {S_{t} } \right|}} $$

(14)

$$ Score = \frac{2*Precision*Recall}{Precision + Recall} $$

(15)

where $ S_{p} $ represents the recommended song sequences, $ S_{t} $ represents the song sequences presented in the historical song sequence dataset and $ a $ indicates a song.

4.3 Experimental Results on Hit Ratio

The proposed PHRR algorithm is a recommendation algorithm to recommend song sequences. In this comparison experiment, the recommendation effects are measured by calculating the hit ratio of the recommended next-songs.

Performance Comparison on Hit Ratio.

HR@k is the hit rate of the most probable k songs of the recommended next-songs. The results of hit ratio of above recommendation algorithms are shown in Table 3, and the best results are boldfaced.

Table 3. Performance comparison on hit ratio

Full size table

Effect of Training Length of Song Sequence on Hit Ratio.

Reinforcement learning process is based on the feedback of interactive information and simulates the decision behavior of listeners. The longer the length of training song sequence is, the more simulated interactions are in reinforcement learning process. The experimental results of the effect of training length on hit ratio are shown in Fig. 2(b).

Effect of Horizon Length on Hit Ratio.

We consider the horizon problem similar to the Go algorithm when recommending next-song, that is, we choose the first song of the song sequence with highest total reward as the next-song (Algorithm 1). The experimental results of effect of horizon length on hit ratio are shown in Fig. 2(c).

Experimental Results and Analyses.

The result of Fig. 2(a) shows that hit ratio of PHRR is 7% higher than PRCM, 10% higher than DJ-MC, 11% higher than PHRR-S, 20% higher than PopRec, and the hit ratio of RandRec is as low as 1%. The results of Fig. 2(b) indicates that when the training sequence length n is 15, the hit ratio is higher than when n is 10 or 5. The longer the training sequence length is, the higher the hit ratio of the recommended next-songs is, and the recommendation result will be more accurate. Figure 2(c) shows that, as the horizon length increasing, the hit ratio of the recommended next-songs also tends to be higher.

4.4 Experimental Results on F1-Score

In this section, we use F1-score as an evaluation indicator to measure the effect of above algorithms on song sequence recommendation.

Performance Comparison on F1-Score.

The results of F1-score of above recommendation algorithms are shown in Table 4. F1@k represents the F1-score of the recommended song sequence whose length is k, and the best results are boldfaced.

Table 4. Performance comparison on F1-score

Full size table

Effect of Training Length of Song Sequence on F1-Score.

Compared with other reinforcement learning based algorithms, the proposed PHRR promotes the precision by enhancing the simulation of interaction process. The experimental results of the effect of song sequence training length on F1-score are shown in Fig. 3(b).

Effect of Horizon Length on F1-Score.

In the next-song recommendation stage (Algorithm 1), we only recommend the first song of the song sequence with highest total reward, instead of recommending this entire song sequence. Because as noise accumulating during the self-updating process, the variation of the model would be larger. The experimental results of the effect of horizon on F1-score are shown in Fig. 3(c).

Experimental Results and Analyses.

As shown in Fig. 3(a), F1-score of PHRR is 4% higher than DJ-MC, 6% higher than PHRR-S, 11% higher than PRCM and 20% higher than PopRec on average. PHRR enhances simulating listener’s interaction in the reinforcement learning process, while other algorithms don’t consider it. Figure 3(b) presents that, when the song sequence training length n is 15, F1-score is higher than when n is 10 or 5. The longer training length can bring more chances to enhance the simulation of interaction. Figure 3(c) indicates that as the horizon length increasing, F1-score shows a slight higher. The horizon length shouldn’t be too long, because too long horizon length is not significantly useful for improving the effect but increases the complexity.

5 Conclusion

In this paper, we propose a hybrid recommendation algorithm for music based on reinforcement learning (PHRR) to recommend higher quality song sequences. WMF and CNN are trained to learn song feature vectors from the songs’ audio signals. Besides, we present a model-based reinforcement learning framework to simulate the decision-making behavior of listeners, and model the reinforcement learning problem as a Markov decision process based on listeners’ preferences both for songs and song transitions. To capture the minor changes of listeners’ preferences sensitively, we innovatively enhance the simulation of interaction process to update the model more data-efficiently. Experiments conducted on real-world datasets demonstrate that PHRR has a better effect of music recommendation than other comparison algorithms.

In the future, we will incorporate more human behavioral characteristics into the model. We also want to analyze the role of these characteristics for recommendation.

References

Bawden, D., Robinson, L.: The dark side of information: overload, anxiety and other paradoxes and pathologies. J. Inf. Sci. 35(2), 180–191 (2008)
Article Google Scholar
Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web. LNCS, vol. 4321, pp. 325–341. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72079-9_10
Chapter Google Scholar
Aaron, V.D.O., Dieleman, S., Schrauwen, B.: Deep content-based music recommendation. In: NIPS, vol. 26, pp. 2643–2651 (2013)
Google Scholar
Brunialti, L.F., Peres, S.M., Freire, V., et al.: Machine learning in textual content-based recommendation systems: a systematic review. In: SBSI (2015)
Google Scholar
Fletcher, K.K., Liu, X.F.: A collaborative filtering method for personalized preference-based service recommendation. In: ICWS, pp. 400–407 (2015)
Google Scholar
Koenigstein, N., Koren, Y.: Towards scalable and accurate item-oriented recommendations. In: RecSys, pp. 419–422 (2013)
Google Scholar
Yao, L., Sheng, Q.Z., Segev, A., et al.: Recommending web services via combining collaborative filtering with content-based features. In: ICWS, pp. 42–49 (2013)
Google Scholar
Francois-Lavet, V., Henderson, P., Islam, R., et al.: An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018)
Article Google Scholar
Li, H., Chan, T.N., Yiu, M.L., et al.: FEXIPRO: fast and exact inner product retrieval in recommender systems. In: SIGMOD, pp. 835–850 (2017)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
MATH Google Scholar
Wang, X., Wang, Y., Hsu, D., et al.: Exploration in interactive personalized music recommendation: a reinforcement learning approach. In: TOMM, pp. 1–22 (2013)
Google Scholar
Chen, J., Wang, C., Wang, J.: A personalized interest-forgetting markov model for recommendations. In: AAAI, pp. 16–22 (2015)
Google Scholar
Liebman, E., Saartsechansky, M., Stone, P.: DJ-MC: A reinforcement-learning agent for music playlist recommendation. In: AAMAS, pp. 591–599 (2015)
Google Scholar
Hu, B., Shi, C., Liu, J.: Playlist recommendation based on reinforcement learning. In: ICIS, pp. 172–182 (2017)
Google Scholar
Zhang, K., Zhang, Z., Bian, K., et al.: A personalized next-song recommendation system using community detection and markov model. In: DSC, pp. 118–123 (2017)
Google Scholar
Ashkan, A., Kveton, B., Berkovsky, S., et al.: Optimal greedy diversity for recommendation. In: IJCAI, pp. 1742–1748 (2015)
Google Scholar
Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: ICDM, pp. 263–272 (2008)
Google Scholar
Kim, P.: Convolutional Neural Network. In: MATLAB Deep Learning, pp. 121–147 (2017)
Google Scholar
On, C.K., Pandiyan, P.M., Yaacob, S., et al.: Mel-frequency cepstral coefficient analysis in speech recognition. In: Computing & Informatics, pp. 2–6 (2006)
Google Scholar
Urieli, D., Stone, P.: A learning agent for heat-pump thermostat control. In: AAMAS, pp. 1093–1100 (2013)
Google Scholar
Liebman, E., Chor, B., Stone, P.: Representative selection in nonmetric datasets. Appl. Artif. Intell. 29(8), 807–838 (2015)
Article Google Scholar
Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., et al.: The million song dataset challenge. In: ISMIR (2011)
Google Scholar
Chen, S., Xu, J., Joachims, T.: Multi-space probabilistic sequence modeling. In: KDD, pp. 865–873 (2013)
Google Scholar
Zhang, S., Yao, L., Sun, A., et al.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)
Google Scholar
Wu, Y., Dubois, C., Zheng, A.X., Ester, M.: Collaborative denoising auto-encoders for top-n recommender systems. In: WSDM, pp. 153–162 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar

Download references

Acknowledgments

We would like to thank Kan Zhang and Qilong Zhao for valuable discussions. This work is supported by the National Key R&D Program of China (No. 2019YFA0706401), National Natural Science Foundation of China (No. 61672264, No. 61632002, No. 61872399, No. 61872166 and No. 61902005) and National Defense Technology Strategy Pilot Program of China (No. 19-ZLXD-04-12-03-200-02).

Author information

Authors and Affiliations

School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Yu Wang

Authors

Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Wang .

Editor information

Editors and Affiliations

School of Information Systems, Singapore Management University, Singapore, Singapore
Hady W. Lauw
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Raymond Chi-Wing Wong
Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece
Alexandros Ntoulas
School of Information Systems, Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Institute of Data Science, National University of Singapore, Singapore, Singapore
See-Kiong Ng
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Sinno Jialin Pan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y. (2020). A Hybrid Recommendation for Music Based on Reinforcement Learning. In: Lauw, H., Wong, RW., Ntoulas, A., Lim, EP., Ng, SK., Pan, S. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2020. Lecture Notes in Computer Science(), vol 12084. Springer, Cham. https://doi.org/10.1007/978-3-030-47426-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-47426-3_8
Published: 06 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47425-6
Online ISBN: 978-3-030-47426-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Hybrid Recommendation for Music Based on Reinforcement Learning

Abstract

Similar content being viewed by others

Music recommender using deep embedding-based features and behavior-based reinforcement learning

Towards the Improvement of Personalized Music Recommendation System Using Deep Learning Techniques

MCRN: A New Content-Based Music Classification and Recommendation Network

Keywords

1 Introduction

2 Related Work

3 Our Approach

3.1 Music Feature Extraction

3.2 Problem Description

3.3 Listener Reward Function Model

Listener Reward for Songs.

Listener Reward for Song Transitions.

Listener Preference for Songs.

Listener Preference for Song Transitions.

3.4 Next-Song Recommendation Model

3.5 Song Sequence Recommendation and Update Model

4 Experiment

4.1 Datasets

Million Song Dataset.

Taste Profile Subset Dataset.

Historical Song Playlist Dataset.

4.2 Comparison Algorithms and Evaluation Methods

Comparison Algorithms.

Evaluation Methods.

4.3 Experimental Results on Hit Ratio

Performance Comparison on Hit Ratio.

Effect of Training Length of Song Sequence on Hit Ratio.

Effect of Horizon Length on Hit Ratio.

Experimental Results and Analyses.

4.4 Experimental Results on F1-Score

Performance Comparison on F1-Score.

Effect of Training Length of Song Sequence on F1-Score.

Effect of Horizon Length on F1-Score.

Experimental Results and Analyses.

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation