1 Introduction

Recommendation technology [1], which is designed to help users pick out the items they are interested in from a wide range of items, is an important tool for solving the information overload problem and therefore attracts a great deal of attention from both the academia and business communities. Actually, it is of great practical application value to many web services scenarios such as e-commerce, advertisement, multimedia (including movie, music, etc.), and even social networking.

It is natural to imagine that the type of information (also called “context” in a broad sense) introduced in the recommendation system has an important impact on the quality of recommendation service, with different type of context having its own advantages. In this paper, we divide context in a recommendation system into two categories, including external context and internal context, which is illustrated in Fig. 1. External context includes physical context (e.g., time, location) [2], social context (e.g., trust, companion, festival) [14] and so on, while internal context mainly refers to user’s direct preference in the form of complex latent pattern hidden in the known rating matrix [7, 9]. In recent years, more and more researchers are paying attention to context-aware recommendation systems [4] that make predictions through the introduction of various external context, believing that introducing appropriate external context can effectively improve the recommendation performance. However, it may cause the problem of model inflexibility and computational burden. Furthermore, external context may not be available for many systems due to design flaws in early versions, so algorithms that only make use of internal context still occupy an important place in the research community.

Fig. 1.
figure 1

Illustration of two categories of context in recommendation systems.

Different from context-aware recommendation systems, traditional recommendation systems (typically collaborative filtering recommendation systems [4, 7, 9, 10]) only consider internal context, that is, ratings assigned by users to items. A newly proposed model called matrix factorization with multiclass preference context (MF-MPC) [9] is a unified method which combines the two major categories of collaborative filtering – neighborhood-based [4] and model-based [7, 10]. Briefly, MF-MPC is an improved method of SVD (a kind of matrix factorization method) [10] by adding a matrix transformed from the multiclass preference context of a certain user, which represents the user similarities in a neighborhood-based method. In this paper, we further introduce a matrix factorization model that combines not only user similarities but also item similarities, which is thus called MF with dual MPC (MF-DMPC for short). We also study the effect of user similarities and item similarities separately. Experimental results show that our MF-DMPC performs better than MF-MPC [9], and the degree of performance improvement is probably effected by the ratio of the user group size to the item group size, as well as the density of the rating matrix. MF-DMPC inherits not only high accuracy of model-based recommendation algorithms, but also good explainability of neighborhood-based algorithms.

Table 1. Some notations and explanations.

2 Preliminaries

2.1 Problem Definition

In this paper, we study the problem of making good use of internal context in recommendation systems, which means that we will only need an incomplete rating matrix represented by \(\mathcal {R} = \{(u,i,r_{ui})\}\) for our prediction task, where u represents one of the ID numbers of n users (or rows), i represents one of the ID numbers of m items (or columns), and \(r_{ui}\) is the recorded rating of user u to item i (\(r_{ui} \in \mathbb {M}\), \(\mathbb {M}\) can be \(\{1,2,3,4,5\}\), \(\{0.5,1,1.5,\ldots ,5\}\) or other ranges). As a result, we build an improved model based on MF-MPC [9] to estimate the missing entries of the rating matrix. In other words, we are studying a problem called rating prediction. Some notations used to establish our model are shown in Table 1.

2.2 Multiclass Preference Context

In the state-of-the-art matrix factorization based model – SVD model [10], the prediction rule for the rating of user u to item i is as follows,

$$\begin{aligned} \hat{r}_{ui} = U_{u\cdot } V_{i\cdot }^T + b_u + b_i + \mu , \end{aligned}$$
(1)

where \(U_{u\cdot } \in \mathbb {R}^{1 \times d}\) and \(V_{i\cdot } \in \mathbb {R}^{1 \times d}\) are the user-specific and item-specific latent feature vectors, respectively. And \(b_u\), \(b_i\) and \(\mu \) are the user bias, the item bias and the global average, respectively.

In the MF-MPC model [9], the rating of user u to item i, i.e., \(r_{ui}\), can be represented in a probabilistic way as follows:

$$\begin{aligned} P(r_{ui}|(u,i);(u,i',r_{ui'}), i' \in \cup _{r \in \mathbb {M}}\mathcal {I}_u^r \backslash \{i\}), \end{aligned}$$
(2)

which means that \(r_{ui}\) is dependent on not only the (user, item) pair (ui), but also the examined items \(i' \in \mathcal {I}_u \backslash \{i\}\) and the categorical score \(r_{ui'} \in \mathbb {M}\) of each item. Notice that multiclass preference context (MPC) refers to the condition \((u,i',r_{ui'}), i' \in \cup _{r \in \mathbb {M}}\mathcal {I}_u^r \backslash \{i\}\).

In order to introduce multiclass preference context into matrix factorization based model, we need a user-specific latent preference vector \(\bar{U}^{MPC}_{u\cdot }\) for user u from the multiclass preference context [9],

$$\begin{aligned} \bar{U}^{MPC}_{u\cdot } = \sum _{r \in \mathbb {M}} \frac{1}{\sqrt{|\mathcal {I}^{r}_u \backslash \{i\}|}} \sum _{i' \in \mathcal {I}^{r}_u \backslash \{i\}} M^{r}_{i'\cdot }. \end{aligned}$$
(3)

Notice that \(M^r_{i\cdot } \in \mathbb {R}^{1 \times d}\) is a classified item-specific latent feature vector and \(\frac{1}{\sqrt{|\mathcal {I}^{r}_u \backslash \{i\}|}}\) plays as a normalization term for the preference of class r.

If we multiply the MPC expressions of two users u and \(u'\) (\(\bar{U}^{MPC}_{u\cdot }\) and \(\bar{U}^{MPC}_{u'\cdot }\)) together, and set \(\mathbb {M} = \{1\}\) specially, we will have

$$\begin{aligned} \left\langle \bar{U}^{MPC}_{u\cdot }, \bar{U}^{MPC}_{u'\cdot } \right\rangle \approx \frac{\sum _{i' \in \mathcal {I}_u} M^{r}_{i'\cdot } \cdot \sum _{i' \in \mathcal {I}_{u'} } M^{r}_{i'\cdot }}{\sqrt{|\mathcal {I}_u| |\mathcal {I}_{u'}|}}, \end{aligned}$$
(4)

which is quite similar to the cosine-based similarity [11] – a measure of the similarity between two users shown as follows:

$$\begin{aligned} sim(u,u') = cos(\varvec{R}_{u,*}, \varvec{R}_{u',*}) = \frac{\varvec{R}_{u,*} \cdot \varvec{R}_{u',*}}{\left\| \varvec{R}_{u,*}\right\| _2 \left\| \varvec{R}_{u',*}\right\| _2}, \end{aligned}$$
(5)

where \(\varvec{R}\) is the \(n \times m\) user-item matrix. Therefore we believe that multiclass preference context can represent user similarities.

By adding the neighborhood information \(\bar{U}^{MPC}_{u\cdot }\) to SVD model, we can get the MF-MPC prediction rule for the rating of user u to item i [9]:

$$\begin{aligned} \hat{r}_{ui} = U_{u\cdot } V_{i\cdot }^T + \bar{U}^{MPC}_{u\cdot } V_{i\cdot }^T + b_u + b_i + \mu , \end{aligned}$$
(6)

where \(U_{u\cdot }\) and \(V_{i\cdot }\), \(b_u\), \(b_i\) and \(\mu \) are exactly the same with that of the SVD model. MF-MPC is proved to generate better recommendation performance than SVD [10] and SVD++ [7], and also contains them as particular cases.

3 Matrix Factorization with Dual Multiclass Preference Context

Inspired by the differences between user-based and item-based collaborative filtering [4], we can infer that item similarities (item-based multiclass preference context) can also be introduced to improve the performance of matrix factorization model. Furthermore, thanks to the extendibility of MF model, we can hopefully introduce both user-based and item-based MPC into the prediction rule so as to obtain an improved model – matrix factorization with dual multiclass preference context (MF-DMPC). The derivation process can be found in Fig. 2.

Fig. 2.
figure 2

Illustration of user-based MF-MPC, item-based MF-MPC and our MF-DMPC.

3.1 Dual Multiclass Preference Context

Now we call \(\bar{U}^{MPC}_{u\cdot }\) (mentioned in previous section) user-based multiclass preference context (user-based MPC). Before defining dual multiclass preference context (DMPC), we should first define item-based multiclass preference context (item-based MPC) \(\bar{V}^{MPC}_{i\cdot }\) to represent item similarities. Symmetrically, we have

$$\begin{aligned} \bar{V}^{MPC}_{i\cdot } = \sum _{r \in \mathbb {M}} \frac{1}{\sqrt{|\mathcal {U}^{r}_i \backslash \{u\}|}} \sum _{u' \in \mathcal {U}^{r}_i \backslash \{u\}} N^{r}_{u'\cdot }, \end{aligned}$$
(7)

where \(N^r_{u\cdot } \in \mathbb {R}^{1 \times d}\) is a classified user-specific latent feature vector. So now we have the item-based MF-MPC prediction rule,

$$\begin{aligned} \hat{r}_{ui} = U_{u\cdot } V_{i\cdot }^T + \bar{V}_{i\cdot }^{MPC} U_{u\cdot }^T + b_u + b_i + \mu . \end{aligned}$$
(8)

When choosing between the implementation of a user-based and an item-based neighborhood recommender system, we often consider criteria such as accuracy, efficiency, and so on. However these properties always depend on the ratio between the number of users and items in the system.

The good news is that through the advanced matrix factorization method, now we can introduce both user-based and item-base neighborhood information into our model by keeping both \(\bar{U}^{MPC}_{u\cdot }\) and \(\bar{V}^{MPC}_{i\cdot }\) in the model, collectively called dual multiclass preference context (DMPC).

3.2 Prediction Rule and Optimization Problem

For matrix factorization with dual multiclass preference context, the prediction rule for the rating of user u to item i is defined as follows,

$$\begin{aligned} \hat{r}_{ui} = U_{u\cdot } V_{i\cdot }^T + \bar{U}^{MPC}_{u\cdot } V_{i\cdot }^T + {\bar{V}_{i\cdot }}^{MPC} U_{u\cdot }^T + b_u + b_i + \mu , \end{aligned}$$
(9)

with all notations described above. Finally, we call our new model MF-DMPC in short.

With the prediction rule, we can learn the model parameters in the following minimization problem,

$$\begin{aligned} \min _\varTheta \sum ^n_{u=1} \sum ^m_{i=1} y_{ui} [ \frac{1}{2} (r_{ui}-\hat{r}_{ri})^2 + reg(u,i)], \end{aligned}$$
(10)

where \(reg(u,i) = \frac{\alpha _m}{2} \sum _{r\in \mathbb {M}} \sum _{i' \in \mathcal {I}^{r}_u \backslash \{i\}} ||M^r_{i'}||^2_F + \frac{\alpha _n}{2} \sum _{r\in \mathbb {M}} \sum _{u' \in \mathcal {U}^{r}_i \backslash \{u\}} ||N^r_{u'}||^2_F\) + \(\frac{\alpha _u}{2}||U_{u\cdot }||^2 + \frac{\alpha _v}{2}||V_{i\cdot }||^2 + \frac{\beta _u}{2}||b_{u\cdot }||^2 + \frac{\beta _v}{2}||b_{i\cdot }||^2\) is the regularization term used to avoid overfitting, and \(\varTheta = \{U_{u\cdot },V_{i\cdot },b_u,b_i,\mu ,M^r_{i\cdot }\mu ,N^r_{u\cdot }\}\), \(u = 1,2,\dots ,n\), \(i = 1,2,\dots ,m\), \(r\in \mathbb {M}\). Notice that the objective function of MF-DMPC is quite similar to that of MF-MPC. The difference lies in the “dual” MPC, i.e., \({\bar{V}_{i\cdot }}^{MPC} U_{u\cdot }^T\) in the prediction rule, and \(\frac{\alpha _n}{2} \sum _{r\in \mathbb {M}} \sum _{u' \in \mathcal {U}^{r}_i \backslash \{u\}} ||N^r_{u'}||^2_F\) in the regularization term.

3.3 Algorithm

Using the stochastic gradient descent (SGD) algorithm, we have the gradients of the model parameters for a randomly sampled rating record \((u, i, r_{ui})\),

$$\begin{aligned} \nabla U_{u\cdot }&= -e_{ui} (V_{i\cdot } + \bar{V}^{MPC}_{i\cdot }) + \alpha _u U_{u\cdot } \end{aligned}$$
(11)
$$\begin{aligned} \nabla V_{i\cdot }&= -e_{ui} (U_{u\cdot } + \bar{U}^{MPC}_{u\cdot }) + \alpha _v V_{i\cdot } \end{aligned}$$
(12)
$$\begin{aligned} \nabla b_u&= -e_{ui} +\beta _u b_u \end{aligned}$$
(13)
$$\begin{aligned} \nabla b_i&= -e_{ui} +\beta _v b_i \end{aligned}$$
(14)
$$\begin{aligned} \nabla \mu&= -e_{ui} \end{aligned}$$
(15)
$$\begin{aligned} \nabla M^{r}_{i'\cdot }&= - \frac{-e_{ui} V_{i\cdot }}{\sqrt{|\mathcal {I}^{r}_u \backslash \{i\}|}} + \alpha _m M^{r}_{i'\cdot }, i' \in \mathcal {I}^{r}_u \backslash \{i\}, r \in \mathbb {M}. \end{aligned}$$
(16)
$$\begin{aligned} \nabla N^{r}_{u'\cdot }&= - \frac{-e_{ui} U_{u\cdot }}{\sqrt{|\mathcal {U}^{r}_i \backslash \{u\}|}} + \alpha _n N^{r}_{u'\cdot }, u' \in \mathcal {U}^{r}_i \backslash \{u\}, r \in \mathbb {M}. \end{aligned}$$
(17)

where \(e_{ui} = (r_{ui} - \hat{r}_{ui})\) is the difference between the true rating and the predicted rating.

And we have the update rules,

$$\begin{aligned} \theta = \theta - \gamma \nabla \theta \end{aligned}$$
(18)

where \(\gamma \) is the learning rate, and \(\theta \in \varTheta \) is a model parameter to be learned.

The algorithm of MF-DMPC (see Fig. 3) consists of three major steps. Firstly, we randomly pick out a rating record sample from the training data. Secondly, we calculate the gradients via Eqs. (1117). Thirdly, we update each model parameter via Eq. (18). The major difference between the algorithm of MF-DMPC and that of MF-MPC [9] lies in the prediction rule as shown in Eq. (9) and the corresponding gradients.

Fig. 3.
figure 3

The algorithm of MF-DMPC.

Judging from algorithmic efficiency, the time complexity of MF-MPC [9] and SVD++ [7] and the proposed MF-DMPC is \(\text {MF-DMPC}> \text {MF-MPC} > \text {SVD++}\), mainly because of the traversal during calculating \(\bar{U}^{MPC}_{u\cdot }\) and \(\bar{V}^{MPC}_{i\cdot }\) in MF-DMPC, \(\bar{U}^{MPC}_{u\cdot }\) in MF-MPC, and \(\bar{U}^{OPC}_{u\cdot }\) (oneclass preference context defined in [9]) in SVD++. As for space complexity, we can reckon from the size of dominating model parameters vectors shown in Table 2. In general, our MF-DMPC consume more time and memory than the closely related algorithms.

Table 2. The size of dominating model parameters vectors in different models.

4 Experiments

In this section, we wonder what benefit does MF-DMPC bring with more resource consumption. We expect that it may lead to improvement of the accuracy, which is probably related to the effects of different kinds of MPC (including user-based MPC, item-based MPC and dual MPC).

4.1 Data Sets and Evaluation Metrics

For convenience, we choose the same data sets used in previous research about MF-MPC [9]. They are three public data sets from the Grouplens research lab, including MovieLens100K (ML100K), MovieLens1M (ML1M) and MovieLens10M (ML10M). Table 3 shows some information about them. Notice that now the ratio of user group size to item group size and the density of the rating matrix should be important factors to analyze the results of the experiments due to different characteristics of different neighborhood-based algorithms. We use five-fold cross validation in the empirical studies.

Table 3. Statistics of the data sets used in the experiments.

We adopt mean absolute error (MAE) and root mean square error (RMSE) as evaluation metrics:

$$\begin{aligned} MAE= & {} \sum _{(u,i,r_{ui}) \in \mathcal {R}^{te}} |r_{ui} - \hat{r}_{ui}|/|\mathcal {R}^{te}| \\ RMSE= & {} \sqrt{\sum _{(u,i,r_{ui}) \in \mathcal {R}^{te}} (r_{ui} - \hat{r}_{ui})^2/|\mathcal {R}^{te}|} \end{aligned}$$

4.2 Baselines and Parameter Settings

In order to find out the effects of introducing different kinds of MPC into matrix factorization (MF) model, we compare the performance of SVD (see Eq. (1)) against that achieved by matrix factorization with user-based MPC (see Eq. (6)), item-based MPC (see Eq. (8)) and dual MPC (see Eq. (9)).

We configure the parameter settings of factorization-based methods as follows:

  • For the learning rate \(\gamma \), we set it to a commonly used default value, that is \(\gamma = 0.01\).

  • For the number of latent dimensions d, it is enough to show the advantages of introducing MPC when \(d = 20\) (according to [9]).

  • We set the iteration number \(T = 50\), where results have reached steady state.

  • The tradeoff parameters are searched through experiment using the first copy of each data and the RMSE metric, and follow the following conditions: \(\alpha _u = \alpha _v = \beta _u = \beta _v = \alpha \), \(\alpha \in \{0.001, 0.01, 0.1\}\); for user-based MF-MPC, \(\alpha _m = \alpha \); for item-based MF-MPC, \(\alpha _n = \alpha \); for dual MF-MPC, \(\alpha _m,\alpha _n \in \{0.001,0.01,0.1\}\).

4.3 Results

The experimental results are shown in Table 4. Notice that the tradeoff parameters shown in the table are the searched best value for each method.

Table 4. Recommendation performance of our MF-DMPC and other baseline methods on three MovieLens data sets.

From the results in Table 4, we can have the following observations:

  • The accuracy of factorization framework greatly improve when introducing multiclass preference context;

  • Among all kinds of MPC, dual MPC contributes the most to the achievement of minimizing prediction error;

  • In the MovieLens datasets, whether user-based or item-based MPC is more helpful depends on the ratio of user group size to item group size (n/m). Normally, item-based MF-MPC performs better when n / m is of a suitable size. As n / m getting larger, user-based MPC becomes even more important (may be affected by additional factors such as the density of rating matrix); and

  • The performance of MF-DMPC is in a way restrained by the better result between user-based and item-based MF-MPC – just slightly better than the better result. The improvement shows that MF-DMPC strikes a good balance between user-based MPC and item-based MPC.

Last but not the least, MF-DMPC not only inherits high accuracy of model-based algorithm, but also inherits good explainability of neighborhood-based algorithm.

5 Related Work

5.1 Recommendation with Internal Context

Recommendation with internal context refers primarily to collaborative filtering, which consider only rating data given by users. Collaborative filtering (or simply “CF”) methods can be broadly separated into two categories: neighborhood-based and model-based (see Fig. 4). Neighborhood-based CF, which is based on the assumption that users with similar interests have similar preferences for an item, makes predictions by calculating user similarities (represented by user-based CF method) or item similarities (represented by item-based CF method) [11]. Model-based CF (represented by SVD methods) can obtain latent feature vectors of users and items through matrix factorization model, and further obtain predictive scores. SVD++ method proposed by Koren [7], which is regarded as attempt to combine neighborhood-based and model-based methods by adding a latent feature vector (which is actually a latent form of user similarities), gets a better result. After analyzing the merits and demerits of SVD++, Pan et al. proposed the MF-MPC method [9]. Instead of ignoring categorical scores, this method makes full use of multiclass preference context and treats SVD++ as an exception, so as to achieve better results. As an upgraded method of MF-MPC, our MF-DMPC model is also a typical example of using internal context. There are also some different views in modeling the rating scores such as categorical, numerical and ordinal [6], from which we may derive some rich preference context.

Fig. 4.
figure 4

Illustration of related model-based algorithms.

5.2 Recommendation with External Context

For a long time, researchers are searching for ways to make prediction with only user’s explicit feedback in a form of (user, item, rating) etc., due to the limited access to get more information from users in web services. However, with rapid spread of big data, we are now able to make use of more context in recommendation algorithm. Context-aware recommendation have been developed and applied, thanks to the findings in behavioral research on consumer decision. For example, temporal context is likely to have an impact on a travel recommendation system. In contrast with traditional recommendation, context-aware recommendation typically deal with data records of the form (user, item, rating, context), which means that it takes more contextual information (mainly relevant external context) into consideration. According to [4], there are four major approaches to model contextual information in context-aware recommendation systems, distinct by change (static, dynamic) and knowledge (fully observable, partially observable, unobservable) of/about contextual factors. And [4] also introduces the three main algorithmic paradigms for incorporating contextual information into rating-based recommender systems: contextual pre-filtering, post-filtering, and modeling.

5.3 Discussions

From single to hybrid CF recommendation methods, we have achieved great progress on recommendation with internal context. However, we are still looking forward to more improvement because the two-dimensional (2D) data (user, item) will also be included in recommendation with external context and the way of applying the 2D data in traditional method can be illuminating to many other methods such as the prediction rule of timeSVD++ [8].

Different from ours, timeSVD++ is a representative method of containing external context, which also adopts matrix factorization model. TimeSVD++ takes temporal effects on baseline predictors and user preferences into consideration. The prediction rule of timeSVD++ is as follows [8]:

$$\begin{aligned} \hat{r}_{ui} = U_{u\cdot }(t_{ui}) V_{i\cdot }^T + \bar{U}^{OPC}_{u\cdot } V_{i\cdot }^T + b_u(t_{ui}) + b_i(t_{ui}) + \mu , \end{aligned}$$
(19)

where the user bias \(b_u\), item bias \(b_i\) and user-specific latent feature vector \(U_{u\cdot }\) in SVD++ model are replaced by time changing part \(b_u(t_{ui})\), \(b_i(t_{ui})\) and \(U_{u\cdot }(t_{ui})\), respectively. Notice that \(\bar{U}^{OPC}_{u\cdot }\) is the oneclass form of MPC (\(\mathbb {M} = \{0,1\}\)). It is said that a timeSVD++ model of dimension 10 is already more accurate than an SVD model of dimension 200, which can be an evidence of the importance of capturing proper external context (temporal dynamics, in this case). It is understandable that multiclass preference context (defined in our MF-DMPC) can also be introduced into timeSVD++ because of the same factorization framework. Notice that the improvement on timeSVD++ is not illustrated any further since we mention it here only to address the significance of recommendation with internal context in the field of recommendation system.

Notably, approaches mentioned above are all applied in isolated services and ignoring services sociability. Lately, however, a methodology to construct a global social service network for social influence-aware service recommendation approach, which provides recommend-as-you-go, has been proposed [3]. From one perspective, we can say that this work provides a kind of thought to connect recommendation approaches used in different services and therefore achieves better results.

6 Conclusions and Future Work

In this paper, we present a novel collaborative filtering method that joins neighborhood information to factorization model for rating prediction. Specifically, we extend multiclass preference context (MPC) to include two types, i.e., user-based and item-based, and combine them in one single prediction rule in order to achieve better recommendation performance than the reference models.

For future works, as discussed, the inefficiency of our model is an inevitable problem and what we must firstly solve. In the second place, we are interested in studying the issues of robustness of factorization-based algorithms with preference context. We also expect some advanced strategy such as adversarial sampling [13], denoising [12] or multilayer perception [5] to be used.