1 Introduction

Recommender systems are very useful tools in overcoming the information overload problem of users. These systems provide personalized recommendations to a user that he/she might like based on past preferences or observed behavior about one or various items. An essential problem in real-world recommender systems is that users are likely to change their preferences over time. A user’s preference dynamics is known in the literature as temporal dynamics (Koren 2010) that may be caused by various reasons. According to Koren (2010), Rafailidis et al. (2017), and Lo et al. (2018), the most important of these reasons are: (i) User experiences: The past interaction of users and items make users like some items and dislike some others. For example, if a user is satisfied with the purchase on an auction website then he/she will probably continue buying from it in future. (ii) New items: The appearance of new items may change the focus of users. For example, users usually like to explore new items over time instead of interacting multiple times with the same items. (iii) Social influence: Friends’ preferences may affect a user’s decision and change the user preferences over time. (iv) Item popularity: Popular items may affect user interactions, regardless of his/her past preferences. For example, if there is a popular action movie but the user is interested in romantic films, the user may prefer to watch this action movie instead.

Modeling temporal dynamics of user preferences is essential to design a recommender system (Koren 2010; Shokeen and Rana 2018), as it leads to significant improvements in recommendation accuracy (Zafari et al. 2019; Rana and Jain 2015; Cheng et al. 2015). The need to model the dynamics of user preferences over time in recommender systems poses several essential challenging problems. First of all, because the amount of available data dramatically is reduced in a particular time period, the issue of data sparsity (Yusefi Hafshejani et al. 2018) in this situation is more intense (Lo et al. 2018). Moreover, based on the intuition that the time change pattern for each user may differ (Rafailidis and Nanopoulos 2016; Tang et al. 2015), how can the temporal information be incorporated to capture each individual user preference dynamics? Finally, what is the efficient approach to model the dynamics of user preferences in order to generate more accurate recommendations? For this purpose, in this paper, we present a Temporal and Social Collective Matrix Factorization model called TSCMF. The model captures the user preference dynamics based on collective matrix factorization (CMF)(Singh and Gordon 2008) framework to perform temporal recommendation. CMF is an extension of the MF which takes into account the side information, leading to more effective latent features. We take into account the user preferences can change individually over time, and based on the intuition that social influence can affect the users’ preferences in a recommender system, we jointly factorize the users’ rating matrix and social trust matrix via introducing a joint objective function. We adopt stochastic gradient descent (SGD) method and present an efficient optimization algorithm for solving the objective function. In our model, we assume that user preferences change smoothly (Lo et al. 2018; Tang et al. 2015) and the user preferences in the current time period depend on his/her preferences in the previous time period. Therefore, we introduce and learn a transition matrix of user preferences for each individual user to model user dynamics in two successive time periods into CMF. Experimental results on a real-world dataset, Epinions, illustrate that our proposed model outperforms the competitive methods. In addition, the complexity analysis implies that our model can be scaled up to large datasets.

The remainder of this paper is structured as follows. The next section presents the related works. Section 3 defines our problem and details our proposed model. Section 4 reports the experimental results. Finally, Section 5 provides the conclusions and future research directions.

2 Related work

Some studies on capturing the dynamics of user preferences in recommender systems are based on the computing user or item neighborhoods. These approaches generally boost recent ratings and penalize older ratings that possibly have less relevance at recommendation time, by employing time windows or a decay function (Vinagre 2012). For instance, in Su et al. (2015) and Liu et al. (2010), we see that it has given more weight to recently rated items and reduced the importance of past rated items gradually in rating prediction using an exponential time decay function. They consider that the preference dynamics are homogeneous for all users, whereas the changes in user preferences may be individual. A similar method was proposed in Cheng and Wang (2020), which takes into account that different users have different degrees of sensitivity to time. However, the primary challenge in these approaches is that it is hard to estimate an appropriate weighting scheme (Rabiu et al. 2020; Zhang 2015).

The broadly used technique to implement temporal recommender systems is matrix factorization (MF) (Yin et al. 2014). The MF technique has the advantage of relatively high accuracy and scalability (Lo et al. 2018; Yang et al. 2017). In this technique, each users and items is characterized by a series of features showing latent factors of the users and items in the system. It decomposes the matrix of users’ ratings on items into two low-dimensional matrices, which directly profile users and items to the latent feature space, respectively, and these latent features are later used to make user behavior predictions. TimeSVD++ (Koren 2010) is the first MF-based popular method for modeling user preference dynamics. This model adopts the singular value decomposition (SVD) that is the most basic technique to matrix factorization (Yang et al. 2014). TimeSVD++ incorporates time-varying rating biases of each item and user into the MF. It assumes that older ratings are less important in rating prediction. The parameters of this method in different aspects and time periods must be learned individually, so it needs considerable effort for parameter tuning (Lo et al. 2018). A temporal MF method to capture the temporal dynamics in each of the individual user preferences was proposed in Lo et al. (2018). This model uses both rating information within the specific time period and overall rating information to learn the latent feature vector of each user at each time period by introducing a modified SGD algorithm. The method learns a linear model to extract the transition pattern for each user’s latent feature vector using Lasso regression. An approach based on multi-task non-negative MF was presented in Ju et al. (2015) that uses a transition matrix to map between latent features of users in two successive time periods in order to track the temporal dynamics of user preferences. The transition matrix used in this method needs to be fixed, while in practice, this matrix is different for each user and each time period. A temporal MF (TMF) approach was proposed in Zhang et al. (2014) that captures the temporal dynamics of user preferences by designing a transition matrix for each user latent feature vectors between two successive time periods. Next, this approach is extended to a fully Bayesian treatment called BTMF by introducing priors for the hyperparameters to control the complexity and improve the accuracy of TMF. A dynamic MF based on collaborative Kalman filtering approach was proposed in Sun et al. (2014). This method extends the Gaussian probabilistic MF to capture user preference dynamics using a transition matrix of users’ features based on a dynamical state space model. For learning model parameters from historical users’ preferences, it exploits an expectation-maximization (EM) algorithm that uses Kalman filter in the expectation step of the EM. Despite the comprehensiveness of this method, the transition matrix used in it is homogeneous for all users. Moreover, the method is impractical for large datasets due to the run-time performance.

The aforementioned methods exploit only a single type of user-item interaction (users’ rating information) without any side information. Exploiting the side information of users or items (Sun et al. 2019) beside the users’ rating information can help to alleviate the data sparsity problem and thus provides users with better-personalized recommendations (Pan 2016). In this regard, a series of studies based on MF exploit the side information in temporal recommendation systems. A method based on MF was proposed in Wu et al. (2018) that fuses ratings, review texts, and the relationship between items by considering the temporal dynamics of user preferences to improve prediction results. The authors use TimeSVD++ as part of the model to capture temporal dynamics. However, the rating prediction for new users is difficult in this method. Moreover, this method assumes that the number of latent factors in ratings is equal to the number of hidden topics in reviews, while, as the authors point out, the number of latent factors is more than the number of hidden topics. CMF is an effective method that can be employed in recommender systems to simultaneously factorize multiple related matrices such as ratings and trust matrices. A temporal CMF method to generate the recommendations was proposed in Li and Fu (2017). This work jointly factorizes the multimodal user-item interactions to extract the user temporal pattern. The method introduces a transition matrix of users’ preferences between two successive user latent feature matrices. Similarly, a dynamic CMF approach to predict the behavior of users was proposed in Rafailidis et al. (2017), which introduces a transition matrix of users’ behaviors. This method models the temporal dynamics between purchase activity and click response behavior of users. It exploits the side information to alleviate the sparsity problem. The transition matrix used in these two last methods is homogeneous for all users; which is a major limitation of them.

Social trust information accumulated in social networks would be a rich source of information to address the aforementioned sparsity problem (Shokeen and Rana 2018), which has recently attracted the attention of many researchers into their recommendation models (Guo et al. 2016; Wu et al. 2016). A user is more likely to be affected by users whom he/she trusts. Therefore, the trust relations between users affect users’ preferences. Although trust information is also very sparse, especially in a time period, it is complementary to rating information. Taking collective preferences and social trusts between users in a social recommendation system as additional input can be helpful in making more accurate and personalized recommendations (Bao et al. 2013). An SVD-based method was presented in Tong et al. (2019) that integrates rating, trust and time information to model user preference dynamics. This method includes time-variant biases for each item and each user. However, in this method, the feature vectors of users are not optimized with temporal information. In Aravkin et al. (2016), a framework was developed that incorporates trust relations into dynamic MF model to capture user preference dynamics. The method defines a transition matrix of users’ preferences, assumes that trust relations among users are a graph at each time period, and considers a regularization term for dynamics that can incorporate known trust relations via the graph Laplacian. This method assumes that the preference dynamics are homogeneous for all users. In Liu et al. (2013), an approach was proposed in which heterogeneous user feedbacks as well as time and social networks are exploited for more accurate movie recommendation. It proposes a ranking-based MF model for combining both implicit and explicit user feedback, and extends the model to a sequential MF model for enabling time awareness parameterization. An approach based on social probabilistic MF was proposed in Bao et al. (2013) which exploits both temporal and social information to predict user preferences in micro-blogging. In this method, by employing an exponential time decay function, the users’ latent features and the topics associated with previous latent features are made. The method considers the importance of all previous time periods as well as the current as the same for all users and assigns the same weight to all users. However, in practice, the importance of previous time periods varies for each user.

Some studies exploit tensor factorization (TF) (Frolov and Oseledets 2017; Oh et al. 2019) to model user preference dynamics. In these studies, TF extends MF into a three-dimensional tensor through adding the temporal effects to the model. In Xiong et al. (2010), a movie recommendation method was proposed based on the Bayesian probabilistic TF. This method introduces a set of additional time features and adds constraints in the time dimension of the tensor to model the evolution of data over time. In Dunlavy et al. (2011) and Spiegel et al. (2011) were proposed the temporal link prediction methods based on TF. In Dunlavy et al. (2011), time-evolving bipartite graphs were employed and several methods were presented based on both matrix and tensor factorizations for predicting future links. In Spiegel et al. (2011), the importance of past user preferences using a smoothing factor was reduced. This method gives all user preferences the same weight at a specific time period whereas user preference dynamics of each user may vary individually (Rafailidis and Nanopoulos 2016). A temporal recommendation model based on the coupled TF was proposed in Rafailidis and Nanopoulos (2016). In this model, the importance of user past preferences is weighted based on a proposed user preference dynamics rate. The user demographics as side information are coupled with temporal interactions of users in this model. Despite the success of temporal recommendation methods based on TF, the processing and solving the tensor decomposition is hard (Lo et al. 2018) and usually leads to very high computing cost in practice (Zou et al. 2015), especially when the tensor is large and sparse (Lo et al. 2018).

Different from the aforementioned methods, in the present study, we model the temporal dynamics of user preferences by extending the CMF formulation to jointly factorize two matrices of user-item rating and social trust. Under the assumptions that the time change pattern for each user differs and that the user preferences change smoothly, we learn a transition matrix for each individual user to capture user dynamics in two successive time periods.

3 Proposed model

In this section, first we describe the problem definition and introduce the notations used throughout the paper. Then we present our TSCMF model.

3.1 Problem definition

Table 1 presents the important notations used throughout this paper.

Table 1 Important notations

Suppose we have a social recommender system including m users indexed from i = 1, 2, …, m and n items indexed from j = 1, 2, …, n. We consider two types of information sources with timestamps including user-item ratings and social trusts between users. Given P pre-defined time periods indexed from t = 1, 2, …, P, we define \( R^{\left (t \right ) } \in R^{m \times n} \) to be the user-item rating matrix in time period t, and \( R_{ij}^{\left (t \right ) } \) indicates the rating given by user i on item j in time period t. The ratings are normally integer values between 0 and Rmax (eg., 0 to 5), where 0 denotes that the user has not rated that item in time period t. The higher rating means the better satisfaction. In practice, each user rates only a few items and thus \( R^{\left (t \right ) } \) is usually very sparse.

In social recommender systems, a user not only can rate items, he/she can also often specify other users as trusted friends. Let \( T^{\left (t \right ) } \in R^{m \times m} \) be the user-user trust matrix in time period t and \( T_{ik}^{\left (t \right ) } \in \left [ 0,1 \right ] \) denotes that the extent user i trusts the user k in time period t. \( T_{ik}^{ \left (t \right ) }=1 \) indicates the user i extremely trusts user k in time period t and \( T_{ik}^{\left (t \right ) }=0 \) donates the user i does not trust k in this time period.

Based on intuitions that users’ preferences change individually over time and social influence can affect the users’ preferences in a recommendation system, our goal is to provide a model to predict \( R^{\left (t \right ) } \) by capturing the preference dynamics for each individual user based on integrating the ratings and trust matrices.

3.2 Temporal and social collective matrix factorization (TSCMF)

In this section, we first formulate the objective function of our TSCMF model to capture the user preference dynamics based on CMF for performing temporal recommendation. We then devise an optimization algorithm for solving the objective function. Finally, we analyze the complexity of our model. Figure 1 shows the framework of the proposed model.

Fig. 1
figure 1

The framework of the proposed TSCMF model

3.2.1 Objective function

As mentioned before, when a user is rating, the existing ratings of users whom he/she trusts will more likely affect his/her rating. Based on this intuition as well as considering temporal dynamics of user preferences, we present an approach to fuse users’ rating matrix and social trust matrix under a CMF framework by considering the temporal information to model each individual user preference dynamics. The standard CMF ignores the temporal dynamics and it can only exploit all the previous data for model training. The old data may not be useful and may even have a negative impact on the making recommendations in the current time since the user preferences might change dramatically over a long period of time (Li and Fu 2017; De Pessemier et al. 2010). Therefore, unlike the standard CMF, we do not exploit the training data from all previous time periods. However, since users’ rating information at a time period is very sparse, the social trust information that we use beside the users’ rating information can alleviate the sparsity problem.

Suppose \( U^{\left (t \right ) } \in R^{m \times d} \) and \( V^{\left (t \right ) } \in R^{n \times d} \) be the latent feature matrices of users and items in time period t, respectively, with row vectors \( U_{i}^{\left (t \right ) } \) and \( V_{j}^{\left (t \right ) } \) indicating the d-dimensional latent feature vectors of user i and item j, in time period t, respectively (where dmin(m,n)). MF learns the latent feature vectors of users and items in time period t from all known ratings in this time period and then can predict \(\hat { R}_{ij}^{\left (t \right ) } \) as approximation value of \( R_{ij}^{\left (t \right ) } \) by inner product of \( U_{i}^{\left (t \right ) } \) and \( V_{j}^{\left (t \right ) } \), i.e., \( \hat {R}_{ij}^{\left (t \right ) }=U_{i}^{\left (t \right ) }V_{j}^{\left (t \right )^{T}} \), where \( V_{j}^{\left (t \right )^{T}} \) is the transpose of \( V_{j}^{\left (t \right ) } \). Also, suppose \( B^{\left (t \right ) } \in R^{m \times d} \) and \( W^{\left (t \right ) } \in R^{m \times d} \) be the latent feature matrices of trusters and trustees in time period t, respectively, with row vectors \( B_{i}^{\left (t \right ) } \) and \( W_{k}^{\left (t \right ) } \) indicating the d-dimensional latent feature vectors of truster i and trustee k in time period t, respectively. MF learns these two vectors from existing trust relations in time period t, and then the trust value \( T_{ik}^{ \left (t \right ) } \) can be predicted by inner product of \( B_{i}^{\left (t \right ) } \) and \( W_{k}^{\left (t \right ) } \), i.e., \( B_{i}^{\left (t \right ) }W_{k}^{\left (t \right )^{T}} \), where \( W_{k}^{\left (t \right )^{T}} \) is the transpose of \( W_{k}^{\left (t \right ) } \).

Since the users in the rating matrix and the trusters in the trust matrix are the same Yang et al. (2017) and Guo et al. (2016), based on CMF, we jointly factorize these matrices by associating them through sharing a common user latent feature space. We consider the users feature matrix \( U^{\left (t \right ) } \) as the latent space commonly shared by \( R^{\left (t \right ) } \) and \( T^{ \left (t \right ) } \). Therefore, every vector \( U_{i}^{\left (t \right ) } \) simultaneously characterizes how the user i rates items and also how the same user trusts others in time period t. In addition, without loss of generality, similar to Yang et al. (2017), Yu et al. (2018), and Jamali and Ester (2010), we map the raw rating \( R_{ij}^{\left (t \right ) } \) to the interval [0,1] by adopting the function \( f \left (x \right ) =x/R_{max} \). Also, we exploit the logistic function \( g \left (x \right ) =1/ \left (1+exp \left (-x \right ) \right ) \) to bound the inner product of latent feature vectors in the range of [0,1]. Thus, the objective function of CMF for time period t is as follows:

$$ \begin{array}{@{}rcl@{}} min_{U^{(t) },V^{(t) },W^{(t) }} ~ \frac{1}{2} \sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n}I_{ij}^{R^{(t) }} (R_{ij}^{(t) }-g (U_{i}^{(t) }V_{j}^{(t)^{T}}) )^{2}\\ +\frac{\lambda_{T}}{2} \sum\limits_{i=1}^{m} \sum\limits_{k=1}^{m}I_{ik}^{T^{(t) }} (T_{ik}^{(t) }-g(U_{i}^{(t) }W_{k}^{(t)^{T}}))^{2}\\ + \frac{\lambda }{2} (\parallel U^{(t) }{\parallel_{F}^{2}}+\parallel V^{(t) }{\parallel_{F}^{2}}+\parallel W^{(t) }{\parallel_{F}^{2}}) \end{array} $$
(1)

where the first two sum terms represent the approximation errors. \( I_{ij}^{R^{\left (t \right ) }} \) and \( I_{ik}^{T^{\left (t \right ) }} \) are indicator functions; \( I_{ij}^{R^{\left (t \right ) }} \) takes 1 if user i rated item j in time period t, and 0 otherwise. Also, \( I_{ik}^{T^{\left (t \right ) }} \) takes 1 if user i trusted the user k in time period t, and 0 otherwise. The parameter λT controls how much the user’s trusters influence his/her preferences. The last three terms in (1) are regularizations to avoid overfitting. λ is the regularization parameter and \( \parallel .{\parallel _{F}^{2}} \) denotes the Frobenius norm with \( \parallel R^{\left (t \right ) }{\parallel _{F}^{2}}=\sqrt []{ {\sum }_{i=1}^{m} {\sum }_{j=1}^{n} \vert R_{ij}^{\left (t \right ) } \vert ^{2}} \).

In practice, the user preferences change smoothly over time (Lo et al. 2018; Tang et al. 2015; Li and Fu 2017); therefore, the users’ latent features should not significantly change in a short time period. Based on this intuition, we assume that the users’ latent features in time period t (t > 1 ) have a temporal dependence to the users’ latent features in time period t-1. We introduce a transition matrix \( M_{i}^{\left (t \right ) } \in R^{d \times d} \) between the user latent feature vectors \( U_{i}^{\left (t-1 \right ) } \) and \( U_{i}^{\left (t \right ) } \) in two successive time periods t-1 and t for each user i. The transition matrix \( M_{i}^{\left (t \right ) } \) captures the mapping between the previous user latent feature vector \( U_{i}^{\left (t-1 \right ) } \) and the current user latent feature vector \( U_{i}^{\left (t \right ) } \) for user i. We add the following temporal smoothness term in (1) to account the temporal dynamics in user preferences:

$$ U_{i}^{(t) } \approx U_{i}^{(t-1) }M_{i}^{(t) } . $$
(2)

Therefore, we can rewrite the objective function in (1) as follows:

$$ \begin{array}{@{}rcl@{}} L=min_{U^{\left( t \right) },V^{\left( t \right) },W^{\left( t \right) }}~ \frac{1}{2} \sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n}I_{ij}^{R^{\left( t \right) }} \left( R_{ij}^{\left( t \right) }-g \left( U_{i}^{\left( t \right) }V_{j}^{\left( t \right)^{T}} \right) \right)^{2}\\ +\frac{ \lambda_{T}}{2} \sum\limits_{i=1}^{m} \sum\limits_{k=1}^{m}I_{ik}^{T^{\left( t \right) }} \left( T_{ik}^{\left( t \right) }-g \left( U_{i}^{\left( t \right) }W_{k}^{\left( t \right)^{T}} \right) \right)^{2}\\ +\frac{ \lambda_{1}}{2} \sum\limits_{i=1}^{m}\parallel U_{i}^{\left( t \right) }-U_{i}^{\left( t-1 \right) }M_{i}^{\left( t \right) }{\parallel_{F}^{2}}\\ + \frac{ \lambda_{2}}{2} \left( \parallel U^{\left( t \right) }{\parallel_{F}^{2}}+\parallel V^{\left( t \right) }{\parallel_{F}^{2}}+\parallel W^{\left( t \right) }{\parallel_{F}^{2}}+ \sum\limits_{i=1}^{m}\parallel M_{i}^{\left( t \right) }{\parallel_{F}^{2}} \right) \end{array} $$
(3)

where the third term with respective regularization parameter λ1 is the smoothness regularization based on the intuition that the user preferences should be smoothly changed over time. The last regularization term \( {\sum }_{i=1}^{m}\parallel M_{i}^{\left (t \right ) }{\parallel _{F}^{2}} \) is used to control the model complexity. λ2 is the regularization parameter. We let λ1 = λ2 in our implementation for the sake of simplicity.

Choosing the proper length of the time period is critical to the performance of our model. We study its impact on recommendation accuracy in Section 4.7.

3.2.2 Optimization algorithm

The objective function L in (3) is not convex for all variables \( U_{i}^{\left (t \right ) } \), \( V_{j}^{\left (t \right ) } \), \( W_{k}^{\left (t \right ) } \), and \( M_{i}^{\left (t \right ) } \) simultaneously, but L is convex with respect to each variable separately. Therefore, we can obtain a local minimum of L using SGD method. The SGD has become very popular recently for using in non-convex optimization problems (Sidiropoulos et al. 2017). It usually has a very good convergence property (Li and Fu 2017). We update each variable by fixing the other variables. After selecting a pair of random entries \( R_{ij}^{\left (t \right ) } \) and \( T_{ik}^{\left (t \right ) } \), the variables \( U_{i}^{\left (t \right ) } \), \( V_{j}^{\left (t \right ) } \), \( W_{k}^{\left (t \right ) } \), and \( M_{i}^{\left (t \right ) } \) are updated as follows:

$$ U_{i}^{\left( t \right) }=U_{i}^{\left( t \right) }- \eta \frac{ \partial L }{ \partial U_{i}^{\left( t \right) }} $$
(4)
$$ V_{j}^{\left( t \right) }=V_{j}^{\left( t \right) }- \eta \frac{ \partial L }{ \partial V_{j}^{\left( t \right) }} $$
(5)
$$ W_{k}^{\left( t \right) }=W_{k}^{\left( t \right) }- \eta \frac{ \partial L }{ \partial W_{k}^{\left( t \right) }} $$
(6)
$$ M_{i}^{\left( t \right) }=M_{i}^{\left( t \right) }- \eta \frac{ \partial L }{ \partial M_{i}^{\left( t \right) }} $$
(7)

where η is the learning rate. We derive the gradients of L with respect to each variable as follows:

$$ \begin{array}{@{}rcl@{}} \frac{ \partial L }{ \partial U_{i}^{\left( t \right) }}= \sum\limits_{j=1}^{n}I_{ij}^{R^{\left( t \right) }}g^{\prime} \left( U_{i}^{\left( t \right) }V_{j}^{\left( t \right)^{T}} \right) \left( g \left( U_{i}^{\left( t \right) }V_{j}^{\left( t \right)^{T}} \right) -R_{ij}^{\left( t \right) } \right) V_{j}^{\left( t \right) }\\ +\lambda_{T} \sum\limits_{k=1}^{m}I_{ik}^{T^{\left( t \right) }}g^{\prime} \left( U_{i}^{\left( t \right) }W_{k}^{\left( t \right)^{T}} \right) \left( g \left( U_{i}^{\left( t \right) }W_{k}^{\left( t \right)^{T}} \right) -T_{ik}^{\left( t \right) } \right) W_{k}^{\left( t \right) }\\ + \lambda_{1} \left( U_{i}^{\left( t \right) }-U_{i}^{\left( t-1 \right) }M_{i}^{\left( t \right) } \right) + \lambda_{2}U_{i}^{\left( t \right) } \end{array} $$
(8)
$$ \frac{ \partial L }{ \partial V_{j}^{\left( t \right) }}= \sum\limits_{i=1}^{m}I_{ij}^{R^{\left( t \right) }}g^{\prime} \left( U_{i}^{\left( t \right) }V_{j}^{\left( t \right)^{T}} \right) \left( g \left( U_{i}^{\left( t \right) }V_{j}^{\left( t \right)^{T}} \right) -R_{ij}^{\left( t \right) } \right) U_{i}^{\left( t \right) } + \lambda_{2}V_{j}^{\left( t \right) } $$
(9)
$$ \frac{ \partial L }{ \partial W_{k}^{\left( t \right) }}= \lambda_{T} \sum\limits_{i=1}^{m}I_{ik}^{T^{\left( t \right) }}g^{\prime} \left( U_{i}^{\left( t \right) }W_{k}^{\left( t \right)^{T}} \right) \left( g \left( U_{i}^{\left( t \right) }W_{k}^{\left( t \right)^{T}} \right) -T_{ik}^{\left( t \right) } \right) U_{i}^{\left( t \right) }+ \lambda_{2}W_{k}^{ \left( t \right) } $$
(10)
$$ \frac{ \partial L }{ \partial M_{i}^{\left( t \right) }}= \lambda_{1}U_{i}^{\left( t-1 \right)^{T}} \left( U_{i}^{\left( t-1 \right) }M_{i}^{\left( t \right) }-U_{i}^{\left( t \right) } \right) + \lambda_{2}M_{i}^{\left( t \right) } $$
(11)

where \(g^{\prime }(x)=exp(-x)/(1+exp(-x))^{2}\) is the derivative of the logistic function \( g \left (x \right ) \). The pseudocode of our proposed TSCMF model is presented in Algorithm 1. First, the raw ratings \( R_{ij}^{\left (t \right ) } \) and \( R^{\left (t-1 \right ) } \) are mapped to the interval [0,1] in line 1. Then, in line 2, the transition matrix \( M_{i}^{\left (t \right ) } \) for each user i is initialized by setting \( M_{i}^{\left (t \right ) }=I \), where I is an d × d identity matrix. Also, in line 3 the latent feature matrices \( U^{\left (t \right ) } \), \( V^{\left (t \right ) } \) and \( W^{\left (t \right ) } \) are initialized with small random values. In line 4, we perform the MF on \( R^{\left (t-1 \right ) } \) by applying LIBMF library (Chin et al. 2016) to compute the user latent feature matrix \( U^{\left (t-1 \right ) } \). In our iterative optimization algorithm in lines 5-12, after selecting a pair of random entries \( R_{ij}^{\left (t \right ) } \) and \( T_{ik}^{\left (t \right ) } \), the variables \( U_{i}^{\left (t \right ) } \), \( V_{j}^{\left (t \right ) } \), \( W_{k}^{\left (t \right ) } \), and \( M_{i}^{\left (t \right ) } \) are updated using (4)–(7), respectively. In line 11, the objective function L in (3) is calculated based on updated variables. The algorithm repeats until L has converged or the maximum number of iterations has been reached. Convergence is achieved when the change of L between current and the previous iteration is greater than a predefined convergence threshold. In our implementation, we set the convergence threshold to 10− 6 and the maximum number of iterations to 105. Finally, in lines 13-15, the predicted rating matrix \(\hat { R}^{\left (t \right ) } \) as output of algorithm is computed.

figure a

3.2.3 Complexity analysis

The main computation cost of learning our model is to evaluate the objective function L and its gradients against variables. The computation complexity to evaluate the objective function L is \( O \left (dN_{R}+dN_{T} \right ) \), where NR and NT are the number of nonzero entries in matrices \( R^{\left (t \right ) } \) and \( T^{\left (t \right ) } \), respectively. The number of latent features d is fixed. The computational complexities for calculating gradients \( \frac { \partial L }{ \partial U_{i}^{\left (t \right ) }} \), \( \frac { \partial L }{ \partial V_{j}^{\left (t \right ) }} \), \( \frac { \partial L }{ \partial W_{k}^{\left (t \right ) }} \) and \( \frac { \partial L }{ \partial M_{i}^{\left (t \right ) }} \) are \( O \left (dN_{R}+dN_{T} \right ) \), \( O \left (dN_{R} \right ) \), \( O \left (dN_{T} \right ) \) and \( O \left (1 \right ) \), respectively. Therefore, the overall computational complexity for each iteration is \( O \left (dN_{R}+dN_{T} \right ) \), which is linear with respect to the number of nonzero entries in rating and trust matrices \( R^{\left (t \right ) } \) and \( T^{\left (t \right ) } \). Therefore, our model can be scaled to large datasets with millions of users and items.

4 Experiments

4.1 Dataset and evaluation methodology

EpinionsFootnote 1 is a popular product review site by which the users can assign numerical ratings on a 1-5 scale and review the items. An item may be a product or service. In addition, Epinions provides a social network with trust relations where users can add other users to their trust networks. We conduct experiments on Epinions dataset (Tang 2019). This dataset contains rating information, social trust relations, and temporal information for both ratings and trust relations that make this dataset ideal for our experiments. The Epinions dataset used in our experiments contains 22166 users who have assigned ratings to at least one of a total of 296277 items. The total numbers of ratings and trust relations are 922267 and 300548 respectively. The rating data are from July 5, 1999 to May 8, 2011. The whole dataset was split into 11 time periods in chronological order. Since the temporal information about the trust relations before January 11, 2001 is not available, the first time period contains the data before January 11, 2001 and the last time periods covers data after January 11, 2010. Each of other time period contains data for one year. For example, the second time period contains data from January 12, 2001 to January 11, 2002.

We use time-dependent cross-validation based on increasing time window (Campos et al. 2014) as evaluation methodology. This method ensures that time dependencies between data are held in each train-test set pair. Based on this method, the data in each time period (except the first time period) are considered as the test set and all data prior to that time period as the training set. Therefore, we have 10 different train-test splits in total. Finally, the average results on test sets are reported. We use the threshold-based relevant item condition (Campos et al. 2014) to determine favorite items for each user. Based on this condition, the items in the user’s test set rated higher than or equal to a threshold value are considered as favorite items. Accordingly, similar to Yang et al. (2017), we consider items in the user’s test set with ratings higher or equal to 4 as his/her favorite items. We conduct all the experiments using MATLAB 2016a on Windows 10 PC with Intel Core i5 2.53 GHz with 8 GB memory.

4.2 Evaluation metrics

We adopt two most popular rating prediction evaluation metrics, i.e., Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) (Yang et al. 2014), to evaluate the rating accuracy of our proposed model in comparison with other methods. These metrics are defined as:

$$ MAE=\frac{ {\sum}_{\left( i,j \right) \in R_{test}}^{} \vert r_{ij}-\hat{r}_{ij} \vert }{ \vert R_{test} \vert } $$
(12)
$$ RMSE=\sqrt[]{\frac{ {\sum}_{\left( i,j \right) \in R_{test}}^{} \vert r_{ij}-\hat{r}_{ij} \vert^{2}}{ \vert R_{test} \vert }} $$
(13)

where Rtest is the set of ratings in the test set, rij is the real rating of user i on item j, and \( \hat {r}_{ij} \) is the predicted rating of user i on item j. The lower the MAE and RMSE indicate better predictive accuracy.

In addition, we use the metrics Recall@K (R@K for short), Precision@K (P@K for short), and F1@K (Yang et al. 2017) to assess the quality of the top-K recommendations. These metrics are defined as:

$$ R@K=\frac{1}{m} \sum\limits_{i=1}^{m}\frac{ \vert Rec_{i} \cap Fav_{i} \vert }{ \vert Fav_{i} \vert } \ \ $$
(14)
$$ P@K=\frac{1}{m} \sum\limits_{i=1}^{m}\frac{ \vert Rec_{i} \cap Fav_{i} \vert }{K} \ \ $$
(15)
$$ F1@K=\frac{2 \times R@K \times P@K}{R@K+P@K} \ \ $$
(16)

where Favi is the set of favorite items of user i in the test set. Reci is the set of top-K recommended items for user i, which is generated by selecting the K items with the highest predicted ratings.

4.3 Comparison methods

We compare our TSCMF model with the following approaches:

  • Probabilistic Matrix Factorization (PMF) (Salakhutdinov and Mnih 2008): This method is the baseline MF approach. It does not consider the temporal dynamics.

  • Collective Matrix Factorization (CMF) (Singh and Gordon 2008): This method jointly factorizes two matrices that share one-side information and does not consider the temporal dynamics. We use the user-item rating and social trust matrices in this method. CMF is the basis of our proposed model.

  • TimeSVD++ (Koren 2010): This method is a baseline for modeling the user preference dynamics. It incorporates the time-varying rating biases of each item and user into MF and generates the recommendations.

  • Bayesian Temporal Matrix Factorization (BTMF) (Zhang et al. 2014): This is a Bayesian temporal MF approach that captures the temporal dynamics of user preferences by learning a transition matrix for each user latent feature vectors between two successive time periods.

  • Dynamic Multi-Task Non-Negative Matrix Factorization (DMNMF) (Ju et al. 2015): This method models the user preference dynamics by fusing multi-task non-negative MF and a transition matrix of users’ latent features.

  • Temporal Matrix Factorization (TMF) (Lo et al. 2018): This method models the user preference dynamics by extracting a transition pattern for each user’s latent feature vector.

  • Dynamic Matrix Factorization with Social Influence (Aravkin et al. 2016): This method incorporates trust relations into dynamic MF model to capture user preference dynamics. It introduces a transition matrix of users’ preferences and assumes that trust relations among users are a graph at each time period. To facilitate comparison, we refer this model as DMF.

  • TimeTrustSVD (Tong et al. 2019): This method integrates rating, trust and time information. It adopts the time-variant biases for each item and each user into the model to capture temporal dynamics of user preferences.

The PMF, TimeSVD++, BTMF, DMNMF, and TMF methods exploit only the user-item rating matrix without any side information.

4.4 Parameter settings

The optimal parameters for each method are determined by cross-validation. Accordingly, we set the learning rate η to 0.001 in PMF and 0.003 in TimeSVD++, CMF, TMF, TimeTrustSVD, and TSCMF. We also set υ0 = d, β0 = 2, W0 = Z0 = I, μ0 = 0 for BTMF, α = 0.6 in CMF, λ = 10− 2 in DMF, λT = 0.8, and λT = 5 in TSCMF. For making a fair comparison, we fix the dimension of latent feature vectors to be 10 in all comparison methods. In addition, we set the regularization parameters to 0.001 in all our experiments.

4.5 Experimental results

Performance of the methods compared in terms of MAE and RMSE on the Epinions dataset is shown in Table 2. We observe that PMF performs worse than other methods. There are significant differences in terms of both MAE and RMSE between PMF and other methods. This is because PMF does not consider the temporal dynamics of user preferences and also does not exploit any side information such as trust relations.

Table 2 The performance of comparative methods in terms of MAE and RMSE. The boldface numbers highlight the best results in each metric, and the row ‘Improve’ presents the improvement percentage that TSCMF gains relative to respective competitors

The results show that the proposed TSCMF method has the best performance in terms of both MAE and RMSE among the compared methods. The improvements of TSCMF against competitive methods indicate that our model can significantly improve the accuracy of rating prediction.

Performance of the methods compared in terms of R@K, P@K, and F1@K (with K = 5, 10) is shown in Table 3. We observe that the temporal methods achieve significantly higher R@K, P@K, and F1@K than PMF. This implies that considering temporal dynamics of user preferences is useful for improving the recommendations. From Table 3, we can see that the proposed TSCMF has the best performance in terms of R@K, P@K, and F1@K among the compared methods for both values of K. In comparison with the other temporal competitors, the results indicate that our TSCMF method can better capture the temporal dynamics of user preferences. We believe that the transition matrix introduced in our model is a key element that contributes to this improvement. Compared to the transition matrix used in DMNMF, BTMF, TMF, and DMF, this matrix is dynamic and is trained individually for each user.

Table 3 The performance of comparative methods in terms of R@K, P@K and F1@K (with K = 5, 10). The boldface numbers highlight the best results in each metric, and the row ‘Improve’ presents the improvement percentage that TSCMF gains relative to respective competitors

From Tables 2 and 3, we can observe that CMF, which uses both ratings and trust information and does not consider the temporal dynamics, outperforms the temporal methods TimeSVD++, DMNMF, and TMF. This finding indicates that regardless of temporal information, incorporating the social trust relations is effective in improving recommendation accuracy. On the other hand, we can see that the temporal methods that use both ratings and trust information (i.e., DMF, TimeTrustSVD, and our TSCMF method), perform better than CMF. The better results obtained for these methods than CMF imply that temporal dynamic and trust relations could be complementary to each other in boosting the accuracy of recommendations. The superiority of TSCMF over competitive methods indicates that the latent features learned from the previous time period are helpful. Also, capturing the dynamics of user preferences in our model, regarding the fact that the user preferences change individually over time, improves the recommendation accuracy.

In order to evaluate the efficiency of our TSCMF model, we compare the running time of our model with other methods. Table 4 reports the experimental results, in seconds. We can see that PMF has the lowest running time. This is because PMF only exploits ratings to learn latent features and also does not consider the temporal dynamics. The trust-based recommendation methods CMF, DMF, and TimeTrustSVD have a higher running time than other methods, which is mainly due to the use of trust information in these methods. Among trust-based methods, our TSCMF method outperforms other methods in terms of the running time. Also, compared to methods that do not exploit any trust information, the running time of TSCMF is lower than BTMF and DMNMF. The main reason is that TSCMF only uses the data of the previous time period to learn latent features, thus the running time is reduced.

Table 4 Efficiency comparison (s)

We notice that from Table 2 the percentage of relative improvements of TSCMF is close to BTMF (around 6.38% in MAE). Comparing the results of Tables 2 and 3, we observe that when the performance is improved from 0.9722 to 0.9102 with respect to MAE, it achieves more than 10 percent relative improvement in precision. Since the running time of the proposed TSCMF method is lower than BTMF, this amount of improvement in the quality of the recommendations can be valuable.

4.6 Impact of parameter λ T

The parameter λT plays an important role in our TSCMF model via controlling the impact of social trust on user’ preferences. The larger values of λT indicate more influence of the social trust information on users’ preference. To assess how different values of λT affect the final recommendation accuracy, we set λT to be 0.1, 0.5, 1, 2, 5, 10, and 20 in our model. We perform this assessment for each of the 10 train-test splits. Figure 2 presents the average MAE and RMSE of our model with different values of λT. As can be seen, λT affects the recommendation results dramatically, suggesting that fusing the users’ rating matrix and social trust matrix can help to improve the recommendation accuracy. As λT increases, the average MAE and RMSE values decrease at first, indicating that the recommendation accuracy increases. In comparison, when λT exceeds a certain threshold, the average MAE and RMSE values increase.

Fig. 2
figure 2

Impact of different values of λT = 5 a MAE and b RMSE

These findings demonstrate that merely exploiting the user-item rating matrix or merely exploiting the social trust information cannot generate better results than appropriately fusing these two resources together in our model. As shown in Fig. 2, TSCMF has its best results for λT = 5.

4.7 Impact of the length of the time period

Choosing the optimal length of the time period is critical in temporal models (Li and Fu 2017), and usually depends on the application of the recommender system (Rafailidis and Nanopoulos 2016). For example, in a news recommender system, users’ preferences in specific news topics may take only a few days, while in a movie recommendation, users’ preferences in movies may change slowly over time. Therefore, choosing a shorter time period may be appropriate for capturing users’ preferences in news than movies (Sahoo et al. 2012). In such a situation, choosing too long time period may lead to miss any change in the behavior of users within that time period.

In order to study of the effect of the time period length on the methods’ performance, we only consider the methods that incorporate the temporal dynamics of user preferences into models. Since the trust information in Epinions dataset used in our experiments is available as annually, the shortest time period length that we select is 1 year. Figure 3 shows the performance of three different lengths of the time period in terms of average MAE and RMSE. From this figure, we see that all methods gain their best results when the length of the time period is set to 1 year. With increasing the length of the time period, the performance of all methods decreases. Another interesting finding in this regard is that for all three examined time period lengths, our model outperforms the other compared methods.

Fig. 3
figure 3

Impact of different length of the time period a MAE and b RMSE

5 Conclusion

In this paper, we proposed the Temporal and Social Collective Matrix Factorization (TSCMF) model to capture the temporal dynamics of user preferences for implementing temporal recommendation. We jointly factorized the users’ rating information and social trust information in a collective matrix factorization framework by introducing a joint objective function. We assumed that the user preferences in current time period have a temporal dependence to user preferences in the previous time period and model user dynamics into the collective matrix factorization framework by learning a transition matrix of user preferences between two successive time periods for each individual user. We presented an efficient optimization algorithm by adopting stochastic gradient descent method for solving the objective function. The experiments on a real-world dataset collected from a popular product review website, i.e., Epinions, show that our proposed model outperforms the other compared methods. In addition, the proposed model can be scaled to large datasets with millions of users and items. Our findings strengthen the idea that modeling the dynamics of user preferences based on the fact that the changes in user preferences vary individually leads to improvements in recommendation accuracy and, consequently, user satisfaction. In addition, considering temporal dynamic and trust relations could be complementary to each other to the development of social recommender systems.

The proposed method can help to improve the quality of social recommender systems. However, in some social recommender systems, trust information is not explicitly available. For future work, we plan to extract implicit trust based on users’ interactions whenever explicit trust is not available and use in our model. We also want to extend the model to address the problem of cold-start users who do not have any rating and any trust relation in both previous and current time periods (named as new users). One possible approach to deal with this problem is exploiting additional side information such as users’ attributes. In some social recommender systems, users can express distrust toward other users. Additionally, we want to exploit distrust relations among users with temporal information in addition to trust relations in our model for generating better-personalized recommendations.