How does serendipity affect diversity in recommender systems? A serendipityoriented greedy algorithm
 144 Downloads
Abstract
Most recommender systems suggest items that are popular among all users and similar to items a user usually consumes. As a result, the user receives recommendations that she/he is already familiar with or would find anyway, leading to low satisfaction. To overcome this problem, a recommender system should suggest novel, relevant and unexpected i.e., serendipitous items. In this paper, we propose a serendipityoriented, reranking algorithm called a serendipityoriented greedy (SOG) algorithm, which improves serendipity of recommendations through feature diversification and helps overcome the overspecialization problem. To evaluate our algorithm, we employed the only publicly available dataset containing user feedback regarding serendipity. We compared our SOG algorithm with topic diversification, popularity baseline, singular value decomposition, serendipitous personalized ranking and Zheng’s algorithms relying on the above dataset. SOG outperforms other algorithms in terms of serendipity and diversity. It also outperforms serendipityoriented algorithms in terms of accuracy, but underperforms accuracyoriented algorithms in terms of accuracy. We found that the increase of diversity can hurt accuracy and harm or improve serendipity depending on the size of diversity increase.
Keywords
Recommender systems Learning to rank Serendipity Novelty Unexpectedness Algorithms Evaluation Serendipity2018Mathematics Subject Classification
97R70 (User programs, administrative applications)1 Introduction
Recommender systems are software tools that suggest items of use to users [17, 27]. An item is “a piece of information that refers to a tangible or digital object, such as a good, a service or a process that a recommender system suggests to the user in an interaction through the Web, email or text message” [17]. For example, an item could refer to a movie, a song or a new friend.
To increase the number of items that will receive high ratings most recommender systems tend to suggest items that are (a) popular, as these items are consumed by many individuals and are often of high quality in many domains [5] and (b) similar to those the user has assigned high ratings, as these items correspond to users’ preferences [17, 19, 29]. As a result, users might become bored with the suggestions provided, as (1) users are likely to be familiar with popular items, while the main reason these users would use a recommender system is to find novel and relevant items [5] and (b) users often lose interest in using the system when they are offered only items similar to items from their profiles (the socalled overspecialization problem) [17, 18, 19, 29]. Here the term user profile refers to the unique ID and the set of items rated by the target user [17], though it might include information, such as real name, user name and age in other papers.

An item is relevant to a user if the user has expressed or will express preference for the item. The user might express his/her preference by liking or consuming the item depending on the application scenario of a particular recommender system [17, 19]. In different scenarios, ways to express preference might vary. For example, we might regard a movie as relevant to a user if the user gave it more than 3 stars out of 5 [21, 33], whereas we might regard a song as relevant to a user if the user listened to it more than twice. The system is aware that a particular item is relevant to a user if the user rates the item, and unaware of its relevance otherwise.

An item is novel to a user if the user had not heard of this item or had not thought of this item prior to the recommendation of this item [16]. Items novel to a user are usually unpopular, as users are often familiar with popular items, where popularity can be measured by the number of ratings given to it in the system [5, 17, 18, 19]. For example, a user is more likely to be familiar with the popular movie “The Shawshank Redemption” than with the unpopular movie “Coherence”. Novel items also have to be relatively dissimilar to a user profile, as the user is likely to be familiar with items similar to the ones she/he has rated [17, 19]. For example, a rock fan is more likely to be familiar with rock songs rather than with pop songs.

An item is unexpected to a user if the user does not anticipate this item to be recommended to him/her or found by him/her or this item is just very dissimilar to what this user usually consumes [16]. The user does not expect items that are dissimilar to the ones usually recommended to him/her. Generally, recommender systems suggest items similar to items rated by the user [17, 19, 29]. Consequently, an item dissimilar to the rated ones is regarded as unexpected [17, 19]. The measure of dissimilarity could be based on user ratings or item attributes depending on the application scenario of a recommender system [13]. For example, a comedy fan would mostly rate comedies and receive recommendations of comedies in a recommender system. A recommendation of documentary would be unexpected to the user, as the user does not expect a recommendation of this genre from this particular recommender system.
Stateoftheart serendipityoriented recommendation algorithms are barely compared with one another and often employ different serendipity metrics and definitions of the concept, as there is no agreement on the definition of serendipity in recommender systems [19, 21, 32].
In this paper, we propose a serendipityoriented recommendation algorithm based on our definition above. We compare our algorithm with stateoftheart serendipityoriented algorithms relying on the first and currently the only publicly available dataset containing user feedback regarding serendipity.
Our serendipityoriented algorithm reranks recommendations provided by an accuracyoriented algorithm and improves serendipity through feature diversification. The proposed algorithm is based on the existing reranking algorithm, topic diversification (TD) [34], and outperforms this algorithm and other algorithms in terms of serendipity and diversity. Our algorithm also outperforms the stateoftheart serendipityoriented algorithms in terms of accuracy.

It considers each component of serendipity.

It improves both serendipity and diversity.

It can be applied to any accuracyoriented algorithm.

We propose a serendipityoriented recommendation algorithm.

We evaluate existing serendipityoriented recommendation algorithms.

We investigate the effect of diversity on accuracy and serendipity.
2 Related work
In this section, we will discuss definitions of serendipity and diversity, and overview algorithms that improve these properties.
2.1 Definition of serendipity
The term serendipity was coined by Horace Walpole by referencing a Persian fairytale, “The Three Princess of Serendip,” in 1758. In the fairytale, the three princes of the country Serendip ventured out to explore the world and made many unexpected discoveries on their way.^{1} In his letter, Horace Walpole mentioned that the princes were “always making discoveries, by accidents & sagacity, of things which they were not in quest of” [26].
The dictionary definition of serendipity is “the faculty of making fortunate discoveries by accident”.^{2} However, there is no consensus on the definition of serendipity in recommender systems. Some researchers require items to be relevant and unexpected to be considered serendipitous [19, 22], whereas other researchers suggest that serendipitous items are novel and unexpected [19, 32]. However, the most common definition of serendipity includes all three components: relevance, novelty and unexpectedness [17, 18].
Novelty and unexpectedness also have multiple definitions, which results in eight variations of serendipity [16]. Novelty has two variations: strict novelty—the user has never heard about an item; and motivational novelty—the user had not thought of consuming an item, before this items was recommended to him/her. Unexpectedness has four variations: unexpectedness (relevant)—the user does not expect to enjoy the item; unexpectedness (find)—the user does not expect to find the item on his/her own; unexpectedness (implicit)—the item is very dissimilar to what the user usually consumes; and unexpectedness (recommend)—the user does not expect the item to be recommended to him/her. Relevance has one variation and indicates how much the user enjoys consuming the item. Since serendipity consists of relevance, novelty and unexpectedness, relevance has one variation, novelty has two variations and unexpectedness has four variations, there are eight variations of serendipity (proposed in [16]): strict serendipity (relevant), strict serendipity (find), strict serendipity (implicit), strict serendipity (recommend), motivational serendipity (relevant), motivational serendipity (find), motivational serendipity (implicit) and motivational serendipity (recommend). We employ these variations in this study.
2.2 Improving serendipity
There are three categories for serendipityoriented algorithms [19]: (a) reranking algorithms (these algorithms change the order of items in recommendation lists using relevance scores provided by accuracyoriented algorithms); (b) serendipityoriented modifications (these algorithms are based on particular accuracyoriented algorithms); and (c) novel algorithms (these algorithms are not based on any common accuracyoriented algorithms, but rather utilize different techniques to improve serendipity).
Reranking algorithms improve serendipity by changing the order of the output of accuracyoriented algorithms [19]. These algorithms often use relevance scores to filter out potentially irrelevant items first and then use other techniques to promote potentially serendipitous ones. For example, the algorithm proposed by Adamopoulos and Tuzhilin first filters out items likely to be irrelevant and obvious to a user and then orders items based on their overall utility for the user. The latter is based on how different an item is to users’ expectations and on relevance scores for this item provided by an accuracyoriented algorithm [1]. Another example is the algorithm proposed by Zhang et al. Auralist [32]. The algorithm consists of the three other algorithms: Basic Auralist, which is responsible for relevance scores, Listener Diversity, which is responsible for diversity, and Declustering, which is responsible for unexpectedness. The algorithm orders items in the recommendation list according to the final score, which is represented by a linear combination of the scores provided by the three algorithms.
Serendipityoriented modifications refer to common accuracyoriented algorithms modified with a purpose of increasing serendipity [19]. The main difference between reranking algorithms and modifications is that modifications are always based on particular accuracyoriented algorithms, whereas a particular reranking process can be applied to any accuracyoriented algorithm, which provides relevance scores. For example, Nakatsuji et al. modified a common userbased collaborative filtering algorithm (knearest neighbor algorithm) [10] by replacing the user similarity measure with relatedness. It is calculated using random walks with restarts on a user similarity graph [23]. The graph consists of nodes corresponding to users and edges corresponding to similarities based on an item taxonomy. By utilizing the relatedness, for a target user, the algorithm picks a neighborhood of users who are not necessarily similar, but who are in some way related to the target user [23]. Another example of modifications is the algorithm proposed by Zheng et al. The algorithm is based on PureSVD (a variation of the singular value decomposition algorithm) [7]. The main difference between PureSVD and its modification is that the objective function of the modification includes components responsible for unexpectedness, whereas the objective function of PureSVD lacks these components [33].
Novel serendipityoriented algorithms neither fall into reranking nor into modifications categories, as they are not based on any common accuracyoriented algorithms and do not use relevance scores provided by any accuracyoriented algorithms [19]. For example, TANGENT recommends items using relevance scores and bridging scores, where both kinds of scores are inferred using a bipartite graph [24]. The graph contains nodes that represent users and items, and edges that represent ratings. The algorithm calculates relevance scores using random walks with restarts and bridging scores based on the calculated relevance scores [24]. Another example of an algorithm that belongs to the category of novel algorithms is random walk with restarts enhanced with knowledge infusion [8]. The algorithm orders items in recommendation lists according their relatedness to a user profile. The relatedness is calculated using random walks with restarts on an item similarity graph, where nodes correspond to items, and edges correspond to similarities between these items. To calculate the similarities, the authors used the spreading activation network based on Wikipedia and WordNet [8].
Most existing algorithms have been designed to achieve serendipity measured by artificial evaluation metrics due to the lack of publicly available datasets containing user feedback regarding serendipity. The results of artificial evaluation metrics might be misleading, as the assumptions that these metrics are based on might not correspond to the reality due to the lack of ground truth [15]. Furthermore, performance of most existing algorithms is not compared with that of others [18]. In this article, we propose a reranking algorithm and compare it with stateoftheart algorithms in evaluation conducted on the first publicly available dataset containing serendipity ground truth.
2.3 Definition of diversity
Diversity is a property of a recommendation list or a set of them composed by one or several recommender systems. It reflects how dissimilar items are to each other in the list [4, 14]. To measure diversity inside a list, researchers often calculate an average pairwise dissimilarity of items in a recommendation list [4, 14], where dissimilarity can be represented by any metric, which reflects how dissimilar items are to one another. A dissimilarity metric is often based on attributes of items. The higher the average pairwise dissimilarity, the higher the diversity of the list.
Diversity is considered as a desirable property of a recommender system, as it was proven to improve user satisfaction [34], and by diversifying the recommendation results, we are likely to suggest an item satisfying a current need of a target user [14]. A fan of the movie The Matrix is likely to prefer a recommendation list of movies similar to The Matrix, including this movie, rather than a recommendation list consisting of The Matrix sequels only.
Diversity is not always related to dissimilarity of items in a particular recommendation list. The term can also refer to diversity of recommendations provided by different recommender systems [3], diversity across recommendation lists suggested to all the users of a particular system [2], or diversity of recommendations to the same user in a particular system over time [20]. In these cases one needs a more complicated diversity measure than the pairwise average diversity of items inside one recommendation list.
2.4 Improving diversity
Greedy reranking algorithms are very common in improving diversity of recommendation lists. They create two lists of items (a candidate list and a recommendation list), and iteratively move items from the candidate list to the recommendation list [4, 14]. In each iteration, these algorithms calculate different scores, which depend on the algorithm. Based on these scores, the algorithms pick an item from the candidate list to be moved to the recommendation list [4, 14]. For example, the TD algorithm, which our algorithm is based on, calculates in each iteration average similarities between each item in the candidate list and items in the recommendation list and uses the obtained scores to pick an item that is the most relevant but at the same time the most dissimilar to the items already added to the recommendation list [34].
Another group of the algorithms optimized for diversity take diversity into account in the process of generating recommendations. For example, Su et al. proposed an algorithm that integrates diversification in a traditional matrix factorization model [28]. Another example of an algorithm falling into this category is diversified collaborative filtering algorithm (DCF) that employs a combination of support vector machine and parametrized matrix factorization to generate accurate and diversified recommendation lists [6].
To the best of our knowledge, studies that focus on both serendipity and diversity are very limited. In this article, we propose an algorithm that improves both serendipity and diversity.
3 A serendipityoriented greedy algorithm
Notations
Symbol  Description 

\(I=\{i_1, i_2, \ldots , i_{I}\}\)  The set of items 
\(I_u, I_u \subseteq I\)  The set of items rated by user u (user profile) 
\(U=\{u_1, u_2, \ldots , u_{U}\}\)  The set of users 
\(U_{i}, U_i \subseteq U\)  The set of users who rated item i 
\(RS_u(n), RS_u(n) \subseteq I\)  The set of top–n recommendations provided by an algorithm to user u 
\(r_{ui}\)  The rating given by user u to item i 
\(\hat{r}_{ui}\)  The prediction of the rating given by user u to item i 
3.1 Description
Algorithm 1 describes the proposed approach. An accuracyoriented algorithm predicts item ratings \(\hat{r}_{u,i}\) and generates top–n suggestions \(RS_u(n)\) for user u. SOG iteratively picks items from the set corresponding to \(RS_u(n)\) to fill diversified list Res. In each iteration the algorithm generates a candidate set \(B'\) that contains top–n recommendations \(RS_u(n)\) except items already picked to the list Res (converted to the set B). A candidate item with the highest score is added to the diversified list Res. The result Res contains the same items as \(RS_u(n)\), but in a (possibly) different order.

SOG considers item scores instead of positions of items in lists, which leads to more accurate scores (\(score_{ui}\)).

SOG takes into account parameters important for serendipity.

The algorithm considers each component of serendipity.

As our algorithm is based on the diversification algorithm, SOG improves both serendipity and diversity.

As SOG is a reranking algorithm, it can be applied to any accuracyoriented algorithm, which might be useful for a live recommender system (reranking could also be conducted on the client’s side in a clientserver application scenario).

Our algorithm employs four weights that allow to control serendipity. The weights could be different for each user and be adjusted as the user becomes familiar with the system.
3.2 Computational complexity
In cases when n is required to be very high, which makes the computational time unacceptable, one might want to ignore \(div_{i,B}\) in Eq. 2, which will decrease the computational complexity to \(\mathcal {O}(n^2)\). This will also decrease the diversification effect of the algorithm, but keep the improvement of serendipity.
4 Experiments
In this section, we present the dataset we used in our experiments, baseline algorithms and evaluation metrics.
4.1 Dataset
To compare performance of our algorithm with the baselines, we evaluated these algorithms on the Serendipity2018 dataset, as to the best of our knowledge, this is the only publicly available dataset, which contains user feedback regarding serendipity [16]. As the amount of this feedback is limited, we generated additional user feedback based on this dataset. We then split this dataset into three different datasets to evaluate our baselines. In this section, we first describe the dataset and then provide details on its preprocessing.
4.1.1 Description
Serendipity2018 contains ratings given by users to movies on the movie recommender system MovieLens,^{3} where users rate movies they watched in the past on the scale from 0.5 to 5 stars and receive recommendations of movies to watch based on their ratings. The authors of the dataset conducted a survey in MovieLens, where they asked users how serendipitous these users find particular movies. In the survey, the authors selected movies that were assigned low number of ratings in the system (unpopular movies) and given high ratings by the users (relevant movies), as these movies were likely to be serendipitous to the users.
The authors proposed eight variations of serendipity and asked users to indicate how serendipitous each movie was to them according to each of the eight variations (see Sect. 2.1). The dataset thus contains eight binary variables indicating whether a movie is serendipitous or not according to a particular variation.
The dataset contains two types of user feedback: relevance ratings and serendipity ratings. Relevance ratings are 5star ratings that indicate how much users enjoyed watching the movies. Serendipity ratings are binary ratings that indicate whether users considered movies as serendipitous or as nonserendipitous. Serendipity2018 contains 10 million relevance ratings given by 104,661 users to 49,151 different movies and 2150 serendipity ratings given by 481 users (up to five serendipity ratings per user) to 1678 different movies.
4.1.2 Preprocessing
In this experiment, we targeted the union of six variations of serendipity, as the two remaining variations are likely to reduce user satisfaction [16]. A movie was considered as serendipitous to a user, if this movie was serendipitous to the user according to at least one of the six remaining serendipity variations: strict serendipity (find), strict serendipity (implicit), strict serendipity (recommend), motivational serendipity (find), motivational serendipity (implicit) and motivational serendipity (recommend). For detailed discussion the reader is urged to consult [16] and Sect. 2.1.
We generated a number of serendipity ratings due to the lack of these ratings in the dataset. For each of these users, we randomly selected five movies rated by the user with a relevance rating and labeled these movies nonserendipitous for this user. We regarded these movies nonserendipitous, as they were unlikely to be serendipitous to the users. According to the dataset, the chance of a movie to be serendipitous to the user is up to 13%,^{4} provided that the authors of the dataset selected movies that were likely to be serendipitous in their survey. These movies were relevant to the users, as users gave them high relevance ratings and likely to be novel, as these movies had relatively low number of ratings in MovieLens. To randomly select relevance ratings and label them nonserendipitous, we did not control for popularity or relevance. The chance of making a mistake labeling a movie nonserendipitous to the user is thus much lower than 13%. The final dataset contained 4555 serendipity ratings (2405 were generated) given by 481 users to 1931 different movies and 10 million relevance ratings given by 104,661 users (including the 481 users) given to 49,151 different movies.
To tune and evaluate the baselines, we split the final dataset into three datasets: the training dataset, the tuning dataset and the test dataset. The training dataset contains almost 10 million relevance ratings, while the tuning dataset contains 3043 relevance and serendipity ratings (67% of serendipity ratings) of the same usermovie pairs. The test dataset contains 1512 relevance and serendipity (33% of serendipity ratings) ratings of the same usermovie pairs. To tune the parameters of the baselines, we trained them on the relevance ratings of the training dataset and tuned the parameters based on the performance of these baselines on the serendipity ratings of the tuning dataset. We then trained the baselines with the inferred parameters on relevance ratings of the training and tuning datasets combined and evaluated them on the relevance (to measure relevance) and serendipity (to measure serendipity) ratings of the test dataset.
4.1.3 Similarity measure
4.2 Baselines

POP ranks items according to the number of ratings each item received in descending order.
 SVD is a singular value decomposition algorithm that ranks items according to generated scores [33]. The objective function of the algorithm is the following:where \(p_{u}\) and \(q_{i}\) are userfactor vector and itemfactor vector, respectively, while \(\beta (p_u^2 + q_j^2)\) represents the regularization term. Based on tuning, we picked the following parameters: feature number \(\,=\,\) 200, learning rate \(\,=\,10^{5}\) and regularization term \(=\) 0.1.$$\begin{aligned} min \quad \sum _{u \in U} \sum _{i \in I_u} (r_{ui}  p_{u} \cdot q_{i}^{T})^2 + \beta (p_u^2 + q_i^2) , \end{aligned}$$(10)
 SPR (serendipitous personalized ranking) is an algorithm based on SVD that maximizes the serendipitous area under the ROC (receiver operating characteristic) curve [21]:$$\begin{aligned}&max \quad \sum _{u \in U} f(u) , \end{aligned}$$(11)where \(I_u^+\) is a set of items a user likes. We considered that a user likes items that she/he rates higher than threshold \(\theta \) (in our experiments \(\theta =3\)). Normalization term \(z_u\) is calculated as follows: \(z_u = \frac{1}{I_u^+ I_u \backslash I_u^+}\). Based on tuning, we picked the following parameters: Bayesian loss function, \(\alpha =0.4\), feature number \(=\) 200, learning rate \(= 10^{5}\) and regularization term \(=\) 0.1.$$\begin{aligned}&f(u) = \sum _{i \in I_u^+} \sum _{j \in I_u \backslash I_u^+} \cdot z_u \cdot \sigma (0, \hat{r}_{ui}  \hat{r}_{uj})(U_j)^\alpha , \end{aligned}$$(12)
 Zheng’s is an algorithm based on SVD that considers observed and unobserved ratings and weights the error with unexpectedness [33]:$$\begin{aligned}&min \quad \sum _{u \in U} \sum _{i \in I_u} (r_{ui}  p_{u} \cdot q_{i}^{T})^2 \cdot w_{ui} \nonumber \\&\quad + \beta (p_u^2 + q_i^2) , \end{aligned}$$(13)where \(max_{j \in I}(U_j)\) is the maximum number of ratings given to an item. A collaborative dissimilarity between items i and j is represented by diff(i, j). The dissimilarity is calculated as \(diff(i, j) = 1  \rho _{i,j}\), where similarity \(\rho _{i,j}\) corresponds to the Pearson correlation coefficient:$$\begin{aligned}&w_{ui} = \left( 1  \frac{U_i}{max_{j \in I}(U_j)} \right) + \frac{\sum _{j \in I_u} \cdot diff(i, j)}{I_u} , \end{aligned}$$(14)where \(S_{i,j}\) is the set of users rated both items i and j, while \(\overline{r}_u\) corresponds to an average rating for user u. In our implementation, we excluded unobserved ratings due to the size of our dataset. Based on tuning, we picked the parameters: feature number \(=\) 200, learning rate \(= 10^{5}\) and regularization term \(=\) 0.1.$$\begin{aligned} \rho _{i, j} = \frac{\sum _{u \in S_{i,j}} (r_{u,i}  \overline{r}_u)(r_{u,j}  \overline{r}_u)}{\root \of {\sum _{u \in S_{i,j}} (r_{u,i}  \overline{r}_u)^2}\sqrt{\sum _{u \in S_{i,j}} (r_{uj}  \overline{r}_u)^2}} , \end{aligned}$$(15)

TD is a topic diversification algorithm, where similarity corresponds to Eq. (8) and the ratings are predicted by SVD [34]. Based on tuning, we set \(\varTheta _F=0.9\).

SOG is the proposed serendipityoriented greedy algorithm, where the ratings are predicted by SVD. Based on tuning, we set \(\alpha _{rel} = 0.9\), \(\alpha _{div} = 0.1\), \(\alpha _{prof} = 0.7\) and \(\alpha _{unpop} = 0.7\).
4.3 Evaluation metrics
 To measure a ranking ability of an algorithm, we use normalized discounted cumulative gain (NDCG), which, in turn, is based on discounted cumulative gain (DCG) [12]:where \(rel_u(i)\) indicates relevance of item i with rank pos(i) for user u, while n indicates the number of top recommendations selected. pos(i) is the distance of the item from the beginning of the list (1, 2, 3, ..., n). The NDCG metric is calculated as follows:$$\begin{aligned} DCG_u@n=rel_u(1) + \sum _{i=2}^n \frac{rel_u(i)}{log_2(pos(i))}, \end{aligned}$$(16)where \(IDCG_u@n\) is \(DCG_u@n\) value calculated for a recommendation list with an ideal order according to relevance.$$\begin{aligned} NDCG_u@n = \frac{DCG_u@n}{IDCG_u@n} , \end{aligned}$$(17)
 To measure serendipity, we adopted the accuracy metric precision, since user feedback regarding serendipity is the binary variable:where \(ser_u@n\) corresponds to the number of serendipitous items in the first n results. To tune our algorithms, we used Serendipity@3.$$\begin{aligned} Serendipity_u@n = \frac{ser_u@n}{n} , \end{aligned}$$(18)
5 Results
Serendipity
Algorithm  Serendipity@1  Serendipity@3  Serendipity@5 

POP  0.021  0.021  0.051 
Random  0.170  0.149  0.140 
TD  0.277  0.227  0.226 
SVD  0.277  0.241  0.213 
Zheng’s  0.319  0.248  0.204 
SPR  0.319  0.291  0.268 
SOG  0.277  0.305  0.230 
Accuracy
Algorithm  NDCG@1  NDCG@3  NDCG@5 

POP  0.832  0.842  0.864 
Random  0.823  0.850  0.878 
Zheng’s  0.855  0.884  0.902 
SPR  0.850  0.886  0.905 
SOG  0.881  0.887  0.899 
TD  0.881  0.898  0.920 
SVD  0.881  0.903  0.921 
Diversity
Algorithm  Div@5 

SVD  0.347 
Zheng’s  0.347 
POP  0.353 
SPR  0.356 
TD  0.359 
Random  0.367 
SOG  0.371 
 1.Serendipity (Table 2)
 1.1
SOG outperforms other algorithms at top3 results
 1.2
SOG underperforms SPR at top1, as our algorithm always keeps the most relevant item as the first one in the list
 1.3
SOG underperforms SPR at top5, as our algorithm was tuned for top3 results
 1.4
POP demonstrates the lowest performance among the presented baselines, as the most popular items are the most wellknown and the least surprising to the users [18, 19, 32, 33]
 1.5
The serendipityoriented algorithms Zhengs and SPR outperform the accuracyoriented algorithm SVD, as the serendipityoriented algorithms were designed to achieve high serendipity
 1.6
Personalized algorithms (Zheng’s, SPR, TD, SVD and SOG) outperform nonpersonalized ones (POP and Random)
 1.1
 2.Accuracy (Table 3)
 2.1
SOG underperforms the accuracyoriented algorithms SVD and TD, and outperforms the serendipityoriented algorithms SPR and Zheng’s due to the objective of our algorithm
 2.2
Personalized algorithms (Zheng’s, SPR, TD, SVD and SOG) outperform nonpersonalized ones (POP and Random)
 2.3
The accuracyoriented algorithm SVD outperforms serendipityoriented ones (SOG, SPR and Zheng’s), as SVD was optimized for serendipity, while the other algorithms were optimized for serendipity
 2.1
 3.Diversity (Table 4)
 3.1
SOG outperforms other algorithms in terms of diversity
 3.1
According to observations 1.1 and 3.1, serendipity and diversity are properties that can be increased simultaneously. Meanwhile observations 1.1 and 2.1 indicate that the increase of serendipity can cause the decrease of accuracy.
5.1 Investigating the effect of diversity
To investigate the effect of diversity on serendipity and accuracy, we run TD multiple times varying the damping factor from 0 till 0.95. We picked TD for the sake of simplicity.
6 Discussion
In our experiments, we only considered the movie domain, as the only publicly available serendipity dataset contains information on movies. In other domains, the findings might be different. In fact, in some domains and situation, serendipity either might not be suitable or might need to be redefined. For example, generating a play list based on the number of songs or keywords might not require any serendipity [30]. The investigation of the effect serendipity in other domains required user studies and datasets from these domains.
In our experiments, we assumed that the number of candidate items n is relatively small (around 20), as with the increase of n, our algorithm is likely to pick items irrelevant to the user, which is likely to repulse him/her. This assumption is reasonable when serendipitous recommendations are mixed with nonserendipitous ones. However, in the situation, when a recommender system needs to suggest serendipitous items to the user regardless of the number of irrelevant ones (“surprise me” option), n might need to be high, which would significantly increase the time to generate recommendations. A solution in this situation might be to choose another baseline algorithm, such as SPR.
7 Conclusion and future work
We proposed serendipityoriented greedy (SOG) algorithm, provided evaluation results of our algorithm and stateoftheart algorithms on the only publicly available dataset that contains user feedback regarding serendipity. We also investigated the effect of diversity on accuracy and serendipity.
According to our results, our algorithm outperforms other algorithms in terms of serendipity and diversity, serendipityoriented algorithms in terms of accuracy, but underperforms accuracyoriented algorithms in terms of accuracy.
We found that accuracy, serendipity and diversity are not independent properties of recommender systems. The increase of diversity can hurt accuracy and hurt or improve serendipity depending on size of the increase.
In our future work, we are planning to further investigate serendipity by designing serendipityoriented algorithms and evaluating them with real users. Having a bigger dataset on serendipity might provide insights on serendipity. Deep learning seems to be a promising direction for designing serendipityoriented algorithms [25]. User studies might help to further investigate the effect of serendipity on users and the performance of algorithms in terms of user satisfaction.
Footnotes
Notes
Acknowledgements
Open access funding provided by University of Jyväskylä (JYU). The research at the University of Jyväskylä was performed in the MineSocMed project, partially supported by the Academy of Finland, grant #268078 and the KAUTE Foundation.
References
 1.Adamopoulos P, Tuzhilin A (2014) On unexpectedness in recommender systems: or how to better expect the unexpected. ACM Trans Intell Syst Technol 5(4):1–32CrossRefGoogle Scholar
 2.Adomavicius G, Kwon Y (2012) Improving aggregate recommendation diversity using rankingbased techniques. IEEE Trans Knowl Data Eng 24(5):896–911CrossRefGoogle Scholar
 3.BellogíN A, Cantador I, Castells P (2013) A comparative study of heterogeneous item recommendations in social systems. Inf Sci 221:142–169MathSciNetCrossRefGoogle Scholar
 4.Castells P, Hurley NJ, Vargas S (2015) Novelty and diversity in recommender systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook. Springer, Boston, pp 881–918CrossRefGoogle Scholar
 5.Celma Herrada Ò (2009) Music recommendation and discovery in the long tail. Ph.D. thesis, Universitat Pompeu FabraGoogle Scholar
 6.Cheng P, Wang S, Ma J, Sun J, Xiong H (2017) Learning to recommend accurate and diverse items. In: Proceedings of the 26th international conference on World Wide Web, pp. 183–192. International World Wide Web Conferences Steering CommitteeGoogle Scholar
 7.Cremonesi P, Koren Y, Turrin R (2010) Performance of recommender algorithms on topn recommendation tasks. In: Proceedings of the fourth ACM conference on recommender systems, pp. 39–46. ACM, New York, NY, USA. https://doi.org/10.1145/1864708.1864721
 8.de Gemmis M, Lops P, Semeraro G, Musto C (2015) An investigation on the serendipity problem in recommender systems. Inf Process Manag 51(5):695–717. https://doi.org/10.1016/j.ipm.2015.06.008. http://www.sciencedirect.com/science/article/pii/S0306457315000837
 9.Ekstrand MD, Ludwig M, Konstan JA, Riedl JT (2011) Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. In: Proceedings of the 5th ACM conference on recommender systems, pp. 133–140. ACM, New York, NY, USAGoogle Scholar
 10.Ekstrand MD, Riedl JT, Konstan JA (2011) Collaborative filtering recommender systems. Found Trends Hum Comput Interact 4(2):81–173. https://doi.org/10.1561/1100000009 CrossRefGoogle Scholar
 11.Gunawardana A, Shani G (2015) Evaluating recommender systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook. Springer, Boston, pp 265–308CrossRefGoogle Scholar
 12.Järvelin K, Kekäläinen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 41–48. ACM, New York, NY, USAGoogle Scholar
 13.Kaminskas M, Bridge D (2014) Measuring surprise in recommender systems. In: Proceedings of the workshop on recommender systems evaluation: dimensions and design (Workshop programme of the 8th ACM conference on recommender systems)Google Scholar
 14.Kaminskas M, Bridge D (2016) Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyondaccuracy objectives in recommender systems. ACM Trans Interact Intell Syst (TiiS) 7(1):2Google Scholar
 15.Kotkov D (2018) Serendipity in recommender systems. Jyväskylä studies in computing (281)Google Scholar
 16.Kotkov D, Konstan J.A, Zhao Q, Veijalainen J (2018) Investigating serendipity in recommender systems based on real user feedback. In: Proceedings of SAC 2018: symposium on applied computing, ACMGoogle Scholar
 17.Kotkov D, Veijalainen J, Wang S (2016) Challenges of serendipity in recommender systems. In: Proceedings of the 12th international conference on web information systems and technologies., vol 2, pp 251–256. SCITEPRESSGoogle Scholar
 18.Kotkov D, Veijalainen J, Wang S (2017) A serendipityoriented greedy algorithm for recommendations. In: Proceedings of the 13th international conference on web information systems and technologies, vol 1, pp 32–40. ScitePressGoogle Scholar
 19.Kotkov D, Wang S, Veijalainen J (2016) A survey of serendipity in recommender systems. Knowl Based Syst 111:180–192CrossRefGoogle Scholar
 20.Lathia N, Hailes S, Capra L, Amatriain X (2010) Temporal diversity in recommender systems. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 210–217. ACMGoogle Scholar
 21.Lu Q, Chen T, Zhang W, Yang D, Yu Y (2012) Serendipitous personalized ranking for topn recommendation. In: Proceedings of the The IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technology, vol 1, pp 258–265. IEEE Computer Society, Washington, DC, USAGoogle Scholar
 22.Maksai A, Garcin F, Faltings B (2015) Predicting online performance of news recommender systems through richer evaluation metrics. In: Proceedings of the 9th ACM conference on recommender systems, pp 179–186. ACM, New York, NY, USAGoogle Scholar
 23.Nakatsuji M, Fujiwara Y, Tanaka A, Uchiyama T, Fujimura K, Ishida T (2010) Classical music for rock fans?: Novel recommendations for expanding user interests. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10, pp. 949–958. ACM, New York, NY, USA. https://doi.org/10.1145/1871437.1871558
 24.Onuma K, Tong H, Faloutsos C (2009) Tangent: A novel, ’surprise me’, recommendation algorithm. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09, pp. 657–666. ACM, New York, NY, USA. https://doi.org/10.1145/1557019.1557093
 25.Pandey G, Kotkov D, Semenov A (2018) Recommending serendipitous items using transfer learning. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 1771–1774. ACMGoogle Scholar
 26.Remer TG (ed) (1965) Serendipity and the three princes: From the Peregrinaggio of 1557. University of Oklahoma Press, Norman, p 20Google Scholar
 27.Ricci F, Rokach L, Shapira B (2011) Recommender systems handbook, chap. Introduction to recommender systems handbook, pp 1–35. Springer, New YorkGoogle Scholar
 28.Su R, Yin L, Chen K, Yu Y (2013) Setoriented personalized ranking for diversified topn recommendation. In: Proceedings of the 7th ACM conference on recommender systems, pp 415–418. ACMGoogle Scholar
 29.Tacchini E (2012) Serendipitous mentorship in music recommender systems. Ph.D. thesisGoogle Scholar
 30.Vall A, Dorfer M, Schedl M, Widmer G (2018) A hybrid approach to music playlist continuation based on playlistsong membership, pp 1374–1382. https://doi.org/10.1145/3167132.3167280
 31.Vig J, Sen S, Riedl J (2012) The tag genome: encoding community knowledge to support novel interaction. ACM Trans Interact Intell Syst (TiiS) 2(3):13Google Scholar
 32.Zhang YC, Séaghdha DO, Quercia D, Jambor T (2012) Auralist: Introducing serendipity into music recommendation. In: Proceedings of the 5th ACM international conference on web search and data mining, pp 13–22. ACM, New York, NY, USAGoogle Scholar
 33.Zheng Q, Chan CK, Ip HH (2015) An unexpectednessaugmented utility model for making serendipitous recommendation. In: Advances in data mining: applications and theoretical aspects, vol 9165, pp 216–230. Springer International PublishingGoogle Scholar
 34.Ziegler CN, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification. In: Proceedings of the 14th international conference on World Wide Web, pp 22–32. ACM, New York, NY, USAGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.