Balancing the popularity bias of object similarities for personalised recommendation

Network-based similarity measures have found wide applications in recommendation algorithms and made significant contributions for uncovering users’ potential interests. However, existing measures are generally biased in terms of popularity, that the popular objects tend to have more common neighbours with others and thus are considered more similar to others. Such popularity bias of similarity quantification will result in the biased recommendations, with either poor accuracy or poor diversity. Based on the bipartite network modelling of the user-object interactions, this paper firstly calculates the expected number of common neighbours of two objects with given popularities in random networks. A Balanced Common Neighbour similarity index is accordingly developed by removing the random-driven common neighbours, estimated as the expected number, from the total number. Recommendation experiments in three data sets show that balancing the popularity bias in a certain degree can significantly improve the recommendations’ accuracy and diversity simultaneously.


Introduction
The overwhelming online information, though provides users massive and diverse choices, is making it more and more difficult to find what they really want.Accordingly, users largely rely on the information filtering systems, such as the search engines and recommender systems [1][2][3], to look for relevant information.
The recommender systems have got significant attentions and wide applications over the past decades due to its advances in finding users' potential interests [4][5][6].Various algorithms have been developed including the content-based systems which applies the object information such as attributes [7], contents [8] and tags [9,10] to define similarities, and the most widely used collaborative filtering systems [1,11,12].Modelling the interactions between users and objects as bipartite networks [13,14], collaborative filtering systems normally examine the association patterns of either users accessing same objects, or objects being accessed by the same users, leading to the user-based and object-based collaborative filtering respectively.Many practical online systems are object-based systems using object similarities, such as the Amazon's product recommender system [4] and YouTube's video recommender system [5].These systems generally recommend objects that are similar to the target users' historical a e-mail: l.hou@pgr.reading.ac.uk selections.The quantification of object similarity consequently becomes the crucial part for such kind of recommender systems.
Based on the user-object bipartite network modelling, where the users and objects are abstracted as two sets of vertices, the object similarities can be evaluated according to the structure of the network.The most widely-used such approach is known as the Common Neighbour (CN) index which examines the size of two objects' common neighbourhood.If letting Γ α be the set of users who are connecting to object α, the CN similarity between objects α and β can be described as s CN αβ = |Γ α ∩ Γ β |.Considering the popularities, i.e. degrees, of the objects k, a number of variations have been developed, such as the Leicht-Holme-Newman (LHN) index [15] and the Hub-Promoted (HP) index [16], which read respectively.On the other hand, the Adamic-Adar (AA) index [17] and Resource Allocation (RA) index [18] take the user degree as weight to the CN index and thus read s AA αβ = i∈Γα∩Γ β 1/ log(k i ) and s RA αβ = i∈Γα∩Γ β 1/k i respectively.Some diffusion processes have also been introduced to measure the vertex similarity in bipartite networks, and the Mass Diffusion (MD) index [13] and Heat Conduction (HC) index [19] are proposed which read For a review on these similarity quantifications, see reference [3].
Though have been widely applied in both analytics of complex networks and the recommender systems, most of the existing similarity indices have systematic popularity bias [12,20,21].Objects, in general, would tend to be more similar to either very popular objects (for indices such as CN and MD) or very unpopular objects (for indices such as HC).As a consequence, the recommendations would be biased in terms of the popularity leading to either poor accuracy (HC) or poor diversity (CN, MD).Additionally, given the scope of serving online users, both the accuracy and diversity have been argued to be crucial for recommender systems [22][23][24].Therefore, to balance the popularity bias of object similarities and explore to what extent should the system recommend popular objects is significant for achieving accurate and diverse recommendations.
In this paper we develop a Balanced Common Neighbour (BCN) index for measuring the object similarity in bipartite networks based on the evaluation of the expected common neighbourhood for two objects with specific popularities.Applying the proposed method in personalised recommendation, we show that to balance the popularity bias of similarities in a certain degree can largely improve the performances of the recommendations.

Balanced object similarity index
The most fundamental CN index regards two objects that have been collected by many same users as similar to each other.However, popular objects which have been collected by a large number of users tend to have more common neighbours (be more similar) with others than those unpopular ones.As shown in Figure 1, the popular objects are frequently evaluated by the CN index as the most similar objects to others.If ranking randomly, the popularity distribution of the top similar objects should be expected to be same as the object popularity distribution of the empirical data.However, the distribution of CN top similar objects is much higher than the empirical distribution on the tail (the range of popular objects).As a consequence, the recommender system has a tendency to recommend popular objects to users, which though may lead to good accuracy, will result in bad personalisation.
Actually, the number of common neighbours for two objects α and β consists of two components that one comes from random mechanism n rand αβ and the other comes from the similarity between them n sim αβ .While the random component is completely popularity-correlated, the other one describes purely the similarity regardless of their popularities.To distinguish these two components is thus of significance for us to control the popularity tendency and the similarity tendency of the recommender system to optimise the recommendations.To do so, we firstly assume the user-object bipartite network to be completely random and explore the expected number of common neighbours n exp αβ for two arbitrary objects with given popularities.Considering a random user-object bipartite network with N users and M objects, we let T to be the total links between users and objects, i.e.
We let all the links between users and objects break into FM respectively.The introduction and basic statistics of these datasets can be found in Table 1.In each of the subplots, the green circles are the distribution of all the objects, i.e. the basic degree distribution of the object side of the user-object bipartite network.For each object, we calculate the similarities of it with each of the others using the CN index and proposed BCN index with parameter λ = 1, and the corresponding curves (black triangles for CN index and red square for BCN index) show the distributions of popularities of the top-20 similar objects.
half-links so that we can explore the expected number of common neighbours in a process of random rewiring, which is also known as the configuration model [25][26][27].Assuming an object β has randomly connected to k β users, and we let T β to be the total number of remaining halflinks originating form these k β users.For each half-link of β's, its probability to connect to a user u can be given by k u /T , leading the expected degree of the user at the end of each link of β's to be u k 2 u /T .Accordingly, we have, The number of common neighbours between any other object α with β is thus determined by the process where α select k α out of T − k β half-links.When one of the T β half-links is selected, one CN is generated for α and β.Therefore, the number of common neighbours between α and β, n αβ can be described by an hypergeometric distribution H(n αβ ; k α , T β , T − k β ).For any bipartite network which is sparse enough, i.e.T k o , ∀o, we can approximately have H(n αβ ; k α , T β , T ) to describe the distribution of n αβ .The mean of such hypergeometric distribution is thus the expected number of common neighbours between two random objects, which reads Note that, such consideration does not exclude the case of multi-links, leading the calculated value slightly higher than the actual theoretical value for number of common neighbours, especially for those objects with very large degrees.However, the expression is valid for the sparse limit or the limit of N → ∞; M → ∞.
The derivation of the expected number of common neighbours is based on a null model of bipartite networks where the two kinds of nodes connects to each other randomly, and thus the resulted expression is similar to the ones in other similar context [15,28].As shown by equation ( 2), the expected number of common neighbours between two objects α and β is linearly correlated with the product of their popularities k α k β .We further use H to denote the parameter before the product of the popularities, and it can be rewritten as where • represents the mean value.In the parameter, the component k 2 u / k u 2 is normally referred as the degree heterogeneity H of a network [18,29].A large value of H may suggest that the network's degree distribution is very heterogeneous.As a consequence, the parameter H describes the heterogeneity of user degree distribution.
With the expected number of common neighbours as the estimation for the random component, one can compare the actual and expected number by taking either ratios or differences to get the similarity index.Normally, real user-object systems are extremely sparse where most object pairs would have no common neighbours at all.In order to make these object pairs distinguishable from each other, we define the similarity between two objects α and β by taking difference as This expression can be used as an object similarity index and theoretically there would be no popularity bias for quantified object similarities.Considering the popularity of objects may be an influential factor in recommender systems, this paper explores to what extent should the popularity bias be balanced to achieve good performance.By introducing a free parameter λ, we define a new similarity index, namely the BCN Table 1.Statistics of the applied datasets.The Movie-Lens and Netflix datasets are records of users watching movies and the Last.FM dataset is the records that users chose from a group of artists to follow.All of these three datasets can be modelled as the user-object bipartite networks and have been widely used in the studies and tests of recommendation algorithms.In the table, N , M and T represent the number of users, objects and total links respectively, H is the user degree heterogeneity calculated as k 2 u / k u 2 , and H is the heterogeneity parameter defined in equation (3).index as One may find from the expression that λ = 0 gives the standard CN index and λ = 1 gives the theoretical similarity with no popularity bias.
For the theoretical similarity (the BCN index with λ = 1), the popularity distribution of the top similar objects differs from that of the CN index, as shown in Figure 1.The frequency of the extreme popular objects being considered as most similar to others is much lower than that of the CN index.Instead, the BCN index puts more focus on the middle range, that the objects that are neither extreme popular nor unpopular are more frequently considered as similar to others than the empirical distribution.However, the difference between the distributions of BCN and CN index is relatively small for the Last.FM dataset.This may be due to the fact that the user degree heterogeneity H is almost 1 (Tab.1) in Last.FM, while that of the MovieLens and Netflix datasets are 3.2 and 3.7 respectively.

Recommendation algorithm and evaluation metrics
A recommender system can normally be described as a user-object bipartite network with a set of users U = {u 1 , u 2 , . . ., u N }, and a set of objects O = {o 1 , o 2 , . . ., o M }.The interactions between users and objects can thus be presented by the adjacency matrix A = {a uo } where a uo = 1 if the user u collected the object o, and 0 otherwise.Therefore, the task of the recommender systems is actually to predict a number of new links between unconnected user-object pairs.For a target user i, a score for an object α to be connected, w iα , can be calculated as, where s oα is the similarity between objects o and α that calculated by an arbitrary index.Note that, this paper focuses on the proposed BCN index, and also considers all the mentioned indices as comparisons.As shown by the equation, the score of object α to be collected is actually the summation of similarities of it to all the target user's historical collections.One can rank the score of all the uncollected objects, and these ranked at the top L positions will be regarded as the recommendation list for the target user.
To evaluate the performance of the recommendations, one can randomly divide the data (links of the bipartite network) into a training set and a probe set.Therefore, one can compare the recommendation lists generated using the training set, with the records in the probe set.This paper considers four widely-used metrics to measure two aspects of the recommendation performances, i.e. the accuracy and diversity.

Accuracy
For a specific user i, if an object recommended by the system is actually what s/he collected in the probe set, we can call it an accurate recommendation.Denoting the number of accurate recommendations for user i as h i (L), two accuracy metrics can be defined accordingly, i.e. the Precision and Recall.The precision considers how many of the L recommendations are accurate, while the recall considers how many of the k probe i removed records are retrieved.Thus, the precision and recall for the target user i are calculated as p i (L) = h i (L)/L and r i (L) = h i (L)/k probe i respectively.Averaging over all the users, the precision and recall of the recommender system read and where U is the set of users.Obviously, both precision and recall are the higher the better with 0 and 1 as the lowerand upper-limits respectively.

Diversity
It has long been argued that being accurate is not enough for recommender systems [22], because the users may want personalised and novel recommendations.Accordingly, many diversity metrics have been developed.One way to measure the diversity is to evaluate the differences between different users' recommendation lists, which is normally referred as the personalisation.For two users i and j's recommendation lists, the Hamming distance can be calculated as , where Q ij is the number of same objects in the two users' recommendation lists.The personalisation is then defined as the average Hamming distance over all possible user pairs, i.e.
On the other hand, recommending popular objects to users may be of little value because they can be easily found by users themselves or through other means such as the search engine.Therefore, the novelty of recommendations has also been addressed.While there are several ways of measuring the novelty of recommendations, we follow reference [3] and define the novelty as the average popularity of all the recommended objects, i.e.
where Ω i is the set of the L objects that are recommended to user i.For the personalisation, the higher values may suggest that the recommendations are more personalised.As for the novelty, lower values are expected which may indicate that the recommended objects are novel (unpopular).

Results
We carry out recommendation experiments based on three empirical datasets as shown in Table 1.Notably, the user degree heterogeneity H for MovieLens and Netflix is high, while the Last.FM has evenly distributed user degrees (H ≈ 1).For all the recommendation experiments in this paper, we randomly divide 20% of the links into the probe set for each dataset, and take a recommendation list length L = 20.Furthermore, all of the results on the recommendation performance are averaged over 100 independent experiments.
We start with exploring that to what extent should the popularity bias of object similarity should be balanced (by tuning λ) to achieve better recommendation performances in terms of accuracy and diversity.The results are shown in Figure 2. As has been discussed earlier, λ = 0 gives us the standard CN index, which can result in relatively good accuracy but poor diversity.The reason is that CN recommends generally the extremely popular objects, which have better chance to suite more users' common interests.However, every user's recommendation list would be dominated by the same popular objects, leading to poor personalisation and novelty.When gradually increasing the parameter λ, the BCN index removes more and more randomly-generated common neighbours as suggested by equation ( 4).Accordingly, the average popularity of the recommended objects, i.e. the novelty N (20), decreases with the increase of λ, as those not-sopopular objects are evaluated as more similar to others as shown in Figure 1.The recommendations thus become more and more personalised.Therefore, Figure 2 suggests that to remove the random component of the common neighbours can remarkably improve the diversity of the Fig. 2. Recommendation performances on three datasets using BCN similarity index.The precision and recall as the accuracy of the recommendations, are the higher the better.As to the diversity, personalisation is the higher the better while the novelty is the lower the better.Each column of subplots is the result for one dataset, where the red dashed line represent the optimised parameter λo maximising the precision.The results for each dataset are based on 100 independent recommendation experiments with random data partitions.
recommendations.On the other hand, the recommendation accuracy will also be largely influenced by the balance of the popularity bias.If slightly balance the popularity bias, the recommendations are shown to be more accurate.However, when applying a large λ, which means to totally remove the popularity bias (λ = 1) or even reverse the bias (λ > 1), the recommendation lists would be dominated by only unpopular objects, leading to poor accuracy.Here we take an optimised value λ o maximising the precision for each dataset.With the optimised value λ o , both the accuracy and diversity of the recommendations can be significantly improved in comparison to the algorithm applying the original CN index.The optimised values for λ are 0.33, 0.36 and 1.4 for MovieLens, Netflix and Last.FM datasets respectively.
As shown in Table 2, the accuracy metrics (precision P (20) and recall R (20)) are improved about 10% for the MovieLens and Netflix datasets and more than 20%  We further compare the recommendation performances of the optimised BCN index with that of classical similarity measures introduced in the Introduction section, including the CN, LHN, HP, AA, RA, MD, and HC indices.As shown in Table 3, the recommendation accuracies (precision P (20) and recall R( 20)) of the BCN index are comparable to the MD index which is normally considered as one of the most accurate algorithms.The BCN index is more accurate than many of the accuracy-based indices such as the CN, AA, RA.In terms of the diversity, the BCN index is comparable to the HC index, which is designed to achieve good diversity.While some diversitybased indices such as LHN, HP, and HC sacrifice the accuracy a lot to focus on the extreme unpopular recommendations, the proposed BCN index can achieve good accuracy and diversity simultaneously with reasonable preference on the popularity of recommended objects.

Discussion
Nowadays, countless valuable niche information is hidden in the dominance of popular information.While many channels are continuously enhancing the dominance of the popular information, such as the mass media and search engines [30,31], the recommender system is a chance to fulfil the service purpose to provide accurate recommendations for users, as well as enhance the accessibilities of niche information.However, the popularity bias of the similarity quantifications makes the recommendations also biased, leading to either poor accuracy or poor diversity.
Comparing a given bipartite network with the random network, this paper develops a BCN similarity index for the quantification of object similarities.The experiment results show that, the diversity of recommendations can be largely improved by balancing the popularity bias of the CN index.However, the accuracy will be sacrificed if removing all the random-driven common neighbours (λ = 1).To achieve good accuracy and diversity simultaneously, one should optimise the similarity quantification to remove the random-driven common neighbours in only a certain degree.The optimised value λ o for the Movie-Lens and Netflix are 0.33 and 0.36 respectively, which are less than the theoretical value λ = 1.On the other hand, the optimised value for the Last.FM is 1.4 which is larger than the theoretical value.Such difference between the Last.FM dataset with others may be raised from the different object degree distributions, and the extremely even user degree distribution (H = 1).Without hub users, objects are less likely to share many common neighbours, leading to the possibility that we may need a relatively larger value of λ to balance the popularity bias.With the optimised value, the accuracy and diversity of the recommendations can be simultaneously improved and are comparable to accuracy-based algorithms (such as MD) and diversity-based algorithms (such as HC) respectively.

Fig. 1 .
Fig.1.Distributions of object popularity (degree) in datasets (a) Movielens, (b) Netflix and (c) Last.FM respectively.The introduction and basic statistics of these datasets can be found in Table1.In each of the subplots, the green circles are the distribution of all the objects, i.e. the basic degree distribution of the object side of the user-object bipartite network.For each object, we calculate the similarities of it with each of the others using the CN index and proposed BCN index with parameter λ = 1, and the corresponding curves (black triangles for CN index and red square for BCN index) show the distributions of popularities of the top-20 similar objects.

Table 2 .
Numerical results of recommendation algorithm applying BCN index and its improvements in comparison to the CN index.

Table 3 .
Comparison of recommendation performances among algorithms applying different similarity indices.The results of BCN index are based on the optimised value of λ o , i.e. 0.33, 0.36 and 1.4 for the MovieLens, Netflix and Last.FM respectively.