An adjustable re-ranking approach for improving the individual and aggregate diversities of product recommendations

The effectiveness of product recommendations is previously assessed based on recommendation accuracy. Recently, individual diversity and aggregate diversity of product recommendations have been recognized as important dimensions in evaluating the recommendation effectiveness. However, the gain of either diversity is usually at the cost of accuracy and the increase of one diversity does not guarantee a significant improvement in the other. A few attempts have been made to achieve reasonable trade-offs either between recommendation accuracy and individual diversity or between recommendation accuracy and aggregate diversity. Little attention has been paid to obtain a balance among the three important aspects of product recommendations. To address this problem, we propose an adjustable re-ranking approach that incorporates two new ranking criteria for improving both diversities. Three ranking lists are generated to guarantee recommendation accuracy, individual diversity, and aggregate diversity, respectively. The three ranking lists are finally merged with tunable parameters to generate a recommendation list. To evaluate the proposed method, experiments are conducted on a data set obtained from Alibaba. The results show that the proposed method achieves much higher improvements in both diversities than the baseline methods when sacrificing the same amount of recommendation accuracy.


Introduction
Recommender systems are widely used in electronic commerce to help customers find products.A recommender system learns customers' preferences and recommends products that customers are likely to purchase.Studies have consistently shown that product recommendations can increase the sales and profits of electronic commerce [1,2].The effectiveness of a recommender system is usually assessed by the accuracy of recommendation results, i.e. how well a customer's preference on a product is predicted by the system.Accuracy is no doubt of great importance in evaluating the recommendation effectiveness, which can be exemplified by a large pool of studies dedicated to improve the accuracy of recommendation methods [3][4][5][6][7].Recently, recommendation diversity has been recognized as another important aspect for increasing customers' satisfaction and the sales of electronic commerce [8][9][10][11].
There are two types of recommendation diversity, namely individual diversity (ID) and aggregate diversity (AD).ID refers to the diversity or dissimilarity of products in a recommendation list for a customer.Recommender systems are expected to recommend diverse products to meet the needs of a customer.A diverse set of products also leads to more serendipity in recommendations.Therefore, ID is important regarding customers' satisfaction [12,13].AD is the diversity or dissimilarity of products across all recommendation lists for all customers.Low AD indicates that only a small portion of products is recommended to customers, which will exacerbate the long tail phenomenon in online marketplaces.Increasing AD, therefore having a higher possibility to recommend more products in the long tail, has great potential for gaining higher profits since products in the long tail are extremely abundant.From this point of view, AD is significant to ecommerce business [14,15].
Although recommendation accuracy and diversities are important, product recommendations are facing the dilemma between high accuracy and high diversities.The increase of either diversity comes with a decrease in recommendation accuracy [16].In addition, increasing ID does not necessarily lead to a significant improvement in AD, and vice versa [8,15,17].For example, high ID can be obtained by recommending the same diverse set of products to all customers, but AD still remains low.A few methods have been proposed to achieve reasonable trade-offs between recommendation accuracy and diversities, such as ranking-based techniques [18,19] and optimization-based approaches [20,21].However, previous methods focus on maintaining the balance either between accuracy and ID or between accuracy and AD.Little effort has been made to maintain the balance among the three aspects.
In this paper, we propose an adjustable re-ranking approach to address the above problem.Specifically, we first obtain a candidate list using accuracy oriented recommendation methods.Two copies of the candidate list are re-ranked based on product pairs' distance in the concept hierarchy of products and products' degree centrality in the co-purchased network of products, respectively.The distance of product pairs is used to guarantee high ID while the degree centrality is used to guarantee high AD.To obtain reasonable levels of recommendation accuracy, ID, and AD, the three ranking 1 3 An adjustable re-ranking approach for improving the individual… lists are merged with tunable parameters (i.e. the weight assigned to each list).Top N products from the merged list are then recommended to customers.Experiments are conducted on a real-world data set obtained from Alibaba, the largest ecommerce company in China.The experimental results demonstrate that the proposed method can produce different trade-offs between recommendation accuracy and both diversities by changing the length and the weight of the three lists.The results also show that the proposed method achieves much higher improvements in both diversities than several baseline methods when sacrificing the same amount of recommendation accuracy.To summarize, this paper proposes two novel criteria to guarantee ID and AD respectively, i.e. the distance between products in the concept hierarchy and the degree centrality of products in the co-purchased network.A re-ranking approach is designed to obtain adjustable balances among recommendation accuracy, ID, and AD.Experiments are also conducted to demonstrate the effectiveness of the proposed method.
The rest of this paper is organized as follows.The second section reviews accuracy oriented recommendation methods, accuracy and diversity measures, and studies on maintaining the trade-offs between recommendation accuracy and diversities.The third section introduces the proposed re-ranking approach.The fourth section provides details of the experiments and the experimental results.The last section concludes this study with major contributions and possible future research.

Accuracy oriented recommendation methods
Most existing recommendation methods are designed with the goal of high recommendation accuracy.These methods can be classified into three categories, namely collaborative filtering (CF), content-based filtering (CB), and hybrid method [22,23].For a given customer, CF recommends products that his or her like-minded customers have liked before, and CB recommends products with features liked by the customer.Hybrid method can combine both types of methods in different ways, such as switching mechanism, cascade mechanism, and feature augmentation [24].A recommender system generally takes two stages for recommending products to customers.The first one is to estimate customers' preferences on products using certain recommendation algorithms.Based on the estimated preferences, the second stage is to select and recommend products that maximize customers' satisfaction.The proposed adjustable reranking approach is used at the second stage and it can be combined with any accuracy oriented recommendation method.To illustrate and evaluate the proposed re-ranking approach, we combine it with the most popular accuracy oriented method, i.e. the userbased CF [25].Given a set of customers U, a set of products I, and a rating matrix R representing customers' preferences on products, the user-based CF estimates the unknown preference of customer u on product i as follows: where I u, u ′ is the set of products rated by both customer u and u ′ , sim u, u ′ is the similarity between customer u and u ′ , and N(u) is the set of neighbors nearest to customer u.

Accuracy and diversity measures
Different measures have been proposed to evaluate recommendation accuracy and diversities.To select suitable measures for evaluating the proposed method, we review the most commonly used accuracy and diversity measures in this subsection.There are two popular types of accuracy metrics, namely predictive accuracy metrics (such as Mean Absolute Error and Root Mean Square Error) and classification accuracy metrics (such as Precision, Recall, and F measure) [26,27].Predictive accuracy metrics measure how close the predicted preferences are to the true customer preferences.They are used when preferences are scalar value.Classification accuracy metrics measure the percentage of correct predictions about whether or not a customer likes a product.They are used when preferences are binary value.In the context of electronic commerce, customer preferences are usually expressed by binary value, such as viewing or not viewing a web page, clicking or not clicking a link, and buying or not buying a product.ID is measured by the average dissimilarity or distance of all product pairs in one recommendation list [13,27].Previously, the dissimilarity or distance between products is calculated based on the ratings given by customers [28].However, the obtained diversity may not be explicitly sensed by customers because products with similar features could have very different rating patterns.For example, two smart phones with similar design and functions may get totally different ratings from the same set of customers due to the difference of brands.Therefore, we propose to measure ID by the average distance between products in products' concept hierarchy which explicitly shows the diversification of products.The proposed criterion is introduced in Sect.3.
AD is measured either by the number of distinct products recommended to all customers (e.g.Diversity_in_top_N) [19,20] or the distribution of recommended products among all recommendation lists (e.g.Gini_diversity) [14,15].Diversity_ in_top_N favors recommending more products, but it does not consider the distribution of recommended products.Consequently, recommended products could have high Diversity_in_top_N while bearing an unbalanced distribution.For example, recommending every possible product to a different customer and filling the remaining recommendation lists with the same set of products could produce high Diversity_in_top_N.But the product distribution among all recommendation lists is unbalanced.To compensate for this drawback, Gini_diversity is proposed to measure how evenly the products are recommended to all customers.This metric is also An adjustable re-ranking approach for improving the individual… important to product recommendations because it ensures a more even amount of exposure for each of the recommended products.

Accuracy-diversity trade-off
According to the stage where diversities are taken into account, previous methods balancing recommendation accuracy and diversities can be categorized into two types.The first type of methods considers the diversities at the stage of preference estimation.For example, Niemann and Wolpers [29] improved the rating predictions for niche items by considering their jointly consumed items when measuring item similarities.Gan and Jiang [16] identified the adverse effect of item popularity and proposed to adjust user similarity through a power function for more correct rating predictions, and thus for higher recommendation accuracy and AD.Gan [30] balanced recommendation accuracy and AD by calculating discriminant scores for items using a random walk with restart model on a user similarity network.These methods try to identify problems that lead to the lack of recommendation diversities and therefore propose corresponding solutions to enhance diversities.Although such methods can increase diversities at the stage of preference estimation, they provide no control over the extents of recommendation accuracy and diversities.
Another type of methods guarantees recommendation diversities at the product selection stage.Such methods predict customers' preferences on products using previous accuracy oriented methods and then select products based on the predicted preferences and diversity oriented criteria.For instance, several studies [14,21,31] formulated the accuracy-diversity trade-off problem as a multi-objective optimization problem with certain constraints and proposed corresponding solutions to solve the optimization problem.It is possible for optimization methods to adjust the tradeoffs between recommendation accuracy and diversities by setting certain parameters, but it is very difficult to find the optimal solution [32,33].Some studies proposed new ranking mechanisms to generate balanced recommendation lists such as parameterized item-popularity-based ranking approach [19] and graph-based re-ranking approach [15].Compared to other methods, ranking methods are more efficient and can easily adjust the balance between recommendation accuracy and diversities through tunable parameters.However, previous methods aim at achieving a fair trade-off either between recommendation accuracy and ID or between recommendation accuracy and AD.Little attention has been paid to balance the three important aspects at the same time.This paper addresses this problem by proposing an adjustable re-ranking approach which is employed at the stage of product selection.

The proposed re-ranking approach
In this section, we introduce the proposed re-ranking approach for improving the individual and aggregate diversities of product recommendations.
Figure 1 shows the framework of the proposed re-ranking approach and illustrates how to improve ID and AD while keeping acceptable recommendation accuracy.

Fig. 1
The framework of the proposed re-ranking approach 1 3 An adjustable re-ranking approach for improving the individual… The proposed method has three steps.The first one is to predict customers' preferences and generate an accuracy oriented ranking list by accuracy oriented recommendation methods.The second step is to make two copies of the accuracy oriented ranking list and re-rank them to produce ID and AD oriented ranking lists respectively.The final step is to merge the three ranking lists and obtain the top N recommendation list.Details are introduced in the following subsections.

Accuracy oriented ranking list
The generation of an accuracy oriented ranking list is the same as that of previous recommender systems.Customers' preferences on products are first estimated by accuracy oriented recommendation methods.For each customer, the list is then filled with products that are of the highest estimated preferences and have not been consumed by the customer.For the sake of re-ranking efficiency, only top k products with the highest estimated preferences are included in the accuracy oriented ranking list.To leave space for improving ID and AD, k should be larger than the intended number of recommended products.Specifically, given a customer and his or her estimated preferences p, the accuracy oriented ranking list for the customer is as follows: where p I 1 ≥ p I 2 ≥ ⋯ ≥ p I k and k is larger than the intended number of recommended products.

Individual diversity oriented ranking list
To show ID in a way that can be explicitly sensed by customers, we propose to rerank products based on the distance of product pairs in products' concept hierarchy.A concept hierarchy is a tree-like structure that organizes concepts from groups of low-level and specific concepts to several high-level and general concepts, and finally to one universal concept [34,35].Due to the explicit representation of relationships between concepts, concept hierarchies have been widely used in applications that require measuring relationships between objects, such as segmenting customers [35], recommending documents [36], and finding similar patients [37].In the context of electronic commerce, products are related to certain product concepts that are well organized as a concept hierarchy.Figure 2 shows part of the concept hierarchy on clothing products.
Given such a concept hierarchy, the distance between two products can be calculated by a commonly used measure [38,39] defined below: where N 1 and N 2 are the numbers of edges from the lowest-level concepts related to I i and I j to their least common subsumer, i.e. the lowest-level ancestor that includes both concepts.N 3 is the number of edges from the least common subsumer to the root of the concept hierarchy.For example, if I 1 and I 2 are products that belong to category A01020201 and A010201 respectively in the concept hierarchy shown in Fig. 2, then their least common subsumer is category A0102.Therefore, N 1 = 2, N 2 = 1, N 3 = 1, and dis I 1 , There are other types of products, such as digital products and food.Different types of products belong to different sub-hierarchies, and all sub-hierarchies constitute the concept hierarchy that contains all types of products.Therefore, the distance between products of any categories can be calculated within the same concept hierarchy.
Since ID is measured by the average distance between products in a recommendation list, it can be improved by positioning the product pair with the largest distance in the first and second places of the list, and continuously putting in the next place the product that has the largest average distance to the existing products in the list.Therefore, to guarantee certain level of ID, we copy the accuracy oriented ranking list and re-rank the copied list according to the proposed mechanism.Specifically, the ID oriented ranking list can be represented as follows: The positions of I ′ 1 and I ′ 2 are determined by their estimated preferences, i.e. the product with larger estimated preference is ranked first.

Aggregate diversity oriented ranking list
Previous studies have shown that recommendation agents can lower AD of product recommendations since they are usually made based on products' co-purchased patterns [8,40].It is thus reasonable to conjecture that re-ranking products against their An adjustable re-ranking approach for improving the individual… co-purchased relationships can improve AD.Co-purchased relationships among products can be constructed as a co-purchased network in which products are represented by nodes and co-purchasing relationships between products are represented by links between corresponding nodes.In graph and network analysis domain, degree centrality represents the number of links incident upon a node [41] and indicates the co-purchased frequency of products in the context of the co-purchased network [42].We conjecture that ranking against the degree centrality can increase AD.An example of the co-purchased network is presented in Fig. 3.In the example, numbers beside the links refer to the co-purchased frequency between two products.Formally, the degree centrality of products in the co-purchased network is defined as follows: where f i,j refers to the co-purchased frequency of product i and j.To examine our conjecture, we empirically investigated the correlation between the degree centrality and AD by increasing the average degree centrality of products in recommendation lists and observing the resulted AD.The empirical results shown in Fig. 4 verify our conjecture, i.e.Gini_diversity and Diversity_in_top_N are negatively related to the average degree centrality of products in all recommendation lists.Therefore, to improve AD, we copy the accuracy oriented ranking list and re-rank the products in non-descending order of the degree centrality in the co-purchased network.The AD oriented ranking list is then represented as follows: where degree_centrality I �� i ≤ degree_centrality I �� i+1 .( 6) 3 An example of the co-purchased network

Fig. 4
The correlation between the degree centrality and AD

3
An adjustable re-ranking approach for improving the individual…

Merging of ranking lists
After obtaining the three ranking lists, the balance among recommendation accuracy, ID, and AD can be achieved by merging the three lists.Rank aggregation is a well-studied area and many aggregate functions have been proposed to generate an aggregated ranking list from multiple ranking lists.Among existing aggregate functions, Borda's method has gained wide popularity because of its effectiveness and simplicity [43].Therefore, Borda's method is employed to merge the three ranking lists in this paper.Borda's method works by assigning a score to each position of a ranking list and sorting products according to their total scores calculated based on their positions in different ranking lists.In this study, the score assigned to a position is set to the ranking of the position.Consequently, the higher a product scores, the lower it ranks.Traditionally, all ranking lists are treated equally in Borda's method.
To make the balance among recommendation accuracy, ID, and AD adjustable, we adapt Borda's method by assigning different weight to the three ranking lists.Given an item and its rankings in the three lists, the integrated ranking score obtained by the adapted Borada's method is as follows: where Ranking List i is the ranking of item i in the corresponding ranking list, , , and are weight assigned to corresponding ranking lists, 0 ≤ , , ≤ 1 and + + = 1 .After merging the three ranking lists, the final recommendation list can be represented as follows: where Score I ��� i ≤ Score I ��� i+1 .

Data set
To evaluate the proposed re-ranking approach, we conduct experiments on a realworld data set obtained from Alibaba, the largest ecommerce company in China.
The original data set 1 contains more than one hundred million consumer-product interaction records.Without compromising the evaluation, we randomly select 50,000 transaction records from the original data set.Each transaction record contains one customer and at least one product.Each product belongs to at least one product category.A product category can range from the most general level to the finest level in the concept hierarchy of products.The statistical description of the selected data set is provided in Table

Experimental design
The selected data set is randomly divided into training and testing sets with an 80:20 split.Specifically, 40,000 transaction records are used to tune the parameter of and then used as the input of the accuracy-oriented recommendation method (i.e., the user-based CF).The remaining 10,000 transaction records are used to test the recommendation results re-ranked by the proposed method as well as the baseline methods.In our experiments, customers' preferences are binary value (i.e.purchasing or not purchasing a product).Therefore, we use the following classification accuracy metric to measure the accuracy of recommendation results: where Precision = #|RS∩TS| #|RS| , Recall = #|RS∩TS| #|TS| , RS is the set of products recommended to customers, and TS represents test set, the set of products that customers actually like.Precision measures the percentage of products in the recommendation set that are actually liked by customers, Recall measures that of products in the test set that are recommended to customers, and F is the harmonic mean of both Precision and Recall [23,26].
To measure ID, Individual_Diversity metric defined below is used because it explicitly shows the diversification of products.
where RS is a recommendation list, N is the number of products in the recommendation list, and dis(i, j) is the dissimilarity or distance between product i and j as defined by Eq. ( 4).
In terms of measuring AD, both Diversity_in_top_N and Gini_diversity are used in our experiments.The two metrics emphasize different but important aspects of AD as discussed in Sect.2.2.They are defined as follows: 1 3 An adjustable re-ranking approach for improving the individual… where N is the number of products recommended to one customer, RS N (u) is the set of products recommended to customer u, i refers to the index of a product in the product set I, rec i is the number of customers to whom product i is recommended, products in I are arranged in non-descending order of rec i , and total indicates the total number of recommendations across all customers.For a target customer, the user-based CF is applied to estimate his or her preferences on products.Ranking approaches are then used to rank and thus recommend suitable products to the customer.Three ranking approaches are used as our baselines, namely the standard ranking approach [25], the enhancing ID approach [44], and the enhancing AD approach [19].The standard ranking approach arranges products in descending order of the estimated preferences and then recommends the top N ranked products to the target customer.
The enhancing ID approach first selects a candidate list S which contains k candidate products with the highest estimated preferences.The candidate products are then re-ranked according to the re-estimated preferences defined below: where is ranging from 0 to 1 and controls the balance between recommendation accuracy and ID, R * (u, i) is the estimated preference of user u on product i, sim(i, j) is the similarity between product i and j, sim a (i, j) is the similarity between product i and j on attribute a, A is the collection of product attributes which are defined as product categories in this paper, and w u(a) is the importance of attribute a to user u and is decided based on the number of products previously purchased by the user and his or her propensity towards diversities on attribute a.The latter one is measured through Shannon's entropy.Further calculation details can be found in [44].
Similarly, the enhancing AD approach selects candidate products whose estimated preferences are higher than a predefined threshold T H .The selected candidate products are then re-ranked based on the following mechanism: where u = max i∈I u( T R) rank random (i) and I u T R = i ∈ I|R * (u, i) ≥ T R .Simply put, candidate products with estimated preferences higher than the re-ranking threshold T R are re-ranked randomly, while products that are lower than T R are ranked using the (13) standard ranking approach.Besides, all candidate products which are above T R get ranked ahead of all candidate products which are below T R .This is ensured by u .

Experimental results and analysis
To show the ability of the proposed method in balancing recommendation accuracy with ID and AD, we conduct a series of recommendations by varying the values of , , and .Figure 5 shows the resulting recommendation accuracy, ID, and AD when , , and are ranging from 0 to 1.In the experiment, the length of the recommendation list (i.e.N) is fixed at 5, and the length of the accuracy oriented ranking list (i.e.k) is fixed at 10, 20, 30, 40, and 50 respectively.And is changing from 1 to 0 with step decrement of 0.1 while = = (1 − )∕2.
There are many other possible combinations of , , and , but we are not interested in enumerating all of them.The combinations in Fig. 5 are used to illustrate that the proposed method can be adjusted to obtain different trade-offs between recommendation accuracy and diversities.The results also demonstrate that varying k results in different trade-offs.For a specific combination of , , and , higher k leads to higher diversities and lower accuracy.Thus, the proposed method allows electronic commerce platforms to achieve expected trade-offs between recommendation accuracy and diversities by setting suitable values for , , , and k.
We also calculate the changes of F, Iidividual_diversity, Gini_diversity, and Diversity_in_top_N respectively when decreasing from 1 to 0 and setting = = (1 − )∕2 .Table 2 presents the results which show that the proposed method can significantly improve ID and AD at the cost of only a small decrease in recommendation accuracy.For example, the proposed method achieves 13.00%, 12.56%, and 16.43% increases in Individual_diversity, Gini_diveisity, and Diversity_in_top_N respectively with only a 4.08% decrease in F by changing from 1 to 0.8 when k = 10.
To show the superiority of the proposed method further, we compare the proposed method with the enhancing ID approach and the enhancing AD approach in terms of the changes in F, Individual_diversity, Gini_diveisity, and Diversity_in_top_N.In this experiment, the length of the recommendation list (i.e.N) is fixed at 5, the length of the accuracy oriented ranking list (i.e.k) is fixed at 20, and the recommendation performance of the standard ranking approach is used as a comparison standard.The parameters of the enhancing ID approach, the enhancing AD approach, and the proposed method are tuned to improve recommendation diversities at the cost of certain decreases in recommendation accuracy.The extents of the changes compared to the standard then can be calculated.The results are shown in Fig. 6.They demonstrate that the proposed method achieves higher improvements in Individual_diversity, Gini_ diveisity, and Diversity_in_top_N than the enhancing ID approach and the enhancing AD approach when sacrificing the same amount of F.

3
An adjustable re-ranking approach for improving the individual… Fig. 5 The recommendation performance of the proposed method

Conclusions
In this research, we propose an adjustable re-ranking approach for improving the individual and aggregate diversities of product recommendations.This approach balances recommendation accuracy with both diversities at the product selection stage.To improve ID and AD while keeping acceptable accuracy, three ranking lists (i.e. the 1 3 An adjustable re-ranking approach for improving the individual…

Fig. 6
The trade-offs between recommendation accuracy and diversities accuracy, ID, and AD oriented ranking lists, respectively) are produced.The accuracy oriented ranking list consists of top k products with the highest ratings estimated by accuracy oriented recommendation methods (e.g., the user-based CF).The ID oriented ranking list copies the accuracy oriented ranking list and re-ranks the copied list by maximizing ID of the sub list with top N products.Similarly, the AD oriented ranking list copies the accuracy oriented ranking list and arranges the products in non-descending order of the degree centrality in the co-purchased network of products.The three lists are merged by a rank aggregation method, i.e.Borda's method.Experiments are conducted on a real-world data set.The experimental results demonstrate that the proposed method can achieve different trade-offs between recommendation accuracy and diversities by changing the length and the weight of the three lists.Compared to the baseline methods, the proposed method can generate much higher improvements in both diversities when sacrificing the same amount of recommendation accuracy.The contribution of this research is twofold.First, two novel criteria are proposed to ensure ID and AD of product recommendations respectively, i.e. the distance between products in the concept hierarchy and the degree centrality of products in the co-purchased network.Second, a re-ranking approach is designed to balance recommendation accuracy with both diversities for product recommendations.Experimental evaluation demonstrates the advantages of the proposed approach.With the proposed method, ecommerce platforms can better balance recommendation accuracy with ID and AD.The proposed method may be adapted for recommendations in other contexts, such as expert recommendation and literature recommendation.In terms of expert recommendation, a researcher looking for cross-discipline collaborations may expect recommended experts with different discipline backgrounds, and a platform may want to balance the collaboration workload of experts.In this situation, a discipline classification tree can be used to ensure ID and a collaboration network can be used to guarantee AD.In the context of literature recommendation, an author writing a research paper may expect recommended citations covering different topics of the paper, and a platform may want to disseminate as many publications as possible.In this case, a research domain classification tree can be used to ensure ID and a citation network can be used to guarantee AD.Future research will investigate the generalizability of the proposed method in these recommendation contexts.

( 5 )Fig. 2
Fig. 2 Part of the concept hierarchy on clothing products

Table 1
The statistical description of the selected data set

Table 2
The changes of the performance of the proposed method