Introduction

To overcome the information overload problem, recommender systems use specific characteristics of users and items to filter out the content required by users and generate personalized recommendations. Trust is receiving considerable attention in the academic community and e-commerce industry and plays a central role in exchange relationships involving unknown risk [1, 2]. It is observed that trust has a significant correlation with user preference similarities in rating systems [3], especially in the e-commerce industry [4]. Thus, most existing work about trust has been studied in the context of rating prediction [5, 6], although user trust may be violated if a recommender system reduces its accuracy intentionally. Users with similar preferences are more likely to trust each other, demonstrating homophiles [7] in the trust network. Previous work suggests that a model combining rating similarity with trust networks can achieve better performance than the trust propagation model [8, 9].

However, user’s trust relationships usually follow a power-law distribution, and a large, long tail of users only have few trusted or distrusted users [10]. Furthermore, recent state-of-the-art methods have relied on the learning of even larger, more complex factorization models, often using nontrivial combinations of multiple submodels [11, 12].

Thus, in this paper, we improve the calculation algorithm of the trust matrix with the genre trust degree (GTD) and combine GTD with user similarities. The main idea of our model is the synthesis of user credibility and the trust degree between users. We promote the weight of high-credibility users to make better recommendations. In summary, we make the following main contributions: first, we propose a new method to detect shilling attackers with higher accuracy and ensure the stability of the recommendation results. Second, we propose an improve trust degree calculation algorithm to resist shilling attacks.

This paper is structured as follows: In section “Related work”, we refer to related work in the area and explain the differences between our study and theirs. Section “Trust degree in recommender system” briefly describes the definition of trust. Then, we provide a detailed description of the GTD model in section “Genre trust degree model for defending shilling attacks”. We present our real data-based experiment results in section “Experimental setup”. Finally in section “Results analysis”, we conclude this paper and present our future research directions.

Related work

Recommender systems have taken on an impressive role in our daily life, to the extent that profile-injection attacks directed at misleading recommendation results appear continuously. The shilling attack [13] is one type of the profile-injection attacks that is regularly used, and research in the area of shilling attacks has made significant advances in the last few years. Shilling attacks aim to control item ratings for their own purposes. From the intention perspective, shilling attacks were classified into two basic types in researchers’ early work: the push attack and “nuke” attack. The general form of a shilling attack is depicted in Table 1. "Item" refers to items evaluated by users in the system, such as movies, books, merchandise, and so on.

Table 1 General form of shilling attack profile

A shilling attack profile generally consists of ratings of selected items, filler items, unrated items, and the target item. Various attack models have thus been discovered and appropriate metrics developed to measure the effectiveness of an attack [14]. Different shilling attack models correspond to different methods of choosing selected items and filler items. The commonly used methods of shilling attacks are the random attack, average attack, bandwagon attack, segmented attack, and sampling attack: (a) Random attack: The ratings of filler items are chosen randomly and the target item is assigned with a pre-specified rating. The selected item sets are empty. (b) Average attack: A new attack model with a better attack effect than random attack. The selected items sets are still empty, while the ratings of filler items are the individual mean for that item rather than the global mean. (c) Bandwagon attack: Attackers pick up a part of frequently rated items in system as the selected items. The filler items are randomly chosen while the rating value for them is the mean across the whole system. (d) Segmented attack: Avoiding being detected, attackers give high ratings to those items similar to target items. (e) Sampling attack: This attack model needs to know more system information for copying existing user models. Attackers imitate real users selecting and rating items, while the target item are assigned with the highest rating.

The detection of shilling attacks on recommender systems has been studied by many researchers. The use of clustering methods to filter out shilling attackers or the analysis of normal users’ behavior patterns to detect shilling attackers has been studied by many researchers. Alostad et al. [15] presented an improved support vector machine (SVM) and Gaussian mixture model (GMM)-based shilling attack detection algorithm (SVM-GMM). Samaiya et al. [16] combined PCA and SVM—two classification algorithms—to improve the effect of the defense against shilling attacks. Lee et al. [17] proved that users’ evaluations are affected by other users and that users connected by a network of trust exhibit significantly higher similarity on items. Ardissono et al. [18] proposed a compositional recommender system based on multi-faceted trust using social links and global feedback about users. Jaehoon et al. [19] proposed a new trust recommendation algorithm, TCRec. Pan et al. [20] presented an adaptive learning method to map trust values to the [0, 1] interval and use “directionality” to distinguish users.

However, the limitation that these methods need more extraneous information is also obvious, such as user type and item category, and they still suffer from the problem of cold starting. Compared with the existing methods, the model proposed in this work can result in more efficient recommendations using limited existing information. Additionally, the credibility of users is a kind of real user interaction data that can be referred to, and we integrate it into the system model not only to better eliminate the interference of attacking users, but also can ameliorate the restriction of data sparsity. At the same time, the calculated recommendation results are more in line with user preferences to a certain extent.

Trust degree in recommender system

In this section, we will first give the definition and properties of trust in recommender systems. Then, we recommend several often-mentioned trust calculation models.

Definition of trust

The formation of a trust relationship requires the interaction of real users. Because of the lack of real interaction behaviors, trusted attack users are easily overlooked in the trust relationship. Therefore, the impact of the trusted attacking user on the normal recommendation system is reduced.

Trust is generally considered to be a complex concept; two common definitions of trust, reliability trust [21]and decision trust [22], are most widely used. However, in recommender systems, trust is mostly defined as being correlated with similar preferences toward items commonly rated by two users [23, 24]. Guo et al. [25] presented a more unambiguous definition: trust is defined as one’s belief in the ability of others to provide valuable ratings.

With intensive and deeper study, trust measurement methods can be divided into global trust and local trust [26]. It has been found that local trust methods perform better in terms of resisting shilling attacks [27]; furthermore, researchers have summarized some distinct properties of trust, described as follows [28, 29]:

  • Asymmetry/subjectivity: A user may hold different opinions toward different target users, and different users also may have various opinions or trust degrees with the same user. Thus, we can have trust(u, v) ≠ trust(v, u).

  • Transitivity: If user u trusts user v, and user v trusts user w, we believe that user u trusts user w to same degree.

  • Dynamicity/temporality: Trust is the users’ previous interactions and it changes over time.

  • Context Dependence: Trust is context-specific, which indicates a user who is trustable in art may not be helpful in computer science.

Trust calculation methods

Different trust calculation models use different trust metrics to calculate user trust from user ratings, and most of these methods are based on the assumptions that users with similar ratings tend to be trustworthy. Researchers prefer to regard user similarity as inferred trust values. One representative method was proposed by Papagelis et al. [30], who take the Pearson correlation coefficient as the calculation method of trust, as follows:

$$ s_{u,v} = \frac{{\sum\nolimits_{i} {(r_{u,i} - \overline{r}_{u} )(r_{v,i} - \overline{r}_{v} )} }}{{\sqrt {\sum\nolimits_{i} {(r_{u,i} - \overline{r}_{u} )^{2} } } \sqrt {\sum\nolimits_{i} {(r_{v,i} - \overline{r}_{v} )^{2} } } }}, $$
(1)

where su,v is the similarity of useru and userv, and the trust value is assigned as similarity. Some researchers have set a certain similarity as the threshold to filter trustworthy users. Although one characteristic of similarity trust calculation method is symmetry, Sotos et al. [31] proposed that the Pearson correlation coefficient is not transitive unless users are highly correlated. Hwang and Chen [32] obtained the trust score by averaging the prediction error on co-rated items, as follows:

$$ t_{u,v} = \frac{1}{{\left| {I_{u,v} } \right|}}\sum\limits_{{i \in I_{u,v} }} {\left( {1 - \frac{{\left| {\overline{{r_{u} }} + (r_{v,i} - \overline{{r_{v} }} ) - r_{u,i} } \right|}}{{r_{\max } }}} \right)} . $$
(2)

However, the current trust metrics are not satisfactory for producing distinguishable trust lists and may be further limited by the used similarity measures or required thresholds. Moreover, the existing trust metrics are generally based on symmetric methods, such as similarity and error measures; thus, they all can be treated as similarity-based trust metrics. The model described in this paper improves trust calculation methods by distinguishing trust value by ratings in different genres and providing a more reasonable user similarity calculation algorithm.

Genre trust degree model for defending shilling attacks

In this section, we introduce our method, the GTD model, to resist shilling attacks, including our improved recommendation method. After introducing the genre trust degree, we present our proposed model with its optimization method. Prior to delving into our GTD model in recommender systems, we define the notation used in our paper. We denote U, I, R, and K as the set of all users, items’ ratings, and item-genres in recommender systems, respectively. For simplicity, suppose our experimental datasets have an m × n rating matrix, and we keep symbols u, v for users and i, j for items; thus, ru,i represents a rating given by user u on item i, and Iu is the set of items rated by user u, R = { ru,i} denoting m users’ ratings on k genre items. As users record their trust relationships in the trust lists, we let tu,v be the trustworthiness of user v toward user u, and T = { tu,v} presents the trust list network.

Genre trust degree

Many trust metrics have been proposed to calculate implicit trust from user ratings, mainly based on the intuition that users whose ratings are close or similar to each other tend to be trustworthy [33]. In our paper, we propose that trust should be useful not only to generate item predictions but also to suggest reliable users. In contrast to traditional user trust calculation methods, there are two parts to the trust degree in the GTD model: the traditional trust value between users and the credibility of users. In most proposed trust degree computing methods, if useru trusts userv and userw, when giving recommendations to useru, the influence of userv is identical with userw, because they have the same trust degree. The trust theory indicates that “context dependence” is one main character of trust. It shows that users have different trust degrees in different fields, whereas the traditional trust degree computing methods leave this out of consideration. In this paper, we suppose that users have their own preferences, which leads to users better understanding their familiar areas. These scores provide a better reference for providing recommendations to other new users.

As users are more indicative in their familiar or favorite areas, we introduce “user credibility” in our model to distinguish different trustee reliabilities in different areas. When calculating users’ trust degrees, instead of giving a single-fixed value, we propose corresponding trust degrees according to the genres of items. First, we briefly describe some essential notations used in our method. We use the review ratings to calculate user credibility, and reRatingv,s,r stands for the review rating users gives to the reviewr made by userv. Gen(r) is the genre of the item to which reviewr refers. The sum of the review ratings of the reviews userv provides for those items belonging to genre k is given by:

$$ {\text{ReRating}}(v,k) = \sum\limits_{S,Gen(r) = k} {{\text{reRating}}_{v,s,r} } . $$
(3)

Based on Eq. (3), we can obtain the credibility of userv for the items of genre k, where N is the total number of item-genres:

$$ {\text{Credibility}}_{v,k} = \frac{{{\text{ReRating}}(v,k)}}{{\sum\limits_{k} {{\text{ReRating}}(v,k)} }} \times N. $$
(4)

Thus, the genre trust degree of trustoru to trusteev on genre k is as follows:

$$ {\text{GenreTrust}}_{u,v,k} = {\text{TrustValue}}_{u,v} \times {\text{Credibility}}_{v,k} {\text{ = TrustValue}}_{u,v} \frac{{\sum\nolimits_{S,Gen(r) = k} {{\text{reRating}}_{v,s,r} } }}{{\sum\nolimits_{k} {{\text{ReRating}}(v,k)} }}N, $$
(5)

where TrustValueu,v is the total trust value between trustoru and trusteev, which is usually a single-fixed value and is defined as follows:

$$ {\text{TrustValue}}_{u,v} = \left\{ {\begin{array}{*{20}l} 1, & {{\text{trust}}} \\ {\frac{{\sum\nolimits_{i} {(r_{u,i} - \overline{r}_{u} )(r_{v,i} - \overline{r}_{v} )} }}{{\sqrt {\sum\nolimits_{i} {(r_{u,i} - \overline{r}_{u} )^{2} } } \sqrt {\sum\nolimits_{i} {(r_{v,i} - \overline{r}_{v} )^{2}, } } }}} & {{\text{unknown}}} \\ 0, & {{\text{distrust}}} \\ \end{array} } \right.. $$
(6)

We combine the traditional user trust value in Eq. (1) with user credibility to determine the genre trust degree, and this fully demonstrates the “context dependence” characteristic of trust.

Genre trust model

As we know, users are more inclined to trust those users who have similar interests or behavior patterns. Therefore, some researchers introduced the notion of trust to collaborative filtering (CF) recommender systems to improve the system’s robustness to malicious attacks, especially shilling attacks. Because the shilling attacker is a fake user, it can only disguise its own user behavior, but lacks interaction data with real users. Therefore, combined with the CF algorithm can well resist the interference of shilling attacks.

CF recommendation algorithms are generally divided into two types: item-based and user-based collaborative filtering recommendation methods. The common steps of CF algorithms ‘recommendations to users are shown in Fig. 1:

Fig. 1
figure 1

Steps of collaborative filtering (CF) recommendation algorithms

Based on traditional CF recommendation algorithms, the GTD model uses the genre trust degree to improve the user similarity calculation algorithm. The common steps of our GTD model are shown in Fig. 2:

Fig. 2
figure 2

Steps of the genre trust degree (GTD) recommendation model

And the algorithm flowchart of GTD model is shown in Fig. 3:

  1. 1.

    According to Eq. (3), we calculate the genre-trusts between each user for items belonging to different genres:

    $$ GT = \{ {\text{GenreTrust}}_{u,v,k} ,u,v \in U,k \in K\} ,\;GT \in R^{m \times m \times k} . $$
    (7)
  2. 2.

    We calculate all similarities between users and then sort each user’s user-similarity set, Simu = {Similarityu,v, v ∈ U}, in descending order and select the top s similar users as the corresponding user’s similar-user set; that is, SimUseru.

Fig. 3
figure 3

Algorithm flowchart of the genre trust degree (GTD) recommendation model

There are many ways to calculate user similarity. In the experimental part of this article, we choose Cosine correlation coefficient, Pearson correlation coefficient, and trust correlation coefficient for user similarity calculation methods for comparison experiments.

  1. 3.

    We take the ratings of these items rated by userv in the set SimUseru as similar users’ rating lists to useru:

    $$ SR = \{ {\text{SimRating}}_{u} \} = \{ r_{v,i} {,}v \in {\text{SimUser}}_{u} \} {.} $$
    (8)
  2. 4.

    Then, we use the corresponding genre-trusts of useru to weight these ratings to obtain the genre trust rating matrix GR, and this is given by:

    $$ {\text{GenreTrustRating}}_{u,i} = {\text{GenreTrust}}_{u,v,k} \times {\text{SimRating}}_{v,i} ,{\text{where}}\;{\text{Gen}}(i) = k. $$
    (9)
  3. 5.

    For itemi, we add up each item’s genre trust rating as follows, sort all items’ genre trust ratings in descending order, and select the top r items as the final items we recommend to corresponding useru:

    $$ GR_{i} = \sum\nolimits_{{u \in {\text{SimUser}}_{u} }} {GR_{u,i} } . $$
    (10)

Thus, the predicted score of useru for the system item i can be calculated as follows:

$$ \begin{aligned} p_{u,i} &= \sum\nolimits_{{v \in {\text{Si}} {\text{m}}_{u} ,{\text{Gen}}(i) = k}} {{\text{GenreTrustRating}}_{v,i} } = \sum\nolimits_{{v \in {\text{Sim}}_{u} }} {{\text{TrustValue}}_{u,v} \frac{{\sum\nolimits_{{S,{\text{Gen}}(r) = k}} {{\text{reRating}}_{v,s,r} } }}{{\sum\nolimits_{k} {{\text{ReRating}}(v,k)} }}N} \hfill \\ &=\sum\nolimits_{{v \in {\text{Si}} m_{u} }} {\frac{{\sum\nolimits_{i} {(r_{u,i} - \overline{r}_{u} )(r_{v,i} - \overline{r}_{v} )} }}{{\sqrt {\sum\nolimits_{i} {(r_{u,i} - \overline{r}_{u} )^{2} } } \sqrt {\sum\nolimits_{i} {(r_{v,i} - \overline{r}_{v} )^{2} } } }}\frac{{\sum\nolimits_{{S,{\text{Gen}}(r) = k}} {{\text{reRating}}_{v,s,r} } }}{{\sum\nolimits_{k} {{\text{ReRating}}(v,k)} }}N} . \hfill \\ \end{aligned} $$
(11)

The GTD model combines the user trust value with user credibility, providing a better recommendation method to resist different types of shilling attacks and recommend items to users. Our GTD model outperforms state-of-the-art algorithms in the following aspects: (1) combining user credibility with trust value is more reasonable and comprehensive than only considering the one-way trust value given by users; (2) the trust degree can compensate for the weaknesses of recommender system under shilling attacks; (3) the method weight items ratings with trust degrees—that is, combining the user-based recommendation algorithm with an item-based recommendation algorithm—to increase the results’ diversity and give our system more robustness.

Experimental setup

In this section, we first introduce the detail about the Ciao dataset used in our experiments, and then describe our experimental procedures and the results of applying our GTD model to recommend items to users in these datasets under shilling attacks.

Dataset analysis

The Ciao dataset presents the trust links directly in a trust file as trustor, trustee, and trust value, with 17,615 users and 16,121 movies. In the Ciao dataset, users do not only rate movies, but give reviews to movies. Furthermore, users can provide ratings for these movie reviews even if they have not provided ratings for these movies. There are 72,665 movie reviews and 1,625,480 review ratings. The dataset consists of three files, which are movie-rating.txt, review-rating.txt, and trusts.txt, and the specific data contents contained in each file are shown in Table 2. And the specific statistics of Ciao dataset shows that more than 76% ratings are high scores, while the data in Ciao dataset are very spares.

Table 2 Ciao dataset description

Generate attack data

We divide the Ciao dataset into two parts: training and test parts, and in our experiments, tenfold cross validation is used. For each experiment, we take ninefold as the training set and the last one as the test set. Ten rounds of executions are conducted, and then, we adopt the average result of them as the final performance result in one set of parameters. However, the extent of the data sparseness in the Ciao dataset is too severe; if we experiment directly on the original dataset, the performances of shilling attacks and recommendation algorithms will be too slight to reflect the improvement of our GTD model. For this reason, we preprocess the original dataset to select those users who rated at least 20 items with items that were rated at least 10 times.

As the aim of a shilling attack is to push or “nuke” the target items’ ratings and increase or decrease the recommendation times of target items, we take the random attack, average attack, and bandwagon attack—three frequently used shilling attack models—as the attack data generation methods in our experiment: the random attack and average attack are easy to implement, while the bandwagon attack has better attack effects [34]. We inject these attack data into the system to simulate shilling attack process, and due to newly attacked users often lacking trust relationship data, we randomly select trustees and choose the average trustee number in the original system as the attackers’ trustee number. In contrast to ordinary shilling attack data, we need to generate the attackers’ trust relationship. As with the pattern to determine the attackers’ profile size, we take the average number of users in the system user trust list as our attacker trust list size. Thus, to simulate real users and generate better attack effects, we take the attackers’ trustors from the users with a high trust value. Then, we introduce some dimensions of the attack data:

  • Attack size: Attack size is the number of attackers. Only a few malicious users can mislead recommender systems to make inappropriate judgments, even though they do not know the implementation detail or algorithm of the systems [35].

  • Profile size: Profile size is the rating number of a given attack profile. To better simulate real users’ rating patterns of behavior, we use the system average rating number as our shilling attackers’ profile size.

We set the attack sizes at 5%, 6%, 7%, 8%, 9%, and 10%, and the profile size is 20 in the three types of shilling attack methods simulated in our experiment. We conduct a series of contrast experiments using the cosine and Pearson correlation coefficient collaborative recommendation models, and they are defined as follows:

  • Cosine correlation coefficient:

    $$ {\text{Cos}}_{u,v} = \frac{{\sum {r_{u,i} r_{v,i} } }}{{\sqrt {\sum {r_{u,i}^{2} \sum {r_{v,i}^{2} } } } }}. $$
    (12)
  • Pearson correlation coefficient:

    $$ \rho_{u,v} = \frac{{{\text{cov}} (u,v)}}{{\sigma_{u} \sigma_{v} }} = \frac{{E((u - \mu_{X} )(v - \mu_{v} ))}}{{\sigma_{u} \sigma_{v} }} = \frac{{\sum {(r_{u,i} - \overline{{r_{u} }} )(r_{v,i} - \overline{{r_{v} }} )} }}{{\sqrt {\sum {(r_{u,i} - \overline{{r_{u} }} )^{2} \sum {(r_{v,i} - \overline{{r_{v} }} )^{2} } } } }}. $$
    (13)

Meanwhile, we also compare the recommendation results using the cosine and Pearson correlation coefficient as user similarity calculation methods in step 2 in chapter 4.2 of the GTD model to test and verify the robustness of our model.

Performance measures

  • MAE

MAE, short for mean absolute error, calculates the average absolute difference between the predicted ratings and real user ratings for all users and items in the dataset before and after a specified pattern shilling attack. MAE is defined as follows:

$$ MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {f_{i} - y_{i} } \right|} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {e_{i} } \right|} , $$
(14)

where n is the number of predicted item ratings. For a specified item i, \(y_{i}\) is the real user rating, while \(f_{i}\) is the predicted user rating of item i. In our experiments, item i is the target item ID of shilling attackers.

  • RecRate

Shilling attacks usually aim to deliberately increase or decrease recommendations for items. Thus, if shilling attack data are injected into the recommender system, the target items’ recommendations must be increased or decreased intentionally. The target item recommendation rate for one recommendation under a shilling attack is defined as follows:

$$ {\text{recRate}}_{{{\text{target}}}} = \frac{{\sum\nolimits_{u} {{\text{Recommend}}_{{u,{\text{targe}}t}} } }}{N}, $$
(15)

where u is the user in the test dataset and Recommendu,target equals 1 if the target item has been recommended to \(user_{u}\); otherwise, it equals 0. N is the number of users in the relative test dataset.

Results analysis

To verify the performance of the GTD model, first, to evaluate the improvement results of the GTD model in defending against shilling attacks compared with traditional CF recommendation algorithms and the EnsembleTrustCF algorithm, we use Eq. (12) to present the target item recommendation rate to users under different types of shilling attacks and take the original recommendation rate as baselines. Figure 4 shows the recommended rate (recRate) of target items using the recommendation methods of the traditional CF recommendation algorithm using correlation coefficient similarity computing methods and the GTD model using correlation coefficient similarity computing methods with no shilling attack data and we take them as the baselines. The movieID of the target item is “300”. Figures 4, 5 and 6 present the recRate of the target item under different shilling attack sizes, which are 5%, 6%, 7%, 8%, 9%, and 10%.

Fig. 4
figure 4

recRate of CF recommendation algorithms and GTD model with original data

Fig. 5
figure 5

Recommendation rate (recRate) under random attack

Fig. 6
figure 6

recRate under average attack

Benchmark recommendation ratio

Figure 4 shows that for both traditional collaborative filtering and GTD recommendation methods, the recRate of the target movie with the cosine correlation coefficient is higher than the Pearson correlation coefficient. In the meantime, the GTD model has a lower recRate than traditional collaborative filtering models generally, and this also shows that the GTD model has an impact on the diversity of item recommendations. In our experiments, we set these four data values as baselines and compare them with the recRate of the target movie after injecting shilling data into the system. The comparison results reflect different recommendation models defending against the effects of shilling attacks.

Experimental results and analysis

To more comprehensively verify the validity of the GTD model, the EnsembleTrustCF algorithm is selected for the comparison experiment. The EnsembleTrustCF algorithm is a recommendation algorithm based on a trust network proposed by Victor [36] et al. This algorithm takes trust value as the main weight consideration when predicting user ratings and has high coverage rate and prediction accuracy. The formula for the EnsembleTrustCF algorithm to calculate user-item ratings is as follows:

$$ P_{a,i} = \overline{{r_{a} }} + \frac{{\sum\nolimits_{{u \in R^{T} }} {t_{a,u} (r_{u,i} - \overline{{r_{u} }} ) + \sum\nolimits_{{u \in R^{S} \backslash R^{T} }} {s_{a,u} (r_{u,i} - \overline{{r_{u} }} )} } }}{{\sum\nolimits_{{u \in R^{T} }} {t_{a,u} + \sum\nolimits_{{u \in R^{S} \backslash R^{T} }} {s_{a,u} } } }}, $$
(16)

where Pa,i is the predicted rating that usera gives to \({\text{item}}_{i}\). ra is the average rating of usera and RT indicates the set of users trusted by usera. RS indicates a similar-user set to usera. Sa,u is the rating similarity between usera and useru. ta,u is the trust value of usera to useru. From the existing experimental results, it can be seen that the defense effect of using the Pearson correlation coefficient to calculate user similarity is better than that of using the cosine correlation coefficient. Therefore, we also use the Pearson correlation coefficient to calculate user similarity in the EnsembleTrustCF algorithm.

In this part, the experimental results of five algorithms under three different shilling attack methods are compared. They are the CF recommendation algorithm using the cosine correlation coefficient and using Pearson correlation coefficient, the GTD recommendation model using the cosine correlation coefficient and using Pearson correlation coefficient, and the EnsembleTrustCF algorithm. The detailed data are presented in Tables 5, 6, 7 and 8.

Effectiveness

We split the Ciao dataset into the training set and test set randomly. Then, we used different methods to calculate user similarities, create recommendation lists, and obtain statistics for the accuracy of these algorithms to prove the validity of the GTD model. The accuracy of the recommendation results of the Ciao dataset without shilling attack data with different test set sizes is shown in Table 3. Although different training sets may influence the results, the overall trend is obvious: the GTD model performs better than the traditional CF recommendation algorithms and EnsembleTrustCF. The Pearson correlation coefficient has higher accuracy than the cosine correlation coefficient.

Table 3 Accuracy without shilling attack data

recRate

In this part, we compare the recRate among the GTD model, traditional CF recommendation algorithms, and EnsembleTrustCF on the Ciao dataset. Defending against shilling attacks means that the recommendation results should not have an obvious change after injecting attack data. Thus, a lower recRate represents better performance.

  1. 1.

    recRate under Random Attack

    Figure 5 shows that the traditional CF recommendation algorithms, regardless of using the cosine or Pearson correlation coefficient similarity computing methods, are sensitive to random attacks: the rates of target items to be recommended have significant overall increases even if the attack size is small. On the other hand, the rates do not increase obviously when we use the GTD model. Furthermore, the EnsembleTrustCF algorithm is better than the GTD model with the cosine correlation coefficient, while the performance is worse than the GTD model with the Pearson correlation coefficient.

    Moreover, despite the possible presence of some distinctions, the overall trend is that a larger shilling attack size leads to better performance in misleading recommender systems to recommend the target item more frequently. Under the same shilling attack circumstances, the Pearson correlation coefficient CF recommendation algorithm yields better results for defending against shilling attacks than the cosine correlation coefficient method. The results of the GTD model show that the Pearson correlation coefficient performs better than the cosine method.

  2. 2.

    recRate under Average Attack

    Figure 6 shows similar experimental results to Fig. 5. Compared with Fig. 5, we can conclude that the recommendation effectiveness using cosine or Pearson correlation coefficient CF algorithms is similar under random attack and average attack. Besides, the target item recommended rates using the GTD model are slightly higher above the baselines than without shilling attack data. For shilling attack methods with random factors, the choice of the method for calculating user similarity is also one of the important factors affecting the algorithm effect. In the meantime, the shilling attack effect does not increase in proportion to the shilling attack size, meaning that only a few malicious users can mislead recommender systems to give inappropriate results. However, when using the same similarity calculation method, the GTD model which we proposed can achieve better shilling attack defense effects.

  3. 3.

    recRate under Bandwagon Attack

Figure 7 presents the recommendation rates of the target item under bandwagon attack. It is observed that the bandwagon attack has the best attack effect compared to random attack and average attack: the recommendation rates of the target item are improved to varying degrees. It can be seen that the recommendation rate of target items has been improved to different degrees after using bandwagon attacks. However, in contrast to random attacks and average attacks, the GTD recommendation algorithm has a better defense effect than the EnsembleTrustCF algorithm when using either the cosine correlation coefficient or Pearson correlation coefficient. Moreover, when the attack size increases, the recommended rate of shilling attack targets to normal users can be well controlled in the GTD model.

Fig. 7
figure 7

recRate under bandwagon attack

Mean average error

We select an attack size varying from 5 to 10% for the bandwagon attack and calculate the MEA over the test set: we choose 20% user profile data from the Ciao dataset as the test set. Then, we compare the MAE results under the four situations: cosine CF model, Pearson CF model, cosine GTD model, and Pearson GTD model. The results are shown in Table 4.

Table 4 Overall mean average error (MAE) of bandwagon attack

After interpreting the data within our experiments, we can come to the following conclusions:

  1. 1.

    The bandwagon attack has the best shilling attack effect, while the random attack is similar to average attack;

  2. 2.

    CF recommender systems are susceptible to shilling attacks, and only a small amount of malicious data can lead to visible attack effects;

  3. 3.

    The GTD model has very acceptable efficiency in defending against a shilling attack and recommending items to users;

  4. 4.

    The Pearson correlation coefficient is better than the cosine correlation coefficient in the GTD model and traditional CF recommender algorithms;

  5. 5.

    In some cases, the customized filtering conditions in the GTD model are so strictly restricted that they may impose limitations in terms of recommendation genres and selection range. This is an area which requires improvement in our future research.

Conclusions

Based on the intuition that, in recommender systems, users with higher credibility have more weight in recommendations and a user’s credibility differs for genres of items, we proposed the GTD model for trust-aware recommendation by providing genre trust degrees and improving the user similarity calculation algorithm. Compared with the traditional trust metric models, our GTD model focuses on high-credibility users and it can better handle new users’ recommendation problems. Moreover, the GTD model raises the cost of shilling attack injections; high-credibility users are needed in advance to interfere with system recommendation results, and this can support effective and sensible recommendations. The experimental results and extensive comparisons discussed in this paper have shown that our GTD model achieves better or comparable performance in defending different attack sizes of shilling attacks than the other two representative trust metrics. The experimental results of the recRate clearly show that the GTD model can control the rate of target items being recommended to users improperly within a certain range and ensures the reliability of recommendation systems.

Since user ratings are time-sensitive, in further study, we will include time sensitivity and ameliorate the issue of lacking new items in recommendations to adapt to the ime-sensitive requirements of the recommendation system.