1 Introduction

Nowadays, the development of internet innovation makes a big difference in the way of human’s life styles. Variant e-commerce websites, e.g., Yelp, Amazon and Taobao,Footnote 1 provide internet user with a convenient, efficient and relatively reliable online trading environment. More and more merchants prefer to build their virtual shops through different online platforms. Meanwhile, an increasing number of consumers gradually get used to this way of shopping, and automatically share their shop experiences and reviews by using an online review system which applied by the e-commerce website. Because most of these reviews come from the online consumers, they basically reflect the quality of product or the user experience. More and more people are accustomed to checking online reviews before placing an order to buy goods. Moreover, many merchants realize that the more positive online reviews they have, the more business transaction they have and they also can quickly expand and get high reputation.

In each online review system, the consumers’ reviews have a significant influence on merchants’ ranks. Some existing research works have proved that for each half-star upgrade, the restaurant’s sales increased 19 percent more frequently [1], and one star increase in the online rating system will bring five to nine percent increase in revenue [2]. Unfortunately, driven by the substantial economic benefits, many malicious merchants beginning to run illegal operations, including post spam online reviews deliberately. They publish spam reviews or opinions to promote their brand reputation and attract more consumers, due to people tend to purchase such product or choose services that are frequently bought, have top ranking, and have more positive feedbacks [3]. From the BBC News, there were approximately a quarter of Yelp reviews could be spam.Footnote 2

Further, making spam online reviews has become an industrial chain, malicious merchants can easily find some professional spam review writing services online, like the Sponsored Reviews,Footnote 3 which is a site where advertisers and bloggers get in touch to write paid reviews. This deteriorating online review environment let us have to face the task of spam review detection.

Spam online reviews, which are similar to opinion spam in some certain situations, refer to violation activities, such as writing spam reviews, that try to confuse consumers with imitating real buyers. Furthermore, malicious merchants hire the real user to post spam reviews directly. Above all of these increased the difficulty of defining spam reviews. Jindal and Liu [4] make some contributions on spam review detection, and first generally proposed three kinds of definition of online reviews:

  • Untruthful opinions: Those that intentionally deceive per users or conclusion mining frameworks via providing unworthy positive reviews to a lot of target objects in arrange to advance the objects and with giving unreasonable or pernicious harmful reviews to a few other objects in arrange to harm their notorieties.

  • Brands Reviews only: Do not write any useful information on the items but as it were the brands, the producers or the venders of the items. In spite of the fact that they may be valuable, we consider them as spam since they are not focused on at the particular items and are regularly one-sided.

  • Non-reviews: These kinds of reviews have two characteristics: (1) notices and (2) other unessential surveys containing no conclusions (e.g., questions, answers, and arbitrary writings).

Most online spam reviews can be covered by these three types up to now. The first type of reviews may cause the change of purchase decision or the negative effects on other begin merchants. Specially, this kind of spam reviews is difficult to identified. Based on these factors, many studies have applied some research work on detection untruthful reviews.

Besides, reviews on brands and advertisements also attract many researchers more interests, because these types of reviews have the potential of fraud. Some spammers may utilize the online review platform to broadcast their own illegal brand or stealing personal social contact information, or even induce consumers to conduct offline transaction, etc. These tricks bring some confusion to people, and make an enormous challenge for anti-spam framework [5]. In arrange to recognize different extortion designs, many scholars work on some adversarial tasks.

In this paper, we integrate the untruthful opinions and the fraud one comes from the other two types of reviews, and uniformly called as “spam reviews”. Spam audits are conflicting with the genuine assessment of items and attempt to deceive per users or intentionally overestimate or belittle one category things. The source of spam reviews might come from malicious merchants, individual spammers and fraud groups. Spam reviews take the form of various patterns designed by spammers [5, 6]. For occasion, the taking after has appeared two spam surveys was writing to Amazon review platform, which was identified with a model survey spam discovery framework [7]. After observing from “Review 1”, it is troublesome for human per users to decide whether the audit is spam or generous. Fortunately, on the off chance that a per user finds these two reviews at the same time, he/she will be able to capture the fundamental spam include to classify these two as spam reviews, due to both of them have settled semantic design almost diverse items. Clearly, the manual approaches of identifying spam audits are not attainable for this event.

  • Review 1: I did broad investigate some time recently selecting the SD60D, and I am excited with my buy. This camera is modest (littler than my iPod) and lightweight, but still takes extraordinary picture. The screen is much bigger than my friends’cameras, and it has all the additional settings that the normal individual should take incredible photographs is all sorts of conditions, I have not had any terrible or hazy pictures with it however. I am excited with this camera and would suggest it to everybody.

  • Review 2: I did broad investigate some time recently selecting the Kodak EasyShare C875, and I am excited with my buy. This camera takes extraordinary picture. The screen is much bigger than my friends’cameras, and it has all the additional settings that the normal individual has to take extraordinary photographs is all sorts of conditions, I have not had any terrible or hazy pictures with it however. I am excited with this camera and would suggest it to everybody.

The number of online spam reviews has increased year by year. From existing insights, spam reviews for around 2–6% at Proceline, Orbitz, Tripadivisor and Expedia [8, 9]. Especially, in online review platform, e.g., Yelp, the proportion of spam reviews is already up to 14–20% [10]. It is very urgent to build effective detection framework to identify spam reviews automatically. Until now, there are so many state-of-the-art methods proposed driven by the detection task. The main challenge of this kind of task has three folds as follows:

  • How to recognize a set of linguistic features from spam reviews?

  • How to deal with the lack of labelled reviews datasets?

  • How to utilize the relationship between products, consumers and reviews?

This research direction has attracted a lot of research attentions [4, 11,12,13]. Among them, they are mainly focused on basic language models which do not consider deep and relational information. Deep learning models have broadly been connected into numerous NLP assignments. Compared with conventional measurable models, new methods (e.g., deep neural networks and graph based methods) create a large space for new researches [14,15,16]. Recently, a few survey works have been published, there are three works to summarize the existing method for the spam review detection [5, 17, 18]. However, these three works have several shortages. First, they do not systematic summarize the labelled datasets and verify the availability of their listed data source. Secondly, they lack of the conclusion of graph-based technique, especially the rapidly graph convolution network method developed in recent years. Third, they fail to give a complete task classification to cover existing methods. To address these issues, we focus on three aspects to systematically summarize previous research works: existing method and available datasets, and provide some suggestions for future research. Especially, we will disentangle the graph-based strategies that have been proposed to unravel the issue of spam review discovery.

Our work first defines the mainly task of spam review detection. Then we present the existing state-of-art approaches, including four types of directions, such as feature engineering, traditional statistical models, neural network models and graph networks frameworks. In addition, we summarize some existing data resources and their data structure. Finally, we provide some construction research direction for the future.

2 Task Definitions and Concepts

Most of the existing spam review detection researches are driven by different task. Some researches are on the spam reviews themselves, and others are dedicated to identifying spammers or groups. From our selected research works, we identified tasks for detection of four categories: review mining task, end-to-end classification, cold-start problem respectively and spammer detection task, as shown in Fig. 1.

Fig. 1
figure 1

Distribution of Focused Research Works. Review Mining Task takes the percentage of 17%, end-to-end classification task is 22%, and cold-start problems have the same proportion as classification task. Spammer detection task takes the percentage of 39%

2.1 Review Mining Task

Review mining is a process of extracting linguistic features and context from online review platform, utilizing statistical models and feature engineering evaluates customers’opinions, and learning reviewer behavior to identify the spam reviews. It is generally modeled as a rule-based system.

KL Divergence: Kullback–Leibler divergence is broadly utilized to assess the distance between two likelihood disseminations. Lai et al. [19] utilized KL divergence to degree the separate between the combine of language models. Reference [19], which denoted as \(M_{d_{1}}\) and \(M_{d_{2}}\). They proposed untruthful review spam detection strategy which is supported by KL divergence and is characterized by:

$$\begin{aligned} \text{KL}\left( M_{d_{1}} \Vert M_{d_{2}}\right) =\sum _{t \in \left\{ d_{1} \cup d_{2}\right\} } P\left( t \mid M_{d_{1}}\right) \times \log _{2} \frac{P\left( t \mid M_{d_{1}}\right) }{P\left( t \mid M_{d_{2}}\right) } \end{aligned}$$
(1)

Then, they applied Joachim’s SVM packageFootnote 4 for the classifier modules.


MI Measures: Mutual Data degree has been utilized in collocational investigation in existing inquire about works. Mutual Data is an information-theoretic strategy for computing the reliance between two substances:

$$\begin{aligned} M I\left( t_{i}, t_{j}\right) =\log _{2} \frac{{\text {Pr}}\left( t_{i}, t_{j}\right) }{{\text {Pr}}\left( t_{i} {\text {Pr}}\left( t_{j}\right) \right. } \end{aligned}$$
(2)

Lau et al. [7] proposed a content mining show with a adjusted shared data degree for the location of untruthful reviews. They consider both term nearness and term nonappearance as prove for assessing the quality of the affiliation between a concept and its basic clear terms.


Multicriteria Decision: Viviani et al. [20] used a multicriteria choice making approach based both on the evaluation of numerous criteria and the utilize of accumulation administrators with the point of getting a veracity score related with each review. Based on this score, it is conceivable to identify spam reviews [20]. Specifically, they provide a definition of aggregation operator \(F\):

$$\begin{aligned} F: I^{n} \rightarrow I \end{aligned}$$
(3)

Then their aggregated value is calculated as follows:

$$\begin{aligned} F\left( a_{1}, a_{2}, \ldots , a_{n}\right) =\sum _{j=i}^{n} w_{j} b_{j} \end{aligned}$$
(4)

Cross-Platform E-Commerce Fraud Detection: Weng et al. [13] distinguished a bunch of platform-independent include from the world level Table 1, the semantic level and the basic level to separate extortion and typical things on diverse e-commerce stages.

Table 1 Description of features in CATS

Advance, CATS did an execution comparison try to choose the leading one from the six commonly utilized models: AdaBoost, SVM, Xgboost, Neural Organize, Decision Tree and Naive Bayes. The Xgboost show appears way better execution than other benchmark and have the capability of recognizing more features that can segregate whether an thing is false or ordinary.

2.2 End-to-End Classification

Many research works aim to build an end-to-end classifier to detect spam reviews. They took advantage of word embedding as the input of their non-linear or linear classifier. This kind of task can be subdivided into two directions: text classification and graph classification. More specifically, text classification usually modeled as two-class classification task or binary classification task and graph-based problem contained node, edge and sub-graph classification task.


LR & RNNLM: Fontanarava [21] proposed two classifiers on linguistic features and another classifier on meta-data and behavioral characters. They evaluated their models on the datasets from Yelp.com site. Particularly, they are based on Logistic Regression (LR) connected to literary features within the frame ofweighted bag-of-words, and on a generative model utilizing Repetitive Neural Network based Dialect Models (RNNLM) separately.


Graph-based Model: Recent years, there is a growing interest in constructing graph relationship among nodes and edges. Noekhah [22] built a graph-based model with three entities: review, reviewer and target. Then, they designed a algorithm to update corresponding spamicity degree of each entity iteratively to determine whether an entity is spam or not.

$$\begin{aligned} \text{ Spamicity } =\frac{\sum _{i=1}^{n} \sum _{j=1}^{m} f_{i} \times w_{i}}{n \times m} \end{aligned}$$
(5)

Bipartite Graph: Yang [23] used a bipartite graph with popular ranking algorithm to detect spam reviews. In this work, they focused on measuring the coherence of a review based on two major flow smoothness information among sentences: Word transition probability and Word concurrence probability. Specially, they defined one step transition probability as follows:

$$\begin{aligned} H(r)=\exp \left\{ -\frac{\sum _{i=1}^{n-1} \log \left( P\left( s_{i} \rightarrow s_{i+1}\right) \right) }{\sum _{i=1}^{n-1}\left| s_{i+1}\right| }\right\} \end{aligned}$$
(6)

At that point, the word concurrence metric for word \(w_i\) and \(w_j\) as \(\log (O_{i,j}/{O_i}{O_j})\). Encourage, the esteem of coherent metric for spam reviews is frequently lower than that of honest reviews. So this coherent metric is additionally exceptionally supportive to distinguish spam reviews and it can be mutually utilized with the previous one.

$$\begin{aligned} {\text {Con}}(r)=\frac{1}{n-1} \sum _{i=1}^{n-1} {\text {Con}}\left( s_{i}, s_{i+1}\right) \end{aligned}$$
(7)

GCN: Li [6] first applied graph convolution network to the spam review detection problem. The GCN-based strategies take after a layer-wise engendering way. In each proliferation layer, all the nodes upgrade at the same time. In common, a GCN based show can be composed as:

$$\begin{aligned} \begin{aligned} h_{N(v)}^{l} & = \sigma \left( W^{l} \cdot A G G\left( \left\{ h_{v^{\prime }}^{l-1}, \forall v^{\prime } \in N(v)\right\} \right) \right) \\ h_{v}^{l} & = C O M B I N E\left( h_{v}^{l-1}, h_{N(v)}^{l}\right) \end{aligned} \end{aligned}$$
(8)

Moverover, they proposed a heterogeneous graph to represent the interaction between products and users, and formulated to an edge classification problem with attributed nodes and edges.

2.3 Cold-Start Problem

Due to the lack of the labelled data and negative samples, some scholars start to deal with zero-shot learning issues. Moreover, fathoming the cold-start issue in survey spam discovery can offer assistance the online review websites to calm the harm of spammers in time. They mainly focus on positive labelled or unlabeled learning.


PU-learning: PU-learning learns from positive and unlabeled information, where \(P\) denotes a set of positive datasets and \(U\) a set of unlabeled datasets. The goal is to build a classifier using \(P\) and \(U\) to classify the data in \(U\) or a future test set \(T\). Li and Liu [24] proposed the method of learning from positive and unlabeled illustrations (or PU-learning).


TransE: TransE may be a demonstration which can encode the arrange structure and speak to the nodes and edges with the triple frame connection and tail in low-level measurement vector space. TransE has been demonstrated that it is nice at depicting the worldwide data of the chart structure by the work approximately distributional representation for information base. Therefore, Wang et al. [16] utilized TransE to encode the literary and behavioral data into the review embeddings for the cold-start spam location assignment. As appeared in Fig. 2, they take the items \(\beta\) as the head portion of the TransE arrange in their model, take the reviewers \(\alpha\) as the connection portion and review content embeddings which learnt by the CNN \(\tau\) as the tail portion.

Fig. 2
figure 2

Wang et al., model overview. They take the products items as the head part of the TransE network in their model, take the reviewers as the translation (relation) part and take the review as the tail part


$$\begin{aligned} S^{\prime }=\left\{ \left( \varvec{\beta }^{\prime }, \varvec{\alpha }, \varvec{\tau }\right) \mid \varvec{\beta }^{\prime } \in B\right\} \cup \left\{ \left( \varvec{\beta }, \varvec{\alpha }, \varvec{\tau }^{\prime }\right) \mid \varvec{\tau }^{\prime } \in T\right\} \end{aligned}$$
(9)

Attribute Enhanced Domain Adaptive Model: Inspired by the previous work Wang et al. [16], You and Qian [25] presented a neural network encoding the attributes, entities, and their relations, and leveraged the abundant information to alleviate data scarcity problem in the cold-start scenario of spam review detection.


JESTER: Li [26] proposed a user-item-review-rating representation for fraud reviews detection, they embedded the user-item social relations into their models to tackle cold-start problem. JESTER simultaneously considers three tasks: client looking into behavior learning, social connection conservation, and extortion review discovery, comparing to three learning misfortune capacities: behavior learning misfortune, social connection conservation misfortune, and extortion discovery misfortune. By mutually optimizing these three misfortune capacities, Entertainer learns the user-item-review-rating inferable representations for extortion survey discovery. A toy example of the architecture of the above model is shown in Fig. 3

Fig. 3
figure 3

Li et al., model overview. The embedding network consists four parts: user embedding layers, item embedding layers, review embedding networks, and rating embedding layers

2.4 Spammer Detection Task

The semantic and context analysis of online review content, as well as the deep learning method, can well capture the explicit and implicit features in language level from the review text only, but it will always have accuracy problems and affect the efficiency of identifying spam reviews. Therefore, increasing number of research works have devoted to the behavior analysis from the review posters, which can improve the detection performance. Online review spammer detection is a comprehensive analysis for the review content and the review posters.


SR Spammer: Xie et al. [27] focused on singleton review spammer (SR) detection and mapped the SR spammer detection problem to an abnoramlly correlated temporal pattern detection problem. They explored the relationship of reviewers, customers, spammers and SR spammers as shown in Fig. 4.

Fig. 4
figure 4

Xie et al., model overview. Relationship of Spammers, Reviewers and Customers


Group Spam Behavior Indicators: Mukherjee et al. [11] to begin with utilized a visit thing set mining strategy to discover a set of spam analyst groups. They utilized a few behavioral models inferred from the collaboration wonder among spam commentators and connection models based on the connections among bunches, people, and things they looked into to identify spammer bunches as appeared in Table 2.

Table 2 Mukherjee et al. indicators

Reviewer Trustiness: For further learning the relationship among reviews, item and reviewers, Wang et al. [28, 29] provided a review graph to recognize suspicious reviewers. They captured the connections by presenting three essential concepts, the trustiness of analysts, the trustworthiness of surveys, and the unwavering quality of stores. Uncommonly, the trustiness of a reviewer \(r\), which denoted as \(T(r)\) is a score of how much we can believe \(r\). The author, in arrange to illuminate these relations gives the common frame of trustiness work as take after:

$$\begin{aligned} T(r)=\frac{K}{1+e^{-K H_{r}}} \end{aligned}$$
(10)

SRD-BM & SRD-LM: As of late, Hussain et al. [30] worked on this errand and utilizes thirteen distinctive spammer’s behavioral features to calculate the review spam score which is at that point utilized to distinguish spammers and spam reviews. In the mean time, they utilized Linguistic Strategy (SRD-LM) works on the substance of the reviews and utilized change, include determination and classification to distinguish the spam reviews.

The system of SRD-BM begins with the recognizable proof and calculation of spammer behavioral features in unlabelled Amazon survey dataset. This demonstrate executes in four steps: (1) Calculating the normalized esteem of each spammer behavioral feature. (2) Computing the cruel score for each review and the in general precision of the total datasets. (3) Surveying the affect of each behavioral include and relegates a weight agreeing to the significance of each behavioral include. (4) Calculating spam score and distinguishes spam and not-spam surveys utilizing distinctive limit values.

In addition, generative adversarial models become more and more popular recently, because of they can generate negative labelled training data automatically and gradually enhance its discriminator. Aghakhani [31] first used generative adversarial network (GAN) to generate better double discriminator model for detecting deceptive fraudulent reviews. Zheng et al. [32] created one-class antagonistic nets (OCAN) for extortion discovery with as it were generous clients as preparing information.

3 Techniques of Spam Review Detection

Over the past decade, a developing number of analysts have worked to discover superior strategies to identify spam reviews. Some scholars have tried using traditional statistical methods to learn different aspects text features from large-scale datasets, and continually optimize evaluate results with various feature extractors. In addition, other people begin to utilize machine learning approaches to boost their detection frameworks. Specially, as more and more CNN/RNN methods are applied in the field of Natural Language Processing (NLP), researchers also propose some representative neural models to solve spam review detection problems. Due to the dependence among reviews, products and users, graph-based method is used to capture a set of features. This part will be introduced in Sect. 3.4.

3.1 Traditional Statistical Methods

From the perspective of existing investigate works, conventional methods need to extract various features from reviews and usually presented as a language model. Due to the spam reviews have some identified attributes, feature engineering is very necessary for statistical models. Further, the spam review writers could change the form of comments, and the detection models also need to adjust constantly. Previous researchers using various feature engineering are summarized as follows.

Jindal and Liu [4] first identified three types of spam reviews, and they then analyzed real-world datasets from Amazon. Through the statistic of datasets, they found a expansive number of copy and near-duplicate reviews. Then, they utilized 2-gram to calculate the closeness score of two reviews and review sets with closeness score of at slightest 90% were chosen as copies [33]. However, duplicates can only detect spam type reviews, Jindal et al. characterized a huge set of features to characterize reviews, totally up to thirty-five features, such as length of the review title, price of the product and so on. They isolated these features into three categories: review centric features, reviewer centric features and product centric features. For the model building, they used logistical regression with statistical package R.

Further, Mukherjee [11] proposed method first using several behavioral models derived from collusion phenomenon among spam reviewers to detect spam reviews as shown in Table 1.

Then, Mukherjee and Venkataraman [34] at that point considered an extra set of behavioral features around commentators and their review for learning, which significantly make strides the classification result on real world reviews datasets. For the behavioral study, they crawled profiles of all reviewers in their hotel and restaurant datasets and proposed eight behavioral features:

  • Activity window (AW): The difference of timestamps of the last and first reviews for that reviewer.

  • Maximum number of reviews (MNR): Due to 35.1% of spammers posted reviews in one day, the MNR per day is a suitable feature.

  • Review count (RC): The number of audits that a commentator has. This feature appears a clear division of spammers from non-spammers based on their looking into exercises.

  • Percentage of positive reviews (PR): The percentage of positive reviews, which have got 4 or 5 stars.

  • Review length (RL): The average number of words per review for each reviewer.

  • Reviewer deviation (RD): The sum that spammers veer off from the common rating agreement. They compute the supreme rating deviation of a review from other surveys on the same commerce.

  • Maximum content similarity (MCS): The creator computes the most extreme substance closeness based on cosine closeness between any two reviews of a analyst.

  • Tip count (TC): “Tip” function is unique to Yelp website which is a short (less than 140 charactors) descriptions and insights about a business.

In addition, Alom et al. [35] conducted spam reviews detection research work on Twitter website. They made use of several new features, which were more effective and robust than existing used features. Specially, some graph-based features was put forward: (a) triangle count of user’s network: \(\text {Triangle\_Count}(u)\) (b) the ratio of triangle count to number of followers of user:

$$\begin{aligned} \text {RateTNF}(u)=\frac{\text {Triangle}\_\text {Count}(u)}{N_{f e r}(u)} \end{aligned}$$

(c) the ratio of bidirectional links from the users’ social network:

$$\begin{aligned} \text {Rate}_{\text{ bilink }}(u)=\frac{N_{\text{ bilink }}(u)}{N_{f e r}(u)+N_{\text{ fing }}(u)} \end{aligned}$$

Meanwhile, Myo Myo Swe [36] proposed a modern and vigorous boycott creation for recognizing spam accounts on Twitter was proposed. The boycott was made utilizing LDA and TF-IDF strategies. In this work, there are fourteen content-based features proposed that can distinguish fake accounts from legitimate accounts, such as spam words ratio: \(ratio = \frac{CountsOfSpamWords}{TotalNumberOfWords}\), hashtag ratio: \(ratio = \frac{}{}\) and so on.

Jia [37] used linguistic feature, which respectively aims to term frequency, Latent Dirichlet Allocation (LDA) and word2vec, then merged into one model to conduct experiment. In details, they utilized topic modeling technique in natural language processing and extracted hidden topics from Yelp datasets. This work has extreacted five topics and each topic contains eight words in terms of fake review as follows:

  • Topic1: promise, quality, pushy, peeled, rationalize, podium, decorated and gulped.

  • Topic2: care, shots, spray, cliff, ramps, edge, comments and park.

  • Topic3: swirling, settle, breadth, strict, eavesdrop, split, discarded and stones.

  • Topic4: writing, reserve, injure, damn, autographed, hate, olfactory and zealand.

  • Topic5: cube, parings, shined, pomp, bamboo, heroin, absurd and unsalted.

Recently, Weng [13] from Alibaba Group, summarized eleven stage free features from the word level, the semantic level and the basic level to separate extortion and typical things. They had developed a cross-platform e-commerce fraud detection system, called CATS. Additionally, they selected Xgboost model as a binary classifier, and their evaluation results indicated that CATS achieves both high precision and recall [13]. According to their feature engineering, this research work have identified several features as shown in Table 1. These above features have achieved good results in cross-platform spam detection.

Another statistical based method presented by Lai et al. [19] successfully carried out review spam detection for untruthful review detection with an unsupervised probabilistic language model, for non-review detection with a supervised classifier. They outlined the probabilistic language modeling and Kullback–Leibler (KL) divergence based strategy for the discovery of untruthful reviews, and the Support Vector Machine (SVM) based approaches for the detection of non-reviews. LAU et al. [7] also proposed a content mining model is created and coordinates into a semantic language model for the detection of untruthful reviews.

3.2 Machine Learning Approaches

Machine learning approaches have been broadly utilized in handling spam reviews discovery. Most of existing investigate works can be classified three directions: supervised learning, semi-supervised learning and unsupervised learning (Table 3).

Table 3 Comparison of supervised methods in previous work

3.2.1 Supervised Learning Model

Supervised learning model relies on a large amount of labelled datasets. Ott et al. [8, 46] collected a balanced datasets of TripAdvisor reviews and trained a supervised deceptive classifier based on the labelled datasets. They proposed a common system for assessing the predominance of double dealing in online review communities and discovered a Bayesian based model as a classifier that distinguished truthful from deceptive reviews. They further compared with Naïve model and Bayesian model, and found that Bayesian Predominance Demonstrate (bayes) addresses Naïve strategy confinements by modeling the generative prepare through the joint likelihood dissemination of the watched and inactive information. Their proposed prevalence models is denoted as:

$$\begin{aligned} \pi ^{*}=\frac{1}{N^{\text{ test } }} \sum _{i=1}^{N^{\text{ test } }} y_{i} \end{aligned}$$
(11)

Inspired by previous work that Support Vector Machines (SVM) trained on n-gram features perform well in spam detection tasks, this work trained linear SVM classifiers using the LIBSVM software package, and represented reviews using unigram and bi-gram bag-of-words (BoW) features.

Mesnil [39] used a supervised reweighing of the counts as in the Naïve Bayes Support Vector Machines method and achieved strong results on a dataset of IMDB movies reviews.

Siagian [38] abused work words as a imminent include and combined with character n-grams as an input include for recognizing beguiling and honest review.

Yang [40] proposed an unbalanced support vector machine to deal with the lack of manual labeled deceptive reviews. WMUSVM model was first proposed in this paper based on hypersphere with maximum volume, containing all deceptive reviews data, and all true reviews data are outside of this hypersphere, as shown in Fig. 5.

Fig. 5
figure 5

Yang et al., WMUSVM Algorithm Overview. This demonstrate builds up hypersphere with greatest volume, containing all beguiling reviews information, and all genuine reviews information are exterior of this hypersphere

Kennedy et al. [41, 42] utilized the BERT model for pre-training their word embeddings. So far, most well-known machine learning methods were used as benchmark classifiers, such as Naïve Bayes, neural network and support vector machine [47,48,49]. Barushka et al. [47] demonstrated the central importance of text preprocessing strategies in detecting spam reviews among these three methods. The experiment result indicated that number and length of the extracted word segments had major effect on the performance of the classifiers.

A random forests method was chosen by Nilizadeh [43] as a classifier. This was since of its value in a wide assortment of applications, its resistance to over-fitting, and its utility in understanding feature significance [50, 51].

Tingxuan et al. [44] used under-sampling and over-sampling techniques to expend training datasets for imbalance learning. An implementation of gradient boosted decision trees designed by Sihombing [45], which aimed to build a deep learning model that can detect spam and non-spam reviews on datases come fomr Yelp.com.

3.2.2 Semi-Supervised Learning Model

Semi-supervised learning models combine many labeled information and a huge number of unlabelled information to prepare a classifier for the discovery of spam reviews.

Most of the previous research works focus on proposing a novel angle to the problem by modeling positive unlabelled (PU) learning. PU learning generally has two classes of framework:

  • Constructing a classifier by using positive sample dataset and some samples of the unlabeled dataset.

  • Learning a classifier by using positive sample dataset and the full unlabelled dataset.

Further, PU learning aims to build the major classifiers using positive and unlabelled samples with four steps:

  • Extracting the reliable negative samples.

  • Calculating the representative positive and negative samples.

  • Generating the similarity weights for spam samples

  • Building the major SVM-based classifier.

Li and Liu (2014) first reported a supervised learning study of two classes, spam and unknown with a Chinese review dataset from Dianping.Footnote 5 They focused on using text content, since it can detect spam reviews right after posting and spam reviews thus has less damage. Further, they used Support Vector Machines (SVM) and Positive and Unlabeled learning (PU) to detect spam reviews [52]. Li and Chen [52] moreover utilized Dianping’s online review datasets to examine the basic instrument of supposition spamming and perform a supervised learning on the double classification errand. They utilized the perplexing conditions among reviews, clients and IP address to propose collective classification calculation called Multi-typed Heterogeneous Collective Classification (MHCC) and after that amplified it to Collective Positive and Unlabelled learning (CPU). Ren et al. [53] created a blending populace and person property PU learning (MPIPUL) show. For this work, the PU learning was proposed based on Latent Dirichlet Assignment (LDA) and SVM.

Hai et al. [54] proposed a semi-supervised different errand learning strategy through Laplacian regularized calculated relapse to boost the review spam discovery capability. Wu et al. [55] made use of both labeled and unlabelled data to conduct a semi-supervised learning model based on Bayesian inference. A semi-supervised learning system, named SPR2EP (SPam Survey REPresentation), built by utilizing report (review) and hub (reviewer and item) embeddings independently [56]. They assessment comes about appeared that demonstrate that was built by utilizing the combined include vectors accomplish superior execution.

3.2.3 Unsupervised Learning Model

Unsupervised learning model only utilizes a set of unlabeled data to discover the potential relationship among reviews. The existing research works refer to utilizing Generative Adversarial Nets (GAN) to generate spam samples from the original labelled samples and enhance further training process. In general, GAN is formalized as a minimax game with the value function:

$$\begin{aligned} V(G, D)=\mathbb {E}_{\mathbf {x} \sim p_{\text {data}}}[\log D(\mathbf {x})]+\mathbb {E}_{\mathbf {Z} \sim p_{z}}[\log (1-D(G(\mathbf {z})))] \end{aligned}$$
(12)

Due to the difficulty of manually labelling, Zheng et al. [32] applied a one-class classification method to solve the lack of labelled spam reviews datasets. For this work, they only learned the representations of benign reviewers with LSTM-Autoencoder method, named OCAN [32]. Interestingly, OCAN generated many complementary samples of benign reviewers and boosted the capability of discriminator, and then the generator kept trying to make the discriminator fail to identify. After this self-enhancement processing, the detection model can adaptively update a text representation once the reviewer commits a new comment and predict whether the review was a spam or non-spam.

3.3 Neural Networks Models

Neural networks methods, also known as deep neural networks, have been broadly utilized within the field of computer vision, such as Convolutional Neural network (CNN) and represented the sequential information, such as Long Short-Term Memory (LSTM) or Recursive Neural Network (RNN).

Wang et al. [12] endeavored to utilize Long Short-Term Memory Repetitive Neural Organize system to distinguish spamming reviews. They established three types of layers to predict spam reviews, the input layer for receiving data, hidden layer of LSTM and output layer respectively. Liu and Jing [57] explored with bidirectional long short-term Memory (BiLSTMWF) to study document embeddings for detecting deceptive reviews. They formulated the spam reviews detection task as a two-class classification problem and then added the feature embeddings to BiLSTMWF model by aggregating the feature representation. Generally, the BiLSTM neural networks consist of input gate, forget gate and output gate, which these following equations as:

$$\begin{aligned} \begin{aligned} i_{t}&=\sigma \left( W_{m i} m_{t}+P_{m i} h_{t-1}+b_{i}\right) \\ f_{t}&=\sigma \left( W_{m f} m_{t}+P_{m f} h_{t-1}+b_{f}\right) \\ o_{t}&=\sigma \left( W_{m o} m_{t}+P_{m o} h_{t-1}+b_{o}\right) \\ c_{t}&=f_{t} \odot c_{t-1}+i_{t} \odot \tanh \left( W_{m c} m_{t}+P_{m c} h_{t-1}+b_{c}\right) \\ h_{t}&=o_{t} \odot \tanh \left( c_{t}\right) \end{aligned} \end{aligned}$$
(13)

Barushka [58] used deep feed-forward neural network (DNN) to handle the high-dimensional feature representation and classified online reviews into spam and legitimate categories. A CNN model was developed by Archchitha et al. [59] to detect opinion spam. They mapped highlight tokens to a particular inserting utilizing Worldwide Vectors for Word Representation demonstrate (GloVe) pre-trained word implanting demonstrate and developed their CNN with three parallel convolution layers with diverse channel sizes.

Yuan et al. [60] designed a hierarchical fusion attention network to facilitate learning semantic representations from reviewers and items level. They considered a reviewer may post several reviews and used TransH to encode the relationship among reviewer, reviews and products [61]. Additionally, they evaluated their models, named HFAN, on four public spam reviews datasets, such as Mobile01 Reviews, YepCHI, YelpNYC and YelpZip, and the neural networks based models outperformed feature based methods. This work designed some major components for HFAN model: (1) To capture the user-related semantic features of the review at word level, they design the multi-attention unit (MAU). The MAU was a attention mechanism to summarize the local context matrix to extract user-related words, which was denoted as follows:

$$\begin{aligned} \begin{array}{l} {\varvec{v}}_{j}^{(.)}=\sum _{t=1}^{2 r+1} \alpha _{t}^{(.)} {\varvec{X}} \\ \alpha _{t}^{(.)}=\frac{\exp \left( {\varvec{u}}_{t}^{(.)}\right) }{\sum _{k=1}^{2 r+1} \exp \left( {\varvec{u}}_{k}^{(.)}\right) } \\ {\varvec{u}}^{(.)}=\tanh \left( {\varvec{X}} {\varvec{W}}_{x}^{(\cdot )}+{\varvec{U}}_{j} {\varvec{W}}_{u}^{(.)}\right) \end{array} \end{aligned}$$
(14)

(2) To obtain the sentence representation, this paper used linear layer and max pooling on sentence matrix:

$$\begin{aligned} \begin{array}{l} {\varvec{S}}_{i}=\tanh \left( {\varvec{V}}_{i} {\varvec{W}}_{v}+{\varvec{b}}\right) \\ {\varvec{s}}_{i}^{u}=\max _{d i m=1}\left( {\varvec{S}}_{i}\right) \end{array} \end{aligned}$$
(15)

(3) They utilized dot-product attention to calculate the fusion matrix to build the relationship between the two reviews matrices. A TransH based model applied to model user-review-product relationship and defined as:

$$\begin{aligned} l({\varvec{u}}, {\varvec{d}}, {\varvec{p}})=\left\| \left( {\varvec{u}}-{\varvec{w}}_{d}^{T} {\varvec{u}} {\varvec{w}}_{d}\right) +{\varvec{d}}-\left( {\varvec{p}}-{\varvec{w}}_{d}^{T} {\varvec{p}} {\varvec{w}}_{d}\right) \right\| _{2}^{2} \end{aligned}$$
(16)

(4) The fully connected layers are proposed and \(softmax(.)\) function is used to convert the output values:

$$\begin{aligned} \begin{array}{l} {\varvec{y}}={\varvec{W}}_{c}\left( {\text {relu}}\left( {\varvec{d}}_{j} {\varvec{W}}_{d}+{\varvec{b}}_{d}\right) \right) \\ p_{i}\left( c \mid {\varvec{u}}_{k}, {\varvec{d}}_{j}, {\varvec{p}}_{i}; \theta \right) =\frac{\exp \left( {\varvec{y}}_{i}\right) }{\sum _{k=1}^{c} \exp \left( {\varvec{y}}_{k}\right) } \end{array} \end{aligned}$$
(17)

A DeepSpot proposed by Nayak [62] to recognize spam and non-spam reviews based on the real-world reviews and the generated reviews. The DeepSpot applied three well-known supervised learning algorithms for text classification, such as support vector machines, naive bayes, and random forest. Further, they prepared an encoder-decoder system as the reviews generator utilizing Bidirectional LSTM with word embeddings. The change of exhibitions assist demonstrated the thought that the neural arrange can capture more complex setting data that was troublesome to extricate utilizing conventional discrete manual features [63]. Specifically, the spam review indicator in DeepSpot was built by stacked LSTM models and outputs the prediction of each review being spam or non-spam. The proposed architecture of the spam review indicator as shown in Fig. 6.

Fig. 6
figure 6

Nayak et al., review indicator framework. They demonstrate the review spamicity expectation as a two-class classification issue. The review spamicity pointer was built utilizing stacked LSTM models and show yields the individual probabilities of each review being genuine or fakes

3.4 Graph-based Methods

The graph-based algorithm has been widely used in representation learning on networks, such as the social network and knowledge graph. Recent years, the researchers gradually realize that above feature-based methods ignore the relationship among reviews, reviewers and products. However, under some circumstances, the connection between different objects plays an important role in spam review detection. For this reason, some researchers began to apply the graph-based method to capture text features among different entities. Due to the motivation of graph embedding, existing works are focused mainly on graph neural networks and graph convolutional networks.

3.4.1 Graph Neural Networks

Graph neural networks (GNNs) are deep learning based methods that operate on homogeneous or heterogeneous graphs [64]. Machine learning assignments in GNN can be classified into hub classification for foreseeing a sort of a given hub, connect expectation for foreseeing whether two hubs are connected, community location to recognize thickly connected clusters of hubs and organize closeness for assessing the degree of two systems.


Review Graph: The primary GNN-based spam review location strategy was proposed by Wang et al. [28]. They built a heterogeneous “review graph” to speak to the relationship among analysts, surveys and stores [28]. This was the primary time a chart demonstrate with three sorts of nodes that have been utilized to capture spam survey clues. Each node was joined with a set of highlights. For occasion, a store node had highlights approximately its number in case reviews, its rank rating, etc. They encourage proposed three crucial concepts to recognize diverse substances, i.e. the trustiness of commentators, the trustworthiness of reviews, and the unwavering quality of stores. Wang too created an iterative computation system (ICF) to compute unwavering quality, trustiness and genuineness, by investigating the inter-relationship among them.


Spamicity Degree: Noekhah [22] first proposed “Spamicity” concept to define what extent the entity is spam. First, they extracted proper and efficient features from Amazon datasets, and then designed an effective learning algorithm to update corresponding “Spamicity” degree of each entity iteratively [22]. Further, Noekhah updated entity “Spamicity” degree iteratively based on their features and the values from last iteration and utilized final value to distinguish whether an entity is spam or not.


SpEagle: Rayana and Akoglu [14] utilized clues from all metadata, such as content, timestamp and rating from Yelp.com datasets [14], as well as social information, and combined them employing a bound together system to identify spam reviews. Encourage, they built a user-review-product demonstrate with a pairwise Markov Irregular Field (MRF) [65], to handle a network-based classification assignment. They too planned a light adaptation of SpEagle called SpLite which employments a really little set of review features to boost the computation speed.


Coherence Metrics Computation: Yang [23] found that human composing will illustrate certain word move designs actually between two continuous sentences. When a word was given in one sentence, certain words can be watched in its taking after sentence with a few probability [66]. Be that as it may, such move designs in spam reviews can be impeded due to their beguiling nature. At that point, they to begin with characterized some reviews’ coherent measurements to analyze review coherence within the unit of sentence [23]. They proposed a bipartite chart to show all store-sentiment word sets, set of reviews and the association between reviews and store-sentiment word sets. Assist, Yang given a few measurements to degree the coherence of a review based on two sorts data: word move likelihood, word concurrence likelihood.


NetSpam: Shehnepoor [15] utilized spam features for modeling review datasets as heterogeneous data systems to outline spam review location strategy into a classification issue in such networks [15]. The most commitment in this work was that they proposed distinctive metapath sorts which were the inventive within the spam review discovery assignment. The metapath expanded the concept of edge sorts to way sorts and depicted the diverse relations among hubs through roundabout joins, i.e. ways, additionally inferred differing semantics. Encourage, they created the classification portion for recognizing errand with two steps, such as metapath weight calculation and last likelihood calculation.


ATF: Weng et al. [67] created an productive and adaptable AnTi-Fraud framework (ATF) to identify e-commerce fakes for large-scale e-commerce platforms. For the engineering of ATF, they found three components: preprocessor, Graph-Based Discovery module (GBD), and Time Arrangement based Location module (TSD). The GBD was planned as a user-item bipartite chart as portion of the by and large framework for performing spam discovery leveraging the basic and behavioral characteristics of e-commerce spam activities.


Trust Propagation: Xue [68] proposed a three-layer believe engendering demonstrate based on the inter-dependencies between three sorts of hubs: clients, surveys, and statements. Distinctive from the bipartite graph-based two-layer models, the three-layer demonstrate given an extra halfway layer to speak to the impact on one review due to the other review approximately the same protest. Assist, they created an iterative content-based computational demonstrate to compute genuineness scores for diverse substances.


Ianus: Yuan [69] used a Sybil detection method that leverages account registration information. They modeled spam detection as a graph inference problem, which integrated heterogeneous features. Further, they constructed a registration graph that integrated the heterogeneous synchronization and anomaly patterns.

SemiGNN: The research work from Wang et al. [70] mainly focused on tackling three challenges: the bridge between labeled data and unlabeled data, the data heterogeneity and the study of an interpretable model. To address these challenges, they proposed semi-supervised graph neural model with attention mechanism.


GEM: Liu et al. [71] displayed a neural network-based chart demonstrate, named Chart Embeddings for Pernicious accounts (Pearl), which both considered “Device aggregation” and “Activity aggregation” in heterogeneous charts. They centered on managing with the situation of different sorts of nodes and proposed an consideration instrument to memorize the significance of each sort of nodes. Further, they partitioned the arrange into subgraphs concurring to node sorts and calculated the consideration coefficients in recognizing the spam accounts [71].

3.4.2 Graph Convolutional Networks

Graph convolutional networks can be considered as a simplification of the traditional graph spectral methods, and its common strategy is to model a node’s neighborhood as the receptive field and then apply the convolution operation to the deep-learning processing. The graph convolution operator is denoted as feature aggregation of one-hop neighbors. Utilizing the multi-layer convolution operation, information can be transferred among multiple hops. GCN-based method achieves significant improvements compared to previous graph neutral network methods such as DeepWalk [72]. After Perozzi et al. [72] first proposed DeepWalk, which applies SkipGram model [73] on the generated random walks, a large number of scholars have engaged in this area.


GCNN: Alhosseini [74] developed a model based graph convolutional neutral networks (GCNN) for spam bot detection. For this work, the key idea was that an inductive representation learning approach for spam review detection based on the reviewer profile information and the social network graph on twitter datasets was proposed. Further, the inductive representation learning method was similar to GraphSAGE [75] that had a propagation layer with two sub-layers: aggregation and combination. Finally, they compared with multilayer perceptron (MLP) and belief propagation (BP) [76] and gained better performance in detection task.


FdGars: Wang [77] to begin with connected chart convolution arrange approach for spam review location in online app review system. Particularly, they extricated substance highlights and behavior highlights for each analyst based on their review logs. At that point, the review logs were changed into a rule-based chart structure. They moreover planned a naming strategy to name tall suspicious spammers and start clients. Encourage, they prepared a semi-supervised GCN show to memorize node highlight and chart structure, and assessed FdGars by leveraging the real-world review datasets from Tencent App Store.


MNCN: Ghadery [78] proposed a deep convolutional network architecture with three different objective functions at the same time to address spam review detection. Specially, they considered three parallel convolution layers to capture text features from the input reviews, such as convolution filer, n-gram feature maps and max-pooling layer [78].


GAS: Li and Qin [6] first applied GCN-based method to the spam advertisements detection problem and extended GCN algorithm for heterogeneous graph. The heterogeneous graph presented the local context among reviews, users and products, while the homogeneous graph utilized global context which only extracted from reviews. Particularly, the keypoint of heterogeneous chart was to customize conglomeration sub-layer and combination sub-layer with consideration instrument and time-related inspecting procedure. Other than, they utilized surmised KNN chart algorithm [79] to develop the comment chart based on K closest neighbor of nodes. Assist, they utilized the TextCNN [80] show to urge comment implanting and coordinated it to their chart neural organize show as an end-to-end classification system.

4 Data Resource

As mentioned above, most of the spam review detection tasks are highly dependent on labeled data. However, there are less well-labeled datasets for supervised learning task or semi-supervised learning task in real-world. Moreover, the available data resources from existing research work are also hard to collect. In this section, we mainly focus on the open source datasets.


Tripadvisor: This corpus comprises of honest and beguiling lodging reviews of 20 Chicago lodgings. The information is depicted in two papers concurring to the assumption of the review [46, 81]. Modeled as a graph, it only has two entities {hotel, reviews} and two classes {truthful, deceptive}.

This dataset contains 400 honest positive reviews from TripAdvisor, 400 misleading positive reviews from Mechanical Turk, 400 honest negative reviews from Expedia, Hotels.com, Orbitz, Priceline, TripAdvisor and Cry and 400 beguiling negative reviews from Mechanical Turk. Each of the over cluster comprises of 20 reviews for each of the 20 most prevalent Chicago inns in Table 4.

Table 4 Op_spam_v1.4 Datasets statistics

Amazon: This dataset contains item surveys and metadata from Amazon, counting 142.8 million surveys crossing May 1996–July 2014 [82, 83]. This dataset incorporates reviews (evaluations, content, supportiveness votes), item metadata (depictions, category data, cost, brand, and picture features), and joins (too viewed/also bought charts). Table 5 has appeared different features of diverse categories of products [84].

Table 5 Various features of different categories of products

Each amazon.com review data contains the following features: user/item interactions, star ratings, helpful score, timestamps, product reviews, price, brand, category information and other metadata. Specially, the “helpful score” describes the helpfulness rating of the review. e.g., 2/3. Some scholars used this indicator to carry out the threshold of supervised training and apply it to the classification task [85].


Yelp: We are able utilize the comes about of the Yelp.com commercial spam review channel as the ground-truth for execution assessment by slithering the “not-recommended” information at the foot of review page. In the mean time, there are three datasets from Yelp.com accessible to utilize, its rundown insights is given in Table 6.

Table 6 Review datasets in Yelp.com

YelpCHI dataset has been first utilized by Liu [86] and contains user comments from restaurant and hotel domains in the city of Chicago from Yelp website. NYC and ZIP was created by Rayana et al. [14]. NYC contains online reviews for restaurant located in NYC. ZIP was collected reviews for restaurants according to zipcode. The zipcodes are organized by topography, as such this prepare gives us reviews for eateries in a ceaseless locale of the U.S. outline, counting NJ, VT, CT, and PA [14].


Dianping: This Chinese dataset consists of filtered (spam) reviews and unfiltered (unlabeled) reviews from 500 restaurants in Shanghai. It can be able to make three types of nodes: User, Review, IP address. This dataset was used in Liu et al. [52] and the statistics of this dataset has been shown in Table 7.

Table 7 Statistics of the 500 restaurants in Shanghai

5 Conclusion and Future Work

In above sections, we introduce the basic motivation of detecting spam review in e-commence platforms. Then, we present the category of detection task, including review mining, end-to-end classification and cold-start problem. Further, we summarize the existing techniques of spam review detection and divide into three categories: machine learning, neural networks and graph-based methods. Specifically, we discuss machine leaning methods in details from three aspects: supervised learning, semi-supervised learning and unsupervised learning. Then, we first collect some state-of-art graph-based techniques for spam review detection and summarize the core idea of each approach from two subcategories: GNN and GCN, respectively. Finally, we show four available datasets from public websites and describe the data structure of each open source dataset.

Previous researches have done substantial work in spam reviews detection. Most scholars have used supervised learning, pattern discovery, graph-based methods, and relational modeling to solve the problem. However, there is a lack of state-of-art GCN based techniques for real-world spam review detection. So, designing an effective graph convolution network algorithm will be a promising research direction for spam review detection task.