1 Introduction

Although the significance of writing style and authorial linguistics in identifying fake reviews has beenpreviously explored (Chang & Chen, 2019; Mousavizadeh et al., 2022; Tripathi et al., 2022) its impact on persuading consumer buying preferences is not well understood. By “persuading consumer buying preferences”, we mean nudging online browsers to consider or reject a product before the actual act of buying (Zhang & Piramuthu, 2016). Specifically, through fraudulent reviews and misrepresentation of facts (Ismagilova et al., 2020). Extant literature establishes that deciphering customer purchase behavior remains a crucial marketing objective, wherein buying preferences play a central role (Yan et al., 2015; Bitran & Mondschein, 1997; Han et al., 2009). Furthermore, previous research on misrepresentation of online word of mouthshows that consumers may fail to detect manipulations, and thus, their buying preferences might be driven by fake reviews (Kumar et al., 2022).

While most of the past studies have tried to identify the motivation and rationale behind misinformation, the literature is silent on the underlying characteristics of misinformation (Bang et al., 2021; Valecha et al., 2021), particularly, fake reviews (Assaf et al., 2015; Sun & Kim, 2013; Thakur et al., 2018). A few attempts using linguistic features appear in the extant literature (Aghakhani et al., 2022; Banerjee & Chua, 2017; Tripathi et al., 2022), however, their practical applications are limited. Other past studies have offered a fragmented understanding of the dynamics concerning fake reviews; either the social factors of fake (and negative) reviews are identified, or the impact of fake (and positive) reviews on reinforcing purchase intentions is analysed. (Salehi-Esfahani & Ozturk, 2018). However, there have been no deep dive attempts in understanding the inherent nature of fake reviews based on how they affect a product evaluation. Furthermore, there have been no studies to compare these characteristics with that of genuine reviews.

There is a dearth of insights owing to this lack of theoretical investigation and therefore, absence of mechanisms to detect the missing links between linguistic features and persuading buying behavior. Moreover, the underlying characteristics of fake reviews have not comprehensively been studied, and their impact on nudging product preferences remains largely unknown. Given this gap in the extant literature, an important question needs to be addressed: How does review manipulation affect product recommendation or discouragement? Grounded in the above arguments, findings of past works, and founded on the theory of environment psychology in the form of the Stimulus-Organism-Response (SOR) model, we believe that an answer to the above-posted research question lies in the pillars of SOR model (Mehrabian & Russell, 1974; Mo et al., 2015). The stimulus being the review itself, influencing the consumer as organism and finally, shaping the response through product recommendation or discouragement (Mehrabian & Russell, 1974).

In this context, a consumer’s belief may be distorted depending on whether the review is honestly based on the product experience, or is spurious and without experience (Senecal & Nantel, 2004). Therefore, it becomes crucial to detect the role of underlying characteristics of fake reviews on persuading buying preferences, and whether they are governed by the authorial writing styles (Sun & Kim, 2013; Tripathi et al., 2022). As a result, we aim to capture how fake reviews’ latent or hidden characteristics influence product recommendation or discouragement. Since reviews are short descriptions of consumers’ experience, they consist of diverse vocabulary and thus are difficult for manual comprehension to unveil and identify common concerns. We use computational intelligence comprising text mining and natural language processing (NLP) to uncover these hidden characteristics (Sheth, 2021; Pitt et al., 2020). These dimensions are latent in nature and otherwise are unobservable by manual comprehension. The study at hand utilizes the technological strength complemented with an advanced analytics application, structural topic modelling (STM) (Fresneda et al., 2021), to understand the cognitive side of consumer response behavior. Therefore, the contribution of our work falls at the interface of information science and consumer behavior for electronic commerce.

In this realm, results in the extant literature are disjoint in terms of the context, such as the geography (Zhu & Zhang, 2010), associated services (Mejia et al., 2020; Zhu et al., 2021), and particularly, the product category (Ghose & Ipeirotis, 2011; Li et al., 2022; Mousavizadeh et al., 2022; Nikolay et al., 2011). Thus, we answer the above research question for a comprehensive set of product categories to give an overarching set of findings. The resultant hypotheses which we put to test in this research are as follows: (1) can product-specific linguistic features alone explain the persuasion of consumer purchase preferences; (2) whether linguistic features help in explaining the product-specific underlying characteristics of fake reviews; (3) whether product-specific fake review characteristics affect persuading buying preferences, and finally, (4) whether product-specific underlying characteristics of fake reviews play an intermediary role between linguistics features and buying preference persuasion?

Previous studies have observed that buying intentions are influenced mainly by customer satisfaction, particularly ratings, for e-commerce ( Ismagilova et al., 2020; Ku, 2012; Shiau & Luo, 2012). Barring the act of receiving the ordered product, almost all transactions in the e-commerce eco system are made virtually, where the opinions of the online crowd matter (Pi et al., 2011). Using ratings as a close proxy to satisfaction has also reflected mixed findings on the subject of their impact on product preference (Ku, 2012; Shiau & Luo, 2012; Tripathi et al., 2022), and therefore, we test for their influence on the above-postulated relationships. Accordingly, we investigate the moderating effect of ratings on the relationship between underlying characteristics and the persuasion of buying preferences. Summarily, we propose a novel approach to unveil the features underlying fake reviews (both positive and negative) and their influence on nudging buying behavior.

To have a more generalized conclusion, we also verify our conceptualization for a collection of real reviews. This comparison reveals that fake and honest reviews have significant authorial features that may have an impact on persuading buying preferences. This study shows that the complex relationship between authorial writing style-based features and buying preference propulsion is fully recognized through the mediating role of characteristics underlying genuine reviews. One of our central findings is that fake reviews have latent characteristics that are superficial and lack the potential to impact persuading buying preferences. Our findings have significant implications for the practice, as we conclude that permitting fake reviews on e-commerce portals may not offer any significant economic benefit. Our conclusion is also consistent with a few seminal works; however, they are mainly based on lab experiments or survey-based methodology. Previous works have elucidated the importance of relying on first hand opinion of the vast consumer base, for example, the importance of using online reviews to understand the adoption of consumers in multiple contexts like e-learning (Chakraborty & Biswal 2022), mobile payments (Kar & Aswani, 2021), food delivery services (Ray and Bala 2021) and online hotel bookings (Zhao et al., 2019). None of the extant works have used user reviews directly to arrive at such a conclusion for uncovering the persuasion of buying preferences for e-tailing (Chu & Chen, 2019; Nikbin et al., 2022).

For this study, we have looked at products across different categories on Amazon and inferred the results for each of those categories. This was done so as not to combine two or more categories which might be unrelated. For instance, combining reviews of automotive equipment and beauty products would be counter-intuitive because of different consumer bases, age groups and expectations from the shopping experience. The “side effects” for beauty products may not be applicable to the rest of the categories such as office or sports. Likewise, “material” was one of the pertinent latent factors underlying fashion products, however, it is unlikely for material to be a relevant factor for the other product categories. Thus, the study of products in different niches helps ensure the accuracy and generalizability of the results. Hence, we divided the products across several categories in tandem with the catalogue of the ecommerce giant.Thus, we highlight the importance of differentiating the latent review characteristics in terms of the product category they belong to. It is observed that the influence of review characteristics to nudge buying preferences varies with product category. Hence, this study endeavours to understand which products are more prone to review manipulation and its corresponding impact on nudging product preference(s).

In the next section we discuss the theoretical background and conduct a literature review on the consumer decision making process on e-commerce portals. We also develop our hypotheses based on consumer response grounded in SOR theory. In Section 2 we present the research methodology. We develop a research summary diagram, followed by details of data pre-processing and identification of latent factors in reviews. In Section 3 we discuss the results obtained across the product categories with emphasis on descriptive statistics and their implications. Section 4 elaborates on the theoretical and managerial implications of our findings; we dedicate separate subsections to each. Finally, Section 5 concludes our paper, here we provide some additional implications for future research and discuss the limitations of our study.

1.1 Theoretical Background

Our study is grounded on the theory of environmental psychology having the three pillars of SOR as the foundational elements (Mehrabian & Russell, 1974). The theory postulates that in any practical setting, there is an external stimulus affecting the organism, which in turn responds through arousal, pleasure, and dominance. The external stimuli are defined as elements which motivate the person to change their current disposition towards the environment. The organism is defined as the internal state where the external stimuli to the person and the final action taken by the person interact. Finally, response represents the action or the response of the individual. Since its development, the SOR framework has been widely applied in the consumer behavior discourse (Laato et al., 2020; Jacoby, 2002; Chang et al., 2011) in their seminal paper on hedonic motivation for impulse buying behavior consider the retail environmental characteristics as the external stimuli, the consumer’s emotional response as the organism and the consumer’s behavioral response as the action taken.

We draw our motivation and analogy from the above theoretical underpinnings and present the reviews as an external stimulus for consumers which leads to an internal emotional response and ultimately shapes the behavior of the customer in the form of an external response (buying or not buying) reflected through their recommendation behavior. The theoretical framework is depicted below in Fig. 1. Furthermore, the stimulus (reviews in this case) is perceived to be trustworthy, mostly if they are based on actual product experiences (Senecal & Nantel, 2004). The role of actual product-based experience becomes instrumental in shaping response behaviors. Therefore, in this study, we hope to distinguish between honest reviews based on actual experience, vis-à-vis, fake reviews which are fabricated and misinformative. We use the SOR framework as our theoretical grounding and test whether consumer response through external stimuli can only be achieved through genuine reviews based on actual consumer experience or even through fake reviews. The role of product characteristics underlying fake and genuine reviews would help us capture their individual impact on the recommendation responses. Overall, the role of stimuli in the form of writing style and its influence on perceiving the underlying product characteristics and future recommendation has never been analysed. This study therefore is the first of its kind to unveil the novel manifestation of writing style with respect to consumer perception and response behavior.

Fig. 1
figure 1

Proposed research model (Adapted from Mehrabian & Russell, 1974) [*Mediator; ** Moderator]

1.2 Proposed Research Model and Hypotheses Development

With the emergence of Web 3.0, online reviews have become an essential outlet for consumer communication (Chanias et al., 2019; Wang et al., 2018a, b; Xu & Jin, 2022). The improvement in customer engagement has been intertwined with the rise of unscrupulous organisations who create spam profiles to promote or denigrate products on social media (Aswani et al., 2018, Aswani et al., 2019, Michail et al., 2022). Consumer reviews are so ubiquitous in the modern world that consumers’ purchase intentions and decisions are deeply affected by them (Kwon et al., 2014; Senecal & Nantel, 2004). Purchase intention has been defined as the probability of a consumer’s willingness to exhibit a purchase behavior (Nikbin et al., 2022; Wien & Olsen, 2012; Xu & Jin, 2022). It is a well-known measure to predict purchase behavior in the marketing literature One of the major advantages of online reviews is that consumers across various geographical locations can share their buying and product experiences (Chanias et al., 2019). Research suggests that online reviews are more persuasive than any marketing initiative by the company, as these reviews are assumed to be free from vested interest and thus more credible (Plotkina & Munzel, 2016; Jang et al., 2012). Furthermore, perceived credibility reduces consumer risk and eases the buying process (Ma & Atkin, 2017). Literature has also established that it is common for users to post dishonest reviews to either publicize or malign a product (service) using e-commerce portals (Wu et al., 2019). Despite efforts by e-commerce platforms, the number of fake reviews on them is extremely high (Plotkina & Munzel, 2016). The consensus is that at least one-third of the consumer (online) reviews are susceptible to manipulation. Hence, it is fair to assess that fake reviews can impact consumers’ buying decisions.

As stated, customer reviews on online purchase platforms have been found to affect consumer purchase intentions (Heydari et al., 2015; Xu & Jin, 2022). The sentiment of the review (positive or negative) can lead to opposite effects on prospective consumers (Hao et al., 2010). Earlier studies confirm that linguistic styles with language subjectivity boost consumers’ purchase intentions (Liu et al., 2018; Nikbin et al., 2022; Wang et al., 2022). Studies also report significant differences in the micro-linguistic constituents of fake and genuine reviews (Chatterjee et al., 2021). Despite the knowledge about influence of online word-of-mouth on purchase intention, the linguistic complexity of the reviews and their effect on purchase intention are still poorly understood (Liu et al., 2020). Linguistic features include writing style, part of speech, content vs. functional words, and lexical richness, among other characteristics (Bang et al., 2021; Jabeur et al., 2023). Furthermore, hidden signals of consumer psyche can also be captured through important revelations of consumer sentiments, opinion and emotional valence (Jiang et al., 2017; Zhu et al., 2021). In this study, we therefore conceptualize and measure writing style as a second order construct, by capturing linguistic features, sentiments and emotions as sub-constructs. Hence, we hypothesize that writing or authorial style related features can persuade the purchase preference of the customers:

  • H1: Authorial style of fake reviews does not affect buying persuasion influence in customers.

  • H1a: Authorial style of fake reviews impacts buying persuasion propulsion in customers.

Likewise, the following hypotheses are formulated for genuine reviews:

  • H2: Authorial style of genuine reviews does not impact buying persuasion propulsion in customers.

  • H2a: Authorial style of genuine reviews impacts buying persuasion propulsion in customers.

The online reviews of products consist of customer opinions on various product characteristics, which help the customers decide whether to buy the available product and/or service(s) ((Wang et al., 2018a, b). Reviews posted by other consumers provide ample information about the offerings, facilitating new consumer’s buying decisions (Eslami & Ghasemaghaei, 2018; Mudambi & Schuff, 2010). Furthermore, customers are additionally interested in the pertinent information which is latent in reviews (Zhan et al., 2009), and whether or not they are based on actual experiences (Senecal & Nantel, 2004). Review quality as another reflection of latent information, is also known to shape the product evaluation process and ultimately, influence buying intentions (Lee & Shin, 2014). However, the above relationship is vastly dependent on the product type and accordingly, we acknowledge the importance of unveiling product-specific characteristics (Lee & Shin, 2014). In this regard, product-related features discussed in online consumer reviews are expected to play an instrumental role in shaping consumer’s purchase related decision-making process (Zhang & Piramuthu, 2016). Hence, we hypothesize the following:

  • H3: Product-specific characteristics underlying fake reviews do not impact buying persuasion propulsion in customers.

  • H3a: Product-specific characteristics underlying fake reviews impact buying persuasion propulsion in customers.

Similarly, we postulate that product characteristics underlying genuine reviews propel consumer buying persuasion:

  • H4: Product-specific characteristics underlying genuine reviews do not impact buying persuasion propulsion in customers.

  • H4a: Product-specific characteristics underlying genuine reviews impact buying persuasion propulsion in customers.

Following the above discussion, we postulate that both the authors’ writing style and the hidden characteristics underlying the user-generated reviews impact buying persuasion (Jabeur et al., 2023; Liu et al., 2020; Mudambi & Schuff, 2010). This manifestation becomes more crucial due to the burgeoning user generated content in the e-commerce realm. Resultantly, consumers look forward to precise and meaningful content in reviews which sufficiently explain product aspects before forming an opinion (Sangeetha et al., 2018; Zhang et al., 2018). A few studies have investigated the tone (negative) of consumer reviews to form a perspective towards product attributes, which ultimately influences their buying decision (Lee et al., 2008). In this line, we postulate that overall writing style can be considered as an antecedent to the product characteristics revealed through the reviews, which ultimately will influence product recommendation or discouragement. It is thus proposed that writing style affects buying persuasion not only by itself but also through the review’s features. Therefore, consistent with the above formulations, we hypothesize the following:

  • H5a: Product-specific characteristics (expressed through writing style) underlying fake reviews mediate the relationship between authorial style and buying persuasion propulsion.

  • H6a: Product-specific characteristics (expressed through writing style) underlying genuine reviews mediate the relationship between authorial style and buying persuasion propulsion.

Customer rating has been defined as a quantitative measure of customer satisfaction, leading to increased purchases (Hu & Liu, 2004). Research has demonstrated that prospective consumers receive value from positive customer ratings (Clemons et al., 2006). Likewise, negative customer reviews can reduce the buying intention of the prospective customer. Past literature also documents the importance of ratings in evaluating the credibility of reviews (Besbes & Scarsini, 2018). There is further evidence that ratings are positively associated with both product quality and purchase intention (Flanagin et al., 2014). A positive correlation may not necessarily indicate the effect of ratings on shaping product perception and/or product preference, but at least reflects the role in influencing the relationship between product quality and buying intentions. Based on the above arguments, ratings can be postulated to strengthen or weaken the relationship between reviews and buying persuasion. Hence, we hypothesize the moderating effect of ratings on the above relationships:

  • H7a: Customer ratings moderate the relationship between product-specific characteristics underlying fake reviews and buying persuasion propulsion.

  • H8a: Customer ratings moderate the relationship between product-specific characteristics underlying genuine reviews and buying persuasion propulsion.

2 Research Methodology

The central motive of this work is to identify the latent characteristics underlying fake reviews. Moreover, we investigate their direct and intermediary role in nudging buying preferences. The writing style is hypothesized to be an antecedent to the above relationships. To achieve these objectives, we have adopted a comprehensive and robust methodology. For a better understanding, we have divided the methodology into various components. First, the data collection and pre-processing are explained in Subsection 2.2. In Section 2.3, we elaborate on the use of NLP and text mining to uncover the latent characteristics underlying the reviews. Finally, a statistical analysis is presented in Section 2.4. One of our goals in this research work is to build inferential models by the application of statistical modelling and machine learning in tandem to produce results which are interpretable (Kar & Dwivedi, 2020). Figure 2 presents a summary diagram of the entire methodology. The portion relevant to the statistical analysis is outlined in the figure.

Fig. 2
figure 2

Overview of the adapted research methodology

2.1 Theory Building and Validation

We have framed our research methodology based on the guidelines for theory building in big data research by Kar and Dwivedi (2020). They have specified the following major steps for theory development in the big data setting, we explain each of them in the context of our research.

Research questions and hypothesis development: We have formulated relevant research questions where we ask pertinent questions to attend and answer the gaps in the extant literature. Specifically for this study, we intend to investigate the role of fake reviews in shaping buying persuasion of consumers. Grounded on the theoretical underpinnings of environmental psychology and synthesis of the extant literature, we develop appropriate hypotheses and test them through statistical analysis.

Data acquisition and conversion: We collect user review data from the prominent e-commerce platform Amazon.com. The data consists of consumer’s self-expression of product experience as textual reviews along with their ratings. The data is cleaned and converted from unstructured to structured followed by the application of text mining algorithms on the data. The details of the approach are elaborated in the subsequent Section 2.2.

Big data analytics: The application of big data analytics is central to our research. We use NLP to identify the latent dimensions of product characteristics using STM. Use of NLP was warranted since reviews are short in nature and have diverse vocabulary to express common concerns. Therefore, statistical distribution of words helps us in getting a coherent understanding of various dimensions underlying the consumer description. These dimensions are latent in nature and otherwise are unobservable by manual comprehension. Moreover, we could arrive at scores for the entire corpus of reviews corresponding to all the latent factors. This is incrementally beneficial to the extant approaches, since our study is not based on a small set of respondents. Rather, the findings are more generalizable and conclusive. The details of the adopted approach are discussed in Section 2.3.

Model specification and validation: We map the dimensions obtained through topic modelling to the constructs of our research framework. Each of the hidden dimensions acts as an underlying latent factor of the product characteristics, with several topics representing a product category (organism). The identification of topics is based on a set of keywords from two sources: evidence from previous literature of examined factors and individual discretion of authors. Here to address the inherent bias in the process, we cross-check with the past works to identify similarity with earlier literature on consumer behavior. Therefore, the explored factors have references of variables predominant in the consumer behavior literature. The authors then discussed with each other until they reached mutual agreement on the identified latent factors in the underlying reviews (Kar, 2020). The labelling of the buying persuasion could be the source of another bias; hence the second author took extreme care in labelling the reviews and confirmed the label with subject matter experts on contentious reviews.

2.2 Data Collection and Pre-Processing

To study the latent pattern, we gather consumer reviews and user feedback from Amazon.com, Inc, the e-commerce giant (Chevalier & Mayzlin, 2006; Das & Chen, 2007; Kaushik et al., 2018; Nikolay et al., 2011). The dataset contains 21,000 reviews, equally divided and labelled as fake or genuine. Given that comprehensive and generalized findings are lacking in the existing literature, we intend to unveil the characteristics underlying each product category. Therefore, the first role was to segregate data into broad product categories. Though the raw data had 30 different product categories, the instances per category were limited. We thus referred to the Amazon portalFootnote 1 to make the representation coarser. We arranged the 30 categories into 11 broader categories, namely: automotive (includes auto, tools, and industrial supplies) − 700, baby & baby products (baby and toys) − 700, beauty & healthcare (beauty, personal, and healthcare) − 700, books − 350, electronics (camera, Electronics, PC & wireless)- 1400, entertainment (Music & Instruments, Video Games, Video DVDs) − 1050, grocery − 350, fashion (Apparel, Jewellery, Shoes & Watches) − 1400, home & kitchen (Home, Kitchen, home improvement, Home entertainment, Lawn & garden, Outdoors) − 2100, office − 350, pets − 350, sports & luggage (Sports & Luggage) − 700.

This segregated data was used to extract and identify the characteristics underlying fake reviews for each of the 11 broader categories. Next, we wanted to capture hints of persuading consumers’ buying intentions. Reviews with a mention of product recommendations were filtered if any of the following keywords/phrases appeared: “buy,“ “purchase,“ “recommend,“ “go for it”, etc. Out of the fake reviews, 4411 reviews were pertaining to explicit recommendations. Later, the second author manually labelled these reviews into two categories: promote or discourage buying intentions. This exercise helped create a dichotomous dependent variable identifying the direction of nudging of the buying intentions. The detailed count of such reviews and the corresponding distribution is reported later in the Findings section (Section 3).

2.3 Extracting Latent Dimensions

Using NLP and text mining, it is possible to extract the underlying structure of documents (Kar & Kushwaha, 2021). Specifically, topic modelling is a technique that assumes that documents are a distribution of a group of words, while themes are a distribution of a group of documents (Blei et al., 2003; Pang & Lee, 2008). The prominent techniques for topic modelling assume latent distributions and dimensions, viz., Dirichlet Allocation, Semantic and Probabilistic Semantic analysis (Blei, 2012; Manning & Schutze, 1999). The above techniques, particularly Latent Dirichlet Allocation (LDA), are widely used for generating themes from a corpus of documents (Bholat et al., 2015; Hendry & Madeley, 2010). While these techniques are beneficial in uncovering the underlying themes, they cannot reflect on the trajectory of these themes. Thus, we use the structural topic model (STM), which can capture the relationship between the subject and other covariates of the meta-data (years in this case) (Sharma et al., 2021).

Principally, the above techniques assume a latent distribution of words across different themes for any textual corpora text. Consequently, orientation of each document towards all the latent dimensions is measured (Blei et al., 2003). Therefore, one may infer that each document shares a proportion of one of the latent themes assumed under STM (Tirunillai & Tellis, 2014). Intuitively, each document defines a subset of these K dimensions by selecting a list of words that best explain the underlying topic (Blei, 2012). To elaborate further, STM performs two broad computations: probability \({\theta }_{d,k},\) denoting each document’s orientation towards k latent dimensions or assumed themes. This probability \({\theta }_{d}\) follows a Dirichlet distribution with parameter \(\alpha :\)

$${\theta }_{d} \tilde Dir\left(\alpha \right), \alpha =({\alpha }_{1}, {\alpha }_{2},\dots ., {\alpha }_{k})$$
(1)

The second computation involves zd,n with the nth word of dth document aswd,n over the entire vocabulary obtained from the corpora and conditioned on the assumed dimension zd,n, again follows a Dirichlet distribution with parameter \(\eta\):\({\beta }_{k} \tilde Dir\left(\eta \right)\)

The joint likelihood is then

$$P\left(\theta ,\beta ,w,z\right)= {\prod }_{d=1}^{D}P\left({\theta }_{d}\right|\alpha ){\prod }_{k=1}^{K}P\left({\beta }_{k}\right|\eta )/{\prod }_{n=1}^{N}P\left({z}_{d,n}|{\theta }_{d}\right) \ast P\left({w}_{d,n}\right|{z}_{d,n})$$
(2)

Next, one needs to maximize the above likelihood which is not possible explicitly, as the dimensions z are latent. Therefore, a posterior distribution is deduced as explicated in Eq. 3, reflecting probability conditioned on the parameters. Consistent with a conditional probability representation, the ratio in Eq. 3 comprises a joint probability in the numerator, with a marginal probability in the denominator:

$$P\left(\theta ,\beta ,z|w,\alpha ,\eta \right)= \frac{P(\theta ,\beta ,w,z|\alpha ,\eta )}{P\left(w\right|\alpha ,\eta )}$$
(3)

Since a detailed explanation of STM is beyond the scope of this paper, we intuitively understand STM as another (latent) dimension exploration approach, namely the exploratory factor analysis. Documents (reviews in our case) serve as individual rows. On the other hand, each unique word is treated as a separate variable. In accordance with the observed hidden factors, the topic model observes latent dimensions as topics. Associating words within a topic is analogous to mapping variables to a factor (or construct) (Travis et al., 2017). The above process yielded a topic-wise distribution of words, which we outline in the subsequent section.

2.4 Statistical Analysis

Do the extracted characteristics underlying fake reviews impact persuasion of buying preferences? We directly use customer reviews as our observations to answer this question. Previous attempts to validate any similar relationships have performed similar investigations; however, their validation is based on the perception of representative stakeholders through survey-based instruments. The extant approaches have severe generalizability issues due to limited observations (respondents), factors commonly identified, lack of field data and borrowed as per the researcher’s understanding, and the statistical significance of overall results (Rana & Dwivedi, 2016; Reimer & Benkenstien, 2016; Wu et al., 2020). In this study, we perform statistical modelling using NLP and Partial Least Squares Structural Equation Modelling (PLS-SEM). The reviews directly act as observations for an overarching consumer base. The use of NLP helps capture the representation of each review in terms of the underlying product characteristics. The representation is a scaled orientation measure ranging between 0 and 1; a higher value means the review reflects the corresponding product aspect more accurately.

We intend to explain a theoretical model in a parsimonious and straightforward way. PLS-SEM has become a natural and popular choice for scholars, since many constraints corresponding to distributional properties, such as multivariate normality, factor indeterminacy, etc., are absent (Hair et al., 2009). Moreover, previous works have argued that PLS-SEM addresses the individual shortcomings of PLS and covariance-based SEM (CB-SEM), such as flexibility of measurement scale and handling complex models (Fosso Wamba et al., 2015, 2018; Goodhue et al., 2012). Additionally, CB-SEM is known to have less suitability for models involving both reflective and formative constructs, as in our case (Akter et al., 2016, 2017). In this parlance, PLS-SEM are more equipped to handle constructs modelled as factors and/or composites (Sarstedt et al., 2014). The above mechanism establishes a premise for ecosnometric investigation (Bhuian et al., 2018; Sharma et al., 2019; Sharma & Sharma, 2019). Based on the previous works, we consider authorial style as the antecedent, which has a demonstrated effect on review-related characteristics (Zhu & Zhang, 2010). Motivated specifically by Banerjee and Chua (2017), we extend their idea by treating authorial style as a higher-order construct. The higher-order construct offers a more comprehensive reflection with additional empirical and theoretical meaning (Sousa-Zomer et al., 2020).

The sub-dimensions of authorial style-based features are linguistic features (Weiss et al., 2010), sentiments (Nikolay et al., 2011; (Wang et al., 2018a, b), and emotional valence (Zhu et al., 2021). These three features aggregate the diverse and foundational tenets of the writing style of consumers. The items under each factor are selected based on past works and their statistical significance. Next, the impact of authorial features on the underlying product characteristics for various categories is investigated. Finally, the effect of authorial style on buying preference persuasion is gauged with the intermediary role of underlying product characteristics. The previously posited hypotheses for the moderation of ratings are investigated in the second level of the PLS-SEM model. Since all the hypotheses involve product-specific characteristics as a construct, we test and validate each set of characteristics, corresponding to various product categories. The results and their inferences are discussed in the subsequent sections in greater detail.

3 Results and Findings

3.1 Descriptive Results

We first report the summary statistics for fake and genuine reviews. We focus on the number of sentences per review (sentence count), word count per review (word count), and the proportion of stop (function) words per review (stop word count). The findings and results of statistical tests are documented in Table 1 below.

Table 1 Summary statistics of writing style-related features underlying online reviews

The average number of sentences and the corresponding standard deviation for genuine reviews is significantly more than that of fake reviews, which suggests honest reviews are generally lengthier than non-genuine ones. The significantly lower standard deviation in fake reviews indicates a possible standard reply style with little to no variation. What logically follows is that the average word count for genuine reviews is significantly higher than that of fake reviews. Moreover, if we investigate the ratio of words to stop words, we find the average for genuine reviews to be 1.22. The exact ratio for fake reviews stands at 0.96. This process is an important revelation. Word count is the count of meaningful words in a sentence, also known as content words, which have inherent meaning and exist independently of the sentence (nouns and adjectives). On the other hand, stop words (also known as function words) do not convey any semantic meaning when used in a sentence (articles and verbs). The ratio thus, tells us about the “meaningfulness” of a review. Intuitively, a higher proportion of content words is associated with more informative reviews. We find genuine reviews have a higher meaningfulness scorethan fake reviews, this holds high importance in cases where fake reviews may be verbose but in reality, are less informative than genuine reviews. It can be observed that for each of the feature types in Table 1, the difference between genuine and fake reviews is statistically significant with 95% confidence.

3.2 Inferential Results

This section discusses in detail the results obtained from the statistical investigation. Authorial style is a reflective construct comprising linguistic features, sentiments, and emotional valence as unique dimensions. For each product category, we find that authorial style has a significant relation with its sub-constructs for fake and genuine reviews, except for baby products. We further observe that product characteristics mediate the relationship between authorial style and buying recommendation for real reviews. Product characteristics is a formative construct consisting of latent dimensions identified through STM on the customer reviews. The detailed items under each of the broad constructs of authorial style, linguistic features, sentiments, emotional valence, and product characteristics are summarized in Table A1 of Supplementary material. Interested readers are also referred to Figs. 1, 2, 3, and 4 of Supplementary material, showing the unique and exclusive set of keywords in fake reviews which were obtained from STM analysis and further used to identify the latent product characteristics. To have a more generalized conclusion, we compare the results corresponding to both genuine and fake reviews. We observe significant relationships for the five most relevant product categories, viz., baby products, beauty and health products, fashion products, office products, and sports products. We now deep-dive into the PLS-SEM results for these five product categories. We start with the “beauty & health” products category.

In Fig. 3, we observe that writing style significantly explains the characteristics underlying both fake and genuine reviews. We also find a significant relationship between writing style and buying persuasion for genuine reviews. This relationship is partially mediated by product characteristics without significant moderation effects (ratings). However, for fake reviews, the association is non-existent, both with and without mediation. The extracted latent factors for genuine reviews are ease of use, side effects, price, usefulness, catalogue, and taste. The obtained factors indicate thecustomer’s considerations when buying beauty and health-related products. Consumers are concerned about the effects, value for money, and product utility (usefulness and ease of use). However, for the fake reviews, the latent factors are less varied and have little mention of price sensitivity or taste. Next, we examine into the “fashion” products category.

Fig. 3
figure 3

PLS-SEM Model for beauty and health products category

Consistent with the trends observed for “beauty and health” products, Fig. 4 highlights a significant mediating effect of characteristics underlying reviews between authorial style and buying persuasion. Ratings appear ineffective in moderating the relationship between the constructs for both fake and genuine reviews. Multiple fashion-specific sub-constructs emerge as latent factors that form the product characteristics related to real reviews. Prominent among them are looks, variety, comfort, and material which are all applicable to fashion accessories and products. On the contrary, factors that emerge from fake reviews are quality, price, defect, etc. These are generic factors and cannot be considered specific to fashion. Next, we analyse the office products category (Fig. 5).

Fig. 4
figure 4

PLS-SEM Model for the fashion products category

Fig. 5
figure 5

PLS-SEM Model for office products category

For the office products category, authorial style significantly affects product characteristics. However, contrary to the other findings, a direct, significant relationship does not exist between authorial style and nudging buying preferences for fake or genuine reviews. Moreover, there is no statistical evidence of a mediating role of underlying product characteristics. This phenomenon can be understood through the lens of product involvement (Kuenzel & Musters, 2007). Studies in psychology have stated that involvement with a product can be traced down to the consumers’ desirable values and needs based on their self-knowledge and personal values about the product (Peter & Olson, 1987). Office products such as papers, stationery items, etc. are low involvement for the consumer as they are not closely linked with the individual’s abstract psychological and value attributes (Knox & Walker, 1992). Hence, the reviews for these products do not have any effect on consumer buying behavior. However, it is to be noted that the significant latent factors observed for real reviews of office products are utility, supplies, durability, and quality. On the contrary, elements for fake review of office products are availability, ease of use, uniqueness, etc. Again, factors such as durability and supplies which are atypical to office products, fail to emerge from the fake reviews strengthening the efficacy of the real review-based output. Next, we analyse the sports category (Fig. 6).

Fig. 6
figure 6

PLS-SEM Model for sports products category

Unlike other product categories, a significant mediation and moderation effect can be observed between authorial style and nudging buying preferences for genuine reviews for products in the “sports” category. Product characteristics mediate the relationship between authorial style and buying persuasion, whereas ratings moderate the relationship between product characteristics and buying persuasion. In contrast, the underlying factors for the fake reviews do not significantly affect buying persuasion, nor is there any significant evidence of a moderation effect. The latent factors for genuine reviews consist of utility, weight, material, quality, style and handling. Fewer underlying factors are observed for the fake reviews, which again turn out to be very generic such as quality, usefulness, benefits, and offers.

The latent characteristics underlying fake reviews for the baby category were inconclusive and incoherent; thus, no statistical model could be tested. For the genuine reviews, we uncovered and identified eight latent dimensions: uniqueness, aesthetics, attributes, advantages, disadvantages, assembly and after sales. It can be observed that the authorial style of the reviews and the product characteristics significantly affect the recommendation behavior of customers (solid lines) for this category. Moreover, underlying product characteristics mediate the relationship between authorial style and shaping buying preferences– however, ratings do not moderate this relationship significantly.

Furthermore, the latent factors and underlying product characteristics unpack what customers talk about while buying baby products. Some dimensions, such as advantages, weaknesses, and after-sales services indicate the parents’ psyche of sensitivity towards assessing what is best for their children (Fig. 7).

Fig. 7
figure 7

PLS-SEM output for the baby products category (Genuine Reviews only)

Genuine reviews have profound informational content, particularly about product specific attributes which nudge customer’s buying preferences. On the contrary, fake reviews mostly convey generic information and lack the potential to aid buying decisions (Reich & Maglio, 2020). The above manifestation could be one of the primary reasons that fake reviews do not invoke consumer’s buying intentions as a genuine review does. To the best of our knowledge, a similar in-depth analysis of the underlying factors which expose the difference between fake and real reviews has not been available in the extant literature and this remains a major contribution of this research work.

4 Discussion

The study reveals that genuine reviews significantly impact buying persuasion in nearly all product categories by mediating latent factors extracted through customer reviews, which fake reviews fail to achieve. This result is consistent with some earlier studies (Zhuang et al. 2018). In this study, authorial style as a construct is significant across the product categories for both fake and genuine reviews. Authorial style comprises sub-factors such as linguistic, sentiment, and emotion-based features. One of the study’s findings is that authorial style significantly shapes buying persuasion for baby products, “beauty and health products”, and fashion products. These three product categories can be classified as high-involvement products. High-involvement products carry an intrinsic risk for the customer, are complex, and are generally expensive (Martin, 1998). Hence, across the three product categories, what the consumer writes in the reviews (product characteristics) and how they write the review (authorial style) play a vital role in shaping online recommendation for a potential new consumer. However, the customer ratings do not moderate the relationship between product characteristics and buying persuasion, suggesting that the strength of the relation between the variables is independent of the ratings. Hence, it can be concluded that for the high involvement product categories such as fashion, baby, and “beauty and healthcare”, ratings do not interact with latent features such as writing style or product specific characteristics. Therefore, managers may be better off focusing on the content and tone of reviews (sentiment analysis) rather than ratings when analysing customer response to new or existing products.

There is no significant direct relationship between authorial style and buying persuasion for the remainder of the product categories (office products and sports products). For office products, mediation through latent factors is non-existent, nor is there any evidence of a moderation effect. Since office products are perceived as low-involvement products carrying less consumer risk, the underlying latent characteristics, writing style, and ratings have no effect on buying persuasion (Kuenzel & Musters, 2007).

On the other hand, for sports products, one can observe a significant mediation through the latent factors and a significant moderation through the customer ratings. Since sports products are more complex and exhibit characteristics of both low-involvement (general goods bought in bulk) and high-involvement products (professional sports gear), we therefore observe a mediation effect of the latent factors complemented by a significant interaction of ratings. We report the summary of the hypothesis test results in Table 2.

Table 2 Summary of hypothesis test results [NA: Not Applicable, NS: Nonsignificant, S: Significant]

4.1 Theoretical Contributions

The extant literature has concentrated on identifying fake reviews by exploring their writing style and linguistic features. Few have also attempted to derive their influence on the authenticity of reviews, future recommendations, or even popularity. None of the past works have tried to investigate the effect of authorial style of reviews on consumers’ perception of the product features or consumer response behaviors. One of the principal reasons for this gap is the lack of theoretical grounding of the extant explorations. Grounded on the SOR framework, our study advances the theoretical underpinnings of stimulus, organism, and response to ultimately unveil consumer response behaviors, particularly in an online setting. This study tries to uncover whether the complex relation between authorial writing style-based features and persuading buying preferences is mediated through latent product characteristics underlying fake reviews. To offer a comprehensive understanding, we also put to test the postulated hypotheses for genuine reviews vis-à-vis the fake reviews. This comparison is carried out over several product categories for increased generalizability.

Our study investigates how the writing style of fake reviews acts as a stimulus to shape consumers’ perception about product characteristics and ultimately the effect on recommendation behavior. For fake reviews, we found no effect except an indirect effect for fashion products. For a comprehensive understanding, we also test whether the complex relation between authorial writing style-based features and buying preference propulsion is mediated through latent product characteristics underlying genuine reviews. We conclude that fake reviews lack “meaningfulness” of content and have extreme tonal expressions. Other pivotal finding is that product characteristics underlying fake reviews do not mediate the effect of writing style on shaping recommendation behavior.

Our findings corresponding to the authorial features offer profound insights regarding fake reviews. We specifically observe that fake reviews have a significantly lower word count while the proportion of function words is higher than genuine reviews. Interestingly, this indicates that fake reviews lack precision and clarity of discussion. Furthermore, the tone of fake reviews is more polar than real reviews, echoing extreme positivity or negativity in expression. On the contrary, the emotional valence is lower for fake reviews, indicating a lack of touch points. The past literature has also posited that only genuine experiences have higher emotional touch points. Summarily, we inform the scholars of information science that fake reviews lack “meaningfulness” in their expression.

The study also contributes to the domain of consumer behavior. As one of our central findings, we statistically establish that the intermediary role of product characteristics is true only for honest reviews, indicating that consumer perception about product characteristics cannot be misled by review manipulation. We further conclude that fake reviews have latent characteristics that do not influence recommendation behavior. The above conclusion is consistent across various product categories. One possibleexplanation could be that fake reviews lack genuine experience(s) of the product usage. On similar lines, previous investigations have revealed the importance of sharing first hand product experience and its impact on shaping buying intentions (Senecal & Nantel, 2004). The descriptive results highlighted in Section 3.1 are another testimony to the synthetic tone and verbiage of fake reviews.

From a methodological standpoint, exploration of underlying characteristics of misrepresented and manipulated information, online consumer reviews in this case, further adds novelty to our work. Since the findings are based on a large collection of reviews, the adopted methodology does not suffer from the inadequacies of survey-based research, lab findings or qualitative methods, which traditionally have limited implications for real-life situations. Specifically, we have studied over 21,000 fake and genuine reviews collected from Amazon.com, an online retail giant, making the findings directly applicable to advertisers, vendors, and buyers who use Amazon or similar platforms to conduct their e-business.

4.2 Implications for Practice

The findings and conclusions drawn from this study are promising for e-commerce retailers. We first highlight the importance of meaningful content in consumer reviews. Practitioners may plan strategies to reward pertinent reviews on their portal which convey important information. Evidence suggests that non-verbose and polar expressions generally echo manipulated information, and they also lack emotional valence. The above findings may be applicable to not just practice, but also government agencies trying to detect and prevent misinformation in social media and other online media.

Moving further, the moderating role of ratings on the influence of underlying product characteristics on shaping buying persuasion is not established. This has a strong implication for the portal managers and their marketing teams. We posit that managers should concentrate on the product characteristics alone rather than ratings to prevent product discouragement.

Next, we suggest firms entertaining fake reviews on their portals should be aware of their limited role, both in shaping perceptions about latent product characteristics and in influencing buying preferences. Consequently, fake reviews may offer limited economic advantages to the ecommerce portals. One of the possible reasons for the above finding could be that fake reviews lack true product experience and therefore, the underlying characteristics are vague and misleading. In this context, customer relationship managers sceptical of any manipulation, may further engage in conversations to explore more about the specific sources of dissatisfaction. This would be a step towards preventing the spread of negative word-of-mouth in the online medium.

We also gain an understanding of specific factors which the consumer deems important in different product categories. Consumers may be segmented based on preferred factors such as price, quality, utility or a combination of the different features.

This study observes consistent findings for several product types, thereby suggesting to practitioners that principles of consumer behavior remain consistent across product categories. High involvement products such as those falling under “beauty and health”, baby and fashion need to be handled with more caution. The buying persuasion for such high involvement products can be easily nudged relative to other product categories. Therefore, identifying fake reviews for high involvement products become crucial to prevent further product discouragement.

Last, we conclude that latent characteristics underlying fake reviews, as opposed to genuine ones, fail to influence buying preferences significantly. Therefore, our study suggests that firms entertaining fake reviews on their portals need to be aware of the limited economic advantages of such practices. Overall, this study contributes to understanding the antecedents and impact of misrepresented or manipulated online information on overall consumer behavior.

4.3 Limitations and Future Research Directions

Like other studies, this work is also not free from limitations. This study was performed at a general level across all Amazon reviews worldwide. However, various countries, cities, and ethnicities might have differentopinions on the same products, which brings intoview dynamic product design strategies based on customer feedback for other places and people. Previous studies have also observed significant influence of socio-cultural aspects on buying intentions. Therefore, future works may investigate the role of country and related demographic attributes moderating the observed relationships in this study. Comparing product reviews across specific countries could bring out further insights forcompanies and help themdesign innovative products and culture-specific marketing strategies.

Another drawback of the study is that it is based on general e-commerce behavior of customers. It would be interesting to explore the buying behavior of individuals under specific circumstances. For instance, the effect of the COVID pandemic on customer response which led to mass buying and hoarding of products could be an interesting use case. Similarly, the seasonality of offers on e-commerce portals during Black Friday sales or religious festivals may lead to a rise in nudging behavior caused through fake reviews. Hence, our findings of the relevance of fake reviews are valid for general response behavior of customers but we have not investigated specific contexts and scenarios which could augment nudging buying preferences through misinformation.

Unlike other work, our study did not attempt to detect fake reviews (Ansar & Goswami, 2021). We rather focus on the business implications of entertaining fake reviews on e-commerce portals. Nevertheless, a possible gap arises here for enthusiasts of computational intelligence to automatically identify fake reviews. We encourage future researchers to explore this area.

5 Conclusions

The proliferation of online retailing and the popularity of e-commerce has made product quality information imperative for consumer decision-making. Such information is primarily available through consumer reviews, acting as a resource to shape buying decisions. More than the product description and expression corresponding to the experience of consumption of products or services, the perceived authenticity of the reviews has gained tremendous importance. Consequently, reviews are prone to manipulation and misrepresenting information. Previous research posits that consumers can diagnose such review manipulation; however, there is a dearth of efforts to unveil the genesis of such (fake) reviews. This study is an early attempt to investigate the characteristics and authorial writing styles underlying fake reviews and their influence on persuading buying preferences.

behavior Grounded on the theory of environmental psychology, our work is founded on the SOR framework. We draw our motivation and analogy from the SOR framework to map stimulus, organism and response with the nature and role of online consumer reviews. In line with the theory development guidelines for big data research, we use a combination of machine learning and statistical analysis to understand the cognitive side of customer response behavior. We develop hypotheses based on existing literature and recognize the relationship between external stimulation and response behavior in an individual. Hence, the study also functions as an exercise in further validating the value of SOR in consumer behavior studies. We observe that for most of the product categories the proposed hypotheses are significant for the genuine reviews and non-significant for the fake reviews. This illustrates the discriminative power of the SOR framework. We therefore conclude that psychological theories and frameworks could act as a theoretical pillar in supporting such studies and developing the necessary philosophy which may provide us with new ways to understand a social phenomenon and possibly develop ways to counter anti-social behavior.