Controlling for the effects of customer satisfaction and online reviews type, we tested our predictions over two separate studies. Study 1 examines the effects of responses to online reviews on relationship quality and repurchase intention. Study 2 not only generalizes these findings to another scenario to extend validity but also adds a control scenario where promotional information is provided on its own, i.e., not in response to an online review. This is to rule out the influence of any general appeal or aversion to advertising by the participants.
Study 1: effects on relationship quality and repurchase intention
Data collection
We commissioned Sojump, an online survey company, to help us collect the research data for Study 1. Sojump provides access to data of a quality equivalent to traditional sampling approaches (Berinsky et al. 2012). Respondents who completed the questionnaire were rewarded with Sojump points. During data collection, Sojump used “trap” questions, for example, 3X + 4 = 13, X =? to rule out invalid questionnaires. Further, to rule out the effects of online reviews itself, we took the type of online reviews as a control variable. Thus, in Study 1, 442 participants were randomly assigned to one of four online review–response scenarios in a 2 × 2 between-subjects design: the seller’s response with and without promotional information, and the seller’s response to positive and negative reviews.
On the first page of the questionnaire, participants were asked to imagine themselves in an online shopping situation. Some were given a satisfactory experience; others unsatisfactory, as follows. The positive experience was described as: “Last month, you bought a pair of jeans online at a price of 300 yuan. After receiving the jeans, you found them to be well-made, well-rounded with a good fit, and very comfortable to wear. Therefore, you want to post a positive comment about the pants on the website. Please write down your comments.” The description of the negative experience was: “Last month, you bought a pair of jeans online at a price of 300 yuan. After receiving the goods, you found they had rough workmanship, uneven lines, and an unsatisfactory fit. They looked terrible and were uncomfortable to wear. Therefore, you want to post a negative comment about the jeans on the website. Please write down your comments.” The reason why participants were asked to write their own positive and negative reviews was to engage them more deeply in the scenario as a way to increase the integrity and authenticity of their answers to the survey.
The seller’s response to the participant’s review was provided on the second page of the questionnaire. The responses assigned to each of the four scenarios are given in Additional file 1.
Variables and measurements
The survey design is shown in Table 1. Participants were asked to score their agreement with various statements on a 7-point Likert scale (where 1 = strongly disagree, 7 = strongly agree). Relationship quality was measured against eight statements adapted from De Wulf et al. (2001). Three more constructs—repurchase intention, perceived purpose of response, and customer satisfaction—were each measured against one statement. In addition to the constructs, there was also a dummy variable to indicate whether the seller’s response included promotional information (= 1) or not (= 0). Further, we conducted two independent analyses: one for those given a satisfactory experience (the positive review group), and another for those given an unsatisfactory experience (the negative review group).
Table 1 Constructs and measurements An initial list of sentiment statements for each of the constructs was discussed with one professor and eight postgraduate students of marketing. Revisions and additions were made to the list as a result. Then the entire item pool was tested in qualitative interviews, followed by a pretest with 35 graduate students and a Ph.D. Candidate at a university in Beijing, China. The final survey instrument was constructed by selecting and modifying the statements according to feedback from the interviews and the pretest. Because the questionnaire was translated into Chinese from English, we implemented back-translations to ensure accuracy (Brislin 1970).
Hypothesis tests: study 1
We chose a PLS-SEM analysis for Study 1 because the sample size was relatively small, and the variables do not follow particularly normal distributions. PLS is suitable for small samples and does not rest on the assumption of normal distributions (Hair et al. 2011). Plus, when appropriately applied, PLS-SEM provides more robust estimations of the structural model than covariance-based SEM (CB-SEM) (Reinartz et al. 2009). Using SmartPLS 3.0 software, we analyzed the research data, following the instructions of Hair et al. (2011), who recommend calculating the significance of the model estimates through a bootstrapping procedure with 5000 samples.
Measurement model
The descriptive statistics and correlation coefficients for the positive and negative review groups are shown in Tables 2 and 3, respectively.
Table 2 Study 1 statistics and correlation coefficients—positive reviews Table 3 Study 1 statistics and correlation coefficients—negative reviews
Looking at the positive review group first, the Cronbach’s alpha for relationship quality was 0.874, exceeding the benchmark of 0.7, which confirms good internal consistency for all measurement items. All factor loadings were significant at over 0.7 (p < 0.001). We assessed convergent validity using average variance extracted (AVE) and composite reliability (CR), and discriminant validity was assessed following Fornell and Larcker’s (1981) test, which states that the AVE of each construct should exceed its squared correlation to any other construct. An AVE of 0.52 and a CR of 0.90 verify both construct and discriminant validity. Further, with a maximum of 1.17 variance inflation factor (VIF) among all constructs, much lower than the recommended value of 5, multicollinearity is not a threat in this research. The standardized root mean square residual (SMSR) was 0.058, which is slightly higher than the standard 0.05.
In terms of the negative review group, the Cronbach’s alpha for relationship quality was 0.955, which exceeds the benchmark of 0.7, confirming good internal consistency for all items. All factor loadings were significant at over 0.7 (p < 0.001). Both convergent and discriminant validity were assured with an AVE of 0.76 and a CR of 0.96. The maximum VIF was 1.06, again dismissing multicollinearity as a threat. Last, the SMSR was 0.034, lower than the standard 0.05, illustrating a satisfactory model fit.
Structural model
The primary evaluation criteria for the structural model were R2 measures, plus the level of significance and path coefficients (Hair et al. 2011). Hair et al. (2011) propose that 0.20 is a very high R2 in customer behavior research. As can be seen from Table 4, an R2 of 0.15 for relationship quality and 0.11 for repurchase intention with the positive review group, and 0.20 and 0.30 respectively for the negative review group (see Table 5), both reach an acceptable level of R2. Thus, H1 is supported (βresponse to positive review = 0.276, p < 0.001; βresponse to negative review = 0.169, p < 0.05). When a store’s responses to online reviews contain promotional information, consumers perceive that the intention is to promote the store rather than sincerely express gratitude or apologize to their customers. H2 and H3 are also supported; with H2 (relationship quality) at βresponse to positive review = − 0.289, p < 0.001; βresponse to negative review = − 0.272, p < 0.001 and H3 (repurchase intention) at βresponse to positive review = − 0.182, p < 0.01; βresponse to negative review = − 0.255, p < 0.001. These results indicate that if consumers perceive the motivation behind a seller’s response to be out of self-interest, relationship quality degrades, and customers feel less likely to purchase from that business again (Table 5).
Table 4 Study 1 hypothesis test results—positive reviews Table 5 Study 1 hypothesis test results—negative reviews
Study 2: alternative explanation tests
Study 2 was designed to replicate Study 1 with another scenario to extend external validity and isolate the effects of promotional information not given in response to an online review. Our main purpose was to rule out any general aversion to advertising as an influence over the participants’ views.
Data collection
The data collection procedure for Study 2 was similar to Study 1. Three hundred ninety-two Chinese residents were recruited from Sojump and compensated for their time with Sojump points. The participants were randomly assigned to one of four scenarios in the same 2 × 2 between-subjects design as Study 1 (i.e., with and without promotional information for positive and negative reviews), as well as two promotion-only scenarios as a control to rule out the argument that promotional information on its own creates consumer aversion that impacts decision-making.
In this scenario, participants were asked to imagine that they purchased neck massager at a cost of 500 yuan because of neck discomfort. They were also given either a positive or a negative review about the neck massager. In the high satisfaction group, the review is positive: “This neck massager is exquisitely made, comfortable to wear and has a heating function. The massage force is moderate and provides a very comfortable neck massage.” The review for the low satisfaction group is negative: “This neck massager is roughly made, uncomfortable to wear and has no heating function. The massage force is too weak and does not provide a comfortable neck massage at all.” The seller’s response to the review appears on the second page of the questionnaire. Additional file 2 shows the response allocated to each scenario, plus the promotion-only text. The variables and measurements were the same as for Study 1.
Hypothesis tests: study 2
Measurement model
To evaluate the psychometric adequacy of the constructs, we conducted a confirmatory factor analysis. The results, shown in Tables 6 and 7, illustrate that all factor loadings were significant (p < 0.001), ranging from 0.71 to 0.90. The Cronbach’s alphas for relationship quality for the positive and negative review groups were 0.91 and 0.96, respectively. The CRs were 0.92 and 0.96, which exceeds the benchmark of 0.70, suggesting that the measures are reliable. Relationship quality had an AVE well above the recommended value of 0.50, ranging from 0.58 to 0.75, and, according to the Fornell and Larcker’s (1981) test, all the constructs have discriminant validity. The VIFs were lower than the recommended value of 5 (the maximum was 1.03), verifying that multicollinearity is not a threat.
Table 6 Study 2 statistics and correlation coefficients—positive reviews Table 7 Study 2 statistics and correlation coefficients—negative reviews
Structural model
The results of the hypothesis tests are consistent with the findings of Study 1(see Tables 8 and 9). H1 is again supported (βresponse to positive review = 0.205, p < 0.01; βresponse to negative review = 0.344, p < 0.001), from which we conclude that including promotional information in response to an online review adversely affects customer perceived purpose of response. Further, these perceptions significantly weaken relationship quality, in support of H2 (βresponse to positive review = − 0.200, p < 0.01; βresponse to negative review = − 0.248, p < 0.01). However, the effect of customer perceptions on repurchase intention in the positive review group was not strongly significant (β = − 0.115, p < 0.1) and yet extremely significant with the negative review group (β = − 0.206, p < 0.05). Therefore, we only find partial support for H3.
In the case of the seller responses to positive reviews, the model explains 16% of the variance in relationship quality (Adjusted R2 = 0.14) and 16% of the variance in repurchase intention (Adjusted R2 = 0.14). The fit of the model was good, with an SRMR of 0.05. With the negative reviews, the model explains 42% of the variance in consumer expectation (Adjusted R2 = 0.42) and 37% of the variance in product return (Adjusted R2 = 0.37). The SRMR was 0.036, which indicates the fit of the model was good.
Among the control variables, we found that satisfaction had a positive and significant coefficient, signaling that consumer satisfaction increases the strength of customer relationship and repurchase intention. The analysis of variance with the control scenario revealed a significant difference in perceptions between promotional information on its own and when combined into a response to an online review. For relationship quality, the impact was Myes = 3.96, Mno = 4.35; F(1, 431) = 4.42 (p < 0.05). On repurchase intention, the impact was Myes = 3.99, Mno = 4.47; F(1, 431) = 7.28 (p < 0.01). Both indicate a substantial reduction in favor when promotional information is included in a response to an online review (Tables 8 and 9).
Table 8 Study 2 hypothesis test results—positive reviews Table 9 Study 2 hypothesis test results—negative reviews