1 Introduction

The booming web culture characterized by peer-to-peer collaboration, sharing data and consumer generated content, is transforming business and has changed consumer behavior (Cantallops and Salvi 2014). Before buying a product or a service a customer can consult online reviews from previous customers, which contributes to shifting the power from companies to consumers (Hennig-Thurau et al. 2004). In general, word of mouth (WOM) is considered as one of the most influential factors with great importance for consumer behavior (Daugherty and Hoffman 2014) This factor may be especially important when buying intangible products, like hospitality, which hardly could be evaluated prior to consumption. For such products, WOM could help the potential buyer to reduce the risk (Hussain et al. 2017) and increase confidence.

The travel market is a heavily affected industry, where online reviews influence over $10 billion purchases in online travel per year (Vermeulen and Seegers 2009). Within the tourism industry, hotels are likely to be the most affected (Cantallops and Salvi 2014) and TripAdvisor users confirms that online reviews are important when deciding “where to stay” (77.9%) (Gretzel and Yoo 2008). It is claimed that more than 60% of consumers consult customer reviews before making a purchase (Mauri and Minazzi 2013). Clearly, consumer generated content in terms of electronic word of mouth (eWOM) plays an important role in hospitality industry, and especially lodging. It is not surprising that eWOM is considered as the most influential source of travel information proceeding a purchase (Sotiriadis and Van Zyl 2013). From a company perspective, positive online reviews can significantly increase the number of bookings at a hotel, e.g. a 10% improvement in reviewer’s rating is estimated to increase sales by 4.4% (Ye et al. 2009).

Traditional WOM has perhaps been shadowed somewhat due to a research focus on eWOM over the last years. Even though these two concepts may seem to be the same, they are actually very different (Huete-Alcocer 2017). The body of knowledge regarding how these differences are related to the impact on consumer behavior is relatively limited. One study suggests that the impact of others opinion may look differently dependent on whether the source is “social” (e.g. colleagues) or anonymous (e.g. internet) (Viglia and Abrate 2014). The fact that eWOM most often are anonymous may also indicate a lower credibility than traditional WOM (Huete-Alcocer 2017). This is important to investigate further since source credibility may be the most important influential factor (Sotiriadis and Van Zyl 2013).

In this study we want to further explore the difference in impact between eWOM and traditional WOM. The primary aim is to investigate the impact of a single review made by a good friend (WOM) compared to the impact given by the online majority. The study is experimental and based on a fictive hotel and its online reviews. Respondents will be randomized to either read reviews with an overall positive opinion or with an overall negative opinion. The primary endpoint is the respondents booking intention. We will analyze the booking intention, given the overall opinion, and analyze if this intention changes when the respondent gets a WOM (positive or negative) from a good friend.

2 Related research—state of the art

The impact of eWOM on consumer’s decision-making process have been explored in previous research, by considering various products such as books, movies, softwares and hotel stays. Generally, there seems to be an agreement that reviews (WOM both face-to-face and online) have a potential to influence customers decision making (Cantallops and Salvi 2014), and as a matter of fact WOM is ranked as the most important information source preceding a purchase decision (Litvin et al. 2008). The results point out that the overall valance i.e. the opinion of the majority, also called the “bandwagon effect” (Moe and Schweidel 2012), is important. There is a significant difference in factors such as booking intention and hotel consideration when comparing positive and negative overall valence, i.e. if most of the reviews were negative or positive (Sparks and Browning 2011), especially noticeable for lesser-known hotels (Vermeulen and Seegers 2009). It has also been shown that recent reviews are more influential than older ones, and that recent positive reviews actually could override or at least moderate the effect of older negative reviews (Sparks and Browning 2011).

The use of online reviews rather than expertise assessments found in travel magazines or by organizations like the Automobile Association (AA) significantly reconfigure the everyday practice for hotels, and the organization tends to be more micro-managed by the constant flow of assessments, due to the performative nature of the materialization of service processes (Orlikowski and Scott 2013, 2015). The research on how a hotel should respond to a negative review is rather limited, but some interesting results indicates that it is worth to respond, preferably rather quickly, and to use a “human voice” rather than being formal (Sparks et al. 2016).

Even though eWOM offers customers a convenient and easy accessible way for making evaluation and reduce the risk of “buying the pig in a poke”, there is also another side of the coin. According to Nielsen (2016) only roughly 10% of all users of an online community contributes to the content. Moreover, customers who post a review are generally extremely satisfied or extremely dissatisfied (Litvin et al. 2008; Hu et al. 2009). There may also be a purchasing bias, i.e. customers who have a favorable disposition towards a specific hotel are more likely to book that hotel and have the opportunity to review the hotel after the stay (Hu et al. 2009). Another source of bias is fake reviews, aiming at either improving a company’s reputation or to give an unfair view of a competitor (Dellarocas 2006; Hu et al. 2012; Ott et al. 2012). One estimate based on data-mining analyses of Amazon, indicates that as many as one third of all reviews may be fake (Streitfeld 2012). It is also shown that higher predecessors’ ratings affects subsequent reviewers to also enter a high rating with a larger probability, i.e. a phenomena of herding occurs (Lee et al. 2015).

However, despite these sources of bias, internet users find online reviews more trustworthy and credible than to commercial sources, e.g. common marketing documents from travel agencies (Litvin et al. 2008).

If the reader knows a reviewer of an eWOM personally that would intuitively be almost the same as receiving the review in person. On a site like Trip-Advisor most of the reviewers are unknown for the reader and are in that sense anonymous. But, with some information about the identity, e.g. country of origin, even though the identity is still anonymous, and the person is unknown for the reader, such information could anyway contribute with profiling the reviewer. How the credibility of a review is affected by the identity of the reviewer, i.e. if the identity is completely anonymous or accompanied with information that gives a profile, is relatively unexplored (Lee et al. 2011). One previous study indicates that negative reviews are perceived as more credible than positive, even though the initial trust may be higher for positive review, and there may be differences in credibility between positive and negative reviews. However, the differences found are significant as long as the reviewer’s identity is disclosed and are not significant when the identity is not disclosed (Kusumasondjaja et al. 2012). Another study argues that the identity of the reviewer may be related to credibility based on both trustworthiness and expertise, which in turn affects the attitude and intention (Ayeh et al. 2013). It is also concluded that identity-relevant information about reviewers are important for community member’s judgement of the products and reviews in itself (Forman et al. 2008). The profile of a reviewer could be based on reputation cue, i.e. how helpful previous reviews from that reviewer have been perceived by other readers. Furthermore, a profile could include personal information and a picture. A picture may make the impersonal reading of an online review somewhat closer to a face-to-face conversation with a friend (Xu 2014). Reputation seems to influence the perceived credibility (Xu 2014). A recommendation for websites with customer generated content is to provide more signals that would help readers to assess a reviewer’s credibility (Filieri 2016). Intuitively, a review made by a good friend have high credibility and may be worth more than an anonymous review. This assumption is supported in a previous study showing that herding effect is smaller among friends and that the impact from the anonymous crowd decreases as the volume of reviews made by friends increases (Lee et al. 2015). The presentation of previous research given above illustrates the complexity of eWOM and a lack of research that highlights differences between eWOM and traditional WOM (Porter 2017) and how these differences are related to impact.

3 Conceptual framework and hypotheses

3.1 Word of mouth: online and face-to-face

Electronic word of mouth could be defined as internet-mediated peer to peer recommendations and opinions (Dellarocas et al. 2007). The digital technique for mediating eWOM could be everything from small weblogs and chat rooms to large commercialized platforms like TripAdvisor which is the currently most visible third-part eWOM site in the hospitality industry. The reason for taking part in an eWOM community varies due to the type of community, but could include: information sharing, socializing, information exchange, friendship, social support, and recreation (Ridings and Gefen 2004). Also factors like “sense of community” and contributing—helping others are popular motivations for being active in a community. Perceived customer satisfaction may also be an incentive, especially for customers who are either extremely satisfied or dissatisfied (Litvin et al. 2008; Hu et al. 2009; Cantallops and Salvi 2014). The fundamental advantages, i.e. that it is convenient, fast and follows a one-to-many logic (Phelps et al. 2004; Sun et al. 2006) makes it, intuitively, an efficient distributor and a potential influencer.

3.2 Influence of eWOM and manager’s response

It is claimed that word of mouth is the dominant information source in the purchase decision process (Litvin et al. 2008). Word of mouth may be extra important in the travel industry since the products are intangible and difficult to evaluate prior to the actual consumption. The overall valence, i.e. if the majority of the reviews are positive or negative, has an impact on product evaluation and purchase intention (Vermeulen and Seegers 2009; Sparks and Browning 2011; Browning et al. 2013). Vermeulen and Seegers found that the impact was greater for lesser-known hotels. Consumers seems to be more influenced by recent reviews. In the work by Sparks and Browning (2011), it was shown that a positive framing (the two most recent reviews were positive) can produce in the consumer a higher booking intention than negative framing (two most recent reviews: negative). They also noted an interaction effect between overall valence and framing, implying that recent positive reviews could to some extent compensate an overall negative valence. In this study, we have included positive and negative overall valence and framing as experimental factors. This gives us the possibility to confirm these previous research results.

Research considering whether it is favorable to respond to a negative review is rather equivocal. By ignoring the unwanted situation and acting like nothing serious has happened, individuals may be able to mitigate negative consequences (McLaughun et al. 1983). A passive strategy could however damage the company’s image (Lee and Song 2010). In one study it is shown that overall valence has impact on purchasing intention, but that manager’s response to reviews may even have a negative impact on purchasing intention (Mauri and Minazzi 2013). A recent study, gives the opposite message, i.e. that a response is favorable compared to no response, and that a response should be timely and use a “human voice” (Sparks et al. 2016).

In this study we want to further explore the effect of a response to a negative review, and how this may be mediated by subsequent negative reviews with the same content. According to traditional service quality research, the perceived service quality is related to the agreement between expected service quality and perceived service quality (Grönroos 1984). When a manager is responding to a negative review and promise to take some action to hinder repetition of the failure, this mean that customers do not expect the same kind of criticism to be seen repeatedly again. Consequently, if negative reviews with the same complain continues also after such a response this could generate an image of low service quality, i.e. the manager does not seem to fulfil the promised improvements as expected. We include response (or not) on both older and recent complains an experimental factor, enabling us to study the importance of a reply.

3.3 Social comparison, credibility and hypotheses

The research field: hospitality and tourism management has to a large content used social psychology as a theoretical foundation (Tang 2014). One of the cornerstones of social psychology, which also is important considering the aim of this study, is social comparison. The theory of social comparison is based on a belief that there is a drive for individuals to evaluate one’s own opinion and beliefs (Festinger 1954). Furthermore, the theory described by Festinger, explicates how self-evaluations are refined when being compared with others opinion. One important conclusion is that comparisons with others tends to diminish if the others belongs to a group with other common opinions than oneself, and if there is a range of possible persons for comparison then the comparison will be with someone close to one’s own opinion (Festinger 1954).

Another word for describing individuals that are similar is homophily (McPherson et al. 2001) and it is suggested that credibility of eWOM may increase by perceived homophily, i.e. that the reader feels that the reviewer “is like me” (Ayeh et al. 2013). Consequently, since source credibility is claimed to be the most influential factor (Sotiriadis and Van Zyl 2013), homophily may be an important underlying factor affecting opinion and in turn the behavior.

Credibility as mentioned above, sometimes also referred to as believability or reliability, are often conceptualized with two underlying dimensions: trustworthiness and expertise (Cheung et al. 2008; Ayeh et al. 2013; Park et al. 2014). Third-part websites with user generated content (UGC) are in general considered more crediblethan websites provided by the hotel or travel agency (Litvin et al. 2008). If the reviewer is anonymous or not, and if any personal information or if the reviewer has a reputation (a sign of expertise) of delivering helpful reviews may affect credibility (Xu 2014). Beyond personal information of the reviewer, the review in itself provides some information which affects the credibility, e.g. writing style, content, valence, and whether it seems to be consistent with other reviews or not (Kusumasondjaja et al. 2012; Ayeh et al. 2013; Mauri and Minazzi 2013; Xu 2014; Filieri 2016).

In sum, a vital factor for WOM and eWOM is credibility. And there seems to be three important aspects considered when a WOM’s credibility is evaluated: where it was given, by whom and content. Considering eWOM on a third-part site we argue that the degree of homophily may be low and that the credibility of a single review may be low. It is also difficult for a potential customer to judge if a review comes from “someone like me” or not or to judge the reviewer’s expertise. The situation with WOM from a good friend is completely different. The degree of homophily is usually known and high which implies high credibility. The fact that the homophily is known may include knowledge about the good friend’s preferences regarding a hotel and how similar they are to one’s own preferences, which makes it possible to calibrate for potential differences. A good friend’s travelling habits and level of expertise is also known and possible to account for. Thus, if we only consider a single review it is rather clear that WOM is superior to eWOM and that a single WOM has higher impact on booking intention than a single eWOM. But, considering the large number of available eWOM on a travelling site makes the comparison more complex. A third-part travelling site may contain a large number of reviews and the aggregated credibility and impact may offer a considerable amount of aggregated credibility and impact on booking intention.

Given the arguments above, we believe that a good friend’s review will change the booking intention previously based on the online majority. If the good friend’s review agrees with the online majority, we expected that the common opinion would be somewhat enhanced and that booking intention would increase (given joint positive opinion) or decreased (given joint negative opinion). But, in order to contrast the power of eWOM and WOM we were primarily interested in how a goods friend’s review could alter the booking intention if the good friend’s opinion was opposite to the online majority. Due to the high degree of homophily and credibility we hypothesize that the booking intention will be altered to some extent. Beyond testing the hypotheses given below, we also wanted to estimate the magnitude of such alteration.

H1: A good friend’s positive review compensate a negative online majority and thereby increase the booking intention to some content.

H2: A good friend’s negative review compensate a positive online majority and thereby decrease the booking intention to some content.

4 Method

4.1 Responders

A number of previous research studies use samples of students (Mauri and Minazzi 2013; Xie et al. 2011; Park et al. 2014) and there is a call for more research on non-student population (Min et al. 2015). In our study participants on an online course were engaged as questionnaire distributors rather than responders and all participants were asked to deliver exactly 15 questionnaires. According to a previous study, participants on this online course are spread all over Sweden and are in average more than 10 years older than students taking the course on campus; and roughly 80% of the participants are working at least part time (Gellerstedt et al. 2014). Thus, it is likely that these participants chose family members, friends and colleagues as responders instead of other students, which might could be expected on a campus course. The sampling procedure was also stratified, and each participant were instructed to: not chose other participants, have a balanced gender distribution (7 or 8 males), and to choose five responders from each age span: 18–39, 40–59 and 60 +. Each of these three age intervals includes roughly one third of all individuals in Sweden above 18 years of age. This strategy for collecting questionnaires using participants on a course was used three times, aiming at collecting around 1000 responders. Logistically we had the possibility to run the sampling in 2014, 2016 and 2018 which engaged 60, 31 and 32 participants, respectively. In total this corresponds to 123 participants delivering 1845 questionnaires, whereof 1319 were responded to (71.5% response rate). Sweden is suitable for studying behavior on internet due to the high degree of digitalization. According to a recent report 94% of all households in Sweden have internet access and nearly all below 65 years of age uses internet and among the 65 + in age 91% uses internet (Internetstiftelsen 2018).

Among the 1319 responders 46% were male and 54% female (excluding 63 persons who did not disclose their sex), 41% were within 18–39 in age, 33% were 40–59 and 26% were 60 + in age. Thus, our sample deviates somewhat from the general population with a higher proportion of females and young people. This is however in line with the profile of people who book online, which more frequently are female and young (Cantallops and Salvi 2014). The proportion of responders with higher education (university/college-studies) were 59%, which is higher than in the corresponding population (43% in 2017). Moreover, 82% of our responders’ booked online last time they booked a hotel and 68% read previous guest reviews before booking, whereof 90% claimed to be affected by these reviews. These figures is comparable with previous reports, e.g. that roughly 60% of all customers consult online reviews before booking (Mauri and Minazzi 2013) and that 84% of all are affected by reviews (Vermeulen and Seegers 2009). Overall, we believe that our responders have experience and knowledge matching the purpose of this study.

4.2 Design

This study is to a large extent inspired by the impressing work done by Sparks and Browning (2011). We used the same experimental design and used a fictive hotel site displaying twelve guest reviews. The twelve reviews were arranged in two sections. The first section, including six of the reviews, was titled: latest reviews. The second section (lower down on the page) was titled: older reviews (at least a month old). The page with reviews were mimicking a web-page of a hotel, including a picture of the hotel and some general description of the hotel: “centrally located 3.5 star hotel, all rooms equipped with bath, wifi, coffee machine”. See “Appendix” for a typical fictive hotel page for one of the experimental situations. Naturally, the simulated web-page looked exactly the same in all experimental situations except for the included reviews. The questionnaire was distributed online using an inbuilt procedure for randomization of the different experimental situations. We had three experimental factors. The overall valence [factor: overall valence], i.e. if the majority were positive or negative (two possible values: positive or negative), the latest two review (two first reviews from the top of the fictive review page) were either both positive or both negative (two possible “frames”: positive or negative) [factor: frame]. Finally, the reviews could be without any reply from the hotel or have a reply on an old complain or on the most recent complain, thus three different versions of reply [factor: reply]. These three experimental factors: overall valence (2 possible values), frame (2 possible value) and reply (3 possibilities) generates in total 2 × 2 × 3 = 12 different situations. The example in “Appendix” illustrates the combination: overall valence: negative, frame: positive and reply: reply on an older complain). The questionnaire started with some basic demographical questions (gender, age, level of education (elementary/high school/college), and then the responder got the information that the next page will illustrate a page with previous customers reviews (see example page in “Appendix”), and the responder were asked to read all reviews thoroughly. In the next step the responder was asked “How did you perceive the previous guests reviews in general”, as a control that the experiment situation was noted by the responder. Thereafter the questionnaire continues with a statement about booking intention: “Assume that you are travelling to the city were this hotel is located. After reading the reviews, it is very likely that I would book a room at this hotel” accompanied with the shorter question: “Would you considering booking a rook at this hotel”. Thereafter some items related to service quality of this hotel were addressed (unpublished data) and then the question of booking intention was repeated twice but given a good friends recommendation or advice against the hotel as added information. The questionnaire ended up with some questions about experience and opinion about using previous guests review as base for decision (unpublished data). The questions addressing booking intention are discussed more in detail in the section: dependent variable.

4.3 Construction of reviews

One of the most common reasons for dissatisfaction among guests is failure to deliver service quality (Browning et al. 2013). Previous research shows that the valence of guest reviews regarding service have an impact on booking intention (Sparks and Browning 2011). The same study showed no significant main effect between reviews targeted to service compared to reviews targeted to core features of the hotel (size of rooms, cleaning quality, etc.), neither was ratings included in the review significant. Due to the importance of service quality and considering previous research results and the ambition to keep the experimental design as pure and simple as possible, we choose to only target service quality in the reviews. We scrutinized trip-advisor and used a number of suitable reviews as inspiration. The reviews that we used were rephrased slightly and translated to Swedish. Typical reviews included phrases like: “really service minded”, “friendly”, “helpful staff”, “not at all helpful”, “low service level”, “bad attitude”, etc.

4.4 Valence of reviews

Each experimental situation, i.e. the fictive hotel page displaying twelve customer reviews had either an overall positive valence (8 out of 12 reviews were positive) or an overall negative valence (8 out of 12 reviews were negative). The positive and negative reviews were of roughly equal length and in corresponding pairs, i.e. a positive review could have the title “good personal service” and include sentences like “very helpful and friendly staff” while the corresponding negative review had the title “bad personal service” and include the sentence: “hard to get some help and not that friendly staff”.

4.5 Frame

Due to the fact that previous research points out the importance of the latest reviews we adopted the same approach as Sparks and Browning (2011), and started with either two positive reviews or two negative reviews (the latest reviews).

4.6 Reply

As pointed out previously, the research results regarding the importance of making a reply to a negative review or not is equivocal, and thus we wanted to study this issue further. According to Sparks et al. (2016) a reply could be valuable, and a reply should use a “conversational human voice”. The reply we used was:

Thank you for your valuable review. We are sorry that our service was unsatisfying. We will promptly discuss this with our staff. We hope that you will give as a new chance to provide you with a pleasant stay with friendly staff with a smile in their face, making you get the same smile. You are very welcome to contact me directly if you have any more considerations or suggestions. Kind regards Martin, manager.

This independent variable—“reply” had three levels. Firstly, the reply could be given review in the second section of reviews, i.e. the section with the title: “older reviews”, ensuring that there are negative reviews given also after this single reply. Secondly, a page with reviews could be left completely without any reply. Thirdly, a page could include one single reply on the latest given negative reply (ensuring that no negative review was given after this reply).

4.7 Dependent variables

We choose to focus on booking intention as dependent variable. The number of bookings can be heavily affected by reviews, which makes booking intention crucial. In order to enable comparisons we measured booking intention in the same fashion as Sparks and Browning (2011). The statement, which is our primary variable, reads as follows:

Assume that you are travelling to the city were this hotel is located. After reading the reviews, it is very likely that I would book a room at this hotel (response scale: 1 = Strongly disagree to 7 = Strongly agree).

In order to evaluate the seven different points of the scale we also used the straightforward question: “Would you consider booking a room at this hotel” (yes/no). As expected, there is a strong relationship between these two variables (p < 0.001, Cramer’s V = 0.7), and the relationship illustrates the differences between different points on the seven point grade scale. There is a huge step between point 2 and 3, i.e. an increase of 41 percentage points in increased  %Yes-answers, while there is only small differences between the three highest points (see Table 1). It is rather obvious that an increase in the lower- and mid-part of the scale is vital in comparison to a change in the top upper part of the scale. Previous research illustrates mean values in the mid-part of the scale, which may make obtain differences important, in terms of  %Yes-answers regarding booking consideration.

Table 1 Proportion of responders who would consider booking a room (%Yes) for each grade of the seven point grade booking intention scale

For being able to study if a good friend’s recommendation affect the booking intention, we added the question: “Now assume that a good friend of yours recommend the hotel and has mentioned that the service is good”. And then the seven-point graded booking intention question was repeated. In the questionnaire 2018, we also added the assumption that a good friend advise against the hotel and mention that the service is bad, followed by the booking intention question.

4.8 Statistical analyses, reliability and validity

For analyzing the experimental factors: overall valence, frame and reply and their potential impact on booking intention (seven-point graded scale) we used a standard 2 × 2 × 3 factorial ANOVA including interactions. Due to the non-representative sample, we added gender, age group and education as factors. No interactions between demographics and experimental factors were significant. For the purpose of checking reliability we run an analysis including year into the model. There were no significant interactions between year and experiment factors. Since there were no interaction between demographics/year and experimental factors, we choose to present only the standard model with experimental factors as main and interaction (two and three way) effects. We did residual analyses, and found symmetrically distributed residuals, no disturbing extreme values, and no heteroscedasticity. Thus, we found no evidence of model violations. We also used the yes/no-question regarding booking consideration as a dependent variable using logistic regression.

For analyzing how the booking intention was affected by a good friend’s recommendation we used a two-way ANOVA with the experimental factors: overall valence, frame and reply as between-respondents factors and the booking intention before and after a good friends recommendation as a within-respondent factor. The same model was used for analyzing a good friend’s advice against the hotel.

The major strengths with this study are the experimental design and a large sample size. As a reliability check we analyzed the results for each year the survey was distributed (2014, 2016 and 2018). Descriptive statistics showed consistent results over all years and in line with the non-significant interaction effects in the ANOVA-analysis. Thus, that the between years reliability (“repeatability”) was high. Several previous studies used students as responders and to our knowledge there are no studies with pure independent random samples from a general population, regarding this study aim. We used participants on an online course as distributors of the questionnaire rather than respondents. It is likely that the participants asked colleagues, friends and family members. This may be an explanation to the relatively high response rate (71.5%), compared to independent random samples. It may be assumed that a person is willing to respond to a questionnaire if the request comes from a friend or colleague, no matter if the responder is interested in the subject or not. Thus, the biased caused by the fact that people with a certain opinion and interest in the subject are more likely to respond than others, may not be of the same magnitude in this study as compared to an independent random sample study. However, our sample turned out to differ somewhat from the intended population. But as pointed out above, there was however no interaction between gender, age, education and the experimental factors, indicating that the results would be the same if the sample would be perfectly representative regarding these demographics. As will be shown in the next section our results confirm previous findings and this homogeneity indicates validity agreement. The dependent variable booking intention (seven-point graded scale) has been used in previous research which enables comparisons. We believe that the question is straightforward and has a strong face validity. We were, however, concerned about the seven-graded ordinal scale and the magnitude of differences between the different steps on that scale. The subsequent yes/no-question (“Would you consider booking a room at this hotel”) showed that the lower and midpart of the seven-point scale was crucial. This evaluation makes it easier to discuss the potential practical impact of significant effects on the seven-point scale. Meaning that statistical significances also are of any practical significance, i.e. a form of relevance validation. In sum we conclude that the study has high validity, reliability and context relevance.

5 Results

5.1 Did the manipulation with the experiment factors work?

After reading the fictive hotel page and the guest reviews, the responder was asked the question: how did you perceive the previous guests reviews in general? (scale: − 3, − 2, − 1, 0, 1, 2, 3 with anchor descriptions: − 3 = entirely negative and 3 = entirely positive). There was a significant relationship between valence and perception (p < 0.001, Chi square test), which confirms that the manipulation had an effect (Cramer’s V = 0.6 “large effect”), see Table 2 for details. Roughly 6 out of 10 of the responders reading reviews with a positive valence perceived the reviews as positive and 21% as neutral and 18% as negative. The responders reading reviews with a negative valence had a higher degree of corresponding perception with 76% as negative, 14% as neutral and 10% as positive. A positive frame could strengthen a positive valence and increase the proportion perceived as positive from 61 to 70%., while the negative valence is unaffected. A negative frame could decrease the proportion of responders with a positive perception from 61 to 53%, while the responders with a negative valence are stabile in perception. Thus, frame seems to be important for experimental situations with a positive valence but not for situations with a negative valence.

Table 2 Perception of guest reviews overall (answering alternatives − 3 to − 1 categorized as “Negative” and 1–3 as “Positive”)

There was no significant relationship between the different levels of reply and how the respondents perceived the guests reviews in general, even though there was a small difference between replying to the latest complain (39% of responders perceived the reviews as positive) compared to replying on an old review or not replying (33% and 34% perceived the reviews as positive (p > 0.20). Thus, we cannot claim that the level of reply affected the responder’s perception on the given guest reviews.

5.2 The effect of valence, frame and reply on booking intention

The Anova analysis gave significant main effects for overall valence (p < 0.001, η2 = 0.208) and frame (p < 0.001, η2 = 0.020), but not for reply (p > 0.20, η2 = 0.002). Furthermore, there was only one significant interaction effect and that was between valence × frame (p = 0.002, η2 = 0.007), see Table 3 for more details. As indicated by the effect sizes (η2’s) the effect of overall valence, having a medium effect size, was the dominant effect, as compared to the small effect size of frame, which is consistent with previous research. The average booking intention for positive valence was 4.14, which was 1.36 units higher in average than the average for negative valence: 2.78 (Table 4). This corresponds to a difference between 76.2 and 42.1%, i.e. 34.1%-units regarding proportion of respondents who would consider booking a room at the hotel (Table 4). The difference between negative and positive frame was lower: 2.86–2.70 = 0.16, see Table 4. Regarding the significant interaction effect, the difference between negative and positive frame was higher within positive valence than within negative valence. As seen in Table 4, the difference between a negative and positive frame within positive valence was 4.45–3.84 = 0.61 (simple effect test: p < 0.001, η2 = 0.026), while it was 2.86–2.70 = 0.16 (simple effect test: p = 0.138, η2 = 0.002) within negative valence. The difference of 0.16 was thus not significant. In a logistic regression analysis of the proportion of responders would consider booking a room at this hotel, the main effect of valence was significant (p < 0.001, odds-ratio = 4.1) and so was also the main effect of frame (p = 0.029, odds-ratio = 1.8), but reply was not significant, neither was any of the interaction effects. Table 4 illustrates the estimated proportions based on the logistic regression model. The difference between a negative and positive frame was in total: 64.9–53.7% = 11.2%. The effect of frame within negative valence was: 46.8–37.8% = 9%. The effect of frame within positive valence was: 82.5–70% = 12.5%. In other words, framing had roughly the same effect no matter of valence, i.e. no interaction effect, see Table 4 for details. In sum, regarding the primary variable booking intention (seven-point scale) the dependency on overall valence and frames, including an interaction, confirms previous research.

Table 3 Anova table
Table 4 Average booking intention by valence and frame

5.3 Main results: the effect of a good friend’s recommendation or advice against the hotel

The two-way Anova analysis (including all interactions) with the difference between booking intention with and without a good friend’s advice as a within-respondent variable and the other experimental factors as between-respondent factors showed that a friend’s recommendation had a significant effect no matter if the responder belonged to the group with negative or positive valence, see Table 5. There was only one significant interaction, i.e. the influence from a good friend’s recommendation was stronger for negative valence than for positive valence (p < 0.001). It is interesting to note that the average booking intention after a good friend’s recommendation is 4.3 for responders with negative valence, which actually is slightly higher than 4.1 which is the initial booking intention in the groups with positive valence. In other word, one single good friend’s recommendation could outweigh a negative valence and even pass the booking intention for positive valence.

Table 5 The influence of a good friend’s recommendation or advice against booking

The opposite results was found when the good friend advise against booking the hotel, see Table 5. The good friend had a significant impact for both positive and negative valence. And as in the previous situation the impact is as greatest when the good friend’s perceptions is different to the valence (p < 0.001). In this case a good friend’s advice against booking lowers the booking intention with 1.5 and 1.1 units in average, respectively (p values below 0.001). And, a good friend’s advice against booking the hotel could outweigh a positive valence and lower the booking intention to roughly the same level as when the valence is negative.

To sum up, our results confirms previous research results and our hypotheses related to our primary aim was confirmed, as summarized in Table 6 below:

Table 6 Summary of hypotheses and conclusions

6 Discussion

This study confirms previous research results and proves that overall valence influence booking intention. Overall valence had the highest impact among the experimental factors. The proportion of responders who would consider booking a room at the hotel increased with as much as 34.1 percentage points. The magnitude of the difference between negative and positive valence indicates large impact for a hotel. As a comparison to the statement that a 10% increase in average rating could increase sales by 4.4% (Ye et al. 2009). The effect of frame was also significant even though the effect was not of the same magnitude. Interestingly the effect sizes (η2’s) where larger than those estimated in Sparks and Browning (2011), especially for overall valence (0.208 as compared to 0.026) but also for frame (0.020 as compared to 0.009). One possible reason for this might be the internet maturity level in Sweden as well as the fact that this study is based on data from 3 to 7 years later. Since reply had no significant effect on the booking intention we, in an explorative manner, rerun all analysis excluding reply as experimental factor. This had no effect on the results for overall valence and frame or for the interaction between overall valence and frame.

This study also contributes methodologically. The seven-point graded scale used for measuring booking intention has been used in previous research, and according to our opinion the question is straightforward with high face validity. Only 9 responders out of all 1319 (0.07%) did not reply to this question. When measuring abstract phenomena, it is usually wise to use a number of items for constructing an index, but according to our beliefs this single question captures the “booking intention” in a straightforward and easy way. But, even if the question (or statement) is distinct, the scale for responding to the question (statement) may not be equally explicit. In this case one may wonder if it would have been better to use a scale from 0 to 100%, since it is about making a probability judgement. We did however an evaluation, and a comparison of the “yes/no-question” (Would you considering booking a room at this hotel), with a seven levels scale and obtained that the steps between 2–3 and 3–4 on the seven-point graded scale are the most “dramatic” steps. Noteworthy, it is exactly in this part of the scale where we find our effects. In other words, changes on this part of the scale is of higher practical importance compared to changes from 5–6 to 6–7. The most dominant factor is overall valence, hence changing from an obtained negative overall valence to positive valence should be of great concern for hotel managers. The experimental factor of the response from the hotel was not significant. We also analyzed the three pairwise comparisons between the three levels of this factor, but there were no significant differences. Due to the large sample size, the risk of type II-error is low, which indicates that the non-significant results simply either indicates that responding (on either an old or a new complain) or not has no practical effect or that our experimental factor was badly designed and did not convince our respondent. One could speculate that if a response is followed by even more positive reviews than before the response would give some credit to the response and the efforts made by the hotel. But this was not included in our experimental design.

We expected that a good friend’s opinion should have an impact. We anticipated that in situations when the good friend’s opinion is opposite to the online majority this would compensate the booking intention to some extent. This was confirmed in our tests. We were, however, surprised by the magnitude of the effect. We had not expected that a single good friend could overweigh the online majority, i.e. a positive good friend and a negative online majority, gives the same booking intention of a positive online majority (no advice from a good friend). And, the other way around: a negative good friend and a positive online majority, gives the same booking intention to a negative online majority (no advice from a good friend). It is interesting to consider why one single good friend could have the same impact as the online majority. As pointed out previously, the credibility is dependent on trustworthiness and expertise. Since it is a good friend, trustworthiness should not be a problem. And since it is a good friend it may be easier to evaluate the expertise. Good friends have at least a fairly good knowledge regarding each other’s preferences and could calibrate judgements. For instance, if a person knows that the good friend is pickier regarding hotel service, this could be taken into account when considering the hotel. The good friends expertise in terms of travelling habits, frequency of hotel visits, type of travels, etc. is also know, which makes it easier to assess and take account to the level of expertise. A good friend could be regarded as some kind of customized expertise, who could give a customized opinion. This is a kind of customized credibility. Source credibility is claimed to be the most important predictor for if the information is going to be used or not (Ayeh et al. 2013). Furthermore Ayeh et al. (2013) shows that homophily affects both trustworthiness and expertise, and thereby credibility. Two good friends are likely to share common interests and think alike. In an online setting, homophily relates to the extent a community attracts users with the same interest and the same mindset. A community like TripAdvisor attracts users interested in travelling which attracts a wide community, implying that the level of homophily may be rather low between different users. Thus, one possible explanation is that the sum of the opinions given by several persons with low level of homophily and thus low level of credibility actually is outweigh by a good friend due to the higher level of homophily and thereby credibility.

Interestingly the interaction between frame and valence shows that the effect of frame on booking intention is highest when valence is positive. This is contrary to the results of Sparks and Browning (2011) where the effect of frame on booking intention was lowest when valence is positive. This calls for further interesting research. There are several plausible reasons for this that needs to be investigated further. These reasons include cultural differences between Sweden and Australia, difference in the simulated web pages design making the framing more dominant in our case where there is a clearer division between new and older reviews. Other more speculative explanations might be differences in availability of hotels with online reviews that makes the customers more prone to change booking intention when bad reviews are present and that potential hotel guests now are more used to using online reviews and knows that there are alternative hotels just a click away. One should however note that the effect size of the interaction is so small that the practical effect on booking is negligible.

7 Implications for further research and limitations

We believe that the impact a good friend had on booking intention compared to a majority online, and the possible explanation with level of homophily, touches on a really interesting research questions. We use the concept of “peers” in a number of situations, not to mention in academia, where mutual understanding is important in the review process. The mutual benefit of peer-to-peer cooperation may depend on the proximity in interest, knowledge and mindset, i.e. level of homophily. A higher degree of homophily may increase the mutual benefit, engagement and credibility. An example where homphily is used for marketing purposes is the Facebook lookalike audience, which help marketers to identify potential customers with a high level of homophily related to the product. Another example is the attempts made by the cooperation between Netflix and Facebook allowing FB-friends to share movie and series reviews with each other. Filieri (2016) recommends consumer review websites to provide more signals that would help consumers to assess reviewer’s trustworthiness, furthermore the identity may be important (Kusumasondjaja et al. 2012; Ayeh et al. 2013).

These recommendation may be further developed to also suggest that such websites could help a user to identify other homophily users. For instance, if an overall rate is used together with several other judgements of attributes (level of service, space, clean rooms, amenities, etc.), then it would be possible on an individual level, at least after a number of reviews, to estimate how much influence each attribute has on the overall rating. In other word, it would be possible to characterize frequent users preferences and which attributes are most important for a high overall rating. Given such a profile a simple algorithm could identify several frequent travelers with the same preferences, i.e. homophily travelers, and offer reviews from travelers who favors the same kind of hotels. There are also other data analytics possibilities for identifying travelers with high level of homophily, e.g. by finding travelers that have stayed on the same hotel previously and have similar reviews. If a site offers users the possibilities to also add information about other interests like: cultural interest, sport, activities, food, adventure, spa and relax, etc., this would increase the possibilities of finding homophily travelers who share the same interests and have the same preferences for a hotel stay. An example where homophily is used for marketing purposes is the: “Facebook lookalike audience-service”, which help marketers to identify persons who are likely to become new customers due to a high level of homophily with already existing customers. A potential problem is that a traveler may want to have some different profiles, depending on the aim with the trip. Preferences for a hotel may look rather different depending on the purpose of the stay, e.g. if it is business or pleasure, staying alone or with family. For instance, the choice of hotel in Munich, may be completely different when booking for a business trip on your own, compared to visiting with friends for attending the October festival or skiing with the family in winter time. Another issue is integrity and the willingness to share a profile. An intuitive hypothesis is that increased homophily also increases the probability that customers who based their decision on these reviews find them to be correct when consuming the product, i.e. it is more likely to find a suitable product if you take advices from persons with similar preferences.

Other issues that have been discussed within the research group, which seems to be rather unexplored in research is how reviews are used? Does a customer first collect a number of potential hotels, i.e. considerable similar hotels (making the question about consideration used in this study important) and after that use the reviews for discrimination and finding the final choice. Does a customer scrutinize a number of hotels, choose one hotel, but before booking just use the reviews as a check that there are no big warning signs? Or, are the reviews used already from the beginning to find a group of considerable hotels and thereafter check price etc.? If a number of friends suggest different hotels, will the reviews be used as an objective guide? In sum, we suggest that homophily is worth further exploration in online communities intended for peers. We also believe that more knowledge is needed about how the reviews are used.

Naturally, booking intention depends on tons of other factors beyond overall valence, frame, reply and good friend’s advice and therefore the booking intention varies between different responders even given the same experimental situation. But, the strength with a randomized study is that all these factors, some known and some unknown, are expected to be uniformly distributed across the experiment groups. Accordingly, a randomized trial allows us to interpret results found to be caused by the experimental factors. But we have some limitations in our experiment set up. First of all, we used the same proportion of positive/negative reviews all over, i.e. eight positive and four negative reviews when the overall valence was positive and vice versa when it was negative. Thus, we do not know how powerful the online opinion would be with a stronger or less strong majority. This could be a research question in forthcoming studies. Secondly, our study only answers the question if a good friend could alter the booking intention already affected by the online majority. We did not include the possibility to first declare a booking intention after a good friend’s advice and thereafter read the online opinion. A study altering the order of presentation of these two pieces of information and elaborating with different levels of majority would certainly be an interesting follow-up study. Finally, we believe that the factor “Reply” could have been made more explicit in order to increase the attention from the responders. A limitation is that we did not include explicit questions about how important the responders perceived reply, which could have given important information beyond the experimental factor.

8 Conclusion and practical implications

We found that the word of mouth from a good friend could outweigh the opinion from the online majority. A positive review from a good friend could outweigh a negative online majority, and the other way around: a negative review from a good friend could outweigh a positive online majority.

This study also contributes with a deeper understanding of the seven-point graded scale used for booking intention and shows that changes of the reply in the lower and mid part of the scale are essentials. Since, the overall average of booking intention is in this part of the scale, differences between negative and positive overall valence and the influence of a good friend are practically important. A change from grade 3 to 4 corresponds to 25 percentage units increase (58–83%) in probability of answering yes to the question: “Would you considering booking a room at this hotel”. The magnitude of the difference between a negative and positive overall valence was of the same size as the influence from a good friend and is estimated to be more than one unit (from somewhat below 3 to somewhat above 4) on the crucial part of the scale. Thus, we conclude that the effects observed are practically relevant and in turn of financial importance for a hotel business.

For hotel managers our results emphasize the importance of being active in social media, analyzing and using online review systematically as guidance for improvements. Our study indicated that a negative overall valence of reviews, or a positive but with negative framing could mean that a potential customer chose to make a click extra and check another hotel. In other word, when customers get increasingly more used with using eWOM and easily could browse around for alternative hotels, a more proactive work with service quality and social media becomes more important than ever. Why chose a hotel with some negative reviews when there are comparable hotels without such bad indications?

On the same time our results show that traditional WOM between friends also have great impact and should not be forgotten. To encourage satisfied guests to recommend the hotel to good friends is according to our results an effective action. An approach to monitor the effect of strategies and efforts for customer satisfaction is to systematically use and analyze customer satisfaction questionnaires which frequently includes the question “How likely is it that you would recommend this hotel to a good friend”. Thus, working proactively with WOM both online and offline should be prioritized on hotel managers agenda.

Our study also has important implications for third-part sites offering reviews for products or services. Strategies for increasing homophily, e.g. by collecting information valuable for clustering or using analyses for figuring out preferences, between users may increase the perceived credibility and thereby impact of reviews. For instance the third-part sites could offer the opportunity for a customer to select their top five important features in a hotel like for instance, reliable Wi-Fi, free Wi-Fi, cleanness, hotel restaurant quality, training facilities, etc. Using statistical regression models or AI-techniques for profiling preferences gives the opportunity to match customers with similar profile. The customer could then get reviews from customers with a similar important feature profile giving a higher level of homophily and hence thereby potentially increase the trust in the reviews and helping customers make informed decisions. This sort of “find you twin traveler” could give a competitive edge to online travel agencies. If third-part sites could offer more tailor-made help, intuitively customers will experience that information really was helpful and that the review were in line with their own experience, i.e. that the reviews were correct. This will increase the thrust and reliability and use of such help from third part sites. In a sense a review from a “twin traveler” may be equally valuable as a review from a good friend known to be reliable.