Keywords

1 Introduction

Customer reviews in E-commerce are playing an important and unique role; a staggering 90 % of people use and monitor reviews in their online purchasing process. However, the overwhelming number of reviews and inconsistent writing style require significant effort to read and tend to let important information slip by. To help users effectively and efficiently glean information from reviews, a number of systems have summarized customer reviews by extracting features and associate sentiments. From the perspective of customers, online purchasing can be viewed as a decision making process. In light of human decision-making theory, we learn that the foundation of effective information displays for user decision improvement is gaining a deep understanding of user decision-making behavior. However, no clear picture exists to systematically elaborate on how consumers make purchase decisions in E-commerce, in particular, with respect to customer reviews. In this paper, we take online hotel booking as an example to investigate customer decision-making behaviors in three stages of online purchasing: (1) screening out interesting alternatives, (2) evaluating alternatives in detail, and (3) comparing candidates for final choice. Interfaces that aggregate information from customer reviews are developed to support the three alternative stages. Through analysis of the results, we identity the decision strategies users utilize to process information and the information they are inclined to seek at each stage. These findings lay solid groundwork for designing E-commerce interfaces to improve consumer purchase decisions.

2 Related Work

2.1 Literature Research on the Summary of Customer Reviews

Most E-commerce websites, such as Amazon, provide an overall review score for each entity to help users make purchase decisions. However, given that people evaluate whether a product fits their desire in an attribute-driven manner [9], a number of systems have summarized customer reviews by extracting features and associate sentiment toward each feature. Liu et al. (2005) and Carenini et al. (2009) used bar charts to visualize the sentiment toward each feature [4, 10]. Carenini et al. (2006) summarized reviews in the form of a Tree map by representing a feature as a rectangle with nested rectangles corresponding to the descendants of the feature [3]. In addition to numerical ratings, Yatani et al. (2011) used adjective-noun word pairs to summarize the sentiment (adjective) towards each feature (noun) to help users explore reviews in greater detail. Huang et al. (2013) developed a system that can automatically highlight sentences that are related to relevant features to make a balance between reducing information overload and providing the original review context [8].

2.2 Three-Stage Decision Making Process of Online Purchasing

In most conditions, customers identify the need for a product or service without specific requirements on which one to buy [13]; accordingly they need to select interesting one(s) from a range of options that satisfy their desire. Chen (2010) interpreted online purchasing as a precise, three-stage decision-making process: (1) screening out interesting alternative(s) for further consideration, (2) evaluating alternatives in detail, and (3) comparing candidates to confirm the final choice [5]. The transition between the three stages does not follow a rigorous linear order; it is iterative in nature. However, on the whole, the process does follow an approximate sequence.

2.3 Human Decision Making Theory

In classical decision theory, decision makers are assumed to properly process all relevant information and explicitly consider trade-offs among values to choose an optimal alternative on the basis of an invariant strategy. However, human decision-making behaviors in reality often violate the prescription of classical decision theory. One the one hand, decision makers do not process all available information, but devote attention to perceptually salient information or information that they believe to be helpful [2]. On the other hand, they use a wide variety of strategies depending on the relative weight they place on making an accurate decision versus saving cognitive effort, because the accuracy and effort characteristics are different across strategies for a given decision environment and different across environment for a given strategy [2].

The adaptive nature of decision making behavior provides the insight that information display can impact not only information acquisition but also information combination, leading to higher/lower decision accuracy and less/more cognitive effort. For example, an insufficient information display can blind a decision maker to myopic, uninformed decision [15]. Moreover, merely presenting all necessary information is not enough. Decision makers tend to ignore important information simply because the most salient information is not diagnostic or important for decision makers [12]. Thus, the match between the relative importance of information and the salience of information display is important. In addition, decision effort can be reduced by improving the congruence between the format and organization of information and the way that users process information to make decisions. If the decision strategy adopted is not efficient or proper for a task, reducing the effort needed to execute certain operations can direct decision makers toward the use of compensatory processing [17].

To recap, in light of human decision-making theory, an in-depth understanding of customers’ decision making behaviors serves as the foundation of effective information displays for user decision improvement.

3 Formative Study

3.1 Research Questions

A central distinction among strategies is the extent to which they make trade-offs among attributes. Decision strategies that explicitly consider trade-offs are called compensatory strategies, whereas strategies that do not make trade-offs are called non-compensatory strategies. RQ1: which kind(s) of decision strategies do customers adopt to process information, compensatory or non-compensatory strategies?

In an E-commerce environment, each entity is described by diverse information. In general, the information can be classified into two types: static features (such as price and specifications) and customer reviews. RQ2: which kind(s) of information do decision makers seek, static features or/and customer reviews?

The format in which the sentiment towards each attribute extracted from customer reviews is presented can also be different. Numerical values provide an easy proxy for opinions, whereas verbal values provide reasons underlying the scores. RQ3: which kind(s) of values do decision makers refer to concerning the sentiment of attributes extracted from customer reviews, numerical or/and verbal?

3.2 Tasks

To examine decision-making behavior in an E-commerce environment, we took online hotel booking as the test domain for two reasons. First, it is feasible to recruit appropriate and sufficient subjects to participate in the study. Second, the hotel domain contains abundant online customer reviews that are written with multiple attributes in mind. All hotel information and corresponding customer reviews used in the formative study were crawled from Tripadvisor.com in May, 2014.

Three tasks were implemented corresponding to the three-stage decision making process. Task 1: imagine that you will have a trip to Beijing in the summer holiday and need to book a hostel online. The top 10 Beijing Bed and Breakfast are presented. Please choose interesting one(s) for further consideration. Task 2: please read detailed information of the hotel you selected in the preceding task and decide whether to save it as a candidate. Task 3: compare the candidates to choose one as the final choice.

3.3 Research Methods

Two process-tracing methods that have proven especially valuable in decision research are verbal protocols and information acquisition methods [14]. Verbal protocol method asks subjects to “think aloud” while performing decision tasks. As to different information acquisition techniques, the process underlying eye tracking is most similar to real-world process. Computerized process tracing tool (CPT) is done by setting up a decision task so that all relevant information is hidden in boxes until a subject moves mouse to click. Considering that we do not have eye tracking equipment and there is no substantial influence on our research questions by using CPT [11], we employed CPT in our study, which is fairly straightforward in data collection but cannot directly observe internal cognitive process. In contrast, verbal protocols can measure information processing directly but are difficult to analyze formally. Thus, verbal protocols and CPT are concurrently used to complement each other.

3.4 Interfaces for the Formative Study

The feature-sentiment summary of customer reviews has proven to be an effective way to help users digest the massive quantity of customer reviews [4, 10]. However, the variances in other elements of reviews are not taken into account. In our study, we provide a multiple-level exploration of customer reviews, which incorporates post date, usefulness and reviewer into review summary, in addition to feature and associate sentiment. For example, when users learn that there are 23 5-star reviews for location, they can inspect the usefulness, time, and reviewer distribution of the subset of reviews, as shown in the red boxes of Fig. 1. There is evidence that subjects tend to use non-compensatory strategies when faced with complex decision tasks [18]. Thus, in the interface for task one, in addition to static features and review summary for each hotel, there are sorting and filtering to facilitate users selecting an alternative with the best value on the most important attribute and eliminating alternatives with values for an attribute below a cut-off. In addition, attributes extracted from customer reviews are incorporated in sorting and filtering (see Fig. 1).

Fig. 1.
figure 1

Screenshot of the interface for task one with one review summary uncovered

With respect to the interface for evaluating alternatives in detail, Sinha and Swearingen (2002) used a music system as an example and noted that the information that comes into play during this stage can be classified into three categories: basic item information, social opinion and item sample [16]. In the hotel system, hotel name, price, address and facilities are included as basic information. Social opinion is customer reviews from a large community of travelers. Traveler photos are taken as the item sample to enable hotel preview (see Fig. 2).

Fig. 2.
figure 2

Screenshot of the interface for task two

The shopping cart provides a comparison matrix in the form of alternatives (columns) and attributes (rows), with which users can perform feature-by-feature comparison between products. This method has been demonstrated to improve decision quality compared with its absence [7]. Moreover, the attributes (rows) are not limited to brief static features; the {opinion attribute, sentiment} pairs extracted from customer reviews are embedded to complement {static feature, value} pairs (see Fig. 3).

Fig. 3.
figure 3

Screenshot of the interface for task three

3.5 Procedure and Participants

The main procedure for the formative study can be divided into three steps. Step 1: Each participant was required to fill in his/her personal background and E-commerce experience. Then, we gave a brief introduction on the experiment and explained the interfaces to participants. All boxes within a given screen were uncovered. Step 2: Before conducting the task, we asked participants several testing questions to make sure that they understand the hidden content of each box and would not randomly click. Step 3: Participants were asked to perform the three tasks and verbalize their thinking processes. All mouse click and verbal protocols were recorded automatically.

50 participants were recruited to take part in the experiment. They are students at Hong Kong Baptist University pursuing Bachelor, Master or PhD degrees, from different departments, such as Computer Science, Chemistry, Education and Management. In the pre-study questionnaire, they specified their frequency of Internet use (on average 4.96 ‘daily/almost daily’, S.D. = .23), E-commerce shopping experience (on average 3.5 ‘1–3 times a month’, S.D. = .56), and online hotel booking experience (on average 2.42 ‘1–3 times’, S.D. = .45). Thus, most of them are frequent E-commerce users and target customers of online hotel booking.

4 Analysis of the Results

We transcribe individual cases by coding each observed behavior in terms of Elementary Information Processes (EIPs) [1] and corresponding verbal protocols (i.e., supporting commentary). In turn, based on a specific collection and sequence of EIPs, the decision strategy participants adopted can be inferred. An example of formally coded data transcript is recorded in Table 1. To guarantee the reliability of coding, two coders were employed to independently transcribe all the cases. The measure of agreement of Kappa for each variable is above 0.7, suggesting a good level of consistency between the two coders. Disagreements in the coding were solved by discussion.

Table 1. An example of formally coded data transcript

4.1 Stage 1: Screening Out Interesting Alternatives

Decision Strategy. 3/50 (6 %) participants adopted Lexicographic, 9/50 (18 %) participants made use of Eliminate-by-aspect plus Lexicographic, 18/50 (36 %) participants screened out alternatives by Eliminate-by-aspect, and 20/50 (40 %) participants used Eliminate-by-aspect plus Additive difference. In the following, we elaborate on the four types of decision strategies and use LEX, EBA + LEX, EBA, EBA + ADDIF to denote participants who adopted the corresponding decision strategy.

Eliminate-by-aspect. Decision makers eliminate alternatives with values for an attribute below a cut-off. The process continues with the second attribute, and then the third, until a smaller set of alternatives remains.

Lexicographic. Decision makers determine the most important attribute and then select the alternative with the best value on that attribute.

Eliminate-by-aspect plus Lexicographic. Firstly, users eliminate alternatives to a smaller set in terms of Eliminate-by-aspect. Then, they select the alternative with the best value on the most important attribute.

Eliminate-by-aspect plus Additive Difference. Decision makers begin by narrowing down the set of alternatives in terms of Eliminate-by-aspect. Then, they compare the remaining alternatives by summing the differences between alternatives on multiple attributes. Finally, they select the alternative with the best overall value.

Information Acquisition in Eliminate-by-aspect. 47 participants began by narrowing down the range of options in terms of Eliminate-by-aspect to simplify the complexity of choice. As shown in Fig. 4 (left), significantly more users eliminated alternatives by both static features and customer reviews (26/47) compared to those merely using static features (16/47) or customer reviews (5/47); \( \upchi^{2} \left( 2 \right) \, = 14.09,p < .05. \)

Fig. 4.
figure 4

Information acquisition in Eliminate-by-aspect

In greater detail, Fig. 4 (right) lists the specific information of static features and customer reviews to which participants referred. On average, 2.62 attributes (S.D. = 1.22) were utilized, to which static features and customer reviews respectively contribute 1.5 and 1.12. Moreover, significantly more participants eliminated alternatives by attributes extracted from reviews (denoted by opinion attributes) compared to those referring to an overall review score (26/47 vs. 5/47), \( \upchi^{2} \left( 1 \right) \, = 14.23, \, p < .05. \)(1) = 14.23, p < .05.

The process of generating cut-offs is adaptive in nature, determined by the value distribution of an attribute and correlation among attributes, in addition to stable preference. Participants referred to the value distribution to avoid invalid filter, such as too many/few options available due to loose or strict cut-offs. Moreover, participants who explicitly considered trade-offs among values frequently referred to attribute correlation to determine cut-offs. For example, one might explore a hotel with price above original price limit to see how much better it is. If its rating greatly exceeds expectations, the cut-off of price may be shifted, otherwise the cut-off is reinforced.

Information Acquisition in Lexicographic. 12 participants selected alternatives by Lexicographic (3 with LEX, 9 with EBA + LEX). 58.3 % (7/12) of subjects chose the entity with the best value on some static feature, and 41.7 % (5/12) chose based on customer reviews (see Fig. 5(left)). The frequency of each attribute considered the most important is listed in Fig. 5 (right). As to sorting by customer reviews, the proportion of participants who selected hotels in terms of an opinion attribute is not significantly different from that using an overall rating (2/12 vs.3/12), \( \upchi^{2} \)(1) = .20, p > .05.

Fig. 5.
figure 5

Information acquisition in Lexicographic

The weight of an attribute is determined not only by stable preference but also by the value range of an attribute. In other words, the weight given to an attribute is a function of attribute ranges. As the variance in the values on one attribute across alternatives increases, the importance weight on that attribute becomes higher [6].

Information Acquisition in Additive Difference. Because price and quality are generally thought to be negatively correlated (i.e., higher-quality hotels tend to have higher rents), all 20 participants who compared alternatives on multiple attributes (i.e., EBA + ADDIF) referred to both price and customer reviews to make decisions. In addition, 45 % of participants added address into comparison (see Fig. 6 (left)). Considering the information of customer reviews, as shown in Fig. 6 (right), significantly more participants compared alternatives by opinion attributes (e.g.., location and cleanliness) in comparison with those who compared using an overall review score (17/20 vs. 3/20), \( \upchi^{2} \)(1) = 9.80, p < .05. Moreover, during product comparison, the extent to which one is willing to trade off more of one attribute for less of another attribute is different. In other words, people gave different relative importance to attributes.

Fig. 6.
figure 6

Information acquisition in Additive Difference

The format in which the sentiment of an opinion attribute is evaluated can be different. The majority of participants (10/20) made their decisions based on both numerical values (i.e., the average rating and number of reviews) and verbal values (i.e., adjective-noun word pairs), followed by just numerical values (9/20). The smallest proportion relied on only verbal values (1/20).

4.2 Stage 2: Evaluating Alternatives in Detail

Decision Strategy. In this stage, only one alternative is considered at a time. Participants use alternative-based manner, which means users evaluate multiple attributes of a single alternative and compare them with an aspiration level. When the values of all attributes meet the aspiration level, the alternative is saved as a purchase candidate.

Information Acquisition. For different types of participants, the type of information they evaluated at stage two is shown in Fig. 7 (left). On the whole, 50 % (25/50), 84 % (42/50) and 88 % (44/50) of participants evaluated static features, photos and customer reviews, respectively. More specifically, Fig. 7(right) shows which aspects of customer reviews that participants would inspect. The number of participants reading reviews in a feature-driven manner was significantly larger than the number of participants doing so in a holistic manner (38/50 vs. 6/50), \( \upchi^{2} \)(1) = 23.27, p < .05. For example, people mentioned “I mainly concern about cleanliness and location, while others are indifferent… (reading reviews)… but I cannot find content on cleanliness, most of them are about location and service”.

Fig. 7.
figure 7

Information acquisition at stage two

Concerning the numerical values of customer reviews, 24 % and 39 % of participants evaluated the average rating and average rating plus the number of reviews, respectively, whereas the other 37 % also read the time distribution of all and the 5-point customer reviews to examine whether there is a download trend for customer reviews. For example, people noted “the trend of customer reviews often change over time, like the hotel may improve its service, so that the recent reviews may be opposite to old reviews in which people complained about the service”.

In addition to the numerical values, 7 %, 56.5 % and 36.5 % of participants referred to the verbal values in terms of summarized adjective-noun word pairs, raw reviews and both, respectively. Overall, 93 % of participants read raw reviews to assist in the context understanding. Due to the large quantity of raw reviews, participants performed two types of behavior: inspecting the latest and/or the most negative customer reviews. Participants who sorted customer reviews by date mentioned that “I would like to read the newest reviews… especially the reviews written by those who just lived in there last night… I think it will be closer to the real condition and more credible”. More than half of the participants clearly indicated that they favored negative comments compared with positive ones. For example, people said “I would like to read negative ratings and learn the reasons why other customers gave lower rating to see if I have the same concern”, and “The reason of adding it in shopping cart is not only how good it is, but also whether I can stand its drawbacks”.

4.3 Stage 3: Comparing Candidates for Final Choice

Decision Strategy. 43 participants who saved more than one option engaged in this stage. The decision strategy can be interpreted as calculating the value difference between alternatives on one attribute. The process repeats with other attributes. Then, the differences are summed to obtain an overall relative evaluation for each entity. Finally, the alternative with the best evaluation is retained as the final choice

Information Acquisition. Figure 8 (left) lists the type of information that participants compared at stage three. Through statistical analysis, there is no significant difference in the static feature comparison between participants χ2 (3) = 1.32, p < .05; while, there are significant associations between the types of participants and whether they compare customer reviews (χ2 (3) = 8.21, p < .05) and whether they compare photos (χ2 (1) = 11.87, p < .05). More notably, we found that participants who adopted a compensatory strategy at stage one, i.e. EBA + ADDIF (denoted as compensatory in Fig. 8 (right)), focused significantly more on customer reviews (χ2 (1) = 16.59, p < .001) and less on photos (χ2 (1) = 7.34, p < .01) compared with participants who adopted non-compensatory strategies, i.e. EBA, LEX, and EBA + LEX (denoted as non-compensatory in Fig. 8 (right)). The reason for the difference might be that participants who prefer non-compensatory strategies more greatly emphasized minimizing effort, rather than referring to extensive amount of information to make an optimal decision: “I would compare the photos, as it can give me a more intuitive impression, which facilitates choosing the most attractive one”.

Fig. 8.
figure 8

Information acquisition at stage three

Figure 9 (left) illustrates the frequency of each attribute utilized in the product comparison. For all types of participants, price is most frequently compared, which means that people treat price as a crucial factor in online purchasing. Moreover, significantly more participants used {opinion attribute, sentiment} pairs extracted from reviews to perform feature-by-feature comparison between products compared to those merely referring to an overall review score (22/43 vs. 5/43), χ2 (1) = 10.7, p < .001.

Fig. 9.
figure 9

Attributes in product comparison

Out of the 27 participants who compared customer reviews, 13/27 participants made their decisions based on numerical values, 3/27 participants relied on adjective-noun word pairs, and 11/27 participants referred to both, as shown in Fig. 9 (right).

5 Conclusion

The results of the formative study provide practical implications on E-commerce interface design. For the interface of screening out interesting alternatives, we propose that: (1) including both static features and opinion attributes in filter, (2) visualizing the value distribution of each attribute and the correlation among attributes, (3) enabling users to sort alternatives by multiple attributes and giving different weight to attributes, and (4) in addition to static features, integrating opinion attributes in sorting. For the detail page, the following advices are concluded: (1) categorizing customer reviews by features, (2) in addition to the average rating and number of reviews, representing time distribution for opinion attributes to support the analysis of temporal evolution, (3) coupling numerical values with verbal values (i.e., adjective-noun word pairs and raw reviews), and (4) facilitating users to inspect the latest and most negative raw reviews. As to comparison interface, we suggest that: (1) decreasing the difficulty in calculating value difference on each attribute across alternatives, (2) summarizing customer reviews in the form of {feature, sentiment}, and (3) representing both numerical and verbal values toward each opinion attribute.