Introduction

In this paper, we investigate the interplay of online consumer ratings and online consumer reviews in mobile app downloads. An online rating is an assessment of a product’s overall quality on a numerical scale, whereas an online review is a text comment on a product’s attributes and quality. These mechanisms have become essential in electronic markets (Huang et al. 2015), in particular for product markets characterized by many competing alternatives available to consumers, e.g., books, restaurants, hotels, movies, and mobile apps. Ratings and reviews are important to consumers for helping inform their choices, and to producers for representing useful information when forecasting sales, developing products, and designing marketing promotions (Li et al. 2019).

App downloads are an important performance variable for app developers and platform providers. This follows from how app revenues are generated (see Roma and Ragaglia 2016 for a review). Downloads are sometimes paid for by consumers directly and they also enable in-app purchase revenue generation. Furthermore, app advertising revenues are positively related to the size of the app user base, which is a function of downloads (Lee et al. 2021).

Some studies have reported that ratings and reviews strongly impact consumers’ product choices (Burgers et al. 2016; Finkelstein et al. 2017; Gokgoz et al. 2021; Kashyap et al. 2022), which is consistent with consumers’ tendency to rely on peer information over commercial information for their product choices (Sher and Lee 2009). Overall, however, empirical findings of online ratings’ and reviews’ impact on consumer behavior are inconsistent (Liang et al. 2015; Gottschalk and Mafael 2017; Li et al. 2019; Picoto et al. 2019; Li et al. 2020; Sadiq et al. 2021). Such studies have typically investigated either ratings or reviews, so there is a need to better understand the interplay of ratings and reviews in consumer choice, as called for in recent studies (Chen and Xu 2017; Li et al. 2019; Kaur and Singh 2021; Xia et al. 2021). This study contributes to filling this gap in the literature. Based on cue theoretical premises, we provide rationales for ratings’ and reviews’ combinatory use by consumers in app store settings. Specifically, how rating cues (e.g., average rating score) and review cues (e.g., review polarity) may corroborate one another, thereby reinforcing their individual credibility as separate cues, is argued for based on cue-consistency theory (Miyazaki et al. 2005). As contended in the literature, too similar ratings (average rating score) across competing product alternatives undermine ratings’ role as sole determinant in product decision-making (Hazarika et al. 2021). In effect, consumers may instead combine rating and review information for their app downloading decisions, especially since potential consumers reportedly read and use reviews for their decision-making (Li et al. 2019; Lutz et al. 2022; Kashyap et al. 2022). Moreover, how rating and review cues may complement each other, thereby in combination providing more diagnostic (reliable) information for consumers’ app choices, is argued for using cue diagnosticity theory (Feldman and Lynch 1988). Such complementary cue value is consistent with arguments that online reviews give potential consumers deeper insights into how specific attributes of an app appeal to current users, which aggregated ratings (average rating score, dispersion of ratings, and volume of ratings) alone cannot reveal (Liang et al. 2015).

In this study, we explore rating and review variables’ interaction effects on downloads of gaming and productivity apps in the Apple App Store. The rating variables we study are average rating score, volume of ratings, and dispersion of ratings. Corresponding review variables are polarity, subjectivity, and review length. Polarity represents a quantitative measure of the valence of a text review, whereas subjectivity is a quantitative measure of how objective (fact-based) versus subjective (emotional) a text review is. These variables are investigated due to their argued importance in the literature (Salehan and Kim 2016; Li et al. 2019; Filieri et al. 2019).

Studies on mobile apps have divided them into hedonic and utilitarian consumption value segments based on app store category (e.g., entertainment, games, productivity, health and fitness). Different empirical strategies have been employed to arrive at the dual classification of apps based on their app category belonging. These include neutral expert interrater coding (Tafesse 2021), and survey studies of consumers utilizing measurement instruments to identify main type of consumption value perceived for products across categories (Kim et. al. 2014; Tang 2016; Yang and Lin 2019). Yet other studies have used logical reasoning to divide app categories into the two value segments (Arora et al. 2017). Regardless of procedure, gaming apps are in such studies classified into the hedonic value segment, due to mainly being used for the fun and enjoyment they bring. By the same token, productivity apps have been classified into the utilitarian value segment due to being used mainly for the efficiency and effectiveness gains they bring in solving tasks, e.g., spreadsheet problems, making presentations, or writing reports. Using such classifications, research has reported ratings’ and reviews’ to impact app consumption behavior differently for the two value segments (Liu et al. 2014; Roma and Ragaglia 2016; Tafesse 2021). For other product domains as well, ratings’ and reviews’ impact on product decision-making has been reported to depend on whether consumption is of hedonic or utilitarian consumption value (Ren and Nickerson 2019; Akdim et al. 2022). We therefore assume, following the merit reported in previous studies, that gaming apps are more hedonic consumption value-oriented overall, whereas productivity apps are more utilitarian consumption value-oriented overall—hence, the rationale for studying these two app categories in our study. As we acknowledge that games could sometimes be consumed for utilitarian purposes, and productivity apps for more hedonic reasons (see Akdim et al. 2022, for such observations of social mobile apps), we aim to investigate how ratings and reviews in combination impact downloads of gaming apps and productivity apps.

Literature review

The research into online ratings and reviews is part of a broader stream of literature on electronic word-of-mouth (eWOM). eWOM is “any positive or negative statement made by potential, actual, or former customers about a product or company, which is made available to a multitude of people and institutions via the Internet” (Hennig-Thurau et al. 2004). Arguments in the eWOM literature favor use of online consumer ratings and reviews for app downloading decisions. First, compared with other forms of online reviews such as critic reviews or other third-party reviews, consumer comments are often considered more trustable (Liang et al. 2015). Second, the anonymous characteristic of online consumer app ratings and reviews is also favorable, meaning consumers are more comfortable sharing both positive and negative comments when they are anonymous (Deng et al. 2021). Third, the less commercial actors can control review content, e.g., by deleting unfavorable reviews or making fake reviews, the more likely potential consumers are to use reviews for their decision-making (DeAndrea et al. 2018). App developers have little such control since platform providers supply the rating and review mechanisms. In fact, platform providers remove apps from the store and expel developers if they manipulate ratings or reviews (Apple.com, March 2022). Finally, eWOM compared with traditional WOM can be more easily evaluated across space and time, e.g., evaluated repeatedly or at a pace suitable to the reader (Sun et al. 2006). Potential app consumers may for such reasons prefer eWOM over traditional WOM for their app decision-making. This study thus contributes to a deeper understanding of how eWOM impacts product performance under conditions argued to spur its use.

Online ratings’ impact on product performance

The literature on online ratings has mainly considered how the variables’ average rating score, volume of ratings, and dispersion of ratings impact product performance. Arguments in the literature state why these variables ought to matter for product performance. Average rating score informs a potential consumer about other consumers’ perception of a product’s value. Moe and Trusov (2011) therefore argue that products of higher quality are more likely to receive higher ratings than products of low quality, which impacts consumer decision-making. Volume of ratings represents a product’s number of ratings. Arguments in the literature contend that higher volume is associated with more discussions about a product, leading to increased awareness of it among potential consumers (Lu et al. 2020). Volume of ratings is argued to indicate the trustworthiness of the general opinion about a product—that is, whether consumers have reached a consensus on the general evaluation of a product (Burgers et al. 2016). Based on this reasoning, a positive relationship between volume of ratings and consumer decision-making has been argued. Dispersion of ratings represents a measure of the spread in ratings, e.g., variance or standard deviation of ratings. Generally, consumers seek to avoid risk, implying that dispersion of ratings should have a negative impact on potential consumers’ reliance on ratings for their decision-making (Chu et al. 2014).

Empirical findings on how the three online rating variables impact product performance are inconsistent across products such as books, movies, hotels, and apps (Baugher et al. 2016; Finkelstein et al. 2017; Li et al. 2019; Lu et al. 2020; Tafesse 2021; Gokgoz et al. 2021; Chen et al. 2022).Appendix 1 provides a review of such studies, revealing that different performance metrics (downloads, top-list survival, and sales rank) have been investigated. Moreover, it indicates that rating variables’ impact on app performance is contingent on contextual variables, e.g., app category and country profile. We contribute in three main ways to these prior online ratings studies. First, we analyze how dispersion of ratings impacts app performance. To our knowledge, prior work has not dealt with this issue for apps, but for other products (Chu et al. 2014; de Langhe et al. 2015; Zheng et al. 2021). It is not obvious how this variable impacts app performance. On the one hand, consumers generally seek to avoid using less reliable information for their decision-making (Chu et al. 2014). On the other hand, apps are typically low priced, which is why this risk may be ignored (Burgers et al. 2016). In a similar fashion, there are arguments and counterarguments for the role played by average ratings in electronic markets. A higher average rating score signals higher product quality with a positive influence on consumer decision-making (Moe and Trusov 2011), but average ratings may be too similar for competing app alternatives, and thereby do not inform consumer choice (Li 2018).

Second, we analyze how the three online rating variables impact app downloads for gaming apps (hedonic consumption value-oriented) versus productivity apps (utilitarian consumption value-oriented). Apart from Liu et al.’s (2014) analysis into freemium apps, there is a gap in the literature on how ratings impact the performance of hedonic versus utilitarian consumption value-oriented apps. Ratings’ impact on consumer decision-making for other products has been found to depend on such consumption value (Chu et al. 2014; Li et al. 2019; Tafesse 2021). Whether this generalizes to apps is not obvious since apps are typically low priced and can be uninstalled with little effort and regret (Burgers et al. 2016). Third, as Appendix 1 shows, most prior works on apps have analyzed top-listed apps for shorter time intervals. In this study, we track apps on a daily basis over a period of almost two years from their launch in the Apple App Store. This way we contribute to a deeper understanding of how ratings play a role in app performance.

Online reviews’ impact on product performance

Consumer text reviews are argued to help consumers find products that match their needs (Liang et al. 2015). Consequently, investigations into how different text review variables influence product performance, mainly product sales, have been conducted.Footnote 1 Results of such studies are mixed across products investigated (Gopinath et al. 2014; Liang et al. 2015; Li et al. 2019; Guo and Shasha 2016; Đurović and Kniepkamp 2022). The majority of prior research into online reviews has however focused on determining review helpfulness, which represents the subjective value of a review to the reader (Huang et al. 2015; Kashyap et al. 2022). Such studies have reported polarity (valence), subjectivity, and length of reviews as determinants of review helpfulness.

Considering the historical and continued expected high growth of the app market (Borasi and Baul 2019), there is surprisingly sparse research into how online reviews impact app performance. Liang et al. (2015), studying weekly panel data of top-500 listed apps, report a positive impact of their review valence measure on both free and paid app sales rank. Oh et al. (2015) reported a positive impact of the number of potential consumer-generated question posts about an app on its downloads. This study contributes to such studies by analyzing how polarity, subjectivity, and length of reviews impact app downloads. Li et al. (2019) in their review of ratings and reviews literature conclude that eWOM studies focus on numerical ratings but rarely address textual reviews, due to the complexity of text analysis. They moreover conclude that few studies that incorporate textual reviews use techniques such as sentiment analysis. Our study contributes to this stream of literature by utilizing such techniques.

The interplay of ratings and reviews in product performance

Research into the combinatory impact of ratings and reviews has been called for in recent studies (Li et al. 2019; Filieri et al. 2019; Kaur and Singh 2021; Shin et al. 2021). Previous research has suggested that numerical ratings and textual comments might work separately or in combination (Floh et al. 2013), but little research has dealt with how ratings and reviews interplay. Tsang and Prendergast (2009) found in their experimental study of movie product reviews that for those containing both a review and a rating, the former is more significant in affecting product purchase intention. However, the authors did not find that positive ratings accompanied by positive reviews produced significantly higher purchase intention compared with inconsistent evaluations. Hu et al. (2013), utilizing a large panel data set on 4000 Amazon books, reported no direct effect of ratings on book sales rank, but a positive moderating effect with review valence on such rank. Chong et al. (2016) reported positive interaction effect of their sentiment polarity measure and volume of ratings in predicting sales of 12,000 Amazon electronics products based on a neural network approach. Al-Natour and Turetken (2020), based on Amazon and Yelp rating and review data across product domains, reported sentiment polarity to be a good substitute of star ratings, and at times a good complement to such ratings. These author findings are consistent with the cue theoretical premises put forth in the present study into mobile apps. Similarly, Zhu et al. (2020), in the context of hotel reviews, report consistency between review polarity and rating scores. By contrast, Li et al. (2019), utilizing 22-week panel data on consumer reviews of 312 PC products, reported that numerical ratings mediate the effect of textual reviews on such sales. Filieri et al. (2019) found that extreme ratings in combination with long, linguistically clear hotel reviews on TripAdvisor positively impact review helpfulness. Kaur and Singh (2021) reported a mixed impact of rating score combined with review volume on book sales.

Our research contributes to this prior work by analyzing a larger set of interaction effects between rating and review variables on product performance.

The interplay of ratings and reviews in mobile app downloads: cue consistency and diagnosticity rationales

Ratings and reviews are cues, i.e., information signals, used by consumers to infer product quality (Byun et al. 2021). Consumers may use rating and review cues in combination for two main reasons. One is that different cues, by corroborating each other, strengthen each other’s reliability from the consumer’s perspective. This is a main premise of cue-consistency theory (Miyazaki et al. 2005). The theory holds that observation of consistent signals increases information diagnosticity, which is the extent to which a cue helps the consumer assign a product to a specific quality category (such as high or low quality). The other main reason is that multiple cues may complement each other. This is consistent with cue diagnosticity theory, which holds that consumers continue to assess cues until a perceived reliable or diagnostic inference of product quality has been reached (Feldman and Lynch 1988; Reddy et al. 1994). Consumers do so to reduce uncertainty and risk around their product decisions (Kirmani and Rao 2000). Moreover, in line with these two theoretical premises, empirical studies have repeatedly revealed that consumers prefer relying on multiple cues over single cues for their product decisions (see Byun et al. 2021, for a review).

The corroborating and complementary rationales may apply to a multitude of rating and review variable interactions in line with the cue theoretical premises. First, both ratings and reviews are retrievable as they are publicly accessible in app marketplaces. Second, they enable provision of corroborating and complementary product quality signaling value to consumers. Specifically, consistency in valence of both ratings (average rating score) and reviews (polarity) may strengthen the consumer’s trust in each cue. Similarly, average rating score combined with objective (rather than subjective or emotional) reviews may have this corroborating effect. Research thus reveals that consumers put higher trust in objective (factual) online reviews (Darley and Smith 1993). The corroborating effect may furthermore apply to review length accompanied by average rating score, as longer reviews may provide deeper insight into why the average rating score for an app is high or low (Li et al. 2019).

Rating and review variables may also offer complementary signaling value to consumers. This stems partly from their different nature (numbers versus text) and the way they are displayed in app marketplaces. Aggregated rating cues such as average rating score, the number of ratings, and the distribution of ratings (on the, e.g., 5-star scale) provide overall consumer population information about an app’s quality. Text reviews, on the other hand, are displayed in a disaggregated fashion, so they may provide information about specific quality attributes or aspects of an app not revealed by the aggregated rating information. It follows that consumers may use average rating score as a cue for an app’s overall quality while simultaneously using individual text reviews to obtain cue information about how specific attributes of the app appear to fulfill the consumer’s quality expectations. Both cues may have to meet the consumer’s expectations for the app to be downloaded, hence a combinatory effect. Similarly, consumers may consult disaggregated text reviews to obtain additional insight into why ratings are dispersed or not. Thus, review polarity and dispersion of ratings may in this way offer complementary value to one another and may be used in combination to determine whether to download an app. Furthermore, whether dispersion of ratings is based on subjective or objective consumer evaluations can be inferred by consumers inspecting individual text reviews along with overall rating dispersion information (how ratings are distributed on the 5-point scale across raters). As such, the two may be used in combination for consumers’ downloading decisions. Moreover, related work has argued that text and numerical components of a product review would often interact within the consumer’s processing system (Li et al. 2019). Yet other research suggests that combinatory use is enhanced by ratings and reviews being displayed simultaneously in app marketplaces, making the interplay between them particularly valid to study (Chong et al. 2016).

Despite the two main cue theoretical premises favoring consumers’ use of ratings and reviews in combination, characteristics of mobile apps may attenuate their combinatory use. First, apps typically have a low upfront price and can be uninstalled with low effort and regret (Burgers et al. 2016). Accordingly, consumers might not engage in in-depth exploration of ratings and reviews, but instead download apps and try them out. Moreover, text reviews might require too much effort to evaluate if they are not linguistically clear (Salehan and Kim 2016), which may attenuate their corroborating and complementary effect with ratings. For the abovementioned reasons, an exploratory approach is adopted in this study, whereby interaction effects of rating cues (average rating score, volume of ratings, and dispersion), and review cues (polarity, subjectivity, and review length) on mobile app downloads are explored. The remainder of this study focuses on this issue, reporting on such combinatory effects for apps in the game and productivity categories in the Apple App Store.

Methodology

Data

To analyze how ratings and reviews impact mobile app downloads, we used US Apple App Store data ranging from January 1, 2015 to December 19, 2016. These data were acquired from a large reputed global actor specializing in app market analytics (https://www.mobileaction.co/). This allowed us to speed up data collection, compared with using algorithms to gather data for two years ourselves. US Apple App Store data were acquired for market size reasons and due to its common use in related work (Lee and Raghu 2014; Liang et al. 2015; Kübler et al. 2018; Gokgoz et al. 2021). Our dataset is restricted to apps tracked daily from their release in the App Store. We thereby contribute to prior work by capturing new apps and more granular app data for a longer period of time (see Appendix 1 for a comparison to related work). The final sample consisted of an unbalanced panel of 341 mobile apps, of which 295 were gaming apps and 46 were productivity apps. Apps from these two categories were selected for two related reasons. First, according to previous studies, games are more hedonic consumption value-oriented, while productivity apps are more utilitarian consumption value-oriented (see Tafesse 2021, for a review). Second, ratings and reviews are reported to impact consumer decision-making differently depending on such consumption value orientation (Roma and Ragaglia 2016; Ren and Nickerson 2019).

The raw data acquired included the following: count of daily downloads per app, app rating (from one to five stars) per reviewer and app, text review per reviewer and app, app release date, app type (gaming or productivity), and app download type (free or paid). The data also revealed the exact times when text reviews and app ratings were posted on the App Store. This enabled us to generate panel data.

Variables and measurement

In Table 1, measures of variables are summarized. Downloads, which constitutes the dependent variable in our econometric models, was measured as daily count per app. We generated three rating variables: average rating score (Av_Rating), volume of ratings (Vol_Rating), and dispersion of ratings (Disp_Rating). Cumulative measures for these variables were used to achieve consistency with how ratings are displayed to potential app adopters in the App Store. This is also consistent with how rating variables are measured in related work (Lee and Raghu 2014; Baugher et al. 2016; Finkelstein et al. 2017; Kübler et al. 2018; Li et al. 2019).

Table 1 Variables and measurement

In order to econometrically analyze online consumer reviews’ impact on app downloads, we used sentiment analysis to generate statistical variables from text. This is a common method in online consumer review studies (Lopez et al. 2020). Two such variables were generated: polarity (Av_Polarity) and subjectivity (Av_Subjectivity). Polarity classifies words, phrases, or sentences from positive to negative (Liu 2010). In our study, polarity reflects to what extent a text review expresses a positive or negative view of an app’s quality. Subjectivity expresses to what extent a text review is fact-based (objective) versus emotional (subjective) in its character (Liu 2012). Specifically, for their downloading decisions, potential app adopters may rely differently much on emotionally expressed views compared with more fact-based ones. To extract a polarity score and a subjectivity score from each review, we used a lexicon-based approach, i.e., we used a dictionary of words annotated with a word’s or a text phrase’s opinion orientation and subjectivity. Specifically, we used pattern.en, a natural language processing toolkit that leverages WordNet to score sentiment according to the English adjectives used in the text (De Smedt and Daelemans 2012; www.pattern.en for details). WordNet is a large electronic lexical English database including more than 117,000 synsets, i.e., groups of words constituting cognitive synonyms (Fellbaum 1998; wordnet.princeton.edu for details). Polarity scores obtained using pattern.en are in the range − 1 to + 1, where a higher value denotes a more positive opinion, and where 0 reflects a neutral opinion. Subjectivity scores are in the range 0 to 1, where a higher score implies a more emotionally oriented expression. Review length (Rev_Length) was used as a third variable since longer reviews have been argued to be more helpful to readers (Huang et al. 2015; Li et al. 2019). Consistent with rating variables, the three review variables were measured as daily averages.

To investigate the interplay of ratings and reviews in mobile app downloads, we generated nine interaction variables. Specifically, for the three rating variables and the three review variables, we multiplied each single rating variable with each single review variable. Prior work has dealt with only a subset of such interaction effects (Tsang and Prendergast 2009; Hu et al. 2013; Li et al. 2019). More comprehensive analyses are called for regarding how such pieces of information are displayed to readers in electronic markets, because rating and review information contain different signals of product quality (Chong et al. 2016; Chen and Xu 2017; Li et al. 2019). However, little is known regarding how different pieces of rating information and review information in combination influence product performance; this is our rationale for exploring a larger set of interaction effects.

Finally, we included app age and gross ranking as independent variables in our econometric analyses. These were included due to their argued importance for app performance according to related work (Jung et al. 2012; Lee and Raghu 2014; Roma and Ragaglia 2016; Kübler et al. 2018). Gross_Rank is a measure of the overall popularity of an app relative to other apps, where a lower positive rank integer value implies higher relative popularity. The ranking is provided by the Apple App Store, which does not openly reveal its measurement procedure. App_Age is included following product-life cycle theory arguments that different types of consumers may rely on different sources of information for their decision-making. It refers to the number of days an app has existed in the App Store since its initial release.

All independent variables are lagged one day in comparison to the dependent variable to be able to test how changes in ratings and reviews impact downloads, as a causality procedure. A one-day lag was used rather than additional days as related work has found that consumers rely only or to a greater extent on most recent reviews (Li et al. 2019; Alzate et al. 2021). Inspection of our data set reveals that reviews and ratings change on a daily basis, and along with the aforementioned literature arguments, it motivates the use of a one-day lag.

Econometric models and analyses

Following Hausman test results, fixed-effects panel regression analyses were performed to analyze how ratings and reviews impact app downloads. Model 1 is the additive benchmark model constituting the direct effects of independent variables on mobile app downloads:

$$\left(\mathrm{ln}\right){Downloads}_{i,t}={Av\_Rating}_{i,t-1}+{Disp\_Rating}_{i,t-1}+(\mathrm{ln}){Vol\_Rating}_{i,t-1}+{Av\_Polarity}_{i,t-1}+{Av\_Subjectivity}_{i,t-1}+(\mathrm{ln}){Rev\_Length}_{i,t-1}+{Gross\_Rank}_{i,t-1}+{App\_Age}_{i,t-1}+{\varepsilon }_{i,t}$$
(1)

Models 2 to 10 are the interaction effect models, each including one interaction effect variable to enable analysis of product terms’ impact on downloads compared with the benchmark model. Hence, the interaction effect models:

$$\left(\mathrm{ln}\right){Downloads}_{i,t}={Av\_Rating}_{i,t-1}+{Disp\_Rating}_{i,t-1}+(\mathrm{ln}){Vol\_Rating}_{i,t-1}+{Av\_Polarity}_{i,t-1}+{Av\_Subjectivity}_{i,t-1}+(\mathrm{ln}){Rev\_Length}_{i,t-1}+{Gross\_Rank}_{i,t-1}+{App\_Age}_{i,t-1}+{Rating\_Variable}_{i,t-1} \times {Review\_Variable}_{i,t-1}+{\varepsilon }_{i,t}$$
(2)

As we used fixed-effect model analyses, \({App\_Type}_{i}\) as a category variable would be omitted due to being invariant if included as an independent variable. Hence, we split the dataset to separately analyze the benchmark model and interaction effect models for gaming and productivity apps, respectively. For causality reasons, independent variables were lagged one day in all models as shown in (1)–(10). Appendix 2 presents a correlation matrix for explanatory variables. It reveals no correlation higher than (\(\pm\)) 0.8, implying no severe issue of multicollinearity (Mota and Moreira 2015). Appendix 3 presents descriptive statistics for variables. Due to large standard deviation and high skewness reported and narrow scale differences for Downloads, Rev_Length, and Vol_Rating, we used the natural logarithm for these variables. This procedure was taken to normalize the data in line with recommendations to enable valid econometric analysis (Li et al. 2020). The same procedure has been commonly applied for these variables in related work based on app store data (Lee and Raghu 2014; Oh et al. 2015; Gokgoz et al. 2021; Kaur and Singh 2021). Due to the presence of heteroskedasticity revealed by Breusch-Pagan tests, robust standard error terms were used in our regression analyses in line with recommendations (Angrist and Pischke, 2008).

Results and discussion

Tables 2 and 3 in the two subsequent sections report the findings of the fixed-effect regression analyses of how rating and review variables in combination impact gaming and productivity app downloads. The findings are discussed next.

Table 2 Interplay of ratings and reviews in gaming app downloads
Table 3 Interplay of ratings and reviews in productivity app downloads

Ratings’ and reviews’ impact on gaming app downloads

For gaming apps, volume of ratings is the only piece of rating information that is found to have a direct effect on downloads, as shown in Table 2. Contrary to previous studies into apps (Wang et al. 2015; Burgers et al. 2016), we report a significant negative effect of volume of ratings on downloads. In general, the higher the volume of ratings, the more popular a product is perceived to be (see Khare et al. 2011, for a review of arguments). However, Khare et al. (2011) demonstrate that when consumers have a high need for uniqueness, a higher volume of ratings of a product decreases preference for it. Future work must investigate whether this need for uniqueness effect pertains to mobile apps.

We report that length of text reviews has a positive significant effect on gaming app downloads. This finding is consistent with literature arguing that longer text reviews are more helpful to potential consumers due to providing richer product quality cue information (Huang et al. 2015). Polarity, as a measure of a text review’s valence, was not found to impact gaming app downloads. This finding is consistent with consumers displaying heterogeneity in preferences for products consumed mainly for hedonic purposes (Liu et al. 2014; Tafesse 2021). In other words, if consumers have very different needs and desires for a product, valence measures, such as average rating score and polarity, could be insufficient to rely on separately for making downloading decisions.

Instead of relying on either ratings or reviews, our results indicate that consumers use a combination of both for their gaming app downloading decisions. The findings are thus in line with cue consistency and diagnosticity theoretical rationales for consumers’ combinatory use of ratings and reviews (Feldman and Lynch 1988; Miyazaki et al. 2005). Four significant interaction effects between rating and review variables are reported in Table 2. First, polarity enhances the positive impact of average ratings on downloads, which is consistent with findings for books reported by Hu et al. (2013). One interpretation of this interaction effect is that by reading positive reviews of quality aspects that pertain to their needs, consumers rely more on the high average rating score of a gaming app. Second, polarity is found to enhance the negative impact of volume of ratings on downloads following a positive interaction effect. If this is due to a need for uniqueness as demonstrated by Khare et al. (2011), needs further scrutiny. Third, a negative interaction effect on downloads is reported for length of text reviews and dispersion of ratings. This implies that longer text reviews, by providing richer product quality information and being more helpful than shorter ones, as argued by Huang et al. (2015), could reduce perceptions of gaming app quality uncertainty due to spread in ratings of such apps. Thus, since consumers generally seek to avoid risk, dispersion of ratings tends to be undesirable (Chu et al. 2014).

Fourth, in a similar fashion, length of text reviews is found to lessen the negative effect of volume of ratings on downloads. Due to the direct and indirect effects of length of text reviews on gaming app downloads, having consumers write extensive reviews seems important to gaming app developers.

Ratings’ and reviews’ impact on productivity app downloads

The findings for productivity app downloads are reported in Table 3. Again, volume of ratings is the only rating variable with a significant direct effect on downloads. Contrary to gaming apps, its effect is positive and significant for productivity apps. This finding is consistent with arguments in the literature that higher volume of word-of-mouth has more persuasive power on decision-making and constitutes an indicator of a product being more popular (See Khare et al. 2011 for a review). It moreover corroborates previous empirical findings of how volume of ratings impacts app downloads (Oh et al. 2015; Wang et al. 2015). This suggests the importance of having users rate a productivity app in order to grow the developer’s customer base.

Polarity is the only review variable reported to have a direct effect on productivity app downloads. For products providing mainly utilitarian value, consumers tend to be much more homogeneous in their preferences. Therefore, it has been argued that consumers can infer an app’s quality from its average rating score (Roma and Ragaglia 2016). However, our findings reveal that average rating score does not impact downloads of such apps. One potential explanation of this finding is that, to potential consumers, average rating scores for competing app alternatives are too similar, as stated previously in the literature (Li 2018). In our dataset, the average rating score for an app was 4.027 on a scale of 1 to 5 stars, which corresponds to what is reported in related work (Hyrynsalmi et al. 2015). Consumers may therefore instead turn to text reviews to obtain peers’ detailed opinions on an app’s quality. Moreover, average rating score is an aggregated measure of a product’s quality to a consumer. For multi-attribute productivity apps (e.g., statistics software, spreadsheet apps, word processing apps), potential consumers may be particularly interested in the quality of specific functions or tools of such apps. By representing richer cues than ratings, consumer text reviews may better reveal such information to potential app adopters. This could explain why polarity, but not average rating score, significantly impacts downloads of productivity apps.

In the case of productivity apps as well, consumers seem to use a combination of rating and review information rather than relying on one or the other. As reported in Table 3, an increase in polarity enhances the negative effect of dispersion of ratings on downloads. This suggests that consumers are more skeptical toward text reviews mainly being positive when they know that different consumers have rated apps very differently. This line of reasoning is consistent with findings by Huang et al. (2015) on how consumers use online text reviews for their decision-making. Finally, our results reveal that subjectivity in written text reviews enhances the negative impact of dispersion of ratings on productivity app downloads. This is consistent with arguments that for products of a utilitarian value nature, consumers are more oriented toward fulfilling professional responsibilities, which implies risk aversiveness (Das et al. 2018). Specifically, subjective text reviews accompanied by high dispersion of ratings are suggested to increase potential consumers’ skepticism that the app meets the requirements for successful professional task completion. The findings reported in Tables 2 and 3 overall reveal different interaction effects of rating and review variables on downloads for games compared with productivity apps. The findings are therefore in line with arguments that consumers use rating and review information to different extents depending on whether app consumption is mainly of hedonic or utilitarian value orientation (Roma and Ragaglia 2016; Tafesse 2021).

Conclusions and implications

In this paper, we have explored the combinatory role of online ratings and online reviews in mobile app downloads. This was achieved by utilizing a daily panel data set of 295 gaming apps (gaming category) and 46 productivity apps (productivity category). The data sample consisted of apps tracked for almost two years from their launch in the Apple App Store. We report that ratings and reviews have both direct effects and interaction effects on downloads. These effects of rating and review variables are at the same time found to differ for gaming versus productivity apps. Thereby, we provide further support to the literature arguing for classification of apps into hedonic and utilitarian consumption value segments (Liu et al. 2014; Roma and Ragaglia 2016; Tafesse 2021). That is, apps consumed mainly for fun versus for professional purposes. Mainly, this study has contributed to the sparse literature on how online ratings and reviews separately and in combination impact app consumer behavior. The findings have important implications for user attraction and retention strategies of app developers and platform providers.

Limitations

The limitations of this study are important to acknowledge. First, our empirical analysis was based on a comparison of two app categories. These categories were selected because ratings and reviews have been found to have different effects depending on the type of consumption value a product mainly provides to consumers. Previous studies have thus segmented apps according to hedonic and utilitarian value based on their app store category and reported the merits of such classification. However, the app categories analyzed in this study may differ in other dimensions as well, such as the role played by network effects (Arora et al. 2017) and consumer segments targeted (Liu et al. 2017). The data we utilized did not allow us to control for such effects. Such extensions of this study are recommended for future work into apps. Second, our results are limited to one country (US) and the US Apple App Store market. To what extent our findings generalize to countries with other characteristics, e.g., other cultural dimensions (Hofstede 2001), and across different platforms for apps (Roma and Vasi 2019) needs further scrutiny. Third, our study was restricted to how peer influence impacts app downloads based on literature arguments favoring its use (Sher and Lee 2009). Our work thus needs to be extended by analyzing how peer information relative to commercial marketing information impacts app downloads. Future studies could consider how app tutorial videos and advertisements supplied by app providers impact downloads. Fourth, our findings rely on the use of one specific opinion mining technique. Although this technique has been validated repeatedly and used in related work (Fellbaum 1998; De Smedt and Daelemans 2012), testing the robustness of findings across lexicons and mining techniques is called for.

Directions for further research

This study explored the combinatory role of ratings and reviews in app downloads. Cue theoretical arguments on the one hand and app characteristics discussed in the literature on the other hand guided our exploratory study of how rating variables impact consumer decision-making. As our findings reveal that ratings and reviews have a combinatory impact, future experimental work should examine the corroborating and complementary cue value dimensional effects separately. Moreover, qualitative research into how consumers make mobile app downloading decisions is called for. An improved understanding could offer valuable insights to app developers and platform providers on how to describe apps in appropriate ways as well as how to appeal to users.

Identification of the conditions that make consumers rely on ratings, reviews, or a combination of both is important for improved knowledge of such mechanisms’ effectiveness. Previous studies have indicated that ratings matter to different extents depending on country profile (Kübler et al. 2018) and app store market (Jung et al. 2012). This study has contributed to such work by demonstrating how app type matters for the combinatory role of ratings and reviews. Additional conditions are worth exploration in future work, such as how consumer type, for example, opinion leadership versus opinion-seeking (see Flynn et al. 1996), and type of revenue model (see Roma and Ragaglia 2016) matter for the role played by ratings and reviews. Another condition to be considered is how product updates impact ratings and reviews and in turn consumer decision-making. Until now, there has been little research into the role of updates in product appeal, such as for software (see Comino et al. 2018). Moreover, comparative studies into how ratings and reviews matter for different outcomes, such as creating product awareness, initiating use, and generating sales, represent another avenue of further research.

Previous research (see Schrum et al. 2020, for a review) has demonstrated that the design of a scale, such as a rating system, matters for consumers’ response to it. The range of a rating scale and how alternatives on the scale are phrased or depicted may therefore influence consumer behavior. This effect has been studied for products, albeit not for mobile apps with their specific characteristics. “Optimizing the app store rating system could therefore potentially improve ratings and provide consumers with better insight into the expected experience before downloading a mobile app. Other research (Filieri et al. 2021) has shown that consumers are biased in their consumption decision based on how previous consumers rated a product. Whether to treat ratings’ linearly on a scale is therefore worthy further scrutiny. For instance, how consumers weight differences in average rating score across competing product alternatives represents one such issue.

Moreover, how long the memory process is for low, high, and moderate ratings could be worthy further investigation (cf. Zhang et al. 2015). Such studies could offer complementary value to the present study in which a one day’s lag of ratings and review variables was used to investigate their impact on mobile app downloads.

Managerial implications

Our findings have important implications for app developers and platform providers. First, this study reveals that consumers rely on both rating and text review information for their app downloading decisions. However, different types of this information are found to be important for gaming apps and productivity apps.

For gaming apps, developers should encourage current users to write extensive reviews as they are found to positively impact new downloads of such apps. This can be done via pledges in the app asking users to rate and provide text reviews on the app’s important attributes. Preferences for products used mainly for fun tend to vary from person to person (Akdim et al. 2022). This is consistent with our findings that both a high rating for a gaming app and a positive text review are necessary to attract new downloads. These implications should also be considered by developers of other apps that are mainly consumed for leisure or fun.

For productivity apps, which tend to have multiple attributes and functions such as statistics software or word processing tools, positive reviews are important for stimulating downloads. This information represents important cues about the quality of specific functions of interest to potential consumers, e.g., the quality of a specific data analysis tool in statistics software. Moreover, having consumers rate productivity apps is important for attracting downloads. Developers of productivity apps should therefore create means for users to rate their apps. It is argued that doing so creates discussions about the app, which stimulates downloads (Mitchell and Khazanchi 2008). These implications should be considered by developers of apps consumed mainly for professional purposes.

Finally, for app platform providers, it is important that rating mechanisms fulfill the function of providing users with valuable cue information on competing app alternatives. In our study sample, the average rating score for competing app alternatives was 4.027 on a scale of 1 to 5. Similar findings have been reported in related work (Hyrynsalmi et al. 2015). In effect, ratings may not be sufficiently different across competing app alternatives to guide consumer choice. App stores should therefore consider having consumers provide an overall rating of an app as well as of its various quality attributes. Displaying this specific information in app stores could remedy the problem of ratings being too similar to inform consumer decision-making. As reading an extensive number of text reviews may require significant effort by potential consumers, particularly for lower-priced products such as apps (Burgers et al. 2016), readers need help sorting large volumes of reviews. One extension opportunity for app platform providers is to have readers click on an icon next to a text review if they find it helpful. This information could help readers sort text reviews based on users’ helpfulness ratings, which could make review information more useful to potential app consumers. Finally, the implications for developers of gaming and productivity apps should be considered by app platform providers as well. This follows from the revenue-sharing agreements between app developers and app platform providers (Roma and Ragaglia 2016).