Analysing app reviews for software engineering: a systematic literature review

Dąbrowski, Jacek; Letier, Emmanuel; Perini, Anna; Susi, Angelo

doi:10.1007/s10664-021-10065-7

Analysing app reviews for software engineering: a systematic literature review

Open access
Published: 20 January 2022

Volume 27, article number 43, (2022)
Cite this article

Download PDF

You have full access to this open access article

Empirical Software Engineering Aims and scope Submit manuscript

Analysing app reviews for software engineering: a systematic literature review

Download PDF

Jacek Dąbrowski ORCID: orcid.org/0000-0003-3392-0690^1,2,
Emmanuel Letier¹,
Anna Perini² &
…
Angelo Susi²

10k Accesses
23 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 15 March 2022

This article has been updated

Abstract

App reviews found in app stores can provide critically valuable information to help software engineers understand user requirements and to design, debug, and evolve software products. Over the last ten years, a vast amount of research has been produced to study what useful information might be found in app reviews, and how to mine and organise such information as efficiently as possible. This paper presents a comprehensive survey of this research, covering 182 papers published between 2012 and 2020. This survey classifies app review analysis not only in terms of mined information and applied data mining techniques but also, and most importantly, in terms of supported software engineering activities. The survey also reports on the quality and results of empirical evaluation of existing techniques and identifies important avenues for further research. This survey can be of interest to researchers and commercial organisations developing app review analysis techniques and to software engineers considering to use app review analysis.

Mining non-functional requirements from App store reviews

Article 07 June 2019

Finding and Analyzing App Reviews Related to Specific Features: A Research Preview

Software Development Analytics in Practice: A Systematic Literature Review

Article 10 January 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

App stores have become important platforms for the distribution of software products. In 2020, Google Play Store and Apple Store host over 5 million apps and are widely used for the discovery, purchase and updates of software products (Clement 2020). The emergence of these App Stores have had important effects on software engineering practices, notably by bridging the gap between developers and users, by increasing market transparency and by affecting release management (AlSubaihin et al. 2019). In 2017, Martin et al. (2017) used the term ‘app store analysis’ to denote the emerging research using app store data for software engineering. Their survey identified the richness and diversity of research using App Store data, notably for API analysis, feature analysis, release engineering, security and review analysis (Martin et al. 2017).

This paper focuses on analysing app reviews for software engineering. App reviews are textual feedback associated with a star rating that app users can provide to other App Store users and app developers about their experience of an app (App Store 2021). Most reviews have length up to 675 characters (Pagano and Maalej 2013); and convey information on variety of topics such as feature requests, bug reports or user opinions (Martin et al. 2017; Al-Hawari 2020). Analysing these reviews can benefit a range of software engineering activities. For example, for requirements engineering, analyzing app reviews can help software engineers to elicit new requirements about app features that users desire (Johann et al. 2017; Dąbrowski et al. 2020); for testing, app reviews can help in finding bugs (Maalej and Nabil 2015; Iacob et al. 2016; Shams et al. 2020) and evaluating users’ reactions to released beta versions of their apps (Gao et al. 2019; AlSubaihin et al. 2019); during product evolution, analysing app reviews may help in identifying and prioritizing change requests (Villarroel et al. 2016; Gao et al. 2018b; Gao et al. 2019; Dąbrowski et al. 2020).

In recent years, scholars have been also studying on-line user feedback from other digital sources such as microblogs e.g., Twitter (Guzman et al. 2017), on-line forums e.g., Reddit (Khan et al. 2019), or issue tracking systems e.g., JIRA (Nyamawe et al. 2019). Most research efforts, however, have been focused on analyzing app reviews (Lim et al. 2021). Supposedly, the large number of this data, their availability and their usefulness make app reviews unique and thus the most frequently studied type of on-line user feedback (Lim et al. 2021).

Significant research has been devoted to study what relevant information can be found in app reviews; how the information can be analysed using manual and automatic approaches; and how the information can help software engineers. However, this knowledge is scattered in literature, and consequently there is no clear view on how app review analysis can support software engineering. The previous survey on app store data analysis (Martin et al. 2017) identified app review analysis as one important topic within the broader area of app store analysis but does not present a detailed comprehensive analysis of app review analysis techniques. Other literature reviews focus on specific types of review analysis such as opinion mining (Genc-Nayebi and Abran 2017) and information extraction (Tavakoli et al. 2018; Noei and Lyons 2019) but they do not cover the whole range of research on analysing app reviews. In contrast, this paper provides a systematic literature review of the whole range of research on analysing app reviews from the first paper published in 2012 up to the end of 2020. The paper objectives are to:

identify and classify the range of app review analysis proposed in the literature;
identify the range of natural language processing and data mining techniques that support such analysis;
identify the range of software engineering activities that app review analysis can support;
report the methods and results of the empirical evaluation of app review analysis approaches.

To accomplish these objectives, we have conducted a systematic literature review following a well-defined methodology that identifies, evaluates, and interprets the relevant studies with respect to specific research questions (Kitchenham 2004). After a systematic selection and screening procedure, we ended up with a set of 182 papers, covering the period 2012 to 2020, that were carefully examined to answer the research questions.

The primary contributions of the study are: (i) synthesis of approaches and techniques for mining app reviews, (ii) new knowledge on how software engineering scenarios can be supported by mining app reviews, (iii) a summary of empirical evaluation of review mining approaches, and finally (iv) a study of literature growth patterns, gaps, and directions for future research.

2 Research Method

To conduct our systematic literature review, we followed the methodology proposed by Kitchenham (2004). We first defined research questions and prepared a review protocol, which guided our conduct of the review and the collection of data. We then performed the literature search and selection based on agreed criteria. The selected studies were read thoroughly, and data items as in Table 3 were collected using a data extraction form. Finally, we synthesized the results for reporting.

2.1 Research Questions

The primary aim of the study is to understand how analysing app reviews can support software engineering. Based on the objective, the following research questions have been derived:

RQ1: What are the different types of app review analyses?
RQ2: What techniques are used to realize app review analyses?
RQ3: What software engineering activities are claimed to be supported by analysing app reviews?
RQ4: How are app review analysis approaches empirically evaluated?
RQ5: How well do existing app review analysis approaches support software engineers?

The aim of RQ1 is to identify and classify the different types of app review analysis presented in primary literature; where an app review analysis refers to a task of examining, transforming, or modeling data with the goal of discovering useful information. The aim of RQ2 is to identify the range of techniques used to realize the different types of app review analysis identified in RQ1; where a technique stands for a way for facilitating an app review analysis. The aim of RQ3 is to identify the range of software engineering activities that have been claimed to be supported by analyzing app reviews; where a software engineering activity refers to a task performed along the software development life cycle (Bourque et al. 1999). The aim of RQ4 is to understand how primary studies obtain empirical evidences about effectiveness and the perceived-quality of their review analysis approaches. The aim of RQ5 is to summarize the results of empirical studies about effectiveness and user-perceived quality of different types of app review analysis.

2.2 Literature Search and Selection

We followed a systematic search and selection process to collect relevant literature published between January 2010^{Footnote 1} and December 2020. Figure 1 outlines the process as a PRISMA diagram^{Footnote 2}; it illustrates the main steps of the process and their outcomes (the number of publications).^{Footnote 3}

The initial identification of publications was performed using keyword-based search on six major digital libraries: ACM Digital Library, IEEE Xplore Digital Library, Springer Link Online Library, Wiley Online Library and Elsevier Science Direct. We defined two search queries that we applied in both the meta-data and full-text (when available) of the publications. To construct the first query, we looked at the content of several dozen publications analysing reviews for software engineering.^{Footnote 4} We identified key terms that these papers share and used the terms to formulate a specific query:

To not omit other relevant papers not covered by this specific query, we formulated a general query based on phrases reflecting key concepts of our research objective:

The initial search via digital libraries resulted in 1,656 studies, where 303 of them were duplicated. We screened 1,353 studies obtained through the initial search and selected them in accordance with the inclusion and exclusion criteria (see Table 1). To ensure the reliability of our screening process, the four authors of this paper independently classified a sample of 20 papers^{Footnote 5} (each paper was assigned to two authors). We then assessed their inter-rater agreement (Cohen’s Kappa = 0.9) (Viera and Garrett 2005).

Table 1 Inclusion and exclusion criteria

Analysing app reviews for software engineering: a systematic literature review

Abstract

Similar content being viewed by others

Mining non-functional requirements from App store reviews

Finding and Analyzing App Reviews Related to Specific Features: A Research Preview

Software Development Analytics in Practice: A Systematic Literature Review

1 Introduction

2 Research Method

2.1 Research Questions

2.2 Literature Search and Selection

2.3 Data Extraction

2.4 Data Synthesis

3 Result Analysis

3.1 Demographics

3.2 RQ1: App Review Analysis

3.2.1 Information Extraction

3.2.2 Classification

3.2.3 Clustering

3.2.4 Search and Information Retrieval

3.2.5 Sentiment Analysis

3.2.6 Content Analysis

3.2.7 Recommendation

3.2.8 Summarization

3.2.9 Visualization

3.3 RQ2: Mining Techniques

3.3.1 Manual Analysis

3.3.2 Natural Language Processing

3.3.3 Machine Learning

3.3.4 Statistical Analysis

3.4 RQ3: Supporting Software Engineering

3.4.1 Requirements

Requirements Elicitation

Requirements Classification

Requirements Prioritization

Requirements Specification

3.4.2 Design

User Interface Design

Design Rationale Capture

3.4.3 Testing

Validation by Users

Test Documentation

Test Design

Test Prioritization

3.4.4 Maintenance

Problem and Modification Analysis

Requested Modification Prioritization

Help Desk

Impact Analysis

3.5 RQ4: Empirical Evaluation

3.5.1 Effectiveness Evaluation

Availability of Dataset and Tool

Evaluation Objective

Annotation Procedure

Characteristics of Dataset

Effectiveness Quantification

3.5.2 User Study

Evaluation Subjects

Assessment Criteria

Study Participants

Evaluation Procedure

3.6 RQ5: Empirical Results

3.6.1 Effectiveness Evaluation Results

Information Extraction

Classification

Clustering

Search and Information Retrieval

Sentiment Analysis

Recommendation

Summarization

3.6.2 User Study Results

Information Extraction

Classification

Clustering

Searching and Information Retrieval

Sentiment Analysis

Recommendation

Summarization

Visualization

4 Discussion

4.1 Mining App Reviews Is a Growing Research Area