Machine learning for mHealth apps quality evaluation

Haoues, Mariem; Mokni, Raouia; Sellami, Asma

doi:10.1007/s11219-023-09630-8

Machine learning for mHealth apps quality evaluation

An approach based on user feedback analysis

Published: 23 May 2023

Volume 31, pages 1179–1209, (2023)
Cite this article

Download PDF

Software Quality Journal Aims and scope Submit manuscript

Machine learning for mHealth apps quality evaluation

Download PDF

Mariem Haoues^1,2,
Raouia Mokni^3,4 &
Asma Sellami²

1647 Accesses
Explore all metrics

Abstract

Mobile apps for healthcare (mHealth apps for short) have been increasingly adapted to help users manage their health or to get healthcare services. User feedback analysis is a pertinent method that can be used to improve the quality of mHealth apps. The objective of this paper is to use supervised machine learning algorithms to evaluate the quality of mHealth apps according to the ISO/IEC 25010 quality model based on user feedback. For this purpose, a total of 1682 user reviews have been collected from 86 mHealth apps provided by Google Play Store. Those reviews have been classified initially into the ISO/IEC 25010 eight quality characteristics, and further into Negative, Positive, and Neutral opinions. This analysis has been performed using machine learning and natural language processing techniques. The best performances were provided by the Stochastic Gradient Descent (SGD) classifier with an accuracy of 82.00% in classifying user reviews according to the ISO/IEC 25010 quality characteristics. Moreover, Support Vector Machine (SVM) classified the collected user reviews into Negative, Positive, and Neutral with an accuracy of 90.50%. Finally, for each quality characteristic, we classified the collected reviews according to the sentiment polarity. The best performance results were obtained for the Usability, Security, and Compatibility quality characteristics using SGD classifier with an accuracy equal to 98.00%, 97.50%, and 96.00%, respectively. The results of this paper will be effective to assist developers in improving the quality of mHealth apps.

Diabetes Self-management Mobile Apps Improvement Based on Users’ Reviews Classification

CSLabel: An Approach for Labelling Mobile App Reviews

Article 01 November 2017

Recognizing factors effecting the use of mobile banking apps through sentiment and thematic analysis on user reviews

Article 13 July 2023

1 Introduction

Software development organizations compete to provide mobile applications (apps)^{Footnote 1} that successfully satisfy user needs. Worldwide, around 2.87 billion people use smartphones, where 47% say they cannot live without their devices Turner (2020). Mobile apps provide interesting services for users; however, their quality is also important. Several quality characteristics should be provided by each mobile app, especially usability. The best way to evaluate the quality of mobile apps from a user perspective is to analyze his feedback.

Users review mobile apps they used or are currently using. User feedback contains usage scenarios, bug reports, and feature requests, that can help apps’ developers to accomplish apps maintenance and evolution tasks Panichella et al. (2015). Hence, user feedback can be used by developers to early fix bugs and enhance the new release Maalej et al. (2015). The manual analysis of user reviews is unreasonable. Mobile apps’ developers spend an important effort in collecting and analyzing reviews to better satisfy user needs.

Mobile apps are increasingly being adopted in the healthcare industry, by patients as well as medicinal experts. Statistics indicate that over 318,000 mobile apps for healthcare are available in major app stores with more than 270 million people having downloaded a healthcare app Mobius MD (2019). Mobile apps in healthcare are classified into different types such as health & fitness, stress, and diagnosis. Healthcare apps (mHealth apps) mostly provide assistance outside hospitals for patients and can help them manage their daily routine such as measuring vital parameters (e.g., pulse, blood sugar), taking medicine, etc. On the other hand, mHealth apps help healthcare providers conducting virtual visits and gathering data from their patients. For those reasons, healthcare organizations are increasingly adapting mHealth apps to improve the quality of their services. Currently, mHealth apps are used by a considerable number of users, which may lead to a large volume of reviews. Hence, due to the large volume of texts, the manual extraction of relevant information is an impracticable task Messaoud et al. (2019). In fact, manually analyzing user reviews is tedious and time-consuming, especially when looking for valuable reviews Tamjeed (2020).

Since its introduction in 1949 by the Canadian psychologist, Hebb (1949), machine learning algorithms have been increasingly being adopted in different domains (e.g., software engineering, healthcare) due to their problem-solving capacity Alpaydin (2020). Different software development and maintenance activities could be expressed through learning problems and solved by learning algorithms such as effort estimation Pospieszny et al. (2018), requirements classification Zhang and Tsai (2002), and so on. 40% of the United States companies use machine learning to improve sales and marketing, with 76% of them having exceeded their sales targets thanks to the use of machine learning Agrawal (2020). The promising results reported encouraged us to use machine learning algorithms in our study to evaluate mHealth apps’ quality based on user feedback.

The quality evaluation of mobile apps, in particular mHealth apps has been investigated recently by many researchers (cf., Al Kilani et al. (2019); Idri et al. (2018); Dewi et al. (2020), etc.). Several techniques have been used for this purpose such as quality metrics (cf., Zulfa et al. (2020); Dewi et al. (2020), etc.), quality assessment questionnaire (cf., Idri et al. (2017, 2018), etc.), and machine learning (cf., Al Kilani et al. (2019); Lu and Liang (2017), etc.). In fact, researchers evaluated the quality of mobile apps according to a set of quality characteristics (e.g., portability, maintainability, performance); however, none of these research studies respected totally the ISO/IEC 25010 quality model (cf., Idri et al. (2017, 2018), etc.). For instance, Idri et al. Idri et al. (2017) considered Operability as a quality characteristic, while Operability is a sub-characteristic of the Usability quality characteristic according to the ISO/IEC 25010 quality model ISO/IEC (2010). In addition, machine learning has been successfully used in previous works (cf., Al Kilani et al. (2019); Araujo et al. (2020), etc.) to classify user reviews into different categories for requirements engineering such as bug reports and enhancement reports. In fact, except for Kilani et al. Al Kilani et al. (2019), none of the previous studies proposed to classify user reviews on mobile apps according to the different ISO/IEC 25010 quality characteristics.

The main objective of this paper is to improve the quality of mental health apps based on user feedback. Hence, the main research question is that we address in this paper is “How to evaluate the quality of mHealth apps based on user feedback according to ISO/IEC quality model?”. The main contributions of this work can be summarized as follows:

Firstly, we collect data from the user feedback on mHealth apps provided by the Google Play Store and apply natural language processing techniques to construct a classification system using machine learning algorithms.
Secondly, we apply six machine learning algorithms (Random Forest, Decision Tree, Multinomial Naïve Bayes, K-Nearest Neighbors, Support Vector Machine, and Stochastic Gradient Descent) to classify the collected user reviews according to the eight quality characteristics of ISO/IEC 25010 quality model (Functional suitability, Reliability, Performance efficiency, Compatibility, Usability, Security, Maintainability, and Portability) as well as sentiment polarity (Positive, Neutral, and Negative).
Finally, we conduct a set of experiments using our created dataset, as our proposed model yields the highest performance compared to other machine learning and deep learning models (BERT, RoBERTa, DistilBERT, and DistilBERT ML).

This paper is structured as follows: Sect. 2 presents the background information about ISO/IEC 25010 quality model ISO/IEC (2010), and reviews some related works. Section 3 describes how to use machine learning algorithms in evaluating mHealth apps’ quality based on the user feedback (i.e., opinions). Section 4 presents and discusses the results of our conducted experiments. Section 5 discusses the obtained results in this paper. In Sect. 6, we highlight several threats to its validity. Finally, Sect. 7 summarizes this work with a set of future work directions.

2 Background and literature review

This section gives background information about the ISO/IEC 25010 quality model ISO/IEC (2010) and reviews some works studying the use of machine learning algorithms to evaluate the quality of mHealth apps using user feedback.

2.1 Software product quality: ISO/IEC 25010

ISO/IEC 25010 quality model, part of the SQuaRE (Software product Quality Requirements and Evaluation) series, presents a standardized way of defining and quantifying software/service quality characteristics. The ISO/IEC 25010 quality model is a “set of characteristics, sub-characteristics, quality measures, quality measure elements and relationships between them” ISO/IEC (2010). This model is composed of eight characteristics and 31 sub-characteristics that are related to the static properties of the software and dynamic properties of the computer system.

Compared to other quality models, ISO/IEC 25010 is more comprehensive and complete Herrera et al. (2010). For these reasons, we selected this model in this paper; however, we will focus only on the first level of the ISO/IEC 25010 quality model (i.e., quality characteristics level). This level includes the following eight quality characteristics: Functional suitability, Reliability, Performance efficiency, Compatibility, Usability, Security, Maintainability and Portability.

2.2 Related work

In this section, we review some related work that evaluated the quality of mobile apps, in particular mHealth apps. Then, we survey some previous work that used the user feedback analysis to improve mobile apps. Finally, we provide a discussion of the state-of-art.

2.2.1 Quality evaluation of mobile apps

The evaluation of mHealth apps quality has been investigated recently in several research studies (cf., Idri et al. (2017, 2018), etc.). Some of those studies will be detailed below.

Zulfa et al. (2020) proposed to evaluate the portability of the MyITS mobile app based on its three sub-characteristics: Adaptability, Installability, and Replaceability using six metrics, as provided by the ISO/IEC (2016). The weight results calculated from these six metrics on the three sub-characteristics reached maximum results of 1.0. The calculated weight for the adaptability sub-characteristic achieved 7.89, whereas the calculated weight for the installability sub-characteristic achieved 2. For the replaceability sub-characteristic, there is no calculated weight result since all attributes cannot be computed in quality. The obtained results proved that the MyITS mobile app can work appropriately on a variety of environments (e.g., Android, IOS). For the same mobile app, Dewi et al. (2020) evaluated and measured the maintainability quality characteristic based on its four sub-characteristics: Analysability, Modularity, Reusability, and Testability using 10 metrics, as provided by the ISO/IEC (2016). The results of this study showed that the maintainability of the MyITS mobile apps is good. The weight results calculated for the four sub-characteristics reached maximum results of myITS Lecturer at 2.670 and myITS Student at 2.083. The best weight value obtained for the Analysability sub-characteristic achieved 1.0. For the Modularity sub-characteristic, the best-achieved weight value is 0.75. For the Reusability sub-characteristic, the best-obtained weight value is 0.5. Finally, the best-obtained weight value for the Testability quality sub-characteristic is 0.67. It must be noted that this study did not include the Modifiability sub-characteristic.

Falih and Firdaus (2019) investigated the evaluation of mobile hybrid apps quality based on the ISO/IEC 25010 quality standard using three quality characteristics: Performance efficiency, Functional suitability, and Portability. The Functional suitability characteristic is evaluated according to the Functional implementation completeness and the Functional implementation coverage metrics. The Performance efficiency characteristic is evaluated according to the CPU usage (%), Memory usage (mb), API device execution time (ms), Screen first loading time (ms), and Screen resume loading time (ms) metrics, whereas the Portability quality characteristic is assessed using the Plugin compatibility and the Number of supported platform metrics. The authors used three case studies to empirically assess their proposed method, which are RocketChat, Fresh Food Finder, and Property Cross apps. The results obtained in this paper showed that, in terms of Functional suitability characteristic, both Fresh Food Finder and Property Cross apps provided good values, whereas for the Performance efficiency characteristic, both RocketChat and Property Cross apps provide a better response time. Finally, for the Portability characteristic, Fresh Food Finder and Property Cross apps give the best results. Hence, concerning the three selected quality characteristics, PropertyCross is better than Fresh Food Finder followed by the RocketChat app.

On the other hand, several researchers investigated the quality evaluation of mHealth apps. For example, Idri et al. (2017) evaluated the software quality of mobile Personal Health Records (mPHRs) for pregnancy monitoring based on the ISO/IEC 25010 quality standard ISO/IEC (2010). This study selected 17 mPHRs apps available for iOS and Android users from Apple App and Google Play stores, respectively. Their evaluation is based on a quality assessment questionnaire that covers four selected External quality characteristics: Functional suitability, Operability, Performance Efficiency, and Reliability. For each quality characteristic, a set of questions has been proposed depending on the number of sub-characteristics. Each selected app is then evaluated according to the 5-interval scale classification (1–1.5: Very low, 1.6–2.5: Low, 2.6–3.5: Moderate, 3.6–4.5: High, and 4.6–5: Very high). This study showed that the majority of the selected apps offered the Functional suitability (satisfied by 16 from 17: 94.11% ) and Reliability (satisfied by 17 from 17: 100%) quality characteristics more than the Operability (satisfied by 14 from 17: 82.35% ) and Performance efficiency (satisfied by 7 from 17: 41.17%) quality characteristics. Moreover, this study used four classifiers (Iterative Dichotomiser 3, C4.5, K-nearest neighbors, and Naïve Bayes) to predict the quality in-use (i.e., user ratings) from the external quality of the mPHR apps. Among the 17 selected apps, 14 apps that include the user ratings, have been used in the classification. Each app is presented with the median scores from the selected four quality characteristics and described by its user rating (Moderate, High, and Very High). The experiment evaluation that has been conducted using a 2-fold cross-validation model showed that the K-nearest neighbors achieved the highest mean accuracy rate, followed by C4.5, Naïve Bayes, and Iterative Dichotomiser 3. This study did not provide the accuracy measures for the selected classifiers.

Idri et al. (2018) used a quality assessment questionnaire to evaluate the requirements provided by 30 gamified blood donation apps concerning eight quality characteristics (Functional suitability, Reliability, Performance efficiency, Operability, Security, Compatibility, Maintainability, and Transferability). Then, each selected app has been evaluated according to a set of questions. According to its score, a gamified blood donation app is classified into five groups: Very high if the app’s score $\in$ [0.90, 1.00], High if the app’s score $\in$ [0.7, 0.89], Moderate if the app’s score $\in$ [0.4, 0.69], Low if the app’s score $\in$ [0.2, 0.39] and Very low if the app’s score $\in$ [0, 0.19]. The results of this paper showed that the majority of the selected apps satisfied the Functional suitability with 100%, Operability with 91%, Performance efficiency with 86%, and Reliability with 84%.

Davalbhakta et al. (2020) assessed the quality of the mobile apps currently utilized for COVID-19, using the Mobile Application Rating Scale (MARS) for overall Engagement, Functionality, Aesthetics, and Information sub-scales. This study selected 63 apps from Apple app and Google Play stores. The authors conducted their evaluation according to the app continent. The obtained results showed that apps from Asia are rated higher in functionality sub-scale (mean = 0.54; 95%), while the UK (8 of 17 from Europe) and North American apps together are rated higher in information sub-scale (mean = 0.6; 95%). Regarding the Aesthetics, engagement sub-scales, they did not vary between the western and Asian Apps. Generally, this study showed that COVID-19 mobile apps satisfied the functionality dimension with 91.87%, followed by the aesthetics dimension with 77.94%, then the information dimension with 72.58%, and finally the engagement with 64.12%.

Table 1 Summary of the research studies that evaluated the quality of mobile apps

Machine learning for mHealth apps quality evaluation

Abstract

Similar content being viewed by others

Diabetes Self-management Mobile Apps Improvement Based on Users’ Reviews Classification

CSLabel: An Approach for Labelling Mobile App Reviews

Recognizing factors effecting the use of mobile banking apps through sentiment and thematic analysis on user reviews

1 Introduction

2 Background and literature review

2.1 Software product quality: ISO/IEC 25010

2.2 Related work

2.2.1 Quality evaluation of mobile apps

2.2.2 User reviews classification for mobile apps improvement

2.2.3 Discussion

3 Research design

3.1 Machine learning for mHealth apps user feedback analysis

3.2 Dataset construction and data cleaning

3.3 Data annotation

3.4 Data pre-processing

3.5 Data vectorization

3.6 Machine learning algorithms

3.6.1 Support vector machine

3.6.2 Multinomial Naïve Bayes

3.6.3 Stochastic gradient descent

3.6.4 K-Nearest neighbors

4 Experiments results and discussion

5 Discussion and comparative evaluation

6 Threats to validity

7 Conclusion

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical approval

Informed consent

Human participants and/or animals

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation