1 Introduction

Despite the well-known health benefits of physical activity, 23% of the adult population worldwide does not meet the recommended guidelines [1]. Systematic reviews concluded that levels of physical activity in Europe vary across countries, ranging from 15.6% in Israel to 84.8% in Slovakia who met the guidelines [2]. In the Netherlands, approximately one third of the adult population does not meet the Dutch guidelines for healthy physical activity [3]. Moreover, compared to other European countries, Dutch people lead a relatively sedentary lifestyle, with 25% spending at least 8.5 hour sitting on a usual day and over 60% at least 5.5 hour [4]. Insufficient physical activity is one of the leading risk factors for premature mortality and avoidable health-related issues as cardiovascular diseases, cancer, and diabetes [1, 5]. Thus, effective and widely accessible interventions to increase levels of physical activity are needed.

Smartphones and smartphone applications (apps) could be useful as mobile coaching systems that aim to increase levels of physical activity, as they are well intertwined in modern society, always accessible to the user, and because they can lower the barrier for people to address their health problems [6]. Despite the fact that many adults do not meet the activity guidelines, apps that focus on health and fitness promotion are popular. To illustrate, the numbers of health and fitness apps are still growing and the iTunes App Store contained 71,895 health and fitness apps in 2016 [7], including both free and paid apps. Moreover, also traditional interventions have been influenced by ICT developments and make use of mobile phones and the internet. Web-based and mobile (app)-based interventions (i.e., eHealth and mHealth interventions) provide opportunities for delivering personalized materials to promote physical activity on a population level [8, 9]. Several reviews and meta-analyses of eHealth interventions targeting physical activity found small effects on levels of physical activity in favor of the intervention groups [9,10,11]. mHealth interventions that were included in systematic reviews and meta-analyses mainly consisted of interventions delivered via sms or a personal device assistant (PDA) and showed promising results [8, 12,13,14]. However, to date, no systematic reviews on the effectiveness of app-based interventions to promote physical activity are available.

Smartphones offer a wide range of technological possibilities, as part of or in addition to techniques used in eHealth and mHealth, such as telecommunication, sensoring/monitoring, and any-time any-place support. Even though no systematic reviews on the effectiveness of mobile interventions to promote physical activity have been published yet, there are several content analyses available focusing on the inclusion of behavior change theories and behavior change techniques. Those reviews showed that the apps were generally lacking foundation in behavior change theories and the use of behavior change techniques that are associated with effectiveness [15,16,17,18,19]. Behavior change techniques that were often included in apps were self-monitoring, providing feedback on performance and goal-setting [15]. However, sensoring and monitoring can be done in various ways and it remains unclear to what extent current physical activity apps make use of the technological possibilities to help the user to be physically active and thus actually deliver the promises of mobile coaching systems. For example, features as self-monitoring can be based on different types of inputs, e.g., user input (i.e., diary) or sensor data obtained from the phone or from external sensors, such as a Fitbit or a GPS-watch.

Although some technological features can be mapped to behavior change techniques (that again can be associated with effectiveness), it is currently unknown what features implicate higher effectiveness of physical activity apps. It is an interesting first step to investigate the prevalence of those features in the current supply of physical activity apps. Therefore, the aim of the present paper is to inventory the landscape of the state-of-the-art smartphone apps that promote physical activity, in order to (1) gain insight in technological possibilities and (2) identify missed opportunities. More specifically, a framework of technological features is proposed, and a set of apps is selected systematically for the content analysis to discern how often those features are implemented. In addition, it is investigated whether the price of an app and the type of app store in which they are available (Google Play Store vs. iTunes App Store) are correlated with the number of features that are implemented. Also, we explored whether the apps’ number of features are correlated with the reviewers’ ratings of their usability.

The remainder of this paper is organized as follows. Section 2 describes the methods of identifying, screening, and scoring the eligible apps, including the framework used to score the apps. The results are presented in Section 3, and they are reflected upon in the discussion in Section 4. Finally, Section 5 provides a conclusion.

2 Methods

This section describes the process of searching, screening, and selecting the apps to be included in the systematic review, as well as the scoring procedure and how the scores were analyzed.

2.1 Identification

For this review, the Google Play Store and the iTunes App Store were searched for relevant apps. In the first quarter of 2015, Android and iOS (the mobile operating systems served by these two app stores) accounted for 96.7% of the market share. The remaining 3.3% is covered by Windows Phone, Blackberry OS, and other mobile operating systems [20]. For reasons of efficiency, only apps from the app stores of the two market leaders were reviewed in this study.

The Google Play Store and the iTunes App Store were searched between April and May 2015. The search terms used to search the app stores were based on an exploration of the 20 most popular apps in the “Health and Fitness” category of both app stores. The descriptions of those apps were screened and the most prevalent terms were listed. The resulting list of key words was used to construct a set of combined search terms: coach fitness, coach exercise, coach fit, coach workout, coach training, fitness exercise, fitness fit, fitness workout, fitness training, exercise fit, exercise workout, exercise training, and physical activity. These search terms were used to identify relevant physical activity apps in the two app stores, up to a maximum of 100 apps per search term. This led to 100 screened apps per search term and app store, except for physical activity and coach fit, which yielded only 48 and 69 results in the iTunes App Store. Thus, a total of 2517 apps was identified.

2.2 Screening

The total number of 2517 identified apps was screened for inclusion in the app review. The screening procedure consisted of evaluating the app description and screenshots in the app stores, in order to determine whether the app met the predefined inclusion criteria. Some apps that were included based on this screening were still excluded in a later stage after downloading and further exploring the app.

The general inclusion criteria stated that (i) the app is in either English or Dutch; (ii) the app promotes physical activity; (iii) the app is aimed at a healthy population, rather than some specific target group; (iv) the app is focused on adult users, i.e., suitable for users 18 to 65 years of age; (v) the app is not specifically focused at male or female users; and (vi) the app offers more than static information only.

This leads to the following list of exclusion criteria for the apps that were identified through the initial search:

  1. 1.

    General

    1. (a)

      Language: The app is in a language other than English or Dutch.

    2. (b)

      Gender: The app is aimed at male or female users specifically.

    3. (c)

      Age/Health: The app is not aimed at adults, but at children, adolescents, or elderly people specifically, or the app is not aimed at a healthy population but a specific target group, such as people with obesity or other physical problems or illnesses.

  2. 2.

    Aim

    1. (a)

      Dieting: The app is aimed at weight loss, for example through information about dieting, nutrition, calorie counting, without (substantial) physical activity component.

    2. (b)

      Brain: The app is aimed at brain training to improve cognitive capacities.

    3. (c)

      Tactics: The app is aimed at teaching tactics (for sports, games or exams).

    4. (d)

      Games: The app is a game that does not require or promote physical activity.

    5. (e)

      Mind: The app is aimed at stimulating the mind, through for example meditation and mindfulness.

    6. (f)

      Specific: The app is aimed at very specific physical activity, such as training one particular muscle group.

  3. 3.

    Methods

    1. (a)

      Testing: The app only offers a test of physical fitness or endurance, without further support or advice to become more physically active.

    2. (b)

      Timer: The app only offers a timer.

    3. (c)

      Information: The app only offers static information, such as opening times of local sports clubs.

    4. (d)

      Book/Magazine: The app is a digital version of a book or magazine about physical activity or health.

  4. 4.

    Other

Any other reason why an app was excluded, which does not fit in the reasons listed above. For example, the app only offers a heart rate measurement tool.

After the first screening of the 2517 identified apps, 227 apps remained to be reviewed. Of those 227 apps, 113 were found in the iTunes App Store, 89 in the Google Play Store, and 25 in both app stores. In the next step, another 58 apps were excluded, for example because they were seemingly removed from the app stores, because the app required external hardware or a paid subscription, or because they did not meet the inclusion criteria for the review after all. For the remaining 169 apps, targeted search revealed in which app store(s) they were actually available, irrespective of which app store they were originally identified in. This led to a total of 38 apps in the iTunes App Store, 39 apps in the Google Play Store, and 92 in both app stores.

Figure 1 provides a schematic overview of the decision sequence from the identification to the inclusion of the apps. Please note that as soon as one exclusion criterion was identified in the app, the app was rejected and that criterion was registered. This implies that there could have been more reasons why the apps were not eligible than represented in this overview. Apps that were identified multiple times through different search terms are registered under “doubles.”

Fig. 1
figure 1

Schematic overview of the selection process for apps eligible for full review

2.3 Scoring

This section describes how the selected apps were scored. In Section 2.3.1, the framework used for scoring is introduced and explained. In Section 2.3.2, the procedure of scoring the apps is described, including an analysis of the agreement between the different raters.

2.3.1 Scoring framework

The selected apps were scored using a framework of smartphone features (techniques and functionalities) that can be used to monitor or encourage physical activity in an e-coaching system. The framework was designed based on relevant literature, in order to ensure good coverage of the important features. First, it was partly based on a systematic review of scientific publications on smartphone applications that aim to increase physical activity levels [21]. In this review, the authors investigated 26 articles reporting about the viability of smartphones to measure and/or influence physical activity but did not investigate the described apps themselves. Second, the framework was partly based on literature on desired features in smartphone applications that promote physical activity [22, 23].

The resulting framework consists of 50 items, which are organized into five categories about app features (measuring and monitoring, information and analysis, support and feedback, adaptation and social), and two categories for additional information (usability and other). The categories were established through discussions between the authors and allow for analysis of the apps on different levels of abstraction. The two additional categories are subordinate in the current work, as they concern other aspects than app features but serve as a basis for further analyses on the collected data, such as investigating correlations between the perceived usability of apps and their implemented features.

  1. 1.

    Measuring and monitoring

This category contains items about how the app receives its input. This could be, for example, through user input, through built-in smartphone sensors, or through external (hardware) sensors or other sources.

  1. 2.

    Information and analysis

This category contains items about how the collected data is analyzed, summarized, and represented.

  1. 3.

    Support and feedback

In this category, the items cover what kind of support or feedback the app offers the users. For example, does it provide auditory or visual (real-time) feedback or feedback based on the user’s context, etc.

  1. 4.

    Adaptation

This category contains items about whether (and to what extent) the app adapts to the user, e.g., does the app automatically adjust the goals to the user’s behavior?

  1. 5.

    Social

This category investigates the social aspect of the app. For example, is there a community within the app or is there a possibility to connect to external social networking platforms? Additionally, the category contains items about the functionalities within these online communities, e.g., is it possible to send messages, to compete, or to see a leader board?

  1. 6.

    Usability

In this category, the apps are scored on clarity (how easy is it to find information) and attractiveness (does the app look appealing), on a scale from 1 to 5.

  1. 7.

    Other

This category contains items that do not fit in the other categories. The items cover whether there is a website available where the users can view their data in (more) detail and whether the app offers in-app purchases that enable more functionalities.

See the Appendix for the complete scoring framework.

2.3.2 Scoring procedure

The scoring procedure consisted of downloading each of the 227 selected apps and exploring the different functionalities offered by the app. Actually, downloading the app yields more reliable results than merely screening the description and screenshots in the app store [17, 24]. The hands-on experience with each app took approximately 15 min. If some of the functionalities were dependent on actual use (i.e., responding to registered accelerometer or location data), the app was kept running in the background for a couple of hours to days, to see if other functionalities would be revealed.

The framework described in Section 2.3.1 was implemented in Microsoft Excel. Each item from the framework was assigned a 1 or 0: 1 if the answer to the question was “yes” and 0 if the answer was “no” or if it was not clear whether the app included the described feature. If an item consisted of several subitems, it was awarded a 1 if at least one of the subitems was also awarded a 1, and a 0 otherwise. This allows for analysis of the apps’ features on different levels of abstraction. In addition to the scores on the framework, the name of the app, the name of the app store, and the price of the app were registered during the scoring process.

Four reviewers (RK, JM, AM, and RFH) contributed to scoring the 227 selected apps. Each app was scored by two reviewers. Of the 227 apps, RK and JM reviewed a set of 125 apps, RK and RFH reviewed 65 apps, and the remaining 37 apps were reviewed by RK and AM. In order to ensure consensus on the interpretation of the framework, it was discussed extensively before the start of the app review. In addition, decisions on interpretation that arose during the scoring process were documented and continuously shared among the four reviewers. After the review of the apps, possible discrepancies between the scores of the two reviewers for each app were resolved by discussion. If the disagreement was not resolved easily (e.g., if a reviewer overlooked a functionality), a score of “0.5” was registered, in order to reflect the disagreement or ambiguity.

The inter-rater reliability was assessed with Cohen’s kappa, by calculating the agreement for each app separately and taking the average. This resulted in κ = 0.69, which indicates a substantial agreement. However, since some of the apps were scored with many 0s, the probability of chance agreement is relatively high. This results in a relatively low value for κ, even though the percentage of agreement between the reviewers is high, namely, 90% of all item scores.

2.4 Analyses

Using the scores obtained as described in Section 2.3, the results can be analyzed from different perspectives. First, the scores allow insight in the extent to which such features are incorporated in physical activity promotion apps, by looking at the scores per reviewed app. Second, the scores can be used to investigate how often certain features are implemented, by looking at the sum score per item in the framework.

In addition, we used the results to investigate whether there is a difference in the number of features applied in free or paid apps or between apps from the two different app stores. This could reveal whether paid apps are generally more sophisticated (in terms of implemented features) than free apps, or vice versa. Similarly, if apps in one of the app stores are generally equipped with more features, this could indicate a difference in the selection/admission mechanism of the specific app store. The significance of these potential differences was assessed using a Mann-Whitney U test with a significance level of α = 0.05. Also, we used Spearman’s correlations to investigate whether the number of features implemented in apps is correlated with the reviewers’ ratings for the clarity and attractiveness of the apps, as captured by the usability category of the framework.

3 Results

In this section, we elaborate on the results of reviewing the 169 apps that remained after careful screening and selection.

3.1 Overview of included apps

Of the 169 apps, 39 apps were found only in the Google Play Store, 38 apps only in the iTunes App Store, and 92 apps in both app stores. Of the apps selected from the Google Play Store, 34 were free and 5 were paid, with an average price of €2.48 (range €0.76–€3.39). In the iTunes App Store, 35 of the apps found were free and 3 were paid, with an average price of €1.66 (range €0.99–€2.99). Of the 92 apps found in both app stores, 83 apps were free and 9 were paid, with an average price of approximately €2.48 (range €0.99–€2.99).

3.2 Number of features per app

First, we investigated how many features are generally included in the apps. The items in category 6 (usability) and category 7 (other) were not considered in this analysis, since they do not represent technological features. Also, if an item was divided into several subitems, only these subitems were considered in counting the number of features, in order to avoid double counting of features. The framework contains 37 of such subitems.

The average number of features included in the eligible apps was 8.18 (range 0.5–19.5).

Table 1 shows the 18 apps with the highest numbers of features. The app with the highest score was Endomondo, with 19.5 out of the possible 37 features (53%). Endomondo is available for free in the Google Play and the iTunes app store.

Table 1 Overview of apps with highest number of features

Table 2 shows the 16 apps with the lowest numbers of features. The app with the lowest score was 7 Minute Workout by mphan, with 0.5 out of the possible 37 features (1.3%). 7 Minute Workout is available for free in the Google Play Store.

Table 2 Overview of apps with lowest number of features

3.3 Number of apps per feature

Second, we investigated which features and categories of features were implemented most often. Figure 2 shows how often the five categories of features were implemented in the 169 apps. Clearly, the categories Measuring and Monitoring, Information and Analysis, and Support and Feedback were well represented, with 141 to 161 apps (approximately 83 to 95%) that include at least one of the features from that category. Social features were implemented in 121 of the 169 apps (72%). However, adaptation was part of only 7 out of the 169 apps (4%).

Fig. 2
figure 2

Frequencies of the five categories of features implemented in the apps

The next five figures show in more detail how often the features in these five categories were applied in the set of 169 apps, ordered by the most frequently implemented features. Figure 3 shows that many apps made use of user input, and a reasonable number incorporated data from a built-in sensor, but external sources were used very rarely. In Fig. 4, we see that the user’s data was usually summarized textually or numerically and less often in a visual format. Figure 5 shows that real-time feedback was usually offered as audio, whereas feedback on the overall progress of the user’s behavior is usually presented in textual format. The type of feedback that was used most often was generic advice or tips about physical activity, whereas gamification, punishment, and context-aware feedback were very rare among the apps. It also shows that users more often got to set their own goal, rather than the app deciding on a personal goal for them. In Fig. 6, we see that very few apps incorporated some form of adaptation to the user. Figure 7 indicates that sharing workouts or achievements among users was quite common; other social features (e.g., ranking or competition among users) were much less prevalent.

Fig. 3
figure 3

Frequencies of the Measuring and Monitoring features implemented in the apps

Fig. 4
figure 4

Frequencies of the Information and Analysis features implemented in the apps

Fig. 5
figure 5

Frequencies of the Support and Feedback features implemented in the apps

Fig. 6
figure 6

Frequencies of the Adaptation features implemented in the apps

Fig. 7
figure 7

Frequencies of the Social features implemented in the apps

3.4 Differences between types of apps

The average number of features implemented in free apps was 8.26 out of 37, and paid apps contained 7.63 features on average. Using a two-tailed Mann-Whitney U test, we determined that this difference was not significant (p = 0.64). The free apps received an average rating of 3.32 out of 5.0 for usability, whereas the paid apps were marked with an average rating of 3.23 out of 5.0. This difference was again not significant (p = 0.66).

The average number of features implemented in apps that were only available in the Google Play Store was 6.95, and apps that were only found in the iTunes App Store contained 8.83 features on average. This difference was not significant (p = 0.07). The apps from the Google Play Store were marked with an average rating of 3.23 out of 5.0, whereas the apps from the iTunes App Store received an average rating of 3.32 out of 5.0. The difference was again not significant (p = 0.56).

3.5 Correlation between number of features and usability rating

The range of the number of features implemented in the eligible apps was between 0.5 and 19.5. The reviewers’ ratings for the two usability items (clarity and attractiveness) both spanned the entire range from 1.0 to 5.0, with an average of 3.5 and 3.1, respectively. Using Spearman’s rank correlation, we determined that there was no correlation between the number of features implemented in an app and its rating for clarity (r s  = 0.085, p = 0.138). However, there was a moderate positive correlation between the number of features and rating for attractiveness, which was statistically significant (r s  = 0.477, p < 0.001).

4 Discussion

The current review was designed to investigate the use of technological features in apps available through the Google Play and iTunes app stores that aim to promote a physically active lifestyle. To do so, a framework of techniques and functionalities that can be used to monitor or encourage physical activity was constructed. This framework organized the features in five different categories, to be able to evaluate the apps on a higher level. Two additional categories were added to the framework to collect additional information about the apps.

The 169 apps included in the review implemented approximately 8 features out of the 37 in the framework on average. The highest number of features found in an app was 19.5 and the lowest number was 0.5. Disagreements on the presence of a feature between reviewers were reflected by awarding a score of 0.5. The features that were implemented most often were user input (to log activities or to form a personal profile), a textual/numerical overview of the user’s behavior and progress, sharing achievements or workouts in internal or external social networks, and general advice on physical activity. The features that were identified least often were adaptation, integration with external sources, and encouragement through gamification, some form of punishment or the possibility to contact an expert through the app. There were no differences found between apps from the Google Play or the iTunes app store or between free and paid apps. We determined that there was no correlation between the number of features implemented in an app and the reviewers’ rating for its clarity. Apparently, the ease to find information in an app is not compromised by the number of implemented features. However, there was a moderate positive correlation between the number of features and the rating for attractiveness. This implies that relatively simplistic apps (in terms of numbers of implemented features) are generally less visually appealing than apps with a larger number of implemented features. Although the ratings are based on the subjective evaluation of only two reviewers, these results could indicate that more sophisticated apps (that are equipped with more features) are developed with more care for their visual design as well.

The results demonstrate that some features or categories of features are applied quite often, but other functionalities are almost never implemented. Examples of features that were rarely present in the reviewed apps are integration with information from external sources, such as the user’s calendar or the local weather forecast, and adaptation to the user’s behavior. All of these rarely implemented functionalities could enhance the feel of the app being an intelligent virtual personal coach, since they imply a better understanding of the user’s personal context and progress. This is in line with the wishes and expectations of users regarding physical activity apps [23], and therefore suggests an important area of possible improvement of physical activity apps. Although we did not investigate why certain features were or were not implemented, a plausible hypothesis is that it is related to the technical and conceptual complexity of the implementation. Future work could provide more insight into this question.

The interpretation of the apps’ score in terms of implementation of technological features depends on the research question under consideration. For example, one could investigate whether the presence of specific features is related to positive user experiences (e.g., as provided via the ratings in the app store), or whether some features are correlated with the effectiveness of the app. Up to now, the effectiveness of such features in physical activity apps is unknown. Therefore, it is not (yet) possible to argue which apps are more effective than others based on the implemented features. A plausible hypothesis is that elements from each category are necessary to create a versatile and complete app. Also, some subitems seem superior to others: for example, automatic registration of physical activity through (built-in) sensors is usually more user-friendly than manual input of activities. However, such evaluations also depend heavily on user preferences, robustness of (the implementation of) the technology, and the objectives of an app, and are therefore difficult to claim on a global level. Nonetheless, the proposed framework provides a valuable tool in such more specifically motivated evaluations of apps.

One of the limitations of this study—and this type of research in general—is that it provides a snapshot of the landscape at a certain time point. This means that the results that are valid now could be different after some time. In addition, because the number of health and fitness apps has grown to unmanageable numbers, a search strategy is necessary to find a selection of apps to review. This inevitably implies that not all apps can be covered, and certain eligible apps can be missed. Another limitation is that the review depends on the visibility of the features: if certain features are used in the background, they can be missed by the reviewers. Also, the results are susceptible to the reviewers’ interpretation. This is reflected in the “half” features in the scores and the imperfect inter-rater reliability. In future work, interviews with app developers could reveal in more detail which features they did or did not implement, and for what reason.

Among the strengths of the present review are the large number of apps covered in the screening (n = 2517); the relatively large number of apps analyzed (n = 227) and included (n = 169) in the review, which was performed by multiple independent reviewers; and the inclusion of both free and paid apps from the two largest app stores. Moreover, the rating of the apps was done based on downloading and using all functions of the app, rather than considering the app description and screenshots only.

Another contribution of this work is the proposed framework. Although this framework’s relevance is also subject to advances in modern technology, it provides a basis for the prevalence of technological features in physical activity apps. In addition, the hierarchical organization of the items in the framework allows for analysis of the apps on different levels of abstraction. Depending on the research question under consideration, one could investigate only the main items or only the subitems or for example focus on coverage across all categories rather than a simple count of the number of implemented features. These possibilities of different perspectives pave the way for further analysis of the current range of physical activity apps.

The findings of this review are in line with other content analyses of physical activity apps. Although they generally focused on the application of behavior change techniques rather than technological features, these reviews also established that apps are generally lacking in such generic techniques for behavior change [15,16,17,18,19]. Similarly to the results of this study, these reviews showed that there is considerable room for improvement of the content of physical activity apps. Overall, it seems that smartphones provide a wide range of possibilities for more intelligent physical activity promotion interventions, but the developers of such apps are not yet taking full advantage of them.

5 Conclusion

This study aims to investigate the current landscape of smartphone apps that promote physical activity for healthy adults. Therefore, we present a framework to rate the extent to which such apps incorporate technological features. It provides a basis for research on the prevalence of technological features in physical activity apps. The hierarchical organization of the items in the framework allows for analysis of the apps on different levels of abstraction. Additionally, we show that the physical activity promotion apps included in the review implement an average of approximately 8 techniques and functions out of the possible 37. The features that were implemented most often were user input, textual/numerical overviews of the user’s behavior and progress, sharing achievements or workouts in social networks, and general advice on physical activity. The features that were identified least often were adaptation, integration with external sources, and encouragement through gamification, some form of punishment or the possibility to contact an expert. Furthermore, we determined that there was no correlation between the number of features implemented in an app and the reviewers’ rating for its clarity, but there was a moderate positive correlation with the rating for the app’s attractiveness. Overall, the results indicate that physical activity apps can be enhanced substantially in terms of their utilization of the possibilities that current mobile technology offers.