Feelings of well-being and happiness play a preventive role in psychopathology (e.g. depression) and are important to overall physical and mental health (Diener et al. 2017; Greenspoon and Saklofske 2001; Howell et al. 2007). Happier and more optimistic people are found to live longer and healthier lives (Kim et al. 2019; Steptoe 2019; Zaninotto and Steptoe 2019). For instance, feeling happy reduces the likelihood of a heart disease and leads to better cardiovascular health (Boehm et al. 2012; Davidson et al. 2010). Furthermore, well-being is associated with successful outcomes in life, e.g. happier people are more often married, have better relationships with their partners and are more socially engaged (Lyubomirsky et al. 2005; Moore and Diener 2019). The effects of well‐being on health are found to be independently of the negative effects of ill‐being on health, indicating the importance of investigating well-being (Howell et al. 2007).

Well-being is defined in multiple ways in the literature and often a distinction between subjective and psychological well-being is made (Keyes et al. 2002). Briefly, subjective well-being is characterized by high levels of positive affect, low levels of negative affect and a higher subjective evaluation of life satisfaction (Diener et al. 2018), whereas psychological well-being refers to thriving, positive functioning, and judgments about the meaning and purpose of an individual’s life (Ryff 1989). As subjective and psychological well-being are strongly related (e.g. Baselmans et al. 2019; Joshanloo 2016), in this paper, we refer to well-being in the broad sense, including all definitions and constructs.

The majority of studies assessing well-being (WB) make use of questionnaires that are completed by participants at a single time point or multiple times (see for a review of well-being questionnaires and well-being research Cooke et al. 2016; Diener et al. 2012; Linton et al. 2016). Well-being questionnaires ask about the general well-being or happiness and include for example the Life Satisfaction Scale (Diener et al. 1985), the Cantril ladder (Cantril 1965), or the Subjective Happiness Scale (Lyubomirsky and Lepper 1999). The scores on these well-being measures are found to be relatively stable and reliable over time (e.g. Fujita and Diener 2005; Pavot 2008; Schimmack and Oishi 2005).

However, like many complex human traits, momentary feelings of well-being (e.g. mood) fluctuate over time and in different contexts (Eid and Diener 2004; Li et al. 2014; Lyubomirsky 2001). Individuals can have similar well-being scores at different questionnaire waves, while the underlying pattern (WB over the day or week) differs substantially. Some people show relatively stable levels of WB over the day or week, while others fluctuate a lot (Eid and Diener 1999; Gadermann and Zumbo 2007). To better understand the relationship between well-being and for instance psychopathology or environmental influences, it is important to understand within person fluctuations of well-being over time. One way to capture the dynamic nature of well-being is by measuring well-being multiple times a day in the natural context and daily life of participants, i.e. using an ecological momentary assessment design.

Investigating daily experiences and behavior of people in their natural context is not a novel idea. Already in the 1920’s, the dairy method was used by Favill and Rennick (1924) and Flügel (1925) to collect reports of participants about their symptoms, behavior or mood over several days. In different disciplines (e.g. behavioral medicine and psychology), these methods developed further and are nowadays known under different labels, such as ambulatory assessment (Fahrenberg 1996; Fahrenberg et al. 2007), the experience sampling method (ESM: Csikszentmihalyi and Larson 1987) and ecological momentary assessment (EMA: Stone and Shiffman 1994). The goal of all these assessment methods is to study people in their natural environment, including measures of self-report, observational, biological, physiological and behavioral measures. Later, Kahneman and colleagues (2004) developed the day reconstruction method (DRM), where participants are asked to reconstruct and describe all experiences and events of the day. See Wilhelm et al. (2012) for a detailed review and historical overview of the development of the different methods. In this review we will use the term ecological momentary assessment (EMA) to describe all ambulatory assessment, ESM and EMA methods, as this term is widely used in well-being research.

Before technological developments, EMA studies required participants to carry beepers and booklets of questionnaires. After each random timed beep, participants had to complete the questions using pen and paper. In an early study and small sample, Dysinger (1938) reported individual differences in the average level of mood and the scale of fluctuations. However, averaged across participants, there was no evidence for daily or weekly periodicity in these fluctuations. Later, after technological developments, devices such as personal data assistants (PDA) and palmtop computers were introduced to collect EMA data. Data collection became easier and according to the results of larger EMA studies, on average, happiness varied both throughout the day (e.g. happier in the afternoon compared to the morning) and the week (e.g. happier on a Saturday compared to Monday) (e.g. Brandstatter 1991; Csikszentmihalyi and Hunter 2003; Zelenski and Larsen 2000). Furthermore, physical, social and leisure activities were associated with higher degrees of happiness, whereas being alone or at work was associated with lower happiness (Csikszentmihalyi and Hunter 2003; Schwerdtfeger et al. 2008) and smoking was unrelated to momentary positive affect (Shiffman et al. 2002).

Levering the rapid technological progression in the last few years, EMA researchers have started to replace the older methods by smartphone applications (Runyan and Steinke 2015). Using smartphones can lead to a leap forward in EMA research, since smartphones are ubiquitous in our society and data can be collected more easily. However, smartphone-based EMA research does also lead to problems and difficulties. In this paper, we first describe smartphone-based EMA studies and their advantages and difficulties. Furthermore, we systematically review the literature on smartphone-based ecological momentary assessment of well-being in healthy participants.

Since the different well-being measures show a strong phenotypic and genetic overlap and the field of smartphone-based EMA research is relatively new, we will include all measures of well-being (e.g. happiness, positive affect, life satisfaction, quality of life). Specific research questions addressed in our review are: What are the used designs (context, schedule, sampling, WB measure, applications, smartphones and statistical analyses) in smartphone-based EMA studies to well-being? To what extent is objective data, such as GPS or accelerometer data included in smartphone-based EMA well-being research? Is the response rate and compliance in smartphone-based EMA studies related to the design? What are the results of smartphone-based EMA studies with respect to well-being? Finally, what are the limitations in current smartphone-based EMA? Based on guidelines proposed by Liao et al. (2016), we will describe the studies in the areas of (1) sampling and measures, (2) schedule, (3) objective data, (4) technology and administration, (5) prompting strategy, and (6) response and compliance. Additionally, we will (7) describe the analyses, (8) summarize the findings and (9) report limitations and risk of bias in the reviewed studies. In addition, as smartphone-based EMA designs are comparable to studies using other devices to collect EMA data, such as palmtop computers or PDAs, we compare the designs and findings of smartphone-based and other device-based EMA studies. Lastly, we provide guidelines for using (smartphone-based) EMA in future well-being research based on the findings.

1 Smartphone-Based EMA Studies

Recently, more researchers are applying EMA designs in their studies, since the development of smartphones and applications facilitate the use of such designs. Nowadays, smartphone research is feasible for widespread use, especially in the more economically developed countries. The number of mobile phone users worldwide is reaching 5.2 billion, with 3.5 billion people owning a smartphone, i.e. more than one third of the population (GSMA intelligence, 2019). In the Western world and USA, already more than 70% of the population owns a smartphone. Furthermore, the development of tailored EMA applications becomes easier nowadays, as more knowledge and app building software becomes available.

The characterizing feature of EMA studies is the design in which daily behavior and experiences are assessed multiple times per day. EMA data have a high ecological validity, since information is collected in the moment of the experience and memory or recall biases are reduced (Schwarz 2007; Scollon et al. 2003). Different forms of sampling can be used (Shiffman et al. 2008). In time-contingent sampling designs, participants are signaled to answer questions at fixed times or random times within a predefined time frame. In interval-contingent sampling designs participants are prompted to fill in questions after a time interval has passed. In event-contingent sampling designs, participants are requested to complete questions when a predefined event happens. The latter can be subjective and initiated from the participant him- or herself (e.g. when drinking alcohol) or objective (e.g. at a specific GPS location or a specific level of physical activity). Mixed designs combine event and time or interval sampling (e.g. to prevent too little data points).

1.1 Advantages and Difficulties

Integrating smartphone applications into (EMA) research has multiple advantages compared to pen-and-paper methods and older EMA devices, such as PDAs (García et al. 2016).Whereas pen-and-paper methods could not measure the compliance with the scheduled assessment times, response logging is automated in applications, resulting in specific response time information. Second, large nation- or worldwide samples, even in more remote areas, can be reached with EMA applications, as researchers do not need to provide participants with PDAs or palm-top computers anymore. The application can also be used longitudinally. Third, EMA applications can be more easily designed and optimized according to the research question. Related, several sensors of smartphones can be used to measure passive data continuously, for example location, light and noise levels, accelerometer and gyroscope data and phone use. Adding the data of these mobile sensors to a smartphone-based EMA study is relatively easy and valuable. For example, instead of asking participants to report their physical activity or phone use, we can infer this from the continuous stream of accelerometer and screen use data, resulting in objective and rich data.

The nature of EMA data (i.e. real-time and real-life data) allows to accurately capture the variability of subjective experiences and to detect and discover patterns, which are missed when using a sum or average score. For example, in the field of physical activity, an accelerometer study of Chinapaw et al. (2019) suggest that not the volume, but the individual pattern of accumulated physical activity is important in relation to health. Similar for EMA data, using advanced analyses (e.g. time-series analyses) can lead to new insights in patterns and fluctuations. EMA data are powerful to make inferences for an individual instead of for the (often non-existent) average person. Since subjective experiences are measured multiple times a day, the intra-individual variability can be assessed and related to other variables. For example, in a small sample of depressed and non-depressed participants, Stavrakakis et al. (2015) measured physical activity and affect multiple times per day and investigated the fluctuations for each participant. Individual differences in both the daily pattern and the strength and direction of the relationship between physical activity and affect were found, indicating important inter-individual variability. Furthermore, Wichers et al. (2016) provided evidence for an complex dynamic pattern and network of mental states underlying depression using extensive EMA data in one participant.

Designing smartphone-based EMA studies can also lead to difficulties. The main problems are issues with (personal) privacy, the intrusive design and data storage (Trull and Ebner-Priemer 2013). Especially when combining EMA with passive sensor data, privacy and data storage are important as large amounts of personal data are collected. Passive data contains highly sensitive information and possibly identifying personal data, such as GPS location. The question is whether such data can be fully anonymized, as combining data of multiple passive measures might lead to identifying information about participants. For example, Brownstein et al. (2006) could identify the home locations for 79% of the participants in a study, based on a published image map. This clearly violates the privacy of participants. Researchers should put effort in anonymizing data and use for example the relative location data to protect privacy. Next, multiple assessments can result in a high daily burden for participants in smartphone-based EMA studies, leading to increased drop outs and decreased response rates (Shiffman et al. 2008). Related, most studies do not pay participants for their participation. The trade-off between participating and the individual benefit might be unclear for participants, leading to less willingness to participate. Lastly, the multiple assessments and passive data collection can have an effect on the battery life of (older) smartphones, affecting daily life or excluding part of the population.

2 Methods

2.1 Systematic Review of Smartphone-Based EMA Studies

2.1.1 Eligibility Criteria

To review the literature on smartphone-based ecological momentary assessment of well-being, a systematic review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines (Moher et al. 2009). Titles and abstracts of collected articles were screened for eligibility and were included if (1) (subjective) well-being, happiness, or positive affect/mood was assessed using ecological momentary assessment (EMA was defined as at least two assessments per day for a number of days in natural settings), (2) a smartphone (application) was used, and (3) healthy participants were included (adults or adolescents).

2.1.2 Information Source and Search Strategy

In September 2019 and an update in August 2020, the search for relevant articles was conducted in the bibliographic databases PubMed and Web of Science. Additional articles that were missed during this search were identified via reference lists of the selected articles. The search strategy included combinations of search terms related to (1) ecological momentary assessment, (2) well-being/wellbeing and (3) smartphone research (see Table 1). The search applied iterative combinations of these categories by employing the Boolean search operators AND (horizontal) and OR (vertical).

Table 1 Search terms, the search applied iterative combinations of these terms by employing the Boolean search operators AND (horizontal) and OR (vertical)

2.1.3 Study Selection and Data Extraction

A PRISMA flow diagram of the study selection process is presented in Fig. 1. All identified titles and abstracts were screened for eligibility. In cases of insufficient information to determine eligibility, papers were subjected to further screening. The first author screened the full text reports and decided whether papers met the inclusion criteria. Uncertainties and disagreement were resolved through discussions. Articles were excluded if they met one of the following criteria: (1) no smartphone-based EMA of well-being, (2) non-healthy participants, (3) review papers; (4) descriptive planned studies or methodological papers; or (5) studies that used smartphone applications only for an intervention instead of data collection.

Fig. 1
figure 1

PRISMA Flow Diagram of the included studies

2.1.4 Other EMA Data Collection Devices

To put the findings of smartphone-based EMA studies in perspective, we conducted an extra systematic review to EMA studies using other data collection devices, such as palm tops or PDAs with a similar search strategy as described above (see supplementary data Sheet 2 for the keywords, search details and PRISMA Flow Diagram). We compared the designs and results of the two groups of studies.

3 Results

3.1 Study Selection and Characteristics

The initial electronic database search resulted in 368 hits in PubMed and 63 hits in Web of Science. When removing duplicates, 396 original articles remained. After scanning the titles and abstracts of those articles based on the selection criteria, 56 articles remained. These articles were examined and read fully. In addition, based on references we included 19 additional articles. After excluding 29 articles based on the full-text reading, 46 articles met our selection criteria and were included in the study. In the update of August 2020, we added 7 new studies, resulting in a total of 53 studies (see Fig. 1).

Tables 2 and 3 provide an overview of the included studies, the samples and the design characteristics (see supplementary file S1 for more detailed characteristics of the reviewed studies). The first smartphone-based EMA study was published in 2010 and the number of studies published increased exponentially over the years, with 38 of the 53 studies published in 2017–2020. Most studies were conducted in Germany (k = 10), the UK (k = 8), the Netherlands (k = 9), USA (k = 8), Australia (k = 5) and South Korea (k = 2). Single studies were conducted in Canada, Israel, China, Japan, and Switzerland. The remaining studies (k = 6) used participant samples from multiple countries.

  1. (1)

    Sampling and measures

    The median sample size of the 53 studies was 97 participants. There was a lot of variability in the sample sizes with a range of 15 participants to 31,302 participants, resulting in an average number of 2212.15 participants (SD = 6687.55). A few studies included samples over 1000 participants, e.g. the UK-wide and freely available Mappiness app (studies 3, 12, 13, 18, 44 in Table 2) and the worldwide Track Your Happiness app (study 1).

    The mean age of the participants was available for 41 studies and ranged from 16.0 to 60.1. Across all studies, the mean age was 30.8 (SD = 10.5). The proportion of females included ranged from 5.2 to 100% with a mean of 57.4%. One study included adolescents (36), whereas the rest included adult participants. In 18 studies (34%), participants were only or mostly students. In three studies, participants were employees (17, 34, 38, 48) and two studies included only women (25, 53) (see Table 2 for a full overview of the sample characteristics).

    Most studies (knumber of studies = 30, 56.6%) assessed momentary happiness with a single question (variations on “To what extent are you feeling happy?”) using different scales from 0/1 (not at all) to 5/7/9/10/100 (very/strongly). Five studies used the 6-item Short Mood Scale or Multidimensional Mood Questionnaire (valence, calmness and energetic arousal) (2, 5, 9, 32, 38), five studies used positive affect items from the PANAS to rate (e.g. relaxed, excited, energetic) (33, 36, 37, 42, 52), and three a grid of valence (positive/negative) and arousal (8, 11, 19). One study used the Warwick-Edinburgh Mental Well-Being Scale with 14 items (26). Eight studies used other measures, either a qualitative measure (“Select the emotions you have experienced in the last few hours.”) (6, 25, 35) or a combination of other positive mood adjectives (e.g. relaxed, calm) (29, 34, 40, 41, 43).

    The 53 studies measured well-being in different contexts. Momentary well-being in relation to natural environmental variables and contexts (e.g. nature, daylight, urban areas) was assessed in 11 studies (3, 6, 18, 21, 26, 27, 28, 36, 40, 44, 50). Well-being was assessed in relation to physical activity and/or sedentary behavior (k = 9) (2, 5, 9, 19, 25, 38, 43, 48, 53), cognitive processes (e.g. visual search, emotion detection) (30, 31, 32), health (20, 22, 23) alcohol (13, 37, 42), mind wandering (1, 4), food (24, 33), work (12, 17, 51, 52), and sleep (16, 46). Two studies focused on fluctuations of well-being in daily life (34, 45). Single studies measured momentary well-being in the context of homesickness (41), phone use (49), immigrants (14), soccer (10), music (8), fitness exposure (39), transport (47) and social interactions (7). The goal of three studies was to predict momentary mood based on objective data and compared this to EMA data (11, 29, 35). Finally, one study compared different sampling strategies to assess the effect of context on mood (15) (see Table 3).

  2. (2)

    Use of objective data

    Of the 53 studies, 22 studies used only self-reported data, whereas 31 studies (58.5%) included objective, passively measured data.Ten studies used an accelerometer to measure movements (2, 5, 9, 16, 19, 25, 36, 38, 48, 53). Eight studies used GPS location data of the smartphone to investigate well-being in the (natural) environment (3, 12, 13, 15, 18, 26, 44, 45). Four studies used a combination of GPS and accelerometer data (6, 21, 47, 50). One study assessed phone use (49) or the heart rate variability (52). Single studies used an exposimeter of radiofrequency-electromagnetic fields (23) or a light sensor on the clothes of participants to sample the amount of daylight every minute (40). Furthermore, five studies used multiple objective measures, such as a combination of GPS data, time of day, temperature, and neighborhood information (28) or GPS, accelerometer, light, microphone, calls, texts, and Wi-Fi (46). The studies predicting mood used objective measures of mobile phone use (call events, screen use, application use and mobile camera use) (11) or a wide range of (body) sensing data, such as acceleration meter data, heart rate, light level, Wi-Fi signals and GPS location (29, 35).

  3. (3)

    Schedule

    The majority of studies, (k = 43, 81.1%) monitored participants during one specified wave of data collection. The study duration ranged from 1 to 56 uninterrupted days with a mean of 12.8 days (SD = 12.8) and median of 7 days. In eight studies the duration was up to the participant (1, 3, 12, 13, 18, 19, 32, 44). Two studies used multiple waves of data collection (7, 28), with 3 times 21 days and 4 times 7 days (each time a different week of the month), respectively.

    The average number of questions per EMA ranged from 1–57, with a mean of 13.1 (SD = 11.8). The average completion time, based on the report of 13 studies, was somewhat more than 1.5 min (104.9 s, SD = 50.5 s).

  4. (4)

    Technology and administration

    In most studies (k = 33) participants could use their own smartphone in the study, either an Android smartphone (k = 16), iPhone (k = 9) or both (k = 8). In 13 studies, research smartphones with the application installed were provided to the participants. The remaining studies (k = 7) did not specify what type of smartphones or operating systems were used.

    Applications were especially developed to answer the research question and included research specific functions. For example, to measure happiness in relation to food, the SnackImpuls application has the additional feature to quickly categorize and select food choices (33) and the MuPsych app plays music (8). The movisensXS software was used by seven different research teams to create an adapted EMA application (2, 5, 15, 24, 38/48, 40, 52). The (data of the) Mappiness app (k = 5) (3, 12, 13, 18, 44), the Well-being Science app (k = 2) (10, 45), SnackImpuls app (k = 2) (33, 37), the CalFit app (21, 50) and a specific movisensXS app (38, 48) were used in multiple publications to answer different research questions.

    Excluding the double applications, 45 publications used unique applications and 25 of those 45 studies included objective data in addition to self-report to answer the research question. In 13 of the 25 applications (52.0%), the objective data collection (e.g. GPS and accelerometer data) was integrated in the application, using smartphone sensors. In the other studies, either a different application was developed to collect the objective data (k = 2) (6, 11), an additional accelerometer to wear on the arm, chest or hip was provided (k = 8) (2, 5, 9, 25, 36, 38, 40, 53) or an additional exposure meter (23) or heart rate meter (52) was used.

  5. (5)

    Prompting strategy

    The time or interval-contingent design was the most common sampling form (k = 42, 79.2%). Participants received on average 5.0 prompts per day (SD = 2.9) on random times on their smartphones, with a range of 2–12 times. Seven studies used an event-contingent design. In three of those studies the participant had to initiate the EMA after eating (24), a trip (47) or a social interaction (7, 41). In other studies participants were prompted with a questionnaire when an accelerometer measured physical activity that surpassed or fell below a predefined activity threshold (9), when they started to listen to music (8) or based on the exposimeter data (23).

    Three studies used a mixed design (28, 38, 8), using a combination of time and event-contingent sampling based on physical activity or location. A mixed design was used to prevent the collection of too little data when not enough events occurred. Lastly, one study (15) compared four different sampling designs: based on time, based on combined time and distance (prompt when moving), based on location (prompt when moving to a new location) and, lastly based on land use and population density (prompt when moving to a location with a different type of land use or with a different population density). Location-based sampling resulted in less prompts compared to time-based sampling, but in more triggers for unique locations and a greater spatial spread.

    In 41 of the 53 studies (77.4%), participants were prompted with a buzz or auditory signal of the smartphone application. Three studies used text messages to remind the participants to open the app and answer questions (10, 22, 45) and one study used a smartwatch to signal participants (29). In the remaining six studies, the participants had to initiate the questionnaire themselves. Participants had to start a questionnaire after a predefined event in five event-based sampling studies (7, 8, 24, 41, 47). In the other two studies, participants had to initiate an EMA in the morning and the evening (21, 50) or somewhere in three predefined timeslots (31).

  6. (6)

    Response and Compliance

    Only 25 (47.2%) studies reported information on the compliance with the EMA design. Additionally, seven studies provided enough information to calculate the compliance rate. Based on the 32 studies, participants completed on average 71.6% (SD = 14.1%) of all EMAs with a range of 43–95%.

    Five studies reported a relation between compliance and another participant or study variable. Higher compliance was found in the first three days compared to the fourth until seventh day (37) and during weekends compared to weekdays (42). In study 25, no differences in compliance levels for assessments over the day has been found, whereas in study 20 surveys were most often responded to between 12 and 2 pm. Also, participants were found to be most often at home, alone, or involved in work/study when responding (20). In study 42, EMA compliance was unrelated to age, gender and other personal characteristics.

    Based on 9 of the 13 studies with a research phone that did report the compliance, the average completion rate is 76.4% (SD = 0.08). This is not significantly different from studies where participants could use their own smartphone (70.6%, SD = 0.16, based on 23 studies), p = 0.294.

    The compliance levels are not different in studies when participants received incentives (k = 14, compliance based on reports of 9 studies: 72.1%) compared to when participants did not receive incentives (k = 30, compliance based on 19 studies: 69%), p = 0.614. Participants in the remaining 9 studies received course credit, participated in a lottery or received a gift (compliance based on 4 reports: 82%). This compliance is also not different from the compliance in the paid or non-paid studies (p = 0.168 and p = 0.127).

    Studies with the highest compliance rates (> 84%) seem to last either 7 or 14 days with 3–6 prompts per day. Furthermore, studies with a relatively long duration tend to have lower compliance, as 30 days sampling (22) resulted in a compliance of 50% and 21 uninterrupted days of sampling in a rate of 43% (42). However, based on reports of 27 studies, there is no significant correlation between the reported compliance rates and the duration of the study (r = − 0.144, p = 0.458) or the prompts per day (r = 0.243, p = 0.204).

    Besides overall compliance, the timing of responses to prompts is important in EMA studies. This is essential information, since the validity of EMA studies is based on momentary experiences, i.e. the answering of the questions should be in the moment. Whereas a number of studies reported a limited time (90 s–1 h) to respond to a prompt, only three studies actually reported the average latency time between the prompt and answer. The assessments were completed within 38 min (51), 11.6 min (22) or within 28 s (23). In addition, study 20 reported that 66.5% (n = 260) of the EMA’s was responded to within a minute, only 2 took more than 5 min.

    Most studies (k = 43) recruited a target sample, whereas in 10 studies anonymous participants downloaded a freely available and widely advertised application (1, 3, 12, 13, 18, 19, 22, 26, 32, 44). Both methods of recruiting lead to participant drop out during the study, resulting in a difference between the number of participants in the initial enrollment sample and analytical sample. Thirty studies reported this attrition rate and the average dropout rate was 17.1% (SD = 20.8%) with a range of 0–96.8%. Based on reports of 25 and 32 studies respectively, the attrition rate was not related to study duration (r = 0.106, p = 0.550) or the number of prompts per day (r = 0.233, p = 0.241).

  7. (7)

    Analyses

    Most studies (k = 40, 75.5%) analyzed the data using multilevel models or mixed models (fixed and random effects), taking into account the nested nature of EMA data. EMA data consists of repeated measurements of participants over multiple days. Therefore, in the analysis, adjusting for individual effects is necessary. However, five studies only looked at group differences and did not analyze differences on the individual level (10, 14, 24, 30, 31). In addition, two studies computed the correlation between well-being and the other measure separately for every participant and averaged these correlations (5, 8). One study analyzed the data at response level instead of participant level (20). Finally, two studies did not perform any data analysis (6, 15) and three studies only provided a summary and descriptive data to show the feasibility of smartphone-base EMA (17, 23, 34).

  8. (8)

    Results of the studies

    Table 4 describes the results of the reviewed studies, grouped by context, with examples of statistical results of interest. To briefly summarize, momentary well-being fluctuated daily and weekly, with on average higher well-being in the evening and weekend. Yet, often both daily and weekly fluctuations disappear when type of place, physical activity or other environmental variables are controlled for. On average, being in a natural environment in daily life (e.g. walking in the park) and physical activity over the day is associated with positive affect and higher well-being. Working was ranked lowest in happiness levels, but employees do show fluctuations based on the task, where you work (office, home or somewhere else), whether you are alone or with others; the time of day or night working and personal characteristics (e.g. lower WB associated with working when married, but higher when having children). Work stress is negatively related to positive affect.

    The effects of mind wandering on positive affect are inconclusive with a negative and no relation. Eating is related to positive affect, with stronger effects for vegetable consumption and snacks. Drinking alcohol is related to higher momentary positive affect on average, but this increased well-being does not last or spill to other moments. Sleep and well-being were related, with a stronger effect of sleep on mood than of mood on sleep quality. Furthermore, on average, positive affect was associated with faster visual search reaction times, but not with other cognitive measures. Focusing attention on well-being by completing multiple questionnaires about well-being for a few weeks does increase well-being. One study investigated why internal migrants report lower levels of happiness than locals, even after accounting for socio-economic factors. EMA showed that migrants spend less time to happiness-producing activities such as active leisure, and social parties than locals. Furthermore, single studies show that exposure to fitness inspiration online and social media use is related to lower happiness, whereas listening to music and watching soccer is related to higher momentary happiness on average. Walking and cycling is better for your mood than sitting in a bus or auto. Lastly, the studies predicting mood and well-being based on objective data, such as phone use, were successful in 55% to 76% of their mood predictions. However, other models, for example based on the mean mood state, performed better than the personalized models using objective data. See Table 4 and the supplementary data for more detailed results and the corresponding studies.

  9. (9)

    Limitations and risk of bias

Table 2 Characteristics of the samples included in the 53 reviewed studies
Table 3 Design characteristics of the 53 included Ecological Momentary Assessment studies of well-being
Table 4 Summary of the results of the 53 reviewed studies and examples of estimated effects of interest on mood or well-being, sorted by study context

Using the checklists for EMA studies from Liao et al. (2016) and van Roekel et al. (2019), we checked the risk of bias across the studies. Regarding the procedure and methods, 49 of the 53 studies reported on the type smartphone used, all studies reported on the prompt design, study duration, number of assessments per day and incentives provided and all but one study reported on the total number of items per assessment. Of the 53 studies, 37 reported the attrition rate, i.e. how many participants dropped out of the study. Fourteen of the 53 studies reported the average response time (time lag between the prompt and response) of the participants and as already noted, the overall compliance is only reported in 25 of the 53 studies and reasons for noncompliance are often lacking. In a few studies, participants were excluded based on compliance rates. Only three reviewed studies (25, 27, 31) reported a power analysis prior to the data collection to determine their needed sample size and reach a power of 0.8. One study performed a post-hoc power analysis (36) and concluded that their power was really low (0.18 instead of 0.8). A lot of studies miss data on compliance and all details of the design. Therefore, the reporting of the included EMA studies is often not complete and this indicates some risk of bias and selective reporting.

An often-reported limitation in the reviewed studies is the limited sample size in number or generalizability to the whole population. Samples in studies that use free available applications seem to be biased by self-selection. For example, 35 studies reported a bias in their sample, having attracted a younger (k = 10), more highly educated (k = 10), a mostly female (k = 21) or male (k = 6) sample compared to the whole population, with 12 studies having multiple biases, affecting the generalizability of the results. In addition, part of the applications (k = 25) were only available for iPhone or Android smartphones. This biased sample might affect the results and outcomes of the studies.

Another frequently reported limitation is controlling for only a small number of environmental variables (e.g. green/urban, noise or light levels or social environment) and other confounders (e.g. personal characteristics or weather). Since EMA measures behavior and experiences in real life, there is less control over the measurement and context. Confounders such as the weather, social environment, nutritional status, alcohol consumption, but also personality traits, might have affected well-being instead of the variable of interest.

Some studies (2, 5, 6, 36) reported their sampling strategy as limitation, since the number of data points and compliance levels were lower than expected. A way to solve this is to trigger questionnaires not only based on time, but based on variables of interest, such as location or activity. Lastly, another often reported limitation is the impossibility to determine the causality between the variable of interest and happiness with the current design and available data.

3.2 Findings of Studies Using Other EMA Data Collection Devices

The search to EMA studies that used palm tops, PDAs or other EMA devices to assess well-being multiple times a day resulted in 8 studies that included healthy participants (Dvorak et al. 2018; Elavsky et al. 2016; Ilies et al. 2010; Johnston et al. 2013; Juth et al. 2015; Shiffman et al. 2002; South and Miller 2014; Yip 2005) (see supplementary data Sheet 3 for details of the studies and the results).

To summarize, seven of the eight studies were conducted in the USA, with the remaining study in the United Kingdom. The average number of participants included was 142.9 (SD = 85.8) with a range from 62 to 304 participants. The participants were on average 31.7 years old (SD = 13.9) and more females than males were included (68.1%, SD = 20.1). All studies used a measure of positive affect (e.g. PANAS) in their assessments. Positive affect was assessed in different contexts, namely in relation to smoking, stress, ethnicity, emotional coping, sedentary behavior and blood pressure. Three studies used a personal digital assistant (PDA), four used a palm-top computer and the last study used a hand held computer to deliver the prompts and assessments. The average study duration was 8.5 days (SD = 4.3) with an average of 5.4 prompts per day (SD = 1.4). All studies used time-contingent designs, but in the two smoking studies participant were also asked to answer questions after smoking a cigarette, i.e. event-contingent. Only two of the eight studies added objective data, either using an accelerometer to measure sedentary behavior or a cardiovascular monitor to assess blood pressure. Across the 8 studies, the average compliance rate was 78.4% (SD = 8.3) with a range from 70 to 91%.

Shiffman et al. (2002) found no relation between momentary positive affect and smoking, whereas Dvorak et al. (2018) reported an association between momentary positive affect and momentary smoking. Single studies found a negative association between momentary positive affect and stress and internalizing problems (South and Miller 2014) and sedentary behavior (Elavsky et al. 2016). Other studies found a positive association of positive affect with ethnic salience (Yip 2005), emotional approach coping (Juth et al. 2015), and work effort and demands in nurses (Johnston et al. 2013). Finally, there was no correlation between momentary positive affect and blood pressure (Ilies et al. 2010).

4 Discussion

We performed a systematic review on smartphone-based ecological momentary assessments in well-being research in healthy participants. Using the PRISMA guidelines we retained 53 studies (out of 398 studies), which we included in the present review. Overall, the study designs were very heterogeneous of nature, with varying sample size, used questionnaires, phones, study duration (M = 12.8 days with most studies lasting seven days) and how often well-being was assessed during the day (range of 2–12 times a day). Additionally, well-being was assessed in relation to different (environmental) variables, from the (natural) environment, to physical activity, work and other variables. In addition to self-report data, half of the studies included some objective data measured using the smartphone sensors or additional meters (e.g. GPS, accelerometer data or telephone use). Less than half of the studies (47.2%) reported the response rate and compliance. Based on this limited information, on average 71.6% of the EMAs is responded to by the participants.

Based on the reviewed studies we can conclude that momentary well-being fluctuated daily and weekly, with higher well-being in the evening and weekend. These fluctuations disappeared when location and activity were included. On average, being in a natural environment and physical activity relates to higher well-being. Working relates to lower well-being, but workplace and company influence well-being. Besides the mentioned limitations before (e.g. relatively homogenous samples and less environmental control), a main limitation of the reviewed studies is the focus on average scores across people, and ignoring individual differences and patterns over time.

There are some notable differences between the designs of EMA studies with other data collection devices and smartphone-based EMA studies to well-being. First, the maximum sample size in smartphone-based EMA is much larger than in palm top or PDA studies. Next, the average study duration is longer when using smartphones (12.8 days) compared to other devices (8.5 days). These differences show that data collection is easier when participants can use their own devices. There are no (financial) restrictions on the both the sample size and study duration as researchers do not need to buy devices. Furthermore, smartphone studies more often add objective data to their designs, reflecting the flexibility of smartphone applications compared to other devices. Based on the limited number of studies, the compliance rates of both sets of studies are similar (p = 0.204), namely 71.6% for smartphone research and 78.4% for other devices. Only sedentary behavior is assessed in relation to well-being in both smartphone studies and a PDA study. In most studies, momentary positive affect is associated with less sedentary behavior. The other results cannot be compared.

Our review shows that smartphone-based EMA designs are feasible, have multiple advantages over other EMA collection devices and can be used to inform our understanding of well-being in addition to traditional questionnaire research. The real-time responses in a natural environment for the participants give an insight in the fluctuations and patterns of well-being in individuals that cannot be captured with retrospective self-report measures. Furthermore, assessing well-being in different contexts and/or in response to activities or events, can contribute to understand the dynamic nature of well-being and to identify the causal influences on well-being. This might be useful in creating interventions to increase well-being if we know what makes people happy.

4.1 Guidelines for the Use of Smartphone-Based EMA Designs in Well-Being Research

Based on findings of the reviewed studies and the current limitations we propose recommendations and future directions for EMA studies to get a better hold of the complexity of well-being (see Table 5 for an overview).

Table 5 Guidelines for future smartphone-based EMA research of well-being

As a first guideline, we recommend to use smartphones instead of other data collection devices in future EMA studies. Whereas you need to provide participants with the other devices, nearly everyone nowadays owns a smartphone (already more than 70% of the Western and USA population and 45% worldwide (GSMA intelligence, 2019)). This leads to the possibility of reaching larger samples and longer study durations are possible. In addition, the flexibility and continuing developments of smartphones and applications is preferable when designing strong studies, i.e. the addition of objective data collection is easier. Only in studies where the target sample is expected not to have a smartphone, e.g. younger children, providing a device (either a smartphone or any other device), could be a good solution.

4.1.1 Sampling

An important advantage of smartphone-based EMA designs is the possibility of reaching many people and including unique or large samples. Advertising the nation- and worldwide applications Track Your Happiness and Mappiness resulted in the largest sample sizes ever used in EMA research (N > 10.000). This review shows that a smartphone-based EMA design is feasible to use in all types of samples, ranging from adolescents to older participants. Vilaysack et al. (2016) showed that it is even feasible to perform smartphone-based EMA research in 5–7 year old children. With the increasing rate of smartphone users around the world and simple design of an application, more people can be reached in a smartphone study that are otherwise not included in research. Using such large and heterogeneous samples, the influence of and interplay with many factors (e.g. in culture, education, environment) can be investigated on a large scale.

Whereas some reviewed studies advertised such a freely available application, most studies recruited a convenience or specific target sample (e.g. employees or university students). Both recruitment ways have advantages and the best recruitment depends on the study goal. An anonymous sample can become really large and leads to more power, but there is more control over a targeted sample (e.g. specific characteristics or to contact them for follow-up). Large scale open recruitment can have the same drawback as large online survey studies, such as oversampling of females (Saleh and Bista 2017). A quantity-quality tradeoff can be seen in recruiting an anonymous larger sample versus a smaller sample with more control over the representativeness (similar to the quantity-quality trade off in a genome-wide association study (Okbay et al. 2016). We recommend to choose the recruitment method based on the specific research question and goal.

Only three reviewed studies reported a power analysis to estimate the required sample size for their study prior to data collection. A power analysis to calculate the required sample size in EMA studies is often complicated, since power in EMA studies does not only depend on the number of participants, but also on the number of prompts per day. The choice to increase power based on the number of participants or prompts depends on which level the effects are of most interest. Mathieu et al. (2012) suggest that when interested in detecting lower level effects (e.g. relationship between environment and well-being) maximizing the number of lower-level units (number of prompts) is most beneficial, whereas when interested in higher level effects, (e.g. etiology of the within person fluctuations of well-being) increasing the sample size might be more beneficial.

4.1.2 Measures of Well-Being

Most reviewed studies used one happiness question or a combination of positive affect adjectives to assess subjective well-being (e.g. relaxed, feeling good). Only one study used a full well-being sale, the Warwick-Edinburgh Mental Well-being Scale with 14 items (Bakolis et al. 2018). Answering many questions multiple times a day might become too much of a burden for participants and could decrease the compliance and data quality. For example, in the study of Bakolis, only 25 of the 108 participants had a compliance rate of 66% or higher. Using a full questionnaire to investigate momentary well-being is thus not recommended. Krueger and Schkade (2008) investigated the reliability of a single momentary affect measure (e.g. happy, depressed, angry) in the Day Reconstruction Method (DRM) and found reliabilities of 0.50–0.70 (similar to the reliability of a general well-being measure (Diener et al. 1985; Lyubomirsky and Lepper 1999). As DRM and EMA designs result in nearly identical happiness ratings over the day, single item measures are also thought to be reliable in EMA research (Dockray et al. 2010; Kahneman et al. 2004). Therefore, including only one or a only a few well-being questions is preferable to keep the participant burden low.

Whereas the reviewed studies only investigated the hedonic/affective part of well-being (e.g. happiness or mood), well-being also consist of a cognitive part (e.g. satisfaction with life) and eudemonic well-being (the fulfilment of human potential (Ryff 1989; Ryan and Deci 2001)). These components of well-being have not been assessed in smartphone-based EMA studies yet. Rating your current mood or happiness (i.e. affective WB) is relatively easy, whereas a bit more thinking and cognitive processing is needed to report on momentary life satisfaction, i.e. cognitive well-being. Nevertheless, reporting life satisfaction in the moment is possible and some variation over time is expected. In contrast, Steptoe (2019) suggest that answering questions on eudemonic well-being (the meaning of life), needs more thinking and cognitive processing, including aggregation over time and comparisons with self-selected standards. Less variance in daily or hourly eudemonic well-being is expected and therefore, this aspect of the well-being spectrum seems less suitable for EMA. We recommend to include questions in an EMA study to investigate fluctuations in both cognitive and affective well-being (e.g. “How do you feel right now?” (unhappy/happy) and “How satisfied are you with your life at this moment?” (unsatisfied/satisfied)).

4.1.3 Objective Data

More than half of the reviewed studies combined self-report with objectively measured data, either collected from smartphone sensors or an additional (accelero-)meter. Including the measures on the same device (e.g. two different apps) or in the same application is most convenient for both the researchers and participants, as data is integrated immediately and participants do not have carry additional meters.

An important advantage of objective data in general is the passive nature of the data and thereby the ecological validity (i.e. data is collected without active participation of the participant in their natural setting). This reduces the participant burden, and complements self-report data, with no issues of compliance. Furthermore, objective data can provide more reliable information. For example, in physical activity research, only a weak correlation between self-report and objectively assessed accelerometer data has been found, indicating biases in self-report (Dyrstad et al. 2014; Prince et al. 2008). Also in sleep and phone/Internet use research, objectively assessed data might be more reliable than self-report (Boase and Ling 2013; Girschik et al. 2012; Junco 2013).

Combining self-report data and sensor data is a powerful design and can lead to new insights into the interaction of mental, physical, and environmental processes in daily life. Over the next years, objective data is likely to be included more often with the continuing developments in smartphone technology, e.g. more sensors are being embedded in mobile phones (step count or heart rate) (Mohr et al. 2017). Recently, open source platforms have been developed to make the collection of passive data from smartphone sensors easier for researchers, facilitating the use of passive data in research even further. A few examples are the AWARE framework (Ferreira et al. 2015), RADAR-base (Ranjan et al. 2019) and the Insight app (Montag et al. 2019).

However, some difficulties with passive data collection remain, as also noted by Mohr and colleagues (2017). First, smartphone sensors vary in different phones and personal characteristics, such as age and gender, might affect the data. Older people use their phones differently than younger people and males more often carry along their phones in their pockets compared to females (Ichikawa et al. 2005). To correctly process and interpret objective data from different phones and populations more research is needed. Furthermore, with the new European General Data Protection Regulation (https://eugdpr.org) in action, researchers have to adhere to strict rules regarding privacy and personal data. Passive data results in large amounts of possibly identifying information, data processing is needed before storage. For example, GPS data cannot be stored with all details to prevent identifying information on the home location, but should be measured using an unknown relative location or transformed to an appropriate level (e.g. street level).

4.1.4 Schedule and Prompting Strategy

The average number of prompts per day in the reviewed studies on well-being is 5.0 (range: 2–12) and the duration 12.8 days. According to an earlier review of EMA, it is reasonable to administer up to 10 EMAs a day to participants for periods ranging from 1 week to 1 month (aan het Rot et al. 2012). Our review shows that the frequency of prompting and the study duration is influenced by the specific study goal and context. To capture fluctuations, the number of prompts needed depends on the time-scale of variation in the context of interest. When interested in more varying events or contexts, participants need to be prompted more often to capture all types of the context. For example, Poerio et al. (2013) prompted participants 12 times per day to investigate the effect of different types of mind wandering on well-being, whereas three or four prompts per day seem to be enough to assess well-being in different environmental contexts (e.g. Bejarano et al. 2019; MacKerron and Mourato 2013). With rare events, event-based sampling or a combination of different sampling strategies might be preferred to capture these events and reduce the number of prompts.

Based on these findings and the literature on well-being, we recommend to prompt participants based on the context of interest, but at least three times a day (every morning, afternoon and evening) for a minimum of seven days to assess the fluctuations of momentary well-being over the day and on different weekdays. In addition, based on findings of seasonal depression (American Psychiatric Association 2013) and individual differences in mood changes in summer and winter (Golder and Macy 2011; Klimstra et al. 2011), we might expect seasonal fluctuations in well-being or a different pattern of daily WB fluctuations. Therefore, we recommend to assess well-being in the different seasons using EMA as well.

A few studies let the participant decide how many prompts he/she wanted to receive, with a minimum or default level. This individual control over the number of prompts might increase the compliance levels. Unfortunately, with the limited reported data, we could not test this relation. In addition, individual differences might exist in the preference for and reaction to the number of prompts. More research is needed to assess this preference in combination with participant burden and compliance levels. Until these limitations are solved, we recommend to base the number of prompts on the time-scale of variation in well-being in the context of interest.

4.1.5 Applications

Most studies used applications specifically developed for the research question and context of interest. As the development of a properly working application cost a lot of time and money, collaborations with app developers or other researchers should be encouraged to create an app combining research, good usability and technical factors (McEwan et al. 2019). The application should be tested multiple times to reduce the burden for participants, since participants drop out as soon as a problem occurs or an update is needed (McEwan et al. 2019).

Furthermore, especially when not offering incentives, providing feedback to participants after the study (e.g. in the form of a well-being graph over time) is recommended to make participation fun and keep compliance levels high. For example, in the Mappiness studies (MacKerron and Mourato 2013), participants received personalized graphs with happiness over time, related to where they were, with whom and what they were doing. However, undesirable for research and EMA studies, this feedback may make participants aware of their well-being, leading to a reactive effect and influencing the data. Therefore, feedback should be provided after data collection is completed. Awareness of well-being might lead to people initiating more activities that make them happy. Ludwigs et al. (2018) did show that two weeks of paying more attention to your well-being has a small positive effect well-being. However, recently De Vuyst et al. (2019) showed that 10 EMAs of emotions per day does not impact participants’ emotional experience over time. Both samples consisted only of (psychology) university students. More research in a more heterogeneous sample to this reactive effect of attention to well-being is necessary for a conclusive answer.

4.1.6 Response and Compliance

Not even half of the studies reported the response rate and compliance of the participants. Based on the limited data, the EMA compliance was relatively high, with a mean of 71.6%, but most studies did not reach the preferred 80% compliance to have generalizable data of daily life (Stone and Shiffman 2002). Furthermore, the kind of missing data (random vs. not random) is not reported in the reviewed studies, but should be taken into consideration. Based on the limited reported data, results did suggest there is no difference in compliance using a research smartphone or using the participant’s own smartphone. In addition, we did not find a clear relation between compliance and other study characteristics, such as the study duration, number/timing of prompts, providing incentives or participant characteristics, similar to Jones et al. (2019). Further research is needed to confirm these findings, since this information is critical to optimize future EMA studies.

Only three studies reported the latency (time difference) between the prompt and answer. This latency is important, since the validity of EMA studies is based on momentary experiences, i.e. answering of questions should be in the moment. Though most studies did limit the answering time, this could be up to an hour, reducing the ecological validity. However, returning to the added value of passive data, if you know the timing of the answer and passive data collection is continuously, the time difference between prompt and answer might be less of a problem.

To keep compliance levels high, subject-management procedures such as incentives, but also training and feedback might help. For example, incentives for completing assessments are found to increase compliance levels to some extent in earlier EMA studies (e.g. in substance abusers Beckham et al. 2008; Shiffman 2009; Sokolovsky et al. 2014)). However, this evidence is limited to non-smartphone EMA research. In our reviewed smartphone studies, compliance levels seemed to be independent of incentives. Further research to compliance and incentives in smartphone-based EMA is needed.

4.1.7 Analyses

Whereas most studies did apply a multilevel model to the EMA data, a main limitation of the reviewed studies is ignoring the variance (individual differences) and within person patterns over time. Focusing on group averages is common in the field of wellbeing (and beyond), but this leads to missing important information about individual differences. In EMA research, ignoring individual differences also reduces the ecological validity (Ram et al. 2017). As individuals differ in their patterns of mood, behavior, and other experiences in life, the average person or pattern of those variables across participants does not exist in the real world. For example, in contrast to the general belief that people feel better after physical activity, strong individual differences in the affective responses during and after exercise are found (Ekkekakis et al. 2005; Rose and Parfitt 2007, Welch et al. 2007). In the study of Van Landuyt et al. (2000) half of the participants felt less happy during and after physical exercise compared to before. Schutte and colleagues (2017) showed that genetic factors explain 15% of the individual differences in affective responses (Schutte et al. 2017).

The same individual differences might exist for well-being and other environmental variables, such as location preferences or outdoor activities. For example, MacKerron and Mourato (2013) show that people, on average, are happier in natural environments. However, individuals do differ in their reaction to urban or natural environments. Newman and Brucks (2016) showed that people differ in the environment needed to restore their self-control. People high on neuroticism prefer an urban environment to restore self-control, whereas people low on neuroticism prefer a more natural environment. This suggests that individual differences in the effect of natural environments on well-being might exist as well.

Another limitation of averaging data is ignoring a large part of the data. Patterns of moods or behavior over the day or week contain a lot of information and might be more informative than a total or average score. As mentioned in the introduction, when investigating the relation between health and physical activity, the pattern of accumulation and not the total volume is more informative (Chinapaw et al. 2019). As another example, Smit et al. (2019) tried to predict an increase in depressive symptoms in six patients. Results show a rise in restlessness more than 2 months before the increase in depressive symptoms, whereas the total negative affect and positive affect scores did not have the same capacity to signal future increase in depressive symptoms.

In summary, we recommend to focus on fluctuations, patterns of well-being, and individual differences instead of the average or sum. In addition, EMA data is perfect to investigate individual differences in well-being and the relation with environmental variables or other (causal) correlates.

4.2 The Future of EMA in Well-Being

Well-being has become a popular research topic with increasing number of publications in the last years. Well-being findings have the potential to inform policy and are applied more often in the society. Nowadays, using new technologies, detailed information about the fluctuations in well-being can lead to more insight in what people experience in their daily life and what makes them happy, from day to day or even hour to hour.

Our review shows that using a smartphone application to measure well-being multiple times a day is feasible and preferable over other EMA data collection devices. More systematic research on the fluctuations of the different components of well-being (e.g. affective and cognitive WB) in diverse contexts is needed to understand what influences momentary well-being and makes people happy. The continuing developments in smartphone technology (e.g. processing power, battery life, and new sensors and functions) and open source platforms will result in easier collection of self-report and objective data (Harari et al. 2016). The addition of objective data to self-report data is especially valuable to understand the relation between context, environmental variables, and momentary well-being. We recommend to use all aspects of EMA data and to focus on data patterns and individual differences using sophisticated analyses such as time-series analyses.

In the future, when the fluctuations and patterns of well-being and the interaction with environmental variables that make individuals happy are known, a next step in smartphone-based well-being research is a shift from EMA (assessing behavior in the moment) to EMI (ecological momentary intervention), intervening on behavior in the moment to increase well-being (Heron and Smyth 2010). For example, the Shmapped application (McEwan et al. 2019) is developed as both a data collection and intervention tool. When prompted, participants are instructed to notice the nature around them and to recall good things. Preliminary results show improvements in wellbeing at one-month follow-up (McEwan et al., under review). Furthermore, a recent meta-analysis showed that smartphone-based interventions (e.g. awareness or relaxation exercises) to increase well-being in clinical samples have a small positive effect on the quality of life (Versluis et al. 2016). However, most interventions use a one size fits all approach. All participants receive the same intervention to increase well-being and individual differences and reactions to the intervention are not responded to. Taking into account individual differences might increase the effectiveness of future well-being interventions.

5 Limitations

A quantitative meta-analysis on the fluctuations of well-being or the effect of environmental variables was not possible, since the studies were too heterogeneous in study methods, contexts and analyses (e.g. the well-being measure, duration, number of prompts, reported statistics). Another limitation of this review might be the incomplete retrieval of articles. The field of smartphone and application based EMA research is relatively new and gaining popularity. The terminology to describe EMA studies varies and the results of EMA studies are being published in various forms and places. To be as conclusive as possible, we followed PRISMA criteria for systematic reviews.

6 Conclusions

The 53 reviewed studies showed that a smartphone-based form of EMA research is feasible to assess the fluctuations of momentary well-being in different contexts. Based on the used designs and findings, we provided recommendations for future smartphone-based EMA research in well-being. In the near future, with the continuing developments in smartphone technology, the use of smartphone applications combining EMA self-report and objective data can result in more specific knowledge about the fluctuations of well-being and what makes people happy.