1 Background & introduction

More and more adults worldwide suffer from obesity or overweight [14, 47]. Comprehensive approaches to obesity management are required to prevent weight (re)gain (after a diet) among all population groups [17, 35]. Extensive use of internet and especially smartphones provides a useful means of designing intervention systems that will facilitate behavior change and health improvements [36]. Rise of the internet and mobile broadband technologies has been tremendous, with the latter being the most dynamic market (penetration level of 47%) demonstrating a 12 times increase since 2007 [48]. At the same time, more and more people use internet to look up information about health services [15] and more specifically about diets [26]. Interventions based on such e-health (e-coach) approaches will play a significant role in shaping health and diet tailoring systems in the next years [23].

As expected, there are many studies that examine the potential of e-health for the prevention and treatment of overweight people and obesity. Such approaches can be web-based [3], SMS based [43], mobile based [5, 46], and furthermore can be based on different analytical methodologies [12, 13, 49]. There is also an increasing focus on techniques that utilize a smartphone in order to provide an e-coach service [9,10,11, 25, 30, 50]. The majority of these studies focus on weight loss but they do not provide automated feedback through an analytical study of persons’ individual data.

Ecological Momentary Intervention (EMI) uses a combination of real-time assessment and treatment. One way to obtain real-time assessment data is through Ecological Momentary Assessment (EMA). EMA includes a suite of methods that assess research subjects in their natural environments, in their current or recent states, at predetermined events of interest, and repeatedly over time [44]. Thus, EMA allows for an on-line self-monitoring data collection leading to more accurate and ecologically valid results compared to self-report retrospective questionnaire assessments [40]. Essentially, memory recall bias associated with such retrospective assessment is minimized since EMA measures events very promptly, while in retrospective questionnaires the events that stand out, such as emotionally salient events, are recalled disproportionately more often than other events [18].

EMI allows the provision of (indefinite) treatment in the natural environment [21]. To accomplish this, assessment and treatment is conducted and provided via a mobile platform, such as a smartphone. The advantage over traditional treatment is that EMI does not necessarily involve therapist contact but observations made in daily life are used as input to guide therapy-based techniques and progress. EMA methods have been applied to the domain on dieting and self-control [22] but not systematically through an intervention. Several randomized controlled trials to assess the effectiveness of interventions for eating behavior and weight reduction have been conducted [1, 4]. However, as with smartphone-based e-coaches, none of these approaches utilize a systematic machine learning approach.

In this work, we will present an algorithmic process that utilizes EMA methods for developing a machine learning based approach in order to provide adaptive semi-individualized feedback to users regarding their eating behavior. Our machine learning pipeline (based on decision trees and clustering algorithms) takes as input the user collected data (through the EMA) and provides information regarding possible unhealthy eating events (i.e., when participants are likely to eat something unhealthy). We directly adjust machine learning algorithms in order to work smoothly with EMA data. Classical statistics often assume that observations are drawn from the same general population and are independent and identically distributed [41]. This assumption is not applicable to EMA data and most machine learning algorithms do not take this into account when treating this kind of data [51]. There have been some efforts to apply decision tree based methods to EMA data [2] to overcome dependencies between data but with limited applications. Other approaches tried to introduce a random factor, but they are only applied to regression problems [20, 33, 42] and not classification.

Our contribution into the domain of e-coaching can be summarized in the following:

  • Our approach encompasses EMI techniques and machine learning in a embedded integrated framework which provides the necessary tools to use lagged data so as to construct rules that predict unhealthy eating behavior. Emphasis is given to providing feedback prior to possible unhealthy eating events (i.e., warn users in the appropriate time manner using a classification algorithm) and to construct groups of eating behavior profiles (using a clustering algorithm).

  • We address the issue of “new” participants (for whom there is not enough data available) by introducing one week of monitoring and afterwards they are matched to the existing profiles (groups of eating behavior).

  • Feedback offered to users is semi-tailored, since it takes into account both each person’s individual behavior but also benefits from similar persons’ data, thus we balance between personalization and generalization.

The paper will first describe the overall framework of Think Slim, the kind of data collected and necessary annotations in Section 2. In Section 3 our machine learning methods are presented and a detailed process of how they can be applied to EMA data is described. Results are presented in Section 4 and finally Section 5 concludes the paper.

2 Overall framework, data, and annotations

“Think Slim”Footnote 1 is an iPhone application developed in-house that allows users to report potential unhealthy eating promoting factors (emotions, activities, etc.). It is designed on the basis of EMA (or Experience Sampling Methods (ESM)) [44]. The application acts a logbook, collecting information about each user (subject) and implements EMA principles in two ways: (a) random sampling (sometimes also called signal-contingent sampling) and (b) event sampling. For random sampling moments, limited input is requested at pseudo-random time points throughout the day (pseudo-random means that the waking day is divided into on average eight 2-h timeframe boxes, and assessments occur at random times within each box). Every day users are randomly notified by a beeper (random sampling) between approximately 0730 and 2230 (exact times depend on the participant’s actual bedtime habits which can be entered the application) with an average interval of 2 h. For event sampling users are instructed to use the application immediately prior to eating something, filling a similar questionnaire to random sampling moments with additional information regarding the food items that were about to be consumed. The implemented EMA schedule can be found in Fig. 1 and the collected information per type sample are presented in Table 1.

Fig. 1
figure 1

Implementation of EMA protocol in Think Slim application: Example distribution of samples per day

Table 1 Variables collected through Think Slim application events

Using the protocol above we are able to collect longitudinal data for eating behavior for every participant. This process provides us with an average of ten responses (random and eating events) per participant per day. Using exploratory analysis techniques, we preprocessed the data in the following manner: (a) Mood states are measured using seven emotions (using visual analogue scale (VAS)) and in order to further enhance discrimination between positive and negative mood states we aggregated (per record) non-zero positive emotions (cheerful, relaxed), and non-zero negative emotions (sad, bored, stressed, angry, worried). Moreover, we discretized their aggregated values to {Low, Mid, High} for the positive emotions and {No, Yes} for the negative emotions. This selection is based on the fact that negative emotions were reported more sporadically and usually were bursty so a binary value showing whether a negative emotion is present or not is justified [28]. In a similar manner, we discretized food craving to {Low, Mid, High}. (b) Location, activities, and thoughts were provided as free text and were also analyzed and discretized to different categories reflecting the whole spectrum of their possible values. (c) Participants can report the strength of their food craving(s) in the app via a VAS ranging from 0–10. They can also indicate if they had a food craving for a specific type of food. If so, the app allows participants to choose the most appropriate food type(s) from among 19 food icons. In case their craved food is not represented by any of these 19 food icons, participants are instructed to select the most closely resembling icon. The same idea is applied for specific eating: Whenever an eating event occurs, user selects an icon (out the 19 possible) that is most similar to the food consumed. The rationale of using these 19 icons is that having people estimate what food they are eating is really difficult for them leading to inaccurate results [29] and asking them to exactly list their food intake at each measurement point will increase response time and lead to lower compliance [31]. Moreover, our work focuses on classifying eating behavior on a high level, i.e., unhealthy (which refers to high caloric choices) versus healthy (which refers to all other items that represent the healthier option), thus detailed information about caloric value or exact food products is not needed. Users select an icon that most closely resembles the food they are about to eat (or they crave). The 19 icons themselves represent 19 different food categories based on the most common foods in the Western / Dutch diet. Also users are asked to upload a picture of their food as a confirmation. This broad selection (19 possible choices) allows us to categorize each specific item either to unhealthy or healthy food. For random sampling events, users are considered to eat nothing at that moment (as they would have reported an eating event if they were eating). It should be noted that when users selected more than one food items and the selection was a combination of healthy/unhealthy options then the unhealthy option is considered. (d) Finally, time-related attributes were created for each sample, given the time of day that each sample was filled ({morning, noon/afternoon, evening}), and whether it was filled during weekend or not ({No, Yes}). In detail, the transformed attributes are described in Table 2.

Table 2 Discretized attributes

Using this EMA-based application we collected data using two studies which are briefly described below. During study I [6, 45], 100 people (57 overweight and 43 healthy-weight people) participated and the goal was to analyze this data in order to build the necessary framework for providing feedback. Duration of the study was 2 weeks. The percentage of completed random moment assessments relative to the total number of notifications people received during the 14-day EMA period was calculated. For study I (duration 2 weeks), overweight participants (based on which the groups were created) completed 81% (SD =10.26%) of the assessments, and healthy-weight participants completed 80% (SD = 9.73%) of the assessments (there were no significant differences between the overweight and healthy-weight participants). Furthermore, participants (overweight and healthy-weight) reported on average 3.9 eating events (SD = 1.2) per day (again no significant difference between the two groups).

During study II [7], 100 overweight people participated in the intervention trial (randomized controlled trial (RCT) with two groups), where the goal was to provide and test the effectiveness of a CBT-based EMI. Total duration of this study was 8 weeks (6 weeks for the intervention and 2 weeks for data collection before and after the intervention). Since there is no prior data for the new participants, we are faced with the infamous machine learning “cold start” problem [39]. The design of the study was such that we could overcome this issue: First week of the trial is used as a data collecting week with no intervention occurring, thus for every new participant we collect all attributes presented in Table 2 making data from the two studies directly comparable. After the end of first week, participants are split into the intervention group (feedback mechanisms are activated) and control group (no feedback, but they follow their own diet). For study II, compliance dropped a bit, probably due to the longer duration (8 weeks in total apart from training and required more effort from the participants), but was still in high levels, i.e., 70.5% (SD = 13.38%). In this case, participants reported on average 3.6 eating events (SD = 1.1) per day. In addition, participants used the application actively (i.e., it was in the foreground) for 9.9 h on average (SD = 8.4) and for the whole duration of the study.

During intervention phase, feedback is provided using three mechanisms: passive feedback, cognitive behavioral therapy (CBT), and adaptive (active) feedback. Passive feedback involves the presentation of several statistics from user eating behavior in the form of pie diagrams and with CBT users critically evaluate dysfunctional thoughts that promote unhealthy eating (e.g., “Even though this food is not part of my eating-plan, I can eat it anyway. I deserve it, because I’m so stressed”). More details for these modules can be found in [7]. This paper focuses on the adaptive feedback module. We developed a mechanism that analyzes the user-specific data, highlights the most discriminating patterns that lead to unhealthy eating behavior and incorporates this information for providing feedback. This is possible through the utilization of machine learning algorithms (classification decision trees and hierarchical agglomerative clustering). The following Section will elaborate on the developed approaches.

3 Methods

In the following subsections the different methods applied in the two studies’ data are presented. Data from study I are used to build discriminative rules that lead to unhealthy eating events and group participants based on their eating behavior. Data from study II assess the effectiveness of the framework built and more specifically how the adaptive feedback module can be utilized to warn participants prior to possible unhealthy eating events.

3.1 Rule building and classification

In order to be able to predict under which instances (i.e., combinations of values of the attributes) participants are led to unhealthy eating, we further processed our dataset (creating data points with lag 1) so as to assess whether a data point can accurately predict whether the next data point (provided they occur on the same day and derive from the same participant) will be a healthy or an unhealthy eating event. Figure 2 shows an example of how data points (belonging to participant “pp5”) are converted and combined in order to enable early prediction of unhealthy eating events. In this example, all data points belong to the same user so they can all be used for generating the lagged data points. However, data point #1 cannot be used to predict eating at data point #2 since they occur in different days. Data point #2 is used to predict eating at data point #3 and similarly data point #3 is used to predict eating at data point #4. This is the reason that given these four data points we end up with two lagged data points.

Fig. 2
figure 2

Data points conversion example for enabling early prediction

Using observations from all different participants and by converting them to lagged data points following the process of Fig. 2, we will use a classifier to discriminate under which conditions (i.e., combinations of attributes) participants are led to unhealthy eating. We chose decision trees as the base classifier since they are fast in processing and produce an interpretable result [38]. Let A = {A,B,C,...,X} be a set of m attributes (like the ones in Table 2) and Y is the outcome attribute (the class variable, taking two values which in our case are {H,U} representing healthy and unhealthy eating respectively but can be extended to any other classification problem). Dataset D contains n records taking various assignments of values for A and Y, each of which represents the record of an observation. Different criteria exist for which attribute will be selected as a branching node, such as information gain, Gini Index, etc. [32]. In our case we select information gain (IG) but the branching is performed in a way that takes into account the longitudinal structure of the data. IG uses the concept of entropy to assess how homogeneous a node of the tree is. The entropy H of a node t is defined as follows:

$$ H(t)=-\sum\limits_{j}p(j|t)\log p(j|t) $$
(1)

where p(j|t) is the relative frequency of class j at node t and j denotes all different classes (in our case there are two: H and U). Any base for the logarithm can be used since we are interested in the relative gain in entropy (for the following we assume that all logarithms have base 10). Ideal goal is to have entropy zero which implies that all data points in the node belong to one class only.

Then IG is formally defined as:

$$ \text{IG}_{\text{split}}=H(p)- \left( \sum\limits_{i=1}^{k} \frac{n_{i}}{n} H(i) \right) $$
(2)

where: H denotes the entropy of a node and is defined in (1), p denotes the parent node (that we want to split), k are the partitions that node p is split to (i.e., how many new nodes are created), n i is the number of data points in node i.

The process is as follows: First, the attribute with the largest IG is selected. Then, if C is the dominant class (the class that most data points of the node belong to) we define Z k = + for every participant k if the number of observations (in that node) with Y = C is greater or equal than the number with YC. Otherwise, Z k = −. We form a contingency table with the 2k patterns of Z as columns and the attribute splits as rows and compute the significance using an independence test (Fisher test). If the test is positive, then the associated variable is selected for splitting and we continue building the tree. If not, the variable with the second best IG is selected and the process is repeated. An example of this process can be found in Fig. 3. In this Figure, we assume a small dataset of 17 data samples and we want to assess whether attribute X is suitable for branching. Firstly, we construct a contingency table for computing the IG. This Table is the 2 × 2 table on top of Fig. 3b. IG for splitting parent node p to two new nodes n 1 and n 2 based on attribute X is computed based on (2) as follows:

$$\begin{array}{@{}rcl@{}} \text{IG} &=& H(p) - \left[\frac{n_{1}}{n}H(n_{1}) - \frac{n_{2}}{n}H(n_{2}) \right]\\ &=& -\frac{11}{17} \log \frac{11}{17} - \frac{6}{17} \log \frac{6}{17} \\ &&- \left[ \frac{6}{17} \left( -\frac{1}{6} \log \frac{1}{6}-\frac{5}{6} \log \frac{5}{6} \right) \right.\\ &&\left.+ \frac{11}{17} \left( -\frac{10}{11} \log \frac{10}{11}-\frac{1}{11} \log \frac{1}{11} \right) \right] \\ &=& 0.28197 - \frac{11}{17} \times 0.13231 - \frac{6}{17}\\ &&\times 0.19568 = 0.12729 \end{array} $$
Fig. 3
figure 3

Explanatory process of building the decision tree

Then, we form the contingency table based on the previous process which leads to the bottom table of Fig. 3b. The significance of this table is computed using Fisher test and the result of the test is positive (p value 0.0009791), so attribute X will be selected for branching. In Fig. 3a the branching can be seen and how it improves the splitting of data points in regard to the outcome Y. Provided we are looking for higher accuracy we can repeat the same process recursively for the two new created nodes, which usually is the case for large datasets.

3.2 Participant profiling

By applying the above decision tree algorithm, we are able to extract (suppose N) significant rules that indicate what combinations of states (e.g., scoring high on food craving + being at home + low positive feelings + negative feelings) are predictive of unhealthy or healthy eating. Both healthy and unhealthy eating are considered in order to discriminate the conditions that lead to unhealthy eating compared to healthier options and also for better assessment of eating behavior.

The motivation behind constructing groups of participants based on their eating behavior is twofold: First, we want to explore if we can identify specific patterns in the participants’ sample that could describe on a high level eating profiles of different people. Second, we want to be able to generalize as much as possible to new participants and tackle the issue of not having enough data to proceed. Profiling via clustering will facilitate this goal since assigning a new participant to a group, will allow providing feedback to this person based on his/her eating profile.

In order to be able to construct profiles of eating behavior based on the rules, the data samples of all participants (suppose P) are checked to compute each rule’s triggering frequency. More specifically, each participant is represented by a N-dimensional vector (participant vector), where each component represents a rule. The value of the component represents the frequency of occurrence of that rule for the participant.

$$ u_{i} = [u_{i1},u_{i2},...,u_{ij},...,u_{iN}] $$
(3)

where: i = 1,...,P represents the different participants, j = 1,...,N represents the different rules, u i j represents the frequency of rule j for person i and is computed by the following formula:

$$ u_{ij} = \frac{\# \text{of times that rule j was activated for subj. i}}{\# \text{total rules that subj. i activated}} $$
(4)

Vectors u i represent participants’ eating behavior and will be used to assess whether there are any significant groups among the P persons that share common characteristics in their rule activations. In order to compare participants, we compare the distance between the equivalent participant vectors (u i ) using the Euclidean Distance:

$$ \text{dist}_{i,k}=\text{dist}_{u_{i},u_{k}}=\sqrt{\sum\limits_{j=1}^{N}({u_{ij}-u_{kj})^{2}}} $$
(5)

where: i,k are any two different participants out of the P, u i j ,u k j are derived from the vectors of participants i and k respectively.

By computing the distance between all P participants we end up with a P x P matrix, which we use in a standard hierarchical agglomerative clustering (HAC) algorithm [34]. This results in M groups of participants (M is determined by standard evaluation of the clustering results), which are expected to have some similar characteristics (i.e., similar rule activations).

In order to describe each one of these groups, we use a rule vector (similar to the participant vectors) that is representative of the rule frequencies within the group.

$$ g_{m}= \left[g_{m1},g_{m2},...g_{mj},...,g_{mN}\right] $$
(6)

where: m = 1,...M represents the different groups, j = 1,...,N represents the different rules, g m j corresponds to the frequency of rule j within the group m and is computed as follows:

$$ g_{mj} = \frac{{\sum}_{o=1}^{P_{m}}{u_{oj}}}{P_{m}} $$
(7)

where: P m is the number of participants in group m, u o j (as before) is the frequency of rule j for participant o,

Analysis of these groups can lead to significant findings regarding eating behavior of people and can allow us to generalize about the factors that promote unhealthy eating. These findings are presented in Section 4.

Finally, each group is represented by a ruleset that describes 80% of the eating behaviour of participants in the group (thus removing rules with low occurrence and keeping only those with high predictive value). Goal of these rulesets is to provide feedback to participants (see next subsection). The whole process of profiling and ruleset extraction is described in Fig. 4.

Fig. 4
figure 4

Group ruleset construction

3.3 Adaptive feedback mechanism

As already mentioned, each new participant is monitored for one week in order to collect enough data to assess their eating habits. Then, we are able to follow the same process described in Section 3.2 and derive rule vectors for these new participants, so as every one is represented by a vector u i . Our goal is to match each one of these new participants to one of the M groups by comparing the rule vector of each participant to the rule vectors of the groups. The best group that matches the new participant is computed by the following equation:

$$ m* = \text{argmin}_{m=1}^{M} (\text{dist}(i,m)) $$
(8)

where : m ∗ is the group that participant i is assigned, dist(i,m) is the distance between participant i and group m and is computed using (5) and using the rule vector of participant i (u i ) and group vector m (g m ).

After the group assignment process (which happens at the end of the first week), the intervention phase begins. The following will clarify how adaptive feedback works.

Every new (random) sample that is completed by a participant, is checked for a match with one of the pre-existing rules within the eating profile of the participant (using the decision tree algorithm implemented) and provided there is a match, the participant receives a warning and a feedback message via the application. Note that these feedback messages can only occur after a random sample is completed by the participant, and will only occur when the application detects that the participant is likely to eat something that is considered unhealthy in the time period directly following the random sample. This process is shown in Fig. 5a.

Fig. 5
figure 5

Adaptive feedback process

Each group has its own set of rules, where a rule is a combination of variables that has statistically been shown to lead to unhealthy eating for participants with the same eating profile. Each participant receives warnings (and feedback messages) based on the rulesets of the group that he/she belongs to. To allow for individual tailoring, rules that have shown to be statistically important to the participant during the first week of data collection, but do not belong to the group rule set, are included in the set of rules that are used for providing feedback to this particular participant. In our setup, we chose to include one participant-specific rule (but this can be extended to more). This process is shown in Fig. 5b.

4 Results

4.1 Derivation of rules

Given the dataset from N = 57 overweight people of study I, we extracted 65 significant rules (36 leading to healthy eating and 29 to unhealthy) using the algorithm described in the previous Section 3. An example of what a decision tree looks like can be seen in Fig. 6. Given the decision tree structure, we follow every path that leads from root to a leaf and infer one rule per leaf (in this example we expect six rules, four that lead to healthy eating (H), and two that lead to unhealthy eating (U)). On each node the split condition can be seen: If it is “true” (i.e.,“yes”) we take the left branch, otherwise we take the right branch. The six rules extracted from Fig. 3, can be found in Fig. 7.

Fig. 6
figure 6

Decision tree example: H/U refers to prediction of healthy/unhealthy eating event, respectively. For the rest of the abbreviations see Table 2

Fig. 7
figure 7

Rule examples derived from tree of Fig. 6

For example, the rule corresponding to the far right leaf (i.e., the last rule in Fig. 7) is activated when a participant completes a sample and has craving for something unhealthy and time is after 1200 (noon-after, evening) resulting in a warning about a “possible” unhealthy eating event. For this rule, the rest of the variables are irrelevant (although obviously they are filled by the participant). Weak rules (with either low cover over the data samples or with low discriminating capability over the data) are pruned and equivalent nodes are removed from the tree. The actual tree (pruned but still very dense and complex enough) covering the whole dataset can be seen in Fig. 8 and using a similar process readers can deduct all the rules.

Fig. 8
figure 8

Full decision tree: H/U refers to prediction of healthy/unhealthy eating event, respectively. For the rest of the abbreviations see Table 2

4.2 Profiling and grouping

After the extraction of the rules, the profiling process is taking place. Every participant is represented by a 65-dimensional vector and using a hierarchical agglomerative clustering (HAC, UPGMA variant), participants are clustered. The results of HAC can be found in Fig. 9. Since clustering is an unsupervised algorithm the optimal number of groups has to be decided using intrinsic evaluation criteria [27]. Goal of the clustering process is to create groups where participants within each group are similar to each other (based on their rule vectors, i.e., their eating behavior) and participants that belong to different groups are as much as possible different from each other. In our case, multiple criteria (see Table 3) suggested that the optimal number of clusters is six (groups are denoted with different color in Fig. 9). Moreover, rules were tested in order to assess their significance in the clustering process and they were found to be significant.

Fig. 9
figure 9

Clustering process grey: group 4, purple: group 5, yellow: group 6, red: group 1, blue: group 2, green: group 3

Table 3 Clustering criteria

Using the group information, we follow the process of Fig. 4 and we form the group rulesets (i.e., the sets of rules that will be used for providing adaptive feedback to participants assigned to each one of these groups). Moreover, in Table 4 some quantitative characteristics for the groups can be found. Column 3 shows how many rules are active in the ruleset of the group (i.e., how many rules are included for the intervention), column 4 is the percentage (average per person in the group) of the triggered rules that led to unhealthy eating, column 5 is the percentage (average per person in the group) of triggered rules which is computed as the amount of random samples that led to a rule activation over the total random samples and column 6 is the average (per person in the group) number of rule triggers per day of study. From this table, it becomes apparent that group two features the most healthy-eating participants, since they tend to activate less unhealthy rules than any other group (5.30%) and this is the reason of the low rate of triggers per day (0.42). This is also supported by the fact that the percentage of unhealthy rules that are triggered (19%) is much lower than the percentage of healthy rules. In contrast to this finding, group 6 features the participants which activated mostly unhealthy rules (52.4%) and they also trigger almost two warnings per day (on average).

Table 4 Group characteristics

Table 5 shows how rules are formed in one of the groups. Notice that each row of this Table denotes the combinations of circumstances (at time (t)) that lead to unhealthy eating (at time (t + 1)). When there are more than one possible values (like different activities for the first rule) then any of these values can trigger the rule and when there is no value reported (like for “crv” for the first rule) then this variable can hold any value. For example, first rule will be activated when a participant watches TV + has high positive emotions + has craving for something unhealthy + it is evening. Other values are still reported by the participant (e.g., can be at home, work, etc.) but they do not affect triggering of the rule. Finally, some of the most prevalent qualitative characteristics for the behavior of participants within the groups are presented below.

Table 5 Example top-5 rules for group 1
Group 1: The “evening at home” eaters: :

Group 1 holds the highest number of participants and through the analysis of the group most significant rules and the actual triggering statistics, it was found that most participants in the group triggered rules when they were at “home” (64.5% of rule triggers were located at home) and especially during “evening” hours (62.3% of rule triggers were during evening hours and of the rules that were activated at “home” 87.8% were at evening). Snacking at home in the evening could summarize the profile of this group.

Group 2: The “outdoors/social” eaters: :

Group 2 (already mentioned as the most healthy eating group) features among the most significant rules, cases that involve “outdoors” or “other” as locations and “socializing” as activities. More specifically, presence of these characteristics (“outdoors,” “other,” “social” for the locations and “outdoors,” “socializing” for the activities) seemed to dominate the rules that triggered warnings for unhealthy eating (at least one of these characteristics was present in every rule). A significant note here, is that these characteristics were not present in the rules that were found to lead to healthy eating behavior, which enhances our hypothesis.

Findings of group 2 regarding rule triggering come to agreement with the “healthy-eating” assumption since it supports the fact that these participants eat unhealthy only in cases when they are out (e.g., in a restaurant, bar, etc.) and/or in the presence of others (which acts as a social influence factor as well).

Group 3: The “circumstances-driven” eaters: :

Group 3 features the highest number of rules (15) meaning that behavior within the group is more diverse (and also based on more complex rules). Analysis of triggered rules reveals that there are many different combinations of activities and locations that trigger many different rules. Some of these examples are: “Computer-related/working and home,” “traveling and outdoors,” “other and socializing.” This specificity to the combinations is also irrelevant to the food craving value since in the majority of triggered rules (64.8%) food craving was reported to be low.

Group 4: The “very-occasional” eaters: :

Group 4 is the group with the smallest number of participants and is considered to be a group that gathers participants that do not fit well with any of the other groups. It features very specific rules, applicable to other groups as well but in this case they are more prevalent, e.g., the rule that covers circumstances like “computer-related” and “watching TV,” negative emotions, and high food craving. This rule with these specific values was present to all participants in this group.

Group 5: The “after-activity” snackers: :

Group 5 has the main quirk characteristic that unhealthy eating is mostly a result of either healthy cravings or not cravings at all (86% of the triggered rules). Looking closely to the rule triggers revealed that activities within house (“high level in, low level”) or “traveling” moments lead to unhealthy snacking despite the not-unhealthy cravings. Due to this specific characteristic this group is not expected to gather many people.

Group 6: The “unhealthy-cravings satisfaction” eaters: :

Group 6 features significant rules which are governed by the presence of unhealthy cravings that lead to unhealthy eating. Regardless of emotions and time of day, these participants tend to indulge to their unhealthy cravings (88.9% of the rules triggered in this group reported unhealthy cravings before an unhealthy eating event) in various locations and performing different activities. Not surprisingly, this is the group with the most triggers per day (almost 2).

It should be noticed here that some rules are overlapping between groups (e.g., the first rule of group 1 is also present in three other groups). This does not affect performance since triggering of the rules is based (mostly) on different combinations of variables. Besides, some rules cover generic cases and can be used as general warnings, even if the participant did not report that rule previously. Furthermore, overlapping rules between groups and rules that cover multiple cases is a way to generalize over any new participants (especially when their data is not available) or any previously unseen examples.

On the other hand, each new participant will receive feedback based on the ruleset of the group that he/she is assigned to and on the rule which is most significant to that specific person (provided that this rule is not present in the group ruleset). By this way we are able to offer a degree of tailoring for every participant.

4.3 Evaluation of groups

New participants (of study II) were assigned to one of the mentioned groups and receive feedback according to the rulesets of each individual group enhanced (if necessary) with a participant-specific significant rule. Study II is ongoing but data regarding the first group of the randomized controlled trial is available and we present them here as a way to validate our approach. In total, 47 participants received the Think Slim intervention were assigned to six groups based on their rule activations during the first week of study (only monitoring week). Statistics for this assignment can be found in Table 6. Column 3 shows the percentage (average per person) of rules that were triggered over the number of random samples completed (i.e., how often a participant of the new study triggered an unhealthy rule) and column 4 shows the increase in variance imposed by adding new participants in the already existing clusters (i.e., how the group homogeneity is affected by adding more subjects) and is computed by comparing the equivalent rule vectors of all participants (new and old ones) in a group against the ones from study I.

Table 6 Group assignments for new participants

The percentages of unhealthy rule triggering are in accordance with what was found during study I (group 6 has the highest and group 2 has the lowest) which suggests that new participants behave the same way as the same group participants. Moreover, the small increase in the within group distance (without and with study II participants) shows that groups remain cohesive even after adding new participants which confirms the validity of having six groups and proving that generalization over new subjects is possible.

Intervention for these participants lasted 6 weeks during which they received feedback according to the rules they triggered. In order to evaluate the effect of adaptive feedback in the intervention, we measure how many rules participants triggered throughout the 6 weeks of intervention and divide these numbers with the number of random samples filled out by participants during each individual week. Results can be found in Fig. 10. From this Figure, it is visible that during the first weeks of intervention (weeks 1 through 4), there was much instability in the participants’ rule triggering (see the outliers and the variance between the median and the mean value). However, during the last 2 weeks of intervention (when already participants have triggered several rules and received appropriate feedback) the average numbers of rule triggering drop (below the numbers of first weeks) and behavior of all participants in regard to rule triggering becomes more homogeneous (smaller variances). No outliers are noticed in these 2 weeks and several participants have zero rule triggering in last week (20 out of 47) compared to the first week (8 out of 47) (Fig. 10).

Fig. 10
figure 10

Rule triggering per intervention week

5 Discussion

In this paper, we presented a framework on how to use machine learning algorithms (classification and clustering) in order to build an adaptive feedback module for an e-coach mobile application about eating behavior. Techniques (classification decision trees and hierarchical agglomerative clustering) were specifically developed and tailored for ecological momentary assessment data collected through a mobile application. Overall work can be summarized into answering the following questions: how to build a classification system that will predict when people are prone to unhealthy eating and how to use a profiling system that can be utilized for providing feedback regarding possible unhealthy eating events algorithms were developed by taking into account the need for both generalization (how to apply data-driven techniques to people that haven’t been assessed previously by the application) and personalization (how to provide a means for person-specific semi-tailoring in regard to the feedback).

Our developed approaches can be adjusted both in respect to different variables measured (and how they are measured) and to the degree of tailoring (one can increase personalization over generalization). Moreover, results from running the intervention study on participants using our application show that the success of our approach is double: First, participants are assigned to groups based on their eating behavior (which confirms the validity of having six groups). Second, there is a decreasing trend in the actual percentage of rule triggers which was noticed during the last 2 weeks of intervention leading to a more stable behavior across all participants. This is interpreted as participants take into account the warning they receive from the application and identify the moments that lead to unhealthy eating behavior by not repeating them (at least not that often).

There are mainly two limitations in the developed approach: First, it solely relies on questionnaires (so on users’ responses) and despite EMA/EMI techniques minimize recall bias, it is still pretty intrusive (requires from participants to fill in several questionnaires throughout the day but on the bright side this takes place in their natural environment). However, overall approach can easily be extended to include information from sensors (e.g., to assess stress levels or other emotions) or GPS info (e.g., to assess location information). Second, we chose to balance between full generalization (taking into account that all participants belong to one group) and full personalization (taking into account only participant-specific data). Advantages of the first case is that by using data from different people you can build fast enough (due to the data aggregation) a population-based feedback module. On the other hand, limitations of such an approach are that full aggregation of all information will smooth out any individual characteristics which emerged using the group analysis we performed. Full personalization is the ultimate (and perhaps ideal) goal of many e-coach systems but this implies that there is enough data available for each participant. Collecting enough data in order to build a personalized rule-based feedback system will require additional time and effort from participants before they start receiving useful personal advice.

We are currently analyzing the main outcomes of the intervention. Further evaluation of how rule triggering changes over each week of intervention is needed in order to justify the success of the approach on a personal and group level. For this reason, comparisons before and after the intervention period are considered. Finally, we are exploring opportunities of transiting from a semi- to a fully tailored approach in regard to adaptive feedback. We believe that an ideal e-coach system should start from a semi-tailored approach (like ours) and as soon as the application gathers enough individual data from the user and “learns” their profile better, then fully tailored approaches are possible. For this reason, the exact moment that participants have provided enough data has to be identified and then the algorithms can easily be adjusted for each individual since the framework we developed is generic.