1 Introduction

The Internet enables ubiquitous access to a vast array of online products and services. However, while this offers users the benefit of greater choice, finding a preferred product or service when presented with seemingly endless options requires significant exploration time. To attenuate this problem of information overload, recommender systems (RS) were introduced to supply a user with targeted results (i.e. recommendations) based on that user’s individual preferences and the preferences of other users with similar characteristics (Aggarwal 2006).

While there is a considerable literature on recommender systems, across a variety of domains, the effort focused on health and well-being recommendation is comparatively scarce. The vast majority of these works focus on a single aspect of health, such as exercise (Berndsen et al. 2017; Pilloni et al. 2017; Reimer et al. 2016) or healthy food intake (Achananuparp and Weber 2016; Akkoyunlu et al. 2017; Schäfer 2016), in isolation. Another limitation of previous work is the lack of flexibility to recommend “tailored” items. In some domains, such as retail (Wu et al. 2019) or entertainment (Gómez-Uribe and Hunt 2015), this is not an issue because recommendations (i.e. products or movies) are intrinsically static, with unchanging properties. By contrast, recommendations in a personalised well-being domain—meal ingredients and serving sizes; duration and intensity of exercise—should be configurable.

We argue that the interrelationship between daily meals and exercise plays a fundamental role in general well-being and the adoption of healthy living habits. Therefore, it is crucial to consider this interrelationship in personalised well-being recommendation. Our present study constitutes—to the best of our knowledge—one of the first research efforts in this direction. We also produce recommendations that are dynamically tailored to users’ preferences, such that we capture not only what to recommend, but also how much. To achieve this, we employ a genetic algorithm (GA). While GAs have been used in recommender systems before (e.g. Caldeira et al. 2018; Lv et al. 2015), this technique has not been used to realise the potential of producing highly tailored recommendations, which are necessary in domains such as well-being.

To overcome the aforesaid limitations in recommender systems for well-being, we propose a novel approach:

  1. 1.

    We firstly introduce EvoRecSys, a conceptual recommendation framework based on an evolutionary multi-objective optimisation problem, where constraints are modelled upon users’ preferences, their physical condition, and their well-being goals. The underlying evolutionary algorithm explores the search space of all possible combinations of recommendable items, which are in the form of meal-exercise bundles. These bundles capture the interrelationship between eating and exercising in the domain of personal well-being. Output solutions (i.e. items to recommend) balance what the user likes with what the user needs in order to achieve her/his well-being goals. For example, if a user wants to lose weight then the recommended food items will not only meet user preferences, but also keep within strict calorie limits; while recommended exercise will consider calories burned, so that weight loss is ensured. At the same time, general health guidelines, such as controlling the amount of saturated fats, sugars, etc., are observed.

  2. 2.

    To demonstrate the suitability of the EvoRecSys framework, we instantiate it as a model for general well-being, with four possible well-being goals as the personalised constraints for the user, a well-defined quantity of items to be recommended, and a specific evolutionary implementation.

  3. 3.

    Our instantiated model incorporates principles from collaborative filtering recommender systems. Using a similarity metric, our model identifies users that are similar to the target user in terms of preferences, physical condition, and well-being goal. We show that integrating principles from collaborative filtering helps deal with situations of incomplete recommendation user-related interaction information and enhances recommendation diversity, which is fundamental to motivating users during their well-being journey.

Our implemented model is evaluated and validated under two aspects: (1) we measure the algorithmic performance and optimality of the underlying evolutionary approach for different parameter settings and through benchmark against several baselines implementations, demonstrating the model’s ability to produce efficient and optimal recommendations that are semantically meaningful; (2) we also conduct a user study with more than 200 participants and demonstrate that personalised recommendations are positively perceived under four criteria: health, diversity, serendipity, and attractiveness. An additional, A/B test shows user’s tendency to prefer recommendations produced by the proposed EvoRecSys implementation against a CF-based implementation.

EvoRecSys advocates the use of genetic algorithms (GAs) as the core evolutionary technique to optimise recommendations. Although this paper illustrates a model instantiated upon EvoRecSys, different models emulating our conceptual framework can be seamlessly built by defining the necessary input data and objectives to optimise, thereby adapting it to the intended users and aims of the model in question. Furthermore, as opposed to recommending static immutable items, which conventional RS approaches usually deal with, EvoRecSys enables the generation of dynamic recommendations. In summary, this study provides the first effort in establishing a conceptual framework for producing configurable recommendations that incentivise users’ well-being using evolutionary computing as the core of the recommendation engine.

This paper is structured as follows. Section 2 describes related research on recommender systems for food and exercise, along with studies that use GAs in the recommendation process. Section 3 explains the architecture and key elements that compose the general EvoRecSys framework. Section 4 presents a concrete implementation of EvoRecSys for health and well-being recommendations. Section 5 analyses algorithmic performance. A user study is then presented in Sect. 6. Finally, Sect. 7 concludes.

2 Related work

There exist a number of recent research studies on RS for health and well-being. However, unlike our proposed approach, existing studies tend to consider food and physical activity recommendations in isolation, rather than as a combined bundle. Section 2.1 outlines relevant work focused on food recommendation. Section 2.2 describes research related to physical activity recommender systems. Section 2.3 shows studies that incorporate a genetic algorithm as a complementary technique or an extra step during the recommendation process. Finally, Sect. 2.4 presents studies on recommender systems whose output contains bundles of more than one recommendable item. Our contribution to the literature is summarised in Sect. 2.5.

2.1 Food recommender Systems

In the scope of personalised food recommendation, some approaches have focused on food substitution. For instance, Achananuparp and Weber (2016) hypothesised that food items consumed in the same context can be seamlessly replaced by each other—e.g. a tuna sandwich can be substituted for a ham sandwich if both are consumed with a salad—thereby allowing for greater diversity in daily meals. The “substitutability” between two food items was measured by the cosine similarity technique and two vector representations for those food items were explored: positive pointwise mutual information matrix (PPMI matrix) and singular value decomposition (SVD); with the latter obtaining best performance. This method produces top-10 food substitute candidates for each food item. The food data used in this work came from 9896 users of the web platform called MyFitnessPal (MFP) and their food consumption diaries.

Akkoyunlu et al. (2017) argued that it is possible to recommend healthy food substitutes that match user preferences within the same context; where context is defined as the set of other food items that are consumed with the target food. For example, in the meal \(\{{\textit{tea}}, {\textit{bread}}, {\textit{juice}}\}\), the context of \({\textit{tea}}\) is \(\{{\textit{bread}}, {\textit{juice}}\}\). Using the French database, INCA 2, which contains food diaries of 2624 adults, the recommender model generates a graph, with nodes representing meals in the database. Under this design, substitutable nodes—those belonging to the same dietary context—are adjacent and form a fully connected sub-graph, or clique. Nodes are considered highly substitutable if consumed in similar contexts, and less substitutable if consumed together.

Caldeira et al. (2018) suggest that meal recipes can be recommended by considering their nutritional value, harmony of ingredients, and the availability of the ingredients. This research uses the Non-dominated Sorting Genetic Algorithm II (NSGA-II), introduced by Deb et al. (2002), which is an evolutionary algorithm with the following features: (i) elitism, to preserve the best solution of current population in the next generation; (ii) crowding distance techniques, to provide diversity in solutions; and (iii) non-dominated sorting techniques, to maintain a Pareto-optimal archive solutions. Using this algorithm, a list of suggested meal recipes is found by considering the number of portions, quantity of ingredients, and tastiness. Their approach also makes it possible to specify a food style, such as vegetarian or vegan. The set of recipes used in this study was collected from Brazilian website TudoGostoso.Footnote 1

Recently, Musto et al. (2020) proposed a knowledge-based strategy that incorporates “holistic” user profile information in a popularity-driven recipe recommender algorithm. Profiles include user data such as demographics, age, gender, and weight, as well as food requirements, physical activity level, and body mass index, in order to re-rank popularity-based recommendations so that user-related health factors are considered. This solution is different from state-of-the-art food RS in that it handles knowledge about users’ physical health and behavioural characteristics rather than considering diet preferences alone. However, we note that while physical activity data are used as input to the model, only food recipes are recommended (unlike the approach that we propose in this paper, where we recommend a bundle containing interrelated food items for meals and physical activities).

2.2 Recommender systems for physical activity

In terms of physical activity recommender systems, there are several studies whose objective is to try to change the user behaviour towards a healthy physical lifestyle. For example, Reimer et al. (2016) advocated for users to change their habits in a tailored manner in order to reach exercise goals. The proposed framework of this research motivates a user through “nudges” (Thaler and Sunstein 2008). There are various types of nudges such as suggestions, praise and rewards. The accepted nudges by the user are used to create a personalised profile that will encourage the user to reach the goal. Furthermore, this framework utilises a collaborative filtering technique to generate recommendations focused on the goals. Users’ socio-demographic data and their past behaviour are used in order to characterise the feature vector that represents each user. Similarly between two users is calculated by the cosine similarity.

Some research tackles the domain of sports. For instance, Pilloni et al. (2017) argue that it is possible to predict when a user is going to abandon an exercise routine based on their previous behaviour and thus prevent it. The proposed model uses a machine learning algorithm as the core of the recommendation process. The previous user behaviour is used as a training vector which has 34 features including covered distance, workout duration, and rest time. Once the algorithm is trained, it is able to predict if a user is going to abandon the routine. If so, a recommendation for encouraging the user to continue the routine is triggered. Otherwise, the system predicts the user will not abandon the routine. The study tested 4 classification algorithms: (i) random forest, (ii) AdaBoost, (iii) extra trees, and (iv) multi-layer perceptron; where random forest obtained the best performance. Data used for the analysis were taken from the u4fit platform.Footnote 2

Following the same path, Berndsen et al. (2017) showed that amateur runners—those without advanced training, or access to a coach—can improve performance using elite runners’ behaviour as a target to follow. Two models—K-nearest neighbours (KNN) and extreme gradient boosting (XGB), which was shown to have higher performance—were trained to predict marathon times using users’ finishing times at various distances. The predicted times are the basis of the recommendations, which is performed by collaborative filtering. For instance, if a runner finishes a 10 km race in 63 min, while an elite runner takes 46 min, the recommendation would be, “you have to train a little bit more”. The dataset employed in this work was taken from diverse websites where athletes are allowed to declare their race times, such as the website RunnersWorld.Footnote 3 Additionally, the work explored ways to best present recommendations to runners in order to nudge their training behaviours.

2.3 Evolutionary algorithms in recommender systems

Evolutionary computation techniques have been used in many domains, including, to name a few, Computer Science (Dutta et al. 2020; Gunasegaran and Cheah 2019), Geology (Rezaei and Asadizadeh 2020), Biology (Guo et al. 2020), and Chemistry (Buchely et al. 2020). In the area of RS, genetic algorithms (GAs) (one of the primary evolutionary computing techniques) have been scarcely applied to date.

One of the most typical applications of RS is in e-commerce. For example, Lv et al. (2015) proposed a framework to help the standard techniques of recommendation (collaborative filtering and content-based) to yield the quality of recommendations through a traditional GA and a class-based ontology, which is built by considering an item that the user is interested in. For instance, if the user is interested in a book, the class would be Book and some of its attributes would be title and publication date. The workflow of this framework consists of: (i) retrieving the items in the user’s shopping cart from the log file of an e-commerce site; (ii) mapping each attribute as a class to build the ontology; (iii) using the GA to optimise the different feature weights in the set of items; (iv) using the coefficients calculated in the previous step to cluster items; and (v) recommending the nearest cluster to the product that the user was interested in during the last visit. To evaluate the framework, the MovieLens datasetFootnote 4 was used, showing a better performance than standard collaborative filtering and content-based techniques.

Hassan and Hamada (2018) have also used a GA as a optimisation step within a multi-criteria RS to compare well-known RS methods (collaborative filtering and content-based) and GA-based methods. In the study, three variations of GA were used: (i) a standard GA—a population of candidate solutions (called individuals) to an optimisation problem is evolved towards better solutions. Each candidate solution has a set of properties which can be mutated and altered by genetic operators with a fixed probability of occurrence (Whitley 1994; ii) an adaptive GA—population information in each generation is used to adjust the probability of both mutation and crossover in order to maintain population diversity and sustain the convergence capacity (Srinivas and Patnaik 1994); and (iii) a multi-heuristic GA—the principal features of two or more heuristic approaches are combined to form a single algorithm for enhancing performance and preventing premature convergence during the search process. Crossover and mutation rates are initially set high, and then reduced slowly over time. Results demonstrated that GA-based approaches could outperform collaborative filtering and content-based RS methods when tested using the Yahoo! Movie website.Footnote 5

Karabadji et al. (2018) presented another multi-criteria based study and found that a GA can suggest a suitable set of neighbours in a collaborative filtering RS. The research demonstrated that a GA (i) alleviates settings problems related to selecting the N most similar neighbours and (ii) guarantees diversity by selecting groups of individuals that are different. In this way, both high similarity and high diversity are achieved. The model was tested using a dataset containing 239 ratings by 100 customers from 17 Algerian insurance companies and the MovieLens dataset4.

Finally, Cui et al. (2017) argued that, while losing a certain degree of precision, it is possible to improve the metrics of diversity and novelty by adding a multi-objective GA during the recommendation process. The GA represents each individual as a 1-D integer array, where each loci in the genotype represents an item that can only appear once in the recommendation list. While the mutation operator has a standard functionality, the crossover operator is designed to preserve a user’s habits such that if an item appears frequently in the user’s recommendation lists, the probability of preserving it unchanged during the crossover process will increase. For evaluation, two objective functions are used: (i) accuracy and (ii) diversity. The authors validate the performance of their GA in combination with a set of traditional recommendation algorithms. Results demonstrate that the combination of their GA and the recommendation algorithms can achieve a good balance between precision and diversity.

2.4 Bundling in recommender systems

In recommender systems, an output recommendation may contain multiple items, which we call a bundle. For instance, Rapti et al. (2014) introduced an agent-based approach for generation of personalised product bundles for enterprise networks. The core process includes complementary associations between products and building bundles according to the customer preferences. Furthermore, the approach is able to adapt itself if the environment changes: customer profile modifications, product availability, and rule and constraint rule diversity. The authors provide an example under an e-Furniture context, employing an agent-based system in a network of enterprises that manufacture.

Bundling is also used in other recommender system domains such as the telecommunication industry. For example, Dragone et al. (2018) present a system whose outputs are combined services (mobile connectivity, broadband allocation, TV on demand, etc.) and electronic device plans (smartphones, tablets, TVs), selected according the customer necessities. The system considers the constructive preference elicitation framework, which allows to model the bundle offers as a defined set of variables and constraints. By using constraint optimisation, the system generates high-utility recommendations. Furthermore, an empirical validation study, where 134 participants were involved, is presented. Results show that the outputs of the system were considered more satisfactory than those obtained with standard techniques used in the market.

Zanker et al. (2010) applied bundling to the tourism domain, with recommendations containing bundle collections of accommodation, activities, and restaurants. This system uses a constraint satisfaction problem (CSP) solver, which invokes numerous recommender systems to propose a ranked list of items for each product category based on the user model and available community knowledge. The authors evaluate their system using an example scenario consisting of 5 product classes with 30 different product properties and 23 representative constraints. The evaluation only focuses on computation time and does not consider the quality of the final recommendations. Results demonstrate the system is able to generate bundles within a time period that is acceptable for typical e-commerce situations.

2.5 Contribution

While bundle recommendations have been explored in a number of domains, we present the first research that considers recommendation bundling in the health and well-being domain. We focus on linking two aspects that have previously been studied separately: (i) nourishment (see Sect. 2.1) and (ii) physical activities (see Sect. 2.2). We combine physical activity and diet as this aligns with the weight loss/management goal of the user and is designed to illustrate how lifestyle bundles can be incorporated to provide more holistic advice. Elliot and Hamlin (2018) have presented evidence that people like to treat healthy lifestyle in a collective manner when making efforts to change their behaviour and improve health (also see Johns et al. 2014).

Also, while other recommender systems have included a GA, often the role of the GA is limited to a single step inside the recommendation process (see Sect. 2.3). By contrast, we introduce a novel approach for generating recommendations entirely based on a GA. Our approach also enables us to combine items with a high level of granularity, such that the attributes of each recommendable item can be optimised throughout the evolutionary process. This provides an opportunity to offer highly tailored recommendation items based on user preferences.

Furthermore, as previously stated in Sect. 2.3, GAs have been successfully used to improve traditional RS techniques such as collaborative filtering (Hassan and Hamada 2018; Karabadji et al. 2018) and matrix factorisation (Kilani et al. (2018)). Thus, the presented research represents an effort to continue the integration path of GAs in the RS domain.

3 EvoRecSys: general description

This section introduces EvoRecSys (Evolutionary Recommender System), a conceptual framework that reformulates the recommendation problem as a multi-objective optimisation problem in which solutions to the problem are modelled by configurable items or groups of items to recommend to the end user. Under this approach, it is plausible to build recommendations focused on (i) reaching a specific well-being goal specified by the user and, at the same time, (ii) considering the user preferences. In other words, this framework creates recommendations balancing the user preferences and what types of food and physical activity should the user consider for reaching a goal.

The remainder of this section describes the main elements of our proposed approach for personalised well-being. Section 3.1 shows the general framework architecture, the workflow model and a general description of the data sources that could be used by the framework during the evolutionary-recommendation process. Section 3.2 defines essential concepts related to the framework. Finally, Sect. 3.3 describes the key features of the core element of this framework: a genetic algorithm.

3.1 Architecture and workflow

EvoRecSys receives the user input (physical characteristics, well-being goals, and food and exercise preferences) and recommends meal and physical activity (PA) bundles that are tailored to the user through an evolutionary optimisation process. The architecture and workflow are presented in Fig. 1.

Fig. 1
figure 1

Architecture of EvoRecSys for well-being

The main inputs of EvoRecSys are divided into four elements of user-related information:

  1. (i)

    Physical status and exercising habits. Data related to age, gender and body measurements of the user, as well as their frequency of exercising.

  2. (ii)

    Food category preferences. How much the user likes certain ingredients, predetermined types of food, etc. In order to do this, a numerical scale can be used, for example the 5-point Likert scale. Due to the versatility of this element, it is possible to implement it in diverse ways. For instance, it might focus on a specific dietary requirements such as vegetarians, vegans or people with certain allergies.

  3. (iii)

    Preferences on types of physical activity. Information that describes how much the user likes certain types of PA. As in the previous element, it can be implemented focusing on predetermined set of physical activities. For example, water activities, or sports where a ball is used. In order to measure the user preferences, a minimum–maximum-based numerical scale can be used, similar to the previous element.

  4. (iv)

    Well-being goal. A goal chosen by the user from a set of predefined goals focused on a specific well-being aspect to be improved. The goals can be set for handling either general circumstances (losing weight, maintaining weight, etc.) or specific ones (control chronic diseases), depending on the specific aims of the model implemented upon this general framework. Without losing generality, our instantiated model in Sect. 4 focuses on general-purpose recommendations aligned with various well-being goals.

During the evolutionary recommendation process, the framework interacts with a data source that contains, at least, food data and PA data. Previous user preferences are also needed in models that gather users’ interactions and feedback over time. Due to the decoupled architecture of the framework, the data source can be any readily accessible database (e.g. myfitnesspal, u4fit, or Kaggle)Footnote 6 as long as it contains the necessary data to perform the evolutionary recommendation. The framework outputs a list of K recommendations, which are tailored to the user preferences and the chosen well-being goal. Next, we describe the proposed evolutionary process and its core components.

3.2 Basic definitions

Genetic algorithms (GAs) are an optimisation technique inspired by natural selection. They operate by “evolving” a population of individuals, each representing a possible solution in the problem domain. Initially, the population is poor quality and widely dispersed across the search space. Over time, the population will gradually converge towards regions of space with better solutions that are closer to the user’s preferences and their health needs.

In the scope of this study, and based on existing coding schemes to represent individuals (Goldberg 1989), we encode individuals as meal-PA bundles, containing food attributes and PA features. A meal is defined as a set of N food items. Additionally, a PA item defines a physical exercise that could contain attributes such as the type of PA, duration, intensity level etc. This is illustrated in Fig. 2. Additionally, it is feasible to define semantic rules that guarantee coherent meal structures (an example will be shown in the concrete implementation presented in Sect. 4).

Fig. 2
figure 2

Structure of a bundle or recommendable item in the EvoRecSys framework. An individual contains \(K \ge 1\) bundles

Depending on the design decisions made to build a model upon this general framework architecture, the output can have different structures. For instance, the typical output is the best evaluated individual after the evolutionary process with K bundles. Another possible output would consider the top N individuals after the evolutionary process under the assumption that each individual would have one bundle.

3.3 The evolutionary process

Here, we describe the main steps of the genetic algorithm that drives evolution in the EvoRecSys architecture. First, a population of individual “recommendations” are initialised at random across the solution space. Each individual created must: (i) match user preferences; (ii) be within the intrinsic boundaries of the inputs defined; and (iii) have an interrelationship between food and PA items. To determine the performance of individual recommendations, a key element of a GA is the evaluation function or fitness function. It is inspired by the natural selection statement that says that the most adapted individuals in a certain environment have more opportunities to survive and hence, to transmit their genetic information to the next generation of individuals (Goldberg 1989). In order to provide suitable and consistent recommendations that meet (i) the user’s preferences and (ii) her/his well-being goals, we define the fitness function upon a set of restrictions or objectives to optimise. Let \(\mathcal {R} = \{\mu _1, \mu _2, \ldots , \mu _M\}\) represent the set of all possible restrictions to consider with \(M\ge 1\). A fitness function \(FF_i\) associated with a user \(u_i \in U\) with a goal \(\mathcal {G}_i\) is defined based on \(\mathcal {R}\) and her/his individual food-PA preferences \(\varPsi _{u_i}\). It assesses the matching degree to which a recommended bundle simultaneously meets \(\mathcal {G}_i\) and the user preferences.

$$\begin{aligned} {\textit{FF}}_i = {\textit{FF}}(\varPsi _{u_i};\mathcal {G}_i) = \phi \left( \varPsi _{u_i};\varGamma _i(\mu _{i,1}), \varGamma _i(\mu _{i,2}), \ldots , \varGamma _i(\mu _{i,M})\right) \end{aligned}$$
(1)

with \(\mathcal {G}_i = \{\mu _{i,1}, \mu _{i,2}, \ldots , \mu _{i,M}\}\). \(\varGamma _i(\mu _{i,j})\) is an aptitude function describing the degree to which restriction \(\mu _{i,j}\) \((j=1,\ldots ,M) \) is satisfied by the individual; \(\varPsi _{u_i}\) is a function that measures how much \(u_i\) preferences are met, and \(\phi \) is a combination function, e.g. an averaging or aggregation operator (Beliakov et al. 2007). For example, \(\varPsi _{u_i}\) could be a distance function between a representation of the user preferences and the properties of recommendable bundles in an individual. Figure 3 illustrates the definition and application of a fitness function \({\textit{FF}}_i\).

Fig. 3
figure 3

Fitness (aptitude) evaluation for an individual consisting of K meal-PA bundles

The next evolutionary step is the selection of “parent” individuals to reproduce. This step focuses on choosing the fittest individuals (i.e. those with the lowest aptitude values). There are numerous selection methods, including: proportional selection, rank-based selection, tournament selection, disruptive selection, and elitism (for further reading, see Jong et al. 1997). Here, we use tournament selection, however EvoRecSys enables any selection method to be used.

We then produce the offspring population from the selected parents. To enable exploitation of good genetic combinations that have produced high fitness in parents, offspring should be similar to parents. At the same time, to explore solution space, we need to introduce some novelty in the offspring population. To achieve these two aims, we use the genetic operators crossover and mutation, respectively (e.g. see Goldberg 1989). The crossover operator randomly takes two individuals of the new population and it combines a part of each individual to randomly create two new ones, with the aim of further exploring a specific (and sometimes promising) part of the search space. Under the EvoRecSys framework approach, it is feasible implementing this genetic operator in different ways and granularity levels. Regarding meals, we suggest to recombine food items among individuals’ bundles, rather than complete meals. Regarding physical activities, the suggestion is similar to meals: recombining them among individuals’ bundles. The mutation operator acts on one individual, such that one element of the genotype (its representation in GA terms) is modified. This operator therefore explores the local region of search space. In the context of EvoRecSys, a variety of approaches to mutate exist. We suggest, nevertheless, to only mutate food items within meals and PA items within bundles of an individual. Due to the flexible architecture of bundles, both food items an PA items can be mutated regardless the stochastic process implemented for this genetic operator.

Finally, genetic algorithms have a number of other parameters to be considered, including population size (i.e. number of individuals), number of generations (or evolutionary iterations), crossover probability, and mutation probability. Holland (1975) states that the crossover operator is the most reliable operator in order to explore the search space, whereas the mutation operator is a complement of the crossover. Therefore, crossover should have a considerably high probability of taking place and mutation a comparatively small probability that it occurs. Regarding the number of generations and the population size, these parameters are directly proportional to the problem size (Jong et al. 1997). Said otherwise, the more elements individuals represent, the bigger the population size and the number of generations are for the sake of exploring and converging into an optimal zone in the search space. Finally, considering the encoding used in this framework, it is plausible that other parameters might arise in order to control inherent processes related to the encoding itself. Similarly, other data sources or supplementary processes typical in recommender systems such as collaborative filtering could be flexibly incorporated (see Sect. 4.3). The abstract and conceptual nature of the EvoRecSys enables the flexible incorporation of additional parameters.

Table 1 User inputs and their range values

4 EvoRecSys implementation for personalised well-being

We introduce a concrete proof-of-concept implementation of EvoRecSys in the domain of well-being and preventative health. Here, we do not consider the more difficult problem of accommodating clinical conditions, so the target user profiles exclude people with chronic diseases (diabetes, hypertension, allergies, etc.) and kinetic limitations (paralysed or amputated limbs). We also do not consider different traditions or cultural backgrounds. However, in future, the framework can be easily extended to encompass these more general cases. Section 4.1 presents the architectural considerations and the data used. Section 4.2 describes the specific design choices made for the GA. Finally, Sect. 4.3 describes the integration of a nearest neighbourhood-based mechanism, inspired by collaborative filtering, which is used in the mutation operator.

4.1 Architecture, inputs, and data source

We implement an evolutionary model following the description in Sect. 3.1. The inputs are: physical status and exercising habits (see Table 1), and well-being goal: (1) losing weight, (2) maintaining weight, (3) gaining weight, and (4) building muscle mass. Data on 166 food items allocated in 14 food types and 50 physical activity items (PAs) allocated in 8 types are taken from Health Canada (2008) and Arizona State University (2011), respectively (see Table 2), using the following preference categories (expressed using a 5-point numerical scale):

  • Food: bread, cereal, dairy products, egg, fish, fruits, grains, legumes, meat, nuts, pasta, poultry, seafood, vegetables.

  • PA: balance (yoga, tai-chi, etc.), bicycling, conditioning (cardio, gym workouts, etc.), dancing, running, sports, walking, water activities.

Table 2 Dataset structure

Previous users’ data were also collected from an initial survey of 145 users, where each provided their physical status along with their food and activity preferences. This dataset is only used for finding users with similar preferences during the collaborative filtering stage (see Sect. 4.3). The output of this evolutionary model is the best evaluated individual, comprising K meal-PA bundle recommendations.

4.2 Evolutionary specifications of the implemented model

This subsection describes the specific design choices made for the GA components in the model implemented, based on the general framework guidelines introduced in Sects. 3.2 and 3.3. The principal element of EvoRecSys is a genetic algorithm (see Algorithm 1), which we describe below.

figure a
Fig. 4
figure 4

Example of an individual in the implemented model, containing \(K=3\) bundles

4.2.1 Creation of individuals

In this model, a meal contains four food items (\(N=4\)). To ensure semantic consistency of portions, and to allow for finding diverse and non-repetitive recommendations, each meal contains two food types: (i) a single main food item and (ii) three side food items. Without loss of generality, we consider that each individual contains three bundles (\(K=3\)), as shown in Fig. 4.

A relevant parameter that influences the interrelationship between the meal and the PA being jointly recommended is the number of intake calories that the user should consume per day. In order to calculate the target value, we use the Harris and Benedict equation due to the proven trustworthiness of its predictions in existing health-related literature (Lee and Kim 2012). Basal Metabolic Rate (BMR) is measured, as follows:

$$\begin{aligned} {\textit{BMR}}_{{\textit{male}}}= & {} 655.1 + (9.563 \times {\textit{weight}}) + (1.850 \times {\textit{height}})\, \breve{} \,(4.676 \times {\textit{age}}) \end{aligned}$$
(2)
$$\begin{aligned} {\textit{BMR}}_{{\textit{female}}}= & {} 66.5 + (13.75 \times {\textit{weight}}) + (5.003 \times {\textit{height}})\, \breve{} \,(6.755 \times {\textit{age}}) \end{aligned}$$
(3)

where weight is measured in kilograms, height is in centimetres, and age is in years. Total energy expenditure (TEE) is then obtained by multiplying BMR by the physical activity level (PAL) (Shetty et al. 1996):

$$\begin{aligned} {\textit{TEE}} = {\textit{BMR}} \times {\textit{PAL}} \end{aligned}$$
(4)

Table 3 shows the possible values for PAL according to the activity level which is asked to every user of our model:

Table 3 PAL values and their associated description

Finally, the calculated TEE value is tailored according to the chosen goal. In this implementation, the set of available goals are: (i) losing weight; (ii) maintaining weight; (iii) gaining weight; and (iv) gaining muscle mass. Under the evidenced assumption that a kilogram of body fat contains 7717.75 kilocalories (Wishnofsky 1958), it is feasible to estimate the number of required intake calories for some of the well-being goals previously defined. For instance, if a user reduces 551.26 intake kilocalories in the daily TEE, in 7 days the user would lose approximately 500 g of weight.

The resulting output represents the maximum number of kilocalories that a meal should have and it will be used to calculate the suggested time that should be spent on the PA, considering the chosen well-being goal. In other words, this value helps to determine the segments from the search space that fulfil the user PA preferences, her/his nourishment requirements and her/his goal during the stochastic process of creation of the population, excluding those that do not fulfil.

Remark 1

Although meal and exercise activities are both generated stochastically, they have a linking parameter in common, namely the tailored number of intake calories associated with the target user.

4.2.2 Evaluation of individuals

This model implements three specific restrictions that compose the fitness function to evaluate individuals (see Table 4). Let \(\mathcal {R} = \{\mu _{hf}, \mu _{PA}, \mu _{cd}\}\) be the set of all restrictions to consider. A fitness function \(FF_i\) associated with a user \(u_i \in U\) with a goal \(\mathcal {G}_i\), is defined based on \(\mathcal {R}\) and her/his individual food-exercising preferences \(\varPsi _{u_i}\). It assesses the degree that a recommended bundle simultaneously meets \(\mathcal {G}_i\) and the user preferences.

$$\begin{aligned} {\textit{FF}}_i = {\textit{FF}}(\varPsi _{u_i};\mathcal {G}_i) = \phi \left( \varPsi _{u_i};\varGamma _i(\mu _{hf}), \varGamma _i(\mu _{PA}), \varGamma _i(\mu _{cd})\right) \end{aligned}$$
(5)

where \(\phi \) is the arithmetic mean averaging operator. The three restrictions are described as follows:

  • The Healthy Food Restriction follows the England Government Dietary Recommendations (England 2016). Based on this, the restriction evaluates independently the amount of proteins, carbohydrates, sugar, fibre, fat, saturated fat, and salt in a meal.

  • The Exercising Restriction evaluates the matching degree between the recommended time in the PA item and the average time that the user spends during the exercising time. This helps, for instance, to ensure that a PA for a given user is neither too mild, nor too ambitious or intense for her/him. We use the MET as the reference value to ensure that the meal-PA combination aligns with intended user’s well-being goal.

  • The Consistency and Diversity Restriction evaluates the food item diversity from two approaches: firstly, it evaluates the diversity in a single meal (among the food items that conforms the single meal) and secondly, it evaluates the diversity among meals within an individual. It also evaluates the diversity among exercising items within an individual. Moreover, this restriction evaluates, in terms of serving size, how well-proportioned the food items are in each meal. This helps preventing too similar bundles within the same individual.

Table 4 Set of restrictions used in EvoRecSys

Finally, in this EvoRecSys instance, \(\varPsi _{u_i}\) has been implemented as an additional restriction whose purpose is to evaluate how likeable are both the recommended meals and the recommended PA’s, based on the user preferences. Thus, once all four restrictions are employed, the aptitude of the individuals is calculated as follows:

$$\begin{aligned} {\textit{FF}}_{i,x} = (\mu _{hf_x} + \mu _{PA_x} + \mu _{cd_x} + \varPsi _{u_i,x})/4 \end{aligned}$$
(6)

Remark 2

The fitness function of the GA focuses on minimising error. The values that it yields are normalised to the range [0.0, 1.0]

4.2.3 Selection

We use tournament selection (Zhang and Kim 2000), which works as follows. First, a pair of individuals are randomly sampled from the population, with replacement. The aptitude values of the two individuals are compared, and the individual with the best aptitude (the lower value in this implementation) is selected and added (as an “offspring”) to the new population. The process repeats N times (where N is the population size), until a new offspring population is formed with size equal to the parent population. Tournament size T directly controls selection pressure in the population. Note that, on average, since we are using a tournament of size \(T=2\), we expect: the best member of the parent population to have, on average, \(T=2\) offspring in the new population; the median member of the parent population to have, on average, \(T/2=1\) offspring; and the lowest aptitude member is guaranteed to have no offspring. We also include elitism, such that we ensure that the best individual of each generation is reproduced (without modification) into the new offspring population. This ensures that good solutions are not lost during the reproduction process.

Fig. 5
figure 5

Example of the crossover operator mechanism over a bundle

4.2.4 Genetic operators: crossover and mutation

Crossover and mutation follow a stochastic process and occur at the element level of each bundle. During crossover and mutation, each element is selected using the following method: (i) for each bundle, the bundle is selected with probability \(p_0=0.9\); then, (ii) one element within the bundle \(\{{\textit{Main}}, {\textit{Side}}, {\textit{PA}}\}\) is selected with probability \(\{0.2, 0.6,0.2\}\); (iii) if Side is selected, each sub-element (each side meal) is selected with probability 0.5, enabling the possibility of multiple sides to be selected.

The crossover operator is used to recombine genetic code (the items) between two “parent” individuals, A and B (see example in Fig. 5). An offspring (i.e. a child) is created as a copy of parent A then, using the crossover process described above, for parent B, each element (or sub-element) that is selected will be inserted into the child. In the example shown in Fig. 5, the child is a copy of Parent A, with two side meals (“beans” and “broccoli”) copied from parent B.

The mutation operator is directed by collaborative filtering (detailed fully in Sect. 4.3). First, “similar” neighbours are discovered using collaborative filtering over user preferences; then, when a mutation occurs, the element or sub-element selected is replaced by the corresponding element in the neighbouring user. Figure 6 shows an example of mutation in bundle number 2, for the element PA, which is replaced by the exercise activity “Yoga, 59 minutes”, taken directly from a neighbour with similar preferences.

Remark 3

Using collaborative filtering within the mutation operator is non-standard. This novel contribution is designed to heuristically navigate through the population search space, guided by neighbours’ preferences.

Fig. 6
figure 6

Mutation operator example. \(\mathrm{K}=2\) nearest neighbours are discovered using collaborative filtering (see Sect. 4.3). Then, one nearest neighbour is selected at random and the bundle element of the neighbour (“Yoga, 59 minutes”) replaces the individual’s element. (Color figure online)

4.3 Directing evolution using nearest-neighbour collaborative filtering

In a collaborative filtering RS, items are typically recommended to a given user based on the preferences of similar users to her/him (Alhijawi et al. 2016; Karabadji et al. 2018). In essence, if \(u_a\) and \(u_b\) are similar users, and \(u_b\) has positively rated or liked an item \(x_j\) not seen by \(u_a\) yet, then \(x_j\) is likely to be recommended to \(u_a\). Accordingly, our proposed model incorporates a strategy inspired by collaborative filtering in the core GA that identifies similar users to the target user. Due to the multi-objective nature of our evolutionary approach, we consider a holistic notion of similarity among users that does not only consider their taste towards food and PA, but also their physical characteristics (weight, height, age, gender) and their selected well-being goal. For example, two users who have very similar food preferences but exhibit different physical characteristics and opposing goals, e.g. losing weight versus gaining weight, are unlikely to be considered similar.

To reflect this holistic view, we quantify similarity \({\textit{sim}}(u_a,u_b)\) between users \(u_a,u_b \in U\), using: (i) food preferences; (ii) PA preferences; (iii) physical status; and (iv) well-being goal. Let \(FT = \{ft_1, ft_2, \ldots \}\) be a non-empty finite set of food types and let \(\mathbf{p} _a^{ft} = [p_a^{ft_1}\; p_a^{ft_2} \ldots p_a^{ft_{|FT|}}]\) be a vector describing \(u_a\)’s preferences towards food types. Then, the food-based similarity between \(u_a,u_b\) is computed using the following formula:

$$\begin{aligned} {\textit{sim}}^{ft}(u_a,u_b) = 1 - d(\mathbf{p} _a^{ft},\mathbf{p} _b^{ft}) \end{aligned}$$
(7)

with \(d(\cdot ,\cdot )\) a normalised distance metric between two vectors, e.g. Euclidean distance. Let \(AT = \{at_1, at_2, \ldots \}\) be a non-empty finite set of PA types. Accordingly, let \(\mathbf{p} _a^{at} = [p_a^{at_1}\; p_a^{at_2} \ldots p_a^{at_{|AT|}}]\) be a vector describing \(u_a\)’s preferences towards such PA types. The PA-based similarity between \(u_a,u_b\) is computed as follows:

$$\begin{aligned} {\textit{sim}}^{at}(u_a,u_b) = \cdots 1 - d(\mathbf{p} _a^{at},\mathbf{p} _b^{at}) \end{aligned}$$
(8)

The user’s physical status is modelled after the attributes employed to calculate the standardised calorie expenditure function: height, weight, age, and gender. Formally, we have \({\textit{Status}}_a = [{\textit{weight}}(kg), \; {\textit{height}}(cm), \; {\textit{age}}(yr), \, {\textit{gender}}(m/f)]\). The rationale is that two users with similar physical status and activity levels will have a similar calorie expenditure rate, and therefore, the interrelationship between their food intake and exercise requirements in the recommended bundle should is similar. Based on their TEE value [Eq. (4)], we use the following formula to calculate the similarity between two users’ physical status:

$$\begin{aligned} {\textit{sim}}^{st}(u_a,u_b) = 1 - \dfrac{|{\textit{TEE}}_a - {\textit{TEE}}_b|}{{\textit{TEE}}_{{\textit{max}}} - {\textit{TEE}}_{{\textit{min}}} } \end{aligned}$$
(9)

Finally, an aggregation function \(\Phi _W\) is used to combine the three similarities on food preferences, PA preferences, and physical status, into one:

$$\begin{aligned} {\textit{sim}}(u_a,u_b) = \Phi _W({\textit{sim}}^{ft}(u_a,u_b), {\textit{sim}}^{at}(u_a,u_b), {\textit{sim}}^{st}(u_a,u_b)) \end{aligned}$$
(10)

with W a weighting vector for adjusting the relative importance of food preference, PA preference, and physical status. Finally, the selected well-being goal, is used to apply a “rewarding effect” on the aggregated similarity if the two users share the same well-being goal, thereby making users with a common goal more likely to be nearest neighbours of each other:

$$\begin{aligned} {\textit{sim}}'(u_a,u_b) = \left\{ \begin{array}{l l l} \sqrt{{\textit{sim}}(u_a,u_b)} &{}{} &{}{} \text{ if } u_a,u_b \text{ have } \text{ the } \text{ same } \text{ well-being } \text{ goal, }\\ {\textit{sim}}(u_a,u_b) &{}{} &{}{} \text{ otherwise. } \end{array} \right. \end{aligned}$$
(11)

Intuitively, since \(0 \le {\textit{sim}}(u_a,u_b) \le 1\), we have \(\sqrt{{\textit{sim}}(u_a,u_b)} \ge {\textit{sim}}(u_a,u_b)\). A simple k-nearest neighbour strategy is then applied to identify the k most similar users to \(u_a\) based on \({\textit{sim}}'(u_a,u_b)\). Information about the preferences and needs of these neighbours is used to direct the mutation operator of our evolutionary process, leading to more diverse and meaningful personalised recommendations by further exploring the search space.

5 GA Performance analysis

Here, we analyse, optimise, and benchmark the performance of the genetic algorithm used in EvoRecSys.

5.1 Finding suitable aptitude and semantic coherency of recommendations

An essential aspect in the proposed EvoRecSys implementation is to have a comprehensive understanding of the fitness value, which indicates the quality and semantic coherency of the recommendations. In order to demonstrate how to interpret a fitness value, we illustrate using an example based on a vegetarian user with 1925 calories intake per day, which yields 642 calories per meal. This example user spends 43 min per exercise session and has “losing weight” as well-being goal. Table 5 shows examples of fitness values (from worst to best aptitude) for bundles tailored to this example user. Using Table 5, we consider 0.2480 as an acceptable fitness threshold for assuring semantic coherency in the output recommendations.

Table 5 Aptitude values of example bundles for a vegetarian user with 1925 calories intake per day (642/meal), 43 min per exercise session, and goal of “losing weight”

Remark 4

Table 5 shows that small variations in fitness values may yield considerable changes in the quality of the output recommendations. This signals that the fitness function is sensitive and nonlinear.

Table 6 GA parameter settings with best performance

5.2 Finding optimal parameters for the genetic algorithm

Since this implementation of EvoRecSys has been deployed as a Web application hosted in a domestic-use machine (see Sect. 6), the evolutionary process must execute in real time. We therefore conducted experiments to find optimal GA parameter values that can consistently reach the required aptitude threshold. We ran 50 repeated evolutionary trials over the following parameter space: \({\textit{popSize}} = \{10,20,\ldots ,50, 100, 150, \ldots , 300 \}\); \({\textit{maxGen}} = \{100, 150, \ldots , 300 \}\); \({\textit{probCross}} = \{0.1, 0.2, \ldots , 1.0 \}\); \({\textit{probMut}} = \{0.1, 0.2, \ldots , 1.0 \}\). The best parameter values, which we use from now on, are shown in Table 6.

Fig. 7
figure 7

System performance across 50 trials. Green line indicates the mean performance and the shaded area presents the confidence interval (95%). The aptitude threshold (0.2480) is shown as dotted line. (Color figure online)

Figure 7 presents the mean performance of the best individuals across 50 trials (i.e. the mean system performance; with shaded region showing 95% confidence interval). The horizontal dotted line represents the fitness threshold we require for semantic coherency (see Sect. 5.1). Although we can be confident that EvoRecSys will reach the desired aptitude threshold after 60 s, the system continues to improve and does not equilibrate until 80 s. Therefore, we consider 60 s as the minimum computational time required to build coherent recommendations (on the given hardware); but to ensure the best possible recommendations, we use a run time of 80 s when building recommendations during the user study (Sect. 6). We believe the performance improvement is worth the additional 20 s that each user must wait, and also since more powerful hardware would reduce the run times, we focus on producing the best quality recommendations rather than minimising wait time.

Remark 5

All GA trials were conducted using a standard domestic-use machine. Deploying the model on high-performance hardware would significantly reduce run times.

Fig. 8
figure 8

Benchmark of the four EvoRecSys modified instances. Mean performance ± 95% confidence interval (shaded region). (Color figure online)

5.3 Benchmarking

Here, we benchmark the performance of our proposed algorithm. In particular, since our use of collaborative filtering in the mutation operator is novel, we are interested in quantifying the benefit that this process brings. To achieve this, we compare four approaches, each containing different components of the model. These are:

  1. 1.

    EvoRecSys-naïve: Rather than calculate the number of intake calories (see Sect. 4.2.1), individuals are created by choosing a random number of intake calories within the range [4, 5000]. Both genetic operators are enabled, however the mutation operator works under the “standard” approach such that a randomly selected item is replaced by another item of the same class (i.e. a side replaces a side; a main replaces a main) that is randomly selected from the database (i.e. collaborative filtering is not used in the mutation operator).

  2. 2.

    EvoRecSys-no-crossover: In this baseline approach, the crossover operator is disabled. Thus, the evolutionary task relies exclusively on the mutation operator, namely guided by the nearest-neighbour collaborative filtering strategy described in Sect. 4.3 and illustrated in Fig. 6. In addition, individuals are built considering the value calculated by the process described in Section 4.2.1.

  3. 3.

    EvoRecSys-standard: Both genetic operators are enabled. However, the mutation operator works under the standard approach, as described in approach (1). Furthermore, individuals are created by the procedure described in Sect. 4.2.1.

  4. 4.

    EvoRecSys-full: The proposed implementation in this paper. This approach has no modifications; it includes crossover and the collaborative-filtering-based mutation operator, as described in previous sections.

We conducted 50 trials on each approach. Figure 8 presents the mean performance of the best individual (the lowest aptitude value) across all trials, with 95% confidence interval presented as shading. While there are relatively small differences in best aptitude between each approach, these differences translate into significant differences in coherency of recommendation (see Remark 4). We see that the “full” system (4), which includes both crossover and mutation directed by collaborative filtering, produces the lowest aptitude values (see Remark 2), which indicates that it performs best. In particular, (4) significantly outperforms the “standard” approach (3) (paired t-test, \(p<0.0001\)), indicating that directing mutation using collaborative filtering is beneficial. Approach (3) also significantly outperforms approaches (1) and (2) (paired t-test, \(p<0.001\)). The “naïve” approach (1) appears to tend towards better performance values than the “no crossover” approach (2); however, this difference is not significant (paired t-test; \(p>0.05\)). Approach (1) starts poorly because of the randomised configuration of the initial population, with mean aptitude of 0.4337 at generation 0. However, the addition of the crossover operator enables approach (1) to quickly catch and then slightly overtake the performance of approach (2). In summary, these results show that both crossover and mutation with CF are necessary components for the system to perform best and are the only configuration that consistently reaches the desired aptitude threshold.

6 User study

Based on the previous experimental evaluation to determine an optimal configuration of our EvoRecSys Web implementation,Footnote 7 we deployed it to conduct a cohort study with users who volunteered to interact with the system. The study provides additional insight about the system performance from the subjective perspective of end users, analysing their response towards recommendations.

It is important to note that the Web front-end used for both user studies was designed to provide the GA with a default value in cases where the user, whether deliberately or accidentally, skipped a question related to food or physical activity preferences. We set a default value of 3, which corresponds to the neutral preference in a 5-point numerical scale. Thus, recommendations are generated even if the user skips all preference-related questions.

6.1 Subjective analysis of EvoRecSys recommendations

Volunteers were invited to conduct a series of interactions with the system throughout three steps, for approximately 10 min:

  1. i.

    Providing explicit rating information about food/PA preferences, physical status, exercising habits, and well-being goal (see Fig. 9a, b).

  2. ii.

    Receiving a list of three bundle recommendations and evaluating their satisfaction with each one (see Fig. 9c).

  3. iii.

    Assessing overall perception of recommendations received based on four criteria: diversity, serendipity, appeal, and healthiness.

A total of 205 users completed the three tasks. A geographical distribution of the country from where these users participated is shown in Table 7, and Table 8 summarises their demographic and physical characteristics along with their exercising habits and well-being goal.

Fig. 9
figure 9

EvoRecSys interface for the user study: a eliciting food preferences, b eliciting exercising habits, c providing bundle recommendations for their evaluation by the user

Table 7 Geographical distribution of 205 volunteer users for the study
Table 8 Demographics, physical status, PA habits, and well-being goal of participants
Fig. 10
figure 10

Left Average user satisfaction with \(K=3\) meal and PA recommendations in a 5-point scale. Right Percentage distribution of ratings given by users to single meals and single PAs in recommendations received

For each of the three meal-PA recommendations received, users were asked to rate the suggested meal and exercise (see Fig. 9c) using a 5-point Likert scale, and to optionally mark one of more of the recommended combinations as favourite. Figure 10 shows, on the left, the average user satisfaction with individual meals and PAs suggested alongside their standard deviation. The last two bars show the average results across all three meals (resp. PAs). The plot on the right-hand side of Fig. 10 shows the overall distribution of 1-to-5 ratings given by users to the recommendations. The results show, in general, a prevalence of positive ratings over negative ones, particularly towards meals, showing slightly better values than PA in terms of both average ratings and rating distribution. Deviations around the average value are shown as consistent between meal and PA being recommended, suggesting that there is a similar consensus between both aspects in terms of users’ perception of the recommendations.

Finally, in order to assess the perception of recommendations received “as a whole”, users were requested their subjective opinion regarding four quality criteria using again a 5-point scale with values ranging between 1 and 5: (i) diversity, where higher ratings mean more diverse and less repetitive recommendations, (ii) serendipity, with higher ratings meaning more serendipitous and less expected recommendations, (iii) attractiveness, with higher ratings meaning more appealing recommendations that suit their preferences, and (iv) health with higher ratings indicating that recommendations are perceived as healthier. These final questions were optional, hence not all users answered all four of them. Figure 11 summarises the feedback collected for the four questions as a rating distribution (left) and the average score per question/criterion (right).

Fig. 11
figure 11

Left Rating distribution (number of users) of values in the 5-point feedback scale, considering four criteria for evaluating recommendations. Right Average rating provided by users to recommendations on each criterion

We believe these are promising results for various reasons. Firstly, all four rating distributions show a moderately skewed trend towards higher ratings, demonstrating that the average ratings obtained are good representatives of a minority of negative feedback, with no polarised majority opinions around the two extremes of the rating scale. Health is the most positively assessed criterion by most users, with a significantly higher average rating than the other three (4.15). It is also the only criterion in which the majority of users gave the highest rating, and hence, the proposed model succeeds in delivering meal-PA bundles perceived by users as healthy. Most users reported recommendations as appealing (4) or very appealing (5), which is also an encouraging result in terms of balancing healthy recommendations with adaptability to the user preferences. Diversity and serendipity show, in average, slightly closer results to the neutral value (3), although the majority of ratings are still distributed across the \(\{3,4,5\}\) rating interval. This suggests that while the collaborative filtering approach integrated in the GA helps producing diverse and serendipitous recommendations, there might still be areas for improving these aspects in future versions of the model or in new ones, motivating a more thorough exploration of the GA components, its fitness function, and any other RS techniques to be investigated and integrated in EvoRecSys.

Fig. 12
figure 12

Challenge study example. User selected option A: EvoRecSys recommendation

6.2 Challenge study: EvoRecSys vs. collaborative filtering

Following the first study, 44 volunteers accepted an invitation to take part in a follow-up study to subjectively compare recommendations generated by EvoRecSys and recommendations generated by the second baseline system used in Sect. 5.3, which can be understood as collaborative filtering (CF) only. Users begin by entering their preferences (see Fig. 9), and are then shown 5 pairs of “blind” recommendations (e.g. see Fig. 12), one generated by EvoRecSys and one generated by CF alone. For each pair, the user is then challenged to select their preferred recommendation, without being told how the recommendations are generated. The ordering (A or B) of pairs is randomly shuffled between EvoRecSys and CF to ensure that there is no selection bias based on ordering of options shown.

For this stage of the user study, EvoRecSys was configured to recommend one bundle, using parameters \({\textit{popSize}}=150\), \({\textit{maxGen}}=100\), \({\textit{probCross}}=0.6\) and \({\textit{probMut}}=0.1\). For the CF-recommendation, we initially created a population of individuals. The best individual in this initial population (generation 0) is then taken, and the CF-based mutation operator is applied (see Sect. 4.2.4). In this way, CF-recommendations are built using collaborative filtering, but without evolutionary optimisation.

In total, we conducted 220 pairwise challenges (\(n=44\times 5=220\)). The recommendation option generated by EvoRecSys was preferred 124 times, while the recommendation generated by CF was preferred 96 times. If we consider the null hypothesis that recommendations generated by each system are equally likely to be selected by users, then we can test this hypothesis by using a binomial distribution with \(p=0.5\) (probability of each option being selected at random), \(n=220\) (number of repeated trials), and \(x=124\) (number of times that EvoRecSys option is selected). We get probability \(P(X\ge x)=0.034\). Therefore, results suggest that EvoRecSys recommendations are preferred by users and we are able to reject the null hypothesis at the 0.05 significance level.

7 Discussion and lessons learnt

Our efforts to reformulate the recommendation problem as a multi-objective optimisation problem driven by a GA can be summarised as successful in the light of the experimental results. In general terms, recommendations have been positively rated by the majority of users who participated in the study. Furthermore, after a careful experimental setting up of the model parameters, the model achieved recommendations that are tailored, consistent to integrated knowledge and domain guidelines, diverse, and acceptable. All of these are fundamental requirements to meet according to the extant RS foundations (Aggarwal 2006). On the other side, although we showed an implemented model founded on specific design decisions, it must be noted that EvoRecSys deserves further exploration of other recommender principles and user preference/interaction aspects left outside the scope of this work. This, together with the results of our study, suggest that the EvoRecSys framework and its conceptual architecture should be subject to further study by the research community, thereby opening new pathways of research within the field of recommender systems for health and/or based on evolutionary computing.

Although the proposed framework and model have reported favourable results, they constitute to the best of our knowledge the first research efforts for health RS in this direction. Consequently, several challenges and areas for improvement have been identified during the framework design, model development, and experimental studies. The most relevant such directions are:

  1. 1.

    Complementary datasets and interpretable recommendations: One of the proven strengths of EvoRecSys is its GA ability to construct configurable recommendations that accurately adapt to the users’ needs and preferences, personalising fine-grained aspects such as serving sizes in meals and PA duration. However, these recommendations—specifically the suggested meals based on food items—may sometimes be less interpretable than, for example, recommending a recipe (Musto et al. 2020). For this reason, an immediate aspect deserving study is how to incorporate datasets that facilitate more meaningful food recommendations such as recipes, ready meals from a supermarket, regional food, or specific groceries. An interesting question to study here is the effect of bridging precise and highly optimised meals generated by EvoRecSys with static but more understandable recipes/products (e.g. from third-party datasets) that are similar. This would also help developing bespoke models focused on determined demographic sectors.

  2. 2.

    Highly configurable and diverse components: Experiments on the GA parameter settings have demonstrated the importance of semantic coherency criteria to guarantee higher aptitude and quality in recommendations. Due to the nature of the techniques used at the core of the EvoRecSys framework, it is possible to flexibly define the architecture of the recommender engine. Based on this feature, more semantic rules such as compatibility among ingredients can be implemented in order to ensure coherency and diversity from the deepest level (food items within a bundle), to the highest level (food items between bundles). Furthermore, it is possible to add/remove new food item categories. For instance, desserts could be incorporated for building a more robust recommendation for two or three-course meals.

  3. 3.

    Implicit dynamic data acquisition: Additionally, improving how the users’ preferences are modelled in order to acquire a more accurate insight about the user preferences and habits would be possible. The model implemented in this study relies on preferences explicitly provided by the user during their initial interaction with the system. However, more reliable recommendations could be built: (i) by acquiring new forms of data dynamically and over time, e.g. via daily feedback of physical activity logs, and (ii) discovering how these recommendations may align with the preferences and personal needs stated by the user if a mechanism that learns from the user feedback and her/his evolving behaviour towards recommendations is incorporated. In line with this research direction, we also consider it equally important to define more objective evaluation metrics and criteria for experimentally validating the models developed, especially in terms of quantifying the extent to which recommendations align with stated and/or implicitly modelled users’ preferences.

Another relevant aspect to consider is that all experiments were performed on standard commodity hardware. Thus, the thresholds arranged in Sects. 5.1 and 5.2 were partly dependent of the computational power of the equipment available. A dedicated server with high performance hardware would enable us to consider much lower efficiency thresholds and therefore more reliable recommendations. Moreover, fast responses in real time and parallel handling of multiple user requests would be possible. Nevertheless, the experiments made provide a suitable methodological approach to follow for the experimental configuration and validation of models built upon the EvoRecSys conceptual framework.

8 Concluding remarks and future work

In this research, we have introduced EvoRecSys, a novel conceptual framework for recommendations in health-related domains, entirely based on the premise that it is possible to strike a balance between three main dimensions: (i) what the user prefers, (ii) what the user needs, and (iii) what the user sets as a goal. The framework is characterised by defining an evolutionary algorithmic approach that establishes the balance of these three components through a multi-objective optimisation problem. A distinctive feature of the framework is its ability to build highly configurable items in the form of meal-exercising bundles to be recommended rather than immutable items, which allows more tailored and reliable recommendations for the user. The proposed framework architecture is defined to be flexibly instantiated into different model implementations for recommendation across different application areas of health and well-being. In this paper, we presented an implementation of EvoRecSys into a general purpose meal-physical activity recommender to help achieving well-being goals. However, the proposed conceptual framework guidelines may also help when building models for people with special needs such as patients with chronic diseases, professional athletes, or people whose cultural background only allows them to eat specific food. In all cases, when a model is built, we encourage a validation process by health experts to consider the well-being goals and recommendations provided by the system; in particular, users should seek medical assistance to reach goals when there is an illness present.

This study has delivered a first proof of concept where GAs are exploited as the core technique of a recommender system instead of being a complementary part of a recommender engine driven by other currently used techniques. As a consequence of the promising results obtained in this research, future work directions have been outlined. For instance, the inclusion of a dietary and exercising diary for the user is one of the main developments that will help to improve the robustness of this framework by obtaining a better insight of users’ behaviour.

Additionally, a more flexible graphical interface will improve the user experience. For instance, the possibility of creating new bundles combining any of the recommended meals with any of the PAs built by the system. Regarding interpretation of recommendations (see Sect. 7), it can be difficult for users to intuitively compare food portions to the nearest gram and exercise to the nearest minute, therefore showing users an average food portion or a valid portion interval (e.g. to the nearest 50 g) and presenting valid time ranges for exercise (e.g. to the nearest 5 min), will improve the user experience.

On a last note, a native mobile application would provide more freedom in terms of the implementation of a more friendly user interface, possibly linked to wearable devices for seamless capturing of data. For instance, heart ratio and number of steps per day would allow us to learn more about the physical status and habits of the user and therefore the recommendations of physical activities would be more aligned with the physical activity in real time.