“Tell Me Why”: Using Natural Language Justifications in a Recipe Recommender System to Support Healthier Food Choices

Users of online recipe websites tend to prefer unhealthy foods. Their popularity undermines the healthiness of traditional food recommender systems, as many users lack nutritional knowledge to make informed food decisions. Moreover, the presented information is often unrelated to nutrition or difficult to understand. To alleviate this, we present a methodology to generate natural language justifications that emphasize the nutritional content, health risks, or benefits of recommended recipes. Our framework takes a user and two recipes as input and produces an automatically generated natural language justification as output, based on the user’s characteristics and the recipes’ features, following a knowledge-based recommendation approach. We evaluated our methodology in two crowdsourcing studies. In Study 1 ( N = 502 ), we compared user food choices for two personalized


Declarations
This paper or a similar version is not currently under review by a journal or conference.This paper is void of plagiarism or self-plagiarism as defined by the Committee on Publication Ethics and Springer Guidelines.All relevant materials for processing and analysis are available in our repository: https: //osf.io/hn3et/.
Author Contributions.Starke was responsible for the analyses and contributed to writing throughout the entire manuscript.Musto developed the paper's main idea and was responsible for the development of the Natural Language Recommender, and contributed to most sections in the manuscript.Rapp helped with the development of the recommender and contributed in writing to Sections 2 and 3. Semeraro helped with the finalization of the paper and supported the initial idea development.Trattner helped to finalize the paper.

Introduction
Food choices are the result of a context-dependent, multi-aspect process [1,2].While people's general food preferences in part determine short-term decisions [3], a significant part of our eating habits is strongly influenced by contextual factors [4].Many decisions are made at the point of purchase [5,6].For example, foods presented at eye-level sight in supermarkets are more likely to be purchased [7], just like food products with visually attractive packaging [4,6].Such food decisions are often made routinely [8], and are based on heuristics and so-called 'System 1' thinking (cf.[9]) rather than longer-term contemplation.
Online food choices are typically made in the context of informationfiltering and retrieval systems [2].Food recommender studies have examined different approaches to cater towards a user's appetite [10][11][12], but have paid little attention to how users can be supported to nourish themselves more healthily, despite evidence that commonly recommended (popular) internetsourced recipes tend to be unhealthy [13].Consumers tend to be overwhelmed with information when making decision [8], which cannot be alleviated by changing the recommended context.At best, studies have considered specific dietary constraints (e.g., allergies) and nutrient intake to generate healthier recommendations [14,15], or have leveraged human biases to steer user preferences towards specific recipes, for example by using visually attractive images [2,16].
In the recommender context, we argue that users can be supported to make healthier choices by using justifications concerning why a set of recommendations is presented.Specifically for the food domain, justifications of recommendations that elaborate on the nutritional content of different recipes can steer user choices away from the common popularity-based recommendations [17].An open question is to what extent justifications can affect user preferences if items are already personalized (cf.[18]), as well as whether user preferences can be affected if that user has made prior choices.
In this paper, we present an approach inspired by knowledge-based Natural Language Generation strategies [19], to produce justifications for different recipe recommendations.Recent developments in natural language justification strategies show their merit in improving the transparency of the recommendation process, increasing users' trust and affecting their decisionmaking processes [20,21].The proposed framework takes a user and two food recommendations as input and produces an automatically generated natural language justification as output, which is based on the user's characteristics and the recipes' features.It draws upon general knowledge about health risks and benefits related to food consumption to generate justifications.Within the framework, eight different justification strategies are implemented through two different justification styles, based on the combination of different informative content and features.In particular, we generate comparative justifications of recommendations, which juxtapose the main characteristics of two recipes into a single natural language sentence.For instance, such a justification could compare the fiber content of two recipes.This taps into consumer research on the effectiveness of comparative evaluations of item attributes [22], compared to a separate representation of that information (i.e., a 'Single' justification).
We evaluate the effectiveness of the eight implemented justification strategies and two justification styles to support healthy food choices.We examine this across two different studies, asking users in each study to choose between popularity-based and health-based recommendations.In the first study (N = 502), we examine which natural language justification style is most effective in steering users towards healthier recipe choices.Building upon preliminary findings, in the second study (N = 504), we examine which natural language justification strategy is most effective in promoting healthy recipe choices.In doing so, we adopt a strict baseline, where we first present a recommendation pair to users with no justification, immediately followed by the same pair but accompanied by one of our eight justification strategies.Such preference or choice reversal is hard to achieve [23], as people tend to stick to the statusquo when making a decision [9].Finally, in both we inquire on why users have chosen either the healthy or popular recipe, as a user's motivation could help us to understand how to design better justifications (cf.[20]).We posit the following research questions: [RQ1]: Which natural language justification styles are most effective in steering user preferences towards healthier recipes, and for which types of meals?
[RQ2]: Which natural language justification strategies are most effective in steering user preferences towards healthier recipes, and for which types of meals?
[RQ3]: To what extent can users' self-reported motivation predict healthy recipe choices?
As we will show in the following, it emerged that users preferred healthier recipes over popularity-based recommendations if comparative-style justifications are presented, as well as for specific types of justification strategies.
We summarize our contributions as follows: (i) We introduce a methodology to automatically generate a natural language justification to support personalized food recommendations; (ii) we design and (iii) evaluate several justification styles (i.e., None, Single, or Comparative styles) and strategies in a user study, where each justification leverages different user characteristics and recipe features.Moreover, we examine (iv) which justification strategies are most effective in affecting user choices.

Related Work
The idea of providing intelligent information systems with explanation facilities has been studied since the early 90s [24].It was introduced in the area of recommender systems in the 2000s [25], only re-gaining attention due to the recent General Data Protection Regulations (GDPR), which prescribed to increase the transparency of underlying algorithms.This particularly applies to recommender systems, since explanation strategies have shown to positively affect both a user's acceptance of and trust in presented recommendations [26,27].
Explanations in recommender systems can have different aims [20].For example, explanations can educate users or improve the efficiency of decisionmaking [28,29].For our current work, we identify persuasiveness as the main aim (cf.[30]).We specifically aim to promote healthy food choices through our justifications, which is novel to food recommender research [11].The persuasive explanation aim is touted in other domains as useful to convince users to try or buy a recommended item, such as a product on Amazon or a movie on Netflix [20,31].
With respect to the information content exploited to generate justifications, we frame our approach as being at the intersection between content-based and knowledge-based methods (cf.[28]).It is based on user characteristics and food features, along with general knowledge on food consumption.Taken together, they justify our health-aware recommendation by emphasizing health risks and benefits.This is related to studies where health risks are highlighted in a smoking cessation application in a recommender context [32,33]; although no evidence is provided concerning the effectiveness of such information [32], a knowledge-based health recommender did lead to better results than a hybrid recommender [33].Conversely, our work fills this knowledge gap by evaluating the impact of justifications, including health risks and benefits, on user food choices.
The effectiveness of justifications can be better understood by cognitive processing and decision-making theories.For one, dual-process theory emphasizes that people's behavior is determined by two diverse processes or systems: a non-conscious process that relates to spontaneous, heuristic-based thinking (i.e., 'System 1'), and a reflective process that relies on rational and conscious decision-making (i.e., 'System 2') [9,34].This duality in cognition is also described by the Elaboration Likelihood Model [35], which is an information processing theory of persuasion that describes changes in a person's attitude as the result of two diverse 'routes'.Under the central route, the recipient of the persuasion attempt (e.g., the user) is thinking rationally about the message, drawing upon prior experience and knowledge to carefully evaluate all of the information presented.In contrast, the peripheral route of persuasion relies on simple cues and heuristics to judge the relevance and validity of a persuasive message.
Fast decisions without much deliberation seem to be common in low-stake recommender domains, such as movies [36].These choices are typically the result of a simple association or inference process without much cognitive effort [35], activating the peripheral route.Affect is associated with peripheral activation in food choices, as certain emotions tend to be associated to specific foods [37], and foods are chosen based on their visual appeal [2,16].Although peripheral activation is likely when an explanation is absent (e.g., when only showing images and ingredients), we argue that providing a justification why specific recipes are presented, would increase the likelihood of activating the central rather than the peripheral route.In our study, it can be considered as a cognitively oriented healthy eating nudge [4], making users reflect about the contents of recipes.
Another hallmark of the current work lies in the development of a justification framework, designed specifically for the food domain.As discussed in [38], studies that evaluate the impact of explanations and justifications in the food domain are scarce, even though they could encourage users to stick to better eating habits.A preliminary attempt to introduce explanation mechanisms in a food RS is presented by Leipold et al. in [39], where a very simple explanation strategy based on food features is integrated with a food recommender system, but the impact on food choices is not evaluated.Another simple explanation interface is presented in [40], where users' food preferences are linked to the ingredients of the recommended recipe, generating explanations such as 'Because you want food containing X '.We go beyond [40], designing and evaluating a more comprehensive set of justification strategies.
Furthermore, the novelty of this work also lies in the automatic generation of comparative natural language justifications that emphasize similarities and differences between two alternative recommendations.Consumer decisionmaking research has shown that how two alternatives are presented (e.g., separately or comparatively) affects user preferences [22].A remotely similar approach is adopted by Chen et al. [41], who introduce a user interface where different recommendations are presented together with their distinctive features, obtained automatically from user reviews.However, in contrast with [41], rather than developing a completely novel user interface, we designed a framework to automatically generate a single natural language justification that compares two alternatives.
To conclude, we frame our approach with respect to the taxonomy of explanation strategies introduced in [42], labelling it as a black box methodology.Hence, the explanation strategy is not aware of the underlying recommendation model, generating a post-hoc explanation that is independent of the recommender algorithm.Post-hoc explanations provide reliable and effective explanations that are typically preferred by final users [43,44].We evaluate this framework by implementing two food recommender approaches: one that identifies popular recipes and one that selects healthier recipes.More details about the algorithms will be provided in the methodology.
Finally, we emphasize that the term justification is used, instead of the 'traditional' explanation.Even though both concepts appear to be synonymous, we follow the definition provided by Biran [45]: an explanation focuses on how the suggestion is generated, while justifications describe why a user would be interested in an item.This supposedly provides users with a means to make a more informed decision about consuming an item or not, fitting seamlessly to the current study's goal, for we evaluate whether and how natural language justifications affect users' online food choices.The Profiler module collects user's characteristics.It adopts a holistic user profiling approach used in other studies [46][47][48], including one on tastebased food recommendations and health-related scenarios [49].Holistic User Models (HUMs) [50,51] rely on the intuition of modeling a profile of the user by combining heterogeneous data points and mapping them to a set of facets the describe the user.These facets include affect (e.g., a user's current mood), contextual constraints (e.g., time and willingness to pay for meals), demographics (i.e., age, gender), self-reported health data (e.g., BMI, lifestyle self-evaluation, stress), and weight-related goals.Table 1 outlines the seven user aspects used, which are encoded in each user profile.Note that preferences were also inquired upon by asking about favorite ingredients, assuming that this was both related to the overall preferences and specific taste-related preferences.
In a similar vein, the Recipe Analyzer extracts the main food features of the recommended recipes (e.g., ingredients, nutrients).These include the nutritional content of food, expressed in nutrients (i.e., fats, fibers, proteins), calorie content, and a Food Standards Agency (FSA) recipe health score.The FSA score is an aggregate health score that captures the nutritional content of a recipe, based on the serving weight and the weight per 100g of nutrients: sugar, fat, saturated fat, and salt [2,13,52].In addition, the recipe analyzer also extracts contextual features of the recipes, such as cooking time and preparation difficulty.All these data are crawled from online sources (e.g., recipe web-sites, such as GialloZafferano1 ) and publicly available knowledge bases.Finally, the Generator outputs the justification, also based on knowledge about health-related food risks and benefits.The final output comprises eight different justifications strategies, each emphasizing different recipe characteristics or user features.The generation process follows the principles of Natural Language Generation systems [19], completely automated and unsupervised, thus not requiring any human intervention.
On the basis of this setting, our framework generates its output by following two different justification styles: single and comparative.It takes as input two different recipes.On the one hand, by following the first justification style, both of them are processed separately and each recipe is provided with a different justification.On the other hand, a comparative justification contrasts the characteristics of the two recipes and is automatically generated by the algorithm.
To generate justifications, the Generator module also relies on general food knowledge.It uses a food knowledge base that comprises facts related to the daily intakes of nutrients, as well as food consumption benefits and risks.Such knowledge relies on general guidelines concerning food consumption, such as government publications, academic studies, and commonsense knowledge.In particular, for each of the nutrients -sugar, carbohydrates, fats, proteins, fibers -around ten facts are encoded.For instance, "Consuming too much sugar increases the risk of diabetes", "High protein intake improves muscle development", and "High sodium intake increases health pressure".In total, we have encoded around 150 facts in our knowledge base, which are used in several justification strategies.

Overview of the Justification Strategies
We defined eight different justification strategies.These are outlined in Table 2 along with the relevant characteristics and features.To define and select the justification strategies we used two criteria: 1) The set of justification strategies should elicit mainly i) the central route of persuasion, i.e., encourage the user to reflect on her food choices, thinking rationally about the information provided; or ii) both the central route and the peripheral route, i.e., based on cues aimed at activating non-conscious processes; or, to a lesser extent, iii) the peripheral route of persuasion.In this way, we could compare different forms of persuasion, attempting to understand their effectiveness.We privileged the central route because we mainly embrace a cognitively oriented healthy eating nudge approach, which encourages users to reflect on their food choices.However, defining also strategies leveraging the peripheral route could give us insights on how a justification, which in principle should act on the conscious level of persuasion by providing information on the target behavior, could be combined with "nudges" that elicit unconscious processes.
2) The formulation of the single justification strategies should tackle specific factors that, either consciously (via the central route) or unconsciously (via the peripheral route), may possibly affect behavior change, as pinpointed by behavior change theory.To this aim, we relied on five widely accepted theories of behavior change: The Health Belief Model (HBM) [53,54] and the Theory of Planned Behavior (TPB) [55] which pinpoint the role of attitudes and beliefs in driving human actions; the goal-setting theory, which shows that people make decisions and take action in line with their set goal [56]; the Social Cognitive Theory (SCT) [57], which posits that behavior is affected by e.g., efficacy expectations (or self-efficacy) and the behavior of others; and the Transtheoretical Model of behavior change (TTM) [58], which describes change as a six-stage process through which an individual progresses.We chose these theoretical frameworks because they are the most widely used theoretical frameworks in technology-based interventions for behavior change [59][60][61][62][63].
The strategies exploit different information sources and follow a pre-set structure that is filled in dynamically, based on the workflow components depicted in Figure 1.The text outputs from the Profiler, Recipe Analyzer, and food knowledge components are concatenated using adverbs and conjunctions by the Generator.
While most justification strategies in Table 2 put emphasis on health, some variety is included.The Description strategy contrasts both recipes neutrally, providing context on a recipe's origin.The Popularity strategy is based on Social Cognitive Theory which highlights that people may imitate the behavior of others and choices that appear to be popular in order to be accepted by Table 2 Overview of the eight comparative justification strategies used in our experimental evaluation.While the Description and Popularity strategies did not incorporate user features, all others did and were designed to promote healthy recipe choices.In our example, the seafood risotto represents the popular recipe, while the chickpea soup represents the healthy recommendation.Seafood Risotto has a higher amount of proteins (22.7g vs. 18.2g), and a lower amount of fibers (1.2g vs. 12.4g) than Chickpea Soup.The intake of many proteins reduces hunger.Given your current weight, this can be helpful.However, the intake of fiber reduces cholesterol.

BMI, Mood, Sleep, Stress, Physical Activity
Ingredients, Nutritional Information Seafood Risotto has a higher amount of proteins (22.7g vs. 18.2g), and a lower amount of fibers (1.2g vs. 12.4g) than Chickpea Soup.The intake of too many proteins can lead to constipation and dehydration.

User Lifestyle
Personal Lifestyle

FSA Health Score
According to the FSA Score, Chickpea Soup is healthier than Seafood Risotto.Please consider this, given the importance you attributed to a healthy lifestyle.

Level of Difficulty
Chickpea Soup is easier to prepare than Seafood Risotto.It should be more adequate to your cooking skills, which are low.
others [57].The strategy contrasts each recipe's popularity score on the food community platform GialloZafferano,2 where they were initially uploaded.This strategy prioritizes the popularity-based recommendation over the healthy recommendation, in part encouraging peripheral processes of persuasion by creating a majority or bandwagon effect [16], where the pressure of "peers" may act unconsciously, as also employed by [18,23].
The strategies related to the recipe's Food Features and the user's Food Goals support central route persuasion processes.
The Food Features strategy is based on the TTM, which notices that consciousness raising, that is the increasing of knowledge about aspects related to the behavior be changed, may encourage people progress towards behavior change [58].The strategy informs users about specific nutrients of both recipes, aiming to overcome poor nutrient intake and low food knowledge levels [64,65], based on a neutral lexicalization of the characteristics, such as 'X contains more proteins and fats than Y, but fewer carbohydrates'.
The Food Goals strategy relies on goal-setting theory [56], which shows that people make decisions and take action in line with their set goal: reminding people of these goals is particularly effective if the goals are important to them and are self-set rather than assigned to them [66].Accordingly, Table 2 shows how nutritional food features per recipe are linked to a user's self-set goals, contrasting them ('X has more calories than Y'), and highlighting the recipe with fewer calories if a user pursues weight-loss goals.
Two other justifications strategies are based on the HBM and aim to highlight Health Benefits and Health Risks.HBM points out that health-related behaviors and choices are affected by: i) the perceived susceptibility to illness or health problems and the perceived severity of the consequences associated with the state or condition, ii) the perceived benefits of a health behavior [53,54].Both justification strategies link nutrient intake information to health benefits or risks, which is split in three parts: i) macro-nutrient selection, ii) retrieving nutrient-specific food knowledge, highlighting either health benefits or risks, iii) connecting relevant user characteristics to the nutrient-specific knowledge.For example, if the user reported to be overweight, the justification could highlight a risk related to heart diseases.Both pairwise strategies contrast the different levels of nutrients in two different sentences, each linking food characteristics to health benefits or risks, aiming to elicit emotional as well as reflective responses, activating both central and peripheral route processes.
The two final justification strategies are based on the user's self-reported lifestyle and skills and are aimed at eliciting the central route.
The User Lifestyle strategy relies on the Theory of Planned behavior, which states that human behavior is a consequence of one's behavioral intention, which is in turn explained by e.g., one's attitude and subjective norm [67].Attitudes may in turn be affected by values [68,69] While a value may be defined as a desirable and fundamental standard that guides people's actions [67], health value is "the degree to which individuals value their health" [70].In the food domain, it has been shown that people's perceived health values positively affect their choices and actions towards low-fat or low-calories menu items [67].The strategy connects the comparative nutritional evaluation of both recipes (in the form of an FSA health score [13]) to a user's personal values, such as the importance of maintaining a healthy lifestyle.The value-attitude-behavior model explains that both values and attitudes affect behavior [67,70].
In a similar vein, the User Skills strategy is grounded in Social Cognitive Theory and, in particular, in the construct of self-efficacy, which captures the belief in one's capabilities to execute a course of action [71].People who report higher levels of self-efficacy tend to execute more difficult tasks [16,57], because they are more confident that they will successfully execute the task; conversely, people with low self-efficacy may select less difficult activities and give up the accomplishment of difficult tasks [72][73][74].Bandura [16,57] hypothesized that self-efficacy impacts on choice of activities, effort, and persistence.In our study, we link the user's self-reported cooking experience to each recipe's 'level of difficulty'.

Food Recommendation Algorithms and Dataset
For our experimental evaluation, which spans across two studies, we use two personalized food algorithms to retrieve recipes.The first personalized algorithm optimizes for a recipe's health, which is referred to as the Healthy algorithm or health-aware algorithm.Healthy recipes are retrieved based on a variety of user characteristics, such as food goals and dietary constraints [17].The second algorithm retrieves popular recipes, based on given website ratings stored in the dataset, and is thus referred to as the Popular algorithm.Since our natural language justification framework is decoupled from both algorithms, we consider them as independent parameters in our experimental manipulation.
The recipes used for our NLP framework were sampled from a database of 4,671 Mediterranean-style recipes.The used dataset is available online, along with processing scripts. 3The recipes have been obtained from the popular food community platform GialloZafferano and translated to English.The recipes contain information about their name, category, preparation difficulty, as well as their ingredients, (macro-)nutrients, calories, rating count, and average website rating.Moreover, they also include several binary tags, such as vegetarian, vegan, lactose-free, and low-nickel.
4 Study 1: Examining the Effectiveness of Different Justification Styles

Method
In Study 1, we examined the merits of our natural language framework.We investigated the effectiveness of different justification styles (RQ1), comparing user choices for either the healthy or popular recipe recommendation across trails with no justification, a single-style justification, or a pairwise justification.We did so across three meal types, using eight different justification strategies throughout, exploring [RQ2] as well.

Participants
In total, we analyzed a sample of 502 US-based participants (43.8% Male) in an experimental evaluation. 4They were recruited through Amazon MTurk, being required to have a hit rate of 98% and a minimum of 500 approved hits. 5 Participants were required to be fluent in English.Most of the participants were employed (81.3%; 2.6% was student) and between 30 to 40 years (37.1%),whereas only 15.7% was between 20 and 30 years and 17.9% was between 40 and 50 years.More than 55% of the participants declared that they had a weight loss goal, whereas only (9.1%) had a weight gain goal.The majority of the participants completed the provided tasks between 5 and 10 minutes.They were reimbursed with 0.5 USD.

Procedure
First, the participants were asked questions about demographics, health and well-being, dietary restrictions, food preferences, and experience with home cooking, which were needed to model their profile (see Table 1 for an overview of the feature of the model).Then, the profiler (cf. Figure 1) generated three pairs of recommendations (see an example in Figure 2, where the left recommendation is based on our healthy food recommender, whereas that on the right is generated using a popularity-based algorithm), which were presented sequentially to the participant: first, two first course meals, then, two second courses and, finally, two desserts.For each pair, participants were required to choose i) the left-hand side, ii) or the right-hand side recipe, iii) or neither.The participants were not aware of what recipe was the healthy recommendation, or if there was any at all.Participants who chose one of the two recipes were asked to indicate the reason behind their choice, like the recipe's taste, healthiness, or ease of preparation.

Research Design
To examine whether healthy recipe choices could supported with different justification styles (RQ1), we designed three between-subject conditions.The participants were either presented no justification for the prompted recipes (i.e., the baseline), a justification style focusing on each recipe separately (i.e., 'Single Justification'), or a justification style comparing the two recipes (i.e., 'Comparative Justification').Moreover, to explore the merits of different justification strategies (RQ2) the conditions in which single-style or comparative justifications were presented, were subject to eight within-subject conditions (see Table 2).This way, one participant could be presented three different single justifications (e.g., Popularity, Food Goals, and Health Risks), while another participant would be prompted three other comparative justifications (e.g., User Lifestyle, Food Features, Health Benefits), or no explanation at all for each recipe.Figure 2 provides an example of a 'User Skills' justification, displayed within the red box.Users were asked to choose one recipe or neither of them, and to provide reasons why they had chosen a recipe.

Measures
To address [RQ1], we considered the effect of different justification styles on the percentage of healthy recommendations chosen by the participants.To this aim, we compared the 'No Justification' baseline either with any justification style separately, that is 'Single' and 'Comparative' justifications, or across the different justification strategies listed in Table 2.The effectiveness of each justification style was compared against the no explanation baseline, across all dish types for all choices made (i.e., choosing the popular recommendation or choosing neither of the recipes).To address [RQ2], Different justification strategies were compared between the no explanation baseline and the comparative style, as the results showed that the comparative style was the most effective justification style.Moreover, to address [RQ3], we examined participants' motivations for choosing one of the two presented recipes.The participants were required to indicate on 5-point scales to what extent a certain motivation was applicable, as well as to report the reason why they had chosen one of the recipe.Motivation items are depicted in Figure 2, and were related to a match with the user's preferences, weight-loss or gain goals, healthy eating goals, the recipe's taste, and a recipe's ease to prepare.The user preferences herein were related to the overall evaluation of the recommendations, while other motivation related to specific aspects (e.g., recipe taste).
Finally, we discuss the set of user characteristics that users were asked to disclose.These measures were employed by the Profiler to produce healthy recommendations (see Table 1).Besides obtaining data on food preferences and demographics (i.e., age, gender, BMI), we asked users to report whether they had any food goal (i.e., weight-loss, weight-gain, or no goals), and to rate the healthiness or their lifestyle and the importance for them of having such a lifestyle (5-point scales).The participants were also required to rate how frequently (5-point scale) they make healthy food choices, use websites with recipes, look at the nutritional values of food, and engage in home cooking.Furthermore, the participants were asked about their current levels of sleep, physical activity, and mood (3-point scales), and whether they were depressed or stressed ('yes' or 'no').Finally, we asked them about their food knowledge, as they had to indicate their cooking experience (5-point scale) and cost and time constraints for cooking.

Manipulation Check
We checked whether the health-aware recommendations could actually be considered as healthier than the popular recommendations.We assessed recipe healthiness through the 'WHO Score', following, which was first used in a digital recipe context by Howard et al. [52].It captured recommended daily intake levels for six nutrients and calories in a score between 0 and 7 [75].We confirmed that the health-aware recommendations yielded higher WHO scores for each meal type than the popular recommendations: for first courses (healthaware: 4.21; popular: 2.30), second courses (health-aware: 2.65; popular: 1.61), and desserts (health-aware: 2.94; popular: 1.66).The only nutrient for which the popular recommendations were slightly healthier than the health-aware ones was sugar, as the popular recipes tended to be high in fat and saturated fat but somewhat lower in sugar.

Results
We examined user choices through three different analyses. 6We did so in three ways.First, we examined whether presenting any explanation, through two different styles, affected user preferences for healthy recommendations.Second, we examined preferences for each of our eight justification strategies.Third, we investigated more specifically why users had either chosen healthy or popular recipes.

Single and Comparative Justifications styles (RQ1)
We studied whether participants were more likely to choose healthier recipes if justifications were presented underneath it.We used a one-way ANOVA to examine choices made across all types of meals.A Shapiro-Wilk test for normality showed no evidence for non-normality of the dependent variable (W = 1.00, p = 1.00) 7 .The healthy recommendation was revealed to be chosen more often as long as any justification was presented underneath it (47.6% of choices, SD = 0.50%), compared to the 'No Justification' baseline (M = 38.1%,SD = 0.49%): F (1, 1504) = 12.14, p < 0.001.This suggested that justifications helped to steer user preferences towards the health-aware recommendation.
We further differentiated between the effects of presenting 'Single' and 'Comparative' styles.To do so, we performed a two-way ANOVA with two conditions dummies for 'Single' and 'Comparative' justification styles.Although users were not more likely to choose the healthy recommendation when being presented a 'Single Justification' (43.0% of choices, SD = 0.50%, p = 0.13), compared to the baseline (38.1%), they were more likely to do so when facing a 'Comparative Justification' (M = 51.1%,SD = 0.50%): F (1, 1503) = 18.24, p < 0.001.This suggested that comparative justifications were particularly effective in supporting users choices for the healthy recommendation.
Further analyzes teased apart these effects by differentiating across the three meal types, as this would be consistent with previous research indicating that preferences differed across meal types [17].Using multiple one-way ANOVAs, we found that depicting any justification increased the number of choices for healthy recommendations for first courses (F (1, 500) = 4.83, p < 0.05) and desserts (F (1, 500) = 4.43, p < 0.05), but found no such effect for second course meals (F (1, 500) = 3.03, p = 0.08). 8We further inspected these effects by discerning between 'Single' and 'Comparative' justification styles per meal type, performing multiple two-way ANOVAs.This revealed that while 'Single' justifications did not significantly boost healthy recommendation choices in any dish type (all p-values > 0.1), 'Comparative' justifications did do so: for first courses (F (1, 499) = 5.37,p < 0.05), second courses (F (1, 499) = 6.33, p < 0.05), and desserts (F (1, 499) = 6.61, p < 0.05).This gave us further evidence that justifications comparing popular and healthy recommendations were more effective in steering participants' preferences towards healthy recommendations, than separate justifications per recipe.
To understand the results from the different ANOVAs, please refer to Figure 3. Illustrated are recipe choices per meal type (from left to right: first course, second course, dessert), for which we examined the percentage of the chosen options per meal type: neither recipe, the popularity-based recommendation, or the health-aware recommendation.For first course meals and desserts, it was clear that the 'Single' justification only increased the number of choices for healthy recommendation a little, while Comparative justifications increased that effect much further.For second course meals, there was little difference between 'No Justification' and 'Single' in terms of choices made, while 'Comparative' boosted choices for healthy recommendations.

Effectiveness of Justification Strategies (RQ2)
The previous subsection highlighted that pairwise justifications were the most effective in steering participants' preferences towards healthy recommendation.Here, we examine the effectiveness of specific justification strategies (cf.Table 2) to promote our healthy recommendations.9We examined the effectiveness across all meal types, as well as per separate type of meal.Table 3 outlines four different logistic regression analyses, which each predicted whether our health-aware recommendation was chosen (compared to a popularity-based choice or no recipe chosen).We found effects to be mixed across the different meal types, while the second course and dessert models had the highest pseudo R 2 -values.However, all significant effects across all models were positive, indicating that the different justification strategies in the comparative condition increased the likelihood that the healthier recommendation was chosen, not the popularity-based option.
The model across all meal types in Table 3 shows that three justification strategies effectively supported health-aware choices.A comparison of the food features of the two recipes (e.g., Recipe A contains less fat than Recipe B) was related to a higher likelihood of choosing the healthy recommendation compared to the no justification baseline: β = .86,p < 0.001 (also in the first course model), as did justification that compared the health risks of both recipes: β = .98,p < 0.001 (also in the second course and dessert models).
In a similar vein, comparing recipes in terms of their health benefits led users to choose the healthier dessert more often: β = .84,p < 0.05, but not for other meal types.Table 3 also shows that comparing recipes in terms of food goals increased the likelihood of choosing the healthy option for first courses: β = .78,p < 0.05, but not for second courses and dessert.In contrast, a somewhat counterintuitive effect was that a popularity justification strategy, which typically showed that the healthy recipe was less popular than the popularity-based recommendation, increased the likelihood of choosing the healthy recommendation: β = .59,p < 0.05 (also in the dessert model).
Table 3 Four logistic regression models, predicting choices for healthy-aware recommendations (against no choice or popularity-based choices) in the 'Comparative' justification condition, compared to the no explanation baseline.The first model examines choices across all meal types (N = 1, 071), the other models concern meal type-specific analyses (N = 357 for each model).The denoted 'Pseudo R 2 ' is McFadden's pseudo R 2  Table 3 also points out which strategies did not affect participants' preferences between the 'Comparative' and 'No Explanation' conditions.Both giving comparative descriptions of the contents of the recipe (e.g., the ingredients) and comparing whether the recipes match with the participant's lifestyle -for each meal type, did not affect participants' preferences.Furthermore, comparative justifications of food goals did not influence choices about desserts, whereas highlighting health benefits and risks did not affect choices about first course meals.

Choice Motivation (RQ3)
Finally, we investigated why the participants had chosen one of the proposed recipes (RQ3).We performed four logistic regression analyses that compared cases in which either the popular or healthy recommendation was chosen, while ignoring cases in which neither recipe was chosen.Table 4 shows a model that includes a participant's choice motivation across all meal types, as well as three meal-specific models.Significant, positive effects in Table 4 indicate reasons why the healthy recommendation was chosen, while significant negative effects provided evidence as to why a popular recommendation was chosen.The best model fit was observed for the first course meal model, for which the pseudo R 2 was around two times higher than for the other models.
We observed mixed evidence for why healthy recommendations were chosen across different meal types.Our health-aware recommendations were chosen more often because of health-related reasons.A positive effect was found across all meal types (β=.41, p < 0.001), as well as for first course meals (β=.78, p < 0.001) and desserts (β=.47, p < 0.001).In contrast, tastiness was related to popular meal choices: averaged across meal types (β=-.47,p < 0.001), as well as for first course (β=-.54,p < 0.001) and second course meals (β=-.58,p < 0.001).Furthermore, users who indicated to choose recipes because they matched their preferences, were more likely to choose our health-aware recommendations across all meal types (β=.13, p < 0.05), in particular for second course meals (β=.52, p < 0.001).Second course healthy recipes were also chosen more often because a match in food goals: β=0.21, p < 0.05.Table 4 Four logistic regression models, each predicting user choices for the Healthy Recommendation.Models either included choices across all meal types (N = 1, 339), or only meal-specific choices: First Course (N = 462), Second Course (N = 437), and Desserts (N = 440).We only considered recipe pairs why users had either chosen the healthy recommendation (positive effects) or the popular recommendation (negative effects).R 2 is McFadden's pseudo-R² [76].* * * p < 0.001, * * p < 0.01, * p < 0.05.In contrast, easiness was negatively related to choosing healthy first course recommendations (β=-.26,p < 0.01), suggesting that users had chosen first course popular recommendations because they were easier to prepare, while no such effects were observed for second course meals and desserts.

Conclusion
This study explored the effectiveness of different kinds of justifications aimed at explaining health-aware recommendations.With regard to justification styles (RQ1), the study results show that participants preferred popular recipes when no explanation is presented, whereas they preferred health-aware recommendations when a justification is paired with the suggestion.Among the different justification styles presented, we first discovered that comparative justifications are more effective in encouraging healthy choices than single justifications.This goes in line with previous research that emphasizes that individuals tend to make comparative judgments rather than combining two independent observations [3].Furthermore, we have explored the effectiveness of different justification strategies (RQ2), finding that comparing two recipes features and their related health risks better promotes healthy food choices.Finally, we have also shown what drives users' choices in selecting healthier recommendations (RQ3), and whether the reasons differ per meal type.For most meal types, we discovered that popularity-based choices are driven by taste motivations, while choices for our health-aware recommendations are tied to health-related reasons.This said, the contrast between 'no justification' and 'justification' scenarios is usually evaluated in between-subject designs (i.e., A/B tests) or in a within-subject design across multiple, heterogeneous sets [20,30].In contrast, examining changing preferences for the same set of recommendations is uncommon [77,78], for this is harder to measure.To date, only Zhu et al. [23] examined whether a recommender could reverse user choices within a single study due to majority-based social explanations (e.g., "108 people prefer this one" vs "8 people prefer this one").Users were first presented pairs of items without any explanation, after which later in the study the same pairs were presented again, but this time with social explanations.The explanation was presented alongside furniture products, baby photos, and other items from various domains.They found that 14.1% of the users switched towards the item with the majority norm if it was presented quickly after the first trial, while this percentage was higher (22.4%) if there was more time between trials.We follow this approach of preference reversal in Study 2.

Study 2: Investigating Recipe Choices for Different Justification Strategies
For Study 2, we took a stricter study setup than in Study 1, following the work of [23].We examined whether back-to-back trials with and without justifications lead to choice reversal across a recommendation pair.In doing so, we assessed the effectiveness of eight different justification strategies across three different meal types.Note that all relevant processing scripts and datasets are available in our repository: https://osf.io/hn3et/.

Participants
We invited users from the crowdsourcing platform Amazon Mechanical Turk to participate in a study on recipe recommendations and food enjoyment.
Participants were required to be US-based and to have a hit rate of 98%, with a minimum of 500 approved hits, 10 and were reimbursed with 0.5 USD.In total, 504 participants (54.7% Male) completed our user study, among which 61.0% was between 20 and 39 years old.The majority of users was employed (73.6%; 14.9% was student) and had a weight loss goal (51.1%), while only 70 users (13.9%) had a weight gain goal.Participants were recruited throughout the United States, which may have varying levels of familiarity with Italian cuisine and a Mediterranean Diet [79].

Procedure
To provide personalized recipe recommendations, we first asked users to indicate their personal preferences regarding their eating habits and to disclose demographics.These included the different user features that were also used to generate the different justification strategies (cf.Table 2), including questions about a user's BMI, cooking experience (5-point scale), self-reported health (5-point scale), mood and well-being (3-point scales), as well as their dietary restrictions (e.g., no gluten or lactose) and general food preferences (i.e., input of ingredients a user liked).Subsequently, we presented six pairs of recipe recommendations -one at a time.The Profiler (cf. Figure 1) generated three recipe pairs based on a user's responses, which were each presented twice to a user.This included a pair of Mediterranean-style first course meals (cf.[80]), a pair of second course recipes, and a pair of desserts.Figure 4 shows an example set of first course recommendations, depicting the healthy recommendation on the left and the popularity-based recommendation on the right.Users were asked to choose the recipe they preferred the most, or neither of them.In addition, users were required to indicate on 5-point scales to what extent different reasons were underlying their choice, whether this was due to a recipe's ease of preparation, fit with user goals or preferences, health, or taste.

Research Design
In line with [23], we presented each recipe pair twice to a user.While the first trial was presented with no justification, the second trial presented the same pair of recommendations with a pairwise justification.In doing so, we examine Fig. 4 The study's interface for two first course meals.The recipe depicted on the left is our healthy-algorithm recommendation, the one on the right is generated by a popular algorithm.On the first trial, no justification is given but a list of ingredients per recipe.Depicted here is the second trial, presenting a pairwise 'health benefits' justification underneath both recipes.Users were asked to choose one recipe or neither of them, and to provide reasons why they had chosen either recipe.
[RQ2], representing the peripheral route of the elaboration likelihood model by a recommendation scenario with no justifications.In contrast, decisions facing a pairwise justification require to interpret what is comparatively presented, encouraging the user to reflect on the information provided and, thus, eliciting central route processes.Hence, the current study juxtaposes these two scenarios, by initially asking users to choose a recipe from a pair of recommendations in the absence of any justification and, subsequently, re-visiting that choice when that same pair is presented again -accompanied by a justification.While the latter should take a more central route towards a user's elaboration, the justifications in the current study are situated on different points of the 'peripheral-central continuum', supporting rational reflection to different degrees and also prompting information that elicits peripheral processes.Each justification strategy was randomly sampled from the eight strategies listed in Table 2.

Results
In the following, we examined [RQ2] and [RQ3].We first reported the descriptive statistics of our 'no justification' baseline.Then, we examined how often users switched towards a different recipe when facing any justification strategy, before examining the effect of specific strategies (RQ2), and how different choice motivations related to healthy food choices (RQ3).

Baseline Results and Users Switching to the Healthy Recommendation
To investigate whether justifications led users to swap their initial choices for the healthier recommendations (related to all research questions), we first examined user choices in the no justification baseline.Figure 5 depicts the distribution of recipe choices per meal type.For first course meals and desserts, the popular recipe was slightly favored, while the healthier recommendation was preferred for second course meals.Since popular recipes were typically preferred in other studies (cf.[13]), this suggested that our health-aware recommendation pipeline was sufficiently personalized to the extent that many users already liked it -even without any justification.By comparing Figures 5 and 6, we examined whether user choices reversed for the same recipe pair after a justification was presented.By performing paired t-tests, we found that users were more likely to switch to the healthier recommendation when any justification was presented alongside first course meals, compared to no justification: t(503) = −3.17,p < 0.01.In contrast, we observed no differences in healthy recipe choices for second course meals: t(503) = 0.24, p = 0.81, nor for desserts: t(503) = −0.24,p = 0.81.

Specific Justification Strategies (RQ2)
We further investigated which justification strategies led users to reverse their choices towards the healthier recommendation (RQ2).We assessed whether the likelihood that a healthy recipe was chosen increased or decreased due to a specific justification strategy (i.e., reversing user choices), compared to the no justification baseline in the first trial.To this end, Table 5 reports three random-effects logistic regression models, one per meal type, of which the second course model is reported but disregarded, because it did not pass the Wald χ 2 test of model fit.Table 5 shows that different justification strategies affected users' healthy choices for different meals.For first course meals, four different strategies increased the likelihood that a healthy recipe was chosen: a justification that described the features of both recipes (β = 1.69, p < 0.05), justifications that compared both recipe's nutrients and linked them to health benefits (β = 2.21, p < 0.01) and risks (β = 3.25, p < 0.01), and a justification on how a recipe could contribute to a user's lifestyle (β = 1.84, p < 0.05).This suggested that most of the justification strategies that highlighted nutritional aspects of recipes, and possibly linked these to user characteristics, were successful in reversing initial user choices and steering them towards healthier choices for first course meals.
Justifications were less successful in promoting healthy dessert choices.Table 5 shows that the strategies that affected the likelihood of healthy first course choices, did not do so for desserts.Instead, justification strategies on the recipes' health benefits (β = −2.66,p < 0.01) and preparation difficulty (i.e., user skills; β = −1.79,p < 0.05) decreased the likelihood that a healthy dessert was chosen.It seemed that our justification strategies were not appropriate for the dessert context, as users might have had more taste-related reasons for their choices, which was examined next.

Choice Motivation (RQ3)
Finally, to contextualize our findings, we examined to what extent a user's motivation to choose the healthy recommendation changed after being presented any justification (RQ3).Table 6 describes six logistic regression models: three models that predicted healthy recipe choices before a justification was presented (denoted by β pre ; one per meal type), and three models for after a justification was presented (denoted by β post ).Across all meal types, we baseline.This high proportion of healthy recipes was barely affected by any of the justification strategies.
Table 6 Six Logistic Regression models predicting healthy recipe choices using different choice motivations.Reported are the β coefficients and standard errors before being presented a justification ('pre') and after being presented one ('post'), per meal type.Food Characteristics-related motivations were only inquired 'post-justification'. * * * p < 0.001, * * p < 0.01, * p < 0.05.found that health-related choice motivations positively affected the likelihood of healthy recipe choices 'post-justification', while this only applied to first course meals and desserts 'pre-justification'.This suggested that our healthaware recommendations catered to users who were making health-motivated recipe choices, while the justification was important for second course meals.
In contrast, none of the models showed a relation between preference-related, goal-related, and food characteristics-related motivation and healthy recipes choices, indicating that these motivations were not specifically linked to either recommendation.Table 6 further suggests that addition of justifications seemed to put less emphasis on contextual factors.Whereas motivations related to taste (first course meals and desserts) and ease of preparation (first course) decreased the likelihood that a healthy recipe was chosen, these effects were no longer present 'post-justification'.This suggested that the nutritional or healthrelated emphasis of most of our justifications was successful, arguably making users reflect on their initial food choice and tapping into the more central route of persuasion.

Conclusion
Study 2 analyzed users' changing preferences for the same set of recommendations provided, examining choice reversal in back-to-back trial with and without justification.We provided additional evidence for addressing our research questions, by evaluating the effectiveness of eight different justification strategies (RQ2), grounded in psychological literature, across three different meal types.The study results pointed out that pairwise justifications may encourage participants to reverse their choices towards healthier recipe recommendations, moving them away from popular recipes, but that this particularly applied to first course meals.Moreover, we discovered that different kinds of justifications may have different effect for different types of meals.Justification strategies tied to food features, health benefits and risks, and the participant's lifestyle are most effective with reference to first course meals.However, with reference to second course meals we found no effect, which might be due to the fact that this kind of meal was preferred by a large part of the participants in the pre-justification trial, leaving little room for improvement when introducing pairwise justifications.
With regard to the choice motivation of participants (RQ3), we found more evidence that users who are interested in health were more likely to choose the healthy recipe.This already applied to the pre-justification conditions for first course meals and desserts, but also post-justification for second course meals.In addition, we observed that other motivations that were present prejustification, such as ease of preparation and the taste of the recipes, were no longer important after seeing a justification, indicating that the justifications affected what mattered to when choosing a recipe.

Discussion
We examined to what extent natural language justifications in a knowledgebased food recommender system can support healthier recipe choices.We have presented two studies in which we have predicted recipe healthiness by the style of justification used (Study 1; RQ1), by the justification strategy used (Study 1, but mostly Study 2; RQ2), and by a user's choice motivation (both studies; RQ3).The effectiveness of eight different justification strategies, which have been grounded in psychological literature, have been evaluated across three different meal types.In doing so, Study 2 has employed a research design with a stricter baseline, examining choice reversal in back-to-back trial with and without justification, to which we are among the first in recommender system research [23] and the first in food recommender research [11,17].
The overall contribution of this paper is twofold.First, we present a recommendation approach that captures a user's eating preferences.In contrast with most earlier work [10,11,81], we have not focused on recipes that users liked in the past, but we have considered a user's general eating preferences, affect, self-reported skills, and domain knowledge.This has resulted in a recommendation pipeline that presents personalized, yet healthier recommendations.Second, we have presented an approach to generate natural language justifications food recommendations.While the NLP pipeline is a contribution in its own respect, particularly in a food recommender system, we have also validated its effectiveness by showing what types of justifications are most effective to promote our health-aware recommendations, through a user study.Whereas popular recipes are preferred by most users if no explanation is presented (our 'baseline'), we have shown that most users prefer our health-aware recommendations over a challenging popularity-based recommendation baseline, when presenting both recommendations along with a comparative justification.
Our results indicate that pairwise justifications can help to reverse and steer user preferences towards healthier recipe recommendations, moving away from the commonly-preferred popular recipes.However, it seems that different types of justifications might be effective for different types of meals.The use of justifications has led to the most preference reversals in first course meal choices, for which we have found that strategies related to food features, health benefits and risks, and a user's lifestyle are most effective.In terms of persuasiveness, we expect these strategies to have appealed to different parts of peripheral-central route continuum of the elaboration likelihood model [35], since the Health Benefits and Health Risks justifications comprise both emotional and reflective responses [53,54], while Food Features and User Lifestyle mainly require longer-term contemplation.The justification effectiveness is also reflected in the reported choice motivation of users: whereas ease of preparation and taste-related reasons negatively affected 'pre-justification' healthy choices, we have only found health-related reasons to choose a healthy recipe 'post-justification'.
The lack of any effects due to our justifications for second course meals could be attributed to the relatively high proportion of choice for the healthy recommendations in the baseline.Since these were preferred by a large proportion of users in the pre-justification trial, this left little room for improvement by introducing pairwise justifications.
Furthermore, we find that dessert choices are mostly taste-related, which undermines the effectiveness of most health and nutrition-related justifications.Nonetheless, our analysis of choice motivations suggests that the justification have put more emphasis on the health aspect, as taste-related motivations decreased post-justification.We expect that justifications will mostly resonate with users who have strict dietary restrictions or ambitious healthy eating goals.
A limiting factor to our study's design was that the same order of meal types was maintained across all participants, starting with first course meals and ending with desserts.It is possible that users facing their second or third pair of recipes were less likely to change their preferences when facing a justification for those meal types.Alternatively, users might have already opted for the healthier choice in the first place (e.g., for the second course meals), because the justification for the first course meal activated reflective cognitive processes [35], which could have spilled over into later trials.In that sense, the results for the first course meals are likely to be more representative than those for second course meals, as this meal type is also less familiar to non-Italian natives.
The extent to which users are familiar with Italian cuisine has not been measured in our studies.It is possible that their evaluation of Italian-style recipes is different from, for example, American-style recipes, for example due differences in dietary intake styles [80].Italian recipes could fall under a Mediterranean diet, which is, among others, characterized by a high intake of fruits, vegetables, whole grains, legumes, and nuts and a much more moderate intake of red meat and dairy products compared to a North American diet [82].While all participants in both studies are based in the USA, requiring fluency in English, their cultural and ethnical background is not known, nor is their knowledge on various cuisines.Regional differences exist in the USA regarding the dominance of the Italian cuisine [79], among others due to largescale immigration from Italy around the turn of the 20th century [83].While the implications of an American-Italian match in cuisine cannot be inferred from our results, it is clear that many Americans are familiar with Italian-style meals [79].Moreover, general attitudes towards Italian products are rather positive [84], which might have increased user favorability towards any Italian recipe.Follow-up studies could control for this match between participants and cuisine.
Another limiting factor to our findings is the extent to which the recommendations fit into one's diet.While shifting towards a healthier dinner meal can go a long way in terms of improving dietary intake [85,86], it is not informative about one's eating habits throughout the rest of the day.In a similar vein, the extent to which longer-term preferences have been considered is minimal.For our approach, we have assumed that one's preferences as elicited in our knowledge-based system apply to the current session and beyond, using the session-based approach of previous recipe recommenders [87,88].While this has been appropriate to address our research questions regarding justifications (RQ1-RQ2), future research is required to examine whether such an intervention will lead to longer-term changes.
We recommend that follow-up studies explore the effectiveness of different justification strategies in a less controlled environment.Whereas the research design of the current study is suitable to point out specific effects, most food choices are not made between pairs of recipes, but rather in the context larger lists, such as in 'more like this' recommendations on recipe websites or in the context multi-list food recommender interfaces [88].
With regard to specific justification styles, we find that comparative approaches are more effective in promoting choices for health-aware recommendations than single justifications.This taps into research that people are much at making comparative judgments than combining two 'singular' observations [22], which is reflected by the effectiveness of our 'Comparative' justification style over the 'Single' style.The obtained evidence is convincing, since we have observed this effect across different meal types -even desserts, for which food choices tend to be more related to taste instead of health [17].Moreover, we have also examined the effectiveness of specific justification strategies, suggesting that presenting a comparison of each recipe's features and health risks seems to cater towards a user's healthy food preferences.The sophistication of these strategies may have contributed to their effectiveness, for they link and compare different aspects, namely user characteristics, recipe features, and food goals.Although the large number of comparisons for specific justification styles may have been prone to a higher false positive rate, the overall results point out that all explanation strategies either promote healthy food choices -even the popularity-based strategy -or have no net effect.
We have also examined what drives users to choose healthier recommendations, and whether this differs per meal type.For most meal types, we have found evidence that popularity-based choices are related to taste motivations, while choices for our health-aware recommendation are linked to health-related reasons.This confirms that our health-aware recommendation pipeline caters to users with healthy eating goals, which is promising for future applications that seek to support such users.Moreover, 'because it fits my preferences' is also found to be a reason to choose the healthy recommendation across all meal types, suggesting that our approach could generate both satisfactory and healthy food recommendations, which is rarely found in food RSs to date [11,81].
An interesting avenue of future research is to test whether the insights can be generalized in a practical application if more than two recipes in a recommendation list (e.g., [89]).Moreover, we will introduce justifications combining several user-focused aspects, such as food taste and goals, to assess whether these can persuade a user to choose the healthier recommendation.Moreover, we will investigate whether such natural language justifications can be personalized further, and whether this would increase their effectiveness.For example, presenting justification styles that address healthy eating goals make more sense if a user has indicated to have such a goal.While the current user study has done so by inquiring on the user's preferences in the first screen, such questions would only need to be asked when a user's profile is created, for instance on a recipe website.
Finally, we wish to emphasize that the study can serve as a blueprint for future studies on healthy food recommendation.We have shown that our algorithm successfully generates healthy recommendations, as users who chose them indicated to have health-related choice reasons.Moreover, we have also shown how such recommendations should be presented to support healthy food choices.Such a combination of a knowledge-aware algorithm and UI design should pave the way for even more sophisticated applications in food recommendation, as well as for applications in other behavioral recommendation domains.Moreover, future work should extend the number of inputs in the recommender framework, by taking into account a larger and more comprehensive set of algorithms and to evaluate them.

Fig. 1
Fig.1Schematic workflow to generate natural language justifications, based on user and recipe features and food knowledge, to be incorporated in food recommendations.

Figure 1
Figure 1 depicts the workflow to generate natural language justifications.It shows three main components.The Profiler module collects user's characteristics.It adopts a holistic user profiling approach used in other studies[46][47][48], including one on tastebased food recommendations and health-related scenarios[49].Holistic User Models (HUMs)[50,51] rely on the intuition of modeling a profile of the user by combining heterogeneous data points and mapping them to a set of facets the describe the user.These facets include affect (e.g., a user's current mood), contextual constraints (e.g., time and willingness to pay for meals), demographics (i.e., age, gender), self-reported health data (e.g., BMI, lifestyle self-evaluation, stress), and weight-related goals.Table1outlines the seven user aspects used, which are encoded in each user profile.Note that preferences were also inquired upon by asking about favorite ingredients, assuming that this was both related to the overall preferences and specific taste-related preferences.In a similar vein, the Recipe Analyzer extracts the main food features of the recommended recipes (e.g., ingredients, nutrients).These include the nutritional content of food, expressed in nutrients (i.e., fats, fibers, proteins), calorie content, and a Food Standards Agency (FSA) recipe health score.The FSA score is an aggregate health score that captures the nutritional content of a recipe, based on the serving weight and the weight per 100g of nutrients:

Fig. 2
Fig.2The study's interface for two first course meals.The recipe displayed on the left is our healthy-algorithm recommendation, the one displayed on the right is generated by a popular algorithm.Depicted within the red box is a justification in a specific style, in this case a 'Comparative' User Skills justification; the box is missing in the 'No Justification' condition.Users were asked to choose one recipe or neither of them, and to provide reasons why they had chosen a recipe.

Fig. 3
Fig. 3 Percentages of choices per condition, per meal type.Depicted are choices for neither recipe (in blue), the Popular recipe (in red), and the Healthy recommendation across three different meal types.Conditions are the three different justification styles: No justification, single justifications, and comparative justifications.Meal types are First Course, Second Course, and Dessert.
) , where 'Lc' denotes the maximized likelihood value of the current model and 'Lnull' of the baseline model.

Fig. 5
Fig. 5 Distribution of recipes chosen in the no justification baseline (i.e., the first choice made for a recipe pair), per meal type.

Fig. 6
Fig.6Distribution of recipes chosen when a pairwise justification was presented (i.e., a recipe pair's second choice), per meal type.

Table 1
User characteristics obtained by the Profiler module in our natural language justification workflow.
Seafood risotto is a classic first dish of Italian cuisine, perfect for special occasions and for all seasons!Chickpea soup is a very simple and tasty first course.A poor farmer's recipe, that is prepared in very few steps!Popularity none Popularity Seafood Risotto is more popular than Chickpea Soup in the community.
Seafood Risotto has a higher amount of proteins (22.7g vs. 18.2g), and a lower amount of fibers (1.2g vs. 12.4g) than Chickpea Soup.

Table 5
Random-effects logistic regression models (clustered at the user level), capturing different justification strategies that predict whether the healthy recipe is chosen from a recommendation pair.Effects are relative to the effect in the 'No Justification' baseline.Note that the Second Course model does not pass Wald's model test and can be disregarded.* * * p < 0.001, * * p < 0.01, * p < 0.05.