1 Introduction

Food choices are the result of a context-dependent, multi-aspect process (Furst et al. 1996; Starke et al. 2021). While people’s general food preferences in part determine short-term decisions (Köster 2009), a significant part of our eating habits is strongly influenced by contextual factors (Cadario and Chandon 2020). Many decisions are made at the point of purchase (Bialkova and van Trijp 2011; Bialkova et al. 2014). For example, foods presented at eye-level sight in supermarkets are more likely to be purchased (Kroese et al. 2016), just like food products with visually attractive packaging (Bialkova et al. 2014; Cadario and Chandon 2020). Such food decisions are often made routinely (Kalnikaitė et al. 2013) and are based on heuristics and so-called System 1 thinking (Kahneman 2011), rather than longer-term contemplation.

Online food choices are typically made in the context of information-filtering and retrieval systems (Starke et al. 2021). Food recommender studies have examined different approaches to cater toward a user’s appetite (Freyne and Berkovsky 2010; Trattner and Elsweiler 2019; Elsweiler et al. 2022), but have paid little attention to how users can be supported to nourish themselves more healthily, despite evidence that commonly recommended popular internet-sourced recipes tend to be unhealthy (Trattner and Elsweiler 2017). Consumers tend to be overwhelmed with information when making a decision (Kalnikaitė et al. 2013), which cannot be alleviated by changing the recommended context. At best, studies have considered specific dietary constraints (e.g., allergies) and nutrient intake to generate healthier recommendations (Schäfer et al. 2017; Schäfer and Willemsen 2019), or have leveraged human biases to steer user preferences toward specific recipes, for example by using visually attractive images (Elsweiler et al. 2017; Starke et al. 2021).

In the recommender context, we argue that users can be supported to make healthier choices by using justifications concerning why a set of recommendations is presented. Specifically for the food domain, justifications of recommendations that elaborate on the nutritional content of different recipes can steer user choices away from the common popularity-based recommendations (Musto et al. 2020). An open question is to what extent justifications can affect user preferences if items are already personalized (Starke et al. 2020), as well as whether user preferences can be affected if that user has made prior choices.

In this paper, we present an approach inspired by knowledge-based natural language generation strategies (Reiter and Dale 2000), to produce justifications for different recipe recommendations. Recent developments in natural language justification strategies show their merit in improving the transparency of the recommendation process, increasing users’ trust and affecting their decision-making processes (Tintarev and Masthoff 2012; Nunes and Jannach 2017). The proposed framework takes a user and two food recommendations as input and produces an automatically generated natural language justification as output, which is based on the user’s characteristics and the recipes’ features. It draws upon general knowledge about health risks and benefits related to food consumption to generate justifications. Within the framework, eight different justification strategies are implemented through two different justification styles, based on the combination of different informative content and features. In particular, we generate comparative justifications of recommendations, which juxtapose the main characteristics of two recipes into a single natural language sentence. For instance, such a justification could compare the fiber content of two recipes. This taps into consumer research on the effectiveness of comparative evaluations of item attributes (Bettman et al. 1998), compared to a separate representation of that information (i.e., a “Single” justification).

We evaluate the effectiveness of the eight implemented justification strategies and two justification styles to support healthy food choices. We examine this across two different studies, asking users in each study to choose between popularity-based and health-based recommendations. In the first study (\(N=502\)), we examine which natural language justification style is most effective in steering users toward healthier recipe choices. Building upon preliminary findings, in the second study (\(N=504\)), we examine which natural language justification strategy is most effective in promoting healthy recipe choices. In doing so, we adopt a strict baseline, where we first present a recommendation pair to users with no justification, immediately followed by the same pair but accompanied by one of our eight justification strategies. Such preference or choice reversal is hard to achieve (Zhu et al. 2012), as people tend to stick to the status-quo when making a decision (Kahneman 2011). Finally, in both we inquire on why users have chosen either the healthy or popular recipe, as a user’s motivation could help us to understand how to design better justifications (Tintarev and Masthoff 2012). We posit the following research questions:

[RQ1]: Which natural language justification styles are most effective in steering user preferences toward healthier recipes, and for which types of meals?

[RQ2]: Which natural language justification strategies are most effective in steering user preferences toward healthier recipes, and for which types of meals?

[RQ3]: To what extent can users’ self-reported motivation predict healthy recipe choices?

As we will show in the following, it emerged that users preferred healthier recipes over popularity-based recommendations if comparative-style justifications are presented, as well as for specific types of justification strategies.

We summarize our contributions as follows: (i) We introduce a methodology to automatically generate a natural language justification to support personalized food recommendations; (ii) we design; and (iii) evaluate several justification styles (i.e., none, single, or comparative styles) and strategies in a user study, where each justification leverages different user characteristics and recipe features. Moreover, we examine (iv) which justification strategies are most effective in affecting user choices.

2 Related work

The idea of providing intelligent information systems with explanation facilities has been studied since the early 1990s (Johnson and Johnson 1993). It was introduced in the area of recommender systems in the 2000s (Herlocker and Konstan 2001), only re-gaining attention due to the recent General Data Protection Regulations (GDPR), which prescribed to increase the transparency of underlying algorithms. This particularly applies to recommender systems, since explanation strategies have shown to positively affect both a user’s acceptance of and trust in presented recommendations (Sinha and Swearingen 2002; Cramer et al. 2008).

Explanations in recommender systems can have different aims (Tintarev and Masthoff 2012). For example, explanations can educate users or improve the efficiency of decision-making (Jannach et al. 2010; Tintarev and Masthoff 2011). For our current work, we identify persuasiveness as the main aim (Symeonidis et al. 2008). We specifically aim to promote healthy food choices through our justifications, which is novel to food recommender research (Trattner and Elsweiler 2019). The persuasive explanation aim is touted in other domains as useful to convince users to try or buy a recommended item, such as a product on Amazon or a movie on Netflix (Gkika and Lekakos 2014; Tintarev and Masthoff 2012).

With respect to the information content exploited to generate justifications, we frame our approach as being at the intersection between content-based and knowledge-based methods (Jannach et al. 2010). It is based on user characteristics and food features, along with general knowledge on food consumption. Taken together, they justify our health-aware recommendation by emphasizing health risks and benefits. This is related to studies where health risks are highlighted in a smoking cessation application in a recommender context (Hors-Fraile et al. 2016, 2022); although no evidence is provided concerning the effectiveness of such information (Hors-Fraile et al. 2016), a knowledge-based health recommender did lead to better results than a hybrid recommender (Hors-Fraile et al. 2022). Conversely, our work fills this knowledge gap by evaluating the impact of justifications, including health risks and benefits, on user food choices.

The effectiveness of justifications can be better understood by cognitive processing and decision-making theories. For one, dual-process theory emphasizes that people’s behavior is determined by two diverse processes or systems: a non-conscious process that relates to spontaneous, heuristic-based thinking (i.e., “System 1”), and a reflective process that relies on rational and conscious decision-making (i.e., “System 2”) (Hagger 2016; Kahneman 2011). This duality in cognition is also described by the Elaboration Likelihood Model (Petty et al. 1997), which is an information processing theory of persuasion that describes changes in a person’s attitude as the result of two diverse “routes.” Under the central route, the recipient of the persuasion attempt (e.g., the user) is thinking rationally about the message, drawing upon prior experience and knowledge to carefully evaluate all of the information presented. In contrast, the peripheral route of persuasion relies on simple cues and heuristics to judge the relevance and validity of a persuasive message.

Fast decisions without much deliberation seem to be common in low-stake recommender domains, such as movies (Gomez-Uribe and Hunt 2015). These choices are typically the result of a simple association or inference process without much cognitive effort (Petty et al. 1997), activating the peripheral route. Affect is associated with peripheral activation in food choices, as certain emotions tend to be associated to specific foods (Gutjar et al. 2015), and foods are chosen based on their visual appeal (Elsweiler et al. 2017; Starke et al. 2021). Although peripheral activation is likely when an explanation is absent (e.g., when only showing images and ingredients), we argue that providing a justification why specific recipes are presented would increase the likelihood of activating the central rather than the peripheral route. In our study, it can be considered as a cognitively oriented healthy eating nudge (Cadario and Chandon 2020), making users reflect about the contents of recipes.

Another hallmark of the current work lies in the development of a justification framework, designed specifically for the food domain. As discussed in Tran et al. (2018), studies that evaluate the impact of explanations and justifications in the food domain are scarce, even though they could encourage users to stick to better eating habits. A preliminary attempt to introduce explanation mechanisms in a food RS is presented by in Leipold et al. (2018), where a very simple explanation strategy based on food features is integrated with a food recommender system, but the impact on food choices is not evaluated. Another simple explanation interface is presented in Elahi et al. (2014), where users’ food preferences are linked to the ingredients of the recommended recipe, generating explanations such as “Because you want food containing X.” We go beyond Elahi et al. (2014), designing and evaluating a more comprehensive set of justification strategies.

Furthermore, the novelty of this work also lies in the automatic generation of comparative natural language justifications that emphasize similarities and differences between two alternative recommendations. Consumer decision-making research has shown that how two alternatives are presented (e.g., separately or comparatively) affects user preferences (Bettman et al. 1998). A remotely similar approach is adopted by Chen and Wang (2017), who introduce a user interface where different recommendations are presented together with their distinctive features, obtained automatically from user reviews. However, in contrast with Chen and Wang (2017), rather than developing a completely novel user interface, we designed a framework to automatically generate a single natural language justification that compares two alternatives.

To conclude, we frame our approach with respect to the taxonomy of explanation strategies introduced in Friedrich and Zanker (2011), labelling it as a black box methodology. Hence, the explanation strategy is not aware of the underlying recommendation model, generating a post hoc explanation that is independent of the recommender algorithm. Post hoc explanations provide reliable and effective explanations that are typically preferred by final users (Musto et al. 2019, 2020). We evaluate this framework by implementing two food recommender approaches: one that identifies popular recipes and one that selects healthier recipes. More details about the algorithms will be provided in the methodology.

Finally, we emphasize that the term justification is used, instead of the “traditional” explanation. Even though both concepts appear to be synonymous, we follow the definition provided by Biran and Cotton (2017): an explanation focuses on how the suggestion is generated, while justifications describe why a user would be interested in an item. This supposedly provides users with a means to make a more informed decision about consuming an item or not, fitting seamlessly to the current study’s goal, for we evaluate whether and how natural language justifications affect users’ online food choices.

3 Methodology

3.1 Natural language justification workflow

Figure 1 depicts the workflow to generate natural language justifications. It shows three main components.

The Profiler module collects user’s characteristics. It adopts a holistic user profiling approach used in other studies (Cena et al. 2018a, b, 2020), including one on taste-based food recommendations and health-related scenarios (Polignano et al. 2020). holistic user models (HUMs) (Musto et al. 2020, 2021) rely on the intuition of modeling a profile of the user by combining heterogeneous data points and mapping them to a set of facets the describe the user. These facets include affect (e.g., a user’s current mood), contextual constraints (e.g., time and willingness to pay for meals), demographics (i.e., age, gender), self-reported health data (e.g., BMI, lifestyle self-evaluation, stress), and weight-related goals. Table 1 outlines the seven user aspects used, which are encoded in each user profile. Note that preferences were also inquired upon by asking about favorite ingredients, assuming that this was both related to the overall preferences and specific taste-related preferences.

Fig. 1
figure 1

Schematic workflow to generate natural language justifications, based on user and recipe features and food knowledge, to be incorporated in food recommendations

Table 1 User characteristics obtained by the Profiler module in our natural language justification workflow

In a similar vein, the Recipe Analyzer extracts the main food features of the recommended recipes (e.g., ingredients, nutrients). These include the nutritional content of food, expressed in nutrients (i.e., fats, fibers, proteins), calorie content, and a Food Standards Agency (FSA) recipe health score. The FSA score is an aggregate health score that captures the nutritional content of a recipe, based on the serving weight and the weight per 100 g of nutrients: sugar, fat, saturated fat, and salt (Howard et al. 2012; Trattner and Elsweiler 2017; Starke et al. 2021). In addition, the recipe analyzer also extracts contextual features of the recipes, such as cooking time and preparation difficulty. All these data are crawled from online sources (e.g., recipe web-sites, such as GialloZafferanoFootnote 1) and publicly available knowledge bases.

Finally, the Generator outputs the justification, also based on knowledge about health-related food risks and benefits. The final output comprises eight different justifications strategies, each emphasizing different recipe characteristics or user features. The generation process follows the principles of Natural Language Generation systems (Reiter and Dale 2000), completely automated and unsupervised, thus not requiring any human intervention.

On the basis of this setting, our framework generates its output by following two different justification styles: single and comparative. It takes as input two different recipes. On the one hand, by following the first justification style, both of them are processed separately and each recipe is provided with a different justification. On the other hand, a comparative justification contrasts the characteristics of the two recipes and is automatically generated by the algorithm.

To generate justifications, the Generator module also relies on general food knowledge. It uses a food knowledge base that comprises facts related to the daily intakes of nutrients, as well as food consumption benefits and risks. Such knowledge relies on general guidelines concerning food consumption, such as government publications, academic studies, and commonsense knowledge. In particular, for each of the nutrients—sugar, carbohydrates, fats, proteins, fibers—around ten facts are encoded. For instance, “Consuming too much sugar increases the risk of diabetes,” “High protein intake improves muscle development,” and “High sodium intake increases health pressure.” In total, we have encoded around 150 facts in our knowledge base, which are used in several justification strategies.

3.2 Overview of the justification strategies

We defined eight different justification strategies. These are outlined in Table 2 along with the relevant characteristics and features. To define and select the justification strategies, we used two criteria:

(1) The set of justification strategies should elicit mainly (i) the central route of persuasion, i.e., encourage the user to reflect on her food choices, thinking rationally about the information provided; or (ii) both the central route and the peripheral route, i.e., based on cues aimed at activating non-conscious processes; or, to a lesser extent, (iii) the peripheral route of persuasion. In this way, we could compare different forms of persuasion, attempting to understand their effectiveness. We privileged the central route because we mainly embrace a cognitively oriented healthy eating nudge approach, which encourages users to reflect on their food choices. However, defining also strategies leveraging the peripheral route could give us insights on how a justification, which in principle should act on the conscious level of persuasion by providing information on the target behavior, could be combined with “nudges” that elicit unconscious processes.

(2) The formulation of the single justification strategies should tackle specific factors that, either consciously (via the central route) or unconsciously (via the peripheral route), may possibly affect behavior change, as pinpointed by behavior change theory. To this aim, we relied on five widely accepted theories of behavior change: The Health Belief Model (HBM) (Rosenstock 1974; Taylor et al. 2007) and the Theory of Planned Behavior (TPB) (Ajzen 1991) which pinpoint the role of attitudes and beliefs in driving human actions; the goal-setting theory, which shows that people make decisions and take action in line with their set goal (Locke and Latham 2002); the social cognitive theory (SCT) (Bandura 1986), which posits that behavior is affected by e.g., efficacy expectations (or self-efficacy) and the behavior of others; and the Transtheoretical Model of behavior change (TTM) (Prochaska and Velicer 1997), which describes change as a six-stage process through which an individual progresses. We chose these theoretical frameworks because they are the most widely used theoretical frameworks in technology-based interventions for behavior change (Orji and Moffatt 2018; Pinder et al. 2018; Rapp et al. 2019; DiSalvo et al. 2010; Stowell et al. 2018).

The strategies exploit different information sources and follow a pre-set structure that is filled in dynamically, based on the workflow components depicted in Fig. 1. The text outputs from the Profiler, Recipe Analyzer, and food knowledge components are concatenated using adverbs and conjunctions by the Generator.

Table 2 Overview of the eight comparative justification strategies used in our experimental evaluation

While most justification strategies in Table 2 put emphasis on health, some variety is included. The Description strategy contrasts both recipes neutrally, providing context on a recipe’s origin. The Popularity strategy is based on social cognitive theory which highlights that people may imitate the behavior of others and choices that appear to be popular in order to be accepted by others (Bandura 1986). The strategy contrasts each recipe’s popularity score on the food community platform GialloZafferanoFootnote 2, where they were initially uploaded. This strategy prioritizes the popularity-based recommendation over the healthy recommendation, in part encouraging peripheral processes of persuasion by creating a majority or bandwagon effect (Elsweiler et al. 2017), where the pressure of "peers" may act unconsciously, as also employed by Starke et al. (2020); Zhu et al. (2012).

The strategies related to the recipe’s Food Features and the user’s Food Goals support central route persuasion processes.

The Food Features strategy is based on the TTM, which notices that consciousness raising, that is the increasing of knowledge about aspects related to the behavior be changed, may encourage people progress toward behavior change (Prochaska and Velicer 1997). The strategy informs users about specific nutrients of both recipes, aiming to overcome poor nutrient intake and low food knowledge levels (Ilich et al. 1999; Wardle et al. 2000), based on a neutral lexicalization of the characteristics, such as “X contains more proteins and fats than Y, but fewer carbohydrates.”

The Food Goals strategy relies on goal-setting theory (Locke and Latham 2002), which shows that people make decisions and take action in line with their set goal: reminding people of these goals is particularly effective if the goals are important to them and are self-set rather than assigned to them Munson and Consolvo (2012). Accordingly, Table 2 shows how nutritional food features per recipe are linked to a user’s self-set goals, contrasting them (’X has more calories than Y’), and highlighting the recipe with fewer calories if a user pursues weight loss goals.

Two other justifications strategies are based on the HBM and aim to highlight Health Benefits and Health Risks. HBM points out that health-related behaviors and choices are affected by: (i) the perceived susceptibility to illness or health problems and the perceived severity of the consequences associated with the state or condition, (ii) the perceived benefits of a health behavior (Rosenstock 1974; Taylor et al. 2007). Both justification strategies link nutrient intake information to health benefits or risks, which is split in three parts: (i) macro-nutrient selection, (ii) retrieving nutrient-specific food knowledge, highlighting either health benefits or risks, (iii) connecting relevant user characteristics to the nutrient-specific knowledge. For example, if the user reported to be overweight, the justification could highlight a risk related to heart diseases. Both pairwise strategies contrast the different levels of nutrients in two different sentences, each linking food characteristics to health benefits or risks, aiming to elicit emotional as well as reflective responses, activating both central and peripheral route processes.

The two final justification strategies are based on the user’s self-reported lifestyle and skills and are aimed at eliciting the central route.

The User Lifestyle strategy relies on the Theory of Planned behavior, which states that human behavior is a consequence of one’s behavioral intention, which is in turn explained by e.g., one’s attitude and subjective norm (Jun et al. 2014). Attitudes may in turn be affected by values (Ajzen and Fishbein 2005; Ateş 2020) While a value may be defined as a desirable and fundamental standard that guides people’s actions (Jun et al. 2014), health value is "the degree to which individuals value their health" (Tudoran et al. 2009). In the food domain, it has been shown that people’s perceived health values positively affect their choices and actions toward low-fat or low-calories menu items (Jun et al. 2014). The strategy connects the comparative nutritional evaluation of both recipes (in the form of an FSA health score (Trattner and Elsweiler 2017)) to a user’s personal values, such as the importance of maintaining a healthy lifestyle. The value-attitude-behavior model explains that both values and attitudes affect behavior (Jun et al. 2014; Tudoran et al. 2009).

In a similar vein, the User Skills strategy is grounded in social cognitive theory and, in particular, in the construct of self-efficacy, which captures the belief in one’s capabilities to execute a course of action (Bandura 1997). People who report higher levels of self-efficacy tend to execute more difficult tasks (Elsweiler et al. 2017; Bandura 1986), because they are more confident that they will successfully execute the task; conversely, people with low self-efficacy may select less difficult activities and give up the accomplishment of difficult tasks (Zimmerman 2000; Zulkosky 2009; Schunk 1996). Bandura (Elsweiler et al. 2017; Bandura 1986) hypothesized that self-efficacy impacts on choice of activities, effort, and persistence. In our study, we link the user’s self-reported cooking experience to each recipe’s “level of difficulty.”

3.3 Food recommendation algorithms and dataset

For our experimental evaluation, which spans across two studies, we use two personalized food algorithms to retrieve recipes. The first personalized algorithm optimizes for a recipe’s health, which is referred to as the Healthy algorithm or health-aware algorithm. Healthy recipes are retrieved based on a variety of user characteristics, such as food goals and dietary constraints (Musto et al. 2020). The second algorithm retrieves popular recipes, based on given website ratings stored in the dataset, and is thus referred to as the Popular algorithm. Since our natural language justification framework is decoupled from both algorithms, we consider them as independent parameters in our experimental manipulation.

The recipes used for our NLP framework were sampled from a database of 4,671 Mediterranean-style recipes. The used dataset is available online, along with processing scripts.Footnote 3 The recipes have been obtained from the popular food community platform GialloZafferano and translated to English. The recipes contain information about their name, category, preparation difficulty, as well as their ingredients, (macro-)nutrients, calories, rating count, and average website rating. Moreover, they also include several binary tags, such as vegetarian, vegan, lactose-free, and low-nickel.

4 Study 1: Examining the effectiveness of different justification styles

4.1 Method

In Study 1, we examined the merits of our natural language framework. We investigated the effectiveness of different justification styles (RQ1), comparing user choices for either the healthy or popular recipe recommendation across trails with no justification, a single-style justification, or a pairwise justification. We did so across three meal types, using eight different justification strategies throughout, exploring [RQ2] as well.

4.1.1 Participants

In total, we analyzed a sample of 502 US-based participants (43.8% Male) in an experimental evaluation.Footnote 4 They were recruited through Amazon MTurk, being required to have a hit rate of 98% and a minimum of 500 approved hits.Footnote 5 Participants were required to be fluent in English. Most of the participants were employed (81.3%; 2.6% was student) and between 30 to 40 years (37.1%), whereas only 15.7% was between 20 and 30 years and 17.9% was between 40 and 50 years. More than 55% of the participants declared that they had a weight loss goal, whereas only (9.1%) had a weight gain goal. The majority of the participants completed the provided tasks between 5 and 10 min. They were reimbursed with 0.5 USD.

Fig. 2
figure 2

The study’s interface for two first course meals. The recipe displayed on the left is our healthy-algorithm recommendation, the one displayed on the right is generated by a popular algorithm. Depicted within the red box is a justification in a specific style, in this case a “Comparative” User Skills justification; the box is missing in the “No Justification” condition. Users were asked to choose one recipe or neither of them, and to provide reasons why they had chosen a recipe

4.1.2 Procedure

First, the participants were asked questions about demographics, health and well-being, dietary restrictions, food preferences, and experience with home cooking, which were needed to model their profile (see Table 1 for an overview of the feature of the model). Then, the profiler (cf. Fig. 1) generated three pairs of recommendations (see an example in Fig. 2, where the left recommendation is based on our healthy food recommender, whereas that on the right is generated using a popularity-based algorithm), which were presented sequentially to the participant: first, two first course meals, then, two second courses and, finally, two desserts. For each pair, participants were required to choose either i) the left-hand side recipe, ii) recipe the right-hand side recipe, iii) or neither. The participants were not aware which recipe was the healthy recommendation, or if there was any at all. Participants who chose one of the two recipes were subsequently asked to indicate the reason behind their choice, whether it was based on factors such as the recipe’s taste, healthiness, or ease of preparation.

4.1.3 Research design

To examine whether healthy recipe choices could supported with different justification styles (RQ1), we designed three between-subject conditions. The participants were either presented no justification for the prompted recipes (i.e., the baseline), a justification style focusing on each recipe separately (i.e., “Single Justification”), or a justification style comparing the two recipes (i.e., “Comparative Justification”). Moreover, to explore the merits of different justification strategies (RQ2) the conditions in which single-style or comparative justifications were presented, were subject to eight within-subject conditions (see Table 2). This way, one participant could be presented three different single justifications (e.g., popularity, food goals, and health risks), while another participant would be prompted three other comparative justifications (e.g., User Lifestyle, Food Features, Health Benefits), or no explanation at all for each recipe. Figure 2 provides an example of a “User Skills” justification, displayed within the red box.

4.1.4 Measures

To address [RQ1], we considered the effect of different justification styles on the percentage of healthy recommendations chosen by the participants. To this aim, we compared the “No Justification” baseline either with any justification style separately, that is “Single” and “Comparative” justifications, or across the different justification strategies listed in Table 2. The effectiveness of each justification style was compared against the no explanation baseline, across all dish types for all choices made (i.e., choosing the popular recommendation or choosing neither of the recipes). To address [RQ2], Different justification strategies were compared between the no explanation baseline and the comparative style, as the results showed that the comparative style was the most effective justification style.

Moreover, to address [RQ3], we examined participants’ motivations for choosing one of the two presented recipes. The participants were required to indicate on 5-point scales to what extent a certain motivation was applicable, as well as to report the reason why they had chosen one of the recipe. Motivation items are depicted in Fig. 2, and were related to a match with the user’s preferences, weight loss or gain goals, healthy eating goals, the recipe’s taste, and a recipe’s ease to prepare. The user preferences herein were related to the overall evaluation of the recommendations, while other motivations related to specific aspects (e.g., a recipe’s taste).

Finally, we discuss the set of user characteristics that users were asked to disclose. These measures were employed by the Profiler to produce healthy recommendations (see Table 1). Besides obtaining data on food preferences and demographics (i.e., age, gender, BMI), we asked users to report whether they had any food goal (i.e., weight loss, weight gain, or no goals), and to rate the healthiness or their lifestyle and the importance for them of having such a lifestyle (5-point scales). The participants were also required to rate how frequently (5-point scale) they make healthy food choices, use websites with recipes, look at the nutritional values of food, and engage in home cooking. Furthermore, the participants were asked about their current levels of sleep, physical activity, and mood (3-point scales), and whether they were depressed or stressed (“yes” or “no”). Finally, we asked them about their food knowledge, as they had to indicate their cooking experience (5-point scale) and cost and time constraints for cooking.

4.1.5 Manipulation check

We checked whether the health-aware recommendations could actually be considered as healthier than the popular recommendations. We assessed recipe healthiness through the “WHO Score,” which was first used in a digital recipe context by Howard et al. (2012). It captured recommended daily intake levels for six nutrients and calories in a score between 0 and 7 (Organization 2003). We confirmed that the health-aware recommendations yielded higher WHO scores for each meal type than the popular recommendations: For first courses (health-aware: 4.21; popular: 2.30), second courses (health-aware: 2.65; popular: 1.61), and desserts (health-aware: 2.94; popular: 1.66). The only nutrient for which the popular recommendations were slightly healthier than the health-aware ones was sugar, as the popular recipes tended to be high in fat and saturated fat but somewhat lower in sugar.

4.2 Results

We examined user choices through three different analyses.Footnote 6 We did so in three ways. First, we examined whether presenting any explanation, through two different styles, affected user preferences for healthy recommendations. Second, we examined preferences for each of our eight justification strategies. Third, we investigated more specifically why users had either chosen healthy or popular recipes.

4.2.1 Single and comparative justifications styles (RQ1)

We studied whether participants were more likely to choose healthier recipes if justifications were presented underneath it. We used a one-way ANOVA to examine choices made across all types of meals. A Shapiro-Wilk test for normality showed no evidence for non-normality of the dependent variable (\(W=1.00\), \(p=1.00\)).Footnote 7 The healthy recommendation was revealed to be chosen more often as long as any justification was presented underneath it (\(47.6\%\) of choices, \(\textrm{SD}=0.50\%\)), compared to the “No Justification” baseline (\(M=38.1\%\), \(\textrm{SD}=0.49\%\)): \(F(1,1504)=12.14\), \(p<0.001\). This suggested that justifications helped to steer user preferences toward the health-aware recommendation.

We further differentiated between the effects of presenting “Single” and “Comparative” styles. To do so, we performed a two-way ANOVA with two conditions dummies for “Single” and “Comparative” justification styles. Although users were not more likely to choose the healthy recommendation when being presented a “Single Justification” (\(43.0\%\) of choices, \(\textrm{SD}=0.50\%\), \(p=0.13\)), compared to the baseline (\(38.1\%\)), they were more likely to do so when facing a “Comparative Justification’ (\(M=51.1\%\), \(\textrm{SD}=0.50\%\)): \(F(1,1503)=18.24\), \(p<0.001\). This suggested that comparative justifications were particularly effective in supporting users choices for the healthy recommendation.

Further analyzes teased apart these effects by differentiating across the three meal types, as this would be consistent with previous research indicating that preferences differed across meal types (Musto et al. 2020). Using multiple one-way ANOVAs, we found that depicting any justification increased the number of choices for healthy recommendations for first courses (\(F(1,500)=4.83\), \(p<0.05\)) and desserts (\(F(1,500)=4.43\), \(p<0.05)\), but found no such effect for second course meals (\(F(1,500)=3.03\), \(p=0.08\)).Footnote 8 We further inspected these effects by discerning between “Single” and “Comparative” justification styles per meal type, performing multiple two-way ANOVAs. This revealed that while “Single” justifications did not significantly boost healthy recommendation choices in any dish type (all p-values > 0.1), “Comparative” justifications did do so: for first courses (\(F(1,499)=5.37\),\(p<0.05\)), second courses (\(F(1,499)=6.33\), \(p<0.05\)), and desserts (\(F(1,499)=6.61\), \(p<0.05\)). This gave us further evidence that justifications comparing popular and healthy recommendations were more effective in steering participants’ preferences toward healthy recommendations, than separate justifications per recipe.

To understand the results from the different ANOVAs, please refer to Fig. 3. Illustrated are recipe choices per meal type (from left to right: first course, second course, dessert), for which we examined the percentage of the chosen options per meal type: neither recipe, the popularity-based recommendation, or the health-aware recommendation. For first course meals and desserts, it was clear that the “Single” justification only increased the number of choices for healthy recommendation a little, while Comparative justifications increased that effect much further. For second course meals, there was little difference between “No Justification” and “Single” in terms of choices made, while “Comparative” boosted choices for healthy recommendations.

Fig. 3
figure 3

Percentages of choices per condition, per meal type. Depicted are choices for neither recipe (in blue), the Popular recipe (in red), and the Healthy recommendation across three different meal types. Conditions are the three different justification styles: No justification, single justifications, and comparative justifications. Meal types are first course, second course, and dessert

4.2.2 Effectiveness of justification strategies (RQ2)

The previous subsection highlighted that pairwise justifications were the most effective in steering participants’ preferences toward healthy recommendation. Here, we examine the effectiveness of specific justification strategies (cf. Table 2) to promote our healthy recommendations.Footnote 9

We examined the effectiveness across all meal types, as well as per separate type of meal. Table 3 outlines four different logistic regression analyses, which each predicted whether our health-aware recommendation was chosen (compared to a popularity-based choice or no recipe chosen). We found effects to be mixed across the different meal types, while the second course and dessert models had the highest pseudo \(R^{2}\)-values. However, all significant effects across all models were positive, indicating that the different justification strategies in the comparative condition increased the likelihood that the healthier recommendation was chosen, not the popularity-based option.

Table 3 Four logistic regression models, predicting choices for healthy-aware recommendations (against no choice or popularity-based choices) in the “Comparative” justification condition, compared to the no explanation baseline

The model across all meal types in Table 3 shows that three justification strategies effectively supported health-aware choices. A comparison of the food features of the two recipes (e.g., Recipe A contains less fat than Recipe B) was related to a higher likelihood of choosing the healthy recommendation compared to the no justification baseline: \(\beta =.86\), \(p<0.001\) (also in the first course model), as did justification that compared the health risks of both recipes: \(\beta =.98\), \(p<0.001\) (also in the second course and dessert models).

In a similar vein, comparing recipes in terms of their health benefits led users to choose the healthier dessert more often: \(\beta =.84\), \(p<0.05\), but not for other meal types. Table 3 also shows that comparing recipes in terms of food goals increased the likelihood of choosing the healthy option for first courses: \(\beta =.78\), \(p<0.05\), but not for second courses and dessert. In contrast, a somewhat counterintuitive effect was that a popularity justification strategy, which typically showed that the healthy recipe was less popular than the popularity-based recommendation, increased the likelihood of choosing the healthy recommendation: \(\beta =.59\), \(p<0.05\) (also in the dessert model).

Table 3 also points out which strategies did not affect participants’ preferences between the “Comparative” and “No Explanation” conditions. Both giving comparative descriptions of the contents of the recipe (e.g., the ingredients) and comparing whether the recipes match with the participant’s lifestyle—for each meal type, did not affect participants’ preferences. Furthermore, comparative justifications of food goals did not influence choices about desserts, whereas highlighting health benefits and risks did not affect choices about first course meals.

4.2.3 Choice motivation (RQ3)

Finally, we investigated why the participants had chosen one of the proposed recipes (RQ3). We performed four logistic regression analyses that compared cases in which either the popular or healthy recommendation was chosen, while ignoring cases in which neither recipe was chosen. Table 4 shows a model that includes a participant’s choice motivation across all meal types, as well as three meal-specific models. Significant, positive effects in Table 4 indicate reasons why the healthy recommendation was chosen, while significant negative effects provided evidence as to why a popular recommendation was chosen. The best model fit was observed for the first course meal model, for which the pseudo \(R^2\) was around two times higher than for the other models.

Table 4 Four logistic regression models, each predicting user choices for the Healthy Recommendation

We observed mixed evidence for why healthy recommendations were chosen across different meal types. Our health-aware recommendations were chosen more often because of health-related reasons. A positive effect was found across all meal types (\(\beta \)=.41, \(p<0.001\)), as well as for first course meals (\(\beta \)=.78, \(p<0.001\)) and desserts (\(\beta \)=.47, \(p<0.001\)). In contrast, tastiness was related to popular meal choices: averaged across meal types (\(\beta \)=-.47, \(p<0.001\)), as well as for first course (\(\beta \)=-.54, \(p<0.001\)) and second course meals (\(\beta \)=-.58, \(p<0.001\)). Furthermore, users who indicated to choose recipes because they matched their preferences, were more likely to choose our health-aware recommendations across all meal types (\(\beta \)=.13, \(p<0.05\)), in particular for second course meals (\(\beta \)=.52, \(p<0.001\)). Second course healthy recipes were also chosen more often because a match in food goals: \(\beta \)=0.21, \(p<0.05\). In contrast, easiness was negatively related to choosing healthy first course recommendations (\(\beta \)=-.26, \(p<0.01\)), suggesting that users had chosen first course popular recommendations because they were easier to prepare, while no such effects were observed for second course meals and desserts.

4.3 Conclusion

This study explored the effectiveness of different kinds of justifications aimed at explaining health-aware recommendations. With regard to justification styles (RQ1), the study results show that participants preferred popular recipes when no explanation is presented, whereas they preferred health-aware recommendations when a justification is paired with the suggestion. Among the different justification styles presented, we first discovered that comparative justifications are more effective in encouraging healthy choices than single justifications. This falls in line with previous research that emphasizes that individuals tend to make comparative judgments rather than combining two independent observations (Köster 2009). Furthermore, we have explored the effectiveness of different justification strategies (RQ2), finding that comparing two recipes features and their related health risks better promotes healthy food choices. Finally, we have also shown what drives users’ choices in selecting healthier recommendations (RQ3), and whether the reasons differ per meal type. For most meal types, we discovered that popularity-based choices are driven by taste motivations, while choices for our health-aware recommendations are tied to health-related reasons.

This said, the contrast between “No Justification” and “Justification” scenarios is usually evaluated in between-subject designs (i.e., A/B tests) or in a within-subject design across multiple, heterogeneous sets (Symeonidis et al. 2008; Tintarev and Masthoff 2012). In contrast, examining changing preferences for the same set of recommendations is uncommon (Ekstrand and Willemsen 2016; Starke 2019), for this is harder to measure. To date, only Zhu et al. (2012) examined whether a recommender could reverse user choices within a single study due to majority-based social explanations (e.g., “108 people prefer this one” vs “8 people prefer this one”). Users were first presented pairs of items without any explanation, after which later in the study the same pairs were presented again, but this time with social explanations. The explanation was presented alongside furniture products, baby photos, and other items from various domains. They found that 14.1% of the users switched toward the item with the majority norm if it was presented quickly after the first trial, while this percentage was higher (22.4%) if there was more time between trials. We follow this approach of preference reversal in Study 2.

5 Study 2: Investigating recipe choices for different justification strategies

For Study 2, we considered a stricter study setup than in Study 1, following the work of Zhu et al. (2012). We examined whether back-to-back trials with and without justifications lead to choice reversal across a recommendation pair. In doing so, we assessed the effectiveness of eight different justification strategies across three different meal types. Note that all relevant processing scripts and datasets are available in our repository: https://osf.io/hn3et/.

5.1 Method

5.1.1 Participants

We invited users from the crowdsourcing platform Amazon Mechanical Turk to participate in a study on recipe recommendations and food enjoyment. Participants were required to be US-based and to have a hit rate of 98%, with a minimum of 500 approved hits,Footnote 10 and were reimbursed with 0.5 USD. In total, 504 participants (54.7% Male) completed our user study, among which 61.0% was between 20 and 39 years old. The majority of users was employed (73.6%; 14.9% was student) and had a weight loss goal (51.1%), while only 70 users (13.9%) had a weight gain goal. Participants were recruited throughout the USA, which may have varying levels of familiarity with Italian cuisine and a Mediterranean Diet (Lee et al. 2014).

Fig. 4
figure 4

The study’s interface for two first course meals. The recipe depicted on the left is our healthy-algorithm recommendation, the one on the right is generated by a popular algorithm. On the first trial, no justification is given but a list of ingredients per recipe. Depicted here is the second trial, presenting a pairwise “health benefits” justification underneath both recipes. Users were asked to choose one recipe or neither of them, and to provide reasons why they had chosen either recipe

5.1.2 Procedure

To provide personalized recipe recommendations, we first asked users to indicate their personal preferences regarding their eating habits and to disclose demographics. These included the different user features that were also used to generate the different justification strategies (cf. Table 2), including questions about a user’s BMI, cooking experience (5-point scale), self-reported health (5-point scale), mood and well-being (3-point scales), as well as their dietary restrictions (e.g., no gluten or lactose) and general food preferences (i.e., input of ingredients a user liked).

Subsequently, we presented six pairs of recipe recommendations; one at a time. The Profiler (cf. Figure 1) generated three recipe pairs based on a user’s responses, which were each presented twice to a user. This included a pair of Mediterranean-style first course meals (Willett 2006), a pair of second course recipes, and a pair of desserts. Figure 4 shows an example set of first course recommendations, depicting the healthy recommendation on the left and the popularity-based recommendation on the right. Users were asked to choose the recipe they preferred the most, or neither of them. In addition, users were required to indicate on 5-point scales to what extent different reasons were underlying their choice, whether this was due to a recipe’s ease of preparation, fit with user goals or preferences, health, or taste.

5.1.3 Research design

In line with Zhu et al. (2012), we presented each recipe pair twice to a user. While the first trial was presented with no justification, the second trial presented the same pair of recommendations with a pairwise justification. In doing so, we examine [RQ2], representing the peripheral route of the elaboration likelihood model by a recommendation scenario with no justifications. In contrast, decisions facing a pairwise justification require to interpret what is comparatively presented, encouraging the user to reflect on the information provided and, thus, eliciting central route processes. Hence, the current study juxtaposes these two scenarios, by initially asking users to choose a recipe from a pair of recommendations in the absence of any justification and, subsequently, re-visiting that choice when that same pair is presented again—accompanied by a justification. While the latter should take a more central route toward a user’s elaboration, the justifications in the current study are situated on different points of the “peripheral-central continuum,” supporting rational reflection to different degrees and also prompting information that elicits peripheral processes. Each justification strategy was randomly sampled from the eight strategies listed in Table 2.

5.2 Results

In the following, we examined [RQ2] and [RQ3]. We first reported the descriptive statistics of our “No Justification” baseline. Then, we examined how often users switched toward a different recipe when facing any justification strategy, before examining the effect of specific strategies (RQ2), and how different choice motivations related to healthy food choices (RQ3).

5.2.1 Baseline results and users switching to the healthy recommendation

To investigate whether justifications led users to swap their initial choices for the healthier recommendations (related to all research questions), we first examined user choices in the no justification baseline. Figure 5 depicts the distribution of recipe choices per meal type. For first course meals and desserts, the popular recipe was slightly favored, while the healthier recommendation was preferred for second course meals. Since popular recipes were typically preferred in other studies (Trattner and Elsweiler 2017), this suggested that our health-aware recommendation pipeline was sufficiently personalized to the extent that many users already liked it—even without any justification.

Fig. 5
figure 5

Distribution of recipes chosen in the no justification baseline (i.e., the first choice made for a recipe pair), per meal type

Fig. 6
figure 6

Distribution of recipes chosen when a pairwise justification was presented (i.e., a recipe pair’s second choice), per meal type

By comparing Figs. 5 and 6, we examined whether user choices reversed for the same recipe pair after a justification was presented. By performing paired t-tests, we found that users were more likely to switch to the healthier recommendation when any justification was presented alongside first course meals, compared to no justification: \(t(503)=-3.17\), \(p<0.01\). In contrast, we observed no differences in healthy recipe choices for second course meals: \(t(503)=0.24\), \(p=0.81\), nor for desserts: \(t(503)=-0.24\), \(p=0.81\).

5.2.2 Specific justification strategies (RQ2)

We further investigated which justification strategies led users to reverse their choices toward the healthier recommendation (RQ2). We assessed whether the likelihood that a healthy recipe was chosen increased or decreased due to a specific justification strategy (i.e., reversing user choices), compared to the no justification baseline in the first trial. To this end, Table 5 reports three random-effects logistic regression models, one per meal type, of which the second course model is reported but disregarded, because it did not pass the Wald \(\chi ^{2}\) test of model fit.Footnote 11

Table 5 Random-effects logistic regression models (clustered at the user level), capturing different justification strategies that predict whether the healthy recipe is chosen from a recommendation pair

Table 5 shows that different justification strategies affected users’ healthy choices for different meals. For first course meals, four different strategies increased the likelihood that a healthy recipe was chosen: a justification that described the features of both recipes (\(\beta =1.69\), \(p<0.05\)), justifications that compared both recipe’s nutrients and linked them to health benefits (\(\beta =2.21\), \(p<0.01\)) and risks (\(\beta =3.25\), \(p<0.01\)), and a justification on how a recipe could contribute to a user’s lifestyle (\(\beta =1.84\), \(p<0.05\)). This suggested that most of the justification strategies that highlighted nutritional aspects of recipes, and possibly linked these to user characteristics, were successful in reversing initial user choices and steering them toward healthier choices for first course meals.

Justifications were less successful in promoting healthy dessert choices. Table 5 shows that the strategies that affected the likelihood of healthy first course choices, did not do so for desserts. Instead, justification strategies on the recipes’ health benefits (\(\beta =-2.66\), \(p<0.01\)) and preparation difficulty (i.e., user skills; \(\beta =-1.79\), \(p<0.05\)) decreased the likelihood that a healthy dessert was chosen. It seemed that our justification strategies were not appropriate for the dessert context, as users might have had more taste-related reasons for their choices, which was examined next.

5.2.3 Choice motivation (RQ3)

Finally, to contextualize our findings, we examined to what extent a user’s motivation to choose the healthy recommendation changed after being presented any justification (RQ3). Table 6 describes six logistic regression models: three models that predicted healthy recipe choices before a justification was presented (denoted by \(\beta _{pre}\); one per meal type), and three models for after a justification was presented (denoted by \(\beta _{post}\)). Across all meal types, we found that health-related choice motivations positively affected the likelihood of healthy recipe choices “post-justification,” while this only applied to first course meals and desserts “pre-justification.” This suggested that our health-aware recommendations catered to users who were making health-motivated recipe choices, while the justification was important for second course meals. In contrast, none of the models showed a relation between preference-related, goal-related, and food characteristics-related motivation and healthy recipes choices, indicating that these motivations were not specifically linked to either recommendation.

Table 6 Six Logistic Regression models predicting healthy recipe choices using different choice motivations

Table 6 further suggests that addition of justifications seemed to put less emphasis on contextual factors. Whereas motivations related to taste (first course meals and desserts) and ease of preparation (first course) decreased the likelihood that a healthy recipe was chosen, these effects were no longer present “post-justification.” This suggested that the nutritional or health-related emphasis of most of our justifications was successful, arguably making users reflect on their initial food choice and tapping into the more central route of persuasion.

5.3 Conclusion

Study 2 analyzed users’ changing preferences for the same set of recommendations provided, examining choice reversal in back-to-back trial with and without justification. We provided additional evidence for addressing our research questions, by evaluating the effectiveness of eight different justification strategies (RQ2), grounded in psychological literature, across three different meal types. The study results pointed out that pairwise justifications may encourage participants to reverse their choices toward healthier recipe recommendations, moving them away from popular recipes, but that this particularly applied to first course meals. Moreover, we discovered that different kinds of justifications may have different effect for different types of meals. Justification strategies tied to food features, health benefits and risks, and the participant’s lifestyle are most effective with reference to first course meals. However, with reference to second course meals we found no effect, which might be due to the fact that this kind of meal was preferred by a large part of the participants in the pre-justification trial, leaving little room for improvement when introducing pairwise justifications.

With regard to the choice motivation of participants (RQ3), we found more evidence that users who are interested in health were more likely to choose the healthy recipe. This already applied to the pre-justification conditions for first course meals and desserts, but also post-justification for second course meals. In addition, we observed that other motivations that were present pre-justification, such as ease of preparation and the taste of the recipes, were no longer important after seeing a justification, indicating that the justifications affected what mattered to users when choosing a recipe.

6 Discussion

We examined to what extent natural language justifications in a knowledge-based food recommender system can support healthier recipe choices. We have presented two studies in which we have predicted recipe healthiness by the style of justification used (Study 1; RQ1), by the justification strategy used (Study 1, but mostly Study 2; RQ2), and by a user’s choice motivation (both studies; RQ3). The effectiveness of eight different justification strategies, which have been grounded in psychological literature, have been evaluated across three different meal types. In doing so, Study 2 has employed a research design with a stricter baseline, examining choice reversal in back-to-back trial with and without justification, to which we are among the first in recommender system research (Zhu et al. 2012) and the first in food recommender research (Trattner and Elsweiler 2019; Musto et al. 2020).

The overall contribution of this paper is twofold. First, we present a recommendation approach that captures a user’s eating preferences. In contrast with most earlier work (Elsweiler et al. 2022; Freyne and Berkovsky 2010; Trattner and Elsweiler 2019), we have not focused on recipes that users liked in the past, but we have considered a user’s general eating preferences, affect, self-reported skills, and domain knowledge. This has resulted in a recommendation pipeline that presents personalized, yet healthier recommendations. Second, we have presented an approach to generate natural language justifications food recommendations. While the NLP pipeline is a contribution in its own respect, particularly in a food recommender system, we have also validated its effectiveness by showing what types of justifications are most effective to promote our health-aware recommendations, through a user study. Whereas popular recipes are preferred by most users if no explanation is presented (our “baseline”), we have shown that most users prefer our health-aware recommendations over a challenging popularity-based recommendation baseline, when presenting both recommendations along with a comparative justification.

Our results indicate that pairwise justifications can help to reverse and steer user preferences toward healthier recipe recommendations, moving away from the commonly-preferred popular recipes. However, it seems that different types of justifications might be effective for different types of meals. The use of justifications has led to the most preference reversals in first course meal choices, for which we have found that strategies related to food features, health benefits and risks, and a user’s lifestyle are most effective. In terms of persuasiveness, we expect these strategies to have appealed to different parts of peripheral-central route continuum of the elaboration likelihood model (Petty et al. 1997), since the Health Benefits and Health Risks justifications comprise both emotional and reflective responses (Rosenstock 1974; Taylor et al. 2007), while Food Features and User Lifestyle mainly require longer-term contemplation. The justification effectiveness is also reflected in the reported choice motivation of users: whereas ease of preparation and taste-related reasons negatively affected “pre-justification” healthy choices, we have only found health-related reasons to choose a healthy recipe “post-justification.”

The lack of any effects due to our justifications for second course meals could be attributed to the relatively high proportion of choice for the healthy recommendations in the baseline. Since these were preferred by a large proportion of users in the pre-justification trial, this left little room for improvement by introducing pairwise justifications.

Furthermore, we find that dessert choices are mostly taste-related, which undermines the effectiveness of most health and nutrition-related justifications. Nonetheless, our analysis of choice motivations suggests that the justification have put more emphasis on the health aspect, as taste-related motivations decreased post-justification. We expect that justifications will mostly resonate with users who have strict dietary restrictions or ambitious healthy eating goals.

A limiting factor to our study’s design was that the same order of meal types was maintained across all participants, starting with first course meals and ending with desserts. It is possible that users facing their second or third pair of recipes were less likely to change their preferences when facing a justification for those meal types. Alternatively, users might have already opted for the healthier choice in the first place (e.g., for the second course meals), because the justification for the first course meal activated reflective cognitive processes (Petty et al. 1997), which could have spilled over into later trials. In that sense, the results for the first course meals are likely to be more representative than those for second course meals, as this meal type is also less familiar to non-Italian natives.

The extent to which users are familiar with Italian cuisine has not been measured in our studies. It is possible that their evaluation of Italian-style recipes is different from, for example, American-style recipes, for example due differences in dietary intake styles (Willett 2006). Italian recipes could fall under a Mediterranean diet, which is, among others, characterized by a high intake of fruits, vegetables, whole grains, legumes, and nuts and a much more moderate intake of red meat and dairy products compared to a North American diet (Trichopoulou et al. 2014). While all participants in both studies are based in the USA, requiring fluency in English, their cultural and ethnical background is not known, nor is their knowledge on various cuisines. Regional differences exist in the USA regarding the dominance of the Italian cuisine (Lee et al. 2014), among others due to large-scale immigration from Italy around the turn of the 20th century (Levenstein 1985). While the implications of an American-Italian match in cuisine cannot be inferred from our results, it is clear that many Americans are familiar with Italian-style meals (Lee et al. 2014). Moreover, general attitudes toward Italian products are rather positive (Bonaiuto et al. 2021), which might have increased user favorability toward any Italian recipe. Follow-up studies could control for this match between participants and cuisine.

Another limiting factor to our findings is the extent to which the recommendations fit into one’s diet. While shifting toward a healthier dinner meal can go a long way in terms of improving dietary intake (Dallacker et al. 2018; Neumark-Sztainer et al. 2014), it is not informative about one’s eating habits throughout the rest of the day. In a similar vein, the extent to which longer-term preferences have been considered is minimal. For our approach, we have assumed that one’s preferences as elicited in our knowledge-based system apply to the current session and beyond, using the session-based approach of previous recipe recommenders (Elahi et al. 2015; Starke and Trattner 2021). While this has been appropriate to address our research questions regarding justifications (RQ1-RQ2), future research is required to examine whether such an intervention will lead to longer-term changes.

We recommend that follow-up studies explore the effectiveness of different justification strategies in a less controlled environment. Whereas the research design of the current study is suitable to point out specific effects, most food choices are not made between pairs of recipes, but rather in the context larger lists, such as in “more like this” recommendations on recipe websites or in the context multi-list food recommender interfaces (Starke and Trattner 2021).

With regard to specific justification styles, we find that comparative approaches are more effective in promoting choices for health-aware recommendations than single justifications. This taps into research that people are much at making comparative judgments than combining two “singular” observations (Bettman et al. 1998), which is reflected by the effectiveness of our “Comparative” justification style over the “Single” style. The obtained evidence is convincing, since we have observed this effect across different meal types—even desserts, for which food choices tend to be more related to taste instead of health (Musto et al. 2020). Moreover, we have also examined the effectiveness of specific justification strategies, suggesting that presenting a comparison of each recipe’s features and health risks seems to cater toward a user’s healthy food preferences. The sophistication of these strategies may have contributed to their effectiveness, for they link and compare different aspects, namely user characteristics, recipe features, and food goals. Although the large number of comparisons for specific justification styles may have been prone to a higher false positive rate, the overall results point out that all explanation strategies either promote healthy food choices—even the popularity-based strategy—or have no net effect.

We have also examined what drives users to choose healthier recommendations, and whether this differs per meal type. For most meal types, we have found evidence that popularity-based choices are related to taste motivations, while choices for our health-aware recommendation are linked to health-related reasons. This confirms that our health-aware recommendation pipeline caters to users with healthy eating goals, which is promising for future applications that seek to support such users. Moreover, “because it fits my preferences” is also found to be a reason to choose the healthy recommendation across all meal types, suggesting that our approach could generate both satisfactory and healthy food recommendations, which is rarely found in food RSs to date (Elsweiler et al. 2022; Trattner and Elsweiler 2019).

An interesting avenue of future research is to test whether the insights can be generalized in a practical application if more than two recipes in a recommendation list (Starke et al. 2021). we will introduce justifications combining several user-focused aspects, such as food taste and goals, to assess whether these can persuade a user to choose the healthier recommendation. Furthermore, we will investigate whether such natural language justifications can be personalized further, and whether this would increase their effectiveness. For example, presenting justification styles that address healthy eating goals make more sense if a user has indicated to have such a goal. While the current user study has done so by inquiring on the user’s preferences in the first screen, such questions would only need to be asked when a user’s profile is created, for instance on a recipe website.

Finally, we wish to emphasize that the study can serve as a blueprint for future studies on healthy food recommendation. We have shown that our algorithm successfully generates healthy recommendations, as users who chose them indicated to have health-related choice reasons. Moreover, we have also shown how such recommendations should be presented to support healthy food choices. Such a combination of a knowledge-aware algorithm and UI design should pave the way for even more sophisticated applications in food recommendation, as well as for applications in other behavioral recommendation domains. Moreover, future work should extend the number of inputs in the recommender framework, by taking into account a larger and more comprehensive set of algorithms and to evaluate them.