1 Introduction

Food waste is acknowledged as one of the major barriers to sustainable food systems in terms of environmental impact, food safety, and distribution in a world with a growing population [1,2,3,4]. In the EU alone, nearly 88 million tonnes of food is wasted per year. Private households contribute to a majority (53%) of this food waste [5]. One reason for food waste, as pointed out by Hebrok et al. [6], is the insecurities of consumers regarding interpreting date labels and assessing the state of food. To make the right assumptions and decisions, consumers require “food literacy,” which can be understood as the competent application of (embodied) food knowledge [7]. Consequently, the practical experience of sensory-based interaction with the world results in “trusting your guts” that goes beyond simply acknowledging institutionalized forms of knowledge, for instance, formally written down rules in cooking books [6, 8]. As modern consumers increasingly lack food literacy, the problem of decision-making regarding food safety is challenging and leads consumers to throw more food away than would be necessary from a safety perspective [6, 9, 10].

While kitchens have become increasingly smart and equipped with a variety of appliances from smart ovens that clean themselves to everyday helpers like the Thermomix [11], they are not capable to prevent food waste. Some HCI approaches [12] like bin or fridge cam[13,14,15] attempt to create awareness of the household’s food waste behavior, but do not yet address the source of insecurities of food safety. Besides, research in Human-Food Interaction (HFI) indicates that automation-driven technology might even compromise the rich and embodied interaction with food, thus potentially further impeding food relationship building [16, 17].

To address this issue, we propose an approach that utilises human-agent collaboration [18] to enhance embodied knowledge as “competence-to-act” [19] and to promote sustainable and conscious food resource handling. Intelligent personal assistants (IPA) respectively conversational agents gained popularity in recent years as commercially available voice assistants like Alexa or Google Assistant and allow for ambient interaction at home without too much attention directed to the device [18, 20,21,22,23,24,25,26]. Even though they are still limited in terms of their skills, technology as such provides interesting potentials for empowering human action by providing context-dependent cues and instructions [25]. By studying both humans and IPAs in collaborative action and decision-making, we want to explore how humans perceive the agency and the role of the IPA with its qualities and limitations to support the sharing and application of embodied knowledge for food waste prevention. Furthermore, we attempt to derive implications for the design of domestic human-machine co-performance [27, 28].

Our design case study [29] follows a user-centered design approach that is based on the actual food (waste) practices of households with the aim to support and enhance their food literacy. Therefore, we conducted contextual inquiries in 15 households and interviewed six experts about their approach to assessing food quality. We have chosen fish as an application domain, as this is a particularly sensitive food that comes with the most insecurities for consumers. Based on the preliminary implications of our formative studies, we developed and implemented a voice assistant called “Fischer Fritz” that aims at supporting users in applying sensory-based embodied knowledge to assess the quality and state of fresh fish (Fig. 1). Finally, we created a scenario-based video-prototype to evaluate the experience and approach beyond usability and detailed functions. We evaluated the potential to teach and negotiate embodied knowledge and collaborative decision-making towards food waste reduction with experienced consumers, allowing us to learn about how to further improve our design for implementation in common households.

Fig. 1
figure 1

Asking Fischer Fritz how to assess fish freshness, own representation

Our research highlights how AI agents can be designed for supporting situated learning and application of embodied knowledge. By means of Research Through Design [30,31,32], we contribute the design of a prototype providing support for assessing the edibility of food to potentially reduce food waste practices in households. Our study further extends related work on HFI by complementing automation approaches with expertise building to reduce insecurity in food quality and edibility assessment without compromising the human-food relationship. Finally, our case study demonstrates how future domestic co-performance might contribute to empowering humans by increasing their competence to sense, think, and act on food (waste), ultimately contributing towards more sustainable food practices.

2 Related work

2.1 From lack of embodied knowledge to food waste

Food waste results from various factors including demographics, lack of routine and planning, inappropriate storage, aversion to leftovers, and even a lack of cooking know-how [33]. All that factors are entangled within the complex nexus of household practices [3]. Freshness of food is a multi-dimensional and cultural-historically shaped concept, where a specific meaning depends on the background of the person [34] as well as their everyday life context [35]. Since childhood, eating habits have developed into one of the most stable habits with every eating experience and sensory perception. They remain non-reflected for a while and are shaped by the social environment, upbringing, and education [36, 37].

However, one reason that should not be underestimated, as it contributes to a significant share of food waste, goes along with the understanding of date-labeling and food perception [8, 33, 38]. Especially concerning refrigerated products such as fish or meat, consumers are uncertain about consumption and tend to dispose of food [8]. From a practice-theoretical perspective, Hebrok et al. [6] argue to decrease insecurity when consumers asses edibility between institutionalized knowledge, embodied knowledge, and sensorial perceptions. Gherardi and Nicolini [10, 19] define knowledge as a “competence-to-act” by negotiating the “meaning of words, actions, situations, and material artifacts” with people and resulting in “practical accomplishment.” Thereby, applying knowledge becomes observable. Whereas, institutionalized knowledge is theoretical knowledge, e.g., labels such as “best-before date” or explicit rules, e.g., for the storage of food, written by authorities or non-governmental organizations [9, 10]. Embodied knowledge, on the other hand, is built up through prior experiences, e.g., the sensory evaluation of tasting, smelling, seeing, or touching food [9, 39]. Yet, especially knowledge on safety is frequently formalized and institutionalized, but “does not produce safety by itself, but only when it is put to work by situated actors in situated work practices and in local interpretations of its meaning and constraints” [10]. Although embodied knowledge is formalized into rules to some extent, it still has to be appropriately recognized or applied by its users. Due to little experience in sensory evaluation and trust in formalized regulations, consumers tend to prefer institutional knowledge, especially the best-before date, which leads to unnecessary food waste [6, 8].

Here, Alan Warde [40] argues that, due to the lack of embodied knowledge, consumers cannot perform the practice of assessing food and remain in a reflective state of mind. Interventions should therefore focus on reconnecting consumers and food and promote bodily experiences and memories related to food properties and conditions [11, 41, 42]. Meanwhile, consumers might train their embodied sensors, evolve trust in the understanding of their sensory perception, and, hence, obtain the competence to act in any given situation. The endless repetitions in the same place and the same time contribute to establish strong habits grounded in purposeful action to waste less food [43]. During learning, contextual information and rules should be embedded in the situation to guide the practical application of knowledge [6, 10] before everything is internalized as practice [39, 43]. With somebody acknowledging this participation as competence and people reproducing the results repeatedly from a first-hand experience, practices might be established over time [19]. This is called purposive learning as a form of social learning and apprenticeship that focuses on active bodily and mindful participation by observing rules and procedures, accompanied by guidance and feedback [43].

2.2 Human-food (waste) interaction

Due to its high environmental significance, preventing food waste has been a prevalent topic in sustainable HCI research for over a decade [12, 44]. Similar to food waste research in general [45], HCI approaches are dominated by behavioristic and persuasive approaches [12]. Some research [13,14,15] uses bin cams to post images of waste on the Facebook account to promote social comparison and pressure. These design interventions lead to increased awareness and interest in improving personal food waste disposal skills. Furthermore, Lim et al. [46] use direct feedback on discarded food to stimulate self-reflection and improve food planning. Other studies focus more strongly on supporting planning and self-reflection behaviors, like, for example, fridge cams [3, 47]. Those devices record all interactions regarding the fridge and allow access to the contents of the refrigerator ubiquitously. Moreover, research that focused on improving fridge management [46] and giving shelf life reminders [48] observed more awareness and hints for a reduction in food waste. Still, managing the inventory by hand and tracking consumed goods are tedious tasks that might not solve the problem in the long term [49].

In the light of HFI research, Bertran et al. [16] argue for a critical reflection on the agency of technology to not compromise the rich and embodied interaction with food. To bridge the gap between awareness and action, practice-oriented researchers such as Ganglbauer et al. [3] call for the promotion of “specific practices in which “food is done” to promote more sustainable in-the-moment choices,” which — against the background of food waste literature [6] — asks for more engagement in the embodied moments of deciding whether to prepare and consume food.

From a more celebratory perspective [50], HFI engages in the embodiment of the sensorial perception of food [51, 52]. This branch of research focused on the design of gustatory interfaces that simulate taste [53,54,55], touch [56, 57], and smell [58]. Still, what we can learn is the embodied reaction of users to these impressions that comes with emotions and full-body reactions [51]. An emerging theme is the need to engage these experiences in interaction with real food. For example, Vannuci et al.[59] call for more design towards cooking as a craft where technology enhances the cooks’ agency in “touching, smelling, tasting, listening, speaking, and enacting choreographies with the materials at hand.” Similarly, Hassenzahl et al. [17] argue that rather than enhancing the technology’s agency, we should engage users and their senses, let them experience their competencies, and connect them with food.

Food waste research identified the lack of (embodied) knowledge of the sensorial characteristics of food as the main cause of food waste [6], yet, this discourse is currently missing in the sustainable HCI literature. However, sustainable HCI just began engaging in handmaking and sensorics [12, 44] and “encourages hands-on learning about food materials and nurture commonsense food knowledge instead of prioritizing automation and standardization” [60].

2.3 Co-performing conversational agents

To enable humans to use their senses and enact in choreography with food, a user interface is required that allows for interaction with both: the food and the device at the same time. Here, IPAs respectively, conversational agents offer promising solutions as they do not need visual contact nor occupy the touch sense during the interaction. And indeed, commercial voice agents, like Siri, Alexa, or Google Assist, are increasingly pervasive in kitchens to assist in various situations. Thus far, they are used for short and trivial actions, such as setting a timer [20, 61, 62], but bear the potential to support the human with complex tasks and decisions [18, 20,21,22,23,24,25]. Furthermore, research shows promising results to use conversational agents as learning environment or companions [63,64,65,66]. However, as Hobert and Meyer von Wolff based on their literature review conclude, generalization, e.g., design knowledge and a thorough understanding of the design process is missing to contribute to future design of valuable learning environments.

Nevertheless, designing more complex tasks for IPA is difficult. First, the attempt to mimic human-like capabilities leads to high expectations in the intelligence of the assistant [67,68,69]. Up to now, these are mostly not met and leave users frustrated with the limited relationship to the agent [68]. A similar phenomenon is observed for social robots [70,71,72,73,74] where the anthropomorphic or human-like design implies a social presence which they do not live up to in direct comparison to humans. Here, distinct roles with an accordingly defined skill set [74, 75], with speech as a functional, embodied communication feature [67] might be a better paradigm.

Second, despite the opportunities of conversational agents to allow for human agency, they often miss the chance to engage the human directed in action in the decisive moment of the situation [68, 76]. Most dialogues are designed for simple command-responding tasks where the human commands the agent to execute [61] or the technology design in general rather focuses on automation and eliminating human decision-making at all [77,78,79]. Consequently, the design space for collaborative decision-making and co-performance remains secondary.

The notion of co-performance addresses both issues by exploring a useful distribution of capabilities and responsibilities [27, 28, 80, 81]. The authors [27] argue that an artifact should be considered as an active contributor to practice and designed to have the autonomy to learn and act next to humans. For example, they studied domestic heating practices and their evolution of the involved artifact, regarding capabilities, responsibilities, and roles in collaboration with the human from the fireplace to thermostat. While in the past the judgment to heat the fire was assigned to the human, nowadays the thermostat has the agency to decide about the temperature. Still, the human with his or her senses might experience temperature differently. Depending on the situation and embodied knowledge of temperature regulation, the human might overrule and negotiate the decision of the artifact. In this sense, the performance and decision-making of an artifact should be discussed under technological terms in the realm of possible sensing, interpretations, and actions that are differently embodied than by humans. Kim and Lim [28] discerned in their study on human and agent co-performance influencing factors like the human’s mental model towards the agent, considering a learning period to build trust in decision-making and that applying more human-likeness does not contribute automatically to more acceptance and rapport-building. The “artificial performers should be considered as a category in their own right and not as (poor) imitations of humans ones” [27]. Instead, we should focus on the design of an appropriate process of collaborative decision-making exploiting each one’s capabilities, especially in situations of uncertainty [67, 82]. Form an embodied cognitive science view, according to the Sense-Think-Act cycle of Pfeifer and Scheier [83], intelligent machines have first to sense and then to compute before they act situated. Situated means “if it acquires information about its environment only through its sensors in interaction with the environment” [83]. Yet, sensors of machines might not capture the situation in full multi-sensory as humans do. In the opposite, humans often sense and perform certain practices simultaneously with less deliberation involved. Yet, in the case of food assessment, for instance, they have to actively reflect on their intentions and multi-sensory perceptions first. Therefore, our research question focuses on how voice assistants may contribute to negotiating (embodied) knowledge with humans and potentially to preventing food waste.

3 Pre-study: edibility between shelf life, rules-of-thumb, and trusting one’s guts

Gaver [84] and Wulf [29] argue that the aim of a pre-study is to sensitize and inspire design. In this methodological tradition, we descriptively present our pre-study. The objective of the empirical pre-study was twofold. First, we aimed to understand the current practices of consumers, how they examine the edibility of food, which knowledge and skills they apply, and how they negotiate institutionalized and embodied knowledge. With the main problems of consumers well covered by previous research (section 2.1), we summarized relevant design insights to our specific case. To further understand the assessment of fish and how to explain the procedure to an apprentice, we interviewed six experts. Although food waste is present in all food groups, especially in dairy products as well as vegetables, fish is a particularly sensitive example that is subject to many uncertainties of consumers. Hence, it is exceptionally challenging and risky because a majority of consumers lack knowledge and experience with this product.

For the first part, we conducted a qualitative study with 15 consumers (C1–C15), using semi-structured interviews and contextual inquiry in their kitchens. We took photos of the inside of their refrigerators and asked them to explain and show their everyday food handling to further understand the material context of different performances of storing food and assessing freshness of food. The participants have been advised not to prepare for the interviews because we wanted to observe their actual practices, e.g., maintaining freshness of products that are overdue. All participants testified that the inside of their refrigerator has not been altered for this interview.

The participants were recruited through opportunistic sampling within the author’s extended social network. The sample varies in its socio-demographical characteristics with 11 female and 4 male participants, aged between 18 and 88 years, but having the main responsibility of household management and food practices, as can be seen in Table 1. Furthermore, it ranges from younger inexperienced consumers to family parents with a lot of cooking experience. Due to this diversity, we were able to identify a variety of food practices. For the second part, we conducted semi-structured interviews with six experts (E1–E6), including a university teacher on food safety, a cooking teacher, fish traders (supervising apprentices), and a chef. First, we asked them to explain their assessment procedure to a trout that we had brought with us. Next, we followed a semi-structured interview guideline to understand their explanatory approach, recommendations for consumers, and risks. All interviews were transcribed and analyzed in MAXQDA Footnote 1 following the inductive approach of thematic analysis [85]. Accordingly, the answers of the participants were coded and clustered by two researchers independently. We discussed the codes among the authors and refined those in a second round to derive the themes of our analysis.

Table 1 Overview of contextual inquiry participants

3.1 Consumers approach to assessing food

We found varying strategies for assessing the edibility of food for different products. For packaged food, canned food or jam (C10, C13, and C14), milk (8 of 15), and cheese products as well as meat and sausage products (6 of 15), the shelf life is used as an initial indicator. For some consumers, shelf life is not critical for their consumption decisions, e.g., for meat (C1, C5, and C14) and dairy (C1, C2). Some consumers would even buy and consume those products with an expired date if they can consume it the same day. For others, shelf life varies between guiding and determinant when disposing of food.

Problems arise when participants no longer can recall the product opening, purchase date, or expiry date. This is resolved either by sensory evaluation of the products, or for some by estimating the time (C4, C12, C14, and C15). At this point, all participants declare their intentions and attempt to use their senses when examining the freshness or edibility of food before they prepare, eat, or dispose of it. For this purpose, they begin with a visual assessment, looking for signs of decay, e.g., mold or rot. This procedure is conducted for any product. Regarding milk and yogurt, the participants declare that they are impervious to shelf life if the consistency has not changed and no mold is visible. First, they smell the product and then eat or drink a small amount in the meaning of a “small spoon” (C1) or a “knife tip” (C12), which are harmless to health, to further decide on the product. The spoiled smell is described by C7, C11, and C14 as acidic and C15 would explicitly look at the milk to see if it “crumbles.” However, in the case of fish, meat, and boiled eggs (C5, C11), participants expressed greater concern about food poisoning. This is why they act much more cautiously and some tend to throw the product away. Here, also the consistency of meat is checked for changes in color or “smeariness” (C10) and its smell (C11, C15). The eggs are, if possible to estimate the storage time, at least peeled and checked for optical and olfactory signals. Nonetheless, participants state that some qualities cannot be assessed by their senses. As C8, for example, explains, a salmonella infestation cannot be detected.

Interestingly, we could observe varying degrees of food literacy regarding age and household responsibility. Student consumers more often lacked consistent routines and competencies to maintain the freshness of food and storage hygiene, whereas older and experienced consumers had appropriate storage solutions like special Tupperware but also more space such as a second freezer in the basement. All in all, the explanations of the consumers show them trying to triangulate between their bodily reaction to the food, rules such as the identification of “crumbliness,” and institutionalized knowledge in the form of shelf life. Especially when one information source does not lead to a decision, uncertainty arises and different approaches are combined. C14, for instance, explains that the visual perception of meat on the verge of expiry must be perfect, and stressing the meat should not “leak,” referring to liquids inside the plastic container. To double-check edibility, she does an additional smell test. To obtain additional information, C15 also asks a person for their opinion on freshness and shelf life.

Edibility turned out to be a complex, culturally related construct that is also shaped by individual horizons of experience. This experience is usually described in years of experience or gained through cooking with parents. Concerning this, the perception of edibility differs. In this context, several participants also talk about freshness, which seems to be used as closely related to edibility. Nine participants describe freshness as rather “harvest fresh.” Some of them refine it as “from field on the table or directly to the stomach” (C12) and as “ultimate freshness” (C11). Besides, “freshly harvested” also means that the product is just ripe (C4, C8, C10) as it “falls from the tree” (C10). Furthermore, participants used nonsensory characteristics to define edibility. Seven of the participants associate it with healthy to eat and safe food. For one participant, however, food safety is nowadays even of secondary importance to environmental considerations:

Sterility is the wish that things are packed that not everyone has touched. Today it is rather that I take the unpacked goods because I would like to support environmental thoughts. –C15

Freezing of food to preserve edibility is, however, controversial. C3 and C12 regard shock-frozen fruits and vegetables as vitamin-rich as freshly harvested products. In contrast, five participants judge frozen products in general and, more precisely, defrosted bread or ready meals as not fresh.

3.2 Teaching embodied knowledge

As our contextual inquiry confirmed, perception of freshness as well as assessment procedures differed between the households with meat and fish as particularly sensitive cases. Against this background, we wanted to focus in our design on this food item as the model case. Asking the participating experts to explain how a fish should correctly be assessed in terms of its edibility, it quickly became apparent that a multi-sensory approach is needed. This approach includes the senses of sight, smell, and touch that are applied to various characteristics of a fish. Taste, however, is according to E1 only appropriate if the fish is processed in a salad or similar.

All experts agree on such a multi-sensory approach as different preprocessing steps might change certain qualities of the fish. For example, storing it on ice clouds the eyes, which is usually perceived as a sign of decay. However, also some tricks and attempts to deceive consumers were explained. For example, E2 examined a fish with the gills removed from the fish, which the expert calls as a trick to prevent the fish from bad smell and to cover up the non-freshness. Besides, fish can also be prepared with additives such as lime juice or slightly smoked, to enhance durability and hinder the freshness assessment.

In summary, the assessment procedure introduced by the experts includes the following test items. Still, not every expert uses every of those test items, as they usually just need a few checks to determine edibility. However, as explained, multiple tests might be needed if, e.g., the gills were removed.

  • checking for the smell (either neutral of fishy)

  • checking the flesh with pressure (either the dent stays or not)

  • visually checking the eyes (cloudy or clear)

  • visually and tactile checking the skin (slimy and shiny or dry and dull)

  • scratching the scales with a knife (falling off or not)

  • visually and tactile checking the fins (dry and frayed or in wet and normal conditions)

  • visually checking the gills (red and not slimy or pale color)

  • visually checking the inside (light red or thick/coagulated)

  • visually checking the flesh (normal color or greenish/brownish)

Moreover, those rules do not have an explicit order, but some experts used the saying “the fish stinks from the head” (E3) to explain how they start with the gills and their smell. They continue with the eyes and go on with the other parts. Still, some of the items are considered as stronger and more obvious indicators for peak freshness like fire-red gills.

As this rule-of-thumb already indicates, assessing the freshness of a fish is similar to the approach of our interviewed unprofessional consumers, and closely related to experience and some roughly defined rules. Much knowledge is embodied, and over the years, the experts learned to understand their bodily reactions and feelings. For example, E5 said “Bad smell you know, my sense of smell will understand it. It’s non-describable. It’s kind of abstract.” Nonetheless, they tried to articulate their knowledge as rules, for example, using analogies such as “fishy” or “seaweedy” for bad fish or “neutral” or “fresh sea breeze” for good fish. Similar articulations were found for visual characteristics, such as “bloody colored” or “rose.” Here, they also often referred to a normal-looking fish that they had internalized over time. Quite difficult was the articulation of the tactile sense, which the experts indicated to, for example, the normal reaction of the fish skin to pressure, which is fish type-specific and must be learned with time. Still, they argued that a fast reaction of the skin to the pressure is a good sign. Moreover, they highlighted to show and explain the location of certain body parts, e.g., where to find the gills (E6).

Finally, the experts raised our awareness about the field of tension between sustainability and food safety. While some experts (2/6) were more relaxed to the danger of eating slightly decaying fish, others recommended being more cautious. In the worst case, the fish can be toxic, but still, they argue that in those cases everybody should show some natural bodily reaction. Furthermore, in cases of doubt, they recommend at least well cook the fish to prevent salmonella. In this respect, the shelf life was mentioned and that any fish, far from this date, should be disposed of. Otherwise, sensory assessment should only be used in doubt near the shelf life.

3.3 Preliminary implications for design

As the results show, consumers are motivated to use their senses, but often miss guidance on the procedure and interpretation support. Prior research [3, 6] already highlighted that more support for in the moment decisions is needed since lasting behavior change is challenging to achieve. Furthermore, our research points out that the meaning of freshness is affected by the dispersed moments of consumption practices [3]: During shopping, freshness is described as harvest-fresh and ripe, yet, descriptions change in the home context, where the focus is rather on the assessment of edibility. Hence, storing does not represent a negligible practice within the nexus of consumption practices [3, 16, 86], but is central to ensure freshness. It is a practice of keeping food as fresh and edible as possible, in need of competencies to assess food qualities by making use of multiple senses and food condition information [87]. Therefore, we should offer advice beyond the obvious visual indicators of decay and provide clear, quick-to-apply instructions that promote experiential learning and collaborative learning. We need to explain food safety regulations in context and use descriptions that illustrate the gradual differences in food quality like, for example, “sea-weedy.” Furthermore, our findings show that freshness is often described negatively as a deviation from expectations, how something must look, taste, and feel [35]. Therefore, antonyms are used such as “not old,” “not spoiled,” or “if the salad is not withered” to define what is not fresh. This indicates the importance of verbalizing sensory impressions that contribute to a shared understanding, in particular when designing with speech. Furthermore, consumers have to train senses to trust their bodily reactions and develop personal rules-of-thumb. The freshest food offers the highest taste experience, but consumers need to taste first to know the best condition of the food. Moreover in everyday life, trade-offs between fresh and, therefore, healthy food and edibility cannot always be sufficiently avoided. Hence, the prototype needs to enable consumers to understand that although food can still be processed, it may require additional flavors to improve the taste. The design has to acknowledge the perceived severeness of varying health risks between food groups by the consumers to ensure sincerity and reliability. With fish and meat, consumers are very critical and cautious and tend to dispose of the food more quickly. Yet, the prototype should refrain from blaming consumers if they do it anyway. Instead, we need to carefully and patiently explain the instructions and assessment categories as transparent and comprehensible as possible.

4 Prototyping: voice agent

Our first expert interviews and observations verified the main criteria and sensing approach for quality assessment and helped to determine the best guidance order and provide additional reasoning for fish characteristics. To a further extent, we triangulated the preliminary implications with research-based guidelines on fish food safety [88]. In the next step, we carefully solidified the empirical data in a collaboration model to define both, the procedure of assessing freshness as well as the capabilities of the user and voice agent (Fig. 2, step 1). Based on the preliminary design implications and needs consumers have, we aimed to design the food assessment as a collaborative task (Fig. 2, step 2). As follows, the main concept was iteratively evolved using a combination of Role-Playing and Wizard-of-Oz sessions [89], as can be seen in Fig. 2 steps 3a to 3c. To investigate the procedure and elaborate dialog drafts, we began with Role-Playing in our team. In contrast to Wizard-of-Oz, Role-Playing allows to explore dialogs freely to collect possible directions and phrases. As a rigorous method to test the system’s capabilities by the efficiency and sufficiency of utterances, we continued with seven scripted Wizard-of-Oz guidance sessions that restricted further use of “common-sense” to empathize with the user [90]. At this stage, the agent already had some structured guideline with questions and answers to adhere but still the wizard was able use some common-sense to prolong the dialog to a successful ending. After refining the dialog paths, we conducted a second round of Wizard-of-Oz. This time we used the telephone to reduce a potential social presence of the agent and did not deviate from the script. This allowed us to rework error handling and fallbacks by experiencing dead-end conversation cues. Our sample was between 20 and 30 years old and unfamiliar with fish assessments. At this prototyping stage, the limits to a design agency and coaching became aware. The interviewer, in the role of a voice agent, used the list of attributes that indicated the status of freshness. The potential users had a photo of the fish for greater immersion in the situation. The drew upon past encounters with fish and imagined different states of sensory impressions. We renounced the use of fish in the prototyping phase to avoid food waste. Finally, we implemented our dialog tree Footnote 2 in Google Dialogflow (Fig. 2, step 3d) and tested all paths (Fig. 3) within our team. Afterwards, we captured the interaction between the user and the agent as one “happy path” in a video (Fig. 2, step 4), to use this video-prototype to illustrate and evaluate the conceptual design of the artifact (Fig. 2, step 5).

Fig. 2
figure 2

Proceedings and single steps of the Design Case Study, own representation

Fig. 3
figure 3

Representing all possible dialog paths for human-agent co-performance. In total, 8 indicators to check freshness with 5 possible conversation endings on behalf of the user. The video prototype showcases the solid line from stage 1 to stage 9

4.1 Sketching human-agent collaboration

The first draft of our concept was based on the main assessment criteria from our prior research and food quality experts. Furthermore, we used the observations and suggestions by the experts to prioritize the chronological order of information, so that users get reliable results with a minimum number of questions. Therefore, we visualized the potential paths and outcomes in a decision tree and specified the most critical characteristics to be asked first, as can be seen in Fig. 3. Assessments like gills, smell, and color of fish flesh are primary and mandatory aspects, whereas eyes, scales, and fins are additional determinants to indicate the condition of the fish quality. Nonetheless, the ambiguous interim results of the fish condition will need more checks for a final decision. We designed transparent step-by-step explanations to allow users to trace the decision path from beginning to end, e.g., “Okay, so your fish has no gills. Then let us skip the gills and start with the fish inside test. Let us now open the fish, so that we can see the abdominal cavity. Is the fish meat bright and more to the whitish, pale, or pinkish or is it more to brownish yellowish or greenish?”. Thereby, the agent encourages the human to interact with the food product and teaches to interpret the sensory impressions correctly to come to their own, resp. the same conclusion, as, for example, seen in Fig. 4. The human constantly describes and answers the agent to determine, collaboratively, and successfully, whether the fish is still edible without risking health. During the co-performance of the assessment, the users shall not feel patronized, but self-confident and reassured by the collaboration to trust their own senses: “That is good. The body of a fresh fish is firm and when pressed it should bounce back. The fish is fresh enough to be prepared with heat but the flavour might not be the best as the skin is not at peak freshness. Would you like to further have more detailed info about the freshness of your fish?”. Yet, the agent has to react patiently to possible misunderstandings or indecisiveness by users. Finally, the voice agent emphasizes that the responsibility for further actions lies with the user.

Fig. 4
figure 4

Instructions to perform pressure test for freshness by Fischer Fritz, own representation

4.2 Refining voice interaction

In the following Wizard-of-Oz sessions, we explored the conversation with different degrees of role-play restrictions and freedom to simulate the intelligence of the system, as shown in Fig. 3 step 3a to 3c. Meanwhile, we noted possible dialog sequences, unexpected edge cases, missing fallbacks, and collected a variety of utterances to refine the dialog. Besides the right keyword use, edge cases include remaining challenges to explain sufficiently the position of the gills, the right amount of pressure on the skin, or verbalize possible olfactory impressions. As ambiguous descriptions lead to misunderstandings, we implemented non-standardized fallbacks to catch edge-cases and to sound more personalized. Moreover, repetitions help to ensure a shared understanding of the progress and indicate active listening, as can be seen in Fig. 3.

Conversational guidance is based on proactive questioning and proposing distinct adjectives to simplify decision-making. Hence, the voice agent is responsible to perpetuate the dialog and depends on users to answer. We deliberately reviewed all utterances and refined wording and sentences. Thereby, we decided to use explicit adjectives to provide users with clear answers to use. Some of our participants during prototyping find it hard to describe their sensory impressions in their own words. This results also in an advantage for the interaction, since potential dialog errors and fallbacks are reduced to a minimum. The trade-off is a less free conversation for the human, yet better than leading questions on a yes-or-no basis as criticized in our Wizard-of-Oz sessions. Furthermore, distinct opportunities to exit the dialog increase the satisfaction of an accomplished task. Either the agent ends the dialog by reaching a decision quickly or users are convinced to have enough information to skip some or all further assessment steps. Some test users mentioned that they liked the provided additional or more detailed information, but would prefer to ask actively for it. Furthermore, for transparency reasons and to show trustworthiness, the final suggestion of whether to consume the fish or not is carefully verbalized and communicated: “Fins and scales are in great condition but the eyes make it appear a little bit less fresh. The fish is good but please also rely on your senses to not risk your help.” To emphasize an inclusive understanding of performance and create a team experience, we used utterances like “Let’s perform a few tests.”

5 Prototype evaluation by experienced consumers

The main goal of our evaluation was to explore attitudes towards the usefulness of voice assistants in the prevention of food waste, their potential and limits to convey embodied knowledge, and decision-making in collaboration with our voice agent Fischer Fritz. The prototype is not exclusively designed for cooking novices, but aims to support where guidance is needed. We used video-prototyping as a common method in HCI to focus on the concept evaluation of novel artifacts, as proposed by Diefenbach and Hassenzahl [91] allowing to observe several experience levels like interaction, functionalities, and emotions at the same time. The attention is directed rather to the embedded everyday experience without distracting users with usability problems or immature technology aspects [91, 92]. This method is also suited for Human-Agent Interaction [93, 94]. In light of our contribution, this work goes beyond a usability evaluation and discusses design implications to improve sensing, thinking, and acting in co-performance as immediate guidance in the situation of challenging indecisiveness based on a novel artifact. As solving usability issues was already in scope of the iterative technology design, the evaluation reflects on the opportunities of the design to promote appreciation of food and preventing unnecessary food waste.

The video prototype takes 4:05 min and shows a typical scene where a consumer picks a fish from the fridge and doubts its edibility. In the next step, Fischer Fritz is approached for support. In the following, the user and the agent exchange information about the fish characteristics and interpretation of the indicators to come to a useful conclusion. To immerse the viewers of the video, close-ups of the fish help to build their own impressions except the smell. The interactive guidance represents one possible assessment combination out of eight combinations in total (see “Happy Path” in Fig. 3) from the original dialog. Although we wanted to display the consideration of all available fish characteristics and sensory impressions. In this take, some of the fish characteristics are ambiguous in perception, which leads to the most insecure scenario of all available outcomes by using the prototype. Our aim was to confront the participants with a remaining risk to provoke insightful discussions about trust in their senses, the voice agent, and prior knowledge as well as their attitude towards technology in general.

Afterward, we interviewed 15 consumers with a varying range of food experience across Germany, following a semi-structured interview guideline that roughly covered the topics of their perception of co-performance and its potential. In particular, we asked about the meaning and communication of (embodied) knowledge regarding food quality and food value, the risk and control in the process of decision-making, distribution of roles and capabilities, and impact of user empowerment. We aimed for a sample that is well suited to assess the role of the agent and the challenges of teaching rule-based and embodied knowledge, as can be seen in Table 2. The participating consumers (P1–P15) were recruited from contacts from prior studies and the extended social network of the authors. With aiming to collect a variety of perceptions and opinions on the prototype and concept to encourage more embodied interaction with food, we chose consumers with different experience levels. Those ranged from highly experienced home and family cooks to professionals who blog about food or who were trained in gastronomy and give cooking classes. We have a tendency of more food experienced consumers, as they interact with inexperienced consumers regularly and understand their struggles in a more condensed manner. All participants were between 26 and 80 years old. Moreover, as our sample is familiar with the properties of fish, video prototyping does not limit the evaluation due to less sensorial experience, rather allows to center the focus on the verbalization and communication of knowledge. From a more pragmatic stance, we moreover, did not want to risk any food safety issue in a real-world trial or unnecessarily wasted fish (that we would have to let decay on purpose) in a laboratory setting.

Table 2 Overview of prototype evaluation participants

All interviews were conducted using remote conference calls and sharing a private video link during the session. Afterward, they were transcribed verbatim and thematically analyzed in MAXQDA. We followed the thematic analysis procedure as outlined by Clarke et al. [85]. Two researchers coded the themes independently and discussed towards agreement on the themes for further refinement. The final themes are represented in the headings of the evaluation results. Furthermore, we translated the quotes from German into English.

5.1 Trust and autonomy in decision-making

All participants are pleasantly surprised about the guidance by Fischer Fritz and the interactive design of the provided approach itself. They recognized their methods and explanations, similar to the way they teach knowledge to apprentices in the workplace (P5) or participants in cooking courses (P4). P11 emphasizes that inexperienced consumers could get a new perspective through the voice assistant and regain more confidence in their own senses. Some of the participants also praise the additional information, descriptions of the possible sensory perceptions, and explanations that are given for the respective fish characteristics.

The pressure test is very important. Sensational! Yes, you have addressed everything, everything important. There are of course many fish products that no longer contain gills. That can have many reasons, but it is usually said that gills are also decisive and must be bright red. –P4

Overall, most of the participants (12/14) see an opportunity to reduce food waste and estimate the risk to make mistakes as low due to the distribution of tasks and capabilities. They (7/14) also value the availability of the technology at home and the easy access to information by Fischer Fritz. Even though some (5/14) of them express a certain distrust towards voice assistants, they confirm the potential support and comprehensible advice.

No, he [the voice assistant] is very clear and explicit, but I wouldn’t say patronizing, it’s just that he, and oneself, wants to make sure that everything is in order and properly inspected. If it is then later said ‘yeah, I still got diarrhea’ because the food was spoiled, then people say subsequently, ‘he didn’t tell me that I had to check it’.–P13

Again, all generally appreciate the additional reassurance and note that, in their opinion, potential paternalism only arises when Fischer Fritz confronts users who have a higher level of knowledge than the agent himself (P8, P13, P14) or, for example, when new insights contradict their intuition (P2). Otherwise, they might blame the assistant and deny any responsibility. Moreover, users need to actively seek support when using Fischer Fritz (P8, P11), and they are prompted to make their own decision based on agreement with the results (P12, P13, P14).

The human is still [in control], and everyone should use his or her own mind or willing. Whether to eat it or not, he can decide for himself how he likes. Therefore, finally, I see the control still with the human and, so to say, the device only in such a way as a control body.–P12

Moreover, participants (8/14) positively highlighted the structured and step by step guidance and sensory checks (P14) aligned with the actions of the user (P8, P14, P2, P10) without information overload. The descriptions help to check and classify the sensory impressions as well as to look at features that otherwise would not have been considered at all. Hence, the human takes an active role in quality control and retains autonomy in his decision making.

They complement each other. I think the machine has the knowledge and the human simply has the senses, which he has to provide.–P14

5.2 Co-performing food assessment

As mentioned by the participants (7/14) before, complementing the human, our voice assistant can eliminate the last uncertainty and contribute to autonomy in decision-making. For P9, personal control goes beyond his diet. Taking self-responsibility and self-care further lead to decisions for a sustainable environment, since interdependencies determine how we live together. This attitude implies a decision for conscious handling of one’s own life and food.

I think if you eat a healthy diet, you tend to be more conscious of many things that concern you and also the environment. And therefore I would say, most humans I know, who eat very healthy, also pay attention to waste less food.–P11

Besides, four participants considered using Fischer Fritz to check other foods. The value to save animal and plant products is compared to the effort required to use the voice assistant, and, hence, the probability of its use.

With this system, I think it would definitely be possible to avoid [food waste]. I could just imagine other examples that could be a bit more successful. Like potatoes or fruit and vegetables, simply where it is not that critical. The question is which foods should be prevented. Of course, high-quality foods such as fish and meat (...). All the dairy products, for example where you can still eat yogurt after 3 months. The fact that it is thrown away quickly. That the things that can simply be subject to longer storage, are also more likely to be thrown away like those that have a short lifetime anyway. (...) This is maybe with a yogurt that you have 2 weeks in the fridge and then after the 3rd week or a week past the expiration date you just don’t know if you can eat it or not. This case is simply more relevant. The question is, whether in the case of a yogurt one would bother so long asking - answering, because it is also only a 15 cent product, and a fish may have cost 15 euros after all, that somebody is perhaps more likely to do that.–P14

For the efficiency of information retrieval, most of the participants (10/14) compared the voice assistant with their usual Google search. Some of them (2/14) conclude that it would be faster to just read over the information quickly, whereas the majority emphasizes the situated learning and accompanied embodied experience. Some also add that specific information like why the eyes cloud are probably not found at the first search online (P2, P3, P4). Nonetheless, time-conscious participants (3/14) suggest having a quick overview of the total of fish characteristics at the beginning of the dialog. Thus, they can get the first impression of a high-quality condition of the fish. Furthermore, P12 notes that there is always the possibility to skip some parts of the coaching by voice command “Further.”

Quite well, because the written form is just, I think if you want to have a quick look, whether it is still good or whether it is already spoiled. Then it is so cumbersome to enter it somewhere, then just look for something or look it up somewhere. Same with the video. I had to find something first and I want to know it directly. And that is why you simply talk into the room, tap your cell phone, the voice assistant turns on. For me, this is one of the easiest ways to do it, instead of having to look for something somewhere and read it.–P12

Furthermore, they reflect on the modality of speech and its appropriateness to convey knowledge. P3 and P8 note that it might be difficult to teach someone how hard to press on the skin. Although many of the participants (10/14) mention they use visual media like Youtube videos and TV shows, they rather watch it for inspiration than step-by-step guidance.

But what is shown today, I can only say: forget it. I really do not watch any more. (...) Surely anyone can grate or chop carrots. I do not necessarily have to show it on TV every time carrots are needed somewhere, I do not have to show it every time. All I need to say is, “I think carrots belong in there or something.”–P8

Moreover, some argue that pictures to compare the same types of fish (P3) or videos of embodied movements (P8) would support learning significantly. In contrast, P7 expresses his concern, how pictures may contribute to more insecurity by prescribing implications that are not appropriate for the fish at hand. Furthermore, P9 elaborates on how book authors are capable of creating images using comparative examples and words only. He suggests further to update the dialog in this manner.

I think examples would still be important there, which can produce such images in the mind (...). For example, the case of “what no longer serves”. And then also creating the smell for “what no longer goes” on the mind. If you then have such an old Harzer cheese in front of you, so the fish smells like an old Harzer, then you still have to know now, how does a Harzer smell, but having so 2-3 examples of what people might know how something smells.–P9

P1, P2, and P7 weigh in that technical features like cameras, scanners, or sensors could offer a technological sensory reassurance for the assessment. At the same time, however, they claim that it would counteract easy access to the technology already available and would require further investment. However, most of the participants (9/14) denied additional sensors, because they see reconnecting to food and using the human senses for this purpose as the most valuable.

If it is just a camera, you hold it in front of the unit, but if the system itself can touch and feel, I could imagine that you put the product somewhere on it and that it is scanned, touched and sampled. And you certainly know completely detached from the knowledge and experience components that this product can be processed. So you just don’t learn, you don’t train, but you completely hand over everything.–P1

In direct comparison with human-human interaction, the participants (4/14) notice differences as they miss some emotion and passion in the interaction describing it as too functional or informative only. Moreover, P2, P3, and P6 see emotions as a key aspect for cooking and food in general. On the other hand, everyone emphasizes the purpose of Fischer Fritz and its contribution.

The interaction between each other, if you were to ask me now, “is the piece of meat still okay” and then I could explain directly “aha here and there and that’s how you see it,” take it in your hand, etc. that’s just not given with the machines. The cooperation, the communication among each other is different. That just doesn’t work with a machine. But apart from that, it’s completely okay, because it’s purely informative - you want to know something from the machine, and that’s why I think it works.–P13

The majority of participants cannot agree on the role of our agent in the collaborative practice. Some (4/14) say “assistant” is already a good choice because it provides useful advice and is informative. Other participants (6/14) think of it more caring and engaged.

I think of a mixture, I ask my mom how I do it when I’m cooking and really a kind of cooking teacher. Well, I don’t think it’s a kind of a true instructor. There is the issue using speech only, perhaps too imprecisely.–P3

Concerning the voice interaction itself, P9, as an instructor himself, immediately felt strongly reminded of a training situation by the “tone of voice, by the way he spoke to the person.” But to be able to speak of a “coach,” in contrast to humans, the participants (8/14) miss traits like empathy, truly open questions, and spontaneous dialogues. Moreover, most of the participants (12/14) emphasized the explorative character of coaching, such as letting the users make mistakes and guide them to find their own solutions.

A coach helps, so I think what makes a coach is he helps you to develop or discover something yourself. He does not prescribe it but helps you to develop or discover the solution. He does not give you the solution, but he helps you to create it.–P9

5.3 Embodied human-food (waste) interaction

To increase the value of food and develop passion, the majority of participants (11/14) point out that people must engage with food and relearn the natural characteristics of food. Hence, some of the participants highlight that the interaction facilitates shifting the attention of the users to the food itself.

Yes, well, I just thought that it would be better now more practical than a book with my fish hands, or in the iPad, cell phone with my fish finger must search and the eyes are not for “seeing”, etc. and that I also do not have to look anywhere, on a video, but that I can look at the fish all the time. So that I perceive auditorily, so to speak.–P6

Still, the evaluation of food quality or safety without any experience is a challenge. In general, freshness is according to P3 and P4 a stretchy term. P6 mentions insecurities in online requests to her regarding the use of two or one tablespoons in a recipe that are most of the time not decisive. However, generally deciding on the right ingredients, differentiating between high quality and edibility as well as recognizing the little difference to improve the taste requires experience.

Not fresh anymore means you have to put a little more love into the product when cooking, so that it still tastes good afterward. But inedible and, people are afraid of diseases. You have to know how to avoid it.–P14

Concerning the leftovers of a product, for example, potato peels can be baked to ashes in the oven and mixed into mashed potatoes to intensify flavor (P4). But such stimuli come often as external impulses and need to encourage users to try. However, according to the participants (10/14), the successful application of novel information leads frequently to new personal confidence and in the information itself.

But also that a lot of people don’t know that they can also eat the stem of broccoli when they cut it into small pieces and cook it. (...) That many people simply don’t know what they can use from the vegetable or plant. Cooking experience definitely plays into that. So I’ve read up on it, but from experience I’ve tried it and found it to be good. I would not have had the idea to eat the stem on my own, because you are used to eating only the florets. And that I have read and tried it somewhere. You have to take that step, yes.–P12

At the same time, the direct use of information and the associated experience contributes to engagement and relationship building with different foods. Therefore, people have to learn quality control slowly step by step, for example, what a good or bad fish means (P4). Long-term information and experience will transform into knowledge and help to live independently from technological systems for the most.

I think the system is good, if the system is ultimately used to learn to be able to do without the system at some point. If you make yourself increasingly dependent on the system, you might not even know what you can eat sometime in 10 years. Therefore, I think it is a support to find back to your own senses.–P1

In this respect, there could be even more self-reflection promoted. Therefore, participants (9/14) claim it needs frequent and situated opportunities for novel topics and actions. Even before the presentation of the prototype, the participating experts (10/14) agreed that experience is gained through experimentation and that people need to be sensitized or confronted with it over a long period of time, in the best case, in comparison to the last experience.

By encouraging and motivating him [the user] to reflect holistically. To relive the experience. To repeat it more often. However, in the end, it is enough to re-ask about the situation. To ask yourself again “Okay, how slimy was the fish now compared to my last fish? Remembering that.”–P9

Participants (4/14) note that the quality of teaching and training depends on the user type and the way of teaching by the assistant. Therefore, the voice assistant needs the ability to learn and remember personal information about users like allergies or last requests. This gives the chance to track progress and build on previously acquired knowledge (P9). Moreover, the perceived role and function of the assistant, whether as a strict instructor or a friendly family member or coach, might impact the learning effect (P4, P14).

I think such an Alexa can sound very smart-ass but maybe there is another way. And then it is pleasant again. Or if people just want to be more factual, or fast and effective, the learning types are very different, how someone understands something, whether you need more repetition or not, and if the tool can do that. If the tool can do that, then I think it is already a great opportunity.–P10

6 Discussion and implications

Food waste was addressed by various HCI prototypes [12]. While current practice theoretical research [3, 6] highlights the importance of sustainable in-the-moment choices, with a special focus on food quality and safety as well as the value of food, prior HCI research primarily addressed food waste as a motivational issue [12]. As we used Research through Design [30,31,32], we contribute to a thorough understanding of the design process of interactive agents for a learning environment [66] and outline a potential co-performance by our conceptual design [27]. Accordingly, our design approach is accounting for those decisive moments that are, according to Hebrok et al. [6], entangled between embodied and institutionalized knowledge, e.g., labeled dates. Hence, we reflect with experienced consumers on the potential impact, trust, and responsibility, as well as the necessary artifact properties in the decision-making process contributing to sustainable food practices. Reviewing our voice agent and the respective design case study, we want to discuss our research along the Sense-Think-Act Cycle by Pfeifer and Scheier [83] as a guiding design principle. Usually this model is used to describe and analyze machine intelligence in human terms, in our case, however, we argue that true intelligence and agency arises from and within the collaboration between humans and the machine. Hence, it sensitizes us to possible shortcomings of competencies and capabilities arising in co-performance, where consumers act as sensors that need guidance and support by the agent.

6.1 SENSE: interact with food (waste)

According to Bertran et al. [16], the increased use of automation and sensors leads to an increased agency of the technology rather than encouraging human-food interaction and even might compromise this interaction. Here, our design provides an alternative that encourages more interaction with the material at the border between food and waste. And although we have no insight on an actual food waste reduction, our evaluation shows how the design is perceived to increase the value of food and to encourage conscious embodied interaction with food, which directly addresses current practice theoretical findings [6]. In particular, more experienced consumers agreed on the importance of first-hand experience and the empowerment of the own senses. Hence, a useful and enabling design does not necessarily need more or the newest sensors (e.g., for proof edibility), but leaves room for conscious and independent action. Here our research operationalizes the call of Hassenzahl et al. [17] for more conscious interaction to enhance the experience of and engagement in the practice.

From a co-performance perspective, an agent without sensing capabilities relies on and engages human sense-making. Therefore, the interaction itself reconnects humans and food, which bears broader implications for Human-Food Interaction in the sense to use the agency and limitations of the technology to encourage more agency on the human side. Regarding this, the evaluation highlights the importance of not being patronized by the agent and emphasizes consumers being in control of decisions and sense-making. Complementing visuals or sensors that were discussed to increase reassurance and minimize the risk of a wrong decision could even impede the sensory training and increase technology dependency. Here, a field of tension between human reliance on technology, bodily reactions, and safety (or efficiency in other contexts) emerges.

In conclusion, the design should encourage the human to use and trust their own senses to build the embodied knowledge they need. For future designs, voice agents should be considered to expand knowledge beyond food waste and motivate the human to appreciate and engage in food interaction. This could be done by incorporating additional information like regionality or seasonality serving the perception of food value [6].

6.2 THINK: from machine knowledge to human thinking

From a thinking perspective, it is acknowledged that consumers quickly get confused when trying to rationalize their bodily reactions to the material, which results in the use of institutionalized knowledge [6], in our case the reliance on shelf life and the disposal of food. Regarding this, the evaluation of our prototype shows how Fischer Fritz addresses this problem by providing the means of a step-by-step approach and reassuring the human in his doing. In this sense, the agent takes over part of the thinking, while leaving room for “sense” on the human side. This distribution of tasks was perceived as increasing confidence in decision-making as long as the agent is a trustful entity.

This separation of human sense-making and machine thinking, however, requires a common language. Regarding this, the pre-study already sensitized us for the language used in the specific task of assessing fish that relies on metaphors and figurative language. Our evaluation revealed how the challenge is to balance short commands and carefully verbalized instructions to move the co-performance further without tiring the patience of or confusing users. This shows how mutual reliance and common language in a task allows for collaboration beyond simple tasks [61]. A further aspect of collaboration in thinking is the perception of the agent and his capabilities. Although the agent was compared to humans regarding senses, it was not expected to act or think human-like. Moreover, it fulfilled its purpose by being informative and providing traceable explanations and guidance. In this respect, the machine does not have to mimic human behavior but can complement the human on its own terms [27, 67, 68].

As the participants noted, the agent is ultimately a learning tool which, after the temporary takeover of thinking, needs to provide the means to teach the consumer and finally leave the consumer with its own thinking about the bodily reactions. Active support for reflection and demonstration of the practices contribute significantly to the transformation from institutionalized knowledge to embodied knowledge as our participants reflected on the prototyping approach. This is in line with the claims of purposive learning and active participation in the practice [10, 43]. And although the machine might take over some thinking, learning always relies on the promotion of self-reflection and the negotiation of (embodied) knowledge that depends on successful human decision-making leading to the experience of self-competence and autonomy.

To further leverage the role of a coach, the voice agent has to ask more open questions, allow for mistakes, and more exploration. Concerning the dialog that means to allow for intelligent fallbacks that do not feel like dead-end conversations but are enlightening and encouraging [67, 68].

6.3 ACT: side-by-side with an agent

Thus far, voice assistants do not succeed to engage humans in directed co-performance or conversations [68, 76]. In our approach, the agent and the human have to collaborate and use their unique capabilities to accomplish their goals in practice [10]. The human naturally embodies the use of senses but needs the agent to guide the procedure and classify sensory interpretations. Hence, they complement each other in their distributed capabilities. Usually in human-machine interaction, users act through the machine by direct commands [61]. In our case, both are acting upon the real world through talking and listening and working side-by-side. Interestingly, it is even the agent who leads the interaction of the human with the food. The agent is responsible to communicate the information comprehensibly and adjusted to the humans’ capabilities. Yet, the human can decide any time to end the interaction or to just not trust the advice. In comparison to full automation, the human is actively involved in the decision-making process and can control it in reasonable limits. By assigning power to the voice assistant through knowledge and the ability to communicate in human terms, it acts as an equal collaboration partner next to the human [27]. Kuijer et al. [27] claim to not to use human-likeness as an indicator to assess machines. In our evaluation, we could observe that the consumers were not doing that either. Instead, Fischer Fritz met their technological expectations and was judged by its technological capabilities. Future design research should therefore focus on how to adapt human features, like, e.g., showing empathy by using a specific set of words and sounds and transform it into technological terms.

As stressed by Gherardi and Nicolini [19] knowledge means to have the “competence-to-act” which goes along with engaging in action [17]. To develop embodied knowledge, consumers have to act on their received knowledge, gain experience, and memorize the differences in sensory impressions. Thereby, the voice agent acts as a communicator and offers the human opportunity to link distinct actions with applied knowledge. Thus far, domestic co-performance is often discussed in terms of efficient automation and the elimination of human decision-making [77,78,79]. Instead, we have to analyze the gains and losses long-term, when decision-making is completely handed over to an agent. Along with our case study, we could view different levels of consequences when we lack the ability of food quality control. By experiencing the competence to act, similar to the mastership of a former apprenticeship, with every interaction, the human might appropriate the capabilities of the agent and transform deliberate actions into practice. Thereby, our design is not limited to the scenario to prevent food waste but is appropriate to enhance any agency in craftmanship with materials at hand [59]. Consequently, the human is empowered and enabled to act alone at some point, but still has the reassurance to ask for support in any case of uncertainty. We did not follow an approach designated to educational goals, but rather leverage the sense of urgency, as the user rarely decides on planning to tackle the problem of food waste. However, in this respect, the role of the agent can be adjusted, either to support even quicker decisions or checks, or to anchor knowledge and even more learning by additional information.

Finally, both the agent and human, complementing each other by their capabilities, need to engage in collaboration to act upon the world and accomplish their goals. Similar to thinking, the repetitive offer and mentoring of actions contribute to humans to acquire the competence to act on their own and establish new practices.

7 Limitations

Our study encounters several limitations. Neither did we conduct a formative usability study nor a study in the wild to investigate long-term behavior change, effectiveness of decision-making support or to adjust further critical speech related form factors to ensure smooth interaction by a majority of users. Our aim was to explore the design space by Research through Design with a focus on leveraging the opportunities that come with voice interaction and showcase the design of interactive agents to support domestic practices. Future work needs to evaluate the long-term effects of interaction and appropriation regarding the impact on food waste prevention. Although we cannot elaborate on the possible effectiveness on footprint reduction of this intervention nor claim that this will impact sustainability on a large scale, we followed the call by Hebrok et al. [6] for more situated consumer decision support along the food lifecycle and offered an alternative approach to persuasive technology design. Furthermore, the lack of cultural comparison is clearly a limitation of our study being grounded in western consumption patterns. Future design studies should address and include culturally related constructs of notions of edibility and freshness.

8 Conclusion

The present case study proposes the design of a voice assistant which supports the negotiation and transformation of institutionalized knowledge to embodied knowledge to prevent food waste. Our prototype Fischer Fritz offers humans a domestic co-performance to decrease personal insecurity and gain the competence to act. Empowering human sense-making and decision-making leads to engaging experience and action without compromising the food relationship. Consequently, this work contributes with its detailed design process to design knowledge as well as to considerations on co-performative sensing, thinking, and acting between conversational agents and humans. Future alternative case studies might strengthen the understanding of design practices of interactive agents and learning environments.