Robot's Gendering Trouble: A Scoping Review of Gendering Humanoid Robots and its Effects on HRI

The discussion around the problematic practice of gendering humanoid robots has risen to the foreground in the last few years. To lay the basis for a thorough understanding of how robot's"gender"has been understood within the Human-Robot Interaction (HRI) community - i.e., how it has been manipulated, in which contexts, and which effects it has yield on people's perceptions and interactions with robots - we performed a scoping review of the literature. We identified 553 papers relevant for our review retrieved from 5 different databases. The final sample of reviewed papers included 35 papers written between 2005 and 2021, which involved a total of 3902 participants. In this article, we thoroughly summarize these papers by reporting information about their objectives and assumptions on gender (i.e., definitions and reasons to manipulate gender), their manipulation of robot's"gender"(i.e., gender cues and manipulation checks), their experimental designs (e.g., demographics of participants, employed robots), and their results (i.e., main and interaction effects). The review reveals that robot's"gender"does not affect crucial constructs for the HRI, such as likability and acceptance, but rather bears its strongest effect on stereotyping. We leverage our different epistemological backgrounds in Social Robotics and Gender Studies to provide a comprehensive interdisciplinary perspective on the results of the review and suggest ways to move forward in the field of HRI.


Introduction
Gender studies emerged as an academic discipline in the 1980s to study and understand the nuances of how gender is imbued in the power structures of society, as well as how gender materializes in the design of objects, spaces, and knowledge practices [41].Gendered design is common in machines and objects [19], for instance, in medical devices [18,29] as well as children's toys [24,63], and is oftentimes deemeed necessary to accommodate individual differences and users' preferences [45].More often than not, however, gendered design is redundant and conducive of stereotypes and binary perspectives on gender (i.e., the understanding that gender includes only two discrete and opposite categories of female and male [12]) [27].The inherent binarism of gender has been heavily contested with the emergence of feminist and queer theory for its normative power and exclusionary potential [12,41].Gendered robots are a particularly interesting case of gendered design as their "gender" often derives from their humanoid shape, and is thus deeply entangled with the human body [57,56].There is still little knowledge about what exactly it means to "gender" a humanoid robot and how the gendering of robots impacts users' perception and interaction with them.In this scoping review, we are particularly interested in the emergence of the practice of gendering humanoid robots in Human-Robot Interaction (HRI) research to assess its feasibility and consequences and identify ways to move forward.

A Perspective from Gender Studies
"What is gender?"seems to be the imperative question with regards to gendered robots which presupposes the arXiv:2207.01130v2[cs.RO] 27 Apr 2023 idea that gender is a concrete thing.In feminist theory and the academic field of Gender Studies, the object of study is assumed to be "gender" (see [11,41]), yet the interest does not lie in identifying the essence of gender as a fixed category but rather in recognizing the transformative value of gender as a system of thought and a practice.Once gender is not anymore understood as an inherent characteristic or physical attribute of a body but instead as an organizing principle embedded in social structures, behavior, design, and norms, it can be seen as a lens that organizes human life and the knowledge about human bodies.Thus, assessing the effect of "gender" in robots through the theoretical lens of Gender Studies shifts the emphasis from gender as a fixed property of robot bodies to the investigation of gendering practices of robot development and testing.
Historically, the distinction between sex and gender (or lack thereof) has been influential for acknowledging the socio-culturally constructed aspects of being a woman or being a man in the wider society and the roles attached to it.The fact that gender is assumed to derive from sex strengthens the idea of an essential difference between men and women [41,11].Prominent feminist philosopher Butler [12] introduced the false dichotomy of sex and gender, and argued that sex is as equally socially constructed as gender.Through this argument, Butler emphasized the performativity of gender (i.e. a repetitive, ritualized process of talking about and doing gender as a social act [10]) and its use as a principle to organize human bodies and knowledge.Moving from thinking of gender as an attribute ("having a sex/gender") or an essence ("being a sex/gender") to thinking of it as an organizing principle allows a theoretical shift from the analysis of gender as a social marker to the analysis of gendering as a process (how "gender" is done) [12].Beginning to trouble what "gender" means for robot design and attempting to focus on how "gender" is done by roboticists is at the core of this review.
In most cases, gendering is a process of dividing into two categories and hierarchically positioning them in opposition to one another [38] [39].If an object is conceived as masculine, it is associated with concepts opposed to femininity.This is not necessarily problematic but can be problematic when designers are oblivious to the hierarchy imbued in these gendered categorizations and the resulting social consequences of certain design choices [1].Gendering humanoid robots means mapping them onto the gendering of human bodies and their hierarchical positioning and other intersected structures of power [17].This entails that the design of this technology is inherently political and likely to reinforce power structures and hierarchies of domina-tion [80,2,23,17].In addition, the under-representation of women and other marginalized identities in the development of technology contributes to these power imbalances (see [15,17]).
Feminist theory urges to shift from a rather uncritical engagement with technology design and testing to acknowledging the transformative and relational potential of technology.If gender continues to be treated uncritically in relation to technology, the danger is, as Balsamo puts it, that "new technologies will be used primarily to tell old stories -stories that reproduce, in high-tech guise, traditional narratives about the gendered, race-marked body" [2].Through a critical engagement, feminist theory developed modes of inquiry into the gendered knowledges and practices and intersectional structures of power [41,17].A deeper engagement with ideas and practices of gendering robots from the Feminist and Gender Studies scholarship would likely exceed the scope of this literature review.With this section, we wanted to introduce core ideas from Gender Studies that could illuminate the results of this review and provide the HRI scholarship with a different, more complex, understanding of the concept of "gender."We acknowledge the many epistemological differences between the two fields of studies, but nevertheless hope to inspire an interdisciplinary cross-pollination that could enrich the understanding of what is at stake with regards to the gendering of robots.

Gender in Robotics
Currently, there is still little knowledge about the effects of gendering robots and what exactly it entails to "gender" a robot.This begs the question whether "gender" can be a useful or harmful design feature in humanoid robots."Gender" as a design variable and structuring element in robotics is a relatively emergent field of inquiry with only a few theoretical engagements.The need to address the issue of gendering practices in robotics developed through critical analysis of prevalent bias towards high-pitched voice assistants on the market, which have been criticized for promoting stereotypes in gendered job associations and normalization of abuse against women [82,1,37].With the increase of robotic technologies used in social settings, aspects like the gendered voice and embodiment of the robot are inevitably in need of critical examination.Thus, testing for a preference of gendered robots is receiving increased attention.
Within the robotics community only a few scholars have contributed to the theoretical discussion about the role of gender and asked for a more elaborate and sensitive investigation.According to Nomura [48], the influence of gender markers in interactions between humans immediately suggests the relevance of gender cues in interaction with robots.However, Nomura highlights that the context and quality of the interaction might be more prevalent than gender itself in influencing people's perception of the interaction with the robot.Most importantly, the need for gendering and its ethical implications (i.e., confirming gender role stereotypes) is at the heart of Nomura's critique.He emphasizes the need for a deeper discussion on the topic of implementing gendered features in robots.In line with Nomura, Alesich and Rigby [1] argue that there is still a lack of knowledge about the effect of gendering robot design.Roboticists are often not aware of the interweavings of gender and human bodies and how it organizes society and values.The focus on technical problem solving and the fast-paced testing and production in research and industry do not allow for ethical considerations of the social consequences that implementing "gender" in robot design would require [1].Thus, critically engaging with gendering practices in HRI is highly recommended.
Søraa [73] introduces the idea of mechanical genders for robots, which mirror the physical and social aspects of human gender as understood in the field of psychology (which commonly distinguishes between biological, social and psychological gender).Søraa's theorization acknowledges the invented and mirroring effect of modeling robot "gender" after human gender while preserving the difference between them.Most importantly, Søraa [73] highlights the bidirectional nature of gendering and argues that humanoid robots cannot be "genderless".Indeed, roboticists' and users' understanding and ideas about humans as a category are inevitably influenced by a gendered perspective and likely to flow into the design or perception of humanoid robots.This suggests that gendering might not be an entirely controllable process.
The need and interest to address gendering practices in robotics is evident.Interdisciplinary work is still lacking in this regard, and this review attempts an interdisciplinary overview and analysis of robot's "gender" that integrates the different epistemological traditions of Social Robotics and Gender Studies to address whether imbuing robots with gender cues is a viable and ethical design direction for HRI.

Positionality and Terminology
In approaching this review, we want to be transparent in our personal positioning and critical approach towards the concept of gender and its use in experiments.As women, we are affected personally by potential stereotyping effects of gendered robot design and so we have our stakes in gaining a nuanced understanding and a productive, yet sensitive, way forward in future research practices.This is in no way clouding our ability to assess and reason about advantages and disadvantages of gendering practices.Since a lot of the reviewed studies referred to gendered robots as female and male, we kept the same terms in our writing.This is primarily a way to circumvent confusion and elucidate the terminology used in these papers.However, in this article, we try to shift the thinking towards the process of "gendering" a robot and the "genderedness" of a robot, both described by Perugia et al. [56].According to Perugia et al. [56], the process of gendering a robot is a two-step process of gender encoding, in which designers imbue robots with gendered cues, and gender decoding, in which users attribute "gender" to robots.Gender encoding is an optional step, which can be avoided by resorting to robots with minimal anthropomorphic cues or minimized by avoiding adding gender cues to already gendered robot embodiments.Gender decoding, instead, seems to be a spontaneous process.Indeed, it occurs when designers imbue robots with gender cues but also when they do not, as shown, among others, by Marchetti-Bowick in their work on the attribution of gender to the Roomba vacuum cleaner [43].The present scoping review focuses on the encoding phase of the gendering process, how it is performed by the HRI scholarship, and the effect it has on the HRI.We touch upon gender decoding only when discussing the robot's manipulation check.
In performing this review, we adopt the epistemological perspective of Social Robotics, both in terms of methods and in terms of object of inquiry (i.e., the experimental manipulation of robot's genderedness).Taking a more experimental and techno-centric approach entails consistently simplifying the discussion of gender with respect to its complexity as outlined in this Introduction.We integrate the lens of Feminist and Gender Studies in the discussion to outline and highlight the potential implications of current HRI research practices.In the following sections, we describe the core objectives and research questions of our scoping review (see Section 2), detail the method we used to retrieve the papers included in the review (see Section 3), report the findings of the reviewed papers (see Section 4), and critically examine these findings in our discussion with the aim of coming up with guidelines on how to move forward in the field of HRI (see Section 6).

Objectives & Research Questions
The goal of this scoping review is to describe how the HRI scholarship has understood and manipulated "gen-der" in humanoid robots, summarize the effects of robot's genderedness on the perception of and interaction with humanoid robots, and identify best practices to manipulate a robot's genderedness from a feminist perspective.In parallel with these main objectives, this scoping review also aims to appraise the reason for manipulating the robot's genderedness and the validity of such manipulation.We attempt to answer the following research questions (RQ): -RQ1.How has the robot's genderedness been manipulated by the HRI scholarship?-RQ2.What role does the robot's genderedness play in the perception and interaction with humanoid robots?

Data Collection & Eligibility Criteria
In order to identify the papers to include in this scoping review, we performed an electronic search in the following databases: IEEE Xplore, Scopus, ISI Web of Science (WoS), PsycINFO, and Science Direct.We used the following three variations of the same search string.The variation depended on the number of wildcards (*) that each database accepted: 1. "robot gender*" OR "gender of robot*" OR "gender of the robot*" OR "gender* robot*" OR "male* robot" OR "female* robot" OR ("gender cue*" AND "robot*") 2. "robot gender*" OR "gender of robot" OR "gender of the robot" OR "gender* robot*" OR "male* robot" OR "female* robot" OR ("gender cue*" AND "robot*") 3. "robot gender" OR "gender of robot" OR "gender of the robot" OR "gender robot" OR "male robot" OR "female robot" OR ("gender cue" AND "robot") The papers obtained from the electronic search were imported in a shared spreadsheet and screened against the following eligibility criteria: (i) the papers were written in English, (ii) included the manipulation of at least two "genders" of the robot (e.g., studies including only female robots were excluded), (iii) manipulated the robot's genderedness through the same robotic platform (e.g., studies manipulating two "genders" but with different robotic platforms were excluded), (iv) focused on physical humanoid robots or virtual instantiations of humanoid robots, (v) did not focus on sex robots, and (vi) reported experimental results.These exclusion and inclusion criteria were set so that we could easily identify the cues that the HRI scholarship resorted to to modify the robot's genderedness.The inclusion of papers focusing only on one "gender" or manipulating genderedness with different robotic platforms would have not allowed us to isolate these cues so easily as other factors, such as differences in the robots' embodiments, materials, body parts, humanlikeness, could have influenced the researchers' choice of the cues to use.In the next section, we describe the three steps of the selection pipeline process in more detail.

Selection Pipeline
From the initial batch of 553 papers, we removed duplicate results, front covers, and tables of contents.This process left us with 470 papers (see Figure 1 for the diagram of the selection pipeline).We read the abstracts of all 470 papers and excluded 253 papers that were not in English (N = 2), did not present an experimental study (e.g., theoretical paper) (N = 19), or were offtopic (N = 232).This process resulted in 217 papers.
In a second exclusion round, we skimmed through the papers' content and excluded 169 papers that did not feature any experiment or robot (N = 21), did not include a humanoid robot (N = 15), did not manipulate the genderedness of the robot or manipulated it but using multiple robotic platforms (N = 129), and focused on just one "gender" (N = 4).After this step, we were left with 48 papers.
These 48 papers were divided between the authors and read in their entirety.GP read 29 of the papers, DL 17.Of this batch of papers, 13 papers were excluded because they were short versions of a longer journal paper already featured in our list (N = 4), did not employ a robot (N = 7), employed a robot that was not humanoid (N = 1), or did not have a full-text available online (N = 1).As a result of the selection pipeline, we included 35 papers written between 2005 and 2021 in our scoping review.Out of these 35 papers, 7 were journal papers, 17 were full papers included in the proceedings of a conference, 10 were short papers included in the proceedings of a conference, and 1 was a workshop paper.The selection process is described in Figure 1.The last search was performed in May 2021.

Coding & Information Extraction
Once obtained the final list of 35 papers to include in our scoping review, we performed a thorough work of coding and information extraction.For each paper, we recorded: 1.General information: the name of the authors, the year of publication of the paper, and the type of paper (i.e., conference or journal, short or full paper; see Section 3.1.1). 2. Experimental information: the number of participants in the study, their age and gender, the robot used in the study, the type of embodiment of the robot (e.g., picture, video, physical), the independent variables (beyond the robot's genderedness), the dependent variables, and the type of task used in the study (see Tables 1 and 4, and Section 4.2). 3. Gender-related information: definitions of gender, reasons to manipulate the robot's genderedness in the first place, "genders" manipulated (e.g., female, male and gender neutral robots), cues used to manipulate the robots' genderedness, presence of a ma-nipulation check, metrics used to perform the manipulation check, and rationale behind the choice of the cues (see Table 2, and Sections 4.3, 4.4 and 4.5).4. Results: main effects of the robot's genderedness and interaction effects of robot's genderedness and other independent variables on the dependent variables (See Table 4 and Section 4.6).
Tables 1, 2, and 4 report part of the results of the coding and information extraction process, as well as the summaries of all 35 papers.The rest of the extracted information is presented in the Results section.

Participants
Overall, the studies reported in the papers included 3902 participants (see Table 1).The participants in the studies were more or less equally distributed between female (49%) and male gender (47%, see Figure 2 for an overview).Interestingly, only 1% of the participants in the studies fell in the category other/undisclosed, and the gender of 3% of the participants was not specified.None of the reviewed studies reported the presence of non-binary participants or participants with gender identities beyond the binary.In terms of age, 60% of the papers featured a sample of participants composed of young adults, presumably university students (age comprised between 18 and 30 years); 20% of the papers a sample of adults (older than 30), and 20% of the papers a sample of children (younger than 18).

Robots
In terms of robot choice, NAO was the most used robot (37% of the papers, see Table 1) followed by Furhat and Flobi (featured in 9% of the papers each); Meka M1, Reeti, Willow Garage PR2, and Robovie (featured in 6% papers each); and, finally, Alpha 1 Pro, Pepper, Socibot, and Nexi (featured in 3% of the papers each).Four papers did not specify robotic platform used in the studies (11% of the papers).In 65.7% of the included papers, the robot was presented to participants through a physical embodiment, in 25.7% of the studies through a video (although [14] use a video-recording of pictures), and in 8.6% of the studies through images.

Tasks and Activities
In this section, we report the tasks participants were asked to perform in the reviewed studies, as well as the activities the gendered robots were involved in.
In static image studies (cf.pictures in Table 1), participants were asked to carefully look at a picture of the robot and rate their perception of it on the relevant dependent variables [3,4,21].Similarly, in video-recording studies (cf.video in Table 1), participants were asked to watch a short video of the robot and fill out a questionnaire.Some of the videos featured the robot speaking to the camera (e.g., explaining a topic) [8,20,22,40,60,77].Others showed an actual interaction [28] or described it through a series of vignettes [14,34].In studies including a physical robot (cf.physical in Table 1), participants observed a co-present physical robot performing a (set of) behavior(s) or explaining a topic [13,42,50,49,53,54,69,72,79,84] or directly interacted with the robot [26,31,32,33,58,61,62,64,67,68,70,78,86].They rated their perceptions of the robot and/or interaction immediately after.
In the following, we briefly describe the content of the activities in the reviewed studies.In doing so, we focus only on those studies featuring a video-recorded or co-present demo or a video-recorded or first-person interaction and filter out those where the robot is used as a stimulus, for instance, to display an interactive behavior (e.g., facial expressions).We made this type of decision to be sure to present those interactions that had a more or less pronounced social context.
In the demo studies, the robot introduced a topic to a co-present audience or an audience asynchronously watching.Siegel et al. [72] used the robot to provide a brief explanation of its hardware, software, and technical abilities, and ask for donations.Makenova et al. [42] and You and Lin [84] replicated Siegel et al.'s study using the robot to introduce a research project and ask for donations [42] or to give an overview of the research taking place in the lab and ask for donations [84].Nomura and Kinoshita [49] employed the robot to describe the construction of a commercial building, while Powers and Kiesler [60] and Thellman et al. [79] used it to give health advice to participants [60] or explain why humans should not be afraid of robots [79].Finally, Sandygulova and O'Hare [69] and Steinhaeusser et al. [77] employed the robot to tell a story.
In presenting the interaction studies, we first introduce the video-recorded studies, in which the interaction was only observed by the participants, and then the first-person interaction studies, in which the participants themselves took part in the interaction.Three papers asked participants to observe or read about an interaction: Chita-Tegmark et al. [14], Jackson et al. [28], and Law et al. [34].All three papers included very complex interactions, which would have been difficult to carry through in a co-present human-robot interaction study.Chita-Tegmark et al. [14] and Law et al. [34] used the exact same interaction in their studies and presented it through a series of vignettes in a video.The interaction takes place in an office setting between three characters: a supervisor and two subordinates.In the interaction, the supervisor reproaches one of the subordinates for a mistake, and then leaves the room.The two subordinates, who are left in the room, discuss the situation and the subordinate who was not reproached (a human or a robot depending on the condition) reacts to the one who made the mistake in either a friendly or unfriendly way.Jackson et al. [28] presented the interaction through a video of a human-robot interaction.In the video, the robot explains how to play the game battleship and then supervises two humans while they play.At some point during the play, one of the humans receives a call and leaves the room.The human left in the room presents the robot with a morally problematic request, which the robot rejects in different ways.
Thirteen papers featured an actual first-person interaction.In Ghazali et al. [26], participants played a Table 1 General and demographic information about the studies included in the scoping review (F= female, M= male, dns= did not specify their gender).The terms used for participants' gender in the tables are derived from the papers.Studies [20] and [22] refer to the same study but report the results of different dependent variables.*= no manipulation of gender in this study; ≈= calculated from partial means (when only group means are reported); (?)= it is not clear from the paper whether participants interacted with a physical robot.Eyssel & Hegel M, F Facial Yes Participants rated the extent to which (2012) [21] Features, the robot appeared "rather male" vs. Hairstyle "rather female" using a 7-point Likert scale

Eyssel et al. M, F Voice Yes
Participants indicated whether the voice (2012b) [22] sounded rather female (1) or male (7) using a 7-point Likert scale  [31] no cue Color gender on a semantic differential item: the robot was "would you say that the robot was more perceived as like a male or like a female?"(1=male, more male than 7=female) the male robot.* 1 The statistical significance in [20] is inferred from [22] which is based on the same study, but not directly reported.* 2 Thellman et al. [79] do not report the results of the statistical analyses related to the manipulation check.trust game inspired by the investment game, where they prepared a drink for an alien with the help of the robot.In Jung et al. [31], they interacted with the robot in a music listening scenario, whereas in Pfeifer and Lugrin [58], they learned how to develop a website in HTML together with the robot.In Kraus et al [32], Powers et al. [61], and Rea et al. [62] participants engaged in a conversation with the robot.In these studies, the robot acted as a dialogue partner in a taxi ordering or baby healthcare scenario [32], engaged participants in a faceto-face conversation on the topic of first dates [61], or involved them in a casual conversation around daily topics (e.g., hobbies, work, or school) [62].Kuchenbrandt et al. [33] and Reich-Stiebert et al. [64] involved participants in more structured tasks.The former [33] asked participants to sort out items into the compartments of a sewing or tool box under the instruction of the robot.The latter [64] asked them to solve a set of cognitive tasks (i.e., a memory, an auditory, and a visual task) focusing on stereotypical female or stereotypical male academic fields.Following the line of studies involving participants in stereotypical female or male tasks, Tay et al. [78] engaged participants in either a healthcare scenario in which, among other things, the robot measured their body temperature, or in a safety scenario in which it, for instance, enlisted their help in resolving an intrusion in the research space.
The studies by Sandygulova et al. [67], Sandygulova and O'Hare [68,70], and Zhumabekova et al. [86] focused on interactions between children and robots.In [67], the children were asked to help the robot practice its new job of keeping people safe by turning off kitchen appliances.In [68], they were asked to help the robot learn how to use the utensils in the kitchen.In [42], they were asked to help the robot lay the table.Finally, in [70], the children interacted with the robot in three sessions.In the first two, they were involved in a card-pairing task.In the last one, they listened to the robot telling a story.

Definitions of Gender
Most of the papers (91%) did not provide a definition of gender or an explanation of the authors' understanding of gender (see Figure 3a).One of them reported a definition of gendering [8].Bryant et al. borrowed the term gendering from Robertson et al. [65] and defined it as "the attribution of gender onto a robotic platform via voice, name, physique, or other features."They used this term to describe the encoding of gender into robots via the choice of design features [56] (see Section 1.3), rather than the property of the robot of being gendered.
Two other papers gave an explanation of their understanding of gender, both of them in relation to participants' gender.Rea et al. [62] specified "we use the term "gender" synonymously with biological sex, which we recognize is overly simplistic.We used "gender" for the practical purpose of simplifying our investigation."Reich-Stiebert and Eyssel [64], instead, stated "Sex refers to biological and physiological features.Gender, however, is a social construction."They explain that they included both of these factors in their experimental design as person's biological sex might not correspond with their perceived gender identity.While these two definitions give us a clear understanding of the authors' interpretation of human gender, they do not provide us with their understanding of "gender" or the process of gendering when it comes to robots.
[79], You and Lin [84], and Zhumabekova et al. [86] did not provide an explicit reason to manipulate the robot's genderedness.The other reviewed papers, instead, reported four core reasons behind the manipulation of the robot's genderedness.
The first reported motivation was to study the relationship between social categorization and stereotypical judgements of robots.In this group of papers, the robot's genderedness was manipulated to understand whether the robot's social categorization could elicit gender stereotypes [3,4,21,49,62,64], bring people to attribute the robots capabilities in line with their perceived "gender" [8,14,33,34,61,60], or bring people to judge the appropriateness of the robots' behavior based on gender norms [28].
The third reason to manipulate the robot's genderedness was to investigate gender segregation -"the separation of boys and girls into same-gender groups in their friendship and causal encounters" [44] -in childrobot interaction (cHRI).In this group of papers, the robot's genderedness was manipulated to explore whether children retained gender segregation with gendered robots [70] and whether their preference for a same-gender robot changed across age and gender groups [67,68].Finally, the fourth motivation was to test whether female social robots could be used as role models to engage young women in computer science [58] Since Denner et al. [16] showed that girls benefit from learning how to program in female pairs, Pfeifer and Lugrin wanted to understand whether the genderedness of the robot could impact the learning process of women in the domain of computer science.

Voice
In terms of design choices, 28 studies (78%, see Table 2 and Figure 4) manipulated the robot's genderedness through its voice, either in isolation (N = 9) or in combination with other features (N = 19, we report the combinations in the other sections).In most cases, the voices used were the default female and male voices provided by commercially available text-to-speech software, such as MacOS' [34], CereProc [54], Cepstral Theta [60], Acapella [79], or voices edited with software like Audacity [77].In other cases, human voices were recorded and implemented on a robot (e.g., Sandygulova et al. [40]).
Since the voices employed in the reviewed studies were in most cases the default voices provided by commercially available software, the majority of authors did not specify the rationale behind their selection.Only Kuchenbrandt et al. [33] mentioned low frequency as the main characteristic of male voices and high frequency as the characterizing feature of female voices, and Powers and Kiesler [60] and Sandygulova and O'Hare [69] mentioned work by Nass and Brave [46] explaining how a voice with a fundamental frequency of ≈110 Hz is perceived as male and a voice with a fundamental frequency of ≈210 Hz as female.

Facial Features
Six studies (17%) employed facial features to manipulate the robot's gender.Within this category, there was a lot of variability in terms of what facial elements were used to manipulate the robot's genderedness.For instance, Eyssel and Hegel [21] used Flobi's lip module with more defined lips to manipulate the genderedness of the female robot, and the one with less defined lips to manipulate the genderedness of the male robot.Powers et al., [61] instead, used the color of the lips to change the perception of the robot's genderedness: pink lips for the female robot and grey lips for the male one.
At a more holistic level, Calvo-Barajas et al. [13] and Ghazali et al. [26] used the default faces provided by the robots Furhat and Socibot.In both their studies, the female texture had thinner eyebrows, rosier cheeks, and redder lips than the male texture.Paetzel et al. [53,54] did not resort to Furhat's predefined faces.They used the software FaceGen to create the female and male facial textures they then projected onto Furhat's face mask.The software FaceGen gives the possibility to model a 3D head and modify its genderedness through a slider.From the pictures shared by the authors, it seems that the female texture had thinner eyebrows, redder lips, bigger eyes, and a whiter skin with respect to the male texture, all facial features partly overlapping with those in Calvo-Barajas et al. and Ghazali et al.
Facial features appear in isolation only once and are combined with the robot's hairstyle in Eyssel and Hegel [21] and with the robot's voice in 4 studies [26, 53, 54,  61].Interestingly, the choice of facial features used to manipulate the robot's genderedness is never explained in detail or motivated by the studies.This might have to do with the fact that in most studies the faces used to manipulate the robot's genderedness were the default faces provided by the respective robotic platforms (i.e., Furhat and Socibot).Hence, the authors of the papers might have worked under the assumption that a rationale for the choice of facial features had been followed by the respective robotic companies.

Apparel & Color
Three studies (8%) used clothes to manipulate the robot's genderedness.Jung et al. [31] provided the male robot with a man's hat and the female robot with pink earmuffs.Thellman et al. [79] equipped the male robot with a blue white-dotted bow tie and the female robot with a pink ribbon.Finally, Zhumabekova et al. [86] gave the female robot a flower hair clip and the male robot a bow-tie.Clothes were used in combination with voice and names in [79,86].Jung et al. did not give details regarding other gender cues beyond clothes.However, we suspect that they also used the robot's voice to manipulate the robot's genderedness as the robot had a conversation with participants in their scenario.
The clothes in the reviewed studies were often stereotypically colored (color is used in 3 studies, 8%): blue for male robots, pink for female robots [31,79].In You and Lin [84], it is the body of the robot that is stereotypically colored instead: blue for the male robot, grey for the neutral robot, and pink for the female robot.The rationale behind using clothes and color to manipulate robot's genderedness is never explicitly laid down.

Hairstyle
Two studies (6%) employed the robot's hairstyle to suggest the robot's genderedness.Eyssel and Hegel [21] used Flobi's hair module to add short or long hair to the robot, whereas You and Lin [84] used the robot Alpha 1 pro with short, mid-length, and long hair to manipulate female, neutral, and male genderedness respectively.While You and Lin did not provide any rationale for their manipulation of genderedness, Eyssel and Hegel mentioned Brown and Perrett [7], and Burton et al. [9] to justify the choice of using hair length.These papers pose that hairstyle is a salient facial cue to determine someone's gender and that long hair lead to an increased accessibility of knowledge structures about the social category of women, whereas short hair activate stereotypical knowledge structures about men.In Eyssel and Hegel [21], the robot's hairstyle is used in combination with its facial features (see Section 4.4.3),while in You and Lin [84] with the robot's voice and color (see Section 4.4.4).

Body Shape
Two studies (6%) used the robot's body proportions to manipulate the robot's genderedness.These studies were both authored by Bernotat et al. [3,4] and the latest of the two was a replication of the earliest.Bernotat et al. modified the Waist-to-Hips Ratio (WHR) and Shoulder Width (SW) of a robot's drawing to achieve different perceptions of genderedness.They hypothesized that a robot with a WHR of 0.9 and a SW of 100% would be perceived as male, whereas a robot with a WHR of 0.5 and 80% SW as female.The rationale behind this manipulation of genderedness came from the work of Johnson and Tassinary [30] and Lippa [35] who showed that people rely on WHR to judge a target's "gender" and that the form of the waist is a relevant feature for gender perception.Since the studies used static images, body proportions were not used in combination with other cues.

Manipulation Check & Assessment Tools
Only 54.3% of the studies (N = 19) performed statistical analyses to understand whether the manipulation of the robot's genderedness actually succeeded (see Figures 3b and 5).On top of these studies, 8.6% of the studies (N = 3) performed a manipulation check but of a non-statistical nature [8,70,86] (see Figures 3b and  5).The authors did ask participants which "gender" the robot belonged to in their opinion, but they did not perform any statistical analysis to check for the significance of the result.As is easy to infer, 37.1% of the reviewed studies (N = 13) did not perform any manipulation check to test whether participants perceived the robot's genderedness as expected [13,14,28,34,42,49,58,68,69,67,72,84,77].
In the studies that performed a statistical manipulation check, the authors used three different approaches to assess people's attribution of "gender" to the robot (See Table 2 and Figure 3c).The first measurement approach was unidimensional.The authors asked participants to rate the robot's genderedness on one item usually using the following phrase: Rate the extent to which the robot appeared "rather male" versus "rather female".The rating was expressed on a 7-point Likert scale with male and female as end points.The second measurement approach was multidimensional (See Table 2 and Figure 3c).The authors asked participants to fill out two items usually using the following phrasing: (1) To what extent do you perceive the robot as male?(2) To what extent do you perceive the robot as female?.The ratings were expressed on 7-point Likert scales where 1 meant not at all and 7 extremely [3,4,50,54,61,78].Finally, the third and last measurement approach was nominal (See Table 2 and Figure 3c).The authors asked participants to select the "gender" of the robot among a list of options or as a write-in question [61,60,69].Sandygulova and O'Hare used this approach with children using a pictorial response system [69].Powers and Kielser [60] asked participants to attribute a name to the robot and judged the "gender" attributed to the robot based on the gender of the name.Finally, Powers et al. [61] combined the multidimensional and nominal approaches by first asking whether the robot in their study was gendered and then asking participants to specify how feminine and masculine the gendered robot was.
When Likert scales were used to measure the robot's genderedness (first and second approach), the mean scores on the items female/feminine and male/masculine were only rarely close to the end points of the corresponding gender.As an example, for Ghazali et al. [26], the manipulation check was significant.However, the difference between the male and female robot was not marked (male robot: M = 5.50, SD = 1.60; female robot M = 6.07,SD = 0.83).When the manipulation of the robot's genderedness was performed with nominal scales (third approach), the difference between the robot's "genders" was obviously more marked.However, female robots were more difficult to categorize across studies.This was particularly evident in [61] where the robot with the dampened female voice was miscategorized by 73% of the participants and given a male name by 70% of them.
Overall, 79% of the studies performing a statistical manipulation check (N = 15) were successful in manipulating the robot's genderedness.Sixteen percent of them (N = 4) were only partially successful.Finally, 5% of them (N = 1) did not report the results of the statistical manipulation check [79] (see Table 2 and Figure 5).The only instances where the manipulation check was only partially successful were the studies with a gender neutral or gender incongruent condition [31,53,54], or an altered gendered voice [60].

Methodological Note
The studies we reviewed employed 132 dependent variables.These could be nested into 17 groups based on conceptual similarity (e.g., warmth and mildness were nested under communion).For convenience, we refer to the group variables when reporting main and interaction effects.This grouping was merely done to clearly summarize the results and draw conclusions from them.The orange column displays which of the included studies enlists a manipulation check, the green column shows how many of the studies performing a manipulation check actually succeeded in manipulating the robot's genderedness, and the blue column highlights the studies finding a main effect of the robot's genderedness on the dependent variables.The purple boxes on the right enlist the papers featuring main effect of gender on the dependent variables, the gender cues used when such effect was found, and the dependent variables influenced by robot's genderedness.*= the dependent variables reported here are only those significantly affected by the robot's genderedness.

Main Effects
In the reviewed studies, only 17% of the dependent variables (22 dependent variables out of 132) were affected by the manipulation of the robot's genderedness in terms of main effects.The genderedness of the robot did not yield any significant effect on the dependent variables nested under competence (10 dependent variables), likability (15 dependent variables), credibility (3 dependent variables), acceptance (8 dependent variables), task-related robot evaluations (4 dependent variables), proximity (1 dependent variable), closeness (2 dependent variables), and "other" (2 dependent variables).Moreover, it had seldom main effects also on the dependent variables in the other groups.
When the results were significant, participants tended to perceive the robot in line with gender stereotypes (see Section 4.3.2).They attributed more communal traits to female robots than to male robots [4,21] ( [3] marginally significant) and more agentic traits to male robots than to female robots [21].They showed higher affective trust towards female robots than towards male robots [3,4], and rated the female robot as more suitable for stereotypical "female" tasks [3,4,21] and the male robot as more suitable for stereotypically "male" tasks [21].Moreover, they donated more money [42,72], said more words [61], and smiled more to female robots than to male robots [70].The only studies that were counterintuitive in terms of gender stereotypes were Chita-Tegmark et al.'s [14] where, in contrast with the authors' expectations, the male robot was perceived as more emotionally intelligent than the female one, and Bernotat et al.'s [3,4], where, as opposed to the author's assumptions, the female robot elicited more cognitive trust than the male robot.
Very few studies disclosed a significant main effect of the robot's genderedness on crucial HRI constructs (see Section 4.3.2).In [31], the female robot was rated significantly higher in animacy and anxiety than the male one, and in [34], it was trusted significantly less.Interestingly, some of these studies report conflicting evidence.For instance, the male robot was perceived as more anthropomorphic than the female robot in [31], while it was perceived as more machinelike in [53].

Interaction Effects
The reviewed studies showed a significant interaction effect of the robot's genderedness and (an)other independent variable(s) on 24.24% of the dependent variables (32 of the 132 dependent variables).Fifty percent of these effects resulted from the interaction between the robot's genderedness and participant's gender.The other half of these effects resulted from the interaction between the robot's genderedness and a further independent variable (i.e., severity of moral infraction [28], interaction modality [53], type of emotion [13], childlikeness of the robot [60], stereotypically gendered task [33], or learning material [58]).
Robot's Genderedness and Participant's Gender.Among the studies that found an interaction effect between the robot's genderedness and the participants' gender, 50% (8 out of 16 dependent variables) showed a significantly positive effect of the matching between the robot's genderedness and the participant's gender, and 50% (8 out of 16) the opposite, a significantly positive effect of the mismatch between the robot's genderedness and the participant's gender.With regards to the former results, adults seemed to perceive a robot with the same gender as them as significantly less harsh [28], more anthropomorphic [20], more psychologically close [20], and eliciting less negative cognition [26].Further results disclosed that children were in a significantly better mood [70], smiled more [67], played more [68], and got more physically close [68] to a robot that shared the same gender as them, which lends support to the gender segregation hypothesis for cHRI.No evidence was found in support of the use of female robots as role models for women learning computer science topics [58].
With regards to the positive effect of a human-robot gender mismatch, women seemed to attribute higher emotional intelligence to male robots [14] and men found female robots more trustworthy [72], credible [84] (although [72] find this effect for both men and women), and engaging [72] and were willing to donate them more money [72].Furthermore, men and women uttered more words to the robot of the opposite "gender" in [61], and younger children showed more happiness in the opposite gender than in the same gender condition in [70].This latter is the only result that disconfirms the gender segregation hypothesis for cHRI.In general, the results of the studies exploring human-robot gender (mis)match on the perception and interaction with robots are inconclusive when it comes to adult participants.
Robot's Genderedness and Further Independent Variables.Fifty percent of significant interaction effects were due to the joint effect of the robot's genderedness and another independent variable.In [53], the female robot was perceived as more responsible, intelligent, pleasant, relaxed, and content than the male robot, but only in the multimodal condition (i.e., when the robot used both facial expressions and voice to interact), whereas the male robot was perceived as more familiar and trustworthy than the female robot, but only in the unimodal condition (when it used only facial expressions).In [13], the male robot was perceived as more likable in terms of appearance when it expressed high anger (as opposed to medium anger) and low happiness (instead of medium anger, and low anger), while the female robot was perceived as less likable in terms of appearance when it expressed high anger (instead of all other emotions: low, medium, and high happiness, and low and medium anger) and low anger and medium happiness (instead of low happiness).In [60], 100% of the participants said they would be willing to follow the advice of the childlike male robot, 91% of the participants said they would be willing to follow the advice of the adultlike male and childlike female robots, and only 50% of the participants said they would be willing to follow the advice of the adultlike female robot.In [28], participants perceived the male robot as too direct in the pre-test but not when responding to norm violating commands, but did not perceive such a difference for the female robot.Moreover, male participants liked male robots when rejecting commands from male humans for severe norm violations, but did not like female robots rejecting commands from female humans for weak norm violations.Also, male participants liked male robots but not female robots when they issued strong rejections.Finally, female participants preferred when robots did not comply with the requests of a human with the same gender as the robot.In [64], participants who were instructed to solve a stereotypically female task with a male robot and those who were instructed to solve a stereotypically male task with a female robot reported higher contact intentions with respect to participants involved in conditions where the genderedness of the task and the genderedness of the robot matched each other.

Addendum: Papers 2021-2022
To conclude our Results section, we would like to report a short addendum on the studies manipulating the robot's genderedness between May 2021 and May 2022.To identify the studies in this addendum, we used the same search strings and databases detailed in Section 3.1 and followed the same selection pipeline discussed in Section 3.1.1.However, we did not perform the full process of coding and information extraction described in Section 3.2.The present section only aims at indicating the most recent developments in the investigation of robots' genderedness and highlighting whether novel results have been disclosed.The short review we performed returned 40 papers, of which 7 met the inclusion criteria after reading the abstract, and only 5 after reading the entire article [25,47,57,59,71].In Table 3, we give more details about these papers.
Neuteboom and de Graaf (2021) [47] looked into the effects of robot's genderedness (female and male robot) and task (analytical and social) on the robot's perceived trustworthiness (i.e., capacity trust and moral trust), as well as on its social perception (i.e., agency and communion), and humanness (i.e., human uniqueness and human nature).In line with previous studies, they did not find any significant effect of robot's genderedness and performed task on people's perceptions.
Perugia et al (2021) [57], instead, explored how people attribute gender (femininity and masculinity) and stereotypical traits (communion and agency) to Furhat.Most Furhat's faces were attributed a "gender" in line with their names.Interestingly, the robot's genderedness influenced people's perceptions of the robot's agency but not of its communion.This study confirms that the robot's genderedness can influence the attribution of stereotypical traits to humanoid robots in agreement with [3,4,21].
The other three studies focused on the genderedness of service robots.Forgas-Coll et al. ( 2022) [25] investigated the effects of gender-personality congruity on customers' intention to use a service robot.They discovered that while the congruous gender-personality robots (female-cooperative and male-competitive) did not differ from the incongruous ones (female-competitive and male-cooperative) in promoting intention to use, they did differ between each other: the female-cooperative robot performing significantly better than the malecompetitive one in promoting intention to use.
With a slightly similar objective, Pitardi et al. (2022) [59] looked into the effects of matching robot's gen-Table 3 Details about the studies in the addendum: Authors, cues used to manipulate the robot's genderedness, and dependent variables (in bold, the significant main effects).deredness and participant's gender on people's perceived comfort and control in a service encounter, as well as on their brand attitude (i.e., positive and negative evaluations of the service provider).The study disclosed that human-robot gender congruity has a significant positive influence on perceived control and comfort, but not on brand attitude, and that the cultural value of masculinity mediates the effect of human-robot gender congruity on participant's perception of control.Again in a service context, Seo (2022) [71] investigated the effects of robot's genderedness on pleasure and customer satisfaction in a service encounter and took into account the robot's anthropomorphism as an additional independent variable.The results showed that a female service robot leads to higher satisfaction and pleasure than a male service robot and that the robot's anthropomorphism plays a key role in positively influencing the results.

Authors Cues Used Main Effect
To sum up, the five studies in the addendum did not introduce novel ways of manipulating the genderedness of humanoid robots (except from personal titles, which can be equated to pronouns, see Table 3).In terms of results, however, they do disclose some interesting insights.They show a preference for female robots and human-robot gender congruity in service contexts [25,59,71].Interestingly, they also reveal that values of masculinity play a role in this preference.It might be that service contexts are much more powerful than others in eliciting stereotypical knowledge of male and female roles, and especially so for those participants with more conservative views of gender.

Discussion
In the following, we are going to summarize the main findings of the literature review, answer the research questions, and identify gaps in the literature that warrant further attention.Then, we discuss the results of the review and provide guidelines that the HRI community could follow when gendering or studying the gendering of robots.In doing so, we combine our epistemological backgrounds in Social Robotics and Gender Studies.

Summary of Results & Answers to RQ1 and RQ2
To summarize the results of the scoping review, the HRI scholarship most often manipulated the robot's genderedness through its voice, name, and facial features (RQ1).These cues were mostly used in interactive studies enlisting the use of a physical robot (see Figure 4).In the majority of cases the manipulation of the robot's genderedness with voice, name, and facial features yielded the expected results in terms of gendered perceptions (i.e., successful manipulation check).However, it often failed to produce a main effect of the robot genderedness on the dependent variables.Indeed, if we take a look at Figure 4b and the purple boxes in Figure 5, we realize that the most successful gender cues in influencing people's perceptions of robots were body proportions [3,4], and facial features [21,61].If we pay close attention to the results of this scoping review, what becomes apparent is that the studies enlisting a significant main effect of the robot's genderedness on the dependent variables are predominantly picture-based (e.g., communion, agency, task preference).Moreover, we can see that, in these studies, robot's genderedness is mostly successful in eliciting gender stereotypes of communion, agency and task preference/suitability, but does not yield notable significant effects on crucial HRI constructs, such as competence, likability, and acceptance (RQ2).
Given that robot's genderedness seems to be more harmful than useful as a design feature (it affects stereotyping but does not improve HRI), robotic companies might want to resort to less humanlike robots when gender stereotypical tasks are involved, or, in case humanlike robots cannot be avoided, they might want to use gender cues less prone to elicit gender stereotypes.Perugia et al. [55] started investigating which design cues in a robot are more likely to elicit stereotyping.However, more research in this direction is needed (GAP 1).Besides, given stereotypes towards gendered robots are so prevalent but mostly studied with static images and in short-term studies, future HRI research should investigate if stereotype attribution is influenced by a robot's embodiment (GAP 2) and whether it changes over time (GAP 3).In a repeated interaction study, Paetzel et al. [52] discovered that participants develop stable perceptions of a robot's warmth and competence (concepts similar to communion and agency) after two minutes of interaction and do not update them over time.Longitudinal perceptual studies like Paetzel et al.'s are needed also in the context of gendered HRI, to disclose whether stereotypes are formed once and for all a few minutes after meeting a robot or can modify with repeated interactions.In addition, since many studies focused on explicit stereotyping it might be worth performing implicit bias studies [51] investigating people's automatic, pre-reflective stereotyping of gendered robots (GAP 4).Finally, since the main concern of Roboethics and Robophilosophy is that people's behaviors towards robots might eventually generalize to hu-mans, the HRI scholarship is in need of research paradigms and studies that explore whether and how the gender stereotyping people display towards robots can influence their attitudes towards humans (GAP 5).

Discussion of Methodological Pitfalls
None of the studies we reviewed included non-binary, transgender, gender non-conforming, and gender fluid participants.Thirty-nine out of 3902 participants taking part in the reviewed studies (i.e., 1%) selected the option other/undisclosed.We can only assume that part of these participants identified with a gender falling outside of the binary.We consider the lack of genderdiverse participants a huge gap when studying the process of gendering robots, especially considering that the studies in this review brought to light the complex interweavings of participants' gender and robot's genderedness.This might have happened because participants' gender is oftentimes asked with check-boxes providing only two options, "female" and "male", but it might have also happened due to the lack of a proactive effort in including more gender identities.We advocate for this effort, hence we propose a first guideline for research on gendering robots: Guideline 1: Include transgender, gender fluid, gender non-conforming, and non-binary people, not just cisgender people, in the studies investigating robot's genderedness.
This guideline also urges to drop the biologized and essentialist way of asking about sex on a female/male categorical binary.The distinction of sex/gender and the deterministic understanding of sex as a binary biology is highly criticized within the neuro-and biofeminist field [6].Instead, understanding the terminology of the variety of gender identities that are actually relevant for social interaction as well as actively employing diverse recruiting efforts are needed.Scheuerman et al. drafted a living document "HCI Guidelines for Gender Equity and Inclusivity" containing a section on gender inclusive research methods which gives valuable insights into how to perform inclusive research.For instance, they suggest using the following options to ask about participants' gender: woman, man, non-binary, prefer not to disclose, prefer to self-describe and explain how to carry out in-person studies in a way that is respectful of all gender identities (see also [76]).
The studies we reviewed often lacked a clarifying definition of "gender".Only Bryant et al. [8] attempted a description of the gendering process as related to robots.We do not advocate for a universal fixed definition of gender that could fit all research and researchers.
However, we think it is important for researchers working on this topic to: Guideline 2: Reflect on their understanding and experience of "gender," clarify this understanding in their paper, and explain the reason why they are gendering the robot in their study.
A practical way forward to fulfill this objective is to go through a self-assessment process where the researcher(s) ask(s) themselves: (i) What does gender mean to me? (ii) Is gendering really needed to answer my research question?(iii) Am I embedding gender stereotypes in the study?We argue that by making the gendering process a reflective part of designing a study (as suggested by Weiss and Spiel [81]) and especially visible in writing about a study, most of the stereotypes imbued into gendered robots might be avoided.
Another methodological pitfall we observed in some of the studies, which is unfortunately endemic to HRI research, is the uniformity of participants' characteristics.Most of the reviewed studies resorted to a sample of young participants (possibly university students).The main drawback of the homogeneity in participants' characteristics is that it makes it difficult to address context-and user-specific differences.We acknowledge that resorting to students as participants is oftentimes dictated by the research complexity level or by the lack of funding to recruit a more diverse set of participants.However, in the specific context of gendering robots, this might give one-sided results, as individual participants' characteristics might disclose relevant insights into how gendered robots are perceived.While we put forward a caveat in this sense, we do not feel like enforcing a guideline, as the use of university students as participants might depend on the economic availability of each research group.
From a methodological perspective, we need to mention another aspect we observed in the reviewed studies, which might constitute a limitation of this review, namely the richness of robots, tasks, and activities.The studies we reviewed used many different robotic platforms and envisioned many different tasks (e.g., observing pictures, watching videos, interacting with the robot), activities (e.g., educational activities, casual conversations) and participants' roles (e.g., remote observer, co-present observer, interactant).This complexity is not bad in principle, but is risky when building a research field from scratch as it makes comparability between studies difficult, thus hindering the possibility of drawing conclusions on the role of robot's genderedness as a whole.To circumvent this, we suggest to: Guideline 3: Focus on few application scenarios (e.g., healthcare, education, hospitality) and perform studies under comparable conditions.This way the HRI scholarship could adopt an incremental approach to the study of robot's genderedness, where scientific clarity is prioritized over novelty, and in turn encourage replication studies where existing experimental designs are reused with slightly different variables to check if results still hold.

Discussion on Manipulation of Robot's Genderedness (RQ1)
Through this scoping review, we discovered that the robot's genderedness has been manipulated by the HRI scholarship using cues such as the robot's voice, name, facial features, apparel, colors, body proportions, and hairstyle.Some of these cues are fruit of social conventions and socio-cultural schemata (e.g., names, hairstyle, apparel), others refer to the physical and physiological characteristics of gendered bodies (e.g., the waist-tohips ratio and the voice frequency).Nevertheless, most of them tap into a binary understanding of gender.Indeed, in 89% of the reviewed studies, the gendering of the robot has been manipulated within the female/male binary.As a result, we draw the following guideline: Guideline 4: Avoid imbuing robots with oversimplified and normative visions of gender as binary.
One way to go about this objective is for researchers to engage in a critical reflection of their own gendering process by asking themselves: (i) Are the gender cues I have chosen really needed?(ii) Can I achieve the manipulation of genderdness with less and more subtle cues?(iii) Why am I manipulating the robot's genderedness with these cues?(iv) Am I embedding gender stereotypes in the robot by using these cues?Since gender cues might layer and affect each other in unexpected ways, it might also be a good strategy to either choose a robot with quite an undefined gender attribution at baseline and add additional gender cues to it or investigate the robotic embodiment for its existent genderedness without manipulating its design.Tools like the humanoid ROBOts -Gender and Age Perception (ROBO-GAP) dataset https://robo-gap.unisi.it/could help researchers choosing the right robot and checking its perceived gender already at baseline.This brings us to the fifth guideline: Guideline 5: Perform a pre-test of the genderedness of the robotic platform you plan to use to avoid further gendering when it is not needed.
As a non-negligible aspect of the gendering process observed in the reviewed studies, most of the gender cues were used in combination with others and only rarely in isolation, as if the layering of these cues could strengthen the gender attribution.However, from the results of the manipulation check, it becomes apparent that gender is attributed to robots on the basis of the tiniest gender cues (see Rea et al. [62]).Besides, overdoing gender cues and/or using extremely stereotypical cues (e.g., pink ribbon/ blue bow-tie) might lead to stronger stereotyping [55] and end up revealing the purpose of the study.Since the layering of gender cues does not yield any additional effect on the manipulation of gender and also puts researchers at risk of stereotyping, we strongly recommend to: Guideline 6: Avoid stereotypical gender cues and use as little gender cues as possible, and as subtle gender cues as feasible.
Even though most reviewed studies presented the robots through a physical embodiment, the context(s) in which the robots were shown varied widely.The process of gendering is not just initiated by the presence of certain appearance cues, but is deeply influenced by the context where the interaction takes place.Interacting with robots that have a certain role is different than attributing gender to a robot in a contextless task [56].It is during the interaction that the most performative aspects of the robot's genderedness unfold and become apparent and it is through the interaction that the robot's genderedness acquires a symbolic meaning [5].Hence, we recommend researchers to: Guideline 7: Consider the interaction context as part of the manipulation of the robot's genderedness and study how the gendering of robot's roles, behaviors, and activities influences the gender attribution to the robot or even flips it.
Another striking result of this scoping review was that almost half of the studies did not perform any statistical analysis to assess whether the manipulation of the robot's genderedness actually succeeded.This is particularly problematic as it makes it difficult to establish whether the lack of significant main effects of the robot's genderedness on the dependent variables is actually due to robot's genderedness or to its unsuccessful manipulation.Future studies should: Guideline 8: Always perform a manipulation check to test whether the robot's genderedness is perceived by participants in the expected way.
In Ghazali et al. [26], the manipulation check was deemed successful since the female and male robot conditions were perceived as significantly different in terms of gender.When taking a look at the descriptive statistics reported by the authors, however, one can notice that the robot's perceived genderedness did not differ in terms of categorization.Based on this, we recommend researchers to perform a manipulation check, but also: Guideline 9: Check the descriptive statistics of each gender condition as part of the manipulation check, as a significant difference between conditions does not necessarily grant a different categorization of the robot's genderedness.
Measuring the robots' genderedness is not exempt from shortcomings.A research concept is necessarily entangled with the questionnaire that asks the participant about it [36].Meaning, if the concept is a binary understanding of gender, then a question about feminine or masculine aspects whether in one or different items, will ontologically reproduce a binary idea of gender.Besides, asking people to attribute gender to a robot might result in a gender attribution even when the robot is not perceived as gendered in the first place.In this scoping review, we identified several quantitative ways to measure the robot's genderedness.However, it might be interesting to: Guideline 10: Explore more subtle ways of checking whether gender is attributed to the robot, for instance, through qualitative or indirect measures.
For instance, Roesler et al. [66] used naming frequency to understand how the robots in their study were attributed a gender, which gave participants the possibility not just to give robots traditional names, but also technical and more object-oriented ones.

Discussion on Effects of Robot's Genderedness on Perceptions of and Interactions with Robots (RQ2)
When taking the results as whole, it becomes quite clear that gendering robots has a strong effect on stereotyping.We cannot help but wonder whether the effects that robot's genderedness has on stereotyping might have been due to the way the robot was gendered in the first place.As to say, if we imbue robots with stereotypical gender cues, it might become difficult for participants to not stereotype them as a result.
In general, one of the clear-cut outcomes of this scoping review is that genderedness does not have an effect on crucial constructs for the HRI, such as acceptance and likability, as it perhaps does for voice assistants.In this regard, however, the studies published in the last year paint a different picture.They disclose that in service contexts, female robots and gender "congruity" (i.e., the match between participant's gender and robot's genderedness) are almost always preferred.Comparing these results with the research on voice assistants, it seems that there is something in the service context that makes the female genderedness of artificial agents immediately relevant.As if the fact that we as humans are used to see women in service roles makes the suitability of female robots in the same role immediately glaring.From a feminist standpoint, a question arises: do we have to second the preference of the user for female service robots even if we know it stems from a discriminatory understanding of a gendered society?We as authors argue that we do not have to, and present the HRI community with a guideline that could serve as a design opportunity: Guideline 11: Use gendered robots to offer occasions of defamiliarization with normative gender roles and disrupt binary conceptualizations of human gender and tasks.
In the context of interaction effects, two results caught our attention in the papers we reviewed.Calvo-Barajas et al. [13] discovered that children perceived a female robot as less likable when it expresses high anger instead of more positive or less intense emotions, while Jackson et al. [28] disclosed that male participants like male robots but not female robots when they issue strong rejections.These results seem to suggest that female robots, like women, are liked less when they are not compliant or not consensual.This follows the problematic narrative that wants women submissive and aware of "their place" in the world.In a real-life environment, how should a female robot react to people issuing annoyance for their lack of compliance or consent?Should they maintain a jokey vibe of servitude as voice assistants originally did [82] or react resolutely as in Winkle et al. [83]?We consider Winkle et al.'s work [83] a valid and viable option.Aside from this, however, the HRI scholarship should start reflecting on the ethical implications of gendered robots and their (mis)treatment, especially given the highly symbolic meaning human-humanoid interactions entertain with human-human interactions [75,74,56,85].As such we suggest a last guideline: Guideline 12: Critically reflect on the results of your research on gendered robots and engage with a discussion of the ethical implications of your findings, especially considering the highly symbolic value of human-humanoid interactions for human-human relations.
For future robot designs, the challenge remains whether we could come close to a gender neutral or even genderless humanoid robot.Since the human form is so strongly interconnected with the gendering process [56], the predominant use of a humanoid design form could be put into question.The HRI scholarship might want to identify alternatives to humanoid designs as well as imagine interactions with robots that do not just mimic human-human interactions.
Authors' Contributions.GP formulated the research questions, devised the inclusion and exclusion criteria, performed the string search, went through the selection pipeline, read all the papers, extracted the information from the papers, summarized all the results, wrote sections 2, 3, 4, 5, and 6 of the paper, contributed to the writing of section 1, and prepared all the tables.DL formulated the research questions, devised the inclusion and exclusion criteria, performed the string search, read part of the papers, extracted information from part of the papers, wrote section 1 of the paper and contributed to the writing of section 6.
12 Table 4: Experimental information about the studies included in the scoping review: Authors (Date), independent variables (bs= between subjects; ws= within subjects), dependent variables (in bold, the significant main effects of robot's genderedness on the dependent variables, i.e., p < .05),and summary of findings.

Sandygulova bs
Robot Voice Genderedness and Age happiness Female children expressed more happiness towards the et al. (2014) [67] (female child, male child, female robot.Male children expressed more happiness female adult) towards the male robot bs Children Age (8, 9, 10, 11, 12 yo)

Siegel et al. bs
Robot Genderedness (female, male) trust, donations, credibility, Men trusted the female robot more, donated more money (2009) [72] bs Participant Gender (female, male) engagement to it, considered it more credible, and felt more engagement bs Participant alone (not alone, alone) with it.Women showed no preference for any of the robots in terms of trust and donations, but rated the male robot as more credible and more engaging than the female robot.

Tay et al. bs
Robot genderedness (female, male) perceived trust, acceptance, The perceived gender of the robot did not affect perceived (2014) [78] bs Robot Personality attitude towards robots, trust, acceptance, attitude towards the robot, subjective (introverted, extroverted) subjective norms, affective norms, cognitive evaluations.Participants showed higher bs Occupational role (female, male) evaluation, cognitive evaluation, affective evaluations and perceived behavioral control in the perceived behavioral control female healthcare robot condition, and higher affective evaluations in the male security robot condition.Some of the non-significant results were marginally significant.
You & Lin bs Robot Voice Genderedness trust, donations, credibility The genderedness of the robot did not affect trust, (2019) [84] (female, male) engagement the amount of donations, credibility, and engagement.bs Robot Appearance Genderedness While men rated the female robot as more credible (safety) (female, male, neutral) than the male robot, women did not show any difference in their ratings.

Zhumabekova et ws
Robot Genderedness (female, male) liked interacting with robot Children liked the interaction with a same-gender robot al. (2018) [86] bs Participant Gender (female, male) more than the interaction with a robot with a different gender.

Steinhaeusser et bs
Robot Voice Genderedness anthropomorphism, There was no significant effect of robot's genderedness al. (2021) [77] (female, male, neutral) transportation, attitudes on anthropomorphism, the attitude towards robots, and bs Participant Gender (female, male) towards robots the transportation participants felt when the robot told the story, nor were there any interaction effects between participant's gender and robot's genderedness

Fig. 1
Fig. 1 PRISMA diagram detailing the paper selection pipeline

Fig. 2
Fig.2Distribution of Participant's Gender in the Reviewed Studies.In blue, men/male participants, in red, women/female participants, in orange, participants whose gender was not specified and in green, participants falling into the other/undisclosed gender category.

Fig. 3
Fig. 3 Percentage of studies indicating a gender definition (a), percentage of studies performing a manipulation check (b) and frequency of the different assessment approaches of the robot's "gender" in the studies performing a manipulation check (c).

Fig. 4
Fig.4Frequency of Manipulations.4a.Different manipulations in decreasing order of frequency and type of embodiment used in the studies, 4b.different manipulations in decreasing order of frequency and corresponding significant (or not) main effect of robot's genderedness on the dependent variables.

Fig. 5
Fig.5Diagram Summarizing the Results of the Scoping Review.The orange column displays which of the included studies enlists a manipulation check, the green column shows how many of the studies performing a manipulation check actually succeeded in manipulating the robot's genderedness, and the blue column highlights the studies finding a main effect of the robot's genderedness on the dependent variables.The purple boxes on the right enlist the papers featuring main effect of gender on the dependent variables, the gender cues used when such effect was found, and the dependent variables influenced by robot's genderedness.*= the dependent variables reported here are only those significantly affected by the robot's genderedness.

Table 2
Manipulation of the robot's genderedness in the studies included in the scoping review: robot's "genders" manipulated (M= male; F= female; N= neutral), cues used to manipulate the robot's genderedness, presence of a manipulation check (Yes= manipulation check is performed; No= manipulation check is not performed; ns= no statistic performed to verify the manipulation check), significance of the manipulation check (bold=significant, italics=partially significant), metrics used to assess perceived gender, and notes.