Introduction

Computers operate in binary, which significantly differs from the familiar natural language used by humans. To bridge this gap and enhance user-friendliness, user interfaces were developed to make interactions with technology more intuitive. Traditionally, many of these interfaces depended on clicking or typing. However, advancements in natural language processing (NLP), which can be understand as the application of computational techniques for automatic analysis and representation of human languages [1], along with text-to-speech (TTS) technologies and large language models (LLMs), have enabled vocal interactions with computers. This development allows users to communicate with technology using spoken language.

While there are technologies that rely on a mix of different input modalities (e.g., conversational agents that allow text-based chatting and verbal communication), there are also those that focus primarily on voice interaction. The prevalent example of the latter would be voice assistants, which are defined as “software agents that can interpret human speech and respond via synthesized voices” (p. 81) [2]. This technology uses NLP and TTS algorithms to compute an output in natural language that is highly probable to be an appropriate answer to an inquiry. This internal process can be implemented in smartphones or designated loudspeaker systems. About 40% of the US population is estimated to communicate with voice assistants, with smartphones being the most commonly used device [3].

The use of speech as a control element makes the technology particularly accessible and usable, as it can draw on linguistic patterns that humans use in interpersonal contexts [4]. It primarily addresses efficient needs such as controlling smart home applications or fact-checking [5]. At the same time, the ability to socially communicate will continue to improve based on rapidly evolving LLMs (e.g., GPT-4).

Within the realm of sexuality-related applications, these “software agents” can not only serve as social partners with whom one can have sexualized interactions but also as an information source for sexual health communication. Both aspects will be discussed in the following.

The Importance of Voice Within Sexuality-Related Technologies

Technologies used in the broader field of sexuality can be distinguished in arousal-oriented interactions and non-arousal activities [6 ••]. The first covers technologies under the umbrella term “sexual interaction in digital contexts (SIDC)”. Here, users have sexualized interactions with humans or artifical entities via or through technologies. While SIDC would also include video calls with another person in which the voice plays an integral part (sexuality via technology), this article focuses on technologies (respectively artifical entities) that allow interactions with the system.

When discussing intimacy with voice technologies, the science fiction movie Her is frequently mentioned as an example of how people form bonds with voice agents [7]. In this story, the main character develops a romantic connection and engages in intimate interactions with a voice assistant. In reality, the applications (also known as skills or actions) that can be implemented on a voice assistant are restricted by the companies selling the voice assistant systems.

However, first smartphone applications provide a mix of input modalities (e.g., text or voice) that permit romantic and sexualized interactions, such as the companion application Replika. A romantic partner mode can be activated via a Pro membership, enabling romantic and sexualized conversations [8]. The human voice comes into play here by allowing users to talk on the phone with their persona. The smartphone thus becomes the shell of a person with whom people can exchange whatever is on their mind at any time of the day or night. Technological developments will moreover enable increasingly realistic communication: filler words (e.g., I mean, like, uh, um), pauses, breathing, and other very characteristic forms of vocal communication can now be imitated by algorithms (e.g., Genny by Lovo AI) [9].

In addition to these arousal-oriented interactions, technologies facilitate so-called “sexuality-related non-arousing activities”, such as communication about sexual health. These activities can be categorized as digital health interventions (DHIs), encompassing interventions delivered through digital technologies such as smartphones or websites. DHIs can foster healthy behaviors and facilitate remote access to effective treatments [10].

In the context of sexual health promotion, DHIs have gained popularity and have undergone significant advancements in response to the increasing adoption of the internet and mobile devices. They offer a desirable approach for reaching at-risk populations, e.g., adolescents, sexual or ethnic minorities, and sex workers, who may be hesitant or unable to seek professional advice due to lack of resources or stigmatization. Notably, a recent trend in this field involves incorporating new technologies, including text- and voice-based conversational agents [11]. Widely known examples of conversational agents using NLP for DHIs are digital voice assistants such as Apple’s Siri or Amazon’s Alexa [12••]. Beyond their convenience of home accessibility, voice assistants’ ability to engage in private and natural conversations with users amplifies their value as healthcare assistants [13••].

One example of a virtual assistant teaching teenagers about sex and sexual health is the voice assistant skill Hush Hush, developed by Healthy Teen Network. Hush Hush is a “trusted, confidential mentor that engages students in relevant, thoughtful, and personalized sex education conversations.” The voice assistant skill aims to create a confidential environment for adolescents to explore matters related to gender and sexuality, evaluate relationships, and select appropriate birth control methods. It is the goal to provide a non-judgmental guidance concerning complex issues about love and sexuality [14].

Method

In the following, an overview of theoretical and empirical insights for both arousal-oriented interactions and sexual health communication will be presented. The papers provided represent pertinent research derived from a thorough review of the existing literature. For this purpose, common scientific meta-data services (e.g., search engines) were used, such as Google Scholar, Web of Science, Research Gate, and Elicit. Search terms were composed of key words related to “sexuality,” “sexual health,” “pleasure,” “sex,” “voice assistants,” “voice bots,” and “smart home device.” Given that the field is heavily underrepresented and employs a diverse language, Google Scholar’s reverse search of cited papers was also extensively utilized. Because both theoretical and empirical research on voice usage in the broader field of sexuality-related technologies is scarce, research gaps in arousal-oriented interactions and sexual health communication are presented at the end of each subsection. The discussion includes a table that summarizes relevant papers that address the usage of voice in both arousal-oriented interactions and sexual health communication.

Theoretical and Empirical Insights on Arousal-Oriented Interactions with Sexualized Voice Technologies

Theoretical Basis for Arousal-Oriented Interactions with Sexualized Voice Technologies

The theoretical basis for humans engaging in social, specifically in intimate or sexualized interactions with an artificial persona, has been postulated in the sexual interaction illusion model (SIIM) by Szczuka and colleagues [15]. The model is based on the media equation theory and suspension of disbelief, which both can be helpful theories to explain why people interact with artificial entities within intimate settings. In terms of the media equation theory, numerous empirical studies have already shown that people respond to media that meet the criteria of natural language, interactivity, and social role similarly to contact with another person [16••]. Moreover, the so-called suspension of disbelief could play a role, especially in short-term interactions [17]. A theory that originates from theater and film research and states that people quite willingly engage with fictional content for the sake of entertainment or the hoped-for benefit and block out stimuli of the artificial for the moment (cf. reception of fictional stories that are meanwhile not constantly scrutinized for their degree of reality).

Impressions of Human Authenticity and Uniqueness as Important Variable for Empirical Studies

While research in the field of digitized intimacy and sexuality is still underrepresented, there are a few empirical studies on how people respond to intimate interactions with communicative assistants. An empirical study of Szczuka experimentally investigated social responses with a flirtatious voice assistant (including sexualized communication) [18••]. In this study, heterosexual male participants were either exposed to the flirting voice assistant or the same content/messages from another human in form of voice messages as they can be send via state-of-the art messenger applications. The results demonstrated that voice assistants arouse more interest than human messages (one explanation could be the novelty effects), yet the humans were perceived to be more flirtatious. Similar patterns are found in other comparative studies of digitized intimacy/sexuality with artificial counterparts. Motivational and intentional processes are experienced differently in humans than in artificial interaction partners. In this case, it refers to flirtatious behaviors. Within humans, this behavior is characterized by motivations and intentions that can be driven by the self (e.g., a boost of self-esteem) or the wish to contact another person. These distinctive states are based on cautiousness, which can only be imitated at the moment but not authentically implemented into machines. More research is needed on whether there is a threshold, for instance, contextual or personal factors, that influence to which degree users accept the imitation of distinctive motivations and emotions.

Based on the SIIM, it is imaginable that the mere imitation will, at some point, underline the artificiality of a technological system as a form of disbelief in the authenticity of the communicated content. A similar effect is presented in the science fiction movie Her [7]. Throughout the movie the protagonist is confronted with the reality that the voice assistant that imitated all different aspects of an intimate relationship does not only operate on his device but that multiple other users also use the algorithm. This emphasizes the persona’s artificiality, which makes the user recognize that the relation is neither unique nor authentic.

Yet, this does not mean that humans will not engage in such interactions at all. Accompanied sexual arousal may direct cognitive capacities precisely to need satisfaction rather than to cognitions that might negatively affect the situation. For example, considerations of whether such an entity cautiously decides to engage in a sexualized interaction or whether a sexualized interaction with a voice assistant might deviate from certain norms [15]. Research in hyper-realistic sex dolls, including users who use technological “extensions” such as the smartphone applications mentioned earlier to communicate to the persona, agrees that users do not become illusory when using them and understand the interaction partners are not existing persons. This is evident, for example, in interview studies in which “it” is mainly used as a pronoun despite strong personification [19, 20].

However, Pentina and colleagues did a mixed-method study on communication with the companion application Replika. They supported the notion that the perceived social qualities of the technology are vital in developing an attachment to the technology in combination with the user’s motivation [21••]. More research needs to be done to understand motivational factors and their potential power to suspend impressions of the artificiality of the technologies or cognitions centering around the systems’ artificialness in more general.

Research Gaps: Long-term vs. Short-term Interactions, Relevance of Authenticity/Uniqueness, and Temporary Benefits as Transitional Objects

The present state of theoretical research indicates variances between short- and long-term associations with artificial entities, though further empirical investigation is required to strengthen this understanding [22]. Short-term interactions might be an excellent outlet to explore and/or act out social and sexual needs in a controlled environment. Nevertheless, prolonged interactions over extended periods may be associated with unique difficulties. This is the case for users who actively establish and sustain a lasting bond with an artificial entity, requiring them to suppress or handle indications of artificiality actively.

Moreover, the technology involved inevitably exposes its artificial nature during repeated interactions. Dialogue systems may, for instance, verbally express implemented desires (which are based on what other people labled to be a desire) such as the wish to build a shared future with the user. Still, when it comes to practical implementation (e.g., creating a family or a shared living space), users might quickly be confronted with the limits of such a future in practice simply because resources and capabilities are lacking (e.g., financial resources or physical abilities).

However, research indicates that (temporary) relationships to artificial companions (embodied in terms of sexualized robots, but also non-embodied conversational agents such as Replika AI) can potentially benefit different users. These artificial companions can, for instance, function as a form of transitional object to overcome difficult life stages (e.g., difficult breakups or severe loneliness) [19, 23]. More research is needed to understand the use of technologies only for a temporary life phase (e.g., accompanying therapy concepts as described in [19]).

Empirical Insights on Non-arousal Activities Through Voice: Sexual Health Communication

Usage of Voice Assistants for Sexual Health Promotion

First studies have explored the potential of voice-based conversational agents in the context of mental health [24, 25]. However, the current research on conversational agents for sexual health is predominantly focussed on text-based platforms [11, 26, 27]. Consequently, investigations into using voice assistants for sexual health advice are limited. Thus far, they have primarily addressed specific facets of sexual health communication (e.g., particular diseases) and compared different modalities.

For instance, Wilson and colleagues examined the efficacy of smartphones and their digital assistants, specifically Siri and Google Assistant, in providing accurate sexual health advice. They compared the results with a laptop-based Google search [28]. The findings revealed that the laptop-based Google search outperformed both voice assistants, with Google Assistant’s performance superior to Siri’s. Although the smartphone assistants understood the questions the authors addressed, they either dismissed necessary inquiries related to general health or did not provide appropriate information. A significant implication of this study is that attempts to use voice assistants for sexual health in the “real world” could be worse than in the conducted research due to influences of slang words or colloquialisms and accents.

Another example is a study by Napolitano and colleagues that concentrates on the ability of voice assistants to recognize and answer questions about male sexual health concerning different subjects with high prevalence among men, such as erectile dysfunction, premature ejaculation, or male infertility [29•]. Consistent with the findings of Wilson and colleagues, overall, the answers given by the voice assistants were classified as intermediate-low quality [28], [29•]. The study however underlines that different companies provide answers of distinctive quality and therefore can create knowledge gaps that are based on a system’s capabilities to react to user’s needs.

Since no human is involved in the interactions between voice assistants and a patient, voice assistants can represent an accessible and anonymous technology that can be particularly useful in health-related areas associated with stigma. This extends not only to acquiring knowledge but also to managing specific medical conditions. One example is the human immunodeficiency virus (HIV), as HIV is among the eight most prevalent sexually transmitted infections (STIs). Consequently, there is a high interest in the potential of DHIs in this field. While studies have previously focused on evaluating the initial effectiveness of text-based chatbots (e.g., [30, 31]), according to Garett and Young, there is currently no available research investigating the utilization of voice-based conversational agents for pre-test counseling to encourage HIV testing [12••]. However, they point out the potential conversational agents have in this field to alleviate specific challenges faced by high-risk individuals, such as experiencing stigma or limited test kid access. They can be programmed to provide accurate information about available tests, enabling users to compare and select the most suitable option.

Furthermore, voice-based conversational agents can serve as reliable sources of scientifically grounded information, counteracting for instance prevalent HIV-related misconceptions on the internet and social media [12••]. Nevertheless, it is essential to acknowledge that their investigation solely focuses on the theoretical potential of voice assistants in this particular context. Considering the findings of Wilson and colleagues and Napolitano and colleagues, it is debatable whether voice assistants are presently serving as a reliable and accurate source of scientifically correct information [28], [29•].

The Influence of Modality on Sexual Health Communication

As stated earlier, using speech to interact with voice assistants represents a distinctive characteristic to enhance their accessibility [4]. However, there is a lack of research addressing the appropriateness of employing voice assistants in a domain as sensitive and often stigmatized as sexual health. It is debatable if the modality, i.e., voice usage, is the best choice for gathering sexual health advice. In this context, a text-based conversational agent would be a more comfortable choice for most potential users. Cho investigated whether user perceptions are influenced by the modality (voice vs. text) and device (smartphone vs. smart home device) when attempting to access sensitive health information from voice assistants [13••]. Participants were instructed to ask less and more sensitive health questions. In contrast, the high sensitivity information condition concerned sexual health questions like “Do you have to have sex to get an STD?” or “Do condoms affect orgasms?”. The findings suggest that voice interactions can enhance positive evaluations of the agent due to the perception of a social conversation between the user and the voice assistant. However, these patterns were only observed when the requested information was less sensitive, and the participants reported low privacy concerns.

Research Gap: High Sensitivity Information and Privacy

The study by Cho [13••] showed that contrary to the assumption that voice interactions would reduce the social perceptions of a voice assistant in highly sensitive contexts involving information sensitivity and privacy concerns. It was observed that text interactions had a significant impact on perceived social presence similar to voice interactions. That implies that in sensitive settings, interactions with voice assistants become more engaging and stimulating for users, thereby enhancing the social presence of the assistant regardless of the interaction [13••]. There is evidence that individuals exhibit a higher level of comfort when disclosing personal information to voice assistants that utilize a human-recorded voice without a visible face (i.e., disembodied), as opposed to an agent with synthetic facial features that simulates a human appearance [16••]. A systematic review analyzing the effectiveness and acceptability of conversational agents for sexual health promotion by Balaji and colleagues, however, noted that social presence was only found to be adequate in half of the systems studies, possibly due to the utilization of multimedia and embodiment through avatars [11]. Incorporating social presence into a system often praised for facilitating anonymous and non-human interactions raises an intriguing challenge, highlighting the necessity to strike a delicate balance. Although social presence has been implicated in the user acceptance of conversational agents in other domains, its specific impact in the sexual health field remains to be elucidated.

Discussion

It became evident that voice is utilized differently in arousal-oriented interactions than in sexual health communication. Thus, it underlines the importance of this taxonomy provided by Döring and colleagues [6 ••]. While it might be an essential feature in social settings to convey impressions of human likeness, it functions as a user-friendly interface for inquiries made in health communication. Table 1 summarizes relevant work that uses voice in sexuality-related technologies.

Table 1 Overview of relevant theoretical and empirical studies

The table demonstrates that the primary research emphasis often does not revolve around the utilization of voice. This contributes to the existence of numerous gaps in research. While specific gaps have already been tackled in the sections discussing arousal-oriented interactions with sexualized voice technologies and health communication, broader areas also require research, which will be further detailed ahead.

Examining Modalities: Voice- vs. Text-Based Conversational Agents

One important research gap concerns the comparison of modalities (voice- vs. text-based conversational agents) and, consequently, the applicability of voice assistants for arousal-oriented and sexual health communication. As previously stated, the research on the impact of modality on user perceptions remains insufficiently explored in the field of sexual health. Similarly, a knowledge gap exists in the realm of intimate communication. Therefore, it is crucial to investigate the role of modality to uncover the specific contribution of voice in close social interactions and knowledge acquisition. This approach can provide a comprehensive understanding of the significance and impact of voice within these domains. Taking this one step further, comparing voice assistants and embodied agents is also necessary. For example, Dworkin and colleagues investigated the application of embodied conversational agents for HIV medication adherence in young HIV-positive African American men [32]. Embodied agents exhibit the potential to function as customizable relational agents that can foster a socio-emotional connection with the user. Those agents can effectively facilitate education, encourage positive behavior change, and enhance user engagement through various modalities, including audio, graphics, animation, and text [32]. On the other hand, existing evidence suggests that individuals prefer to disclose personal information to disembodied conversational agents [16••]. Based on these findings, it can be postulated that voice, as a human-like cue, may hold greater relevance in facilitating intimate social interactions than knowledge acquisition in sexual health.

Diverse (and Inclusive) Perspective on Sexualized Interactions with Artificial Voices

In the following an outlook on important but underrepresented perspectives in the usage of voice technologies will be provided. Firstly, research on the arousal and non-arousal-oriented usage of voice assistant does not sufficiently address more diverse and inclusive user groups, while the technology has the potential to be used for non-heteronormative groups.

One already discussed but yet under-researched aspect is the gendering of the artificial voice: Even though synthetic, and even if ambiguous, users have the tendency to gender artificial voices [33]. In line with media equation theory, this can activate gender stereotypes [16••], including for instance the systems trustworthiness (e.g., [34, 35]) or likeability [36]. Therefore, the gender perception of the voice has a high relevance for sexual and intimate interactions, as well as for non-arousal activities. Given the fact that voice assistants are utilized by individuals of all genders, it is imperative that representation in voice assistants reflects this diversity. However, this is not yet the reality. The majority of voice assistants on the market have a feminine name [37] and are represented as feminine [38]. This however contributes to the potential replication of stereotypes as the mostly female voice fulfills the social role of a servant, which in some cases is heavily sexualized and degraded [39]. For example, Amazon’s voice assistant system Alexa consistently assumes a subservient role. The system establishes and sustains a link between women and obedience. Among other factors, this leads to sexual harassment within interaction with voice assistants [40, 41]. Therefore, several studies investigate, how conversational technologies respond to sexual harassment and verbal abuse (e.g., [40, 42]). While this can also be a mechanism of playing the system (testing out social norms what is not accepted in interpersonal interactions with other humans) is nevertheless a behavior that is observed and need to be researched longitudinally, especially in the realm of arousal-oriented tasks. While a more optimistic view conceptualized the female artificial voice as the new superpower who is able to answer all imaginable questions within milliseconds, it still needs to be highlighted that this might be more important in the realm of non-arousal-oriented tasks. While first voice assistant systems are equipped with the option to switch to a masculine artificial voice [33], a notable gap exists in the availability of voices that encompass the complete spectrum of gender presentations, including non-binary or gender-ambiguous voices [37].

Diversity is also a question of the data a system is trained with. Extensive research has underscored both explicit and implicit biases present in algorithms and datasets, which are used to train NLP systems, concerning gender [43,44,45], race (leading to discrimination and racism, e.g., [46, 47]), age (leading to ageism [48]), as well as their intersections [46, 49]. The predominant focus has been on gender stereotypes, harassment, and offensive language, particularly emphasizing restricted and/or unfavorable associations with femininities and individuals identifying as genderqueer [50]. Yet, according to Seaborn and colleagues, a disparity is evident in the way the “gender problem” is conceptualized, as most of the present work is guided by a sex/gender binary model of male and female. Therefore, they propose a more comprehensive examination of masculine biases and gender to expose imbalances and disparities in voice assistant-oriented NLP datasets [50].

When talking about more diverse and inclusive user groups, it is important to highlight people of different ages, sexualities, and users that are not in line with affordances of heteronormative user groups, such as people with physical or intellectual disabilities or medical conditions.

Adolescents, who often seek health-related information online, represent a significant user group. A study by Rideout and Fox found that approximately 87% of Americans aged 14 to 22 use online platforms for health-related data [51]. Additionally, the internet is a valuable resource for young individuals exploring their sexuality, particularly for LGBTQAI + youth seeking connections to with like-minded individuals [52, 53]. Moreover, the growing group of older adults exploring the possibilities the digital world offers for sex and love is another significant user segment. A recent study revealed that the online activities among older adults can be categorized into three groups: non-arousal activities (e.g., visiting educational websites or chatting on dating platforms), solitary arousal activities (e.g., watching pornography), and partnered arousal activities which involve at least one other individual (e.g., engaging in webcam sex) [54, 55]. Hence, older adults are utilizing the digital space in the domains that align with application areas discussed in this work. Yet, there exists a significant gap in understanding the potential role of voice for this user group. Voice-only technologies may offer relief for older individuals who, for example, have problems with typing. However, it could also pose a challenge for some who are unfamiliar with communicating with technical devices through speech or expressing intimate thoughts aloud as commands to a voice assistant.

Additionally, voice-only technologies offer a new potential from an accessibility perspective, as individuals experiencing motor, linguistic, and cognitive impairments can engage effectively with voice assistants, given they possess particular levels of remaining cognitive and linguistic abilities [56]. Studies show that despite the presence of certain accessibility challenges, individuals with a range of disabilities already utilize state-of-the-art voice assistant systems. This usage extends to unexpected scenarios such as speech therapy and providing support for caregivers [57]; it is therefore also imaginable that sexualized interaction could be of benefit for these user groups.

The Potential of Personalized LLMs

As LLMs enhance, personalization of voice interactions is evolving to become a key factor in satisfying user’s expectations for customized experiences that correspond to their individual needs and preferences [58]. The idea of a stronger personification can for instance in turn affect how users align their speech models with a system [59], or the level of parasocial relation they form with the voice [59] and consequently affect variables such as heightened preference, trust in the agent [60], and smoother conversations [61, 62]. In terms of arousal-oriented interactions, this might even be the key for ongoing dialogues and in some cases even romantic connections. And while personalized LLMs are likely to positively enhance the usage of heteronormative user groups, this will be specifically the key for user groups of more diverse affordances.

One imaginable example is users with hearing or speech impairments, as well as neurodivergent people, allowing adjustments such as altering the voice assistant’s speed or an appropriate language use to enhance their overall user experience, both for arousal and non-arousal applications.

Psychological Aspects and Ethical Considerations

However, there is a demand for further research to explore psychological components related to voice assistants, encompassing their capabilities and quality and factors like acceptability, trust, and trustworthiness. Previous studies on conversational agents in health care have highlighted various limitations. These include privacy concerns, limited conversational responsiveness, user-perceived undesirable personality (e.g., rude, lack of sympathy, patronizing, or judgmental), and a lack of trust in the creators of the digital assistants [12, 63]. It must be noted that providers of such technologies bear immense social responsibility; users can form deep relationships with artificial communication partners, and sudden updates or discontinuation of specific services can lead to severe social reactions (cf.: changing the offer of romantic partner mode in Replika and reports from users that went as far as suicidal thoughts [64]). However, it is essential to highlight that voice assistants in particular are heavily controlled by providers. The applications offered by the most widely used devices are subject to the rules of the respective providers, and these are often heavily regulated, especially in the area of sexuality. It, therefore, remains to be seen to what extent dedicated devices will be developed for intimate interactions with voice assistants or whether greater reliance will be placed on voice interaction via smartphones. This responsibility also extends to the realm of sexual health communication, encompassing social implications and considerations regarding privacy and data security. Previous studies have identified data privacy and confidentiality concerns as obstacles to the regular use of virtual assistants, particularly in contexts involving sensitive information, such as health contexts [65, 66]. Thus, it is reasonable to posit that these factors also significantly influence when addressing inquiries on sexuality, which involve deeply intimate subject matter and are widely regarded as highly sensitive.

Moreover, handling sensitive data brings focus to the notion of trust. As a result, inquiries emerge regarding the trustworthiness of these systems and the features they possess, which can influence the willingness of patients to trust them within a medical setting [67]. Those aspects can also be connected effectively to the abovementioned point, as certain user groups, such as children and adolescents, necessitate particular attention, especially concerning data security. Because a substantial proportion of children aged two to eight already engage with voice assistants daily, it is essential to consider the potential and risks associated with utilizing voice assistants for sexual health education and intimate communication in general [68]. Particularly in this context, it is crucial to comprehend the mechanisms involved in data storage and access, as research conducted by Szczuka and colleagues demonstrated that such understanding is negatively associated with children’s inclination to disclose private information to a voice assistant. More precisely, the language employed by the voice assistant in their study (e.g., using the phrase “I am silent as a grave” when asked about entrusting a secret) holds the potential to prompt children to disclose susceptible information, which can subsequently be accessed by unauthorized individuals [69]. As the act of safeguarding information is integral to the process of identity formation, this naturally includes highly sensitive details concerning body perception and sexuality. Voice assistants that are used in a sexuality-related context should therefore disclose their data policies and operate in the users’ best interests. This could for instance be utilized by providing easy options for data deletion, straightforward access to a data management system that outlines authorized access, and perhaps even implementing preventive prompts to remind users of privacy considerations. To achieve this, it would however also be important to have adaptive systems which recognize users and their different affordances, which is something that still needs to be implemented in state-of-the-art systems.

Limitations

Due to the limited empirical literature, this article can only be considered an overview of the usage of voice as a medium for social interactions involving sexually related artificial entities and sexual health-related communication. However, we believe that the findings of this work can serve as valuable resources for researchers, informing future work to broaden the presently limited scope of empirical knowledge in this field.

Conclusion

Empirical research on arousal-oriented interaction with sexualized voice technologies and non-arousal activities through voice is still underrepresented. Although initial studies suggest that voice assumes a crucial role in establishing connections with artificial agents by offering significant human and, consequently, social cues, findings concerning the significance of voice in health-related contexts remain varied. These results tend to lean towards perceiving voice more prominently as an interface type. Because natural language processing become rapidly more sophisticated, using voice in the broader field of sex-related technologies will likely play a more critical role soon. The present article encompasses pertinent gaps in research to spark future studies aimed at gaining deeper insights into the significance of voice in the realm of technology associated with sexuality.