Introduction 

The cost of randomized clinical trials (RCT) has increased in the past several decades. In consequence, multiple strategies have been explored to reduce the costs associated with these studies [1]. Artificially intelligent interactive voice response (AI-IVR) systems such as Cortana, Siri, and Amazon Alexa have particularly proliferated as a means to reduce overall expenditure and researcher workload. These technologies have been shown to facilitate data analysis and electronic data capture [1, 2]. Despite these benefits, the real-world acceptability of these systems for research purposes remain poorly understood, particularly in cardiovascular populations. In an attempt to address this knowledge gap and highlight opportunities for optimization, the objective of this study was to evaluate peoples’ attitudes and perceptions of AI-IVR systems when used for screening and electronic data capture with data from the Voice-Based Screening For SARS-CoV-2 Exposure in Cardiovascular Clinics study.

Methods

Study Design

The design of the Voice-Based Screening For SARS-CoV-2 Exposure in Cardiovascular Clinics study has previously been reported [3]. In brief, the study aimed to evaluate the reliability of electronic data capture with the AI-IVR system Amazon Alexa. Participants entering cardiovascular clinics were approached sequentially to participate in the study. Individuals who consented were screened for SARS-CoV-2 symptoms and risk factors by a research assistant and the Amazon Echo Show 8 device in a non-randomized cross-over design. Participants had the option to participate in the Amazon Alexa SARS-CoV-2 screening process in English or French. Amazon Alexa was selected as the investigational device because of its commercial availability and flexibility for creating data capture fields. The current study is a post hoc analysis of a post-screening survey that elicited participants’ perceptions about the use of Amazon Alexa for research and clinical use.

Data Collection

Participants who consented to participate in the post-screening survey were invited to provide verbal comments in French or English about the screening process in a non-structured interview with a research volunteer. The comments were then typed by the research volunteer ad verbum. This study was approved by the local ethics board at the McGill University Health Centre.

Data Synthesis

Thematic analysis was performed on the collected data utilizing a systematic strategy proposed by Nowel et al. [4]. First, qualitative coding was performed on the ad verbum comments (e.g., “Alexa [had] trouble hearing”) to translate them into their intended meanings or concerns (e.g., there were challenges with communication). Then, thematic analysis was conducted inductively on the translated comments to produce overarching thematic domains reflecting individuals’ attitudinal relationship with the AI-IVR system. Thematic theme prevalence was calculated by using the number of individuals who responded to the post-screening survey as the denominator for each theme. This process was done independently by E.G., and independently validated by A.R. Disagreements were resolved by consensus or by a third reviewer (A.S.) as needed. The thematic trail is available in Appendices A to C.

Results

Overall, 215 people (mean age 46.1; 55% females) consented to participate in the Voice-Based Screening For SARS-CoV-2 Exposure in Cardiovascular Clinics study. Among these individuals, 31% spoke French and 47% worked in the hospital where the study was conducted. Together, 58 (27%) of the 215 individuals consented to participate in the post-screening survey. From these subjects, 73 ad verbum comments were obtained, yielding an average of 1.3 comments per participant. From the 73 comments, 1 was removed because it did not provide attitudinal data regarding the use of AI-IVR.

Following thematic analysis, a total of four key themes affecting the acceptability of AI-IVR systems were identified (Fig. 1). These were difficulties with communication (44.8%, n = 26/58), preferences towards human interaction (41.4%, n = 24/58), concerns with universality and accessibility (27.6%, n = 16/58), and barriers with the development of therapeutic relationships (i.e., therapeutic alliance; 8.6%, n = 5/58). These attitudinal themes and their definitions are described in detail in Table 1.

Fig. 1
figure 1

Attitudinal themes affecting the acceptability of artificially intelligent interactive voice response systems in cardiovascular research settings 

Table 1 Summary of users’ perceptions and attitudes towards the use of AI-IVR systems for electronic data capture

Within the communication theme, users frequently reported issues with the device’s ability to receive (29.3%, n = 17/58) and produce speech (15.5%, n = 9/58). In the interaction theme, participants reported that the absence of certain features (e.g., the absence of a sound to prompt users to answer a question) rendered the device difficult to use (32.8%, n = 19/58). In addition, within the interaction theme, some people reported technical difficulties, such as impromptu cease of functioning (8.6%, n = 5/58). The universality and accessibility theme encompassed the device’s ability to accommodate user-specific constraints; the most frequent comments related to the device’s time-consuming screening (15.5%, n = 5/58), followed by concerns with question flexibility (5.2%, n = 5/58), and accessibility for different populations (6.9%, n = 4/58). Finally, the therapeutic alliance theme related to users’ perception of the device’s ability to establish a clinical rapport. Within this theme, users reported a preference for human interaction (8.6%, n = 5/58).

Discussion

This study was designed to evaluate individuals’ perceptions and attitudes towards AI-IVR systems in a real-world setting when used for screening and electronic data capture. A thematic analysis of the obtained comments was conducted, and a total of 4 themes were identified. The most frequently reported attitudes affecting acceptability related to participants’ communication with Amazon Alexa. Participants frequently expressed disagreement with the device’s ability to readily understand answers and the device’s absence of certain features (e.g., audio prompts signaling response), rendering the device moderately difficult to use. Participants also frequently commented on accessibility, and the perceived time-consuming process of AI-IVR screening when compared to human-led screening. Finally, in the theme of therapeutic alliance, some participants noted a preference to undergo screening by a human.

One of our key findings was the fact that issues with communication were the most frequently reported themes with the use of AI-IVR in cardiovascular clinical contexts. To date, the use of AI-IVRs in medicine has shown benefit in research and clinical practice [5,6,7,8]. However, these studies have evaluated the use of these devices in controlled clinical settings. In the real world, clinical settings may not always be ideal for the adequate functioning of AI-IVR technologies. Several studies have noted that noise levels in hospitals exceed WHO recommendations [9]. Furthermore, there may be a high prevalence of voice-altering disorders in hospitals due to comorbidities that affect individual’s ability to produce speech [10]. Additionally, participants in the study wore N95 or surgical masks during screening by Alexa as mandated by hospital policies. It has been noted that the use of N95 masks resulted in a significant decrease in speech perception [11]. As a result, these devices should be used in specific settings where ambient noise levels are optimal and if possible, in the absence of masking. For instance, the use of AI-IVR devices could be limited to quite hallways, hospital alcoves, or private rooms. Moreover, screening with AI-IVR could be done with carefully selected populations by excluding people with hearing deficits or voice-altering comorbidities.

The current study also showed that concerns with interaction were a frequently reported theme affecting acceptability. Detailed demographic data for participants in the current study are not available. However, individuals of older age are more likely to utilize the healthcare system [12] and certain studies have associated older age and baseline cognitive ability with the decreased capacity to use novel technologies such as AI-IVR [13]. In this sense, demographic data and background medical histories such as age and neurological comorbidities may be important in further understanding interaction limitations within subgroups. Modifications to the device’s functioning based on comments within this theme may improve usability. More specifically, users reported that auditory feedback prompting an answer may facilitate interaction with AI-IVR systems. Additionally, certain subjects noted the need to reboot the device. Feedback loops could consequently be programmed to recognize faulty screening processes, enabling automated reboots. These self-correcting mechanisms could enable patients with relatively lower technological abilities to interact with AI-IVR systems more easily.

Parallel to this, survey participants often expressed concerns about the universality and accessibility of AI-IVR systems. In particular, 15.5% (n = 9/58) of people thought the AI-IVR screening process took longer than the human-led one. Concerns about the accessibility of AI-IVR screening in certain demographics (e.g., the elderly) as well as the screening procedure' flexibility (i.e., question formulation) were also often expressed (6.9%, n = 4/58). As a result, future research may aim to compare human and AI-IVR screening times, as well as whether AI-IVR screening should be restricted to particular groups. Nonetheless, adding features such as speeding up Alexa’s speech, providing a visual interface with written queries, and options to reformulate screening questions may improve AI-IVR acceptability.

The therapeutic alliance theme was the least frequently reported limitation in this study. Participants with comments on this theme specifically noted preferring human interactions with no specifiers. A study looking at AI-based chatbots in healthcare among individuals with a mean age of 30 identified several shortcomings with this technology. In this study by Nadarzynski et al., users reported preferring human interaction as chatbots were perceived as unable to display empathy or understand emotional issues [14]. Moreover, it is unknown if the Echo Show 8 device is able to address these shortcomings with more sophisticated programming. Current research suggests that AI is capable of emulating human-like behavior [15]. Continued development of AI-IVR systems with more sophisticated programming could address this limitation. Finally, analysis of willingness to undergo AI-IVR screening among participants who prefer human interaction may highlight the relative importance of therapeutic alliance when electronic data capture is the goal.

Limitations

This study had several limitations. The post-screening survey did not collect language preference or demographic (e.g., age, sex, race) information on the individuals who contested to participate in the post-screening. As a result, it was not possible to conduct subgroup analyses and delineate language or demographic-based differences in attitudinal characteristics. Moreover, the implementation of quality-control measures (e.g., duration of screening, ambient noise levels) was not done, which could permit more quantitative comparisons between AI-IVR and human screening. In parallel, the addition of qualitative measures of willingness to undergo screening by AI-IVR systems despite perceived limitations could prove relevant prior to the widespread implementation of these devices. Results obtained based on stratified data and quantitative measures will help guide required improvements in AI-IVRs with the aim to reduce perceived limitations. Finally, most participants in the study wore masks. As a result, this could have affected the prevalence of reported themes affecting acceptability. Finally, subgroup analysis based on demographic data, spoken language, and the use of a mask may help further clarify the impact of these variables on communication limitations.

Conclusion

This study was designed to evaluate peoples’ perceptions and attitudes towards AI-IVR systems when used in cardiovascular settings for electronic data capture and research screening. The study suggests that AI-IVR systems were generally well-accepted and that people often responded favorably. However, the findings also demonstrated that communication issues were the most prevalent theme affecting acceptability followed by interaction preferences, therapeutic alliance, and universality and accessibility. Prior to increased implementation of AI-IVR in cardiovascular research settings, future studies will need to aim to address these thematic domains to facilitate widespread implementation.