1 Introduction

The development of new devices, such as smart speakers or wearables, and recent advances in artificial intelligence (AI) that facilitate more natural interactions via speech or gestures are changing the interplay between user, task, and technology within information systems (IS). Today, people own multiple different devices, such as personal computers, smartphones, tablets, or smart speakers, and use them interchangeably, often switching between multiple devices in order to complete a task (Levin 2014; Westcott et al. 2020). In addition, many of these devices afford users new ways of interacting with them through touch, speech, or gestures (Turk 2014). For example, customers can shop at Amazon using multiple devices and multiple interaction modalities as well as combinations thereof (e.g., Amazon’s Echo Show devices combine speech interaction with a touchscreen display). The same trend can be found at the workplace, where employees can, for example, use natural language to interact with enterprise resource planning (ERP) systems (e.g., SAP CoPilot) or business intelligence and analytics (BI&A) systems (e.g., Tableau Ask Data).

This trend has not gone unnoticed by market research firms that recently introduced the term multiexperience (MUX) as a key area of strategic importance in the next years (Gartner 2019). There also is a plethora of existing research related to MUX that provides insights into how users interact with IS across different devices and modalities and how to design for MUX (Brudy et al. 2019; Li and Zhang 2005; Turk 2014; Zhang et al. 2009). However, what is new today is that the sheer number of available devices and mature modalities presents an opportunity – and challenge – to better meet users’ needs and preferences when they interact with an IS to perform tasks. Similar tasks can be performed quite differently and may result in different outcomes, depending on the devices used (e.g., smartphones vs. smart speaker) and modalities available (e.g., clicking on a screen vs. speech input) (Diederich et al. 2020; Rzepka et al. 2020a). Thus, there is a need to improve our understanding of the nature of MUX and the roles of devices and modalities within IS. Considering that many devices (e.g., virtual reality headsets) and modalities (e.g., speech interaction) are now beginning to reach a level of maturity which allows widespread application, we believe the time is ripe to (re)define the concept of MUX.

The goal of this catchword is to build a bridge between the new term of MUX and existing research in our field in order to provide a solid conceptual grounding for MUX and identify future research opportunities for the BISE community. Drawing on the rich body of research on multi-device and multimodal IS in the fields of IS and human–computer interaction (HCI), we propose a clear conceptualization of MUX and provide a framework of three guiding paths toward MUX that may be equally useful for researchers and practitioners in the BISE community. Additionally, we describe several real-world examples to explain the benefits and challenges of each path in our framework and offer practical guidance for moving along these paths toward MUX. Finally, we outline promising areas for future research and highlight the unique position of the BISE community in capitalizing on these opportunities.

2 Conceptual Foundations: Multi-Device and Multimodal Information Systems

In the early days of personal computing, researchers and practitioners primarily focused their efforts on a single device – the personal computer (PC). However, already in 1991, Mark Weiser shared his vision of a world in which people can interact with content across multiple computing devices in different shapes and sizes (Weiser 1991). This idea served as inspiration for research that went beyond a single user at a single computer (Brudy et al. 2019). A classic example is Rekimoto’s seminal work from the late 1990s on interaction techniques that crossed device boundaries of multiple portable computers and displays on table/wall surfaces (Rekimoto 1997; Rekimoto and Saitoh 1999). Around the same time, commercial products such as the first BlackBerry device were introduced that offered new ways to do everyday tasks that would typically be done on a computer in an office (Lyytinen and Yoo 2002). For example, people could not only send and receive emails but also synchronize their schedule, tasks, and contacts with their PC – something that is commonplace nowadays but represented a major leap forward at that time. Since the advent of mobile devices in the 1990s in general, and smartphones in the 2000s in particular, the number and diversity of devices has grown significantly. In addition, many companies allow employees to use private devices for work purposes, and vice versa (Köffer et al. 2015). Today, the most popular devices range from PCs, smartphones, tablets, and TVs to smart speakers, smartwatches and augmented (AR) or virtual reality (VR) devices (GlobalWebIndex 2020). Each device is characterized by certain display capabilities (e.g., screen size), processing power, input/output modalities, and sensors (Levin 2014). Furthermore, people use devices much differently than they did 10 or 20 years ago. Since many tasks span multiple devices (Dearman and Pierce 2008), people use several devices simultaneously and switch between them to complete a single task (Brudy et al. 2019). For example, Netflix’s “Continue Watching” feature allows customers to start watching a movie on one device (e.g., a TV in the living room) and continue watching on another device (e.g., a smartphone or tablet) while commuting or waiting for an appointment (Netflix 2013). However, there are also several technical challenges associated with multi-device IS, particularly when it comes to sharing information and keeping it consistent across multiple devices (Dong et al. 2016). Commercial solutions, for example, are often limited to devices within a particular manufacturer’s ecosystem (e.g., Apple) and there are only few open standards that support the integration of multiple devices (Brudy et al. 2019). Nonetheless, current trends suggest that researchers and practitioners cannot consider the PC, smartphone, or any other device as a standalone platform anymore, but need to understand and design for use patterns across multiple devices (Levin 2014).

An important distinguishing feature of the aforementioned devices is that they offer users a wide variety and diversity of interaction modalities (hereafter referred to as modalities for simplicity). Modality broadly refers to the type of communication channel used to convey or acquire information (Nigay and Coutaz 1993). This includes both input modalities (i.e., users providing data to the system) and output modalities (i.e., users receiving data from the system). Users provide input using their effectors (e.g., limbs, eyes, vocal system, head) and perceive output through their five senses (i.e., sight, hearing, touch, smell, and taste). For example, in the interaction with a PC, users primarily provide input through typing on a keyboard or mouse clicks using their fingers (i.e., limbs). Further input modalities, such as speech, mid-air gestures, and eye gaze, that leverage other effectors are possible but less common today. In terms of output modalities, users primarily rely on their visual sense since most applications on a PC feature a graphical user interface. Secondary output modalities that leverage other senses, such as audio output (e.g., “beeping” when an error occurs), may also play a role. Table 1 provides an overview of different categories of devices and their input as well as output modalities.

Table 1 Examples of devices and input/output modalities

Although humans employ multiple senses and effectors to interact with the world around them, research in the fields of IS and HCI has historically been focused on unimodal interaction (i.e., using only a single input and a single output modality) (Liu et al. 2019; Turk 2014). An example is the PC that displays text on a screen with a keyboard for input. However, advances within the AI subfields of natural language processing and computer vision as well as affordable sensor technology are paving the way toward more natural interactions that replace or complement traditional modalities (Turk 2014). Multimodality refers to the use of more than one input and/or output modality in the interaction (Nigay and Coutaz 1993). These modalities can be used simultaneously or sequentially during the interaction. For example, in the interaction with an Amazon Echo Show device, users can provide input via speech (e.g., “Alexa, what’s the weather like in Berlin today?”) and its touch screen (e.g., touching a button) and receive spoken output (e.g., The current weather in Berlin is …”) and visual output on the screen (e.g., a weather forecast). The key assumption is that “well-designed multimodal systems integrate complementary modalities to yield a highly synergistic blend in which the strengths of each mode are capitalized upon and used to overcome weaknesses in the other” (Oviatt 1999, p. 74). Indeed, research has shown that multimodality can lead to higher task performance (Lee et al. 2001) and improve learning outcomes (Suh and Lee 2005). Similar to multiple devices, the integration of input from multiple modalities is a key technical challenge (Reeves et al. 2004). The main reason is that due to the unique characteristics of each modality, there are no obvious points of similarity and therefore no straightforward ways to connect them (Turk 2014). Over the years, several so-called fusion approaches have been developed to address this challenge (Jaimes and Sebe 2007). In general, fusion can be performed at different levels, ranging from feature level (i.e., integrating input signals) to higher semantic levels (i.e., integrating common meaning representations derived from different modalities) (for an overview, see Jaimes and Sebe 2007). Although significant progress has been made, technical challenges related to the integration of modalities remain and the development of fusion approaches continues to be an active area of research.

To summarize the conceptual foundations of MUX, Table 2 provides an overview of the two research streams on multi-device and multimodal IS with key papers and exemplary artifacts.

Table 2 Overview of literature streams related to multiexperience

3 Three Paths Toward Multiexperience

Against the backdrop of the conceptual foundations, we define MUX as the user’s perceptions and responses resulting from the use and/or anticipated use of an IS that leverages multiple devices and/or multiple modalities. As such, it combines the terms multi (i.e., multi-device, multimodality) and (user) experienceFootnote 1 to account for the increasing prevalence of more than a single device and/or modality within an IS. New devices and advanced interaction modalities have certainly made scenarios a reality that belonged in the realm of science fiction just a few years ago (e.g., talking to an ERP system or wearing smart glasses that display information directly in the field-of-view). However, since both the IS and HCI field have a long tradition of investigating how users interact with IS across different devices and modalities and how to design IS with multiple devices and/or modalities, we argue that MUX is not an entirely new phenomenon. What is new is that the sheer number of available devices and modalities today provides a greater opportunity and challenge to better meet users’ needs and preferences when they interact with an IS. In addition, many devices (e.g., virtual reality headsets) and modalities (e.g., speech interaction) are now beginning to reach a level of maturity that allows widespread application.

To shed a more nuanced light on the concept of MUX, we propose and describe a conceptual framework of guiding paths toward MUX. As depicted in Fig. 1, the framework conceptualizes MUX based on the two axes of devices and modalities, illustrating the shift from single to multiple devices and/or modalities. Drawing on previous research on multi-device and multimodal IS, we propose three paths toward MUX that differ not only in their reliance on multiple devices and/or multiple modalities, but also in their prevalence in prior literature. They are intended to serve as starting points for individuals and organizations who seek to proceed on the path toward MUX. In this spirit, our framework is meant to assist; not to constrain or suggest that other paths are not possible. There are four important points to be highlighted. First, our framework suggests that MUX is not achievable when there is only a single device and a single modality. Fundamentally, MUX requires at least two devices or two modalities, but not necessarily both (i.e., multiple devices and multiple modalities). Although many of today’s devices technically support multiple modalities, applications running on these devices do not always capitalize on this potential. For example, a smartphone-only application that only supports touch input and visual output would not be able to achieve MUX. Second, our framework allows for variance in MUX. Similar to the use of the UX concept, MUX can vary from low to high. Just because another device is supported or another modality is added does not automatically imply that MUX has improved. For example, most websites today can be accessed from multiple devices, but a website that offers the exact same layout and content across all devices would achieve rather low MUX (e.g., because text could be difficult to read on a smartphone or large images could result in slow loading times). In contrast, higher MUX could be achieved when the website is optimized for each device (i.e., using a responsive design) or dedicated apps are developed for smartphones and tablets. Consequently, the extent to which MUX is achieved also depends on how devices and/or modalities are integrated and allow users to transition from one device or modality to another during use. Third, there are different entry points to each of the three paths. In this sense, our framework is not bound to a specific sequence or set of devices and/or modalities to achieve higher MUX. For example, one company might enter the path to MUX by adding a mobile app to run alongside their website, whereas another might choose a very different path by adding speech interaction capabilities to their ERP system. Finally, our framework suggests that MUX can be achieved by moving along one axis – either vertically along Path 1 (devices) or horizontally along Path 2 (modalities) – or along both axis simultaneously. However, as we explain in the next sections, the greatest potential lies in Path 3 that leverages both multiple devices and multiple modalities rather than focusing on one dimension alone.

Fig. 1
figure 1

Conceptual framework of guiding paths toward multiexperience

3.1 Path 1: Leveraging Multiple Devices

Technological development has brought – and continues to bring – a constant stream of new devices to the market. Therefore, taking advantage of these new devices and the opportunities that they offer is a common path toward MUX. For example, starting in the 1990s, mobile devices became increasingly popular and today they are an essential part of our everyday lives. Many companies have developed mobile apps to complement existing desktop applications. However, particularly at the beginning, this often resulted in merely migrating functionality, content, and design from existing applications to mobile devices without taking their specific characteristics, such as smaller screens and limited keyboard input, into account (Levin 2014). Similar trends can be observed for AR/VR devices with applications that try to faithfully recreate existing functionality from mobile or desktop applications without taking the main advantage of AR/VR – not being bound to characteristics of physical reality – into account (Berkemeier et al. 2019; Wohlgenannt et al. 2020). These observations indicate that leveraging multiple devices can be a viable path toward MUX. However, when such efforts are based on a limited understanding of technology characteristics, task characteristics, and user needs, it will be rather difficult to achieve higher MUX. The following two examples may serve to illustrate past and current efforts on this path toward MUX.

3.1.1 Example #1: Mobile Banking Apps

With the advent of the internet in the 1990s, banks started offering online banking to supplement traditional offline (e.g., ATM, local branch) and phone banking. Customers with a PC connected to the internet could access their accounts and conduct financial transactions through the bank’s website. A few years later, when the first cell phones came out, banks launched the first mobile banking services via SMS. However, mobile banking only became an important banking channel after smartphones were introduced at the end of the 2000s. Today, most banks offer native mobile banking apps that enable customers to access their bank accounts through smartphones and tablets in order to conduct a range of financial transactions, including balance checks, fund transfers, and stock trading. However, customers’ usage of mobile banking apps often remains either rudimentary (e.g., only checking balances) or lacking altogether (Crowe et al. 2017). Hoehle et al. (2017) provide the example of a bank that spent EUR 300,000 on designing a mobile banking app that was only used by a handful of customers. Research suggests that customers’ usage patterns of mobile banking apps are related to technology characteristics (Hoehle et al. 2017; Kim et al. 2009). Some tasks (e.g., more complex financial transactions) may be too difficult to perform on a mobile device due to its small screen and on-screen keyboard. In contrast, the same task may be much easier to perform on a laptop or desktop computer with a physical keyboard and a larger screen. Consequently, when leveraging multiple devices, it is also important to develop a thorough understanding of technology characteristics (e.g., screen size) and task characteristics (e.g., simple vs. complex) in order to balance the strengths and weaknesses of each device.

3.1.2 Example #2: Augmented and Virtual Reality in E-Commerce

The gaming industry is considered the pioneer in the use of AR and VR (Wohlgenannt et al. 2020). However, AR and VR applications that can, for example, be experienced via head-mounted displays, smart glasses, or smartphones are increasingly employed by e-commerce providers as well (Wedel et al. 2020). Their main goal is to overcome e-commerce’s inherent limitation “that online consumers can only passively understand the product information but cannot touch and feel the product” (Tarafdar et al. 2019, p. 1). AR and VR applications can allow consumers to evaluate products in real scale and from different angles (Peukert et al. 2019). For example, to complement existing online shopping experiences via their website and mobile apps, the Swedish furniture company IKEA has developed an AR application that enables consumers to view furniture in real size, from different angles (360° view), and at the intended place (Ozturkcan 2020). Furthermore, IKEA provides different VR applications that help consumers to increase the imagination of product arrangements and encourage co-creation with others. Similarly, Europe’s largest retailer for consumer electronics (the MediaMarktSaturn Retail Group) offers a holistic VR shopping environment, Virtual SATURN, encompassing several products from their online shop. Nevertheless, there are also examples of AR and VR applications that are just standalone “gimmicks” – either to serve as marketing tools or as a means to gain experience with this novel technology (Peukert et al. 2019). Consequently, it is important to reflect on the use of AR and VR for tasks that have little additional benefit when compared to physical reality (Steffen et al. 2019) and explore how to integrate AR and VR applications with existing applications offered on other devices in a way that generates substantial added value for consumers.

3.2 Path 2: Leveraging Multiple Modalities

Humans interact with the world through multiple senses and effectors. However, most IS have traditionally focused on unimodal interaction (i.e., a single input and output modality), such as providing visual output on a screen with a keyboard for input. In recent years, system designers have begun to complement the more traditional modalities, such as mouse, keyboard, and touch, with more advanced modalities such as speech or mid-air hand gestures. For example, Apple’s Siri allows users to perform various commands via speech (e.g., setting an alarm or creating a to-do list), which traditionally had to be made via keystrokes or touching buttons. Similar virtual assistants are introduced to the workplace to assist in work-related tasks (Mirbabaie et al. 2021; Seeber et al. 2020). Furthermore, e-commerce providers are increasingly complementing touch- and mouse-based interaction with gesture-based interaction (i.e., reaching, pointing, and manipulating products using hand movements in the air) in order to provide a more natural interaction experience (Liu et al. 2019). However, multimodal interaction capabilities alone do not automatically result in a better or more natural interaction with a system. The number one myth about multimodality is that “if you build a multimodal system, users will interact multimodally” (Oviatt 1999). Whether or not users interact multimodally depends upon many factors, including the nature of the task, the current environment, as well as the user’s individual expectations, experience, and needs. Moreover, different modalities vary in the degree to which they are capable of transmitting similar information (Oviatt 1999). For example, comprehensive results from data analyses in a BI&A system are rather difficult to communicate to users via speech output, while the same information can be easily conveyed using visual output in the form of a graph or chart. Therefore, simply replicating the functionality of one modality in another modality is unlikely to play to the particular strengths of each modality. This is particularly important when multiple modalities can be used simultaneously or sequentially (e.g., pointing at an object and then speaking a command). Taken together, leveraging multiple modalities can be another viable path toward MUX. However, providing multimodal capabilities alone is not sufficient to realize the benefits of having more than one modality available. A thorough understanding of the unique strengths and weaknesses of each modality, the nature of tasks, and user needs is also required to achieve higher MUX. The following two examples may serve to illustrate past and current efforts on this path toward MUX.

3.2.1 Example #1: From Smart Speakers to Smart Displays

Looking at the recent history of smart speakers, it is interesting to observe how they have evolved from smart speakers to smart displays. While smart speakers (e.g., Amazon Echo, Google Home) offer only one modality – speech – for input and output, users can interact multimodally with smart displays (e.g., Amazon Echo Show, Google Nest Hub) because they combine speech interaction with a touchscreen display. For example, users can provide input via speech (e.g., “Alexa, order toilet paper”) and touch (e.g., selecting a product by touching a button on the screen) and receive both speech output (e.g., “Here are some options for toilet paper”) and visual output (e.g., different products with names, images, and prices). In contrast, when a user provides the same speech input to a smart speaker (i.e., “Alexa, order toilet paper”), it would respond with something like “The top choice for toilet paper is (product name). It costs (product price) euro in total. Would you like me to order it?”. Since it is difficult to present several alternatives via speech output, the smart speaker selects a “top choice”, for example based on the users’ shopping history, and asks them if they want to make the purchase. However, while this product selection process can increase efficiency, it also comes at the cost of transparency and control (Rzepka et al. 2020b). A lack of transparency and control may be less critical for routine tasks, such as playing music or getting weather updates, but play an important role in high involvement tasks (e.g., purchase decisions). Consequently, smart displays try to combine the best of both worlds by leveraging multiple modalities: efficiency via speech in-/output and transparency by augmenting speech output with visual information.

3.2.2 Example #2: Multimodality in Augmented and Virtual Reality

A fundamental characteristic of AR and VR applications is their extensiveness – i.e., “the range of sensory modalities accommodated” (Slater and Wilbur 1997, p. 605). AR/VR applications not only differ in the number of modalities they offer but also in the extent to which these modalities are stimulated. Although head mounted displays (HMDs) with visual output represent the main category of output devices, additional input and output modalities are increasingly integrated into AR/VR devices and, in turn, leveraged by applications (for a comprehensive overview of input and output devices, see Anthes et al. 2016). For example, many HMDs already provide audio output and, in the future, this modality may be complemented with haptic feedback ranging from controller vibrations to realistic force feedback in order to make forces originating from virtual objects perceptible (e.g., HaptX Gloves or Teslasuit). The latest HMDs are equipped with various sensors that provide the necessary hardware components for multimodal interactions. For example, Facebook’s Oculus Quest 2 is able to capture hand movements via the built-in external cameras and understand speech commands. Therefore, users can control apps through gesture- and speech-based interaction as well. These functionalities can – when supported by an application – make controllers obsolete. For example, YouTube’s VR app can be controlled with only hand gestures. Microsoft’s HoloLens 2 goes one step further and – in addition to gestures and speech – allows gaze-based interactions via integrated eye-tracking technology. As a result, natural interactions are possible even in hands-free scenarios, such as picking tasks in logistics, or remote assistance use cases at the workplace where gloves must be worn or fingers become dirty.

3.3 Path 3: Combining Multiple Devices and Multiple Modalities

The final path in our conceptual framework results from the conflation of both previously described paths. This path can be regarded as the logical next step in the efforts toward MUX because it seeks to leverage both multiple devices and multiple modalities. While researchers and practitioners have traditionally focused their efforts on either devices or modalities, recent years have seen an increased interest in the combination of both elements. For example, Domino’s AnyWare platform allows customers to order pizza in 15 different ways using various devices – laptops, smartphones, smart speakers, smart watches, smart TVs, and even cars – and different modalities – mouse and keyboard, touch, and speech (Domino’s 2021). However, as the example indicates, this path can quickly become complex and unmanageable because of the sheer number of available devices and modalities. The constant stream of new devices and mature modalities leads to an overwhelming number of possible combinations (i.e., number of devices times the number of modalities supported by each device). Therefore, the key challenge on this path is to choose wisely among the numerous possibilities and identify those combinations that provide the greatest benefit to users. To tackle the increased complexity, it is essential to develop a holistic understanding of the interplay between devices and modalities (e.g., strengths and weaknesses), tasks as well as user needs and preferences. Despite these challenges, the plethora of options also opens up the opportunity to balance the relative strengths and weaknesses of different devices and modalities in order to better meet users’ needs overall. Hence, we argue that this path is the one with the greatest potential for both improving user interaction with IS and making significant contributions to research. The following two examples may serve to illustrate past and current efforts on this path toward MUX.

3.3.1 Example #1: Multiexperience in Enterprise Resource Planning (ERP) Systems

Traditionally, ERP systems have focused on a consistent graphical user interface that provides visual output and allows users to interact with the system via mouse and keyboard (Klaus et al. 2000). Most employees access their company’s ERP system when they are in their office at a desk equipped with a PC, mouse, and keyboard. Given the complex information that is conveyed to users through transaction screens, tables, and reports as well the nature of work, the workplace setting will likely continue to be the dominant context of use. However, with the growing popularity of mobile devices in the 2000s, software vendors started to enable mobile access to ERP systems without requiring a local ERP client (Markus et al. 2000). Today, many vendors offer mobile applications for smartphones and tablets (e.g., Sage Mobile Sales app) and provide platforms or frameworks to allow customers to develop their own mobile applications (e.g., Oracle Mobile Application Framework). Recently, ERP systems have also started integrating speech interaction via written or spoken language to complement existing modalities of the traditional graphical user interfaces (vom Brocke et al. 2018). For example, in 2017, SAP launched CoPilot, a digital assistant integrated into the SAP Fiori user interface that can be operated via speech (SAP 2017). Instead of using mouse and keyboard to enter transaction codes, users can also speak or write natural language commands to navigate the interface and perform routine tasks (e.g., “create a sales order for customer X”). Moreover, CoPilot can not only be used from inside an SAP application, but also comes as a standalone mobile application for smartphones and tablets. As a result, users are able to switch seamlessly between desktop and mobile devices and between traditional and speech modalities according to their individual preferences and the characteristics of the task at hand. At their desks, users may prefer to interact with the Fiori user interface via mouse and keyboard, while only occasionally using the CoPilot to enter natural language commands for specific transactions. At home and on the go, they may favor the CoPilot mobile app and interact with it via speech commands, for example, to get a quick overview of current sales and inventory levels for an upcoming meeting. As a result, more efficient and intuitive interactions with an ERP system are possible through combining multiple devices and multiple devices. Therefore, users can not only select the most suitable device and modality for their current task, but also perform a task seamlessly across multiple devices and multiple modalities.

3.3.2 Example #2: Multiexperience in Business Intelligence and Analytics (BI&A) Systems

Important business decisions need to be made by individuals, teams, or groups in many places – at the desk, in meetings, or in the field. To support data-driven decision-making in all of these situations, BI&A systems have evolved from traditional desktop applications for expert users to flexible systems that leverage both multiple devices and multiple modalities to accommodate a wide range of users (Chen et al. 2012). Today, decision makers can use BI&A systems on their smart phones or tablets while on the go (Power 2013). In meetings, cross-functional teams can make decisions together by collaboratively interacting with BI&A systems on large interactive screens (e.g., Microsoft’s Surface Hub) (Ruoff and Gnewuch 2021). To facilitate transparent interaction, particularly for non-expert users, BI&A systems increasingly support multiple modalities. For example, Tableau’s Ask Data feature helps users visualize and analyze data by asking a question in natural language. Moreover, mid-air hand gestures have been found to facilitate the collaborative analysis of complex data in BI&A systems (Butscher et al. 2018). Bringing both trends together, BI&A systems increasingly try to combine multiple devices and multiple modalities to better meet decision makers’ needs and preferences when performing different tasks. At their desks, individual decision makers may use mouse and keyboard and leverage the large screen of their PCs to perform complex data analyses tasks and prepare detailed management reports. In meetings, teams of decision makers may use a combination of speech- and touch-based interaction to perform ad-hoc analyses on large interactive screens. In the field, particularly when hands-free operation is required, individuals or teams may use speech or gestures to analyze data on the fly. As a result, faster and more intuitive interactions with BI&A systems are possible through combining multiple devices and multiple devices. Moreover, these BI&A systems put the human at the center because they allow users to choose the device and modality most suited for the characteristics of the task at hand and switch between them in accordance with their individual preferences.

4 Future Research Directions

This catchword sheds light on the concept of MUX and proposes a framework with two dimensions – devices and modalities – and three different guiding paths toward MUX. Drawing on illustrative examples for each path, we embed the concept of MUX into existing streams of research in IS and HCI, explain benefits and challenges of each path, and offer practical guidance for moving along these paths toward MUX. While substantial research has been conducted on the first two paths toward MUX, less attention has been paid to the third path, which seeks to leverage the combination of multiple devices and multiple modalities. Therefore, many promising research questions remain and the BISE community is well suited to address them. In the following, we suggest four areas for future research on MUX. Table 3 summarizes the identified future research directions and provides illustrative research questions.

Table 3 Suggested directions and questions for a research agenda on multiexperience (MUX)

First, while this catchword represents a valuable step toward a better understanding of MUX, our conceptualization could benefit from further refinement. Our framework of paths toward MUX provides sufficient structure to guide future research, but also leaves ample room for further exploration of the nature of MUX and the interplay between devices and modalities. For example, future research could systematically identify and classify the many different combinations of devices and modalities that can be used for MUX (e.g., in the form of taxonomies or morphological boxes). Another vital step would be to operationalize the concept of MUX and develop suitable measurement instruments. These instruments would be equally useful for researchers who seek to empirically evaluate MUX and practitioners who want to assess their software products’ utility. Existing measurement instruments for usability and UX, such as the system usability scale (Brooke 1996) and the user experience questionnaire (Laugwitz et al. 2008), could serve as a suitable starting point. Additionally, future research could identify ways to measure MUX objectively using behavioral data, such as interaction logs across devices and modalities, to complement self-report measures.

Second, another promising direction for future research is the empirical investigation of MUX. From an individual perspective, such work could, for example, examine whether and how MUX influences the adoption and use of IS. Since MUX involves utilitarian and hedonic aspects, studies should investigate both instrumental (e.g., better performance) and experiential outcomes (e.g., enjoyment) as well as potential trade-offs and synergies between them. Given the important role of contextual factors in MUX, future research is also needed to better understand how users behave in different contexts (e.g., at work, at home, while riding on a subway), when performing different tasks (e.g., information search, online transactions, entertainment), and when switching between different contexts and tasks. Moreover, future studies should consider how individual differences, such as demographics, personality characteristics, and preferences for different devices or modalities, affect MUX over time and whether users can be classified into different MUX user types (e.g., mobile-first users, keyboard- or touch-only users). Finally, from an organizational perspective, empirical investigations could attempt to shed light on whether and when investments into MUX (e.g., developing an AR app) pay off and how organizations can find and achieve an “optimal” level of MUX.

Third, numerous opportunities exist for future research to deliver new design knowledge through building and/or evaluating innovative MUX artifacts. Such research could follow a design-oriented behavioral research approach (Maedche et al. 2021) to observe and analyze existing MUX artifacts (e.g., SAP CoPilot, Tableau Ask Data, Amazon’s Echo devices) or a design science research (DSR) approach to build new MUX artifacts that tackle important real-world problems. Moreover, a better understanding of whether and how existing design knowledge for single-device and unimodal artifacts can be reused for the design of MUX artifacts would be beneficial. Of particular interest could be to provide design knowledge on how to effectively combine devices and modalities to be able to adapt the MUX to individual users and their changing needs over time. For example, future research could investigate the design of IS that automatically change or recommend modalities and devices according to the users’ current needs. Research in this area may also profit from setting up collaborations with researchers from other fields such as computer science.

Finally, future research should aim to provide methods and tools that support the development, implementation, and management of MUX. While existing methods and tools from areas, such as human-centered design, UX, and software engineering, may be used as a starting point, it is evident that handling the complexity of MUX – resulting from the large number of possible combinations of devices and modalities – requires a new set of methods and tools. For example, future research could develop methodological guidance to help researchers and practitioners choose among the plethora of options in order to identify the most suitable and promising combinations of devices and modalities for a particular purpose. Similarly, methodological guidance on implementing MUX in an existing IT landscape would be valuable. Finally, while many software vendors offer their own MUX development platforms (Gartner 2021), future research could provide platform-independent tools that empower everyone to design for MUX, regardless of whether they use commercial software packages or their own software stack.

Overall, we believe that this catchword offers a fresh perspective on MUX and opens up manifold opportunities for future research. MUX has not only been an important theme in prior IS research, but current trends and the constant technological advancement also indicate that it will continue to be so for the foreseeable future. At the same time, it is clear that the multitude of ways in which devices and modalities can be combined adds another layer of complexity to understanding the interplay between user, task, and technology. For example, it is difficult enough to design a unimodal artifact for one specific device or to rigorously examine how users interact with an IS on a single device using one modality. Going forward, there is little doubt that these difficulties will increase as the number of available devices and mature modalities continues to grow. Given the background, interests, and skills of BISE researchers, we are convinced that the BISE community is well positioned to both address the challenges and take advantage of the opportunities for future research on MUX, and we invite fellow researchers to contribute to this exciting research stream.