1 Introduction

Mixed reality (MR) games integrate physical entities, e.g. the physical environment, real objects etc., with digitally mediated contents into immersive entertainment experience. According to the Reality-Virtuality Continuum proposed by Milgram and Kishino (1994), MR games encompass a wide spectrum of hybrid systems and applications, varying from pervasive augmented reality (AR) games, e.g. Pokemon Go, to fully-immersive virtual reality (VR) games with tangible objects involved, examples like the work by Cheok et al. (2002) and Harley et al. (2017). Sometimes, it also refers to XR games as a collective term for AR, VR and MR games. While a widespread belief is that XR is the abbreviation for “extended reality”, some recent research like the article by Rauschnabel et al. (2022) also appears to contest this view. With its core value of bridging and blending the cyber and the physical world, MR game is considered to be the essence of future metaverse by some tech tycoons including Microsoft; and the key to approaching the metaverse is to “democratize the game building”, according to Waters (2022). In the recent decade, the application of MR technology in serious games and gamification has been widely witnessed in both public and private sectors, including education (Zikas et al. 2016), culture heritage (Ioannides et al. 2017) and healthcare (Abdelkader et al. 2011).

Currently, it has to traverse multiple technology stacks, such as computer vision, projection mapping, indoor/outdoor positioning, Internet of Things (IoT)...to just name a few, for game creators to make a single MR game. Despite the high technical threshold and development cost, commercial game engines and proprietary solutions provide rather limited and rudimentary supports, and MR game creators oftentimes find themselves drowning in too many lower-layer details, making the design and development process far more difficult when compared with conventional digital games, according to Medford et al. (2018).

Existing MR game research are found rely intensively on domain or use case specific solutions, indicating a lack of generalizability and extensibility. A rather diverse nature was exhibited by the current MR game research body, regarding in-game interactions, display technologies, the degrees of mobility and immersiveness etc., all the way to the mechanics how the virtual and the physical components are synergized with one another. These highly divergent perspectives hence rendered it even more challenging, if not infeasible at all, to settle at a unified, one-size-fits-all solution. Rather, it entails a top-level perspective to systematically delineate, or as Gaver (2012) put it, “give dimensionality to”, its design space, thus better inform MR game creators about different technological affordances and their potentials as design resources.

Fig. 1
figure 1

Research structure and procedure

To this end, we adopted a Research-through-Design (RtD) approach in this research (Zimmerman et al. 2007; Stappers and Giaccardi 2014). The reasons are twofold: First, We aim to fully leverage the advantages of RtD as an iterative and reflective design process for generating actionable knowledge, as well as involving potential users and stakeholders in co-design activities, while the latter is also consistent with the user-centered values that has long been practised within the Human Computer Interaction (HCI) research community. Second, design research values the culture of making, and the resulting artefacts are considered to embody the implicit theories and knowledge of designers in the field, according to Gaver (2012). We intended to design and create a set of creativity support tools integrating both hardware and software to concretely represent the new development paradigm that we propose specifically for MR games/gameful experience. Our intention is not to claim that it is the optimal solution, but rather to present the actionable understanding gained from our designerly practice in this emerging and complex research field, with the hope to stimulate further interests within both the research and practice community.

As shown in Fig. 1, we conducted the following research activities referring to the RtD methodology: (1) Grounding: This mainly involved literature review to identify some initial and multiple perspectives on the current design space for MR game technologies. Note that our focus was not a comprehensive systematic literature review that encompasses the entire state of the arts, but rather to obtain preliminary insights to guide the following design process. The outcome of this stage is summarized as three technological affordance spectra, see Sect. 2.1. (2) Ideation + Iteration: Based on the results from previous stage, we projected our proposed technology stack within the corresponding intervals on the three spectra, then conducted multiple divergent/convergent design and development iterations. The result was a prototype system that aims at facilitating low-code MR game creation. (3) Reflection: Using the prototype, we further conducted a co-design workshop to elicit user requirements, scenarios, future opportunities and challenges, so as to form a broader understanding on MR gameful experience. Four game conceptual designs were generated, and from the subsequent user survey and interview three major design implications were further extracted and synergized, which in return echoed the three previous technological affordance spectra. We believe, the proposed technology stack along with the intermediate-level generative knowledge that goes beyond the technological affordances, will contribute to the future applications and novel instances in this area.

2 Methods

2.1 State-of-the-art review

A literature query was performed on 9th April, 2023. Our review targets were academic publications, initially retrieved from IEEE Xplore and ACM Library. Scopus as well as snowballing were for extra query results that were not included by the previous two data sources. We specified all the query strings identically, in a way that both games within MR/XR, digital twins, cross-reality or hybrid reality contexts, and one or more of the following keywords: game engine, development framework or toolkit, programming interface, editing or authoring tool, must be contained in all metadata. Thus, we were able to narrow down our review targets from general MR game studies, to those who plausibly accentuated the technological perspective and hence had adequate details for us to analyze their technological affordances.

After the first-round screening based on titles and abstracts, apparently irrelevant results were eliminated, and a rest of 57 publications remained for the second round of full-text screening. We further ruled out less relevant articles, for example, in some papers MR was just a term appeared concurrently with VR and AR, while the described system actually fell out of the definition of MR in this article. Also, some papers presenting general findings such as design visions and common technical trends in MR systems without adequate implementation details were also excluded. As a result of the second round screening, 34 articles that were finally accepted.

We referred our review approach to the previous work by Laine and Lindberg (2020), and the detailed review protocol is presented in the table below.

Table 1 Review protocol

As we scrutinized through the state-of-the-art research, an intriguing multifaceted technodiversity was identified from current literature body. For instance, some frequently adopted hardware settings in these MR game studies included: (1) hand-held devices such as smart phones, tablets and personal digital assistants etc., examples like the work by Samodelkin et al. (2016), Cavallo and Forbes (2016), Linner et al. (2005), Behmel et al. (2014), Torstensson et al. (2020), Alavesa and Ojala (2015), Zarraonandia et al. (2022), (2) head-mounted devices like Microsoft Hololens, Magic Leap, Oculus etc., examples like the work by Cheok et al. (2002), Harley et al. (2017), Kim et al. (2020), Wang et al. (2019), Prompolmaueng et al. (2021), Nisiotis and Alboul (2021), Alpala et al. (2022), Alavesa and Ojala (2015) (3) position-fixed devices like Microsoft Kinect (Pillat et al. 2012; Oswald et al. 2014; Pratticò et al. 2019; Jing et al. 2017; Yannier et al. 2013; Reilly et al. 2010), ceiling-/wall-mounted projectors (Kim et al. 2020; Oswald et al. 2014; Pratticò et al. 2019; Hong et al. 2017; Lupetti et al. 2015; Lahey et al. 2008; Hatton et al. 2008; Kajastila and Hämäläinen 2014; Swearingen and Swearingen 2018), public displays (Samodelkin et al. 2016; Smith and Graham 2010) etc. Many studies leveraged standard interactions offered by conventional I/O devices like game pads (Cheok et al. 2002; Kim et al. 2020; Oswald et al. 2014; Pratticò et al. 2019; Hong et al. 2017; Khoo et al. 2009; Swearingen and Swearingen 2018, touchscreens (Samodelkin et al. 2016; Cavallo and Forbes 2016; Behmel et al. 2014; Torstensson et al. 2020; Smith and Graham 2010) and AR/VR headsets (Harley et al. 2017; Kim et al. 2020; Wang et al. 2019; Prompolmaueng et al. 2021). In addition, rich alternative interactions were also found, varying greatly from one another: (1) computer vision based interaction, e.g. fiducial markers (Cheok et al. 2002; Lahey et al. 2008), motion tracking (Pillat et al. 2012; Jing et al. 2017; Lupetti et al. 2015; Kajastila and Hämäläinen 2014; Tan et al. 2006). (2) location based interaction, with players being in either an indoor or outdoor environment, including the work by Cheok et al. (2002), Samodelkin et al. (2016), Cavallo and Forbes (2016), Linner et al. (2005), Pratticò et al. (2019), Khoo et al. (2009), Alavesa and Ojala (2015). (3) Tangible/object based interaction, including the work by Cheok et al. (2002), Harley et al. (2017), Behmel et al. (2014), Oswald et al. (2014), Jing et al. (2017), Yannier et al. (2013), Hong et al. (2017), Smith and Graham (2010), while robot based interaction, e.g. the work by Pratticò et al. (2019), Lupetti et al. (2015), Lahey et al. (2008), could also roughly be merged into this genre, note that there might be no actual physical contact between the players and the robots. As for application scenarios, the most targeted domains were general-purpose use (Harley et al. 2017; Samodelkin et al. 2016; Linner et al. 2005; Kim et al. 2020; Pratticò et al. 2019; Lupetti et al. 2015; Smith and Graham 2010) and education (Torstensson et al. 2020; Wang et al. 2019; Prompolmaueng et al. 2021; Pillat et al. 2012; Yannier et al. 2013; Lahey et al. 2008; Hatton et al. 2008), followed by entertainment (Cheok et al. 2002; Cavallo and Forbes 2016; Oswald et al. 2014; Jing et al. 2017; Hong et al. 2017), social interaction (Swearingen and Swearingen 2018; Khoo et al. 2009; Tan et al. 2006), sports ( Kajastila and Hämäläinen (2014)) and architecture (Behmel et al. 2014).

While we can go through more implementation aspects e.g. software components, authoring/editing tools, computational and communicational architectures etc., it will usually be less meaningful if we focus only on the individual technical primitives and their combinations. Rather, we attempted to extract a higher-level understanding by drawing on the resulting technological affordances, according to Gaver (1991) and Hutchby (2001), which are defined by what end users perceive a specific MR system to be like, regardless of its building technologies. As Fig. 2 illustrated, we identified a set of technological affordances from current MR game systems, which then were categorized into three different spectra, respectively:

Fig. 2
figure 2

Technological affordance spectra of MR game literature

  1. 1.

    Activity Range: User-perceived spatial freedom of in-game physical activities. As shown in Fig. 2, the left extremity locates at where players are confined to a stationary setting similar to playing conventional computer games, and as mobility extends, players’ physical activities can take place in table- or room-sized space. For example, Kajastila and Hämäläinen (2014) proposed an augmented climbing wall where climbers followed a projected climb route on a wall-sized surface. The activity range might fall right between the interval of the table-sized and room-sized space. The right extremity of this spectrum encompasses pervasive gaming settings where players are able to explore location-based MR contents in a large geographical scale.

  2. 2.

    User Interface: User-perceived access point to interact with an MR system. At the left extremity of this spectrum locates graphical user interface (GUI), where conventional WIMP (window, icon, menu, pointer) style elements or their 3D counterparts are directly transplanted into an MR environment. Most mobile-based pervasive games adopted this kind of user interface, such as the work by Samodelkin et al. (2016), Cavallo and Forbes (2016), Linner et al. (2005). Following GUI, we have witnessed the use of tangible user interface (TUI). In MR context, TUI often involves physical objects and surfaces that are not intentionally devised as input devices, forming a hybrid game experience that blends physical entities and digital contents. Some examples include the work by Harley et al. (2017), Oswald et al. (2014), Jing et al. (2017), Yannier et al. (2013), Hong et al. (2017). The right extremity is natural user interface (NUI), including but not limited to voice commands (Wang et al. 2019; Prompolmaueng et al. 2021), gaze (Wang et al. 2019), gestures (Wang et al. 2019; Swearingen and Swearingen 2018; Smith and Graham 2010) and body motion (Cheok et al. 2002; Oswald et al. 2014; Lupetti et al. 2015; Kajastila and Hämäläinen 2014). That being said, it is difficult to assert that NUI must be closer to reality extremity than TUI in some occasions, but both TUI and NUI are generally considered closer to the real-life interactions taking place in physical reality. Surely, there are also hybrid user interfaces, like the work by Smith and Graham (2010), in which tangible objects (e.g. a car toys) were combined with the touchscreen GUI on a tabletop computer.

  3. 3.

    Feedback Control: User-perceived feedback control that an MR system establishes between the virtual and physical entities. We defined the left extremity as “physical entity sensing”, referring to that the MR systems appear to at least possess certain mechanism to capture the status of physical entities. Next, “virtual entity actuation” describes MR systems’ ability to drive virtual entities or trigger virtual events in reaction to the status and its changes of physical entities. Many MR systems in our literature review located in the first interval between “physical entity sensing” and “virtual entity actuation”. For example, Behmel et al. (2014) described a review tool by which users could navigate the camera view in a virtual game scene by moving a tangible piece on a tablet surface showing the 2D top view. Here, the real-time trajectory of the physical piece was detected and further used to control the virtual camera. Similarly, “physical entity actuation” refers to MR systems’ ability to actuate physical entities according to its virtual counterpart or if certain game-defined condition is met. Research in the second interval included the work by Oswald et al. (2014), Yannier et al. (2013), Hong et al. (2017), Smith and Graham (2010). Oswald et al. (2014) proposed an MR game level editor that allows players to use different shaped and colored physical items, e.g. a yellow post-it, on a wall or projection screen to construct a “Super Mario” style game level. Virtual game characters and other digital game contents were overlaid on top of the physical interface by projection mapping. In this case, the interaction between virtual assets and physical entities permits not only sensing physical status and reviewing static virtual contents, but also dynamic construction and editing of the game levels. As the degree of feedback control gets more intensive, it will reach the the right extremity of this spectrum, which we define as “virtual-physical synchronization”. Both physical entities and their virtual counterparts involved in an MR game system are able to be fully synchronized. Whichever side has changed its status, it will thus trigger the corresponding update of the other’s status in a real-time and automatic manner, similar to what is known as the “digital twins”. Most work in the last interval between “physical entity actuation” and “virtual-physical synchronization” were human-robot games, like the work by Pratticò et al. (2019), Jing et al. (2017), Lupetti et al. (2015), Tan et al. (2006), where a physical robot could react to the behavior of virtual assets, e.g. a projected virtual pingpong on the floor.

Note that each of the aforementioned technological affordance is a continuous spectrum, and again we want to stress that there are also cases that possibly fall between the intervals. Interestingly, these spectra coincide with the Reality-Virtuality Continuum and all manifest a gradual transition from virtuality (left) to reality (right). These three spectra were commonly shared by the MR game systems we have reviewed, despite the technodiversity that the literature body demonstrated. By positioning MR game systems at specific intervals on the spectra, it allows creators to form a more precise and clearer vision of the target user experience, before coming down to the actual implementation stage. Thus, we believe that it is of specific benefits to take into account these general technological affordances, when designing and developing no matter a specific use case or more generalized development tools for MR games.

Additionally, while very limited studies self positioned themselves as game development frameworks/toolkits, the majority took the shape of ad-hoc game applications/use cases. To provide a more comprehensive understanding about current state of the art, we have extended our review objects from specifically game-focused studies, to a wider context of general MR/XR development. Based on the original query string, we removed the restrictive keywords related to game and gamified applications while adding more general terms like “architecture”, “infrastructure” under MR/XR, digital twins, cross-reality or hybrid reality contexts. The extended literature search was carried out primarily using the Scopus database, resulting in a total of 110 hits. We applied similar inclusion criteria as earlier mentioned in Table 1, and 17 articles remained for the full-text examination after removing duplicated and unqualified results. We finalized the selection process with 6 publications that suitable for parallel research analysis.

Within extant literature, we have observed a substantial corpus of work with an exclusive focus on either software architecture or a conceptual framework for development workflow. For example, Kavouras et al. (2023) proposed methodological framework that aims to minimize the time and cost of urban planning process and to increase citizen participation; Silva et al. (2022) proposed a framework for creating VR and AR experiences for learning or training purposes in serious environments, with gamification elements to keep users engaged in the learning process; Kern and Latoschik (2023) proposed a software toolkit for enhancing cross-device and cross-platform compatibility of I/O modules in XR development, mostly targeting consumer-grade VR/XR products. We also identified a few studies tried to provide full-stack or approximately full-stack solutions. Svanæs et al. (2021) proposed a development framework that offers standardized connectivity among heterogeneous devices, networks and platforms by defining a wifi/Bluetooth communication layer abstractly. Unity game engine was used as a hub for cross-reality integration, by which networked physical devices possessed their digital twin counterparts. However, it still entailed technical specialty to handle the lower-layer details and hardware complexity. Zarraonandia et al. (2019) described an XR game development toolkit, which was considered closest to our proposed technology stack. It differs in the sense that an interactive box, incorporating built-in sensors, buttons, LEDs and an ESP8266 NodeMcu development board, was leveraged as the core component for enabling real world interactions. Although pre-programmed, the interaction box still required certain level of electronic engineering knowledge for correct circuit connections and offered restricted extensibility due to its physical configuration.

In the following subsection, we will present an MR game technology stack as a result of an iterative design process, then further reveal the rationale behind the design decisions to showcase how we reflected on the aforementioned affordances in our own practice.

2.2 Mixed reality game technology stack

Reflecting on our findings from literature review, we propose a modular technology stack for designing and developing MR games. Previous studies (Tsai and Wang 1999; Krahn et al. 2008; Wang et al. 2011) indicate that a modular technology stack with readily integrated virtual and physical components will greatly mitigate development cost and technical hurdles. Moreover, it increases the overall customizability and adaptability so as to fulfill various situational needs and domain-specific requirements, thus facilitating research community, game creators as well as domain experts to fully exploit MR for innovative, full-fledged gameful experience. We went through a designerly iterative process with a primary goal to lower down the overall technical threshold and engage also less tech-savvy users in the MR game creation. The proposed technology stack, is less a general solution, than it is to empower users, with or without technical expertise, to explore the design space of cross-reality interactions within the context of MR games/gamified applications.

Fig. 3
figure 3

Prototype hardware components, from left to right: Cardboard VR Goggles with an Android smartphone, RTK module with patch antenna, passive UHF RFID tags, and RFID reader

In the first iteration of our prototype, we intended to provide players with an activity range as wide as possible, but without being exclusively confined to pervasive game scenarios. Previous research by Cheok et al. (2004) and Magerkurth et al. (2005) has intensively investigated the combined use of location-based gaming and augmented reality; successful commercial cases like Pokemon Go also came out in the market. However, virtual reality that can be applied to both outdoor and indoor scenarios remains still an underexploited area. To this end, we drew on the mobility of smartphones and cardboard VR goggles, and further integrated Real-Time Kinematic (RTK) positioning module to improve the precision of player location data from around 10 ms (referring to ordinary GPS positioning) to a decimeter level. Specifically, we utilized commercial-off-the-shelf (COTS) RTK rovers taking account of the overall availability and affordability (see Fig. 3a). A pervasive VR demo in outdoor environment was implemented and shown in Fig. 4, and more details can be found in our previous work by Xiao et al. (2021). By utilizing the native SDK of cardboard goggles, natural user interface can partially realized such as gaze and head movement. However, we found that with the only magnetic button sitting on the side of cardboard goggles, it highly constrained the interactability, which turned out inadequate for gameplay in most occasions.

Therefore, in our second iteration we incorporated passive Ultra High Frequency (UHF) Radio Frequency Identification (RFID) module to enable full-spectrum user interactions, by leveraging motion detection via RFID tags attached to body surface (NUI) and RFID-embedded physical objects (TUI) (see Fig. 3b). Passive RFID was preferred because it is cableless, battery-free, low-cost and flexible to be blended with environments and physical entities, thus highly compatible to mobile game scenarios where the players with their VR goggles on. Our previous studies by Xiao et al. (2022b) has showcased how to dynamically change players’ virtual coordination, i.e. teleport, and load virtual assets from a remote server by scanning RFID tags. These use cases demonstrated the technical feasibility to drive virtual assets using physical entities, which was identical to the second interval located on the feedback control spectrum.

Fig. 4
figure 4

A pervasive VR Demo in outdoor environment

The overarching technology stack we propose is shown in Fig. 5. Unity was adopted as a hub, where all system components except the external RFID host server were integrated to form a holistic MR game development environment. By building our architecture upon an existing popular game engine, it is able to reuse rich built resources and reduce development cost; for experienced Unity users, it can further lower down the learning efforts and maintain a consistent usage habit.

Fig. 5
figure 5

Proposed MR game technology stack

We have integrated three major functional modules on the basis of Unity, respectively: (1) Cardboard VR module, where we relies on the Google Cardboard VR SDK for Unity to handle lower-layer motion detection, like head movement, orientation and gaze etc. (2) Outdoor Positioning module, which leverages the Google 3D Maps for Unity to establish a spatial mapping between the physical and virtual environment. To achieve a better positioning precision, we employs a SparkFun GPS-RTK2 rover with a Bluetooth module on board. The RTK node with its patch antenna is a palm-size device and communicates with the smartphone via Bluetooth. (3) RFID-based Interactive module, where the mobile VR application utilizes UDP messaging to keep listening to an external RFID host server. To maintain an overall consistency, the program on the RFID server side was developed by using C#, same as the scripting language in Unity. The server program runs on an independent laptop, which connects to a ThingMagic M5EC RFID reader via USB cable. Whenever the RFID reader detects a tag, the host service will look up the tag’s EPC (Electronic Product Code, acting as a universal identifier of an RFID tag) in a local CSV(Comma-Separated Values) file and send the associated data to the mobile VR application in a standard JSON format.

Thus, the CSV file functions as an authoring tool, allowing users to customize the data being associated to a specific RFID tag (see the bottom of Fig. 5 for an exemplary data structure contained by the CSV file). It turns each RFID tag into a data input device or event trigger, which conveys the user-defined data on its activation to the mobile VR application for further processing the data or manipulating the virtual assets etc. The easy configuration of RFID tags and their associated in-game behaviors without complicated coding is a significant feature when we designed the technology stack, as most designers and developers of MR serious games and gamified applications are domain experts and researchers, who may not necessarily possess relevant expertise and technical skills on game programming.

By deliberately separating the RFID host device from the mobile VR, our intention was to cover a wide range of different use scenarios with various degrees of mobility and interactability. As implied from the former subsection, it entails a technology stack to be flexible and extensible enough so as to offer different technological affordances in response to different requirements and situational needs. Consider two contrasting use cases: an urban scavenge hunt game and a multiplayer motion-based exergame. The first game scenario relies intensively on the RTK positioning module to navigate through an outdoor environment and reveal location-based virtual contents; while the players may need to carry a mobile RFID reader to explore the surrounding environment and scan hidden tags for e.g. hints for solving a puzzle, narratives for guiding players to the next location etc. While the second one is more about co-located multi-player embodied play, and multiple RFID tags can be assigned to different players and attached to different body parts; by making clever use of the read range of one or more position-fixed readers, enriched embodied interaction and social experience can be expected. By decoupling the RFID host device from the mobile VR app, our proposed technology stack allows bespoke deployment according to particular on-site needs, such as mobile RFID readers (with Android OS) using 5 G network or fixed readers with USB or WiFi connections to PCs, with little or no re-adaption on the mobile VR side.

While the authors noticed that some recent research is trying to redefined the concept of MR, like the work by Rauschnabel et al. (2022), yet we stick to our current adoption of Reality-Virtuality Continuum brought up by Milgram and Kishino (1994). The reasons are twofold: (1) Our proposed technology stack can hardly fall into the dichotomous categorization of XR systems proposed by Rauschnabel et al. (2022). According to whether the physical environment is (at least visually) part of the experience or not, they tried to define and divide XR systems into either AR (if yes) or VR (if no). As shown in Fig. 4, our proposed technology stack establishes a realtime spatial synchronization between physical reality and the synthetic virtual reality, the physical environment does play a significant part in our pervasive VR experience. (2) The proposed technology stack innately lends itself to creating the “Augmented Virtuality (AV)” experience, which, according to Milgram et al. (1995), refers to completely graphic display environments with some amount of “reality” (e.g. the GPS information of a user’s current position in our case) or additional real-object interactions (e.g. RFID-based tangible interaction in our case). However, the AV was completely excluded from Rauschnabel et al. (2022)’s definition, mostly due to their intensive emphasis on contemporary, industrial AR/VR experience instead of “niche” academic studies, like the case presented in this research. This gap, however, reaffirms our belief that there are still uncharted areas for researchers and industrial practitioners to explore. In general, we agree with Rauschnabel et al. (2022) that AV is much less acknowledged across both academia and business areas, yet, MR still remains a proper defining term for this kind of hybrid, cross-reality experience.

2.3 User co-design workshop

To directly observe and gain first-hand end user feedback about the intended use and affordances of the proposed technology stack, we conducted a user co-design workshop. Compared with evaluative user experiments structured for assessing how well an ad-hoc solution addresses given problems, a user co-design workshop allows open-ended exploration for the researchers and participants together to generate a broader understanding of the creation and use of MR gameful experience. We have drawn upon the approach for evaluating general creativity support tools advocated by the HCI research community (Zarraonandia et al. 2022; Remy et al. 2020; Frich et al. 2019). Beyond the specific use of the proposed technology stack, our aim is to further abstract an intermediate level of generative design knowledge, or the “strong concept” as Höök and Löwgren (2012) and Löwgren (2013) put it, for inspiring and seeding future research and instantiation in this area.

In the same vein, we recruited 15 participants in total from both design and computer science backgrounds, with varied skills and experience of game design and development. 7 participants claimed no experience at all, while 8 of them claimed that they have game making experience less than three years. It was our deliberation to incorporate less tech-savvy participants, since the proposed technology stack targets not only experienced game developers, but also game designers as well as domain experts who may carry out actual on-site implementation and playtest. The majority of the participants were university students and research faculty, among which, 7 were females and 8 were males, ranging from 19 to 25 years old with an average age of 21.

The overall workshop was structured into three sequential parts, respectively: (1) hands-on tutorial, (2) group co-ideation, and (3) user surveys and interviews. We will give a more detailed account of each topic in the following three sub-subsections. To note that, except for specific devices and materials such as cardboard VR goggles, RFID readers and tags, we did not provide the workshop participants with ordinary hardware and software like PCs with pre-installed Unity and programming software. Rather, we asked the participants to bring their own laptops and smartphones, so as to approach the in-the-wild condition as closely as possible.

2.3.1 Hands-on tutorial

In the first session, the participants were given an overarching introduction of MR games, then offered a step-by-step tutorial to establish the MR game development environment (see Fig. 6a). As shown earlier, the proposed technology stack is a composite of three distinct functional modules, i.e. the mobile VR module, the RFID-based interactive module and the outdoor positioning module. To prevent the learning curve being too steep for beginners, we further broke down the task into subtasks, where each functional module was added, run and tested independently, following a similar style to what is known as incremental development, according to Larman and Basili (2003).

Fig. 6
figure 6

User co-design workshop

Specifically, the participants first downloaded the Google Cardboard SDK for Unity, proceeded with necessary configuration, built and ran a sample game scene on their own smartphones. After successfully testing and experiencing the cardboard VR, the participants were then instructed to import the RFID messenger and the JSON data parser from a prepared unity package. This enabled the participants to dynamically load virtual assets from a remote web server into the same sample scene by scanning RFID tags. To brief the process, the RFID host server and web asset bundles were shared among multiple participants, hence not all participants needed to run their own RFID host service or upload their own assets to the web server. Before moving to the next stage, the participants again built and tested the newly-integrated module on the mobile ends. Finally, the outdoor positioning module and its dependent Google 3D Maps for Unity packages were imported. Unfortunately, due to the time and spatial restriction of the workshop, we were not able to let the participants to go out and test this function in the outdoor environment. Instead, we demonstrated several videos of sample game prototypes using the outdoor VR component.

Moreover, due to the hardware/software diversity and thus the resulting technical glitches, there were predictably varying progresses among participants at the end of the tutorial session. Instead of asking each participant to complete all the tasks, the workshop organizer demonstrated the overall workflow, and the participants were encouraged to proceed as far as they could within the planned time slot.

2.3.2 Group co-ideation

Based on the intake of the previous session, the participants were separated into 4 random groups and continued to a 45-minute co-ideation session. After sensitizing them with the proposed MR technology stack via hands-on practice, video and live demonstration, we expected the participants to envision what can be done with the technology stack and generate conceptual game designs via group discussions and brainstorming. After the co-ideation step, each group went through a 10-minute presentation about their ideas (see Fig. 6b), followed by a voting session for all participants and audience to select their favorite conceptual game. As illustrated in Fig. 7, the four presented game concepts include:

Fig. 7
figure 7

Four user-generated game design concepts

  1. 1.

    MR pets: The first conceptual game design was an MR simulation game, where players can raise an imaginary pet in the virtual world, such as slimes, unicorns and dragons. By feeding the pets with different data, e.g. location-specific data, mental and physical data from the owner etc., it determines the unique direction of evolution for each virtual pet. Each pet is bound to an RFID tag, which can possibly be embedded into a key holder, an amulet, a mobile pendant and alike attachable or wearable accessories; and only collocated players with VR goggles on around a “pet spot”, where an RFID reader has been installed, can see the others’ virtual pets. In this way, players in the real world can connect with each other through virtual pets, creating a community and the sense of belongingness by exchanging the experience of cultivating their virtual pets.

  2. 2.

    MR murder mystery game: Conventional murder mystery games, or more generally, script entertainment, rely heavily on script text reading, artificial stage settings, game masters (GMs) and/or non player characters (NPCs) to proceed the story and puzzle solving. The second group suggested an MR-enhanced murder mystery game, where the proposed technology stack can be used to improve the overall interactability and immersiveness. For example, GPS can be used for guiding players to a real flower shop or around a scenery spot; while RFID tags can be hidden in costumes, treasure chests, weapons and other physical objects, and assign different virtual attributes to these objects. Say a player is role playing a spy with a mission to sneak into a banquet. He or she may need to pick and wear the right combination of costumes and accessories either by acquiring enough “elegance” value for the doorman (who wears VR goggles and can see all guests’ attributes) to allow him/her enter the front entrance, or the player can go another way around by wearing the costumes that are low key enough then sneaking into the banquet from the backyard or service entrance. Thus, it enriches the way how the players can proceed with in-game tasks and interact with the NPCs, and greatly extend the game context and playful experience in real-world settings.

  3. 3.

    AR tour guiding game: The third conceptual design anchored in serious game with a specific purpose to guide tourists through an itinerary around a scenery spot or within a museum and provide augmented location-aware information. The third group proposed to leverage a multi-branched narrative, which is oftentimes seen in text-based adventure games (T-AVGs). Players’ choices on entering different story lines will then structure their itinerary in a different and meaningful way, thus shaping a more personalized and related sightseeing experience. Distinguished from the previous two game designs, where the RFID tags are attached to the players and one or more readers are installed at fixed positions, players in this game need to carry a mobile RFID reader with them, which may be hidden in a lantern for instance. According to the points of interests displayed in a mini map, players may move the lantern around an environment object or exhibit, and when the lantern get close to a hidden RFID tag, it will then light up and reveal a secret hint or a piece of key information for the players. In this case, a see-through AR mode will be used to provide players augmented visual presentations while maintaining their situation awareness (see Endsley 2021) towards physical exhibits and surroundings at the same time.

  4. 4.

    MR multiverse: The forth conceptual game design is similar to the third one in the sense of its emphasis on educational purpose, as well as that players can switch to several “parallel universes” by making different decisions in the game. The proposed technology stack is used to show players a virtual timeline of a real natural landscape, by traversing its past, presence and future. At certain point, players are given the chance to determine the direction of how future will develop. For example, if players decide to take care of a sapling in the virtual world, it will help increase the chance to avoid possible future disasters as results of global warming and climate changes. The more people make sustainable choices, the more likely for the game to arrive at a promising future.

Among all the above conceptual game designs, the MR murder mystery game was the top voted idea. For the winning team, each member was rewarded a steam gift card that worth 50 RMB.

2.3.3 User survey and interview

After sensitizing the participants to the proposed technology stack during the co-design workshop, we hence conducted questionnaire-based surveys and semi-structured user interviews with individual participants. Our aim is to synergize fresh, in-depth design insight from user feedback. The user survey part intended to collect quantitative data reflecting users on their experience regarding some general aspects of the proposed MR technology stack, including user-perceived learnability, innovativeness, engagement and willingness of future use, while the four open-ended questions used in the following interview focused more on the speculative use and game scenarios, positioning as well as potentials of the proposed technology stack.

Each session lasted for around 15-20 min. All 15 interviews were carried out either online or in person, and completed within less than 5 days after the workshop, so as to ensure that the participants’ impression of the workshop content remained fresh and clear. All interviews were audio and video recorded. We conducted our user experiments in compliance with the Helsinki Declaration (2013). Prior to user experiment and each interview session, we informed the participant about his/her right to withdraw from the interview at anytime, the use of collected data, e.g. individual information, images and sound, as well as other relevant issues. The participant was asked to sign an informed consent form, if he/she agreed to grant his/her consent.

Table 2 User survey questionnaire

To specifically note that, instead of a rigorous statistical verification of pre-established hypothesis, the primary objective of the questionnaire-based survey is to identify possible pitfalls highlighted by particularly salient data. For example, as shown in Table 2, Q1-1 asked about the learning difficulty, and if a non-experienced participant answered with a rather lower score like 1 or 2, then he/she would be asked to provide additional explanations in the sequent interview, so as to help us identify what may contribute to the entry thresholds perceived by novice users. Thus, it provides a supplementary perspective that facilitates the qualitative analysis.

The participant was first directed to an online questionnaire page, which consisted of four questions, as shown in Table 2 below. We referred to some existing validated questionnaires, e.g. Validated UTAUT2 questionnaire for elderly (see the work by Siow 2016) and so forth (Camilleri and Camilleri 2022; Indrawati and Putri 2018), and then tailored our own questionnaire accordingly. The participants were asked to respond to the four questions using a 5-point Likert scale, with the lowest point 1 indicating the most negative feedback (very difficult/strongly disagree) and the highest point 5 indicating the most positive feedback (very easy/strongly agree).

After the participant filling up the questionnaire, the semi-structured interview would start. Some prepared open-ended questions were listed in Table 3.

Table 3 Semi-structured user interview questions

All recorded interview audios were later transcribed into text files, and a thematic analysis was conducted to break down and comprehend the transcribed interview data. Two researchers coded the transcribed text data independently, and the discrepancy in the resulting codes were then addressed by discussions until the consensus of the research team was reached.

Fig. 8
figure 8

User survey questionnaire results

3 Results and discussion

3.1 Questionnaire survey results

Based on the questionnaire responses, we further summarized, analyzed and visualized the user survey results using bar charts. Although we present here the overview of collected data, the quantitative result per se served for the purpose of signaling potential noteworthy issues by salient data and providing corroboratory information to the later qualitative analysis, rather than a rigorous statistical evaluation. As shown in Fig. 8, the workshop participants perceived an average learning difficulty around 3.2 (standard deviation=0.98) about the introduced MR game technology stack (Q1-1). Moreover, regarding perceived innovativeness (Q1-2, potential contribution to innovative game development), perceived motivation (Q1-3, user motivation to game development), willingness of future use (Q1-4), the results indicated a considerably high average ratings of 4.4 (standard deviation=0.8), 4.5 (standard deviation=0.62) and 4.5 (standard deviation=0.81) respectively.

Given a small sample size of 15 participants, it was difficult to assert any statistical significance.While we observed no extreme outliers in the aforementioned four indicators, few low scores, like the perceived innovativeness and the willingness of future use that were each rated 2 points by one participant, were further investigated and reconfirmed during the follow-up individual interviews. Moreover, some results, e.g. high perceived innovativeness and motivation, relatively medium learnability, were further reinforced by the qualitative analysis result as well. In general, we believe that despite plenty of technical glitches and first-time contact, there implied a common interest and positive acceptance towards this kind of new game technology stack.

3.2 Semi-structured user interview results

In this subsection, we will present the common themes that have emerged from the user interview, following the same sequence as the interview questions.

For the first question, “which conceptual game design did the participant vote for as the favorite one”, most participants’ answers were the MR murder mystery game, even though we did not specifically restrict the participants to vote for the idea of their own team. Prior to the COVID pandemic, the amount of offline murder mystery shops in China reached 30,000, according to Xie (2021), and it has already become a popular entertainment and socializing form among the younger generation. Indeed, when asked about the reason why they voted for the MR murder mystery game, a significant part of the participants answered that they had experienced the commercial offline ones before. As one participant P3 said during the interview:

“Conventional murder mystery game has limitations like restricted immersion and limited player communication, mostly relying on oral communication. I believe that’s where MR technology can come in and improve.”

Interestingly, multiple participants (P3, P6, P7) unanimously expressed the idea that MR murder mystery games, which do not fit in the traditional definition of serious game though, fall into a specific genre of game-with-a-purpose to “provide added values beyond just digital entertainment that contribute to offline real economy, like local tourism, retail and cultural events” (P6). A great potential and innovation space awaits for further exploration, where the technology is expected to bring in not only a stronger sense of immersiveness and authenticity, but also greater opportunities to generate real social impact.

When asked “what kind of games that would be suitable to leverage the proposed technology stack”, the participants’ opinions branched into two divergent tracks. The first one, partially aligning with the observation drawn from the first question, includes offline murder mystery games, LARPs (Live Action Role-Playing game), scavenger hunt and other service and event games that entail an intense incorporation with specific physical environment, stages, costumes and items etc. Distinguished from product games, it appears more frequently on the event organizer and service provider side the duty of preparing game settings and particular gaming equipment, e.g. VR headsets, RFID tags and readers. Thus, it permits new technology-enabled gaming experience with accessibility towards a wider audience, without imposing extra device requirements on the users. One such application scenario mentioned by both P7 and P14 was for the museum and exhibition visitors, closely related to the third AR tour guiding game concept presented earlier. P14 claimed herself to be a history museum mania, “but oftentimes I found those exhibitions lacking the handlers for long-term memory. It’s easy to lose track with the exhibits because of information overload.” To this end, the proposed technology stack is believed to be able to magnify the efficiency and effectiveness of digital storytelling and narrative based serious gaming, according to Abrahamson (1998).

In contrast, the other possible game scenario suggested by the participants focused on a more casual or even daily gaming context, e.g. location-based mini games that players can carefreely initiate and terminate a game session during their commute time. This sort of mini games can possibly be embedded into the existing social media applications, e.g. WeChat, Tiktok, Facebook etc., and do not require players to download any game software or applications, or go to specific game venues. One such example brought up by P5, which she thought could be combined with MR, was a mobile pervasive chasing game making use of WeChat’s location sharing function. Similarly, P9 also mentioned another WeChat applet, YangLeGeYang, a match-three mini game which got explosively popular over the Chinese social network in 2022. He commented that:

“This kind of light-weighted applet games with simple gameplay, once spread through the social media, may have the chance to grow into a phenomenally popular one. Then people may be attracted to buy a cheap cardboard VR headset and put their phones into it, just to try and play the game.”

Two bottlenecks can be implied from the above comment, which prevent current MR/XR games from further popularization: technology availability and lacking of killer applications. We have witnessed earlier in conventional console and PC game market how a phenomenal hit title could spurred purchases and massive-scale upgrade in consumer hardware, mainly game consoles and graphics cards. While MR/XR games may yet stay out of the mainstream options of gamers, low-cost entry-level devices for easy tryout may help lower down the barriers for technology availability aside from a killer application; and the effects of “viral marketing” via social media may further accelerate this process. These findings are also in line with our earlier research by Xiao et al. (2022c) about user engagement in technical systems. However, the same study also pointed out that a more profound ecosystem and game culture need to be cultivated for maintaining long-term user engagement, when the initial technical novelty wears out.

As for whether there ware any pre-/post-workshop differences or not, 14 out of 15 participants gave positive feedback. For non-experienced participants, the most reported change was the transition in their understanding of game design and development. According to P2,

“To my understanding, game making was once simply equal to coding and software programming...For the first time, it made me aware that how much physical components can actually lend to game creation. (For example?) Say I am making a rogue game, and I can tweak the parameters of each virtual item at each level in a very detailed and complicated way, but still it’s totally different...It can’t compare to the real feeling that physical objects brings about.”

The same insight was shared and reinforced by the feedback from the experienced participants. When compared the proposed technology stack with their previous tools, e.g. Unity, Processing etc., a significant portion of participants considered that the introduction of RFID and geographical positioning adds to the overall interactability as well as an enriched spatial experience, making them start to “think outside the screen”. Or as P11 put it,

“My research topic is about VR interactive narratives. Many current VR interactive films still heavily rely on or just simply transplant the conventional screen-based interaction into VR, like popping up a menu and asking the audience to pause and make a choice at certain point. VR turned out to be no more than just a gimmick-like thing. The use of RFID makes it possible to rely on more natural and meaningful user interactions, instead of asking users to learn and follow an interaction guide or explicitly perform something.”

There surely were some negative feedback about the use of the prototype tools, which also directs us to our last question: What further improvement would you like to propose for the current technology stack?

We have received some specific feedback regarding the development tools of the proposed technology stack, explicitly, the comprehensibility of programming interface (7 participants). For example, as bare RFID tags are utilized as part of the programming interface, the participants complained that it was difficult to distinguish between multiple tags; also, there was no visible indicator in current interface for showing the read range of the RFID reader, resulting in repetitive tests and extra efforts for users to try out and adjust by themselves. Therefore, a more distinguishable RFID interface was considered as a must by 3 participants. In the same vein, users now have to specify the IP address and port number of the RFID host server by changing the corresponding part in the source code. A better encapsulated tool instead of exposing lower-layer technical details will facilitate especially non-tech-savvy beginners, stressed by 2 participants. In addition, tutorials and community support were specifically valued by 3 experienced participants.

3.3 Discussion

As the principal results of the user feedback analysis, we have recognized a few commonly shared opinions among the participants. By synergizing them with our own interpretations, which were rooted in our previous design practice and empirical knowledge, we came up with some major insights and further mapped them to the three technological affordances spectra presented in the Sect. 2.1. To specifically note that, these insights are not confined to the proposed technology stack, but rather generalizable to be applied to a wider MR game research and practice context. We hence present the three major design implications below, with bracketed numbers indicating the amount of advocatory participants respectively:

  1. 1.

    Seamful Design, Seamless Experience: The use cases that attracted most attentions were located between the intervals of room-sized and pervasive games on Activity Range Spectrum (8). Pervasive MR games are considered to expand and transcend the temporal, spatial and societal boundaries between the game and real world, or the “magic circle”, which exists in conventional games according to Huizinga (2020) and Montola (2011). Game creators may have to well balance the immersiveness and the situation awareness simultaneously, which is greatly distinguished from ordinary pervasive games or fully immersive VR games. For example, several participants (4) mentioned that it may on occasion break a coherent gameplay if players need to put on/down their goggles from time to time in an outdoor gaming scenario, or when the players need to perform the action of scanning an RFID tag with a reader. This issue can be partially addressed by concealing the technologies underneath the surrounding environment and objects without users’ explicit awareness of their presence. We also suggest seamful design, proposed by Chalmers et al. (2003), as a practical approach to associate underlying technologies and the seams they create with meaningful interpretations in consistency with the context of the gameplay. One such example is the work by Yi et al. (2020), where a mediation device for museum exhibition visitors was proposed. When the visitors hold the stethoscope-like device close to an exhibited object or panel, which was embedded with RFID tags, and move along the surface, he/she can see augmented virtual contents through the scope such as the cardiovascular system inside the human body or the internal texture and structure of a rock sample. By employing an analogy of stethoscope, it successfully established a meaningful mapping between users’ action of holding a VR display with the behavior of exploring knowledge itself. This kind of metaphorical design is also able to immerse users into a playful and consistent experience, without distracting their focus from the physical targets to the virtual contents.

  2. 2.

    Think Outside the Screen: We found that participants generally favor the TUI and NUI on the right side of User Interface Spectrum more than the conventional GUI in screen-based games on the left end. This observation aligns with some established design guidelines such as the one by Bowman et al. (2004). It is commonly agreed that user interface for MR/XR systems should be native and dedicated to the fully or partially immersive environment, however, in practice we found that it is not always easy for game creators to truly think outside the screen. Auxiliary props like RFID were reported explicitly helpful for pivoting game creators around alternative user interfaces involving physical entities and body movement (8). The participants also expressed their appreciation towards RFID’s ability to foster unambiguous and unobtrusive identification and tracking without complicated electronic engineering and programming (4). As a single RFID tag affords only binary input as either activated or deactivated, some participants also expressed their expectation for a better granularity for motion detection (3). It is possible to mitigate the issue by a more sophisticated arrangement, e.g. an RFID tag array with dedicated gesture recognition algorithm developed by Wang et al. (2018), or incorporating extra sensor units like IMU, so as to capture continuous, fine-grained motion data. However, our experience as well as previous research by Mueller et al. (2018) both confirmed that an engaging game experience can benefit more from the innate ambiguity of embodied interaction than from a fully reliable and accurate sensing/tracking utility. A good example is the party game, 1-2-Switch by Nintendo, which allows players to enjoy filling the blank where the joycon cannot sense, by their own bodily improvisation and performative play. Therefore, one may need to consider what will be the best match between the intended game experience and the granularity of user interaction, e.g. is full-body motion tracking the core to the gameplay? Again, this probably requires MR game creators to think outside the technology framework.

  3. 3.

    Play with Virtuality and Reality: Referring to Feedback Control Spectrum, our user study results manifest the participants’ needs for enhanced physical entity actuation (1) and in some occasions a full-fledged virtual-physical synchronization (2). The former is consistent with our previous finding by Xiao et al. (2022a) in a way that existing smart daily objects and environments were suggested to provide new play opportunities and new in-game feedback modality. For example, suppose a player enters a dark and icy dungeon in a game, it will thus actuate the connected smart home appliances to lower down the room temperature and dim the ambient light, so as to create a responsive physical environment as part of an enhanced and holistic game experience. In the same vein, research community are seeking to cohere vestibular and proprioceptive systems with visual stimulus, e.g. aligning vertical floor vibrations with cannon bombing in a VR game (Jung et al. 2022), or utilizing the sense of gravity as a resource for game design (Hämäläinen et al. 2015) etc. In our user study, the participants had the concerns that when players interact with RFID-embedded physical objects, virtual feedback might not be adequate especially when there are multiple similar objects nearby at the same time. It implies a need for enhanced physical feedback or even a higher-level synchronization between the status of physical entities and their virtual counterparts. Currently, the most relevant research identified in this direction was robot-based MR games. While it is technically feasible to incorporate more sensing and actuating utilities into MR games, rather than simply mutual mirroring between a physical entity and its digital representation, a more sophisticated mechanism is worth of future research efforts in this highly context-dependent and open-ended area.

4 Conclusions

4.1 Limitations

The proposed technology stack relies on smartphones coupled with cardboard goggles to make an abridged mobile VR display. The hardware limitations in computational capacity and the graphic processing ability of current smartphones result in a compressed 3D graphic quality inferior to dedicated displays. Two participants in our user study have mentioned this issue. Another limitation lies in that currently RFID is not among standard smartphone configurations, although NFC offers similar utility but with a much shorter read range. It thus requires extra RFID reader devices, while there are mobile readers with pre-installed Android OS, we have not yet tested the proposed technology stack on them.

It has been our intention that the technology stack settles in the form of an open source project, and users will have the full access to the source code. To configure and compose their own MR game applications, for now, users need to directly modify part of the source code, e.g. C# scripts in Unity, CSV files at the RFID server end etc. The user study results suggested that it was inadequate to expose the programming interface as bare source code. To this end, we are considering that a tangible programming interface may facilitate end-user development (EUD) and collaborations within a mixed reality environment. This will be directing our future work in the next iteration, and user elicitation and evaluation will be further conducted throughout the process.

As for research process and experiment design per se, the result of questionnaire-based measurement was restricted to the limited sample size (\(p=15\)), and the participants’ ratings may be possibly confined to the limited use experience during the co-design workshop. Again, we want to stress that the interpretation of the quantitative analysis results should be combined and collated along with the parallel qualitative results. We position the three design implications as a sort of intermediate-level knowledge between particular instances and general theories. While we are not able to claim the universal applicability, the emphasis is on how this kind of generative knowledge construction can inspire and contribute to future design, research and implementation of similar systems.

4.2 Contributions

In this article, we first concluded the technological affordances of MR game systems from current state-of-the-art literature. The identified technological affordances encompassed three different spectra, namely (1) activity range, (2) user interface and (3) feedback control, each of which reflected a gradation between virtuality and reality from the left to the right extremity. By mapping MR game systems at specific intervals on the spectra, we believe that it will help MR game creators establish a precise and clearer vision of their target user experience, prior to an actual implementation stage.

Secondly, we proposed a general-purpose technology stack for MR game creation, based on our reflection of the previous technological affordance spectra. It consisted of three distinct functional modules, respectively (1) mobile VR module, (2) RFID-based interactive module and (3) outdoor positioning module. To collect first-hand user feedback, a co-design workshop was conducted, and a total of 15 participants took part in the workshop with 4 conceptual game designs generated as outcome. The user survey and interview results conjointly indicated an overall positive user experience about the perceived innovativeness, the perceived motivation, the willingness of future use of the proposed technology stack while a moderate feedback about the learning difficulty. Future applications and potential improvement were also identified from the user interview.

Last but not least, we have further discussed three generative design implications, including: (1) seamful design, seamless experience, (2) think outside the screen, and (3) play with virtuality and reality. We believe these insights, as a reflection of our previous empirical research and design practice, will contribute to the collective knowledge body and facilitate the design and development of future MR games and gamified applications in the coming era of metaverse.