Towards a model-driven approach for multiexperience AI-based user interfaces

Software systems start to include other types of interfaces beyond the “traditional” Graphical-User Interfaces (GUIs). In particular, Conversational User Interfaces (CUIs) such as chat and voice are becoming more and more popular. These new types of interfaces embed smart natural language processing components to understand user requests and respond to them. To provide an integrated user experience all the user interfaces in the system should be aware of each other and be able to collaborate. This is what is known as a multiexperience User Interface. Despite their many benefits, multiexperience UIs are challenging to build. So far CUIs are created as standalone components using a platform-dependent set of libraries and technologies. This raises significant integration, evolution and maintenance issues. This paper explores the application of model-driven techniques to the development of software applications embedding a multiexperience User Interface. We will discuss how raising the abstraction level at which these interfaces are defined enables a faster development and a better deployment and integration of each interface with the rest of the software system and the other interfaces with whom it may need to collaborate. In particular, we propose a new Domain Specific Language (DSL) for specifying several types of CUIs and show how this DSL can be part of an integrated modeling environment able to describe the interactions between the modeled CUIs and the other models of the system (including the models of the GUI). We will use the standard Interaction Flow Modeling Language (IFML) as an example “host” language.


Introduction
The specification and the implementation of the User Interface (UI) of a system is a key aspect in any software Communicated [19] to offer rich interactions between the user and the system. But nowadays, a new generation of UIs which integrate more interaction modalities (such as chat, voice and gesture) is gaining popularity [27]. Moreover, many of these new UIs are becoming complex software artifacts themselves, for instance, through AI-enhanced software components that enable even more natural interactions, including the possibility to use Natural Language Processing (NLP) via chatbots or voicebots. These NLP-based interfaces are commonly referred as Conversational User Interfaces (CUIs).
Even more, many times several types of UIs are combined as part of the same application (e.g., a chatbot in a web page), what it is known as Multiexperience User Inter-face. These multiexperience UIs may be built together in the development environment provided by a Multiexperience Development Platform (MXDP). According to Gartner 2 , MXDPs serve to centralize life cycle activities-designing, developing, testing, distributing, managing and analyzing -for a portfolio of multiexperience apps. Multiexperience refers to the various permutations of modalities (e.g., touch, voice and gesture), devices and apps that users interact with on their digital journey across the various touchpoints. Multiexperience development involves creating fit-for-purpose apps based on touchpoint-specific modalities, while at the same time ensuring a consistent user experience across web, mobile, wearable, conversational and immersive touchpoints. According to Gartner, by 2023, more than 25% of the mobile apps, progressive web apps, and conversational apps at large enterprises will be built and/or run through a multiexperience platform.
Despite the increasing popularity of these new types of user interfaces, and the rich possibilities they offer when combining among them or with preexisting GUIs, we are still missing of dedicated software development methods and techniques that facilitate their definition and implementation. The fragmented ecosystem of languages, libraries and APIs for building them (many times, proprietary and vendorspecific) add a new challenge to their development. Besides, their development is often completely separated from that of the rest of the system, as an ad hoc extension. This raises significant integration, evolution and maintenance challenges. Developers need to handle the coordination of the cognitive services to build multiexperience UIs, integrate them with external services, and worry about extensibility, scalability, and maintenance. We believe a model-driven approach for MXDP could be an important first-step towards facilitating the specification of rich UIs able to coordinate and collaborate to provide the best experience for end-users. Indeed, most non-trivial systems adhere to some kind of model-based philosophy [8] where software design models (including GUI models) are transformed into the production code the system executes at run-time. This transformation can be (semi)automated in some cases. In this sense, the contribution of this paper is twofold: -We propose to raise the abstraction level used in the definition of this new breed of conversational and smart interfaces, facilitating their specification and implementation on a variety of platforms. -We show how these interface models can be used in conjunction with GUI models to combine the benefits of all these different types of interfaces. 2 www.gartner.com.
The rest of the paper is structured as follows: Section 2 provides the background and introduces some preliminary concepts used through the paper; Section 3, which is focused on Conversational User Interfaces (CUIs), presents a new Domain Specific Language (DSL) for specifying these interfaces; Section 4 shows an extension of the standard Interaction Flow Modeling Language (IFML) to describe the links between the multiexperience UIs and the other software components; Section 5 reviews the related work; Section 6 outlines several lines for the further work; and finally Sect. 7 draws the conclusions.

Background and preliminary concepts
Before presenting our model-driven approach for multiexperience applications, we review in this section some basic concepts on User Interfaces (UIs) and their evolution, with a special focus on Conversational User Interfaces.

Evolution of user interfaces
The history of computer User Interfaces is that of the improvement of the user experience. At the beginning, textbased User Interfaces, such as Command-Line Interfaces (CLIs) which evolved from batch computing, allowed to process commands in the form of plain text. The next relevant improvement was the introduction of Graphical User Interfaces (GUIs), which allow interacting through graphical elements such as icons, cursors, and buttons. In the last decades, GUIs evolved to a new paradigm known as Natural User Interfaces (NUIs), which integrate more interaction modalities such as touch, voice or gestures to make the user interactions with the software feel as natural as possible.
This naturalization process often implies embedding some AI components in the UIs (for instance, for voice and gesture recognition), what result in the Intelligent User Interfaces (IUI) [45]. A clear example of Intelligent UIs are the smart Conversational User Interfaces (CUIs) 3 [28], which allow users to interact with computer-based applications via natural language.
Note that an application may include more than one User Interface as part of an integrated multiexperience User Interface enabling the user to interact with the application in multiple ways depending on the more convenient way to accomplish a certain task in a given context. Even in parallel. The development of this new type of multiple and complex

Conversational user interfaces
CUIs are becoming more and more popular every day. The most relevant example is the rise of bots [27], which are being increasingly adopted in various domains such as ecommerce or customer service, as a direct communication channel between companies and end-users. A bot wraps a CUI as key component but complements it with a behavior specification that defines how the bot should react to a given user message.
Bots are classified in different types depending on the channel employed to communicate with the user. For instance, in chatbots the user interaction is through textual messages, in voicebots is through speech, while in gesturebots is through interactive images. Note that in all cases bots are the mechanism to implement a conversation, it just changes the medium where this conversation takes place. As such, bots always follow the similar working schema depicted in Fig. 1.
As you can see in the Figure, the conversation capabilities of a bot are usually designed as a set of intents, where each intent represents a possible user's goal when interacting with the bot. The bot then waits for its CUI front-end to detect a match of the user's input text (called utterance) with one of the intents the bot implements. The matching phase may rely on external Intent Recognition Providers (e.g., DialogFlow, Amazon Lex, IBM Watson,...). When there is a match, the bot back-end executes the required behavior, optionally calling external services for more complex responses; and finally, the bot produces a response that it is returned to the user (via text, voice or images, depending on the type of CUI).
As an example, we show in Fig. 2 a bot that gives the weather forecast for any city in the world. Following the working schema sketched in Fig. 1, this weather bot defines several intents such as asking for weather forecast. When a user writes (considering a chatbot) or says (considering a voicebot) "What is the weather today in Barcelona?" or "What is the forecast for today in Barcelona?", the intent  [17] asking for weather forecast is matched and "Barcelona" is recognized as a city parameter (also called "entity") to be used when building the response. Then, the bot calls an external service (in this case, the REST API of OpenWeather 4 ) to look up this information and give it back to the user.

Modeling smart conversational user interfaces
In this section we describe how the Model-Driven Development approach may be applied to specify bots comprising some type of Conversational User Interface (e.g., chatbots).
To this end, we present a new Domain Specific Language (DSL), that generalizes the one discussed in [17] to cover all types of CUIs and not just chatbots and follows a state-machine semantics to facilitate the definition of more complex flows and behaviors in the interaction with the users. A DSL is defined through two main components [26]: (i) an abstract syntax (metamodel) which specifies the language concepts and their relationships (in this context, generalizing the primitives provided by the major intent recognition platforms used by bots such as [3,21,24]), and (ii) a concrete syntax which provides a specific (textual or graphical) representation to specify models conforming to the abstract syntax.
In the next subsections we provide the abstract syntax of our language split into three packages in order to facilitate its readability: the intent package metamodel (see Sect. 3.1), the behavioral package metamodel (see Sect. 3.2), and the runtime package metamodel (see Sect. 3.3). Besides, we provide a possible concrete syntax, to show how the concepts of the metamodel are instantiated in our running example (see Sect. 3.4).

Intent package
The Intent Package metamodel (see Fig. 3) describes the set of concepts used for modeling the intent definitions at design time.
It defines a top-level Library class containing a collection of Events, representing occurrences that have some consequence for the system [32]. Events, which are identified by its name, are classified in a hierarchy according to the UML [32] and the IFML [31] metamodels. There are two types of Events: ThrowingEvents (events that are generated by the UI) and CatchingEvents (events that are captured in the UI and that trigger a subsequent change). In its turn, we classify CatchingEvents into two sub-categories: UserEvents (events produced by the user) and SystemEvents (events produced by the system itself, usually provided by ExternalServices, which represent a high-level abstraction of external platforms and providers).
Intents, which in the context of CUIs are the most relevant type of UserEvents, represent the user intentions or requests that bots have to address. For each Intent we need to store a set of training sentences (Text) in a specific language, which are input examples used to detect the user intention underlying a textual message. The Text of a training sentence is split into several TextFragments representing input training sentence parts-typically words-to match. Note that we assume voicebots (or other bots which are not dealing with text) have an internal pre-processing which translates all the interactions to text in order to be able to automatize its treatment, as typically done in industrial practice. As a consequence, in this package we can address all Intents directly as text, just like it occurs in chatbots. Besides, Intents can optionally have Parameters which have a name and an entity, and allow to define specific characteristics of an Intent. We define the EntityType enumeration to save the values that a parameter may acquire. Note that, this enumeration includes basic types (String, Integer, Boolean, etc) as well as other more specific types (Date, Time, Location, City, etc) which are supported by most natural language understanding platforms such as dialogFlow 5 .

Behavioral package
The Behavioral package metamodel (see Fig. 4) defines the set of concepts used for modeling the execution logic of the Intent package. For this aim, we reuse the UML statemachine formalism [32]. Roughly, a state-machine is a graph where vertices represent the States and arcs represent the Transitions. State machine execution is triggered by appropriate Event occurrences. In the context of this paper, this representation allows to express the valid (conversational or event-driven) interaction flows between a user and a bot.
A State models a situation in the execution of the Behavior during which some invariant condition holds. A particular state is the InitialState, which is, as its name indicates, the initial state where the bot is when starting its execution. Each State defines a body, which is the Behavior executed when entering the state. All these Behaviors consist of a set of Actions which specify, in a concrete language, the specific functions performed. These Actions may be provided by an ExternalService, such as platforms and providers.
Sates are connected through Transitions. A Transition represents a single directed arc from a single source State and terminating on a single target State. We distinguish two types of Transitions: AutomaticTransitions (which are triggered automatically) and GuardedTransitions (which are triggered when a specific guard holds). A GuardedTransition may be triggered by several Events (in our context, usually an Intent). When multiple triggers are defined for a Transition, they are logically disjunctive, that is, if any of them are enabled, the Transition will be triggered. Besides, a GuardedTransition may have an associated guardedConstraint, which allows a fine-grained control over the firing of the Transition. The constraint is evaluated when an Event occurrence is dispatched by the state-machine. If the constraint is true at that time, the Transition may be enabled, otherwise, it is disabled. When none of the output Transitions hold, the bot executes the general defaultFallback behavior, or, if defined in the state where the bot is on, the localFallback behavior to try to recover from this unexpected situation (e.g., a matched intent in a state where that intent was not supposed to be matched).

Runtime package
The Runtime package metamodel (see Fig. 5) defines some of the classes used during the runtime execution of the deployed bot. This is useful to illustrate how the bot works but also to describe some of the internal structures required to build a bot engine that can be run CUIs as the ones described before. We focus on the classes related with the Intent package and we avoid the classes from the Behavioral package, since we consider the former is the most relevant for the topic of this paper and the later are similar to any proposal for enacting state machines.
In this package, the UserInput metaclass represents the user utterances. Depending on the type of UI we are dealing with, the user input may be in the form of a SpokenInput (in voicebots), a TextualInput (in chatbots) or a GestualInput (in gestual interfaces). Each of these inputs represent Data in form of Speech (for voicebots), Text (for chatbots) or Video (for gestual interfaces). In all cases, these data is translated into text (see the associations speechToText and videoToText). To simplify, in this paper, we do not model the external AI libraries that could be used to perform this translation, but in practice, all these different input types are translated into text (and later on converted back if necessary) for an homogeneous treatment by the CUI.
A UserInput may match with one or more Intents defined in the Intent Package 6 . We assume there exist an intent recognition provider (out of the scope of this paper) used to classify UserInputs into one or more Intents. Each recognized intent 6 Classes from another packages are shown shaded. is represented by the MatchedIntent class, which stores the level of recognition confidence provided by the intent recognition provider.
Finally, for each Parameter of an Intent, the corresponding MatchedIntents keep its value in the association class ParameterValue. Each of these ParameterValues correspond to a specific TextFragment of the analogous UserInput which has derived the MatchedInput.

Concrete syntax
In order to complete the definition of our DSL, in this section we provide an example of textual concrete syntax based on Xatkit [17]. The syntax is presented through examples using the running case study introduced in Sect. 2.2. The complete syntax of this language and its instantiation can be seen in GitHub 7 . Alternative syntaxes (including graphical ones) can be easily mapped to the abstract syntax presented so far 8 .
The first step is defining the asking for weather forecast intent (see Listing 1). Note how as part of the intent, the bot collects the name of the desired city.
Listing 1 Intent definition for the weather bot.

Modeling multiexperience applications
A key element of multiexperience UIs is that every type of interface communicates and collaborates with each other, across different devices and applications, thus giving the user the perception of a unified and shared experience across the different modalities and interfaces. The expectation of users when aiming at a given objective is that he can interact with a variety of applications in different contexts in a seamless way to reach the target.
For instance, Fig. 6 shows a typical scenario of multiexperience user interaction. Suppose that a customer on a Sunday morning wants to buy a new technical product (a cell phone or a home theater system). He first interacts with his home assistant (like Alexa or Google assistant) to ask it to find the best nearby tech store open on Sunday. With this information in mind, he looks at the store web site on his PC and, being satisfied with the kind of store, he asks the web site chatbot to find the type of products he is looking for. After browsing the various alternatives, he finds one item he likes, and sets the place and the product as preferences on his mobile phone. He reads the details of the product on the phone while walking to his car. When he reaches the car, he transfers the information about the place to the car navigation system and drives there. Finally, in the stores he looks around, tries various items, reads the reviews about them on a dedicated mobile app, and finally picks up the product and pays for it.
This kind of dynamic and seamless interaction demands a variety of complex design and implementation mechanisms to be put in place. In the previous section, we have discussed our approach for the specification of CUIs. In this section, we will see how they can be integrated in other, more general, languages to become part of a multiexperience development project that are able to satisfy requirements like the ones described in the above scenario. As before, given the cost and complexity of developing comprehensive multiexperience UIs, MDD can play an important role in the specification and implementation of the user interaction paradigms and of the respective UIs realizations.
In particular, for describing the integration of the DSL presented in Sect. 3 within a broader-scope user interaction design at the modeling level, we will make use of the Interaction Flow Modeling Language (IFML) [31], one of several existing modeling languages for UI design. We describe various levels of integration and we discuss the advantages of a model-based MDXP approach in our context. IFML is an international standard defined by the OMG that aims at the platform-independent description of graphical user interfaces for applications accessed or deployed on a variety of systems including computers, mobile phones, tablets, and smart devices. It supports event-driven specification of the interactions and it integrates seamlessly with other languages for the specification of the business logic, including UML and BPMN.
The focus of IFML is on the structure and behavior of the application front-end as perceived by the end user, i.e., the view part of the application. Therefore, modeling the UI and interaction with IFML amounts to addressing the following aspects: The view structure and components that define the interface shapes, and the contents visible in the UI; The events specification, which consists of the definition of events (coming from user's interaction, application logic, or external agents) that may affect the state of the UI; The reference to actions triggered by the user's events; The events triggered in the user interaction and their effects in terms of action execution and on the state of the UI; And the parameter binding between the elements of the UI and the triggered actions.
Furthermore, IFML can be complemented with external models for connecting to any kind of content model (representing databases, ontologies, file systems or other resources) and any kind of dynamic model (describing the business logic behind the application front end). Various implementation exist for this modeling languages, including WebRatio 9 [1] and IFMLedit.org 10 [6], which provide modeling, validation, and code generation capabilities.
For granting a multiexperience UI featuring a new CUI integration together with the GUI option already predefined in IFML, we focus on extending the IFML language with appropriate metamodel-level concepts, according to the extensibility rules provided within the language itself. We start from defining the different levels of integration that might be of interest for a multiexperience UI, based on increasing levels of complexity: 1. The first integration level consists in supporting the positioning of the chatbot within a specific page, window, or screen of the GUI, so that it becomes visible and accessible from it, alongside the other interface elements; 2. The second integration level is that of data sharing between the bot and the rest of the interface, thanks to various data sharing mechanisms; 3. The third level of integration is that of interactive integration, where the chatbot and the other elements of the UI are connected in a bi-directional way through interactions on one side which can trigger effects on the other and vice versa. 4. the fourth level of integration enables also event-triggered side effects through execution of actions on both sides.
Thanks to the modular and appropriate design of both the IFML metamodel and the chatbot metamodel, all the integration levels can be achieved with a very simple extension strategy, namely the addition of just two concepts in the IFML language, as described in the IFML metamodel excerpts reported in Figs. 7 and 8. Figure 7 shows the integration of the Chatbot metaclass, which defines the presence of the chatbot component. More precisely, Chatbot is defined as a specification of the ViewComponent IFML metaclass, exactly as any other typical component we usually add to interfaces, including Lists, Forms, Details panels, and so on. This allows the Chatbot to be positioned within View-Containers such as Windows, as per the composite design pattern specified in the metamodel. Thanks to the general semantics of IFML, this also allows full support of the two additional levels of integration, because the component can send or receive data to/from other components, and can trigger navigations of the interface. Figure 8 shows the IFML metamodel extension for the chatbot Action metaclass, corresponding to the Action metaclass defined in the Chatbot metamodel. This allows the invocation of actions in the chatbot from any UI component. And vice versa the bot behavior definition can easily reuse and call IFML behaviors by working at the BehavioralFeatureConcept level when needed.
To demonstrate the advantages of this integration, Fig. 9 shows an example IFML model featuring a chatbot integrated within the UI of a tourism web site. The site includes a page Touristic Offers, including a list of touristic Locations. The user can select a location and thus see the details of the Chosen Location. From there, the user can add the location to the cart, for future purchase. The page also features a chat- bot component named Weather Explorer, which allows the user to ask about the weather in specific locations. Besides asking questions and interacting directly with the chatbot, the user can also ask to perform specific tasks, such as add a location to the cart, or start the checkout process. These two requests are performed by the chatbot through the outgoing arrows, which lead, respectively, to the Add to cart action and to the Payment Information page. Information is transferred between IFML components according to the semantics of the language. In particular, this can be achieved according to two strategies: (i) by transferring parameter values along the navigation flow, i.e., the arrows connecting the components, which therefore carry incoming and outgoing parameters; (ii) by defining a shared Context environment, which is available as a global data space to all the components. These behaviors automatically extend to the chatbot component too.
Finally, Fig. 10 shows a deeper level of integration between the pieces of the user interface, where the chatbot is the chatbot receives a Location Set event, which triggers a change of state in the internal bot state machine, independently on where the user triggered the event directly solicited through event notifications. In IFML, any component, possibly from multiple devices and interfaces, can generate events (event generation is represented by black circles in the models). Such events may be captured by other components, which in turn may change their state or trigger some action according to the event meaning (event capturing is represented by a white circle). In the example, a Location Set event may be generated by the user selecting a location in the Web interface, or by a relocation of a smart watch or another device, which generates a GPS positioning event. The chatbot can capture these events and trigger a state change in the internal state machine that determines the evolution of the dialogue of the bot.
Thanks to this wide variety of integration options, a seamless multiexperience integration can be achieved. Indeed, the flexibility of the proposed metamodel allows to integrate blend interactions with the chatbot, with interactions happening on other components of a Web page, and with interactions happening with other interfaces (mobile apps, embedded systems, wearables, environmental sensors, and other devices), provided that they are able to send and receive events according to a classical event-driven architecture. The event-driven approach allows a very general and light-weight integration practice, which is widely supported. At the same time, it allows for direct integration at the user interface level (by sending around front-end events that trigger UI reactions), and at the back-end level (where the events can directly trigger changes in the state of the components or even on the data layer).

Related work
In this section, we summarize a large corpus of previous works that address how the model-driven development approach has been applied to model user interfaces, mainly focusing on Conversational User Interfaces and their integration with other types of interfaces in a multiexperience application.
Instead, there is much less work focusing on the current emerging set of UIs, aiming at establishing more natural and conversational experiences, supported by the AI. These new type of UIs, which are hard to specify [37], test, verify and debug [39] and require a different and specialized skillset [25] could benefit as well from a model-based perspective. Right now, most of the initiatives focus on CUIs and, in particular, chatbots, mainly lead by commercial companies offering a kind of low-code/no-code [12] front-end to their specific chatbot platform. Some examples are: Tock 11 , Engati 12 or FlowXO 13 . With a more research perspective, we have Xatkit [17] (formerly known as Jarvis [16]), a fully modular and extensible platform-independent chatbot modeling language which provides a meta-model and a textual DSL for defining all types of bots (which we have been used as basis for the generalized metamodel in Sect. 3); CONGA (ChatbOt modeliNg lanGuAge) [36] providing a unifying DSL for specifying some types of chatbots that can then be implemented on top of a couple of chatbot platforms (and even migrated from one to the other); and Baudat et al. [5] proposing wcs-OCaml, a new multi-tier chat-bot generator library designed for use with the reactive language Reac-tiveML. Some other works focus on voicebots. For example, tortu 14 which supports the visual creation of conversation flows and VoiceFlow 15 which facilitates a graphical DSL to create voice-based conversation flows that can be deployed on Google home or Alexa.
Nevertheless, all these initiatives focus on specific types of CUIs and treat them as standalone components and not as part of the more general systems in the context of MXDP. Some 11 https://doc.tock.ai/tock/. 12 https://www.engati.com. well-known low-code platforms like Mendix 16 , GeneXus 17 or OutSystems 18 start to include chatbots as part of their system specification. But this support is rather limited and mostly consisting in either simple chatbot templates or in helping you to connect your application with an external AI component defined with a separate tool (e.g., one of the above).
To sum up, although there exist plenty of tools that provide model-based environments to define CUIs they are limited to specific types or bots or target only concrete technologies. Moreover, none of the tools we have analyzed support the MXDP approach as there is no way to link the CUI models with those of other CUIs or with the software models to provide richer multiexperience interactions.

Roadmap
We believe our proposal is a good first step towards this model-based MXDP vision. But there is still plenty of work to be done to advance in the model-driven development of powerful smart Conversational User Interfaces, especially as part of an integrated systems modeling process. In this section, we discuss a few of them.

Modeling the training process of CUIs
AI-based CUIs must be trained before deploying them. In some cases, the process is simple and fast but in others (e.g., CUIs for very specific domains, like the medical context [41]) we may need to reuse pre-trained language libraries and/or perform specific curation tasks on the input training sentences of the bot.
In order to able to completely define the end-to-end process of specifying and generating a CUI, we should provide new modeling primitives for specifying these tasks as part of our CUI metamodel, potentially looking at existing languages for workflow modeling.
This would be very useful also as a way to trace and explain the bot behavior. For instance, a voicebot can present accessibility issues by only properly recognizing voice messages from a certain set of English Speakers. Examining the training process could help as pinpoint the origin of the issue.

A repository of MXDP models and best practices
Many CUIs need to deal with a set of repetitive interaction patterns. Some examples would be greeting users and other 16  chit-chat conversation flows, providing contact information, or detecting and stopping trolls. It would be ideal to have a community-driven repository of partial models for these repetitive tasks that we could easily import in new projects.
Similarly, we could also use this repository to share best practices around CUI design. There is plenty of literature around good design patterns for Graphical User Interfaces but not so much for CUIs. Same applies to patterns and best practices for combining several UIs part of a MXDP approach.

Modeling extensions
Our proposal covers the core aspects of modeling multiexperience interfaces. But there are several other dimensions of bots that could be added to the metamodel shown in Sect.
3. An example is role-based access control. We may need to define bots that hide some intents from some users (e.g., non-authenticated ones) or that respond differently to them. Integrating a RBAC metamodel (or any of its derivatives [30]) in our proposal would allow for a joint definition of the access-control policies together with the bot behavioral aspects.

Testing of MXDPs
There are some efforts for testing bots (e.g., [11]) but they focus on specific testings properties/techniques. Plenty of new research should be conducted to bring CUI testing to the same level of testing we have for regular interfaces.
This testing strategies should also cover system testing scenarios, where we test the collaboration scenarios between different CUIs part of a multiexperience application and not just each individual CUI in isolation. A roadmap for bot testing is also discussed in [13].

Automatic generation of MXDPs
For structured data sources, we can automatically derive a (limited but useful) behavioral/interface mode providing basic query and modification capabilities for the underlying data [2]. The same idea is implemented in many web application frameworks that offer scaffolding capabilities.
This same principle can be applied to CUIs. By looking at the data structure we could derive potential conversations users may want to have on top of that data. There are some initial works in this direction [15,18,35] but results are still preliminary. This generation can also take place at the MXDP level by combining generators for specific types of UIs to cover as well the glue code required to make them work as an integrated multiexperience application.

CUIs for modeling editors
Modeling editors could also benefit from the same advantages we get when adding CUIs to any other type of software systems. Modeling tools are known to be a barrier to entry for the adoption of modeling practices, CUIs could play a positive role here. For instance, a voicebot could help to create models as part of a collaborative design meeting. Or enable stakeholders to have a more direct participation in the modeling process even if they don't master the language notation.

Conclusions
The growing popularity of all kinds of intelligent Conversational User Interfaces is undeniable. And these interfaces are not isolated components. Instead, they are a core element of the software system that embeds them. As described by the MXDP initiative, CUIs must interact with the other interfaces of the system and have access to its functionality and resources.
The large number of libraries, platforms and technologies to develop CUIs complicates even more their development. To facilitate a global, integrated and platform-independent development process for CUIs, this paper has presented a model-based approach for CUIs covering both the design of each individual interface and the discussion of how such design could be combined with other software models for a complete software generation process. This is a first step in this direction. As we discussed in the previous section, there are plenty of research challenges and potential improvements to advance towards this vision. We plan to tackle them next, starting with the chatbot testing and automatic generation aspects.
Acknowledgements This work has been partially funded by the Spanish government (PID2020-114615RBI00) and the AIDOaRt project, which has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 101007350. The JU receives support from the European Union's Horizon 2020 research and innovation programme and Sweden, Austria, Czech Republic, Finland, France, Italy and Spain.
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copy-right holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.