1 Introduction

In recent years, the evolution of technologies has opened up new frontiers and found innovative solutions in the field of tourism and enhancement of cultural heritage (Bekele et al. 2018). Augmented Reality (AR) is often adopted to complement traditional forms of fruition and offer visitors new ways of exploring and acquiring knowledge during the visit of a place of historical and artistic interest. Based on the approach of edutainment (De Paolis et al. 2011a, b), which exploits game-oriented environments (De Paolis et al. 2010) and multichannel and multisensory platforms (De Paolis 2013) for educational purposes, AR fosters a direct involvement of the visitors (Cervellini and Rossi 2011), by providing them with relevant and contextual information. In museums and archaeological sites, it allows to advance reconstructive hypotheses about the structural parts, polychromy and sculptural furnishings no longer present on site. The visitor, using a tablet or smartphone, can thus take a “trip back in time” and see the differences between what was and what is no longer there. This is particularly useful for environments that have changed considerably over time or mutilated sculptures and paintings whose polychromy has been lost. AR applications also represent an alternative tool to the traditional guided tour, enabling a deeper knowledge and understanding of the artwork, through an interactive guide based on a stimulating, immersive and engaging communicative impact.

In this context, what might partially disorient the user stems from the fact that AR establishes a connection between a story that follows a narrative order and real environments that can be explored in a nonlinear manner (Shilkrot et al. 2014). Also in the light of this consideration, this work starts from a mobile AR application developed for the promotion of Basilica of Saint Catherina of Alexandria in Galatina, introduced in a previous work (Cisternino et al. 2021), to conduct a deeper study aimed at investigating the interdependencies among usability, user experience factors and mental workload from users’ answers to SUS, UEQ and NASA-TLX standard questionnaires. At the end of the analysis, a possible integration of these human factors into a unified framework is also discussed.

In particular, some important research questions addressed by this work are:

  • Is the attractiveness of an AR application such as the one analysed in this paper influenced in any way by usability?

  • Is there a significant correlation between the novelty effect and the attractiveness of the application?

  • Is it mentally more demanding to use the application or to learn how to use it?

  • Does the storytelling influence the mental workload?

The rest of the paper is structured in this way: Sect. 2 presents the related work about AR applications for cultural heritage and some models for usability and user experience evaluation; Sect. 3 introduces digital storytelling; Sect. 4 briefly describes the case study on which the AR application presented in Sect. 5 focuses; Sect. 6 introduces the experimental methodology and the questionnaires used for the post hoc analysis; Sect. 7 analyses the collected data about users’ impressions; Sect. 8 discusses and summarizes the main findings; and Sect. 9 concludes the paper.

2 Related work

Several AR applications have been developed especially in the archaeological field, being this technology able to offer the user the possibility to better understand the transformations occurred over time through the visualization of faithful reconstructions of monuments and landscape (Bonacini 2014). The European Union has immediately grasped the potential offered by this technology and has funded several cultural projects, such as ARCHEOGuide (Augmented Reality based Cultural Heritage On-Site Guide) (Vlahakis et al. 2002) and iTACITUS (Intelligent Tourism and Cultural Information through Ubiquitous Services) (Zoellner et al. 2007). The former is an application that guides the user during the visit of an archaeological site using a laptop equipped with an HMD visor that returns the image of the temple of Zeus at Olympia, superimposing it on the existing ruins. The latter is a tool designed to enhance the points of interest of a territory by providing information in AR simply by pointing a device towards an artwork.

Another mobile AR application (Cisternino et al. 2018) was developed to support the promotion of the archaeological areas of the “Castello di Alceste” Diffuse Museum in San Vito dei Normanni (Brindisi) and the “Fondo Giuliano” site in Vaste (Lecce): when an aerial photograph of the areas is framed using a smartphone, 3D models and other contextual information are visualized.

The ArkaeVision project (Bozzelli et al. 2019) proposed an integrated VR/AR framework for a gamified exploration with elements of digital fiction and an engaging storytelling: it employs VR technology for the exploration of the Temple of Paestum and AR technology for the exploration of the slab of the Swimmer Tomb.

Another work (De Paolis et al. 2021) studied the use of virtual and mixed reality technologies for the promotion of an underground oil mill, addressing place accessibility issues and the valorization of itineraries and rural heritage.

An outdoor location-based mobile AR application that allows users to interact with both virtual and physical objects was presented in Ping et al. (2020): experimental tests were conducted to assess navigation accuracy, ease of use, interaction naturalness, attraction, immersion and users’ attitude towards the application. The tests revealed also the ability of the application to improve the learning effectiveness and extend the focusing time of participants.

A special kind of on-site exploration made possible by mobile AR is based on virtual portals, located in specific points of interest, that emulate a transition to a virtually reconstructed past reality when users pass through them (Cisternino et al. 2019).

Location-based AR can also exploit unmanned aerial vehicles (UAVs) to take pictures of an archaeological site from an aerial perspective and display various contextual information (3D models, textual information, etc.) over them (Botrugno et al. 2017).

AR has also been used in various museum. The Museum of London was one of the first museum that makes use of AR applications: by launching StreetMuseum, an application that uses geolocation to render certain views of the city in AR, the user can take a trip back in time and sees monuments as they appeared in the past. Another museum that was able to take advantage of AR technology to enhance the understanding of its collections was the Franklin Institute in Philadelphia, which, from 30 September 2017 to 4 March 2018, hosted the Terracotta Warriors of The First Emperor exhibition (Terracotta 2021), including ten of the 8,000 famous statues unearthed in 1974 by a Chinese farmer. The AR application that accompanied the exhibition allowed to frame the warriors and to digitally reconstruct their weapons. By zooming in, it was also possible to admire the statues in their details. In Italy, too, several museum used this technology to enhance cultural heritage, transforming the traditional guided tour into a more interactive and engaging experience. Among the regions that first proposed cultural tourism routes based on mobile applications, we would like to point out Tuscany and Apulia, whose applications are configured as interactive guides of the territory, with some maps indicating the points of interest, the routes to reach them and some additional information that enriches the knowledge. The potential AR interaction patterns for guided tours in museums were explored in Liu et al. (2021), where a possible combination of handheld device and head-mounted display was studied.

The mobile AR application described in De Paolis et al. (2018) recognizes the sketches on the Atlantic Codex and superimposes animated 3D models showing the structure and the working principles of the machines designed by Leonardo Da Vinci. Another application based on touchless interaction using the Kinect device was made to allow further study of these machines and tests were carried out to assess the impact in a learning context (De Paolis et al. 2019).

Other applications have instead focused on a single artwork, allowing the user to travel back in time through 3D reconstructions usable by means of visors or permitting to deepen the executive technique, penetrating into the brushstrokes or preparatory layers of a painting and, in some cases, returning the original colours now lost. An example of this is the innovative Project “L’Ara com’era”, realized with the scientific collaboration of the Superintendence of Roma Capitale and aimed at enhancing the Ara Pacis of Augustus. Visitors, wearing a Samsung Gear VR visor inside which is placed a smartphone, have the opportunity to review the colours that originally enriched the monument and to view 3D movies that tell some aspects. Even the Hall of Frescoes in the Palazzo Comunale of Tarquinia (Hall 2021) has been the subject of an interesting AR project that has given voice to the characters depicted, allowing them to tell the story, anecdotes and curiosities about the past of the city. There are also museums created specifically to take advantage of AR technology, as in the case of MAUA—Museo di Arte Urbana Aumentata (Museum of Augmented Urban Art) (MAUA 2021), a widespread museum created in Milan, Palermo and Turin. The street artworks, once framed with the smartphone, come to life and transform.

A different kind of AR applications applied to cultural heritage is the Spatial Augmented Reality (SAR), also mentioned as video mapping (VM), a particular form of AR that consists in the projection of light beams on surfaces in order to transform the facades of buildings into screens, altering the real vision and enriching it with content (Bimber and Raskar 2005; Cisternino et al. 2021; De Paolis et al. 2022).

A multiplatform usability evaluation test, based on interviews, observations, think-aloud protocol and questionnaires, was conducted on a digital storytelling application for the city of Timisoara in Romania employing interactive touchscreen table, desktop/laptop, mobile and AR platforms (Vert et al. 2021). The study demonstrated the strengths of various types of technologies in cultural heritage: an interactive touchscreen table fosters social interaction, a desktop/laptop application allows for detailed exploration of content, a mobile application allows for on-the-fly exploration and sharing of information with other users and an AR application offers the possibility of deepening topics at certain landmarks.

The study in Jin et al. (2022) compared natural user interface and graphical user interface for a Hololens narrative application: between the two modes, the former turned out to be the best performing system for users without Role Playing Game (RPG) experience, while the latter turned out to be the best performing system for users with RPG experience.

2.1 Usability and user experience

According to ISO 9241-11 International Organisation for Standardisation (2018), usability is “the extent to which a system, product or service can be used by specified users to achieve specific goals with effectiveness, efficiency and satisfaction in a specified context of use”.

User performance is influenced by mental workload (Cain 2004), also known as cognitive workload, which is the mental effort needed to perform the tasks. It can be evaluated by means of quantitative performance tests, physiological measures or subjective feedback collected through questionnaires (Moustafa et al. 2017).

Various questionnaires were developed to be administered to users after their experience with a system or application to assess the usability they perceived (Assila et al. 2016; Lewis 2018; Hajesmaeel-Gohari and Bahaadinbeigy 2021).

The HARUS questionnaire was developed for the usability of mobile AR applications (Santos et al. 2014, 2015): it was conceived from an analysis of perceptual and ergonomic issues, which inspired respectively the dimensions of comprehensibility, i.e. the ease of understanding the content, and manipulability, i.e. the ease of handling the device during a task.

User experience goes beyond the traditional concept of usability and covers additional factors such as usefulness, emotional factors and design elegance (Vosinakis and Koutsabasis 2018). As stated by ISO 9241-210 International Organisation for Standardisation (2019), it deals with “perceptions and responses that result from the use or anticipated use of a product, system or service” .

The Augmented Reality Immersion (ARI) questionnaire (Georgiou and Kyza 2017) was designed to measure immersion in location-based AR. It is made up of 30 items, which can be grouped into six factors: interest, usability, emotional attachment, focus of attention, presence and flow.

The framework proposed in Okanovic et al. (2022) for extended reality environments consists of an introductory part, a quantitative evaluation and open questions. It covers three subscales that can be mapped with two constructs of the UTAUT model introduced several years earlier (Venkatesh et al. 2003): immersion and edutainment are linked to performance expectancy (i.e. how much an individual believes that using the system will help him/her improve his/her performance), while perceived ease of use coincides with the definition of effort expectancy. Data analysis revealed a positive effect of the narrative quality on immersion and edutainment, even when ease of use issues occur.

Among the various models developed for storytelling evaluation, the Narrative Engagement Scale (Busselle and Bilandzic 2009) consisted of several items grouped into four dimensions:

  • Narrative understanding;

  • Attentional focus;

  • Emotional engagement, which refers both to the ability to feel the emotions of the characters and to feelings towards them;

  • Narrative presence, which refers to a transition from the real world to the story world.

The emotional gratification scale presented in Bartsch (2012) consists of three factors describing rewarding feelings (fun, thrill and emphatic sadness) and four factors concerning emotions in the context of social and cognitive needs (contemplative emotional experiences, emotional engagement with characters, social sharing of emotions and vicarious release of emotions).

Jin et al. (2022) designed a scale to assess the empathy and the heartfelt connection to the protagonist by combining an emotional engagement scale extracted from (Bartsch 2012) and a contemplativeness scale derived from (Bartsch et al. 2014; Bartsch 2012).

A comprehensive framework for the evaluation of the impact of interactive guides and AR/VR technologies on museum settings is the MUSETECH model (Damala et al. 2019): it considers three symbolic entities (the Visitor, the Cultural Heritage Professional and the Museum) and divides the life cycle of technologies into four phases (design, content, operation and compliance).

The questionnaire used in Boskovic et al. (2017) for a cultural heritage application consists of a user profiling section, 30 items for the evaluation of immersion, edutainment and usability and some final open questions to collect users’ opinions on their favourite and most problematic parts. The results of the application to a case study revealed the potential of virtual models to engage users in the exploration of historical artefacts by increasing their motivation, even in the presence of navigation and visibility issues. They highlighted also the importance of the quality of the production and the performance of digital stories for a high level of user immersion.

3 Digital storytelling

According to Abbott (Porter Abbott 2014), “story is an event or sequence of events”, while “narrative discourse is those events as represented”. Ryan (2006) also included other elements in the narrative concept such as a story world, intelligent characters, a timeline and meaningful events.

Digital storytelling refers to a variety of digital media platforms designed for narrative purposes (Miller 2019). In interactive digital storytelling, the user plays an active role, by influencing the flow or even the content of the story (Okanovic et al. 2022).

According to the model proposed in Roth and Koenitz (2016), user experience in interactive digital narratives can be described through 12 dimensions clustered into three experimental qualities: agency, immersion and transformation.

Agency, defined as the ability to take meaningful actions (Murray 1997), includes usability, which is a precondition for any enjoyable user experience and significantly influences effectance, autonomy and user satisfaction. In this context, effectance is the effect of an action, while autonomy refers to the freedom to choose from various options that can influence a narrative without feeling constrained in one direction. However, it is important to underline that high availability of options does not necessarily produce a more enjoyable experience.

Murray (1997) describes narrative as a “transformational” experience that can change the user. Interactive Digital Narratives foster a more direct connection (Murray 1997) through an active participation of the user, also by providing alternative paths and outcomes in different sessions.

An important factor of transformation is eudaimonic appreciation, which represents a special kind of engagement that derives from the aesthetic presentation of the interactive narration and leads the user to develop a personal dimension linked to his/her previous experiences. The distinction between hedonistic enjoyment and eudaimonic appreciation was explored in Oliver and Raney (2011): the former is related to pleasure and amusement, while the latter is related to the need to research and reflect on the meaning, truths and purposes of life.

Other components of the transformation dimension are affect, which consists of various measures of the affective states perceived by the user, and enjoyment, which represents entertainment in a broader sense (Roth and Koenitz 2016).

3.1 Augmented Reality for digital storytelling

One form of interactive digital storytelling is provided by Augmented Reality (AR), a technology that attempts to blur the line between reality and fiction (Shilkrot et al. 2014). A widely discussed topic is the connection that AR establishes between the story world, which follows a narrative model, and the real world, which the user can freely explore in a nonlinear way (Shilkrot et al. 2014). Moreover, AR exploits the concept of minimal departure, where a fictional story world can be understood due to its derivation from a real world that users are already able to perceive for themselves (Herman et al. 2007).

Based on interaction mode, AR-based narratives can be classified into three categories (Shilkrot et al. 2014):

  • Point of view-based exploration, which involves the user in a first-person game experience, where they are able to delve into the story through the eyes of a character;

  • Space-based exploration, where the user can activate the playback of storytelling content associated with frame landmarks or markers in a real-world space;

  • Ontological interaction, which has the possibility to alter the plot or the world of AR narrative.

According to the type of experience in which users are involved, AR-based narratives can be divided into three categories (Shilkrot et al. 2014):

  • Situated augmented narratives, which typically have a local nature and take place within a limited time;

  • Location-based narratives, which can augment a wider area of the physical world by exploiting various portals that give access to a story world;

  • World-level narratives, which extend globally and over a long period of time.

This paper examines the case of a space-based exploration with a location-based narrative, made possible by an AR application that is articulated through eleven points of interest associated with portions of the frescoes located at various points in the Basilica (as explained in detail in Sect. 5).

4 The case study: the Basilica of Saint Catherine of Alexandria in Galatina

After the birth of the Kingdom of Italy, the Basilica of Saint Catherine of Alexandria in Galatina (Lecce, Italy) was classified as a “National Monument of the first category” and in 1929 it was entrusted to the Friars Minor of the Province of St. Joseph. At the end of the last century, it obtained the nomination of Pontifical Minor Basilica. The facade, interested by a recent restoration that brought back to light the warm colour of the stone, is characterized by the presence of some disharmonious elements caused by the different interventions that followed one another during the centuries. A stringcourse frame runs horizontally, making the lower part more projecting than the upper register, which houses, in the centre, a stone rose window. The three portals correspond to the main nave and to the two lateral ambulatories.

The central portal, larger than the other two, is characterized by the presence of a porch resting on two columns supported by lions and surmounted by griffins acephalous. The three bands of stone that frame the wooden portal are richly carved with acanthus spirals and anthropomorphic figures, zoomorphic and phytomorphic. On the architrave is placed the relief representing Christ among the twelve apostles, according to a frontal and hierarchical disposition that denounces the presence of archaisms.

The rose window placed in the upper curtain wall is decorated with two orders of concentric circles carved with an ornate plant motif. The twelve rays welcome the stained glass windows and converge towards a central polychrome oculus representing the coats of arms of the d’Anjou Durazzo and d’Enghien-Brienne.

The interior of the basilica is striking for its solemn longitudinal layout and for the pictorial decoration that covers its walls (Fig. 1). The central nave is divided into three spans with cross vaults and is divided from the lateral spans by three lowered pointed arches. The spans are marked by transverse ogival arches resting on capitals sculpted with ornamental motifs taken from the Romanesque repertoire.

The pictorial decoration, originally, had to cover all the walls of the building. Today, the walls of the church preserve traces of two pictorial campaigns: the first one, which took place around 1391, is visible in the lower registers where devotional frescoes depicting saints appear. The second one, of uncertain dating (Cuciniello 2014), unfolds on the walls and on the vaults. In the first span and in the counter-façade are depicted scenes from the Apocalypse of John. On the vault are painted the Virtues. The second span is dedicated to the stories of Genesis, overlapped by the representation of the Sacraments. The walls of the third bay are occupied by Christological scenes from the Gospel, while the vault is decorated with the angelic hierarchies.

The story of the life of Saint Catherine is narrated in the presbytery which, at the top, is dominated by representations of the Evangelists and the Doctors of the Church. In the right aisle, instead, where the Orsini chapel is located, there are frescoed scenes of the life of Christ and the Virgin. Traces of a previous pictorial campaign, probably dating back to 1391, are recognizable in the lower parts of the basilica where there are numerous votive representations. According to the most recent criticism, different artists dedicated themselves to the Galatian pictorial cycle because of their formation and origin (Casciaro 2017). Scholars have recognized the intervention of three workshops headed by as many masters: one worked on the stories of Saint Catherine and on the vaults with the Evangelists, the angelic hosts and the Sacraments; another seems to have taken care of the walls of the second and third span, with stories from Genesis and the life of Christ; a third group of artists is active in the cycle of the Apocalypse. If the first two artists reveal a Venetian education, the master of the Apocalypse seems to have had knowledge of Neapolitan painting, as attested by the similarity with Giotto’s lost cycle in Santa Chiara. The hand of a fourth master, more inclined to anecdotal narration, can instead be identified in the frescoes that cover the Orsini Chapel.

Fig. 1
figure 1

Interior of the Basilica of Saint Catherine of Alexandria in Galatina (courtesy of the Archdiocese of Otranto)

5 The mobile AR application

5.1 Design

The AR application was born from the need to provide visitors with an innovative tool that can complement the traditional guided tour and enhance its use through thematic tours capable of enriching the cultural offer and facilitate the reading of the valuable pictorial cycles that cover the walls of the building. The AR application can offer the user a more engaging and educational experience, providing him with a wealth of information conveyed through multimedia content.

An in-depth preliminary study on the historical and artistic aspects of the Basilica was carried out to identify the pictorial portions of greatest interest: in particular, an accurate bibliographical study and numerous inspections were necessary to identify the position of the frescoes and the sources of illumination. The main challenges were the construction of a storytelling based on solid historical critical foundations and the selection of all those pictorial portions that could be easily framed by the users’ devices. The iconographic and iconological heterogeneity of the cycles, whose interpretation is still the subject of studies by art historians, has led to the selection of eleven points of interest, reported in Tables 1 and 2, chosen on the basis of historical and stylistic peculiarities little known to most visitors. The constructed itinerary is guided by the voice (interpreted by a speaker) of Pietro Cavoti (1819 - 1890), an illustrious artist from Galatina who for a long time waited for the studies of the Basilica in his capacity as President of the Conservative Commission of the Monuments of Terra d’Otranto and as Inspector of the Monuments (Montinari 1978). He, in fact, had made numerous drawings and sketches, accompanying them with historical and stylistic annotations. In order to better frame his activity, in-depth research was carried out at the Fondo Cavoti that allowed the photographic acquisition of numerous watercolours and pencil sketches.

Even though several tests revealed the possibility to recognize also the pictorial portions placed in the upper registers and on the vaults, it could be uncomfortable for the visitor to enjoy the content while keeping his arms stretched upwards. For this reason, in the design of the application, it was decided to keep the AR content even when the visitor moves his device along other directions. Moreover, since visitors are not allowed to access the presbytery areas, where the Funeral Monuments and the cycle with the Stories of St. Catherine are located, such frescoes are excluded from the list of points of interest.

Table 1 Points of interest of the guided tour—part 1
Table 2 Points of interest of the guided tour—part 2

5.2 Interface

A careful design of the graphic interface was carried out to make the application pleasant and easy to use even for inexperienced users, in order to meet the principles of effectiveness, efficiency and user satisfaction. A good interface and an effective storytelling will provide a good basis for experimentation for the analysis of the relationships between various user experience indicators that will be conducted in the following sections.

In particular, the application interface is made up of graphic shapes taken from architectural and pictorial elements and colours with a strong symbolic value, such as white and gold, which are a clear reference to the royalty of the Basilica’s patronage.

After downloading the application, the user accesses the interface in Fig. 2, which includes a full-screen photograph of the main nave of the Basilica, the application logo and a button that allows access to instructions. At the bottom, the arrow on the right gives access to a plan of the basilica containing clickable points of interest. A drop-down menu at the top provides access to textual information on the history, the pictorial cycles and the adjacent museum.

The map at the bottom left of Fig. 2 indicates the hot spots where the frescoes to be framed with the smartphone are located. The user starts the journey from point 1 and gradually moves towards the other points accompanied by the audio guides. By clicking on a point, the user accesses a preview screen that helps him understand which portion of the painting to frame with the camera to activate the visualization of the AR content associated with it. It contains a photographic reproduction, the title of the fresco and indications regarding the cycle to which it belongs. The camera-shaped button at the bottom right gives access to the AR scenes: it is sufficient to frame the painting of interest in order to see the icons corresponding to the audio guides and, in some cases, to the virtual restoration of lost parts of frescoes or to the game that allows you to listen to the sound of some mediaeval musical instruments depicted in the frescoes.

Fig. 2
figure 2

AR application interface

5.3 Implementation

The AR application, developed in Unity (Unity 2021) for Android and iOS platforms, is based on the Vuforia development kit (PTC 2021) (version 2018 4.12f1): it supports a variety of 2D and 3D target types, enables markerless and multitarget feature tracking, and has an internal fiducial marker system known as VuMark. Additional features of the SDK include Occlusion Detection, Extended Tracking, and Cloud Storage, i.e. online space for storing images and evaluating them in terms of tracking quality.

The analysis of the context has led to choose a markerless tracking based on natural features and on the recognition of two-dimensional images represented in the frescoes of the Basilica. Therefore, a photographic campaign was carried out to obtain the application targets. At the end of each inspection, the whole material was acquired and displayed, and in order to improve the reading of the lenses by Vuforia, each shot was subjected to a post-production through PhotoShop aimed at cropping the images, increasing the contrasting colour and keeping them within the maximum limit of 2 MB imposed by Vuforia. A subsequent evaluation of the photographs allowed us to assess how suitable each image is for tracking: the higher the number of discontinuities, i.e. features, the more suitable that photograph will be for image recognition.

6 Methodology of analysis and experimentation

The factors of usability and user experience and the connections between them have already been discussed in the literature: most of the published papers tried to define theoretical models or questionnaires used to individually assess the characteristics of an application. At the same time, several models have been proposed to describe various factors that characterize digital storytelling and its effectiveness. This paper follows a different approach, which aims to evaluate by means of a post hoc analysis the connection between different usability and user experience factors in the specific case of a mobile AR application for cultural heritage, also assessing possible correlations with the mental workload.

The tests involved 41 visitors of the Basilica of various ages (from 23 to 77, distributed as illustrated by the density plot in Fig. 3) without previous experience with AR technologies.

As the Basilica is a sacred place, they were provided with both tablets and headphones at the beginning of the visit to allow them to enjoy the contents of the audio guides in silent mode.

After the visit, visitors were asked to fill in a questionnaire made up of items taken from NASA-TLX (Nasa and Administration 2010), SUS (Brooke 1996) and UEQ (Laugwitz et al. 2008) standard tools dealing with the perceived workload, the application’s usability and the user experience.

Fig. 3
figure 3

Density plot of the ages of the users involved in the test

6.1 NASA-TLX

The NASA Task Load Index (NASA-TLX) (Nasa and Administration 2010) was designed to assess the perceived workload during the execution of a task. Among the 6 NASA-TLX items, physical demand and temporal demand were discarded, since the only physical activity required to use the application consists in touch-based interactions on a tablet and no hurry is imposed during the tasks. The considered items were Mental demand (how much thinking, deciding or calculating was required to perform the task), Performance (the success in accomplishing the application tasks), Effort (how hard the user had to work to achieve his/her level of performance) and Frustration, with scores expressed on a scale from 0 to 6.

6.2 SUS

The System Usability Scale (SUS) (Brooke 1996; Borsci et al. 2009) consists of a 10 item questionnaire with five response options ranging from “Strongly Disagree” to “Strongly Agree”.

  1. 1.

    I think that I would like to use this system frequently.

  2. 2.

    I found the system unnecessarily complex.

  3. 3.

    I thought the system was easy to use.

  4. 4.

    I think that I would need the support of a technical person to be able to use this system.

  5. 5.

    I found the various functions in this system were well integrated.

  6. 6.

    I thought there was too much inconsistency in this system.

  7. 7.

    I would imagine that most people would learn to use this system very quickly.

  8. 8.

    I found the system very cumbersome to use.

  9. 9.

    I felt very confident using the system.

  10. 10.

    I needed to learn a lot of things before I could get going with this system.

The SUS questionnaire is known to have a two-factor structure (Lewis and Sauro 2009), consisting of Learnability and Usability in a strict sense, which are weakly correlated (Borsci et al. 2009). The former is made up of items 4 and 10, while the latter is made up of all the remaining items.

6.3 UEQ

The 26 items of the UEQ questionnaire are grouped into 6 components (Laugwitz et al. 2008):

  • Attractiveness, which is the general impression towards the product;

  • Efficiency, which describes how fast and efficient the application is to use, including the organization of the user interface;

  • Perspicuity, which is the ease to get familiar with the product;

  • Dependability, which is the perception of controlling the interaction;

  • Stimulation, which concerns the interest and motivation fostering the use of the application;

  • Novelty, which expresses how creative and eye-catching the design is considered.

7 Result analysis

Principal component analysis was the first method employed to study data dimensionality and detect any relation among the considered variables. Then a cluster analysis was carried out by means of the ICLUST package, based on a hierarchical algorithm (Revelle 1978). ICLUST and principal component analysis can provide different results, as they are based on two different approaches (Cooksey and Soutar 2006): while the former tries to maximize internal consistency and homogeneity, the latter aims at maximizing variance by considering all the items simultaneously. Internal consistency represents how closely related a set of items are. A well-known internal consistency measure is Cronbach’s alpha (Cronbach 1951), which is defined as the mean of all the possible split-half reliabilities of a scale. However, this measure assumes a single underlying general factor. For this reason, Revelle introduced coefficient beta (Revelle 1979), defined as the minimum value among all the possible split-half reliabilities, to assess the scale homogeneity, which is assumed by coefficient alpha.

7.1 Principal component analysis

Principal component analysis (PCA) aims at grouping together multiple variables according to their variability and reducing the dimensionality of the data set. In the following subsections, principal component analysis was first applied to the items of each individual questionnaire, then to the factors describing the user experience in the UEQ questionnaire and finally to the factors of all three questionnaires taken together.

7.1.1 PCA of NASA-TLX items

The plot in Fig. 4 represents the relationships among NASA-TLX items in a space made up of the first two dimensions (which account for the 75% of the total variability), since variables correlated with the first two principal components are the most important to explain the variability in the data set: the longer an arrow, the better the variable it refers to is represented in the factor map; moreover, positively related variables, such as Effort and Frustration, are represented by very close arrows, while negatively related variables, such as Mental demand and Performance, are represented by arrows pointing in opposite directions. In the same chart, two concentration ellipses enclose the clusters detected in the data set.

Fig. 4
figure 4

Variable correlation plot for NASA-TLX items

The bar plot in Fig. 5 depicts the square cosine (cos2) of NASA-TLX variables on the first two dimensions: Performance and Mental demand, which are the variables with the highest values (close to 1), are best represented by the first two principal components; on the other hand, lower cos2 values suggest a weaker representation of Effort and Frustration variables on the two components. Variables that are correlated with the first two principal components are the most influential for the variability in the data set.

Fig. 5
figure 5

Quality of representation of NASA-TLX items

Table 3 shows that the first two and the first three components account for the 75% and the 93% of the total variability.

Table 3 Principal component analysis of NASA items

In principal component analysis, various types of rotation can be applied to the loading matrix to achieve the so-called simple structure, which should make their interpretation easier. After an initial formulation by Thurstone (1947), based on five criteria, a simpler formulation is now accepted, proposed by Kline (2002), according to which “each component should have a few high loadings with the rest of the loadings being zero or close to zero”. In this study varimax orthogonal rotation and promax rotation were considered: the former aims at maximizing variance among the squared values of loadings of each component (Kaiser 1958), while the latter is an oblique enhancement of varimax aimed at meeting the “simple structure” to a greater degree (Hendrickson and White 1964). After an orthogonal rotation component variances get changed, but components remain uncorrelated and variable communalities (which represent the part of the variance shared with other variables) are preserved.

The loadings in Table 4 were obtained by performing varimax and promax rotations on the first two components. The sign of a loading indicates whether a variable and a principal component are positively or negatively correlated. Only loadings higher than 0.4, highlighted in bold, were considered as significant. They revealed that the first component can be expressed as a combination of Mental demand and Performance: the opposite sign of the loadings indicates that Performance decreases as Mental demand increases. The second component can be expressed as a combination of Effort and Frustration, which are both negatively correlated with it.

Table 4 Loadings obtained through varimax and promax rotations on the first two principal components for NASA items

The loadings in Table 5 were obtained by performing varimax and promax rotations on the first three components. The first component is still a combination of Mental demand and Performance, but now the second principal component coincides almost exclusively with Effort, while the third coincides with Frustration.

Table 5 Loadings obtained through varimax and promax rotations on the first three principal components for NASA items

7.1.2 PCA of SUS items

The plot in Fig. 6 represents the relationships among SUS items in a space made up of the first two dimensions: they account for the 56.8% of the total variability and thus provide a weak representation, as confirmed by the short length of most of the arrows. SUS items 3 and 7 are represented by almost coinciding arrows: this would suggest a close relation between the perceived ease of use and the belief that people can learn to use the application very quickly, but the short length of both the arrows reveals that the two items are not well represented in the space of the first two dimensions. Also SUS items 6 and 8 are represented by close arrows, but in this case the former has a shorter arrow than the latter: this suggests that item 6 is not effectively represented by the first two components, so it is not possible to hypothesize a correlation between the perception of any inconsistencies and the idea that the system is cumbersome. The arrows of SUS items 2 and 10 are also quite close: since their length suggests that the two items are well represented, it is possible to assume a correlation between the perceived complexity and the need for an extensive training.

Fig. 6
figure 6

Variable correlation plot for SUS items

The bar plot in Fig. 7 depicts the square cosine (cos2) of SUS items on the first two components considered in Fig. 6: SUS items 2 and 10 are best represented by the two components, while SUS items 5, 1, 7 and 3 are only weakly represented, as also suggested by the shorter arrows in the diagram of Fig. 6.

Fig. 7
figure 7

Quality of representation of SUS items

Table 6 shows that the first four components account for the 77% of the total variability.

Table 6 Principal component analysis of SUS items

The loadings in Table 7 were obtained by performing varimax and promax rotations on the first four components. The first component is a combination of SUS items 2 and 10, which represent the perceived complexity and the need for an extensive training. The second component is a combination of SUS items 3 and 9, which represent the perceived ease of use and confindency with the application. The third component is a combination of SUS items 1 and 7, which represent the predisposition to use the application frequently and the thought that many people could learn to use it quickly. The fourth component depends only on SUS item 5, which indicates whether the components of the system are perceived as well integrated.

Table 7 Loadings obtained through varimax and promax rotations on the first four principal components for SUS items

7.1.3 PCA of UEQ factors

The plot in Fig. 8 represents the relationships among UEQ factors in a space made up of the first two dimensions (which account for the 90% of the total variability): the direction of the arrows shows 3 pairs of variables Efficiency-Perspicuity, Attractiveness-Stimulation and Dependability-Novelty which have very similar orientation along the first two main components.

Fig. 8
figure 8

Variable correlation plot for UEQ factors

The bar plot in Fig. 9 depicts the square cosine (cos2) of UEQ factors on the first two components, showing that all the six UEQ factors are fairly well represented by the two components: Stimulation and Novelty are the most influential factors for variability, since they are correlated with the first two principal components, whereas Perspicuity has the lowest influence, although the differences in the contributions of the variables are not very pronounced.

Fig. 9
figure 9

Quality of representation of UEQ variables

Table 8 shows that the first three and the first four components account for the 94% and the 97% of the total variability. The loadings in Table 9 were obtained by considering only the first three components: they do not reveal a clear dominance on a single component of Attractiveness, which has low loadings in all the three components, and Stimulation, whose loadings have almost the same magnitude and opposite sign on PC1 and PC2.

Table 8 Principal component analysis of UEQ factors
Table 9 Loadings of the first three principal components for UEQ factors

On the contrary, varimax and promax rotations applied on the first four components produced the loadings in Table 10, where for each factor there is a clear dominance on a single principal component. The first component can be expressed as a combination of Attractiveness, Stimulation and Novelty. On the other hand, Efficiency, Perspicuity and Dependability correspond to the second, the third and the fourth principal component, respectively.

Table 10 Loadings of the first four principal components for UEQ factors

7.1.4 PCA of all the questionnaires’ factors

The plot in Fig. 10 represents the relationships among questionnaires’ factors in a space made up of the first two dimensions (which account for the 76.8% of the total variability): the opposite direction of Frustration is negatively correlated with many other variabiles, such as Perspicuity, Efficiency, Attractiveness and Stimulation. Two very close arrows pointing in the same direction suggest a positive correlation between Attractiveness and Stimulation.

Fig. 10
figure 10

Variable correlation plot for questionnaires’ factors

The bar plot in Fig. 11 depicts the square cosine (cos2) of all the questionnaires’ items: the variables with the weakest representation are Effort and Performance; the former had a low cos2 value also in the bar plot in Fig. 5 focusing on all the NASA-TLX items, while the latter goes from the highest score in the chart in Fig. 5 to the lowest score in the bar plot in Fig. 11; the other variables have very high cos2 values, above 0.75, except for Efficiency, which however has a much higher value than Effort and Performance.

Fig. 11
figure 11

Quality of representation of questionnaires’ variables

Table 11 shows that the first eight components account for the 97% of the total variability.

Table 11 Principal component analysis of questionnaires’ factors

Varimax and promax rotations applied on the first four components produced the loadings in Table 12. Attractiveness, Stimulation and Novelty, which formed the first principal component in Table 10, now form the seventh component: therefore, they lose their importance in this wider context which includes more variables. Now the first component is positively correlated with Frustration and negatively correlated with Dependability. However, the dominance of Frustration on the first component is not very clear, especially according to the promax rotation, in which the loadings have opposite signs but not too different magnitudes on the first and last main components. Mental demand, Performance, Effort, Efficiency and Learnability are represented individually by components 2, 3, 4, 5 and 8 respectively. The sixth component is positively correlated with Usability and Perspicuity.

Table 12 Loadings of the first eight principal components for questionnaires’ factors

7.2 Hierarchical cluster analysis

The hierarchical procedure implemented by ICLUST performs multiple clustering steps between similar items or clusters formed in a previous stage until a stop criterion based on alpha and beta coefficient is satisfied. An item is added to a cluster only if it can improve the cluster’s internal consistency and factorial homogeneity. In the following subsections, Cluster Analysis was first applied to the items of each individual questionnaire, then to the factors describing the user experience in the UEQ questionnaire and finally to the factors of all three questionnaires taken together. In particular, a more detailed analysis is conducted at the end to identify the relationships between Perspicuity, Frustration and the individual items of the SUS questionnaire that make up Usability and Learnability.

7.2.1 Cluster analysis of NASA-TLX items

The diagram in Fig. 12 represents the clusters between items of the NASA-TLX questionnaire. Each oval connects two items forming a cluster and reports the Cronbach’s alpha and the Revelle’s beta reliability coefficients. Performance and Mental demand, which form up cluster C1, are again negatively correlated as suggested by PCA. Effort and Frustration, which form up cluster C2, are positively correlated, although more weakly and with significantly lower reliability coefficients. At a higher level, clusters C1 and C2 form cluster C3.

Fig. 12
figure 12

Cluster analysis of NASA-TLX items (Cluster fit = 0.72, Pattern fit = 0.98, RMSR = 0.09)

7.2.2 Cluster analysis of SUS items

The diagram in Fig. 13 represents a cluster analysis of the ten SUS items: a model based on three clusters has been adopted, since it maximizes cluster fit and pattern fit and minimizes the RMSR.

Items 4 and 10, which form up the SUS Learnability factor, are not directly correlated, but they belong to the same macro-cluster in which, however, other items also join. In particular, there is a significant correlation between complexity (SUS2) and the need to learn a lot of things (SUS10), which form cluster C1. The perception of the system as cumbersome (SUS8) forms together with cluster C1 a higher-level cluster C3. Similarly, the need for technical support (SUS4) and the perceived inconsistency (SUS6) join at higher levels to form clusters C5 and C7 respectively. However, cluster C7 presents a high discrepancy between \(\alpha =0.88\) and \(\beta =0.69\), which makes the inference on the perceived inconsistency less reliable.

Fig. 13
figure 13

Cluster analysis of SUS items (Cluster fit = 0.8, Pattern fit = 0.98, RMSR = 0.07)

7.2.3 Cluster analysis of UEQ items

The diagrams in Figs. 14 and 15 represent a cluster analysis of the 26 UEQ items, where a model based on six clusters has been adopted. In particular, there is a big macro-cluster, labelled as C19 in Fig. 14, that groups most of the items and presents the highest correlation coefficients: valuable/inferior and good/bad items, belonging to cluster C1, can be considered almost coinciding (due to correlation coefficients equal to 1), but most other coefficients are still greater than 0.9.

Fig. 14
figure 14

Cluster analysis of UEQ items (Cluster fit = 0.96, Pattern fit = 0.99, RMSR = 0.05)

Other smaller clusters are represented in Fig. 15. Two items, namely impractical/practical and unpredictable/predictable, are completely isolated and do not form any clusters. The small isolated cluster C12 groups easy to learn/difficult to learn and complicated/easy items: this is in agreement with the light correlation between Usability and Learnability SUS factors suggested by Borsci et al. (2009). The medium-sized cluster C20 is the root of a hierarchical tree grouping annoying/enjoyable, conservative/innovative, boring/exciting, not interesting/interesting and not understandable/understandable: in particular innovative aspects makes the application enjoyable; moreover, the more interesting and enjoyable the application, the easier it is to understand.

Fig. 15
figure 15

Cluster analysis of UEQ items (Cluster fit = 0.96, Pattern fit = 0.99, RMSR = 0.05)

Table 13 compares the UEQ item groupings in the three macro-clusters with the six-factor groupings in the UEQ questionnaire. The biggest cluster C19 covers almost entirely Efficiency and Dependability, half of the Stimulation and Novelty items and only one Perspicuity item. However, the high discrepancy between \(\alpha =0.98\) and \(\beta =0.81\) in C19 makes the clustering between C17 (the subcluster that hierarchically groups the remaining items of C19) and the attractive/unattractive item (the most important for the Attractiveness factor) not very reliable.

Cluster C20 includes half of the Stimulation items and one item from each of the Attractiveness, Perspicuity and Novelty factors, even though the high discrepancy between \(\alpha =0.87\) and \(\beta =0.76\) in C20 makes the clustering between C16 and the not undestandable/understandable item (belonging to the Perspicuity factor) not very reliable. The smallest cluster C12 covers only half of the Perspicuity items.

Table 13 UEQ items grouped by factors

7.2.4 Cluster analysis of UEQ factors

The diagram in Fig. 16 represents a cluster analysis of the six UEQ factors, which shows a three-level hierarchical clustering in the form of a tree rooted in cluster C5: at the lowest hierarchy level, the closest correlation is between Attractiveness and Stimulation, followed by the one between Dependability and Novelty and then by the one between Perspicuity and Efficiency.

Fig. 16
figure 16

Cluster analysis of UEQ factors (Cluster fit = 0.99, Pattern fit = 1, RMSR = 0.04)

7.2.5 Cluster analysis of all the questionnaires’ factors

The cluster diagram in Fig. 17 put together the factors from NASA-TLX, SUS and UEQ questionnaires to check whether there is any influence between factors from different questionnaires.

Fig. 17
figure 17

Cluster analysis of all the questionnaire factors (Cluster fit = 0.97, Pattern fit = 0.99, RMSR = 0.06)

Attractiveness and Stimulation, as well as Dependability and Novelty, are still highly correlated just as in the diagram involving only UEQ factors shown in Fig. 16. They form the same hierarchical structure that was already present in the previous diagram.

Efficiency is no longer linked to Perspicuity, as in the diagram in Fig. 16: while in Fig. 16 it was the Efficiency–Perspicuity cluster that was connected to the subtree formed by Attractiveness, Stimulation, Dependability and Novelty, in Fig. 17 only Efficiency is connected to it.

Perspicuity and Learnability, which refer to very close concepts, are directly grouped in a cluster showing a high correlation. Usability, which joins at a higher-level cluster, can be seen as a consequence of these factors.

Effort does not form any clusters with other factors: in Fig. 12 it was weakly correlated with Frustration, which now has a much stronger correlation with cluster C1.

The correlation between Usability and cluster C5 in the diagram of Fig. 17 reflects the weak connection between Learnability and Usability, which was previously highlighted in literature (Borsci et al. 2009): it shows that the negative influence of Frustration contributes to determine Usability in addition to Learnability and Perspicuity, represented by cluster C1.

Mental demand and Performance are still correlated just as in the diagram involving only NASA items shown in Fig. 12.

7.2.6 Usability, learnability, perspicuity and frustration

In order to understand in more detail what are the linking points between Usability, Learnability, Perspicuity and Frustration, a cluster analysis was carried out between the SUS items, the UEQ items that make up the Perspicuity component and the NASA-TLX item that represents Frustration (Figs. 18 and 19 ). Frustration is closely related to the complexity of the application (SUS2). Moreover, both these aspects increase the difficulty in learning how to use the application, as highlighted by the correlation between cluster C2 and the easy to learn/difficult to learn item, belonging to the UEQ Perspicuity factor. In turn, these three items increase the feeling of having to learn a lot before being able to use the application effectively (SUS10), as highlighted by the correlation between cluster C4 and SUS10 item. All these aspects influence the perceived clarity in the use of the application, as highlighted by the correlation between cluster C5 and the clear/confusing item of the UEQ Perspicuity factor. The high discrepancy between \(\alpha\) and \(\beta\) values for clusters C8 and C11 makes the hypothesis of a correlation with the SUS8 and SUS6 Usability items less reliable.

Fig. 18
figure 18

Cluster analysis between Usability, Learnability, Perspicuity and Frustration items (Cluster fit = 0.91, Pattern fit = 0.99, RMSR = 0.06)

The other two items belonging to the UEQ Perspicuity factor represent the perceived level of complication and the level of comprehensibility: the former correlates with SUS4 item of Learnability, which concerns the need for support from an expert, while the latter correlates with SUS3 and SUS9 items of Usability, which concern the ease of use and the level of confidence acquired in using the application.

Fig. 19
figure 19

Cluster analysis between Usability, Learnability, Perspicuity and Frustration items (Cluster fit = 0.91, Pattern fit = 0.99, RMSR = 0.06)

7.2.7 The influence of storytelling on mental workload

In order to assess in detail possible influences of storytelling on the workload, a cluster analysis was conducted between the Mental demand item of NASA-TLX and the items of the UEQ questionnaire. Among the factors describing the user experience, Dependability (i.e. the perception of controlling the interaction) is the only one with an item, concerning unpredictability, which forms a cluster with Mental demand and thus seems to have influence on it (Fig. 20). Mental demand, on the other hand, did not correlate with any of the items of the SUS questionnaire, nor with the items of the other UEQ factors (Attractiveness, Stimulation, Perspicuity, Efficiency, Novelty). This suggests that mental load is not affected by usability issues, but only by certain storytelling evolutions that are considered unpredictable by some users, probably caused at least in part by the exploratory and nonlinear nature of the AR-based experience (Shilkrot et al. 2014).

Fig. 20
figure 20

Cluster analysis between the Mental demand item and the Dependability items (Cluster fit = 0.89, Pattern fit = 0.99, RMSR = 0.07)

8 Discussion

The connection between the worlds of usability and user experience has long been debated in the literature. In this sense, it is interesting to evaluate in the present study the connection between the SUS and UEQ questionnaires, which can be found mainly in the strong correlation between Perspicuity and Learnability, two very close concepts: the former, which refers to the ease of becoming familiar with the application, represents the more practical aspects of the latter, which refers to the ease of learning to use the application (in the sense of “understanding how to use it”).

Learnability can be further decomposed into two subcomponents, represented by SUS4 and SUS10 items: the former concerns the amount of things to learn in order to use the application, while the latter refers to the practice needed under the guidance of an expert to master and apply them correctly. These two subcomponents are the basis of a double connection between Learnability and Perspicuity: according to the diagrams in Figs. 18 and 19, users think that any difficulties in learning to use the application can be addressed with the help of an expert, whereas the many things to learn would make the system appear complicated.

A noteworthy aspect is the absence of correlation between Mental demand and Usability/Learnability, which suggests that the user’s mind is more focused on the content of the application than on its use. The absence of correlation of Effort with any other factor also seems to support this hypothesis.

The fact that Frustration (which has an average value of only 0.561) correlates more closely to Learnability suggests that any difficulties were perceived mainly in the first approach to the application, as also suggested by the average Learnability value (3.134) which is slightly lower than the average Usability value (3.479).

The main factors determining the Attractiveness of the application are the eye-catching design (the Novelty factor) and the user’s perception of being able to control the interaction (the Dependability factor), two aspects that appear to be interrelated. The latter aspect is not to be confused with Usability, which is unrelated to it, but should rather be understood as the way the storytelling is structured, that is, the way the content is articulated through the sequence of hot spots.

However, it should be emphasized that in the considered scenario the user is given a certain amount of freedom, as he or she can choose how long to stay at each hot spot to enjoy the associated AR content. This most likely contributes to a more pleasant experience and lightens the mental load. On the other hand, the dualism between the narrative order of the story and the nonlinear exploration of the real environment could produce cognitive disorientation that could become an additional mental burden.

Even though the storytelling of the application is linked to a precise path that the user has to follow inside the Basilica, no time constraints are imposed (which is why the NASA-TLX Temporal demand item was not included) nor are there any particular objectives to be reached during the route. This makes the application suitable for an informal learning scenario (Lin et al. 2012), where intrinsic motivations, such as personal curiosity, are predominant and study activities are carried out as hobbies.

It would be interesting to assess how the relationships between the various factors considered in this study would change in the presence of objectives to be achieved or quizzes to be answered during the experience.

Also the inclusion of scores and gamification elements in the storytelling, such as a treasure hunt or other puzzles along the way through the hot spots, could have a significant impact on interactions between the various factors: in particular, the desire to achieve certain game goals or scores could lead users to feel a greater impact of any usability problems on their mental load. In addition, the definition of time constraints within which to complete tasks could reveal relationships between Effort and other factors, as well as bring into play the time demand variable that was omitted in this study.

Moreover, the introduction of gamification elements (such as treasure hunts) could be a way to impose an order of visiting the various hot spots that could be congruent with the linearity of the story. A constrained workflow could mitigate the above mentioned cognitive misalignment, though the presence of quizzes and puzzles would require greater attention and concentration. More detailed tests will be conducted in future work to assess which of these two effects could prevail.

9 Conclusions and future work

The present study analysed the relationships between usability, user experience and mental factors for a mobile application, based on AR technology, that provides the visitors of the Basilica of St. Catherine of Alexandria with new keys to reading and thematic routes that can complement the traditional guided tour. The thematic itinerary considered in this first experimental scenario is only one of the innumerable thematic approaches that we propose to undertake, since the Basilica with its frescoes is configured as a great book whose chapters still have much to tell to scholars and enthusiasts, a precious historical document capable of opening up to the most attentive and sensitive visitor. Each fresco can be seen as a story within history, a valuable testimony of a time gone with its customs, its creators and its enlightened patrons.

The analysis of the data collected through questionnaires after the trial showed no influence of usability or learnability on mental demand, which suggests that users were able to focus more on the content than on using the application or on learning how to use it. Moreover, usability seems to have no influence on the attractiveness of the application, which appears to be influenced more by novelty (which expresses how creative and eye-catching the design is considered) and the perception of controlling the interaction. Future work will examine whether the addition of quizzes, gamification elements and time constraints in the storytelling path will significantly change this trend.