Guidelines for evaluating wearables’ quality of experience in a mulsemedia context

Quality of Experience (QoE) is inextricably linked to the user experience of multimedia computing and, although QoE has been explored in relation to other types of multimedia devices, thus far its applicability to wearables has remained largely ignored. Given the proliferation of wearable devices and their growing use to augment and complement the multimedia user experience, the need for a set of QoE guidelines becomes imperative. This study meets that need and puts forward a set of guidelines tailored exclusively towards wearables’ QoE. Accordingly, an extensive experimental investigation has been undertaken to see how wearables impact users’ QoE in multiple sensorial media (mulsemedia) context. Based on the exploratory study, the findings have shown that the haptic vest (KOR-FX) enhanced user QoE to a certain extent. In terms of adoption, participants reported they would generally incorporate the heart rate (HR) monitor wristband (Mio Go) into their daily lives as opposed to the haptic vest. Other findings revealed that human factors play a part in user’s attitudes towards wearables and predominantly age was the major influencing factor. Moreover, the participants’ HR varied throughout the experiments, suggesting an enhanced level of engagement whilst viewing the multimedia video clips. Furthermore, the results suggest that there is a potential future for wearables, if the QoE is a positive one and if the design of such devices are appealing as well as unobtrusive.


Introduction
Technology has had a significant impact across the world and especially the way in which people communicate with one another. There are many technology innovations that have developed rapidly over the years and multimedia is no exception to this. Whilst digital multimedia appeared over two decades ago, constant innovations in respect of communication infrastructure access devices, as well as multimedia rendering and production have meant that multimedia technology has remained at the forefront of innovation. Given the importance of end users in the acceptance and adoption of technology, the term Quality of Experience (QoE) was initially introduced in the late 90s. QoE refers to the "degree of delight or annoyance of applications or services" [12]. Although, there has been research done on QoE there is a gap that exists and that is with 'wearables'. Wearables have known an increasingly popularity of late, becoming progressively affordable and offering a variety of options to the contemporary user. However, user experience is key as far as the adoption of modern technology and adapting QoE to wearables is long overdue, especially as wearable devices branch out into multimedia consumption and multi-sensorial interaction.
Wearable technologies' most evident manifestation is through computerized gadgets that can be worn on or underneath garments. They encompass a plethora of devices, such as watches, fitness trackers, glasses, headsets, clothing, jewellery, and are used in many fields, e.g., gaming, military, healthcare, education, entertainment, and leisure [49]. When it comes down to acceptance, however, users tend to be reluctant to do so, due to privacy and security concerns [19,65]. The most critical element of technology adoption is getting users to change their habits and precious few studies have discussed the acceptability of wearable devices. Researchers such as Spagnolli et al. [99] have pointed out that there are issues such as privacy concerns and comfort that lead users to being reluctant to use wearable devices in real contexts. Moreover, as mentioned by Buenaflor and Kim [14], due to social acceptance, not many users take to wearable computers; besides, human factor and technological considerations impact users in accepting technology. QoE research so far has not dealt with wearable devices -apart from a single study by Hupont et al. [41] as wearables have been mostly in the development phase. However, some are now commercially available and have progressively gained notable attention from users as well as markets.

Challenges
Wearable technologies are not always accepted due to people's views and opinions which are always changing, and this is a challenge but, finding out how they feel in wearing the wearables is something that could aid developers in improving upon their designs or functionalities to meet their needs. To this end, many factors are perceived as being influential in accepting wearables. For instance, Ariyatum et al. [6] highlighted that the physical appearance of a wearable plays a key role when it comes to acceptance. Moreover, the wearable device should fit the user's personality and lifestyle, and indeed the device's usability, functionality and price are also crucial factors when it comes to the device's acceptance. Similarly, Bodine and Gemperle [9] claim that the acceptance of wearables is based on perceptions of comfort and functionality; and that these dimensions should be considered by the developers early in the development phase. However, developers tend to not always involve users in the early development stage and test wearables in iterations, which ultimately causes problems when it comes to using a device regularly and acceptance of the device [58].
Users' involvement is critical, as their experience confirms the success or failure of a product [44]. Accordingly, Stickel et al. [104] and Hassenzahl [38] have pointed out that user satisfaction is an important feature that determines whether the product has met a user's expectation. From this review of related work, it becomes clear that the potential of using wearable devices to enhance user QoE of viewing multimedia content has largely been ignored by the literature. In this context, a deeper understanding of user QoE is what inevitably would close the gap between designers and developers, helping them understand what users need and want from the product.

Quality of experience
There are many definitions for 'quality' that have been proposed in the literature. For instance, Parasuraman et al. [81], have said of quality that it is an indescribable and diverse concept, whilst Martens and Martens [60] have defined 'quality' as an individual's judgement or perception of an outcome that could be from either a product or service [12]. As well as 'quality' in the ICT environment, 'experience' has become obvious, as they both have a distinct meaning. Experience is defined as an individual's interaction with a service or system and their perception of events that occur [12]. Event is defined in the literature as a place where something imperative happens that is organized by someone. This includes the location and the time the event will occur and involves observations [12].
The term QoE was introduced in various white papers [12,73,92] and there are different definitions that have been proposed in the literature that share a similar meaning. The concept of QoE is based on understanding human behaviour/attitudes, as well as users' needs, perceptions and acceptance of products. The international telecommunication union (ITU) defines QoE as "the overall acceptability of an application or service, as perceived subjectively by the enduser" [46]. As defined here the ITU addresses that QoE includes the complete end-to-end system effects (client, terminal, network, services infrastructure, etc.), where overall acceptability may be influenced by user expectations and context. According to Kim and Choi [97] and Staelens et al. [103], this definition of QoE is user-centric and is particularly relevant for multimedia streaming type of services that are linked to quality of service (QoS) which includes Internet-based Protocol Television (IPTV), Video on Demand (VoD), streaming media and broadband data services. Similarly, Li-yuan et al. [56], define QoE: "The function of quality of experience (QoE) evaluation includes two aspects: to monitor the experience of user on-line, then to control and justify the service based on the QoE to ensure that the quality of service can highly meet the requirements of the user". This definition of QoE is also associated with the QoS concept as it assesses how the end user perceives the value of the service.
In contrast Laghari et al. [54], define QoE as a blueprint encapsulating experiences and human objective, subjective, hedonic and aesthetic needs focusing upon a person and their interaction towards technology. According to them, understanding human desires requires incorporating cognitive science, engineering science, social psychology, and economics. This definition is different compared to the one proposed by the ITU that explicitly refers to QoE as a subjective measure whereas objective human factors are considered equally as important in this definition. Zapater and Bressan, [118] define QoE as "the characteristics of sensations, perceptions and views of people about a particular service or product; these characteristics can be good, fair or bad". Sensation and perception are an area in psychology and this definition emphasizes these two characteristics that will determine the user's QoE. In relation to this definition Rodriguez et al. [88] have stated that other criteria such as human cognitive process, sensory processing and psychological approaches would complement the perceived quality of multimedia services. Although, the term QoE has various definitions, it all depends on the context it is used. For our research we applied a recent (working) definition from the Qualinet paper-"QoE is the degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and / or enjoyment of the application or service in the light of the user's personality and current state" [12]. Here QoE is defined entirely from the user's perspective-"…the degree of delight or annoyance of the user…" and includes a hedonic component as well "…utility and/or enjoyment…". Furthermore, 'application' refers to-"A software and/or hardware that enables usage and interaction by a user for a given purpose. Such purpose may include entertainment or information retrieval, or other" [12]. We used this definition of QoE because it is relevant to an exceptionally large array of application fields. Also, it is the most common and wellestablished definition used by researchers for different scenarios. This definition was deemed appropriate to use as we wanted to demonstrate the relevance of QoE concept with wearables and its applicability in a mulsemedia context.

Motivation
Wearables have revolutionized the technological landscape and lifestyle of individuals. Whilst the popularity and growth of the wearables market has been undeniable, this is not to say that the sector is without its problems. There are many issues and challenges that have arisen, and the most addressed were design, privacy, data security and cost [2,42,43,66,78,79,83,85,113]. Although researchers and experts have been increasingly discussing the problems associated with consumer acceptance of wearable devices and identified to some extent the underlying influencing factors, there is, however, a lack of studies centered around measuring the users' QoE of wearable devices, notwithstanding the fact that both domains are of importance in the ICT sector. To this end, some recent studies [26,[75][76][77]109] have started exploring the issue, with only a single [17] looking at wearables in a multisensory and healthcare context. The main contribution of this work is to fill this existing gap, discovering user attitudes and acuities aligned with the interactivity associated with wearables. To this end, clear views will be evident through measuring QoE associated with wearable devices in mulsemedia context by employing olfaction and haptic effects. Also, we will delve into the human factors to see whether they impact user QoE with mulsemedia in relation to wearables. In addition, a set of guidelines will be formulized to evaluate user QoE of wearables. The use of the guidelines will assist researchers or developers to examine QoE better for existing and future innovations linked to wearable devices.
Accordingly, the structure of this paper is as follows: Section 2 reviews related work, after which Section 3 explains human factors. Section 4 details the experimental methodology employed in our study. Results are then analyzed and discussed in Sections 5, 6, 7, 8 and 9. Section 10 summarizes the results, whilst Section 11 introduces a set of guidelines in relation to evaluating QoE with wearables. Section 12 discusses the limitations of this study. Lastly, conclusions are drawn in Section 13.

Related work
With the recent rapid development in technologies underpinning smart and wearable devices human senses beyond the audio-visual can now be included in digital applications. These new multisensory technologies are now more affordable and accessible for all people, hence including other senses such as smell, and touch is an increasingly realistic proposition which has the potential to enhance a user's QoE. Accordingly, there have been a proliferation of studies exploring user QoE of mulsemedia applications incorporating non-traditional media types such as haptics [47], gustatory [72], olfactory [30,31,67] or indeed, a combination thereof, such as haptic and olfactory [40].
QoE has been comprehensively investigated in and considered to be a very important aspect of multiple sensorial media (mulsemedia) [115], with several potential application areas being identified. For example, Nakamoto et al. [71] applied olfaction in a gaming context with results showing an increased QoE. In terms of multisensory interaction and design many studies have demonstrated that using this phenomenon in practice has brought many benefits. Accordingly, Hancock et al. [36] used a multisensory concept in their study and found it improved the performance of visual searches and reduced the amount on mental workload. Covaci et al. [20] proposed a multisensorial educational game named Fragrance Channel and looked at how the learning engagement, performance and QoE can be improved with olfactory stimulation. The findings highlighted that the multisensory setups in educational games engage users and can increase the performance as well as the learning process. Speaking of education, Zou et al. used mulsemedia in Technology Enhanced Learning (TEL) to improve the learning process and experience. The authors developed a testbed to play video content enhanced with olfaction, haptic and airflow effects. The results showed that most users are open to (TEL) as it would increase their learning experience to a great extent [121]. Gustavo proposed a model named 'Multisensorial Electronic Books' combining enhanced e-books with mulsemedia to improve the readers learning process and the QoE. The author also developed a prototype that integrated olfactory, auditory and haptic effects. The prototype was a notable success and opened avenues for future research [35]. Also, in the context of learning e-books enriched with mulsemedia content have shown positive results as seen in several studies [1, 10,55,91]. It is fair to say that multisensory digital learning experiences that involve olfaction can enhance the users' QoE. Additionally, the benefit of olfactory media to enrich QoE has also been proven in several other studies [29,48,69,107,116,117]. All these studies strengthen the belief that multisensory integration in a digital context will enhance QoE when using interactive systems. Although QoE has been studied with mulsemedia this has been without looking at the cross sensorial interaction. Nonetheless, over the last decade, there has been incipient work which has started to explore crossmodal correspondences between olfactory and visual stimuli. We shall now turn our attention to these.

Crossmodal correspondences
Crossmodal correspondences have been addressed mainly in the field of cognitive science and this phenomenon is defined as "a tendency for a sensory feature, or attribute, in one modality, either physically present or merely imagined, to be matched (or associated) with a sensory feature in another sensory modality" [82,101,102]. Experiencing a stimulus in a sensory modality is often associated with experiences in another sensory dimension (e.g., pitch in audition and brightness in vision). Crossmodal correspondences between audition and vision have long been explored and extensively documented [59]. However, researchers have shifted towards mapping olfaction and vision-an area that had not been studied before. There are few studies that have mapped more than one sensory modality as Gilbert et al. [32] provided one of the first examples of olfactory-visual correspondences, showing that there are strong correlations between odours and colours. Accordingly, bergamot smell was associated with yellow, cinnamon with red, pine with green, etc. Other studies investigated various smells associated with colours as seen in the works of [51,57,90,105]. Specifically, Gilbert et al. [32] presented a study on colour odour linkages that showed that blue colour matches lilial scent, yellow colour-bergamot scent, red colour-cinnamon scent and so forth. Part of these matches are illustrated in (Table 1). Correspondingly, Kemp and Gilbert [51], found that strong smells were found to be associated with darker colours. Other studies focused on the shape -colour correspondences and found that odours of pepper and lemon are significantly related with the angular shape, whereas the odours of raspberry and vanilla are relevantly linked with round shapes [37].
Crossmodal correspondences were documented between several pairs of sensory modalities such as: vision and touch [98], audition and touch [114], flavours and sounds [21], flavours and vision [27]. Even though the focus of this study is on wearables, we decided to incorporate different smells that were crossmodally matched with the six video clips to enhance user QoE and to explore the user experience of wearables in such a context. To this end we designed an experiment to explore whether the cross-modally mapped multisensorial effects (olfaction and auto-generated haptic) from visual features of videos enhance the users' QoE see (Table 2). We hypothesize that considering crossmodal mappings whilst creating mulsemedia systems could lead to more immersive and effective experiences for the users.

Human factors
Numerous studies in the multimedia field have shown that human factors such as age, gender and personal interests influence user QoE [95,119,120]. Scott et al. [96] investigated the influence of personality and cultural traits on the perception of multimedia quality. They reported that human factors play an important role in perceptual media quality as well as user enjoyment. Although, these studies are in the context of multimedia applications, very little research has been done on human factors in perceptual mulsemedia quality apart from a single study by Murray et al. [68]. They investigated how age and gender influence users' perception of the temporal boundaries within which they perceive olfactory data and video to be synchronized. Moreover, whilst there has been previous work on mulsemedia QoE discussed in Section 2, there is a paucity of research that has looked at the influence of human factors on wearables QoE, and this adds an extra dimension to our investigation.

Methodology
The experiments we designed are aimed to investigate the potential influence of using crossmodal mulsemedia correspondences concepts on user QoE with wearables. More specifically, we used s videos characterized by dominant visual features: colour (blue, yellow), brightness (low, high), and shape (round, angular). Participants viewed these videos enhanced with crossmodally matching smells while wearing a haptic vest and a heart rate (HR) monitor wristband. We chose to use the vibrotactile display because literature has shown that participants exhibit an increased emotional response to media with haptic enhancement [86].

Sampling
Convenience sampling is a non-probability strategy we used to recruit participants because we had limited resources to reward people in participating in our experiments which were quite lengthy in time (30-40 min). This Convenience sampling is defined as "a type of nonprobability or non-random sampling where members of the target population that meet certain practical criteria, such as easy accessibility, geographical proximity, availability at a given time, or the willingness to participate are included for the purpose of the study" [24]. We chose to use convenience sampling as it was relevant to our studies because it is a quick and easy method to recruit participants in short space of time. We reached out to people who were available from both Brunel University, Department of Computer Science and University of West London, School of Computing and Engineering via email and word of mouth. The participants gave informed consent and could withdraw at any time without giving a reason, and that they were not compensated for taking part in the experiments. The data was anonymized and strictly kept confidential.

Participants
The sample size of our experiment is based on a study by Brunnström and Barkowsky [11]. These authors have emphasized that in QoE experiments planning the sample size depends upon the statistical significance testing one will use for their study. They have also highlighted that the sample size depends upon the test design an experimenter undertakes for their study which could be either within-subject design or between-subject design. In this experiment we decided to use between-subject design to see if there is statistically significant difference between two unrelated groups. A study by Brunnström and Barkowsky [11] have observed that for between-subject design a significance difference of alpha 0.05 can be found with a sample size of 24 subjects. Their study has proved that small sample size can be used as long as it meets the significance level. We recruited 24 participants (14 males and 10 females) who were randomly allocated into two groups: an experimental group (EG) with 18 participants and a control one (CG) with 6 participants. The crossmodally matched smells and wearables of the two groups can be seen in ( Table 2). The Participants were aged between 18 and 41+ years and came from various nationalities and educational backgrounds. The gender and age of the participants were roughly matched across in the experiment. All participants spoke English and self-reported as being computer literate.

Wearable devices
Two distinct types of wearable devices were used in our experiments see (Fig.1). The first was a KOR-FX gaming haptic vest. This device was chosen for this study because a user can get engaged with what they are seeing on the screen, enabling them to have an immersive experience. Also, the haptic vest connects to the audio coming from any media content such as movies or games [53].
Applying the KOR-FX device in the experiment would provide different perceptions from users, because the vest has sensors that are meant to immerse the user and enhance the sense of reality as well as giving a better experience overall. The second device used in our study was a wearable HR monitor band 'Mio Go' [33]. The Mio Go wearable band was chosen because it would help in monitoring the HR of a participant, especially seeing how fast or slow the heart beats for each video clip in relation to the haptic vest's vibrations. Mio Go has received positive reviews online from people who have purchased this product and use it regularly [39].

Other devices
Exhalia-the Exhalia device diffuses scents through cartridges from each of its four small fans. The cartridges contain scented polymer through which air is blown (through four built-infans). The SBi4 can store up to four interchangeable scent cartridges at a time, but we used a single slot in our experiments to prevent the mixing of scents [25].

Video clips
The video clips were associated with six scents: bergamot, lilial, clear lavender (low intensity), lavender (high intensity), lemon and raspberry. The accompanying olfactory content was modified in line with principles of olfactory-visual crossmodal correspondences that were previously discussed in the literature. The video with dominant blue images (V1) was watched with lilial odor, while the one dominantly yellow (V2) with the bergamot odor [32]. In V3, where brightness was considered the dominant visual cue, low intensity lavender odor was delivered concurrently to the users, while in V4, where the brightness was high, the olfactory content of high intensity lavender, was employed [32]. Finally, V5, the video displaying angular shapes, was matched with lemon odor, whilst V6, where the dominant shape was round, was delivered with a raspberry odor [37,100]. Participants watched 6 multimedia video clips each of 120 s duration. The view area was 1000 × 700 pixels. The resolution for each video clip was 1366 × 768 pixels and the frame rate 30 frames per second. The original sound was generated from the original video content. The clips were chosen based on visual features: colour, shape, spatial relations, and texture. These clips were chosen because they are based on natural scenes and contain low level information that would offer a more interactive and engaging experience.

Experimental preamble
Our experiment was focused on the cross-modal correspondence between olfactory and haptic effects, and their impact on user QoE. The experiment was carried out in a noiseless laboratory and lasted for approximately 40 min for each participant to complete. The Exhalia SBi4 device was placed at 0.5 m in front of the participant, letting him/her to detect the smell in 2.7-3.2 s [70]. All participants were explained the procedure and tasks involved in this experiment. Participants were seated behind a table, facing the 15.6-in. Lenovo Windows 10 laptop screen. Each participant was then provided with headphones (iShine), a haptic vest to wear (KOR-FX) and HR monitor wristband (Mio Go) as shown in (Fig. 1). When participants confirmed that wearing the haptic vest and HR monitor wristband were comfortable as well as being satisfied with the whole setup, they then continued to view the video clips. The experiment was approved by the Ethical Committee of Brunel University.

Experimental process
The experiment involved 6 video clips that were accompanied by olfactory and vibrotactile contents. Videos were viewed in a random order so that order effects were minimized. Olfactory content was emitted using Exhalia's SBi4 four built-in-fans blowing through cartridges that contain scented polymer balls. A program employing Exhalia's Java-based SDK was used to emit olfactory content throughout the duration of the video clips. Accordingly, scents were emitted for 10s at 30s intervals throughout the video clip (i.e., starting at 0 s, 30s, 60s, and 90s). When the Exhalia SBi4 was not emitting scents, the scent's lingering effect ensured that it was still noticeable for the next 20s, after which the SBi4's fans were switched back on to emit for the next 10s. Alongside odours, vibrotactile effects were provided throughout the whole duration of the clips, vibrating according to the associated audio soundtrack. After each video clip, participants were asked to complete a subjective questionnaire with a set of 7 questions in relation to QoE, designed to capture users' views and their overall experience of this experiment see (Table 3). Each question was answered on a 5point Likert scale with positive questions anchored at one end with "strongly agree" and with "strongly disagree" at the other end. These questions were developed based on the SUS, widely used amongst researchers and by a variety of industries [8]. Once the experiment was over, participants were further asked to complete paper questionnaires that featured SUS see (Tables 4 and 5) and UEQ based questions (

Analysis of self-reported QoE
We decided to investigate whether wearing a haptic vest with cross-modally mapped olfaction is more effective in enhancing user QoE. We tested the following hypothesis: -Users will have a positive experience whilst viewing multimedia with olfactory and haptic vest effects.
We used IBM SPSS software to run our statistical analysis. To check the effect that device type (haptic vest) has on QoE, we performed an independent sample t-test with group as independent variable and the responses to the 7 self-reported QoE questions as the dependent variables. A significance level of 0.05 was adopted for the analysis. and the results are presented in (Tables 7 and 8). Before analysing the data, we converted the scores of each negatively phrased question (Q3 and Q4) to the equivalent score associated with a positively phrased counterpart. As previously stated, participants self-reported the QoE by answering 7 Likert scale questions. The null hypothesis is that any differences in QoE between the CG and EG happen by chance. In Q1, there were no statistically significant results between the EG and the CG for all 6 video clips. This means that the participants' responses do not differ significantly, as the haptic effects which were automatically generated out of the content-original sound have contributed to the enjoyment. Throughout most of the video clips for Q2 there were statistically significant results between the EG and the CG, with the only exception being video clip 2. This suggests that participants in the EG have noticed the relevance of the haptic effect for these respective videos (1, 3, 4, 5 and 6), whereas in the CG the participants did not notice any effects. This is because no smell and effects were present in this group. On a positive note, in Q3 there were insignificant differences between both groups (EG and CG) revealing that the haptic vest effects were not distractive but rather pleasant. The same can be said for Q4 also showing insignificant differences as the haptic effects were generally not perceived annoying. In Q5 there were statistically significant differences between the EG and the CG in video clip 3 (p = .007), video clip 5 (p = .002) and video clip 6 (p = .007). This indicates that these clips made a positive impact enhancing the sense of reality for some participants in the EG however, the same does not apply for the CG who scored a higher mean across these video clips. In respect of Q6 there were statistically significant results in both EG and CG for video clips 1, 3, 5 and 6. In the EG the participants found the haptic vest enhanced their viewing experience to a certain extent, whereas in the CG some participants had a negative view and disagreed, whilst others had a neutral response. Lastly, in Q7 there were insignificant results in both the EG and CG implying that overall, the participants' enjoyed the multisensorial experience. The results confirm the hypothesis as the use of olfaction and haptic effects enhanced the users QoE as seen in the EG.

Analysis of post questionnaires
We compare results obtained after experiencing mulsemedia interaction in the presence and absence of olfactory, haptic vest feedback from users' and HR monitor wristband. We tested the following hypothesis: -Users exposed to haptic effects and olfaction would incorporate wearable devices into their daily lives Accordingly, an independent sample t-test was used to compare the responses obtained from the end of experiment paper questionnaires for the EG and the CG. The null hypothesis is that any differences in users' attitudes to wearable devices between the EG and CG happen by chance. The results are displayed in (Table 9). In respect of Q1, there were insignificant results between the EG and the CG. This implies that both groups found the haptic vest comfortable to wear. In Q2, insignificant results were found between both groups, (EG and CG). This    disagreed (M = 4.50). With that being said there were statistically significant differences between the EG and CG in Q7 (p = .029). These results suggest that the haptic vest would be worn by participants in their leisure time in the EG (M = 2.72) than the CG (M = 3.83). In terms of the HR monitor wristband, the results reveal that for Q1 there were no differences in responses across both groups (EG and CG). This implies that the users were pleased to wear the Mio Go wristband, which was deemed very comfortable to wear. In the case of Q2 there were also no significant differences in results as both groups agree that the HR device is helpful in terms of the activities it offers. For Q3, Q4 and Q5 there were insignificant results showing that participants' in both the EG and the CG prefer wearing the HR device in public, work and leisure time as opposed to the haptic vest. This could be because the HR device is discreet, small and can be concealed. Overall, this device has received positive feedback from users and would be worn in the future. Overall, the results confirm the hypothesis as most users who were exposed to the haptic effects and olfaction (EG) would adopt both of the wearable devices into their daily lives but the same cannot be said for the CG who were keener on adopting the HR monitor wristband than the haptic vest. This could be because there were no haptic vest effects present in this group.

Human factor results
In order to understand if age, gender and education (human factors studied in our work) influence a user's satisfaction and enjoyment of mulsemedia applications we analysed the impact of each on the individual items of the self-reported QoE questionnaire. We wanted to test the following hypothesis: -Age, gender and education will influence the users' QoE with wearables in a mulsemedia context To this end, we undertook an Analysis of Variance (ANOVA) test with age, gender and education as independent variables and the user QoE responses as dependent variables. The results of this analysis are shown in (Table 10) and the descriptive statistics are displayed in (Table 11). The null hypothesis tested is that QoE differences determined by age, gender and education groups happen by chance. ANOVA revealed a highly significant main effect of age (p = .000), gender (p = .020) and education (p = .002) for Q1. Apart from the age group (31)(32)(33)(34)(35), most of the age groups enjoyed watching the videos whilst wearing the haptic vest,

Analysis of HR
As a physiological metric, we employed Mio Go HR monitor wristband to carry out objective measurement. The HR of each participant was collected at the rate of one reading per second and measured in BPM. We collected 120 HR readings for 6 video clips. The HR readings for both group (EG and CG) varied with the means for each video illustrated in (Fig. 2). In order to understand whether there are any differences in the HR between the two groups (EG and CG), we tested the following hypothesis: -The users HR will not be the same in the EG and CG We undertook an independent samples t-test to test our hypothesis, the results of which are shown in (Table 12). The results demonstrate a statistically significant difference in the HR between the two groups for all the videos. We observed a tendency for a higher HR in the EG for the whole duration of the videos and we can refute the null hypothesis that the difference in HR between the two groups happens by chance. This indicates that the two groups experienced a different mood in the two setups: (i) the one using crossmodally matching smell, haptic vest and HR monitor wristband (EG) and (ii) the one where only HR monitor wristband was provided as no smell and haptic effects were present in the CG. The most significant differences in HR appear for video clips 1 and 6. This shows that these video clips considerably changed the user's mood especially in the EG scoring a high HR when compared to the CG. This could be due because of the content or the two setups. The results  have revealed that the users HR between the two groups was not the same it was different, and this confirms our hypothesis.

UEQ analysis
The UEQ short version was used in our study because we wanted to learn about the attitudes of users towards the two wearable computing devices (haptic vest and HR monitor wristband) employed in our study. The UEQ consists of 8 items (Table 6) recording two elements, respectively pragmatic and hedonic quality. At the end of the experiment participants from both groups (EG and CG) judged the two wearables. We used the short UEQ data analysis tool in Microsoft Excel developed by Schrepp, [93] to measure the reliability of the 8 items. The tool reports the mean, standard deviation, confidence intervals and Cronbach Alpha which are detailed in the following sections.

Cronbach alpha
Cronbach alpha is the most commonly used measure of reliability and has become widespread in the literature [22]. Cronbach's alpha provides a measure of internal consistency of a test and generally ranges in value from 0 to 1. Internal consistency assesses the inter-correlations between items that should all measure the same construct [106].   [93] has designed the tool that rescales the data from seven-stage (7-point Likert scale) to the range − 3 to +3 and calculates the scale values for pragmatic and hedonic quality per person. Accordingly, −3 represents the most negative answer (horribly bad), 0 a neutral answer, and + 3 the most positive answer (extremely good). Also, the scale means are interpreted with values between −0.8 and 0.8 that signify a neutral evaluation of the equivalent scale. The values that represent a negative evaluation are < −0.8 and the values that represent a positive evaluation are >0.8 [35]. The results in the EG have shown the mean values are above the threshold of >0.8 for both pragmatic (0.94) and hedonic (1.01) quality of the haptic vest. The mean per item has positive values apart from item 1 (M = 0.38) and item 8 (M = 0.72) that had a lower mean. This suggests that some participants found the haptic vest quite obstructive to wear and did not find the device as a leading edge. Also, most of the participants' responses were average and their outlook on the haptic vest leaned slightly more towards hedonic implying that they found the wearable device fun and exciting to wear. However, there were two items (2 and 4) in the pragmatic quality that stood out and have highly positive mean indicating that participants deemed the haptic vest to be clear and easy to use. The mean values in the CG displayed (1.00) for pragmatic quality items and (1.12) for the hedonic quality items. The mean values per item were generally positive and participant's responses were similar across the board for both pragmatic and hedonic qualities. However, item 7 in the hedonic quality scored a higher mean as participants found the haptic vest inventive. These results can be seen in (Fig. 3).
In terms of the HR monitor wristband the results unveiled mean values of (1.54) for pragmatic and (1.47) for hedonic quality in the EG. The mean per item were more positive than the haptic vest as shown in (Fig. 4). Items 1, 2, 3, 6 and 8 have high positive mean this shows that the participants' found the device supportive, easy, efficient, interesting and leading edge. Overall, more items had a greater mean in respect of the pragmatic quality as participants found the HR monitor wristband very useful to wear in terms of its functionalities. The mean scores of pragmatic and hedonic quality were very close and participants found the device appealing. The mean values were also above the threshold in the CG for both pragmatic (1.33) and hedonic quality (1.62). The mean values per item were all positive, participants impressions towards HR monitor wristband leaned more on the hedonic quality items. Items 7 and 8 had a higher mean with the same value (M = 1.83) as participants found the wearable device inventive and leading edge. However, from the pragmatic quality item 1 had a very high mean (M = 2.00) suggesting that participants found the device rather supportive see (Fig. 4). Furthermore, the confidence intervals of our values are provided in (Tables 13, 14, 15 and 16).

Discussion
From the self-reported questionnaire, the findings from the independent sample t-test conveyed that there was a significant difference in the user responses between both the EG and CG. Participants who wore the haptic vest with effects (EG) found the wearable device effective in its utility, as employing mulsemedia did enhance users' QoE. However, participants in the CG were not enthused when viewing the video clips as they did not get much out of the device (haptic vest) because the effects were not present in this group. As regards to the end of experiment paper questionnaires, the responses to the SUS questions revealed that the users' responses in the EG were neutral towards the haptic vest as compared to the CG where participants had a rather negative attitude. On the bright side, both groups were satisfied in wearing the HR monitor wristband and would incorporate the device into their daily lives. The implications of these findings suggest that the participants in both (EG and CG) were keener in adopting the HR monitor wristband than haptic vest into their daily lives and it could be because of the design. As previous literature [4,5,13,64,80,84] have shown that design issues create a barrier for user adoption. Regarding the HR monitor wristband this type of device is well-known amongst people as they are well acquainted with it as opposed to the haptic vest. HR monitor wristbands are very popular in the health and fitness market [62]. However, not many people are aware of a haptic vest as it is less known so this could be another reason why users may have been hesitant in adopting this device.
The literature so far has looked at human factors in multimedia as many studies [95,119,120] have found that human factors influence user QoE. However, not much research has been done exploring the impact of human factors in mulsemedia. Indeed, only one study [68] has examined human factors with mulsemedia, and not in respect of wearables. To the best of our knowledge human factors have not been considered before for QoE with wearables in

Mean value per item (EG)
-3 - 2  -1  0  1  2  3   1  2  3  4  5  6  7  8 Mean value per item (CG) Fig. 3 Mean values for the haptic vest (Blue bar is for pragmatic items and Yellow bar is for hedonic items) Mean value per item (CG) Fig. 4 Mean values for the HR monitor wristband (Blue bar is for pragmatic items and Yellow bar is for hedonic items) mulsemedia context. We explored a subset of human factors such as age, gender and education. The human factor results showed that all three demographics (age, gender and education) have an influence in users' QoE with wearable devices. However, age and education were the two influencing factors that impacted the users' responses the most. These findings confirm the work of [68,95,119,120] who found age, gender and personal interests influences users QoE. Our study has shown that human factors are important to consider when evaluating QoE with wearables as one can gain substantial insights. Lastly, the UEQ part of the questionnaire has revealed that the EG and the CG participants responses leaned slightly more towards the hedonic quality for most of the items regarding the haptic vest. For the HR monitor wristband, the EG leaned more on the pragmatic qualities whereas the CG scored higher in the hedonic qualities. These results imply that participant's impressions towards the wearables were mostly linked to the hedonic quality items. The participants found the wearable devices fun, original, interesting, and engaging.
The results have conveyed that there were mixed views towards the two wearables employed in this study. However, majority of participants prefer the HR monitor wristband in terms of its practicality as opposed to the haptic vest. In summary, it appears from the results that the users from the EG enjoyed wearing the haptic vest and that it enhanced their overall experience as compared to the CG. This could be due to the use of olfaction as well as the content itself.

Guidelines
There are guidelines that have been presented for wearables in the literature mostly related to design aspects. Few guidelines exist to assist developers and designers in creating accessible wearables. Of these, worthy of mention are those of Wentzel et al. [111] and Wentzel and Geest [110], who created a set of design guidelines for accessible wearables that cater to the needs of people with a disability. They evaluated the guidelines with developers, researchers, and visually impaired people. Burak and Özcan [16], extracted generalisable design guidelines from their research about how to design wearables and movement-based gameplay for tabletop role playing experience. From their results, they evaluated design implications from players related to game design and accordingly designed a new gaming system (WEARPG) that incorporates arm-worn devices and movement-based gameplay in tabletop role playing experience. By testing their new system amongst users, new design guidelines were identified enabling the authors to improve the system before developing a prototype. Accordingly, they designed and implemented an arm-worn device and a tangible device. The use of wearables and movement-based gameplay increased player's immersion experiences. Much earlier, Gemperle et al. [28], examined dynamic wearability and proposed design guidelines insisting that unobtrusive placement is an important consideration as well as keeping aesthetics in mind. Overall, though, not many guidelines have been presented in relation to wearables. Moreover, to the best of our knowledge there is no single study that has put forward a set of guidelines in evaluating QoE with wearables in multimedia and mulsemedia contexts. This is especially even more surprising, given the importance of QoE to the user multimedia experience. The work presented in this paper is grounded on an experimental QoE study designed to understand users' attitudes and behaviour towards wearable devices. The set of guidelines are meant to be used to inform developers and researchers in evaluating user QoE for wearable devices. From the quantitative findings, we have derived a set of guidelines that provide a foundation upon which to provide insight and direction to developers when developing wearable devices suitable for use and capable of satisfying users' needs see (Table 17).

Guideline 1: Ensure the device can be affixed sturdily on the body
If the device is to be affixed to the body, it is important that it is secured on properly making a user feel comfortable. When attaching the devices on a user's body one must leave room for movement. Users should be able to move around contentedly without feeling uneasy therefore the fitting should be not too tight/loose and a confirmation from the user is crucially important. In respect of our study, we assisted the participants in putting on the haptic vest and strapped the HR monitor wristband either on their left or right wrist based on their preference. This may have contributed to most participants reporting that they found the two wearable devices comfortable and enjoyed wearing them. From a design point of view Rutter [89], has deliberated that the design of a wearable device is good when it fits perfectly with the user's body. In our studies both wearables were fitted properly on the user's body to ensure comfort as well as users' getting the upmost experience as if the fitting is loose or not attached suitably this could lead to a negative user QoE. The guidelines are as follows: 1 Ensure the device can be affixed sturdily on the body. 2 Facilitate adjustable seating considering height, armrest and backrest to ensure user comfort. 3 Ensure a user is positioned correctly when facing the computer screen. 4 Perform experiments in a quiet environment with minimal distraction. 5 Display cross-modally mapped multimedia content 6 Incorporate human factors (age, gender and education). 7 Include insights of hedonic and pragmatic qualities of wearables. 8 Design subjective usability questionnaires aligned with the device type. 9 Utilise objective QoE measures (e.g. HR). 10 Stimulate unobtrusive/ subtle wearable device and use.
10.2 Guideline 2: Facilitate adjustable seating considering height, armrest, and backrest to ensure user comfort Adjusting the chair in terms of its height, to be not too high/low is important. We encourage to correct a user's sitting posture to avoid them encountering any physical pains especially if a user is to be seated for over 30 min. We recommend using a chair that has the ability of adjusting the seat height, armrest and back rest, as this will provide a comfortable and relaxing sitting position. As mentioned by Ayoub [7] and Allie and Kokat [3] the right chair height is when a user's feet are flat on the floor. It is vital to be mindful of the importance of good posture as making sure users sit up straight can boost their self-confidence and mood [94]. In our study, we correspondingly adjusted the chair's height, arm rest and backrest until the users were satisfied. A study by Murray et al. [70] highlights that comfort is crucial when carrying out an experiment on olfaction-based mulsemedia QoE. The authors have found that getting the height as well as a user's posture intact leads to unbiased results.

Guideline 3: Ensure a user is positioned correctly when facing the computer screen
One must ensure that a user is positioned correctly when facing the computer screen, bearing in mind the distance should be not too close or far from the desk. In the case of our study, we checked and adjusted accordingly the monitor screen as well keeping a good distance between a user and the desktop computer. As suggested by Chandra et al. [18] and Woo et al. [112], the screen monitor should be positioned in the centre in front of user's eyes to avoid neck and shoulder pain. Also, the screen's height should meet the level of the user's eyes for instance, a short person cannot have his/her screen in the same position as a tall person. Moreover, the monitor viewing distance should be arm's length away when a user is sitting in their chair. To minimize eyestrain a user must not be positioned too close or too far from the screen. Whilst this guideline is well known for desktop-based computers, it is reassuring to know that it also applies to when users are looking at the screen whilst having wearables on them. Specifically, in our experiment users were exposed to multimedia video clips whilst wearing two wearable devices; therefore, it was important that users were positioned correctly to ensure their viewing experience was not affected.

Guideline 4: Perform experiments in a quiet environment with minimal distraction
Experiments should take place in a quiet room where there are no distractions that keep users' from maintaining focus and productivity. We conducted our experiment in a noiseless and spacious room, where the walls were white. As recommended by Murray et al. [70], and ISO standard [45], when performing olfactory evaluations, the walls in the rooms should be mattoff-white to minimize the effects of synesthesia.

Guideline 5: Display cross-modally mapped multimedia content
Multimedia videos should accentuate the core content to enhance users' QoE. Our videos were crossmodally matched to certain objects, colours and smells. However, some videos were better perceived than others. The results conveyed in our study showed that participants in both EG and CG enjoyed watching all 6 video clips in response to Q1 as shown in (Table 7). This could be because olfaction was employed, or that haptic effects may have attracted the users' attention when viewing the video clips. Specifically, 4 of the video clips (blue, dark, angular and round) made a great impact with participants in the EG who reported that the haptic vest enhanced sense of reality as well as their viewing experience ( Given the impact that human factors have on QoE with wearable devices, it is essential to incorporate various dimensions of human factors (e.g. age, gender, education) into any QoE evaluation for wearables. As highlighted in the Qualinet paper, one of the influencing factors of QoE is human that relate to a user [12]. As defined by Kohn et al. [52]: "Human factors examine the relationship between human beings and systems with which they interact". Based on our initial findings human factors play an important role as age was predominantly the influencing factor. Also, education had a significant impact on the users' QoE. In the context of this guideline, Scott et al. [95,96] have previously explored the influence of human factors on perception of multimedia quality, perceived video quality and enjoyment. From their results they found that human factors such as personality and cultural traits play a key role and influence users' responses especially in the way enjoyment and perceived quality are rated. Additionally, Zhu et al. [119] explored user factors in video QoE. They found that gender and cultural background have a significant impact on users QoE as females were more involved in the viewing experience of the videos than males. The cultural background results were shown to have impacted QoE ratings as Asian participants rated their QoE much higher than Western participants. Although we only explored a subset of human factors in our work, it is perfectly plausible that other factors such as personality and culture should also be considered in evaluating QoE with wearable devices, and it is left to future research to confirm this hypothesis.

Guideline 7: Include insights of hedonic and pragmatic qualities of wearables
In our work, we explored considerations of hedonic and pragmatic qualities of wearables using a UEQ questionnaire designed by Schrepp et al. [93]. The UEQ questionnaire is based on selfreported measures to assess the user's experience when using a technical product regarding hedonic and pragmatic product qualities. We incorporated the UEQ questionnaire for our second study. Participants interacted with two different wearable devices and the responses from the EG to the haptic vest leaned more towards the hedonic qualities as they found the device fun and exciting to wear. However, there were some pragmatic qualities found in the responses as the haptic vest was perceived to be clear and easy to use. In the CG participants responses were similar between hedonic and pragmatic quality items. However, item 7 received a high score as participants found the haptic vest inventive.
In terms of the HR monitor wristband both hedonic and pragmatic qualities of the device were well received by the participants in both groups (EG and CG). However, the EG leaned slightly more towards pragmatic quality items whereas the CG leaned towards the hedonic quality items. Resulting from their work on perceived qualities of smart wearables Karahanoğlu and Erbuğ [50] have found that hedonic qualities are essential as well as pragmatic qualities. Merčun and Žumer [63], have commented that both hedonic and pragmatic qualities combined would lead to either positive or negative emotions and guide the acceptance of the product. In our work, we have found that these recommendations are also applicable when it comes to enhancing the QoE associated with the two wearable devices employed in our study.

Guideline 8: Design subjective usability questionnaires aligned with the device type
Subjective measures are usually carried out in the format of questionnaires; it is important therefore that the questions are designed carefully and are aligned with the device type. Typically, keep the questions clear and concise, so that they can be answered easily by the user. Using simple language is recommended as it will help users understand the questions and inform them the goals of our experiments. Having questionnaires aligned with the particular device type is one of the things we did in our study. Also, having clear, unambiguous questions is one of the principles of good questionnaire design [15].

Guideline 9: Utilise objective measures (such as HR)
Utilising objectives measures when evaluating QoE with wearables is equally important as to using subjective measures. Wearable sensors that are worn in contact with a user's body measure physiological responses such as the HR, blood pressure, body temperature and many more [23]. In our work, we used a Mio Go HR monitor wristband device to carry out objective measurement. We connected the wristband to a smartphone via Bluetooth where continuous physiological data of a participant was collected and transferred into a mobile application. In our study we wanted to find out if there are any differences in the HR between two groups (EG and CG). The participants HR varied, and HR monitor wristband was shown to be useful in providing insights, which otherwise would have proven hard to uncover. Accordingly, we have learned that certain video clips increased user HR and had an impact on QoE. Moreover, the recommendation of employing HR monitor is in line with previous research, as Vermeulen et al. [108] have stated that HR sensors are non-obtrusive in comparison to other physiological sensors such as those measuring galvanic skin response GSR. The HR sensors are subtly embedded into devices such as fitness trackers or smartwatches that people are already wearing.
10.10 Guideline 10: Stimulate unobtrusive/ subtle wearable device and use Stimulate unobtrusive/ subtle wearable device and encourage use in a public environment. The end of experiment questionnaires (Table 4: Q5, Q6, Q7 and Table 5: Q3, Q4, Q5) regarding whether participants would consider in wearing the two devices in public varied. From our study it appears that most participants preferred wearing the HR monitor wristband and would incorporate it into their daily lives (work, public and leisure time). However, the user's attitudes were generally neutral towards the haptic vest. This emphasises that some users may wear the haptic vest in the public. Users who were not keen may be reluctant to wear a haptic vest due to its design; moreover, the appearance of the haptic vest is not discreet, as opposed to the HR monitor wristband. The haptic vest does not necessarily need to be worn over the user's garments -it can be worn underneath their top or shirt that way it would be hidden. Again, this recommendation is in line with previous research such as that of Rekimoto [87], who suggests that, in order for wearable devices to be adopted for everyday use, they should be unobtrusive and natural as possible. Should this be the case, our work suggests that user QoE can be enhanced.

Limitations
There are a couple of limitations to be addressed, initially this is the first study to investigate wearables user QoE within the context of mulsemedia however we only used two wearable devices in our studies. Secondly, the sample size of 24 in our study is fairly small but was adequate for this research. Thirdly, whilst the multimedia content (6 video clips) that we employed for our study were chosen for the specific experimental purposes of our study, they are not representative of general multimedia content and that future work could explore wearables QoE with more representative multimedia content, in which genres such as movies, sport, music, documentaries, etc. are also represented. Another limitation to our study is that for HR we did not explore mood but is something to consider in future work. Lastly, we combined the views of user's experiences with two wearable devices from our study and presented a set of guidelines. The guidelines will aid developers and researchers in evaluating user QoE for wearable devices. Both developers and researchers can apply the guidelines to evaluate existing wearable devices and can expand upon them accordingly. Whilst the guidelines are evidence based, yet they need to be validated and generalized with researchers as well as developers who have previous experience in developing wearable devices. Also, the draft guidelines require validation from experts who have immense knowledge with the QoE concept. Validation would prove that the guidelines are acceptable and can be used in many contexts. Without validation we cannot guarantee of how sound our guidelines are and that is a limitation to our study.

Conclusion
The main contribution of our work is that we have explored and discovered user attitudes and perceptions associated with wearables. Before we carried out this research, we found that in the literature there was not a single study that evaluated user QoE with wearables in mulsemedia context. This existing gap in knowledge motivated the need to explore wearables in light of capturing users' attitudes and behavior with such devices. Wearables have undoubtedly been trending in the consumer market, but user acceptance and user's level of enjoyment were unexplored areas. Accordingly, we worked towards the goal of finding out users' views and opinions in relation to two wearable devices (haptic vest and HR monitor wristband). Our work has shown that users' views towards the two wearables employed in our study were generally positive and the QoE was enhanced to a certain extent. The results from this study showed that the wearable devices as well as the integration of olfaction made a considerably positive impact whilst users viewed the video clips. A significant difference was measured with and without the haptic effects and olfaction between two groups (EG and CG). Differences were found as the EG appeared to have likened the use of the haptic effects that heightened their experience as compared to the CG. We believe that the reason for the difference in user QoE is due to the level of immersion, as the haptic vest vibration effects significantly impact user level of enjoyment greatly as seen with the EG. The user QoE was found to be significantly low in the CG as they did not feel any haptic effects and did not engage well with most of the video clips as some of their responses were either neutral or leaned more towards the disagree statement. The end of experiment questionnaires were based on (SUS and UEQ). The responses to the SUS questions highlighted that the users' in both groups would employ the HR monitor wristband more than the haptic vest in their daily lives. Regarding the UEQ the user's in the EG and CG leaned more towards the hedonic qualities of the haptic vest. However, the responses from the EG for the HR monitor wristband leaned slightly more towards the pragmatic qualities whereas the CG favoured the hedonic qualities of this device. From the results we believe that any device that is perceived to enhance user QoE has a chance of being accepted by a user. Although the functionalities of wearables may be useful, nonetheless the design of such devices plays a key role. Accordingly, our results showed that the haptic vest did heighten user's level of enjoyment but the responses to whether the participants would wear it in their daily lives were neutral.
We have also presented a set of guidelines as emanated from the experimental study. The study was carried out to address the existing gap in knowledge, and, on their basis, we have identified the attributes that will enhance users' QoE. Consequently, we have formed the attributes identified from our findings into guidelines. Developers, researchers and designers may apply the guidelines that are applicable to the context of use to their studies, as not all of them will suit the user's requirements. Moreover, whilst some of the guidelines are also applicable to traditional, desktop computing scenarios, our experiments have highlighted their pertinence to wearable computing QoE. Lastly, it is also important to remark that, although we have presented a set of empirically derived guidelines, they are yet to be validated and generalized.

Conflicts of interests/competing interests The authors report no Conflicts of interests/Competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .