Towards an integrated framework to measure user engagement with interactive or physical products

Building great products or services is not easy; users want products and services that exceed their expectations and evolve with their needs; it is not just about building the right features. Knowing the user engagement (UE) towards a physical, virtual product or service can give valuable information that could be used as feedback for the design, enhancing its chances of success. In the context of user-centered design, UE is the assessment of the user experience characterized by the study of the individual's cognitive, affective, and behavioral response to some stimulus, such as a product, a service, or a website. UE considers not only the users’ requirements and wishes but also their perceptions and reactions during and after an interaction with a product, system, or service. Many studies looking to quantify the UE are available. Still, a framework that provides a generic view of the most commonly used methods and metrics to measure UE does not yet exist in the literature. Aiming to understand the UE better, in this research, we developed a conceptual framework summarizing the available metrics and techniques used across different contexts, including good practices of self-report methods and physiological approaches. We expect this study will allow future researchers, developers, and designers to consider the UE as one of the most prominent product/service success indicators and use this guideline to find the more appropriate method, technique, and metric for its measurement.


Introduction
User engagement (UE) is an attribute of the user experience characterized by assessing the user's cognitive, affective, and behavioral investment when interacting with a digital system [1]. The UE can be interpreted as the level of involvement and absorption into an activity [2]. It has been widely defined as the process involving users through interactive experiences to create and enhance user-product relationships [3]. In this context, the product can be a physical or virtual object, service, or experience online and offline.
While developing a product, choosing carefully the most suitable methods to know the user's requirements and needs becomes an advantage [4,5]. In this sense, User-centered design (UCD) is one of the most comprehensive methodologies and philosophies for designing interactive products [6]. It enables the creation of valuable and usable products by significantly involving users [6] in every stage of product development. Following the UCD perspective, the UE considers not only the users' requirements and wishes but also their perceptions and reactions during and after an interaction with a product.
UE is a multifaceted, complex phenomenon with several potential and different measurement approaches [8]. However, it is an abstract construct that manifests differently within the whole spectrum of uses and applications; for this reason, one of the challenges is to define how the Engagement can be measured so that it can be used in design and evaluation. Studies have shown the necessity for a more context-specific approach when constructing measurement scales regarding the Engagement and the desire to obtain a set of items that allows measuring it in different sectors and natures [7].
Moreover, due to the variability of its application in different contexts and scenarios, an exact definition of "user" within the UE terminology is unclear. According to the various definitions of "user" and "consumer," in this study, we use these terms as synonyms, considering the user as the final consumer of a product, service, or experience. For example, in education, the end-user is the student; in health services is the patient; in the marketing area, the buyer; and in video games, the player.
All methodological approaches have advantages and limitations regarding UE use in specific populations and settings, and the measures may capture interactions subjectively or objectively [8]. As McNeal et al. [9] realized, performing direct UE measurement and establishing associated metrics can be challenging. There are studies conducted in fields like education, health, videogames, marketing, arts, and social media, among others. Still, one of the major gaps in the UE literature is the lack of evidence about how it should be standardized and measured [136].
Many studies have aimed to quantify the UE; however, a framework that provides a global and generic view of the different scales and methods used to measure it does not yet exist in the literature. After performing a literature review analysis, we found that most methods and metrics are classified into two main focuses: self-report or physiological. In this paper, the term "physiological" describes the measurement done using a physical indicator and an associated instrument that depends on biological responses, for instance, measuring the students' engagement using electrodermal activity [2,9] or heart rate analysis [10], or EEG signals [11,12].
The methods and tools are widely varied; this opens new opportunities for evaluation and metric selection. Thus, with so many options for measuring UE, there is a need for an updated conceptual framework that summarizes the measurement methods and metrics and classifies them according to their parameters, tools/instruments, and context of use.
This study aims to provide a literature review regarding UE's most commonly used measurements and develop a conceptual framework of the current methods and metrics used for Engagement measurement. Specifically, the study addresses two objectives: first, summarize the leading sectors and measures found in the literature. Second, develop a conceptual framework that outlines the most commonly used methods and metrics, particularly physiological approaches, according to their application context. For this study, we use the definition proposed by Maxwell [138], who described the conceptual framework purpose as "clarify, explain, and justify methods" and later complemented by Burkholder et al. [139] as a theory or literature review that informs and describes the development of a research question, data selection and collection, analysis, and presentation of findings.
We expect this framework will allow future researchers to start from an initial state of the art, where the methods, techniques, and, whenever possible, the measurement scale used to measure UE would be easily found. Thus, this study's perimeter offers guidance and a general scenario of the most commonly used methods, tools, and techniques. This paper is divided into the following sections. Section 2 sets the methodology used for the review process. Section 3 reports the literature review analysis, exploring the most frequently used methods according to their parameters. Section 4 includes the results included in the framework, and in Sect. 5, there are conclusions and discussion.

Method
Authors such as Schimanski et al. [13] and Dresch et al. [14] have made explicit the importance of the information flow in science research [13,14]; therefore, for carrying out the literature review, we have followed the approach used by Motyl and Filippi [137]. In Fig. 1, we have depicted the flowchart process used for our literature review.
For this research, the question posed is defined as follows, "What are the methods used to measure the user engagement?". Consequently, the search string used is as follows: (engagement) AND (measure OR measurement) AND (user OR consumer). The search was carried out in the Scopus DB and Google Scholar from years 2000 to 2020. Also, we have considered proceedings documents and additional sources such as Microsoft academics and Crossref. The following exclusion criteria were applied: duplicated records; documents not in English; documents written before 2000; and from the other sources, as crossref, only documents with registered doi.
The search protocol in Scopus DB and Google scholar initially matched 157 documents from 2000 to 2009, 229 documents from 2010 to 2015, and 493 from 2016 to 2021. It resulted in a total of 879 for the last 20 years and more than 600 additional documents from other sources. In Table  1, there is a summary of the document's quantity for each year set in the first step of the flowchart.
After the initial match, we applied the exclusion criteria, obtaining 508 documents. Finally, after the abstract and full document analysis, the number of documents considered for this review was 380. After data analysis from the last 20 years, the numbers show an increase in the number of citations in this regard. The most influential articles had more than 300 citations in the last two decades. Particularly, between the  [20] who created a questioner to measure the engagement in video game-playing, impacted the current research on this subject. After the procedure illustrated in Fig. 1, we classified the documents according to the method used, obtaining a collection of articles for each type (Table 2). Based on the results obtained, we found that most studies applied self-report methods to measure the UE, while others used physiological methods. Over the studies analyzed, we found that the most common methods used to measure UE using physiological

Review of user engagement methods and technics
The literature review analysis shows that the UE measurement methods can be classified into two main groups, self-report and physiological. Many examples of good practices are available in the literature; according to these, the self-report methods can be categorized by approach (quantitative or qualitative) and type (interview, questionnaire, or survey). In addition, we have noticed that UE dimensionality was considered in many studies to describe and compare the self-report methods. Regarding the physiological methods, we have found seven approaches, as reported in Table 2. Still, there is no evidence of a classification compendium for measuring the UE from the literature review. To fill this gap, in the following sections, we will explore the methods and technics most used according to the literature review analysis; our interest is to give an overview of the current UE methods to help researchers and practitioners quickly recognize the available methodologies for analyzing the engagement with a product or service. Table 3 shows an overview of the methods and parameters that will be reviewed in the following sections. Later, in Sect. 4, the conceptual framework will contain the analysis results obtained during the reviews carried out for each type of method.

User engagement self-report methods
Self-reported methods are assessments where users are asked to report their responses directly. Many standard measurements of attitudes, such as Likert scales and semantic differentials, are self-report. Similarly, constructs of interest to researchers, such as behavioral intentions, beliefs, and retrospective reports of behaviors, are often measured using this method [134,141]. The self-report methods used to analyze the UE can be classified according to type or approach. Regarding the type, the studies typically used questionnaires, interviews, and surveys. Authors such as Lalmas et al. [8], Fredricks and McColskey [16], and Henrie et al. [17] have made significant contributions to the recompilation of self-report instruments.
The self-reports can also be categorized by the approach used (qualitative or quantitative); in this case, the classification is made according to the following criteria: quantitative, the instrument that includes numerical scales, and qualitative the instruments that do not contain numerical scales in its measurement.
From the literature review, considering these factors, the self-report type, the approach, and the context of the application (sector), we found numerous tools and methods to measure the UE. Many of these tools and techniques were created ad-hoc for each study, so the chances of standardization are low. However, some instruments are more frequently used. For example, the user engagement scale (UES) [18] is one of the most used standard questionnaires since it aims to measure the quality of the user experience. The National Survey Student Engagement (NSSE) measures student participation in Canada and the United States concerning learning and engagement. At the same time, the Student Engagement Questionnaire (SEQ) [19] aims to capture student feedback and perception in a learning environment. In the field of task demand, the National Aeronautics and Space Administration (NASA) uses the NASA-TLX instrument as a subjective workload assessment tool to allow users to perform subjective workload assessments on the operator while  [20] identifies the user's psychological engagement when playing video games. Table  4 summarizes a collection of good practices using self-report methods for the UE analysis in different segments. Many UE studies have used almost exclusively selfreport measurements; however, recent advances are making other types of evaluations more feasible. For example, selfreported outcomes can be contrasted to others that do not rely on respondents' reports, as physiological approaches that measure respondents' behaviors, sometimes in a constrained or controlled environment [141].

User engagement dimensionalities
Many researchers have explored the theoretical foundations as a systematic conceptualization of the UE [35], defining different types of sub-constructs or types of engagement [17], known as User Engagement dimensionality (UED), another way to compare self-report assessments [16].
UED describes the magnitude that captures a person's idea as a goal-oriented being equipped with a set of processes that guide their behavior in a changing environment. In other words, it is the level of interaction between a user and a product in a specific context; meanwhile, the user develops a perception that may or may not generate an engagement. The dimensions definition depends on the product; moreover, it depends on the potentially engaged user perspective.
From the studies' examination, it is possible to notice considerable differences in the dimensions that constitute the term engagement [36]. Two main dimensionality groups emerged, defined as uni-dimensionality and multidimensionality; these groups allow to develop and estimate diverse theoretical and empirical models about how various characteristics of customer relations with a product impact the user actions, behaviors, and intentions.
The uni-dimensionality is commonly expressed [35] as one of the three following engagement aspects: I) Emotional -Affective, II) Cognitive, or III) Behavioural. It retains the characteristic of simplicity and concept uniqueness, without extensive in-depth analysis and dimensions combination, in contrast to the multi-dimensionality that focuses dynamically on the three aspects defined previously. There is no agreement on the elements that should be primarily considered in uni-dimensionality or multi-dimensionality, but both provide helpful insights concerning the UE concept's appropriateness [37]. Furthermore, despite the number of studies supporting the UE's importance, there are still sharp controversies about its definition and the number of dimensions it includes [36,37].
Furthermore, studies on UE must consider the characterization of the intended measure; in this regard, an important definition comes from Diamantopoulos and Winklhofer [140], who described the difference between reflective and formative measurements. Measurement development can focus on items composing a scale, perceived as reflective indicators of an underlying construct. Or, as an alternative measurement perspective, it could be based on formative indicators that involve creating an index of a weighted sum of variables rather than a scale.
The sub-constructs vary between authors and segments of application. However, we observed that the multidimensionality defined by Fredricks et al. [38] as Cognitive, Emotional, and Behavioural, and subsequently reviewed by Brodie et al. [35], Hollebeek [39], among others, is the most popular. Within the last five years, the research carried out by Dessart et al. [36] is particularly relevant for the topic; the study provides a classification of dimensional UE scales in the context of online brand communities. According to the literature review, it is possible to infer that the multidimensionality accomplishes better the principles that define the concept of UE because it includes a more in-depth analysis of user perception. Table 5 summarizes some examples of dimensional classifications for different segments based on and adapted from Brodie et al. [35], Hollebeek [39], and Dessart et al. [36].

Physiological methods
Body language is a more reliable and authentic form of transmitting information than verbal communication [67]. Without realizing it, the body continually sends information about intentions, feelings, and behaviors. Even without verbal expressions, the physiological factors speak for the body and can be very significant.
During the development process, to create a successful product with higher positioning and differentiation [68], it is crucial to choose the most appropriate method to understand the user requirements [4,5]. For these reasons, as anticipated in Table 3, we have classified the existing physiological methods used to measure the UE level with a product according to: the type of physiological measurement, measurement range period (time in which the measurement is performed), the procedure for data analysis (process used to measure the UE through the implemented method), engagement scale (used or identified scale to measure the UE), and the equations used to calculate or facilitate the UE understanding and classification.
The reviewed physiological methods are Skin conductance, Heart rate, Electroencephalography (EEG), Pupillometry, Posture analysis, Respiratory rate, and Facial expressions. These methods will be further described in the following sections.

Skin conductance
Electrodermal activity (EDA), also known in the literature as skin conductance response and galvanic skin response (GSR), are terms used for defining the continuous autonomic variation in the electrical properties and characteristics of the skin. It can reveal the humans' physiological arousal, connected to a subject's attention and alertness, and is considered a suitable approach for measuring the UE during different tasks, stimuli, or situations [2,9].
GSR includes a slow variation or tonic component known as the skin conductance level (SCL) and the skin conductance response (SCR). The tonic component SCL is interrupted by increased skin conductance due to a particular stimulus [9]. Studies have used a variety of sensors to measure the variation of the electrical properties and characteristics of the skin. For example, Morrison et al. [69] and Di Lascio et al. [2] used the Empatica E4 wristband galvanic sensor; this sensor allows to record the skin conductance variation measured in μSiemens in a specific time interval and report the user experience. The task could be split into three main periods to guarantee a good   • Relax period to obtain a baseline recording • Task development period • Recovery period After filtering the data, it is possible to plot the variations (on the vertical axis, the Skin conductance measured in μSiemens, and on the horizontal axis, the unit time). In this order, it is possible to identify the three periods. It is critical to note that every user will have specific data, and every plot will have different variations according to the type of activity measured; consequently, it is difficult to generalize the level of engagement.
For that reason, and because the GSR signal is highly individual-dependent and can vary a lot from user to user [73,76], many studies [2,[71][72][73][74][75] carried out normalization of the skin conductance data, comparing the users' skin conductance during the task development against their baseline recording. In order to counteract this dependency, the following normalization is typically used (Eq. 1): Equation 1-Normalization of the skin conductance data. Lykken and Venables [131].
• i x Indicated the normalized value • SC L(i x) Indicates the current value • SC L(min) Indicates the minimum value • SC L(max) Indicates the maximum value After normalizing the entire signals (rather than just the tonic component), it is possible to generate a general comparable data graph, as shown in Fig. 3.
Finally, Di Lascio et al. [2] stated that the level of UE in skin conductance could be measured using the normalized graph by dividing the different levels of skin conductance into five groups: • Very high engagement between 0,8 and 1 • High engagement between 0,6 and 0,8 • Normal engagement between 0,4 and 0,6 • Low engagement between 0,2 and 0,4 • Very low engagement between 0 and 0,2

Heart rate
Heart rate (HR), known as pulse, is the number of times a person's heart contracts per minute; these contracts vary from person to person according to age, weight, body size, heart conditions, medication, and performed activities [129].
The HR is highly susceptible to variations due to the emotions and reactions of the subject; for that reason, different authors have used this physiological characteristic as a measure of UE. Previous studies have identified and confirmed that the HR and the level of engagement to a task relate to the HR variation during an activity development; this, because a person's heartbeat reflects emotional levels and reactions [10,[77][78][79]134]. Furthermore, the HR tends to increase when the user is experiencing strong emotions and tends to decrease in an immersive environment [80,81].
The method to identify the engagement of a user using the HR [77,[81][82][83] can be generalized by separating the measure results into two stages (similar to GSR), the first to set the HR baseline and the second to measure the corresponding HR during the development of an activity as can be observed in Fig. 2(b). After data filtering, the HR can be analyzed graphically in two ways: the first one analyses the HR using a graph where the vertical axis corresponds to the beats per minute (BPM), and in the horizontal axis, the unit time; in this way, it is possible to identify the increases and decreases peaks in the HR measured periods. The second method calculates the HR variability (HRV) according to the alterations in the time interval between consecutive heartbeats in milliseconds. To measure the HR, the authors have used a wide range of trackers, and the selection depended on the type of study performed and the required accuracy.
Authors like Darnell and Krieg [10] proposed a linear regression of the recorded data to establish the variations throughout the activity execution, concluding that there is an HR decrease in comparison to the initial one, except for the fortuitous stimuli that can be due to factors internal or external to the test. Richardson et al. [78] proposed a data normalization to remove any baseline difference in the user data and verify the variation more concisely.
Consequently, depending on the type of activity, the engagement can be represented either by an HR increase or decrease, resulting in high variations in the general population's results. For example, Rooney et al. [132] identified a drop in the HR in a short film immersion but increased attention. At the same time, during lectures, Darnell and Krieg [10] concluded that there are increasing and decreasing HR variations that can characterize attention changes. Due to this discrepancy, we can conclude that the UE level will depend on the variation of the HR with respect to the baseline identified in the first measurement step; the interpretation made based on this indicator must consider that the results will vary from subject to subject, and according to the application context. More reflective environments intended to engage the user will tend to have a stable HR variation. In contrast, active environments with many stimuli will have a more significant HR variation concerning the baseline.

Electroencephalography
Electroencephalography (EEG) is a technique for recording and interpreting the brain's electrical activity. According to the user's state, the brain's nerve cells generate electrical impulses that fluctuate rhythmically in different patterns [128]. The electrical EEG signal is composed of different frequencies produced by neuronal electrical activity; these frequencies, known as bands, are classified as Delta(δ), Theta (θ), Alfa (α), Beta (β), and Gamma(γ). Each type of band reflects specific and different cognitive processing skills in particular brain areas [84]. Furthermore, studies have validated that EEG can provide metrics for determining task engagement and arousal [11]. As well as GSD and HR, the EEG measurements regarding UE can be divided into two phases [11,84,85]; the first consists of baseline data to control the initial subject response. The second phase contains the electrical brain activity measurement while carrying out a task. For EEG measurements, a range of instruments are available in the market like the neuroSky MindWave, OpenBCI 3D printed devices, traditional Ag/AgCl electrodes, Neuron spectrum 1, and one of the most used in the last years, Emotiv EPOC [11,86].
For the location of the electrodes, the studies typically used the 10-20 electrode placement system, as McMahan et al. [11], while using the Emotiv EPOC device, as well as, Nuamah and Seong [87]. In this system, each electrode has one or two letters to identify the lobe or brain area as follows: Pre-frontal (FP), Frontal (F), Temporal (T), Parietal (P), Occipital (O), Central (C), and includes the Z (zero) reference (FPz, Fz, Cz, Oz), see Fig. 4 for reference.
Many studies of EEG have identified different engagement indexes. Kang et al. [88] used the Neural Engagement Index (NEI), and Yamada [89] defined the frontal middle theta rhythm as a valuable index of attention. In the study carried out by Pope et al. [12] and confirmed later by Freeman et al. [85], the authors developed an EEG-based engagement index (Eq. 2), considered to be the most effective among different studies [11,84,86,90,91], this because this index reflects task engagement in a more precise and validated way. Furthermore, studies have shown that EEG-based indexes are reliably linked to various levels of decision-making tasks.
The study done by Pope et al. [12] identified the frequency bands as Theta (4-8 Hz), Alpha (8)(9)(10)(11)(12)(13), and Beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22). The theta waves describe a lower mental activity (resting state), while alpha waves are related to lower mental alertness, this one appears in a sleep-wake cycle. In contrast, Beta waves describe a higher mental activity, system activation, and alert state of the brain; in other words, Beta waves are related to an increase in brain activity during a task and allow the identification of attention changes due to external stimuli, indicating cognitive processes and changes in brain states while performing a task [91].
The EEG engagement index, in Eq. 2, identifies the UE level by comparing the initial user state with the stimulus responses; if it increases, a higher engagement is expected [11,12,84]. In this way, it is possible to identify the user engagement or disengagement with an activity, considering the baseline period versus the EEG data while performing a task.

Pupillometry
The pupil diameter analysis has been recognized for many years as a psychophysiological arousal index [92,93]; pupil dilatation is related to a certain level of engagement during the performance of an activity [94] and a function of processing load or mental effort required to perform a cognitive task [95].
According to several authors [94][95][96][97][98][99], for the measurement, two periods during the test can be established to confront the data, one to define the baseline of the pupil diameter and the other for the measurement during the activity development. It is possible to plot the total variation of the pupil diameter or just the variation concerning the baseline data as depicted in Fig. 2.c. Also, it is essential to consider that the baseline pupil diameter gradually reduces when the user begins participating in a task [95,100].
The measurement time for each period depends on the study. For example, authors like Jepma and Nieuwenhuis [99] for the first period identified that the pupil data during the 0,5 s immediately preceding the task could not be included as a baseline period because the user presented an anticipatory increase in the pupil diameter while Murphy et al. [101] used the mean pupil diameter over the 500 ms pre-stimulus as a baseline.
In the last years, authors have used different specialized instruments, like the Tobii eye-tracker 2150, brain vision analyzer, iView X MRI-SV eye-tracker, among others. To eliminate artifacts and blinks, authors [98][99][100] commonly used a linear interpolation algorithm.
The data analysis is correlated with the time-on-task and difficulty [93,95,96,98], showing that the pupil diameter is more significant when a difficult task is presented, reaching a peak that specifies an engagement point. Kassem et al. [102] suggest that the presence of arousal stimuli induces a dilatation of the pupil, confirming a rise around the moment of the stimuli independent of the baseline [103].
Podder et al. [104] proposed an engagement behavior function (EB) according to a Graph-Based visual saliency (GBVS) while presenting a video. This method is divided into two steps: in the first one, this model forms activation maps on individual feature channels and then normalizes them in a way that highlights conspicuity based on human visual perception and attention (Eq. 3) [104].
In the function, all the variables have the same weight to create a ranking between the subjects in the study, proving the worst and best level of performance that can represent the concentration/engagement level. The variables' descriptions are the following: It is possible to conclude that the level of engagement depends on the pupil diameter variation in relation to the baseline. Additionally, Gilzenrat et al. [98] verified that increases in baseline pupil diameter would be associated with decreases in task utility and disengagement from the task, although reduced baseline diameter (but increases in taskevoked dilatations) would be related to task engagement.
The findings of Gilzenrat et al. [98] are considered a good base for analyzing the results. Furthermore, Eq. 3 is a valid approximation of the UE level between users of the same study group.

Posture
Posture is the most natural element to observe and interpret within nonverbal communication [105]. Posture is a primarily involuntary signal that can be involved in the communication process. Although it seems to go unnoticed, the brain receives and processes the information, often unconsciously, and responds automatically to a stimulus; for this reason, it is crucial to measure how the posture of a subject can indicate whether a user is engaged or not.
According to the literature, various systems for measuring posture have been developed, usually indicated as Body Pressure Measurement System (BPMS) and Sitting Posture Monitoring System (SPMS) [106,107]; both systems look to identify the position of the user according to a reference system.
The most popular method to measure engagement evaluates the subject's posture (while solving a task seated) using pressure sensors located in the chair. Mota and Picard [105] and D'Mello and Graesser [107] used two pressure sensor matrices located at the seat-pan and in the seat's backrest. Shirehjini et al. [108] decided to embed the sensors in the chair, having a non-invasive system allowing the subject to adopt a normal seated posture reproducing real-life conditions; as well as Bibbo et al. [106] have done, using a chair equipped with eight analog tactile pressure sensors. The difference between authors is the location where the sensors are placed in the chair; Table 6 contains some examples of sensor locations.
A graphic example of these locations (Fig. 5) is the one used by Bibbo et al. [106], a study that shows adequate results.
Regardless of the pressure sensor used, different authors [105][106][107][108] conclude that a high level of UE exists when there is increased pressure on the chair seat with minimal movement and when the user tends to lean forward on the chair trying to get a better focus in the task [106]. On the other hand, a low UE level (disengagement) is identified when there are rapid changes in the pressure on the chair's seat and when the users have a significant distance between their face and the screen or activity [107,109].
D'Mello and Graesser [107] used the BPMS produced by Tekscan to examine the pressure map during an emotional stimulus (called frame), finding three significant pressurerelated functions: the average pressure (μ), the average coverage (C), and the average coverage change in a pressure mat (a coverage ). These functions are represented in Eq. 4. R is the number of rows in the pressure matrix, C is the number of columns, and P i j the pressure of a sensing element in row i and column j [107].
Pressure related functions Equation 4 allows the BPMS system to efficiently identify pressure features to facilitate the measurements and better understand the results. In conclusion, regardless of the system used, the changes in the subject posture measured with pressure sensors can indicate the level of UE when performing a task. Specifically, the user is engaged when they tend to lean forward, exerting higher pressure on the seat.

Respiratory rate
Respiration is a way to consciously influence our physical and emotional responses, reflect feelings, relate to the environment, and impact the body's general state. It is a factor that reflects emotional response [110]. Its variability is an indicator of mental stress and sustained attention to a particular task [111]; therefore, it is considered a physiological factor that can measure the level of the UE.
According to several authors [111][112][113][114], and following the same procedure used in other physiological methods, the users' respiratory data is generalized in two stages, the first to set the baseline, the second during the development of the activity; in this way, it is possible to contrast the initial data with its variation (see Fig. 3d).
There are different types of instruments for measuring respiratory rate. For example, Webster and Colrain [113] used a mouth breathing mask. Wenger et al. [114] used a respiratory pneumotachograph to measure the respiratory volume. Gomez et al. [115] and Vlemincx et al. [111,116] used a respiratory inductive plethysmograph to measure the rate and volume of each breath; and Vlemincx and Luminet [117] used two respiratory belts around the chest and the abdomen, besides making use of a breathing face mask.
According to the research, respiratory data, unlike the other physiological methods for measuring the UE, has not been a leading research method [115]. Only a few authors have analyzed this physiological factor and its relations with UE when performing an activity.
Vlemincx et al. [116] proposed some main respiratory parameters (Eq. 5) to measure respiratory variability, breath by breath, including Respiratory volume (V i), Respiration Rate (R R) and Minute ventilation (M V ). Existing studies [110,111,[114][115][116] demonstrate that random breathing variability increases during mental stress, difficulty, or excitement; and decreases with task-related attention. Also, the respiratory intensity during different emotional states varies according to the type and duration of the stimulus.

Facial expressions
Facial expressions are the changes that occur in our faces as a response to inner emotional states. In this sense, the UE analysis using facial expressions (no verbal communication type) aims to identify facial manifestations and their direct relationship with emotions. The facial expression analysis encodes and interprets the facial muscle movements to determine the internal emotional reactions. These reactions can be a rich source of social signals conveying the user's focus, attention, intention, motivation, and emotion [118].
Different studies have identified the importance of the time scale for recognizing facial expressions. Nezami et al. [119] used in their research three time-scale types: an entire video projection, a 10-s video clip, and single images. This time scale allowed them to categorize the facial expressions manually using labels and annotations to analyze and classify them according to a scale to rate the user status. This process is challenging due to inconsistencies between labelers/annotators, hence the need to automate this kind of process [120].
The literature review shows different computer-vision techniques to automate the UE analysis with a product. Several methods/architectures make it easier to analyze facial recognition, such as the Boostbf, Support Vector Machine (SVM), and the CERT Toolbox [120]. One of the most used is the Convolutional Neural Network (CNN) [119,121,122], a deep learning algorithm trained to recognize and classify facial expressions according to a preestablished facial data set.
The engagement scale parameters using facial expressions vary according to the study. For example, Whitehill et al. [120] defined it as I) Not engaged at all, II) Nominally engaged, III) Engaged in a task, and IV) Very engaged, Nezami et al. [119] defined it as I) Engaged and II) Disengaged, and Ramya et al. [123] defined it as I) Like, II) Dislike and III) Favourite. The main difference between studies that use CNN is related to the dataset used to train the network. For example, Nonis et al. [121] used the Bosphorus public database, Nezami et al. [119] used the dataset from the facial expression recognition challenge 2013, and Zhang et al. [122] combined temporal information with spatial information by applying CNN's to recognize facial expressions.
The study by Nonis et al. [121] shows adequate results regarding the relationship between facial expression and engagement of a user with a product. The study used an Intel® RealSenseTM SR300 and MobileNetV2 instruments and trained the network using the Bosphorus public database, then classified the engagement into three classes according to Russell's circumplex model of affect as follows: I) Deactivation: low engagement level with relaxed and calm expressions, II) Average: medium engagement level with happy, contented, serene and elated expressions, and III) Activation: high engagement level with alert and excitement expressions.
Olivetti et al. [125] also applied this classification while analyzing the user engagement during the interactions with a virtual environment using a Support Vector Machine (SVM) method to classify the engagement level according to the analysis of the facial expressions. Likewise, Violante et al. [126] used an SVM, and Russell's circumplex model of affect, generating three classes to define the inner users' requirements: I) Deactivation, II) Pleasure and III) Arousal.
It is possible to conclude that the analysis and recognition of facial expressions is an effective physiological method to identify the UE level with a product, mainly because of the adequate results found using CNN trained with a facial dataset. The classification of the UE used by Nonis et al. [121] can be used as a reference to identify the user status.
Finally, Table 7 shows a general guideline of the physiological methods discussed in this section, including the parameters of range period measurement, metrics, the procedure for data analysis, engagement scale, and equations.

Results
User engagement within a User-Centered Design context has been validated as an approach that integrates design visions with appropriate responses to user feedback and needs. Moreover, fields like manufacturing, engineering, virtual reality, and any technical discipline must consider the user's feelings and reactions towards an expected output during the first design steps; in other words, a wrong decision during the design process would affect the manufacturing operations directly. The design of a product and its formal specification must be combined with behavioral and organizational theories to develop an understanding of the user needs, contexts, and possible solutions [15].
We have realized that no document in the literature provides a generic view of the different methods and metrics used to measure user engagement. Thus, we aim to offer insights and guidelines for any design discipline that intends to implement UCD methods and apply user engagement as feedback for its analysis.
The results of this conceptual framework, summarized in Table 8, came from three analyses: the first one concerned the self-report methods, the second related to the conceptual   Compare the user's Pupil diameter or dilatation during the task development against their baseline recording There is no consensus about a scale that indicates a level of UE using the pupil diameter analysis because the level of engagement depends on the variation of the pupil diameter for the baseline identified in the first measurement step, so the interpretation is made on this fact and varies from study to study. However, Gilzenrat et al. [98] verified that increases in baseline pupil diameter would be associated with decreased task utility and disengagement from the task. In contrast, reduced baseline diameter (but increases in task-evoked dilatations) would be related to task engagement It is possible to calculate an engagement behavior function (EB) [104]. It means that a higher EB function indicates a higher engagement ranking When the user tends to lean forward, exerting higher pressure on the seat, they are engaged Where R refers to the number of rows in the pressure mat, C the number of columns, and P i j is the pressure of a sensing element in row i and column j Taken from D'Mello and Graesser [107] Respiratory Rate  definition of the UE through its dimensionalities, and the last one focused on the physiological methods. The findings verify that the UE is defined and measured differently depending on the application context; yet, getting to know better the engagement level of a user with the product under analysis can give valuable information that could be used as feedback for the design process, enhancing the chances of success of a product. In many aspects, the focus and extent of the analysis performed will create a product that users wish to interact with [68]. For that reason, we have included examples of good practices in this framework.
This study identified several authors who have conducted significant studies on self-report methods and their characteristics. For future studies on this topic, we see opportunities for research related to identifying and standardizing a methodology based on UED sub-constructs to complement the development of self-report instruments.
Referring to physiological methods, in the literature, we found different physiological approaches for conducting the UE measurement; in the framework, we have included the most commonly used. From the revision, we concluded that there are no standardized scales since the measurements are highly dependent on the conditions under which the studies are carried out. Consequently, the standardized scales for some physiological methods such as heart rate, pupillometry, posture, and respiratory rate are still a research gap. A common scale could be beneficial, facilitating the measurement and guaranteeing a clear understanding of the results. In this sense, there is an opportunity for future studies interested in this topic that wish to focus on less-used physiological methods for measuring the UE.
Most existing studies on UE focus their goals on determining the engagement value using a parameter explicitly created for their research. In this way, finding lessons learned and guidance from other studies could become difficult due to the lack of standardized methods. The available literature review documents analyzed a particular UE domain and referred to a specific application context. For example, in the study conducted by Pontes et al. [142], the authors made a systematic literature review of the instruments used to measure political engagement. The Handbook of Research on Customer Engagement [143] is a good source of information regarding exclusively marketing practices and organizational performance. Kulikowski [144] explored the literature on work engagement as a predictor of health. Furthermore, Bazzani et al. [145] presented an overview of EEG application in consumer neuroscience, briefly discussing the drivers of engagement.
Some evaluated studies concluded the need to adopt physiological approaches to assess engagement. Current studies on UE do not necessarily focus exclusively on one field; thus, considering this framework as a starting point for further research can give a global vision of the most commonly used methods in different fields and help the reader delimit their research area.
The best method to use will depend on many factors; so far, the previous studies are making considerable progress on physiological technics, analyzing the physical user responses to a stimulus, and obtaining a reliable engagement value. These findings demonstrate that the inclusion of physiological characteristics provides information that allows the researcher to determine whether the findings from selfreports are trustworthy and representative; however, the sample size is much must lower compared to traditional selfreport methods, limiting the generalization of the results. On the other hand, self-reported methods have been in the UE context for more time, and it is usual to find validated instruments for specific fields. It is also essential to consider resource availability, budget, time, and experience. In the case of physiological methods, we realize that the equipment needed is much more expensive than self-reports, and the researchers must follow a considerable training period to collect and evaluate reliable data. Fortunately, recording equipment is getting relatively cheaper and more user-friendly; that is the case, for example, of the new EEG headsets available on the market.
The findings from this framework suggest the need to apply a mixed UE measurement technic using self-report and, whenever possible, a physiological method to validate the results.
In addition, there is a need for an agreed conceptualization of UE measurement; future studies from the same contexts should try to address their finding as an input to create a standardized UE measure. The information in this framework could then be used as the basis to explore already validated methods and, based on them, develop reliable and standardized UE measurements. Through a review of the different methodologies, the framework developed in this paper provides a comprehensive summary of current evidence obtained through an objective and transparent approach to minimize bias without preference and favoritism. This study is a valuable help for other researchers and practitioners involved in UE and can be used as a guide to quickly choose an alternative from all those present in the literature.

Conclusions and discussion
Our research question in this study was whether in the literature exits a common scale/metric for measuring the user engagement. To address this question this conceptual framework aimed to identify a range of studies that met the criteria and reported engagement measurements to extract conclusions about the methods used. A comprehensive search was conducted to determine as many studies as possible, which included Scopus DB and Google Academics but without excluding proceeding documents, to identify further prominent studies that may not have been wholly published yet. We only included articles that reported a precise engagement measure.
The number of articles and citations regarding UE studies has been increasing in recent years. As stated before, many disciplinary fields are interested in engagement measurement. Indicatively, in our study, we found that the sector that has carried out the most research on this topic is the Education field which represents 27% of the total documents analyzed; lately, interest has grown regarding student engagement during online lessons. The following field by the number of studies (20%) is health services and their interest in patient engagement during different treatments. Followed by engineering and manufacturing context (19%), this field includes studies related to new technology developments, product design processes, and workers' engagement while developing a specific task. Marketing-related studies represent 15%. Virtual reality applications the 8%. Social services the 4%. Videogames 3%, while other services interested in user engagement and their perceptions like music, sports or food represent the remaining 4%.
For carrying out the literature review, an initial search was done, then the representative data was selected and filtered according to the scope of the research to create clusters based on data classification. After the review, a total of 1479 documents were analyzed, screened, and selected to understand the methods, tools/instruments, and metrics to measure the UE. We found that many studies described the use of selfreport methods, while others used physiological methods.
Based on the results, we classified the methods to measure the UE with a product into clusters. The previous revision is gathered within a framework that organizes the current methodologies as Self-report and Physiological methods. The self-report methods refer to measurement instruments that evaluate engagement from different perspectives. This category included the UE dimensionality (UED) as a way to compare the instruments [16], allowing a comparison between sub-constructs that defines the interaction level between user and product. It was possible to identify a wide range of dimensionalities considered as the basis of a self-report instrument. However, according to the findings, the multi-dimensionality that better supports the UE analysis includes the emotional, behavioral, and cognitive sub-constructs.
There is a higher prevalence of self-report methods, opening new possibilities for the incursion of physiological methods to identify in real-time physical metrics and establish if a user is more or less engaged. Many of the studies used only self-report methods to measure UE, (despite the reported limitations of self-report [139]), and used engagement assessments specific to one context; therefore, generalizability is limited. However, in the last years, there has been an increase in the use of physiological methods.
We explored seven physiological approach methods, finding similarities in the process's measurement phases between the different approaches. These phases allow comparing a standard data against its variation, usually defined as baseline period, task development period, and in some cases, a resting task period. On the other hand, for some methods (Heart rate, Pupillometry, Posture, Respiratory rate), there is not a numerical scale that classifies whether or not the user is engaged; however, there are insights to facilitate the understanding of the information regarding the UE according to the method used. While for other methods (Skin conductance, Electroencephalography, Facial expressions), indices and classification levels define the UE more clearly and concisely. Furthermore, considering these metrics as an additional source of information about the user requirements and desires, it is possible to conclude that the physiological methods still have a wide field to be explored concerning UE.
In this context, we consider user engagement a handy indicator in any research field, including interactive design and manufacturing. The interdisciplinarity aimed by UCD methods and the lessons learned from other areas will provide visions that will enhance the current design processes. For that reason, this study described different scenarios, including marketing, health, and education, that can be supportive case studies for future UE applications.
This conceptual framework has analyzed the most significant number of publications available and shows examples of good practices from different fields and domains for both selfreport and physiological methods. Additionally provides, whenever possible, the measurement scale and equation used to calculate the UE.
Finally, we recognize the challenge of selecting a method for measuring UE since there is a wide range of possibilities between the various existing methods. In this regard, the results of this study can allow future researchers, developers, or designers to consider the UE as one of the most prominent indicators of the product or service success and to use this general guideline as support to find the more appropriate method, technique, and metric for measuring the UE based on the requirements of their studies.
Author contributions All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by IACJ, JSGA, FM, SM and EV. All authors read and approved the final manuscript.
Funding Open access funding provided by Politecnico di Torino within the CRUI-CARE Agreement.

Conflict of interest The authors report no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.