Deriving QoE in systems: from fundamental relationships to a QoE-based Service-level Quality Index

With Quality of Experience (QoE) research having made significant advances over the years, service and network providers aim at user-centric evaluation of the services provided in their system. The question arises how to derive QoE in systems. In the context of subjective user studies conducted to derive relationships between influence factors and QoE, user diversity leads to varying distributions of user rating scores for different test conditions. Such models are commonly exploited by providers to derive various QoE metrics in their system, such as expected QoE, or the percentage of users rating above a certain threshold. The question then becomes how to combine (a) user rating distributions obtained from subjective studies, and (b) system parameter distributions, so as to obtain the actual observed QoE distribution in the system? Moreover, how can various QoE metrics of interest in the system be derived? We prove fundamental relationships for the derivation of QoE in systems, thus providing an important link between the QoE community and the systems community. In our numerical examples, we focus mainly on QoE metrics. We furthermore provide a more generalized view on quantifying the quality of systems by defining a QoE-based Service-level Quality Index. This index exploits the fact that quality can be seen as a proxy measure for utility. Following the assumption that not all user sessions should be weighted equally, we aim to provide a generic framework that can be utilized to quantify the overall utility of a service delivered by a system.


Introduction
One of the main research challenges faced by the the QoE community is deriving QoE models for various applications and services, whereby ratings collected from subjective user studies are used to model the relationship between tested influence factors and QoE. With it being well known that different users perceive both quality and value differently [1], user diversity will inherently impact the distribution of rating scores for a given test condition [2,3]. However, the majority of user studies to-date still report only on MOS (Mean Opinion Score) values and confidence intervals, and utilize these values to derive QoE models. When focusing on technical Quality of Service (QoS) influence factors, this leads to the common reporting of so-called QoS-to-MOS mapping functions.
Previous work has argued that from a service/network provider perspective, there is a likely interest in additional metrics beyond MOS values, thus providing deeper insight into rating distributions and how various conditions are perceived by the user population [3][4][5] (as opposed to how conditions are perceived by an "average user"). As an example, the GoB metric gives the probability that for a given condition, the user rating will be "good or better" [6] (e.g., on a 5 pt. Absolute Category Rating, ACR, scale, this corresponds to a rating of 4 or 5). In addition to a QoS-to-MOS mapping function, the results of a user study could thus be used 7 Page 2 of 17 to derive and report also a QoS-to-GoB mapping function. Such a mapping function could subsequently be used by a service or network provider in the context of QoE management when aiming to maximize the percentage of "happy" users in the system [7]. To generalize, subjective studies are used to derive QoS-to-QoE mapping functions, where QoE in this context can refer to any QoE metric of interest (e.g., MOS, GoB).

Fundamental relationships for deriving QoE in systems
Moving from the domain of user studies to the systems domain, we consider service/network providers interested in deriving various QoE metrics in their system, given (a) the system performance, and (b) QoE models available from user studies. To put it in a mathematical context, we observe certain system parameters, which is described by a random variable (RV). Assuming, for illustration purposes, a Webbased service, system performance may be quantified by the response time X experienced by the end user. As a result, we have a response time distribution in the system, meaning various users will experience different response times. On the other hand, going back to the results of subjective studies, we know that the user ratings for a certain test condition (response time) also follow a distribution. Hence, due to user diversity, the experienced QoE for a certain response time X = t is again a distribution Q| t . The question arises as to what is the observed QoE distribution Q in the system, when X is a random variable of the system's performance and Q| t is a random variable of the user's QoE for X = t ? Moreover, how can various QoE metrics in the system be derived, such as expected QoE and expected GoB?
To this end, we highlight the following key contributions of the paper: • We prove a fundamental relationship showing that the expected QoE in the system is equal to the expected MOS in the system, despite the fact that the actual QoE distribution in the system is not (necessarily) equal to the MOS distribution in the system. We note that the MOS distribution in the system is obtained by mapping response times of the system to MOS values as per a given QoS-to-MOS mapping function. • We show that to derive additional QoE metrics in the system it is necessary to use corresponding mapping functions derived from user rating distributions in subjective studies. In particular, to derive the expected GoB metric in the system, a QoS-to-GoB mapping function is needed. If only a QoS-to-MOS mapping function is available, it is not possible to derive the expected GoB in the system. • Going beyond our previous work [8], in which we derived fundamental relationships assuming that QoE depends on a single QoS parameter only, we now show that these relationships can be extended to an arbitrary number of parameters. Novel use cases are considered to demonstrate how to use those fundamental relationships. Firstly, the dimensioning of a web server based on a target GoB ratio is discussed. Secondly, we consider HTTP video streaming QoE as an example for a multidimensional QoE relationship.
To stress the implications of these contributions, we again highlight the link between the QoE community, systems community, and end users: if researchers conducting subjective user studies provide different QoS-to-QoE mapping functions for QoE metrics of interest, this is enough to derive corresponding QoE metrics from a system's perspective. This holds for any system parameter distribution, as long as the corresponding values are captured in the reported QoE models. The term 'service' has evolved over the years from being simple transport of data to a model where access to an application may be delivered as a service. When deriving QoErelated metrics in a system, we consider the system as offering a single service (e.g., web browsing, video streaming, etc.). Multiple users using the service each experience their own QoE. Adopting a more generalized approach (portrayed in Fig. 1), we may consider a system as potentially offering multiple services. During a certain time period, multiple sessions corresponding to a given service may be active in the system. A session involves one or more users. For example, in a network (system) we could consider HTTP streaming (a service). Over the course of one hour, multiple users are each watching their own video streams for a given duration (session).
For a given service, we distinguish between the following: • the QoE of an individual user experiencing a session, and • a measure indicating the overall service quality over a target time period (covering multiple sessions).
It is important to note that QoE is inherently linked to an end user experience. Thus, when measuring aggregate quality over multiple users, we are no longer referring to QoE as such, but rather to a QoE-based quality aggregate.

Not all user sessions are created equal
In our aim to calculate expected QoE in a system and prove fundamental relationships, in Sections "Fundamental relationship: QoE in the system for a single parameter" and "Extension of the fundamental relations to multiple parameters" we will consider a system as offering a single service, where all individual QoE values (inherently linked to individual users) are equally weighted.
In a realistic setting, however, where a system operator is aiming to utilize estimated QoE values of users in the system for purposes such as dimensioning, monitoring, or benchmarking, the relative importance of user sessions may need to be considered. We therefore go a step further and provide a generic framework for quantifying the utility of a given service from a QoE-based perspective, discussed in Section "A QoE-based approach to Service-Level Quality". We address cases when not all user sessions (and consequently corresponding user QoE values) should be weighted equally.
Continuing with the previously given example of HTTP video streaming, we consider the service delivered to multiple users in a network over a one-hour period. We utilize QoE models that map system performance to QoE for each individual user viewing session. One option is to calculate the QoE for each user session, and then consider the average value to be the expected QoE in the system for that service (as reported in Sections "Fundamental relationship: QoE in the system for a single parameter" and "Extension of the fundamental relations to multiple parameters"). However, in a realistic case, some users may watch short 1-3 minute video clips, while others may watch videos lasting 30 min-1 h. Such sessions widely differ both in terms of duration, as well as in terms of consumed system resources. Consequently, we argue that when calculating an aggregate quality index for the given time frame, it may be relevant for the service provider to consider some sessions to be "more important" than others, i.e., they do not contribute to the same extent to a measure of the overall service quality.
We therefore define a QoE-based Service Quality Index (SQI) as: a measure indicating the overall utility of a service delivered by a system and derived as a weighted combination of quality values estimated per user session. Individual QoE values are weighted according to factors that are deemed relevant by a service provider, and may be related to session characteristics (e.g., session duration), resource consumption (costs), number of users involved in a session, etc. We note here that we are working off the assumption that QoE is indeed a good proxy for the users' utility. While we think that this is indeed a reasonable assumption for most services, it should be verified for individual services when necessary. For simplicity's sake, we also assume a linear correlation between QoE and utility. This might be simplistic, as shown by Kilkki [9], however we note that a non-linear relationship between QoE and utility does not affect the core of our proposal, it simply implies a slightly more complex weighting function. The fundamental relationship extended to SQI is also formulated in that way that the SQI value in a system is equal to the expected utility values.

Definition of terms
Several different QoE-related terms and concepts are used throughout the paper. For the sake of clarity, we first define relevant terms according to their usage in this paper. Please note that proper mathematical definitions are provided in the later sections.
The term Quality of Experience (QoE) refers to a complex multidimensional construct comprised of various perceptual features contributing to the quality of an individual's experience. A commonly cited definition of QoE is provided in the Qualinet Whitepaper [10] stating that "Quality of Experience (QoE) is the degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and / or enjoyment of the application or service in the light of the user's personality and current state." Moreover, in the context of communication services, it is stated that QoE is influenced by various underlying influence factors related to the service, content, device, application, and context of use.
The quantification of QoE for an individual user is obtained through individual QoE user ratings. While we are aware that an individual's rating may in fact be influenced by a wide range of underlying system, context, and human factors, we note that network and service providers commonly rely on simplified QoE models, developed to estimate QoE based on measurable QoS parameters in the system. Thereby, we consider a user as quantifying their QoE for a given condition (in our case expressed in terms of QoS parameters) on an underlying rating scale, e.g., a 5-point Absolute Category Rating scale (1: poor, 2: bad, 3: fair, 4: good, 5: excellent). This quantification is necessary for the analysis of systems and for the derivation of fundamental relationships. For the sake of readability, we often simply say 'QoE of a user' instead of using the correct notion 'QoE rating of a user on the used rating scale'. We further note that the term 'QoE distribution' refers to the 'distribution of QoE user ratings on the used rating scale'. When using the term QoE metrics, we refer to an aggregation of individual QoE user ratings across multiple users for a particular condition. Examples for common QoE metrics are the Mean Opinion Score (MOS), which refers to the average rating over all users for a given test condition; the Good-or-Better (GoB) ratio indicating the ratio of users rating good or better on the given rating scale; and the Poor-or-Worse (PoW) ratio indicating the ratio of users rating poor or worse. A more detailed overview of QoE metrics and their definitions is provided in [3]. As stated previously, the majority of conducted user studies only report such aggregated metrics. Most commonly, studies report MOS values per test condition, thereby lacking insights into the distribution of QoE ratings for the condition. The reporting of such aggregate metrics hides the underlying diversity of actual user scores. Consequently, the reporting of only MOS values makes it impossible to infer the GoB ratio from a MOS value. With the MOS alone, we cannot say anything about the GoB. For example, a MOS of 3 is reached (a) if 100% of the users are rating 3 or (b) if 50% of the users are rating 1 and 50% are rating 5. Although resulting in the same MOS, a system provider may for example be more interested in the PoW ratio.
In a system, users will experience different conditions. When referring to the expected QoE in the system, we are in fact referring to the expected value of the QoE user ratings in the system. Hence, we are using the mathematical definition of expected value, which should in this context not be confused with the notion of the users' expectations (in relation to the perceiving subject's frame of reference) as used in the Qualinet definition and addressed in a number of studies [11]. Figure 2 visualizes the different terms and provides the relationship between individual users in sessions and the expected QoE in the system.
Finally, in this article we introduce the Service Quality Index (SQI) in a system, which we define as being based on session utility functions (further details provided in Section "A QoE-based approach to Service-Level Quality"). Utility functions have generally been used to specify the relation between relative user satisfaction and consumption of a certain resource [12]. With the concept having been adopted from economics, utility functions offer a way to formalize the correlation between network performance and user perceived quality (QoE), by defining a formal mathematical vehicle for expressing a user's degree of satisfaction Fig. 2 Overview of system parameter distributions as observed in a real system, and user rating distributions in a subjective study. They are combined into a QoE distribution as observed in the system System and services -utilization, request patterns, -configuration, -implementation, . . .

System parameter distribution
Parameter X of the system is measured and is a random variable (RV).
User diversity -experience of the individual user, -preferences and expectations, -actual context, . . .

User rating distribution
For fixed x, user ratings (RV) are measured in a subjective study.
combine distributions random variable X, e.g. response time lab studies, field trials, crowdsourcing QoE distribution in the system with respect to corresponding multi-criteria service performance [13,14]. In this paper, when considering the individual user perspective, we consider a QoE mapping function as relating user perceived quality to system conditions, i.e., QoS parameters (please note that when assuming a broader view of QoE, this relationship can of course depend on additional context, user factors, etc.). In the context of customer retention and avoiding churn, it is well known that QoE is one of the main drivers for service and network providers when deploying their services [15][16][17]. Further, considering the provider's perspective, ensuring a certain level of QoE for a given user session entails a certain cost, results in a certain revenue, etc. Thus, as previously mentioned, not all user sessions are necessarily considered equal in terms of their relevance and value to the provider. Consequently, focusing on the provider perspective, we refer to the utility of a given user session as the estimated QoE of the session, but scaled using an assigned weight factor, so as to reflect additional factors deemed relevant by the provider (e.g., cost in terms of resource consumption, profit). For example, a session with higher costs for the provider could result in a decreased session utility (obtained by scaling the QoE estimate with a lower weight factor). For a provider aiming to quantify the overall quality (or utility) of a service delivered in their system, we propose to integrate individual session utility values into a metric that we refer to as SQI.
The remainder of the paper is structured as follows. Section "Fundamental relationship: QoE in the system for a single parameter" provides the one-dimensional fundamental relationship between the QoE in the system and the subjective user studies for arbitrary QoE metrics. The multi-dimensional case is considered in Section "Extension of the fundamental relations to multiple parameters". Section "A QoE-based approach to Service-Level Quality" then provides a generic framework for quantifying SQI from a user oriented perspective, building on the assumption that user QoE values in the system may need to be weighted differently, depending on the target use case. A final discussion and conclusions are given in Section "Discussion and conclusions".

Fundamental relationship: QoE in the system for a single parameter
This section revisits fundamental relationships to quantify QoE in the system for a single service. Thereby, the users in the system consume the service in a similar way. In this section, we assume that the QoE depends on a single system parameter only. As a consequence, the QoE of the users, Q, may be derived by applying a one-dimensional mapping function to the single system parameter, X. As a concrete example, we consider web QoE. The users are consuming certain web pages from a server, whereby the response time of the server is the relevant QoE parameter. We use the web QoE example for illustrating how to use the QoE in the system for dimensioning a web server in Section "Example: Web QoE dimensioning". Before that, we summarize the fundamental one-dimensional relationships for QoE in the system, as originally contributed in [8]. The limitation to a single parameter is no longer required and will be extended to an arbitrary number of parameters in Section "Extension of the fundamental relations to multiple parameters". Figure 2 provides an overall picture on deriving QoE in a system. In a system, its users will experience different performance measures, such as response times, throughput, etc. For the sake of readability, in the following we will use response times for web QoE as an example of a system parameter X. The system's performance depends on both its configuration and its implementation. However, since the system utilization varies as the offered load (requests) varies, the users will experience different response times, which can be represented by a continuous random variable X. The cumulative distribution function (CDF), H(x), and the probability density function (PDF), h(x), of the response time is Two different users experiencing the same system condition (e.g., response time) x may rate the situation differently due to user diversity. The rating scale may be either discrete, like the typical 5-point Absolute Category Rating (ACR) scale, or continuous. Thus, we obtain a (discrete or continuous) QoE user rating distribution that depends on x. This is represented by a random variable Q| x for the QoE user ratings, given that the system parameter is X = x , with the CDF Q(i|x) and PDF q(i|x), as follows

Fundamental one-dimensional QoE relationship
In the case of a discrete rating scale, q(i|x) is the probability mass function (PMF) indicating the probability P(Q = i|X = x) that the user rating is Q = i for the system parameter X = x. Now let us do the following Gedankenexperiment. All users in the system are rating the QoE on the same 5-point rating scale after a session. This could be implemented by the service provider with a proper interface in the provided application, although this may be very annoying in practice. Then, the service provider observes a random variable Q of all QoE user ratings as well as a random variable X of the (2) corresponding QoS measurements. Hence, the provider also obtains the conditional QoE Q|x which is a random variable reflecting the user ratings in the system for a particular QoS x. Then, we obtain the MOS mapping function f(x) as the average over all user ratings for x, which is the expected value of the random variable Q|x.
. In a similar way, the GoB mapping function g(x) = P(Q| x ≥ 4) is obtained for the underlying 5-point rating scale -and analogously other QoE metrics of interest like PoW. The fundamental relationships show how the service provider can combine the QoS measurements and QoE mapping functions and which statements can be derived on the expected QoE in the system or the ratio of users in the system with a QoE user rating which is good or better. To be more precise, the expected QoE in the system means the expected value of the random variable Q of the QoE users rating, i.e. E[Q] . Those fundamental relationships are derived in the following. Q is the unconditional random variable for the QoE user ratings over all the system performance conditions, with the CDF Q(i) and PDF q(i) as follows.
In practice, the service provider does not obtain the QoE user rating distribution Q. Instead, the provider may collect QoS measurements, i.e. the random variable X. The probabilities q(i|x) and Q(i|x) may be estimated from user ratings obtained by means of subjective studies, e.g., in the laboratory, via crowdsourcing, or by field trials, as long as the system condition x is observed. In user studies, the QoE user rating distribution typically has been obtained under certain (controlled or observed) conditions, which do not reflect the current system parameter distribution H(x). The H(x) might change due to reconfiguration or reimplementation of the system and its service, or due to changes in the offered load or system utilization.
A service provider is interested in the QoE distribution Q(i) of all users in the system. The stochastic components of the QoE distribution are (1) system parameter distribution X (i.e. response time in our example) and (2) user rating diversity yielding user rating distribution Q(i|x). To be more precise, the service provider is interested in QoE metrics like the expected QoE in the system over all users or the ratio of users in the system who rated the experience QoE good or better. In practice, the service provider would not ask all users as we do in the Gedankenexperiment. But the service provider may be able to measure QoS and utilizes existing QoE models from literature, which provides e.g. a mapping f(x) from QoS to MOS, a mapping g(x) from QoS to GoB, or a mapping q(i|x) from QoS to conditional user rating distributions.
While answering this question from a mathematical point of view, we need to be aware of the inherent limitations of such indirect QoE measurements. The service provider relies on the provided QoS mapping functions in order to estimate the QoE user rating distribution or QoE metrics of interest. However, in practice it may be sufficient to get a rough estimate of the true QoE of the users in the system to get a QoE oriented perspective rather than a QoS-driven focus.

Expected QoE versus expected MOS
We consider here the case of a discrete rating scale like a 5-point ACR scale for the sake of simplicity. In particular, we use a discrete rating scale with items 0, … , n where 0 indicates the lowest QoE and n indicates the highest QoE of the scale. Please note that those fundamental relationships can also be derived for continuous rating scales analogously.
In Eq. (2) the distribution of the QoE user ratings i under a specific system parameter x is given. The expected user rating, given x, is the MOS value.
be the mapping function between the condition x and the MOS rating (the mean opinion score for a given value of x), which may be derived from subjective user studies. In practice, subjective studies will typically cover only a few instants of the response time only due to cost reasons. However, the mapping function f(x) is continuous, since x is continuous.
Then, a continuous mapping function f like the exponential function suggested by the IQX hypothesis [15] needs to be fitted to the collected MOS values. Please note that no assumptions are required for this mapping function f(x), besides continuity.
The MOS mapping function allows us to derive the following fundamental one-dimensional relationship between the expected QoE in the system and the system parameter X. The expected QoE in the system means the expected value of the random value Q reflecting the QoE user ratings in the system as observed in the Gedankenexperiment.
For the sake of readability, we just use the term 'expected QoE in the system'.
where the random variable M of MOS ratings is the transformation from the random variable, X using the MOS mapping function, M = f (X) . This equality follows from is the expected value of M over the distribution of X. This equation can be read from both sides.
. This implies that the QoE user rating Q in the system is obtained as illustrated in the Gedankenexperiment. Then, we may derive the MOS mapping f(x) from the user ratings in the system for any condition x and the equality holds . If we only have the QoS to MOS mapping function, then we may apply the mapping to the QoS values X. Assuming that this MOS mapping function provides the average user rating of the users in the system for any condition x, then this equality holds. Hence, the service provider relies on the MOS mapping function to properly include the relevant parameters and context for the users in its system. This is typically a simplification in practice, but gives the service provider the possibility to evaluate the system in a more QoE-centric way.
This can be seen from the simple fact that for a discrete rating scale, Q is a discrete random variable, while M is a continuous random variable. Please note that we do not need any assumptions on the user rating distribution Q| x , response time distribution X or the MOS mapping function f(x) for this fundamental relationship In practice, it is tempting to measure the expected response time E[X] and then to apply the MOS mapping function f to get the expected MOS (i.e. the expected user rating). However, the relation between the system parameter (e.g.m response time) and MOS is in general a nonlinear function, which implies that Only for a deterministic distribution with a constant value, the variance is

GoB ratio
The probability that the QoE Q in the system is rated good or better is denoted as where is set accordingly to reflect good or better. Commonly, a value of = 3 4 n is chosen on a rating scale with minimum value of 0 and maximum value of n [3]. For example, on a 5-point ACR scale, 4 indicates a value of good and For deriving the QoE metric GoB for the QoE distribution Q over all users in the system and stochastic response times, it is necessary to provide a continuous GoB mapping function In general, it is not possible to derive the GoB from the MOS distribution M, although approximations may exist under certain assumptions and conditions, see [18]. The fundamental GoB relationship means that the system parameter X may be measured and can then be mapped to g(X) to derive GoB [Q].

Higher moments and user distribution
In a similar way, we may derive those fundamental results for variances or distributions, see [8]. To derive the complete distribution of the random variable Q, it is necessary to have the distribution q(i|x) = P(Q = i|X = x).
Please note that we included an example for deriving the QoE user rating distribution Q from QoS measurements in  [8]. In that example, we consider a single web server offering a certain service. Previous subjective studies have shown that for a given waiting time x, the conditional distribution Q|x can be approximated with a binomial distribution. The fundamental relationship allows to obtain then the distribution Q with the equation above. Based on this example, we show in Section "Example: Web QoE dimensioning" how to derive the GoB mapping function g(x) to analyze the GoB ratio in the system.
For the k'th order moments of the r.v. Q:

Example: Web QoE dimensioning
In the following example, we use the GoB in the system to dimension a web server. The goal of the dimensioning is to determine the required service rate of the server such that a certain target GoB ratio is ensured for a given system request rate. We consider a single web server offering users a certain service like access to a web site. We chose this simple use case, since there exist web QoE models that may be utilized for this example, and the system itself can be be modeled as a queueing system, for which analytical results are well known. The response time of the system, i.e. the web page load time, is the single parameter X influencing the QoE in this example. The system is modeled as an M/M/1-FCFS queueing system. User requests arrive according to a Poisson process with rate . The server has a single processing unit which serves request in a first-come-first-serve (FCFS) manner with service rate . If the server is occupied, arriving requests need to wait until they are served. An unlimited waiting room for the incoming requests is assumed. With the request interarrival times and the service times following an exponential distribution, with intensities and , respectively, this is a classical M/M/1-FCFS waiting queue with the well known response time distribution X ∼ Expo( − ) , and the probability density function (PDF): under the assumption > (where the system is said to be stable).
In [3], it is shown that the opinion scores (for a given system parameter x), Q| x for a web QoE study can be very well approximated with a binomial distribution for various system response times x. Thereby, the user ratings are shifted to a scale from 0 to 4 (instead of the typical 1 to 5 scale). In that subjective study, a set of different page load times x was rated by 72 subjects. Thus, for any response time x, we may approximate the distribution of Q| x with a Binominal distribution, Q| x ∼ Bino(n, p x ) with MOS E Q| x = np x and n = 4 on the discrete rating scale (0; 4). due to the used rating scale. The parameter p x follows as p x = E Q| x ∕n . The MOS for a certain response time follows the IQX hypothesis, as shown in [19], which finally leads to Q| x ∼ Bino(n, e − x ) , with probability mass function (PMF) with the sensitivity parameter = 0.25 of the IQX hypothesis for this distribution, see [8]. The MOS mapping function where is the arrival rate, the service rate, and = 0.25 the fitted IQX sensitivity parameter. Now, we want to dimension the web server system in such a way that the GoB ratio is larger than a target value, e.g. G ( , ) ≥ G * where G * = 90% in this example. This requirement can either be expressed by the service rate as a function of the arrival rate and G * , or by as a function of and G * . In this example we use the required service rate (G * ) as a function of the arrival rate and the target GoB ratio G * . Figure 3 provides the numerical results and allows the service provider to determine the required service rate depending on the actual arrival rate in the system. Hence, the service provider needs to estimate the arrival rate to finally decide on the required service rate, according to the following equation for a target GoB ratio of 0 < G * < 1: g Please note that arrival rate and the sensitivity parameter have the unit 1/s, while the target GoB ratio G * is dimensionless.

Extension of the fundamental relations to multiple parameters
The web QoE example in the previous section considered the response time as single parameter influencing the QoE. Thus, X is a one-dimensional random variable which needs to be measured in the system. Then, the one-dimensional QoE mapping function can be applied to derive QoE metrics in the system.
For other applications and services, several parameters influencing QoE must be considered. As a result, several random variables need to be observed in the system. Then, a multidimensional QoE mapping function may be applied to quantify the QoE in the system. As an example, we consider non-adaptive HTTP video streaming in Section "Example: HTTP video streaming QoE". The QoE is mainly determined by the number of stalls and the total stall duration [20] which are not independent of each other. In practice, those two parameters need to be measured in the system to derive the joint probability density function for both parameters. This use case demonstrates how to conduct a QoE evaluation of a running service in a system. In Section "Fundamental multidimensional QoE relationships", we derive fundamental relationships for the QoE in the system for The underlying multidimensional QoE model may have certain characteristics, e.g. an additive or multiplicative QoE model, which will allow to simplify the fundamental relationships (Section "Multiplicative and additive QoE models"). The video streaming example in Section "Example: HTTP video streaming QoE" will also show how to utilize the relationships in real-world system when measuring the system performance.
We want to highlight that we still consider in this section that all sessions are comparable. This allows to quantify the QoE for the offered service in the system across all users.

Fundamental multidimensional QoE relationships
We follow the same line of thinking as sketched in Fig. 2. The system performance distribution is described by the random variables X 1 , X 2 , … which may be dependent on each other. Hence, we have a multivariate random variable which is a list of random variables with the corresponding joint probability density function h(x 1 , x 2 , ...).
In subjective studies, the conditional probabilities Q| are obtained for the test condition = (x 1 , x 2 , … ) . From the subjective studies, relevant mapping functions are derived, e.g., a MOS mapping function f ∶ → [1;5] . In practice, it is a difficult problem to sample the parameter space properly to derive appropriate models such as the MOS mapping function. A large set of parameters (reflected in the dimensionality of the parameter space of ) will require too high costs for conducting subjective experiments which would cover a representative combination of system parameters in . If one or more of the system parameters are continuous and not discrete, this will increase this challenge even further. This sampling problem is interesting and investigated in recent literature. As a starting point, the interested reader may review for example [21][22][23][24] developing active learning algorithms for multidimensional QoE models.

Expected QoE
For the sake of simplicity, we consider in the following only two parameters, e.g., X 1 (number of stalls) and X 2 (stall duration), to derive the fundamental multidimensional QoE relationships. The same thoughts can be generalized to an arbitrary number of possibly dependent parameters. We further assume a discrete rating scale yielding a discrete random variable of user ratings, i.e., Q ∈ {0, … , n} . Again, the derivation can be generalized to continuous rating scales analogously.
The expected QoE in the system requires the joint two-dimensional PDF h(x 1 , x 2 ) of the two (17) = (X 1 , X 2 , … ) Fig. 3 Dimensioning of the required service rate for the web QoE example (Section "Example: Web QoE dimensioning") such that the GoB is above a certain target GoB ratio G * for a given arrival rate (Eq. (16). In practice, the provider needs to fix G * and estimate the arrival rate, e.g. overestimating 7 Page 10 of 17 parameters = (X 1 , X 2 ) as well as the conditional QoE distribution Q| x with the corresponding probability mass function q(i| ) = q(i|x 1 , x 2 ) = P(Q| = i) = P(Q = i|X 1 = x 1 , X 2 = x 2 ) = (Q = i| ) . Then, we obtain the following fundamental relationship for the expected QoE E[Q] in the system. Similar to the one-dimensional relationship, a MOS mapping function f is required which maps the system performance to a distribution of MOS values M = f ( ) . Note that the relation given in Eq.(18) generally holds for k-dimensional parameter sets (in Section "Fundamental relationship: QoE in the system for a single parameter", we consider k = 1-dimensional, while in this section we consider k = 2-dimensional sets). The k-dimensional MOS mapping function is f ( ) = E Q| x and the expected QoE in the system is then the expected MOS.

GoB ratio
For other QoE metrics, similar derivations as in the onedimensional case lead to the fundamental relationships. We show the example of the GoB ratio in the following.
Hence, we need the multidimensional GoB mapping function g and the multivariate PDF h for calculating the GoB over all users in the system. It is g(x 1 ,

QoE distribution
For the derivation of the complete distribution of the random variable Q, it is necessary to have the distribution q(i|x 1 , x 2 ).

Multiplicative and additive QoE models
The fundamental relationships indicate that the joint PDF h is required to compute the metrics for the QoE in the system. Now, let us consider that the two random variables X 1 and X 2 are independent of each other. Then, the joint probability density function h(x 1 , x 2 ) is the product of the marginal PDFs h 1 (x 1 ) and h 2 (x 2 ).
Another simplification considers the underlying QoE model. The literature often provides additive and multiplicative QoE models, see [25] for detailed discussions. To this end, we focus on the expected QoE in the system and corresponding MOS mapping functions in the following for additive and multiplicative MOS models.

Additive MOS model and expected QoE in the system
The literature suggests additive QoE models for different services like mobile web browsing [26] or speech quality. The E-model [27] is a commonly used parametric planning model for predicting expected speech quality. The underlying principle for handling multiple different types of impairments came from the OPINE model proposed by NTT, assuming quality degradation factors are summed on a psychological scale [28]. For video quality estimates, the amendment of ITU-T Rec. P.1201 [29] suggests the use of an additive model whereby degradations resulting from stalling and initial delay are subtracted from a maximum MOS value. Let us consider an additive QoE model which maps the parameters x 1 and x 2 to corresponding MOS values. In general, we may have m different mapping functions f * j , ( j = 1, ⋯ , m ) which are defined on a k-dimensional parameter space , where m and k might be different. For instance, (21) in the combined additive model we weights the two terms with corresponding factors a 1 and a 2 . There are no assumptions on the functions f 1 and f 2 , which may be e.g. non-linear functions. We arrive at the following (non-linear) twodimensional MOS mapping function f.
In the following we consider two mapping functions, f 1 and f 2 , and two parameters x 1 and x 2 , and let f * The conditional density is defined as h(x 2 |x 1 ) = h(x 1 , x 2 )∕h 1 (x 1 ) . In other words, the joint density of both random variables is the product between the conditional density and the marginal density. 1 Thus, the expected QoE in the system can be described with the marginal density functions.
O n a k -d i m e n s i o n a l p a r a m et e r s et , t h e In practice, the usage of the marginal distributions may be required, when the complete information is not available. The complete information could be provided as a tuple of measurement values for each session or as a multi-dimensional histogram. Due to the large amount of measurement data, in practice the marginal distributions may be only captured. Dash boards for example may only provide the aggregated histogram of measurement values for the individual parameters.

Multiplicative QoE model
For multiplicative QoE models, we obtain similar results as in the additive case, except the simplified formulation with marginal distributions is not possible as we will see later. Multiplicative models have been demonstrated for audiovisual quality whereby the multiplicative term between audio and video qualities is generally sufficient to estimate audiovisual quality [30]. This was confirmed in a survey comparing integration models [31].
The video streaming example in Section "Example: HTTP video streaming QoE" relies also on such a multiplicative MOS model. In particular, the multidimensional IQX model [21] relies on a multiplicative model. The parameters x 1 and x 2 are contributing to the MOS according to the single dimensional IQX hypothesis. Thus, f 1 (x 1 ) = a 1 e − 1 x 1 and f 2 (x 2 ) = a 2 e − 2 x 2 . In addition, the interaction between the two parameters is considered f 12 (x 1 , x 2 ) = a 12 e − 12 x 1 x 2 . Then, the MIQX model suggests Hence, the MIQX is an extension of the IQX to a vector of parameters and the sensitivity parameters with as a linear model with interaction. The result of the MIQX is a multiplicative model.
We consider the multiplicative model for the MOS and two potentially dependent parameters X 1 , X 2 in the following. Here, the MOS mapping function takes into account an additive offset b and a discrete rating scale.
The expected MOS is then the system can be derived with the fundamental relationship and the joint PDF h(x 1 , x 2 ).
However, this equation cannot be simplified anymore as in the additive case, since the latter integral leads to a function In the case that the parameters X 1 and X 2 are independent, Eq.(29) can be simplified, since h(x 1 ,

Example: HTTP video streaming QoE
This example shows how to quantify the expected QoE for an HTTP video streaming service in a system for twodimensional QoS measures. This example also depicts how a service provider may monitor the QoS in the system and then apply the two-dimensional fundamental relationship in order to estimate the expected QoE in the system. The service provider measures the stalling pattern on application layer, which is the total stall duration t and the number n of stalls, i.e., the system parameter set is = (x 1 = n, x 2 = t) . Previous work has already shown that stalling is a major QoE influence factor [20]. In particular, for (non-adaptive) HTTP streaming 2 , a two-dimensional QoE mapping function f ( ) = f (n, t) is provided which maps the number of stalls and the total stall duration to MOS values [20].
which implies that f ∶ → [1. 5;5] For obtaining the stalling patterns, we conduct a simulation and we compare the measurement results for the expected QoE in the system with theoretical results from the known simulation parameters. The observed number of stalls N in the system is a random variable which is assumed to follow a geometric distribution: N ∼ Geom(p) with P(N = k) = (1 − p) k ⋅ p for k = 0, 1, 2, … . The total stall duration follows an Erlang distribution which is composed of N exponential phases of average length L. Thereby, L indicates the average stall duration for each of the N stall events. Thus, T ∼ Erlang (N, L) . We select the parameters of the stalling simulation as follows: For each simulated video j, we sample the set j = {n j , t j } , where n j is the number of stalls for video j, and t j is the total stalling duration for the same video. We  h(n, t) . In practice, however, we simply use the measurement values = {n j , t j } and obtain q j from q j = f ( i ) = f (n j , t j ) , which is the corresponding MOS value for the j-th video. Please remember: With the MOS mapping function, we can only derive the expected system QoE (but not system GoB etc.).
The average QoE of the video service obtained from the measurements is therefore For the QoE of the system, it is also recommended to compute the confidence intervals based on the measurements. In this example, the average QoE is E[Q] = 3.380 with the 95% confidence interval (3.299, 3.460). As already discussed previously, it is not possible to compute the average QoE by mapping the average number of stalls and the average total duration to MOS, unless the mapping is a linear function. In our example, we estimated Q = 3.38 , which is not equal to f (N,T) = 2.55.
The theoretical expected QoE in the system can be derived for our scenario with the joint PDF h(n, t) for (32) f (n, t) = 3.5 ⋅ e −0.15t−0.19n + 1.5 the MOS mapping function with parameters , t , n , . It is f (n, t) = e − t ⋅t− n ⋅n + . The joint PDF h(n, t) can be expressed with the conditional density a n (t) which is the total stall duration under the condition of n stalls. Thereby, the number of stalls follows the geometric distribution with parameter p and a n (t) follows the Erlang distribution with n phases of average length L = 1∕ = 2 s . Figure 4 shows the results for a parameter study on the average number of stalls in the system according to the following equation.
In practice, for monitoring the QoE in the system, it is required that the corresponding mapping function is used (e.g. MOS mapping function f as in the example above or GoB mapping function g). Then the input parameters of the mapping function need to be measured in the system. Then, the QoE in the system can be derived according to the fundamental relationships. Hence, the average of the measurement values mapped to QoE needs to be calculated, as in Eq. (33).

A QoE-based approach to Service-Level Quality
In the calculations and discussions given in Sections "Fundamental relationship: QoE in the system for a single parameter" and "Extension of the fundamental relations to multiple parameters", the QoE in the system is obtained using estimated per-user QoE values, e.g., by applying MOS or (34) GoB mapping functions to a given set of system parameters. Therefore, the QoE of the individual users are all taken into account in the same way and weighted equally. However, a service provider (or system operator) may aim at utilizing the estimated QoE values of the users for various purposes (e.g., network/server dimensioning, quality monitoring, system benchmarking). In this context, a user interacts with the service within a certain session. Those sessions may be very different in terms of session duration, resource demands, or emerging costs for the proivder. For some services, such as multiparty video calls, several users may take part in the same session, but each user will have their own individual experience of that session. It may be relevant to consider that certain sessions are more "important" than others for the provider, e.g., due to resource demands or costs for a session, or due to the number of affected users in a session. Thus, such sessions may be more indicative of an aggregate service quality measure, which is then in turn used by an operator as a proxy measure when performing tasks such as network/server dimensioning, quality monitoring, or system benchmarking. In this section, we first provide a generic definition of a Service Quality Index (SQI), and then provide two example use cases to demonstrate scenarios in which a service provider can utilize such an index.

Defining a QoE-based Service Index (SQI)
In Section "Not all user sessions are created equal", we defined the SQI as a measure indicating the overall utility of a service delivered by a system and derived as a weighted combination of quality values estimated per user session. Rather than considering only a QoE mapping function (which estimates QoE per session), we extend this notion to a utility function, which weights individual session QoE values in a manner that is deemed relevant by a service provider (e.g., based on session duration, resource consumption, etc.). For a session i, we define the utility function as: where m * ( s ) is QoE mapping in session s on a subset of , and w * ( s ) = w s is the weight of session s. In a generic sense, the weight factor is assigned to indicate the relative importance of a given session quality value. Hence, we may use normalized weights for the utility function, e.g., when having S sessions, w * s = w s ∕ ∑ S s=1 w s . Please note that x s may also include other aspects than QoS parameters like the duration of sessions or the resource consumption for that session. Then the QoE mapping function may only consider a subset of those parameters. Similarly, the weighting function may also consider a subset of the parameters. and SQI represents the expected QoE in the system. If we use the GoB mapping function g instead, then In the following sections, we illustrate two example use cases with different weights illustrating how the SQI may be utilized in a meaningful way by service providers.

Example use case: utilizing SQI for web QoE dimensioning
As an example, we once again consider a web service provider aiming to dimension a web server (as explained previously in Section "Example: Web QoE dimensioning"). For each loaded page, we have a waiting time t w and a service time t s to process the user request at the web server. The page load time is t = t w + t s and the MOS is from Eq (31) where a = 4 , f * 1 ( ) = f 1 (t w ) = e − t w and f * 2 ( ) = f 2 (t s ) = e − t s . Rather than relying simply on calculating expected QoE in the system, the provider may assign weights to individual MOS values that are proportional to the service times. One example for this time-dependent consideration is the processing costs. Le us consider ad impressions during service consumption. A service provider may have contracts with ad companies which pay per impression time. While a user is served, ads are displayed and the provider may get revenue from the ads company [33].
Thus, we consider the weight of a user request to correspond to w * (t w , t s ) = w(t s ) = t s , i.e., only the time it takes to process the user request, i.e. to download actual contents. The joint PDF is h(t w , t s ) . In the FIFO system, the waiting time and the service time are independent (different than in a processor sharing system). Hence, we simply multiply the PDFs of the waiting time and the service time h(t w , t s ) = h w (t w ) ⋅ h s (t s ) . The PDF of the service time is h s (t s ) = e − ⋅t s . The PDF of the waiting time is h w (t w ) = ( − )e −( − )t w . Putting the different parts together, we can derive the SQI based on the random variables T w and T s reflecting the waiting time and the service time.
For dimensioning the service rate (how fast user requests can be processed), the service provider first needs to estimate the service demand rate , i.e. the number of user requests per time. Figure 5 shows the SQI as well as the expected QoE in the system in relation to the server rate for = 3 . The system is stable, i.e., not in overload, when > . As expected, the higher the server rate is, the higher the expected QoE in the system is. However, taking into account the session weights, we have a different picture. Low server rates close to = 1 lead to highly loaded systems with bad QoE and low SQI values. However, high server rates lead to high QoE values, but shorter service times. In that case, the provider may lose revenue from the ads company. As a result, the SQI value is also low. In Fig. 5, the SQI curve indicates the optimal operating point wrt. the service rate, which is considered best for the operator according to the SQI.

Example use case: utilizing SQI for benchmarking a system offering multiparty calls
For complex applications, such as WebRTC conferences, where many users can participate in a session, using several media modalities at once, we face some additional complexities in defining the "quality of a session", since unlike 5 Utilizing SQI for the web QoE example in Section "Fundamental one-dimensional QoE relationship" for = 3 . We use the service time as weighting function w(t w , t s ) = t s and use the MOS mapping function f (t w , t s ) . The optimal server rate is = 3.58 leading to SQI = 0.57 and E[Q] = 3.80 . The QoE values are shifted to a 5-point scale from [1; 5] the case of video streaming or web browsing, each session itself (a "call") potentially involves: (a) many users, and (b) several QoE models (e.g., for voice and video) that need to be combined. As is the case with e.g., video streaming, calls can have widely varying durations, which in and of themselves also affect the QoE of the participating users (who in addition can join or drop at different times during the call).
In such a context, if from the service level we want to see the "call" as our session unit, there needs to be some quality aggregation done already at the session level (since anyway we are considering the session quality as a function of the QoE of the participating users). In this case, we can shift the inter-session variation into the session quality estimation itself (assuming that different weights should be assigned to different participating user QoE values), and then we can simply average session qualities over the service as a whole. If, on the other hand, we wanted to consider the session granularity at the user level, as opposed to the call level, we would need to include some aspects of the call itself (e.g., duration of the user's participation, number of users in the call) in the weighting function.
Let us assume that a service provider would like to apply the SQI framework for the purpose of benchmarking their service. We use the following notation: S: total number of sessions; n s : number of users in session s; n = ∑ S s=1 n s : total number of users across all sessions; s,i : a set of objectively measured parameters related to user i in session s; u( s,i ) : the utility function for user i in session s; m( s,i ) : the mapped QoE value for user i in session s; and w( s,i ) : the weight for user i in session s. We then calculate the SQI as a weighted average of all utility values summarized across all users and sessions: The weight assigned to a given user QoE value can be derived based on the relevance of a user within a particular session (as compared to other session users, for example, if there is a so-called dominant speaker), as well as based on the relevance of the session itself (as compared to other sessions), or some combination thereof. By now considering each user QoE value individually, we can denote the SQI value as the following: where i is the set of objectively measured parameters for user i, i = 1, ..., n . It thus once again follows that SQI = E[u( )].
We note that if all weights are equal, then w( ) = 1 for all users in the system. In such a case, if we use

Discussion and conclusions
Service and network providers rely on QoE models (often in the form of QoS-to-MOS mapping functions) for estimating and / or predicting user perceived service quality in their systems. A common approach is to use the distribution of MOS scores in the system (as obtained from a QoS-to-MOS mapping function) to draw conclusions with respect to the QoE distribution (or other QoE metrics) of users in the system. These metrics are then further used to drive QoE optimization and management decisions [34][35][36]. Similarly, [37] analyzes MOS distributions, but states that "[...] the ultimate goal is to predict the distribution of user ratings". This will "[...] give operators and service providers a holistic view of service quality." Especially in 5G, a user-centric design is foreseen, requiring to consider system QoE [38].
In this paper, we draw the attention of the systems community to the fact that the actual QoE distribution in a system is not (necessarily) equal to the MOS distribution in the system. The current systems literature however, indicates that there is clearly lack of a common understanding as to what are the implications of using MOS distributions rather than actual QoE distributions. For example, it is not possible to derive the ratio of users experiencing good or better (GoB) quality in the system by utilizing the MOS mapping function to obtain the MOS distribution. Instead, a QoS-to-GoB mapping is required. We provide important insights to raise awareness and foster further research in this area; targeting also the QoE community, and once again highlight the need for reporting QoE metrics and mapping functions beyond just those relying on MOS (e.g., GoB).
The contribution of this paper are proven fundamental multi-dimensional QoE relationships providing an important link between the QoE community and the systems community. If researchers conducting subjective user studies provide different QoS-to-QoE mapping functions for QoE metrics of interest (e.g., MOS or GoB), this is enough to derive corresponding QoE metrics from a system's perspective. This holds for any QoS distribution in the system, as long as the corresponding QoS values are captured in the QoE models.
In addition, we propose a framework for network/service providers which provides guidelines for taking into account the relative importance of user sessions in a system. We define a QoE-based Service Quality Index (SQI) based on individual user QoE values, which are weighted according to factors that are deemed relevant by a service provider. Such factors may be related to session characteristics (e.g., 7 Page 16 of 17 session duration), resource consumption (e.g., costs), number of users involved in a session, etc. The index considers utility as an extension of quality, and is consistent with the definition of QoE metrics in the system when all QoE values are weighted equally.