Introduction

The question of how theoretical constructs like Health-Related Quality of Life (HRQoL) should be related to observables reflects one of the fundamental scientific issues facing any field: how should we think about the relation between constructs and observables? Two dominant approaches to this question are known as formative and reflective modeling [1, 2]. In formative models (FMs), items are viewed as causes of the theoretical construct under consideration, whereas in reflective measurement models (RMMs), items are seen as effects of that construct. In the present paper, we argue that neither of these approaches suits HRQoL, and present an alternative approach based on a network model (NM).

Some of the analyses performed in HRQoL research have been based on the application of FMs using principal components analysis (PCA), creating weighted composites of observables to achieve data reduction [3]. The 36-item Short Form Health Survey (SF-36), a commonly used instrument across different disease conditions and patient groups [4], has been developed on the basis of PCA. In an FM, HRQoL is the common effect of items (or simply a composite score formed out of them, like in PCA [5]). The idea behind the FM is that observed variables contribute to HRQoL: if the observables change, HRQoL changes as a result. A simplified example of the FM is represented in Fig. 1a where the observables are represented as forming a “mental health” (MH) component, one of the domains of the SF-36.

Fig. 1
figure 1

Examples of an FM (a), RMM (b) and an NM (c) that can be applied to HRQoL. FM formative model; RMM reflective measurement model; NM network model; MH mental health; NP item 9b of the 36-item Short Form Health Survey (SF-36): “how much of the time during the past 4 weeks have you been a very nervous person”; DC item 9c of the SF-36: “how much of the time during the past 4 weeks have you felt so down in the dumps that nothing could cheer you up”; CP item 9d of the SF-36: “how much of the time during the past 4 weeks have you felt calm and peaceful”; DB item 9f of the SF-36: “how much of the time during the past 4 weeks have you felt downhearted and blue”; HP item 9h of the SF-36: “how much of the time during the past 4 weeks have you been a happy person”. Presumed causal relations between variables are displayed by arrows. Labels on covariances among observed variables and on variances between latent and observed variables are omitted for clarity of presentation

An advantage of the FM is that it allows people with similar levels of HRQoL to have different item responses. For example, John may have a poor HRQoL because he is a very nervous person, whereas Jane may have a poor HRQoL because she feels downhearted and blue. Furthermore, the FM is appropriate when one would like to calculate a single score to represent someone’s HRQoL, which can be used as an index of general functioning. However, the FM also has some downsides. First, the FM is unidentified unless external outcome variables are added to identify its parameters [6]. Since different external variables yield different modeling solutions, the definition of a formative construct cannot be assumed stable across applications (i.e., interpretational confounding; [7]). Second, since the FM does not have implications for the correlation structure between item responses, it cannot evaluate important relations between items that make up HRQoL, nor the processes that give rise to the correlation structure that characterizes it [1]: relations between items are modeled as nuisance, even when they may harbor important information.

An alternative to the FM [e.g., 8, 9] is the RMM. In an RMM, HRQoL is defined as the common determinant of item responses. For example, Fig. 1b shows that the items NP, DC, CP, DP and HP have a common determinant, namely MH. When using an RMM, one has to satisfy the assumption of local independence [e.g., 10, 11], which states that two variables are locally independent when controlling for a third (latent) variable. So, an RMM implies that a high correlation between responses like “feeling calm and peaceful” and “being a very nervous person” can be explained by the common influence of the latent variable MH. For this reason, the RMM is also called a common cause model [6].

However, it is questionable whether HRQoL should be represented this way. It seems conceptually implausible that having a poor HRQoL results in being downhearted and blue. The reverse has more potential [12, 13]: i.e., downhearted and blue contributes to having a poorer HRQoL. Furthermore, the assumption of local independence may be unrealistic. The correlation between CP and NP is probably not due to the central influence of MH, but more likely results from a direct connection between the two. Signifying that, although the RMM represents an important model for relations between observables, a feature not found in the FM, it is unlikely to be fully appropriate for HRQoL.

Thus, the FM is a model useful for constructing a general health index, but one that ignores the structure present in item connections. Contrary to the FM, the RMM does model relations between observables, but does so on unrealistic assumptions. Thus, both the FM and the RMM are not able to capture the complexity of the relationship between HRQoL and its observables. In other words, we currently have no satisfactory way of thinking about the relation between HRQoL and its observables. In this study, we argue that the novel perspective offered by the NM can fill this gap.

The NM has been introduced as a psychometric approach that offers an alternative to the RMM and the FM. In an NM, connections between observables are assumed to result from a system in which variables have direct (pairwise) interactions [14]. These interactions can reflect the influence of observables on each other via bidirectional, and potentially causal, relations: i.e., feeling downhearted and blue leads to that person feeling less calm and peaceful, which in turn can lead to that person feeling more downhearted and blue. Alternatively, these interactions may arise because variables are part of the same homeostatic system, or because described relations are conditional. In these cases, variables will become coupled: they show dependencies that will not vanish after conditioning on all other variables. The structure of these relations can be represented and analyzed using an NM.

Figure 1c shows a simplified example of an NM applied to HRQoL, in which DB is connected to CP, which, in turn, is connected to NP. Typically, the absence of a direct relation means that variables will become statistically independent when conditioning on the variables mediating the path between [15]. In Fig. 1c, NP and DP are independent after conditioning on CP. Importantly, within the NM, HRQoL is neither assumed to be a common effect (as in the FM) nor the common cause of item responses (as in the RMM): the NM offers a third alternative for conceptualizing construct–observation relations. This framework has already been fruitfully applied to intelligence [16], psychopathology [17] and personality research [18].

Importantly, instead of a causal relation, an NM assumes that the relation between individual item responses and the construct HRQoL is mereological; individual components are part of the construct, because the construct is understood as a network of mutually interacting variables that together form HRQoL [19]. As such, direct connections between item responses are not only accommodated in an NM, but form the flesh and bones of the network structure. Not only connections between item responses, but connections between health domains can be examined as well, as the SF-36 consists of eight domains, which form a profile of a person’s health status [4]. In this study, we aim to demonstrate that the NM can be successfully applied to HRQoL research and show that it provides a fruitful alternative to an RMM or FM: the NM provides novel ways of representing and analyzing connections present among items or domains, which suggest new avenues for research and may inform treatment interventions.

Next to providing a novel approach for operationalizing HRQoL, the NM allows researchers to ask new questions about item structures in relation to the construct. In this study, we investigate four such questions. First, we examine how HRQoL is structured in terms of its network architecture, by constructing networks of two Dutch samples of healthy and non-healthy individuals. Second, we exploratively examine the structure of HRQoL on domain level by constructing domain networks for each sample. Third, we investigate which items are most central to HRQoL by using network metrics of centrality. Fourth, we test whether the network structure of healthy and cancer populations are significantly different.

Method

Data source

This study involves a secondary analysis of data that were originally gathered for the International Quality of Life Assessment Project (IQOLA) and have been described in detail in a previous paper [20]. The data involved were unidentifiable; it could not be traced back to the individual. Therefore, we did not require informed consent. The present study focuses on two subsamples of the IQOLA project: Dutch cancer patients (cancer patient sample) and a Dutch nationwide sample of adults who were not diagnosed with cancer (national sample). Participants completed the SF-36 between 1992 and 1994 (cancer patient sample) or in 1996 (national sample). In addition, we combined the two datasets (combined sample) to analyze the function of diagnostic status (i.e., the distinction between being diagnosed with cancer or not) in the HRQoL system. To this end, we added diagnostic status as a separate variable in the network structure.

Sf-36

The SF-36 [21, 22] is a HRQoL questionnaire that has been adapted and translated into more than ten languages over the past few decades, and has also been validated in various patient groups and languages [2325]. The SF-36 is based on an FM and comprises the following eight first-order latent variables (domains): physical functioning (PF), role limitations–physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role limitations–emotional (RE) and mental health (MH). These domains are themselves modeled as the cause of two second-order latent variables [26], which are represented by physical and mental component summary scores (PCS and MCS, respectively). Domain scores were calculated by summing up item responses, after which the scores were transformed to range between 0 and 100. PCS and MCS scores were calculated using standard US scoring algorithms [27]. Item allocation to the eight domains identified by the SF-36 can be found in Table 1.

Table 1 Allocation of the 36-item Short Form Health Survey items to the eight domains of Health-Related Quality of Life

Network analysis

An NM conceptualizes HRQoL as a network of mutually interacting characteristics [28]. NMs consist of two elements: nodes (circles; observed variables) and edges (lines; relations between variables [29]). To obtain a network, we estimated a Gaussian Graphical Model (GGM) [30] for all samples on both the item level and the domain level, a network in which an edge indicates a nonzero partial correlation between two nodes, while controlling for all other nodes in the network. This means that two connected nodes display a level of covariation that cannot be explained by other nodes in the network. To control for spurious connections that may arise due to multiple testing, and for the computational size of the problem, we applied the graphical lasso [31]: a form of lasso regularization [32], which utilizes penalized maximum likelihood estimation. The result is a sparse GGM in which many edge weights are set to zero and thus removed from the network. The network that is formed with a graphical lasso is therefore both interpretable and guarded against overfitting. The graphical lasso uses a tuning parameter to control the sparsity of the network, which we chose by minimizing the Extended Bayesian Information Criterion [EBIC; 33]. This methodology is explained in more detail by Costantini et al. [34]. Because the GGM assumes that the input covariance matrix comes from a population that follows a multivariate Gaussian density, whereas the SF-36 only measures at an ordinal scale, we computed the polychoric correlation matrix to apply the graphical lasso.

Network comparison

We checked for differences in network structures by means of a permutation test developed by van Borkulo et al. [35]. The difference is defined as the deviation in absolute weighted sum scores of the connections [36]. This permutation-based test randomly regrouped participants from the cancer patient sample and the national sample repeatedly (1000 times) and calculated the differences between these subsamples. The resulting distribution under the null hypothesis (both subsamples are equal) is used to test the observed difference of the original subsamples against a significance level of 0.05. Both weighted network structures (taking the edges’ weights into account) and unweighted network structures (only taking the presence of an edge into account) were tested. The latter is tested to investigate whether the basic structure of the samples are similar, the first to investigate whether the strength of individual connections in the networks structures are similar.

Centrality analysis

To analyze the place and function of items within individual networks, we use the measure of closeness centrality [37]. Edges between nodes are interpreted as paths: the stronger the edge, the stronger the path between relevant nodes, and the easier it is to travel from one node to another [34]. A highly central node is one from which it is possible to easily travel to all other nodes. Such paths may be interpreted as etiological progressions by which individual problems can lead to closely connected problems. In particular, nodes with high closeness centrality have a high ability to predict other nodes, and as such they may correspond to characteristics that have a particularly important function in HRQoL. Adopting the formal interpretation of HRQoL, when an item has a high closeness, it predicts HRQoL well.

All analyses were performed using the R statistical software 3.1.2. GGMs were constructed with the R-package huge version 1.2.6 [38]. Network visualization and the computation of centrality measures were done with the R-package qgraph version 1.3.1 [39].

Results

Participant characteristics

A total of 2227 participants completed the SF-36. As shown in Table 2, the mean age of the cancer patient sample (N = 485) was 57.27 years with 58 % women. The national sample (N = 1742) had a mean age of 46.71 years with 44 % women. Table 2 also shows the mean scores on the eight domains of the SF-36 for the individual samples.

Table 2 Means (SD) of the cancer patient sample and the national sample

Network analysis

Figure 2a–c show networks of the cancer patient sample, the national sample and the combined sample, respectively. Edges between nodes within a network correspond to polychoric partial correlations between items, controlling for all other items. The stronger a connection between two nodes, the thicker and more saturated the edge. Positive and negative connections are denoted by green and red edges, respectively [34]. Each node corresponds to a single SF-36 item (as given in Table 3) and is colored according to the domain it is allocated to (as given in Table 1). Item 2 and the diagnostic status variable are not part of any domain and thus are represented as separate. The Fruchterman–Reingold algorithm, which places more strongly connected nodes closer together, is used for node placement in all networks [40].

Fig. 2
figure 2

Network of Health-Related Quality of Life (HRQoL) as measured by the 36-item Short Form Health Survey in a cancer patient sample (a), a national sample (b) a pooled sample of the former two (c). The size of the absolute polychoric partial correlation between two nodes is represented using the color and thickness of an edge [37]. Node colors correspond to the eight domains: RED general health (GH), YELLOW physical functioning (PF), ORANGE mental health (MH), BLUE role limitations–physical (RP), GREEN role limitations–emotional (RE), PURPLE bodily pain (BP), GREY social functioning (SF), PINK vitality (VT), BROWN items not belonging to a domain

Table 3 Items of the 36-item Short Form Health Survey and their assigned labels

As seen in Fig. 2a–c the global structure of each network reflects the domains set up by Ware et al. [41]; items that belong to the same domain are closely connected and cluster into predetermined domains.

These results comply with the idea that the covariance between items may result largely from direct interactions between observables, rather than from the common influence of a latent HRQoL variable. For instance, items 5a and 5b are strongly connected, which likely reflects a potential causal relation, because being able to spend less time on work will typically lead one to accomplish less. Another example is the strong connection between items 7 and 8, which is visible within all networks. However, there are also connections that are more likely to reflect bidirectional influences. An example is the relation between items 9c and 9f, where feeling down in the dumps and feeling downhearted and blue influence each other. Finally, some strong connections arise because items formulate necessary conditions for other items (exemplifying potential deterministic causal relations). For instance, there exists a strong correlation between items 3g and 3h. This correlation plausibly arises because walking 1 km requires the ability to walk a few 100 m, such that the latter is a necessary condition for the former.

Figure 3a–c shows networks of the eight domains present in the SF-36 for the cancer patient sample, the national sample and the combined sample. It can be seen that domains whose items are near each other in Fig. 2 are strongly connected. For example, the domains mental health (MH) and vitality (VT) have a strong connection in all networks. Interestingly, there exists a strong connection between bodily pain (BP) and social functioning (SF), while this is not visible in Fig. 2, where item networks are displayed.

Fig. 3
figure 3

Network of Health-Related Quality of Life (HRQoL) as measured by the domains of the 36-item Short Form Health Survey in a cancer patient sample (a), a national sample (b) a pooled sample of the former two (c). The size of the absolute partial correlation between two nodes is represented using the color and thickness of an edge [37]. Node colors correspond to the eight domains: RED general health (GH), YELLOW physical functioning (PF), ORANGE mental health (MF), BLUE role limitations–physical (RP), GREEN role limitations–emotional (RE), PURPLE bodily pain (BP), GREY social functioning (SF), PINK vitality (VT), BROWN diagnostic status (DS)

Network comparison

We compared the item network structures from both the cancer patient sample and the national sample. We found that these two network structures are dissimilar (p < .001) when comparing weighted network structures, but we did not find dissimilarity when comparing the unweighted network structures (p = .056). Although care must be taken in interpreting null results in hypothesis testing, this suggests that the basic structure of the SF-36 in the cancer patient sample resembles the structure found in the national sample. However, it should be noted that this does not rule out the existence of local differences in the network structure, as statistical power to detect local differences is limited.

The same analysis was performed for the domain networks. We could not reject the null hypothesis that the network structure is invariant over subpopulations, when comparing the unweighted network structures (p = .16) as well as when we compared the unweighted network structures (p = .18). Results indicate that the domain network structure generalizes to different subpopulations quite well.

Centrality analysis

Figure 4 and Table 4 display the closeness centrality measures for all item networks. When inspecting closeness centrality, we find that the three networks mostly agree on which items are most central. As seen in Table 4, items 4b, 4c, 4d, 9g and 9i were most central for the cancer patient sample, items 4a, 4b, 4c, 4d and 5b were most central for the national sample and items 4a, 4b, 4c, 4d and the diagnostic status variable were most central for the combined sample. Items 4b, 4c and 4d are the items that are among the most central items in all networks. This suggests that in all datasets, the ability to keep participating in work or other activities despite one’s physical health has the largest influence on other characteristics in all networks. The networks align less with respect to the least central items. Items 2, 3d, 3e, 3f, and 7 were least central for the cancer patient sample, items 2, 7, 8, 9b and 11c were least central for the national sample and items 3e, 3g, 3h, 3i and 9b were least central for the combined sample. There were no items that were among the least central items in all networks, but items 2 and 7 were among the least central items in both the cancer patient sample and the national sample. Remarkably, this suggests that one’s perception of one’s health compared to 1 year ago and the amount of bodily pain during the past 4 weeks hardly influence other characteristics in the cancer patient sample network and the national sample network.

Fig. 4
figure 4

Visual representation of the predictive quality of individual Health-Related Quality of Life characteristics of the 36-item Short Form Health Survey in the network structures using the closeness centrality measure. DS diagnostic status

Table 4 Closeness centrality measure for every network to express the predictive quality of individual Health-Related Quality of Life characteristics in the network structure per sample in the 36-item Short Form Health Survey

Figure 5 and Table 5 display the closeness centrality measures for all domain networks. When inspecting closeness centrality, we find that the three networks disagree on which items are most central. As found in Table 5, the domains general health (GH) and BP are most central in the cancer patient sample, the domains role limitations–physical (RP) and role limitations–emotional (RE) in the national sample, and the domains RP and PF in the combined sample. The cancer patient sample and the national sample do not share any domain that is among the most central domains in the networks. The domain PF is among the most central domains in both the cancer patient sample and the combined sample, and the domain RP is among the most central domains in the national sample and the combined sample. This suggests that physical health and possible limitations as a result of one’s physical health have the largest influence on other domains in the networks.

Fig. 5
figure 5

Visual representation of the predictive quality of domains of the 36-item Short Form Health Survey in the network structures using the closeness centrality measures. BP bodily pain, GH general health, MH mental health, PF physical functioning, RE role limitations–emotional, RP role limitations–physical, SF social functioning, VT vitality, DS diagnostic status

Table 5 Closeness centrality measure for every network to express the predictive quality of domains of the 36-item Short Form Health Survey in the network structure per sample

The same non-alignment is found in the least central domains. Domains SF and BP are considered the least central domains in the cancer patient sample and the national sample, and the domain MH and the diagnostic status variable (DS) were the least central domains in the combined sample. It can be seen that the cancer patient sample and the national sample regard the same domains as least central. This suggests that the interaction between social functioning and bodily pain on the one hand, and the rest of the domains on the other hand, is less strong compared to other interactions in the network. This may mean that they have the least influence on other domains in the network, that they are least sensitive to changes in other domains or that the variance in these domains is largely determined by factors outside of the network structure.

Discussion

The present study demonstrated a new approach for modeling HRQoL, in which HRQoL emerges from a network of mutually interacting characteristics. We provided the first estimated network structure for HRQoL, by determining the GGM for the SF-36. In this network, every pairwise interaction is evaluated while controlling for all other variables in the network, after which the network is regularized by a lasso penalization. Edges that survive the resulting process of culling thus are likely to have a causal background. Importantly, the present analysis does not determine what the nature or direction of that causal background is. For example, some relationships between items may reflect potential causal effects, while others may be potential bidirectional relationships or potential conditional relationships that exemplify nearly deterministic relations. In addition, some items may hang together because they depend on one or a set of unmodeled latent variables.

In case unmodeled latent variables affect multiple items simultaneously, this will generate a fully connected sub network or clique in the network [42]. However, in our view it is extremely unlikely that all connections result from a common latent structure (as the RMM assumes). However, it would be worthwhile to develop analytical techniques that can combine latent variables analysis with network modeling. For now, the NM models relevant associations in a manner that is both statistically efficient and may also be more justifiable than the RMM, as it does not force a particular causal model to the data. Given our limited understanding of constructs like HRQoL, further explanation of network analysis as a tool in HRQoL research is therefore warranted.

As research advances, and the field improves its understanding of the causal relations that underlie the network structure of HRQoL, NMs can identify the most important nodes in the structure, as we have shown. By crossing the effect of variables on other variables in the network with the cost and availability of interventions directed at these variables, NMs may inform treatment decisions. Centrality analyses showed that the ability to perform work or other activities and accomplish things despite one’s physical health was most central in all network structures. This is indeed a plausible conclusion, given the importance of maintaining a daily routine in people’s lives. Furthermore, results showed that physical health and possible limitations as a result of one’s physical health are central domains in the domain network. In terms of treatment, these findings suggest that it is important to have access to direct resources that allow people to keep their daily routine and perform work or other activities as usual, as doing so may stop vicious circles from within the network structure. Thus, in the future, NMs may bridge the gap between research and treatment practice by providing specific guidance on treatment interventions.

Moreover, we found similar, unweighted, item network structures for both samples, next to the domain networks that were equivalent in for both weighted and unweighted networks. Even though the domains’ network structure is similar, network structures on item level may differ over distinct groups (e.g., age, depression). Investigating group differences may offer important inroads to understanding differential treatment effects or group differences in vulnerability. Future research may focus on relating the network structure extracted in the current study with networks that characterize other subpopulations.

In conclusion, this study supports the further explanation of NMs as a tool in HRQoL research and highlights the need for more research into comparison and confirmatory methods for network modeling, as this would help to compare networks across subpopulations and to generalize network structures to larger populations. In addition, NMs may be coupled to the analysis of treatment interventions. Thus, we propose that investigating the network structure of HRQoL will allow research to advance by taking advantage of the many possibilities that NMs have to offer.