Keywords

1 Introduction

Valid and reliable measurement instruments are vital for human factors in privacy research [23]. Validity means that an instrument measures what it purports to measure. Reliability means that the instrument measures this consistently.

In this chapter, we focus on the validity and reliability of privacy concern scales. While there is a range of privacy concern and behavior measurement instruments available [8, 10, 12, 22, 33, 41, 45, 50], also discussed in studies on the privacy paradox [14, 26], we will focus on the scale Internet Users’ Information Privacy Concerns (IUIPC) [33]. IUIPC has roots in the earlier scale Concerns for Information Privacy (CFIP) [45], itself a popular scale measuring organizational information privacy concern and validated in independent studies [18, 46].

IUPIC has been appraised by researchers as part of other studies [36, 43] and undergone an independent empirical evaluation of the scale itself [16] and of the applicability of the full nomology in other cultures [39]. Even though the scale was originally created in a diligent, evolutionary fashion and founded on a sound underpinning for its content validity, construct validity and internal consistency reliability were not always found up to par for the purpose of human factors in privacy research.

In this chapter, we will discuss a brief form of the Internet Users’ Information Privacy Concern scale (IUIPC) [33] as a running example. The brief form, called IUIPC-8, only uses eight of the original ten items and was determined to yield stronger construct validity and internal consistency reliability [16].

Our aim for this chapter is not only to present the IUIPC-8 scale itself, but also to shed light on methods for the evaluation of valid and reliable measurement instruments. To that end, we will employ confirmatory factory analysis (CFA) as the tool of choice. We will use CFA to model the ordinal non-normal data of the questionnaire, to confirm the three-dimensionality is a fixed term, typically written with hyphen of IUIPC-8, to establish global and local fit, and finally to estimate construct validity and internal consistency reliability metrics.

Chapter Overview

We begin this chapter with an overview of information privacy concern and predominant measurement instruments in Sect. 2. We give a brief introduction of validity and reliability notions in Sect. 3 and lay the foundations for the use of confirmatory factor analysis as tool to evaluate measurement-instrument properties in Sect. 4. We discuss the abstract approach used for the evaluation of IUIPC-8 in Sect. 5 and include the empirical results in the validation of the instrument in Sect. 6. Section 7 highlights aspects of the scale properties and considerations for its use in practice in a general discussion. Definitions used throughout the chapter are summarized in the definition box below.

Definitions

  • Validity: Capacity of an instrument to measure what it purports to measure [6, 35].

  • Reliability: Extent to which a variable is consistent in what is being measured [17, p. 123].

  • Construct Validity: Whether the measure accurately reflects the construct intended to measure [23, 35].

  • Factorial Validity: Factor composition and dimensionality are sound.

  • Confirmatory Factor Analysis: Factor analysis in a restricted measurement model: Each indicator is to depend only on the factors specified [25, p. 191].

  • Nested Model: A model that can be derived from another by restricting free parameters.

  • Accept-support test: A statistical inference, in which the acceptance of the null hypothesis supports the model, e.g., the close-fit test [25, p. 265].

  • Reject-support test: A statistical inference, in which the rejection of the null hypothesis supports the model, e.g., the not-close-fit test [25, p. 265].

  • Fit Statistic: A summary measure of the average discrepancy between the sample and model covariances.

  • Goodness of fit χ2: Measures the exact fit of a model and gives rise to the accept-support exact-fit test against null hypothesis \(H_{\chi ^2, 0}\).

  • RMSEA: Root Mean Square Estimate of Approximation, an absolute badness-of-fit measure estimated as \(\hat {\varepsilon }\) with its 90% confidence interval, yielding a range of fit tests: close fit, not-close fit, and poor fit [25, pp. 274].

  • Bentler Comparative Fit Index (CFI): An incremental fit index based on the non-centrality measure comparing selected against the null model.

  • Standardized Root Mean Square Residual (SRMR): A standardized version of the mean absolute covariance residual, for which zero indicates excellent fit.

  • Standardized Factor Loading β: Z-transformed factor score.

  • Variance Extracted R2: The factor variance accounted for, computed as squared standardized loading β2.

  • Average Variance Extracted (AVE): The average of the squared standardized loadings β2 of indicators belonging to the same factor [25, pp. 313].

  • Heterotrait–Monotrait (HTMT) Ratio: A metric of discriminant validity, the ratio of the avg. correlations of indicators across constructs measuring different phenomena to the avg. correlations of indicators within the same construct [20].

  • Cronbach’s α: Internal consistency based on the average inter-item correlations.

  • Congeneric Reliability ω: The amount of general factor saturation (also called composite reliability [25, pp. 313] or construct reliability (CR) [17, p. 676] depending on the source).

2 Information Privacy Concern

2.1 What Is Information Privacy Concern?

Malhotra et al. [33, p. 337] ground their definition of information privacy concern in Westin’s definition of information privacy as a foundation of their understanding of privacy concern: “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.” They define information privacy concern as “an individual’s subjective views of fairness within the context of information privacy.”

This framing of information privacy concern resonates with the interdisciplinary review of privacy studies by Smith et al. [44]. Therein, privacy concern is shown as the central antecedent of related behavior in the privacy macro-model. At the same time, the causal impact of privacy concern on behavior has been under considerable scrutiny. The observed phenomenon, the privacy attitude–behavior dichotomy, is commonly called the privacy paradox [14]. Investigating the privacy paradox has been a mainstay topic of the human aspects of privacy community. This investigation calls for instruments to measure information privacy concern accurately and reliably because measurement errors and correlation attenuation of invalid or unreliable privacy concern instruments could confound the research on the privacy paradox.

2.2 Information Privacy Concern Instruments

Information privacy concern can be measured by a range of related and distinct instruments [8, 10, 12, 33, 45, 50]. As a comprehensive comparison would be beyond the scope of this chapter, we refer to Preibusch’s comprehensive guide to measuring privacy concern [41] for an overview of the field. We will consider the scales most closely related to IUIPC. Table 1 offers a brief overview of these instruments and their dimensions. While IUIPC is one of the most used Internet privacy concern scales, its dimensions also influenced further scales, such as Hong and Thong’s Internet Privacy Concern (IPC) [22]. Still, it remains a relatively concise scale.

Table 1 Overview of selected privacy concern instruments

First point of call is the scale Concern for information privacy (CFIP) [45] as a major influence on the development of IUIPC. CFIP consists of four dimensions—Collection, Unauthorized Secondary Use, Improper Access, and Errors. While both questionnaires share questions, CFIP focuses on individuals’ concerns about organizational privacy practices and the organization’s responsibilities. CFIP received independent empirical confirmations of its factor structure, by Stewart and Segars [46] and by Harborth and Pape [18] on its German translation.

The scale Internet Users’ Information Privacy Concern (IUIPC) was developed by Malhotra et al. [33], by predominately adapting questions of the earlier 15-item-scale Concern for Information Privacy (CFIP) by Smith et al. [45] and by framing the questionnaire for Internet users as consumers. IUIPC is measuring their perception of fairness and justice in the context of information privacy and online companies. IUIPC-10 was established as a second-order reflective scale of information privacy concern, with the dimensions Control, Awareness, and Collection. The authors considered the “act of collection, whether it is legal or illegal,” as the starting point of information privacy concerns. The sub-scale Control is founded on the conviction that “individuals view procedures as fair when they are vested with control of the procedures.” The authors considered being “informed about data collection and other issues” as central concept of the sub-scale Awareness.

Initial appraisals of IUIPC-10 [36, 43] yielded concerns for the validity and reliability of the scale largely tied to two items on awareness and control. These validity and reliability problems were confirmed in an independent empirical evaluation of the scale [16]. Pape et al. [39] independently evaluated the full nomology of IUIPC-10 in Japan.

Internet Privacy Concerns (IPC) [12] considered Internet privacy concerns with antecedents of perceived vulnerability and control, antecedents familiar from the Protection Motivation Theory (PMT). IPC differs from IUIPC in its focus on misuse rather than just collection of information and of concerns of surveillance. In terms of the core scale of privacy concern, Dinev and Hart identified two factors:

  1. (i)

    Abuse (concern about misuse of information submitted on the Internet)

  2. (ii)

    Finding (concern about being observed and specific private information being found out)

It considered the two antecedents Control and Vulnerability. The IPC scale was subsequently expanded on and integrated with other scales by Hong and Thong [22] and further investigated with respect to four driver and inhibitor dimensions by Hong et al. [21]. Herein, Hong and Thong reformulated questions to more consistently express concern.

Buchanan et al.’s Online Privacy Concern and Protection for Use on the Internet (OPC) [10] measure considered three sub-scales—General Caution, Technical Protection (both on behaviors), and Privacy Attitude. Compared to IUIPC, OPC sports a strong focus on item stems eliciting being concerned and on measures regarding a range of concrete privacy risks.

3 Validity and Reliability

When evaluating a privacy concern instrument such as IUIPC-8, the dual vital questions for research in human factors of privacy and the privacy paradox are:

  1. (i)

    Are we measuring the hidden latent construct privacy concern accurately? (validity)

  2. (ii)

    Are we measuring privacy concern consistently and with an adequate signal-to-noise ratio? (reliability)

Without sufficient reliability, a measurement instrument cannot be valid [23].

Validity refers to whether an instrument measures what it purports to measure. Messick offered an early well-regarded definition of validity as the “integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores” [35]. Validity is inferred—judged in degrees—not measured. In this chapter, we put our attention on the validation procedure and underlying evidence for validity and reliability. In that, we largely take content validity of IUIPC for granted. Content validity refers to the relevance and representativeness of the content of the instrument, typically assessed by expert judgment.

3.1 Construct Validity

Messick [34] defines construct validity [11], the interpretive meaningfulness, as the extent to which an instrument accurately represents a construct. This definition has also been used in more recent papers on measurement [23] as a primary kind of validity. Construct validity is typically established by the evaluation of the instrument through multiple lenses, where we will go into factorial, convergent, and discriminant validity.

Factorial Validity

First, we seek evidence of factorial validity, that is, evidence that that factor composition and dimensionality are sound. While IUIPC is a multidimensional scale with three correlated designated dimensions, we require unidimensionality of each sub-scale, a requirement discussed at length by Gerbing and Anderson [15].

Unidimensional measurement models for sub-scales correspond to expecting congeneric measures, that is, the scores on an item are the expression of a true score weighted by the item’s loading plus some measurement error, where in the congeneric case neither the loadings nor error variances across items are required to be equal. This property entails that the items of each sub-scale must be conceptually homogeneous.

We find empirical evidence for factorial validity of a scale’s measurement model in the closeness of fit to the sample’s covariance structure. Specifically, we gain supporting evidence by passing fit hypotheses of a confirmatory factor analysis for the designated factor structure [3, 15, 25], where we prioritize fit metrics and hypotheses based on the RMSEA included in Table 2.

Table 2 Exact and approximate fit hypotheses

Convergent and Discriminant Validity

Convergent validity [17, pp. 675] (convergent coherence) on an item-construct level means that items belonging together, that is, to the same construct, should be observed as related to each other. Similarly, discriminant validity [17, pp. 676] (discriminant distinctiveness) means that items not belonging together, that is, not belonging to the same construct, should be observed as not related to each other. On a sub-scale level, we expect factors of the same higher-order construct to relate to each other, and on hierarchical factor level, we expect all 1st-order factors to load strongly on the 2nd-order factor.

In the first instance, a poor local fit and tell-tale residual patterns yield disconfirming evidence for convergent and discriminant validity. We can further inspect inter-item correlation matrices: we expect items belonging to the same sub-scale to be highly correlated and, thereby, to converge on the same construct. Correlation to items of other sub-scales should be low, especially lower than the in-construct correlations [25, pp. 196].

These principles give rise to criteria based on the average variance extracted (AVE), the Fornell–Larcker criterion [13], and the Heterotrait–Monotrait Ratio (HTMT) [1, 20]. We summarize these terms in the definition box of this chapter.

3.2 Reliability

Reliability is the extent to which a variable is consistent in what is being measured [17, p. 123]. It can further be understood as the capacity of “separating signal from noise” [23, 42, p. 709], quantified by the ratio of true score to observed score variance [25, pp. 90]. We evaluate internal consistency as a means to estimate reliability from a single test application. Internal consistency entails that items that purport to measure the same construct produce similar scores [25, p. 91]. We will use the internal consistency measures Cronbach’s α, congeneric reliability ω, and AVE, defined in the definition box of this chapter. While Cronbach’s α is well-known in the community, average variance extracted (AVE) offers a simple intuitive measure, and congeneric reliability provides a robust approach.

Thresholds for reliability estimates such as Cronbach’s α or composite reliability ω are debated in the field, where many recommendations are based on Nunnally’s original treatment of the subject, but equally often misstated [28]. The often quoted α ≥ 0.70 was described by Nunnally only to “save time and energy,” whereas a greater threshold of 0.80 was endorsed for basic research [28].

When we designate a priori thresholds as criteria for internal consistency reliability, this approach needs to be put into a broader context. As for validity, reliability is judged in degrees. John and Benet-Martínez [23] discuss the arbitrariness of one-size-fits-all fixed reliability thresholds. Internal consistency reliability needs to be considered in relation to inter-item correlations and the length of a scale and, further, how these aspects fit the nature of the construct in question. Ultimately, the choice of thresholds gives rise to a bandwidth–fidelity trade-off [23]. Whether we call an instrument “reliable enough” depends on the proportion of error variance we are willing to tolerate and on the attenuation of the correlation to other variables as a consequence of that.

4 Factor Analysis as Tool to Establish Measurement Instruments

Factor analysis is a powerful tool for evaluating the construct validity and reliability of privacy concern instruments. Thereby, it constitutes validation procedure for the measurement instruments [23]. Factor analysis refers to a set of statistical methods that are meant to determine the number and nature of latent variables (LVs) or factors that account for the variation and covariation among a set of observed measures commonly referred to as indicators [9].

Confirmatory factor analysis (CFA) is a factor analysis in a restricted measurement model, that is, in which each indicator depends only on the factors specified [25, pp. 191]. CFA is commonly used to evaluate psychometric instruments. It is based on the common factor model (CFM), which holds that each indicator variable contributes to the variance of one or more common factors and one unique factor. Thereby, common variance of related observed measures is attributed to the corresponding latent factor, and unique variance (uniqueness) seen either as variance associated with the item or as error variance. We call the proportion of variance associated with factors communality and the proportion of variance not associated with factors uniqueness. We depict the common factor model in Fig. 1.

Fig. 1
A model diagram lists the factors of the total variance. They are communality in common variance and uniqueness in unique variance, along with specific variance and error variance.

Variance attribution in the common factor model (CFM)

IUIPC is based on a reflective measurement, that is, the observed measure of an indicator variable is seen as caused by some latent factor. Indicators are thereby endogenous variables, and latent variables exogenous variables. Reflective measurement requires that all items of the sub-scale are interchangeable [25, pp. 196]. In this chapter, we focus on covariance-based confirmatory factor analysis (CB-CFA). Therein, the statistical tools aim at estimating coefficients for parameters of the measurement model that best fit the covariance matrix of the observed data. The difference between an observed covariance of the sample and an implied covariance of the model is called a residual.

4.1 Estimation Methods for Ordinal Non-normal Data

The purpose of a factor analysis is to estimate free parameters of the model (such as loadings or error variance), which is facilitated by estimators. The choice of estimator matters because each comes with different strengths and weaknesses, requirements, and assumptions that need to be fulfilled for the validity of their use.

While maximum likelihood (ML) estimation is the model commonly used estimation method for CFA, it is based on assumptions [25, pp. 71] that are not satisfied by IUIPC:

  1. (i)

    A continuous measurement level

  2. (ii)

    Multi-variate normal distribution (entailing the absence of extreme skewness) [25, pp. 74]

The distribution requirements are placed on the endogenous variables: the indicators.

First, the Likert items used in IUIPC are ordinal [17, p. 11], that is, ordered categories in which the distance between categories is not constant. Lei and Wu [30] held based on a number of empirical studies that the fit indices of approximately normal ordinal variables with at least five categories are not greatly misleading. However, when ordinal and non-normal is treated as continuous and normal, the fit is underestimated, and there is a more pronounced negative bias in estimates and standard errors. While Bovaird and Kozoil [7] acknowledge robustness of the ML estimator with normally distributed ordinal data, they stress that increasingly skewed and kurtotic ordinal data inflate the Type I error rate and, hence, require another approach [25, pp. 323]. In the same vein, Kline [24, p. 122] holds the normality assumption for endogenous variables—the indicators—to be critical.

4.2 Comparing Nested Models

Nested models [25, p. 280] are models that can be derived from each other by restricting free parameters. They can be compared with a likelihood ratio χ2 Difference Test (LRT) [25, p. 270]. This technique comes into play when we compare multiple models that are based on the same indicator variables, e.g., to establish which factor structure most suits the covariance matrix. We use this technique in comparing one-factor solutions with solutions with multiple factors.

4.3 Global and Local Fit

The closeness of fit of a factor model to an observed sample is evaluated globally with fit indices as well as locally by inspecting the residuals. We shall focus on the ones Kline [25, p. 269] required as minimal reporting.

Statistical Inference

The χ2 and RMSEA indices offer us statistical inferences of global fit. Such tests can either be accept-support, that is, accepting the null hypothesis supports the selected model, or reject-support, that is, rejecting the null hypothesis supports the selected model. We present them in Table 2.

Local Fit

Even with excellent global fit indices, the inspection of the local fit—evidenced by the residuals—must not be neglected. Kline [25, p. 269] emphasizes “Any report of the results without information about the residuals is incomplete.” Large absolute residuals indicate covariations that the model does not approximate well and that may, thereby, lead to spurious results.

5 Approach

In this section, we are weaving a general approach for creating a valid and reliable measurement instrument with specific design decisions taken for the brief information privacy concerns scale IUIPC-8 [16]. General approaches for systematic constructions of measurements [23], measurement models for survey research [5], or their reliability and validity [2] are well-documented in the literature. Here, we introduce specific considerations for IUIPC-8. The following aspects inform this evaluation:

  • The scale IUIPC-8 is derived from the long-standing scale IUIPC-10. Hence, a comparison of both scales is in order.

  • We will conduct confirmatory factor analyses to establish the dimensionality and construct validity of the scale.

  • The IUIPC data will be from ordinal Likert items, with a skewed non-normal distribution.

  • We will need sufficient sample sizes for the statistical power on RMSEA-based statistical inferences.

  • We aim at a scale that yields low attenuation of the correlations of its latent variable in the relation to other variables, requiring good internal consistency reliability.

5.1 Analysis Methodology

Our analysis of IUIPC-10 and the brief variant IUIPC-8 will be supported by confirmatory factor analyses on two independent samples, one used for specification and refinement and the other used for validation. The factor analyses yield the evidence for unidimensionality of the sub-scales and the overall dimensionality of the instrument. While the creation of a new measurement instrument would often start with an exploratory factor analysis on a candidate item pool and another independent sample, here we shall focus only on the confirmatory factor analyses setting the 8-item and 10-item variants apart. The corresponding analysis process is depicted in Fig. 2.

Fig. 2
A flow diagram represents the corresponding steps taken on the samples. The steps include data preparation, I U I P C 10 C F A, I U I P C 8 C F A, and validation C F A. Samples are B for base and V for validation. Sample B undergoes the first 3 steps, while sample V has the first and last step alone.

Which steps were taken on what sample (adapted from [16])

Because IUIPC yields ordinal, non-normal data, the distribution of the data asks for careful analysis as part of the data preparation. The assumptions of a standard maximum likelihood estimation will be violated, by which we are preparing for a robust diagonally weighted least square (DWLS) estimation method as tool of choice. The specific method employed is called WLSMV, a robust diagonally weighted least square (DWLS) estimation with robust standard errors and mean- and variance-adjusted test statistics using a scale shift. The choice of estimation method will also impact the sample size we need to obtain: Apart from cases of small samples (N < 200), WLSMV was found to be less biased and more accurate than robust ML estimation (MLR) [31]. For smaller sample sizes, we would recommend MLR.

5.2 Sample

The quality of the sampling method, that is, how participants are drawn from a population, has a considerable impact on the sampling and non-sampling biases introduced early in a study. In an ideal case, the target and survey population are clearly specified and the sampling frame explicitly defined [19, 48]. In terms of sampling method, random sampling, possibly stratified to be representative of the population, carries the least bias.

The sample size is determined based on three requirements:

  1. (i)

    The size needed to reach a diversity approximately representative of the UK population (N > 300)

  2. (ii)

    The minimum sample size to make DWLS estimation viable (N > 200)

  3. (iii)

    The sample size required to reach adequate statistical power

For confirmatory factor analyses considered in this chapter, the key statistical inferences are on the χ2 significance test for the exact fit and the RMSEA-based significance tests for approximate fit. Hence, we determined the sample size for an RMSEA close-fit test in an a priori power analysis [27, 32, 49]. We used the R package SEMPower and an IUIPC-10 model with npar = 23 free parameters and df  = 32 degrees of freedom as benchmark. To reach 1 − β = 80% statistical power in this constellation, we would need a target sample size of N1−β=0.80 = 317.

For the analysis of IUIPC-8, we employed two independent samples, B and V. Base sample B and validation sample V were designated with a sample size of 420 each, allowing for some sample size and power loss in the sample refinement and analysis.

The samples used here were used for an earlier study [16] establishing IUIPC-8 and, hence, serve for illustration and not as an independent validation of the questionnaire. The samples were recruited on Prolific Academic [38] to be representative of the UK census by age, gender, and ethnicity. The sampling frame was Prolific users who were registered on the platform as residents of the UK, consisting of 48, 454 users at sampling time (August 2019). The sampling process was as follows:

  1. 1.

    Prolific established sample sizes per demographic strata of the intended population.

  2. 2.

    It presented our studies to the registered users with matching demographics.

  3. 3.

    The users could choose themselves whether they would participate or not.

We enforced sample independence by uniqueness of the participants’ Prolific ID.

We note that the sampling method is not random, it is a crowdsourcing sample with demographics screening [29], yet Prolific has been found to obtain samples from a diverse population and with high data quality and reliability [40].

5.3 Validity and Reliability Criteria

The first consideration for construct validity is in the factorial validity of the model, where we compare multiple possible factor structures to confirm the dimensionality. Based on the overall selected model structure, we then turn to an analysis of the global and local fit in a comparison between IUIPC-10 and IUIPC-8 on samples B and V. For the fit of the models, we consider the RMSEA-based fit hypotheses shown in Table 2 as important part of our investigation. Here we are interested in getting support from the close-fit hypothesis, being aware that the CFAs will not have enough statistical power to offer a tight enough confidence interval on the RMSEA estimate to reject the not-close-fit hypothesis.

For convergent and discriminant validity, we turn to empirical criteria, especially relying on the average variance extracted (AVE) and Heterotrait–Monotrait Ratio (HTMT) in the definition box of this chapter. We gain empirical evidence in favor of convergent validity [17, pp. 675]:

  1. (i)

    If the variance extracted by an item R2 > 0.50 entailing that the standardized factor loading are significant and β > 0.70.

  2. (ii)

    If the internal consistency (defined in Sect. 3.2) is sufficient (AVE > 0.50, ω > AVE, and ω > 0.70).

The analysis yields empirical evidence of discriminant validity [17, pp. 676]:

  1. (i)

    If the square root of AVE of a latent variable is greater than the max correlation with any other latent variable (Fornell–Larcker criterion [13])

  2. (ii)

    If the Heterotrait–Monotrait Ratio (HTMT) is less than 0.85 [1, 20]

While that would be beneficial for privacy research as well, we shall adopt reliability metrics α, ω ≥ 0.70 as suggested by Hair et al. [17, p. 676].

6 The Validation of IUIPC-8

In this section, we are examining the model of IUIPC-8 in a diagonally weighted least square (DWLS) CFA estimation with robust standard errors and a scale-shifted mean- and variance-adjusted test statistic (WLSMV). We begin our inquiry with the characteristics of the underlying sample (Sect. 6.1) and distribution (Sect. 6.2), considering a base Sample B and an independent validation Sample V.

6.1 Sample

The demographics of both samples B and V are included in Table 3. While these samples were meant to be drawn to be UK representative, we observe an under-representation of elderly participants compared to the UK census age distribution. Still, the sample offers us sufficient diversity for the evaluation of the scale.

Table 3 Demographics, table taken from Groß [16] licensed under CC BY-NC-ND 4.0

The two samples have undergone a sample refinement in stages, which Table 4 accounts for. The refinement included:

  1. (i)

    Removing incomplete cases without replacement

  2. (ii)

    Removing duplicates across samples by the participants’ Prolific ID, to guarantee independence

  3. (iii)

    Removing cases in which participants failed more than one attention check (FailedAC > 1)

Table 4 Sample refinement, table taken from Groß [16] licensed under CC BY-NC-ND 4.0

The named attention checks were instructional manipulation checks [37] distributed over the wider questionnaire.

Overall, of the NC = 848 complete cases, only 4.2% were removed due to duplicates or failed attention checks. After this refinement, a small number of multi-variate outliers were removed.

6.2 Descriptives

Evaluating the sample distribution, we found the indicator variables to be negatively skewed. The distributions tail off to the left. The Control and Awareness indicators suffer from positive kurtosis. We found that the indicator distributions as well as the IUIPC sub-scale distributions exhibited substantial non-normality. We illustrate these aspects in Table 5 and Fig. 3.

Fig. 3
Four distribution graphs plot 3 left-skewed bell curves. Curves in all graphs overlap and the areas are shaded in different colors. The first 2 graphs have fluctuations with low to high peaks.

Density of IUIPC-8 sub-scale responses across samples (B: violet, V: orange). Note: All graphs are on the same scale (adapted from [16]). (a) Control. (b) Awareness. (c) Collection. (d) IUIPC-8 overall

Table 5 Means (SDs) of the summarized sub-scales of IUIPC-8 and the original IUIPC-10 (adapted from [16])

We observed that the two samples displayed approximately equal distributions by sub-scales. Controlling for the difference between Samples B and V, we found that none of their sub-scale means was statistically significantly different, the maximal absolute standardized mean difference being 0.13—a small magnitude.

Our IUIPC-10 samples yielded 6% univariate outliers by the robust outlier labeling rule and 3% multi-variate outliers with a Mahalanobis distance of 12 or greater [25, pp. 72]. We removed these MV outliers as indicated in Table 4.

These observations on the distribution of the samples are relevant for the choice of estimator for the confirmatory factor analysis to come. A maximum likelihood (ML) estimation would require continuous measurement with multi-variate normality. These assumptions are clearly not fulfilled. While a robust maximum likelihood estimation (MLM) can also be considered, we opted for a diagonally weighted least square (DWLS) estimation with robust standard errors and scale-shifted mean- and variance-adjusted test statistics (WLSMV), typically considered preferred for ordinal/non-normal data.Footnote 1

6.3 Construct Validity

6.3.1 Factorial Validity

First, we investigate the three dimensionality of IUIPC-8. To that end, we computed confirmatory factor analyses on one-factor, two-factor, and the hypothesized three-dimensional second-order model displayed in Table 6. The two-factor solution was statistically significantly better than the one-factor solution, χ2(1) = 215.065, p < .001. In turn, the three-factor solutions were statistically significantly better than the two-factor solution, χ2(2) = 30.165, p < .001. Hence, given the results of the likelihood ratio tests (LRTs) on these nested models, we choose the three-dimensional second-order model. This is the model also shown in the path plot of Fig. 4.

Fig. 4
A tree diagram of C F A paths plot with standardized estimates of I U I P C 8 on sample B. The root node, i p c leads to 3 main branch nodes, c t r, a w r, and c l l. c t r has c t 1 and c t 2, a w r has a w 1 and a w 2, and c l l has c l 1, c l 2, c l 3, and c l 4. Each path has its respective value.

CFA paths plot with standardized estimates of IUIPC-8 on Sample B. Note: The dashed lines signify that the raw factor loading was fixed to 1 (cf. Table 9, figure was adapted from [16])

Table 6 Comparison of different model structures of IUIPC-8 on Sample B with WLSMV estimation

6.3.2 Model Fit

Global Fit

Second, we evaluate the global fit as a measure of factorial validity. We do this in a two-way comparison of WLSMV CFAs on the following dimensions:

(i):

IUIPC-10 vs. IUIPC-8

(ii):

Base sample B and validation sample V

Table 7 reports on the fit statistics of the four CFA models.

Table 7 Fit statistic comparison of IUIPC-10 and IUIPC-8 (adapted from [16])

Because IUIPC-10 and IUIPC-8 models are non-nested, we cannot use likelihood ratio test (LRT) to evaluate their difference. In the WLSMV estimation, we are left with comparing fit measures.Footnote 2

Regarding the global fit reported in Table 7, we notice that all CFA models fail the exact-fit test on the χ2 test statistic. To evaluate approximate fit, we draw attention to the root mean square estimate of approximation (RMSEA), its confidence interval, and the close-fit hypothesis \(H_{\varepsilon _0 \leq .05, 0}\). We observe that the IUIPC-10 models are not supported by the close-fit test, and the IUIPC-8 models are Sample B: \(p_{\epsilon _0 \leq .05} = .086\) and Sample V: \(p_{\epsilon _0 \leq .05} = .394\). Hence, we conclude that the IUIPC-8 model shows a sufficient approximate close fit, even if not an exact fit.

Local Fit

The good global fit for IUIPC-8 shown in Table 7 alone is not sufficient to vouch for the overall fit of the model. For this purpose, we inspect the correlation and raw residuals in Table 8. Therein, we observe slightly reduced correlation residuals between coll1 and the awareness indicator variables. These negative correlation residuals mean that the CFA model overestimates the correlation between the indicator variables in question. The correlation residuals in the validation model (included in the online supplementary materials) show lower deviations. Hence, we believe both models to hold acceptable local fit.

Table 8 Residuals of the WLSMV CFA of IUIPC-8 on Sample B

6.3.3 CFA Model, Convergent, and Discriminant Validity

We illustrate the selected second-order CFA model for IUIPC-8 in Fig. 4. Table 9 contains the corresponding factor loadings with their standardized solutions. The standardized loadings of the model give us confidence in the convergent validity of the model: the average variance extracted (AVE) was greater than 50% for all first-level factors. This observation holds equally for the standardized factor loadings of the validation CFA, summarized in the online supplementary materials.

Table 9 Factor loadings and their standardized solution of the WLSMV CFA of IUIPC-8 on Sample B

In terms of discriminant validity, we first consider the Fornell–Larcker criterion in Table 10. As required, we find that the square root of the AVE displayed on the diagonal of the matrices is greater than the inter-factor correlations in the rest of the matrix. This holds for both the base Sample B and the validation Sample V.

Table 10 First-level correlations and \( \sqrt {{\mathit {AVE} }}\) as evidence for the Fornell–Larcker criterion for discriminant validity on IUIPC-8

Further, we evaluate the HTMT criterion in Table 11. We are satisfied with this criterion for Samples B and V at a threshold of 0.85. Hence, we conclude that the scale offers sufficient discriminant validity.

Table 11 Heterotrait–Monotrait ratios as criterion for discriminant validity on IUIPC-8

6.4 Reliability: Internal Consistency

Table 9 also includes reliability metrics derived from the WLSMV CFA model of IUIPC-8. For both the base Sample B and the validation Sample V, we observe that the congeneric reliability ω is consistently greater than .70. By that, the reliability criteria established in Sect. 5.3 are fulfilled, and we can expect a good signal-to-noise ratio for the scale.

7 Discussion

We have seen that IUIPC-8 offers good construct validity and reliability. The outcomes of our analysis are summarized in Table 12. It can serve as a useful measurement instrument for information privacy concern. As any measurement instrument, IUIPC-8 is a working solution, which may be proven wrong eventually and superseded by more refined scales [23].

Table 12 Selected evidence for construct validity and reliability criteria on Samples B and V under WLSMV estimation (adapted from [16])

In terms of bandwidth–fidelity trade-off [23], IUIPC-8 offers a greater fidelity than the original scale IUIPC-10 [33], at the expense of bandwidth. Because of the greater congeneric reliability, we expect less attenuation in the correlations to other variables that with IUIPC-10: according to classical test theory, such correlations are bounded by the square root of its reliability. Thereby, IUIPC-8 can be useful in investigating relations to other variables, such as impact of privacy concern on privacy behavior.

The restriction to eight items also bears limitations that need to be considered carefully. First, the factors Control and Awareness are based on a narrower footing in terms of content validity, that is, in terms of coverage of relevant and representative aspects of the construct. We also need to consider CFA model identification. While IUIPC-8 as a whole is identified because of the two-indicator rule [25, Rule 9.1], the sub-scales Control and Awareness on their own will not be identified. Hence, they cannot be used as robust measurement instruments of their own.

In terms of future work, it would be preferable to refine IUIPC-8 with further items rounding out the sub-scales Control and Awareness, while maintaining a high construct validity and reliability. Ideally, each factor would have three or more indicators. While this chapter largely focused on the construct validity and reliability in the form of internal consistency of the scale itself, a more comprehensive evaluation of privacy concern scales is vital. For IUIPC-8, we considered internal consistency as reliability (that is, generalizing across items). At the same time, retest reliability (generalizability across times) and equivalence reliability (generalizability across forms) are still research areas to expand. In addition, the investigation of IUIPC in its full nomology is important, such as pursued by Pape et al. [39] in the case of the use of the scale in Japan.

8 Summary

This chapter considers the validity and reliability of privacy concern scales, with IUIPC-8 as an example of a brief information privacy concern scale:

  • We introduced validity and reliability concepts, focusing on construct validity and congeneric reliability.

  • We discussed confirmatory factory analysis as a tool to establish the properties of measurement instruments.

  • We discussed CFA estimation methods for ordinal and non-normal data as found with the IUIPC scale.

  • We included an empirical analysis of IUIPC-8 on both a base sample and an independent validation sample.

  • We evaluated validity and reliability criteria in the comparison of IUIPC-10 and the brief form IUIPC-8.