Lord–Wingersky Algorithm Version 2.5 with Applications

Item response theory scoring based on summed scores is employed frequently in the practice of educational and psychological measurement. Lord and Wingersky (Appl Psychol Meas 8(4):453–461, 1984) proposed a recursive algorithm to compute the summed score likelihood. Cai (Psychometrika 80(2):535–559, 2015) extended the original Lord–Wingersky algorithm to the case of two-tier multidimensional item factor models and called it Lord–Wingersky algorithm Version 2.0. The 2.0 algorithm utilizes dimension reduction to efficiently compute summed score likelihoods associated with the general dimensions in the model. The output of the algorithm is useful for various purposes, for example, scoring, scale alignment, and model fit checking. In the research reported here, a further extension to the Lord–Wingersky algorithm 2.0 is proposed. The new algorithm, which we call Lord–Wingersky algorithm Version 2.5, yields the summed score likelihoods for all latent variables in the model conditional on observed score combinations. The proposed algorithm is illustrated with empirical data for three potential application areas: (a) describing achievement growth using score combinations across adjacent grades, (b) identification of noteworthy subscores for reporting, and (c) detection of aberrant responses.


Introduction
Generalizing the seminal Lord-Wingersky (1984) algorithm to other settings has been a regular topic in item response theory (IRT) research since its initial publication more than 35 years ago. Also well known in the Rasch modeling community (Andersen, 1972;Gustafsson, 1980), this simple recursive algorithm's wide-reaching impact in psychometrics is impressive to behold. For example, Hanson (1994), Thissen et al. (1995), as well as von Davier and Rost (1995), were among the first to expand the algorithm to polytomous IRT models. Chen and Thissen (1999) derived an item calibration algorithm based on summed scores. Thissen and Wainer's (2001) influential text on test scoring presented extensive methods for handling mixed-format tests, including an approach to handle score combinations (Rosa et al., 2001) that heavily influenced our thinking in the study reported here.  applied the Lord-Wingersky algorithm to illustrate summed score-based test linking, another area consistently of interest to psychometricians (e.g., Zeng & Kolen, 1995;Thissen et al., 2011).  proposed a solution to the item fit testing problem with a slight alteration of the original Lord-Wingersky algorithm. Li and Cai (2018) further extended the algorithm to create more accurate distributional approximations for test statistics sensitive to latent variable distributional assumptions. Stucky (2009), and independently Kim (2013), developed the weighted version of the algorithm wherein the item scores can take non-integer values.
the item response function of the ith item (i = 1, . . . I n ) in cluster n, such that T i (1|η, ξ n ) = 1 1 + exp − c i + a 0 i η + a n i ξ n where a 0 i and a n i are the primary latent dimension and specific latent dimension item slopes, respectively, and c i is the item intercept. The item parameters are assumed to be known and fixed, usually from a calibration study.
The middle equation in (3) is repeated over values of x between 1 and i − 1.
To avoid notational clutter, let P n (s n |η,ξ n ) = P n I n (s n |η, ξ n ) denote the likelihood associated with the within-cluster summed score s n = 0, . . . , I n , after all I n items in cluster n have been added according to the recursions defined in Eq. (3). At this point, an extra step is performed. The specific latent dimension, ξ n , is integrated out, leaving the summed score likelihoods solely a function of the primary latent dimension, η. For simplicity, we can approximate this integral with rectangular quadrature: where Q is the number of quadrature points, Y q the qth quadrature point, and W n (Y q ) is the corresponding quadrature weight, computed as normalized ordinates of g (ξ n ).

Stage II
At the end of the first stage, available to us are N sets of within-cluster summed score likelihoods {P n (s n |η) ; s n = 0, . . . , I n }, for n = 1, . . . , N . These quantities depend only on the primary latent dimension η. Each item cluster can now be treated as if it were a polytomous item with I n + 1 categories, and the "item scores" range from 0 to I n .
Denote L n (s|η) as the likelihood of summed score s after adding item cluster n to the existing summed score likelihoods in the recursive computation described below. Let S n be the maximum obtainable summed score after adding item cluster n. In our context when the items are all dichotomous, S n = n j=1 I j . Obviously S N would be the maximum summed score. At this point, the standard Lord-Wingersky algorithm for polytomous items can be applied.
Let L 1 (s 1 |η) = P 1 (s 1 |η) , ∀s 1 = 0, . . . , I 1 , for the purpose of initialization. Then in step n(2 < n ≤ N ), the likelihoods P n (s n |η) from item cluster n are added to the likelihoods from previous step to form the desired summed score likelihoods. For each possible summed score 0 ≤ s ≤ S n , we let where 1 s (s n−1 + s n ) is an indicator function and takes the value of 1 if s n−1 + s n = s and 0 otherwise. Equation (5) essentially involves the booking keeping for a pair of scores s n−1 (from all item clusters added previously) and s n (from the current item cluster) that adds up to the summed score s. When all N item clusters are included L N (s|η)-or simply L(s|η) to reduce clutter-contains the summed score likelihoods for the primary dimensions for 0 ≤ s ≤ S N .

Posterior Summaries
Recall that h(η) is the prior distribution of the primary latent dimension. The normalized posterior of η associated with summed score s is where p (s) is the (marginal) probability of summed score s: and Q rectangular quadrature points X q are used to approximate the posterior, with W X q the normalized ordinates of h(η). The posterior mean E (η|s) and posterior variance V ar (η|s) = E η 2 |s − E 2 (η|s) are useful summaries, where A normal approximation of the posterior based on the posterior mean and variance often works quite well even when the number of items is moderate. The posterior mean can be used as the summed score-based IRT scaled score estimate and the posterior variance as the error variance estimate for the scaled score. The marginal probability p (s) itself can be useful either as a modelbased (pre-operational) estimated of the expected summed score group probability or as an aid in IRT model fit checking.

General Approach
Recall the bifactor model with N specific latent dimensions defined in Sect. 2. Each of the N item clusters includes I n items. The Lord-Wingersky algorithm 2.0 is focused on obtaining the posterior distribution of the primary dimension η, conditioned on the overall summed score. The specific latent dimensions are integrated out at the end of Stage I (see Sect. 2.1). In the proposed algorithm, we obtain bivariate posteriors of the primary latent dimension η and the specific latent dimension ξ n .
Instead of the overall summed score, each bivariate posterior is conditioned on a pair of scores. We continue to use s n to denote the summed scores from item cluster n, where s n = 0, . . . , I n , and introduce here new notation for the rest score s (n) , i.e., the summed score from all clusters except item cluster n. Let S (n) = N j=1, j =n I j be the maximum summed score from the rest of the item clusters so s (n) = 0, . . . , S (n) . Cai (2015) in fact alluded to the possibility of using the summed vs. rest score combination (s n , s (n) ), but stopped shy of actually computing the bivariate posterior, as we now outline below.

Stage I
In the first stage, for each item cluster, the within-cluster summed score likelihoods are accumulated over the space spanned by η and ξ n , n = 1, . . . N , just as in Lord-Wingersky algorithm 2.0. At the end of this stage, we retain and store the likelihoods for the primary dimension {P n (s n |η) ;s n = 0, . . . , I n } , ∀n = 1, . . . , N . The critical added requirement is that we also retain and store all the within-cluster summed score likelihoods {P n (s n |η, ξ n ) ; s n = 0, . . . , I n } , ∀n = 1, . . . , N . In a quadrature representation of the likelihoods, at most Q × Q floating point values are stored per cluster, per score, if Q quadrature points per dimension are used.

Stage II
We will now cycle through the item clusters to compute the desired bivariate posteriors. In general, for item cluster n, we wish to construct bivariate posteriors for η and ξ n . Recall that the other item clusters do not depend on ξ n , so we proceed by treating the cluster summed score likelihood values P 1 (s 1 |η) , . . . , P n−1 (s n−1 |η) , P n+1 (s n+1 |η) , . . . , P N (s N |η) as though they were polytomous items that depend on η. The standard Lord-Wingersky algorithm can now be applied readily to produce item cluster n's rest score likelihoods R n s (n) |η , for s (n) = 0, . . . , S (n) . In other words, the recursions work in exactly the same manner as Sect. 2.2, except that we omit the likelihood contributions from P n (s n |η).
The rest score likelihoods R n s (n) |η are then combined with the summed score likelihoods from item cluster n, P n (s n |η, ξ n ), s = 0, . . . , I n , as well as the prior distributions for η and ξ n , to yield the bivariate posterior distributions of η and ξ n associated with the summed vs. rest score combination (s n , s (n) ): where the marginal probability p s n , s (n) is Again, the posterior above can be easily approximated with rectangular quadrature: Aside from the marginal probability in Eq. (10), other useful summaries of the posterior distribution include the mean vector μ the covariance matrix , which facilitate a bivariate normal approximation to the posterior that can be quite effective in practice, as we shall demonstrate later. The marginal posterior means μ 0 = E η|s n , s (n) and μ n = E ξ n |s n , s (n) , and the error variances and covariance σ 00 = V ar η|s n , s (n) , σ 0n = Cov η, ξ n |s n , s (n) , and σ nn = V ar ξ n |s n , s (n) provide reasonable point estimates and error (co)variance estimates for all the latent variables in the model. These means and covariance matrix elements can be approximated with quadrature along similar lines as Eq. (11).

An Illustrative Example
Consider the same hypothetical six-item scale with bifactor structure as discussed in Cai (2015). These six dichotomous items form three item clusters, each consisting of two items. Priors of all four latent dimensions (one primary latent dimension and three specific latent dimensions) are assumed independent and standard normal. Table 1 shows the item parameters and the factor pattern.
For each dimension, Q = 5 equally spaced quadrature points at − 2, − 1, 0, 1, and 2 are used for demonstrate purposes only (practical usage of the algorithm requires much larger values of Q). Thus, a 5 × 5 grid is formed as the direct product of η and each of the three specific latent dimensions when appropriate. Summed score likelihoods are evaluated over these grid points. Tables 2 and 3 show the first stage of the Lord-Wingersky algorithm 2.5 to compute the bivariate posterior of η and ξ 1 . In Table 2, the within-cluster summed score likelihoods of the three Accumulating within-in summed score likelihoods for item cluster 1, 2, and 3 Quadrature grid for (η, ξ 1 )
Integrating the specific dimensions ξ 2 and ξ 3 out of the summed score likelihoods . 054 .244 .403 . . . .403 . . . .403 .244 .054 Multiplying cluster 2's summed score likelihoods by W 2 (ξ 2 ) Leaving cluster'2 summed score as a function of η, by integrating out ξ 2 (summing over X q ) .742 .519 .277 .106 .028 .054 .244 .403 · · · .403 · · · .403 .244 .054 Multiplying cluster 3's summed score likelihoods by .000 .003 .038 · · · .190 · · · .337 .231 .054 Quadrature grid for η − 2 − 1 0 1 2 Leaving cluster'3 summed score as a function of η, by integrating out ξ 3 (summing over ξ 3 ) item clusters are computed. In Table 3, specific dimensions ξ 2 and ξ 3 are integrated out, leaving the observed summed score of item cluster 2 and 3 as a function of η. Table 4 shows the second stage of the algorithm, where item clusters 2 and 3 are treated as polytomous items (both with 3 categories), while the rest score likelihoods for item cluster 1 are calculated. Table 5 shows how the bivariate posteriors associated with each summed vs. rest score pattern are computed. Summaries of the bivariate normal approximations of posteriors associated with the score combinations are shown in Table 6. Figure 1 shows the equal probability contours of the bivariate normal approximated posteriors for five score combinations (with the mean vectors and covariance matrices in Table 6). Each contour includes 25% of the volume under the posterior density. The five score combinations .348 .157 .046 .008 .001 .378 .319 .175 .059 .012 .049 .155 .310 .387 .321 .005 .032 .130 .331 .578 Table 5.

The More General Case
The algorithm generalizes naturally when the number of general dimensions exceeds 1 (M > 1). In this case, the single set of rectangular quadrature points Y q that cover the latent variable space of η becomes a direct product of M sets of quadrature points. The marginal posterior means μ 0 = E η|s n , s (n) is now a M × 1 vector, the primary dimension error covariance matrix 00 = V ar η|s n , s (n) is M × M, and the covariance terms in σ 0n = Cov η, ξ n |s n , s (n) become a M × 1 vector.
It is worth noting if there are more than two item clusters, the estimates of η will change depending on which item cluster is treated as the focal cluster (e.g., cluster 1 vs. the rest, or cluster 2 vs. the rest, etc.). This phenomenon is analogous to the difference between response patternbased scaled scores and summed scores-based scaled scores in unidimensional IRT models that do not assume equal item slope parameters. Items with varying slope parameters are not equally discriminating, and different response patterns with the same summed score will necessarily lead to different scaled score estimates. This is well understood. In hierarchical item factor models, item clusters take the place of items. The item clusters may have different difficulty and discriminability as far as η is concerned, and therefore different ways of decomposing the total summed score will lead to different η estimates. For a reader whose only concern is scoring for η, Cai's

Growth Interpretation of Observed Score Combinations
ELPA21 is a multi-state assessment program that provides measures of English language proficiency of English Learners (ELs) in K-12 educational systems in the participating states. It measures ELs' proficiencies in four language domains-reading, listening, writing, and speakingfrom kindergarten to high school (i.e., kindergarten, grade 1, grade band 2-3, grade band 4-5, grade band 6-8, and grade band 9-12). Cut scores were established from standard setting studies in each domain and grade band so that students are classified as emerging, progressing, or proficient. Parameters of items in tests of different grade bands were calibrated separately (CRESST, 2017). Thus, the latent ability scales of two adjacent grade bands are different and the scale scores of tests of different grade bands cannot be compared directly.
A convenient and transparent way to report students' growth is desired, but ELPA21 took the position that vertical scaling, a technique popular in statewide accountability testing, should not employed. The major reason for the choice was that language learning and proficiency development change rapidly over time, especially in the early stage (e.g., PK to 2), potentially shifting the construct being measured considerably. Measures of proficiency such as ELPA21 test scores probably should not be compared directly (or forced on the same scale) for a PK English Learner vs. an English Learner in upper elementary. Hansen and Monroe (2018) provide additional discussions of this topic. Here, we explore the possible use of observed score combinations to describe student growth through the application of Lord-Wingersky algorithm 2.5.
Consider two fixed-form tests of in adjacent grade bands-one in the lower-grade band and one in the upper-grade band. The example here is the listening tests from grade bands 4-5 and 6-8. The lower-grade band test consists of 24 dichotomous items, and upper-grade band test includes 30 dichotomous items. A random sample of 300 students who took both tests was used to estimate the population distribution, while item parameters were held at the pre-calibrated values (see Table 7). Items in each test form are thought to load on either the lower-or upper-grade band  (Cai, 2017) without additional programming. This is analogous to Cai's (2015) Example 4.2 that replicates more specialized score combination computations, wherein the bifactor model was strictly not needed. The classification of emerging, progressing, or proficient are made out of the volume of the marginal posterior distribution that falls between the cut scores. For example, the probabilities of emerging, progressing, or proficient of students with observed score combination (13, 18) are .84, .16, and 0 in the lower-grade band and .24, .75, and .01 in the upper-grade band. We may then communicate clearly to the users of the score reports that this particular combination of 13 (out of 24) on the lower-grade band test and 18 (out of 30) on the upper-grade band test indicates an improvement from emerging to progressing. In a similar fashion, the score combination (13, 29) represents an improvement from progressing to proficient.
The probabilities of each of the combinations are also natural by-products of our recursive algorithm. Among students who received a score of 13 on the lower-grade band test (expected to be roughly 1.89% of the student population, based on the model), a score of 24 on the upper-grade band test places the student at the 74% percentile, which is akin to a student growth percentile (SGP; Betebenner, 2009) but entirely based on observed scores. In addition, although not pursued here, the Lord-Wingersky algorithm 2.5, coupled with the calibrated projection method (Thissen et al., 2011), can be applied to predict scores of the upper grade-band test based on the lower-  grade band test scores. In sum, Lord-Wingersky algorithm 2.5 serves as a useful tool to facilitate reporting of student growth in the multi-state EL assessment program.

Facilitating Subscore Reporting
Educational and psychological assessments usually consist of several item clusters, yielding the so-called subscores. Within the IRT framework, several subscoring approaches, including the bifactor model approach and the correlated-traits MIRT model approach, are available. Subscore reporting is another recurrent topic in recent psychometrics literature (e.g., Sinharay et al., 2007;Haberman, 2008;Haberman et al. 2009;Feinberg & Wainer, 2014) because of the increasing demand for more detailed information about individuals. Two issues must be considered when deciding whether to report subscores obtained through a bifactor model (i.e., the ξ n estimates) in addition to the overall score (i.e., the η estimate). The first question-if these subscores are reliable enough-is the easier one to address within the IRT framework. Here we focus on the second question-whether the information the subscores provide is distinct enough from the overall score.
We believe that if a subscore is considered to be surprising given an individual's overall score (i.e., if the ξ n estimate cannot be well predicted by the η estimate), it should be reported for it is adding information. This is similar in spirit to Feinberg and von Davier's (2020) idea of identifying unexpectedly high or low subscores by comparing observed subscores against a discrete distribution of subscores conditional on the overall proficiency variable in a unidimensional IRT model, but the computations and approach are different.
Our context is a psychiatric assessment tool-the Psychiatric Diagnostic Screening Questionnaire (PDSQ; Zimmerman & Mattia, 2001). PDSQ is a widely used self-report instrument. In particular, it is used in the well-known Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial, a federally funded large-scale study comparing depression treatments. The instrument consists of 139 dichotomous items that cover 15 most prevalent DSM-IV (American Psychiatric Association, 1994) Axis I disorders. Using STAR*D data, which we also use here, Gibbons et al. (2009) showed that a bifactor model, which includes a general psychiatric distress dimension and 15 domain-specific latent dimensions, provides a plausible theoretical and statistical structure for the instrument.
In our preliminary analysis, three symptom domains-alcohol abuse dependence (ALC), drug abuse dependence (DRUG), and Psychosis (PSYCH)-are excluded. The exclusion of the first two is based on the empirical observation that the substance abuse domains were rather distinct from  Figure 3. 95% prediction interval of η-ξ 1 regression the other domains, as judged from the item slopes. The Psychosis domain is excluded because the STAR*D participants are screened positive for non-psychotic major depressive disorder. Two more items in the major depressive disorder (MDD) domain were further excluded due to their ill fit. Therefore, the MDD domain is measured by 19 dichotomous items, while the rest score on the other 11 domains can range from 0 to 100. Item parameters are calibrated based on a sample of 3999 participants and are assumed to be fixed in the subsequent analysis. The illustrative task is to identify combinations of observed scores on the MDD domain (i.e., s 1 ) versus the rest (i.e., s (1) ) that signal the reporting of the MDD subscore would add information to the overall score.
We note that the summaries of the posterior distribution (i.e., μ and ) along with the probability associated with each observed score combination, obtainable from the Lord-Wingersky algorithm 2.5, could be utilized to capture the statistical relationship between ξ n and η and thereafter facilitate subscore reporting. In the simplest instantiation, we regress the estimate of ξ 1 of each score combination (s 1 , s (1) ) on the η estimate, weighted by the corresponding marginal probability, p s 1 , s (1) . A 95% prediction interval from this weighted least squares regression can be calculated (Fig. 3) with the regression parameter estimates and serves as the basis to evaluate the bivariate normal approximated posterior of each (s 1 , s (1) ). For each score combination, the proportion of the posterior density volume that falls within the prediction interval is computed, akin to a p-value. The smaller the proportion, the more necessity there is to report the ξ 1 estimate associated with the score combination. For example, as in Fig. 3, the MDD subscore associated with (7, 71) should be reported, while for another combination, (7, 9), it may not be necessary. Table 9 shows proportions of posterior volumes that fall in the prediction interval, with darker cells indicating lower proportions. Proportions of posterior volume that falls in prediction interval

Detecting Aberrant Score Combination Pattern
As mentioned in Sect. 4.1, the probability of each observed score combination is a by-product of the Lord-Wingersky algorithm 2.5. When arranged in a contingency table, the probability of observed subscore combination (s n , s (n) ) can be used to detect aberrant score combinations through the construction of posterior high-density region (HDR; Novick & Jackson, 1974). A low probability indicates that the co-occurrence of corresponding summed scores is rare. Depending on context, this approach can be useful for diagnosis of lack of person fit or for forensic data analysis in test security.
We illustrate this application of Lord-Wingersky algorithm 2.5 with the Quality of Life (QoL) Scale for the Chronically Mentally Ill (Lehman, 1988). Many previous studies indicate a bifactor model fits the 35-item QoL scale extremely well (e.g., Gibbons et al., 2007;Cai & Hansen, 2013).
Beyond an overall quality of life item, there are 7 subscales (Family, Finance, Health, Leisure, Living, Safety, and Social), each of which includes 4 to 6 items. The dataset used here includes responses from 586 patients. To aid presentation, the original 7-category rating scale items are recoded to have two categories (i.e., 0, 1, and 2 in the original scale are recoded as 0; 3, 4, 5, and 6 as 2). Here, we construct the high-density region (HDR) of combinations of the score on Health (s 1 ) and the rest score (s (1) ) using the Lord-Wingersky algorithm 2.5. s 1 ranges from 0 to 6, and s (1) ranges from 0 to 29. To construct a HDR of level α, we first stack the p(s 1 , s (1) ) of each score combination into a single column, sort all the probabilities from the largest to the smallest, and then compute the cumulative distribution of these probabilities. Observed score combinations that contribute to the first 100α% of the cumulative distribution are identified as the 100α% HDR. Figure 4 shows the HDR for the illustrative task. The unshaded cells represent the 95% HDR. The light gray cells together with the unshaded cells represent the 99% HDR. The dark gray cells represent observed subscore patterns that rarely occur. For example, the score combinations (0, 29) and (6, 0) rarely occur. Individuals with such score combinations deserve further attention.  The original Lord-Wingersky (1984) algorithm was developed for binary items under unidimensional IRT models. Then the algorithm was expanded to polytomous unidimensional IRT models (Hanson, 1994;Thissen et al., 1995;von Davier & Rost, 1995). The Lord-Wingersky algorithm version 2.0 (Cai, 2015) was proposed to computed likelihoods associated with overall summed scores in the context of hierarchical item factor models. In the present article, we proposed the Lord-Wingersky algorithm 2.5 as an extension of the Cai's (2015) Lord-Wingersky algorithm 2.0. The algorithm yields the characterization of the bivariate posterior associated with observed score combinations from the mutually exclusive clusters of items in the model. The algorithm uses more observed information than the Lord-Wingersky Algorithm 2.0 (observed score combinations instead of one overall summed score). Thus it can provide additional information that is useful in practice (summed score likelihoods for all latent dimension instead of the likelihood for the primary latent dimension only). The Lord-Wingersky algorithm 2.5 also remains computationally efficient due to the continued use of dimension reduction. With the Lord-Wingersky algorithm 2.5, likelihoods of observed score combinations under several IRT models, including the two-tier model, the bifactor model and the standard MIRT model, can be computed directly under one algorithm.
The bivariate normal approximation (summarized by μ and ) to the posterior associated with each observed score combination, as one of the outputs of the Lord-Wingersky algorithm 2.5, is a reasonable alternative to the actual (intractable) posterior distribution and can serve multiple purposes in educational and psychological measurement. The marginal probability of each observed score combination, which comes as a by-product of the proposed algorithm, is also useful in practice. We use three empirical applications to illustrate the range of possible use of this new algorithm-(a) translating observed score combinations to aid growth interpretations in educational measurement, (b) facilitating subscore reporting in psychiatric assessment, and (c) detecting aberrant observed subscore combinations in health-related outcome research.
While applying the proposed algorithm, we assume the IRT model is correct. It is also assumed that item parameters are known and fixed, since in practice the parameter calibration stage and scoring stage are often conducted sequentially. To take into account the uncertainty around item parameters (i.e., standard errors in the calibration stage), we suggest using multiple imputation (MI; Rubin, 1987)-based approach (e.g., Yang et al., 2012).
Hierarchical item factor models, especially the bifactor model, saw increasing use in psychological and educational assessment. Recent development in computational algorithms for estimating multidimensional IRT models (Cai, 2010a;Edwards, 2010) and software, e.g., flexMIRT (Cai, 2017), has brought the usage of MIRT models within reach for routine data analysis. We posit that providing scores that are based on observed statistics (e.g., summed scores, observed subscale scores) will continue to be desired and useful in practice, and the current study is a further contribution to the IRT scoring literature.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.