The Psychometric Properties of an Internet-Administered Version of the Depression Anxiety and Stress Scales (DASS) in a Sample of Dutch Adults

Psychometric work on the widely used Depression Anxiety and Stress Scales (DASS) has mostly used classical psychometrics and ignored common internet-administered versions. Therefore, the present study used not only classical, but also modern psychometrics based on item response theory (IRT) to evaluate an internet-administered version of the DASS (Dutch translation). Internet-administered DASS data were collected as part of a large internet-based study in the Dutch adult population (n = 7972). Initially, external correlates (i.e. demographics other measures) and some classical psychometrics (internal consistency, convergent/divergent validity) of the DASS scales were evaluated. Next, IRT was used to investigate the scales’ dimensionality, discrimination and item-functioning. Finally, the DASS depression scale was further investigated by linking it to the more clinically-oriented Quick Inventory of Depressive Symptomatology (QIDS) using item response theory (IRT). Initial classical psychometric analyses supported the scales’ internal consistency (alpha = 0.94–0.98) and convergent/divergent validity. IRT analyses showed that each of the DASS scales was only suitable to measure variations in a very narrow and rather mild severity range. Linking the DASS depression scale with the QIDS also showed that the DASS depression scale discriminated best in the mild-moderate severity range, but not at higher severity levels that were covered by the QIDS. In conclusion, the scales of the internet-administered DASS show good internal consistency and validity. However, users should be aware that the scales discriminate best at mild-moderate severity ranges in the general population.


Introduction
The Depression Anxiety and Stress Scales (DASS) is a 42item self-report instrument that was developed to improve the discrimination between depression and anxiety (Lovibond and Lovibond 1995a, b). The DASS is widely used and several psychometric studies have shown the internal consistency, convergent/divergent validity and factorial structure to be satisfactory (Lovibond and Lovibond 1995a;Brown et al. 1997;Page et al. 2007). Despite abovementioned work, the dimensionality, discriminatory ability and item-functioning of the DASS remain incompletely understood. The previous work has only employed classical psychometric analyses that provide limited information about these important measurement characteristics (e.g., Sijtsma 2009). Fortunately, these aspects can be effectively investigated with modern psychometric methods based on item response theory (IRT; Embretson and Reise 2000). In addition to providing information about the functioning of scales and items within an instrument, IRT allows for deeper investigations of the relationships between scores on one measurement scale with scores on another scale. IRT-based linking, for instance, can be used to map scores on two separate scales that are designed to measure similar constructs (e.g., depression) onto a common underlying severity dimension (Kolen and Brennan 2004;Orlando et al. 2000;Wahl et al. 2014). This mapping provides valuable information about two scales' relatedness in terms of measurement range and discriminative properties, and can be used to evaluate if a measure has the properties needed for administration in specific target groups. Unfortunately, IRT work has only been conducted with the shorter DASS-21 (Shea et al. 2009;Parkitny et al. 2012) but, to our knowledge, not with either a paper-and-pencil or internet-administered version of the full-length DASS.
Another subject that has received relatively little attention in the literature is the psychometric quality and measurement characteristics of the DASS when administered via the internet. Some classical psychometric work that was conducted with an internetadministered version of the full-length DASS (Zlomke 2009), showed that the scales had good internal consistency (alpha = 0.93-0.95). However, modern psychometric (i.e. Rasch) analyses have only been conducted with the internetadministered DASS-21 (Shea et al. 2009). An extensive IRTbased study of the full-length internet-administered DASS could give more insight into the potential usefulness and added value of the instrument for large-scale and low-cost mental health research (e.g., Coles et al. 2007;Naglieri et al. 2004;Gosling and Mason 2015). Several advantages of online mental health assessments are: (a) the lower rates of socially desirable responding and decreased social anxiety (e.g., Joinson 1999), (b) the possibility to include those otherwise unable or unwilling to visit a research site (internet samples tend to be substantially more diverse than conventional samples), and (c) the potential for using (computerized) adaptive testing to shorten assessment time and personalize measurements (Gibbons et al. 2008;Buchanan 2002;Gosling and Mason 2015). Ideally, the psychometric characteristics of internet-administered versions of instruments should be investigated with dedicated studies, as findings for paper-andpencil versions of questionnaires do not necessarily generalize to internet-administered versions (Buchanan 2002).
The current study addresses the above described issues by evaluating the classical and modern psychometric properties of an internet-administered Dutch version of the DASS in a group of population-dwelling Dutch adults (N = 7972). First, preliminary classical psychometric analyses were conducted (internal consistency and convergent/divergent validity). Next, IRT was used to investigate each scale's measurement properties (i.e. discriminative ability; range of measurement). To gain more insight into the meaning and functioning of the DASS depression scale in the context of more broadly-defined clinical depression, IRTbased linking was used to place the DASS depression scores onto a common scale with scores on the Quick Inventory of Depressive Symptomatology (QIDS) that is conceptually closer to the clinical definition of major depressive disorder (MDD; Diagnostic and Statistical Manual fifth edition) and includes a broader set of clinically relevant criterion symptoms (i.e. several somatic/vegetative symptoms and suicidality). These analyses were used to gain some insight into the extent to which the DASS depression-scale scores actually capture severity variations in clinically-defined depression severity.

Method Participants and Procedures
The data were collected as part of a large scale project (van der Krieke et al. 2016), which was aimed to investigate the distribution of mental health dimensions in the Dutch population, focusing both on mental vulnerabilities/problems (e.g., mood, anxiety stress) and mental strengths (e.g., positive affect, humor, empathy, well-being). The project was advertised through a press release by the University Medical Center Groningen (UMCG), after which it was picked up by local and national media. Participants could go to the project website (www. hoegekis.nl), create an account, and fill in a questionnaire assessing basic socio-demographics (e.g., age, gender education, social status, employment). After this, participants could choose to complete different modules of questionnaires (e.g., affect/mood, well-being, mental strengths; van der Krieke et al. 2016). After completion of each module, participants received automated feedback about their scores, including a comparison with the other participants' scores. Participants were informed about the project and the fact that their data were to be stored, anonymized and used for scientific research before deciding to continue and participate in the research project. The study protocol was reviewed by the Medical Ethical Committee of the UMCG and exempted because it concerned a nonrandomized open study targeted at anonymous volunteers in the general public. In a period of exactly one year (December 13th 2013 -December 13th 2014), 12,501 subjects registered online with the research project. Of these, 7972 (63.8%) completed the DASS. Those who did complete the DASS were older (mean: 46.2 years [s.d. = 14.9] vs. 43.7 years [s.d. = 14.9]; t = −9.2, p < 0.001; Cohen's D = 0.17) and more often female (67.5% female vs. 61.1% male; χ 2 = 52.7, p < 0.001; Cramer's V = 0. 07) compared to those who did not fill in the DASS (n = 4529; 36.2%). All questionnaires were administered via the internet and participants could only submit their responses if all items in a questionnaire were completed.

Measures
The DASS (Lovibond and Lovibond 1995a, b;Dutch translation: de Beurs et al. 2001) is a 42-item self-report questionnaire with items rated on a 4-point (0-3) Likert scale. The DASS consists of three subscales of 14 items to assess the specific emotional dimensions, viz., 'depression', 'anxiety', and 'stress'. The Dutch translation of the DASS was previously found to have good psychometric properties (De Beurs et al. 2001).
The official Dutch translation of the Quick Inventory of Depressive Symptomatology self-report (QIDS; Rush et al. 2003) was used to measure depression severity (translation details are provided on the IDS website: www.ids-qids.org).
The QIDS is a questionnaire consisting of 16 items rated on a 4-point Likert scale (0-3) and covers all criterion symptoms of a depressive episode according to the DSM. Nine scores are counted up to a total depression severity score (range: 0-27). Only the highest score on the sleep items (1-4), appetite and weight loss/gain items (items 6-9) and the highest score on the psychomotor agitation/retardation items (items 15-16) are used in the sum score. The paper-and-pencil version of the QIDS was previously shown to have good internal consistency (alpha = 0.86; Rush et al. 2003), convergent validity and construct validity (Reilly et al. 2015). Studies specifically investigating the Dutch QIDS translation found adequate psychometric properties (e.g., Lako et al. 2014).
The Dutch translation of the Positive Affect and Negative Affect Schedule (PANAS; Watson et al. 1988;Dutch translation: Peeters et al. 1996) is a self-report questionnaire consisting of 20 items rated on a 5-point Likert scale (1-5) that assess the presence of several emotions in the week prior to the assessment including today (in the used version). The PANAS consists of two 10-item subscales: Negative Affect (NA) covers negative emotions and distress (e.g., feeling 'guilty', 'pessimistic') and Positive Affect (PA) covers positive emotions (e.g., feeling 'interested', 'enthusiastic'). In the current study, an internet-administered version of the PANAS was administered. Previously, the PANAS has been shown to have good internal consistency (alpha = 0.84-0.89 for NA and alpha = 0.84-0.89 for PA; Watson et al. 1988;Crawford and Henry 2004). The Dutch translation has also been shown to have good psychometric properties (Peeters et al. 1996;Engelen et al. 2006).

Statistical Analyses
Associations between DASS scales and sociodemographic factors were investigated by comparison of median DASS scores between sociodemographic groups using nonparametric Mann-Whitney U tests (for comparison of 2 groups) and Kruskal-Wallis (K-W) tests (for comparison of 3 groups). Effect-sizes for these non-parametric analyses were calculated using the formulas presented in Fritz et al. (2012) and Cohen (2014). These analyses were conducted with R (version 3.4.0; R Core Team 2015).

Classical Psychometrics
Cronbach's alpha and average inter-item correlations were calculated based on the polychoric item-correlation matrix. Spearman correlation coefficients were calculated to investigate the inter-relationships between each of the DASS scales, and of the DASS scales with the QIDS and the PANAS scales. These analyses were conducted with R-package 'psych' (version 1.7.5; Revelle 2015).

Item Response Theory
Prior to IRT analyses, each scale's unidimensionality was checked by running a 3-factor exploratory factor analysis (EFA) with a bifactor rotation, and inspecting the proportion of explained variance for the first extracted factor, using ≥70% explained variance as the cutoff for sufficient unidimensionality (Reise et al. 2010). If sufficiently unidimensional, each scale was investigated with IRT analyses. Because the DASS items had an ordinal response scale, a Graded Response Model (GRM; Samejima 1969) was fitted to each of the DASS subscales. This model estimates two parameters for each item: the discrimination parameter (α) describes how strongly an item is related to the underlying severity dimension, and the threshold parameter (β) describes the severity of the symptom described by the item. In a constrained model, α is constrained to be the same across items, which implies that all items are equally strongly related to the underlying dimension, similar to a polytomous Rasch model. In an unconstrained model, α is estimated for each item separately. Both model-variants were fitted and compared with a likelihood ratio test. For the bestfitting model, the items' thresholds and discrimination parameters were inspected. In addition, the scales' test information curves (TIC) were inspected to gain insight into the information each scale provided along the underlying severity dimension. The item information curves (IIC) were also inspected to evaluate specific items' individual contributions to measurement information. IRT analyses were conducted with R-package 'ltm' (version 1.0-1; Rizopoulos 2006).

Linking
The items of the DASS depression scale and the QIDS were mapped on a common scale to investigate how measurements with the DASS depression scale were related to measurements with the QIDS, in terms of discriminative ability and the severity range of measurement. To facilitate this linking process, a set of constants (based on similar items in both scales) was identified for calibration. The following item-pairs were identified based on similarities in content: (a) feeling sad/ depressed (DASS-13 and QIDS-5; polychoric correlation (r pch ) = 0.86); (b) loss of interest (DASS-16 and QIDS-13; r pch = 0.74); and (c) feelings of worthlessness in comparison to others (DASS-17 and QIDS-11; r pch = 0.81). For the linking analyses, item parameters for the DASS depression scale and QIDS were estimated first with a GRM in ltm. Next, the items of both instruments were placed on a common scale. To do this, the IRT-parameters of the QIDS items (thresholds and discrimination) were rescaled to the scale of the DASS depression scale, which was used as reference. This was done by use of linking constants obtained with the Stocking-Lord calibration linking method (Stocking and Lord 1983). Linking analyses were rerun with an alternative calibration method (Haebara 1980) to investigate the consistency of the results. Next, item-parameters of both instruments were investigated and the IICs were inspected and compared between the items of the two questionnaires, in order to gain insight into their comparative coverage along the depression severity spectrum. Finally observed DASS scores were equated to observed QIDS scores (original scoring), based on corresponding theta-values on the shared underlying depression severity dimension. Linking was performed with the R-package 'plink' (version 1.5-1; Weeks 2010).
Median DASS scale scores were significantly higher in females, in young age-groups, in the unmarried group, in the unemployed group, and in those with less than a college education (Table 1), although the observed effect sizes were small.

Classical Psychometric Characteristics
Cronbach's alpha coefficients (see Appendix Table 5) indicated very high internal consistency for each of the scales (alpha = 0.94-0.98). Average inter-item correlations were high for the anxiety (0.55), stress (0.56) and depression (0.74) scales. In addition, the DASS scales showed strong intercorrelations (ρ = 0.60-0.69). Spearman correlations between the DASS scales and the QIDS (alpha = 0.88), NA (alpha = 0.91) and PA (alpha = 0.93) indicated moderate to strong interrelatedness, with the strongest correlations being observed between the DASS depression scale and the QIDS (ρ = 0.77), between the DASS stress scale and NA (ρ = 0.74) and between the DASS depression scale and PA (ρ = −0.68). The weakest correlation was observed between DASS anxiety and PA (ρ = −0.43). The results were stable across gender, age-groups, and gender-by-age subgroups.

Item Response Theory Analyses
Bifactor EFAs of the individual scales showed that the first general factor explained more than 70% of the variance in each of the DASS scales indicating sufficient unidimensionality for IRT analyses. For each of the DASS scales the unconstrained GRM fit the data better than the constrained GRM (Depression: LRT = 2517.1; df = 13, p < 0.01; Anxiety: LRT = 1923.3; df = 13, p < 0.01; Stress: LRT = 2368.4; df = 13, p < 0.01). This indicated that items differed with respect to their discriminatory ability.

Depression Scale
The lower end of the depression scale (see Table 2) was covered by symptoms of mood and motivational disturbance, e.g., 'sad/depressed mood' (item 13), 'feeling down' (item 26), 'difficulty to get going' (item 5). The highest end of the measured severity dimension was covered by items tapping into anhedonia, such as 'lack of enthusiasm' (item 31), 'no positive feelings' (item 3), and 'no interest in anything' (item 16). The TIC and IICs ( Fig. 1) indicated that items varied in terms of the amount and severity range of the provided information. Several items provided remarkably high levels of information, for example item 37 ('I could see nothing in the future to be hopeful about') and item 21 ('I felt that life wasn't worthwhile'), which provided 18.1% of the total information. Contrarily, item 5 ('I just couldn't seem to get going') and item 42 ('I found it difficult to work up the initiative to do things') were relatively uninformative about severity and provided only 7% of the total information. An examination of the items' thresholds and discrimination parameters showed several subsets of items with comparable measurement properties. For instance, items 17 ('I felt I wasn't worth much as a person'), 21 ('I felt that life wasn't worthwhile), 34 ('I felt I was pretty worthless'), 37 ('I felt there was nothing to look forward to'), and 38 ('I felt that life was meaningless') showed strongly overlapping thresholds, in line with their overlapping content. This was also observed for items 24 ('I couldn't seem to get any enjoyment out of the things I did') and 31 ('I was unable to become enthusiastic about anything').

Anxiety Scale
The lower end of the anxiety severity dimension was covered by items assessing anxiety and panic, such as 'feeling scared' (item 20), 'feeling close to panic' (item 28), and 'situational anxiety' (item 9). The highest end of the anxiety dimension was covered by items assessing somatic arousal symptoms, such as 'perspiration' (item 19), 'feeling faint' (item 15), and 'difficulties in swallowing' (item 23). Inspection of Fig. 1 showed that item 28 ('I felt I was close to panic') and 36 ('I felt terrified') provided most information. The curves of items 2 ('I was aware of dryness of my mouth'), 19 ('I perspired noticeably') and 25 ('I was aware of the action of my heart') showed that they provided little information along the dimension (apart from some information at the severe end). Several items contributed most of their information at the severe end of the dimension: item 15 ('I had a feeling of faintness') and item 23 ('difficulty swallowing'). Inspection of the itemparameters indicated that there was some overlap in itemfunctioning in the anxiety scale, with the clearest overlap between items 9 ('I found myself in situations that made me so anxious I was most relieved when they ended'), 20 ('I felt scared without any good reason'), 28 ('I felt I was close to panic') and 40 ('I was worried about situations in which I might panic and make a fool of myself').

Stress Scale
The low end of the stress scale was covered by items that assess symptoms of agitation and irritability, such as 'difficulties to relax' (item 8), 'feeling very irritable' (item 27), and 'feeling touchy' (item 18). The severe end was marked    Fig. 1 showed that individual items differed substantially in terms of the amount Per scale, items are ordered in ascending order by mean threshold Thr1 = response threshold between category 0 and 1; Thr2 = threshold between response category 2 and 3; Thr3 = threshold between response category 3 and 4; Discr = item discrimination parameter of information they provided along the severity dimension. Items 11 ('I found myself getting upset rather easily') and 27 ('I found that I was very irritable') provided high levels of information, whereas items 14 ('I found myself getting impatient when I was delayed in any way'), 22 ('I found it hard to wind down') and 32 ('I found it difficult to tolerate interruptions') provided relatively little information along most of the dimension. However, only the latter items provided any information at the severe end. In the stress scale, overlap between items' functioning was less pronounced than in the other scales.

Linking DASS Depression and the QIDS
The item-parameters of the DASS depression scale items and the QIDS items, ordered by increasing mean threshold on the common underlying scale are shown in Table 3. The two items at the extreme ends of the spectrum showed very low discriminative ability, and were therefore not included in the interpretation of the results. For the remaining items, the range of covered severity was large (lowest threshold at −0.66 and highest threshold at 4.74). The DASS items showed average thresholds of 1.02 to 1.75, and thresholds ranging from −0.66 to 3.01 and the QIDS items showed average thresholds ranging from 1.16 to 3.34 (thresholds ranging from: −0.01 to 4.74). This indicates that the DASS items were more located in the lowermiddle range of the common severity spectrum, which was also evident from the IICs in Fig. 2. Among the DASS items, only two often-endorsed QIDS items were located at the mild end of the spectrum (QIDS11: 'view of myself', and QIDS5: 'feeling sad'). Most QIDS items provided measurement information in the middle-high range of the common severity dimension. Among these items were DSM criterion symptoms for depression, not included in the DASS depression scale (i.e., appetite/ weight change and psychomotor problems). Similar results were found with another linking method (Haebara), and when using the DASS instead of the QIDS as reference scale (see Appendix  Table 4.

Discussion
This paper presented an investigation of the psychometric properties of an internet-administered version of the DASS in a sample of Dutch adults. Previous work showed high internal consistency for the DASS scales, while associations with other instruments indicated good convergent/divergent validity, especially for the depression scale. In line with these previous findings, the current results show that the scales of the internet-administered version also have good classical psychometric properties. Additional modern psychometric analyses showed that the items within each DASS scale showed varying severity and discrimination parameters, although some overlap in item-functioning was observed in the depression and anxiety scales. The measurement information provided by items along the underlying severity dimension also varied within each scale and showed most variation in the anxiety and stress scales. Linking the DASS depression scale items to the items of the QIDS showed that, within the context of a more heterogeneous, clinically defined depression severity spectrum, the DASS items mostly measure in the mild-moderate range of depression severity. The high alpha coefficients (0.94-0.98) indicated very good internal consistency for the DASS scales. However, together with the high average inter-item correlations (0.55-0.74), these coefficients also suggested that the DASS scales were quite Items are ordered in ascending order by mean threshold. Thr1 = response threshold between category 0 and 1; Thr2 = threshold between response category 2 and 3; Thr3 = threshold between response category 3 and 4; Discr = item discrimination parameter. The 4 QIDS sleep items, 2 psychomotor problem items, and weight change and appetite change were not merged for these analyses a These items showed very low discrimination parameters and were not included in the interpretation of the results homogeneous in their coverage, especially the DASS depression scale. This is probably because this scale includes overlapping items that measure quite narrow concepts (i.e. depressive cognitions and mood) resulting in a scale that measures a narrow construct (Clark and Watson 1995). Indeed, another direct comparison of the DASS-21 depression scale and the QIDS in a clinical sample showed higher internal consistency for the DASS-21 depression scale, which the authors explained by the fact that the DASS-21 scale is rather homogeneous (mainly cognitive and emotional symptoms) compared to the more comprehensive QIDS, which covers all clinical criteria for a major depressive disorder, including sleeping problems, appetite/weight change, energy-loss and psychomotor retardation/agitation (Weiss et al. 2015). Indeed, deeper investigation of the depression scale with IRT analyses showed strong overlap in itemfunctioning between items with similar content. For instance, sets of items that all assessed cognitions of worthlessness (items 17, 21, 34, 37 and 38) and items that all assessed lack of positive emotions (items 24 and 31) showed strong overlap. From a theoretical perspective, the fact that many items function in the same way, implies that the severity dimension as indexed by the complete scale score has a restricted range. Clusters of similarly functioning items provide a lot of information about a rather small severity interval. Indeed, when mapped on a common severity scale, the DASS-items provided most measurement information at the lower end of the overall depression severity spectrum, whereas typical criterion symptoms of clinical depressive episodes that are included in the QIDS but not in the DASS depression scale (i.e. psychomotor symptoms, appetite/weight change and hypo/hypersomnia) were endorsed at higher severity levels. Importantly, this indicates that the DASS depression scale cannot provide meaningful information along the whole spectrum of depression severity, which could result in ceiling-effects when the scale is used in more severely depressed populations. Note that it is not negligence that the DASS included items that are rather similar in content, as the original authors aimed to divide each scale into even more specific 'subscales' of 2-5 items (Lovibond and Lovibond 1995a, b). For instance, the depression scale was meant to assess the following domains: 'dysphoria', 'hopelessness', 'devaluation of life', 'self-deprecation', 'lack of interest/involvement', 'anhedonia' and 'inertia'. However, our results suggest that items of selfdeprecation (item 21), devaluation of life (item 38) and hopelessness (item 37) functioned very similarly, indicative of a limited differentiation between these subdomains.

Fig. 2 Test information curves (left) and item information curves (right) for the DASS depression scale (black lines) and the QIDS (blue lines) on the joint underlying depression severity dimension
As stated above, the results show that the DASS depression scale is most useful to differentiate between mild-moderate severity levels. The finding of potentially redundant items may suggest that the depression scale, and possibly the other scales as well, can be shortened without compromising their differentiating ability within this range. Indeed, the short DASS-21 (Lovibond and Lovibond 1995b) includes only seven items per scale and has been quite thoroughly investigated using classical (e.g. Antony et al. 1998;Clara et al. 2001;Sinclair et al. 2012;Osman et al. 2012;Gomez et al. 2014) and modern (Shea et al. 2009;Parkitny et al. 2012) psychometric techniques. However, the depression scale of the DASS-21 still includes sets of items that were found to overlap in this study (DASS-21 items 17 and 21 [worthlessness/meaninglessness] and items 3 and 16 [lack of positive feelings/enthusiasm]). Based on the present findings, further shortening of the DASS scales could be considered. For instance, calculations in the current dataset showed that shortening the DASS depression scale to 5 items would still result in a scale with good internal consistency (alpha = 0.92; with DASS-21 item 5 ['I found it difficult to work up the initiative to do things'] and item 21 ['I felt that life was meaningless'] removed). Although this observation was based on data collected with the full-length DASS, it is in line with previous Rasch analyses (Shea et al. 2009), which suggested that the depression scale could be improved by removing item 5 ('I found it difficult to work up the initiative to do things'). Alternatively, the DASS depression scale could be extended with a range of more diverse symptoms (e.g. vegetative symptoms) to increase the heterogeneity of the covered domains and the scale's measurement range.
Although the properties of the anxiety scale could not be investigated in as much detail because secondary measures of anxiety were not administered, its average inter-item correlation was considerably lower (0.55) than for the depression scale. Although this indicates that scale homogeneity was less marked, some overlap in item functioning was observed in the IRT results, with four items that cover 'situational anxiety' (items 9 and 40) and 'subjective experiences of anxious affect' (items 20 and 28) providing most of their measurement information at the same severity level. Additionally, information at the mild-moderate end of the anxiety spectrum was mostly provided by items covering situational and subjective anxiety (i.e. panic, feeling scared), whereas information on the moderate-severe end of the spectrum was provided by items covering symptoms of autonomic/somatic arousal (i.e. trembling, perspiring, difficulty swallowing).
Within the stress scale, the average inter-item correlation was also lower than for the depression scale (0.56), but was still high enough to indicate some item redundancy. Although inspection of the IRT parameters of the stress scale showed that there were no sets or clusters of items with strongly overlapping functioning, most items were located relatively close together on the latent dimension (as indicated by their averaged item thresholds). This suggests that there is also room for improvement for the stress scale.
The current study had several strengths, including the large sample size, which provided the possibility to investigate the DASS's psychometric properties in different demographic groups. Additional strengths were the use of modern psychometric techniques, and the linking of DASS depression scores with scores on the QIDS. However, some study limitations should be kept in mind. First, the data were collected in volunteers through an internet-platform, which attracted respondents that were relatively highly educated and often female. Consequently, the generalizability of the results to the general population -or subpopulations that are not covered by the current study -requires further investigation. Second, the full version of the DASS was used, instead of the shorter and often used DASS-21. The generalizability of the psychometric performance results from the current study to the short-form version needs further evaluation. Third, for the DASS anxiety and stress scales convergent validity could not be investigated very deeply, because more specialized anxiety and stress measures were not administered. Consequently the linking analyses could only be performed for the DASS depression scale. Finally, the sample was recruited from the general population and no information was available about formal (DSM-5) anxiety/depressive disorder diagnoses, limiting possibilities to test the scales' relationships with diagnosed clinical psychopathology.
A promising direction for further research in the context of online-administered depression and anxiety instruments -including the DASS, is the implementation of computerized adaptive testing. The current results already provide some insight into how the scales' items are distributed along their respective underlying severity spectra (Wahl et al. 2014). Such information is a good starting point for the development of algorithms that can quickly and effectively zero in on a person's severity level, by strategically adapting each next administered item to the responses given on the previous items. Such algorithms could save administration time and would make measurement more personal (e.g., less administration of items that do not apply to the respondents) while increasing precision.
In conclusion, the present classical and modern psychometric investigation showed the internet-administered version of the DASS to (a) have good classical psychometric properties, (b) contain sets of items with similar item-functioning, and (c) be most suitable to measure dimensional depression severity variations in population samples (mild-moderate severity levels).
Funding This research project is funded by a VICI grant (no: 91,812,607) received by Peter de Jonge from the Netherlands organization for Scientific research (ZonMW) and by the University Medical Center Groningen Research Award 2013 received by Peter de Jonge. Part of the project was realized in collaboration with the Espria Academy.

Compliance with Ethical Standards
Ethical Approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent Informed consent was obtained from all individual participants included in the study.
Conflict of Interest Klaas J. Wardenaar, Rob B. K. Wanders, Bertus F. Jeronimus and Peter de Jonge declare that they have no conflict of interest.     Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.