The aim of the study was to examine the psychometric properties of the nine IGD criteria as measured by the IGDS9-SF in a large and heterogeneous sample of adolescent and adult USA gamers using IRT analysis (2PLM). As for the IRT analysis (based on the binary recoding process outlined for the IGDS9-SF), the results indicated that the IGDS9-SF items designed to assess IGD criteria are psychometrically sound, as they demonstrated excellent discrimination values. For most criteria measured by the IGDS9-SF, difficulty values were high (at above + 2 SD from the mean). The IGD criteria “giving up other activities” (item 5) and “withdrawal” (item 2) had especially higher discrimination values (α = + 3.57 and + 3.29, respectively), thereby indicating that they were stronger items when used to discriminate individuals with and without high levels of the IGD trait. These results converge with theoretical assumptions about behavioral addictions and recent empirical findings. More specifically, these finding support Griffiths’ (2005) components model of addiction [which has been demonstrated to overlap with the nine IGD criteria theoretically (see Pontes and Griffiths) and empirically (see Pontes et al. 2014)] that argues that “conflict” (IGD criterion 5) and “withdrawal” (IGD criterion 2) are core features of behavioral addictions such as IGD. Additionally, these findings lend empirical support to recent psychophysiological evidence supporting the presence of withdrawal-like effects in behavioral addictions (see Reed et al. 2017).
The results also revealed that for most IGD criteria as measured by the IGDS9-SF, information values were extremely low up to around + 0.5 SD from the mean relative to their reliabilities at + 2 SD to + 2.5 SD from the mean. Furthermore, the overall findings for the discrimination values in this study suggest that the nine IGD criteria, as measured by the IGDS9-SF, are able to effectively discriminate those with high and low levels of IGD trait. The findings for the difficulty values suggest that for these recoded scores, there is a 50% probability that the criteria would be endorsed as being present when their underlying trait levels are at least around + 2 to + 2.5 SD from the mean with adequate reliabilities, at around the mean to + 2 SD from the mean. It is important however to note that these are reliabilities of the different IGDS9-SF criteria to measure the IGD and not their own individual unique reliabilities (e.g., the reliability of an item measuring its referred criterion). Notwithstanding this, this finding is of utmost clinical importance and underscores the advantages of applying IRT to the investigation of IGD criteria as psychometric standardized tools designed to assess this phenomenon should be able to reliably discriminate at different levels of the disorder, something that is not tenable to be examined using CTT-based approaches.
Despite a reasonable level of consistency in the results for the item parameters, there were key differences between the parameter estimates for the criteria worthy of specific note. The results showed that although the discrimination (α) values across the different criteria were high, they varied noticeably (ranging from 1.57 to 3.57), thereby indicating the IGD criteria as measured by the IGDS9-SF differed in their ability at discriminating those gamers with low and high levels of the IGD trait. The order of the criteria in terms of increasing discrimination values were IGDS9-SF items 8, 1, 6, 9, 3, 4, 7, 2, and 5. The values were 1.57, 2.41, 2.47, 2.50, 2.57, 2.75, 2.82, 3.29, and 3.57, respectively. Based on guidelines proposed by Baker (2001) that discrimination values < 0.64 are considered low, and that discrimination values from 0.65 to 1.34 are moderate, from 1.35 to 1.69 are high, and > 1.69 are perfect (interpreted here as very high), all IGD criteria, except criterion “escape” (item 8), can be considered high. “Escape” had noticeable lower ability to discriminate those with and without high levels of the IGD trait. In contrast, criteria “giving up other activities” (item 5) and “withdrawal” (item 2) had relatively higher discrimination values (α = 3.57 and 3.29, respectively), and therefore, higher ability to discriminate those with and without high levels of the IGD trait. There was also some degree of variability for the difficulty values, with values ranging from around + 2 SD to + 2.41 SD from the mean. However, as the differences in the difficulty parameter values were all within 0.5 SD, the differences could be taken as small. Thus, the findings indicated that generally for the recoded IGDS9-SF criteria, there is a 50% probability that they would be endorsed as being present, when their underlying IGD latent trait levels are around + 2 SD or slightly more than + 2 SD from the mean, depending on the criterion. Additionally, unlike the other IGD criteria, criterion “escape” (item 8) had low reliability for virtually all levels on the trait spectrum.
Similarly to the findings obtained in the current study, the study by Wu et al. (2017a) that applied the Rasch model on the ratings of the IGDS9-SF, as well as the study by Király et al. (2015) that examined the 2PLM properties for the recoded scores of the IGDT-10 (Király et al. 2015) have reported wide variability considering the item difficulty and/or discrimination parameters. However, there were important similarities and differences between the findings of these studies in comparison to the present study. First, relative to most of the other criteria, Király et al. (2015) found a lower discrimination parameter value for criterion “escape” (item 8), and a higher discrimination parameter value for item 2 (“withdrawal”). It was also found that relative to most of the other IGD criteria, criterion “preoccupation” (item 1) had a lower difficulty parameter value. Third, in line with the findings reported by Király et al. (2015), we also found relative low difficulty parameter value for criterion “escape” (item 8).
Although, findings for the parameter estimates of the current study were directly compared with the findings in previous studies, caution is advised when interpreting the present results, as IRT utilizes responses to individual items to obtain continuous scaled estimates of the underlying trait or theta. In most statistical modeling packages, the trait value is set at a mean of zero and a standard deviation of one. Since the difference in the response categories and scoring methods across the measures used in different studies could lead to different responses by the same individual, the metrics for the measures would be different. Consequently, their parameters cannot be directly compared. Thus, all IRT findings reported in this study are novel, and extend existing psychometric information for the IGDS9-SF that has been derived essentially via CTT-based studies.
Clinical and Diagnostic Implications
The findings obtained in the present study may have key clinical and diagnostic implications for the assessment IGD and potential implications for revising (or at least investigating further) the nine IGD diagnostic criteria. More specifically, criterion “escape” (item 8) had relatively low discrimination ability, and its reliability was relatively low at all trait levels. The latter means that this criterion does not measure IGD trait with sufficient accuracy, and may need some revision or even removal from future revisions of the IGD diagnostic criteria. While this study suggests the need to revise the criterion for “escape,” suggesting exactly how it should be revised is beyond the scope of the present study as a clinical sample would be required. Notwithstanding this, at a more general level, the low reliability value for the criterion “escape” may also have direct implications for the use of this criterion in the diagnosis of IGD as it may not be reliable for the diagnosing IGD. Our findings suggest that its adequacy as a diagnostic criterion may need to be reviewed in future editions of the DSM. Despite the preliminary evidence found here with regards to the inadequacy of the “escape” criterion, this finding needs to be interpreted with caution as further clinical research is necessary to corroborate or invalidate this finding as the present sample was a community-based sample. Additionally, as noted by Wender (2004), unlike rating scales, clinical interviews provide opportunities for clinicians to deal with respondents’ uncertainties when answering questions.
The present findings with regards to the criterion “escape” echo those reported by Ko and colleagues (2014) using a clinical sample of disordered gamers (n = 75), remitted disordered gamers (n = 75), and a control group (n = 75) that found that the criterion “escape” (alongside “deception”) presented with the poorest diagnostic accuracy values in comparison to all other criteria. A potential explanation for this finding could be related to the fact that “escape” may be best understood as a gaming motivation and risk factor for disordered gaming (as opposed to a core criterion of IGD), further implying that although “escape” maybe a reliable predictor of IGD as found by several studies (e.g., Király et al. 2015; Wu et al. 2017a), it may be better understood as a peripheral feature (i.e., underlying motivation) and not a central aspect of IGD.
The results of this study also have implications for the future use of the IGDS9-SF in research and clinical settings. Given the IRT results, it can be argued that when IGDS9-SF criteria are used as binary scores, they would generally provide a highly discriminative and reliable measure of their underlying IGD latent trait for those with high levels of IGD. In this respect, as item 5 (criterion “giving up other activities”) and item 2 (criterion “withdrawal”) demonstrated sound ability to discriminate those with and without high levels of the IGD trait, and reliable representation of the IGD trait at high levels, these criteria may be considered to be important for the identification and diagnosis of IGD. This is an important finding that corroborates recent studies using different methodological approaches with regards to the clinical utility of the criteria “giving up other activities” and “withdrawal.” More specifically, the study by Rehbein et al. (2015) using a large representative sample of 11,003 ninth-graders aged between 13 to18 years from Germany found that “give up other activities” best corresponded with the full IGD diagnosis and that this criterion alongside “withdrawal” were the most relevant and useful criteria for IGD diagnosis. Additionally, Rehbein et al. (2015) also found that although “escape” was endorsed most frequently by their sample, it rarely related to IGD diagnosis, further supporting the present findings with regards to this criterion.
Further implications of the IRT analysis suggest that overall the nine IGD criteria outlined in the DSM-5 are captured in the IGDS9-SF, and more information regarding the validity of the criteria used for diagnosis can be derived from prior and future use of the instrument. Furthermore, IGD can be assessed from a dimensional and categorical vantage using this instrument. The results obtained here may pave the way to future research as the present findings may be utilized to discern important screening items for brief screening an intervention of IGD in clinical prevention and intervention research.
Additional implications may be related to the fact that a general rule of thumb in psychological measurement is that scores 2 SD from the mean in the deviant direction are considered clinically meaningful. Since the difficulty parameter values of all the criteria in the IGDS9-SF were at this point or higher, it can be substantiated that an IGD score of + 2 SD may be able to efficiently distinguish those with and without high levels of IGD, which is of utmost importance to any clinical assessment tool.
In addition to this, it is worth noting that the findings that the IGD criteria are generally unreliable and do not adequately represent the appropriate trait, from close to the mean to low trait levels, implies that measuring IGD criteria with the IGDS9-SF may result in unreliable scores for individuals with relatively low levels of the IGD criteria. Thus, its use with individuals with low levels of the IGD criteria may be problematic. This may be particularly relevant for community-based studies and particularly advantageous to clinical-based studies screening individuals with potentially elevated IGD-related symptomatology. Despite this, the use of such measures can still be considered appropriate for epidemiological and prevalence studies since the focus of such studies is not on individuals with low levels of the IGD criteria, but on individuals with high levels of these criteria. The primary goal of such studies is to ascertain prevalence rates of specific disorders in the broader population.
An approach that has been proposed for scoring the IGDS9-SF is to use the total scale scores (Pontes and Griffiths 2015). Given that there was some degree of variability in the difficulty parameters, some may consider that the algebraic summation of unweighted raw scores of the criteria to obtain the total score is mathematically inappropriate. Put simply, assigning the same clinical weight to all nine criteria may be inappropriate given the findings encountered in the present study. Nevertheless, it is argued here that as the variability in the difficulty parameters across the criteria was small, and as the relationships between IRT estimated theta values and the total scores were (fairly) linear from + 2 SD onwards (see Fig. 3), this may not be a problem from a practical viewpoint. However, the findings presented here also showed that IGDS9-SF criteria are generally not reliable and are weak at representing the appropriate traits at low trait levels. As the total score is based on all criteria in the IGDS9-SF, it follows that total scores will include criteria with endorsements of lower responses options. Given that at this level there is low reliability and weak representation of the appropriate traits, it can be argued that total scores have questionable utility as they assume all criteria measured by the nine criteria are equally relevant and important toward IGD diagnosis. Thus, the use of total scores may not be a useful approach for inferring whether an individual would potentially qualify for the IGD. It is therefore suggested that when using the IGDS9-SF, the binary recoded scores (as suggested by Pontes and Griffiths 2015 and applied in the present study) may be a best procedure to be used.
In summary, the major contribution of the study is that this is one of the few studies to provide IRT properties of the IGD criteria as measured by the IGDS9-SF. The IRT findings indicated that all the criteria assessed by the psychometric test were strong discriminators of the IGD trait. Additionally, they measured more of the IGD trait variance and with more precision in the upper half of the trait continuum, which is useful for clinical and epidemiological studies. Despite these new findings, there are also potential limitations in the study that need to be acknowledged. First, as this study examined gamers from online communities, the findings may not be totally applicable to clinical samples. Second, it is important to keep in mind that the information for the IGDS9-SF were derived from self-ratings, which may be affected by common method. However, modeling common method variance effects in IRT procedures is complex, and was not possible in the 2PLM used in this study. Thus, it is not certain how method-related effects could have confounded the results in this study. Third, to the extent that this study examined the recoded scores of the IGDS9-SF in a USA sample, the findings must be seen as limited to gamers from this particular country. Additionally, the relevance of the findings and conclusions made here for the IGDS9-SF to other IGD rating scales or interview-based data in other national groups are uncertain and warrant further research. Fourth, as the IGDS9-SF is a clinical measure to facilitate the diagnosis of IGD, it would be useful to replicate this study with individuals diagnosed or with high levels of IGD criteria. Finally, the appropriateness of the application of 2PLM in the study could be questioned. This model assumes that traits are bipolar, that is, both ends of the trait continuum scale represent meaningful variations of the trait. Thus, the mean of the latent trait is defined as zero, with low scores reflecting levels below the average levels. According to Reise and Waller (2009), many clinical constructs could be unipolar, where one end of the trait continuum represents severity and the other end represents its absence. Lucke (2014) has suggested that for such traits, the person with a certain amount of the trait has to be referenced to the level of no trait, and not the mean. This means that low scores represent the absence of the trait and not scores below the average, and thus zero is the lowest possible latent trait score. Interestingly, he developed new IRT models (called unipolar item response models) and illustrated their applications with reference to a gambling addiction scale. Although such models may seem as viable alternative to the 2PLM for application in the current study, Lucke (2014) has pointed out that the assumption in unipolar item response models, that the probability of item endorsement is zero for those individuals with a trait level at zero, does not necessarily apply to other unipolar traits. Thus, it does not make sense to diminish the relevance of the 2PLM for the current study. Given these limitations, there is a need for additional cross-validation of the findings, keeping in mind the potential limitations discussed here. Despite these potential concerns, at the more general level, this study has shown that the use of IRT procedures can provide valuable additional psychometric information, and also inform practical and theoretical issues relevant for IGD and for clinical psychology in general. It is envisaged by the authors of the study that these findings will facilitate future research using IRT-based models for evaluating the psychometric properties of the different IGD measures that are now available.