Over the past few decades, online gaming (OG) has become incredibly popular worldwide (Hu et al. 2018; Scerri et al. 2018; Stavropoulos et al. 2018a) including the United States of America (USA) and Australia, where the present study was carried out. Approximately 97% of US adolescents are engaged with gaming, with an estimated gaming market income of $12 billion (US) (Norcia 2018). Similarly, in Australia, the gaming industry exceeds $1 billion (Australian) annually (Brand et al. 2015). Furthermore, 68% of Australians play video games, with 90% of households possessing some type of gaming device, 78% of Australian gamers being over 18 years (average age 33 years old), and 47% of gamers being females (Brand et al. 2015). The latter socio-demographic figures contradict the widespread stereotype of gamers being adolescent males (Griffiths et al. 2003). Overall, the rapid growth of OG has been interwoven with uncertainties about which populations are primarily engaged, and OG’s effect on well being (Anderson et al. 2017; Kuss and Griffiths 2012; Pontes et al. 2017a; Stavropoulos et al. 2018b). Despite OG having many positive psychosocial effectsFootnote 1 beyond satisfying one’s leisure needs (Scerri et al. 2018; Carras et al. 2018; Jones et al. 2014), for excessive gamers, psychosocial repercussions can be multiple,Footnote 2 compromising their concurrent and future adaptation (Anderson et al. 2017; Beard et al. 2017; Kuss and Griffiths 2012; Kuss et al. 2017; Pontes et al. 2017b; Kuss et al. 2018; Stavropoulos et al. 2018a, 2018b; Stavropoulos et al. 2019). The extent and the significance of the excessive OG’s negative repercussions have prompted questions and concerns regarding its classification as a distinct psychopathological entity and its inclusion in the broader category of addictive behaviorsFootnote 3 (Kuss et al. 2018; Hu et al. 2018; Scerri et al. 2018).

Disordered Gaming

Addressing these issues, the World Health Organization (WHO 2018) recently included “Gaming Disorder” (GD) under addictive disorders in the proposed beta draft of International Classification of Diseases (ICD-11). GD describes a pattern of digital-video gaming (independent of internet usage) characterized by impaired control, a gradual reduction in other activities, and continuation of gaming despite its negative impact (WHO 2018). The beta draft of ICD-11 classification followed the addition of Internet Gaming Disorder (IGD) as a diagnosis warranting further investigation in the fifth edition of the Diagnostic and Statistical Manual for Mental Disorders (DSM-5; American Psychiatric Association [APA] 2013). The present study, with its emphasis on disordered gaming involving the internet, utilizes the IGD construct. The nine IGD criteria comprise (i) preoccupation with OG; (ii) withdrawal symptoms when OG is prevented; (iii) tolerance, with a progressive increase of OG over time; (iv) using OG to escape or relieve negative mood (e.g., anxiety); (v) unsuccessful attempts to control OG; (vi) continuing excessive OG despite awareness of the risks; (vii) loss of interest in alternative forms of entertainment/satisfaction; (viii) deceiving important/close others regarding time spent OG; and (ix) jeopardizing or losing a significant relationship, job, or educational/career opportunity because of OG. Since the introduction of IGD, researchers and clinicians have begun to converge internationally on its use (when referring to excessive OG), as described in the DSM-5, and have (in the main) applied psychometric measures with high IGD construct validity (Gomez et al. 2018a; Pontes et al. 2017a; Stavropoulos et al. 2018c).

Measurement Concerns

Although the introduction of IGD (APA 2013) enhanced conceptual and construct consistency/validity concerning problematic OG, the construct was previously addressed by numerous different definitions and their associated tools/scalesFootnote 4 (i.e., disordered gaming; compulsive gaming; Anderson et al. 2017; Kuss et al. 2017; Pontes and Griffiths 2015; Kuss and Griffiths 2012), and psychometric equivalence concerns still pertain with respect to different IGD instruments (Stavropoulos et al. 2018b; Anderson et al. 2017). Indicatively, research has demonstrated that (i) scale items, representing the different (suggested) DSM-5 criteria, associate with IGD differently across cultures, (ii) the same item scores may indicate different levels of severity across populations, and (iii) scale items differ in their capacity to discriminate individuals experiencing different levels of IGD (Gomez et al. 2018a; Pontes et al. 2017; Stavropoulos et al. 2018b). Although psychometric equivalence issues confirmed across cultures, longitudinal psychometric equivalence of IGD scales remains unassessed, despite repeated recommendations (Gomez et al. 2018a; Pontes et al. 2017; Stavropoulos et al. 2018c). This deficit is important, as in order to evaluate the use of an IGD scale in clinical treatment or developmental community, monitoring requires longitudinal measurement to track clinical efficacy. Therefore, the psychometric properties of the measure must remain stable across different time points and groups of participants for it to accurately assess IGD variations over time, while increasing generalizability of results (Gomez et al. 2018b). As with any clinical instrument, differences in the pattern of responding to IGD scales over time could confound the comparability of any repeated measures obtained. More specifically, and as with invariance across cultures, IGD scale items could differently associate (i.e., load) to IGD over time, whereas the same item scores may reflect different IGD severities across repeated measures (occasionally due to regression to the mean tendencies; Pontes et al. 2017; Stavropoulos et al. 2018b). Therefore, a reduction or spike in IGD over time in both clinical or community samples may not be accurately concluded. Consequently, the purpose of this preliminary study was to address this gap by longitudinally evaluating the psychometric properties of a widely used assessment tool for IGD.

Test-Retest Measurement Invariance

To further evaluate the psychometric properties of scales, test-retest (or longitudinal) measurement invariance (MI) is recommended (Brown 2014; Gomez et al. 2018b). This process is used to confirm that ratings are equal across time points, when they represent the same level (intensity/severity) of the underlying behavior (i.e., IGD; Gomez et al. 2018b). Lack of test-retest MI indicates that ratings obtained across multiple time points cannot be reliably compared, because differences could be confounded by irregularities in the psychometric properties of the scale at these different time points (Drasgow and Kanfer 1985). As a further clarification, test-retest MI is not the same as test-retest reliability. Test-retest reliability refers to the consistency (or correlation) between scale scores recorded across repeated measures (Weir 2005), while test-retest MI indicates that the same scores observed at different points in time reflect the same level of the underlying latent variable and in fact, must be established for test-retest reliability to be reliably evaluated (Eignor 2013; Gomez et al. 2018b).

Confirmatory factor analysis (CFA) is a common method for evaluating test-retest MI (Widaman et al. 2010). This involves comparing progressively more constrained models that test several types of MI across time points. These are (i) configural invariance (the latent factor structure remains the same between time points [i.e., a unidimensional IGD structure]); (ii) metric invariance (the association of the same items with the behavior assessed will have the same strength [i.e., same factor loadings] between repeated measures); and (iii) scalar invariance (same item scores will indicate the same behavior intensity between time points; Stavropoulos et al. 2018b).

Short-Form Nine-Item IGD Scale (IGDS9-SF)

Out of the several different scales assessing the IGD construct,Footnote 5 the over-time psychometric properties of the Internet Gaming Disorder Scale 9-item-Short Form (IGDS9-SF) has been prioritized in the present study for two significant reasons: (i) it is arguably the most commonly used IGD measure globally and has been validated in more languages than any other IGD instrument; and (ii) its short length and demonstrated construct validity properties make it ideal for epidemiological and clinical use. Prior to the present investigation, there was no study assessing the psychometric properties of the IGDS9-SF over time, which suggests a gap in the reliable assessment of the progress of IGD symptoms either in the community or in clinical populations (Gomez et al. 2018a; Pontes et al. 2017; Stavropoulos et al. 2018b).

The IGDS9-SF is a short screening tool based on the nine DSM-5 criteria (Pontes and Griffiths 2015). It assesses the severity of IGD symptomology with reference to the past 12 months. The nine items included are answered using a 5-point Likert scale (1 [“never”] to 5 [“very often”]). Final IGD scores are then calculated by adding together participants’ responses (from 9 to 45), where higher scores reflect higher IGD severity. While there is currently no diagnostic threshold for the IGDS9-SF, the diagnostic framework suggested in the DSM-5 endorses consideration of a diagnosis when five out of nine criteria are identified. It follows that identification of five out of the nine IGD criteria, on the basis of answering “very often” on the IGDS9-SF, would require further consideration (Pontes and Griffiths 2015; Gomez et al. 2018a).

The Present Study

To the best of the authors’ knowledge, there are currently no studies evaluating the test-retest MI of the IGDS9-SF. Establishing test-retest MI for the IGDS9-SF would establish that same scores obtained at different time points would reflect the same severity of IGD symptomology. Conversely, a lack of demonstrable test-retest MI would mean that scores observed at different time points cannot be reliably compared as they could be confounded by psychometric irregularities in the IGDS9-SF. Such evidence would be significant for monitoring changes in IGD manifestations over time in both community and clinical settings. Consequently, the present study examined test-retest invariance of IGDS9-SF in two samples of emergent adult Massively Multiplayer Online (MMO) gamers over a 2- to 3-month period. One sample was administered the IGDS9-SF face-to-face (FtF) in Victoria (Australia), and the other administered the IGDS9-SF online in North America. The approximate timeframe of 3 months (60–90 days) was chosen as a point of reference for critical developments in both addictive behaviors (such as IGD), as well as variations in the outcome of addiction therapy (Flynn et al. 2003; Sinha 2011). The emphasis on American and Australian gamers is advocated by the fact that both countries constitute significant and expanding IG markets, as well as being at the forefront of psychometric IGD research and treatment (Stavropoulos et al. 2018b; Stavropoulos et al. 2019). From a developmental perspective, emergent adulthood (18–29 years) was targeted because it has been identified as a relatively under-researched and simultaneously high-risk IGD period (Adams et al. 2018; Burleigh et al. 2018; Liew et al. 2018). Similarly, the MMO genre was prioritized because it has a high IGD propensity compared to other OG genres (Adams et al. 2018; Burleigh et al. 2018; Liew et al. 2018). Finally, the two different collection methods, online vs. FtF, were chosen to be assessed independently, as previous literature suggests that they may associate with a lack of psychometric equivalency when the same measure is used (Weigold et al. 2013).

Methods

Participants

The Australian sample comprised 61 Australian emerging adults from the general community (45 males and 16 females) aged 18 to 29 years (M = 22.53 years, SD = 3.04), who played MMO games (e.g., World of Warcraft). The estimated maximum sampling error corresponding to a number of 61 equals 12.55% (Z = 1.96, confidence level 95%). The measurements examined were collected FtF over a 3-month period. Attrition evolved as follows: Time point 1 (TP1) 61 participants and Time Point 2 (TP2) 43 participants (29.51% attrition from TP1). Attrition was assessed with the Missing Completely at Random Test (MCRT) (Little and Rubin 2014) demonstrating unsystematic attrition patterns (MCRT = 1715.79, p = 1.00).

The US sample comprised 120Footnote 6 emerging adults (62 males and 47 females), aged 18 to 29 years (M = 22.35 years, SD = 2.82), who played MMO games. The estimated maximum sampling error referring to 120 participants is 8.95% (Z = 1.96, confidence level 95%). The measurements were collected via an online questionnaire, over an approximate 3-month period (60–90 days). Attrition presented as non-systematic (MCRT = 88.93, p = 0.26) and developed as follows: TP1 120 participants and TP2 48 participants (60% attrition from TP1).

It should be noted that (i) a minimum normative sample of 30 is recommended for the use of parametric criteria in pilot scale assessment studies (Johanson and Brooks 2010); (ii) sample sizes required tend to be smaller for higher levels of communality between the examined items and when the item to factors ratio exceeds 6 (as in the case of the IGD9S-SF; Mundfrom et al. 2005; Pontes and Griffiths 2015); and (iii) the minimum recommend ratio of five participants per item has been met for both samples examined here (Dimitrov 2012). In addition, indices based on polychoric matrices calculations, which are appropriate for non-normal distributions, such the weighted least square mean and variance adjusted (WLSMV) index have been used for the present analyses (to address potential non-normality challenges; Suh 2015). Table 1 presents general socio-demographic information of the participants that completed the baseline survey.

Table 1 Sociodemographic information of the participants

Measures

Sociodemographic questions were addressed by the participants before assessing IGD.

The short-form nine-item Internet Gaming Disorder Scale (IGDS-SF9: Pontes and Griffiths 2015) is tailored according to the nine DSM-5 IGD criteria (APA 2013). Items assess IGD behaviors (e.g., “Have you continued your gaming activity despite knowing it was causing problems between you and other people?”) on a 5-point Likert scale (1 “Never” to 5 “Very Often”). Each item score is added, resulting to a final IGD score in a 9 to 45 range, where higher scores reflect stronger IGD behaviors. The scale’s internal reliability scores werevery good to excellent across the TPs and samples in the present study (Cronbach’s αAUS T1 = 0.87, Cronbach’s αAUS T2 = 0.89; Cronbach’s αUSA T1 = 0.91, Cronbach’s αUSA T2 = 0.92).

Procedure

The study was approved by the Human Research Ethics Committees of Federation University, Australia (for the Australian sample), and Palo Alto University, USA (for the American sample), as part of a joint project on IGD risk and resilience factors in emergent adults. Australian and US permanent residents or nationals between 18 and 29 years of age, and who identified as MMO gamers, were able to take part. Australian gamers were informed about the study content via the use of both offline advertising (i.e., wall posters and information flyers) and online advertising (email and social networking sites information links) targeting the general community. Prior to completing the survey, participants had to carefully address the Plain Language Information Statement (PLIS), which described the research aims and process and clarified that participation was optional and that withdrawal at any part of the process was not penalized (or required to be explained). Then, participants provided written informed consent concerning data collection and usage. The over-time measurement was conducted FtF starting in June 2016 and finishing in September 2016. Uncompensated measurement meetings (25–35 min) were scheduled (with the consent of the participants) by a specially trained data-collection group (five undergraduate and two postgraduate psychology students). Measurements were identical across TPs, and data were matched via the use of a re-identifiable code.

US gamers were informed about the study via an Amazon Mechanical Turk (AMT) link available on various gaming websites and forums. The Plain Language Information Statement (PLIS) covered aims and reviewed information congruent with the Australian sample, and was accessible directly after the participants activated the survey link. Further information about online data collection was provided as part of the consent process. Gamers were exclusively allowed to enroll after they had previously provided digital consent (via ticking a digital consent box) to the content, process, and the aims of the study (by ticking a digital consent box). The over-time measurement was conducted online initiating in January 2017 and closed at the beginning of April 2017. Measurements included in the present study largely took place in January and March. Measurements were identical across TPs and were paired via the use of a re-identifiable code. With reference to the online collection of the US sample in particular, it needs to be noted that (i) online survey applications were deemed as sufficient and reliable modalities for conducting psychological studies (Chandler and Shapiro 2016); (ii) equivalence has been supported between FtF and internet collection methodologies (Weigold et al. 2013); and (iii) web-based data collection may enable the accessibility of otherwise hard to access groups, such as MMO gamers (Griffiths 2010).

Analytic Plan

A test-retest MI CFA procedure based on Brown’s (2014) recommendations was conducted. More specifically, IGDS9-SF structures at TP1 and TP2 were combined and assessed within the same CFA. Then, progressively more restrained CFAs examined configural (items’ loadings and thresholds free), metric (items’ loadings restrained to be equal and thresholds free), and scalar (items’ loadings and thresholds are restrained to be equal) invariance, respectively. When one of these successively nested CFAs significantly worsens the model fit, the item parameters (loadings or thresholds) dropping the fit are detected and gradually released (initiating from those with the highest modification indices) until the model fit does not significantly differ (partial invariance; Millsap and Yun-Tein 2004). The CFA fit here was comparatively evaluated using both (stricter-absolute fit) χ2 values (pΔWLSMVχ2 < .05, non-invariance) and incremental fit differences in RMSEA, CFI, and TLI rates (ΔRMSEA > .015 and ΔCFI > .015, non-invariance; Chen 2007; Cheung and Rensvold 2002; Gomez et al. 2018b).

Results

The unidimensional IGD structure was first assessed across countries and TPs. Among Australian gamers, the model demonstrated sufficient fit (based on Hu and Bentler [1999] benchmarks) for TP1 (χ2 = 28.85, p = .368, CFI = .99, TLI = .98, RMSEA = .039) which, although remained adequate, dropped for TP2 (χ2 = 34.99, df = 28, p = .088, CFI = .93, TLI = .90, RMSEA = .094). Australian gamers’ item unstandardized loadings for TP1 varied from .479 to .938; see Fig. 1) and for TP2 from .517 to .884 (see Fig. 2).

Fig. 1
figure 1

Australian gamers’ item unstandardized loadings for TP1

Fig. 2
figure 2

Australian gamers’ item unstandardized loadings for TP2

Among American gamers, the model had adequate fit (based on Hu and Bentler [1999] benchmarks) for TP1 (χ2 = 43.83, df = 28, p = .029, CFI = .99, TLI = .98, RMSEA = .093, p < .001), which, similarly to Australian gamers, dropped at TP2 (χ2 = 68.47, df = 27, p < .0001, CFI = .97, TLI = .96, RMSEA = .148). Among American gamers, item unstandardized loadings for TP1 varied from .652 to 1.854 (see Fig. 3) and for TP2 from .554 to 2.507 (see Fig. 4).

Fig. 3
figure 3

American gamers’ item unstandardized loadings for TP1

Fig. 4
figure 4

American gamers’ item unstandardized loadings for TP2

Test-retest MI analyses for the Australian and American participants proceeded as a second step. The unidimensional IGD configural model demonstrated acceptable fit for the Australian gamers (χ2 = 184.61, p = .0004; RMSEA = .103; CFI = .94; TLI = .93), which significantly dropped, at the metric level based on both the more conservative chi-square absolute fit difference test (ΔWLSMVχ2 = 37.11, p < 0.001), as well as the more lenient incremental fit differences (ΔRMSEA = .018; ΔCFI = .02). Progressively relaxing item loadings 2 and 3, as indicated by modification indices, led to a partial metric invariance model that did not significantly differ from the fit of the configural model considering both absolute fit (ΔWLSMVχ2 = 8.29, p = .217) and incremental fit differences (ΔRMSEA = .005; ΔCFI = .01). For scalar invariance, this partial metric model was expanded with all item thresholds being equal and produced a significant drop of fit from that of partial metric based on absolute fit differences (ΔWLSMVχ2 = 74.97, p < .001) and insignificant based on more lenient incremental fit differences (ΔRMSEA = .005; ΔCFI = .01). Gradually releasing thresholds with the higher modification indices, namely all thresholds of Items 1 and 2 and thresholds 1 (“never”), 3 (“sometimes”) for Items 4, 6, 8, and 9 informed a partial scalar invariance model with acceptable fit indices (χ2 = 207.35, p < .001; RMSEA = .094; CFI = .94; TLI = .94) and insignificant absolute and incremental fit difference (ΔWLSMVχ2 = 26.77, p = .062; ΔRMSEA = .004; ΔCFI = .01) from the model fit of the partial metric model (see Table 2).

Table 2 Test-retest measurement invariance of IGDS9-SF in Australian and USA gamers

For the American gamers, the unidimensional IGD configural model demonstrated sufficient fit (χ2 = 164.68, p = .010; RMSEA = .069; CFI = .98; TLI = .97), which did not significantly drop at the metric level both on the basis on absolute and incremental fit differences (ΔWLSMVχ2 = 11.14, p = .193; ΔRMSEA = .002; ΔCFI = .00). For scalar invariance, the metric model was expanded with all item thresholds being equal that once again did not produce a significant drop of fit both on the bases of absolute and incremental fit (ΔWLSMVχ2 = 30.03, p = .747; ΔRMSEA = − .016; ΔCFI.00). This final scalar invariance model additionally demonstrated sufficient fit indices (χ2 = 198.17, p = .062; RMSEA = .051; CFI = .98; TLI = .98) (see Table 2).

Discussion

The present study is the first to examine the test-retest measurement invariance of IGD ratings over a 3-month period, as assessed (online and FtF) with the short-form nine-item Internet Gaming Disorder Scale (IGDS9-SF), across two normative national (US and Australian) samples of emergent adult MMO gamers. Prior to this evaluation, the study examined the support for the unidimensional IGD model across both time points in the two samples. The findings indicated reasonable support for the unidimensional IGD structure across both time points in the two samples. Consistent with these findings, other existing data also show support for the unidimensional IGD model (e.g., Gomez et al. 2018a; Pontes et al. 2017; Stavropoulos et al. 2018a). Two separate sequences of successive CFAs were later calculated to longitudinally assess the psychometric properties of the IGDS9-SF across the two time points in the two populations. Configural invariance was established across both samples, and metric and scalar invariances were supported for the US online sample on the bases of both absolute and incremental fit indices. Interestingly, only partial metric (factor loadings for Items 2 and 3 non-invariant) and partial scalar invariance (i.e., all thresholds of Items 1 and 2, and thresholds 1, 3, for Items 4, 6, 8, and 9 non-invariant) were established for the Australian FtF sample based on the more conservative absolute fit differences approach. Taken together, the results can be interpreted as reasonably good support for test-retest measurement invariance, particularly for US online ratings across a 2- to 3-month interval in community samples. Nevertheless, caution is recommended for Australian gamers and/or assessed FtF. More specifically, Items 1 and 2 (preoccupation and withdrawal) may differ over time with respect to the IGD construct, while the absent (“never”) and moderate (“sometimes”) scores in Items 4 (loss of control), 6 (conflict), 8 (mood modification), and 9 (consequences) may not discriminate between symptom severity equivalently over time.

IGD Unidimensional Structure

The sufficient fit of the unidimensional model of IGD across both time points in the two samples of gamers is consistent with past findings across a diverse group of different national populations (Gomez et al. 2018a; Pontes et al. 2017a; Stavropoulos et al. 2018b). Consequently, the findings here further strengthen previous literature indicating that IGD behaviors are experienced, and therefore reported, as aspects of one problematic dimension, which may not be meaningfully divided into further clinical sub-dimensions, such as preoccupation or withdrawal (Pontes and Griffiths 2015). Interestingly, the present study indicates that this unidimensional perception of IGD does not change over time (i.e., within the 90-day timeframe), which is considered a significant threshold for variations in addictive behaviors (Flynn et al. 2003; Sinha 2011). Finally, the consistency of the perception and reporting of the one-factor structure of the IGD construct appears not be confounded by the means of data collection (i.e., FtF or online), as the unidimensional model was appropriate for both samples.

Items Loadings and Thresholds Differences in the Australian FtF Sample

Despite the overall thrust of the present findings, in the Australian FtF sample, significant variations were detected in the strength of the associations between Items 1 and 2 and the IGD construct (based on both the stricter absolute fit [ΔWLSMVχ2] and the more lenient incremental fit indices [ΔRMSEA; ΔCFI] differences), as well as scores “1” and “3” in Items 4, 6, 8, and 9 and their reflected intensity of IGD behaviors (based only on the stricter absolute fit [ΔWLSMVχ2]) over a period of time. Item 1 describes IGD-related “preoccupation” (Do you feel preoccupied with your gaming behavior? Examples: Do you think about previous gaming activity or anticipate the next gaming session? Do you think gaming has become the dominant activity in your daily life?), and Item 2 reflects “withdrawal symptoms” associated with gaming abstinence or reduction (Do you feel more irritability, anxiety, or even sadness when you try to either reduce or stop your gaming activity?). Significant loading differences across the two timepoints suggest that the experience of “preoccupation” and “withdrawal symptoms,” as indicative of IGD, varies over time within the interval examined. Interestingly, the strength of the association between preoccupation (Item 1) and IGD was evidenced to increase over the 3-month period (see Fig. 1 vs. Fig. 2), while the reverse occurred with “withdrawal symptoms” (Item 2). Accordingly, it is suggested that while the reported “preoccupation” should be addressed as a more significant indication of IGD at the initial assessment, this weakens over time, while the opposite occurs with “withdrawal symptoms.”

Due to significant differences over a 3-month period in thresholds “1” and “3” in Items 4, 6, 8, and 9 (based on the stricter absolute fit [ΔWLSMVχ2] rather than the incremental fit [ΔRMSEA; ΔCFI] differences), caution is recommended in their comparability over time. More specifically, over time, thresholds 1 (never) and 3 (sometimes) reflected different levels of IGD intensity and therefore may be cautiously comparable when reporting “relapse” (Item 4; Do you systematically fail when trying to control or cease your gaming activity?), “relationship conflicts/ problems” (Item 6; Have you continued your gaming activity despite knowing it was causing problems between you and other people?), “mood modification” (Item 8; Do you play in order to temporarily escape or relieve a negative mood [e.g., helplessness, guilt, anxiety]?), and “professional, educational functionality issues” (Item 9; Have you jeopardized or lost an important relationship, job or an educational or career opportunity because of your gaming activity?). It is also important to note that thresholds 1 (never) and 3 (sometimes) reveal nonexistent and moderate levels of IGD manifestations, which are supported to be non-diagnosable (Gomez et al. 2018a). Furthermore, the differences detected were only based on the more-strict absolute fit indices (ΔWLSMVχ2) and not the more lenient incremental fit indices differences (ΔRMSEA; ΔCFI). Therefore, there is support that all four items provide comparable scores on the diagnosable level, while increased caution and clinical judgment are recommended in comparing “never” and “sometimes” responses provided FtF in a community sample of Australian gamers.

Consistency of Loadings and Thresholds in the US Online Sample

In relation to the US online sample of gamers, the findings demonstrated non-significant variations in all items’ loadings and thresholds across time (based on both the stricter absolute fit [ΔWLSMVχ2] and the more lenient incremental fit indices [ΔRMSEA; ΔCFI] differences). These results indicate that all items’ associations with the IGD construct remain steady over the 3-month period, while all scores given indicate similar comparable levels of severity. Therefore, the IGD9S-SF can be safely used to assess and monitor IGD behavior changes because the results are not confounded by potential psychometric irregularities of the scale of data collected from an online community US sample.

The differences in the test-rest invariance findings between the Australian FtF sample and the US online sample need to be interpreted with caution. First, it could be indicative of measurement invariance, and thus differences in the psychometric properties of the scale, across the two populations, already highlighted by existing research findings (Stavropoulos et al. 2018b). Second, it is likely that the data collection method could have confounded the consistency of the measurement properties of the instrument for the Australian FtF sample (Weigold et al. 2013). Nevertheless, even in this case, only differences in the strength of the association between “preoccupation” and “withdrawal symptoms” and the latent IGD construct were significant based on both absolute and incremental fit difference indices ([ΔWLSMVχ2; ΔRMSEA; ΔCFI], while threshold differences were insignificant based on incremental fit indices [ΔRMSEA; ΔCFI]. Subsequently, one could assume that IGD9S-SF scores are comparable and sufficient to assess and monitor changes in IGD behaviors longitudinally across both populations, independent of whether these were assessed with online or FtF reporting methods. Nevertheless, “preoccupation” should be considered as a more significant IGD indicator among Australian gamers assessed FtF at the initial assessment, while the opposite is recommended for “withdrawal symptoms.” Changes in the sample should also be considered, and future studies are encouraged to include further information about the contribution of changes in gaming behavior, as these may offer insights to the relative contribution of gaming behavior to changes in IGD symptoms.

Limitations and Conclusions

Although the present study is the first to assess the test-retest invariance properties of IGD9S-SF and provides valuable insight into the capacity of the instrument to adequately assess and monitor IGD behavior variations over time, several significant limitations need to be considered and addressed in further research. Specifically, the gamers assessed here were collected from the community, resided in two specific countries, and their IGD fluctuations were only assessed over the study period. Therefore, results may not be generalizable across clinical IGD samples, gamers from different countries, and/or measurement over longer time periods. Furthermore, the participants in the present study were all MMO gamers and emergent adults, which may pose the need for careful considerations of applying these findings to different game genres and age ranges. Finally, although normative, both samples assessed here were relatively small and therefore the study would need replicating utilizing larger samples. Despite these limitations, the present study provides a first confirmation about the suitability of IGD9S-SF to temporally assess and monitor changes of IGD behaviors.