Electronic gaming has become widespread and popular worldwide, playing a major role in the leisure and social pursuits of children, adolescents, and adults (Paulus et al., 2018; Pontes, 2018). The recent figures suggest that there are more than 214 million gamers across the United States (US; Entertainment Software Association, 2020), and about 51% of the European population plays video games (Europe’s Video Game Industry, 2019). In Spain for example, about 15 million people actively played video games in 2019 (Spanish Video Game Association, 2020), further supporting the pervasiveness of gaming in this day and age.

Given the rise in popularity of gaming in today’s society, a key area of research within psychology and psychiatry is related to the potential positive (see Griffiths (2019) and Mandryk et al. (2020)) and negative effects stemming from electronic games, with several studies reporting a wide range of detrimental effects elicited by video game play behaviors, such as aggression (Lemmens et al., 2011; López-Fernández et al., 2020), addiction (Pontes, 2018; Pontes & Griffiths, 2014), and other resulting psychiatric comorbidities (Pontes, 2018; Sherry, 2001), including but not limited to attention-deficit/hyperactivity disorder (ADHD) (Stavropoulos et al., 2019), autism spectrum disorders (ASD) (Craig et al., 2021), and depression (Ostinelli et al., 2021), among other behavioral addictions (Rozgonjuk et al., 2021), with recent evidence linking disordered gaming with poor physical health through impaired psychological well-being (Moore et al., 2021).

Due to its well-documented addictive and detrimental effects (see Männikkö et al. (2020) and Pontes (2017)), the American Psychiatric Association (APA) identified “Internet Gaming Disorder” (IGD) as a tentative disorder in the 5th revision of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; APA, 2013) in 2013. Following this, in 2019 the World Health Organization (WHO) included “Gaming Disorder” (GD) in the 11th revision of the International Classification of Diseases (ICD-11; WHO, 2019), further recognizing disordered gaming as an official mental health disorder. Arguably, the inclusion of IGD in the DSM-5 was the first key milestone for research examining the addictive effects of video games, but the culmination of scientific effort regarding its conceptualization and legitimacy was formalized by the inclusion of GD in the ICD-11 and its official recognition as an addictive disorder by the WHO (Pontes & Griffiths, 2020).

Taking this into account, the APA has specified that IGD may be present upon the endorsement of at least five of the nine following diagnostic criteria within a 12-month timeframe: (1) preoccupation with gaming; (2) withdrawal symptoms when gaming is taken away; (3) tolerance, leading to greater amounts of time gaming; (4) unsuccessful attempts to control gaming involvement; (5) loss of interest in previously enjoyed activities as a result of, and with the exception of games; (6) continued excessive gaming behavior despite awareness of problems; (7) deception of family members, therapists, or significant others regarding the amount of gaming; (8) gaming in order to escape or relieve negative moods; and (9) jeopardizing or losing a significant relationship, job or education, or career opportunity because of participation in games (APA, 2013).

Moreover, the WHO further conceptualized GD by suggesting that it can occur both online and/or offline and that GD requires the experience of (1) diminished control in relation to gaming (e.g., onset, frequency, intensity, duration, termination, and context), (2) an increase in the priority given to gaming to a point that it takes precedence over other important life interests and activities, and (3) continuation or escalation of gaming regardless of the experience of detrimental consequences. Furthermore, the gaming behavior pattern should be of sufficient severity to result in significant impairments in personal, family, social, educational, occupational, or other important areas of life (WHO, 2019).

The Current Study

Since the earlier inclusion of IGD by the APA in the DSM-5 as a tentative mental health disorder, several psychometric tests have been developed to assess disordered gaming based on the APA framework (Pontes et al., 2021), among which a few have been translated and psychometrically validated to Spanish-speaking samples. Specifically, the Internet Gaming Disorder Test (IGD-20 Test; Pontes & Griffiths, 2014), the Internet Gaming Disorder Scale-Short Form (IGDS9-SF; Pontes & Griffiths, 2015), and the Ten-Item Internet Gaming Disorder Test (IGDT-10; Király et al., 2017) have all been tested in Spanish-speaking populations, with studies generally reporting promising psychometric properties (Beranuy et al., 2020; Fuster et al., 2016; Király et al., 2019; Maldonado-Murciano et al., 2020; Sánchez-Iglesias et al., 2020). Although these standardized psychological measures are useful and psychometrically sound for assessing disordered gaming (Poon et al., 2021), they have been developed to assess IGD under the APA framework. Therefore, they do not fully take into account the latest conceptualization of GD proposed by the WHO in the ICD-11, making it necessary to further investigate and refine the psychometric assessment of GD under the WHO framework (Montag et al., 2019).

To bridge this gap in the assessment of disordered gaming, a brief standardized psychological measure including four items reflecting the key defining diagnostic features of GD according to the WHO framework has been recently developed (Pontes et al., 2021). The Gaming Disorder Test (GDT) was originally developed in English and Chinese-speaking samples (Pontes et al., 2021) due to GD being an emerging public health concern in Asia and developed Western countries (Evren et al., 2020). The GDT has been subsequently psychometrically validated and adapted in German (Montag et al., 2019) and Turkish samples (Evren et al., 2020), with the findings of these recent studies suggesting that the GDT presents with robust psychometric properties and that it is a suitable psychological measure to assess GD across several populations according to the WHO framework.

Based on the aforementioned rationale, the goal of the present study was to develop the first Spanish version of the GDT through Classical Test Theory (CTT) and Item Response Theory (IRT) in order to report its psychometric suitability to assess GD under the WHO framework within the Spanish population. By achieving this goal, this study will be contributing to the field by providing additional information on the suitability of the WHO framework to assess disordered gaming within an international context (i.e., non-English context), adding to the knowledge base about the assessment of GD while further providing a practical resource to health professionals in Spain to assess the severity of GD within the Spanish cultural context.

Methods

Participants and Procedures

A sample of Spanish gamers was recruited using two inclusion criteria (i.e., being at least 16 years and having played a video game once in their lifetime). The study was conducted in accordance with the Declaration of Helsinki and approved by the Committee on Bioethics of the University of Barcelona.

Participation was voluntary, and no financial compensation was offered to eligible participants. An online informed consent was obtained from all participants after they had been informed about the anonymous and confidential nature of the study. Data collection was conducted using an online survey hosted on Qualtrics, which included questions assessing participants’ sociodemographic status, gaming behaviors, personality traits, and psychiatric symptoms. Data collection spanned from April 15 to November 6, 2020, and the survey was advertised online via multiple social media platforms (i.e., Facebook, Instagram, Reddit, and Twitter) and on the online course management system of a second-year course of the degree of Psychology of the University of Barcelona.

A sample of 569 participants was initially recruited. However, a total of 31 participants were excluded from the study for presenting with missing data (n = 21, 3.69%) or for declaring being a professional video game player (n = 10, 1.76%). Since no additional participants were removed due to missing data, a final sample of 538 participants was achieved and their data were subsequently included in the statistical analyses conducted.

Within the final sample, a 42.94% (n = 231) of participants were female (age range: 18–57 years) and 57.09% (n = 307) were males (age range: 18–56 years). The overall mean age was 23.29 years (SD = 7.24; range: 16–57 years). Moreover, most participants completed a secondary educational level (55.76%, n = 300) or a higher educational level (30.85%, n = 166). In relation to gaming behaviors, participants reported having played an average of 1.98 hour a day on working days (SD = 2.07; range: 0–16 hours) and 3.48 hours during non-working days (SD = 2.86; range: 0–16 hours).

Measures

Sociodemographic Data

The survey collected sociodemographic data aligned with previous similar psychometric studies (e.g., Maldonado-Murciano et al., 2020). More specifically, data were collected about participants’ gender, age, educational level achieved, and gaming-related behaviors (e.g., time spent gaming during the working days and non-working days such as weekends and holidays).

Gaming Disorder Test (GDT) (Pontes et al., 2021)

The GDT is a brief 4-item standardized psychological test assessing disordered gaming according to the WHO framework, adopting the proposed conceptualization for GD by the WHO in the ICD-11. The first three items of the GDT reflect (1) impaired control over gaming, (2) increased priority given to gaming, and (3) continuation despite negative consequences, while the last and fourth item assesses potential functional impairments by evaluating gamers’ (4) experience of significant problems in life due to GD.

Responses to all four GDT items can be given on a 5-point scale ranging from 1 (never) to 5 (very often). Total scores can range from 4 to 20 points, with higher scores indicating greater degrees of disordered gaming. For non-clinical purposes, answers given to all four GD items as 4 (often) or 5 (very often) can be coded to reflect endorsement of a specific GD criterion.

The Spanish version of the GDT was devised by adopting a double-translation and reconciliation procedure for the translation of the original English items of the GDT. This procedure involved two psychologists who independently translated the original GDT items from English into Spanish. Following this, a third independent translator identified and resolved any discrepancies between the alternative Spanish forward translations generated (International Test Commission, 2018). Interested readers can access the final Spanish version of the GDT in Appendix, with further information about the GDT being presented in the author's website (www.halleypontes.com/gdt).

Internet Gaming Disorder Scale-Short Form (IGDS9-SF) (Pontes & Griffiths, 2015)

The IGDS9-SF is a 9-item standardized test used to assess disordered gaming as per the APA framework for IGD defined in the DSM-5. All nine items can be responded to using a 5-point response scale ranging from 1 (never) to 5 (very often), and greater overall scores indicate higher levels of disordered gaming symptoms. The IGDS9-SF has been shown to present with robust psychometric properties according to a recent review study of 21 studies across 15 languages that employed the IGSD9-SF (Poon et al., 2021). Spanish IGDS9-SF has been found to present with excellent psychometric properties under the CTT (Beranuy et al., 2020; Sánchez-Iglesias et al., 2020) and IRT frameworks (Maldonado-Murciano et al., 2020). In the present study, the IGDS9-SF exhibited high levels of internal consistency (α = 0.90 and ω = 0.85).

Mini International Personality Item Pool-Five-Factor Model-Positively Worded (Mini-IPIP-PW) (Donnellan et al., 2006)

The Mini-IPIP-PW was used to evaluate personality traits under the five-factor model of personality (Goldberg, 1992). These include the traits extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience. The test comprises 20 items answered using a 5-point response scale response ranging from 1 (totally disagree) to 5 (totally agree). In the present study the Spanish positively worded version was utilized (Bados López et al., 2005) as it has been shown to have high levels of reliability, convergent, and predictive validity (Martínez-Molina & Arias, 2018). The Spanish Mini-IPIP-PW has exhibited high levels of internal consistency across its different domains in the present sample (see Table 1).

Table 1 Descriptive statistics of the IGDS9-SF, the MINI-IPIP-PW, and the DASS-21 and their correlations with the Gaming Disorder Test (GDT)

Depression, Anxiety, and Stress Scales (DASS-21) (Lovibond & Lovibond, 1995)

The DASS-21 includes 21 items that can be responded to on a 4-point Likert scale ranging from 0 (did not apply to me at all) to 3 (applied to me very much, or most of the time). The DASS-21 is used to assess psychiatric symptoms of depression, anxiety, and stress. The Spanish DASS-21 has been shown to exhibit adequate internal consistency, satisfactory convergent validity, and acceptable discriminant validity (Bados López et al., 2005). In the present study, the Spanish DASS-21 has exhibited high levels of internal consistency across its different domains (see Table 1).

Data Analysis

For the purpose of describing the distribution of the GDT items’ scores, the frequency of endorsement of each item response category and item skewness and kurtosis were obtained. Similarly, for the GDT total score, general descriptives and the Shapiro–Wilk test (W) of univariate normality were computed for overall sample. The Mardia test was also utilized to assess multivariate normality across the GDT items.

In order to assess the one-factor structure of the Spanish GDT, a Confirmatory Factor Analysis (CFA) was estimated using the Weighted Least Square Mean and Variance Adjusted (WLSMV), which has been found to provide accurate parameter estimates with ordinal items, whereby items are rated with few response categories, in relatively small sample sizes, and when departures from multivariate normality are observed (Li, 2016). The model fit was assessed with the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), the Root Mean Square Error Approximation (RMSEA), and the Standardized Root Mean Square Residual (SRMR). Goodness of fit was interpreted using the recommended guidelines proposed by Hu and Bentler (1999) where an adequate fit was observed when CFI ≥ 0.95, TLI ≥ 0.95, RMSEA ≤ 0.06, and SRMR ≤ 0.08.

The Average Variance Extracted (AVE) coefficient for the GD factor was also estimated. Reliability was additionally assessed using different indicators (i.e., Cronbach’s alpha (α), McDonald’s omega (ω), and Composite Reliability (CR)). Moreover, validity based on relationships with other relevant variables was assessed by computing Pearson correlation coefficients between the GDT and the other relevant psychometric tests used in the study for measuring GD, personality traits, and psychiatric symptomatology (i.e., IGDS9-SF, MINI-IPIP-PW, and DASS-21, respectively).

In order to complement the CTT analyses performed, a follow-up IRT analysis was conducted on the Spanish GDT as IRT models produce useful information about the quality of items and provide measures of precision at different levels of the trait (\(\theta\)) (Embretson & Reise, 2000). Previously to the IRT analysis, local independence and unidimensionality assumptions were respectively inspected, computing Yen’s Q3 statistic, with critical values of item residual correlations > 0.20 indicating local dependence (Chen & Thissen, 1997; Christensen et al., 2017) and fitting a one-factor model by the CFA formerly described. Due to the ordinal nature of the four items, we compared the model fit of three competing models: the Partial Credit Model (PCM) (Masters, 1982), the Generalized Partial Credit Model (GPCM) (Muraki, 1992), and the Graded Response Model (Samejima, 1999). The fit of the models was compared using the Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC), selecting the model with lower values, which indicates closer fit to the true model (Burnham & Anderson, 2004).

For the best fitting model, we calculated the item fit parameters using S-χ2 statistic and the INFIT and OUTFIT (Wright & Panchapakesan, 1969), the Zh person fit statistic (Drasgow et al., 1985), the items’ Operating Characteristic Curves (OCC), and the information function of both the items and the test. Potential item misfit is indicated by statistically significant values of the S-χ2 statistic (Kang & Chen, 2008; Orlando & Thissen, 2000) and by INFIT and OUTFIT values less than 0.5 or greater than 1.5 (de Ayala, 2009), while values of the Zh statistic ≤  − 2 suggest person misfit (Desjardins & Bulut, 2018).

Additionally, Measurement Invariance (MI) of the Spanish GDT across gender was investigated. MI analysis is grounded on the notion that a psychometric test measuring a given trait should reveal differences among individuals if those individuals actually differ on the trait (Millsap, 2011). Thus, if we intend to use GDT scores to make comparisons among male and female gamers, we must then ensure that the latent variable (i.e., GD) is functioning similarly across the two groups of participants. The evaluation of gender invariance was carried out sequentially assessing (1) configural invariance, to investigate the equivalence of the factor structure; (2) weak or metric invariance, to test equivalence of the item loadings on the latent factor; (3) strong or scalar invariance, to test the equivalence of intercepts; and (4) strict or residual invariance, to assess the equivalence of residual variances (Desjardins & Bulut, 2018).

According to Sass et al. (2014), since interpreting changes in approximate fit indices may be controversial with diagonal weighted least squares-based estimators, the comparison between MI models was based on the examination of chi-square difference (Δχ2) and its corresponding statistical significance. Finally, since a non-normal distribution of the GDT total scores was observed, the non-parametric test Mann–Whitney U test was adopted in order to estimate gender differences on GDT total scores.

All statistical analyses were conducted with R (version 1.0.136) (R Core Team, 2021), using the packages lavaan (Rosseel, 2012) for the CFA and MI analysis, semTools (Jorgensen et al., 2020), and mirt (Chalmers, 2012) for the IRT analysis.

Results

Distribution of GDT Scores

As shown in Table 2, most of the participants’ responses for all items are ubicated in the first or second response category (never or rarely, respectively), with very few participants endorsing the fifth category (very often). Skewness and kurtosis coefficients suggested that item 4 was right-skewed, with almost 80% of participants endorsing the lowest item response category (never). In terms of univariate normality testing, results showed that the GDT total score was positively skewed (W = 0.879, p < 0.001), indicating a tendency to lower levels of GD (M = 7.04, SD = 2.98, range: 4–17, Md = 2.97). As for the multivariate normality assessment, the results of the Mardia test found that the data was not normally distributed (skewness = 879.901, p < 0.001; kurtosis = 26.018, p < 0.001) Table3 and 4.

Table 2 Endorsement, kurtosis, and skewness of GDT items
Table 3 Item statistics for the Partial Credit Model (PCM) across the four items of the Spanish Gaming Disorder Scale (GDT)
Table 4 Gender measurement invariance indices of the Spanish GDT

Dimensionality

A CFA was carried out on the four items of the Spanish GDT in order to test the unidimensionality of the scale. The results obtained supported a one-factor solution (χ2(2) = 3.847, p = 0.146; CFI = 0.999; TLI = 0.997; RMSEA = 0.041 [90% CI: 0.000–0.104], p = 0.489; SRMR = 0.016). The path diagram (see Fig. 1) shows that all standardized factor loadings were high and statistically significant (λ > 0.750, p < 0.001).

Fig. 1
figure 1

Path diagram with summary of the confirmatory factor analysis (CFA) obtained from the four items of the Gaming Disorder Test (GDT). Notes: GD, gaming disorder

Reliability

The AVE is used as evidence of adequate convergence between items of a psychometric test when its value is ≥ 0.50 and CR is ≥ 0.70 (Hair et al., 2010). The GDT obtained an AVE of 0.669 and CR of 0.890. Moreover, further evidence supporting the reliability of the Spanish GDT was obtained with a Cronbach’s alpha of α = 0.889 and a McDonald’s omega of ω = 0.839. Based on these results, it can be concluded that the Spanish GDT presents with adequate levels of convergence between items and reliability.

Concurrent, Convergent, and Discriminant Validity

As can be seen in Table 1, the GDT was strongly associated with the IGDS9-SF (r = 0.76). In relation to the personality and psychiatric symptoms scores, the GDT moderately correlated with the depression subscale (r = 0.23) and with consciousness (r =  − 0.24), respectively. Furthermore, a weak association was observed between GDT and agreeableness (r =  − 0.16), openness (r = 0.10), and stress (r = 0.14). Taken together, these findings provide adequate evidence of concurrent, convergent, and discriminant validity for the Spanish GDT.

IRT Analysis of the Spanish GDT

Unidimensionality and local independence assumptions were met as shown in the CFA and the results of the Yen’s Q3 statistic, which indicated that residual correlations ranged between − 0.383 and − 0.162. The AIC (AICPCM = 4137.917; AICGPCM = 4143.280; AICGRM = 4139.131) and the BIC (BICPCM = 4210.810; BICGPCM = 4229.037, BICGRM = 4325.016) values obtained suggest that the IRT model with best fit to the data was the PCM. Table 3 shows the item difficulty (β) parameters and item fit statistics.

Table 5 Descriptive statistics and reliability indices of the Gaming Disorder Test (GDT) in male and female participants

Altogether, the item threshold parameters covered a wide range of the latent trait, especially the middle-upper band (i.e., ranging from − 0.16 to 5.21), suggesting that a high latent trait level is needed to endorse high item response categories. The higher values observed in item 4 (i.e., I have experienced significant problems in life (e.g., personal, family, social, education, occupational) due to the severity of my gaming behavior) are indicating that this is the most difficult item to endorse (i.e., high level of GD is needed to endorse the high item response categories). Conversely, item 1 (i.e., I have had difficulties controlling my gaming activity) was the easiest item to endorse (i.e., low level of GD is needed to endorse high item response categories). Regarding item fit based on S-χ2 statistic, results indicated that GDT items 1, 2, and 4 presented with adequate fit to the PCM model, while item 3 exhibited poor fit.

A visual inspection of the empirical plot for item 3 (see Fig. S1) suggested that the misfit was due to discrepancies between the theoretical model and the empirical data on response categories 2 (rarely) and 3 (sometimes). However, since the S-χ2 statistic is very sensitive to sample size (Jöreskog, 1993), items’ INFIT and OUTFIT values were inspected showing that all values were within the recommended range (0.5 ≤ INFIT/OUTFIT ≤ 1.5). To further assess the model fit, person fit indices and items’ OCCs were calculated (Embretson & Reise, 2000). Person fit Zh statistic showed that 98.33% of participants’ response patterns were aligned with the PCM (see Fig. S2). In addition, the items’ OCCs (Fig. 2) indicated that all four items’ response categories of the Spanish GDT were ordered according to increasing levels of the latent variable and did not overlap between them, demonstrating the suitability of the 5-point response scale of the Spanish GDT items.

Fig. 2
figure 2

Operating Characteristic Curves (OCC) of the four items of the Gaming Disorder Test (GDT)

The item information function shows the amount of information that each item explains as a function of the latent trait level. As shown in Fig. 3, item 1 (i.e., impaired control over gaming) and item 4 (i.e., experience of significant problems in life) were more informative at the medium levels of the latent trait (i.e., θ close to 0). In contrast, item 2 (i.e., increased priority giving to gaming) was more informative at higher levels of the latent trait (i.e., peak of precision around θ = 3.5) and to a greater extent item 3 (i.e., continuation despite negative consequences) was more informative even at higher levels of the latent trait (i.e., peak of precision around θ = 4).

Fig. 3
figure 3

Item information curves of the four items of the Gaming Disorder Test (GDT)

The test information function and standard error (Fig. 4) revealed that the test as a whole was more informative at the highest levels of the trait, more specifically when the trait remains between 1 and 5, and is less precise at lower levels of the latent trait (i.e., θ < 0).

Fig. 4
figure 4

Test information curve and standard error of the Gaming Disorder Test (GDT)

Measurement Invariance Between Genders

The GDT MI between genders was limited to male and female participants, leaving aside those who identified themselves with another gender due to the low sample size in this particular group (n = 11, 2.04%). As shown in Table 4, the chi-square difference statistic was not statistically significant across the configural, metric, scalar, and strict invariance models. This suggests that the Spanish GDT presents strict invariance, where loadings, intercepts, and residual variances are invariant between males and females.

The GDT total score was compared between males and females with the Mann–Whitney U test, suggesting that there were no differences between genders (U = 32,364, p = 0.287). Descriptive statistics and reliability indices of the GDT total score are shown in Table 5 for both gender groups.

Discussion

The present study sought to adapt the GDT into Spanish and to conduct a psychometric validation using both CTT and IRT analytical frameworks, with the goal of testing the suitability of the GDT for assessing GD within the Spanish context. For this purpose, the English version of the GDT was translated into Spanish following conventional international standards (International Test Commission, 2018), administered to a large sample of Spanish video gamers, and data on psychometric indicators was gathered in a cross-sectional online survey study.

Based on the findings reported, it can be concluded that the GDT is a valid and reliable psychometric test for assessing GD as suggested by the ICD-11 (World Health Organization, 2019). More specifically, the results of the CFA supported the unidimensionality of the Spanish GDT, with high and statistically significant standardized factor loadings. This unifactorial model was found to be gender invariant at the highest level (i.e., strict invariance). This implies that the GD construct as assessed by the GDT is equivalent across males and females and, consequently, GDT scores present the same meaning for both genders, enabling adequate comparisons of the mean GDT total scores between male and female gamers.

This result is encouraging as it is the first gender MI analysis of the GDT in Spanish population. However, further invariance testing assessing other relevant groups (e.g., GD diagnostic invariance or cross-cultural invariance) would provide additional information about the capabilities of the GDT in terms of MI across groups. In contrast to other studies (Arıcak et al., 2018), male and female participants in our study showed similar levels of GD.

The results pertaining to the reliability analysis of the Spanish GDT demonstrated high levels of internal consistency for the overall sample and across genders in terms of the Cronbach’s alpha and omega reliability coefficients. A detailed analysis at the item level by means of IRT indicated that the Spanish GDT items performed adequately and were more informative at medium–high levels of the GD trait. Despite the statistical misfit of item 3 to the PCM, this item was retained given that other indicators analyzed in the present study supported its adequate performance, and its retention balances the content coverage of the GDT as based on the ICD-11 criteria of GD, and other studies with empirical data have shown that the misfit generally implies a negligible practical impact on score estimates (Sinharay & Haberman, 2014; Zhao, 2017). Furthermore, the test information functions revealed that GDT scores were more accurate at middle-upper levels of the latent trait, suggesting that beyond its application in community samples, the GDT may be especially useful in clinical settings where GD symptoms are prominent and the brevity of administration is a priority.

Moreover, the results obtained provided further support for the concurrent, convergent, and discriminant validity of the Spanish GDT. Concurrent validity was supported by the strong correlation between GDT and IGDS9-SF scores, as they reflect the intensity of disordered symptoms. The strength of this association was similar to that obtained in previous studies (Pontes et al., 2021). Furthermore, GDT scores were also moderately associated with depression and weakly associated with stress (convergent validity). Previous research has shown that depression symptoms have been consistently associated with severity of GD (Ostinelli et al., 2021), showing that adults with depression may resort to gaming to escape from adverse emotions (Kim et al., 2017). In relation to the finding pertaining to stress, the weaker association may be due to the intricate relationship between GD and stress responses as stressful life events are an important predictor of disordered gaming (Li et al., 2016) while at the same time GD has been found to be a stress response (Snodgrass et al., 2014).

The correlation between GD and anxiety was surprisingly low in the present study. Previous studies that have evaluated the relationship between GD, as measured by the IGDS9-SF, and anxiety (i.e., measured with the DASS-21) have found a stronger relationship between the two constructs (Pontes & Griffiths, 2016; Yam et al., 2019), suggesting either that anxiety-related symptoms may be better represented in the DSM-5-based instruments such as the IGDS9-SF or that the coronavirus pandemic (COVID-19) may have influenced the results. In relation to the latter, taking into account that data collection of the present study was undertaken during the COVID-19 pandemic, the low correlation between GD and anxiety may be explained by the current pandemic situation as it is likely that baseline levels of mental health factors (e.g., anxiety) among the participants recruited were modulated by the mental and emotional toll brought about the COVID-19 (Liang et al., 2020) potentially confounding the findings related to anxiety.

In terms of personality traits, GDT scores were inversely associated to conscientiousness with a moderate strength. In line with this result, Müller et al. (2014) found that low consciousness was a predictor for addictive disorders, and Dieris-Hirche et al. (2020) concluded that disordered gamers showed lower levels of conscientiousness. To a lesser extent, GD was inversely related to agreeableness and directly to openness, although both associations were very weak. Other studies have found agreeableness to be negatively correlated with motives to play (de Hesselle et al., 2020) and even defined as a protective factor of behavioral addictions (Kayiş et al., 2016) such as GD (Mihara & Higuchi, 2017). Results on openness are contradictory; some studies have found an inverse association and some others a non-existing association (Şalvarlı & Griffiths, 2019). Although neuroticism has been repeatedly associated with GD (Dieris-Hirche et al., 2020; Mihara & Higuchi, 2017; Wittek et al., 2016), this finding was not observed within the present study. Future studies should be conducted in order to disentangle relationships between GD, as measured by the GDT, and other relevant constructs for gathering additional evidence of nomological validity.

Taking together, the results on the validity and reliability allow us to conclude that the Spanish GDT is an adequate psychometric test to assess GD within the Spanish population, reaching similar promising results than those found with the Spanish version of the Spanish IGDS9-SF (Beranuy et al., 2020; Maldonado-Murciano et al., 2020; Sánchez-Iglesias et al., 2020). Since the Spanish IGDS9-SF assesses disordered gaming under the APA framework, the present Spanish GDT represents a better assessment option for health professionals and clinicians alike when conducting assessment of disordered gaming adopting the WHO framework. Moreover, the Spanish GDT offers additional practical benefits due to its brevity.

Despite the findings reported, the present study is not without limitations. One of the main limitations is the sampling strategy used to recruit participants, as participants were self-selected. Consequently, the results reported cannot be directly generalized to the general Spanish population. Further research utilizing different sampling strategies (e.g., random sampling) should be conducted to help overcome the current sampling limitations and estimate prevalence of GD in more representative samples. Since a clinical sample was not recruited to the present study, we were not able to explore the diagnostic accuracy of the Spanish GDT in terms of its sensitivity and specificity nor test its MI across diagnostic groups. Future studies may help advance the literature on the assessment of GD under the WHO framework by examining the diagnostic accuracy of the GDT in clinical samples using a valid and reliable gold standard (i.e., a formal psychiatric assessment) that also allows to estimation of cut-off points for the GDT. Moreover, in the absence of a clinical gold standard, researchers may develop ad hoc cut-off points using mixture modeling techniques such as latent profile analysis or latent class analysis to derive an empirical gold standard, similar to what has been done in past research (Fuster et al., 2016; Király et al., 2017; Pontes et al., 2014; Severo et al., 2020).

Finally, it is plausible that decreased mental health levels due to the COVID-19 pandemic may have influenced the relationship between GD and mental health factors investigated in this study. Notwithstanding these potential limitations, the results obtained indicate that the Spanish GDT is a useful psychometric test to assess GD in community-based samples across a wide age range within the Spanish context, lend empirical support for the concept of GD as suggested by the ICD-11, and pave the way for new psychometric research on GD in Spanish-speaking countries.

Conclusion

This study has developed the Spanish GDT and investigated its psychometric properties, thus contributing to advancing the current understanding of GD and its assessment under the WHO framework. The results reported suggest that the Spanish GDT presents a unidimensional factor structure, consistent test scores, measurement invariance across genders, and that the items of the GDT are more precise at medium and high levels of the GD trait. Notably, the GDT is a promising assessment tool that can be used in both clinical and epidemiological studies within the Spanish context.