Introduction

The use of electronic devices has become a ubiquitous part of children’s and adolescents’ daily life [1]. Although there is no consensus yet for children and adolescents regarding recommendations on screen time, limitations of spending no more than 1–2 h daily on screens for recreational purposes are advised [2, 3]. However, previous studies found that the time spent on electronic devices in childhood and adolescence is rapidly increasing, especially during the COVID-19 pandemic [4, 5••, 6, 7•]. In extreme cases, such behavior can lead to problems akin to those of substance addictions, including psychological and interpersonal consequences such as loss of control, abstinence symptoms, tolerance development, and deceit of family and friends [8, 9]. Since Internet use disorders (IUD) may have a negative impact on the individual’s offline lives [10] with severe functional impairments and suffering [11••], this issue raises concerns among parents, teachers, clinicians, etc. about its adverse consequences [12], like school absenteeism [13] or mental health problems [14]. To prevent/counteract these negative consequences, it is important to identify PIU and initialize effective prevention and/or intervention strategies in a timely manner [7•, 15, 16•, 17].

Younger individuals have been identified as having a higher risk for problematic Internet use (PIU) behavior. While the prevalence of Internet-related disorders is about 1–2% in the general population [18, 19], higher prevalence rates up to 4% were found among adolescents aged 14 to 16 years [19]. A recent meta-analytic review implicates that due to the COVID-19 pandemic global prevalence of IUD increased significantly [20].

Notwithstanding the possible adverse effects and the acceptance of PIU as a significant global mental health issue [21], IUD is not officially recognized as mental disorder by not being incorporated into any of the new classification systems. To date, only the diagnosis of Internet Gaming Disorder (IGD) is included in the appendix of the Diagnostic and Statistical Manual of Mental Disorders, fifth edition (DSM-5) [22] as a condition for further study and in the International Classification of Diseases, 11th edition (ICD-11) as officially recognized disorder [23] based on available evidence as well as public health and clinical needs [24, 25].

Suitable screening tools and clinically validated instruments are indispensable to further explore the construct of PIU and to establish the clinical relevance of Internet addiction (IA) [7•]. Given the growing impact of Internet use through new technological tools [9, 26], the various scales on PIU and IUD and despite growing research interest in this behavioral addiction [27], to date, the assessment of PIU and IUD lacks clear recommendations regarding the most optimal tools, especially concerning the vulnerable group of younger age groups [28]. Previous reviews are quite outdated (e.g., [29,30,31]), and not limited to children and adolescents [29, 30]. Moreover, neither of these reviews provided an exhaustive overview of the available scales and its psychometric properties, nor provided clear recommendations for particularly well-validated instruments on the basis of a priori defined clear criteria. Therefore, this systematic review sought to provide a psychometric guide which would assist researchers and clinicians in their choice of adequate instruments. More specifically, we aim to identify instruments assessing PIU and IUD in children and adolescents, critically examine their psychometric properties, and finally derive recommendations for particularly well-validated instruments based on these characteristic values and other assessment criteria.

Methods

Study Procedure

A computer-assisted systematic search in the following five electronic databases was conducted based on corresponding key words considering the PRISMA-guidelines [32]: (1) PsycINFO (multi-field search; limitation to: “abstract”), (2) PubMed (advanced search; limitation to: “title/abstract”), (3) PsycArticles (multi-field search; limitation to: “abstract”), (4) Scopus (advanced search; limitation to: “title/abstract”), (5) Web of Science (advanced search; command: “all”).

In the next step, we conducted reference list searches of each of the included validation studies in order to obtain as many relevant reports as possible.

The review followed the guidelines and standards of PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analysis) for writing systematic reviews. A PRISMA flow chart of the entire literature search and study selection process is summarized in Fig. 1.

Fig. 1
figure 1

PRISMA flow diagram

Inclusion and Exclusion Criteria

The search was limited to articles published in peer-reviewed journals (document type articles) as well as primary studies written in English language. Further criteria were the inclusion of participants aged up to 18 years (maximum mean age sample: 18.9 years), the validation of (screening) instruments assessing IUD in general (no gaming or social network use disorder) with a detailed report of psychometric properties (more information than solely the reliability of the tool). Some studies did not conduct a systematic validation of an instrument, but used IUD (screening) instruments in empirical studies, providing partial evidence for psychometric properties. These studies have also been included in our review. We included studies that aimed at either developing a new scale or providing a psychometric evaluation of already existing instruments.

Analogously, studies evaluating other aspects of Internet use or studies including adults were excluded. Further exclusion criteria were qualitative studies, reviews, and meta-analyses. Other constructs such as problematic use of the smartphone or media in general were not considered to stick to a uniform construct. Furthermore, we excluded articles which generally only gave partial evidence of a scale’s psychometric properties (e.g., only assessing the reliability of a scale).

Search Strategy

We generated a search algorithm for PsycInfo in an iterative process. Subsequently, further systematic searches were conducted in the other four databases. Various keyword combinations were created and linked by using the Boolean operators “AND” and “OR.” Moreover, truncations (*) and Wildcards ($) for possible additional characters were used to account for different variations and to include terms not yet taken into account. In addition, synonyms were included in the search strategy to achieve a higher/greater hit rate.

The final search in the academic databases took place on January 15, 2024, containing the following search term combination:

(“measur*” OR “tool#” OR “test*” OR “validat*” OR “psychometric*” OR “screen*” OR “diagnos*” OR “item#” OR “instrument#”) AND (“internet” AND “use disorder#” OR “addict*” OR “depend*” OR “abus*” OR “misus*” OR “problem*” OR “risk*” OR “hazard*” OR “compuls*” OR “obsess*”) AND (“child*” OR “adolesc*” OR “young age”).

Data Synthesis

Results are summarized in Tables 1 and 2 in compliance with previous reviews on subtypes of IUD, like Gaming Disorder [33••] or Social Network Use Disorder [34]. In Table 1, study characteristics like sample size, age group, country of the validation samples, and sampling approach are presented. This is supplemented by information on the respective tool, such as language version, investigated construct, components, items, response format, and cut-off score.

To illustrate the psychometric properties of each scale, we extracted the relevant psychometric parameters from the validation studies and summarized the information in Table 2. If provided, we reported the dimensionality (method; factor structure), reliability (internal consistency; test–retest), and test refinement (Rasch/IRT; measurement invariance), as well as construct and criterion validity of the scale in question.

The reliability was assessed primarily with internal consistency (mainly Cronbach’s alpha); some studies also reported test–retest reliability (with Pearson’s correlation coefficient r). For construct/external validity, a significant correlation between PIU and IUD scales with other adequate assessment tools measuring a conceptually similar construct was considered. Criterion validity was defined by a significant association between the tool and variables related to Internet use, i.e., frequency or daily/average time spent on the Internet, in anticipation that these variables would be positively correlated with PIU and IUD.

Finally, we evaluated the psychometric properties and other criteria in a quick reference guide (QRG) in Table 3 yielding validation evidence for each study. Table 1 contains a final evaluation of tool characteristics being integrated across studies, i.e., that also considers the frequency of studies in which each criterion for a scale is covered.

Table 1 Integrated quick reference guide to all reviewed tools (n = 31) for IUD among children and adolescents

The rating of the QRG was conducted by two independent coders (S.S. and L.H.). Any discrepancies between the coders in the ratings were resolved through discussion and by involving a third researcher (H.-J.R.).

In evaluating psychometric properties of the scales, we also referred to previous systematic reviews on special forms of IUD, like Gaming Disorder [33••] and Social Network Use Disorder [34]. The latter involves 13 criteria illustrated in Tables 3 and 1: (1) DSM-5 coverage, (2) ICD-11 coverage, (3) Cut-off score, (4) Longitudinal data, (5) Dimensionality, (6) Internal consistency, (7) Test–retest reliability, (8) Test refinement (i.e., Rasch; IRT; Measurement invariance), (9) External validity (i.e., convergent validity), (10) Use of clinical samples, (11) Impairment, (12) Wording, and (13) Quality of sampling approach.

Results of the Systematic Literature Search

The systematic literature search in the databases resulted in a total of 11,408 potentially relevant hits. After removing 4188 duplicates, 5405 were assessed as not relevant after reviewing the titles of the remaining 7220 non-redundant studies. Another 1304 articles were excluded after reading the abstract. Subsequently, 511 full texts were examined, which resulted in the exclusion of studies narrowing down to 68 articles (see Fig. 1 for details). The most common reason for exclusion was of methodological nature, i.e., the report of insufficient reliability criteria, mostly only Cronbach’s alpha. This information was not considered sufficient to assess the psychometric quality of the screening tool. In the next step, we conducted reference list searches of each of the 64 included reports for additional scales, yielding a total of 2 more reports. Finally, 70 studies validating 31 survey instruments were included.

Description of the Studies

The inclusion of 70 studies yielded a total of 31 tools to detect PIU and IUD in children and adolescents: (1) Chinese Internet Addiction Goldberg Scale (CIA-Goldberg) [35], (2) Chinese Internet Addiction Goldberg Scale – Young (CIA-Young) [35], (3) Chen Internet Addiction Scale (CIAS) [36] [37], (4) Revised Chen Internet Addiction Scale (CIAS-R) [38], (5) Chen Internet Addiction Scale – Short Form (CIAS-SF) [39•], (6) Chinese Internet Addiction Test (CIAT) [40], (7) Compulsive Internet Use Scale (CIUS) [41,42,43,44,45,46,47,48,49,50,51], (8) Diagnostic Interview for Internet Addiction (DIA) [52], (9) Excessive Internet Use Scale (EIUS) [53], (10) Generalized Problematic Internet Use Scale 2 (GPIUS2) [54,55,56,57,58], (11) Internet Addiction Scale (IAS, (originally developed by Nichols and Nicki, 2004 [59]) [60], (12) Internet Addiction Scale (IAS, originally developed by Young, 1998 [61, 62]) [63], (13) Internet Addiction Test (IAT) [64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80], (14) Internet Addiction Test – Adolescence version (IAT-A) [81], (15) IAT – Short version (IAT – Short) [82, 83], (16) brief Internet Addiction Questionnaire (bIAQ) [84], (17) Internet Disorder Scale (IDS-15) [85], (18) Initial screening scale [86], (19) Internet-user Assessment Screen [87], (20) Index of Problematic Online Experiences (I-POE) [88, 89], (21) Internet Related Experiences Questionnaire (IREQ) [90, 91], (22) Korean Scale for Internet Addiction (K-Scale) [92], (23) Online Cognition Scale (OCS) [93], (24) Problematic Internet Entertainment Use Scale for Adolescents (PIEUSA) [94, 95], (25) Problematic Internet Use Questionnaire (PIUQ) [96,97,98], (26) Problematic Internet Use Questionnaire – Short Form, 6 Items (PIUQ-SF-6) [99,100,101], (27) Problematic Internet use questionnaire – Short Form, 9 Items (PIUQ-SF-9) [97, 98, 102], (28) Problematic Internet Use Scale in adolescents (PIUS-a) [103], (29) Parental Version of Young Diagnostic Questionnaire (PYDQ) [104], (30) Short Problematic Internet Use Test (SPIUT) [105], and (31) Young’s Diagnostic Questionnaire (YDQ) [106, 107].

These tools are available in a total of 17 languages: Arabic, Chinese, Croatian, English, French, German, Greek, Hebrew, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Persian, Spanish, and Turkish. The samples were mostly recruited in Europe (n = 35), followed by Asia (n = 29), America (n = 4), and Africa (n = 1; Table 1).

Of all instruments, about half (n = 17) had an explicit reference to Internet addiction: CIA-Goldberg, CIA-Young, CIAS, CIAS-R, CIAS-SF, CIAT, DIA, IAS (Original: [59]), IAS (Original [61, 62]:), IAT, IAT-A, IAT – Short, bIAQ, IDS-15, IREQ, K-Scale, PYDQ, YDQ. The other half focused on problematic/excessive/compulsive Internet use: CIUS, EIUS, GPIUS2, Initial screening scale, Internet user Assessment Screen, I-POE, OCS, PIEUSA, PIUQ, PIUQ-SF-6, PIUQ-SF-9, PIUS-a, and SPIUT (Table 1).

Of the 31 scales, only 12 had been evaluated more than once in terms of their psychometric properties (CIAS, CIUS, GPIUS2, IAT, IAT – Short, I-POE, IREQ, PIEUSA, PIUQ, PIUQ-SF-6, PIUQ-SF-9, and YDQ). With a total of 17 validation studies, the IAT was by far the most frequently validated measure, followed by the CIUS (n = 11 studies), GPIUS2 (n = 5 studies), and PIUQ-SF-6 (n = 4 studies). Scales validated by 3 studies were PIUQ and its short version PIUQ-SF-9. The CIAS, IAT – Short, I-POE, IREQ, PIEUSA, and YDQ were validated by 2 studies. The other 19 scales had the lowest evidence base by being validated by only 1 study: CIA-Goldberg, CIA-Young, CIAS-SF, CIAS-R, CIAT, DIA, EIUS, IAS (Original: [59]), IAS (Original [61, 62]:), IAT-A, bIAQ, IDS-15, Initial screening scale, Internet-user Assessment Screen, K-Scale, OCS, PIUS-a, PYDQ, SPIUT (Tables 1, 2, 3, and 1).

Except for 7 studies adopting a dichotomous response format (yes/no) on CIA-Goldberg [35], CIA-Young [35], CIAT [40], IAT [67], I-POE [88], PYDQ [104], and YDQ [106, 107], other tools used Likert rating scales, primarily a 5-point Likert scale.

The minimum mean age for which validation was undertaken was 8.55 years (SD = 2.02) (CIUS [48]). Moreover, some of the instruments presented were developed for adolescents and validated on this young sample, like CIA-Young, IAT-A, PIUESA, and PIUS-a.

A total of 5 short versions of the CIAS, IAT, and PIUQ have been validated in children and adolescents: CIAS-SF, PIUQ-SF-6, PIUQ-SF-9, bIAQ, IAT – Short.

Two instruments standing out from the usual self-reporting questionnaires could be identified: the PYDQ [104], a parental instrument to assess Internet use behavior of their own children, and the DIA [52] consisting of an independent evaluation in the form of a semi-structured interview based on DSM-5.

A second scale based on the DSM-5 criteria was the IDS-15. However, none of the identified tools was based on the ICD-11.

Even though in about half of the studies a cut-off value has been reported (n = 29 studies, 23 tools: CIA-Goldberg [35], CIA-Young [35], CIAS-R [38], CIAS-SF [39•], EIUS [53], GPIUS2 [57], IAS [60], IAS [63], IAT [65, 70, 76, 78,79,80], IAT-A [81], IAT – Short [83], bIAQ [84], Initial screening scale [86], Internet-user Assessment Screen [87], I-POE [89], IREQ [91], K-Scale [92], OCS [93], PIEUSA [95], PIUQ [96, 97], PIUQ-SF-6 [100], PIUQ-SF-9 [97, 102], PIUS-a [103], YDQ [106]), overall, only 2 studies have externally validated the cut-off scores, namely for the CIAS [37] and IAT – Short [83], in which a diagnostic interview was conducted to determine a reliable cut-off value.

Psychometric Findings

In total, internal consistency ranged between α = 0.68 (CIA-Goldberg) [35] to 0.96 (PIUQ) [98], with mostly high reliability scores, since in 51 studies a Cronbach’s alpha higher than 0.80 could be traced (Table 2). Although the length of the scale should be taken into account [108], even the short scales exhibited to be sufficient reliable. By comparison, significantly fewer tools were tested for test–retest reliability (n = 9 tools; n = 11 studies). Here, the IAS showed the highest outcome (r = 0.98, 1 week [60], followed by the CIUS (r = 0.89, 2 weeks [47]).

Questionnaires with the longest test–retest duration were the CIAS-SF (48 months, r = 0.20) [39•], and the IAT (24 months, r = 0.56) [72].

With regard to construct validity, operationalized as relationship between the IUD instrument and other similar or closely related tools, most studies examined the correlation with other screenings on PIU and IUD (range: r = 0.34 (DIA with Internet Addiction Proneness Scale for Adolescents (O_A)) [52] up to r = 0.88 (PIUQ with IAT [98]), using mostly the IAT as a reference tool. The highest convergent validity was found for the PIUQ, which correlated with the IAT at 0.88 [98], and its short versions (PIUQ-SF-9: r = 0.86; PIUQ-SF-6: r = 0.85) [98], followed by the CIUS (r = 0.85 with IAT) [43].

In terms of criterion validity, which was evaluated by examining the association with Internet use behavior (i.e., time spent on Internet activities) and the total score of an instrument, validation studies ranged from 0.19 (CIA-Goldberg) [35] to 0.47 (CIUS) [42].

Furthermore, although a total of 11 studies employed invariance analyses, only 2 tools have been tested for measurement invariance using Rasch analyses, namely the IDS-15 and IAT.

No tools retained in this review have been examined in conjunction with standardized measures of functional impairment. Accordingly, no statement on the clinical significance of IUD can be derived.

Discussion

Validation Frequency and Instruments with the Highest Scoring

The systematic review investigated psychometric properties of a total of 31 survey instruments on PIU and IUD in children and adolescents based on 70 identified empirical studies which were mainly conducted in Europe and Asia. The IAT was the most frequently validated IUD tool, followed by CIUS, GPIUS2, and PIUQ. If the QRG is used as a basis for evaluation, no clear recommendation in favor of a particular tool can be made. With a total score of 10 out of 26 points, the GPIUS2 performed best, followed by CIUS, IAT, and the short form of the PIUQ, PIUQ-SF-6 achieving 9 points (Table 1). If only the individual validation studies are taken into account, i.e., without integrating the evaluations of the individual studies on each instrument (Table 3), the IDS-15 [85] has also achieved quite good results. However, when considering the number of validation studies and the individual findings integrated for a single instrument, the IDS-15 scored only 8 points and is therefore not included in the following recommendation, as further validation studies on this instrument are required.

GPIUS2

The GPIUS2 by Caplan [111] consisting of 15 items and an 8-point Likert scale is based on the cognitive-behavioral model, and hence not referring to DSM or ICD criteria which can be considered a weakness. Furthermore, no data on retest reliability could be identified. A cut-off score has been specified (52 points) [57], though not externally validated. Therefore, it can be used for screening purposes, although an external validation of the cut-off is still pending.

CIUS

With a total of 11 validation studies, the CIUS [109] consisting of 14 items using a 5-point Likert scale was the second most validated tool in young age groups. However, this strength is countered by the disadvantage that the CIUS is based on the DSM-4 criteria for pathological gambling, as well as on the component model of behavioral addiction [110], which are no longer current. In addition, the CIUS has no specified cut-off value for individuals younger than 18 years. Therefore, the CIUS is rather inadvisable for diagnostic purposes in this age group. However, validation studies on this tool relied more frequently on representative, large samples compared to the other scales, which constitutes an advantage. Moreover, also data on test–retest reliability are available [47], in contrast to the IAT and GPIUS2.

IAT

Although the IAT [61] is the most validated tool (17 included studies), this test has some weaknesses, e.g., being based on an older theoretical concept (DSM-4 criteria for pathological gambling) and heterogenous cut-off values, item numbers (10–20 items), and response formats (dichotomous vs. from 3- up to 6-point Likert scale). Only one study so far has evaluated the usefulness of the given cut-off-scores [77]. In addition, validation studies have not tested this tool with regard to retest-validity. Moreover, with a total of 20 items, the IAT might be quite long and thus less economic compared to other instruments. At the same time, only the IDS-15 [85] and IAT [75] have been tested for measurement invariance using Rasch analysis.

Some refinements of the IAT have been undertaken: an adolescent version, IAT-A [81], and 2 short versions, IAT – Short [82, 83], and bIAQ [84]. However, little data is available regarding its modified items, as they have been validated only once so far in younger age groups. External validity has been tested as correlation with other IUD tools, and first data exist regarding the correlation with an external gold standard derived by a clinical interview [83].

PIUQ

The PIUQ by Demetrovics et al. [112] has also been evaluated comparatively often in young age groups, but is not based on an addiction theory model by referring only to the items of the IAT. A further weakness of the PIUQ is its divergent cut-off values which differ between the studies. However, if an economic assessment is desired, the short scales of the PIUQ, PIUQ-SF-9, and PIUQ-SF-6 can be used. It has been shown that, opposed to the relatively low test–retest reliability ranging between 0.54 and 0.56 [98], the internal consistency of the short scales PIUQ-SF-9 and PIUQ-SF-6 showed to be high enough to reliably measure PIU (α = 0.77) [100, 101]. Thus, the short versions of the PIUQ can be recommended due to their simpler manageability which offers users an economical alternative to the long version. Nevertheless, none of the instruments presented above has been validated in clinical samples, so a sound rating of the psychometric quality in clinical contexts is not possible.

PYDQ

As an external rating, the PYDQ extends the view of already established self-report instruments by a parent’s perspective. It appears to be another promising approach overcoming self-assessment, which can be biased by symptom denial [113] or poor introspection skills [114, 115] that could reduce adolescent assessment accuracy [113]. Additionally, the PYDQ is an interesting alternative, especially as parents are often the ones who seek help [104] and might provide a first impression of the PIU of the adolescents. Moreover, some of the items asking about observable behavior are easier for parents to assess than questions concerning the adolescent’s thoughts and feelings [104]. A moderate internal consistency of 0.70 for the PYDQ [104] was measured, which is in the range of the reliability coefficients for the self-assessment version of the YDQ (from α = 0.62 [107] to 0.72 [106]). However, the YDQ exhibited one of the lowest reliability scores compared to the other tools. For future research, comparison with the adolescents' ratings would be of great value to substantiate the construct validity of the PYDQ.

Theoretical Foundation

If the theoretical foundation referring to current DSM-5 criteria for Gaming Disorder is used as an assessment criterion, the instruments IDS-15 and DIA can be recommended. The latter instrument also sounds promising, as it expands the survey formats by providing a supplement to the commonly offered self-report questionnaires. However, further studies on these tools are desirable due to being validated only once so far. Thus, we are unfortunately unable to make a well-founded statement regarding its psychometric quality emphasizing this as a target for future research.

Clinical Validation

In one of the included studies, the DIA classification was verified in clinical practice by including only children and adolescents scoring above the cut-off for behavioral addiction on at least one screening questionnaire (DIA) [52]. In two other studies (CIAS, IAT – Short), a diagnostic interview by psychiatrists, i.e., an independent evaluation, took place [37, 83]. On the basis of the remaining studies, it is impossible to say whether the investigated tools can be used as a valid measure to distinguish between IUD and normal Internet use.

Cut-off Scores

Although about one-third of all survey instruments (n = 29) have specified a cut-off value, only 2 studies undertook an external validation of the tools (CIAS, IAT – Short). In addition, despite reporting cut-off values, some studies did not always explain exactly what this value means or what it refers to exactly. No cut-off value was reported for some of the instruments considered (CIAT, CIUS, DIA, IDS-15, PYDQ, SPIUT), so caution is needed here when classifying IUD vs. normal Internet use [116]. Furthermore, there were sometimes divergent findings between the studies regarding the cut-off value, even within 1 tool. For example, the cut-off value of 29 points [102] given the PIUQ-SF9 to classify PIU was more conservative than the cut-off value of 22 proposed by another study [97]. However, cut-off points are not only important for the purposes of epidemiological research but also to identify individuals who need interventions [117]. It would be helpful to use current clinical criteria and diagnostically accurate cut-off values to better differentiate between IA and normal Internet use enabling the use of already existing tools in clinical contexts [52, 118]. Moreover, longitudinal studies are needed for, among others, examination of the identified cut-off point in predicting behavioral and health outcomes [119]. Overall, identifying problematic users and establishing (clinical) relevant cut-off scores based on an objective gold standard, i.e., diagnostic interview in clinically diagnosed groups of problematic Internet users as validation sample, would be indicated [7•, 116, 117].

Test Refinement

Hardly any of the enclosed tools have been tested for measurement invariance using Rasch analyses, e.g., to assess or whether the items and their underlying construct have the same meaning for adolescents as for adults or for the assessment of cross-national or cultural differences [120]. So far, only the IDS-15 [85] and IAT [75] have been tested for Rasch invariance.

In only about one quarter of the studies (n = 16), validation was based on large, representative samples of children and adolescents. Hence, most validation studies have relied on non-probabilistic samples, so the power of such studies can be considered limited due to the lack of representativeness which comes along with a potential impact of sample-specific influences on our comparisons.

Directions for Future Research

As the majority of tools have only had 1 validation study, it seems that most studies are developing and validating new tools on a one-off basis rather than repeatedly validating existing methods. Thus, further studies on already existing instruments are needed to make a final decision on a recommendation [29, 66, 121]. This would be a first step towards a “gold standard” for the assessment of IUD, which would go hand in hand with a better synthesis of research findings as well as better comparability of test results [66]. A next step in establishing the gold standard for IUD assessment would be a reference to actual theoretical frameworks, like the DSM-5 for IGD.

Another challenge is a discrepancy in labeling and terminology as various synonyms of IUD exist in the literature with different terms such as “pathological Internet use,” “problematic Internet use,” “excessive Internet use,” “Internet dependency,” or “Internet addiction” being often used interchangeably without clear diagnostic differentiation [122]. Furthermore, not all instruments on IUD have been validated in children and adolescents, like for example the Chinese Internet Addiction Inventory [123].

In addition, transparency regarding the sampling approach, a better description of the sample, and in the case of large representative samples, indication for which characteristics this sample is representative would be also desirable for future research, as we had great difficulties with the rating for this criterion in the QRG due to the fact that only little information on this topic has been reported by the validation studies.

As the majority of tools are based on self-report being thus often susceptible to biases such as social desirability or recall bias, the additional use of external ratings by clinicians or relatives may prove useful in assessing whether self-reported symptoms are a good indicator of PIU [102, 104] especially in young age groups.

Finally, future studies should take a closer look at different activities on the Internet, e.g., chatting and streaming, which are associated with different intensity of use [64, 124]. At the same time, consideration should be given to adapting the already established instruments to the constantly changing trends on the Internet. Corresponding questionnaires were partly developed many years ago and might no longer correspond to the current use of language, especially by young people. In addition, the field of Internet use is very dynamic, with new platforms constantly coming onto the market that should be considered and older platforms needing to be reviewed to ensure they are up to date. In the Internet-user Assessment Screen, for instance, rather outdated items like “I often use MSN or ICQ of Blogs” are used [87]. A further example in this context, the IAT assesses with one item the frequency of checking emails, although in the meantime other communication tools such as TikTok and WhatsApp have largely replaced emails and are of higher importance among adolescents [65]. In general, the wording of items should be examined in order to ensure content validity [125].

In addition, the use of more objective measures of online activities, e.g., to assess the actual usage time with the help of apps, could be a useful valid alternative tool to self-report data [33••, 126].

Previous reviews of PIU screening instruments have reported various inconsistencies and psychometric weaknesses [31], which we can confirm based on the data presented: Overall, the quality of the included studies can only be rated as moderate, as studies in our review achieved only a score of 11 (IDS-15, Table 3 which does not yet integrate the rating for a single instrument across all validation studies) and 10 (GIUPS2, Table 1 comprising integrated ratings across studies for each instrument) out of maximal 26 points in the QRG. Therefore, for future research, it would be desirable to improve study quality, besides a clinical validation of existing tools under consideration of functional impairment.

Limitations

Results of our review should be interpreted in the context of certain limitations: First, our research project has not been preregistered beforehand, for instance in the Prospective Register of Systematic Reviews (PROSPERO). Secondly, studies presenting only one global value for internal consistency were excluded, which meant that additional reliability values were not considered. Thirdly, studies examining solely divergent validity were not included. Fourthly, a meta-analysis was not conducted, as we assumed considerable heterogeneity in assessment of PIU and IUD. No additional subgroup or sensitivity analyses were conducted either, as these were not in line with our study aims. Lastly, this search was limited to English language studies, thus in a potential omission of relevant but in other languages published literature.

Conclusions

This systematic review offers an overview of the psychometric evidence of existing screening instruments for IUD in the field of young age groups. The IAT was by far the most widely validated measure in our review. On the whole, our review revealed rather inconclusive findings: None of the validated instruments for the survey of PIU has proven to be clearly superior, as strengths in one area are offset by weaknesses in other areas. The GPIUS2 achieved the highest scores, followed by CIUS, IAT, and the short form of the PIUQ: PIUQ-SF-6. When using the reference to the classification systems as a criterium, the DIA and IDS-15 can be used.

Although no final recommendation on a specific screening regarding IUD in childhood and adolescence can be drawn, there are a number of promising scales, especially those scales with one validation study each still need further validation. Moreover, all of the included studies only partially provided psychometric properties as none of them has covered all of our criteria.

As a relatively new research field in the spectrum of addiction, the assessment of IUD would benefit from clear definitions and standardized screening instruments to identify Internet-related harms across the spectrum of maladaptive behavior patterns on the Internet. Further research on the optimization of screening tools and their applicability in large-scale studies as well as a development of future instruments with a clear cut-off-score taking the actual DSM-5 and ICD-11 criteria for IGD into account and being validated against a recognized (clinical) gold standard would be required to strengthen the research field on IUD. An expert appraisal by a group of experts in a Delphi process [127] is currently being pursued by our working group in order to develop recommendations regarding screening tools for use within younger population groups [128] to shed further light on this topic.