1 Introduction

International Large-Scale Assessments (ILSAs), such as the Programme for International Student Assessment (PISA), have increasingly influenced the educational decisions made by policymakers in recent years. ILSAs provide cross-national data on students’ cognitive or non-cognitive characteristics, such as achievement, motivation, and attitude. The availability of cross-national data has stimulated comparative analysis in education and psychology (Leitgöb et al., 2023). Educational researchers are studied in cross-national studies to comprehend similarities and differences in human behavior and assist in contributing to educational goals (Rios & Hambleton, 2016). The International Test Commission (ITC) guidelines have pointed out some remarkable issues leading to invalid inferences from the cross-national data in measuring educational and psychological constructs across cultures (ITC, 2017). One of the issues involves developing reliable and valid tests for cross-national applications. In every cross-national study, there is a need to address whether test scores obtained in diverse cultural groups can be interpreted in the same way across these groups, which is called measurement invariance or equivalence (Van de Vijver & Tanzer, 2004). ITC guidelines recommend only comparing scores across populations when the measurement invariance has been established on the scale. This emphasizes the importance of using state-of-the-art methodology in international assessments to ensure cross-national comparability and validity of results across different cultural groups (Hambleton & Zenisky, 2010).

Self-report questionnaires are administered, and scores are produced to evaluate students’ non-cognitive constructs across different education systems in ILSAs (He et al., 2019; van de Schoot et al., 2012). PISA has measured non-cognitive constructs such as information, communication, and technology (ICT), social and emotional characteristics, beliefs, and attitudes across the participating countries. PISA assesses the performance of countries by using student scores on cognitive or non-cognitive measurements. Achieving cross-national comparability is challenging because the observed difference in these constructs should reflect the actual group difference in constructs across different countries. It is not expected to result from the items operating inconsistently over the countries (Leitgöb et al., 2023). Measurement invariance (MI) is a statistical prerequisite for conducting trustworthy cross-national comparisons (Vanderberg & Lance, 2000). MI of a construct is a measurement of characteristics implied in different conditions such as cultures, countries (Ding et al., 2022), or time points (Sözer et al., 2021); the same construct is being measured in the same way regardless of group membership (Van de Vijver & Tanzer, 2004; Vanderberg & Lance, 2000). MI is ignored if individuals from different groups respond to a test item differently when they are at the same level on the construct (Kim et al., 2017). The observed variation of individual responses in the construct under different conditions should be irrelevant to the measurement instrument itself but reveal the true difference across the conditions (Ding et al., 2022). Establishing MI is important for providing trustworthy inferences derived from scores when researchers focus on comparing many groups based on non-cognitive constructs.

With the world becoming increasingly digitalized, PISA cycles have measured the relationship between the ICT-related variables in the learning process and educational outcomes. Researchers are able to compare ICT-related variables between countries using PISA assessments (Kim & Kim, 2023; Lee, 2023; Sırgancı, 2023; Zheng et al., 2024). Despite the extensive use of these assessments, ensuring their fairness and comparability across diverse cultural and educational contexts remains a challenge (Wurster, 2022). The motivation for this study arises from the need to enhance the validity of cross-cultural comparisons, which is essential for informed policymaking and educational improvement. Because ICT-related measurements are widely utilized, the MI is a fundamental requirement for making meaningful comparisons and drawing valid conclusions about educational policies on a global scale (Vanderberg & Lance, 2000). To test MI, the new and traditional methods were employed to ensure the robustness of our findings. By integrating these methods, we aimed to comprehensively assess the consistency and validity of the ICT-related measurements across different educational contexts, thereby improving the reliability of the cross-cultural comparisons.

This study examines the comparability of ICT-related measurements in PISA 2022, thereby providing fair and comparable results across diverse educational contexts. Furthermore, this research seeks to offer practical recommendations for policymakers and educational practitioners. These recommendations can guide the development of more equitable and effective educational policies that address the needs of diverse student populations and contribute to global educational improvement efforts. This paper is organized as follows. First, the literature on accessibility information using ICT is reviewed, focusing on ICT in PISA, followed by a review of the importance of MI and methods for testing MI. After introducing the methodological approach, results are presented and discussed.

1.1 Students’ practices regarding online information

Information and communication technologies (ICT) are increasingly significant in education. In recent years, there has been a rapid growth in integrating ICT into educational practices. The integration of ICT in education has allowed for the development of interactive and engaging teaching materials and innovative teaching methods (OECD, 2023a). The accessibility of educational resources through ICT has expanded and provided equal opportunities for quality education to students across countries and socio-economic backgrounds (Ma & Qin, 2021). The use of ICT in education allows for individualized learning experiences, as students can access resources and materials at their own pace and according to their individual needs and interests (Nadeem et al., 2018).

Research has shown that the use of ICT in education can improve students’ achievement (Zhao & Chen, 2023), motivation (Toma et al., 2023), and engagement (García-López et al., 2023). Integrating ICT into educational settings has increased access to information. However, online information may not always be trustworthy or reliable, so students should learn how to critically evaluate and verify the information they find online (Hu et al., 2018). This will help them develop digital literacy skills and become discerning information consumers in the digital age (Alkan & Meinck, 2016). The PISA framework explores access and use of ICT by 15-year-old students in and outside of school. This framework involves three major dimensions: (1) access to ICT, (2) use of ICT, and (3) students’ ICT competencies (OECD, 2023a). The use of ICT covers the intensity and the types of ICT used by students in formal and informal environments for learning and leisure. The ICT Familiarity Questionnaire provides data for 12 subscales for various aspects of ICT. One of these subscales measuring the use of ICT is Students’ Practices Regarding Online Information (ICTINFO), which includes various statements about students’ practices regarding online information. The aim of the study is to explore the comparability of this scale scores across countries in the PISA 2022 cycle.

There is a growing body of literature focusing on examining MI of PISA’s ICT scales before comparing across different countries, cultures, or languages (Kankaraš & Moors, 2014; Ma & Qin, 2021; Meng et al., 2018). These studies aim to ensure the validity and reliability of the ICT scales, allowing for meaningful comparisons of students’ scale scores across countries. Van de Vijver and Tanzer (2004) argued that cross-cultural studies must consider levels of invariance when assessing contextual factors. This means that for both cognitive and non-cognitive assessments, the scores from different cultures reflect what the assessment is intended to measure. Wu et al. (2023) studied the cross-cultural comparability of non-cognitive constructs - ICT readiness for school and teacher- from PISA 2018. They found that the school ICT readiness scale was invariant, but the teacher ICT readiness scale showed some degrees of noninvariance. Wu and colleagues (2023) suggest that using the scores derived from the teacher scale might result in improper conclusions to compare countries. Odell et al. (2021) investigated the international comparability of mathematics, science, and ICT scales from PISA 2015. Their results demonstrated that mathematics and science scores were invariant, whereas the ICT scale was mostly noninvariant and could not be used to compare ICT means reliably across countries. Overall, the results from different studies suggest that before any comparative inference is made from cross-cultural data, MI should be tested to avoid misinterpretation of the results and to make valid comparisons.

1.2 Measurement invariance

It is suggested that measurement instruments used for any purpose should be fair, which means that they reflect the same construct for all students, and scores from them have the same meaning for all students. A fair test does not advantage or disadvantage some students because of characteristics such as gender, age, or cultural backgrounds irrelevant to the intended construct (AERA et al., 2014). However, there are a number of potential biases that might damage the comparability of scores or fairness. Van de Vijver and Tanzer (2004) distinguished the measurement biases into three types, depending on where the source of bias is located. (1) Construct bias refers to the phenomenon where the interpretation of a construct varies across different cultures; (2) method bias arises from differences in sampling methods and the modes of administration; and (3) item bias means an item has a different meaning in different cultures. Psychometric comparability (absence of bias) should be checked before comparative inferences are made. Levels of comparability are also called measurement invariance (MI) (He et al., 2019).

Testing MI is a fundamental psychometric validation when utilizing data from ILSAs to ensure substantive cross-group comparisons (He et al., 2019; Van de Vijver et al., 2019). It provides that group differences in the latent/observed scores are not due to the measurement instrument but rather reflect their true differences in the construct being measured (Ma & Qin, 2021). Multiple Group Confirmatory Factor Analysis (MGCFA) within Structural Equating Modeling is the most widely used statistical approach (Meade & Lautenschlager, 2004; Leitgöb et al., 2023; Vanderberg & Lance, 2000). Generally, four levels of invariance are tested in studies: configural, metric, scalar, and strict. Configural invariance is the basic level that means the same items measuring the same latent constructs for different groups. Only establishing configural invariance does not allow for group comparisons at the latent or observed level. Metric invariance requires the same items to have the same factor loadings across groups (Leitgöb et al., 2023). If metric invariance is established, it allows for comparing group-estimated factor variances at the latent level. Scalar invariance requires the same items to have the same factor loadings and intercepts/thresholds to be equivalent across groups. Ascertaining scalar invariance allows for group comparisons of means at the latent level. Strict invariance requires the same items to have the same factor loadings, intercepts/thresholds, and error variances to be equivalent across groups. However, strict invariance does not directly affect the comparability of structural parameters across groups (Leitgöb et al., 2023).

The alignment method or alignment optimization introduced by Asparouhov and Muthén (2014) is proposed as an alternative to the MGCFA. Scalar invariance is necessary to make factor means comparable, but it is frequently rejected, especially if many groups are (e.g., Kaya et al., 2023). With many groups, the MGCFA approach becomes impractical due to the numerous potential invariance violations. The alignment method does not assume MI and overcomes these limitations to find the most optimal measurement parameters within the configural model (Kim et al., 2017). This method can estimate all parameters (factor loading, intercept, and measurement error) under the assumption of the minimum number and amount of noninvariant parameters. The alignment method has two steps (Muthen & Asparouhov, 2018): (1) The alignment method begins the estimation of the configural model in which factor loadings and intercepts are freely estimated, and factor means and variances are fixed to 0 and 1 for model identification. (2) Alignment optimization is the second step. It focuses on estimating group-specific factor means and variances (Immekus, 2021) and chooses their values to minimize the total amount of noninvariance using a simplicity function. The model fit of optimization is equivalent to that of the configural model (Asparouhov & Muthén, 2014).

Asparouhov and Muthén (2014) showed the strengths of the alignment method. It performs better in estimating group-specific factor means and variances for many groups without requiring MI. It can also be used to detect the most invariant items in the measurement instrument.  In recent years, the alignment method has been applied in many studies to examine MI across countries using ILSAs’ data (Ding et al., 2022; Immekus, 2021; Jin et al., 2023; Munck et al., 2017; Rolfe, 2021; Yiğiter, 2024). The alignment method is not widely recognized and is less utilized in testing MI ICT-related variables (Odell et al., 2021; Wu et al., 2023).

1.3 The present study

The PISA 2022 ICT framework explores the relationship between students’ use of ICT and cognitive achievement, well-being, and level of ICT competencies (OECD, 2023a). ICT competencies comprise five main areas, one of which is accessing, evaluating, and managing information and data. Accessing information requires sorting through different information sources and evaluating their relevance and usefulness. As growing amounts of information are generated, the process of filtering through all information is becoming progressively vital (Frallion et al., 2013). The quality, credibility, and accuracy of online information shape students’ practices with ICT both in and outside the classroom. The PISA 2022 framework measures this area by the “Students’ Practices Regarding Online Information (ICTINFO)” scale. To our knowledge, the MI of this scale to be a valid cross-national comparison has not been investigated. The current study aims to examine the MI of the students’ practices regarding online information scale across countries within the PISA 2022 using MGCFA and alignment method.

2 Methods

2.1 Data source

The PISA 2022, eight cycles, has been implemented in 81 countries, of which 37 are OECD countries. Data from 29 OECD countries administered to the ICTINFO questionnaire were retrieved from the PISA 2022 official website (https://www.oecd.org/pisa/data/2022database/). Since some OECD countries did not implement or answer the questionnaire, eight countries were removed from the data. The final data contains 187.614 15-year-old students. All countries administered the ICTINFO questionnaire, which was implemented using computers. Table 1 shows the sample size and reference code of each country.

Table 1 Reference code and sample size by countries

The study includes data exclusively from OECD countries due to their diverse yet well-documented and comparable educational systems and data collection practices (OECD, 2023a). This selection allows for a robust analysis of MI that offers comparability across countries with similar socio-economic characteristics. It enhances the study’s reliability and validity in proposing broader implications for international educational assessments (Tabak & Çalık, 2020).

2.2 Measures

Students’ practices regarding online information (ICTINFO) scale is a self-reported six-item. Students were asked to rate their agreement with various statements about their practices regarding online information using four response categories with a Likert-type scale (“Strongly disagree,” “Disagree,” “Agree,” “Strongly agree”). The scale items -IC180- are presented in Table 2, which is one of the subscales of the ICT Familiarity Questionnaire. The Cronbach’s alpha coefficients of the scale ranged between 0.75 and 0.89 values across participating countries (OECD, 2023b), indicating appropriate reliability.

Table 2 Items of students’ practices regarding online information scale (ICTINFO)

2.3 Data analysis

The data analysis started with the data cleaning, including the examination of missing and extreme values and assumptions of the multivariate analysis. After checking the assumptions, MGCFA and the alignment method were utilized to explore the level of MI for the ICTINFO questionnaire across countries.

2.3.1 Multiple group confirmatory factor analysis (MGCFA)

MGCFA was used to determine the level of invariance among countries. MGCFA follows a sequential procedure. The first step is the configural invariance, which assumes the indicators of the construct underly a common factor structure. If the configural invariance is held, the metric invariance is tested where the factor loadings are constrained to be equal. If the metric invariance is supported, the scalar invariance is tested where the intercepts are constrained to be equal. Finally, if the scalar invariance is supported, the strict invariance is tested where the measurement errors are constrained to be equal (Vanderberg & Lance, 2000). The chi-square (χ2) test was used to evaluate model fits when the models were nested. The additional model fit indices are recommended to evaluate model fit because the chi-square test is sensitive to sample size (Cheung & Rensvold, 2002). The root mean square error of approximation (RMSEA) and the standardized root mean square residual (SRMR) (< 0.08), and the comparative fit index (CFI) and the Tucker-Lewis index (TLI) (> 0.95) values are taken as references in the model comparisons (Hu & Bentler, 1999). Criteria for metric and scalar invariance included a difference in CFI less than 0.01 (ΔCFI ≤ 0.01) (Cheung & Rensvold, 2002) and a change in RMSEA less than 0.015 (ΔRMSEA ≤ 0.015) (Chen, 2007). Rutkowski and Svetina (2014) recommended that a more liberal ΔCFI ≤ 0.02 and ΔRMSEA ≤ 0.03 value be used for evaluating metric invariance when large numbers of groups are under consideration.

2.3.2 The alignment method

The alignment procedure was introduced by Asparouhov and Muthén (2014). The method assumes that most measurement parameters remain invariant across groups while a small set of parameters is noninvariant (Leitgöb et al., 2023). This method can be based on either maximum likelihood estimation (MLE) or Bayes estimation. This method has two alignment optimizations called FIXED and FREE. The FREE optimization is that a configural model is established, and factor loadings and intercepts are freely estimated for each group. It is recommended to initially employ FREE optimization, particularly for comparisons involving many groups. However, it is possible that the alignment method may suffer from poor identification or fail to converge. If so, the FIXED optimization has to be used. The FIXED optimization constraints the factor means and variances in the reference group to 0 and 1, respectively.

The current study followed two steps during the alignment analysis. It started estimating a configural model with FREE optimization using the Bayes estimation method. Since the analysis produced a warning message about model convergence, the statistic programme suggested using FIXED optimization with a 27 (Sweden) reference group as the baseline group. In the FIXED optimization, the factor mean was constrained to 0 for the reference group. The next step is where factor means, and variances are assigned values based on a pattern of parameter estimates using a simplicity function to minimize the total amount of noninvariance for paired groups. The estimation stopped when the least amount of noninvariant parameters was achieved (Odell et al., 2021).

The method separates the invariant and noninvariant parameters. Asparouhov and Muthén (2014) recommended that less than 25% of noninvariant parameters could be statistically confident in making group comparisons. The alignment output provides a table of which groups are invariant for a given parameter. The noninvariant parameters are shown in parentheses (Odell et al., 2021). An R2 value measures invariance effect size, quantifying how much variability in the item parameter can be explained by the factor means and variances. It ranges from 0 to 1, and a value closer to 1 indicates a more invariant parameter (Asparouhov & Muthén, 2014; Kim et al., 2017). In this study, a Monte Carlo (MC) simulation was conducted to assess the stability of factor means across different groups. This process offers additional evidence for the reliability of the alignment solution (Asparouhov & Muthén, 2014). MGCFA and the alignment method were estimated using Mplus (version 8.3) (Muthén & Muthén, 19982012).

3 Results

3.1 MGCFA results

The results are presented for MGCFA following the step-by-step. Table 3 shows fit statistics comparing the configural, metric, and scalar models for ICTINFO scale across 29 OECD countries. Firstly, the configural model was fitted for each country, indicating acceptable model-data fit (χ2 (203) = 5600.39, CFI = 0.98, TLI = 0.97, RMSEA = 0.065, SRMR = 0.021). After the configural model was fitted, the metric model was estimated. The metric model demonstrated an acceptable model fit (χ2 (343) = 7780.83, CFI = 0.98, TLI = 0.97, RMSEA = 0.058, SRMR = 0.051). The chi-square difference test was significant between the configural and metric models (Δχ2 = 2180.44, df = 140, p < .05). The changes in CFI (ΔCFI = 0.006 < 0.01) and RMSEA (ΔRMSEA = 0.007 < 0.015) indicated that the more constrained metric model did not have a poorer fit than the less constrained configural model. Thus, the metric invariance was established. The scalar model showed an acceptable fit (χ2 (483) = 19600.98, CFI = 0.95, TLI = 0.95, RMSEA = 0.079, SRMR = 0.084). The chi-square difference test was significant between the configural and metric models (Δχ2 = 11820.15, df = 140, p < .05). The changes in CFI (ΔCFI = 0.03 > 0.01) and RMSEA (ΔRMSEA = 0.021 > 0.015) were exceeded the cut-off criteria indicated that the more constrained scalar model had a poorer fit than the less constrained metric model. The scalar model was not established.

Table 3 Goodness-of-fit-statistics for MGCFA

According to the MGCFA results, comparing factor means across countries was impossible because the scalar invariance did not hold. When there were many groups, it could be seen that it is hard to determine which countries the MI does not hold. There were many large modification indexes for the scalar model, which implies that a long sequence of model modifications is needed.

3.2 Alignment method results

The alignment method started using the FREE optimization. This optimization approach produced an error message about poor model identification. Thus, Sweden (27) was selected as the reference group in the following analysis with the FIXED approach.

Table 4 demonstrates each item’s intercept and factor loading contribution to the simplicity function. The Fit Function Contribution (FFC) provides information values for each item’s intercepts and factor loadings to the final simplicity function. A high FFC is an indication of possible noninvariant items. The item IC180.4 contributed the least to the fitting function for intercept parameters, whereas IC180.6 had the most amount of noninvariance. In other words, the item IC180.4 had the most amount of invariance in the intercept parameters. The item IC180.5 contributed the least to the fitting function for factor loading parameters, whereas IC180.1 (very close to IC180.2) had the most amount of noninvariance. The Total Contribution Function (TCF) presents the total contribution of each indicator to the fitting model. The item IC180.4 contributed the least to the fitting function (-280.11), implying that it was the most invariant item. The item IC180.6 (-358.90) was the least invariant item. There was more invariance in the factor loadings (-883.64) than in the intercepts (-1003.48).

Table 4 Alignment fit statistics of items

The higher R2 value indicates the more likely the invariant item. In Table 4, the R2 values of IC180.4 and IC180.5 were higher for the factor loading and the intercept, respectively, which indicates the most invariant items. On the contrary, the R2 values suggested that the items IC180.6 and IC180.2 were the least invariant items in the intercept and the factor loadings. The R2 values of IC180.1 and IC180.2 were equal to zero for intercepts. A little confusing can be the fact that this R2 can be small, even if the corresponding parameter is highly invariant (Rudnev, 2024). Consequently, the parameters of each item exhibited varying degrees of noninvariance.

The (non) invariance of factor loadings and intercepts across countries is shown in Table 5. A bold parenthesized group is an indication of noninvariance. As seen in Table 5, most of the items show varying degrees of measurement noninvariance for the intercepts and factor loadings. For example, factor loadings of the IC180.1 differed significantly for the parenthesized countries coded as 11, 12, 16, 19, 20, 21, 24, 25, 27, and 28. The metric invariance was not provided for the IC180.1 item across these countries. Factor variances can be compared for countries other than these countries. The intercepts of the IC180.1 were noninvariant across countries coded as 1, 5, 6, 7, 8, 10, 11, 12, 16, 18, 20, 21, 22, and 28. The scalar invariance had not been established across these countries for the item IC180.1. The latent (factor) means cannot be compared across these countries. In terms of factor loadings, given six items and 29 countries (a total of 174 (6*29) parameters), 53 noninvariant parameters (30,45%) exceeded the proposed 25% noninvariance rate. The 92 noninvariant intercept parameters (52.8%) also exceeded the 25% rate. The results showed that the noninvariance rates were too high to compare across countries. Therefore, a trustworthy comparison cannot be established between 29 countries. In this study, latent (factor) means were not compared because of the large proportion of noninvariance.

Table 5 Approximate measurement (non) invariance for intercepts and factor loadings of items

3.3 Monte Carlo (MC) simulation check of 29-country alignment

Muthén and Asparouhov (2013) suggested that MC simulation studies can be conducted to study how well the alignment method works under different conditions of varying the number of groups and degrees of measurement noninvariance. The alignment results showed that the ICTINFO had more than 25% noninvariance parameters. Therefore, MC simulation was conducted to examine the reliability of the factor means and the quality of the alignment solution.

MC simulations for two different sample sizes, n = 5000 and n = 10,000 with the 100 replications nrep = 100, are reported in Table 6. The correlation between the true factor means and the estimated factor means was computed for each replication and averaged over the replications. The same way was used to calculate factor variances. Standard deviations of correlations/variances refer to standard deviations of correlations across simulation runs. The correlations of true means and estimated means were all 0.99 for factor means and factor variances. These estimates are well above the recommended cut-off value of 0.98 for trustworthiness (Muthen & Asparouhov, 2018). However, a large proportion of noninvariance parameters existed, and the parameters achieved for ICTINFO are trustworthy in the alignment analysis.

Table 6 Results of Monte Carlo simulation

4 Discussion

Researchers should conduct tests of MI before comparing mean scores across different groups. The aim of the study was to examine the MI of students’ practices regarding online information (ICTINFO) used in PISA 2022 across countries, as well as compare different approaches to MI analyses. In this study, comparing the latent means across countries was impossible due to the failure of scalar invariance when using the traditional MGCFA. The results of MGCFA imply that it can compare the factor loadings unbiased across countries when the metric invariance was reached. Consistent with earlier research, the majority of non-cognitive factors in PISA, particularly ICT-related indicators (Immekus, 2021; Odell et al., 2021; Wu et al., 2023), fail to achieve higher degrees of invariance among different countries when using the MGCFA method (Ding et al., 2022; Wurster, 2022).

The MGCFA approach has certain limitations, notably the need for numerous model modifications to examine MI when dealing with many groups. In contrast, the alignment method has strengths in comparisons across many groups. The alignment method can identify the most invariant items in the scales and provide the estimates of parameter invariance for each model parameter across each group (Muthén & Asparouhov, 2013). Researchers can focus on only noninvariant items and further identify the sources of noninvariance (Wu et al., 2023). In this study, consistent with MGCFA results, the ICTINFO scale was identified to be noninvariant across the countries. The factor loadings and intercepts exceeded the acceptable cutoff value (> 25%), which means they would be untrustworthy in comparing means across participating OECD countries. The Monte Carlo simulations confirmed the quality of the alignment analysis and the reliability of the results. Without evidence of MI, it is hard to know if observed differences between groups are real or just due to measurement errors. Future studies are needed to identify possible sources of the noninvariance. Understanding noninvariance and its causes in ICT scales helps researchers develop more culturally consistent items.

This present study is comprised of a total of 29 education systems and over 180.000 students from the PISA 2022 cycle. Asparouhov and Muthén (2014) indicated that the large number of groups and sample sizes has shown to be a violation of the invariance. According to the results of MGCFA and the alignment method, the construct was measured the same way across countries (the configural model was held); however, the indicators (items) of the construct cannot be directly compared. The current study detected the most noninvariant indicators in the construct. The indicators “I try to flag wrong information when I encounter it online (IC180.6)” for intercepts and “I discuss the accuracy of online information with my teachers or in class (IC180.3)” for factor loadings seem to contribute the most noninvariance. This could be further investigated to determine possible sources of noninvariance and to check whether these indicators are the fit measures to the given construct.

Based on the findings of this study, there may be some potential explanations and reasons for why MI was not achieved for the indicators of the ICTINFO scale. One potential explanation is that differences in educational curricula and teaching practices across countries may influence how students engage with and discuss online information accuracy in classroom settings (Sellar & Lingard, 2013). Educational systems that emphasize digital literacy skills may interpret and teach these concepts differently. Another explanation may be regarding language and contextual nuances (Addey & Gorur, 2020). The wording and context of survey items (e.g., “flagging wrong information” or “discussing accuracy”) may not translate uniformly across languages and cultures. This linguistic and contextual variability can lead to differences in interpretation and response patterns among students from different linguistic backgrounds or educational contexts. Lastly, methodological factors related to the measurement approach used (e.g., survey design, item characteristics, scaling techniques) could contribute to noninvariance (Vanderberg & Lance, 2000). Issues such as differential item functioning (DIF) or varying response styles across cultures might affect the performance of specific items in achieving MI.

The competencies of students’ use of ICT are affected by internet accessibility, the familiarity of students with ICT resources, and exposure to online information sources (Steffens, 2014). The availability, accessibility, and quality of ICT resources shape teachers’ and students’ practices with ICT (OECD, 2023a). For instance, Wu et al. (2023) found that Singapore showed the highest factor mean in school ICT readiness, whereas Brazil, Argentina, and Mexico had the lowest factor mean. This could be associated with the Singapore government’s endorsement of technology integration in schools. However, the schools in these countries (Brazil and others) had similar opportunities as those in Singapore, and the factor mean scores were differentiated from them due to the socio-economic distinction. The variance of the factor scores indicated that ICT-related scores may be affected by cultural differences. Overall, it is recommended that MI testing is conducted before any cross-group ICT-related score comparisons. Pokropek et al. (2019) noted that MI testing is necessary for cross-cultural comparison but not a sufficient condition for comparability. The scales might show similar characteristics (equal factor loadings and intercepts) but have different meanings in various groups, so comparing between countries or cultures may not always be a straightforward comparison. It is recommended to identify potential causes of noninvariance across indicators; therefore, it may be possible to revise the scale items in PISA to encompass cross-cultural differences. These efforts are believed to guide researchers or experts in the development and improvement of scales to be used in the forthcoming PISA cycles.

The existence of a substantial proportion of noninvariance from the MGCFA and alignment analysis indicated that the comparison of students’ practices regarding online information across countries should be made carefully. The results implied that a direct comparison of the construct might be debatable. One possible explanation for the noninvariance of the items is that the PISA 2022 cycle was performed during the COVID-19 pandemic. Educational institutions have changed the learning process from face-to-face to remote learning techniques. Remote lessons, digital tools, and educational apps have transformed learning. These changes provided students with a foundation for effective remote and autonomous learning, enriched game-based learning, or computer simulations (Schleicher, 2023). One of the most important necessities was the use and accessibility of the Internet to benefit from these opportunities. However, not all participating countries have the same level of accessibility to the Internet. Therefore, online education processes might vary between countries. This might be leading to differences in students’ access to online information and their experiences using it. The scale might not be able to reflect the current global situation properly.

Understanding MI is crucial because it ensures that the constructs being measured (i.e., ICT-related constructs) are interpreted consistently across different educational systems. This consistency has significant implications for educational policies and research. Confirmation of MI provides policymakers with the confidence that data collected from diverse countries can be effectively compared (OECD, 2018). This allows for the formulation and evaluation of educational policies based on reliable international data. For example, effective ICT integration strategies identified in one country could be adapted and implemented in others, knowing that the underlying data is comparable. Furthermore, reliable cross-country comparisons inform resource allocation decisions. They identify countries needing targeted interventions, particularly crucial in ICT education where investments are substantial (Schleicher, 2023). The availability of comparable data also supports the establishment of international benchmarks and standards (UNESCO, 2021). It enables the standardization of educational outcomes and ICT adoption in education, offering countries clear targets and metrics to measure their progress. Finally, the MI of measurements across diverse educational contexts promotes equity and inclusion in educational assessments. It ensures fair assessments where all students, regardless of their cultural backgrounds, are evaluated fairly and on an equal basis (Gottschalk & Weise, 2023).

5 Practical implications for future studies

The results of the current study indicated that the ICTINFO scale cannot be directly compared to cross-groups according to the results of the MGCFA and alignment method. Other methods of MI analyses should also be considered. The MGCFA needs to be specified for each group and produces large modification indices if there are many groups. The alignment method encounters challenges with scalar model fit and large modification indices. However, it may show the violation of invariance when a large number of group and sample sizes are. The Bayesian Approximation Method can be used to address this limitation, as it allows small differences in parameters (i.e., factor loadings and intercepts), assuming such small differences do not affect the results of the following analyses (Kim et al., 2017). In the ILSAs, in some cases, countries can be nested within regions, religions, and cultural zones, forming multilevel data. Alternatively, the countries may be organized into subgroups based on their similar cultural involvement with technology. In such cases, the Mixture Multigroup Factor Analysis (MMG-FA) for clustering groups based on their intercepts can be used to explore MI across many groups (De Roover, 2021). The MMG-FA offers the capability to produce subgroups within which a particular level of invariance is present, facilitating the possibility of conducting valid comparisons.

This study was limited to measures from the ICTINFO scale and 29 OECD countries. It is recommended that researchers confirm the results of this study across different countries. The noninvariance can be observed in the conception of rating scales, survey mode, device, and item wording. Another potential source of noninvariance may be the language effects (Leitgöb et al., 2023).

Meitinger (2017) demonstrated that the absence of scalar MI was linked to significant variations in the comprehension of concepts across countries. Therefore, further analysis should include the source of noninvariance and investigate why it occurs. Besides, answering a scale item involves comprehending the item’s meaning, finding relevant information, and selecting the response. Scores from the scale should be as equivalent as possible, but it is not always guaranteed to compare. It is necessary to take some precautions to minimize potential sources of bias during the scale development (Meitinger et al., 2020).

The findings from this study stress the necessity of MI testing of a construct to be able to make trustworthy comparisons across countries. Identifying noninvariant items holds promise for enhancing the ICTINFO scale in PISA to achieve comparability. On the other hand, Odell et al. (2021) stated that the lack of invariance allows for national adaptations, including differences particular to different nationalities. The noninvariance items might reveal the language or cultural differences at the item level. This information will be useful in developing culturally specific measurement instruments in the adaptation process.

6 Conclusions

The aim of the present study is to explore the measurement invariance of the ICTINFO scale from the PISA 2022 cycle by using MGCFA and the alignment method. The results indicate that MI is unjustified and that there is significant variability in the measurements of ICTINFO across OECD countries. This suggests that the ICTINFO scale should not be reliably compared because it is possible that the measurements are heterogeneous for OECD countries. Given these findings, researchers are encouraged to further investigate the substantive and methodological sources of noninvariance items. The results contribute methodologically by demonstrating the effectiveness of the alignment method. The alignment method, in particular, has provided valuable results in detecting noninvariance items, particularly at the country level. This approach provides a nuanced understanding of comparability challenges in cross-national studies. The study’s contribution extends beyond methodological advancements, as it offers practical insights for policymakers aiming to improve the fairness and comparability of educational assessments.