5.1 Introduction to the IDE

5.1.1 What Is the PIAAC IDE?

The PIAAC International Data Explorer (IDE) is a web-based tool for conducting analyses using a simple point-and-click interface without the need for special software on the user’s desktop or related statistical knowledge. It was commissioned by the OECD but developed and licensed to the OECD by Educational Testing Service (ETS). The IDE can produce design-unbiased estimates and standard errors that reflect PIAAC’s complex sampling and assessment design, accounting for its multi-matrix sampling of items and people, weights, and plausible values, to answer a variety of research questions. These questions can range in complexity from simple descriptive results using one or multiple variables, such as average score by gender, to more complex ones that require using a combination of variables, such as a linear regression of literacy scores on age, gender, and education.

5.1.2 Differences Between the OECD IDE and the US IDE

There are two versions of the PIAAC IDE based on the same technology but containing somewhat different data and hosted by different organizations. The first, which we will refer to as the ‘US IDE’ (accessed at https://nces.ed.gov/surveys/piaac/ideuspiaac/), is supported by the National Center for Educational Statistics (NCES) in the United States; the second, the ‘OECD IDE’ (accessed at https://piaacdataexplorer.oecd.org/ide/idepiaac/), is supported by the OECD.

While both IDEs are similar, there are some differences between the two, primarily in terms of data availability and analytical functions (see Table 5.1), as a result of respective sharing agreements, quality considerations, and aspects related to organisational policies. Both IDEs include the countries and economies that participated in PIAAC Cycle 1, more specifically Rounds 1 and 2.Footnote 1 The US IDE includes data for Cyprus (Michaelidou-Evripidou et al. 2016), which is not in the OECD version. The OECD IDE includes data for Australia and the Russian Federation (OECD 2016a) that are not available in the US IDE. As a unique source, the US IDE contains the US combined 2012/2014 household data (Holtzman et al. 2016) as well as US prison data (Hogan et al. 2016a),Footnote 2 while the OECD IDE contains only the US 2012 data (OECD 2016b). Additionally, the US version contains variables specific to the US and the prison study (e.g. Hogan et al. 2016b) that are not available in the OECD version. There are also some differences in the structure and organization of variables between the two IDEs.

Table 5.1 Differences between US IDE and OECD IDE

The US IDE has some additional analytical functions, such as gap analysis and regression analysis. It also groups subjects together so they can be displayed simultaneously and has proficiency levels/benchmarks as ‘variables’ instead of ‘statistics options’, which provide additional options for analysis.

Most of the details in this chapter will apply to both versions of the IDE, and differences will be noted or discussed as relevant throughout the chapter.

5.1.3 What Can and Cannot Be Done in the IDE?

The IDE can be used to compute various types of statistical estimates, including averages, percentages (including proficiency level distributions), standard deviations, and percentiles, along with their respective standard errors, while accounting for design (i.e. the use of estimation weights for correct population representation, replicate weights to account for sampling variance, and plausible values to account for measurement variance).

The IDE can also be used to apply basic recoding or collapsing of categories for a variable. Results can be displayed in a variety of formats, including tables, maps, and charts, such as bar charts, column charts, line charts, and percentiles charts. The IDE can also be used to run statistical significance testing, and, as mentioned previously, the US PIAAC IDE can be used to run regression and gap analysis.

Some advanced types of analyses cannot be done in the IDE, including more complex linear regressions and logistic regressions; correlation between variables or scale scores on multiple domains; or analyses that involve more complex recoding of variables, such as creating new variables from multiple existing ones. To conduct more advanced analyses, or those using variables available only on the US Restricted Use File, use of the microdata files and other analytical tools, such as the International Association for the Evaluation of Educational Achievement (IEA) International Database Analyzer (IDB Analyzer; see Chap. 6 in this volume) or the REPEST module for Stata developed by the OECD (see Chap. 7 in this volume), is required.

5.2 Content of the IDE

There are four categories of types of data available in the PIAAC IDE. They include:

  • Direct assessment data—that is, data on the three cognitive domains of literacy, numeracy, and digital problem solving.

  • Background questionnaire (BQ) data. Both the OECD and US IDE contain international variables that are common across all countries (OECD 2014), including some variables derived from original responses through recoding or categorisation of direct responses to the BQ. The US IDE contains specific variables administered only to the household and/or prison population in the United States.

  • Trend data from the two prior international adult literacy assessments, including literacy data from the International Adult Literacy Survey (IALS; OECD and Statistics Canada 2000), conducted in 1994 to 1998, and literacy and numeracy data from the Adult Literacy and Life Skills Survey (ALL; Statistics Canada and OECD 2005), conducted in 2003–2008.

  • Jurisdiction information, meaning data organised by OECD Entities, OECD member countries that participated in PIAAC at the national level, such as the United States; OECD Sub-National Entities, or OECD members that participated in PIAAC at the sub-national level; and Partners, or participating countries that are not OECD members. In addition, the US IDE contains data from the full 2012/14 US data for those aged 16–74 years, as well as from the incarcerated population.

5.3 Organisation of the IDE

In this section we will describe the IDE’s user interface and how the data are organised and presented. The content and functions of the IDE are organised under four main tabs or pages (as seen in Fig. 5.1):

  • Select Criteria

  • Select Variables

  • Edit Reports

  • Build Reports

Fig. 5.1
figure 1

Overview of IDE tab organisation

The content and functions of these tabs are outlined in Table 5.2, and we will introduce each in turn.

Table 5.2 Selection options and functions within each tab or page of the IDE

Additional details on the content and organisation of the IDE can be found in the PIAAC International Data Explorer (IDE) Training Video on the PIAAC Gateway website at http://piaacgateway.com/ide-training-video.

5.3.1 Select Criteria Tab

In the OECD IDE, on the Select Criteria page, the user begins by selecting a Subject (literacy, numeracy, or problem solving) that will be the only cognitive domain available for analysis.

In the US IDE, the user begins by selecting a Display, or population of interest. These displays are Adults aged 16–65; Young adults aged 16–34; and US adults residing in households and prisons aged 16–74. The 16–65 display and 16–34 display allow for international comparison, while the last display is focused on US adults only. After this initial selection is made, selection of years, measures, and jurisdictions will become available on the Select Criteria tab.

In the OECD IDE and the US IDE, displays allowing for international comparison (Adults 16–65 or Young adults 16–34), there are column headers to select the years/studies for analysis, which can be used to analyse the data from ALL 2003–2008 or IALS 1994–1998.

Following the initial selection, the tab displays variables that can be used as dependent measures for the target analysis. The first category of variables in both IDEs is the scale scores, and the first sub-category is the cognitive skills variables. All variables included in this page are continuous variables. These continuous variables include the specific values of actual responses or derived measures and indices rather than the range or group in which they fit. So, for example, one would find the specific earnings variable on this tab. One can produce averages, standard deviations, and percentiles of the dependent measures on this page. Therefore, selecting the literacy score measure on this page or a continuous skill use measure would allow one to later conduct analyses such as producing averages or percentiles of these measures.

Note that in the OECD IDE, only the cognitive domain initially selected as the Subject is available for selection, while in the US version, all three cognitive domains are available.

Other variables available on this tab include:

  • The ‘skill use’ indices (continuous measures derived from responses to several questions on frequency of use of specific reading, writing, numeracy, or ICT skills) and reading components variables

  • The population category, which is used to look at percentages across the full sample without looking at any specific continuous measures

  • Categories and variables from the International BQ, with various sub-categories from each section of the International BQ that was common across countries, such as Formal education, Current work, and Background

  • Derived variables, including PIAAC-specific variables as well as comparable trend variables that are available from IALS, ALL, and PIAAC and can be used to do analysis over time

  • Prison-specific variables on topics such as prison jobs, available when the US Adults, 16–74 years old (Household and Prison) display is selected in the US IDE

Data from the jurisdictions (or all participating countries and entities) are included in the lower portion of this tab, allowing for international comparison. In the OECD IDE, the International group includes the OECD Average, which provides the average of OECD national and sub-national entities in the OECD IDE, while the US IDE can provide the Average of All Jurisdictions, which also includes the Partners in the average. These averages always stay the same regardless of the specific jurisdictions selected. The other average listed in this group is the Average of the Selected Jurisdictions, which provides the average of all the specific jurisdictions selected in the analysis; it will vary depending on the selections. For example, if Canada, Japan, and the United States were selected in addition to the Average of the Selected Jurisdictions, this would provide the average of those three selected countries.

Note that when US Adults 16–74 is selected as the target population in the US IDE, then only the US Household (16–74 years old) and US Prison (16–74 years old) will be available in the jurisdiction section.

5.3.2 Select Variables Tab

This tab contains variables organised by category and sub-category, similar to the previous page. The variables here are not at a continuous measurement level. Rather, they are all categorical variables (ordered or nominal). For example, the tab includes variables categorising detailed income into deciles, which is related to but different from the continuous income variables available on the first tab. The variables here can be used differently in analysis than the variables on the first page. They can be used to produce percentage distributions or crosstabs and can also be used to cross or subset the results for the measures selected on the Select Criteria page.

Major reporting groups, the first category on this tab, provides easy access to commonly used variables. This category begins with the All adults option, which allows one to get results for the full population, without breaking it down by additional categories or variables. It also includes common demographic variables, such as gender, age, education level, and employment status.

In the US IDE, proficiency levels is another major subcategory; it allows access to the six proficiency levels for literacy and numeracy and four levels for digital problem solving.

Other variables available on this tab include:

  • The International BQ variables, including current work, education, skill use, and background; see the Background Questionnaire – Conceptual Framework (OECD 2011),

  • Derived variables as well as trend variables

  • US prison variables available within the 16–74 (Household and Prison) display; see examples in the US PIAAC prison report (Rampey et al. 2016).

On this page, multiple variables can be selected, and the reports produced on the next two tabs will use them in separate or combined ways in the analyses as selected.

On both the Select Criteria and Select Variables pages, the search function is available to search for a measure or variable by keyword(s) or variable names, as an alternative to looking through the categories for a measure or variable.

5.4 IDE Functions, Analysis Types, and Statistic Options

In the following sections, most of the IDE’s major functions, types of analysis, and statistic options will be covered.

These options are accessible in the Edit Reports and Build Reports tabs.

5.4.1 Edit Reports Tab: Statistic Options

The following statistics are available through the Statistics Option button on the Edit Reports tab. Averages

The IDE computes the mean of the selected measure, such as estimating the average scores of the literacy, numeracy, or digital problem-solving domains for a given population or subgroup. Using the average statistic with the cognitive scale scores selected on the first tab can answer questions ranging from ‘How do average numeracy scores compare across countries?’ to ‘What are the average problem solving in technology-rich environments scores of US young adults aged 16–34 by employment status?’

The averages statistic can also be used for other continuous, noncognitive variables, to answer questions like ‘How do the average monthly earnings compare between males and females in the United States, Japan, and Canada?’ or ‘How does the number of hours of participation in non-formal education vary by education level?’ Percentages

The percentages statistic produces the percentage distribution of the column variable within each category for the row variable, meaning that the categories of the column variable will add up to 100% for each category of the row variable. So, for example, if gender is the row variable (i.e. gender categories are listed as the row labels), and analysis is done by level of educational attainment, the results would show the percentage in each educational attainment category for males and for females. This function is useful to answer questions such as ‘What is the percentage distribution of males and females in different areas of study?’ or ‘What percentage of the US adult population are employed?’

The percentage distributions do not include missing data (i.e. data missing due to nonresponse, or valid skips due to survey design) unless additional selections are made in the Format Options section of the Edit Reports page (see below). Standard Deviations

The standard deviation statistic is a measure of variation or dispersion of the values for a particular variable. This statistic option can be used to answer questions such as ‘What is the standard deviation of the literacy scale across all countries?’ or ‘What is the standard deviation of income in each country?’ Percentiles

The percentiles option shows the threshold, or cut point, below which certain percentage of adults score at or below that cut point. For example, the 50th percentile literacy score shows the median value, of which half the adults performed above the threshold and half below, and the 75th percentile would show the cut point above which the top 25 percent of adults performed. The OECD IDE has selection options for the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles, while the US IDE has the 10th, 25th, 50th, 75th, and 90th percentiles available. The percentiles statistic can be used to answer questions such as ‘What are the percentiles on the literacy scale, including the median, for each age group?’ or ‘How does the monthly income cut point for being in the top 10% of earners vary by education level?’

Note: The IDEs do not support the definition of user-defined cut points for percentiles. Achievement Levels

Achievement levels (discrete and combined), commonly referred to as proficiency levels, are available as a statistic option only in the OECD IDE, while this type of analysis would be done using proficiency levels as variables in the US IDE, as described later. These proficiency levels are reported as the percentage of adults scoring at performance levels anchored by a specific set of concrete skills. Discrete Achievement Levels (OECD IDE Only)

The discrete achievement levels option allows users to look at each individual level in the proficiency distribution, so Below Level 1 through Level 5 for literacy and numeracy and Below Level 1 through Level 3 for problem solving. Note that the literacy-related nonresponse category in literacy and the computer-related nonresponse categories in problem solving will display estimates from non-missing values only when Percentage across full sample is selected as the measure. Literacy-related nonresponse includes those adults unable to communicate in the language(s) of the BQ, or those with a learning or mental disability, while the computer-related nonresponse groups include those with no computer experience, those who failed the ICT core, and those who refused the computer-based assessment. The discrete performance levels statistic type can be used to answer questions such as ‘What is the literacy proficiency distribution within each employment status category in Australia?’ or ‘What is the percentage of employed adults performing at the lowest literacy level in the United States and Australia?’ Combined Achievement Levels (OECD IDE Only)

The Combined Achievement Levels option allows users to analyse combined groupings of adjacent proficiency levels, enabling them to focus on those adults performing at higher or lower ranges of levels. For example, if Below Level 3 was selected within the Combined Achievement Levels selection, the percentage of adults at Below Level 1, Level 1, and Level 2 would be reported as a combined grouping. For literacy and numeracy, the grouping options available for the combined achievement levels are Below Level 2, Below Level 3, Low Levels (1 and 2), High Levels (4 and 5), and At or above Level 3. For problem solving, the grouping options are Below Level 2, Below Level 3, Low Levels (1 and 2), and High Levels (2 and 3). Questions that can be answered using combined achievement levels include What is the percentage of young adults performing at the low levels in numeracy within each level of educational attainment?’ or ‘Which age groups have the largest percentage of high performers in digital problem solving?’ Levels as Variables, Profile by Level, and Score by Level (US IDE Only)

Proficiency levels are available as variables in the US IDE, rather than as a statistic typed, so selection of proficiency levels for analysis occurs on the Select Variables page rather than the Edit Reports page. This analysis option is not available in the OECD IDE.

If proficiency levels are selected as a variable, then selecting Percentages as the statistic will provide the percentage distribution of proficiency levels. Having proficiency levels as variables allows for some additional flexibility in analysis. For example, the proficiency level categories can be collapsed as desired—for example, collapsing Levels 4 and 5 but leaving the other levels in the proficiency distribution as is—using the Edit action on the Edit Reports page (described in more detail later). Using the proficiency levels in this way can answer similar questions to those in the combined or discrete achievement levels in the OECD IDE.

Having the proficiency levels as variables also allows one to create profiles of those at different skill levels by looking at the percentage distribution of characteristics within each level. One can create profiles within levels by using the Edit action on the Edit Reports page (see below). This can be used to answer questions such as ‘Among those at the lowest numeracy levels, what percentage have a tertiary education?’ or ‘What is the distribution of health status among US young adults within each numeracy proficiency level?’

One can also look at averages of the continuous measures available on the Select Criteria page, including cognitive scale scores, within each level if proficiency levels are selected as a variable and Averages is selected as the statistic. This can be used to answer questions such as ‘What is the average literacy score at each numeracy proficiency level for older adults?’ or ‘How do average monthly earnings vary by problem-solving proficiency level?’

5.4.2 Build Reports Tab: Additional Statistical Functions

In addition to those statistic types, there are some other analytical functions available in the IDE on the next page, Build Reports. The following is available in both versions of the IDE. Significance Test

Significance testing can be used to estimate if differences in results reflect a true difference in the population, or are likely to have been observed due to sampling variation or chance. On the Build Reports page, there is a Significance Test selection above the table displaying results for the analysis. In the Significance Test window, users can select to conduct testing either Between Jurisdictions (comparing similar populations across countries, e.g. determining whether females perform higher in numeracy in Germany or Spain), Within Variables (comparing groups or categories of a variable within a jurisdiction, e.g. determining whether females or males perform differently in numeracy in Italy), or Across Years (comparing groups or a full population within a jurisdiction over time, e.g. determining whether the numeracy score of females in Canada was different across IALS, ALL, and PIAAC).

Depending on the variable and jurisdiction selections, some significance testing options that are not applicable will be greyed out and not available for selection. For example, if only PIAAC data were used in analysis, one will not be able to conduct testing across years (i.e. with IALS or ALL). Similarly, if only one jurisdiction was selected, one will not be able to conduct testing between jurisdictions. Other steps in this Significance Testing window allow users to name their significance tests, select whether they want their results displayed in a table format or map format (available only when Between Jurisdictions Testing is selected), or choose to display score details—that is, display the estimates and standard errors on the table. The last section is used to select which jurisdictions, variables, and/or categories, years, and statistics to compare.

A few additional analytical functions, gap analysis and regression analysis, are available only on the Build Reports page of the US IDE. Gap Analysis (US IDE Only)

The Gap Analysis function can compare differences in gaps between countries and/or across different time points. For example, the most basic type of gap analysis for comparing average literacy scores by gender for two countries would be to compare the male–female gap (i.e. score difference between males and females) in one country to the male–female gap in another country. The Gap Analysis window is available from the Build Reports page. Selecting the basis for comparison on this page, either Between Jurisdictions or Across Years, users can also select the gap, or difference measure, to analyse either Between Groups, Between Years, Between Groups and Years, or Between Percentiles within the selected variable.

Similar to the significance testing, other steps in this window allow users to name their significance tests, select whether they want their results displayed in a table format or map format, or choose to display score details. The last section is used to select which jurisdictions, variables and/or categories, years, and statistics to compare. This function can be used to answer questions such as ‘Is the gap in numeracy skills between young adults (16–24) and older adults (55–65) different in the United States than in Canada?’ or ‘Did the gender gap in numeracy skills change over time between ALL and PIAAC?’ Regression Analysis (US IDE Only)

Regression analysis functionality is available only in the US IDE and uses a linear regression approach. Although the function is more restrictive, and there are fewer options than when conducting the analysis using standard statistical packages, this function allows users to examine and test the level of association between one continuous dependent variable (predicted) and up to three independent variables (predictors). Dummy coding (i.e. a 0/1 flag) is used to code independent variables, where the first subgroup of the independent variable is the reference group and cannot be changed. This is useful for comparing each subgroup against a reference group. For example, if the subgroup Excellent is the reference group for the independent variable Health Status, the IDE creates a Very Good dummy variable (1 for respondents who answered Very Good, 0 otherwise), a Good dummy variable (1 for respondents who answered Good, 0 otherwise), a Fair dummy variable (1 for respondents who answered Fair, 0 otherwise), and a Poor dummy variable (1 for respondents who answered Poor, 0 otherwise). The reference group Excellent is excluded from the regression analysis. This way, each of the other health groups is compared to the Excellent group using a total of four dummy variables.

Regression analysis is accessible on the Build Reports page. In the Regression Analysis window, one can create a name for the regression analysis and select jurisdictions, year, and up to three independent variables for analysis. Regressions can be used to answer questions such as ‘Do US males have higher monthly earnings than females, even when controlling for occupation and industry?’ or ‘Do employed US adults have higher numeracy skills than unemployed adults, even when controlling for educational attainment and age?’

Note that regression analysis can only be performed for 1 year and one jurisdiction at a time, unlike most other analysis types. The continuous measure selected on the Select Criteria page will become the dependent variable for regression, and the independent variables are from the categorical variables selected on the Select Variables page and cannot include continuous measures.

5.4.3 Display and Reporting Results Options

The following are different options to display and format results. Format Options

The Format Options selection on the Edit Reports page allows users to select various options on how to display variable labels, whether to display missing values in percentage results, and year order. It also includes options on the number of decimal places displayed in the results, whether standard errors are included and whether parentheses/brackets are used. The selections will apply to all reports. Edit Option

In addition, the Edit option, available for each individual report in the Action column on the Edit Reports page, allows users to change or select the measure, jurisdiction, year, and statistic or create new variables by collapsing or combining categories of a selected variable. It also allows users to edit the table layout by changing whether the variables are located in the rows and columns. Particularly for percentages analysis, changing the rows and columns of the table will impact the results and change how the distributions are analysed and reported (i.e. the categories of which variable(s) add up to 100%). For example, within proficiency levels analysis in the US IDE, moving the proficiency levels variable to the row section and one (or both) of the other selected variables to the column section produces results showing a profile by levels.

Other display format and options available on the Build Reports page include:

Data Tables

By default, the IDE will present statistical results in a data table format that is displayed on the Build Reports page. Note that up to two statistics and up to three variables can be selected for inclusion in each report table.


On the Build Reports page, the Chart selection allows another option to display results. Multiple years/studies or jurisdictions may be selected, but only a single statistic type can be selected for inclusion in the chart. In the Chart Options window, one can choose the chart type and how the data are displayed. Chart types available in both the OECD and US versions of the IDE include Bar Chart, Column Chart, Line Chart, and Percentiles Chart. Note that the Percentiles Chart is available only when the percentiles statistic is selected.

The OECD IDE also includes Discrete Chart and Cumulative Chart types, which are available when the Discrete Achievement Levels or Combined Achievement Levels statistic is selected, respectively. Below the chart type selection, one can choose what the bar/column/line values will display and what these values will be grouped by. The selections here will determine how the data are displayed and organised. After previewing the chart, use the drop-down menus above the chart to view other sub-sets of the data, depending on previous selections. Once the chart selections are complete, the Done button finalises the chart and allows it to later be saved or exported.

Export Reports

On the Build Reports page, the Export Reports button allows tables, charts, and significance tests produced in the IDE to be saved or printed. The reports checked off in the Edit Reports page, and any associated charts or significance tests, will be available for selection to export. The user also must select the format for export here. In both the OECD and US versions of the IDE, HTML, Excel, or Word formats are available. In the US IDE, a PDF format is also available for export.

The US IDE has one other option to save results on the Build Reports page. The Link to this Page button can be used to produce a link that can be copied and pasted into an email or browser. Note that only the main results table is produced through this link, and if the user had collapsed variables, conducted significance testing, or created charts for the analysis, they would not be directly available.

5.5 Example Research Scenarios

The following research scenarios will provide a basic idea of the kind of questions that can be answered using the IDE and how to interpret the IDE results. A more detailed step-by-step walk of additional research questions can be found in the PIAAC International Data Explorer (IDE) Training Video on the PIAAC Gateway website at http://piaacgateway.com/ide-training-video.

5.5.1 Scenario 1: Averages Analysis

For the first scenario, the IDE will be used to answer the following question: What are the average problem solving in technology-rich environments scores in Australia, Canada, and England/Northern Ireland by employment status?

This scenario covers a basic research question and the simplest way of reporting the results. The OECD IDE will be used here, but the process for answering this question in the US IDE would be similar. To answer this question, the Subject used is Problem Solving, and the continuous measure selected on the Select Criteria page is the skills scale score for problem solving (PIAAC Problem Solving: Solving located in the Scale Scores Category and Skills Sub-Category). The jurisdictions or countries of interest in the scenario are Australia and Canada from the OECD National Entities Group and England/Northern Ireland from the OECD Sub-National Entities Group on the Select Criteria page. The categorical variable for analysis selected on the Select Variables page is the derived employment status variable, the variable Current status/work history – Employment status (DERIVED BY CAPI) found in the International Background Questionnaire Category and Current Sub-Category. Finally, on the Edit Reports page, Averages is the Statistic to answer this type of question.

On the Build Reports page, a table like that in Fig. 5.2 is produced. In this output, read each row across to find results (averages with the related standard errors) for the relevant jurisdiction. For example, the average problem-solving score in Australia is 291 for employed adults, 282 for the unemployed, and 282 for those out of the labour force. The related standard errors appear in parentheses next to the main estimate and provide information about the uncertainty of the estimates. It appears that employed adults perform better in problem solving than those who are unemployed and out of the labour force, but a significance test would need to be conducted to determine whether this apparent difference is statistically significant.

Fig. 5.2
figure 2

Table output from Scenario 1 (Source: Organisation for Economic Co-operation and Development (OECD) PIAAC International Data Explorer)

5.5.2 Scenario 2: Proficiency Levels Analysis, Significance Testing, and Charts

This scenario uses the OECD IDE to introduce proficiency levels analysis, conduct significance testing, and create charts to answer the following question: How does the percentage of non-native-born US adults performing at low levels on the numeracy proficiency scale compare to the percentage among their peers internationally?

As described in the overview of the analysis types, the process for conducting analysis with proficiency levels is different in the US IDE (not illustrated here). For this scenario, the Subject is Numeracy, the measure chosen on the Select Criteria page can be the numeracy score (PIAAC Numeric: Numeracy located in the Scale Scores Category and Skills Sub-Category), and the jurisdictions are the OECD Average (which is the average of the OECD National and Sub-National Entities and does not include Partners) in the International Group and the United States in the OECD National Entities Group. On the Select Variables page, the variable of interest is Background – Born in country in the Major reporting groups Category and Sub-Category. On the Edit Reports page, within the Combined Achievement Levels Statistic Option, the Below Level 2 option can be used to focus on the group of low-skilled adults.

The results table displayed on the Build Reports page shows that, internationally, 36% of non-native born adults perform Below Level 2 in numeracy, while 49% in the United States do. The Significance Test function on this page can be used to see if the difference between these two numbers is statistically significant.

The Between Jurisdictions significance testing type is used to compare proficiency levels between the US and the OECD average rather than Within Variables, which compares proficiency levels within each jurisdiction for the native born and non-native born. The testing of interest would use All Jurisdictions in the Jurisdiction section and the No (not born in the country) category in the Variable section. The significance results shown in Fig. 5.3 are the Table output type. The title provides details and information that this significance test is for the statistic and group of interest (Below Level 2 and not born in the country). In the legend below the table, the less-than arrow (<) with lighter blue shading indicates ‘significantly lower’, the greater-than arrow (>) with darker blue shading indicates ‘significantly higher’, and the x with white shading indicates ‘no significant difference’.

Fig. 5.3
figure 3

Significance test output from Scenario 2 (Source: Organisation for Economic Co-operation and Development (OECD) PIAAC International Data Explorer)

To interpret the table, read across the row that shows, for example, that the OECD Average percentage at Below Level 2 is significantly lower than the percentage for the United States. The proficiency level percentage values being compared for each jurisdiction is in parentheses after each jurisdiction label (e.g. for the United States it was 49%). Within the table, the differences in percentage points between the two jurisdictions or groups being compared are found under the symbol indicating the results of the testing. In this example, the difference is 12 percentage points for the OECD average and the United States. The difference is estimated based on the values with full precision (i.e. unrounded values), so even though it may seem that the difference should be 13, the rounded difference based on estimates with full precision is 12. The value in parentheses is the standard error of this difference. The p-value for that testing is indicated under the difference. As indicated in the note, an alpha level of 0.05 is being used for these comparisons, so testing with a p-value lower than this indicates a significant difference.

The IDE also provides a Chart option on the Build Reports page as another way to display your results. To compare the results from the OECD Average and United States visually, select both in the Jurisdiction section within the Data Options selections. Use the Bar Chart type and use the Jurisdiction for the Bar Values and have the Values Grouped by Combined Achievement levels within the Chart Options selections in order to create a chart comparing the level of low-skilled adults across jurisdictions. The figure produced would display results from those who were not born in the country, as shown in Fig. 5.4.

Fig. 5.4
figure 4

Chart output from Scenario 2 (Source: Organisation for Economic Co-operation and Development (OECD) PIAAC International Data Explorer)

5.5.3 Scenario 3: Gap Analysis

The next scenario uses the US IDE to go over the Gap Analysis function and look at the question: Is the gap in literacy skills between younger adults (16–24) and older adults (55–65) different in the United States than internationally?

As mentioned previously, the gap analysis function is available only in the US IDE. The Adults, 16–65 Display is used to conduct this international comparison. On the Select Criteria page, the PIAAC Literacy: Overall scale is the Measure, and the jurisdictions include the Average of All Jurisdictions (which includes OECD National and Sub-National Entities as well as Partners) and the United States, found in the International and OECD National Entities Groups, respectively. Age groups in 10-year intervals (derived) within the Major reporting groups Category and Sub-Category is selected on the Select Variables page, as it can be used to look at the relevant categories of younger and older adults. The gap analysis will be comparing the differences between averages of the age groups, so the Averages statistic type is used on the Edit Reports page. After producing an output table displaying average literacy scores by age group in the United States and internationally on the Build Reports page, the Gap Analysis function on this page is used to see whether the score-point difference between younger adults and older adults is significantly different in the United States and internationally. In the Gap Analysis window, Between Jurisdictions is used as the basis for comparison, and the Between Groups gap should be analysed. All Jurisdictions is selected in the Jurisdiction section to include both the Average of All Jurisdictions and the United States, and in the Variables section, only the 24 or less and 55 plus variable categories need to be included.

In the Gap Test tab, a table similar to the tables for significance testing is produced (see Fig. 5.5). The title of the table indicates that this testing is focused on finding differences between jurisdictions for gaps in averages between age groups. Reading across the table shows that the gap for Average of All Jurisdictions has a significant positive difference compared to the United States, meaning that the gap in literacy skills between younger and older adults is larger internationally than in the United States. The size of the gap, or score-point difference, is shown in parentheses next to each jurisdiction, so the 11 next to United States is the difference between the literacy score of 273 for younger adults and the score of 262 for older adults listed in the table under the testing results. These gaps are what are being tested or compared here. Also, within the table, under the symbol indicating the direction of the difference, is the difference in the size of the gap and its standard error, so in this case the gap internationally is 15 points larger than the US gap. Under that, the p-value is listed.

Fig. 5.5
figure 5

Gap analysis output from Scenario 3 (Source: National Center for Education Statistics (NCES) PIAAC International Data Explorer)

5.5.4 Scenario 4: Regression Analysis

This last scenario demonstrates the regression analysis function in the IDE and focuses on the following question: Do US adults (16–74) who are employed have higher numeracy skills than those who are unemployed, even when controlling for age and education level?

This scenario will also use the US IDE, as the regression analysis is available only in this version. This question focuses on the full US 16–74 population, so US Adults 16–74 (Household and Prison) is the Display of interest. PIAAC Numeracy: Overall scale is the Measure and US Household (16–74 years old) the Jurisdiction selected on the Select Criteria page. So here, numeracy scores will be the measure or dependent variable for regression. In the Select Variables tab, the independent variables or control variables that will go into the regression are Age in 10 year bands extended to include ages over 65 (derived), Education – Highest qualification – Level (collapsed, 3 categories), and Current status/work history – Employment status (derived). These variables are all located in the Major reporting groups Category and Sub-Category. When conducting regression analysis, the statistic should be set as Averages on the Edit Reports page. From the Cross-Tabulated Report that contains all the independent or control variables, one can use the Regression Analysis function on the Build Reports page. In the Regression Analysis selection page, all three variables—age, education level, and employment status—are included in the Variable section.

Regression analysis output, as seen in Fig. 5.6, is produced. The title for the regression results table includes the information of the predicted variable that is called the dependent variable, which here is numeracy; predictor or explanatory variables that are called the independent variables, which here are age, education level, and employment status; and the reference groups for the explanatory variables called the contrast coding reference groups, which are the categories for each variable to which all other categories of the variable are compared, which here are the 24 or less age group, those with education ISCED 2 and below, and the Employed.

Fig. 5.6
figure 6

Regression analysis output from Scenario 4 (Source: National Center for Education Statistics (NCES) PIAAC International Data Explorer)

To review how much explanatory power the variables have on numeracy scores or our outcome variable, one should look under R Squared in the top portion of the results. The R-squared value here is 0.27, which means that 27 percent of the variation in the numeracy scores are accounted for by the independent variables in our model.

In the lower portion of the table, one can find the regression coefficients for the variables. This includes the standardised and unstandardised regression coefficients, along with their standard errors. The standardised regression coefficients are standardised against the independent variables’ mean and standard deviation, which is done to allow comparison of the units across the variables. Using the standardised coefficient, one can answer the question of which of the categories have a stronger or weaker relationship with the outcome variable (or numeracy). For example, looking at Unemployed, which has a standardised regression coefficient of −0.07, and comparing that to the standardised coefficient for the age groups (ranging from −0.08 to −0.16), indicates that age has a stronger relationship with the dependent variable, or numeracy, than being unemployed. In order to interpret the results within each of the variables, look at the unstandardised regression coefficients, labelled here as just regression coefficients. So, for example, the unstandardised regression coefficient for Unemployed is −17, meaning that those who are unemployed scored 17 points lower in numeracy than those who were employed, holding other explanatory variables included here constant. Moving to the right of the table, the t-statistic is −7, and the probability is 0, which is less than the significance threshold of probability less than 0.05. This means that the independent variable, being employed, is significantly associated with changes in the dependent variable, numeracy score. This statistical significance is also marked in the significance column, with a less-than (<) symbol. So, controlling for the two other explanatory variables (age and education level) in our regression, numeracy scores for those who are unemployed are lower than for those who are employed.

5.6 Summary

PIAAC is a complex large-scale study with the major components of a direct assessment of three domains and an extensive background questionnaire. The IDE is a user-friendly online tool that allows users to conduct different types of analyses, from basic statistical analyses to some more advanced analyses, such as regression and gap analysis, using PIAAC data; it is available in two different versions through the OECD and NCES in the United States. The IDE contains data on the PIAAC direct assessment and background questionnaire from jurisdictions that participated in PIAAC, as well as trend data from previous large-scale assessments that were rescaled to PIAAC. It can be used to conduct analyses of averages, percentages (including proficiency level distributions), standard deviations, and percentiles,and to produce significance testing, gap analysis, and regression analysis (depending on the version). Analysis in the IDE follows the basic steps organised under the four main tabs: 1. Select Criteria, 2. Select Variables, 3. Edit Reports, and 4. Build Reports. The example research scenarios provide an overview of these basic steps and how to answer research questions using the IDE.