1 The harmonization of education in survey research

Surveys are an attractive data source because of their standardization, resulting from the application of a highly structured, uniform questionnaire in a standardized interview. Some processes and data, however, cannot be standardized by the questionnaire, survey operations or standardized interview administration, but require further standardization to be achieved with data processing. The application of statistical definitions and classifications for deriving nationally or even internationally comparable standard variables is one such area. In such an endeavor, there is a delicate balance between standardization and construct validity (Hauser 2016).

“[...] All efforts that standardize inputs and outputs in comparative surveys” are referred to as ‘harmonization’ (Granda and Blasczyk 2010, p.1). The measurement of educational attainmentFootnote 1 in cross-national surveys requires harmonization procedures in data processing: Educational systems and qualifications differ strongly across countries, and the names of educational qualifications usually cannot be reliably translated because across languages, similar terms are used to denote different levels of education. There is no international terminology that would be equally well understood by respondents across countries.Footnote 2 It is thus not advisable to design one ‘source’ measurement instrument and translate this into all languages needed for the survey (so-called ‘input harmonization’) when measuring levels of education. Therefore, comparative research usually relies on country-specific measures of individual’s educational attainment. The resulting country-specific variables are harmonized by recoding them into a common standard after data collection. This process is called ‘output harmonization’ (for further details, see e.g. Schneider et al. 2016; Schneider 2016).

For truly cross-national surveys, i.e. surveys that are “deliberately designed for comparative research” (Harkness et al. 2010, p. 3), education harmonization is already part of the survey design phase and not limited to data processing, which is why it is called ex-ante output harmonization (Ehling 2003; Granda et al. 2010; Granda and Blasczyk 2010). These surveys design the harmonized target variable(s) before developing the national questionnaires, and design measurement instruments in the different countries as well as coding rules for harmonization before data collection. In this way, they make sure that every kind of education intended to be differentiated and coded internationally will also be identified in the country-specific questionnaire items. Ex-ante output harmonization requires a certain degree of organizational capacity, but then can be realized successfully.

The situation is rather different when surveys are not designed to be comparable from the outset, but researchers wish to combine data from different surveys (sometimes much) later to be able to examine a specific research question that requires increasing the variation on the country level or over time (Dubrow and Tomescu-Dubrow 2016; Slomczynski and Tomescu-Dubrow 2018), or increasing sample size to study specific groups (Doiron et al. 2012). Such an undertaking requires ex-post data harmonization (Ehling 2003; Granda et al. 2010; Granda and Blasczyk 2010), i.e. the adjustment of data that are to be pooled, resulting in a single integrated dataset with coherent target variables (Dubrow and Tomescu-Dubrow 2016). This applies both to the combination of surveys conducted in different countries, but also surveys conducted in just one country, but designed independently of each other (i.e. not part of a time series or panel design). Then, the data need to be made comparable after data collection by recoding variables relating to the same underlying concept but resulting from different measurement instruments into a common standard. Here, flexibility is highly limited by the information collected in the different surveys (Wolf et al. 2016), and harmonization is therefore difficult (Dubrow and Tomescu-Dubrow 2016). With respect to education, for example, education categories are more detailed or better documented in some datasets than in others. This makes ex-post harmonization more challenging—and the results often more limiting—than ex-ante harmonization.

Education is a core social background variable covered by all surveys and used in many statistical analyses, but very difficult to harmonize. To facilitate this task and support ex-post harmonization projects that want to maintain as much information as possible from the original data, this paper proposes a new harmonization framework for harmonized educational attainment variables as target variables in harmonization projects. It builds on the International Standard Classification of Education (ISCED) 2011 (UNESCO 2012, see Sect. 3), but extends it for usage in ex-post harmonization of survey data. The framework firstly includes a new coding scheme called ‘Generalized International Standard Classification of Education’, or GISCED. The framework also builds on experience from ex-ante output harmonization of education for comparative surveys, especially the European Social Survey (ESS), complementing the concepts underlying ISCED (and thus GISCED) to better represent strongly stratified educational systems, as they exist in many European countries. The framework therefore secondly proposes a set of ’extension variables’ that allow researchers to operationalize additional concepts to what ISCED traditionally covers, such as the stratification of secondary education or distinction of different types of vocational education and training.

The paper will proceed in Sect. 2 with a brief overview of existing harmonization schemes for education, and a discussion of the advantages and disadvantages of detailed vs. aggregated measurement and coding. Section 3 then presents ISCED 2011 as the foundation for the proposed coding framework. Section 4 introduces common obstacles in ex-post harmonization of education data and proposes specific solutions, leading to the proposal of the GISCED framework, consisting of the GISCED coding scheme and a number of extension variables. Then, in Sect. 5, I will explain how to apply the new framework to existing national education variables, and how to derive its codes from some other international education coding schemes. Section 6 provides an empirical illustration using different education schemes as independent variables, looking at intergenerational educational inequalities. The final section summarizes the paper, discusses the results, and gives a brief outlook.

2 Existing schemes for ex-post harmonization of education

There are three basic approaches for output-harmonizing information on educational attainment: levels of education, scaled levels of education, and years of education. They will be briefly presented here.

2.1 Years of education

The first approach is to convert national education variables into the corresponding years of education (following Duncan and Hodge 1963). The ISCED mappings provided by UIS (UNESCO 2021) include the required information on the cumulative duration of educational programmes across countries. Sometimes years of education are also derived from harmonized categorical education variables. The ex-post harmonized International Social Mobility File (ISMF, Ganzeboom and Treiman 2019), only accessible on request, for example provides codes for derived ‘virtual’ years of education (Ganzeboom 2019). In this derivation, vocational education is penalized by not counting fully the related number of years of education so that the resulting variable better reflects how much general education a respondent has obtained. The resulting information is easy to include in linear statistical models, and is particularly popular among economists analyzing returns to education (Mincer 1974; Flabbi et al. 2008).Footnote 3

2.2 Levels of education

Regarding the second approach, levels of education, various coding schemes are used. For national data from different sources, harmonization can be supported by national education classifications (which not all countries have), while for international data, an international classification is needed.

At the most simple level, data harmonized using levels of education distinguish just a few broad, ordinal categories such as ‘less than primary’, ‘primary’, ‘secondary’ and ‘higher’ (or ‘tertiary’) education, which are often not explicitly related to an international standard. For example, the IPUMS Demographic and Health Surveys (DHS, Boyle et al. 2019) and IPUMS International (Minnesota Population Center 2019) harmonize education into just four broad categories, in addition to years of education. What specifically constitutes these broad categories unfortunately differs across projects, which is due to different levels of development and educational expansion across countries. Furthermore, the specific content of broad categories is not always documented, and if the source variables were translated into English, the link with the educational system of the country in question becomes obscure. The exact contents of categories is also often unclear because there is no universal understanding of terms like ‘secondary’ or ‘higher’ education. Therefore, with data using just broad education levels for measuring and/or harmonizing education data, interoperability with other data, including harmonization with other sources, is severely limited, unless they use the same broad education categories. Some projects use standard aggregations of ISCED levels to produce three broad levels (e.g. Barone and Ruggera 2018), which are also commonly used in official reports concerning education across countries (e.g. OECD 2017).

At the most complex level, there are multi-digit coding schemes reflecting not just levels but also types of education within levels. The most common ones here are ISCED 2011 (UNESCO 2012, see further details in Sect. 3) and its predecessor ISCED 1997, developed for international official education statistics, and the CASMIN education scheme, which was developed in the academic project ‘Comparative Analysis of Social Mobility in Industrial Nations’ (König et al. 1988; Brauns et al. 2003). While the CASMIN scheme has been used mostly as a coding scheme for ex-post harmonization in selected European countries (Erikson and Goldthorpe 1992; Breen 2004; Breen et al. 2009), ISCED is by now widely used for ex-ante and ex-post harmonization (e.g. Barone and Ruggera 2018). For CASMIN, mappings linking national education categories with the international ones are only available for a limited set of countries. In the case of ISCED, such mappings are available for almost all countries in the world (UNESCO 2021) and in quite some detail specifically for Europe (Eurostat 2021).

To give some examples, the European Social Survey (ESS, 2020c), the European Values Study (EVS, 2020), the Survey of Health, Ageing and Retirement in Europe (SHARE, Börsch-Supan et al. 2013), the Programme for the International Assessment of Adult Competencies (PIAAC, OECD 2016), and the European Union Labour Force Surveys (EU-LFS, Eurostat 2020), harmonize education ex-ante and use fairly detailed categorical coding schemes based on ISCED. However, the specific ISCED-based schemes still differ from each other. The multi-digit education coding scheme used in the ESS since 2010, differentiating levels and various types of education within levels in one variable, called ‘edulvlb’, has fared quite well in empirical tests and was thus recommended for use in other comparative surveys (Schröder 2014, , chap. 3). It has since then gained some acceptance beyond the ESS community and has been implemented in the EVS 2017, SHARE, and—in slightly adapted form—for the upcoming cycle of OECD’s PIAAC (Allen et al. 2017). Most of these datasets additionally have information on (actual or derived) years of education.

In between these two solutions using levels of education, there are datasets using a medium amount of differentiation, providing main education levels without differentiation of types of education. These main education levels are often defined with reference to ISCED, which is thereby being reduced to its ‘bare bones’ (Schröder 2014, p.76). The European Social Core variables project has specified ISCED main levels as the minimum requirement for official Eurostat surveys (Eurostat 2007). The Luxembourg Income Study (LIS, 2019) and the Survey Data Recycling project (SDR version1, Survey Data Harmonization Team 2017; Slomczynski and Tomescu-Dubrow 2018) use main ISCED levels, but the latter also provides additional information in separate so-called ‘harmonization control’ variables and thus in fact almost falls into the category of complex coding schemes: whether the respective education was not completed, whether it was vocational, and whether the category also includes higher levels of education. These harmonization control variables are intended to support researchers controlling for properties of the source survey items, and thus to enable researchers to ‘recycle’ rather than throw out data.

2.3 Scaling education

Finally turning to the third approach, scaled education variables, these are obtained by transforming categorical education variables into a linear metric using supporting variables. The ISMF project developed a linear education scale, constructed by Schröder and Ganzeboom (2014) through cause-and-effect-proportional scaling of national education variables—an approach building on prior work by Treiman (1975) and Smith and Garnier (1987)—, called International Standard Levels of Education (ISLED). The resulting scores can in principle be applied to national education variables across surveys.Footnote 4 Given these national variables are usually not standardized within countries across surveys, this process is rather complex and introduces new errors.Footnote 5 ISLED scores can also be derived from the ‘edulvlb’ coding scheme used in the ESS (Schröder 2014, chap. 3),Footnote 6 as well as from 3-digit ISCED codes (Schröder 2014, chap. 5), making it more widely applicable but still relying on detailed harmonized measures of levels of education.

This summary of existing harmonization schemes reveals that both scaled education variables as well as derived years of education rely on harmonized measures of levels of education. The latter thus are not just important for data users who wish to analyze levels of education, but equally so for data users preferring other ways of analyzing educational attainment. However, as will be shown in Sect. 4, the existing international coding schemes for education are not easily applicable in ex-post harmonization, which motivates the development of the more generally applicable coding scheme GISCED.

2.4 Aggregate or detailed coding of education?

It can be debated whether survey data and harmonized datasets should rather provide detailed or aggregate education variables. Aggregate variables contain information on many different educational qualifications in very few summary categories, which often carry a more vague meaning. Detailed variables in contrast contain many categories that are more specific. An argument in favor of aggregate data is that mis-classifications will be less common than when providing detailed variables: some ‘noise’ will disappear due to the aggregation. Producing aggregated variables is also arguably less labor intensive than producing detailed variables. A lot of public reporting (see e.g. OECD‘s ‘Education at a Glance’-seriesFootnote 7.) is also limited to aggregated measures, and it is easier to communicate about broad distinctions in public debate.

Cross-national survey data involving many European countries, however, notably chose the more complex coding schemes, because European educational systems tend to be more complex than the US-American system (Shavit and Müller 1998) or education systems in most low and middle income countries. When using years of education to measure educational attainment in complex educational systems, there is a concern that important distinctions between people who have followed different educational pathways with the same number of years of education would not be measured, leading to hidden inequalities and low validity of the data (Braun and Müller 1997; Schneider 2010; Kerckhoff and Dylan 1999; Kerckhoff et al. 2002; Schröder 2014). The same criticism may be applied to projects using a low number of broad education categories, and research shows that types of education within levels can make a difference for various outcomes (Triventi 2013a; van de Werfhorst 2017; Delaruelle et al. 2020). We can only study the effects of extreme disadvantage by adequately representing the periphery of the socio-structural concepts we are interested in. Avoiding loss of information, be it by losing entire cases because of incomplete data, by aggregating categories, or transforming a categorical measure of educational qualifications into hypothetical years of education, is therefore an important guiding principle of output harmonization (whether ex-ante or ex-post).

So, while both years of education and broad education categories are commonly used and may be perfectly adequate for data analysis in many research projects, they may be deficient as the sole harmonized education variable(s) for projects aiming for a high level of data reuse, since important categorical distinctions cannot be derived from years of education or broad education levels. Then, educational effects and inequalities may be underestimated, and hypotheses concerned with unveiling such inequalities cannot be tested. In contrast, when using a detailed categorical coding scheme, it is possible to derive years of education from these categories, or simplify the detailed scheme into broad categories in whichever way needed for a specific set of countries or a specific research question. Even for deriving years of education or linear education scores, the amount of detail covered in the source variables is crucial (Schröder and Ganzeboom 2014). Such derivations are not possible the other way round. The degree of ‘interoperability’ and ‘re-usability’, core principles of FAIR data (Wilkinson et al. 2016), are thus highest when applying a detailed categorical scheme for harmonization, or (which is in terms of information content equivalent), harmonizing information into a whole set of harmonized variables. In contrast, especially when combining data from countries at different stages of development and different time points, the problem of harmonization becomes rather severe when only broad levels are available in the data, and the boundaries between the broad levels do not match across datasets. The GISCED framework therefore follows the strategy of coding at a high level of detail.

3 The international standard classification of education (ISCED) 2011

ISCED 2011 (UNESCO 2012; Schneider 2013) is the international statistical classification for education-related data and maintained by UNESCO Institute for Statistics in Montreal, Canada. Like all classifications, its aim is to “[...] group and organize information meaningfully and systematically, usually in exhaustive and structured sets of categories that are defined according to a set of criteria for similarity”, and “to provide a simplification of the real world” (Hancock 2013, p. 3).

In contrast to its predecessor ISCED 1997 (UNESCO 2006), ISCED 2011 provides a three-digit numerical coding system for the sub-classification of levels of education (there is also a sub-classification for fields of education). There are two variants of the sub-classification of levels of education, one for the classification of educational programs (ISCED-P) and on for the classification of educational attainment (ISCED-A). Since surveys are mostly concerned with measuring the latter, ISCED-A forms the backbone of the proposed GISCED.Footnote 8 This section describe ISCED-A digit by digit. Table 1 gives an overview of all ISCED categories. The information on which national educational program and qualification fulfills which criteria and is assigned to which ISCED code is found in the official ISCED mappings (Eurostat 2021; UNESCO 2021).

Table 1 ISCED 2011 codes and labels for educational attainment

3.1 The first digit: ISCED main levels

The first digit of ISCED consists in an ordered set of categories going from no to the highest possible level of education. ISCED level 0 is used for individuals who have not completed primary education, which is defined as ISCED level 1. ISCED level 1 provides pupils with fundamental skills in reading, writing and arithmetic in four to six years. ISCED level 2 or lower secondary education is an intermediate level in which a broad range of subjects is being taught until approximately age 15 or 16. Completion of ISCED 2 is often regarded as the absolute minimum level of education, with some countries using the term ‘primary education’ for this level, which may lead to some confusion regarding the classification of such programs. While most countries offer one educational program for all students in ISCED 2, some countries already start sorting students into different programs by ability at the start of ISCED 2.

In upper secondary education (ISCED level 3), subject specialization increases substantially, especially in vocationally oriented programs. Program choice and achievements at this level predetermine to a large extent whether an individual will gain access and be able to successfully complete higher education. ISCED level 3 in most countries ends at age 18 or 19, after about 12 years of schooling. All individuals pass through ISCED levels 1 to 3, if they do not stop their education before the end of upper secondary education. ISCED level 4, post-secondary non-tertiary education, in contrast, is an ‘optional’ level. It mostly contains educational programs that allow individuals to change the specialization they had at level 3, in order to either gain access to tertiary education programs they would otherwise not be admitted to, or get vocational training to then enter the labor market. It also contains programs that may nationally be considered as ‘tertiary’, but that last less than two years and thus do not fulfill the requirement for classification in tertiary education.

ISCED level 5 is the first out of four levels summarized as ‘tertiary education’. It contains all programs leading to qualifications below the level of a Bachelor’s degree but that take at least two years of full-time education to complete. These are mostly vocational programs.Footnote 9 Higher education ‘proper’ starts with ISCED level 6, the Bachelor’s level. Next to the prototypical Bachelor’s degrees, it is also used for other qualifications in higher education that last 3 to 4 years since the end of upper secondary education, such as polytechnic diplomas. ISCED level 7, the Master’s level, contains not just Master’s degrees and other post-graduate qualifications, but also qualifications from long (5 to 6 years) first degree programs, which were common in many countries before the onset of the Bologna reforms and in many countries still exist in certain subject areas such as medicine or law. ISCED level 8 finally is reserved for doctoral programs. Code 9 is foreseen for education that cannot be classified in any of these levels.

These levels are also referred to as ‘main’ levels. Very often, nothing else is actually used, or not even that: official data often only use the three broad reporting categories ‘low’ (up to ISCED 2), ‘medium’ (ISCED 3 and 4) and ‘high’ education (ISCED 5 and upwards). However, since depending on the country, many respondents may accumulate in one or a few ISCED levels despite heterogeneous educational experiences, and specific types of education (such as vocational vs. general) are also of theoretical interest in research, it is worthwhile to also consider the next two digits of ISCED in data coding and analysis. Especially for data producers that provide harmonized data for researchers with diverging measurement requirements, a more detailed than first digit coding is advisable.

3.2 The 2nd digit: program orientation

The second digit has a different meaning depending on the level of education we are talking about. For ISCED 0, the second digit distinguishes (0) ‘no education’ (at all) from (1) ‘some pre-primary’ and (2) ‘some primary education’—also a new feature of ISCED 2011. At ISCED levels 2 to 5, it distinguishes between (4) general and (5) vocational programs, without any ordering implied. A vocational orientation applies if a program prepares for a specific occupation or class of occupations. All other programs are considered to be ‘general’. At ISCED levels 6 to 8, the terms ‘academic’ and ‘professional’ are sometimes used in a similar vein. However, since the distinction of programs preparing for a specific occupation is not of major importance here, and internationally agreed definitions for distinctions within higher education lacking, the second digit is rarely used in official data at these levels. This is why ISCED provides code (6) ‘orientation unspecified’ specifically for these levels.

3.3 The 3rd digit: level completion and access to a higher level

The third digit of ISCED 2011 for attainment (ISCED-A) is only defined for levels 2, 3 and 4, and identifies whether the educational program (at least partially) completes the ISCED level (code 2). This is usually not the case for programs of very short duration of e.g. just one year (which get code 1 on ISCED-P and are not considered for classification in ISCED-A). If the program in question completes the level, ISCED distinguishes whether it provides access to a higher level of education (code 4) or not (code 3). These codes are thus ordered and can be regarded as sub-levels. At ISCED levels 3 and 4, access refers to either of levels 5, 6 or 7. So if a vocational program at level 3 only gives access to a program at level 5 but not 6 or 7, it will be classified as providing access, even though it will likely be at a lower standard than a general program providing access to levels 6 and 7.


For the second and third digit of ISCED, code 0 is used when these digits are not specified at the respective ISCED level. For example, at ISCED 1, there is no distinction of vocational and general education and sub-levels don’t apply either, and at ISCED 0, 1, and in tertiary education, all programs are assumed to complete the respective level and to give access to a higher level.

4 Extending the ISCED 2011 code scheme for ex-post harmonization

ISCED was developed for administrative rather than survey data. Only since the adoption of ISCED 2011, the classification offers some features specifically geared to measurement of education in surveys, distinguishing the classification of educational programs (ISCED-P, for indicators related to enrollment) and educational qualifications (ISCED-A, for indicators related to educational attainment), as well as providing a 3-digit coding scheme. If data using this standard coding scheme were more widely available, ISCED could become a very useful tool in the standard coding of education in surveys, much like the International Standard Classification of Occupations (ISCO, International Labour Organisation 2007) for the derivation of occupation-related variables such as social status (Ganzeboom et al. 1992) or social class (e.g. Oesch 2006; Rose and Harrison 2010; Erikson et al. 1979).

Surveys using ex-ante output harmonization like the ESS can take the requirements of ISCED into account when developing measurement instruments. This is not the case with existing survey data on educational attainment that a researcher may want to harmonize ex-post, where measurement instruments and categories differ across countries (because of the different educational systems) and even across surveys within countries. Many education categories in actual surveys nevertheless can directly be coded into ISCED-A using the ISCED mappings (Eurostat 2021; UNESCO 2021). However, a few issues will repeatedly appear in any ex-post harmonization project:Footnote 10

  1. 1.

    What if the ISCED main level of education is not identified in the source education variables, e.g. because an education category spans two ISCED levels?

  2. 2.

    What if information on program orientation, level completion and/or access to a higher ISCED level is unknown?

  3. 3.

    What if there is some potentially important piece of information covered in the source education variables, that is however not covered by ISCED, and that the harmonization project wants to keep visible in the harmonized data?

In order to offer standard rules and codes for ex-post harmonization taking these issues into account, this paper presents a generalized framework for the classification of education in surveys. The numerical code scheme underlying this framework is in short called GISCED, i.e. the Generalized International Standard Classification of Education. It is complemented by a set of separate extension variables that capture information not covered by ISCED in a standardized way. Both GISCED and the extension variables can be adapted to the requirements of a specific project with regards to the level of detail covered. This section describes the GISCED framework and classification rules in some detail.

A general idea when developing GISCED is that ISCED 2011 codes can be derived by just dropping newly specified digits, in order to avoid the need for complex recoding and to keep interoperability with data using official ISCED codes high. In this sense, ISCED forms the ‘heart’ of GISCED. The other general idea is that the framework should allow harmonization with minimal loss of information, also if the source education variables are richer in information than ISCED. Therefore, the generalized framework does not just offer new ‘unspecified’ categories at each ISCED digit, but also introduces new digits and so-called ‘extension’ variables to carry information not covered by ISCED but potentially relevant in comparative research.

4.1 Categories that span multiple main ISCED levels

A common harmonization problem is that sometimes, education categories in questionnaires do not correspond to a single main ISCED level but include a mix of levels (e.g. Luxembourg Income Study n.d.). For example, depending on compulsory schooling legislation in a country, countries differ in how much they differentiate education at very low levels: in the first four rounds of the ESS, Austria for example only distinguished completion of compulsory school, corresponding to ISCED level 2, and no educational qualification. As an effect, ISCED 0 (less than primary education) and ISCED 1 (primary education) cannot be distinguished for Austria in ESS rounds 1 to 4. While ISCED offers code 9 to classify educational programs and qualifications ‘not elsewhere classified’, this would result in severe loss of information since we do know that the respective category aggregates ISCED levels 0 and 1.

Therefore, another digit is added after the ISCED main level, which indicates the (included) upper bound of a category, while the main level indicates the (included) lower bound.Footnote 11 If there is no range, the code for the applicable level would simply be repeated. Code 9 on the first (and/or second) digit of GISCED should thus only be used if the applicable main ISCED levels are absolutely unknown. Table 2 shows how the first two digits of GISCED would then look like, in relationship to ISCED main levels.Footnote 12

Table 2 First two digits of GISCED: level codes including categories spanning ISCED main levels

This leads to a classification where categories are not entirely mutually exclusive, because the ‘spanning’ categories include the non-spanning categories, which is usually undesirable (Hancock 2013). However, this is exactly needed when survey data are not collected or harmonized with the ISCED main levels as a coding target in mind. Education categories that can be coded into a non-spanning category should always be coded there and spanning categories only be used if otherwise missing data would be produced.

For harmonization projects for which the supplementary dimensions of ISCED—i.e. all information beyond main level—is not relevant, this scheme that turns the first ISCED digit into two digits may be entirely sufficient. It will be referred to as ‘GISCED2’. Codes 02 (low), 34 (medium), and 58 (high) would represent the three broad ISCED levels commonly used in statistical reporting. GISCED2 can also be used to identify specific binary distinctions often used in the analysis of educational transitions, such as less than upper secondary (02) vs. upper secondary or more (38), or less than Bachelor’s level (05) vs. Bachelor’s level or more (68). For more demanding harmonization projects, ISCED second digit codes can be appended to these codes, too, e.g. when an education category clearly identifies vocational education, but mixes vocational education at two ISCED levels.Footnote 13

GISCED2 can be approximately transformed into ISCED main levels by dropping the second digit. With this solution, there will be some systematic underestimation of education when simplifying data using the lower bound only, i.e. ignoring the upper bound. However, this is often preferable to the alternative of producing missing data. Sometimes a level between the lower and upper bound may be a more adequate simplification.

4.2 Unknown information for the second and third digit of ISCED

Sometimes we do not know whether a qualification is vocational or general because of incomplete information, or a response category contains both. For ISCED 0, the distinctions foreseen by ISCED of whether any pre-primary or primary education was attended are hardly ever available in survey data (although for developing countries, sometimes ‘some primary education’ is separately measured). In a similar vein, we may not know whether a qualification or program gives access to a higher level of education or not—or a response category mixes these different categories.

There are several possible solutions to this problem. Firstly, if we happen to know that one of the sub-categories is clearly dominant in the respective education category, i.e. much more common than the other, the respective sub-category could be coded, disregarding the measurement error for those respondents for whom another sub-category would be more adequate. Secondly, a specific code could be used to signify that we do not have this piece of information. This would be advisable if no specific sub-category dominates the education category found in the questionnaire. Following the solution for ISCED main levels, we will use code 9 across all digits and levels to identify unknown further specifications.Footnote 14

Tables 3 and 4 show the codes available for the third and fourth digit of GISCED, corresponding to the second and third digit of ISCED. The only difference compared to the official ISCED coding (UNESCO 2012, Tables 2 and 3, p. 21–22) is then the changed label of code 9. Table 5 shows some exemplary GISCED codes and their labels to illustrate how the different digits come together, and how the lower and upper bounds (first and second digit) can be used to flexibly classify education categories that span ISCED levels.

Table 3 Codes for the third digit of GISCED, corresponding to the second digit of ISCED-A: sub-level and orientation
Table 4 Codes for the fourth digit of GISCED, corresponding to the third digit of ISCED-A: level completion and access to higher level
Table 5 Illustration of selected GISCED codes across all ISCED levels

4.3 Avoiding loss of information present in source education measures

There are (at least) five situations in which source education categories can be more differentiated than ISCED (see also Schneider and Kogan 2008): the existence of 1. external stratification or ‘tracking’ in secondary and 2. higher education, 3. different types of vocational education, 4. short higher level programs not considered for level completion, and 5. specific categories for incomplete education (dropout). In such situations, harmonizing data using ISCED leads to information loss. The GISCED framework therefore includes separate ‘extension’ variables, in order to allow this information to be retained in the harmonized data, without changing the GISCED code itself.Footnote 15 They carry substantive information that goes beyond information harmonized in ISCED. These extensions will be useful for labor market and social stratification research, but are less relevant when education is just a background variable. Data users can use these variables to custom-build harmonized education variables that include the respective kind of information. More specifically, the extension variable ‘edustrat’ (see Sects. 4.3.1 and 4.3.2, and Table 6) allows researchers to complement the GISCED scheme with information about stratification in secondary and higher education. The extension variable ‘vettype’ (see Sect. 4.3.3 and Table 7) complements GISCED by adding information about different types of vocational education and training.

4.3.1 Stratification in secondary education

While being comprehensive in most countries, some countries track students into different programs or school types already in lower secondary education. In these countries, some general education programs do not give access to all (and especially not to academically selective) programs at a higher level but only to vocational programs. Enrollment in such a pre-vocational program is consequential for individuals’ educational careers and labor market outcomes (Bol and Van de Werfhorst 2011; Bol et al. 2014; van de Werfhorst and Mijs 2010; Allmendinger 1989), but not identifiable using ISCED. At the upper secondary level, different general tracks and resulting qualifications—which exist more rarely at this level—give unequal access to different types of higher education (see section 4.3.2). This is why the ESS opted for a more fine-grained measure of education, distinguishing whether a secondary qualification provides access to general or academic education at the higher level or not on the third (access) digit. The ESS education coding scheme contains this information at ISCED levels 2, 3 and 4. To keep the ISCED code intact, it is suggested here to code an extension variable for ISCED levels 2 – 4 using the following codes:

  1. 1.

    = track not giving access to any higher level education;

  2. 2.

    = track giving access only to vocational, professional or lower tier higher level education (i.e. limited access);

  3. 3.

    = track giving access to academic, university or upper tier or all types of higher level education, including non-tracked (comprehensive) programs (i.e. full access).

These codes of the extension variable ‘edustrat’ are consistent with the third digit of the ‘edulvlb’ coding scheme used in the ESS (see section 5.2).

4.3.2 Stratification in higher education

Vertical stratification in tertiary and higher education is well captured in ISCED by the distinction of four tertiary education levels. Horizontal stratification in higher education takes on different forms in different countries though, hampering comparative measurement (Marginson 2016; Triventi 2013b, a). Again, this institutional differentiation is expected to be related to social background (Triventi 2013a) and to have consequences for individual’s educational and labor market outcomes (Shavit et al. 2007). Educational expansion then does not necessarily lead to equalization of opportunities because it may generate new inequalities through institutional differentiation (Lucas 2001). In countries with little institutional diversification in higher education, field of study is a good indicator of qualitative (horizontal) differences in education, but it is only rarely measured. In countries with a diversified higher education system, at the most basic level, there is often a differentiation between traditional or elite universities and institutions without university or ‘elite’ status (Bourdieu 1988).

One way of operationalizing this idea is that the former are more academic and research-oriented and are thus the prime (or sole) institutions that award PhDs. The latter are more practically-oriented and usually do not award PhDs. This differentiation is not made in ISCED though. The ESS at levels 6 and 7 therefore distinguishes (commonly less selective and more professional) qualifications from polytechnics, universities of applied science and lower tier colleges from (commonly more selective and academic) traditional university degrees. The ESS education coding scheme contains this information on the second digit (1 signifying lower tier/non-university programs, and 2 signifying higher or single tier/traditional university programs). The variable ‘edustrat’ already used for stratification in secondary education is therefore extended to higher education, using the distinctions already made in the ESS:

  1. 4.

    lower tier (non-selective, polytechnic, applied, and other non-university institutions);

  2. 5.

    upper tier (selective, traditional, research oriented universities awarding doctoral degrees) or single tier.

For ISCED 5 qualifications that are not part of the higher education but rather vocational education and training system in a country, edustrat should be coded 0 (e.g. master crafts in Germany). For ISCED 5 qualifications awarded at universities or polytechnics, it should be coded in the respective higher education tier.

Code 9 on this extension variable can again be used if the information on track in secondary or tier in higher education is not available or the different types are mixed in a single education category. Code 0 can be used for cases for which this extension variable is not applicable (e.g. less primary education).

Table 6 Codes for the extension variable ‘edustrat’, reflecting stratification in educational systems

4.3.3 Different vocational education and training types

Vocational education and training (VET) is organized differently across countries (Dieckhoff 2008). ISCED does not distinguish between school-based, work-based or the ‘dual system’ of combined school- and work-based VET. However, when comparatively studying skill production, success in higher education or labor market outcomes, the institutional context may make a difference (e.g. Hanushek et al. 2017; Forster et al. 2016; Saar et al. 2017): different types of VET relate to different degrees of employer involvement, occupational specificity, labor market linkage, standardization, skill transparency, state regulation, and development of general skills (Shavit and Müller 1998; Dieckhoff 2008; Bol and Van de Werfhorst 2016; Andersen and van de Werfhorst 2010). The organization of VET is also relevant for education and labor market policy. Due to lacking data, this is often only measured at the aggregate level, but often, various types of VET are available in one country. For projects interested in respondent level effects of different types of VET, a separate variable ‘vettype’ is introduced to the GISCED framework, distinguishing

  1. 1.

    School-based VET (with full-time vocational schooling);

  2. 2.

    ‘Dual’ VET (combining in-company training and part-time vocational schooling);

  3. 3.

    Work-based VET (without vocational schooling). This latter type may be dubious with respect to its recognition as formal education, and could be used to downgrade the affected respondents’ attainment level if education is intended to e.g. proxy general skills in the project in question.

This variable needs to be specified separately from the stratification variable introduced previously because both apply to the same levels and categories of education. Since some countries have started offering ‘dual’ VET in tertiary education (see e.g. degree level apprenticeships in the UK and Berufsakademie or ‘duale Hochschule’ in Germany), this variable is not only relevant for secondary but also tertiary education. Code 9 can again be used for cases for which the information is not available or mixed in a single education category, and category 0 for cases for which type of VET is not relevant (e.g. primary education).

Table 7 Codes for the extension variable ‘vettype’, reflecting different types of vocational education and training (VET)

4.3.4 Short higher level programs

Educational programs that are too short to complete an ISCED level, a situation that only occurs at ISCED levels 2 and 3, are classified as 2X1 and 3X1 respectively in ISCED-P. In ISCED-A, which is more relevant for survey research, qualifications from such a program would be classified as 100 and 2X4 respectively, which usually corresponds to the code of the qualification completed prior to entering the short program. Therefore, data coded using ISCED omit the information that a short higher-level program was completed. I thus firstly suggest to include the orientation of the higher level program in the coding of attainment at the lower level.Footnote 16 Secondly, a flag variable should be added to signal that in addition to the highest level of attainment, a short program from a higher level was also completed.Footnote 17 Alternatively, if the extension variable on educational stratification ‘edustrat’ has been coded, code 6 on that variable, so far empty, can also be used. This is useful because the track of the previously attended education would often be unknown for respondents who completed such short higher level programs anyway, so they would effectively be recoded from code 9 to 6 on ‘edustrat’.

4.3.5 Dropout categories

Some countries employ education categories reflecting that an educational program was not successfully completed in some surveys. However, then the actually highest level of education completed is not known. It is thus usually advised not to use such categories. If faced with this information in ex-post harmonization, it is not clear how to code it in ISCED. It is suggested here to try to infer the highest successfully completed program and code this as the respondents’ educational attainment in GISCED, which would be comparable to the information collected in other countries (i.e. excluding dropout-categories). Then, a separate variable can be produced containing the information about additional but incomplete education (again coded in GISCED), which would however only be available for those countries and surveys where this information is collected.

5 Applying GISCED

When harmonizing education categories using the GISCED framework, the best approach is to start from detailed, country-specific education variables, since these contain most information. This can be quite a labor-intensive process though. In order to establish which country-specific education category to map to which GISCED code, the official ISCED mappings (UNESCO 2021; Eurostat 2021) should primarily be used. These will usually give all information needed for coding into GISCED. A good rule to follow in order to maximize cross-national comparability is to only take into account completed education levels, and downgrade all education levels that were attended but not completed to the highest level that was successfully completed (see also Luxembourg Income Study n.d.).Footnote 18

For extension variables, depending on the specific extension, this information will also be available in the ISCED mappings, but often in a non-standard format (e.g. included in a textual note rather than standardized column). Otherwise, this will need to be established via separate channels such as the scholarly literature, existing harmonized data such as the ESS, information on educational systems on the Internet (such as EurydiceFootnote 19 or SurveycodingsFootnote 20), or expert consultations. The same applies to educational qualifications not covered by the ISCED mappings, especially outdated qualifications (which are successively being added to the ISCED mappings though), which are covered in Surveycodings though.

Table 8 shows an example, where the GISCED scheme and extension variable on educational stratification are applied to the country-specific education variable for Austria in ESS round 2 (vettype would be 9 for the vocational categories 3 and 5 and is thus not included). The Austrian education measure can neither be harmonized into ISCED main levels nor broad levels (low/medium/high).Footnote 21 GISCED allows a coding of categories 1, 5 and 6, which span ISCED levels, that avoids any loss of information due to harmonization. The example also shows how ‘edustrat’ differentiates between two categories at ISCED level 3 which both include vocational education with access to tertiary education, but where one category gives access to university studies and the other does not.

Table 8 Coding the Austrian education variable in ESS round 2 (2004) to GISCED and edustrat

If the data to be harmonized only contain already harmonized education categories (but with a different harmonization target), you can still use GISCED, which is (at least in the case of cross-national data) less labor intensive, but will also potentially result in less differentiated data (see next section for details). If using country-specific education variables, harmonized education variables using ISCED already present in the data can substantially speed up the process since often these will already contain most of the information needed to code the national education variables into GISCED. For example, the ISCED level could be derived from those already harmonized variables, and only for the coding of the second and third digit of ISCED as well as potential extension variables the country-specific variables would be needed. Whichever approach is chosen, it is important that the process is fully documented, and documentation (including mapping tables, Wysmulek et al. 2015) made available to later users of the harmonized data (Granda et al. 2010).

Analysts may also wish to harmonize data ex-post that already contain harmonized education variables, but not the same ones (e.g. ESS and PIAAC). This can be achieved by mapping GISCED with coding schemes that are used in these datasets. This paper provides the mapping between four other harmonized education coding schemes and GISCED: years of education, the ESS-coding scheme ‘edulvlb’, the European Survey Version of ISCED (ES-ISCED), and CASMIN.

5.1 Linking GISCED with years of education

Like ISCED, GISCED normally assumes source variables representing levels of education. With some pragmatic inference, variables including information on completed years of education can, however, be recoded into ISCED levels and thus GISCED as well. In this case, the correspondences provided in Table 9 should be used, which use the cumulative years of education by which ISCED levels are generally defined.Footnote 22 In such a case, GISCED does not add anything to ISCED main levels apart from the distinction of ‘some primary education’, which forms the second digit of ISCED-A at ISCED level 0 (and is thus included for ISCED-A in Table 9). Then, it may well be better to use derived years of education as the harmonized target variable anyway in order not to throw out any information. For recoding ISCED levels into years of education, following the Luxembourg Income Study (2019), the years shown in bold here should be used.Footnote 23

Table 9 Correspondence between years of education, ISCED-A, and GISCED

5.2 Linking GISCED with ‘edulvlb’, ISCED-A and ES-ISCED

Next, information for links between variables more directly relating to ISCED are presented together. The harmonized variable ‘edulvlb’ is used by the ESS since round 5 (2010). This coding scheme was implemented in the ESS while ISCED 2011 was still in development, but the most important features were already known. Therefore, the relationship between ISCED 2011 and the ESS education scheme is very close, even though the exact codes used at the second and third digits differ. However, the ESS education scheme adds information that is not covered in ISCED, along the lines of the differentiations mentioned here for the extension variable ‘edustrat’ (see Sect. 4.3). The mapping of country-specific education categories to ESS education codes is documented in Appendix A1 of the Data Documentation report of each survey round (European Social Survey 2018a, b, c, 2020a, b). These may thus also help for the harmonized coding of the extension variable ‘edustrat’ in other datasets. Table 10 shows how ‘edulvlb’, ISCED-A and GISCED correspond, using ‘edustrat’ as supporting variable. The close relationship between edulvlb and ISCED-A is very apparent, resulting in GISCED codes that do not actually need the new digit for categories spanning ISCED levels (see Sect. 4.1). New information is located almost exclusively in the extension variable ‘edustrat’.

From ‘edulvlb’, and thus also GISCED with the extension variable ‘edustrat’, it is also possible to derive the so-called European Survey-Version of ISCED (ES-ISCED). This variable has a much lower number of categories than ‘edulvlb’ and is thus suitable for statistical analysis. It has been shown to produce more valid and comparable results than ISCED main levels (Schneider 2010; Schröder 2014), and is thus an attractive alternative to ISCED main levels.

Table 10 Correspondence between ‘edulvlb’, ISCED-A, ES-ISCED and GISCED with stratification extension

5.3 Linking GISCED with CASMIN

The final comparative education coding scheme to link with GISCED is the CASMIN education scheme, which has its roots in social stratification research. It is not possible to code CASMIN into ISCED directly because some CASMIN categories span across ISCED levels, which GISCED, however, has a solution for (see Sect. 4.1). The mapping of CASMIN to GISCED is more difficult than of the ESS education scheme because in contrast to ISCED, CASMIN is constructed on a relative education scale in order to capture social selectivity effects of education (Brauns et al. 2003; König et al. 1988; Brauns and Steinmann 1999). It thereby identifies both class-specific barriers in educational systems and labor market signals of educational qualifications. This means that the mapping of qualifications to CASMIN is somewhat a ‘moving target’ since selectivity changes over time as educational systems expand. The duration of education as well as vocational vs. general education play a considerable role in both CASMIN and GISCED (as well as ISCED) though, allowing the construction of a rough correspondence.

CASMIN category 1b (‘general elementary education’) is defined as the (non-selective) ‘social minimum’ of education in any given country. To achieve a correspondence, I assume ISCED level 2 to correspond to this ‘social minimum’, even though this may differ across countries and time. Also, CASMIN differentiates general elementary education and intermediate general education, which, historically, maps to primary education (ISCED 1) and lower secondary education (ISCED 2) respectively, but today rather corresponds to differentiations within ISCED 2. This distinction is achieved by combining the extension variable on educational stratification with ISCED. Similarly, when general elementary or intermediate general education is combined with vocational training (categories 1c - Basic vocational qualification or general elementary education and vocational qualification and 2a - Intermediate vocational qualification or intermediate general qualification and vocational qualification), this can only be distinguished using ‘edustrat’, using the same values as with the qualifications without subsequent VET. Additionally, these two CASMIN categories span across ISCED levels. We therefore map both with vocational education at ISCED 2 to 3, resulting in code 2359Footnote 24. Finally, while CASMIN 3a corresponds to short tertiary education in most countries, which can be mapped to ISCED level 5, for Germany, it refers to the lower tier of higher education, classified as ISCED level 6. As a consequence, CASMIN 3a will be mapped to ISCED 5 and 6 lower tier, and CASMIN 3b to ISCED 6 – 8 upper tier. Table 11 shows how CASMIN coded data can be coded into GISCED, including ‘edustrat’. It is apparent that here, the harmonization using GISCED is much less straightforward than for the ESS education coding scheme, but nevertheless, it can be done (unlike with just ISCED).

Table 11 Correspondence between CASMIN and GISCED with stratification extension

6 Empirical illustration

Beyond the fact that GISCED improves the interoperability and re-usability of data and allows studying more specific education-related hypotheses than years of education or broad education levels, another criterion for its usefulness is what can be gained for substantive and policy-oriented research when using GISCED, possibly with extension variables, rather than other ways of harmonizing educational attainment. For illustration, this section presents some empirical analyses, using educational inequality, i.e. the impact of parental education on respondents’ education, as an example. It uses ESS round 5 to 9 data (ESS ERIC 2018a, b, c,2020a, b).

Respondents’ education is reduced to levels of education, i.e. the ordinal ISCED main levels, which are treated as a metric variable.Footnote 25 Parental education is harmonized in a number of different ways: Model 1 uses three broad education levels derived from ISCED levels, Model 2 uses ISCED 2011 levels, and Model 3 uses the detailed ESS education variable ‘edulvlb’, which corresponds to the combination of GISCED codes and the extension variable edustrat (see Table 10).Footnote 26 Parental education is thus conceptualized as both levels and types of education, i.e. education as a categorical variable using GISCED and extension variables. In this way, we can see how different levels and types of education influence the educational attainment in the filial generation. This model shows how using a high level of detail in harmonization, including stratification in education, provides further insights into educational inequalities across generations than ISCED levels alone. Model 4 uses ES-ISCED as an alternative aggregation to main ISCED levels, because a detailed coding scheme like the one used in Model 3 is not very practical in empirical analyses. As has been suggested elsewhere (Schröder 2014), ISCED 0 is differentiated from ISCED 1 here because in the parental generation, less than primary education is actually still a rather common category. Table 12 shows results from these four random intercept multilevel linear regression models of respondents’ educational attainment on the education of their most highly educated parent. For each model, R-squared following Snijders and Bosker 1994 is reported for both level 1 (respondents) and level 2 (countries), using the Stata package ‘Multilevel tools’ (Möhring and Schmidt 2012). All models use design weights, control for age and sex, and use respondents aged 26 to 64 only.

The results show that the broader the education measure, the more heterogeneity of effects is hidden from view. This is particularly visible when comparing Model 1 with Model 2, and Model 2 with Model 3. Looking at Model 1, compared to respondents with parents with a low level of education (ISCED 0 – 2), respondents with highly educated parents (ISCED 5 – 8) obtain two more levels of education, and those with medium educated parents one more level of education. Model 2 reveals that each additional level of education of parents leads to approximately half an extra level of education for their children, with two exceptions: (1) respondents with parents that have not even completed primary education (ISCED 1) are severely disadvantaged and get three quarters of an education level less than those whose parents have at least completed ISCED 1 (granted, the number of affected respondents is low, but in terms of social policy, this is a highly relevant result). (2) having parents educated at ISCED level 5 only gives a quarter extra level compared to parents educated at ISCED level 4, and ISCED 6 another quarter extra level compared to ISCED 5. The differentiations introduced by ISCED 2011 compared to ISCED 1997 (where ISCED levels 5 – 7 were all in one level) appear to have been fruitful, with ISCED 5 having lower effects than ISCED 6, ISCED 6 lower effects than ISCED 7, and ISCED 7 lower effects than ISCED 8 (all effects are statistically significantly different from each other). Having one parent with a PhD gives an advantage of almost three education levels compared to having parents with primary education at most. Also these results show that even though ISCED 4 is rare, it does lead to effects that are significantly higher than those of ISCED 3 (the level it is commonly aggregated with).

Then looking at the highly detailed results of Model 3, we find a number of interesting differentiations: Firstly, children of parents with vocational lower secondary education (ISCED 2), which has a rather bad reputation, achieve a higher level of education than children of parents with just general lower secondary education, if the vocational program gives access to upper secondary education (GISCED 2254). If it does not (GISCED 2253), there is no statistically significant disadvantage though. Within general lower secondary education, parents who completed the higher or a single track (edustrat code 3) more positively influence their children’s education than those who completed a lower (pre-vocational) track (edustrat code 2), but the difference between these effects is only marginally statistically significant (p=0.0684).

Table 12 Multilevel models for validation using various education codings (N L1=145350; N L2=35)

Secondly, within upper secondary education (ISCED 3), there are also marked differences between the effects of generally versus vocationally educated parents, here pointing to advantages of children of generally educated parents. The effects of different kinds of upper secondary general education are not statistically significantly different from each other. However, parents with vocational upper secondary education that gives access to academic higher education (GISCED 3354, edustrat code 3) give their offspring the same advantage as those with general upper secondary education (GISCED 3344). In line with previous research (Schneider 2010), vocational upper secondary education giving access to all kinds of higher education is not a disadvantage compared to general education. If the access is limited to only the lower tier or vocational tertiary education though, there is a statistically significant disadvantage compared to those with full access. This is likely due to the different content of vocational upper secondary education programs giving access to university studies, and resulting differences in skills of its graduates. Here we find that the extension variable ‘edustrat’ indeed carries important information that is not represented by ISCED alone.

Thirdly, while ISCED level 4 is rather uncommon, we still find differences in effects here, notably that parents who graduated from ISCED 4 programs that do not give access to tertiary education do not convey any advantage to their children beyond the advantage they would have provided when completing upper secondary education with access to tertiary. Still, since this type of ISCED 4 education often follows ISCED 3 programs that do not give access to tertiary education, compared to that, parents with ISCED 4 still give some advantage to their children.

Finally, in tertiary education, while distinctions of orientation of education are irrelevant at ISCED level 5, i.e. for qualifications below the Bachelor’s degree, types of education as identified by the extension variable ‘edustrat’ are relevant in the levels belonging to higher education strictly speaking (ISCED 6 – 8). Both in ISCED levels 6 and 7, parents who completed lower tier (i.e. college or polytechnic) programs (edustrat code 4) convey less educational advantage to their offspring than parents who completed higher or single tier (i.e. traditional university) programs (edustrat code 5). These results are very much in line with expectations from educational stratification research, and again point to the validity of the extension variable ‘edustrat’, and that this is a useful extension of ISCED.

Model 4 shows results when aggregating the detailed education variable in an alternative way, not using ISCED main levels. Using ES-ISCED reveals how different types of upper secondary education of the parents differ in their effects on the educational attainment of their children. The explained variance is between the one obtained with ISCED main levels and edulvlb, even though the variable has one category less than ISCED has main levels. It is thus a more parsimonious alternative to ISCED levels, which however cannot be derived from data coded using ISCED levels alone.

7 Discussion and outlook

The combination of different datasets is increasingly popular. Survey variables on educational attainment are, however, often not coherently measured and coded: Different national and cross-national surveys use different measurement instruments and coding schemes. In order to support researchers, data harmonization projects and data producers with the task of education harmonization, this paper described a generalized coding framework for ex-post harmonization of educational attainment variables in surveys, called GISCED. It is inspired by the education coding scheme ‘edulvlb’ used in the ESS, but adapts it for the purpose of ex-post harmonization and sets up a closer link with ISCED 2011. By extending the official three-digit ISCED code to four digits, it should be possible to code almost every education category in any dataset to an internationally intelligible code—if it is well enough documented to understand its relationship with ISCED main levels. For retaining information available in source data that is not taken into account by ISCED, it is suggested to code separate ‘extension’ variables recording this information in standard format, e.g., the secondary education track, type of higher education, or type of vocational education and training.

Compared to other education coding schemes, the suggested framework results in data with a maximum possibility for reuse and interoperability if coded at the detailed four-digit level. GISCED can be transformed into ISCED 2011 and 1997 (in detail and main levels) and derived years of education. Data coded at this level of detail can be further processed to satisfy many different research needs and allows testing of innovative hypotheses that cannot be tested using measures based on years or main levels of education alone, especially when looking at differential effects of different types of education within main levels. It can also help coding aggregated education variables in a way that allows better modeling of specific education effects, as the above empirical illustration showed when using ES-ISCED distinguishing university-preparatory from other programmes at the upper secondary level. Depending on the specific research questions, there are many other ways in which the detailed variable can be simplified. The fact that the framework has four digits makes this look more complex than it actually is since only few codes are actually used at the third and fourth digits, and not all combinations are possible.

Granted, these fine-grained distinctions will not be relevant for all research using education as an independent variable, but the above illustration shows that they do carry meaning. Depending on the purpose of the data harmonization, more or less effort may be justified in producing such harmonized extra information. For many ex-post harmonization projects, the variant ‘GISCED2’ that only improves the harmonization into main levels of education without considering program orientation or sub-levels, will be sufficient. For research interested in social and educational inequalities, further details will generate richer research opportunities.

The empirical illustration is of course rather limited and the results may not be generalizable to other relationships. Sor further research might further investigate which kinds of research a detailed harmonization is most fruitful for, and for which kinds of research it is not worth the effort. One could, for example, additionally look at how track placement amongst respondents is influenced by parental education (where parental tracks may be even more important), or how respondents’ occupational attainment is influenced by their education (where type of education, including type of VET as measured by the extension variable ‘vettype’ may be highly relevant), but this goes beyond the scope of this paper.

The illustration is also based on ESS data that are already available with detailed education codes obtained via ex-ante harmonization. In a next step, it would be highly relevant to employ the GISCED scheme for a real ex-post harmonization project to further study its usability and empirical value. The generalized coding framework may be most useful as a coding framework for international ex-post harmonization projects. However, national projects, especially longitudinal surveys, may find it equally useful to structure their harmonization over time given national education classifications suitable for survey use do not exist in all countries. For data producers and harmonization projects that aim to produce data that are suitable for a range of research questions, including labor market and social stratification research, the framework could be a helpful tool to standardize the harmonization of education categories across diverse sources that were not designed with cross-national or over time comparability in mind.

If the source variables already differ in their scope, e.g. one dataset only measures vocational education if this is provided as full-time schooling (like the ISSP), and another one measures all vocational education irrespective of whether the school-based element is full-time or part-time (like many surveys using ISCED), some degree of non-comparability cannot be overcome by a common code-frame that otherwise harmonizes the categories of the education variables in the respective surveys. GISCED can do nothing to remedy this but recommend the usage of harmonization controls (Slomczynski and Tomescu-Dubrow 2018) to data users, as well as adherence to the definition and guidelines for the measurement of educational attainment in official statistics, such as OECD, Eurostat (2014), to data producers.

While GISCED will allow almost every education category appearing in surveys using measures of levels of education to be harmonized with very little loss of information, it will still result in heterogeneous data if the variables to be harmonized are measured at very different levels of detail. Therefore, the general recommendation for surveys to not aggregate education categories too much already at the stage of data collection (Schneider 2008) is still highly relevant: information not gathered cannot be coded and harmonized—whether using ISCED, GISCED, derived years of education or some other form of scaling.

It is then up to the analyst to make the most of the information provided by recoding into simpler categories that are adequate for the research question at hand, that can be used in statistical analyses. Looking at the distribution of all codes in the harmonized dataset, analysts will be better able to aggregate education variables in a substantially meaningful way for analysis than opting for just broad ISCED levels from the outset and accepting substantial information loss. Further research should look into how reliably data can be harmonized using GISCED and the proposed extension variables, and look more deeply into the substantive research potential and cross-national comparability of the resulting data compared to other harmonization schemes.