1 Introduction

This article documents the number of target persons participating in the panel surveys of the National Educational Panel Study (NEPS) as well as the number of respondents who temporarily dropout and of those leaving the panel (attrition). We introduce discrete time event history models as proper means to study panel attrition and selectivity in NEPS. For this purpose, we consider all of the six NEPS starting cohorts and their corresponding Scientific Use Files (SUFs) published by February 1st, 2018. NEPS is a nationwide study gathering information about the educational trajectories of people residing in Germany. To cover the complete life span with respect to significant educational transitions, it surveys target persons from six mutually exclusive starting cohorts:

Starting Cohort 1 (SC1):

children born between February and July 2012,

Starting Cohort 2 (SC2):

children in 2010 whose enrollment in school was expected to be in school year 2012/13,

Starting Cohort 3 (SC3):

students in grade 5 in regular and special schools in school year 2010/11,

Starting Cohort 4 (SC4):

students in grade 9 in regular and special schools in school year 2010/11,

Starting Cohort 5 (SC5):

freshmen in 2010/11 at universities and universities of applied sciences,

Starting Cohort 6 (SC6):

adults born between 1944 and 1986 living in Germany.

Detailed information on the objectives, the composition, and the contents of NEPS is given in Blossfeld et al. (2011). The population and the sampling design of all starting cohorts is described in very detail in Würbach et al. (2016) for the SC1, Steinhauer et al. (2016) for the SC2, Steinhauer and Zinn (2016a) for the SC3, Steinhauer and Zinn (2016b) for the SC4, Zinn et al. (2017) for the SC5, and Hammon et al. (2016) for the SC6. Up to now, the following SUFs have been released, see https://www.neps-data.de/:

SC1::

Waves 1 to 4 from 2012 to 2015 (SUF version 4.0.0),

SC2::

Waves 1 to 6 from 2011 to 2015 (SUF version 6.0.0),

SC3::

Waves 1 to 7 from 2010 to 2015 (SUF version 7.0.0),

SC4::

Waves 1 to 9 from 2010 to 2015 (SUF version 9.1.0),

SC5::

Waves 1 to 9 from 2010 to 2015 (SUF version 9.0.0),

SC6::

Waves 1 to 7 from 2009 to 2016 (SUF version 8.0.0).

Taken together all of the SUFs comprise in total 72 studies. Table 1 gives an overview of all of these studies inclusively (NEPS internal) study numbers, study time, survey periods, panel waves, and survey mode. In each study, survey questionnaires have been administered in one of the following survey modes:

  • CATI: computer assisted telephone interview,

  • CAPI: computer assisted personal interview,

  • CAWI: computer assisted web interview,

  • PAPI: paper and pencil interview.

Some studies allowed respondents to choose between modes, while other studies assigned them randomly. In few studies special groups of respondents were assigned to a particular survey mode to increase the likelihood of participation. For example, SC6 panel members who could not be interviewed on the phone (via CATI) were automatically assigned to the CAPI mode.

Generally, target persons are surveyed in two different contexts, either in groups such as test groups in schools or universities or individually, for example when interviewed on the telephone or personally at home. Comprehensive details on this and the NEPS studies in general are given at the web page of the NEPS data.Footnote 1

Besides questionnaires, NEPS also administers competence tests to gather information on the development of knowledge, skills and competencies relevant for educational processes and decisions. There are domain-general tests such as cognitive functioning and domain-specific tests such competencies in mathematics and reading. In Table 1, waves with tests are marked by a star. Note that target persons at younger ages, i.e. in SC1 and in SC2 from 2011 to 2013, are tested but questionnaires are answered by their parents. At later ages (i.e., in SC2, SC3 and SC4), both, parents and target persons, are interviewed.

Table 1 Attribution of studies to starting cohorts and panel waves

The remainder of this article is structured as follows: first, we detail the number of participants and temporary as well as final dropouts along all of the panel waves and starting cohorts. Second, we present the results of the selectivity analyses in which we study how attrition affects the composition of the NEPS samples. We conclude with some recommendations for dealing with the detected selection bias in statistical analyses.

2 Participants, Dropouts, and Attrition

NEPS surveys target persons together with relevant context persons such as parents, educators, and teachers, where it applies that is at younger ages in SC1, SC2, SC3 and SC4. This article, however, focuses on the target persons only. Information on context persons are provided elsewhere, for example, at the web page of the NEPS data. In the subsequent, a target person is considered to be a participant when that person has provided some information on him- or herself during a study.Footnote 2

Initially for each starting cohort a gross sample had been established comprising all of the units drawn to be part of the panel survey. In SC1, SC5, and SC6 the whole gross sample has been administered in Wave 1, and each of its members has been asked for panel consent during the first wave. All respondents with positive consent form the panel cohort of the corresponding starting cohort at Wave 1. On the contrary, in SC2, SC3, and SC4 the panel consent had been obtained before the first wave, thus, the sample administered in Wave 1 already constituted the panel sample. In other words, the people asked to participate in the first waves constitute different samples: in SC1, SC5, SC6 the gross sample, and in SC2, SC3, SC4 the panel sample at Wave 1. At the start of a specific wave, the panel sample of each starting cohort consists of all individuals who initially gave their panel consent and did not refuse further participation, or are defined as final dropout because of one of the following two reasons: (i) continuous non-participation over a period of two yearsFootnote 3 or (ii) a response code in a previous study defined to be an attrition event. These response codes are:

  • respondent refuses participation in general/permanent deletion of address/withdraw panel consent (for target person),

  • death of target person,

  • target person already surveyed,

  • respondent refuses new address (for target person),

  • target person cannot be surveyed/permanently sick or disabled,

  • communication impossible/respondent does not speak enough German/no communication possible in one of the languages offered,

  • respondent refuses participation in general/permanent deletion of all of the data/withdraw panel consent (for target person).

Sometimes not all of the members of the panel sample are administered in each panel wave. There are two main reasons for this. First, questionnaires or tests cannot be administered because of missing contact information. This occurs mainly in highly mobile populations such as students graduating from school and leaving home for further training or studying. Second, by design only specific subgroups are considered in a wave, for example, only students of a specific field. Persons who were administered in a study but did not participate and who are not a final dropout are regarded as temporary dropout. Note that final dropouts can occur within and between studies: within waves attrition results from an accordant response code, and between waves attrition arises because of active refusal or continuous non-participation over a period of two years.\({}^{3}\)

Subsequently, the distinct panel samples of NEPS are described, broken down by starting cohort, panel wave, administered sample, number of participants and temporary dropouts as well as final dropouts within and between waves. In, SC2–SC6 sampling particularities allow for the derivation of design specific subsamples which are considered in our presentation. These are:

Table 2

The figures of SC1 and SC2 are given in the Tables 2 and 3. The Tables 4 and 5 summarize the numbers of SC3 and SC4, and the Tables 6 and 7 present the numbers of SC5 and SC6. Participation rates are calculated as the ratio between the size of the administrated sample and the number of participants. The Figs. 1 to 6 illustrate the panel progress of all starting cohorts graphically.

2.1 Starting Cohort 1

The NEPS SC1 (Newborns) started with a gross sample size of 8483 persons (cp. Table 2). In Wave 1, 3481 interviews could be realized corresponding to a participation rate of 41.0%. The panel cohort reduced to 3431 (participation rate 40.4%) since 42 participants gave no panel consent in Wave 1, and 8 participants withdrew their panel consent before Wave 2. The numbers of Wave 2 are reported separately for CATI and CAPI mode. In the parent interview (CATI) we recorded 2849 respondents, the corresponding participation rate is 83.0%. Additionally, direct measurements and another parent interview were applied to a random subsample of the SC1 panel cohort in Wave 2. In total, 1893 persons were asked for participation and 1510 cases could finally be realized corresponding to a participation rate of 79.8%.

Fig. 1
figure 1

Size of Panel Cohort SC1 along Waves

Among the 2616 realized interviews in Wave 3, 2609 are valid (participation rate 79.5%). Seven interviews are considered invalid due to technical problems during the survey. In Wave 4, 2480 interviews were realized, but two interviews had been conducted from interviewers without approval for execution. The data from these two interviews were regarded as not exploitable and thus regarded as temporary dropouts. The corresponding participation rate is 78.8%. Due to continuous non-participation over a period of two years 143 of the 541 temporary dropouts are converted to final dropouts between Waves 4 and 5. Fig. 1 displays these numbers, where the height of each bar gives the initial number of targets with valid panel consent.

Table 2 Panel Progress SC1

We see that the amount of temporary dropouts remains stable across the panel waves whereas the number of final dropouts is adding up, of course.

2.2 Starting Cohort 2

The NEPS SC2 (Kindergarten) started in 2010 with a panel cohort comprising 3007 Kindergarten children whose school enrollment was expected to be in the school year 2012/13 (cp. Table 2). In the first wave, 2949 Kindergarten children participated together with their parent. The corresponding participation rate is 98.1%. Wave 2 consists of 2727 participants yielding an identical participation rate as in Wave 1.

Fig. 2
figure 2

Size of Panel Cohort SC2 along Waves. a KIGA (\(n=3007\)), b K1_AUG (\(n=6341\))

In Wave 3 in the school year 2012/13, an augmentation sample of Grade 1 students (K1_AUG) was drawn and asked for participation. This augmentation sample is related to the sample of Kindergarten children by the elementary schools to which they pass. The augmentation gross sample contains 19205 students. In total, 6917 students provided panel consent and are followed up through their time in elementary school and beyond. A small proportion of these students constitutes the Kindergarten children who have already been surveyed in Wave 1 and 2 (576 students in KIGA_PANEL). Among the sample with panel consent, 6733 participated in the survey and testing of Wave 3 corresponding to a participation rate of 97.3%. Kindergarten children who did not pass to a NEPS schoolFootnote 4 are assigned to the field of individual retracking (KIGA_IND). By design, they are not interviewed and tested until Wave 6 when they are supposed to be in Grade 4. Accordingly, from Wave 3 up to Wave 5 they are defined as temporary dropouts. Among the 6340 realized interviews in Wave 4 (participation rate is 96.1%), 5801 cases belong to K1_AUG and 539 cases to KIGA_PANEL. In Wave 5, 5799 interviews were realized, 5296 cases in the K1_AUG subsample and 503 in the subsample KIGA_PANEL.

Table 3 Panel Progress SC2

The overall participation rate in Wave 5 is 94.1%. All students are asked for participation in Wave 6, including those from subsample KIGA_IND. In sum, 6943 students are tested and surveyed yielding a participation rate of 81.8%. Among these, 5462 students belong to the K1_AUG subsample, 483 to the KIGA_PANEL subsample, and 998 students are part of the subsample KIGA_IND. The number of final dropouts in Wave 6 is far higher for KIGA_IND as compared to the other two subsamples. This might be due to the fact that this particular subsample was not surveyed for three years. The KIGA_IND subsample was tested and surveyed individually in Wave 6. In contrast, students of K1_AUG and KIGA_PANEL are tested and surveyed in their institutional context. We see a considerable decrease in the panel cohort size when the school context was left in Wave 7 and all students together with their parents were tested and surveyed individually. In each subsample, the increase in the final dropouts between the Waves 6 and 7 is very high. This issue is mainly attributable to the summation of all parent withdrawals of the previous studies. Until Wave 6 the affected target persons could be surveyed and tested in spite of parental withdrawal. However, in Wave 7 all students transitioned to the individual field, i.e. questionnaires and tests are passed at home. That is, in case of an existing parent withdrawal, surveying has had to be abandoned. As a result 526 target persons have had to be excluded from the panel sample though they were still willing to participate. Fig. 2 visualizes these numbers, where the height of each bar gives the initial number of targets with valid panel consent.

2.3 Starting Cohort 3

The SC3 panel cohort (Grade 5) comprises the two subsamples G5 and G7_AUG. The G5 subsample has been established in 2010. Its gross sample consisted of 11563 Grade 5 students. Two years later, in 2012, the SC3 sample was enriched by the G7_AUG augmentation sample. For this purpose, 3944 Grade 7 students had been drawn and asked to participate in NEPS.

Fig. 3
figure 3

Size of Panel Cohort SC3 along Waves. a G5 (\(n=6112\)), b G7_AUG (\(n=2205\))

Table 4 Panel Progress SC3

In sum, 6112 students (i.e., 52.9%) of the G5 gross sample and 2205 students of the G7_AUG gross sample (i.e., 55.9%) provided valid panel consent. Table 4 details the SC3 panel progress, separately for the two samples G5 and G7_AUG. Its third column gives the panel cohort size at the beginning of each wave. The columns four and five show the number of students who had been administered an interview and those who had not. Then, in the columns six to nine the number of participants, temporary, and final dropouts at the end of each wave are given. The last column contains the number of students actively refusing further participation in the SC3 panel study. The basically same information is provided by Fig. 3, where the height of each bar gives the initial number of students with valid panel consent. From both, Table 4 and Fig. 3, the large number of students finally dropping out after Wave 4 is noticeable. This is because 578 students in special-need schools were dismissed from the panel.

2.4 Starting Cohort 4

The gross sample of the SC4 (Grade 9) consists of 26868 students. Of these, 16425 students (61.1%) provided valid panel consent. Table 5 gives details on the SC4 panel progress separated by its two subsamples ACA (academic track) and VOC (vocational track). The table provides the panel cohort size at the beginning of each wave together with the number of students who had been administered an interview and those who had not. For students who had been administered an interview the following columns give the corresponding status (participant, temporary, and final drop out) at the end of each wave. The last column gives the number of students actively refusing further participation in the panel study.

Fig. 4 displays the numbers of Table 5 graphically. Note that the height of each bar gives the initial number of students with valid panel consent. In the Waves 1 and 2, all students are in ACA. From Wave 3 to Wave 8 the students in the academic track (ACA) are located at top of the graphic, whereas the students in the vocational track (VOC) are shown at the bottom of the graphic. Over time, more and more students leave school for vocational education.

Fig. 4
figure 4

Size of Panel Cohort SC4 along Waves

Table 5 Panel Progress SC4

Hence, the number of students in the top part (ACA) declines, whereas the number of students in the bottom part (VOC) increases. In Wave 9 all students have left school and thus distinguishing ACA and VOC is not any longer necessary. From both, Table 5 and Fig. 4, some numbers are noticeable. First, in Wave 4 and Wave 6 the majority of students had not been administered. This is because these two waves were targeted only at students in the vocational track who had participated in the previous wave (Wave 3 and Wave 5) to keep in touch. Second, in Wave 8 a large number of students had not been administered. These are mainly students from special-need schools, for whom further financing was unclear. However, starting from Wave 9 financing was secured again and the majority of these students reparticipated. The large number of final dropouts after Waves 8 and 9 is caused by converting temporary dropouts to final ones because of continuous nonparticipation over a period of two years. Due to this, 1396 students were defined as final dropouts and removed from the panel sample after Wave 8, and another 1246 students after Wave 9.

2.5 Starting Cohort 5

For SC5 (First-Year Students), in total 31082 freshmen with valid contact information could be recruited at private and public universities and universities of applied science. These constitute the SC5 gross sample. From these, 17910 persons took part in Wave 1 and gave their panel consent. This corresponds to 57.6% of the administered cases and is the panel cohort of SC5. The remaining cases are ascribed to the final dropouts of Wave 1. Table 6 details the SC5 panel progress separated by its four subsamples TEA (freshman studying for a teacher degree), UNI (freshman at universities without TEA), AUN (freshman at universities of applied sciences without TEA), and PR (freshman at private universities). In the Wave 1 competence tests, only one third (33.2%) of the panel cohort took part. In the Waves 2–9, participation rates fluctuate between 58.8% and 73.5%. We find that the participation rates in the CAWIs (Waves 4, 6, and 8) are generally lower than those in the CATIs conducted earlier in the same year (Waves 3, 5, and 7).

Fig. 5
figure 5

Size of Panel Cohort SC5 along Waves

Table 6 Panel Progress SC5

In Wave 7, the oversampling part of the TEA subsample has not been administered (i.e., 15.9% of the Wave 7 panel sample) because at this time its further financing was not secured. However, it was again starting with Wave 8. In Wave 7, for the first time study members are considered as final dropouts because of continuous nonparticipation over a period longer than two years. As a consequence, the proportion of people dropping out from the sample (between the Waves 7 and 8) is noticeably higher than in the waves before. Because of the same reason, after Wave 9 a large proportion of temporary dropouts was declared to be final dropouts. In the Waves 1, 5, and 7 competence tests took place. The Wave 7 competence test was only administered to a particular subgroup of the panel cohort, namely to 600 business administration students. Compared to the participation in the Wave 5 testing (50.6% of the administered cases), participation in the Wave 7 testing was high, i.e. 74.3% of the administered cases. In Wave 9, five years after study start, most students graduated and/or left university. Thus, their propensity to take (further) part in a student sample likely declines. Fig. 5 displays the numbers of Table 6 graphically. Note that the height of each bar gives the initial number of students with valid panel consent, that is, the 17910 students who took part in Wave 1 and gave their panel consent.

2.6 Starting Cohort 6

The sample of the SC6 (Adults) consists of three subsamples: the participants of the ALWA study who agreed to continue to participate in NEPS (ALWA), the newly drawn individuals of the first NEPS wave (NEPS1)Footnote 5 and the individuals of the refreshment sample in the third wave of the NEPS (NEPS3). Table 7 details the SC6 panel progress separated by its subsamples ALWA, NEPS1, and NEPS3. The column “Not administered” involves individuals who did not actively withdraw their panel consent, but who could not be contacted any more (e.g., because of missing valid contact information).

Fig. 6
figure 6

Size of Panel Cohort SC6 along Waves. a ALWA/NEPS1 (\(n=11932\)), b NEPS3 (\(n=5208\))

Table 7 Panel Progress SC6

Because of convenience, these cases were completely excluded from the panel.Footnote 6 The column “Administered” contains for the Waves 1 and 3 the gross sample sizes of the newly drawn individuals of the subsamples NEPS1 and NEPS3.Footnote 7 In total, 11649 individuals participated in Wave 1 and gave their panel consent. This corresponds to 43.1% of the administered cases. In Wave 1, 1927 members of the ALWA sample dropped out temporarily. From these, 833 individuals were readministered in Wave 2 and 283 reparticipated. These cases (i.e., \(N=283\)), combined with the participants of Wave 3, constitute the panel sample of SC6. In Wave 4, 76.4% of the administered cases participated in the interview. In Wave 5, the initial panel sample was augmented by a refreshment sample of 17111 persons. From the drawn gross sample, 30.4% participated in the panel study and gave panel consent. We see that the ALWA members are more likely to participate in the survey than the individuals from the two other NEPS samples. In particular, the NEPS3 subsample shows a strong decline in participation rates: In the latest Wave 7 only 77.5% of the administered persons agreed to participate, compared to 85.1% in the ALWA sample. Fig. 6 illustrates the SC6 panel progress. It is obvious that the temporary dropouts decline more and more as time went by since at latter waves the panel consists mainly of people who are willing to further participate.

3 Selectivity Analyses

Non-random attrition across all of the panel waves is a common issue in non-mandatory panel surveys. It does not pose a problem as long as it is accounted for in statistical inference. Otherwise, biased results might lead to erroneous research conclusions. In NEPS, selectivity (on the level of the respondent) arises at two distinct stages: in the initial sample due to unit-nonresponse in the gross sample (yielding the panel samples at Wave 1) and due to wave nonresponse. Unit-nonresponse in the gross sample is usually handled by weighted analysis using non-response adjusted design weights or by including relevant design variables into the focal model of the substantive research question. Non-response adjusted design weights are part of the SUFs (in the Weights file) and the design variables are described in detail in the sample documentation. For further information see Würbach et al. (2016) for the SC1, Steinhauer et al. (2016) for the SC2, Steinhauer and Zinn (2016a) for the SC3, Steinhauer and Zinn (2016b) for the SC4, Zinn et al. (2017) for the SC5, and Hammon et al. (2016) for the SC6. In a second step, attrition along the panel waves has to be studied and individuals with higher dropout propensities to be revealed. This information can then be used to correct for non-random selection processes in statistical analysis. Corresponding approaches are described in Sect. 4.

The main issue to start with is the examination of the attrition processes present in the NEPS Starting Cohorts 1 to 6. Concretely, we explore how attrition (final dropouts) distorts the NEPS panel samples with respect to relevant design variables (such as stratification criteria) and panel member characteristics (like sex and birth year). For this purpose, we study the panel status of each panel member–being part of the panel sample vs. final dropout–across all of the panel waves with respect to starting cohort and target population specific characteristics. For consistency reasons, we consider some variables in each of the models (corresponding to the distinct starting cohorts). Each model comprises the region where a person is surveyed (Eastern Germany inclusively Berlin vs. Western Germany), her/his gender (female vs. male), the year of birth, the migration background (target person and/or one of her/his parents are born abroad vs. otherwise)Footnote 8, and the CASMIN of the father and/or the mother (elementary, secondary, and higher level of education according to length of educational experiences).Footnote 9 If the percentage of missing values in a variable exceeds 5%, we specify a missing category for this variable, otherwise missing values are imputed.Footnote 10

We use discrete time event history models (see, e.g., Kalbfleisch and Prentice 2002; Hougaard 2000) to capture the dynamic nature of the attrition process. Discrete time event history models are perfectly suited to this kind of problem. Relevant variables are regressed on whether attrition was observed for a panel member or not in a panel wave. Proceeding this way, the impact of time and individual characteristics are considered simultaneously when modeling propensities for final dropouts. Our modeling approach is also well suited to cope with the unbalanced data structure of our data sets that result due to attrition events in each wave. Ignoring this particularity of the data and generating, for example, a balanced panel data set by considering as risk set only those panel members that remained until the last wave likely gives wrong research conclusions. The reason is that the group of panel members who already dropped out at earlier waves are expected to differ with respect to their composition from that panel members of later waves. For example, highly mobile individuals are more prone to dropout earlier since their contact information may be not valid any longer. Footnote 11 All models are specified as proportional hazards model, so called Cox models named according to their inventor (Cox 1972). Hence, in our models the unique effect of a unit increase in a covariate is assumed to be multiplicative with respect to the attrition propensity. To preserve the proportional hazard property–as required by the Cox model–we specify our models as generalized linear models with a cloglog link function.Footnote 12 All models across all starting cohorts are estimated using the glm function of the statistical software R (R Core Team 2017), see for example Broström (2012). Again, each of the starting cohorts is analysed and described in very detail separately.

3.1 Starting Cohort 1

The SC1 panel sample consists of four waves with surveys in an interval of approximately one year covering the time period 2012 to 2015. Starting from a gross sample of 8483 targets, 3481 individuals responded in Wave 1. The corresponding model with the propensities for participation is given in Würbach et al. (2016, Chap. 4.1). This model contains only a restricted set of explaining variables owing to the fact that very limited information was available in advance from the registration offices (asked for providing information on the target population). These are mainly characteristics of the newborns used for sampling. Additional information from the history of contacts was included. That is, the number of contact attempts was used to control for accessibility. This model indicates only modest selectivity of the participants with respect to the gross sample. Respondents with non-German citizenship show a slightly lower propensity for participation than respondents with German citizenship.

Table 8 documents the results of the selectivity analysis regarding the latest published SUF for the SC1 (Waves 1 to 4). The figures are reported in reference to the panel sample of the SC1 at start (\(N=3431\)). In the SC1 the target population are newborns but the respondents are their legal guardians. It is possible that the contact person changes between two waves, for example, in the first two waves we got all information from the mother and in the last two waves the father participated and gave information (both with panel consent). If there was no change of the contact person, all relevant child and parent data was carried over from previous CATI.

Table 8 Selectivity Analysis for the SC1 Panel Sample along Waves 1–4

In case of change, usually the parent data was obtained from the new respondent and thus being updated. This updated information is used for modeling. The remaining missing values are imputed as mentioned above. We considered the residential community size, the employment status and the family status of the reporting parent as well as the number of children in the household as relevant variables to model attrition in SC1. All covariates included were regarded as time invariant, because changes–if at all–are only modest.

In detail, Table 8 reports the hazard ratios for attrition across all of the four waves observed so far. The results show a significant increase in the propensity to drop out from the panel sample when the respondent is currently unemployed or has a migration background (generation status lower than three) compared to their reference categories. Moreover, respondents with a higher level of education have a remarkably lower propensity to be a final dropout. Opposed to respondents with school leaving certificate lower or equal to secondary education without vocational training (reference category), respondents of the groups higher education entrance qualification (with or without vocational training) as well as respondents with university degree or a technical college qualification are significantly more willing to participate. Regarding the household and family structure two further outcomes emerge. Missing information on family status is strongly associated with attrition. In addition, we see a tendency for large families to be more willing to participate. That is, having two or more children in the household increases the propensity to stay in the panel sample, though not being significant. The time effects were highly significant, indicating significant attrition at all of the waves following Wave 1.

3.2 Starting Cohort 2

The SC2 panel sample consists of six waves with one survey every year covering the time period 2011 to 2016. In Wave 1 the SC2 panel sample contains 3007 children from kindergarten. Compared to the gross sample (\(N=4515\)), the panel sample has a lower proportion of children not speaking German at home. Furthermore, the panel sample comprises a lower proportion of children raised by a single parent opposed to children being raised by both parents. The corresponding model with the propensities for participation is given in Steinhauer et al. (2015, Chap. 3.1).

The panel sample of the augmentation subsample K1_AUG (\(N=6341\)) reveals only minor selectivity of participating school children compared to the gross sample (\(N=16{,}784\)). We found that the proportion of children being earlier enrolled for school is slightly lower than in the gross sample, see Steinhauer et al. (2016, Chap. 3.2). Again, the set of variables used for analysing selectivity between the gross and net sample is naturally restricted to the sampling information (because no other information was available in advance). Please note, that no general statements can be made regarding the selectivity apart from this.

Table 9 documents the results of the selectivity analysis regarding the latest published SC2 SUF (Waves 1 to 6), in which all subsamples (KIGA_IND, KIGA_PANEL, K1_AUG) were tested and surveyed again. The figures are reported in reference to the SC2 panel samples at start (\(N=9336\) in total) but separately for each of the three subsamples. The number of explaining variables differs between the subsamples. For the children of the augmentation subsample (K1_AUG) a lot of information from the target as well as the school context is available. We considered the level of urbanization, the funding of the school, the time of enrollment for school as well as the presence of special educational needs as relevant variables to model attrition in the SC2 subsample K1_AUG.

Table 9 Selectivity Analysis for the SC2 Panel Sample along Waves 1–6 (KIGA_Panel/KIGA_IND), and Waves 3–6 (K1_AUG), respectively

Similar manifold information is available for the school children from the subsample KIGA_PANEL. However, due to the small overall sample size (\(N=576\)) and the resulting small case numbers in single cells, some variables were intentionally excluded when modeling attrition for the KIGA_PANEL to increase efficiency. Concretely, this applies to the funding of the school, the level of urbanization, the school enrollment, the special educational needs as well as the migration background of the parent. When modelling attrition propensities in the KIGA_IND subsample, we added the urbanization level to the variables described in the introduction of this section. All covariates included were regarded as time invariant, because changes–if at all–are only modest.

Table 9 reports the hazard ratios for attrition across all six waves observed so far (i.e., Waves 3 to 6 for K1_AUG, respectively) in detail. In all three subsamples targets whose parents have a higher level of education show a remarkably lower propensity to be a final dropout, though, not being significant. Opposed to targets of parents with school leaving certificate lower or equal to secondary education without vocational training (reference category), having parents of the groups higher education entrance qualification (with or without vocational training) as well as having parents with university degree or a technical college qualification significantly increases willingness of the target to participate.

In the KIGA_PANEL subsample the propensity to drop out from the panel sample is significantly decreased for targets living in semi-urban areas opposed to those living in a rural area. For the KIGA_IND subsample only the missing information regarding the CASMIN of the parents shows a significant effect on the panel attrition. However, the effect is counterintuitive because the presence of missingness in the CASMIN is related to a lower propensity for attrition here. The results show that in subsample K1_AUG respondents from Western Germany have a significantly increased propensity to drop out from the panel compared to those from Eastern Germany including Berlin. Regarding the funding of the school and the migration background of the parents we observe positive effects on panel willingness. Children from public schools as well as school children with parents having a generation status lower than three are more willing to participate.

The time effects were highly significant at all waves for the KIGA_PANEL and K1_AUG subsamples, indicating a significant loss of panel members at all of the waves following Wave 1 for KIGA_PANEL, and after Wave 3 for K1_AUG, respectively. The time effects for KIGA_IND are insignificant up to Wave 6. This is not surprising, because KIGA_IND was pending in the Waves 3 to 5.

3.3 Starting Cohort 3

The SC3 panel sample covers seven waves, mostly in an interval of one year, ranging from 2010 to 2016. During this time, 6112 students (subsample G5) have been surveyed and tested from Grade 5 to Grade 10. The 2205 students of subsample G7_AUG have been surveyed and tested from Grade 7 to Grade 10. The relevant design variable used for stratification in both subsamples is the school type in which a student had initially been sampled. The corresponding secondary school types (offering education to students in the Grades 5 to 10) are listed in Table 10.

Table 10 School types in Germany

Some students changed schools and possibly also school types over the course of the panel. Unfortunately, there is no consistent and complete information on the school type histories of the SC3 panel members available. This is why we stick to the sampling information when modelling attrition propensities. In addition to the individual characteristics described in the introduction of this section, we consider the mathematical competence of a student in Grade 5 and Grade 7 (low, medium, high, and no information) as explanatory model variable. All of the considered covariates are time invariant. This also holds for the mathematical competencies in Grade 5 and Grade 7, incorporated as cross-sectional information into the model because there was no testing in Grade 6. Table 11 shows the results of the respective analysis for the two subsamples of SC3. For the subsample G7_AUG there are no estimates displayed for mathematical competence in Grade 5, because this information is not available by design. Further, there are no estimates given for certain school types (special need schools FS, elementary schools GS, and orientation stage schools OS), because either no students were sampled in the corresponding school type (FS), or the school type does not host any students in Grade 7 (GS, OS). In the first four waves, G5 contains students with special needs sampled in school type FS. Since these students were dismissed from the panel after Wave 4 (cp. Table 4), we excluded them from our analysis. The dominant effect of having no information on several variables on the attrition propensity is obvious, although only relevant for mathematical competence among students of the G5 subsample. Besides that, students of the G5 subsample having good or medium mathematical competence show a smaller propensity to drop out of the panel, compared to students with bad mathematical competencies. The same holds for G5 students who have initially been sampled in OS (school type independent orientation stages). This is because these students had to leave OS after Grade 6, and thus, are individually surveyed. Finally, students from the G5 subsample living in Western Germany have a higher attrition propensity than those living in Eastern Germany (incl. Berlin). Characteristics like gender, age group or the migration background do not affect the attrition propensity in G5.

Table 11 Selectivity Analysis for the SC3 Panel Sample along Waves 1–7 (G5), and Waves 3–7 (G7_AUG), respectively

We find that students of the G7_AUG subsample living in Western Germany have a higher propensity to drop out of the panel than students from Eastern Germany (incl. Berlin). Compared to G7_AUG students with bad mathematical competencies, students with a medium mathematical competence have a lower attrition propensity. Students with parents having a high educational background (measured by CASMIN), or no information on the educational background have a higher probability for remaining in the panel sample, compared to students whose parents have a lower educational background.

3.4 Starting Cohort 4

The SC4 panel sample covers nine waves, mostly in an interval of one year, ranging from 2010 to 2016. During this time, 16425 students have been surveyed and tested from Grade 9 onwards. Students get to choose their track of education after Grade 10. Here students can either stay in school, enter the academic track (ACA) and do their A‑levels (Abitur) or they can leave secondary school. In the latter case, students start a vocational training or enter the German transition system. Both groups, vocational training and transition system, are summarized in the vocational track (VOC). The relevant design variable used for stratification is the school type where a student had initially been sampled. Here, all secondary school types listed in Table 10 except elementary schools (GS) and orientation stage schools (OS) apply. Compared to the SC3, in the SC4 more students changed schools over the course of the panel and likely also the school type. Unfortunately, there is no consistent and complete information on their school type history available, which is why we stick to the sampling information. Besides the individual and design characteristics mentioned above, we consider the mathematical competence of a student in Grade 9 (low, medium, high, and no information) as explanatory model variable. Because students change their educational track after Grade 9 we incorporated the educational track as a time-varying covariate into the model. Table 12 shows the results of the respective analysis.

The dominant effect of having no information on several variables on the attrition propensity is obvious, although only relevant for migration background and parental CASMIN. Compared to students in the academic track, students in the vocational track have a higher probability to drop out of the panel sample . This is mostly due to the fact that students in VOC are surveyed and tested individually, so that the peer pressure of testing groups in schools is not present any more, making it easier to refuse. Apart from this, the VOC group of students is more mobile and thus harder to track. We find that the school type has a strong effect on panel attrition. Compared to students who have been sampled in schools leading to upper secondary education (GY), students in other school types are more likely to drop out. Commonly, students in GY stay longer in school as students in other school types (who offer schooling mostly until Grade 10). Accordingly, students who have been sampled in schools dominantly passing their students over the vocational track (i.e., schools for basic secondary education HS, comprehensive schools IG, Rudolf Steiner schools FW, schools with several courses of education MB, intermediate secondary schools RS) have a lower propensity to remain part of the panel, compared to students in schools of upper secondary education (GY).

Students in special need schools (FS) are, compared to students in schools of upper secondary education (GY), less likely to leave the panel sample. This might be due to the fact that these students do not switch or leave their schools. Moreover, male students have a higher propensity to drop out of the panel as compared to female students. Students belonging to the younger part of the cohort have a lower probability to drop out. Concerning the mathematical competence, students with medium or high mathematical competencies are more likely to remain part of the panel sample as compared to students with a lower achievement in the mathematical competence tests. Finally, the parents’ educational background (measured by CASMIN) influences panel attrition. Here, students whose parents have at least a secondary school qualification and a completed vocational training (or higher degrees of education) are more likely to remain in the panel as compared to students whose parents do not have at least a completed vocational training.

Table 12 Selectivity Analysis for the SC4 Panel Sample along Waves 1–9

3.5 Starting Cohort 5

The panel sample of SC5 consists of nine waves with one survey every six months ranging from 2010 to 2015. The first wave sample comprises 17910 students. Relevant design variables are the type of university at which a student started her/his studies (i.e., public or private university, and university or university of applied sciences), whether a student studied with the aim of becoming a teacherFootnote 13 (i.e., yes vs. no), and whether a student has graduated with a degree allowing for traditional university admissionFootnote 14 (i.e., traditional university admission in Germany, traditional university admission abroad, and nontraditional university admission). The field of study is a further stratification criterion. However, over the course of the panel many students changed their study field (in parts or completely). There is strong evidence that many students who dropped out have changed their study field. Consequently, no current information on their study field is available. Including outdated information into our analysis would give a wrong picture. Thus, we decided to omit it. Clearly, students have also changed universities. However, here we could not find evidence for high incidence. Hence, we included this criterion into our analysis. In addition to the individual characteristics described above, we consider the mathematical competence of a student in the winter semester 2010/11 (low, medium, high in comparison to peers) as explanatory model variable. All of the considered covariates are time invariant.

Table 13 shows the results of the respective analysis. We find significant effects of the birth year, the region, the competence score, and the university type. Younger cohorts (i.e., students born later than 1989) are less likely leaving the panel sample than persons born before 1989. Alike, people studying/having studied in Eastern Germany (incl. Berlin) remain more surely part of the panel sample than those in Western Germany. The same applies to students performing well in the mathematical competence test and to students studying at universities (in comparison to students studying at universities of applied sciences). The latter may be explained by students continuing their studies by a doctorate programme at university. Such programmes do usually not exist at universities of applied sciences. Thus, here the chance is higher that students move and are not any longer accessible. Apart from this we see that students with no information on their university admission are surely dropping out. Moreover, we find strong time effects at all waves, mirroring the significant loss of panel members at all of the nine waves. The strongest effect arises at Waves 8, where for the first time all persons who did not participate in NEPS for a period longer than 2 years were not administered since they had been converted into final dropouts after Wave 7. Furthermore, we find evidence that final dropouts occur more often in CATIs than in CAWIs. Overall, the general tendency of more and more students leaving the panel becomes apparent. The obvious reason for this that in Wave 8 most students have finished their studies and move. Thus, they are hard to access, may lose their interest in the study, and stop participating.

Table 13 Selectivity Analysis for the SC5 Panel Sample along Waves 1–9

3.6 Starting Cohort 6

The SC6 panel sample covers in total seven waves with surveys in an interval of approximately one year, ranging from 2009 to 2016. The first wave sample comprises 11649 participants, of these 11932 persons gave their panel consent and thus form the panel cohort at Wave 1 (i.e., ALWA/NEPS1). In Wave 3 the panel sample was augmented by a refreshment sample of 5208 participants (i.e., NEPS3). To comply with the different starting times, the SC6 selectivity analysis is conducted separately for ALWA/NEPS1 and NEPS3. Relevant design variables considered in the analysis as covariates are gender, birth cohort, migration background, whether someone lives in Western or Eastern Germany (incl. Berlin), the size of the residential community, marital status as well as highest educational qualification attained (mapped by the CASMIN classification). Furthermore, the household size, the employment status and the presence of children in the household are taken into account.

The ALWA/NEPS1 model additionally considers the subsample membership (i.e., ALWA or NEPS1). All covariates included were regarded as time invariant, because changes–if at all–are only modest (especially concerning the presence of children in the household).

Table 14 shows the results of the respective analyses separated by the two samples ALWA/NEPS1 and NEPS3. In the ALWA/NEPS1 subsample, the individuals from the oldest birth cohort leave the panel with a higher probability than those of the younger cohorts. Respondents who live in Western Germany are more likely to drop out from the panel than those from Eastern Germany (incl. Berlin). Likewise, leaving the panel is more likely for single and married persons as for widowed or divorced ones. Respondents who live in communities with more than 500,000 inhabitants possess a lower dropout rate than individuals who live at locations with less than 50,000 inhabitants. With increasing educational level, the likelihood of leaving the panel study decreases. Furthermore, children in the household lead to higher panel affinity and three or more household members result in a higher dropout probability.

For the NEPS3 sample, we observe–just like for ALWA/NEPS1–a higher probability of leaving the panel for people of the oldest birth cohort and for respondents living in large households. However, there are also some differences in the effects as compared to ALWA/NEPS1. The educational level and whether someone lives in Western or Eastern Germany does not have any significant effect on the attrition propensity in NEPS3. However, we find that individuals with migration background are more likely to drop out from the NEPS3 panel.

4 Summary and Recommendations for Statistical Analyses

Our selectivity analyses have shown that–over the course of the panel–specific groups of individuals have a higher tendency to drop out from the panel sample than others. All in all, highly mobile target persons (such as students leaving their parental home for university or vocational training), people with migration background, and persons with (or parents with) elementary or lower secondary education have higher dropout propensities than their counterparts. Likewise, people living in the Western part of Germany show a higher probability to leave the panel as compared to those living in the Eastern part inclusively Berlin. Furthermore, persons with low mathematical competence scores and those with missing values have a lower tendency to remain part of NEPS. Further findings of our analyses are ambivalent and differ between the starting cohorts.

Table 14 Selectivity Analysis for the SC6 Panel Sample along NEPS Waves 1–7 (ALWA/NEPS1), and NEPS Waves 3–7 (NEPS3), respectively

We see that the composition of the NEPS cohort samples changes over time. Neglecting this feature in statistical analysis likely yields biased results. As a guideline, we recommend applying non-response adjusted design weights when conducting descriptive statistics. Such weights are provided in the Weights file of the NEPS SUF. However, all of the weights provided refer to the group of people who participated in a wave, not to a subgroup which may be of special interest to answer a particular research question. For coping with a special subsample of a cohort, further non-response weighting might be necessary. For this purpose, a non-response model has to be specified, fitted and adjustment factors have to be derived. For the NEPS, the accordant processing is described in very detail in Steinhauer et al. (2015) as well as in Steinhauer (2014). Concerning regression, we advise to include the stratum information–to account for the unequal selection probabilities in the distinct strata–into the focal model. Furthermore, all variables that have been found to have a significant effect on the attrition probability of the considered sample should be included as explanatory variables. Missing values may be imputed using multivariate equation by chained equation (van Buuren and Groothuis-Oudshoorn 2011) or modelled using the full information maximum likelihood approach (Enders 2010). Both approaches work fine under missing at random (MAR) mechanisms. However, the situation complicates if a missing not at random (MNAR) process must be assumed and the missing probability depends on the missing values themselves. Then, sensitivity analyses have to be performed opposing different MNAR models such as selection and pattern mixture models. For the NEPS data, an accordant study with recommendations for the data users has been conducted by Zinn and Gnambs (2018).

Besides the recommendations listed here, users of the NEPS data are invited to use the NEPSforum (https://forum.neps-data.de/) to ask questions answered by either other NEPS data users or the data providers at the Leibniz Institute for Educational Trajectories.