Intensity and content of private tutoring lessons during German secondary schooling: effects on students’ grades and test achievement

Private supplementary tutoring is a widespread phenomenon. However, evidence that private tutoring has positive effects on academic achievement or about the specific conditions of successful private tutoring is rare. Adapting Carroll’s (1963) model for school learning to private tutoring, we expected to find positive effects of tutoring duration, tutoring intensity, and students’ motivation to attend private tutoring. In a sample of eighth-grade students in German secondary schools (N = 8510, 18.6% currently being tutored), we conducted regression analyses with multiple covariates and did not find a positive main effect of private tutoring attendance in any of the school subjects examined. Moreover, within the subsamples of tutored students, we were not able to identify positive effects of tutoring duration, tutoring intensity, tutoring content (such as a focus on homework completion, test preparation, or study behavior), or students’ motivation to attend private tutoring. Given these disillusioning findings, we primarily derive suggestions for future research.


Introduction
In line with the rising demand for private tutoring lessons (Byun et al., 2018), research on the spread and effectiveness of private tutoring lessons has greatly increased in the past 20 years, resulting in many articles, several special issues (Guill & Spinath, 2014;Manzon & Areepattamannil, 2014;Zhang & Bray, 2019), and various books on the topic (Bray et al., 2015;Jokic, 2013). In Germany, about half of the student population receives private tutoring at some point in their school career (Hille et al., 2016). However, looking at the effectiveness of private tutoring, one can still agree with the conclusion of the review by Park et al. (2016): Research is inconclusive. After systematic differences between tutored and nontutored students (Byun et al., 2018) were controlled for, some studies identified positive effects of private tutoring, but others found none at all. Given the enormous amount of economic resources families invest in private tutoring classes (Germany: Klemm & Hollenbach-Biele, 2016;South Korea: Lee, 2005), it is important to gain further insights into the circumstances under which private tutoring attendance improves academic achievement.
Private tutoring refers to tutoring in academic subjects (e.g., languages, mathematics, or science) to improve students' academic achievement. It is provided by tutors for financial gain and does not include extracurricular activities, such as sports, or remedial lessons in school (Bray, 2014). Despite this quite clear definition, many empirical studies rely on an intuitive understanding of the term "private tutoring" in their questionnaires and fail, for example, to differentiate between paid and fee-free private tutoring (Bray & Kobakhidze, 2014). However, when looking at the effects of private tutoring, as in our study, tutoring costs are not the most important characteristic. More relevant characteristics are the amount of private tutoring that students receive and the content of the tutoring lessons. At an international level, tutored students in East and Southeast Asia tend to attend much more intensive forms of private tutoring than students in Western countries (Guill & Wendt, 2016;Entrich, 2014). However, in both regions, if studies attempted to obtain more details on private tutoring lessons, they considered either the duration of private tutoring attendance, that is, how long students had already been attending (Ömeroğulları et al., 2020;Heinrich & Nisar, 2013) or the number of hours per week spent attending private tutoring lessons (Choi & Park, 2016 in South Korea;Guill et al., 2020a, in Germany); they did not consider the combination of both factors.
There are not many quantitative studies that provide data on tutoring content. In contrast to this, we were able to rely on a large and rich longitudinal data set from a metropolitan area in Northern Germany. This data set had several strengths: (1) Duration and weekly hours of tutoring were measured separately for several tutoring subjects. (2) Teacherassigned grades, as well as achievement test results, were available as outcomes of private tutoring attendance and as control variables before the start of tutoring lessons. (3) At least some students provided data on the content of their tutoring lessons, differentiating, for example, between homework completion, preparation for tests in schools, and the instruction of learning strategies. (4) The data made it possible to check interaction effects between these characteristics of tutoring lessons and the characteristics of the students, namely, their prior achievement and their motivation to engage in private tutoring lessons.
In the remainder of this paper, we first discuss the theoretical reasons for why tutoring duration and intensity and specific tutoring content can be expected to affect academic achievement, and we present empirical research on this topic. Afterward, we present our data, methods, and empirical results and discuss the consequences of our results for parent counseling, education policy, and future research.

Private tutoring as additional learning time
Tutoring is generally believed to be effective because it gives students more time to learn (Kuan, 2011;Mischo & van Kessel, 2005). This argument is based on Carroll's (1963) model of school learning, which posits that learning is a function of the ratio of the time spent on learning to the time needed to learn. The time needed depends on cognitive factors and the quality of instruction, and the time spent depends on the amount of time that is available and the student's motivation to use that time.
Empirical studies have reached inconsistent conclusions regarding the amount of time available for learning during tutoring and the impact of that time on school achievement. There is some evidence of linear links (Choi & Park, 2016, in South Korea), missing links (Guill & Bos, 2014;Guill et al., 2020a, both in Germany;Smyth, 2008b, in Ireland), and a threshold amount of instruction time needed for successful learning (Cheo & Quah, 2005, in Singapore;Heinrich & Nisar, 2013, in the USA;Ömeroğulları et al., 2020, in Germany).
Conversely, there appears to be a saturation threshold: The positive effect of private tutoring decreases when the hours spent on private tutoring increase (Liu, 2012).
However, in most studies, the amount of tutoring is measured either by the number of months and years (Heinrich & Nisar, 2013;Ömeroğulları et al., 2020) that students have already attended private tutoring lessons (tutoring duration) or-and in most cases-by the number of hours per week students spend with private tutoring (tutoring intensity). As both factors, tutoring duration and tutoring intensity, contribute to the additional learning time made available by private tutoring, they should both be considered when the effects of the amount of tutoring are modeled.
Furthermore, for students who have additional learning time in private tutoring classes, it is mostly unclear whether their total learning time (in and outside school) indeed increases. Some authors (Bray, 2009;Kenny & Faunce, 2004) suggested-based on anecdotal evidence-that tutored students pay less attention during regular school classes and thus spend less time actively learning in that context because they assume that tutoring will allow them to make up for what they have missed. Additionally, students might simply displace time that would otherwise be spent learning at home to private tutoring lessons. In both cases, private tutoring lessons do not increase the active learning time-or at least not as much as might be expected considering the amount of tutoring time. Only a few studies controlled for the learning time at home or at school when analyzing the effects of private tutoring. Kim and Hong (2018) found a positive relationship between private tutoring and academic achievement in South Korea even after controlling for self-study time but they did not control for students' prior academic achievement. Smyth (2008a) also controlled for homework and study time at home but did not find any effect of private tutoring.
Coming back to Carroll's model, not all students are expected to profit equally from private tutoring lessons. Students who are more motivated use the learning time available during tutoring lessons as active learning time and are therefore more likely to improve their academic achievement. Indeed, Kuan (2011) found a positive interaction effect between tutoring attendance and students' motivation to attend tutoring, while others (Guill & Bonsen, 2010, Guill & Bos, 2014Guill et al., 2020a) did not find evidence for an interaction between tutoring attendance and students' motivation to learn or their motivation for a specific subject. However, such a general measure of a student's motivation or interest in a specific subject might be too vague a measure of the student's motivation to engage in private tutoring lessons. In line with Carroll, we therefore expected to find a positive interaction between students' motivation to engage in private tutoring and the amount of learning time made available by private tutoring.
Furthermore, according to Carroll's model, more able students or students with higher prior knowledge should be able to profit more effectively from the learning time available as they have a more differentiated knowledge structure, which helps them to integrate new learning material. However, contrary to this claim, in some previous studies, a higher level of prior knowledge was associated with less success achieved through tutoring (Guill & Bonsen, 2010;Guill & Bos, 2014;Kuan, 2011). Guill et al. (2020a), who also found a negative interaction between private tutoring and prior grades, suggested that tutors and students invest more effort in the case of low-performing students if the risk of grade repetition is high.
Based on our adaptation of Carroll's model to private tutoring, we expected to find a positive relationship between academic achievement and private tutoring duration (Hypothesis 1a), between academic achievement and private tutoring intensity (Hypothesis 1b), and between academic achievement and the interaction of both private tutoring duration and intensity (Hypothesis 1c), at least when changes in out-of-school learning time were controlled for. We expected this effect to be stronger for students with higher prior academic achievement (Hypothesis 2a) and for students who were more motivated to attend private tutoring lessons (Hypothesis 2b).

Instructional foci of private tutoring lessons
More time-intensive or longer-term tutoring is not necessarily just "more of the same." Tutoring lessons may have very different instructional foci depending on the conceptions of the tutor and the tutees. Based on earlier descriptive findings on the content of tutoring lessons in Germany (Rudolph, 2002), we differentiated between four core areas of tutoring lessons.
(1) Tutoring may focus on homework completion. Either the tutees do their homework during private tutoring with the tutor's assistance or the tutor checks the tutee's homework. In both cases, the tutor assists the student to fulfill the basic requirements of school lessons. Furthermore, the student can be assured that his or her homework is correct and given confidence to present it in school. Both factors can improve the student's grades. However, if the tutor corrects the student's homework instead of focusing on the student's understanding of the subject matter, private tutoring will not result in improved academic achievement as measured by achievement test scores. Homework support from tutors that can be classified as controlling and interfering even has negative effects on students' homework-related study behavior (Guill et al., 2020b).
(2) Tutoring may focus on material currently being taught in the classroom and on test preparation. If the tutee lacks a deeper understanding of the subject matter, the tutor may train basic mathematical algorithms or grammatical rules, for example, which allow the tutee to master a substantial part of teacher-designed tests and, therefore, to reach a sufficient grade and to avoid grade repetition. In this case, tutoring may result in better grades although the effects on students' achievement in academic tests-which focus on a deeper understanding and cover a broader range of the curriculum-may be small or even nonexistent. (3) In contrast, tutoring might also focus on filling gaps in prior knowledge (Thomas et al., 2006). Prior knowledge is essential for students to integrate new learning content, to revise, and to actively construct new knowledge structures. Prior knowledge is therefore a strong predictor of academic achievement (Helmke & Schrader, 2001). If a tutee struggles in school for a long time, his or her prior knowledge is no longer sufficiently elaborated to integrate new content. If tutoring focuses especially on improving prior knowledge, this might not immediately result in better grades but might become visible in achievement tests, which focus not only on current lesson content but also on a broader range of the curriculum. (4) Last but not least, in the long run, students should ideally be able to master their learning process independently. To reach this goal, tutoring lessons may focus on developing study behavior and learning strategies that are requirements for self-regulated learning processes (Helmke & Schrader, 2001;Ramdass & Zimmerman, 2011). In this case, short-term effects on grades and test achievement might be low but effects on students' cognitive and metacognitive learning strategies can be expected.
While short-term tutoring most probably mainly focuses on short-term goals such as improving the grade of the next and maybe decisive test, long-term tutoring offers more time for long-term goals such as improving the tutee's prior knowledge and study behavior. To the best of our knowledge, however, no research so far has systematically looked at correlations between these various thematic priorities and the duration and intensity of private tutoring, nor at whether such priorities have a differential effect on successful learning. According to qualitative analyses, tutors are aware that tutoring may be more effective in the long run if it focuses on closing gaps in prior knowledge (Mischo & van Kessel, 2005), and tutors therefore sometimes take this approach (Thomas et al., 2006).
Summing up our argumentation, we expected private tutoring that puts a focus on homework completion and test preparation to improve mainly the students' grades (Hypotheses 3a and 3b), private tutoring that puts a focus on prior knowledge to improve mainly the students' test achievement (Hypothesis 3c), and private tutoring that puts a focus on study behavior and learning strategies to improve the students' cognitive and metacognitive learning strategies (Hypothesis 3d).
We expected to find this general pattern of results across all tutoring subjects. Nevertheless, in line with research on the effects of classroom instruction, which is usually subjectspecific (Baumert et al., 2010), we believed that it would be useful to analyze the effects of the instructional foci of private tutoring separately for each subject. Grading practices differ between subjects and might influence the instructional foci of tutoring lessons as well as their effects. Similarly, achievement tests, such as those used in large-scale assessments, differ between subjects. They do not capture every aspect of the curriculum and might be differentially sensitive to the instructional foci of private tutoring lessons. We therefore expected subject-specific analyses to be more capable of revealing the effects of the instructional foci of private tutoring.

Sample
Our analyses relied on two consecutive waves of the KESS Study (Competencies and Attitudes of Students), namely KESS 7 at the beginning of grade 7 in 2005 (Bos et al., 2009) and KESS 8 at the end of grade 8 in 2007 (Bos & Gröhlich, 2010). A complete student cohort of about 14,000 students in the city of Hamburg, a metropolis in Northern Germany, took part in this study. Participation in the questionnaire was voluntary for students and their parents, whereas the achievement tests in various subjects were obligatory for the students. Students attending schools for children with special educational needs and students who did not provide information on private tutoring were excluded from further analyses. Therefore, our analytic sample consisted of N = 8510 students.

Private tutoring
After a screening question on ongoing private tutoring ("Are you currently attending private tutoring [in German: Nachhilfeunterricht]?"), the eighth-grade students reported for each subject, mathematics (M), English (E), German (both as a native language and as a second language: G), and learning strategies (LS), for how long they had been attending private tutoring and for how many hours per week they were currently attending private tutoring. Tutoring duration was measured by four categories: up to 3 months, 4 to 6 months, 7 to 9 months, and more than 9 months. Tutoring intensity was classified into three categories: up to 1 h per week, up to 2 h per week, and more than 2 h per week. Information on tutoring in German or German as a second language was integrated into one variable by using the larger value on the original variables.
Parents stated in the KESS 8 parents' questionnaire whether, and if so, for how long their child had been attending private tutoring in mathematics, English, German, or learning strategies in the past 2 years, namely in grade 7 or 8. Following Guill & Bos (2014), this information was dummy-coded to no (0) or any private tutoring (1) in the past for those students who did not mention any ongoing private tutoring.
The tutoring items did not explicitly focus on paid private tutoring but rather relied on an intuitive understanding of the German term "Nachhilfeunterricht", which mostly refers to paid private tutoring. The typical setting in Germany is one-to-one tutoring or tutoring in very small groups of not more than five students. The tutoring setting was not further differentiated in our study.
Concerning activities during tutoring lessons, students stated for one tutoring subject whether each of the following activities was done not at all, rarely, sometimes, or often: doing homework, checking homework (homework scale: α M = 0.51; α E = 0.73; α G = 0.65), concentration training and training of study techniques (scale on learning techniques: α M = 0.76; α E = 0.70; α G = 0.58), going over previous tests, preparing for future tests, training content from current lessons and (in the case of language tutoring) vocabulary training (scale on current subject material: α M = 0.72; α E = 0.55; α G = 0.80), and repetition of earlier lessons (item on prior knowledge). If the tutoring content could not be attributed to one subject, it was recoded to missing; unfortunately, this resulted in quite high missing rates (see 3.4). Furthermore, students stated their motivation to attend tutoring classes on a 3-item scale (e.g., "I get private tutoring lessons because I am not satisfied with my school achievement"; α = 0.80) Responses were given on a 4-point Likert scale and were reverse-coded (1 = does not apply at all to 4 = fully applies). Students estimated changes in their learning time at home since the start of their tutoring lessons (e.g., "Since I started to attend private tutoring, I have spent less [1]/ an equal amount of [2]/ more [3] time doing my homework at home"; α M = 0.77; α E = 0.79; α G = 0.73).

Outcome variables
Academic achievement was measured through subject-specific school grades and achievement tests. These two measures differ in their representation of academic achievement. Grades are based on students' performance in teacher-designed tests and during oral discussions in the classroom. They are influenced by student characteristics such as study habits, effort, or classroom behavior, but standardized test scores are not and are therefore more objective (Marsh et al., 2005). However, grades are the only feedback students get regarding their achievement and they are a decisive factor for grade repetition. The grades in mathematics, German, and English at the end of grade 8 were reported by the teachers and reverse-coded for analyses (1 = fail to 6 = very good). The mathematics achievement test was composed of different subdomains, for example, arithmetic, algebra, and geometry; for German, a reading test was administered. In English, student's knowledge of vocabulary, spelling, syntax, and grammar was assessed through a cloze test in which students had to replace the missing half of every fourth word in three different texts (Bos & Gröhlich, 2010). The achievement test scores were scaled via a two-parameter logistic IRT (item response theory) model. Weighted likelihood estimates (WLEs) were calculated for English, reading, and mathematics (Feddermann et al., 2019). To measure the students' learning strategies, a short version of the WLST 7-12 (Würzburg reading strategies knowledge test for grades 7 to 12; Schlagmüller & Schneider, 2007) was used in KESS 8, measuring the students' knowledge about the relative superiority of different reading strategies. Unfortunately, the instrument was not available in KESS 7.

Covariates
We controlled for students' prior knowledge by using the subject-specific midterm grades in the same school year (reverse-coded: 1 = fail to 6 = very good). In Germany, the school year is subdivided into two terms. Students are graded for their performance during the first term with midterm grades. Insufficient midterm grades are therefore often a reason to start tutoring.
Furthermore, we controlled for students' prior knowledge by using the test scores at the beginning of grade 7. We added the subject-specific average test score of each school as a covariate because students' learning outcomes are influenced by the ability composition of their classroom and school (Boonen et al., 2014). Basic cognitive abilities were measured in grade 7 by the figural analogies subscale of the revised cognitive abilities test for grades 4 to 12 (KFT 4-12 + R; Heller & Perleth, 2000). In the early 2000s, the Hamburg school system consisted of numerous tracks: Some students attended lower or intermediate secondary school tracks (Hauptschule or Realschule) or an integrated form of these two tracks (Integrierte Haupt-und Realschule, IHR). Others attended comprehensive schools (Gesamtschule). In both tracks, IHR and comprehensive school, students attended courses based on their abilities in the different school subjects. The academic track (Gymnasium) qualified students for university admission. For each track, we used a dummy variable. Subject-specific academic self-concept, work habits, and interest in mathematics were assessed at the beginning of grade 7. Academic self-concept scales (Marsh, 1990) were adapted from PISA (Programme for International Student Assessment) 2000 for the subjects mathematics (α = 0.92), German (α = 0.75), and English (α = 0.79; e.g., "In the school subject English, I learn quickly"). Work habits in mathematics (α = 0.85), German (α = 0.83), and English (α = 0.85) were assessed with five items each (e.g., "I try to participate in mathematics classes"; Trautwein, 2003). Interest in mathematics was measured with six items (e.g., "I like to calculate using large numbers"; α = 0.83; Schiefele, 1996) and performance anxiety with four items (α = 0.80; e.g., "I get frightened when the teacher says that we will write a test"; Valtin et al., 2000). Responses were given on a 4-point Likert scale (1 = completely disagree to 4 = completely agree). We further controlled for gender (male = 0, female = 1). Concerning the migration background, two dummy variables were coded, indicating whether one or both parents were born abroad (no = 0, yes = 1). Educational background was measured via parents' secondary school degree, that is, whether they had graduated from the academic track or not (no Abitur = 0, Abitur = 1) and via their higher education (no university degree = 0, university degree = 1). The socioeconomic status of the family was provided as the highest International Socio-Economic Index of Occupational Status (HISEI; Ganzeboom et al., 1992).

Analyses
Our analyses were conducted in two steps: (1) First, the effect of tutoring duration was analyzed with the data of all students, using the nontutored students as the reference group.
(2) Second, we used the subsample of tutored students in each subject and analyzed the effect of tutoring duration, intensity, and content. Furthermore, we checked the effect of the interaction terms between students' prior knowledge, tutoring duration, and tutoring intensity, as well as between students' tutoring motivation, tutoring duration, and tutoring intensity. Grades and test achievement were centered at the sample mean of tutored and nontutored students and tutoring motivation was centered at the sample mean of tutored students to ease the interpretation of interaction effects. We analyzed tutoring effects on grades and test achievement.
Due to the multilevel structure of our data with students nested in classrooms and schools, we used the "type = complex" command in Mplus 8 (Muthén & Muthén, 1998-2017 to avoid an underestimation of the standard errors of the coefficients. We report standardized regression coefficients (stdyx for continuous predictors and stdy for binary predictors).
For each subject and each dependent variable, we tested the effect of 13 tutoring characteristics (nine for the tutoring of learning strategies), resulting in a substantial risk of alpha inflation. Therefore, we also calculated corrected p values by applying the Bonferroni-Holm method (Hemmerich, 2019;Holm, 1979).
As the covariate distribution of tutored and nontutored students showed a large area of common support (Ömeroğulları et al., 2020), propensity score matching was not conducted before the regression analyses.

Missing data
Using the procedure described in an earlier paper Ömeroğulları et al. (2020), missing data in the dependent and independent variables were mostly handled using multiple imputation with the mice package (van Buuren & Groothuis-Oudshoorn, 2011) and miceadds (Robitzsch et al., 2016) in R. We performed multilevel imputation on the school level for test scores and grades (n = 10 datasets with 50 iterations). The percentage of missing data varied from 0.01 to 81.5% (the latter percentage is for the variables on work habits), and the average missing rate was 21.3% in the sample of tutored and nontutored students. Missing values were due to the nonparticipation of students in the questionnaire, the multimatrix design of the items, for example, on work habits, and individual nonresponse. The results of the statistical analysis were averaged over 10 imputed data sets according to Rubin's rules (1987). Missing data on tutoring items (duration, intensity, content, and motivation) were not imputed but were handled using full information maximum likelihood estimation (FIML; Enders, 2010) integrated in Mplus. Rates were very high, ranging from a mean rate of 42.3% in mathematics to 53.9% in learning strategies, 60.1% in English, and 68.0% in German.

Descriptive results
The portion of students that reported that they were currently attending private tutoring was 18.6%. On a descriptive level, in all subjects, nontutored students outperformed tutored students before and after tutoring (see Table 1) regarding grades, test achievement, and general cognitive abilities. Mean grades before and after tutoring remained stable in both groups. On average, tutored students showed lower motivation in terms of work habits and interest, were found to attend all school tracks, and had the same socioeconomic status in mathematics and English tutoring and a slightly lower socioeconomic status in German and learning strategy tutoring than the nontutored students. Except for tutoring in mathematics, on average, the parents of nontutored students had lower academic degrees than those of tutored students. Girls attended private tutoring in mathematics more often than boys, while boys attended tutoring in German and learning strategies more often than girls. Attendance of English tutoring was the same for boys and girls.
In most cases, students attended private tutoring for up to 6 months and for not more than 2 h per week. They described their motivation to attend tutoring lessons as rather high and, on average, they did not change their learning time at home. The instructional foci were mostly homework and current and earlier lesson content and less often learning strategies. For more details, see Table 2.
The correlations between most of the variables were rather low (see Tables A.1 to A.4 in the supplemental online material), but achievement-related variables such as grades and test achievement had moderate to high correlations. Correlations between the scales on tutoring content and tutoring duration or tutoring intensity were all below r =|.20|.

Summary of the effects of private tutoring
Even after controlling for multiple covariates, tutored students showed significantly lower test achievement than nontutored students in all subjects (see Table 3). With regard to grades, students tutored for up to 6 months had lower grades in mathematics than nontutored students, but there was no relationship between tutoring and grades in the other subjects. Tutoring in learning strategies was not related to reading strategy knowledge.

Tutoring characteristics
Within the subsamples of tutored students, we found hardly any systematic effects (see Table 4 and Tables B.1 to B.6 in the supplemental online material). The models were sequenced as follows: We first tested for the main effects of private tutoring (PT) intensity and duration in separate models and in a common model (Model 1: Hypotheses 1a and 1b). Second, we tested for the main effects of instructional foci in separate models and a common model (Model 2: Hypotheses 3a-d). Third, we tested for the interaction between PT duration and PT intensity (Model 3: Hypothesis 1c), the interaction between PT duration/intensity and prior achievement (Model 4: Hypothesis 2a), and the interaction between PT duration/intensity and tutoring motivation (Model 5: Hypothesis 2b). In nearly all cases (for an exception see below) neither PT duration, nor PT intensity, nor PT motivation was related to grades or test achievement at the end of grade 8. No instructional focus of the tutoring lessons had any systematic relationship with the dependent variables.
Concerning mathematics test achievement (see Table B.2), we found a positive interaction between PT duration and PT intensity (β = 0.16), which means that the combination of long-term and intensive tutoring was especially effective (Hypothesis 1c), but the main effect of PT intensity was negative (β = − 0.12). Motivation to attend tutoring had a positive main effect (β = 0.15) after the interaction of motivation and PT intensity was controlled for (β = − 0.11, p > 0.05). Concerning end-of-year grades in English (see Table 4), we found a positive interaction between prior achievement and PT intensity (β = 0.19), which means that especially high-achieving students profited from more intensive private tutoring lessons.
Concerning private tutoring in German (see Table B.5), we found a negative main effect of tutoring intensity (β = − 0.10) in Model 1. Most interestingly, focusing on homework during private tutoring lessons was also negatively related to achievement in the reading test (β = − 0.46).
Concerning the control variables, as to be expected, midterm grades were the strongest predictor of end-of-year grades. Test achievement in grade 7 predicted grades and test achievement in grade 8. The effects of the motivational variables were rather small and, in many cases, not significant but this was in line with earlier findings. Family background did not have an effect either on grades or on test achievement. Girls had higher grades and test achievement in English and better reading strategy knowledge. There was no gender effect either on mathematics or on German.

Summary of findings
In our data set, 18.6% eighth-grade students were currently (i.e., in 2007) attending private tutoring. This proportion is in line with the findings of Hille et al. (2016), who reported that 19% of eighth-grade students attended paid private tutoring in the last 6 months of 2013. So, even though our tutoring items did not explicitly ask about paid private tutoring, this number implies that, in most cases, we captured paid private tutoring.
In line with several previous studies with the same and similar data sets (Guill & Bos 2014;Guill et al., 2020a, b;Ömeroğulları et al., 2020), we did not find any positive effects of private tutoring on academic achievement, either on grades or on test achievement. Nontutored students outperformed tutored students both before and after several months of tutoring. This was the case although we controlled for numerous achievement-related covariates, systematically differentiating between tutored and nontutored students. Furthermore, in detailed analyses, we did not find any effects of either tutoring intensity, tutoring duration, or students' motivation to attend tutoring. With the exception of a negative effect of a focus on homework during German tutoring on reading achievement, we did not find any effect of specific aspects of tutoring content. After alpha inflation was corrected, neither the homework effect nor any interaction effects between tutoring and students' characteristics reached statistical significance. Even before the correction of alpha inflation, the analyses of the interaction effects did not reveal a clear pattern and the pattern of results .02 .

Strengths and limitations
Given that, in sum, our findings are disillusioning, a detailed analysis of the specific strengths and limitations of our study and data analyses is important. A major strength of our study is that we were able to analyze a large longitudinal data set with substantial sample sizes of tutored students in several school subjects and even in learning strategies. We had information about tutoring duration and tutoring intensity. Hardly any studies up until now have measured the amount of private tutoring students receive in this much detail.
A further strength of our study is that the data set included variables on tutoring content. We are not aware of another large-scale study that has such a detailed description of tutoring content. However, we were unable to identify any effects of this tutoring content. On a methodological level, this might be because of extremely high missing data rates on tutoring content. Students were asked to choose only one tutoring subject to rate the tutoring content but often failed to follow these instructions and chose at least two subjects. In that case, we could not assign statements about tutoring content to any subject and therefore lost this data. In consequence, the estimates of the effects of tutoring content had high standard errors. Future studies using digital tools could technically force the students to restrict themselves to just one subject. Furthermore, the reliabilities of some tutoring content scales were rather low (< 0.60). In future studies, this problem could be solved by adding more items to each scale. However, on a theoretical level, one could discuss whether tutoring content is sufficient to describe tutoring quality. For example, homework support during tutoring can vary a lot, being either autonomy-supportive or controlling, and this has differential effects on students' study behavior (Guill et al., 2020b). Classroom instruction can be described by three dimensions of instructional quality: structure, cognitive activation (or challenge), and support. This model has been successfully adapted to characterize tutoring lessons (Guill et al., 2020a;Bäumer et al., 2011), but a valid measure of cognitive activation during tutoring is still missing. Additionally, future studies should measure the tutoring setting and the tutor's qualification as both aspects might influence tutoring quality.
Based on Carroll's (1963) model of school learning, we identified students' motivation to engage in private tutoring as a moderator of tutoring effects, but we were unable to prove such an effect. Our measure on tutoring motivation focused on the decision to start private tutoring. However, a scale focusing on students' willingness to engage during tutoring lessons might be more helpful to capture the effects of student motivation. This willingness to engage should in turn be related to more general predictors of motivated behavior such as students' self-efficacy, their causal attributions, and their goals (Wigfield & Eccles, 2000).
As also implied by Carroll's model, we tried to control for changes in students' out-ofschool learning time. However, this measure depends not only on the students' capacity to evaluate their current learning time but also on their capacity to evaluate the development of that time over a time span of several months and, therefore, might be of limited validity. Deriving changes in learning time from several measures of current learning time before and during tutoring would be a feasible alternative.
Furthermore, although we controlled for systematic differences in prior knowledge between tutored and nontutored students and also within the group of tutored students, there might also be systematic differences concerning cognitive and metacognitive learning strategies in favor of nontutored or less tutored students. As long as these are not controlled for, tutoring effects might be masked.
Information on students' learning environments outside school and tutoring lessons was limited. Students might regularly be tutored by an older sibling or parent. Theoretically, this support within the family might have an impact on students' achievement similar to that of private tutoring. However, even after controlling for the quality of parental homework support (Guill et al., 2020b) were not able to detect positive effects of private tutoring.
Last but not least, as in all studies using large-scale assessments to analyze the effects of private tutoring attendance, no data are available on short-term declines in academic achievement or motivation just before the decision was made to start private tutoring. Furthermore, smaller improvements within one grade, for example, C ( −) to C ( +), cannot be measured and general achievement tests might not be sensitive enough to capture small improvements.

Future research outlook
A more detailed diagnosis of students' achievement development when they attend private tutoring is desirable. This was provided by, for example, Mischo and Haag (2002), who used grades in teacher-designed tests instead of end-of-term grades. We still believe that it is important to measure tutoring quantity precisely as tutoring can only have an effect if it is attended for a minimum amount of time.
Concerning the problem of systematic differences between tutored and nontutored students, which we did not control for, more experimental designs are desirable to evaluate the effect of private tutoring. However, students can hardly be randomly assigned to tutoring or nontutoring groups while ignoring the families' preferences. Therefore, given tutoring vouchers to the experimental group, which is a random encouragement design (West et al., 2008), might be a worthwhile alternative.
Research on tutoring content and quality is still in its infancy. As a next step, we propose that multiple perspectives should be integrated, that is, not only the students' but also the tutors'-or even observers'-reports of tutoring quality. Given these limitations to our measures of tutoring motivation and students' out-of-school learning time, we do not feel it is appropriate to discuss the implications of our results for Caroll's model of school learning. However, as there are students who reduced their learning time at home, private tutoring should not only be conceptualized as additional learning time but also as differently spent learning time, that is, learning time spent with a tutor instead of alone. This is an additional reason to focus more on tutoring quality. Furthermore, the focus should be broadened to include more dependent variables. Although the improvement of academic achievement might be the main goal of attending private tutoring, tutoring may also have effects on students' motivation and their satisfaction with school and family life (Guill et al., 2020a). Given the existing variety of tutoring settings, all results should be validated with samples from different age cohorts, for additional school subjects, and concerning the international context, in different cultural contexts.

Educational implications for parent counseling and education policy
To date, neither the current state of research (Guill et al., 2020a;Park et al., 2016) nor our analyses provide enough evidence to support a recommendation of private tutoring as a generally effective strategy to improve academic achievement. Even after several months of private tutoring, tutored students did not catch up with nontutored students. If the aim of private tutoring is for students to improve at school, students and parents should closely monitor whether the expected improvement really occurs. The granting of public subsidies for private tutoring with voucher programs and similar measures (Bray, 2009) cannot be supported by our findings and should be weighed up carefully against structured, free-ofcharge tutoring programs that take place within school as these have been shown to be an effective measure to improve reading and mathematics achievement (Pellegrini et al., 2018;Slavin et al., 2011) independent of the parents' willingness and potential to pay for private tutoring lessons.