1 Introduction

The spread of the virus causing coronavirus disease 2019 (COVID-19) resulted in temporary school closures in many countries and regions, resulting in declines in student academic achievement.Footnote 1

Some studies have used meta-analysis and systematic review methods to show that the negative impact of the COVID-19 pandemic on academic performance has varied widely by country/region (Betthäuser et al., 2023; Donnelly & Patrinos, 2022). The negative impact of COVID-19 on students’ test scores has been qualitatively confirmed in several countries and regions. However, the magnitude of the impact has been quantitatively shown to vary across countries and regions. For example, Betthäuser et al. (2023) conducted a meta-analysis using estimates from 42 previous studies in 15 countries and reported that the COVID-19 pandemic let to a reduction in students’ test scores by an average of 0.14 SDs. They also confirmed that socioeconomically disadvantaged students were more likely to have decreased test scores due to COVID-19 than students from the general population and that test scores declined more in middle-income countries than in high-income countries due to COVID-19.

However, evidence on the impact of COVID-19 on students’ academic performance has not been fully collected in Japan. Therefore, this study estimates the effects of the COVID-19 pandemic, which was accompanied by the declaration of a state of emergencyFootnote 2 and temporary school closures to prevent the outbreakFootnote 3, on math and Japanese language test scores for Grade 1–8 students in all public elementary and junior high schools in Amagasaki City, Japan. Amagasaki City belongs to a metropolitan area adjacent to Osaka (the municipality with the second largest population out of the 1718 municipalities in Japan) and has the 43rd largest population in Japan. Notably, Amagasaki City is a municipality with many households with low socioeconomic status (SES). For example, as of 2019, 22.9% of households had received school attendance assistance (national average 14.71%), and 3.95% of households had received public assistance (national average 1.13%), both of which are higher than the national average.Footnote 4 The average household income of Amagasaki in 2019 (about JPY 4.62 million) was also lower than the average of the Hyogo Prefecture (about JPY 5.58 million) and the national average (about JPY 5.61 million).

The average household income of Amagasaki in 2019 (~JPY 4.62 million) was also lower than the average income of Hyogo Prefecture (~JPY 5.58 million) and the national average (~JPY 5.61 million).

To compare test scores for the same school grade across different cohorts, we used the Amagasaki City achievement test, which is vertically and horizontally equalized according to item response theory (IRT, Embretson & Reise, 2013). We standardize the test scores to compare effect sizes across grades and with those reported in previous studies. In Amagasaki City, the COVID-19 school closure period (March 2–May 31, 2020) lasted 12 weeks, so the periods between the end of the closures and the 2020 and 2021 achievement tests used in the analysis were 7 and 19 months, respectively.

We use the difference-in-differences (DID) method to estimate the impact of the COVID-19 pandemic on students’ math and Japanese language performance 7 months and 19 months after school closures. To estimate the impact of the pandemic 7 months after the school closures, we compare changes in academic performance from 2019 to 2020 in the COVID-19 cohort (treatment group 1; individuals who had experienced the COVID-19 pandemic) with changes from 2018 to 2019 in the non-COVID-19 cohort (control group 1; the same individuals who had not yet experienced the COVID-19 pandemic). Furthermore, to analyze the impact 19 months after school closures, we compare changes from 2019 to 2021 for the cohort that took the test twice after the COVID-19 school closures (treatment group 2) and the changes from 2018 to 2020 (control group 2). However, because control Group 2 includes information on the effect of the COVID-19 pandemic in 2020, we remove this effect by subtracting the effects estimated between Treatment Group 1 and Control Group 1 7 months after the school closures. To facilitate interpretation, we analyze the data at 7 months after the school closures for three grade groups, namely, lower elementary for Grade Groups 1–3, upper elementary for Grade Groups 4–5, and junior high school for Grade Groups 6–7, in the first of the two analysis periods. In the analysis, only Grade Group 6 is considered junior high school at 19 months after the school closures due to a lack of data.

We further perform quantile-DID and DID analyses with interaction terms to check for heterogeneity in the effects across quantiles of test scores and predeterminant variables, including sex and pre-COVID-19 SES. We also conduct a triple difference (DDD) estimation to identify heterogeneity in the effects between students in schools that experienced significantly fewer athletic events after the school closures and those in other schools.

There are four main findings. First, the negative effects of the COVID-19 pandemic were greater for math than for the Japanese language at 7 months, and the decrease in math scores was more pronounced than those for the Japanese language at 19 months after the school closures. Specifically, math scores worsened considerably, by 0.133 SDs and 0.249 SDs at 7 and 19 months after the school closures, respectively. In comparison, Japanese language scores were not significantly affected 7 months after the school closures, although they decreased by 0.113 SDs at 19 months after the school closures. Second, the negative effects on math scores did not differ by grade group, but Japanese language scores were negatively affected only in the lower grade groups. Third, math scores declined more in the lower quartile in all grades while Japanese language scores declined more in the upper quartile only for elementary school students. Fourth, the negative effects of the COVID-19 pandemic varied little depending on the gender of the student, the SES of the student’s family prior to the COVID-19 pandemic, and the decrease in school athletic events after the school closures.

This study makes four significant contributions to the literature, as follows. First, the impact of the COVID-19 pandemic on academic achievement was examined by using vertically and horizontally equated tests based on IRT to account for changes in test difficulty. In Japan, the Ministry of Education, Culture, Sports, Science, and Technology (MEXT, 2021a, 2021b) showed that school closures did not affect the mean or variance of student achievement according to the annual National Assessment of Educational Progress and Learning for sixth-grade and ninth-grade students in Japan. However, that study did not use an IRT test, so these results may not adequately reveal the impact of the COVID-19 pandemic and changes in test difficulty. Thus, our study attempted to remove the effects of changes in test difficulty using the IRT test in Amagasaki City. This approach aims to offer a more accurate identification of the effects of the COVID-19 pandemic on students’ academic performance.

Second, we examine the effects of the COVID-19 pandemic on academic achievement for Grades 1–8 in public elementary and junior high schools. We then compare the effect sizes across grades in Amagasaki to those in existing studies by standardizing math and Japanese language test scores at the time of the school closures by school grade. This comparison helps clarify whether the effect of the school closures differs by grade level as in previous studies where the effect differed by country/region.

Third, our study examines the medium- and long-term effects of the COVID-19 pandemic on academic achievement. Several existing studies have analyzed test data from 2021, more than a year after school closures (Asakawa & Ohtake, 2022; Blainey & Hannay, 2021; Kuhfeld et al., 2023). By comparing academic performance 7 and 19 months after the closures, we can provide insights into the duration of the impact of the COVID-19 pandemic on academic achievement.

Finally, to identify the channels through which the COVID-19 pandemic affects math and Japanese language achievement, we tested whether the effect varies by quartile of test score, student SES, and magnitude of reduction in school athletics after the closures. No consensus has yet been reached as to which academic levels were most affected by the COVID-19 pandemic. For example, several studies have shown that the lower an individual’s academic achievement level is, the greater the negative impact of the COVID-19 pandemic on academic achievement (Ardington et al., 2021; Asakawa & Ohtake, 2022; Kuhfeld et al., 2022; Schult et al., 2022a, 2022b). Other studies have shown that top academic groups are negatively affected (Contini et al., 2022; Gambi & De Witte, 2021). In addition, many previous studies have examined whether the effect of COVID-19 on student achievement differs according to student characteristics, such as gender or household SES, but the findings are not consistent (gender: Birkelund & Karlson, 2023; Lichand et al., 2022; Schuurman et al., 2023; SES: Gore et al., 2021; Kuhfeld et al., 2022; Schult et al., 2022a, 2022b). Moreover, while many countries/regions reduced athletic events after school closures, no study has examined this measure in terms of recovering lost lecture time. Therefore, we determine whether the effects of the COVID-19 pandemic on academic achievement differed according to achievement level and reduction in athletic events. This heterogeneity analysis provides insights into the grade groups and subjects for which the achievement gap became wider.

This paper consists of seven sections. Section 2 reviews previous studies. Section 3 describes the school closures of elementary and junior high schools in Amagasaki City. Sections 4 and 5 explain the data and the estimation method, respectively. Section 6 presents the results of the empirical analysis. Section 7 concludes the paper.

2 Existing research

In existing research, the countries, grades, subjects, and timing and methods of tests vary from study to study. Here, we focus only on related studies comparable to ours. Specifically, we review the studies in which standardized test scores and national language or math test results were analyzed.

In Australia, where the first wave of school closures lasted as little as 8 weeks, Gore et al. (2021) used matching estimation to analyze the impact of the pandemic on the language and math scores of Grade 3 and 4 students in New South Wales 6–8 months after school closures. They found that neither national language nor math scores were significantly affected.

In Germany, where school closures also lasted for 8 weeks, Schult et al. (2022a) estimated the effects of the COVID-19 pandemic on the national language and math scores of Grade 5 students in the state of Baden-Württemberg 5 months after school closures. They found a negative effect of −0.07 SDs for national language and −0.06 SDs for math. Using a pre-post analysis, Schult et al. (2022b) estimated the effects of school closures on the same cohort’s national language and math scores 17 months after the school closures. The authors found that the national language scores recovered slightly (−0.045 SDs), while math hardly did so (−0.063 SDs).

In the Netherlands, where schools were closed for 8 weeks in the first wave and a hybrid teaching period was implemented for 4 weeks, Haelermans et al. (2022) used DID to estimate the impact of these school closures on the national language and math scores of students in Grades 1–5 in the 2 to 3 months after the closures. They found the effects of the COVID-19 pandemic on national language and math scores to be negative (−0.096 to −0.190 SDs for the national language and −0.129 to −0.326 SDs for Math). Additionally, when comparing effect sizes by grade level, the negative effect was greater for higher grades, especially for math.

In Belgium, where the first wave of school closures lasted 9 weeks, Maldonado and De Witte (2022) used a fixed effects model to estimate the impact of the COVID-19 pandemic on the national language and math scores of Grade 4 students in the Flemish region 1 month after school the closures. The results showed a negative effect of −0.19 SDs for the national language and −0.17 SDs for math. Gambi and De Witte (2021) used a fixed effects model to estimate the impact of school closures on the national language and math scores of Grade 6 students in the same region 13 months after the closures and found a negative impact of −0.14 SDs for the national language and −0.05 SDs for math. In addition, only the national language scores worsened further over time after the school closures.

Considering the first wave of 10-week school closures in the UK, Blainey and Hannay (2021) estimated the impact on the language and math scores of students in Grades 1–6 in the 4–7 months after the school closures by comparing the postschool closure test scores with those of the same cohort in the previous school year. Their analysis results indicated a negative effect of −0.02 to −0.155 SDs for the national language and −0.02 to −0.09 SDs for math. When the effect sizes were compared by grade, the negative effects were particularly large for Grades 1–3 in the national language and Grades 1, 3, and 5 in math. UK schools were closed in a second wave from December 2020 to January 2021 for 10–14 weeks. Therefore, Blainey and Hannay (2021) also estimated the impact on scores in January–April and April–July 2021 and found that both language and math scores worsened compared to those in the 4–7 months after the first wave of school closures.

In Japan, the longest school closure period for the first wave was 12 weeks. Asakawa and Ohtake (2022) used the DID method to estimate the effect on math scores at the time of the school closures and at three other time points (3, 7, and 10 months after the closure) for Grade 4 and 5 students in Nara City. The results showed that scores decreased by −0.14 SDs at the time of the school closures but recovered by −0.075 SDs (3 months after the closures), 0.26 SDs (7 months after the closures), and 0.295 SDs (10 months after the closures). Furthermore, the higher the student’s grade at the time of the school closures, the faster the recovery of academic achievement after the closures.

In Italy, where the first wave of school closures lasted for 15 weeks, Contini et al. (2022) used the DID method to estimate the impact on the math scores of Grade 2 students in the province of Torino 4 months after the closures. They found a negative impact of only −0.19 SDs. Bazoli et al. (2022) used coarsened exact matching for Grade 5, Grade 8, and Grade 13 students who were randomly sampled class-by-class from the SY2020–2021 National Test (INVALSI). They found a negative effect of −0.316 to −0.057 SDs for national language scores and −0.291 to −0.142 SDs for math scores. Furthermore, comparing the effect sizes by grade level, the effects on national language and math scores among the youngest students, those in Grade 5, were small.

In the U.S., where school closure periods varied widely across states and hybrid instruction was often used, Kuhfeld et al. (2023) used the data from a national test (the NWEA Measures of Academic Progress) administered in August–November 2020 to students in Grades 3–7. Pre-post analysis of the impact on language arts and math scores revealed that while the impact on language arts varied by grade (−0.024 to 0.045 SDs), the impact on math was negative for all grades (−0.181 to −0.11 SDs). Furthermore, in the following year, the effect sizes for language arts and math were −0.095 to −0.173 SDs and −0.213 to −0.262 SDs, respectively, indicating that academic achievement decreased for all grades. In both language arts and math, the younger the student was, the greater the deterioration in academic achievement. Kuhfeld et al. (2022) conducted a multilevel growth model analysis of students in Grades 3–7 who took the same test from August–November 2020, December 2020–March 2021, and March–June 2021. They showed that language arts and math scores decreased progressively over time after school closures. Moreover, the younger the students were, the greater the decreases in their scores in both subjects.

Table 1 and Fig. A1 summarize the above results by school grade. Considering the mean effect sizes by grade group, Table 1 shows that in the Japanese language, the negative impact of the COVID-19 pandemic was greater for the lower grade groups (Grades 1–3: −0.14 SDs, Grades 4–5: −0.073 SDs, Grades 6–7: −0.058 SDs). Table 1 also shows that the negative effect of the COVID-19 pandemic was slightly greater in the lower grades, but the difference between grades was smaller for math than for language (Grades 1–3: −0.147 SDs; Grades 4–5: −0.14 SDs; Grades 6–7: −0.129 SDs).

Table 1 Summary of previous studies

There are four possible reasons for these varying effects of the COVID-19 pandemic on academic performance. The first reason is that the duration of school closures varied among countries/regions. For example, previous studies showed that some countries had no school closures (e.g., Sweden). In contrast, others closed schools for less than 10 weeks (Australia: 8 weeks, Belgium: 9 weeks, Denmark: 4 weeks for grades 1–5 and 8 weeks for Grades 6–9; Germany: 8 weeks, Netherlands: 8 weeks, Switzerland: 8 weeks); 10–20 weeks (Japan: 12 weeks, Spain: 12 weeks, South Africa: approximately 10 weeks, Italy: 15 weeks); more than 20 weeks; or shifted to a hybrid system (UK: 20–24 weeks, Brazil: 35 weeks, Mexico: 48 weeks, Columbia: approximately 68 weeks). The differences in the duration of COVID-19 school closures may have directly affected the magnitude of the effect on academic performance.

The second reason is that the period between the day after school closures and test administration varied among countries/regions. For example, the test was administered within 3 months after the school closures in the Flemish region (Belgium), the Netherlands, and Nara City (Japan). However, in New South Wales (Australia), the test was administered for the first time 6–8 months after school closures (Asakawa & Ohtake, 2022; Engzell et al., 2021; Gambi & De Witte, 2021; Gore et al., 2021; Haelermans et al., 2022; Maldonado and De Witte, 2022). Asakawa and Ohtake (2022) and Jack et al. (2023) showed that academic performance recovered after the resumption of in-person classes. Thus, the estimated effect size may have been smaller the longer the interval between school closures and the test administration period was.

The third reason is that the starting month for classes differs among countries/regions. Since most countries covered by the previous studies start classes from July to September, they were already in the latter half of the school year in March 2020 when the COVID-19 school closures started. However, in other countries, COVID-19 school closures started just after the new school year (in Australia and Colombia) or just before it (in Japan). The new school year is associated with increased difficulty in learning content and a change in classmates, so students need to adjust to a new environment. If school closures coincided with the start of a new school year, academic achievement may have declined more due to the pandemic.

The fourth reason is that in many countries and regions, the sudden school closures due to the COVID-19 pandemic limited the grades and regions of students available for study and the subjects of tests available for analysis. Several studies examined more than five school grades (Haelermans et al. (2022): grades 1–5; Blainey and Hannay (2021, 2022a): grades 1–6; Kuhfeld et al. (2023): grades 4–8; Kuhfeld et al. (2022): grades 4–8), while other studies used only one grade (Arenas & Gortazar, 2022; Contini et al., 2022; Gambi & De Witte, 2021; Gore et al., 2021; Schult et al., 2022a, 2022b; Vegas, 2022). Most studies used national language and math scores as measurements of academic performance, but effect sizes varied widely across countries and regions.Footnote 5 In Japan, the MEXT (2021a, 2021b) showed that school closures did not lead to decreases in math and Japanese language achievement scores using cross-sectional data from the National Assessment of Academic Ability for Grades 6 and 9. Using panel data from April 2019 to March 2021, Asakawa and Ohtake (2022) showed that the COVID-19 pandemic caused a temporary decline in the math scores of Grade 4 and 5 public elementary school students in Nara city, but the students’ academic performance recovered to the preclosure level after 6 months. However, few studies have comprehensively analyzed the effects of the COVID-19 pandemic on academic achievement across grades.

3 Response to COVID-19 in elementary and junior high schools

3.1 Nationwide response to COVID-19

In Japan, measures to combat the spread of the COVID-19 pandemic began earnestly in elementary and junior high schools in late February 2020. On February 25, 2020, the MEXT stated that the decision to close schools to combat COVID-19 infection was left to the discretion of each local government. However, on February 27, 2020, Prime Minister Abe announced a nationwide simultaneous temporary school closure policy for elementary, junior high, high, and special-needs schools. On the same day, the MEXT requested the temporary closure of all schools from March 2, 2020, until the start of the spring break. Due to the spread of COVID-19, a state of emergency was subsequently declared for seven prefectures on April 7, and the declaration became nationwide on April 16. As a result, temporary school closures were mainly extended in areas where the status of infection was serious until May 31, when the emergency declaration was lifted. Specifically, the school closure rate for elementary, junior high, and high schools in Japan was approximately 99% as of March 16, 2020 (14 days after school closure), 95% as of April 22 (1 month and 20 days after school closure), and 88% as of May 11, 2020 (2 months and 10 days after school closure).

After the COVID-19 school closures, many schools reduced their number of events to compensate for class-time loss due to the closures. This reduction in school event implementation and preparation time was expected to improve students’ academic performance by allowing more class time. For example, the School Questionnaire of the National Assessment of Academic Ability 2021 showed that school events were reconsidered in 94.4% of elementary schools and 93.6% of junior high schools nationwide, and in 97.7% (elementary schools) and 96.9% (junior high schools) of the Hyogo Prefecture (excluding designated cities), where Amagasaki City is situated. However, reduction or cancellation policies for school events vary widely among schools and grades. Moreover, the National Assessment of Academic Ability does not provide objective data on which events were reduced and by how many hours.

3.2 Response to COVID-19 in Amagasaki City

Following the nationwide request for temporary school closures, Amagasaki City closed all schools from March 3 to May 31, 2020, with “dispersed school attendance once a week” during the last week of May, “dispersed school attendance” from June 1 to 12, and “regular school attendance” from June 15. Therefore, this study calculates the months since school closures in Amagasaki starting from June 2020.

During this period, Amagasaki City took the following measures based on a notice from the MEXT. First, students were asked to refrain from going out and attending school during the school closure period. Second, schools were to maintain close contact with students staying at home due to temporary school closures and their parents. Third, during the original class period, students were provided with paper-based learning materials based on their textbooks for self-study. However, the content and number of paper-based learning handouts could differ among schools. In addition, educational materials and videos were introduced and provided through education board websites to enable children to study by themselves using ICT terminals in their homes.Footnote 6

After the school closures, many school events were reduced or canceled in elementary and junior high schools in Amagasaki City. However, since most of the school events were not recorded as objective data, the reason why school events were reduced at each school and grade is largely unknown. Among them, athletic events, the only events for which Amagasaki City has objective data, showed large variations in the approach toward holding events, and the duration of these events decreased after the school closures. For example, some schools divided events by grade, drastically reduced event content, substituted an event with a regular physical education class, or canceled the event altogether. As a result, the amount of reduced time for athletic events varied among schools and grades, as did the increased class time due to the reduction in athletic events. Moreover, ICT-based learning environments were not established until April 2022. Specifically, the installation of wireless LANs in regular classrooms continued to progress slowly, and the distribution of one tablet per student for ICT-based learning was not completed until January to March 2022.

4 Data

The data used in this paper are individual data from the Amagasaki City Survey of Academic Achievement and Life Conditions. This survey was conducted independently by Amagasaki City among all students from the first grade of elementary school to the second grade of junior high school in the city since FY2018. The survey consisted of an academic achievement survey and a questionnaire survey administered to all elementary schools in December each year and all junior high schools on a specific day in January. This study uses data from FY2018 through FY2021, which are the most recent available data.

The academic achievement tests are organized as follows. The achievement test is based on content common to all schools, with a private education provider creating the paper test. The paper test consists of two subjects (math and Japanese language) and takes 40 min for students in Grades 1–6. In comparison, a test of 45 min is used for five subjects (Japanese, mathematics, science, social studies, and English) for students in Grades 7 and 8 in junior high school. The paper tests were collected after the tests were administered and scored by the private educational providers who wrote the questions. The results were provided to Amagasaki City after equating them based on IRT by a private education provider.

In addition to the achievement tests, we collected information from other administrative data, including questionnaires administered on the same day as the achievement tests and the Basic Resident Registers. The questionnaires were administered the same day as the paper test and concerned students and their lives. The Basic Resident Registration data contain a code that the local government uses to identify students’ identities and household information. Using a code that identifies the individual, we used other administrative information, such as data on public assistance and school attendance assistance.

The following restrictions apply to the data. First, they are limited to students who took the same-day achievement test. Second, the data are limited to students who resided in Amagasaki and attended public schools in the city. Third, we cannot use information on the school districts and addresses of students because the test data and the Basic Resident Registration data were provided based on students’ names and address data from Amagasaki City. Therefore, it is impossible to match these data with the number of new COVID-19 cases per school district or the macro data on school districts and addresses. As a result of these limitations, the maximum sample size available was 21,937.

Additionally, we use data on the schedule of athletic events from FY2019 to FY2021. Elementary and junior high schools in Amagasaki City are required to apply to the city at least two weeks before the scheduled date and time of the event. After the application is made, the events must be held as requested. In 2019, before the school closures, all schools held athletic events with the participation of all students, but beginning in 2020, the duration and schedule differed among schools and grades due to the COVID-19 measures. As a proxy variable for reducing school events after the school closures, we use the scheduled hours of physical education events for each school and grade level from 2019 to 2021.

We now define the variables used in the analysis and present their descriptive statistics. First, we describe the outcome variables—the IRT tests for math and Japanese language. We use standardized test scores as outcome variables to compare the estimated results of the effects of the COVID-19 pandemic on math and Japanese language scores with those of existing studies. The test difficulty can be regarded as equivalent across years for the same grade level because the test questions are designed based on the IRT. Therefore, we independently standardize the math and Japanese language test scores for the pairs of DID estimations: treatment group 1 (T1, 2019–2020) and control group 1 (C1, 2018–2019) and treatment group 2 (T2, 2019–2021) and control group 2 (C2, 2018–2020). Additionally, we standardize the test score independently for each grade level to compare effect sizes across grades.Footnote 7

Next, we explain three predetermined variables. First, among the list of households eligible for school attendance assistance, we create a dummy for households receiving school attendance assistance (1 if the household receives school attendance assistance at After = 0 and 0 otherwise). Second, a dummy variable for living with one parent is created (1 if the student is either in a single-parent household or one of the parents lives alone outside Amagasaki City at After = 0 and 0 otherwise). Third, we create a female dummy (1 if the student is female and 0 otherwise).

Moreover, to examine whether the reduction in athletic events by school and grade contributed to the recovery of academic achievement after the school closures, we first confirm the distribution of reduced hours spent on athletic events. Fig. A4 shows the histogram of the difference between the scheduled hours of events in 2019 and 2020–2021, and Fig. A5 shows the changes in the scheduled hours of events in 2019–2021 by school and grade. These figures include two groups with different event reduction times bordering on 200 min both in FY2020 and FY2021. Therefore, we create two athletic event reduction dummies that take a value of 1 if the number of athletic events is reduced by more than 200 min from FY2019 to FY2020 and from FY2019 to FY2021. Specifically, we first calculate the difference in the scheduled duration of athletic events before and after the school closures for each school and grade. Then, we create two dummy variables, \({{Reduc}20}_{c,{sg}}\) and \({{Reduc}21}_{c,{sg}}\), which take a value of 1 for schools and grades that reduced their athletic events by more than 200 min from FY2019 to FY 2020 and from FY2019 to FY 2021, respectively.

In Table 2, we show descriptive statistics by cohort, period, and grade group for the outcome variables used in this study. Table 3 shows descriptive statistics for the predetermined variables and athletic event reduction dummies for “Before” by cohort and grade group. To examine whether each variable differs between cohorts, the results of the balance test, that is, the p value of the t test, are also shown.

Table 2 Descriptive statistics of math and Japanese language (by cohort, period, and grade groups)
Table 3 Descriptive statistics of pre-determinant variables and athletic event reduction dummies (by cohort, period, and grade groups)

Table 2 shows that treatment groups T1 and T2 had higher test scores before the school closures than did control groups C1 and C2. Specifically, compared to “Before,” which does not include the effects of the COVID-19 pandemic, Grade Groups 1–3 had higher scores for both the T1 and T2 cohorts than did the C1 and C2 cohorts for both subjects, Grade Groups 4–5 had higher scores for both the T1 and T2 cohorts than did the C1 and C2 cohorts for math only; and Grade Groups 6–7 had higher scores for both the T1 and T2 cohorts than did the C1 and C2 cohorts for both subjects at the 10% level. The attrition rates were less than 1% for all cohorts because Amagasaki City fully resumed in-person classes after June 2020. However, in Table 3, the number of households receiving school attendance assistance is smaller in the treatment group than in the control group. No obvious differences in the other variables are observed between the treatment and control groups.

5 Estimation method

5.1 Main analysis

5.1.1 Impact 7 months after the school closure

Using the DID method, we compare the differences between the growth in standardized scores in math and Japanese language from 2019 to 2020 for the COVID-19 group 1 (\({T}_{1}\)) and from 2018 to 2019 for the non-COVID-19 group (\({C}_{1}\)). Figure 1 shows the cohorts and timing of the tests used in the estimation.

Fig. 1
figure 1

Cohort and test timing for the DID analysis 7 months after school closure

To facilitate interpretation and identify the impact of COVID-19 on academic performance by school grade group, we divide students in both cohorts from Grades 1–7 at \({After}=0\) into three grade groups based on their grade at \({After}=0\) (lower elementary: Grades 1–3; upper elementary: Grades 4–5; and junior high school: Grades 6–7). The estimation equation for each grade group \(g\in \,\){Grades 1–3, Grades 4–5, Grades grades 1–3, grades 4–5, grades 6–7} can be written as follows:

$${Y}_{{it}}=\,{\alpha }_{i}+{\gamma \; COVID}{19}_{i}+{\lambda \; Afte}{r}_{t}+{\delta \; COVID}{19}_{i}* {Afte}{r}_{t}+{\nu }_{{sg}}+{\varepsilon }_{{it}}$$
(1)

Now, let \({Y}_{{it}}\) be the standardized test scores of math and Japanese language for student \(i\) in year \(t\in \{2018\ldots 2020\}\). \({{COVID}19}_{i}\) is a dummy variable that takes 1 if student i is in treatment group \({T}_{1}\) (0 for control group \({C}_{1}\)). \({Afte}{r}_{t}\) is a dummy variable that takes 1 if the test for \({T}_{1}\) was conducted in 2020, and the test for \({C}_{1}\) was conducted in 2019 (0 if the test was conducted 1 year earlier). \({\alpha }_{i}\) and \({\nu }_{{sg}}\) are individual fixed effects and school grade fixed effects at \({Afte}{r}_{t}=0\) (\({T}_{1}\) cohort is for FY2019, \({C}_{1}\) cohort is for FY2018), respectively. \({\varepsilon }_{{it}}\) denotes the error term. We also assume that \(E\left[{\varepsilon }_{{it}}|t\right]=0\).

Denoting G and c as the grade group and cohort, respectively, the ATT of the COVID-19 pandemic on standardized test scores in math and Japanese language for grade group \(g\), and \({\delta }_{g,{ATT}}\) can be written as follows:

$$\begin{array}{l}{\delta }_{g,{ATT}}=\left\{E\left[{Y}_{{it}}|G=g,\,c={T}_{1},\,t=2020\right]-E\left[{Y}_{{it}}|G=g,\,c={T}_{1},\,t=2019\right]\right\}\left\}\right.\\\qquad\;\;\, \,-\,\left\{E\left[{Y}_{{it}}|G=g,\,c={C}_{1},\,t=2019\right]-E\left[{Y}_{{it}}|G=g,\,c={C}_{1},\,t=2018\right]\right\}\end{array}$$
(2)

We estimate Eq. (1) for each grade and then average the estimation results for all grade groups for each subject and compare the effect size with those in existing studies.

5.1.2 Impact 19 months after the school closure

Here, we compare the growth in standardized scores in math and Japanese language from 2019 to 2021 for the COVID-19 group 2 (\({T}_{2}\)) and from 2018 to 2020 for the COVID-19 group 1 (\({T}_{1}\)). Unlike the analysis in Section 5.1.1, we divide students in both cohorts from Grades 1–6 at \({After}=0\) into two grade groups based on their grade at \({After}=0\) (lower elementary: Grades 1–3, upper elementary: Grades 4–6) since the available grades are up to Grade 8, as of \({Afte}{r}_{t}=1\). Moreover, both cohorts experienced the COVID-19 pandemic (\({T}_{2}\): FY2020 and FY2021; \({T}_{1}\): FY2020). Therefore, we remove the impact of COVID-19 included in FY2020 for the T1 cohort using the ATT by grade group estimated in Eq. (1). Here, the T1 and C1 cohorts are each one grade higher than the grade groups defined in Section 5.1.1 (lower elementary: Grades 2–4, upper elementary: Grades 5–6, junior high school: Grade 7) because the interval between \({Afte}{r}_{t}=0\) and \({Afte}{r}_{t}=1\) is two years and FY2020, which includes the impact of COVID-19, is the second year. By doing so, we can consider the \({T}_{1}\) cohort as the non-COVID-19 group (control group 2, \({C}_{2}\)). The cohorts and timing of the tests used in the estimation are shown in Fig. 2.

Fig. 2
figure 2

Cohort and test timing for the DID analysis 19 months after school closure

We also divide the students in both cohorts from Grades 1–6 at \({After}=0\) into three grade groups based on their grade at \({After}=0\) (lower elementary: Grades 1–3, upper elementary: Grades 4–5, and junior high school: Grade 6). Here, Grade 7 is excluded from the analysis due to a lack of test scores 19 months after the school closures. Adding the superscript “L” to indicate the long-term effect of the COVID-19 pandemic, the estimation equation for the effect at 19 months after the school closures can be written by three grade groups \({g}^{L}\) \(\in \,\){grades 1–3, grades 4–5, grade 6} as follows:

$${Y}_{{it}}^{L}={\alpha }_{i}^{L}+{\gamma }^{L}{COVID}{19}_{c}^{L}+{\lambda }^{L}{Afte}{r}_{t}^{L}+{\delta }^{L}{COVID}{19}_{c}^{L}* {Afte}{r}_{t}^{L}+{\nu }_{s{g}^{L}}^{L}+{\varepsilon }_{{it}}^{L}$$
(3)

Now, let \({Y}_{{it}}^{L}\) be the standardized test scores for math and Japanese language for student \(i\) in year \(t\in \{2018\ldots 2021\}\). \({{COVID}19}_{c}^{L}\) is a dummy variable that takes a value of 1 if student i is in the treatment group \({T}_{2}\) (0 for the control group \({C}_{2}\)). \({Afte}{r}_{t}^{L}\) is a dummy variable that takes a value of 1 if the test for \({T}_{2}\) was conducted in 2021 and the test for \({C}_{2}\) was conducted in 2020 (0 if the test was conducted 2 years earlier). \({\alpha }_{i}^{L}\) and \({\nu }_{{sg}}^{L}\) are individual fixed effects and school grade fixed effects, respectively, at \({Afte}{r}_{t}^{L}=0\) (the \({T}_{2}\) cohort is for FY2019, and the \({C}_{2}\) cohort is for FY2018). \({\varepsilon }_{{it}}^{L}\) denotes the error term. We assume that \(E\left[{\varepsilon }_{{it}}^{L}|t\right]=0\).

The ATT of the COVID-19 pandemic on standardized test scores in math and Japanese language for grade group \({g}^{L}\) (\({\delta }_{{g}^{L},{ATT}}^{L}\)) can be written as follows:

$$\begin{array}{l}{\delta }_{g,{ATT}}^{L}=E\left[{Y}_{{it}}\left|G={g}^{L}\right.,\,c={T}_{2},\,t=2021\right]-E\left[{Y}_{{it}}\left|G={g}^{L}\right.,\,c={T}_{2},\,t=2019\right]\\\qquad\;\;\;-\,\left\{E\left[{Y}_{{it}}\left|G=g\right.,\,c={C}_{2},t=2019\right]-E\left[{Y}_{{it}}\left|G=g\right.,\,c={C}_{2},\,t=2018\right]\right\}\\\qquad\;\;\; -E\left[\right.E\left[{Y}_{{it}}\left|G={g}^{{\prime} }\right.,\,c={T}_{1},\,t=2020\right]-E\left[{Y}_{{it}}\left|G={g}^{{\prime} }\right.,\,c={T}_{1},\,t=2019\right]\\\qquad\;\;\;-\left\{E\left[{Y}_{{it}}\left|G={g}^{{\prime} }\right.,\,c={C}_{1},\,t=2019\right]-E\left[{Y}_{{it}}\left|G={g}^{{\prime} }\right.,\,c={C}_{1},\,t=2018\right]\right\}\left]\right.\\\qquad\;\;\;=\left\{E\left[{Y}_{{it}}\left|G={g}^{L}\right.,\,c={T}_{2},\,t=2021\right]-E\left[{Y}_{{it}}\left|G={g}^{L}\right.,\,c={T}_{2},\,t=2019\right]\right\}\\\qquad\;\;\;-\,\left\{\left[{Y}_{{it}}\left|\right.G={g}^{L},\,c={C}_{2},\,t=2020\right]-E\left[{Y}_{{it}}\left|G={g}^{L}\right.,\,c={C}_{2},\,t=2018\right]\right\}+\widehat{{\delta }_{{g}^{{\prime} },{ATT}}\,}\end{array}$$
(4)

Here, the grade group in T1 used to exclude the effects of the COVID-19 pandemic for control group C2 is indicated by \(g{^\prime}\) \(\in \,\){Grades 2–4, Grades 5–6, Grade 7}. Equation (3) is also estimated for each school grade group. We also average the estimated results for the three grade groups and compare the effect sizes with those of existing studies.

5.1.3 Assumptions for identification

To interpret \({\delta }_{g,{ATT}}\) and \({\delta }_{g,{ATT}}^{L}\) as causal effects of the COVID-19 pandemic on academic performance, we need to assume common trends and common shocks.

The common trend assumption cannot be directly verified due to insufficient data before FY2018. However, most existing studies have conducted DID analyses using two cohorts over two periods (Contini et al., 2022; Engzell et al., 2021; Haelermans et al., 2022; Lichand et al., 2022). Therefore, to estimate the impact 7 and 19 months after the school closures, we assume the common trends between the T1 and C1 cohorts and between the T2 and C2 cohorts for each grade group.

We need two additional assumptions to estimate the impact 19 months after the school closures. The first assumption is that the common trend for each grade group holds for the C1 cohort in addition to the T2 and C2 (=T1) cohorts. This assumption is because the outcome variables for the C2 cohort in FY2020 are post-COVID-19, so the impact of COVID-19 in the C2 cohort must be removed using the T1 and C1 cohorts. The second assumption is that the ATT in FY2020 for the C2 cohort is the same within the same grade group. The reason for this assumption is that Eqs. (2) and (4) estimate the ATT 7 months after the school closures for each grade group.

To establish the common shock assumption, it is necessary for no institutional changes affecting outcomes other than COVID-19 to have occurred in both the COVID-19 and non-COVID-19 groups. In Japan, however, the course of study was revised by the MEXT in April 2020. This revision added 26.25 and 52.5 h of English tuition per year to Grades 3–4 and 5–6, respectively. However, the number of math and Japanese language classes did not change. The common shock assumption seems reasonable, as no other institutional changes affecting the outcomes were implemented between 2018 and 2021.

5.2 Analysis of heterogeneity of effects

5.2.1 Heterogeneity of effects across quartiles of test scores

Existing studies have shown that the negative impact of the COVID-19 pandemic on academic achievement was greater for lower academic levels (Ardington et al., 2021; Asakawa & Ohtake, 2022; Kuhfeld et al., 2022; Schult et al., 2022a, 2022b). However, the negative impacts were shown to be greater for individuals at higher academic levels in several countries and regions (Contini et al., 2022; Gambi & De Witte, 2021).

Therefore, we test the hypothesis that the lower was the student’s academic achievement level before the school closures was, the greater the negative impact of the COVID-19 pandemic and the slower the recovery of academic achievement. We then compare the results with those of existing studies. Specifically, we use a DID approach using the advantages of the quantile regression (quantile-DID) method used by Athey and Imbens (2006) for each subject to test whether the impact of COVID-19 on academic performance differs across quartiles of test scores.

The Quantile-DID test is performed with quartiles (hereafter, QT) of the outcome excluding the upper and lower 5th percentiles (1st QT: 0.05–0.275, 2nd QT: 0.275–0.5, 3rd QT: 0.5–0.725, 4th QT: 0.725–0.95). The equation for estimating the effect of COVID-19 on standardized test scores 7 months after the closures in the \({q}^{{th}}\) quartile of the scores for each grade group \(g\) can be written as follows according to Eq. (1):

$${Y}_{{it}}={\alpha }_{i}\left(q\right)+\gamma \left(q\right){COVID}{19}_{i}+\lambda \left(q\right){Afte}{r}_{t}+\delta \left(q\right){COVID}{19}_{i}* {Afte}{r}_{t}\,+\,{\nu }_{{sg}}\left(q\right)+{\varepsilon }_{{it}}\left(q\right)$$
(5)

Denoting Q as the quartile of standardized test scores, the ATT of the COVID-19 pandemic on standardized test scores for grade group \(g\) and the \({q}^{{th}}\) quartile can be written as follows according to Eq. (2):

$$\begin{array}{c}{\delta }_{g,{ATT}}\left(q\right)=E\left[{Y}_{{it}}|G=g,\,c={T}_{1},Q=q,\,t=2020\right]-E\left[{Y}_{{it}}|G=g,\,c={T}_{1},\,Q=q,\,t=2019\right]\\ -\left\{E\left[{Y}_{{it}}|G=g,\,c={C}_{1},\,Q=q,\,t=2019\right]-E\left[{Y}_{{it}}|G=g,\,c={C}_{1},Q=q,\,t=2018\right]\right\}\,\end{array}$$
(6)

Using the superscript “L” to indicate the long-term effect of the COVID-19 pandemic, the equation for estimating the effect of COVID-19 on test scores 19 months after the school closures in the \({q}^{{th}}\) quartile of standardized test scores for each grade group \(g\) can be written as follows according to Eq. (3):

$${Y}_{{it}}^{L}=\,{\alpha }_{i}^{L}\left(q\right)+{\gamma }^{L}\left(q\right){COVID}{19}_{i}^{L}+{\lambda }^{L}\left(q\right){Afte}{r}_{t}^{L}+{\delta }^{L}\left(q\right){COVID}{19}_{i}^{L}* {Afte}{r}_{t}^{L}+{\nu }_{s{g}^{L}}^{L}\left(q\right)+{\varepsilon }_{{it}}^{L}\left(q\right)$$
(7)

Denoting Q as the quartile of standardized test scores, the ATT of the COVID-19 pandemic on standardized test scores for grade group \(g\) and the \({q}^{{th}}\) quartile can be written as follows according to Eq. (4):

$$\begin{array}{c}{\delta }_{g,{ATT}}^{L}\left(q\right)=E\left[{Y}_{{it}}\left|\right.G={g}^{L},\,c={T}_{2},\,Q=q,t=2021\right]-E\left[{Y}_{{it}}\left|\right.G={g}^{L},\,c={T}_{2},\,Q=q,t=2019\right]\\ -\,\left\{E\left[{Y}_{{it}}\left|\right.G=g,\,c={C}_{2},Q=q,t=2019\right]-E\left[{Y}_{{it}}\left|\right.G=g,\,c={C}_{2},Q=q,\,t=2018\right]\right\}\\ -E\left[\right.E\left[{Y}_{{it}}\left|\right.G={g}^{{\prime} },\,c={T}_{1},Q=q,\,t=2020\right]-E\left[{Y}_{{it}}\left|\right.G={g}^{{\prime} },\,c={T}_{1},Q=q,\,t=2019\right]\\ -\{E\left[{Y}_{{it}}\left|\right.G=g^{{\prime}} ,\,c={C}_{1},Q=q,\,t=2019\right]-E\left[{Y}_{{it}}\left|\right.G=g^{{\prime}} ,\,c={C}_{1},Q=q,\,t=2018\right]\}\left]\right.\\ =\left\{E\left[{Y}_{{it}}\left|\right.G={g}^{L},\,c={T}_{2},Q=q,\,t=2021\right]-E\left[{Y}_{{it}}\left|\right.G={g}^{L},\,c={T}_{2},Q=q,\,t=2019\right]\right\}\\ -\,\left\{\left[{Y}_{{it}}\left|\right.G={g}^{L},\,c={C}_{2},Q=q,\,t=2020\right]-E\left[{Y}_{{it}}\left|\right.G={g}^{L},\,c={C}_{2},Q=q,\,t=2018\right]\right\}\\ +\widehat{\,{\delta }_{{g}^{{\prime} },{ATT}}\,\left(q\right)}\end{array}$$
(8)

The standard errors of the parameters of interest, \(\delta \left(q\right)\) and \({\delta }^{L}\left(q\right)\), in Eqs. (5) and (7), respectively, are derived from a nonparametric bootstrap with 300 iterations. In addition to the assumptions for identification imposed in Section 5.1.3, quantile-DID estimation analysis requires imposing common trend assumptions for the same quartile \(q\) of T1 and C1 for analysis 7 months after the school closures and T2 and C2 (=T1) and C1 for analysis 19 months after the school closures. Additionally, we assume that the distribution of unobserved variables does not change over time, allowing the distribution of unobserved variables to differ between the treatment and control groups.

5.2.2 Heterogeneity of effects by pre-determinant covariates

Previous studies have shown that the impact of COVID-19 on academic achievement is highly heterogeneous, not only by country and region but also by individual characteristics. For example, some studies have demonstrated that children from households with lower SESs decreased their academic achievement more due to the COVID-19 pandemic (Contini et al., 2022; Gore et al., 2021; Haelermans et al., 2022; Kuhfeld et al., 2023; Maldonado & De Witte, 2022). Moreover, Haelermans et al. (2022) showed that children from single-parent households had lower academic performance due to the COVID-19 pandemic than those from dual-parent households. However, the results of differences in the effects of COVID-19 by child gender have been mixed across countries and regions (Ardington et al., 2021; Arenas & Gortazar, 2022; Birkelund & Karlson, 2023; Contini et al., 2022; Hevia et al., 2022).

In line with the previous studies’ findings, we examine whether the impact of the COVID-19 pandemic on academic performance differs depending on individuals’ prepandemic determinant characteristics in Amagasaki City. Here, we estimate the following equation, which adds the prepandemic predetermined variables as interaction terms to the full-sample DID analysis defined in Eqs. (1) and (3):

$$\begin{array}{c}{Y}_{{it}}\,={\alpha }_{i}^{{Hetero}}+{\gamma }^{{Hetero}}{COVID}{19}_{c}+{\lambda }^{{Hetero}}{Afte}{r}_{t}+{\eta }^{{Hetero}}\,{{Interact}}_{i}\\ +\,{\delta }_{1}^{{Hetero}}{COVID}{19}_{c}* {Afte}{r}_{t}+\,{\delta }_{2}^{{Hetero}}{COVID}{19}_{c}* \,{{Interact}}_{i}\,+\,{\delta }_{3}^{{Hetero}}{Afte}{r}_{t}* \,{{Interact}}_{i}\\ +\,{\delta }_{4}^{{Hetero}}{COVID}{19}_{c}* {Afte}{r}_{t}* \,{{Interact}}_{i}\,+\,{\nu }_{{sg}}^{{Hetero}}+\,{\varepsilon }_{{it}}^{{Hetero}}\end{array}$$
(9)

\({{Interact}}_{i}\) is the interaction term of the prepandemic predetermined dummy variables. Specifically, these dummy variables refer to whether the student received school attendance assistance, whether the student was living with one parent, and whether the student was female. We exclude individual fixed effects from the estimation equation because the dummy variable for the interaction term is constant for individuals over the estimation period. In the analysis 19 months after the school closures, since control group C2 was affected by the COVID-19 pandemic, we add \(\widehat{{\delta }_{1}^{{Hetero}}}\) estimated for each grade group \(g^{\prime}\) \(\in \,\){Grades 2–4, Grades 5–6, Grade 7} using the T1 and C1 cohorts to the results in Eq. (9).

5.2.3 Heterogeneity of effects by athletic events reduction after COVID-19 school closure

In Amagasaki City, many school events, including athletic events, were reduced to compensate for lost class time due to the pandemic, as described in Section 3.2. Since most school events in Amagasaki were held before December, the month of the achievement test, the DID coefficient indicating the effect of the COVID-19 pandemic on academic achievement would be underestimated if academic performance recovered more for the schools and grades that reduced school events and increased class time.

To test whether the effects of the COVID-19 pandemic on academic performance differ between students with more than 200 min of athletic event reduction and students in other schools, we perform a DDD estimation by multiplying \({COVID}{19}_{c}* {Afte}{r}_{t}\) by the school- and grade-level athletic event reduction dummies, \({{Reduc}20}_{{sg}}\) and \({{Reduc}21}_{{sg}}\). These dummy variables take a value of 1 if the reduction time in school \(s\) and grade \({g}\) in 2020 and 2021 is greater than 200 min compared to the total time in 2019. Since the reduction hours for athletic events vary across schools and grades, we allow the dummy variables to take different values for different grades rather than grade groups, even within the same school. The estimated equation after 7 months of the school closures is as follows:

$$\begin{array}{c}{Y}_{{it}}\,={\alpha }_{i}^{{DDD}}+{\gamma }^{{DDD}}{COVID}{19}_{c}+{\lambda }^{{DDD}}{Afte}{r}_{t}+{\eta }^{{DDD}}\,{{Reduc}20}_{{sg}}\\ +\,{\delta }_{1}^{{DDD}}{COVID}{19}_{c}* {Afte}{r}_{t}+\,{\delta }_{2}^{{DDD}}{COVID}{19}_{c}* {{Reduc}20}_{{sg}}\,+\,{\delta }_{3}^{{DDD}}{Afte}{r}_{t}* {{Reduc}20}_{{sg}}\\ +{\delta }_{4}^{{DDD}}{COVID}{19}_{c}* {Afte}{r}_{t}* {{Reduc}20}_{{sg}}\,+\,{\nu }_{{sg}}^{{Hetero}}+\,{\varepsilon }_{{it}}^{{Hetero}}\end{array}$$
(10)

We exclude individual fixed effects from Eq. (10) because \({{Reduc}20}_{{sg}}\) is constant within individuals over the estimation period. We assume that \(E\left[{\varepsilon }_{{it}}^{{DDD}}|t\right]=0\). The impact of more than 200 min of reduced athletic events from 2019 to 2020 is represented by \({\delta }_{1}^{{DDD}}+{\delta }_{4}^{{DDD}}\), while the impact of fewer than 200 min of reduced athletic events is represented by \({\delta }_{1}^{{DDD}}\). Thus, we present only \(\widehat{{\delta }_{4}^{{DDD}}}\), the difference between students in schools with Reduc20 = 1 and Reduc20 = 0.

Next, the estimation equation after 19 months of the school closures is as follows:

$$\begin{array}{c}{Y}_{{it}}={{\alpha }_{i}^{{DDD},L}+\gamma }^{{DDD},L}{COVID}{19}_{c}+{\lambda }^{{DDD},L}{Afte}{r}_{t}+{\eta }_{1}^{{DDD},\,L}\,{{Reduc}20}_{{sg}}+{\eta }_{2}^{{DDD},L}\,{{Reduc}21}_{{sg}}\\ +{\delta }_{1}^{{DDD},L}{COVID}{19}_{c}* {Afte}{r}_{t}\,+\,{\delta }_{2}^{{DDD},L}{COVID}{19}_{c}* {{Reduc}20}_{{sg}}+{\delta }_{3}^{{DDD},L}{COVID}{19}_{c}* {{Reduc}21}_{{sg}}\\ +{\delta }_{4}^{{DDD},L}{Afte}{r}_{t}* {{Reduc}20}_{{sg}}+\,{\delta }_{5}^{{DDD},L}{Afte}{r}_{t}* {{Reduc}21}_{{sg}}\\ +{\delta }_{6}^{{DDD},L}{COVID}{19}_{c}* {Afte}{r}_{t}* {{Reduc}20}_{{sg}}+\,{\delta }_{7}^{{DDD},L}{COVID}{19}_{c}* {Afte}{r}_{t}* {{Reduc}21}_{{sg}}\,+\,{\nu }_{{sg}}^{{DDD},L}+\,{\varepsilon }_{{it}}^{{DDD},L}\end{array}$$
(11)

As in Eq. (3), since control group C2 was affected by the COVID-19 pandemic, we add \(\widehat{{\delta }_{1}^{{DDD}}}\) estimated in Eq. (10) for each grade group \(g^{\prime}\) \(\in \,\){grades 2–4, grades 5–6, grade 7} using the T1 and C1 cohorts to the estimated results in Eq. (11). We also exclude individual fixed effects from the estimation equation because \({{Reduc}20}_{{sg}}\) and \({{Reduc}21}_{{sg}}\) are constant within individuals over the estimation period. Moreover, we assume that \(E\left[{\varepsilon }_{{it}}^{{DDD},L}|t\right]=0\).

In Eq. (11), two different treatment effects are estimated for FY2020 and FY2021 depending on the timing of the reduction in athletic events. Therefore, four estimation results are obtained according to the reduction pattern of athletic events. Since only a few schools had fewer than 200 min of event reduction in 2020 and more than 200 min in 2021, we present only \(\widehat{{\delta }_{6}^{{DDD},L},}\) the difference between students in schools with Reduc20 = 1 and Reduc21 = 0 and those with Reduc20 = Reduc21 = 0 and \(\widehat{{\delta }_{6}^{{DDD},L}}+\widehat{{\delta }_{7}^{{DDD},L},}\) the difference between students in schools with Reduc20 = Reduc21 = 1 and those with Reduc20 = Reduc21 = 0.

6 Estimation results

6.1 Results of the main analysis

Figure 3 and Table A1 present the results of estimating the effect of the COVID-19 pandemic on math and Japanese language standardized test scores. The estimated coefficients 7 months after the school closures are plotted on the left-hand side of the figure and in columns (1)–(5) of the table, and the estimated coefficients 19 months after the closures are plotted on the right-hand side of the figure and columns (6)–(10) of the table.

Fig. 3
figure 3

Main results: full-sample DID

The results indicate that the COVID-19 pandemic had a more severe negative impact on math than on Japanese language at both 7 and 19 months after the school closures. Specifically, on average for all grades, math scores worsened by 0.133 SDs and 0.249 SDs at 7 and 9 months after the school closures, respectively. Meanwhile, Japanese language scores were not significantly affected 7 months after the school closures; although, they worsened by 0.113 SD 19 months after the school closures.

For school grade groups, negative effects were observed in all grade groups for math, while the impact of the pandemic on test scores differed by school grade for Japanese language. Specifically, the effects of the pandemic on math scores after the school closures are as follows: Grades 1–3, –0.134 SDs at 7 months and –0.205 SDs at 19 months; Grades 4–5, –0.167 SDs at 7 months and –0.321 SDs at 19 months; and Grades 6–7, –0.086 SDs at 7 months and –0.225 SDs at 19 months. Meanwhile, only Japanese language test scores in Grades 1–3 were negatively affected (approximately −0.2 SDs). Surprisingly, no negative effects were observed in the other grades. For Grades 4–5, Japanese test scores increased by 0.134 SD at 7 months after the school closures, but the significant positive effect was no longer observed 19 months. For Grades 6–7, no significant effects were observed at 7 and 19 months after the school closures.

A possible reason for the relatively large long-lasting decline in math and lower school graders’ Japanese language scores is that substituting for the loss of regular in-person instruction in these subjects and grade groups is difficult. Four alternative educational methods to regular in-person instruction exist: online education by schools, home education through parental involvement, in-person extracurricular education, and online extracurricular education by cram schools and private tutors. In Amagasaki City, online education by schools and in-person extracurricular education were not available during the school closures. Therefore, the alternatives during the school closures were home education by parents and online extracurricular education by cram schools and private tutors. The learning efficiency of these alternative methods varied by subject and school grade, which may have caused differences in the impact of the pandemic on the test scores.

The heterogeneity in the efficiency of the alternative educational methods may be due to two reasons. First is the difference in the practice required for learning retention. For math, previous studies show that more practice is required to establish math skills (Cooper & Sweller, 1987; Geary, 1995; Laski et al., 2013; Murnane, 1975; Ramirez et al., 2016). Math learning can be difficult if not done in formal settings such as schools, while reading can be learned at home (Murnane, 1975). Moreover, students who have not fully mastered elementary problem-solving methods may have difficulty learning and practicing on their own since math requires the mastery of the simple methods to solve advanced problems (Laski et al., 2013; Ramirez et al., 2016). Furthermore, the decrease in academic achievement due to lack of practice may be greater in math than in reading because enhancing math computation and spelling skills require extensive practice, while understanding concepts learned in reading comprehension is based on experience rather than practice (Cooper & Sweller, 1987; Geary, 1995). For Japanese language, the curriculum in the lower school grades includes units that require practice, such as spelling; units that require less practice increases in higher school grades. Therefore, if the amount of practice in these areas was reduced during the school closures, test scores in the Japanese language would decline in the lower school grade groups.

Second is the difference in the difficulty of parental involvement. Extant literature shows that parental involvement in children’s learning contributed to the recovery of test scores after the COVID-19 pandemic in Chinese language but not in math (Wang et al., 2023), and that parental support for math learning at home requires training to have appropriate knowledge, appropriate experiences, positive beliefs, and self-efficacy (Panaoura, 2020). Our findings are consistent with those of previous studies, which show that compensating for the decline in test scores through learning at home was more difficult in math than in the Japanese language.

Therefore, less practice and easier parental involvement do not negatively affect Japanese language in the upper elementary and junior high school students (Grade Groups 4–5 and 6–7). However, we cannot verify these mechanisms due to lack of data.

6.2 Results of the heterogeneity of effects analysis

6.2.1 Heterogeneity of effects across test score quartiles

Figures 4 and 5 and Tables A2 and A3 show the quantile-DID estimates of the effects of the COVID-19 pandemic on standardized test scores in the math and Japanese language. For each figure, the upper and lower panels show the effects of COVID-19 at 7 and 19 months after the closures, respectively. For each table, columns (1)–(5) and (6)–(10) show the effects of COVID-19, 7 and 19 months after the closures, respectively.

Fig. 4
figure 4

Quantile-DID results: Math

Fig. 5
figure 5

Quantile-DID results: Japanese language

Figure 4 and Table A2 show that the negative impact of the pandemic on math test scores varied by score quartile for all grade groups. Specifically, Grade Groups 1–3 had negative and significant coefficients on the 1st–3rd QTs, and a positive and significant effect was observed only in the 4th QT. The scores were more negative in the lower QTs and decreased more from 7 to 19 months after the school closures. For Grade Groups 4–5, all the coefficients were negative and significant, and the negative coefficients increased further from 7 to 19 months after the school closures. For Grade Groups 6–7, the coefficients were negative and significant only for the 1st QT at 7 months after the closures. Moreover, 19 months after the closures, Grade Group 6 had negative and significant coefficients for the 1st and 2nd QTs. The negative coefficients for the 1st and 2nd QTs further increased from 7 to 19 months after the closures.

Figure 5 and Table A3 show that the negative impact of the COVID-19 pandemic on scores in the Japanese language varied by score quartile for Grades 1–5. Specifically, Grade Groups 1–3 had negative and significant coefficients in all QTs, but the upper QT was more negatively affected. The coefficients did not change in this group between 7 and 19 months after the school closures. For Grades 4–5, only the 4th QT had a negative and significant coefficient, which increased further from −0.085 SDs at 7 months after the school closures to −0.338 SDs 19 months after the school closures. However, in this group, a positive and significant effect was observed on the 1st–3rd QTs 7 months after the school closures and only on the 1st QT 19 months after the closures. In Grade 6 (Grade Groups 6–7 for 7 months after the school closures), no negative impact was observed in any of the test score quartiles.

A possible reason why the decline in test scores varies across test score quartiles is the differences in the difficulty of substituting regular in-person instruction via home education by parents and online extracurricular education by cram schools and private tutors. As noted in Section 6.1, math requires practice to raise scores. However, if the stock of pre-pandemic practice is lower in the lower quantiles, the students’ learning efficiency in alternative educational methods after the pandemic may also be lower. For Japanese language, alternative educational methods may have mitigated the decline in scores because the need for practice is low and home education by parents is more straightforward. However, since the higher QT elementary school students maintain their scores by using cram schools and private tutors, the shift in extracurricular education from in-person to online after the pandemic may have decreased their learning efficiency in the Japanese language. Unfortunately, we cannot further elucidate these mechanisms due to a lack of data.

6.2.2 Heterogeneity of effects by pre-determinant covariates

Figures 68 and Tables A4A6 show the DID coefficients with interaction terms that include the predeterminant variables of the school attendance assistance (SAA) receipt dummy, living with one parent dummy, and female dummy.

Fig. 6
figure 6

Heterogeneity of treatment effects across groups (students receiving school attendance assistance and others)

Fig. 7
figure 7

Heterogeneity of effects across groups (students living with one parent and others)

Fig. 8
figure 8

Heterogeneity of effects across groups (female students and male students)

Figures 6 and 7 and Tables A4 and A5 show that no statistically significant differences can be observed in the impact of the COVID-19 pandemic on math and Japanese language scores by living conditions at After = 0, that is, whether the student was receiving SAA and whether the student was living with one parent. However, Fig. 8 and Table A6 indicate that, only in Grade 6, the pandemic had a significantly lower negative impact on the Japanese language for female students than for male students. These findings suggest that females in the upper grades scored significantly higher than males in the Japanese language after the COVID-19 pandemic.

6.2.3 Results of effects of athletic events reduction after COVID-19 school closure

Before the analysis, we present the descriptive statistics for students in the schools and grades that reduced the duration of athletic events by 200 min or more in 2020 and 2021 and others in Table A7. The table shows that students in schools and grade groups that reduced the duration of athletic events by more than 200 min had lower academic achievement in treatment group 1 (T1, 2019–2020) and treatment group 2 (T2, 2019–2021). Moreover, for treatment group 1, higher percentages of students in schools and grades with athletic events reducation by more than 200 min were also receiving SAA and were living with one parent. Thus, the schools and grades that significantly reduced the duration of their athletic events were those with relatively low academic performance and living standards before the pandemic. If the negative impact of the COVID-19 pandemic was greater in these schools and grades, the reduction in athletic events may have been self-selective. Therefore, we use the results of the DDD analysis to determine whether recovery from the negative effects of the pandemic varied between students in schools and classes with a reduction in the duration of athletic events greater than 200 min and those in other schools and classes.

Figure 9 and Table A8 show the difference in effects of COVID-19 pandemic for students in schools with Reduc20 = 1 and Reduc20 = 0, as estimated by the DDD analysis in Eqs. (10) and (11). In Fig. 9, the coefficients on the effects 7 months and 19 months after the school closures, estimated using Eqs. (10) and (11), are plotted on the left side and right side, respectively. In Fig. A6, to confirm the magnitude of the estimated results, we also show the treatment effects for students in schools with Reduc20 = 0 estimated by COVID-19 × After, that is, \(\widehat{{\delta }_{1}^{{DDD}}}\) in Eq. (10) and \(\widehat{{\delta }_{1}^{{DDD},L}}\) in Eq. (11).

Fig. 9
figure 9

DDD results (students in the schools reduced athletic events by over 200 min and others)

In Fig. 9 and Table A8, we find that for all subjects, periods after the school closures, and grade groups, no statistically significant differences at the 5% level in the impact of the COVID-19 pandemic on test scores between students in schools that reduced athletic events by more than 200 min and those in other schools. Moreover, Fig. A6 shows that the magnitude of these coefficients is smaller than the baseline coefficient of the ATT for students in school, with Reduc20 = 0 for 7 months after the closures and those with Reduc20 = Reduc21 = 0 for 19 months after the closures. Thus, we conclude that schools that drastically reduced their hours spent participating in athletic events improved their students’ test scores slightly at 19 months after the school closures but not enough to counteract the negative effects of the pandemic.

7 Conclusions

We examined whether the COVID-19 pandemic affected the standardized math and Japanese language test scores of students in Grades 1–7 in all public elementary and junior high schools in Amagasaki using DID estimation.

The analysis compared the changes in the academic performance of the COVID-19 experienced cohort (treatment group 1) from 2019–2020 with the growth of the non-COVID-19 cohort (control group 1) from 2018–2019. In addition, to analyze the impact at 19 months after the school closures, we compared the changes from 2019 to 2021 for the cohort that took the test twice after the school closures (treatment group 2) with the changes from 2018 to 2020 for the cohort that took the test only once after the closures (control group 2). Since Control Group 2 was affected by school closures in 2020, the effect was removed by subtracting the difference between Treatment Group 1 and Control Group 1. To facilitate interpretation, we created three grade groups (lower elementary, upper elementary, and junior high school) and estimated by grade group. We also performed quantile-DID and DID analyses with interaction terms to check for the heterogeneity of effects across test scores and predetermined quantiles. Finally, we conducted DDD estimation to identify the heterogeneity in effects between students in schools that significantly reduced the duration of their athletic events after the school closures and students in other schools.

The results of the main DID estimation showed that, for all school grades, the negative effects of the COVID-19 pandemic were greater for math scores than for the Japanese language scores at 7 months, and the decreases in math scores were more pronounced than those in the Japanese language scores at 19 months after the school closures. Specifically, no significant effect was observed for Japanese language scores 7 months after school closure, but they worsened by 0.113 SD 19 months after school closure. Math scores worsened by 0.133 SDs and 0.249 SDs at 7 and 19 months after the school closures, respectively.

Considering the three grade groups, the negative effects on math scores did not differ by grade group, but Japanese language scores were negatively affected in the lower grade groups. The analysis by test score quartile revealed that math scores decreased more in the lower quartile in all grades while Japanese language scores decreased more in the upper quartile, but only for elementary school students. The mechanism by which the negative effects of the COVID-19 pandemic on student test scores in math for all public elementary and junior high school students (Grades 1–7) and in the Japanese language in early elementary school grades (Grades 1–3) remained 19 months after the school closures could not be explained by gender, the SES of the student’s family prior to the COVID-19 pandemic, or the decrease in the duration of school athletic events after the school closures.

Compared to the effect sizes of previous studies presented in Table 1, the negative effects 19 months after the school closures in Amagasaki City were almost twice as large for math (previous study: −0.139 SDs, Amagasaki: −0.251 SDs) and slightly smaller for the Japanese language (previous study: −0.088 SDs, Amagasaki: −0.062 SDs). With this being said, our finding of a greater negative effect in math than in the Japanese language is consistent with the findings of previous studies. By grade level, our findings are consistent with the previous studies presented in Table 1, which show that in Amagasaki, the magnitude of the negative effect on math scores was similar across grades and the negative effect on Japanese scores was larger in the lower grades at 7 months after the school closure. Specifically, in Amagasaki City, the negative effect on Japanese language scores was greater in Grades 1–3 than in the extant studies, while no negative effect was observed in Grade Groups 4–5 or 6–7, unlike in extant studies.

Amagasaki City experienced a greater deterioration in math scores due to the pandemic than other municipalities in Japan. For example, Asakawa and Ohtake (2022) showed that in Nara City, Japan, math scores in Grades 4 and 5 had already recovered at 7 months after the school closures (0.05 SDs for Grade 4 and 0.46 SDs for Grade 5). However, in Amagasaki, the negative impact remained −0.167 SDs for Grades 4–5 during the same period, delaying recovery by 0.217–0.393 SDs compared to that in Nara City.

Even though our analysis used available information (gender, SAA status and number of parents cohabiting with the student prior to the COVID-19 pandemic and reduced hours of athletic events after school closure) and examined the channels through which the effects of the COVID-19 pandemic on test scores remain long-term, it still failed to identify.

Three further possible reasons can be considered why the students’ academic performance in Amagasaki City had not recovered even 19 months after the school closures. The first possible reason is that the negative effect of COVID-19 on the test scores of elementary and junior high school students in Amagasaki City may be greater than the Japan-wide negative effect on the same school grade students because Amagasaki City is a municipality with one of the highest percentages of students with low SES. This result is consistent with studies showing that students with lower SES and sociocultural capital were more affected by COVID-19 (Gore et al., 2021; Schult et al., 2022a, 2022b). However, the test used in Amagasaki City is not comparable to that used in other municipalities or to the national achievement test. Thus, comparisons among municipalities will be performed in the future.

The second possible reason is that the class time lost due to the pandemic was not fully compensated for by online learning and extracurricular education. In Amagasaki City, schools were unable to fully compensate for the class time lost due to the 10 weeks (50 days) of COVID-19 school closures.Footnote 8 Even though high-quality online education complements the lack of instructional time in school and significantly mitigates the decline in academic achievement caused by COVID-19 (Clark et al., 2021), the introduction of online learning in Amagasaki City did not proceed until April 2022.Footnote 9 A delay in introducing online education in Amagasaki City may hinder the recovery of test scores. Moreover, the percentage of students who use extracurricular education (e.g., cram schools and private tutoring) is also low in Amagasaki City.Footnote 10 Considering the findings of prior studies showing that math test scores fully recovered within a year in Nara City, where a high percentage of students use extracurricular education (Asakawa & Ohtake, 2022), and that using a cram school improves math and Japanese language test scores (Abe et al., 2023), low use of extracurricular education may also have delayed the recovery of test scores in Amagasaki City. However, we do not have sufficient data to verify these findings, and further analysis is needed.

A third possible reason is that the COVID-19 pandemic has worsened students’ mental health, resulting in a long-term decline in academic performance. In Japan, seven surveys conducted by the National Center for Child Health and Development (NCCHD) confirmed a long-term deterioration in the mental health of students after the spread of COVID-19 (NCCHD, 2023). However, no evidence has been obtained to show that academic performance declined due to the deterioration of mental health caused by COVID-19. Unfortunately, we cannot examine mental health because administrative data from Amagasaki City do not reveal students’ mental health information comparable to that used in previous studies.

One possible caveat is the effect of the 2020 course of study revision. As discussed in Section 5.1.3, math and Japanese language class time did not change, while an increase in English classes may have affected students’ academic performance. Additionally, the courses of study revision included curriculum changes, and the study contents also changed. If test difficulty changed due to curriculum revisions, we cannot completely distinguish the effects of the COVID-19 pandemic on academic achievement from the effects of differences in the curriculum, regardless of the use of the IRT test. Moreover, if the degree of change in difficulty differs across grades, it is difficult to accurately estimate the impact of the COVID-19 pandemic on academic achievement for each grade. However, although the revision of the course of study went into full effect in April 2020, at the same time as the COVID-19 pandemic, revisions were likely made in stages since the announcement in 2018. If parents changed their behavior in advance, the impact of curriculum revision is expected to be small.

A future research direction based on this study would be to analyze the impact of the pandemic on learning attitudes (e.g., learning time). In Japan, several studies have examined the impact of COVID-19 school closures on learning time (Ikeda & Yamaguchi, 2021; Nishihata & Kobayashi, 2023). For example, using logs of online learning service use, Ikeda and Yamaguchi (2021) found that students decreased their learning time using these services only during the COVID-19 school closures. They also found that the decline in learning time was heterogeneous across students, with students who had accessed online learning services at home and students in higher-quality schools spending more time learning than other students. Nishihata and Kobayashi (2023) showed that students in schools with longer COVID-19 closures had less learning time and more screen time, and these effects were more pronounced for students from low-income families, students with lower academic achievement, and elementary school students in single-parent households. Due to differences in extracurricular education use, the impact of the pandemic on learning attitudes may differ between Amagasaki City and other municipalities. This issue is outside the scope of this study and will be the subject of future research.