1 Introduction

The COVID-19 pandemic caused by the SARS-CoV-2, first reported at the end of 2019, has spread rapidly worldwide. Countries around the world have taken various measures including quarantines, to contain its spread. However, 3 years later, COVID-19 has not been eliminated yet and resulted in several outbreaks. In Japan, the disease has also spread quickly throughout the country, since the first case was detected and new cases were still occurring by the end of 2022. Studying the spread of infections can be useful for combating both current and future infectious diseases. Because COVID-19 is a highly contagious infectious disease, once it starts to spread, it spreads almost simultaneously, but never evenly across space, with some areas having more infected people than others at the same time.

Previous studies have suggested that population fluidity may significantly affect COVID-19 rate. For example, a reduction in travel has been reported to affect three epidemiological outcome measures, the number of export cases, the probability of an outbreak and the time delay to an outbreak [1, 2]. Noda reported a correlation between the number of effective reproductions estimated from the epi-curves of the target prefectures (Tokyo, Osaka, Aichi, Hokkaido, Kanagawa and Kyoto) and the behaviour patterns classified from the mesh data using non-negative tensor factorization [3]. It has also been reported that there is a correlation between the influx of population data for Tokyo during the pandemic and the number of effective reproductions, considering time delays [4].

However, a few studies have investigated the variations in infection status trends between regions during multiple waves of the pandemic. This study investigates inter-regional trends in COVID-19 pandemic. It is necessary to prepare for pandemics in general, not only for COVID-19, as they are likely to occur in the future as well. The data obtained could be helpful for the current and future control of infectious diseases.

2 Methodology

2.1 Definition of the indicator used in this study

In this study, we use open data on the novel coronavirus infections provided by the Japanese Ministry of Health, Labour and Welfare [5]. First, the population-adjusted infection rate for each prefecture is calculated by dividing the infection number in the prefecture by its population. The prefectural infection ratio is defined as \(T_{r}\) in Eq. (1)

$$\begin{aligned} T_r (t) = \frac{{C_{r}(t)}/P_r}{\sum _{r} {C_{r}}(t) / \sum _{r}{P_r}}, \end{aligned}$$
(1)

where the \(C_{r}(t)\) and \(P_r\) are the number of positive of COVID-19 tests at a specific time in region r and the population number in region r, respectively. As this study focuses on the prefectures in Japan, r indicates one of the 47 prefectures. The parameter t refers to time. The prefectural infection ratio is abbreviated as the PIR in the following text in this study.

The PIR compares the prefectural population-adjusted infection rate with the national population-adjusted infection rate as a baseline. In other words, a PIR value greater than 1.0 for a prefecture indicates a more active state of infection in the region than the national average. A PIR value closer to 1.0 also indicates that the infection in the prefecture is closer to the national trend, while a PIR of a prefecture less than 1.0 indicates that the infection situation in a region is more moderate than that at the national level. The number of infected individuals was calculated as the cumulative value within the target period of each pandemic wave. This indicates that the national cumulative number of infected persons was also small during the early part of the target period. The value of the national cumulative number of infections is included in the denominator of the PIR; therefore, the value of the population-adjusted infection rate by prefecture tends to be higher in the early part of each target period.

2.2 Target period

Prior to commencing the study, the following seven periods are defined as COVID-19 pandemic waves. Although there is no clear delimitation of each COVID-19 pandemic wave period, the boundaries of the pandemic waves are determined referring to daily data regarding the number of patients nationwide.

Regarding the fifth and seventh waves, the possible interference of previous waves cannot be ruled out, because the fifth and seventh waves are start before the number of national daily COVID-19 cases drop overall. The changeover periods between waves 4 and 5 and waves 6 and 7 occur at similar times of the year. The SARS-CoV-2, which causes COVID-19, is characterised by its continued spread through a vast number of mutations. First-wave data were excluded from the scope of this study, because the number of infected persons in this wave was relatively small (Table 1).

Table 1 Periods of the COVID-19 pandemic waves

3 Results and discussion

For the total of seven COVID-19 pandemic waves until the end of October 2022, the PIR is calculated daily and presented as time-series data. These time-series data are sorted in order of peak values for each prefecture and the top eight regions are extracted (Table 2) based on the PIR. Each wave of change in PIR has different characteristics and no general rule applicable to all pandemic waves has been determined yet. In this study, we share the results of our investigation regarding the fourth and sixth waves, because they have different and distinct characteristics.

Eight prefectures are extracted in the order of the peak values from the PIR time-series (Table 2) and hierarchically clustered against the two temporal features. Ward’s method, a type of hierarchical agglomerative clustering, was used for the clustering. The Euclidean distance was adopted as the distance between PIR values of each prefectures.

Table 2 The top eight prefectures in descending order of the PIR value of each COVID-19 pandemic wave

3.1 Fourth pandemic wave

From the time-series data of the PIR, two time-related features are extracted: the number of days from the start date of the fourth wave until the date of the peak value and the number of days from the start date of the fourth wave until the first day that the PIR exceeded ad hoc 75% of the peak. For the purpose of examining the order of infection by prefecture, it is necessary not only to clarify the peak period of infection but also to define and indicate the time when infection is almost prevalent. Because if the PIR remains high near the peak, infection activity may have begun well before the peak. There are various ways to define the almost prevalent date, but since the peak PIR value varies depending on the epidemic wave and the prefecture, we define it as ad hoc 75% of the peak PIR value for each prefecture. Figure 1 shows the results of extracting the quantities and plotting them on a scatterplot. Figure 2 shows the time trend of the fourth wave of the infection using the Japanese national map.

In the fourth wave, only Okinawa has two peaks; therefore, it is divided into an earlier and later peaks, as shown in Fig. 1. Hierarchical clustering confirms the presence of four groups which are (1) Ibaraki, Chiba, Tokyo, and Saitama; (2) Miyagi and Okinawa1; (3) Hyogo and Osaka; and (4) Okinawa2. Accordingly, it can be assumed that the general infection order the pandemic followed this trend. The elements within the earliest group in the fourth wave are regionally close to each other. The map in Fig. 2 shows that the lower the lightness of the colour is, the greater the PIR becomes. The area around Kanto was more prominent in March 2021. Miyagi and Okinawa in April 2021 and Osaka, Hyogo, and Nara in May 2021 exhibited larger PIRs. The PIR values during the fourth wave are shown in Fig. 3.

Fig. 1
figure 1

The peak period of infection in the fourth wave for each prefecture. The vertical axis indicates the number of days taken from the start date of the wave to reach 75% of the peak value indicating the time of spread of infection. The horizontal axis indicates the number of days taken from the start date of the wave to reach the peak value. Time flows from the bottom left to the top right

Fig. 2
figure 2

Choropleth map of the PIR in the fourth wave. The spatial spread over time of COVID-19 is plotted from March to June 2022

Fig. 3
figure 3

Regional trend feature of the fourth wave. The top eight prefectures in descending order of the PIR value in the fourth pandemic wave are plotted on this graph. The vertical axis indicates the PIR which is defined in Eq. (1). The PIR value greater than 1.0 indicates a more active state of infection in the region than the national average. A matter that should be paid attention is only Okinawa has two peaks

3.2 Sixth pandemic wave

For the sixth pandemic wave, the highest PIR value in Okinawa is 22.63, which is more than four times the peak value in Yamaguchi, which had the second highest peak in Japan. Okinawa had shown an upward trend since May 2022. After declining slightly in May, the values in Osaka remained largely unchanged and were greater than 1, which were higher than the values obtained at the national level. For the sixth wave, the scatter plot of the temporal features and the map of infection trends were similar to those of the fourth wave.

By comparing Fig. 1 with Fig. 4, observe that the period on the horizontal axis in Fig. 4 is shorter. This implies that both the peak and the 75% of the peak values in the sixth wave were concentrated at the beginning of the sixth pandemic wave. Moreover, the period of the sixth wave was longer. After some initial regional differences, the trend has been gradual without much change. This indicates that the infection spread at an almost constant level relative to the national trend.

The initial situation of the sixth wave differs from that of the fourth wave, in which the infection appears to have started in four different areas simultaneously. The four areas are Okinawa, the Chugoku District, the Kinki District, and the Kanto District. The map shows that the four areas appear to have been infected early on in the wave, followed by the neighbouring prefectures (Fig. 5).

Fig. 4
figure 4

The infection peak in the sixth wave for each prefecture. The vertical axis indicates the number of days taken from the wave start date to reach 75% of the peak value. The horizontal axis indicates the number of days taken from the wave start date to reach the peak value

Fig. 5
figure 5

Choropleth map of the prefectural infection ratio in the sixth wave. The spatial spread over time of COVID-19 is plotted from January to June, 2022

3.3 Discussion

Regarding the fourth wave, we examine the sequence of infection more locally. The order of infection outbreaks in the prefectures adjacent to Miyagi and in Chiba and Ibaraki shows that infection outbreaks occur in ascending order of latitude as Fig. 6. The results suggest that the infection propagates overland from the Kanto area to Miyagi along neighbouring prefectures in the fourth wave.

On the other hand, the timing of the pandemic in Okinawa is earlier than that in Kagoshima, which is the prefecture nearest to Okinawa. Kagoshima later experienced an infection sequence in the Kyushu region. This result indicates that it is unlikely that the infection was transmitted southwards from the Kyushu region to Okinawa and unlikely that it is transmitted northwards from Okinawa to the Kyushu region. Regarding the access from other prefecture to Okinawa, there are only two modes of transportation, by boat or by plane, because Okinawa is surrounded by sea. It takes approximately 1 day by boat from Kagoshima to the neighbouring prefecture, whereas it takes 1 h and 20 min by plane. Naha City, the capital of Okinawa Prefecture, is approximately 660 km away from Kagoshima City, capital of Kagoshima Prefecture. Most people should visit Okinawa by plane. An air passenger movement survey conducted at Naha Airport showed that passengers arriving at Naha Airport came from airports in major cities, such as Tokyo, Osaka, and Fukuoka [7]. This suggests the need to consider linkages with prefectures including large cities rather than geographically closer prefectures when analysing the infection influence regarding Okinawa. However, the significantly highest value in Okinawa in the early part of the sixth wave is likely due to other factors. This is because infections in other large cities are less prevalent (Fig. 7).

Fig. 6
figure 6

The peak period of infection in the fourth wave for the prefectures around Miyagi. The vertical axis indicates the number of days taken from the start date of the wave to reach 75% of the peak value. The horizontal axis indicates the number of days taken from the start date of the wave to reach the peak value

Fig. 7
figure 7

Number of passengers per day arriving at Naha Airport. The figures are for 1 weekday and one holiday in October or November 2021. The horizontal axis indicates the prefecture of departure, where the headcount is at least 1

4 Summary and perspectives

This study examined the regional characteristics of the transmission of past COVID-19 during past pandemic waves in Japan. The prefectural infection ratio (PIR) was defined to chronologically identify regions with active infection. We further analysed the time-series data using agglomerative hierarchical clustering. This allowed us to determine the temporal order of the regions in which the infection was active. It was also found that the infection trends in Okinawa differed from those in other regions.

In the near future, population influx data will also be used to investigate the relationship between wide-area human flows and infection trends [8].