1 Introduction

Urbanization is a global trend in which city populations increase significantly more than those in rural areas. To grapple with urbanization, the local governments adjacent to metropolitan cities are developing unused land for housing and specific zones as new residential areas, attracting more and more urban dwellers. A redeveloped city experiences a mixture of residents: those who were born and raised in it and have a strong sense of interest and responsibility in it and new residents who have less attachment to the city.

If local communities were constructed to encourage cooperation among residents, social efficiency might increase, and local societies could be revitalized by autonomously resolving such issues as social welfare for children and seniors, the coexistence of residential and commercial areas, and the distribution of public goods and shared resources [1,2,3]. Such effects produced by the cooperative behavior of residents are called social capital [4]. Communities where new and existing residents cooperate can be formed through interactions among residents in employment, education, medical care, sports, entertainment, festivals, and other activities.

The COVID-19 pandemic has created much confusion and difficulty among people who must interact with each other. Since early 2020, people in several countries, including Japan, have faced various levels of prohibitions/restrictions from going out and meeting others except to satisfy the basic necessities of life. Even now (April 2022), many people still avoid newcomers in their own communities because they might trigger the spread of the infection [5]. Such behavioral restrictions further alienate people who have different life styles.

To improve both social capital and combat infectious diseases, we must understand where people interact during an epidemic. As a case study, we observed the interactions between people in and around Tsukuba City, which was designated by the Japanese government in 1988 as a destination for migration to reduce overcrowding in Tokyo. Tsukuba City currently has a population of about 250, 000. Between 2000 and 2018, its government developed the city’s west side, resulting in about 20, 000 newcomers moving to that new development area. Local government is committed to interaction between new and existing residents.

In this study, we define new residents and existing residents based on the average age of the houses in the residential areas and measure the degree of separation between new and existing residents at various locations and facilities during the COVID-19 pandemic using smartphone location information. Parks, food courts, department stores, and other locations with low levels of separation were identified even during the epidemic. Low-separation places connect different communities, such as new and existing residents. During the pandemic, the spread of infection across different communities may be prevented by ensuring sturdy infection control measures in such places. During the post-outbreak period, by paying the high cost of both reducing infection risks and promoting resident interaction in these locations, a city’s social capital can be improved while reducing the number of infected people in its communities.

Subsequent sections are organized as follows. The next section describes the spatial big dataset used in this study. Section 3 describes the methodology used to measure the diversity of daytime visitors in each area. Even before the COVID-19 pandemic, significant separation was found between new and existing residents in various locations in and around Tsukuba City. Section 4 measures the impact of the pandemic on visitor diversity. Section 5 provides a summary.

2 Spatial Big Data

We used three types of spatial data sets in and around Tsukuba City, bounded by a rectangle from \(35.9479^{\circ }N\) to \(36.2354^{\circ }N\) and from \(139.996^{\circ }E\) to \(140.172^{\circ }E\). The first dataset is landmark information called Points of Interest (POI). The second dataset is the date, time, location, and attribute information of visitors to the public places. The third dataset is visitor residential area information, including average housing age, savings and income classes.

The landmark information was collected from OpenStreetMap (OSM) in December 2021 [6]. Table 1 shows the 18 types of landmarks we focused on and their numbers. OpenStreetMap contains the typical location coordinates of these landmarks. We divided Japan into \(250 \times 250\) square meter grids according to the Japanese Industrial Standards (JIS X0410 regional mesh) [7]. We signified the location of particular landmarks by flags in a grid. The maps in this paper were created with OpenStreetMap.

Table 1 Number of landmarks in and around Tsukuba City: Second and third columns represent Key and Value in OpenStreetMap

The date, time, location, and attribute information of visitors to the public places in Tsukuba City area were obtained from KDDI Location Data, an anonymously processed au smartphone location dataset of millions of Japanese people who allowed their data to be used [8]. We used the au cell phone service, which is a brand of KDDI (one of Japan’s leading telecommunications operator), without regional bias. The dataset contains the dates and times on which the location information was recorded, the location information coarse-grained on a \(250 \times 250\) square meter grid, the time spent at that location, the estimated residence coordinates coarse-grained on a \(500 \times 500\) square meter grid [7], and anonymized age and gender. The home area of all individuals at the \(500 \times 500\) square meter grid level was inferred using their most common location between 7 p.m. and 3 a.m. We used 231, 736, 464 position coordinates from April 1 to September 30, 2019 and 395, 591, 820 position coordinates from April 1 to September 30, 2020 of the people who visited in and around Tsukuba City.

The residential area information was identified from a housing information dataset on 17.8 million real estate properties in Japan registered between January 2015 and December 2019 with the National Real Estate Information Network operated by At Home Co., Ltd. (used by over 58, 000 real estate agencies as of December 1, 2020) [9]. The dataset contains 41, 590 residences in and around Tsukuba City. We calculated, for each \(500 \times 500\) square meter grid, the average age of the homes on that grid as of January 2019.

We estimated the average level of savings and average annual income of households in each \(500 \times 500\) square meter grid using the 2013 estimated savings class and income class dataset constructed by Zenrin Marketing Solutions Co., Ltd. [10, 11]. This dataset contains the probability distributions of savings and the annual incomes in each grid. We calculated the mean of each from probability distributions.

3 Methods for Measuring Spatial Separation

We investigated the diversity of the visitors to each \(250 \times 250\) square meter grid between 9 a.m. and 5 p.m. by applying the spatial separation measure proposed by E. Moro et al. [12]. Residential location is a strong reflection of socioeconomic status (SES). We attribute to each smartphone that provided location information the owner’s age and gender, the average home age, the average household savings, and the average annual household income in the owner’s residential area. For each attribute (except gender), we grouped visitors into four equally sized quantiles based on 1/4 (the Q1), 1/2 (the median), and 3/4 (the Q3) of its attribute.

We extract any visits an individual makes to a given place that lasts for more than 10 min. To measure the spatial separation of attribute i of each place \(\alpha \) in and around the city, we computed proportion \(\tau ^{i}_{q \alpha }\) of the total time spent at that place by each quartile group q. Spatial separation \(S^{i}_{\alpha }\) is created as a measure between 0 and 1:

$$\begin{aligned} S_{\alpha }^{i} = \frac{2}{3} \sum _{q} |\tau ^{i}_{q \alpha } - \frac{1}{4}|. \end{aligned}$$
(1)

\(S^{i}_{\alpha }=0\) means that the total time across all visitors spent at place \(\alpha \) is split evenly among the four quartile groups. In other words, the place is fully integrated about visitor’s attribute i. By contrast, since a place with \(S^{i}_{\alpha }=1\) is one that was visited exclusively by a single quartile group, it has a higher level of separation for visitor’s attribute i.

We standardized the total time spent by gender q at place \(\alpha \) by the total time spent by that gender at all places. We computed proportion \(\tau ^{i}_{q \alpha }\) of the total standardized time spent at place \(\alpha \) by people of gender q. Spatial separation \(S^{i}_{\alpha }\) of men and women is created as a measure between 0 and 1:

$$\begin{aligned} S_{\alpha }^{i=gender} = \sum _{q} |\tau ^{i=gender}_{q \alpha } - \frac{1}{2}|. \end{aligned}$$
(2)

The statistical significance of spatial separation \(S^{i}_{\alpha }\) can be confirmed using the null hypothesis that people choose where to visit independently of their attributes. By randomly shuffling the visitors in a given dataset and preserving the total time spent at each place, we simulated a society that follows the null hypothesis.

Figure 1 shows the spatial distribution of the average housing age in and around Tsukuba City. Newer houses are clustered on the west side, indicating where new residents are living. We clarified a statistically significant spatial separation of new and existing residents for visitors to each place on weekends from April to September 2019. The housing age quartiles of these visitors’ residences are \(Q1=15\) years, median\(=19.6\) years, and \(Q3=23.1\) years. Fig. 2(a) shows the spatial distribution of separation \(S^{i}_{\alpha }\) for new and existing residents calculated using these quartiles. Fig. 2(b) shows the spatial distribution of the separation in the null hypothesis with randomly shuffled visitors. Notice that Fig. 2(a) has almost no cyan-colored places representing high diversity (low fragmentation) compared to Fig. 2(b). Fig. 3 is the cumulative distribution of separation \(S^{i}_{\alpha }\). The null hypothesis for the 2019 weekends is that the probability of separation with \(S^{i}_{\alpha } > 0.09\) is \(1\%\). However, in the real world, \(98.2\%\) of all the \(250 \times 250\) square meter grids have separation with \(S^{i}_{\alpha } > 0.09\), which is statistically significant with a p-value \(< 1\%\).

Fig. 1
figure 1

Spatial distribution of housing ages: X- and y-axes represent east longitude and north latitude. Each plot represents a \(500 \times 500\) square meter grid

Fig. 2
figure 2

Daytime spatial separation of new and existing residents on weekends in 2019. Each plot represents a \(250 \times 250\) square meter grid: a real data and b city-wide shuffled visits

Fig. 3
figure 3

Cumulative distribution function of daytime spatial separation of new and existing residents: Black, red, and blue respectively represent the CDF in normal period: April 1–Sep. 30 in 2019, outbreak period: April 1–Jun. 18 in 2020, and post-outbreak period: June 19–Sep. 30 in 2020. Solid and dashed lines represent the CDF for actual data and city-wide shuffled visits: a weekends and b weekdays (color figure online)

4 Spatial Separation Change due to Outbreak of COVID-19

4.1 Spatial Distribution of Separation

In Japan, the first COVID-19 infection was confirmed on January 15, 2020, and the number of new patients increased continuously. On April 7, 2020, the Japanese government declared a state of emergency in the greater Tokyo area, which includes Tsukuba City. The emergency declaration, which strongly urged that citizens stay home and avoid going out except for urgent or necessary reasons, was lifted on May 25, although the government continued to request that people avoid traveling across the prefecture until June 18. Mizuno et al. used cell phone location information to measure the percentage of the change in the number of people who left the houses from January 2020 in each city in Japan [13, 14]. Tsukuba City recorded a decrease from \(20\%\) to \(60\%\) through June 18; this decrease remained less than \(20\%\) until the end of 2020. We measured the spatial separation for the normal period, “April 1 to September 30, 2019,” the outbreak of the infection period, “April 1 to June 18, 2020,” and the post-outbreak period, “June 19 to September 30, 2020.”

Figure 3 shows the cumulative distribution of the separation of new and existing residents. Separation significantly expanded on the weekends in the outbreak period. On weekdays, however, its expansion was limited. In other words, during the outbreak period, on weekends people avoided places frequented by people with different attributes when their visits were flexible. In the post-outbreak period, separation returned to its normal period levels, although the number of visitors to various places has not recovered.

4.2 Separation for Various Attributes

We also calculated spatial separation by visitors’ residential savings and income class and their age and gender and present the mean values for each in Table 2. As with the spatial separation of new and existing residents, separation was significantly greater on weekends during the outbreak period. On weekdays, the increase in separation was limited. During the post-outbreak period, separation returned to normal period levels for all the given visitor attributes.

People not only stayed home but also reduced the distance the covered when they went out, presumably to prevent the spread of COVID-19. As shown in Table 2, the average weekend outing distance shrank to 5.5 km during the outbreak period, compared to 8.8 km during the normal period. In the post-outbreak period, the outing distance was 6.6 km, which has not recovered to normal period levels compared to the separation.

Since residences reflect socioeconomic status, spatial separation tends to increase as the outing distance from home decreases, although the correlation between distance and separation is weak. Fig. 4 shows the relationship between the average distance to a visitor’s home in each \(250 \times 250\) square meter grid and the separation of new and existing residents. Although the regression coefficient is not zero, the variance of separation conditioned on outing distance is very large. Fig. 5 shows the results of a Mann-Whitney’s U test to determine whether there is a difference in the mean value of separation for places with equal outing distances between the normal and outbreak periods. Separation is statistically and significantly higher during the outbreak periods than during the normal periods at places where the outing distance exceeds 6 km (p value \(< 0.05\)). This result suggests that the increase in separation during the outbreak period was not solely caused by residents shortening the distances of their outings.

There are correlations between visitor attributes. For example, younger households live in areas with younger housing ages. Therefore, to determine which attributes are the main factors fueling separation among residents, all the attributes must be equated except for the one being analyzed. Since the smartphone location dataset used in this previous paper [8] does not contain a large enough sample size to control for all the given attributes, we analyzed the separation of the new and existing residents by only limiting the age of visitors. Increasing the sample size and clarifying the contribution of each attribute to separation is a future task. As shown in Table 3, we also observed the separation of new and existing residents by age group and noticed an increase on weekends during the outbreak period for all ages. This means that at least some separation is independent of age.

Table 2 Mean of daytime spatial separation for various attributes: normal period is from April 1 to Sep. 30, 2019
Fig. 4
figure 4

Relationship between average distance to visitor’s home and separation of new and existing residents: Dashed lines are regression lines: a normal period: weekends from April 1 to Sep. 30, 2019 and b outbreak period: weekends from April 1 to June 18

Fig. 5
figure 5

Mann-Whitney’s U test for differences in separation of new and existing residents between normal and outbreak periods on weekends. X-axis represents average distance from visitor’s home. Red dots indicate a higher average value of separation during outbreak period than during normal period. Black dots show a higher average value in normal period, although not statistically significant (p value (color figure online) \(> 0.05\))

Table 3 Separation of new and existing residents by age on Weekends

4.3 Segregation by Landmarks

We investigated the diversity of visitors to each of the landmarked grids in Table 1 using separation measures. Figure 6 shows the relationship between the average distance to home for visitors and the separation of new and existing residents at each landmark. We identified similar trends to this figure in separation in the savings, income, age, and gender attributes. The dashed lines are the regression lines between the distance and separation calculated from all the grids as in Fig. 4. Grids with landmarks tend to have lower separation than the regression line, which represents the overall average trend. In particular, food courts, department stores, and theaters have low separation because such facilities are used by a variety of people.

During the outbreak period, we identified a decrease in the number of distant visitors and an increase in the separation at various landmarks. Parks show an exceptional decrease in separation, suggesting that they were used by a diverse population during the outbreak period. Even after the COVID-19 outbreak began, lower separation was observed in places with food courts and department stores than in other landmark locations.

Fig. 6
figure 6

Relationship between average distance to home for visitors and separation of new and existing residents at each landmark: Dashed lines are regression lines from all grids in and around Tsukuba City. Dashed lines are regression lines between the distance and separation calculated from all grids, as in Fig. 4. Node color and size represent number of people who visited places where landmarks are located during daytimes for a given period: a normal period: weekends from April 1 to Sep. 30, 2019 and b outbreak period: weekends from April 1 to June 18, 2020

5 Conclusion

By distinguishing between new and existing residents based on the average age of the houses in their residential areas, we measured the degree of separation between them at various locations and facilities in the Tsukuba City area during the daytime based on smartphone location information. We also investigated separation by visitors’ residential savings and income class and their age and gender in each location.

Separation was observed in almost all the public places in Tsukuba City, even before the COVID-19 outbreak. During the outbreak, many public places and facilities were visited by fewer people, and yet their separation increased. On the other hand, separation lessened in parks, increasing opportunities for residents to interact. Even after the outbreak began, lower separation environments remained in places where food courts and department stores were located compared to other places. In the post-outbreak period, separation returned to its normal level.

In the post-outbreak period, we observed a slower recovery in outing distance relative to spatial separation, suggesting that people diversified their outings around their residences. Direct evidence of such outing diversification by analyzing each trip history is future work. Since there is a correlation between visitor attributes, we must clarify which attribute is the main factor that generates separation to improve social capital.