Impacts of Social and Economic Factors on the Transmission of Coronavirus Disease 2019 (COVID-19) in China

This paper examines the role of various socioeconomic factors in mediating the local and cross-city transmissions of the novel coronavirus 2019 (COVID-19) in China. We implement a machine learning approach to select instrumental variables that strongly predict virus transmission among the rich exogenous weather characteristics. Our 2SLS estimates show that the stringent quarantine, massive lockdown and other public health measures imposed in late January significantly reduced the transmission rate of COVID-19. By early February, the virus spread had been contained. While many socioeconomic factors mediate the virus spread, a robust government response since late January played a determinant role in the containment of the virus. We also demonstrate that the actual population flow from the outbreak source poses a higher risk to the destination than other factors such as geographic proximity and similarity in economic conditions. The results have rich implications for ongoing global efforts in containment of COVID-19.


Introduction
Several clusters of patients with pneumonia of unknown cause were reported in late December 2019 in Wuhan, the capital city of Hubei Province, China. It was later identified to be caused by a new coronavirus (severe acute respiratory syndrome coronavirus 2, or SARS-CoV-2) (Zhu et al., 2020) and the disease is named by the World Health Organization (WHO) as Coronavirus Disease 2019 (COVID-19) 1 . Since mid January 2020, it has rapidly spread throughout China and other countries.
The first confirmed case outside Wuhan in China was reported on January 19 (Shenzhen) . Similar to SARS-CoV and MERS-CoV, the COVID-19 can be transmitted from person to person. As of January 30, a total of 9976 cases had been reported in at least 21 countries 2 . Early detection of COVID-19 importation and prevention of onward transmission are crucial to all areas at risk of importation from areas with active transmissions (Gilbert et al., 2020). To contain the virus, Wuhan was placed under lockdown with residents not allowed to leave the city on January 23. On the same or the next day, most public transportation stopped in Wuhan and other cities in Hubei province. In addition, most cities in Hubei province had adopted blockade measures for Class A infectious diseases by January 28 3 . Residents in those areas are highly encouraged to stay at home and not to attend any gathering activity. Besides, at least 207 cities in China have adopted similar though softer measures by February 12. People are frequently reminded by media and community workers to maintain safe social distance. Furthermore, many cities require that recent visitors to high COVID-19 risk areas quarantine themselves at home or in designated facilities for 14 days. Residents who have symptoms of fever or dry cough are required to report the situation to community, and are quarantined and treated in special medical facilities. Governments also trace and isolate contacts of those patients.
As multiple nations have implemented mandatary quarantine or even massive lockdown, such as South Korea, Italy and China, and more countries are expected to join as the coronavirus outbreak becomes a global pandemic, examining the influencing factors of the transmission of COVID-19 and the effectiveness of the large-scale quarantine measures in China not only adds to our understanding of the containment of COVID-19 but also provides insights into future prevention work against similar infectious diseases. In this paper, we estimate how the number of confirmed COVID-19 cases in a city is influenced by the number of new COVID-19 cases in the same city, other cities, and Wuhan in the preceding first and second weeks, respectively. China's large scale quarantine policy takes effect in late January. We also test whether this policy succeeds in delaying the spread of COVID-19 using a 14-day rolling window analysis between January 19 and February 15.
For infectious diseases, the number of infected people usually increases first before reaching a peak and then drops. This pattern implies that for a linear equation of new cases on the number of cases in the past, the unobserved determinants of new infections are serially correlated. The unobserved determinants of new cases may be serially correlated, also because they measure persistent factors, such as people's habit and government policy. Serial correlations in errors give rise to correlations between the lagged number of cases and the error term, and ordinary least square (OLS) estimator may be biased. Combining insights in Adda (2016) and the existing knowledge of the incubation period of COVID-19, we construct instrumental variables for the number of new COVID-19 cases in the preceding two weeks. Weather characteristics in the previous third and fourth weeks do not directly affect today's number of new COVID-19 cases after controlling for the number of new COVID-19 cases and weather conditions in the preceding first and second weeks. Therefore, our estimated impacts have causal interpretations. We use the Lasso method to select instrumental variables among eleven weather characteristics that have the highest predictive power for the average number of new confirmed cases during each of the past two weeks. Furthermore, we examine the moderator effects of socioeconomic factors on the transmission of COVID-19 in China, which include population flow out of Wuhan, the distance between cities, GDP per capita, number of doctors, and contemporaneous weather characteristics. We focus on the effect of population flows from the origin of the COVID-19 outbreak, because data on real time travel between cities have recently become available and we examine whether it can explain the disease spread, and that Wuhan is a major city and a transportation hub with significant population movements.
We find that the stringent quarantine and public health measures imposed in late January significantly decreased the transmission rate of COVID-19. By early February, the spread of the virus had been contained in China. While many socioeconomic factors moderate the spread of the virus, the actual population flow from the source poses a higher risk to the destination than other factors such as geographic proximity and similarity in economic conditions. Our analysis contributes to the existing literature in two aspects. First, our analysis is connected to the economics and epidemiologic literature on the influencing factors of and ways of preventing the spread of infectious diseases. Existing studies find that reductions in interpersonal contact from holiday school closings (Adda, 2016), reactive school closure (Litvinova et al., 2019), public transportation strikes (Godzinski and Suarez Castillo, 2019), strategic targeting of travelers from high-incidence locations (Milusheva, 2017), and paid sick leave to keep contagious workers at home (Barmby andLarguem, 2009, Pichler andZiebarth, 2017) can reduce the prevalence of influenza.
Additionally, literature find that epidemics spread faster during economic booms (Adda, 2016), increases in employment are associated with increased incidence of influenza (Markowitz et al., 2010), and growth in trade can significantly increase the spread of influenza (Adda, 2016) and HIV (Oster, 2012). Vaccination (Maurer, 2009, White, 2019 and sunlight exposure (Slusky and Zeckhauser, 2018) are also found effective in reducing the spread of influenza. In the case of COVID-19, the quarantine policy takes effect throughout China within few days in late January.
Thus, we directly compare the transmission effects in the pre and post February 1 subsamples to examine its effectiveness in preventing the transmission of COVID-19. Interestingly, within one week, population flow from Wuhan significantly increased the spread of virus in the early subsample, but decreased its transmission rate in the later subsample, suggesting that people have 3 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020 been taking more cautious measures from high COVID-19 risk areas. Besides, following the previous literature on the moderator effect of economic development and environmental characteristics on virus transmission, our analysis further reveal that COVID-19 transmission is positively moderated by GDP per capita and negatively moderated by the number of doctors, while the effects of the environmental factors and population density are mixed.
Second, our paper complements the epidemiologic studies on the basic reproduction number of COVID-19. Using data on the first 425 COVID-19 patients by January 22, Li et al. (2020) estimate a basic reproduction number of 2.2. Based on time-series data on the number of COVID-19 cases in mainland China from January 10 to January 24, Zhao et al. (2020) estimate that the mean reproduction number ranges from 2.24 to 3.58. Given the short time since the beginning of the COVID-19 outbreak, more research in this area is required to assess the dynamics of transmission and its implications on how the COVID-19 outbreak will evolve McGoogan, 2020, Wu et al., 2020b). Several recent studies have examined the effect of population movement on the spread of COVID-19 (Zhan et al., 2020. Our analysis relies on spatially disaggregated data in a longer period, and the instrumental variable approach we use imposes fewer restrictions on the relationship between the unobserved determinants of new cases and the number of cases in the past. Using exogenous temperature, wind speed and precipitation in the preceding third and fourth weeks as the instruments, we find that in the January 19 -February 1 subsample, one new infection leads to 1.465 more cases within one week and the effect on the second week is not definitive.
While this causal estimate is slightly lower than the above epidemiologic studies, the rolling window analysis shows that the infection rate increases first in late January, then stabilizes and drops. Our paper also adds to the epidemiologic studies on effective ways of containing COVID-19. Using a stochastic transmission model that is parameterized to the COVID-19 outbreak, Hellewell et al.
(2020) find that highly effective contact tracing and case isolation is enough to control a new outbreak of COVID-19 within three months in most scenarios. Our empirical estimation based on real numbers of newly confirmed COVID-19 cases suggests that one new infection case can decrease the number of new cases by 0.097 (0.438) in the following first (second) week in the February 2-15 subsample, corroborating the effectiveness of China's large-scale quarantine policy. This paper is organized as follows. Section 2 introduces the empirical model. Section 3 discusses our data and the construction of key variables. The results are presented in Section 4. Section 5 concludes.

Empirical Model
Our analysis sample includes 303 prefecture-level cities in China. We exclude Wuhan from our analysis because the epidemic patterns in Wuhan are significantly different from those in other cities. Some confirmed cases in Wuhan contracted the virus through exposure to Huanan Seafood Wholesale Market, which is the most probable origin of the virus 4 . In other cities, infections arise 4 Li et al. (2020) document the exposure history of the first 425 cases. It is suspected that the initial cases were linked to the Huanan Seafood Wholesale Market in Wuhan. from human to human transmissions. The health care system in Wuhan faced the challenge of previously unknown virus infections in early January and became overwhelmed as the number of new cases increased exponentially from mid-January. These factors may cause severe delay and measurement error issues in the number of cases reported in Wuhan. We alleviate the measurement error problem using variations in the infections caused by exogenous changes in weather conditions which is discussed in detail in Section 3.2.
To model the spread of the virus, we consider simultaneously within city spread and between city transmissions, as in Adda (2016). The baseline model is where c is a city other than Wuhan, y ct is the number of new confirmed cases of COVID-19 in city c and date t. τ denotes the preceding first or second week. d cr is the distance between cities c and r in log, and r =c d −1 crȳ τ rt is the inverse distance weighted average of new infections in other cities. We include lagged y ct up to 14 days based on the estimates of the durations of the infectious period and the incubation period in the literature 5 . We take average of lagged y ct by week, asȳ τ ct = 1 7 7 s=1 y ct−7(τ −1)−s andz τ t = 1 7 7 s=1 z t−7(τ −1)−s . In addition, a notable feature of the COVID-19 epidemic is that it was originated in one city (Wuhan) and most of the early cases outside Wuhan can be traced to previous contacts with persons in Wuhan. To model how the virus spreads to other cities from its source, we include the number of new confirmed cases in Wuhan in the model (z τ t ). x ct includes weather controls, city, and day fixed effects 6 . ct is the error term. Standard errors are clustered within provinces. In a simplified model, we consider only within city transmissions, There are several reasons thatȳ τ ct ,ȳ τ rt andz τ t may be correlated with the error term ct . The unobserved determinants of new infections such as local residents' and government's preparedness are likely correlated over time, which causes correlations between the error term and the lagged dependent variables. Therefore, the coefficients are estimated by two-stage least squares in order to obtain consistent estimates. For the y ct equation (2), the instrumental variables include average temperature, maximum wind speed and precipitation for city c for the past three and four weeks.
The selection of weather variables as instruments is discussed in detail in Section 3.2. The timeline of key variables are displayed in Appendix Figure A.1. The primary assumption on the instrumental 5 The 2019-nCoV epidemic is still ongoing and the estimates are revised from time to time in the literature as new data become available. The current estimates include the following. The incubation period is estimated to be between 2 and 10 days (World Health Organization, 2020b), 5.2 days ), or 3 days (median, Guan et al. (2020)). The average infectious period is estimated to be 1.4 days (Wu et al., 2020a). 6 On February 12, cities in Hubei province include clinically diagnosed cases in the confirmed cases, in addition to cases that are confirmed by nucleic acid tests, which results in a sharp increase in the number of confirmed cases for cities in Hubei on February 12. The common effect on other cities is controlled for by the day fixed effect.

5
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020 variables is that weather conditions before two weeks do not affect the likelihood that a person susceptible to the virus contracts the disease, conditional on weather conditions within two weeks.
On the other hand, they affect the number of other persons who have become infectious within the two-week window, because they may have contracted the virus earlier than two weeks. These weather variables are exogenous to the error term and affect the spread of the virus, and have been used by Adda (2016) to instrument flu infections 7 .
A main objective of this paper is to quantify the effect of various social and economic factors in mediating the transmission rates of the virus, which may help identify potential behavioral and socioeconomic risk factors for infections. For within city transmissions, we consider the mediating effect of population density, degree of economic development, number of doctors, and environmental factors such as temperature, wind and rain. To measure the spread of the virus from Wuhan, we include an estimate of the number of people traveling from Wuhan. The empirical model including these mediating variables is as follows, whereh kτ ct ,m kτ crt andm kτ c,Wuhan,t are the mediating factor for within city transmissions, between city transmissions and imported cases from Wuhan.

Variables
January 19, 2020 is the first day that COVID-19 cases were reported outside of Wuhan, so we collect the daily number of new cases of COVID-19 for 303 cities from January 19 to February 15.
All these data are reported by 32 provincial Health Commissions in China 8 . Figure 1 shows the time patterns of daily confirmed new cases in Wuhan, in Hubei province outside Wuhan, and in mainland China but outside Hubei province.
For the explanatory variables, we calculate the number of new cases of COVID-19 in the preceding first and second weeks for each city on each day. To estimate the impacts of new COVID-19 cases in other cities on a city's own COVID-19 new case, we first calculate the geographic distance between a city and all other cities using the latitudes and longitudes of the centroids of each city, and then calculate the weighted sum of the number of COVID-19 new cases in all other cities using the inverse of log distance between a city and each of the other cities as the weight.
7 Flu viruses are easier to survive in cold weather. Adverse weather conditions also limit outdoor activities which can decrease the chance of contracting the virus. For details, see Adda (2016) and Section 3.2.
8 Hong Kong and Macao are excluded from our analysis due to the lack of some socioeconomic variables.
6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. .

Figure 1: Number of Daily New Confirmed Cases of COVID-19 in Mainland China
Since the COVID-19 outbreak starts from Wuhan, we also calculate the weighted number of COVID-19 new cases in Wuhan using the inverse of log distance as the weight. Furthermore, to explore the mediating impact of population flow from Wuhan, we collect the daily population flow index from Baidu that proxies for the total intensity of migration from Wuhan to other cities 9 .  Table 1 presents the summary statistics of these variables.
We rely on meteorological data to construct instrumental variables for the endogenous variables.
The National Oceanic and Atmospheric Administration (NOAA) provides average, maximum and minimum temperatures, air pressure, average and maximum wind speeds, precipitation, snowfall amount, and dew point for 362 weather stations at the daily level in China. To merge the meteorological variables with the number of new cases of COVID-19, we first calculate daily weather variables for each city on each day from 2019 December to 2020 February from station-level weather records following the inverse-distance weighting method. Specifically, for each city, we draw a circle of 100 km from the city's centroid and calculate the weighted average daily weather variables using 9 Baidu Migration, qianxi.baidu.com.

7
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020.03.13.20035238 doi: medRxiv preprint CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . 9 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10. 1101/2020 stations within the 100 km circle 10 . We use the inverse of the distance between the city's centroid and each station as the weight. Second, we match the daily weather variables to the number of new cases of COVID-19 based on city name and date.

Selection of Instrumental Variables
The transmission rate of COVID-19 may be affected by many environmental factors. Humanto-human transmission of COVID-19 is mostly through droplets and contacts (National Health Commission of the PRC, 2020). Weather conditions such as rainfall, wind speed, and temperatures may play a key role in shaping infections via influencing social activities and virus transmissions.
For instance, more precipitation increases humidity that may weaken virus transmissions (Lowen and Steel, 2014). Both more rainfall and lower temperature may reduce social activities. Lower temperatures may also enhance the infectiousness of the virus in the environment . Higher wind speed and therefore ventilated air may disturb virus transmissions. Current new cases arise from contracting the virus in the past, and for the case of COVID-19, typically within two weeks. The extent of human-to-human transmission is determined by the number of people who already contracted the virus and the environmental conditions within two weeks. Conditional on the number of confirmed cases and environmental conditions within two weeks, it is plausible that weather conditions further in the past should not affect current new cases.
There are many potentially relevant weather characteristics, including daily average, maximum and minimum temperature, average and maximum wind speed, precipitation, air pressure, visibility, dew point, snow depth, and the presence of extreme weather. We use Lasso method to select environmental variables that have the highest predictive power for the average number of daily new confirmed cases during each of the past two weeks, with provincial and day fixed effects already included.The penalty parameters are given by the plugin method of Belloni et al. (2014). There are four variables selected, including average maximum wind speed and average precipitation during the past third and fourth weeks.We also include average temperature during the past third and fourth weeks because temperature is known to affect the transmissions of flu virus and is conjectured to affect COVID-19 as well . We then regress the endogenous variables on the one to four week lags of the instrumental variables, city, and date fixed effects. Table 2 shows that F-tests on the coefficients of the three and four week lags of the instrumental variables all strongly reject joint insignificance, which confirms that the selected instrumental variables are not weak.
The coefficients of the first stage regressions are reported in Table B.1 in the appendix.
10 The 100km circle is consistent with the existing literature. Most studies on the socioeconomic impacts of climate change have found that estimation results are insensitive to the choice of the cutoff distance (Zhang et al., 2017). We also used other cutoff distances (e.g., 150km) for robustness checks, and the main results remain unchanged. These results are available upon request.

10
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. .

Within City Transmission
Our sample starts from January 19, when the first COVID-19 case was reported outside Wuhan.
The sample spans four weeks in total and ends on February 15. We also estimate the model separately for each half sample (January 19 to February 1, and February 2 to February 15). The first two weeks witnessed the virus infections quickly spread throughout the country with every province reporting at least one confirmed case, and the number of cases also increased at increasing speed ( Figure 1). During these two weeks, the government response has become more robust. On January 20, COVID-19 was classified as a Class B statutory infectious disease and treated as a Class  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . 12 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020 authorities and residents taking more protective measures in response to a higher perceived risk of contracting the disease. Releasing official information on newly confirmed cases at the daily level as well as exchanging information via social media throughout China may promote more timely actions to slowdown virus transmissions. Comparing containment effects over time, in the first half sample, one new infection leads to 1.966 more cases within a week, implying a fast growth in the number of cases. However, in the second half sample, the effect decreases to 0.490, suggesting that public health measures imposed in late January are effective in limiting a further spread of the virus.
Besides Wuhan, a large number of cases are also reported in other cities in Hubei province, where 6 of them reported over 1000 cumulative cases by February 15 11 . Their overstretched health care system exacerbates the concern over delayed reporting of confirmed cases in these cities. To mitigate such potential measurement errors in our estimates, we exclude all cities in Hubei province from the regression. The bottom panel of Table 3 reports the estimates. Comparing the IV estimates in columns (4) and (6), we observe that in the first two weeks, on average, one new case leads to 1.345 more cases in the following week, while the chain of infections is broken in the second half sample. in Wuhan and other cities. To alleviate this concern, we use a measure of the size of population flow from Wuhan to a destination city, which is constructed by multiplying the daily migration index on the population flow out of Wuhan (Figure 2) with the share of the flow that a destination city receives provided by Baidu (Figure 3). During days before January 25, we use the average destination shares between January 10 and January 24. For days on or after January 24, the average destination shares between January 25 and February 23 are used 12 . Table 4 reports the estimates from IV regressions of Eq.(1), and Table 5 reports the results where we exclude cities in Hubei province from the analysis. Column (4) of Table 4 indicates that in the first half sample, one new case leads to 1.465 more cases within one week, and the effect is not statistically significant between one and two weeks. In the second half sample, one new case actually decreases future number of new cases, which indicates that the responses by the 11 These cities are Xiaogan, Huanggang, Jingzhou, Suizhou, Ezhou, and Xiangyang. 12 The shares of top 100 destinations are available. The starting and ending dates of the average shares released by Baidu do not precisely match the period of the analysis sample.

13
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . 14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. .

15
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020

16
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020 government and the public effectively contain the risk of additional infections. The time varying patterns in local transmissions are evident using the rolling window analysis (Figure 4). The top left panel shows the estimated local transmission coefficients for various 14-day samples with the starting date indicated on the horizontal axis. After an initial increase in the local transmission rate, one case leads to fewer and fewer additional cases from late January onwards. Comparing Table 4 with Table 3, we observe that the results are not sensitive to the inclusion of terms on between-city transmissions.
As a robustness test, Table 5 reports the estimation results if the analysis sample does not include cities in Hubei province. Column (4) of Table 5 indicates that in the first half sample, one new case leads to 1.517 more cases within a week, and this becomes small and statistically not significant in the second half sample. Besides, in the second half sample, one new case decreases new infections by 0.724 between 1 and 2 weeks, which is larger than the estimate (0.438) with cities in Hubei province included. Overall, the spread of the virus has been effectively contained between For the between-city transmission from Wuhan, we observe that the population flow better explains the contagion effect than geographic proximity (Table 4). In the first half sample, one new case in Wuhan leads to more cases in other cities which receive more population flows from Wuhan within one week. In the second half sample, more arrivals from Wuhan one week earlier can still be a risk. A back of the envelope calculation indicates that one new case in Wuhan leads to 0.067 (0.120) more cases in the destination city per 10,000 travelers from Wuhan within one (two) week between January 19 and February 1 (February 2 and February 15) 13 . Note that while the effect is statistically significant, it should be interpreted in context. It was estimated that 15, 000, 000 people would travel out of Wuhan during the Lunar New Year holiday 14 . If all had gone to one city, this would have directly generated about 280.5 cases within two weeks. The risk of infection is likely very low for most travelers except for few who have previous contacts with sources of 13 It is estimated that 14,925,000 people traveled out of Wuhan in 2019 during the Lunar New Year holiday (http: //www.whtv.com.cn/p/17571.html). The sum of Baidu's migration index for population flow out of Wuhan during the 40 days around the 2019 Lunar New Year is 203.3, which means one index unit represents 0.000013621 travelers. The destination share is in percentage. With one more case in Wuhan, the effect on a city receiving 10,000 travelers from Wuhan is 0.00490 × 0.000013621 × 100 × 10000 = 0.067. 14 http://www.whtv.com.cn/p/17571.html 17 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . infection, and person-specific history of past contacts may be an essential predictor for infection risk, in addition to the total number of population flows 15 .
Besides spillovers from Wuhan, a city may also be affected by infections nearby. We observe that in the second half sample, one new case in a neighboring city that is 100km away generates 0.099 more cases after between one and two weeks 16 . This indicates that by February, while within city transmissions are effectively contained, new infections caused by between-city transmissions can still be a risk. For cities outside Hubei province, the results are similar (Table 5), except that the transmission from Wuhan is not significant in the first half sample.

Social and Economic Mediating Factors
We investigate the impacts of many social, economic and environmental variables in mediating the transmission rates (Eq. (3)). For own city transmissions, we consider the mediating effects of population density, per capita GDP, number of doctors, contemporaneous temperature, wind speed and precipitation. For between-city transmissions, we consider the mediating effects of distance, density difference and per capita GDP difference. We also include a measure of population flows from Wuhan. Table 6 reports the estimation results of the IV regressions using the first half sample, and Table 7 reports the estimates for the second half sample.
Our main conclusions still hold with these mediating factors included. The effects of past infections are much smaller in the second half sample, suggesting that the virus outbreak has been contained. For the between-city transmissions, population flow from Wuhan poses a risk for new infections in other cities, with one week lag in the first half sample, and with two week lag in the second half sample. The effects are robust with the inclusion of other measures of proximity, such as those based on geography and economic similarity. Table 8 shows variables that affect transmissions within cities and whose coefficients are statistically significant at 0.1. In order to help compare across variables, we compute the changes in the variables needed in order for the coefficient of past infections on current cases to be reduced by 1. The effects of the environmental factors and population density are mixed. We also observe that the transmission rate is lower in cities with more doctors. Cities with higher per capita GDP have higher transmission rates, which may be due to more social interactions as a result of more economic activity 17 . However, until clearer mechanisms are identified and tested, this should be interpreted with caution.
15 From mid February, individual specific health codes such as Alipay Health Code and WeChat Health Code are being used in many cities to aid quarantine efforts. 16 0.099 = 0.456 × 1 log 100 . 17 Fogli and Veldkamp (2019) show that income is positively correlated with more closely connected social network, and a dense network spreads diseases faster.

18
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020   CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020  CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020

22
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020 At the time of writing, COVID-19 represents a very high risk at the global level, according to the WHO risk assessment (World Health Organization, 2020a). Countries other than China are reporting new confirmed cases, and increasingly many cases arise through community transmissions rather than being imported. Based on case data in China between January 19 and February 15, we show that public health measures adopted can effectively contain the virus outbreak, and newly available data on real-time population flows can be a valuable tool for risk assessment by public health authorities and the general public.
A key limitation of the paper is that we are not able to disentangle the effects from each of the stringent measures taken, as even within this four-week sampling period China enforced such a large number of densely timed policies to contain the virus spreading, often simultaneously in many cities. A second limitation is that these policies fall within each of the two sub-samples.
By the starting date of the official data release for confirmed infected cases throughout China, i.e. January 2020, a number of stringent measures were already taken, which prevent researchers to ideally examine and compare a sub-sample period during which no strict policies was enforced. Key knowledge gaps remain in the understanding of the epidemiological characteristics of COVID-19, such as individual risk factors for contracting the virus and infections from asymptotic cases. Data on the demographics and exposure history for those who have shown symptoms as well as those who have not will help facilitate these research.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020 A Data Figure A.1 illustrates the dependent variable, the instrumental variables, and the exclusion restriction that weather conditions earlier than two weeks from date t affect the number of people who are infectious within past two weeks, but do not directly affect the number of confirmed new cases at date t.   (1) and (2), the dependent variables are the numbers of newly confirmed COVID-19 cases in the own city in the preceding first and second weeks, respectively. In Columns (3) and (4) Weather conditions affect disease transmissions either directly if the virus can more easily survive and spread in certain environment, or indirectly by changing human behavior. In our sample period, higher wind speeds or more precipitation decrease the number of infections in the future. Higher temperatures increase local infections, but decrease infections in other cities.

26
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020

27
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 17, 2020. . https://doi.org/10.1101/2020