Time trends and persistence in PM2.5 in 20 megacities: evidence for the time period 2018–2020

The degree of persistence in daily data for PM2.5 in 20 relevant megacities such as Bangkok, Beijing, Mumbai, Calcutta, Canton, Dhaka, Delhi, Jakarta, London, Los Angeles, Mexico City, Moscow, New York, Osaka. Paris, Sao Paulo, Seoul, Shanghai, Tientsin, and Tokyo is examined in this work. The analysis developed is based on fractional integration techniques. Specifically, the differentiation parameter is used to measure the degree of persistence in the series under study, which collects data on daily measurements carried out from January 1, 2018, to December 31, 2020. The results obtained show that the estimated values for the differentiation parameter are restricted to the interval (0, 1) in all cases, which allows us to conclude that there is a mean reverting pattern and, therefore, transitory effects of shocks.


Introduction
Human health, and particularly for those who live in cities, can be seriously affected by poor air quality. Recent studies on health and the environment have pointed out that one of the most harmful pollutants for health is suspended particles, especially the finest particles. Fine particles, also called particulate matter or PM 2.5 , can penetrate deep into the lungs and cause them to become inflamed, putting patients with heart and lung disease in serious danger. In turn, these particles can carry carcinogenic compounds that could be adsorbed on the surface of the lungs. All of this leads research on the dynamics of atmospheric pollution to acquire great importance, specifically research projects that contribute and develop models for prediction purposes and, as a consequence, that enable the design of air quality management policies.
Based on the above, the present paper investigates the time series properties corresponding to daily data of PM 2.5 on the twenty megacities around the world, investigating its evolution across time. We measure the degree of persistence to determine whether the shocks in the series have permanent or transitory effects. However, instead of using classical methods, which are based on a "well" I(0) or short-memory behaviour of the error term, we consider the possibility of long memory, which is a feature very often observed in environmental and climatological data (Liu et al. 2014;Yaya et al. 2015;Knight et al. 2017;Li et al. 2017;Bai et al. 2019;Qi et al. 2019;Zhao et al. 2019;Gil-Alana and Lenti 2021;Gil-Alana et al. 2020a, b;Sakiru et al. 2021;etc.).
The term "megacity" is used to define a metropolitan area with more than 10 million inhabitants. Typically, these urban environments are made up of one, two, or more metropolitan areas that have been physically joined together (Rollandi 2012). However, information from different sources differs on the number of inhabitants in megacities, mainly because the urban area is spread over territories that are divided by different political entities, and sometimes, there is no clear Responsible Editor: Ilhan Ozturk definition of urban boundaries. A characteristic of Megacities is that they are polycentric, which means that they do not have a single centre, but that within the same urban extension there are different areas with the capacity to attract economic, social, and political activities. In this case, megacities are structured with the existence of different centralities. There are major differences between megacities located in developed countries and those in developing countries. On the one hand, in developed countries, there are conurbations organised by the extension of infrastructures over territories that are being incorporated in an orderly fashion. However, in megacities in developing countries, conurbations are created through informal settlements, which have no planning or infrastructure. In fact, urban planning in these environments usually takes place after the consolidation of an urban area and is reduced to providing infrastructure for the new neighbourhood (Giglia 2001).
On these occasions, urban management focuses on what is urgent rather than what is important. Governments are often more concerned with finding solutions to specific, immediate, and emerging problems (flooding, rubbish, water supply, etc.) than with solving the pollution problems of an area in the medium or long term (Cantos 2011).
One of the most important reasons for the increase in environmental problems is population growth. In order to meet the needs of the population, industrial development is necessary. But increasing industrialisation causes rapid consumption of natural resources. In addition, the waste produced by production and consumption has a negative impact on the environment. Another negative factor in population growth is unplanned urbanisation. As a consequence of unplanned urbanisation, pollution is increasing in urban centres. Air, water, and soil are polluted, negatively affecting living beings.
It is clear that how megacities continue to produce and consume energy and goods will be crucial to their social, ecological, and economic survival. In this context, policies related to environmental sustainability will be key to hopefully facing one of the great challenges facing humanity.
The concentration of the world's population in urban centres is a growing trend. According to the United Nations (UN), about half of the world's population (55%) now lives in urban areas; and by 2050, about two-thirds (68%) of all people are projected to reside in urban areas (World Population Prospects 2019). The UN points out that while in 1990 there were 10 megacities in the world, today their number has risen to 33, and they have grown from 7% of the world's population to 13%, a trend that will continue in the future (UN 2019).
Population growth in megacities is mainly determined by migration. The explanation lies largely in the migratory flow, in the transfer of population from other populations, which does not necessarily have to become international immigration, since a large part of the world's population will move from rural to urban areas (UN 2021). In this context, we have taken 20 megacities with different levels of industrialisation and development as reference for our study, which are located in different regions worldwide, and for which reliable records on PM 2.5 emissions are available. See Table 1 below that displays the population of these megacities in 2020.
The rest of the paper is structured as follows: The "A Review of the Literature" section shows a brief review of the literature on the topic. Data and the methodology and modelisation used in the paper are given in the third section. The data used are presented in the fourth section. The empirical results are displayed in the fifth section, while the last section concludes the paper.

A review of the literature
Poor air quality can affect human health. Among the most harmful pollutants for health is particulate matter (PM 2.5 ). In recent years, greater attention has been paid to particles with a size smaller than 1 μm in diameter, which are known as an ultrafine fraction and to which it seems can be attributed a greater potential for damage (Lippmann 1989;Ostro et al. 1999;Castillejos et al. 2000;EPA 2002;Morgenstern Xing et al. 2016;Eguchi et al. 2018;Amsalu et al. 2019;etc.). Suspended particles or fine particles can enter the lungs, causing inflammation in the lungs and producing serious diseases. Suspended particles also often carry carcinogenic compounds that can be absorbed by the lungs. The main components of the particles are metals (lead, iron, vanadium, nickel, copper, platinum, and others), organic compounds, material of biological origin (viruses, bacteria, animal, and plant remains, such as pollen fragments), ions (sulphates, nitrate, and acidity) and reactive gases (ozone, peroxides, and aldehydes), and their core is often made up of pure elemental carbon (EPA 1999).
Air pollution is a global problem, affecting all countries regardless of their level of development. It can occur in indoor environments, where the main pollutant is tobacco, or in outdoor environments, in which case industry and massive vehicular load are the main associated factors. The rapid growth of cities coupled with the lack of effective transportation planning can be the cause of large and harmful levels of fine particulate matter (PM 2.5 ) in the air (Montes de Oca et al. 2010).
Numerous institutions in different countries and several studies have analyzed the connection between pollution and adverse health effects. Examples are the works by Schwartz and Marcus (1990), Anderson et al. (1996), Atkinson et al. (1999), Gardner and Dorling (1999), HEI (2002), EPA (2002), andWHO (2006). In the USA, Section 812 of the Clean Air Act Amendments requires the Environmental Protection Agency (EPA) to periodically evaluate the effects of the Clean Air Act on public health, the economy, and the environment. In this document, a series of measures are proposed to improve air quality, as well as to establish a detailed program for compliance and maintenance of national air quality standards. In this sense, it is clearly important to investigate the dynamics of air pollution in order to develop adequate models for prediction purposes and to design policies to manage air quality.
In more developed cities, particulate pollution from car traffic is a major problem. In many cases, these cities have grown without proper planning of their growth in many areas, such as transport flow planning. The fact that we do not have frequent air quality data makes it difficult to assess health impacts and for governments, in some cases, to take policies on increasing transport flows seriously. In this respect, it is of vital importance to define an air quality standard in order to protect citizens (Cohen et al. 1997;Rosales-Castillo et al. 2001;Magas et al. 2007;Molnár et al. 2007;Montes de Oca et al. 2010;Yuan et al. 2012;Steinle et al. 2013Steinle et al. , 2015Karagulian et al. 2019).
In this regard, the World Health Organization (WHO 2006) has established an annual limit value of 10 μg/m 3 for the concentration of PM 2.5 particles in the air. However, in some large cities, this value practically doubles, with the consequent impact on morbidity and mortality. Recently, the WHO has established new guideline values for particulate matter concentrations in the air based on concentrations of particles smaller than 10 μm in diameter (PM 10 ) and particles smaller than 2.5 μm in diameter (PM 2.5 ), although it clarifies that PM 2.5 values are preferable to PM 10 . This preference is based on the fact that PM 10 has an important component of natural origin, especially in southern European cities, such as air intrusions from North Africa. (Rodríguez et al. 2001;Escudero et al. 2005). However, in an urban atmosphere, the main contribution to PM 2.5 is due to engine combustion and has a less important natural component than PM 10 (Ballester et al. 2007), and therefore seems, a priori, to be a more reliable indicator for measuring anthropogenic activity. In addition, these fine particles penetrate deeper into the pulmonary alveoli producing more adverse health effects than particles of a larger diameter, such as PM 10 (De Kok et al. 2006). In another study, Linares and Díaz (2009) explain, for example, the association found between PM 2.5 and children's hospital admissions to the emergency room, compared to what had been detected with other pollutants. Similar evidence is also found in Bell et al. (2015), Patto et al. (2016), Nishikawa et al. (2021), and Ren et al. (2021) among many others.
In this article, we work with a long-memory perspective based on fractional integration. Long memory is an aspect of the time series where the high degree of dependence between indicators that are widely separated in time stands out. Analysing the bibliography on this methodology, we find that it has been used in many areas of knowledge, for example in the economic-financial field (Gil-Alana and Moreno 2012; Abritti et al. 2017;Kalemkerian and Sosa 2020;Murialdo et al. 2020;Qiu et al. 2020 Bruneau et al. 2020;Gil-Alana et al. 2020a, b;Xayasouk et al. 2020;Yaya et al. 2020). In this latter area, the most recent writing using fractional integration is that of Caporale et al. (2021). In the study, the authors investigate the statistical properties of daily PM 10 in eight European capitals (Amsterdam, Berlin, Brussels, Helsinki, London, Luxembourg, Madrid, and Paris) during the period 2014-2020.

Methodology and modelization
We use techniques based on long-range dependence or long memory that imply that the infinite sum of the autocovariances is infinite. Within this class of models, widely used in environmental studies, a very manageable one is that based on the concept of fractional integration that means that the number of differences required to render the series stationary and short memory or I(0) may be a fractional real value. According to the definitions provided in Granger (1980Granger ( , 1981, Granger and Joyeux (1980), and Hosking (1981), a covariance stationary process {x(t), t = 0, ±1, …} is said to be integrated of order d and denoted as I(d) if it can be expressed as where L is the lag-operator, that is, L k x(t) = x(t-k), and u(t) is I(0) or a short-memory process. Then, if d > 0, x(t) displays long memory (or long-range dependence) in the sense that the observations are highly dependent on time even if they are far distant, and the higher the value of d is, the higher the level of association between the observations is.
In the empirical application carried out in the "Empirical Results" section, we allow for deterministic terms such as an intercept and/or a linear time trend. Thus, we consider the following regression model: where y(t) represents the observed data, α and β are the unknown coefficients, and x(t) is described by Eq. (1) so that x(t) is I(d). Thus, there are two relevant parameters here: β indicating the number of emissions per unit and d measuring the degree of persistence in the data.
The estimation is conducted via the Whittle function in the frequency domain by means of implementing a particular version of the tests of Robinson (1994)

Data
The series analysed corresponds to the air quality daily average taken from the World Air Quality Index (WAQI) at https:// aqicn. org/ map/ world/ es/. All data have been converted using the United States Environmental Protection Agency (US EPA standard). More specifically, we use data from January 1, 2018, to December 31, 2020, concerning the 20 megacities around the World: Bangkok, Beijing, Bombay, Calcutta, Canton, Dacca, Delhi, Jakarta, London, Los  Table 2 shows the number of unavailable observations and their percentage with respect to the series for each city. In these cases, we have computed the arithmetic mean. We observe that the highest percentage of missing observations corresponds to Moscow (21.29%), though they are rather dispersed across the sample, not altering the overall evolution of the data.
The appendix shows the graphs that allow visualizing the behaviour of the concentration of PM 2.5 in each of the megacities studied. We can observe the elevated levels and variability of some of the megacities, such as Bangkok,

Empirical results
We consider in this section the model given by Eqs. (1) and (2), testing the null hypothesis for any real value d o . Thus, the model under the null becomes Across the tables, we report the values of d o where the null hypothesis (3) cannot be rejected at the 95% level along with the estimates of d based on a frequency domain version of the Whittle function. We display the estimates of d under three different scenarios: (i) no deterministic terms, i.e. implying that α = β = 0 a priori in (4), (ii) with an intercept or a constant (i.e. with β = 0 a priori), and (iii) with both the constant and the linear time trend freely estimated from the data.
In Tables 3 and 4, we suppose u(t) is a white noise process, while in Tables 5 and 6, weakly autocorrelated errors are permitted. We start by reporting the results based on In Tables 5 and 6, we allow for weak autocorrelation. We employ here a Bloomfield exponential spectral model (Bloomfield 1973) that accommodates autoregressive (AR) structures in a nonparametric way with a lower number of parameters, being stationary for all its range of values unlike what happens in the AR case.
We observe in Table 5 that the time trend is required now in a number of cases (11 out of the 20 cases presented), and the time trend coefficient is negative in all these cases, the largest value corresponding to Delhi (−0.1163), followed by Bombay (−0.0783). This is good news in the sense that it indicates a systematic decrease in the number of emissions for these cities. The estimates of d are once again smaller than 1 but the values are now much lower. Stationary patterns (d < 0.50) are observed in 15 out of the 20 observed series, ranging the values of d from 0.06 (Tientsin) to 0.45 (Jakarta); for Bangkok, Dacca, and Delhi, the estimates of d are around 0.50, and Bombay and Calcutta display values of d significantly higher than 0.50 and thus showing a nonstationary     pattern. Nevertheless, in all cases, the estimated values of d are once again more statistically significantly below 1 and thus displaying mean reversion and implying transitory shocks. Thus, in the event of a negative exogenous shock generating an increase in the emissions in the cities, there is no need for strong actions since the series will return by themselves in the long run to the original long-term projection. On the other hand, if the shock is positive, reducing substantially the number of emissions, actions should be conducted to maintain the emissions at these lower levels. Table 7 summarizes the results in terms of the degree of persistence measured by the fractional differencing parameter, d. The left column refers to the case of white noise errors, while the right-hand column refers to autocorrelation. If we look at the first five positions (referring to the lowest degrees of persistence), we observe that there are three cities appearing in both scenarios: Shanghai (1st position with white noise errors and 5th under autocorrelation), Tientsin (2nd and 1st respectively) and Tokyo (5th and 3rd). On the other hand, focusing on the bottom places with the highest levels of persistence, three cities share the last 4 places: Bangkok (20th and 17th), Calcutta (19th and 20th), and Bombay (18th and 19th). The lowest values observed under the assumption of autocorrelation can be explained by the competition between the two structures (fractional differentiation and Bloomfield autocorrelation) in describing the time dependence. Note that under the white noise structure, all the time dependence is exclusively captured by the order of integration d, while in the autocorrelation case, the degree of integration competes with the Bloomfield structure in capturing that time lag relationship.
In Table 8, we compare the time trend coefficients under the (wrong) assumption of I(0) errors, i.e. imposing a priori that d = 0, (2nd column) with those obtained with d estimated from the data (3rd column). We observe that for seven cities, namely, Bangkok, Calcutta, Dacca, Mexico, Moscow, Sao Paulo, and Tientsin, the time trend coefficient is significant when d = 0 but insignificant when d is correctly estimated. For a couple of cities, Jakarta and Los Angeles, the coefficient is insignificant in the two scenarios, and for the remaining eleven cases, there is a reduction in the magnitude of β in six cities (Beijing, London, Osaka, Paris, Seoul, and Tokyo) and an increase in another five (Bombay, Canton, Delhi, New York, and Shanghai). Thus, erroneous conclusions can be obtained under the imposition of short memory or I(0) structures on the error term.

Conclusions
In this paper, we have looked at the statistical properties of PM 2.5 in 20 megacities around the world. Using daily data from January 1, 2018, to December 31, 2020, and based on fractional integration, our results show that the estimated values for the differencing parameter are constrained in the interval (0, 1) in all cases, thus showing a mean reverting pattern and thus implying transitory effects of shocks. This result holds independently of the way of modelling the I(0) error term either with uncorrelation or with autocorrelated errors, though lower degrees of integration are shown under this latter assumption. More importantly, the fact that the differencing parameter is significatively different from zero has some important consequences on the estimation of the time trend coefficient that measures the decrease in the emissions per unit. In fact, some megacities that were supposed to display significant negative time trends under the I(0) specification are found to show insignificant coefficients with the long-memory approach. This is one important lesson obtained in this work. Thus, when examining time trends in environmental series the potential presence of long memory is an issue that should be taken into account.
Finally, this work can be extended in several directions. First, the same type of analysis can be extended to other regions of the world in order to verify if the long-memory property holds across regions and to check the decrease in the values across time. Second, other long-memory models, based on parametric, semiparametric, or even nonparametric methods can be implemented to verify the results reported in this work. Finally, the possibility of nonlinear structures or structural changes still in the context of fractional integration in the analysis of environmental data is another line of future research that is currently in progress.
The abscissa axis shows the monthly average from 2018 to 2020, and the ordinate axis shows the measurements of PM 2.5 microparticles measured in micrograms per cubic meter of air (μg/m 3 ).