## Abstract

A non-homogeneous Poisson process is used to study the rate at which a pollutant’s concentration exceeds a given threshold of interest. An anisotropic spatial model is imposed on the parameters of the Poisson intensity function. The main contribution here is to allow the presence of change-points in time since the data may behave differently for different time frames in a given observational period. Additionally, spatial anisotropy is also imposed on the vector of change-points in order to account for the possible correlation between different sites. Estimation of the parameters of the model is performed using Bayesian inference via Markov chain Monte Carlo algorithms, in particular, Gibbs sampling and Metropolis-Hastings. The different versions of the model are applied to ozone data from the monitoring network of Mexico City, Mexico. An analysis of the results obtained is also given.

This is a preview of subscription content, access via your institution.

## References

Achcar JA, Fernández-Bremauntz AA, Rodrigues ER, Tzintzun G (2008) Estimating the number of ozone peaks in Mexico City using a non-homogeneous Poisson model. Environmetrics 19:469–485. https://doi.org/10.1002/env.890

Achcar JA, Rodrigues ER, Paulino CD, Soares P (2010) Non-homogeneous Poisson processes with a change-point: an application to ozone exceedances in Mexico City. Environ Ecol Stat 17:521–541

Achcar JA, Rodrigues ER, Tzintzun G (2011a) Using non-homogeneous Poisson models with multiple change-points to estimate the number of ozone exceedances in Mexico City. Environmetrics 22:1–12. https://doi.org/10.1002/env.1029

Achcar JA, Rodrigues ER, Tzintzun G (2011b) Using stochastic volatility models to analyse weekly ozone averages in Mexico City. Environ Ecol Stat 18:271–290. https://doi.org/10.1007/s10651-010-0132-1

Álvarez LJ, Fernández-Bremauntz AA, Rodrigues ER, Tzintzun G (2005) Maximum a posteriori estimation of the daily ozone peaks in Mexico City. J Agric Biol Environ Stat 10:276–290. https://doi.org/10.1198/108571105X5917

Barrios JM, Rodrigues ER (2015) A queueing model to study the occurrence and duration of ozone exceedances in Mexico City. J Appl Stat 42:214–230

Bell ML, McDermont A, Zeger SL, Samet JM, Dominici F (2004) Ozone and short-term mortality in 95 US urban communities, 1987–2000. J Am Med Soc 292:2372–2378

Bell ML, Peng R, Dominici F (2005) The exposure-response curve for ozone and risk of mortality and the adequacy of current ozone regulations. Environ Health Perspect 114:532–536

Box GEP (1980) Sampling and Bayes’ inference in scientific modelling and robustness. J R Stat Soc Ser A 143:383–430

Castro-Morales FE, Gamerman D, Paez MS (2013) State space models with spatial deformation. Environ Ecol Stat 20:191–214. https://doi.org/10.1007/s10651-012-0215-2

Cox DR, Lewis PA (1966) Statistical analysis of series events. Methuen

Cressie NA (1991) Statistics for spatial data. Wiley, Hoboken

Cruz-Juárez JA, Reyes-Cervantes H, Rodrigues ER (2016) Analysis of ozone behaviour in the city of Puebla–Mexico using non-homogeneous Poisson models with multiple change-points. J Environ Prot 7:1886–1903

de Jesús-Romo V, Rodrigues ER, Tzintzun G (2012) A Gibbs sampling algorithm to estimate the parameters of a volatility model: an application to ozone data. Spec Issue Air Pollut Appl Math 12A:2178–2190

Dias CTdS, Samaranayaka A, Manly B (2008) On the use of correlated beta random variables with animal population modelling. Ecol Model 215:293–300

Diggle PJ, Ribeiro PJ Jr (2007) Model-based geostatistics. Springer, New York

Galizia A, Kinney PL (1999) Long-term residence in areas of high ozone: association with respiratory health in a nationwide sample of nonsmoking adults. Environ Health 99:675–679

Gamerman D, Lopes HF (2006) Markov chain Monte Carlo: stochastic simulation for Bayesian inference, 2nd edn. Chapman and Hall, Boca Raton

Gauderman WJ, Avol E, Gililand F, Vora H, Thomas D, Berhane K, McConnel R, Kuenzli N, Lurmman F, Rappaport E, Margolis H, Bates D, Peter J (2004) The effects of air pollution on lung development from 10 to 18 years of age. New Engl J Med 351:1057–1067. https://doi.org/10.1056/NEJMoa040610

Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409. https://doi.org/10.1080/01621459.1990.10476213

Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman and Hall/CRCP, Boca Raton

Gouveia N, Fletcher T (2000) Time series analysis of air pollution and mortality: effects by cause, age and socio-economics status. J Epidemiol Community Health 54:750–755

Guardani R, Aguiar JL, Nascimento CAO, Lacava CIV, Yanagi Y (2003) Ground-level ozone mapping in large urban areas using multivariate analysis: application to the São Paulo Metropolitan Area. J Air Waste Manage Assoc 53:553–559

Gyarmati-Szabó J, Bogachev LV, Chen H (2011) Modelling threshold exceedances of air pollution concentrations via non-homogeneous Poisson process with multiple change-points. Atmos Environ 45:5493–5503

Huerta G, Sansó B (2007) Time-varying models for extreme values. Environ Ecol Stat 14:285–299. https://doi.org/10.1007/s10651-007-0014-3

Huerta G, Sansó B, Stroud JR (2004) A spatiotemporal model for Mexico City ozone levels. Appl Stat 53:231–248

Javits JS (1980) Statistical interdependencies in the ozone national ambient air quality standard. J Air Pollut Control Assoc 30:58–59. https://doi.org/10.1080/00022470.1980.10465918

Koop G, Potter SM (2009) Prior elicitation in multiple change-points models. Int Econ Rev 50:751–772

Lagona F, Maruotti A, Picone M (2011) A non-homogeneous hidden Markov model for analysis of multi-pollutant exceedances data. In: Dymarski P (ed) Hidden Markov models: theory and applications. InTech, Croatia, pp 207–222

Larsen LC, Bradley RA, Honcoop GL (1990) A new method of characterizing the variability of air quality-related indicators. In: Air and waste management association, international specialty conference, tropospheric ozone and the environment. Los Angeles, California Air and Waste Management Series, Pittsburgh, Penn., USA

Lawless JF (1982) Statistical models and methods for lifetime data. Wiley, Hoboken

Loomis D, Borja-Arbuto VH, Bangdiwala SI, Shy CM (1996) Ozone exposure and daily mortality in Mexico City: a time series analysis. Health Effects Inst Res Rep 75:1–46

Majumdar A, Gelfand AE, Banerjee S (2005) Spatio-temporal change-point modeling. J Stat Plan Inference 130:149–166

Martins LC, de Oliveira Latorre MRD, Saldiva PHN, Braga ALF (2002) Air pollution and emergency rooms visit due to chronic lower respiratory diseases in the elderly: an ecological time series study in São Paulo, Brazil. J Occup Environ Med 44:622–627

NOM (2002) Modificación a la Norma Oficial Mexicana NOM-020-SSA1-1993. Diario Oficial de la Federación. 30 October 2002. Mexico. (

**in Spanish**)NOM (2014) Norma Oficial Mexicana NOM-020-SSA1-2014, Diario Oficial de la Unión. 19 de agosto de 2014. Segunda Edición. (

**in Spanish**)Paez MS, Gamerman D (2003) Study of the space-time effects in the concentration of airborne pollutants in the Metropolitan Region of Rio de Janeiro. Environmetrics 14:387–408

Paroli R, Pistollato Rosa M, Spezia L (2005) Non-homogeneous Markov mixture of periodic autoregressions for the analysis of air pollution in the Lagoon of Venice. In: Proceedings of applied stochastic models and data analysis. Janseen J, Lenca P (eds). Brest. France. May 16–20: pp 1124–1132

Raftery AE (1989) Are ozone exceedance rate decreasing?, Comment of the paper “Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone” by R. L. Smith”. Statistical Sciences 4:378–381

Raftery AE (1996) Hypothesis testing and model selection. In: Gilks W, Richardson S, Speigelhalter DJ (eds) Markov chain Monte Carlo in practice. Chapman and Hall, Boca Raton, pp 163–187

Robert CP, Casella G (1999) Monte Carlo statistical methods. Springer, New York

Rodrigues E, Achcar JA (2013) Applications of discrete-time Markov chains and Poisson processes to air pollution modeling and studies. Springer Briefs in Mathematics. Springer. New York

Rodrigues ER, Gamerman D, Tarumoto MH, Tzintzun G (2015a) A non-homogeneous Poisson model with spatial anisotropy applied to ozone data from Mexico City. Environ Ecol Stat 22:393–422

Rodrigues ER, Tarumoto MH, Tzintzun G (2015b) A non-homogeneous Markov chain model to study ozone exceedances in Mexico City. In: Nejadkoorki F (ed) Current air quality issues. InTech, Croatia, pp 375–394

Sahu SK, Gelfand AE, Holland DM (2007) High resolution space-time ozone modeling for assessing trends. J Am Stat Assoc 120:1221–1234

Sang H, Gelfand AE (2009) Hierarchical modeling for exteme values observed over space and time. Environ Ecol Stat 16:407–426

Schliep EM, Gelfand AE, Holland DM (2018) Alternating Gaussian process modulated renewal processes for modeling threshold exceedances and duration. Stoch Environ Res Risk Assess 32:401–417

Schmidt AM, Rodríguez MA (2010) Modelling multivariate counts varying continuously in space. In: Bernardo JM, Bayani MJ, Berger JO, David AP, Heckerman D, Smith AFM, West M (eds) Bayesian inference 9. Oxford University Press, Oxford, pp 1–20

Shaddick G, Yan H, Salway R, Vienneau D, Kounall D, Briggs D (2013) Large-scale Bayesian spatial modelling of air pollution for policy support. J Appl Stat 40:777–794

Smith RL (1989) Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone. Stat Sci 4:367–393

Smith AFM, Roberts GO (1993) Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion). J R Stat Soc Ser B 55:3–23

Spiegelhalter DJ, Best NG, Carlin BP, Van der Linde A (2002) Bayesian measures of model complexity and fit (with discussion and rejoinder). J R Stat Soc Ser B 64:583–639

Szpiro AA, Sampson PD, Sheppard L, Lumley T, Adar SD, Kaufman JD (2010) Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics 21:606–631

Villaseñor-Alva JA, González-Estrada E (2010) On modelling cluster maxima with applications to ozone data from Mexico City. Environmetrics 21:528–540

WHO (2006) Air Quality Guidelines-2005. Particulate matter, ozone, nitrogen dioxide and sulfur dioxide. World Health Organization Regional Office for Europe, EU

Yang TE, Kuo L (2001) Bayesian binary segmentation procedure for a Poisson process with multiple change-points. J Comput Gr Stat 10:772–785

Zozolotto HC (2010) Aplicação de modelos de volatilidade estocástica em dados de poluição do ar de duas grandes cidades: Cidade do México e São Paulo. Master’s Dissertation, Universidade de São Paulo, Ribeirão Preto, Brazil. (

**in Portuguese**)

## Acknowledgements

The authors thank three anonymous reviewers for their careful reading of this work and for all the comments and suggestions that helped to improve the presentation of the results. In particular, they thank one of them for calling their attention to the work by Koop and Potter (2009). ERR and MHT were partially funded by the Projects PAPIIT-IN102713 and IN102416 of the Dirección General de Apoyo al Personal Académico of the Universidad Nacional Autónoma de México, Mexico (DGAPA-UNAM). This work started and ended during ERR two recent sabbatical visits to the Department of Statistics of the University of Oxford, UK, which was also partially supported by DGAPA-UNAM. ERR is grateful to the Departments of Statistics of the University of Oxford, UK, and of the Universidade Estadual Paulista “Júlio de Mesquita Filho” – Campus Presidente Prudente, Brazil, for support and hospitality during the development of this work. MHT thanks the Instituto de Matemáticas of the Universidad Nacional Autónoma de México, Mexico, for support and hospitality.

## Author information

### Affiliations

### Corresponding author

## Additional information

Handling Editor: Bryan F. J. Manly.

## Appendix

### Appendix

In this appendix we present the values of the hyperparameters of the prior distributions of the parameters present in the Poisson and spatial components of the model as well as some of the computational details of their estimation. We also present conditional posterior distributions of the change-points. The expressions for the posterior distributions corresponding to the other parameters may be obtained as in Rodrigues et al. (2015a). Additionally, the tables with the values of DIC and MLF, the results of the estimated means of the multivariate normal distributions of the parameters \(\log (\varvec{\alpha })\) and \(\log (\varvec{\beta })\), and the MC errors are also given together with the estimated correlation matrices of the change-points.

### A.1. Hyperparameters of the prior distributions of the parameters in the rate function and the spatial model

We would like to draw attention to the fact that the initial prior elicitation was very superficial and led to very flat uniform priors. This allowed for physically unrealistic parameters. It was observed that the MCMC mixed very poorly at these values. On reflection, we revised the hyperparameters in our priors to restrict the distributions to more realistic values. Hence, the data played a role in informing the support of the prior distributions. This is not strictly Bayesian, but made the analysis computationally more tractable. Therefore, the chosen prior distributions for the several versions of the model are given as follows.

#### Model without change-points

The hyperparameters of the prior distributions of the parameters are given as follows. The normal prior distributions of the components of the corresponding vector \(\varvec{\mu }^{\alpha }\) have mean 0.1 when data from stations FAC/EAC, TLA, MER, UIZ, PED, CUA and PLA are used. In the case of the stations SAG and MON/CHA the value of the mean is − 0.05, and for station TAH its value is 0.2. In the case of vector \(\varvec{\mu }^{\beta }\), the normal prior distributions of its coordinates are more heterogeneous. Their means are − 0.35, − 0.1, 0.1, − 0.15, 0.07 and − 0.2 in the cases of data from stations FAC/EAC, SAG, MON/CHA, MER, UIZ and TAH, respectively. When we consider stations TLA and PLA, the mean has value − 0.25, and it is − 0.3 for stations PED and CUA. In all cases, we have a standard deviation of 0.1.

The hyperparameters of the Pareto prior distribution for the parameter \(\psi _{r}\) are \(c=3\) and \(d=1\). The prior distribution of \(\psi _{a}\) is, as specified in Sect. 3.3, i.e., it is a uniform distribution on \((0, \pi /2)\).

The prior distributions for \(\phi _{\alpha }\) and \(\phi _{\beta }\) are inverse gamma distributions IG(2.5, \(b_{\alpha }\)) and IG(3, \(b_{\beta }\)), respectively, where the hyperparameters \(b_{\alpha }\) and \(b_{\beta }\) are the solutions of the optimisation problem (6) corresponding to each particular \(\phi \). The values of the coordinates of the vectors \(\varvec{\sigma }_{\alpha }\) and \(\varvec{\sigma }_{\beta }\) are equal to 0.1.

The prior distribution of \(\varvec{\alpha }\) is, as specified in Sect. 3.3, a log-normal distribution with parameters \(\varvec{\mu }^{\alpha }\) and \(\Sigma ^{\alpha }\). Similar procedure is applied to the vector \(\varvec{\beta }\), but now using \(\varvec{\mu }^{\beta }\) and \(\Sigma ^{\beta }\).

All parameters were estimated using a sample of size 6000 drawn from three chains after a burn-in period of 10,000 iterations with values collected using a sampling interval of 10.

#### Model with one change-point

The prior distributions corresponding to the parameters \(\psi _{a}\), \(\psi _{r}\), \(\phi _{\alpha }\) and \(\phi _{\beta }\) will have hyperparameters as in the case where no change-points are present. The new parameter \(\phi _{\tau }\) will have as prior distribution an inverse gamma IG(3, \(b_{\tau }\)) where \(b_{\tau }\) is also the solution of (6) corresponding to \(\phi _{\tau }\).

If we consider the components of the vector \(\varvec{\mu }^{\alpha }_{1}\), then the mean of the normal prior distribution is − 0.27 for all stations with the exception of station PLA. In this case, the value of the mean is 0.27. If we look at the coordinates of the vector \(\varvec{\mu }^{\alpha }_{2}\), then the means of the normal prior distributions are all equal to − 0.27 for data from all stations. In the case of the vectors \(\varvec{\mu }^{\beta }_{1}\) and \(\varvec{\mu }^{\beta }_{2}\) the values of the means of the normal prior distributions of their coordinates varied from station to station. Hence, for the coordinates of \(\varvec{\mu }^{\beta }_{1}\) the value of the mean is 0.7 for all stations with the exception of station MON/CHA where the value is 0.09. When we take into account the coordinates of \(\varvec{\mu }^{\beta }_{2}\) the means of their normal prior distributions are equal to 1.3 for all stations with the exception of stations MON/CHA and CUA where the values were 0.1 and 1.0, respectively. The values of the coordinates of the vectors \(\varvec{\sigma }_{\alpha }\) and \(\varvec{\sigma }_{\beta }\) are equal to 0.1.

Estimation of all parameters, including the change-points, was performed using a sample of size 6000 obtained from three chains after a burn-in period of 10,000 steps and using a sampling interval of 10 steps.

#### Model with two change-points

In the case of Model 2_1, the hyperparameters of the prior distributions of \(\psi _{a}\), \(\psi _{r}\), \(\phi _{\alpha }\), \(\phi _{\beta }\) and \(\phi _{\tau }\) are as in the version with one change-point. For the coordinates of the vectors \(\varvec{\mu }^{\alpha }_{1}\), the means of their normal prior distributions are equal to 0.05 for all stations with the exception of stations SAG, MON/CHA and TAH where we have mean − 0.05 in the case of SAG and MON/CHA, and 0.15 in the case of TAH. For the coordinates of the vector \(\varvec{\mu }^{\alpha }_{2}\), we have mean 0.15 for stations FAC/EAC and TLA; − 0.05 for stations SAG and MON/CHA; 0.1 for station MER, UIZ and CUA; 0.25 for stations TAH and PED; and 0.2 for station PLA. In the case of the means of \(\varvec{\mu }^{\alpha }_{3}\) we have the following. For stations FAC/EAC, TLA and PLA we have normal prior distributions with means equal to 0.15; for stations SAG, MON/CHA, PED and TAH the respective means are − 0.05, − 0.07, 0 and 0.25; and in the case of stations MER, UIZ, and CUA we have means equal to 0.1.

Consider the coordinates of the vector \(\varvec{\mu }^{\beta }_{1}\). In this case the normal prior distribution of the coordinates corresponding to stations FAC/EAC, TLA, SAG, MER and CUA will have means − 0.05, − 1.05, 0.2, 0.05 and 0.1, respectively. In the case of stations MON/CHA and TAH the means are equal to 0.09 and are − 0.1 when we use data from stations UIZ, PED and PLA. Consider now the coordinates of vector \(\varvec{\mu }^{\beta }_{2}\). Their normal prior distributions have means − 0.35 in the case of stations FAC/EAC and TLA; 0.1 for stations SAG and MON/CHA; − 0.3 in the case of stations TAH, PED and PLA. For stations MER, UIZ and CUA, the corresponding means are − 0.15, 0.07 and − 0.2. The coordinates of the vector \(\varvec{\mu }^{\beta }_{3}\), have normal prior distributions with means associated to stations FAC/EAC, TLA and PLA equal to − 0.5; they are equal to 0.1 for stations SAG and MON/CHA; stations TAH and CUA have means equal to − 0.3; and they are − 0.1, 0, and − 0.6 in the cases of MER, UIZ and PED, respectively.

The hyperparameters related to Models 2_2 and 2_3 are as in Model 2_1 with the exception of the coordinate of \(\varvec{\mu }^{\beta }_{3}\) corresponding to station MER where the mean of its normal distribution is − 0.25 instead of − 0.1. The values of the coordinates of the vectors \(\varvec{\sigma }_{\alpha }\) and \(\varvec{\sigma }_{\beta }\) are equal to 0.1.

In all versions of the two change-point models, all parameters, including the change-points, were estimated using a sample of size 6000 drawn from three chains after a burn-in period of 10,000 steps using a sampling interval of 10.

### A.2. Full posterior distributions of the change-points

In this section we present the expressions for the full conditional marginal posterior distributions from which the change-points were sampled. We will denote by \(\varvec{\theta }_{(- x)}\) the vector of parameters \(\varvec{\theta }\) without the component *x*. Values will be sampled through a Metropolis-Hastings step within the Gibbs sampling algorithm.

The change-points posterior distributions have contributions from the prior (\(P({\varvec{\tau }})\)) and the likelihood. The density \(P({\varvec{\tau }})\) is straightforward to simulate but awkward to write down in close form. It is defined by the simulation procedure given in Sect. 3.3. Hence,

and for \(j = 2, 3, \ldots , J\)

where we take \(N^{(i)}_{\tau ^{(i)}_{J+1}} = K_{i}\).

### A.3. Some of the figures

In this appendix we give some of the figures discussed in the main text (Figs. 2, 3 and 4). First, we have the plots of the daily maximum ozone concentrations during the observational period in all stations used in the present work. Then, we have the plots of the differences between the accumulated observed and estimated means using all versions of the model. This is followed by the estimated rate function for all stations when the selected model is used.

### A.4. Further results

In this section some additional tables and results mentioned in the main text are presented. Hence, Table 3 gives the values of the DIC and log-MLF for all stations and and versions of the model with two change-points.

In Table 4 we have the estimated means of \(\varvec{\mu }^{\alpha }\) and \(\varvec{\mu }^{\beta }\) of the multivariate normal distributions in the cases of \(\log (\varvec{\alpha })\) and \(\log (\varvec{\beta })\) for all stations for the version with two change-points given by Model 2_2.

Table 5 gives the MC errors of the parameters present in the Poisson model when Model 2_2 is used.

Next we present the posterior correlation matrices for the vectors of first and second change-points when Model 2_2 is used. Hence, first we have \(\rho _{\varvec{\tau _{1}}}\) followed by \(\rho _{\varvec{\tau _{2}}}\).

## Rights and permissions

## About this article

### Cite this article

Rodrigues, E.R., Nicholls, G., Tarumoto, M.H. *et al.* Using a non-homogeneous Poisson model with spatial anisotropy and change-points to study air pollution data.
*Environ Ecol Stat* **26, **153–184 (2019). https://doi.org/10.1007/s10651-019-00423-6

Received:

Revised:

Published:

Issue Date:

### Keywords

- Anisotropic spatial model
- Bayesian inference
- Change-points
- Markov chain Monte Carlo algorithms
- Non-homogeneous Poisson process