1 Introduction

The study of exceedances of a specific threshold in the time series of environmental measurements (Rodrigues & Achcar, 2013),(Suarez-Sierra et al., 2022), data traffic measurements in computer security (Hallgren et al., 2021) or the measurement of the prevalence of tuberculosis in New York (Achcar et al., 2008) in certain time periods are some examples of the practicality of a model that allows us to establish whether certain exceedance mitigation contingencies in the measurements have worked correctly in an observed period, that is, if there are moments in the time series that indicate decrease, growth or steadiness in the rate of exceedance emission. These moments will be represented as abrupt changes, represented in the model as change-points. One of the most commonly used methodologies for this purpose is based on Non-Homogeneous Poisson Processes (NHPP), in which the change-points that determine the different regimes, which make up the entire time series, are estimated. In the same way, the parameters of the distribution of the excesses that compose each regime are estimated. For this, most of the time, the prior distribution of the change-points is considered to be the uniform distribution, with hyperparameters at the extremes of the interval where the researcher approximately observes the change-point. From there and from a Bayesian mechanism, the joint distribution of the change-points is updated with the rest of the parameters of each regime. The likelihood function of the observed overshootings and the aforementioned prior distribution are multiplied to obtain the posterior distribution. Using a Bayesian computational technique such as Markov Chain Monte Carlo (MCMC) on the posterior distribution, the parameters and change-points of interest are estimated.

It is worth noting that on the one hand we are talking about the time series of the measurements (TS1) and on the other hand about the time series of the number of times these measurements exceed a threshold of interest, so that cumulatively these counting measurements generate a new time series (TS2) of the number of exceedances at time t. The regimes and change points detected in TS2 will be the same as those determined in TS1. Since TS2 is a counting time series, it is possible to fit a Non-Homogeneous Poisson Process (NHPP) process, with a time-dependent overrun emission intensity function. This last feature gives the advantage to the model that it does not assume equal distribution between regimes, as conventional time series models do. Furthermore, regardless of the random mechanism that generates the TS1 measurements, by means of TS2, the number of change-points and the respective location in TS1 can be found. Because of the above, the objective function is organized over the TS2 observations in the likelihood and the parameters of the joint distribution function will be the same as those of the overshoot emission rate function which for the purposes of the present proposal, are functions with less than three parameters, as will be presented later in the expression (34). That is, the complexity of the model will be directly related to the NHPP overshoot emission rate related to the TS2 observations with respect to time and not to the uncertainty of the TS1 series measurements.

One of the disadvantages that arise in the implementation of the aforementioned models is the model execution time, because the more change-points are considered, the longer the time it will take to reach an optimum. But on the other hand, the greater the number of change-points, the greater the adjustment of the estimated to the observed, allowing less loss of information. The latter can be seen by comparing the Deviance information criterion (DIC) of the models with zero, one, two or more change-points in Suarez-Sierra et al. (2022). So we want a model that continues with the same goodness of fit, but automates the detection of change-points, in short run times. Therefore, in the present work we make a new proposal that unifies the principles of construction of the likelihood function from non-homogeneous Poisson processes, as it is a counting problem, and on the other hand, Bayesian computational methods to estimate the parameters of each regime and its change-points. After the joint posterior distribution is obtained, it will be combined with information theory principles to penalize the integer and real parameters present in the segmentation of the time series. All the above under the heuristics of a genetic algorithm, will convert the penalized posterior distribution into the objective function. Hence, the problem of solving the segmentation uncertainty with the present proposal is reduced to finding the penalized Maximum A Posteriori (MAP) estimator, which corresponds to a modification of the Minimum Description Length principle (MDL) and such we will refer to it as the Bayesian-MDL.

The cumulative mean function, which uniquely determines the NHPP, provides more information about the number of threshold exceedances up to a given time t. This mean function has an associated intensity function that will indicate the time trend in each of the segments and the speed of emission of the threshold overshoots of interest. The parameters of these functions will be estimated with the genetic algorithm, to which the starting conditions of the first generation of change-point configurations or chromosomes can be set. The latter will be responsible for generating the offspring with the best characteristics. To guarantee this, the mutation distribution, mating dynamics and reordering between chromosomes will be implemented and tested as in Li & Lund (2012). The execution time is also declared at the beginning of the algorithm and is determined by the number of generations. From each generation will come out the best of the children, which will be evaluated by the MDL, hence the chromosome with the lowest MDL among all the generations will be the best of all. In this type of algorithms, the execution time is compensated by the quality of the solution. Another advantage of this model is that once the cumulative mean function is known, the distribution of the number of overflows present at time t can be explicitly established and from there different forecasting exercises can be performed.

It should also be established that as per the characterization exposed in Aminikhanghahi and Cook (2017) for the type of method, the here proposed procedure is of the off-line type as the whole dataset is considered at once in a retrospective manner and the conditions set for the genetic algorithm are evaluated to determine an optimal solution. At the end of the algorithm execution, it will return time segments that will be independent and identically distributed (iid) conditioned to the parameters of each segment. Assuming that the observations in each segment determined by the proposed algorithm are iid may be wrong in the presence of correlation signals. However, Hallgren et al. (2021) comments that in the presence of outliers, the assumption turns out to be robust, as in the case of Fearnhead (2006) and here.

The presentation of this work is structured as follows. In the second section, a background review will be presented related to what has been developed in the field of NHPP as methods of adjustment of the univariate function of cumulative mean of the exceedances of a threshold of interest up to time t and its respective inferential analysis. In this same section the scheme of the genetic algorithm will be presented, displaying the start, pairing and stop conditions to guarantee that the solution is optimal. Section three will show the theoretical construction of the computational model for four different forms of the intensity function \(\lambda (t)\) proposed in the literature. In section four we will present the results with three sets of simulated data and one set of real data. From the simulated data, the performance of the algorithm will be shown to determine the number of change-points, their location and the number of generations necessary to reach the optimal solution. With the real dataset, we will evaluate the ability of the algorithm to assess the impact of public policy on air pollution data in Bogotá by \(PM_{2.5}\) between the years 2018 and 2020. Finally, in section five we will present the conclusions and future considerations.

2 Inferential analysis of the univariate non-homogeneous poisson process

Based on the construction of the likelihood from a non-homogeneous Poisson process discussed in Cox et al. (1966), we have the following expression, from which we will start to construct the penalized MAP function. The developments that will be shown in the following sections will be based on the assumption that the true intensity function of the overshoots corresponds to some of the function in (34). Then the process that we have correspond to a NHPP with intensities varying with respect to t,

$$\begin{aligned} \begin{array}{llll} \lambda ^{(W)}(t\mid \mathbf {\theta }) &{}=&{} (\alpha /\beta )(t/\beta )^{\alpha -1}, &{}\alpha ,\beta>0\\ \lambda ^{(MO)}(t\mid \mathbf {\theta }) &{}=&{} \frac{\beta }{t+\alpha }, &{}\alpha ,\beta>0\\ \lambda ^{(GO)}(t\mid \mathbf {\theta }) &{}=&{} \alpha \beta \exp (-\beta t), &{}\alpha ,\beta>0\\ \lambda ^{(GGO)}(t\mid \mathbf {\theta }) &{}=&{} \alpha \beta \gamma t^{\gamma -1}\exp (-\beta t^\gamma ), &{}\alpha ,\beta , \gamma >0.\\ \end{array} \end{aligned}$$
(1)

Each letter in the superscript of the left-hand term corresponds to the initial letter of the name of the distribution used as intensity function as follows, Weibull (W) Mudholkar et al. (1995); Ramirez et al. (1999), Musa-Okumoto (MO) Musa & Okumoto (1984), Goel-Okumoto (GO) and a generalization of the Goel-Okumoto model (GGO) Goel & Okumoto (1978), for which the mean cumulative function, \(m(t\mid \mathbf {\theta })\) is defined respectively by:

$$\begin{aligned} \begin{array}{llll} m^{(W)}(t\mid \mathbf {\theta }) &{}=&{} (t/\beta )^{\alpha }, &{}\alpha ,\beta>0\\ m^{(MO)}(t\mid \mathbf {\theta }) &{}=&{} \beta \log \left( 1+\frac{t}{\alpha }\right) , &{}\alpha ,\beta>0\\ m^{(GO)}(t\mid \mathbf {\theta }) &{}=&{} \alpha [1-\exp (-\beta t)], &{}\alpha ,\beta>0\\ m^{(GGO)}(t\mid \mathbf {\theta }) &{}=&{} \alpha [1-\exp (-\beta t^\gamma )], &{}\alpha ,\beta , \gamma >0.\\ \end{array} \end{aligned}$$
(2)

In the first three cases, the vector of parameters is \(\theta =(\alpha ,\beta )\) and for the last one, \(\theta =(\alpha ,\beta , \gamma )\).

The way the rate functions of the process are formulated shows that the observations within the same segment are equally distributed and independent, conditional on the parameters of the emission intensity function of the observations in that segment. Hence, this information is taken into account in the Bayesian-MDL to correspond to the growth, decrease or steadiness of excess emissions in each interval of the partition determined by the change-points. For example as exposed in Rodrigues & Achcar (2013), in air pollution models, the Weibull rate function shows a better fit than the GGO, without and with change-points.

The present model determines the number of change-points and their specific location over the time series of interest. Since the data suggested to be studied with this type of model are exceedances of a specific threshold, the nature of the change-points will represent an abrupt variation in the rate of growth or decay of the exceedances in the cumulative mean function, which in turn will represent variations in the original time series.

2.1 Likelihood function with change-points

A change point is defined as an instant where the structural pattern of a time series shifts (Aminikhanghahi & Cook, 2017; Truong et al., 2020). We assumed the presence of J change-points, \(\{\tau _1, \tau _2, \ldots ,\tau _J\}\) such that there are variations on the model parameters in between segments \(\tau _{j-1}< t < \tau _j, j = 0, 1, 2, \ldots , J+1, j_0 = 1, j_{J+1} = T\). These changes can be attributed to environmental policies or legislations in a certain year, the suspension of some network station due to maintenance for a climate case, macroeconomic policies from an economic standpoint or the presence of a stimulus in a neuroscience context. On the other side, assuming the overshoots from a given threshold between two regimes determined by the presence of a change-point can be modeled after a NHPP, then the intensity functions are,

$$\begin{aligned} \lambda (t \mid \theta ) = \left\{ \begin{array}{lll} \lambda (t\mid \theta _1), &{} 0\le t< \tau _1, &{}\\ \lambda (t\mid \theta _j), &{} \tau _{j-1}\le t<\tau _j, &{} j = 2,3,\ldots ,J,\\ \lambda (t\mid \theta _{J+1}). &{}\tau _J\le t\le T, &{} \end{array} \right. \end{aligned}$$
(3)

where \(\theta _j\) is the vector of parameters between the change-points \(\tau _{j-1}\) and \(\tau _j\), for \(j=2,\ldots ,J\), and \(\theta _1\) and \(\theta _{J+1}\) are the parameter vectors before and after the first and last change-points, respectively. With n observations, the functions for the means are (see, e.g., Rodrigues & Achcar (2013)),

$$\begin{aligned} m(t \mid \theta ) = \left\{ \begin{array}{lll} m(t\mid \theta _1), &{} 0\le t< \tau _1, &{}\\ m(\tau _1\mid \theta _1) + m(t\mid \theta _2) - m(\tau _1\mid \theta _2), &{} \tau _1\le t<\tau _2, \\ m(t\mid \theta _{j+1}) - m(\tau _j\mid \theta _{j+1}) \\ + \sum _{i=2}^{j}[m(\tau _i\mid \theta _i) - m(\tau _{i-1}\mid \theta _i)] \\ + m(\tau _1\mid \theta _1), &{} \tau _{j}\le t<\tau _{j+1}, &{} j = 2,3,\ldots ,J, \end{array} \right. \end{aligned}$$
(4)

where \(\tau _{J+1}=T\). That is, because \(m(t\mid \theta _1)\) represents the average number of exceedances of the standard, before the first change-point. \(m(\tau _1\mid \theta _1) + m(t\mid \theta _2)-m(\tau _1\mid \theta _2)\) is the average number of exceedances of the standard between the first change-point \(\tau _1\) and the second one \(\tau _2\), given that the vector of parameters \(\theta _2\) is known, and so on.

Be \(D={d_1,\ldots ,d_n}\), where \(d_k\) (as in the case without change-points), is the time of occurrence of the \(k-th\) event (the \(k-th\) time the maximum level of the environmental standard is exceeded, for example), with \(k=1,2,\ldots ,n\), the likelihood function is determined by the expression below where \(N_{\tau _i}\) represents the exceedances before the change-point \(\tau _i\), with \(i=1,2,\ldots ,J\) (see Yang et al. (2020); Achcar et al. (2011)).

$$\begin{aligned} L({\textbf{D}}\mid \phi ) \propto&\left[ \prod _{i=1}^{N_{\tau _1}}\lambda (d_i\ \mid \ \theta _1) \right] e^{-m(\tau _1 \mid \theta _1)}\nonumber \\&\times \left[ \prod _{j=2}^{J} \left( \prod _{i=N_{\tau _{j-1}}+1}^{N_{\tau _j}}\lambda (d_i \mid \theta _j)\right) e^{-[m(\tau _j \mid \theta _j)-m(\tau _{j-1} \mid \theta _j )]} \right] \nonumber \\&\times \left[ \prod _{i=N_{\tau _{J}}+1}^{n}\lambda (d_i\mid \theta _{J+1})\right] e^{-[m(T\mid \theta _{J+1})-m(\tau _{J}\mid \theta _{J+1} )]}, \end{aligned}$$
(5)

Using the expression (5), we infer the parameters \(\phi =(\theta , \tau )\), with \(\theta =(\theta _1,\ldots ,\theta _J)\) and \(\tau =(\tau _1,\ldots ,\tau _J)\) using a Bayesian approach. This perspective consists of finding the relationship between the prior distribution of the parameter \(\theta\), on whose intensity function \(\lambda (t\mid \theta )\) is dependent and the posterior distribution of the same, after taking into consideration the observed information D. In Achcar et al. (2010), this method was applied to obtained results very close to the observed ones, hence the descriptive capacity of the model and the methodology used. In such work, the criteria used to select the model that best fits the data together with the graphic part was the MDL.

2.2 Detection of multiple change-points using genetic algorithm

2.2.1 MDL framework

Since finding J change-points implies finding out \(J+1\) regimes for the time series or fitting \(J+1\) models with different parameters, statiscal criteria has been used for such purpose in the available literature. Some include the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), Cross-Validation methods, and MDL-based methods. For problems involving regime shift detection, MDL methods usually provide superior empirical results. This superiority is probably due to the fact that both AIC and BIC apply the same penalty to all parameters, regardless of the nature of the parameter. On the other hand, MDL methods can adapt penalties to parameters depending on their nature be it continuous or discrete, bounded or not. In short, MDL defines the best fitting model as the one that enables the best compression of the data minimizing a penalized likelihood function. That is,

$$\begin{aligned} MDL=-\log _2(L_{opt})+P. \end{aligned}$$
(6)

Here \(\log _2(L_{opt})\) is the required amount of information needed to store the fitted model, term taken from information theory. More details on this can be found in Davis et al. (2006). \(L_{opt}\) is obtained from replacing the maximum likelihood estimator in its function (5). This will be explained in the next section.

Because of the above, it is possible to make the natural connection between the likelihood and the MDL objective function by means of the penalty P (see Davis et al. (2006)). The broad penalty methodology is summarized in three principles as stated by Li & Lund (2012). The first one is to penalize the real valued parameters by the number of observations. Say k that are used to estimate it, then, the penalty will be \(\frac{\log _2 k}{2}\). For this principle, it is important to take into consideration how the observations are arranged to calculate the parameter of interest because this arrangement will be reflected in the penalty.

The second principle involves the penalty of how many integer parameters, such as the number of change-points J and where they are located represented by \(\tau _{1},\ldots , \tau _{J}\) should be charged. This charging is calculated based on the value for each of them. For example, the quantity J, which is bounded by the total number of observations T is charged an amount of \(\frac{log_2T}{2}\). For each of the \(\tau _{j}\) with \(j=1,\ldots ,J\), we have that \(\tau _{j}<\tau _{j+1}\), therefore the cost of its penalty will be \(\frac{\log _2 \tau _{j+1}}{2}\) for \(j=2,\ldots ,J\).

The last principle, mentioned in Li & Lund (2012), is the additivity principle. It involves constructing P based on the sum of all the partial penalties mentioned above. The more parameters the model has, the higher P will be. However, if despite adding parameters, the expression \(\log _2(L_{opt})\) does not grow larger than the penalty P of the extra parameters, the simpler model will be preferred. For the purposes of this paper, the following will be used as the penalty function \(P_{\tau }(\theta )\) for a fixed change point configuration,

$$\begin{aligned} P_{\tau }(\theta )=R\sum _{j=1}^{J+1} \frac{ln(\tau _1-\tau _{i-1})}{2} +ln(J) + \sum _{j=2}^{J}ln(\tau _j), \end{aligned}$$
(7)

where \(R=2,3\) depending on whether \(\theta =(\alpha , \beta )\) or \(\theta =(\alpha , \beta ,\gamma )\), i.e. if \(\theta\) has one or two parameters. The first summand of the right-hand term of expression represents that each of the real-valued parameters (\(\alpha _j, \beta _j,\gamma _j\)) will be penalized by \(\frac{\ln (\tau _1-\tau _{j-1})}{2}\) of the j-th regime to which they belong and since there are \(J+1\) regimes, the sum goes from 1 to \(J+1\). The second summand of the right-hand term is the penalty of the number of points of change, and the last one comes from the sum of each of the penalties of each change-point.

2.2.2 Genetic algorithm schema

As exposed in Li & Lund (2012) the total possible cases to evaluate the MDL corresponds to \({T\atopwithdelims ()J}\), where T is the number of observations in the time series and J is the number of change-points. However, this number of parametric configurations is a quantity that does not make a computationally efficient optimization algorithm that aims to choose the best of the parametric configurations that minimize the MDL. For this reason, we will use the genetic algorithm that, by natural selection criteria will establish the best of the parameters configurations that we will call chromosomes. Each chromosome will be labeled as \((J,\tau _{1},\ldots , \tau _{J})\), where the first component J stands for the number of change-points, located respectively at times \(\tau _{1},\ldots ,\tau _{J}\), corresponding to the respective coordinates. The following is to establish how the genetic algorithm (GA) evaluates each of the chromosomes, while avoiding those probabilistically less optimal.

Let us now see how a complete generation is produced from an initial one with a given size, although the size can also be a couple. For this purpose, suppose there are k individuals or chromosomes in the initial generation set at random. Each of the T observations in the time series is allowed to be a change-point, independent of all other change-points, with probability, for example as seen in Li & Lund (2012), of 0.06. The number of change-points for each chromosome in the initial generation has a binomial distribution \(Binomial (n = T-1, p = 0.06)\).

Two chromosomes are taken out of the initial generation, one mother and one father chromosome, to make a child of the next generation. This is done by a probabilistic combination of the parents. The principle of natural selection in this context will be performed by selecting a pair of chromosomes that best optimize the expression (38) since this couple is considered to have the highest probability of having offspring. Therefore, the chromosomes are arranged from the most likely to the least one to have children, and each individual of the same generation is assigned a ranking, say \(S_i\), being the ranking of the j th individual, with \(S_j=1,\ldots ,k\). If \(S_j=k\) then j is the individual that best fits (38) and if \(S_j=1\) then j is the individual that least well fits (38).

Once this ranking has been made for each of the chromosomes of the same generation, we proceed to establish the probability of selection using the following expression that simulates the principle of natural selection of the parents that will generate the next generation.

$$\begin{aligned} \frac{S_j}{\sum _{i=1}^{k}S_i} \end{aligned}$$
(8)

The chromosome that has the highest probability of being selected from the k chromosomes, is chosen as the mother. Among the remaining \((k-1)\) chromosomes, the father is chosen under the same selection criteria as the mother. Suppose that the mother chromosome has m change-points, located in some \(\tau _1,\dots ,\tau _m\), i.e. with the parameter configuration \((m,\tau _1,\dots ,\tau _m)\). Similarly, suppose the father chromosome has the parametric configuration \((n,\delta _1,\dots ,\delta _n)\). A child of these parents can arise simply by joining the two chromosomes, i.e., the child chromosome will initially have the following configuration, \((m+n,\epsilon _1,\dots ,\epsilon _{m+n})\), where the \(m+n\) change-points contain the mother’s m and the father’s n change-points.

After this, we remove the duplicated change-points from the child \((m+n,\epsilon _1,\ldots , \epsilon _{m+n})\). From this last configuration, we keep all or some change-points. For this, in Li & Lund (2012) use the dynamics of flipping a coin for each change-point in the child configuration. If heads comes up, the change-point is left, otherwise it is removed. That is, a binomial distribution will be used with probability parameter 1/2 and number of trials, the length of the configuration of the child minus duplicities. All this with the aim that the offspring will keep traits of the parents, without being exact replicas.

Each point of change in the child chromosome, can undergo a mutation; a schema taken from Li & Lund (2012) is that one of the following three alternatives may happen. We start by generating, with some random mechanism, the numbers \(-1\), 0 and, 1 with respective probabilities 0.4, 0.3, 0.4. If \(-1\) comes out, the change point is subtracted by one unit; if 0 comes out, it stays at the time it is at, and if 1 comes out, the current change point is added by one unit. Again, duplicates are eliminated. With this last procedure, we have finished the construction of Child 1. Child 2 up to k are generated in the same way as the previous one. New parents are selected if chromosomes are duplicated in the same generation with the previous parents.

The process of generation is repeated as many times as generations are to be obtained. In fact, one of the criteria for establishing the completion of the genetic algorithm is to fix the number of generations r. Another approach could be to reach the solution that minimizes the objective function or one that does not improve after some given number of iterations or generations. Thus the objective function we used was \(\ln P_\tau (\theta ) - \ln f(D\mid \theta ) - \ln f(\theta )\) and the optimization problem we intend to resolve through the means of the genetic algorithm takes the following form,

$$\begin{aligned} {\hat{\theta }}_{BAYESIAN-MDL} = argmax_{\theta ,\tau } \left( \ln P_\tau (\theta ) - \ln f_\tau (D\mid \theta ) - \ln f_\tau (\theta )\right) \end{aligned}$$
(9)

In other words, \({\hat{\theta }}_{BAYESIAN-MDL}\) is the maximum argument of the objective function, in which case it will be the optimal solution to the problem of finding the best configuration of shift points and the respective parameters of the regimes they determine.

3 Methods and models

We are particularly interested in the likelihood function of expression (9) in order to establish what we have called the corresponding Bayesian-MDL. Therefore, for any \(m (t \mid \theta )\) defined in expression (35) and its respective intensity function defined in (34), it follows that,

$$\begin{aligned} \begin{aligned} L({\textbf{D}}\mid \phi )&\propto e^{-m(\tau _1\mid \theta _1)} \prod _{i=1}^{N_{\tau _1}} \lambda (d_i\mid \theta _1)\\&\times \prod _{j=2}^{J} \left( e^{-[m(\tau _j\mid \theta _j)-m(\tau _{j-1} \mid \theta _j )]}\prod _{i=N_{\tau _{j-1}}+1}^{N_{\tau _j}} \lambda (d_i \mid \theta _j) \right) \\&\times e^{-[m(T \mid \theta _{J+1})-m(\tau _{J}\mid \theta _{J+1} )]} \prod _{i=N_{\tau _{J}}+1}^{n}\lambda (d_i\mid \theta _{J+1}) \end{aligned} \end{aligned}$$
(10)

since the product with respect to i only affects the functions \(\lambda (t,\theta )\) in each of the different \(J+1\) regimes determined by the J change-points in the vector \(\tau =(\tau _1, \tau _2,\dots , \tau _j)\). Using that for J change-points, \(\tau _{J+1} := T\) where T is the number of daily measurements, \(\tau _0=0,N_0=0\) and \(m(0\mid \theta )=0\), it follows that the expression (10) reduces to,

$$\begin{aligned} L({\textbf{D}}\mid \phi ) \propto \prod _{j=1}^{J+1} \left( e^{-[m(\tau _j\mid \theta _j)-m(\tau _{j-1} \mid \theta _j )]} \prod _{i=N_{\tau _{J-1}}+1}^{N_{\tau _j}} \lambda (d_i\mid \theta _j) \right) \end{aligned}$$
(11)

Taking the logarithm of (36) we have,

$$\begin{aligned} \log L(D \mid \phi )&= \left( \sum _{j=1}^{J+1} m(\tau _{j-1}\mid \theta _j)-m(\tau _j\mid \theta _j) \right) \nonumber \\&+\left( \sum _{j=1}^{J+1} \quad \sum _{i= N_{\tau _{j-1}}+1}^{ N_{\tau _j}} \log \lambda (d_i\mid \theta _j) \right) \nonumber \\&=\sum _{j=1}^{J+1} \left( m(\tau _{j-1}\mid \theta _j)-m(\tau _j\mid \theta _j)+\sum _{i= N_{\tau _{j-1}}+1}^{ N_{\tau _j}} \log \lambda (d_i\mid \theta _j) \right) \end{aligned}$$
(12)

Also, the Bayesian principle states that,

$$\begin{aligned} f(\theta \mid D)\propto L(D\mid \theta ) f(\theta ), \end{aligned}$$
(13)

where L is the likelihood function depending on the observations D, given the parameter vector \(\theta\) and, \(f(\theta )\) the prior function of the parameter vector \(\theta\). Taking logarithm for (13) we have that,

$$\begin{aligned} \ln f(\theta \mid D)\propto \ln (L(D\mid \theta )) + \ln (f(\theta )) \end{aligned}$$
(14)

From (14), we start by establishing the form of the prior joint function \(f(\theta )=f(\alpha , \beta , \tau _j)\) of the first three cases in (35). The expression for the last case is given in the appendix (A.2).

3.1 Prior distributions

If we take \(\alpha \sim Gamma(\phi _{11},\phi _{12})\), then,

$$\begin{aligned} f(\alpha ) = \frac{\phi _{11}^{\phi _{12}}}{\Gamma (\phi _{12})} \alpha ^{\phi _{12}-1}e^{-\phi _{11} \alpha } \end{aligned}$$

After applying logarithm we obtain,

$$\begin{aligned} \log f(\alpha )&= \log \left( \dfrac{\phi _{11}^{\phi _{12}}}{\Gamma (\phi _{12})} \alpha ^{\phi _{12}-1}e^{-\phi _{11} \alpha } \right) \nonumber \\&= \phi _{12}\log \phi _{11} - \log \Gamma (\phi _{12}) + (\phi _{12}-1)\log \alpha - \phi _{11}\alpha \nonumber \\&\propto (\phi _{12}-1)\log \alpha - \phi _{11}\alpha \end{aligned}$$
(15)

Similarly for \(\beta\), if we take \(\beta \sim Gamma(\phi _{21},\phi _{22})\), then,

$$\begin{aligned} \log f(\beta ) \propto (\phi _{22}-1)\log \beta - \phi _{21}\beta \end{aligned}$$
(16)

On the other hand, assuming every time in the series can be chosen as a change-point \(\tau _j\) with the same probability, thus \(\tau _j \sim Uniform(1, T), \ j = 1, 2, \ldots , J\).

Then we have,

$$\begin{aligned} f(\tau _j) = \frac{1}{T-1} \end{aligned}$$
(17)

Taking logarithm we obtain,

$$\begin{aligned} \log f(\tau _j) = - log(T-1) \end{aligned}$$
(18)

Rebuilding the joint function for \(\theta =(\alpha ,\beta , \tau _j)\) under the assumption of independence, we have,

$$\begin{aligned} \log f(\alpha ,\beta , \tau _j)&\propto (\phi _{12}-1)\log \alpha - \phi _{11}\alpha \nonumber \\&+ (\phi _{22}-1)\log \beta - \phi _{21}\beta - log(T-1) \end{aligned}$$
(19)

Up to this point, the second summand of the right-hand side of (14) has been obtained. Next, the first summand of the right-hand side of (14) will be derived. This will be done for the first case of (35) and the other ones are given in the appendix (sections (A.2.1), (A.2.2), (A.2.3)).

3.1.1 Weibull intensity rate (W)

After taking the expressions for the intensity function \(\lambda ^{(W)}(t\mid \theta )\) and the cumulative mean function \(m^{(W)}(t\mid \theta )\) using (35) and (34) respectively, and replacing these in (37) we have,

$$\begin{aligned} \log L(D\mid \phi )= & {} \sum _{j=1}^{J+1} \left( m(\tau _{j-1}\mid \theta _j)-m(\tau _j\mid \theta _j)+\sum _{i= N_{\tau _{j-1}}+1}^{ N_{\tau _j}} \log \lambda (d_i\mid \theta _j) \right) \\= & {} \sum _{j=1}^{J+1} \left( \left( \dfrac{\tau _{j-1}}{\beta _j}\right) ^{\alpha _j}-\left( \dfrac{\tau _{j}}{\beta _j}\right) ^{\alpha _j} \right. \\{} & {} \left. +\sum _{i= N_{\tau _{j-1}}+1}^{ N_{\tau _j}} \log \left( \dfrac{\alpha _j}{\beta _j} \left( \dfrac{d_i}{\beta _j}\right) ^{\alpha _j-1}\right) \right) \\= & {} \sum _{j=1}^{J+1} \left( \dfrac{\tau _{j-1}^{\alpha _j}-\tau _{j}^{\alpha _j}}{\beta _j^{\alpha _j}} + (N_{\tau _j}-N_{\tau _{j-1}})\left( \log (\alpha _j)- \alpha _j\log (\beta _j)\right) \right. \\{} & {} \left. +(\alpha _j-1)\sum _{i= N_{\tau _{j-1}}+1}^{ N_{\tau _j}}\log (d_i)\right) \end{aligned}$$

Substituting the expressions (39), (41), (19) in the objective function of the expression (9) we have that,

$$\begin{aligned} \ln P_{\tau }(\theta ) - \ln f_\tau (D\mid \theta ) - \ln f_\tau (\theta )&= 2\sum _{i=1}^{J+1}\dfrac{\ln (\tau _i-\tau _{i-1})}{2}+ \ln (J) + \sum _{i=2}^J\ln (\tau _i)\nonumber \\&-\sum _{j=1}^{J+1} \left( \dfrac{\tau _{j-1}^{\alpha _j}-\tau _{j}^{\alpha _j}}{\beta _j^{\alpha _j}} \right. \nonumber \\&\left. + (N_{\tau _j}-N_{\tau _{j-1}})\left( \ln (\alpha _j)- \alpha _j\ln (\beta _j)\right) \right. \nonumber \\&\left. + (\alpha _j-1)\sum _{i= N_{\tau _{j-1}}+1}^{ N_{\tau _j}}\ln (d_i)\right) \nonumber \\&-\sum _{j=1}^{J+1} \left( (\phi _{12}-1)\ln \alpha _j - \phi _{11}\alpha _j\right. \nonumber \\&\left. + (\phi _{22}-1)\ln \beta _j - \phi _{21}\beta _j\right) + J \ ln (T-1) \end{aligned}$$
(20)

Each of the members of the same generation will have a Bayesian-MDL, of which the smallest is chosen. This is done for all the generations. At the end, we will have as many Bayesian-MDLs as generations, and the minimum corresponding to the solution sought in the problem of determining the points of change of the time series of intervals is chosen. Now, we will proceed to address the performance of the algorithm under different escenarios, three simulated and one of real data.

4 Results and discussion

In this section, we proceed to assess the performance of the algorithm to detect multiple change-points. For this purpose we consider two datasets, the first one being simulated observations and the second one consistent of records of particulate matter of less than 2.5 microns of diameter (\(PM_{2.5}\)) in the city of Bogotá, Colombia collected during the period 2018-2020 on a daily basis.

For the implemented experiments, only the Weibull mean cumulative function (20) was considered with optimal values for the parameters \(\alpha\) and \(\beta\), 0.1 and 0.5 respectively, which were estimated via the optim function of the statistical software R (R Core Team, 2022); on the other hand, for \(\alpha\) and \(\beta\) we used as prior distributions \(Gamma(\phi _{i1}, \phi _{i2}), \ i = 1, 2\) as we previosly defined in section (3.1.1) and the optimal values for the hyperparameters were found to be \(\phi _{11} = 1\), \(\phi _{12} = 2\), \(\phi _{21} = 3\) and \(\phi _{22} = 1.2\) estimated using Markov-Chain Monte Carlo (MCMC) methods such that, the objective function takes the form of (19).

We start by analyzing the simulated data under three settings which will be described soon in a detailed manner and then we proceed with the real data. For this task we used the statistical software R; the scripts and datasets can be shared on request.

4.1 Simulation study

To assess the performance of the algorithm on simulated data, we proceed under a similar scheme as the one proposed by Li & Lund (2012). Three different settings were considered for the number of change-points J, 1, 2 and 3 and their locations, \(\tau _1, \tau _2, \ldots , \tau _J\) were selected in a convenient manner to illustrate, such that these are presented in Table 1.

Taking into account that the length of the \(PM_{2.5}\) series for Bogotá for 2018-2020, was 1096, such number of observations were simulated from a log-normal distribution with scale parameter \(\mu \in {\mathbb {R}}\) and of shape \(\sigma > 0\) (see expression (21)). The data was approximated to this distribution as per the results obtained using the library fitdistrplus Delignette-Muller & Dutang (2015) in R and the ones in Rumburg et al. (2001).

$$\begin{aligned} f(x) = {\left\{ \begin{array}{ll} \frac{1}{x \sigma \sqrt{2 \pi }} exp \Bigg ( -\frac{(ln(x) - \mu )^2}{2 \sigma ^2} \Bigg ), &{} \hbox { }\ x > 0\\ 0, &{} x < 0 \end{array}\right. } \end{aligned}$$
(21)

Now, let’s remember that J change-points split the time series into \(J+1\) sub-series or regimes, such that for each J, every regime was generated by incrementally varying the scale parameter \(\mu\) in 0.5 units while the parameter \(\sigma\) was held constant and equal to 0.32. Thus, the values of \(\mu\) and \(\sigma\) used to generate the \(J+1\) regimes, \(J \in \{1, 2, 3\}\) are presented in Table 2 while the series graphical behavior for the three settings of \(\tau _1, \tau _2, \ldots , \tau _J\) can be appreciated in Fig. 1 such that the vertical dashed lines represent the change-points. On the other hand, again to illustrate, we defined as threshold for possible exceedances the arithmetic mean of the 1096 simulations, \({\bar{X}} = \frac{1}{1096} \sum _{t = 1}^{1096} X_t\).

For each of the three settings, the number of change-points were estimated, the optimal Bayesian-MDL and the cumulative mean function, \(m(t\mid \theta )\) and for every run of the genetic algorithm, 50 generations with 50 individuals each were used; the proportion of the times used to generate the initial population was of 6% and the mutation probability was 3% as in Li & Lund (2012). These results are presented and analyzed in the following section.

Table 1 Different Change-points simulations considered
Table 2 Settings for the time series regimes for different number of change-points
Fig. 1
figure 1

Simulated time series behavior

The results of the implementation of the proposed model in the three different schemes with simulated data and the real data set in the context of air quality will be shown below. For this, the performance on each data set will be illustrated by means of Figs. 2, 4, 6, 6 and 9. These figures should be interpreted as follows. The graph on the upper left represents the behavior of the observed cumulative mean number of overshoots (in black), compared to the estimated cumulative mean function \(m(t\mid {\hat{\theta }})\) (in red) and their respective confidence intervals of the estimated at \(95\%\) (in blue). At the top right of the same figure is a plot representing the behavior of BMDL across each of the 50 generations. In the lower left is the histogram that determines the number of times a change point is repeated over the 50 generations. The lower right graph relates the change-points obtained in each of the 50 generations. It is important to note that the simulations shown in the Fig. 1 are intended to show three of the most important schemes. In the first panel with only one change point, we want to show the case in which there is a significant disproportion between the observations to the left of the change point, versus those to the right of the change point (\(\tau _{1}=825\)). In the second scheme with the two change-points \(\tau _1 = 365\), \(\tau _2 = 730\) we want to exhibit the case where the points evenly split the segments in the time series. While for the last scheme it is sought to establish the predictive ability of the algorithm in the case where the change-points are at the end of the time series as is the case of the scheme where the location of the same are in \(\tau _1 = 548\), \(\tau _2 = 823\), \(\tau _3 = 973\). In the later scheme it should be noted that all the change-points are after the middle of the initial observations of the time series of interest.

4.1.1 First setting: one change-point

Fig. 2
figure 2

Genetic algorithm results - 1 change-points: a Confidence interval for fitted \(m(t\mid \theta )\) (blue lines), estimated cumulative average value of the number of exceedances at time t (red line) and number of exceedances observed at time t (black line). b BMDL calculated for each of 50 generations. c Frequency of the number of times each of the change-points is observed in the 50 generations. The red lines indicate the real locations of the change-points. d Estimated number of change-points in each of the 50 generations.The dashed lines indicate the optimal estimated number of change-points

Fig. 3
figure 3

First setting - Mean fitting for the regimes

We start by considering the first configuration of the Table 1, i.e. \(\tau _1 = 825\). In the Fig. 2 the results of the genetic algorithm implementation are recorded. The top left plot shows the good fit of the observed cumulative overruns with the estimated \(m(t\mid {\hat{\theta }})\) function, before and after \(\tau _1 = 825\). The estimated value of the number of overshoots at \(\tau _1 = 825\) is 213.51 (in red) versus the observed value of 204 (in black). All observed values are contained within the \(95\%\) confidence bands (in blue).

On the other hand, according to the next plot (upper right panel) the minimum reached for the Bayesian-MDL was found around the 40th generation taking a value of 797.27; such generation is marked explicitly in the lower right panel by the vertical dashed line while the optimal number of change-points detected was \(J = 3\) as per the horizontal dashed line.

Despite the real value of \(J = 1\) being different to the estimated one, in the last panel (lower left) the bar associated with the more frequent change-point can be found in a neighborhood of \(\tau _1 = 825\), furthermore, taking as a reference such real value (vertical solid line) we have that the estimations are in average, two units over the optimum. In the same histogram, it can be seen that of the 50 repetitions (one for each generation), 20 are very close to \(\tau _1 = 825\).

Now, the Fig. 3 shows using the solid horizontal line, the sample mean associated to each regime defined by the estimated change-points. Thus the algorithm accomplishes for the whole series capturing the trend of the values under the presence of said points even while there are differences regarding the estimated J and its real value.

4.1.2 Second setting: two change-points

Fig. 4
figure 4

Genetic algorithm results - 2 change-points: a Confidence interval for fitted \(m(t\mid \theta )\) (blue lines), estimated cumulative average value of the number of exceedances at time t (red line) and number of exceedances observed at time t (black line). b BMDL calculated for each of 50 generations. c Frequency of the number of times each of the change points is observed in the 50 generations. The red lines indicate the real locations of the change points. d Estimated number of change points in each of the 50 generations.The dashed lines indicate the optimal estimated number of change points

Fig. 5
figure 5

Second setting - Mean fitting for the regimes

Now we have the second setting of the Table 1, \(\tau _1 = 365\) and \(\tau _2 = 730\) with the results of the implementation of the genetic algorithm presented in the Fig. 4. We start by analyzing the fitting of the mean cumulative function (upper left panel) and move in a clockwise manner.

In item (a) of the Fig. 4 there is a good fit of \(m(t\mid \theta )\) (red line) so that its values overlap with the real ones (black line), while the \(95\%\) confidence bands (blue lines) completely contain the real observations. Again, as in the previous case, these estimates grow in a stepwise fashion following the behavior of the actual values with breaks defined by the change points. The estimate of \(m(t\mid \theta )\) over the change points were \(m(\tau _1\mid {\hat{\theta }}) = 18.54\), \(m(\tau _2\mid {\hat{\theta }}) = 150.63\). That is, we estimate about 19 overshoots up to \(\tau _1\) and about 151 overshoots up to \(\tau _2\), compared to the observed values of 12 in \(\tau _{1}\) and 137 in \(\tau _{2}\).

The following graph (b) of (4), presents the evolution with respect to the generations of the Bayesian-MDL, reaching the minimum in generation 42 of the 50. It follows from there to observe in d. of the same figure, that the chromosome of generation 42, suggests \(J = 4\) change points, when in reality they are only 2 change points. However, as can be seen in histogram c. of the same figure, it is noted that the highest frequencies in the 50 generations contain very small neighborhoods of the true values of \(\tau _{1}\) and \(\tau _{2}\).

On the other hand, (5) through the solid horizontal line the sample mean is again represented for each of the regimes after estimating the change-points; as in the previous case the algorithm allows for a proper fitting of the observed data and the variations of the statistic of interest under the presence of said times despite the estimation of J which was superior to the real value (Fig. 5).

4.1.3 Third setting: three change-points

Fig. 6
figure 6

Genetic algorithm results - 3 change-points: (a) Confidence interval for fitted \(m(t\mid \theta )\) (blue lines), estimated cumulative average value of the number of exceedances at time t (red line) and number of exceedances observed at time t (black line). (b) BMDL calculated for each of 50 generations. (c) Frequency of the number of times each of the change points is observed in the 50 generations. The red lines indicate the real locations of the change points. (d) Estimated number of change points in each of the 50 generations.The dashed lines indicate the optimal estimated number of change points

Fig. 7
figure 7

Third setting - Mean fitting for the regimes

Finally we have now the case of three change-points such they are located at \(\tau _1 = 169, \tau _2 = 413\) and \(\tau _3 = 1027\) and the results of applying the genetic algorithm are presented in Fig. 6. In this case, it is worth noting that as seen in the time series depicted in the Fig. 7, the three simulated changes are presented back-to-back at the end of the series, making it more difficult to capture the actual location of the change points \(\tau _1 = 169, \tau _2 = 413\) and \(\tau _3 = 1027\). This apparent difficulty, can be seen reflected in the fit of the function \(m(t\mid \theta )\), in panel a. of the Fig. 6, since it does not appear that what is observed is fully contained within the \(95\%\) confidence bands, at the beginning of the series. Estimated values were \(m(\tau _1\mid {\hat{\theta }}) = 15.12\), \(m(\tau _2\mid {\hat{\theta }}) = 123.76\), \(m(\tau _1\mid {\hat{\theta }}) = 250.67\), versus the actual observed 10, 109 and 244 respectively, showing high predictive ability of the model, even in the presence of many changes at the end of the series of interest.

Now we analyze the evolution of the Bayesian-MDL for each generation of the genetic algorithm (upper right panel) its behavior seems to be approximately constant for all its path with some decreases such that the optimal value was 584.13. Said optimum was reached in generation 20 as noted by the vertical dashed line in the third plot (lower right panel); on the other hand, the optimal number of change-points was found to be \(J = 3\) according to the horizontal dashed line in the panel d., the real number of change-points.

Similarly as in the other two cases, the estimated change-points cluster around the real ones as per the last plot of (6) (lower left panel) and as per the values around the vertical solid lines representing the real optimum; as in the other cases the more frequent values are far from the real optimum in average in just two units or one, this for the first two change-points while for the third one the algorithm captures such value with difficulty which is consequent with the behavior for the mean cumulative function. Finally, in (7) we have the fitting of the sample mean for all the considered regimes; the trend associated to the change-points it is not only captured by the algorithm but also it can be appreciated how the found ruptures are in a neighborhood close to the real change-points which is consistent with the results in the last plot of (6).

Additionally to the three presented settings, in the appendix in the third section (A.3), the reader can found some results in regards of the performance of the algorithm with a larger number of change-points, i.e., \(J = 10, 20, \ \text {and} \ 50\) as the structure of the objective function (20) depends on it. Further experiments present in the apendix in the first section (A.1) were also conducted with other available methods such as the Pruned Linear Exact Time (PELT) (Killick et al., 2012), the Cumulative Sums (CUSUM) (Page, 1954) and the original that this work is based on by Li & Lund (2012) showing the here proposed approach is robust to deviations from distributional assumptions

4.2 Real data analysis

Here we applied on the \(PM_{2.5}\) series for Bogotá in the period 2018 - 2020, in an exhaustive manner, the genetic algorithm with the already described settings at the beginning of this section and using as well as an objective function (19). As threshold, the one established by the environmental norm for Colombia was used which specifies that presence of this polluting agent can not overpass \(37 \mu g / m^3\). The series can be observed in Fig. 8.

Fig. 8
figure 8

Maximum \(PM_{2.5}\) daily measurements in Bogota, Colombia: January 1 2018 - December 31 2020. The red solid line represents the optimal mean by regime

Then, the optimal chromosome was found to be (8, 400, 408, 445, 488, 627, 654, 661, 798), such that there are \(J = 8\) change-points. The horizontal solid line in the third plot (lower right panel) of Fig. 8 indicates the sample mean of every regime such that the start of each one of these represents the presence of a change-point, except for the first one and the last times. It can be observed as well, that the first 4 change-points are very close and they overlap. The Table 3 shows the date corresponding to each change-point.

Table 3 Points of change with their respective dates

These surpasses occurred during the rainy season in Bogotá. If the dates in the table above are compared with the measurements in Fig. 8, the seasons with the most consistent peaks over time can be captured.

As can be seen in the other plots in (8), the one in the upper left panel shows the adjustment of the cumulative mean function \(m(t\mid \theta )\) (black line) by means of the point averages, obtained from the vectors of parameters better qualified by the MDL (red line), and their 95% confidence interval (blue lines).

Also, the corresponding optimal Bayesian-MDL was 778.44 as per the upper right panel which shows the evaluation of said metric for each one of the best chromosomes of each of the 50 generations such that the minimum is reached around the 45th one (vertical dashed line in the lower right panel).

Finally, in the lower left panel, the histogram shows the days in the time series that exceed the threshold, which appear most frequently in the first 50 chromosomes of the 50 generations analyzed.

Fig. 9
figure 9

Actual data on \(PM_{2.5}\) pollution in Bogota from 2018 to 2020. (a) Confidence interval for fitted \(m(t\mid \theta )\) (blue lines), estimated cumulative average value of the number of exceedances at time t (red line) and number of exceedances observed at time t (black line). (b) BMDL calculated for each of 50 generations. (c) Frequency of the number of times each of the change points is observed in the 50 generations. The red lines indicate the real locations of the change points. (d) Estimated number of change points in each of the 50 generations.The dashed lines indicate the optimal estimated number of change points

The graph in Fig. 10 shows that before day 400, i.e., Monday, February 4, the rate of exceedances of the 37 \(\upmu\)g/\({\rm m}^3\) threshold had been decreasing sharply, but after it, the highest emission rate for eight consecutive days was recorded. This high average rate is around 1.004, as shown in (Table 4) for the second regime. In the third regime, it decreased to an average rate of 0.9468; in the fourth, it jumps to 0.6748, and it is in the fifth regime that it achieves the lowest drop, 0.2090, before the intensity function rises again to levels above the 0.2 threshold overshoots per unit of time. This rate present in the fifth regime goes from May 3 to September 19, 2019. Thus, it is evident that the fifth regime occurred before the COVID-19 pandemic lockdown was declared in Colombia. Therefore, this may be the result of a public policy aimed at reducing the emissions of \(PM_{2.5}\).

Fig. 10
figure 10

Rate function for \(PM_{2.5}\) by Days

As part of the 2010-2020 ten-year plan for air pollution control, the use of emission control systems in cargo transport vehicles and motorcycles, and as well as the integrated public transport system (SITP for its acronym in Spanish) policies were implemented. The later includes the replacement of old buses with internal combustion engine with electric or hybrid buses. In addition to the above, a few days before the fifth regime, resolution 383 (see RES19 (2019)) was issued, which declared a yellow alert for particulate matter exceedances. Considering the first regime represented in Fig. 10), the rapid deceleration in the emission of threshold exceedances can also be seen as a consequence of this resolution. Such is the case of restrictions on the use of transportation and the mobility sector, in addition to those aimed at the operations of industries that use combustion processes associated mainly with the burning of biomass and the use of fossil or liquid fuels.

Table 4 Minimum, Maximum and Mean for each Regime

5 Conclusions and future work

A solution has been presented to determine change points in counting time series, particularly when they exceed an environmental standard of interest. So, the need for specifying a likelihood function was overpassed by modeling the times of the series where existed exceedances through a non-homogeneous Poisson process and this specification allowed for better estimations of the number of change-points with a lower variability and a better estimation of parameters of the underlying generating process.

A family of such objective functions was derived using different definitions for the mean cumulative function of the non-homogeneous Poisson processes but it is yet to be seen the influence of using one or the other on the estimation of the change-points and their location.

The search method for the change-points was a genetic algorithm similar to the one exposed by Li & Lund (2012), nevertheless, there is still room for improvement for it in terms of different operators available at literature and as well the use of other families of such algorithms.

The detection of change-points using a genetic algorithm and a Bayesian-MDL as selection criterion yielded results that agree with the public policies implemented in Bogotá, Colombia, both regarding the contamination alerts issued by the monitoring network, as well as the mobility restrictions due to the quarantine caused by the SARS-CoV-2 virus pandemic.

The good performance of the algorithm is partly due to the fact that in Achcar et al. (2010) and Achcar et al. (2011), the cumulative means function of the contamination exceedances was adjusted, but by then there was no computational technique for the automatic detection of change-points.

We hope that in the near future these methods will prove to be a useful tool for the government agencies in charge of measuring the effectiveness of the actions taken to reduce air pollution, and thus reduce its impacts both in the environment and the health of the inhabitants.