Probabilistic Approach to Modelling, Identification and Prediction of Environmental Pollution

The probabilistic general model of environmental pollution process based on the semi-Markov one is developed and presented in the paper. The semi-Markov chain model approach is based on using prior information to predict the characteristic of some systems. Now, the semi-Markov process is used for the environmental pollution assessment. The methods and procedures to estimate the environmental pollution process’s basic parameters such as the vector of initial probabilities and the matrix of probabilities of transition between the process’s states as well as the methods and procedures to identify the process conditional sojourn times’ distributions at the particular environmental pollution states and their mean values are proposed and defined. Next, the formulae to predict the main characteristics of the environmental pollution process such as the limit values of transient probabilities and mean total sojourn times in the particular states in the fixed time interval are given. Finally, the application of the presented model and methods for modelling, identification and prediction of the air environmental pollution process generated by sulphur dioxide within the exemplary industrial agglomeration is proposed.


Introduction
The air pollution is a phenomenon defined as a presence of harmful, toxic substances or their mixtures whose high concentration in the atmosphere is detrimental to the quality of life and causes health risks. The transport, industrialisation, agriculture and using chemicals in everyday life have become the main sources of pollution in urban and industrial areas. Some global and regional organisation as well as governments identify the needs for monitoring and assessment the environment quality, established its standards and limits as well as providing information to the public.
The new approach, based on the semi-Markov process for the environmental pollution assessment is proposed in this research. The semi-Markov process theory was developed by Lévy [31] and Smith [32]. These processes are used for modelling real systems and are commonly applied in the queuing and reliability theories [33][34][35][36][37][38]. In this study, modelling, identification and prediction procedures are adopted from [39] and [40,41] where they are used related to the operation processes of complex technical systems and the critical infrastructure accident consequences for the marine environment, respectively.

3
The semi-Markov model is considered in the paper as the approach more flexible in opposition to the traditional Markov model. In Markov model, it is assumed that the distributions of conditional sojourn times in particular states are only exponential. For the semi-Markov approach, the distributions of sojourn times do not necessarily have to be exponential. Thus, the model is possible to use for any distribution of the operation process sojourn times at the particular operation states. This way the semi-Markov approach is more sensible, giving the better description of reality.

Modelling Environmental Pollution Process
It is assumed that the pollutant's concentration in the environment takes v, v ∈ N different concentration states s 1 , s 2 , … , s v . Further, the environmental pollution process S(t), t ∈ ⟨0, +∞) with the pollutant's concentration states from the set {s 1 , s 2 , … , s v } is defined. Moreover, a semi-Markov model of the environmental pollution process S(t) is assumed. Its random conditional sojourn time at the pollutant's concentration state s k while the next transition will be done to the state s l , k, l = 1, 2, … , v, k ≠ l is denoted by kl .
In the paper, a state transition means that the environmental pollution process shifts from one state to another. If a state remains the same, there were no transitions, because the process is still at the particular pollutant's concentration state.
For example there is no change from state 1 to state 1, but the process is still at state 1, under the condition it can shift to another state, except state 1. Thus, the state can transit to the next one or still stay in the first one. Hence, the environmental pollution process S(t) is defined by the matrix of probabilities p kl , k, l = 1, 2, … , v, k ≠ l of the process S(t) transitions between the pollutant's concentration states s k and s l .
where by the formal agreement Moreover, the environmental pollution process S(t) is described by the matrix of conditional distribution functions of sojourn times kl at the state s k while its next transition will be done to the state s l , k, l = 1, 2, … , v, k ≠ l where by the formal agreement The matrix (Eq. (2)) corresponds to the matrix of conditional densities of sojourn times kl of the environmental pollution process S(t) at the state s k while its next transition will be done to the state s l , k, l = 1, 2, … , v, k ≠ l where by the formal agreement

Identification of Environmental Pollution Process
Prior to estimating the unknown parameters of the environmental pollution process S(t), its kinds and number v of states s 1 , s 2 , … , s v should be fixed and defined. The identification of environmental pollution process S(t) is based on its number of realisations. The matrix of realisations n kl , k, l = 1, 2, … , v of the numbers of the process S(t) transitions between the states s k and s l during the experimental time is fixed Taking into account the numbers given in the matrix (Eq. (4)), the matrix p kl vxv , k, l = 1, 2, … , v of realisations of probabilities of the process S(t) transitions between the states s k and s l during the experimental time is evaluated where and and is the realisation of the total number of transitions of the process S(t) from the state s k during the experimental time.
Further, the hypotheses on the distribution functions of the process S(t) conditional sojourn times kl , k, l = 1, 2, … , v, k ≠ l at the state s k while the next transition is to the state s l are formulated and verified on the base of their realisations kl , = 1, 2, … , n kl .
In order to estimate the distribution parameters of conditional sojourn times kl , k, l = 1, 2, … , v, k ≠ l of the process S(t) at its particular states, the empirical characteristics of their realisations at these states should be determined as follows:  In order to formulate and further to verify the nonparametric hypothesis relating to the distribution form of the environmental pollution process's conditional sojourn time kl , k, l = 1, 2, … , v, k ≠ l at the state s k while the next transition is to the state s l , on the base of its realisations kl , = 1, 2, … , n kl , the procedure adopted from [39] is applied as follows: • to construct and to plot the realisation of the histogram of the environmental pollution process's conditional sojourn time kl , k, l = 1, 2, … , v, k ≠ l at the state s k , defined by the formula • to compare the histogram h n kl (t) with the graphs of the density functions given in Chapter 2 in [39], and next to select one of them and to formulate the following hypothesis H , relating to the unknown form of the conditional sojourn time kl , k, l = 1, 2, … , v, k ≠ l distribution: the environmental pollution process's conditional sojourn time kl at the state s k while the next transition is to the state s l has the distribution expressed with the density function h kl (t) • to join the intervals I j having the number n kl j of realisations less than 4 with the neighbour ones I j+1 or I j−1 to obtain the numbers of realisations not less than 4 in all intervals • to fix a new number of intervals r kl • to determine new intervals (12) n kl j = #{ ∶ kl ∈ I j , ∈ 1, 2, … , n kl }, j = 1, 2, … , r kl ,  [39], z = 0 for the chimney distribution function H kl (t), z = 1 for the exponential distribution function H kl (t), and z = 2 for Gamma distribution function H kl (t) • to read the value u from tables of the 2 − Pearson's distribution for the fixed values of the significance level and the number of degrees of freedom r kl − z − 1 such that the equality holds • to determine the acceptance and critical domains in the form of the intervals ⟨0, u ⟩ and (u , +∞) , respectively • to compare the critical value u read from tables of the 2 − Pearson's distribution with the obtained value u n kl of the realisation of the statistics U n kl and to decide on the formulated hypothesis H in the following way: if the value u n kl does not belong to the critical domain, i.e. when u n kl ≤ u then the hypothesis H is not rejected, otherwise if the value u n kl belongs to the critical domain, i.e. when u n kl > u then the hypothesis H is rejected Finally, the mean values M kl of the conditional sojourn times kl are determined as follows [39]:

Prediction of Environmental Pollution Process
It is assumed, that the unknown parameters of the environmental pollution process S(t) are identified using the procedure given in "Sect. 2.2". Now, the main characteristics of the environmental pollution process S(t) can be predicted. Namely, taking into account the formula for the total probability, the unconditional distribution functions of sojourn times k , k = 1, 2, … , v at particular states s k of the process S(t) are determined by that are complied with the density functions given by Hence, the expected values E k of variables k are given by where p kl are defined by (Eq. (5)) and M kl are defined by (Eq. (17)).
The limit values of the transient probabilities of the process S(t) at its particular states are calculated according to the formula The probabilities k satisfy the system of following equations: where and p kl is given by Eq. (1).
The asymptotic distribution of the sojourn total time ̂ k at the state s k , k = 1, 2, … , v of the process S(t) in the time interval ⟨0, ⟩, > 0 is normal with the expected value where p k are given by Eq. (21).

Application-Preliminary Analysis of (Air) Environmental Pollution Process Generated by Sulphur Dioxide
Sulphur dioxide (SO 2 ) is an invisible gas that has a nasty and pungent odour. It reacts easily with other substances to form harmful compounds, such as sulphuric acid, sulphurous acid and sulphate particles. Sulphur dioxide is formed in the urbanised and industrial areas by burning coal in domestic fireplaces and the combustion of fossil fuels containing sulphur or sulphur compounds. Then the flue gas is the major anthropogenic source of sulphur dioxide in the air. The fossil fuels have different concentrations of sulphur and sulphur compounds. The coal and oil may contain up to 3% of these substances, whereas a natural gas may be completely free of them. Some chemical reactions in the air transforms SO 2 into sulphuric acid (H 2 SO 4 ) that is condensed into droplets, dissolved in the moisture of air (drops of rain, snow, clouds) and as a socalled acid rain reaches the surface of earth and rivers, seas, oceans and other water areas as well.
Sulphur dioxide affects both health and the environment. It harms the human respiratory system, reduces the lung function and makes breathing difficult. Therefore, people with asthma and chronic lung diseases are more sensitive to these effects than normal individuals. The SO 2 deposition implicates the destruction of vegetation, the degradation of soils and building materials. Due to the harmful properties of SO 2 , its limit values for the ambient concentration that correspond to different levels of health concern are distinguished ( Table 1). These values are also used as a component of the air quality indicators. The pollution levels presented in Table 1 correspond to Polish ones published by Main Inspectorate for Environmental Protection.
In the experiment, the SO 2 concentration data comes from the monitoring station AM3 located in Gdańsk-Nowy Port ( Fig. 1) and free-accessible through https:// powie trze. gios. gov. pl/. The AM3 is one of nine stations belonging to the ARMAAG monitoring network of Tri-City (Gdynia, Sopot and Gdańsk) agglomeration in Poland. This agglomeration is situated in Pomerania-the north and seaside part of Poland and has a population of over 1 million people. The area is affected by the pollution coming from industrial sectors as well as transport and domestic sources. Within the air ARMAAG monitoring system continuous measurements (counted every hour) of the air quality are taken at several representative points, where the concentrations of pollutants are the highest ones.

Modelling (Air) Environmental Pollution Process Generated by Sulphur Dioxide
Under the assumption that the sulphur dioxide concentration in the air is changing in time, taking into account data from Table 1, the following v = 9 sulphur dioxide concentration states s k , k = 1, 2, … , 9 of the environmental pollution process S(t) are arbitrarily distinguished (level 1 from Table 1 is divided into four additional sublevels expressed with state s 1 , s 2 , s 3 and s 4 , respectively): Then, according to Eqs. (1)-(3), the environmental pollution process S(t) is expressed by the matrix of probabilities p kl 9x9 of transitions between the particular states and the matrix of distribution functions H kl (t) 9x9 or equivalently by the matrix of corresponding to them density functions h kl (t) 9x9 of conditional sojourn times at the particular states.

Identification of (Air) Environmental Pollution Process Generated by Sulphur Dioxide
The experiment is performed in Gdańsk-Nowy Port (Fig. 1) during the 120-day period (8th Oct 2019-4th Feb 2020) and the statistical data coming from the real realisation are collected and given in Appendix. Through this experiment, there are not observed realisations in states s 6 , s 7 , s 8 and s 9 , then the matrix of realisations n kl , k, l = 1, 2, … , 5, k ≠ l of numbers of the process S(t) transitions between the states s k and s l during the experimental time are fixed and expressed according to Eq. (4).
Hence, according to Eq. (6), the realisation of the total numbers of the process S(t) transitions from the state s k , k = 1, 2, … , 5 during the experimental time is Further, applying Eq. (5), the matrix p kl 5x5 , k, l = 1,2, … , 5, k ≠ l of realisations of probabilities of the process S(t) transitions between the states s k and s l during the experimental time is fixed as follows: Applying the procedure and formulae given in "Sect. 2.2", and based on the data given in Appendix, the empirical parameters of the conditional sojourn times kl , k, l = 1, 2, … , 5, k ≠ l of the process S(t) can be determined. The conditional sojourn time 21 is an example of this procedure application presented below. The conditional sojourn time 21 is one having sufficient populous set of its realisations, that is it assumed n = 143 values presented in Appendix.
The results for the conditional sojourn time 21 are: • the realisation Using the procedure given in "Sect. 2.2" as well as the data given in Appendix and the above results, the hypotheses concerning the distribution forms of the environmental pollution process's conditional sojourn times kl , k, l = 1, 2, … , 5, k ≠ l at the particular states may be verified. To do this, a sufficiently numerous set of these variables realisations is needed. It means that the sets of particular realisations coming from the experiment should contain at least 30 ones (see Appendix). The conditional sojourn time 21 is the one having the most numerous set of its realisations and preliminarily analysed above in this section. The histogram h 21 (t) of the environmental pollution process's conditional sojourn time 21 realisation defined by Eq. (13) is presented and illustrated in Table 2 and Fig. 2, respectively.
After analysing and comparing the realisation of histogram h 21 (t) with the graphs of the density function of distributions distinguished in Chapter 2 in [39], the following hypothesis H is formulated: the environmental pollution process's conditional sojourn time 21 at the state s 2 when the next transition is to the state s 1 has the exponential distribution expressed with the density function of the form The unknown parameters of the hypothetical density function Eq. (28) are estimated using (4.13) in [39], and the results are as follows: Next, substituting Eq. (29) into Eq.    Table 2 The realisation of the histogram of the environmental pollution process's conditional sojourn time 21 Proceeding in the analogous way, based on the data given in Appendix, the forms of the particular density function h kl (t) of the environmental pollution process's conditional sojourn times kl , k, l = 1, 2, … , 5, k ≠ l that have a sufficient number of their realisation at the particular states are identified. The results are as follows:   When there are less than 30 realisations of the environmental pollution process S(t), it is assumed that such conditional sojourn time kl , k, l = 1, 2, … , 5, k ≠ l has the empirical density function given by h kl (t) = 1 n kl # ∶ kl ∈ I j , ∈ 1, 2, … , n kl , j = 1, 2, … , r kl that complies with the following distribution function: for t ≥ 0, k, l = 1, 2, … , 5, k ≠ l (the number of elements of the set is expressed with the symbol #). For instance, the environmental pollution process's conditional sojourn time 24 assumed n = 15 values given in Appendix. The order sample realisations 24 is 1, 1, 1, 1, 1, 2, 2, 3, 5, 6, 9, 14, 18, 21, 21. Thus the conditional sojourn time 24 has the empirical density function and the distribution function respectively given by Proceeding in the analogous way, based on the data given in Appendix, it is assumed that the conditional sojourn times 42 , 45 , 53 and 54 have also the empirical density function in the following forms: When only the number of realisations of process S(t) is known and all these realisations are equal to an approximate value, it is assumed that such conditional sojourn time kl , k, l = 1, 2, … , 5, k ≠ l has the uniform distribution in the interval between this value minus to this value plus its half (Bogalecka, 2020). For instance, the environmental pollution process's conditional time 35 assumed n 35 = 7 values given in Appendix. All of them equal to 1. Thus the conditional sojourn time 35 has the uniform density function and the distribution function respectively given by Proceeding in the analogous way, based on the data given in Appendix, it is assumed that the conditional sojourn times 13 , 15 and 25 have also the uniform density function in the following forms: After accepting the density functions of the particular conditional sojourn times kl , k, l = 1, 2, … , 5, k ≠ l of the  17) is applied to find their mean value M kl = E kl . In other cases, when the statistical identification of the environmental pollution process's conditional sojourn times distributions at the particular states is not possible because of the lack of sufficient numbers of their realisations, the approximate empirical values of mean values M kl = E kl of the conditional sojourn times at the particular states are calculated using the formula (7). The results are given in the matrix below

Prediction of (Air) Environmental Pollution Process Generated by Sulphur Dioxide
The process the environmental pollution process S(t) is identified in "Sect. 3.2". Now, its main characteristics may be predicted using the procedure presented in "Sect. 2.3". Applying Eq. (20) and considering Eqs. (24) and (48), the approximate mean values M k , k = 1, 2, … , 5 of unconditional sojourn times of variables k , k = 1, 2, … , 5 can be evaluated. The values that are not equal to 0 are presented only, and they are as follows: To find the limit values of the transient probabilities p k , k = 1, 2, … , 5 at particular states of the process S(t) , the system of equations (Eq. (22)) has to be solved that here it takes the following form: (52) (51)-(52) are evaluated based on the experiment and the real statistical data; therefore, the values (Eqs. (51)-(52)) may change and being more precise if the experiment duration is longer.
Moreover, the last results (Eqs. (51)-(52)) can play a practically role in the minimisation of air pollution caused by sulphur dioxide and its losses mitigation what is the subject of future research.

Conclusion
The model of the environmental pollution process based on the semi-Markov process designed and presented in the paper is a novel approach. The procedure of its practical application is illustrated in the modelling, identification and prediction of the environmental pollution process caused by the air pollutant, i.e. sulphur dioxide. The proposed method provides to establish the limit values of transient probabilities and the mean values of sojourn total times staying at particular pollution states indicating the concentration of pollutant (Table 3). There is the first approach to usage of this method; therefore, the obtained results should be treated just as an illustration of the proposed method.
The developed general model of the environmental pollution process is a universal tool. It can be used successfully in regard to other environmental pollutants existing in air or water and soil [42]. Moreover, the model allows to consider two or more pollutants in parallel. It means that the next stage of research will consider the air environmental pollution process generated jointly by SO 2 , CO, NO 2 , O 3 , PM 2.5 and PM 10 commonly used in determination of air quality index (AQI) that is based on these pollutants concentration and describes the air pollution levels. Acknowledgements Acknowledgment is made to the Chief Inspectorate for Environmental Protection, Poland, for free access to sulphur dioxide concentration data used in the paper.
Funding This work was supported by the Gdynia Maritime University ("Monitoring and analysis of the impact of selected substances and materials in terms of environmental protection"-project grant no. WZNJ/2022/PZ/10).

Availability of Data and Material
The data that support the findings of this study are available free of charge and remain the property of the Chief Inspectorate for Environmental Protection, Poland (https:// powie trze. gios. gov. pl/).

Ethics Approval
The author declares no ethical violation during the preparation of this manuscript.

Consent to Participate Not applicable.
Consent for Publication Not applicable.

Competing Interests
The author declares no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.