Stochastic Modeling of Indoor Air Temperature

Temperature is one of the main parameters describing thermal comfort and indoor air quality. In this paper we propose an approach, based on a modification of the continuous time random walk, to model the indoor air temperature. We perform a statistical analysis of the recorded time series, that allows us to point out the main statistical properties of the recorded variable. The obtained conclusions about the nature of the process lead to a continuous time random walk, that in contrast to the classical approach, models time dependence of the jumps distribution. Moreover, we show that the waiting times can be modeled by a tempered stable distribution, which yields a subdiffusive behavior in short times and diffusive behavior in longer times. Finally, by conducting a simulation study we illustrate possible applications of the presented approach in the thermal comfort monitoring and forecasting.

tic and visual comfort. A widely accepted definition of thermal comfort is that it is a state of mind that expresses satisfaction with the thermal environment [ASHRAE 55-2004, EN ISO 7730]. Thermal comfort is a psychological phenomenon, subjective feeling affected by the physical laws of heat transfer. Thermal comfort is influenced by numerous factors. They are usually divided in two groups: environmental and personal. The first group includes air temperature, radiant temperature, air velocity and humidity. The most important personal factors contributing to thermal comfort are: age, gender and health status of occupant, type of clothing (particularly its insulation), activity level (i.e. amount of physical work done) and human physiology resulting in metabolic heat. These factors may be independent of each other, but together they contribute to a thermal comfort.
Thermal environment has been the subject of much international research over many years because the occupants of the buildings frequently suffer from thermal discomfort and this is not a minor problem. Maintaining comfortable thermal conditions inside building is important for health and quality of life of its occupants, appropriate energy consumption profile, as well as ensuring optimum work performance (extremes in air temperature may have adverse effects on productivity). Although a number of modeling approaches exist which help in studying the problem [1], the individual differences in preferences for thermal conditions cause that achieving acceptable comfort level for all occupants is the difficult, if not an impossible, task.
The assessment of thermal environment is often derived for steady state conditions or for minor fluctuations of one or more of the variables [2]. However thermal conditions inside rooms are seldom steady, due to the interaction between building structure, meteorology, occupancy, and heat, ventilation and air conditioning (HVAC) system [3]. The thermal conditions vary throughout a space and also with time (during the day, night and seasons). Hence the real time indoor environment monitoring system based on continuous measurements of selected physical parameters has vast prospects for human thermal comfort assessment.
Temporal variability of temperature causes that the assessment of this parameter based on a single or several periodic measurements may be unrepresentative. Therefore continuous monitoring and statistical analysis of measurement results is preferred [4]. The statistical description of transient states is based on an analysis of time series [5]. The effectiveness of this method depends upon sample sizes, averaging times, and sampling strategies. For example, the statistical description of dynamic conditions can be based on short or long-time average values. Unfortunately, the long-time average value gives little or no information on the thermal episodes that occur infrequently. Measurements that are averaged over shorter periods allow for recognizing these episodes. However, they reflect temporary conditions that are not representative of longer time periods. Therefore, short-term measurements may cause large uncertainties, potentially leading to incorrect conclusions, unsatisfactory performance of HVAC systems, or unnecessary costs [6].
Considering the shortcomings of traditional short-term measurement strategy, we proposed another approach using longer measurement periods and statistical data analysis. In this work, we want to show that a stochastic model of time series of temperature is a source of information about the state of indoor air. This methodology is a continuation of the analysis of indoor air quality time series presented in [7], where a subdiffusive process was analyzed in the context of experimentally recorded data. In this paper we extend the analysis presented in [7] and, based on statistical analysis of the observed time series, propose to use a model that apart from the observed constant time periods accounts also for the seasonal behavior of the underlying time series. The model is based on a modification of the continuous time random walk scheme.
The standard random walk describes a motion of a particle that in each time step makes a jump randomly in one of the possible directions. However, both the random walk and it's continuous time limit-Brownian motion, being a milestone of the statistical thermodynamics, are not capable of explaining a transport phenomena in complex systems such as glasses, liquid crystals, polymers, living cells or ecosystems. In contrast, in the continuous random walk (CTRW) scheme, originally introduced in [8], the waiting times between the jumps are not constant, as in the standard random walk, but are random variables governed by some probability law. As a consequence, the CTRW model seems to be a natural description of transport in crowded environments and complex systems. Limit distribution of the CTRW scheme can be also formulated within the framework of the celebrated fractional Fokker-Planck equation [9]. On the other hand, CTRW-if properly scaled-in the limit leads to a continuous time subordinated processes described by a Langevin type equations, see e.g. [10,11]. Such processes are in the origin of rapidly developing in the recent years field of anomalous diffusion, see e.g. [12]. In this paper we propose to use the tempered stable distribution [13] as an appropriate for the description of the waiting times and, consequently, the subordinator. This distribution has many interesting properties, like e.g. finite moments of all orders. Moreover it stays close to the purely α-stable family. It is worth mentioning, that this class of distributions allows for modeling data that demonstrate subdiffusive behavior at short times and normal at longer times.
The paper is structured as follows: in Sect. 2 we describe in details the analyzed time series. Next, in Sect. 3 we introduce the proposed model, i.e., the continuous time random walk that takes into account the properties observed in the considered data such as the apparent constant time periods and the seasonality. Moreover, we present the main properties of the proposed system. In Sect. 4 we analyze the temperature time series and apply the introduced discrete model. Finally, Sect. 5 consists of the simulation study and the last section concludes the paper.

Description of the Analyzed Datasets
As it was mentioned above, temperature is an important, measurable parameter associated with thermal comfort. Therefore our attention was focused on an analysis of time series of this variable. The present study was restricted to conditions characteristic for a classroom. The analysis of measurement results covers three days in June 2012, although the more extensive data set is already available. The classroom is involved in a long term indoor air quality (IAQ) monitoring program. The three days were selected to represent indoor conditions, which could be considered as the baseline for the studied space. The classroom was not equipped with mechanical ventilation. Its windows were sealed. In course of measurement series, the windows were not occluded and the room was closed. Therefore, except for cleaning operations in the morning the indoor air was influenced exclusively by the outdoor conditions.
The measurement device was located in the central part of the room, at the height of about 1 m. The temperature data was recorded with 15 s time resolution. The measurement accuracy was ±0.1°C.
We plotted the recorded time series in Fig. 1. Looking at the chart one easily notices time dependence of the temperature values. It may be observed that the presented time series exhibits a very characteristic behavior. The first aspect of it is the well pronounced seasonality. The temperature rises during the day and it falls during the night, in cycles. The second characteristic feature of the time series are the noticeable time periods when the temperature has

Continuous Time Random Walk
The observed data is recorded in discrete setting for both time and scale. Hence, in order to describe the dynamics of the analyzed dataset, we use the continuous time random walk (CTRW) methodology [8]. Recall, that in the classical approach the CTRW process is defined as: where the counting process {L t } is given by with the sequence {T n } ∞ n=1 of nonnegative independent identically distributed (i.i.d.) random variables representing the waiting times. The sequence {X n } ∞ n=1 represents the jumps. The process {L t } is often referred to as the renewal process or, alternatively, as the counting process [14]. Moreover the sequences {T n } and {X n } are assumed to be independent.
In this paper we propose to use a modification of the classical CTRW scenario, in which the distribution of the jump sizes depends on the actual time of the jump, while the waiting times are governed by the tempered stable law. In the following we provide a motivation and a detailed description of the proposed model.
The analyzed dataset exhibits seasonality (see Fig. 1) and the jumps can take only two values a and −a (due to the sensitivity of the sensor). Therefore, we consider a binomial model with a particle jumping between two sites, where a jump length is equal to a. The probability of the jump is governed by a periodic function that depends on the actual time t and therefore the jumps X t for each t have the following form: Such specification is an analogy to the field induced CTRW, analyzed in [16], with p t = 1 2 (1 + f (t)) for f being a periodic function.
Further, the statistical analysis of the considered dataset (see Sect. 4) indicates that the waiting times can be described by the tempered stable distribution. Therefore we assume that the sequence {T n } constitutes a sample of independent random variables from the tempered stable distribution, i.e., for each n the random variable T n has the following Laplace transform [17]: for some parameters c, λ > 0 and 0 < α < 1. The tempered stable distribution appears to be an important alternative to the power law distributions in confined systems, due to the finite second moment. For the simplicity in the further analysis we assume the scale parameter c = 1. The tempered stable distribution is related to the α-stable one through the operation of 'tempering'. Namely, the probability density function (pdf) of tempered stable random variable can be expressed in the following form: where U is a totally skewed α-stable random variable and g U is it's pdf. As we observe, for λ = 0 the tempered stable distribution defined by the Laplace transform in (3) becomes the α-stable random variable U . The tempered stable distribution of waiting times was considered in [17][18][19] from both theoretical and practical point of view. Moreover the rich class of tempered stable distributions was examined in [13,[20][21][22][23]. Some interesting applications of the class of tempered stable distributions one can find also in [24,25].
To sum up, we assume that the analyzed dataset can be modeled by the process defined as: where tempered stable random variables, L t = max{n ∈ N : Θ n ≤ t} and the distribution of jumps X j is defined in (2). Observe that, in contrast to the classical CTRW assuming that the waiting times and the jumps are independent, here the distribution of the jump depends on the actual time.
Using the relation between the counting process {L t } and the waiting times vector {T n } we can find the distribution of the process {L t }: where F T (·) is a cumulative distribution function (cdf) for the tempered stable random variable T and F * k T (·) denotes the cdf of the convolution of k tempered stable distributions. Let us mention, the convolution of k tempered stable random variables with scale parameter c = 1 has tempered stable distribution with scale parameter k.
In Fig. 2 we plot the first and the second moment of the process {Y (t)} calculated from the simulated trajectories of the considered model with the parameters estimated from the analyzed dataset (for details see Sect. 4).

Fig. 2
The first and the second moment calculated from 10000 simulated trajectories of the CTRW model (see (5)) with parameters estimated from the analyzed dataset (for details see Sect. 4)

The Subordinated Process as a Limit of the CTRW Model
The CTRW model defined in (5) has the following limiting process [10]: where {B(t)} is the classical Brownian motion and {S(t)} is the inverse subordinator. In the case of the tempered stable distribution of waiting times in the corresponding CTRW model, the process {S(t)} is the inverse tempered stable subordinator, being a limit of the counting process {L t }, see (5). The inverse tempered stable subordinator is defined as follows, [26]: where {T (τ )} is a tempered stable Lévy process, i.e., a process with independent stationary increments having the Laplace transform given by: Process {Z(t)} defined in (6) is a special case of a system defined in [10]. It arises as a combination of the external process {X(τ )} which satisfies appropriate Langevin equation and inverse tempered stable subordinator. More precisely {Z(t)} can be expressed as follows: where {X(τ )} is a solution of the Langevin equation: Some properties of the inverse tempered stable process one can find in [27] but also in [17], where the inverse tempered stable subordinator was analyzed in comparison with the tempered stable Lévy process. Below, we recall a formula for the pdf of the inverse tempered stable subordinator [17], namely For large and small t the density g S(t) , (9), can be approximated using the following relation: where g T (x) (·) is a pdf of a tempered stable random variable with the Laplace transform given in (8). The above relation is the extension of the similar behavior of stable subordinator and its inverse. This property is fully examined in [15]. The process {Z(t)} defined in (6) is called the subdiffusion process with time-dependent force and was considered in [10] not only for the case of the tempered stable distribution but for all infinitely divisible distributions of the subordinator. The discrete analogue of the process {Z(t)} was also analyzed in [16], where the authors presented relation between the pdf of the process and the fractional Fokker-Planck formula. Namely, the pdf g Z(t) (·) of the subdiffusion process with time-dependent force satisfies the following fractional Fokker-Planck equation [16]: where the integro-differential operator Φ is defined as follows: for sufficiently smooth function g(·). In the above definition the function M(·) is called the memory kernel and in the case of the tempered stable distribution it has the following representation [28]: where E α,β (·) is the generalized Mittag-Leffler function defined as: .
Using Proposition 1 in [10] we can calculate the moments of the process {Z(t)}: In the special case when the function f (x) = 0 for all x ∈ R the above formulas reduce to [28]:

Temperature Data Analysis
According to our assumption of the CTRW scenario, we apply here a procedure adequate for such kind of processes. Namely, in the first step of our analysis we divide the data into two vectors. The first one, vector T , consists of the lengths of the constant time periods and corresponds to the waiting times T i in the CTRW definition (see Eq. (5)). On the other hand, removing the constant time periods from the dataset, i.e., considering only these time points for which a change in the values of the analyzed process occurs, leads to a vector Y t i corresponding to the jumps of the CTRW process. Precisely, differentiating the vector Y t i we obtain the sizes of the jumps X i , namely X i = Y t i+1 − Y t i . The vectors obtained from the analyzed dataset are plotted in Fig. 3.

Waiting Times
In the next step, we analyze the waiting times properties. According to the CTRW definition, the vector {T i } should form an independent, identically distributed sample. First, we check the independence assumption. We apply a simple visual test that is based on the autocorrelation function of both, the series and the squared series, for details see [29] or [30]. Recall, that the autocorrelation function of a time series is defined as the correlation between equally distant observations as a function of time lag. If there is no dependence in the analyzed data the autocorrelation values should be close to 0. Moreover, the plot should resemble the values expected for a white noise sequence. In Fig. 4 we plot the obtained autocorrelation functions together with the confidence intervals for a white noise. As can be observed, the calculated values are close to 0 and most of them lie within the white noise confidence intervals. Hence, we may conclude that the waiting times are independent.  Next, we check if the waiting times have the same distribution. Here we propose to use a simple visual stationarity test based on the behavior of the empirical second moment of the underlying time series {T i }, i.e. the C j statistic defined as follows: If the analyzed vector constitutes a sample with elements from the same distribution, then the C j statistic is a linear function with respect to j . More details of this visual test one can find in [31]. In Fig. 5 we present the behavior of the C j statistic calculated for the waiting times, i.e. the vector {T i }. On the basis of statistic C i the rigorous regime variance test can be also constructed. The null hypothesis for this tests is defined as follows: the quantiles of the squared time series do not change in time. The hypothesis is satisfied for example in the case of independent identically distributed random variables. For a detailed description of the test see [31]. In our case the obtained p-value is equal to 0.2, what confirms the hypothesis of the same distribution of the waiting times.
Further, we find the waiting times distribution. We consider two types of distributions, namely α-stable and tempered stable. Since the data consists of only integer values (due to the measurement methodology), in order to estimate the distribution parameters, we propose to use a method based on the tail behavior of the data. Namely, we fit a tail function corresponding to the tested distribution to the right empirical tail of the dataset. A similar method was used in [32,33] in the context of the α-stable distribution, as well as in [19] for the tempered stable case. If the data comes form the α-stable distribution, the right tail behaves like a power function t −α . On the other hand, if the sample follows the tempered stable law, then the right tail can be approximated by e −λt t −α , where α and α, λ are the parameters of the α-stable and tempered stable distributions, respectively. In the considered case the above functions are fitted to the right empirical tails of the sample using least squares method. In Fig. 6 we plot the empirical right tail (in double logarithmic scale) and the fitted functions corresponding to both distributions. As we observe, the empirical tail does not behave according to a power law. On the other hand, the tempered stable distribution yields much better fit than the α-stable one and replicates the empirical tail well. The estimated parameters of the tempered stable distribution are λ = 0.01274 and α = 0.2082.

Jumps
Finally, we analyze the distribution of the jump sizes. The recorded temperature values may jump in each time step upward or downward. Due to the measurement accuracy, the size of the jump is always equal to 0.1°C. However, as can be observed in Fig. 1, the probability of upward (or equivalently downward) jump varies with time. Based on these observations we assume that the jumps of the process are given by where p t is a time varying (periodic) function. In order to estimate the probability p t , we use two methods. In the first one we use nonparametric approach and estimate p t as the proportion of upward jumps within a given hour. Precisely, we divide the dataset into 24 subsets corresponding to each hour of the day. Next, for each hour, we calculate the number of upward jumps and divide it by the overall number of jumps within that hour. An advantage of such method is that it does not require an upfront specification of the form of p t . However, the results are only reliable, if there are many data points used in the estimation procedure. Here, even though we have recorded in total 17280 time points, the subsets consist of only from 11 observations (for 2 am) up to 47 observations (for 9 am).
In the second method we assume that the upward jump probability is given by a periodic function of the form p t = a 1 sin(2π/a 2 (t + a 3 )) + a 4 . In order to estimate the parameters a i , we use the maximum likelihood method, i.e. we maximize the function L (x, a 1 , a 2 , a 3 , a 4 is the observed sample and the possible values for x t are ±0.1. Observe that if x t = 0.1, then P (X t = x t |a 1 , a 2 , a 3 , a 4 ) = p t = a 1 sin(2π/a 2 (t +a 3 ))+a 4 . On the other hand, if x t = −0.1, then P (X t = x t |a 1 , a 2 , a 3 , a 4 ) = 1 − p t = 1 − a 1 sin(2π/a 2 (t + a 3 )) − a 4 . Therefore, using the indicator function 1 {·} , we may write that: a 1 , a 2 , a 3 , a 4 In practice the above likelihood function has to be maximized numerically. It should be noted, that the numerical maximization may lead to parameter estimates yielding a value of p t that is negative or higher than one. Hence, in order to avoid such situation, some constraints should be used in the procedure. Here, we assume that a 4 > 0, | a 4 a 1 | ≥ 1 and for a 1 > 0 a 4 −1 a 1 ≤ −1, while for a 1 < 0 a 4 −1 a 1 ≥ 1. The estimated parameters are provided in Table 1. Note that the obtained period of 5731 observations approximately corresponds to a daily periodicity, as there are 5760 observations each day. This is in compliance with the intuition and the visual investigation of Fig. 1.
The values of p t obtained using both proposed methods are plotted in Fig. 7. Observe that, although the curves are not identical, they preserve similar shape with a higher probability of an upward jump at the beginning of the day and lower in the late afternoon and night hours. This result is in compliance with the temperature properties apparent in Fig. 1. Analogously, there is an upward trend of temperature in the morning/early afternoon, which changes into a downward trend in the late afternoon.

Simulation Study
In this section we perform a simulation study using the proposed model and the parameters estimated in the previous section. First, in Fig. 8 we plot a sample simulated CTRW trajec- Next, we calculate the quantile lines, i.e., the curves describing the values that with a given probability will not be exceeded by the corresponding process, see [34] for details. The quantile lines of levels 10 %, 20 %, . . . , 90 % obtained using 10000 simulated trajectories are plotted in Fig. 9. Additionally, in the same figure we plot the recorded temperature values. Observe that the observed temperature does not exceed the bounds given by the 10 % and 90 % quantile lines.
Further, using 10000 simulated trajectories of the CTRW model (5), we plot the time evolution of the model distribution. The obtained probabilities P (Y (t) = x) are plotted in Fig. 10.
Finally, in order to illustrate how beneficial might be the simulations of the proposed model in the air quality monitoring, we calculate the temperature forecast for the next hour  Fig. 11. Moreover, we calculate the 10 % and 90 % confidence bounds for the forecast, i.e., the curves within which a future value should lie with 80 % probability. As can be observed in the figure, the actual measured values lie within the calculated confidence interval and the forecast visually resembles the actual values. Next, we perform a more rigorous validation. Namely, we calculate the mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) of the forecast for the next 24 hours and the next hour. The obtained values are given in Table 2. The mean percentage error does not exceed 0.02 % for the 24 hours forecast and is no more than 0.002 % for the one hour forecast, so the temperature values are predicted quite accurately. In absolute terms the mean 24 hour error is about 0.4°C, while the mean hourly error about 0.04°C.

Conclusions
Measurements of temperature may be a source of information to characterize the transient state of the thermal environment. The temperature is extremely important to the occupant's perception of indoor air quality and thermal processes occurring inside room [35]. It reflects transient state of thermal environment because of an inherent link with building usage and surroundings [36]. The temperature in a building is dependent on outside temperature, sun loading, plus heating and cooling added by the HVAC system and other sources, e.g. occupants add heat to the room since the normal body temperature is much higher than the room temperature. Therefore, temperature value is the result of a range of interactions affected by seasonal and daily changes in meteorology and by the requirements of occupants varying in time and space [37]. In principle, two major kinds of information are useful for the indoor microclimate specialists. The first kind of information is applicable to the direct control of heat ventilation and air conditioning systems. The data which is useful for this purpose shall be acquired continuously and in real time. The other kind of information serves the assessment of the indoor thermal conditions in relatively long-term perspective. The evaluation shall be based on the time series of the measurement data. Therefore, in this paper we search for mathematical methods which would allow for the complex and quantitative assessment of indoor thermal conditions based on the time series of the measured values of physical/chemical parameters e.g. the temperature. In our opinion this is the way to provide the second kind of information.
In this paper we considered a stochastic system that allows for modeling such time series as the indoor temperature. We took into account the seasonal behavior but also the constant time periods observable in the indoor temperature measurements. More precisely, we proposed to use a generalized CTRW scheme in which the probability of jumps is given by a (periodic) function of time. Moreover we extended the classical model by introducing the tempered stable distribution of waiting times. For such a model we proposed a procedure to verify the model assumptions and to estimate the model parameters. This procedure can be divided into following steps: -divide the analyzed data into two vectors-first corresponding to the waiting times and second to the jump sizes, -check the assumption on the independence of waiting times using autocorrelation function, -check if the waiting times constitute sample from the same distribution by using tests based on the empirical second moment of the underlying time series, -estimate the parameters of the tempered stable distribution using the empirical tail behavior and check the fit of the theoretical tail, -estimate the parameters of the jump probabilities using the maximum likelihood method.
Finally, we have shown in the simulation study that fitting an appropriate model to the analyzed data set might be beneficial in analyzing, monitoring and forecasting the thermal environment indoors. It is worth mentioning, that the presented methods were used for the measurements over short periods that gives the greater reliability of the obtained results.