1 Introduction

It is widely recognized that the concentration of CO2 is an indicator of indoor air quality. Although this gas is not considered to pose serious health risks to occupants, the experience indicates that elevated CO2 cause drowsiness, lethargy and a general sense that the air is stale. Researchers demonstrate correlations between concentrations of this gas and reduced productivity as well as the overall performance decrease (Seppanen et al. 1999; Seppanen and Fisk 2004; Apte et al. 2000a, b; Daisey et al. 2003; Mendell and Heath 2005; Haverinen-Shaughnessy et al. 2011). For these reasons, people are frequently interested in checking whether the actually observed CO2 concentrations exceed limit or guideline values. However, this type of interest is quite limited. It completely overlooks the fact that the data on CO2 concentration indoors are much more informative. In particular, they may be employed as a source of information on the driving forces of indoor microclimate.

The residential CO2 is affected by many factors. Major ones are associated with wide range of indoor sources of this compound, their type e.g. metabolic or combustion and their intensity e.g. human activity. Highly meaningful are the operation of heating, ventilation and air conditioning system (HVAC) as well as the overall building maintenance. Other contributions come from ambient conditions and the interaction between building and its surroundings. Essentially, all these factors present time variability and in various manner they induce perturbations of CO2 concentration. Indeed, continuous measurements demonstrate there exists a considerable temporal variation of CO2 in indoor air.

Yet, the focus of this work is not the temporal variation of CO2. We propose to make one step forward and to focus on the dynamics of indoor CO2 concentration changes. While the variability of CO2 concentration in time is able to reflect the influence of factors on indoor air, the dynamics of changes may provide the information on the character of their impact. This statement is fundamental for our work. The proposal to study the dynamics of CO2 variation, and use it as a source of information on the type of collective impact of factors on indoor air, is a major novelty in this work. We focused on the collective impact, because in practice it is difficult to separate the influences originating from individual factors.

The traditional temporal analysis of time series refers to the variation in the mean and other standard sample statistics such as e.g. variance and standard deviation, range. A major shortcoming of this approach lies in the fact that the methods do not allow to study the dynamics of change. In this work, we propose a method which allows to recognize the character of dynamics of temporal variation of CO2 concentration. It is our original approach.

The method is based on three major assumptions. (1) The dynamics of indoor CO2 concentration is determined on the basis of univariate time series of measurement data. They reflect the stochastic dynamics of the values observed over time. The term univariate time series means the set of well-defined data of a single quantity (CO2 concentration) which have been measured at regular time intervals. The data shall be collected for a relatively long time. (2) The dynamics maintains common character over some period of time, i.e. in time intervals. The size of intervals as well as their position in time domain is unknown prior to the analysis. For this reason, the method shall allow to identify segments of the time series reflecting one kind of dynamics of CO2 concentration. (3) The objective is to distinguish between three types of dynamics of indoor CO2 concentration, namely: linear, sub-linear and super-linear. For this reason it is proposed to apply for characterization the sample mean square displacement (MSD) analysis. More specifically, the individual categories of dynamics are indicated using β parameter.

In this paper we demonstrate the feasibility of the approach for studying the temporal dynamics of CO2 concentration. In our future work the method will be applied to investigate other parameters of indoor air. Ultimately, we would like to achieve the capability of characterizing the dynamics of an entire complex system which is indoor air. It may become very useful in diagnosing the microclimate as well as designing a proper one.

2 Methodology

In this section we present major elements involved when analyzing the stochastic dynamics of the time series of indoor CO2 concentration. The key element of our proposal is to characterize the dynamics in time intervals, which delimit adjacent, non overlapping fragments of the time series. For those data segments, the dynamics may be described in a consistent manner, with a single value of an indicative parameter.

The main tool which we apply to describe the stochastic dynamics for a fragment of the time series is the time-averaged mean squared displacement (MSD). The choice is based on the fact that for samples being realizations of processes with stationary increments, the MSD behaves like power function \(\tau ^{\beta }\), for some parameter \(\beta >0\). The value of this parameter characterizes the stochastic dynamics of the process.

In this paper we are primarily interested in the problem of choosing a proper size of data segment for which the MSD behaves like the mentioned power function.

In order to solve this problem we introduce an information criterion. It takes under consideration the quality of MSD fitting with the power function \(\tau ^{\beta }\), but it also penalizes for the excessive number of observations involved in calculations. On the basis of this criterion we can conclude how many observations should be taken in order to achieve a reliable estimate of β. In other words, we obtain the size of time window which may be applied to time series when calculating β. Hence, there is obtained the time resolution adequate for analyzing the stochastic dynamics of the time series of indoor air parameter.

2.1 Mean square displacement

The MSD for the sample \(\{X_{i},\,i=1,2,\ldots,n\}\) of length n with stationary increments is defined as follows (Burnecki and Weron 2010):

$$M_n(\tau )=\frac{1}{n-\tau }\sum _{k=1}^{n-\tau }\left( X_{k+\tau }-X_{k}\right) ^2,$$
(1)

and it is a function of \(\tau \) which indicates the difference between observations. Let us mention, the MSD defined as in Eq. (1) is a random variable.

In general, the MSD can be obtained either by performing an average over an ensemble of particles, or by taking the temporal average over a single trajectory. The second approach is used in this paper. Some examples of using the MSD measure one can find in Caspi et al. (2000), Golding and Cox (2006), He et al. (2008), Lubelski et al. (2008).

One of important properties of stochastic process \(\{X_t\}\) which can be recognized by using MSD is its dynamics. For sample being a realization of process \(\{X_t\}\) with stationary increments, the MSD defined in (1) behaves like:

$$M_n(\tau )\sim \tau ^{\beta }.$$
(2)

We classify the process with \(\beta =1\) as exhibiting linear dynamics. If \(\beta <1\), the process may be described as featured by sub-linear dynamics, and consequently \(\beta >1\) points at the super-linear dynamics of the stochastic process. Some examples of processes that exhibit non-linear dynamics one can find in Barkai et al. (2012), Burov et al. (2001), Hoefling and Franosch (2013), Sokolov (2012).

The origin of nonlinear dynamics in a given system is often unknown (Guigas et al. 2007; Szymanski and Weiss 2009). However, (1) sub-linear (\(\beta <1\)) dynamics is usually observed in disordered systems, which are in stable conditions (Cushman et al. 2011). They possess memory. In such systems, long-range correlations are valid. System variables are typically described by Lévy-type statistics and heavy tailed distributions. Upon sub-linear dynamics, gross system changes are retarded by various factors and their interactions. More specifically, shifts of the system in one direction are quickly compensated by changes in the opposite one. (2) If \(\beta =1\) sample MSD is proportional to time. It is characteristic for systems with stationary conditions and random fluctuations. The processes which have this type of dynamics are local in space and do not have memory. The random variations can be described as a series of transitions characterized by finite length scales for each step and finite time scales between transitions. Mathematically, this represents a sequence of steps done one after another, where each step follows a random direction which does not depend on one of the previous step (Markov property). One may attribute linear dynamics to neutral conditions in the system, i.e. the situation when various influences are well balanced. (3) The super-linear growth of MSD with time \((\beta >1)\) is observed when the system is accelerated. In other words, when it is unstable or in transient state. Such conditions may be caused by imbalanced influences, e.g. resulting from active transport processes. From statistical point of view, the super-linear dynamics is typical for standard random walks with heavy tailed jump length distributions. There are many examples of processes with non-linear dynamics. We should mention here Lévy walks (Froemberg and Barkai 2013; Godec and Metzler 2013; Magdziarz and Teuerle 2015), fractional Brownian motion (Deng and Barkai 2009; Jeon and Metzler 2012), continuous time random walk (Barkai et al. 2000), heterogenous diffusion process (Cherstvy et al. 2013; Fulinski 2011), scaled Brownian motion (Lim and Muniandy 2002). For additional examples we refer the reader to (Harder et al. 1987; Iomin 2011; Jeon et al. 2013; Meroz et al. 2010).

2.2 A method to select optimal sample size for MSD analysis

There is given a time series with removed deterministic trend and seasonal components.

First, for the window length m we calculate the sample MSD according to formula (1). More precisely, we divide the sample \(\{X_{i},\,i=1,2,\ldots,n\}\) of length n to sub-samples of lengths m (non-overlapping sub-samples) and for each subsample we calculate MSD. As a result we obtain \(\left\lfloor n/m\right\rfloor \) MSD functions, one for each sub-sample. We call them \(M_{i,m}(\tau )\) for \(i=1,2.,,,\left\lfloor n/m\right\rfloor \). Next, to each \(M_{i,m}(\tau )\) we fit the power function \(\tau ^{\beta ^m_i}\) by using least squares method. As a result we obtain the parameter \(\beta ^m_i\) corresponding to each sub-sample. The fitting procedure is described in details in Maciejewska et al. (2012).

In order to check the quality of fit, we introduce the measure of fitting accuracy. This measure is defined for \(i=1,2,3,\ldots ,\left\lfloor n/m\right\rfloor \) as a mean square error between the \(\log (M_{i,m}(\tau ))\) and function \(\log (\tau ^{\beta ^m_i}){:}\)

$$e(i,m)=\frac{1}{k}\sum _{j=1}^k\left( \log (M_{i,m}(\tau _j))-\log (\tau _j^{\beta ^m_i})\right) ^2,$$
(3)

where \(\tau _1,\ldots,\tau _k\) are arguments for which the functions are calculated.

In order to find the optimal window length for MSD analysis, we introduce the information criterion. For a particular window length m we define IC m as follows:

$$IC_m=\frac{\frac{1}{{\left\lfloor n/m\right\rfloor }}\sum _{i=1}^{\left\lfloor n/m\right\rfloor }e(i,m)}{\max _i\{e(i,m)\}}+\frac{m}{n}.$$
(4)

The criterion is a sum of two components. The first refers to the fitting error when using sub-samples of length m and the second accounts for the size of the time window itself. As we observe, the value of the criterion is small when the fitting error is small and when the time window is short. Hence, the minimum value of the statistics IC m is considered as indicative for the optimal window length in MSD calculations. We denote the optimal window length as m opt . The \(\beta ^{m_{opt}}\) calculated for the time windows of that size may be taken into consideration when analyzing the stochastic dynamics of the time series of indoor air parameter.

3 Experimental

Measurements of carbon dioxide concentration were performed in a lecture room with an amphitheatric layout. Room dimensions are 19 × 8 m × (4–2.9 m). It has only one external wall, which is fitted with huge, openable windows. Despite availability of mechanical ventilation, air exchange is realized predominantly via the natural ventilation. Teaching hours extend from 7:30 to 21:00. Classes are held during all working days and on majority of weekends (part time studies). Teaching blocks are typically 1.5 h long with the brakes of 15 min in-between. Although designed for 90 students, the lecture room is hardly ever occupied to that extent. In the examined period, the number of listeners changed considerably within a single day as well as from day to day.

CO2 measurements were performed the central part of the room at the height of about 1 m. The measuring device was separated from the direct influence of the emission sources (students and the teacher).

The monitoring was realized with the instrument dedicated to continuous measurements and data logging. It is based on the NDIR sensor and it offers the following measuring characteristics: measuring range 0–5000 ppm; accuracy 50 ppm +3 % of measured value and the measurement data resolution 1 ppm. This level of performance may be currently considered as a standard in indoor air quality studies.

The measurement results were recorded with time resolution of 15 s. In this work we analyzed the data collected during eight consecutive days, between 6th and 14th June 2012.

4 Analysis of simulated data

In this section we test whether the proposed methodology recovers the true dynamics of an exemplary time series. We present the results of applying the methods presented in Sect. 2 to simulated data with known characteristics. The simulated time series was characterized by the time varying β. More specific, we assumed that simulated data consisted of subsamples of equal length with the parameter β constant within every subsample and changing on consecutive subsamples. Due to the periodicity observed in real-world data from Sect. 5 we included in our simulated data a periodic parameter β on subsequent subsamples. Nevertheless, our method is capable to cope with a nonperiodic case as well.

In order to obtain the requested data we use the fractional Brownian motion (fBM), since it is a process with the MSD being asymptotically like a power law function, see Eq. (2), with the power law exponent controlled by the model’s parameter. Namely, the fBM is a mean zero Gaussian process \(\{B_H(t)\}\) defined as follows (Mandelbrot and Van Ness 1968):

$$B_H(t) = \int _{-\infty}^{\infty }\;\left((t-u)_{+}^{H-1/2}- (-u)_{+}^{H-1/2}\right) dB(u),\quad t\ge 0,$$

where \(\{B(t)\}\) is the classical Brownian motion, \((x)_{+}=\max (x,0)\) and parameter \(H\) (\(H\in (0;1)\)) is called Hurst exponent. What is important, the second moment of the process \(\{B_H(t)\}\) is expressed as \(\left\langle B_H(t)^2\right\rangle =\sigma ^2t^{2H}\) (for some parameter \(\sigma >0\)) and it coincides with its sample MSD, therefore we have that \(\beta =2H\).

The simulation was performed according to algorithm presented in Stoev and Taqqu (2004). We simulated 12,000 observations. This set consisted of \(24\) subsets of length 500 \((k=500)\) and was characterized by varying parameter H. Namely, the entire data set was

$$\left( \left( B_{H(1)}(i)\right) _{i=1}^k,\left( B_{H(2)}(i)\right) _{i=k+1}^{2*k},\ldots ,\left( B_{H(24)}(i)\right) _{i=23*k+1}^{24*k}\right)\!,$$

with the parameter \(H(\cdot )\) being periodic:

$$H(j)=\left\{ \begin{array}{l} 0.2 \quad{\mathrm {for}} \;mod(j,4)=1, \\ 0.8 \quad{\mathrm {for}} \;mod(j,4)=2, \\ 0.6 \quad{\mathrm {for}} \;mod(j,4)=3,\\ 0.3 \quad{\mathrm {for}} \;mod(j,4)=0.\\ \end{array} \right.$$
(5)

It is worth noticing that the increments of our data are stationary only within the subsamples of length 500 on which the parameter H is constant. Therefore our simulated data are of different type than so-called multifractal fBM considered in Ralchenko and Shevchenko (2010).

The trajectory of simulated data is presented in Fig. 1. One can notice that on subsequent intervals of length 500, we observe trajectories of fBM with parameter H given by formula (5). The symptoms of time-varying parameter H are visible in Fig. 2, where the amplitude of the increments for the presented trajectories varies over time.

The simulated data was analyzed using an approach described in Sect. 2. The goal was to select optimal time window size for MSD analysis. In Fig. 3 we present the information criterion plotted versus time window length. From the obtained results it appears that the optimal window length is \(m_{opt}=502\) observations. This value almost replicates the true value (k = 500) for the simulated data.

As the final step we attempted to reconstruct the values of parameter β in consecutive time windows of length \(m_{opt}\). In Fig. 4 we plot the estimated values of the parameter β together with true values. There we find almost perfect agreement. This result confirms that the approach proposed in Sect. 2 can be used to such data. In next section we present our tools applied to real-world data.

Fig. 1
figure 1

A trajectory of simulated data considered in Sect. 4

Fig. 2
figure 2

The increments of a trajectory presented in Fig. 1—one can observe typical periodic variance change between subsequent subsamples of length 500

Fig. 3
figure 3

The values of information criterion for different window sizes. The minimum value of the criterion is obtained at \(m=502\), which indicates the optimal window size

Fig. 4
figure 4

The comparison between true and estimated values of parameter β on consecutive windows of observations (of length \(m_{opt}=502\)) for simulated data

Fig. 5
figure 5

Time series of CO2 concentration

5 Analysis of carbon dioxide time series

The recorded time series of CO2 concentration is displayed in Fig. 5. We analyzed the stochastic dynamics of this data.

In the initial step, we performed data transformations which allowed us to obtain the process with stationary increments. For that sake, all deterministic components observed in the data had to be removed. First, the polynomial trend was fitted to the data by using least squares method. The proper order of polynomial was calculated on the basis of mean square error measure. After removing the deterministic polynomial, one should also eliminate the deterministic seasonality. Using periodogram we have identified the dominant cycle which was equal to 5,760 points, i.e. number of data points in a single day. We proposed to fit the data with the function which is a sum of sine waves. Similar as previously, we could find the proper number of sinusoidal functions by using mean square error measure.

After the mentioned transformations the data exhibited behavior adequate to a process with stationary increments. The stationarity was checked using autocorrelation and partial autocorrelation function of the differentiated time series. Respective plots are displayed in Fig. 6. It is easy to observe that the values of ACF and PACF tend to 0 with larger lags, indicating that the analyzed time series is stationary.

Next, following the same steps as in Sect. 4, we determined the optimal window length. The results presented in Fig. 7 indicate that the optimal window length is 693 observations, which is equal to 3 h 5 min, approximately.

Fig. 6
figure 6

Stationarity check of differentiated CO2 time series

Fig. 7
figure 7

The values of information criterion for different window sizes. The minimum value of the criterion is obtained at \(m=693\) (approximately 3 h 5 min), which indicates the optimal window size

Having calculated the optimal window length, we estimated mean squared displacement in subsequent non-overlapping intervals, each consisting of 693 observations. The estimated values of β parameter together with respective error bars are plotted in Fig. 8. The i-th error bar in Fig. 8 is calculated using the root-mean-square error \(\sqrt{e(i,693)}\) related to the fitting accuracy given by Eq. (3). One can observe that the errors are in general of low values, therefore the resulting estimates of parameter β in consecutive non-overlapping intervals can be recognized as valid.

Based on the obtained results the dynamics of CO2 concentration in indoor air was quite complex. First, in the analyzed time series there were recognized all types of dynamics, namely: sub-linear (\(\beta <0.9\)), linear (\(0.9<\beta <1.1\)) as well as super-linear (\(\beta >1.1\)) . Secondly, there were observed the periodic changes of the dynamics over time. Namely, the period of sub-linear dynamics was typically followed by the one of linear dynamics, next there appeared super-linear dynamics and via the subsequent time interval of linear dynamics the system returned back to the sub-linear one. In a more simple case sub-linear and linear dynamics were observed interchangeably. We have examined the periodicity of \(\beta \) time series by analyzing the autocorrelation function, as shown in Fig. 9. From it, the dominant cycle was identified, equal to 8.25. It is worth noticing that the calculated cycle is close to \(\frac{24}{3.08} \approx 7.8\). From this fact we might draw a conclusion that the estimated characteristic is periodic with cycle approximately equal to one day.

As displayed in Fig. 9 for the examined 1 week + 1 day long data, CO2 concentration in indoor air most frequently exhibited nonlinear dynamics. Sub-linear one was typically associated with the nighttime. In these conditions we may view the air as a stable system. Positive changes of CO2 concentration are quickly compensated by negative ones. Gas motion driving forces are largely inactive, similar as CO2 sources. Super-linear dynamics was usually observed during midday. In this case, we may describe air as being unstable or in transient state. Once initiated, the changes of CO2 concentration are further accelerated, irrespective of their direction. Gas motion driving forces are active and not balanced, similarly as CO2 delivery and removal processes. The linear dynamics was usually recognized in periods between the nighttime and midday, but sometimes, it extended over entire daytime. In circumstances of linear dynamics changes of CO2 concentration are neither retarded not accelerated. Gas motion driving forces are active but balanced, similar to CO2 delivery and removal processes. The air is in neutral state.

Fig. 8
figure 8

Estimated values of β parameter in consecutive windows of observations together with respective error bars (see Sect. 5 for details of error bars calculations)

Fig. 9
figure 9

Autocorrelation function of β’s estimates depicted in Fig. 8

6 Conclusions

In this work we proposed a method which allows to recognize the character of stochastic temporal dynamics of CO2 concentration in indoor air.

The method focuses on the time series of measured CO2 concentrations and it utilities MSD analysis as a tool to characterize the dynamics. More specifically, β parameter is applied as the dynamics descriptor.

The method was verified by demonstrating high accuracy of reproducing the known stochastic dynamics of the simulated data set.

We also showed that the approach was successful in studying the dynamics of CO2 concentration based on real measurement data. In the examined time series there were identified segments featured by linear, sub-linear and super-linear dynamics. Their occurrence in time could be associated with the daily cycle of change of the collective influence of factors on indoor air.

Following, we intend to apply the method for investigating the occurrence of various types of CO2 concentration dynamics in long term perspective (a year or longer) in various enclosed spaces. Further, the study, utilizing this approach, will be extended to other parameters of indoor air. Ultimately, we would like to achieve the capability of characterizing the dynamics of an entire complex system which is indoor air. We believe it will become useful for understanding and designing proper microclimate in buildings. Other important area of application is the diagnostics of indoor environment.