1 Introduction

Methane is a potent greenhouse gas (GHG), trapping more than 80 times more heat in the atmosphere than carbon dioxide after it reaches the atmosphere over the first 20 years [12, 3]. According to the Canadian federal government’s official greenhouse gas inventory, methane accounts for 13% of Canada’s GHG emissions. About 43% of methane emissions in Canada are sourced from oil and gas operations and 29% from agricultural (e.g. cattle) activities [4, 5]. Just under 20% of global warming can be attributed to methane [6]. Despite COVID-19 pandemic shutdowns worldwide, methane concentration still grew in 2021 as shown in Fig. 1 [7].

Fig. 1
figure 1

Global monthly methane mean concentration from July 1983 to December 2021 [7]

There have been many policies and activities targeted at reducing methane emissions in recent years. For example, the Global Methane Pledge launched in November 2021 at COP26 aims to reduce global methane emissions 30% below 2020 levels by 2030 [8]. Canada has committed to this pledge. Alberta, as a proactive provincial government, has committed to a methane emission reduction objective for the oil and gas sector, positioning itself as a leader in emission reduction efforts [9, 10]. In September 2021, the Canadian government set a target to reduce methane emissions by 45% below 2012 levels by 2025. This reduction target is part of Canada’s efforts to address climate change aligning with its commitments under the Paris Agreement. To achieve the 45% target, the Canadian government has implemented various measures, including regulations and industry-specific initiatives in particular, focused on the oil and gas sector. Alberta [9] has reported that oil and gas methane emissions dropped by 44% between 2014 and 2021 although there are uncertainties that need to be addressed with respect to potential underestimates methane emission volumes [11]. Further to the 45% target, the Government of Canada has committed to reducing methane emissions from the upstream oil and gas industry by 75% from the 2012 levels by 2030 whereas the Government of Alberta is considering a 75–80% reduction from 2014 levels by 2030.

Given the trends for GHG emissions and concentrations in the atmosphere, it is important to have tools to predict the evolution of methane air concentrations through time as a basis for establishing a baseline to ensure reductions are occurring. Therefore, accurate methane prediction models are required for projecting how methane emissions link to industry activities as well as to provide informative and prompt analysis, allowing government and even people in affected areas to take timely actions to reduce emissions.

Methane concentration prediction is a time series problem. Traditional time series analysis and forecasting methods, for example, the autoregressive moving average (ARIMA) method uses univariate time series to make predictions without accounting for ambient climate variables [12, 13]. However, it is well known that methane concentration is affected by many factors, such as temperature and wind speed and directions. Wind speed and direction reflect the atmospheric pressure distribution. In recent years, machine learning–based models for predicting GHG emissions have been acknowledged as both efficient and reasonably accurate [14,15,16,17,18,19]. Numerous prediction models of methane emissions have been developed using data obtained from dairy cows [20], sheep and dietary factors [21, 22], rice paddies [23, 24], and other natural systems such as agricultural soil [14, 25], wetlands [26], and inland freshwater bodies, i.e. ponds, reservoirs, and lakes [15, 16]. Despite this work, there remains no consensus on the best method and type of additional data, e.g. temperature and wind conditions, that can be used to improve predictions of methane concentration time series data.

Several machine learning approaches have been used to model methane concentration [14, 19]. Among those studies, multiple variable linear regression has been used to analyze the percentage of the variance of emissions associated with production volume [27]. Support vector machines (SVMs) have been used for forecasting methane emissions from landfills [28]. Long short-term memory (LSTM) neural networks have been used to predict emissions from soil [14]. The Prophet model has been used for forecasting air pollution time series data [17]. There has been considerable progress in developing neural network methods for air pollution modelling [29, 30]. Athira et al. [31] applied recurrent neural networks (RNNs) to predict air quality and Ma et al. [21, 32] used long short-term memory (LSTM) neural networks to forecast particulate matter and carbon dioxide emissions. LSTMs, due to their ability to have “memory” and capture long-term dependencies, have been shown to provide strong performance for predicting data trends, but no studies so far have focused on their use to build models to predict methane emission using multivariate data.

Here, we examine the application of LSTM for analysis of the time series tied to multivariate climate data, in particular, temperature and wind speed and direction, to predict methane concentration in the air. Methane concentration, temperature, wind speed, and direction data in the Fort McMurray area, Alberta, are explored using LSTMs. The seven adjacent air quality monitoring station datasets in the Wood Buffalo area of Alberta, Canada, displayed in Fig. 2, were used for the analysis.

Fig. 2
figure 2

Wood Buffalo monitoring sites [33]. The data used here is from the following stations: Anzac, Bruderheim, Buffalo Viewpoint, Fort McMurray—Athabasca Valley, Fort McMurray—Patricia McInnes, Lower Camp, Mildred Lake

2 Materials and Methods

2.1 Data Description

The air quality data used to train and test the machine learning models is categorized as multivariate and discrete-time series data. Specifically, multivariate and discrete-time series are data points that contain sequential multiple variable records measured over contiguous time periods at discrete time points.

The quality of the observation data for training is one of the most critical factors for the successful performance of a machine learning prediction model. In the province of Alberta, air quality is monitored in airshed regions by industry, communities, and Alberta Environment and Parks Canada [9]. In 1992, Canada established the National Pollutant Release Inventory (NPRI), where industries, businesses, and facilities are required to report the release of substances and disposal to the NPRI [34]. This data helps the government monitor emissions and air quality and regulate environmental policies [359]. This study has multivariate data from seven permanent monitoring stations, listed in Table 1, with data spanning dates ranging from 2010 to 2021. All seven stations are from the Wood Buffalo Environmental Association (WBEA) [36]. The WBEA operates the largest airshed in the largest municipality and is considered Canada’s most integrated and intensive air and terrestrial monitoring [36]. The WBEA monitors the air in the Regional Municipality of Wood Buffalo (RMWB) 24 h a day, 365 days a year.

Table 1 Monitoring stations’ methane concentration date range and the number of data points (Alberta Air Data Warehouse).

The Continuous Emission Monitoring Systems (CEMS) Code of Alberta holds standards and requirements for installing, operating, maintaining, and certification continuous emission monitoring systems [371035]. The code has been taken effect since 1998 and got recently revised in 2021, with the intention to ensure effective and standardized measurement, recording and reporting of specified emissions and other parameters [9]. All the quality assurance and quality control actions taken for each dataset collected in the Alberta Airsheds guarantee the quality of the time series data we are feeding for the machine learning models. In addition, the WBEA works closely with Alberta Environment and Parks (AEP) to ensure they have followed every standard, enhancing the overall quality of the monitoring data, both air and terrestrial.

Table 1 lists a detailed date range, number of data points, and statistical parameters for methane data from each station. The corresponding number of data points of temperature, wind speed, and wind direction are the same as the methane data points. In this study, we focus on examining the deep feature relationships between different combinations of climate variables among these seven monitoring stations.

We use hourly air quality data from the Alberta Air Data Warehouse for the seven stations shown in Fig. 2. The dataset contains several variables: methane concentration, outdoor air temperature, wind speed, and wind direction. Table 2 provides detailed information about the input variables, units, and measurement methods. The wind speed, \(U\), is preprocessed by converting it into latitudinal (x) and longitudinal (y) direction components:

$$U_\text{x}=U\text{cos}\varnothing\;\text{and}\;U_\text{y}=U\text{sin}\varnothing,$$

where the wind direction \(\varnothing\) is expressed in radians. An example of the input variables profile for the Anzac monitoring station for 2020 is shown in Fig. 3 (the other six station’s data are displayed in the “Supplementary Information” section). In our analysis, all data points are used.

Table 2 Description of input variables
Fig. 3
figure 3

Time series plots of methane, temperature, and wind speed components (x and y) for the Anzac monitoring station in Northern Alberta [35]

In another example, Fig. 4 illustrates the methane concentration (in ppm) versus temperature through time, in 1-year increments, for the Bruderheim station. In some years, the data shows that there is a tendency for higher methane concentrations in winter (lower temperature) and lower in summer (higher temperature), although in some years, there are methane concentration peaks occurring in warmer weather (see 2010–2011 and 2012–2013 for example). A large fraction of the data is at a concentration of about 2 ppm with less data found above 3 ppm. The data shows that there are many peaks in some years with data above 3 ppm (e.g. 2014–2015 and 2015–2017), but after 2017, the data is mainly focused below 3 ppm with a tendency for flatter profiles.

Fig. 4
figure 4

Methane concentration (in ppm) versus temperature (in °C) for the Bruderheim station through time.

2.2 Long Short-Term Memory

Long short-term memory (LSTM) neural networks are an improvement of the recurrent neural networks (RNN) approach, which was designed to avoid the vanishing and exploding gradient problems [3839]. As a variant of RNN, LSTM consists of a chain of repeating modules. Each module contains exclusive “cell states” that preserve long-term dependencies throughout the model training progress [40]. Compared to the standard RNN, the LSTM neural network performs better in training with long-time sequences. Each LSTM storage unit consists of one memory cell \({C}_{{\text{t}}}\) and three gates, including the forget gate \({f}_{{\text{t}}}\), the input gate \({i}_{{\text{t}}}\), and the output gate \({o}_{{\text{t}}}\). The state of the memory cell \({C}_{{\text{t}}}\) is jointly controlled by the three gates [41, 42]. Formally, the LSTM network can be formulated as:

$${f}_{{\text{t}}}= \sigma \left({W}_{{\text{f}}}\bullet \left[{h}_{{\text{t}}-1}, {x}_{{\text{t}}}\right]+ {b}_{{\text{f}}}\right),$$
$${i}_{{\text{t}}}= \sigma \left({W}_{{\text{i}}}\bullet \left[{h}_{{\text{t}}-1},{x}_{{\text{t}}}\right]+ {b}_{{\text{i}}}\right),$$
$$\widetilde{{C}_{{\text{t}}}}={\text{tanh}}\left({W}_{{\text{C}}}\bullet \left[{h}_{{\text{t}}-1},{x}_{{\text{t}}}\right]+ {b}_{{\text{C}}}\right),$$
$${C}_{{\text{t}}}= {f}_{{\text{t}}}\odot {C}_{{\text{t}}-1}+ {i}_{{\text{t}}}\odot \widetilde{{C}_{{\text{t}}}},$$
$${o}_{{\text{t}}}= \sigma \left({W}_{{\text{o}}}\bullet \left[{h}_{{\text{t}}-1},{x}_{{\text{t}}}\right]+ {b}_{{\text{o}}}\right), {\text{and}}$$
$${h}_{{\text{t}}}= {o}_{{\text{t}}}{\text{tanh}}\left({C}_{{\text{t}}}\right).$$

At the start, an input \({x}_{{\text{t}}}\) at time t is fed to the network. The forget gate \({f}_{{\text{t}}}\) then decides which information from the previous output \({h}_{{\text{t}}-1}\) is discarded or kept. Then, the input gate \({i}_{{\text{t}}}\) decides which state will be updated. With the outputs obtained from the forget gate and the input gate, in addition to a vector of new \(\widetilde{{C}_{{\text{t}}}}\) generated with a \({\text{tanh}}\) layer, an updated new cell state \({C}_{{\text{t}}}\) is determined. The result \({h}_{{\text{t}}}\) comprises a sigmoid neural network layer and a \({\text{tanh}}\) function with respect to the new cell state. The weights \({W}_{{\text{X}}}(X \in [f, i, C, o])\) are for the previous hidden state and current input, \({b}_{{\text{X}} }(X \in \left[f, i, C, o\right])\) are the bias vectors, and \(\sigma\) is the sigmoid function given by \(\sigma = \frac{{e}^{{\text{x}}}}{{e}^{{\text{x}}}+1}\). Here, “\(\odot\)” is the Hadamard product (an element-wise product). Also, “ + ” represents pointwise addition.

2.3 Rolling Window

The collected raw data are in a consecutive form that the model cannot take in as input data directly. To deal with this, the rolling window method for data preprocessing before feeding it into the model was used. To predict the time-series value at time t + 1, the model not only needs the value at time t but also the values at time t − 1, t − 2, …, t − ∆t. ∆t is defined as the lookback window size. A smaller lookback size might not give an equivalent prediction performance compared with a larger lookback size. However, the larger the lookback size, the greater the computational work, noise, and complexity [21, 32, 43]. In our approach, the lookback window size is optimized. For sequential data such as time-series data where subset samples of the data are not likely independent, we use a blocked time-series cross-validation. In this approach, two margins are used: first, we added a gap between the training data set and the validation data to make sure that there were no lag values or “history” to interfere the validation [44]. The second gap was between each iteration with size equal to the lookback window size (the data sets are organized in the rolling window structure) to help prevent the model memorizing the pattern from previous samples [44].

For each individual station, 80% of its data is used for training, 10% retained for testing, and the remaining 10% reserved for cross-validation. For training, testing, and validation, the data from the stations were kept separate. Detailed information about how the data was fed into the model is illustrated in Fig. 5. In each epoch, the model is trained on batches of data with the data object shape of [lookback window size, 1] until the model has traversed all the training data.

Fig. 5
figure 5

Rolling window data structure for ML algorithms

3 Experimental Design

3.1 Proposed Architecture

The objective is to select the hyperparameter combinations that yield the best forecasting model. Figure 6 presents the flowchart for the multivariate prediction of the air quality data. The proposed framework consists of three main parts: data preprocessing, neural network training, optimization, and multivariate prediction. The data preprocessing stage is responsible for extracting and validating the time series data and then formatting the data with the rolling window before feeding it into the neural network models. Next, in the second stage, the neural network will be trained again and again with different hyperparameter combinations to find the one with the best forecasting performance yielding a model ready for forecasting.

Fig. 6
figure 6

Flowchart of the proposed architecture

To take a closer look at each layer of the neural network, Fig. 7 shows the proposed topology of the proposed multivariate LSTM networks. First, the inputs of the climate variables are extracted and validated from the raw dataset after removing invalid data, which is indicated by abnormal or erroneous observations, such as sudden temperature spikes. In this research, two processes were used to evaluate invalid data. We use the Z-score method to detect how many standard deviations away a data point is from the mean. Although the Z-score is one of the most efficient ways to detect anomalies, the potential drawback of mean and standard deviation being highly affected by outliers should be validated. Here, to deal with this, the isolation forest, a binary tree–based unsupervised machine learning method, is used to verify and ensure that the mean and standard deviation are not affected by outliers. Second, the feature datasets are reformed with the rolling window structure with a given window size where the window size represents how much history is contained within each training step. Third, the datasets are normalized and fed to the LSTM layer; each input has its own LSTM layer. Last, the results from each input are concatenated and fed to the final dense layer to generate the prediction result.

Fig. 7
figure 7

Multivariate LSTM Architecture

3.2 Hyperparameters Optimization

A LSTM neural network contains parameterized functions that directly affect the performance of the algorithm for providing better predictions. A grid search of the hyperparameters was used for the optimization process; the points used are listed in Table 3. After hyperparameters are found that yield the best results, the model is trained for making predictions. The pseudo-code for the proposed multivariate LSTM structure is illustrated in Algorithm 1, which is divided into three parts to reflect the flowchart described above.

Table 3 Hyperparameters for model optimization
Algorithm 1
figure a

Pseudo-code for procedures of proposed prediction framework

3.3 Measures of Error

The root mean squared error (RMSE), one of the most common scoring rules to evaluate the accuracy of the model, is calculated from:

$${\text{RMSE}}=\sqrt{\frac{{\sum }_{{\text{i}}=1}^{{\text{n}}}{\left({Y}_{{\text{i}}}- {P}_{{\text{i}}}\right)}^{2}}{n}},$$

where \({Y}_{{\text{i}}}\) is the actual value of methane concentration, \({P}_{{\text{i}}}\) is the predicted value, and \(n\) is the number of data points. The RMSE measures the average magnitude of the error from the calculation, which gives high weight to large errors. It is evident that the smaller the RMSE value is, the more accurate the prediction is. The mean absolute error (MAE) is another criterion to evaluate model performance that shows the average offset compared with the actual observation, given by:

$${\text{MAE}}=\sum_{{\text{t}}=1}^{{\text{n}}}\left|\frac{{Y}_{{\text{t}}}- {P}_{{\text{t}}}}{{Y}_{{\text{t}}}}\right|.$$

In the equation, \({Y}_{{\text{t}}}\) is the actual value while \({P}_{{\text{t}}}\) is the predicted value.

4 Results and Discussion

Figures 8 and 9 show the predicted values of each climate combination and observed (raw) methane concentration values for the Lower Camp and Mildred Lake stations, respectively (the other station results are displayed in the “Supplementary Information” section). The x-axis represents the time, and the y-axis is the methane concentration in parts per million (ppm). The blue line shows the raw data, whereas the red line shows the univariate model prediction value (prediction obtained from training with the methane data alone); this model is the Methane-only model. The green line showed the methane concentration prediction when the model was trained with both the methane and temperature data; this model is referred to as the Methane + Temp model. The purple line shows the predicted methane concentration when the model is trained with both the methane concentration and wind speed and direction data; this model is referred to as the Methane + Wind model. Finally, the orange line shows the methane concentration predictions when the model is trained with the methane concentration, temperature, and wind data; this model is referred to as the Methane + Temp + Wind model.

Fig. 8
figure 8

Lower Camp station predicted value versus the observed (raw) methane concentration

Fig. 9
figure 9

Mildred Lake station predicted value versus the observed (raw) methane concentration

Table 4 lists the MAE and RMSE evaluations between the different models and monitoring stations for the training and validation phase. In general, the Methane + Temp model outperforms the Methane + Wind model with respect to both MAE and RMSE, and in some cases, the Methane + Temp + Wind outperforms the Methane + Temp model. For example, in the training and validation phases, the MAE for the Anzac station was lowered from 0.00245 ppm for the Methane-only model to 0.00207 ppm with the Methane + Temp model and 0.00322 ppm for the Methane + Wind model and further lowered to 0.00168 for the Methane + Temp + Wind model. In the other cases, the Methane + Temp model performed better than the Methane + Temp + Wind model. The RMSE is also lowered for the Methane + Temp models in the training and validation phases. For example, the RMSE for methane prediction of the Fort McMurray-Athabasca Valley station data is 0.000410 for the Methane-only model, which rises to 0.000412 for the Methane + Wind model and 0.000439 for the Methane + Temp model.

Table 4 MAE and RMSE of different ambient climate variable combinations evaluated in the training and validation phase

Table 5 lists the MAE and RMSE values for forecasting data that the models had not seen before (the forecasting phase). The results exhibit similar trends to that of the training and validation phases. The MAE for the prediction of the Fort McMurray-Patricia McInnes station data is reduced from 0.0478 ppm for the Methane-only model to 0.0486 ppm for the Methane + Wind model and 0.0457 ppm for the Methane + Temp model. The Methane + Temp + Wind model achieves a RMSE of 0.0464 ppm. The RMSE for this station is 0.109 for the Methane-only model which declined to 0.107 for the Methane + Temp model. When the wind parameters are included in the training set (the Methane + Temp + Wind model), the RMSE is larger. Overall, the models trained with both methane concentration and temperature data in most cases achieve the best predictive performance.

Table 5 MAE and RMSE of different ambient climate variable combinations evaluated in the forecasting phase

When looking at the forecasting error versus the variance value of the methane concentration data for different monitoring stations listed in Table 1, it becomes clear that the data with a higher variance has lower training errors. For example, the Bruderheim station has a variance of 0.1217 but still has an RMSE (training and validation) of only 0.000277. On the other hand, Buffalo Viewpoint has a relatively low variance of 0.0357 but has a higher RMSE of 0.00122. This provides evidence that the variance of the input data does not affect the forecasting performance.

The results demonstrate that the addition of temperature data to the methane concentration training set provides a better predictive model than that when the wind parameters are included. One reason for this could be that as the temperature rises, a given number of moles of methane would have greater volume (as shown by the ideal gas law). Thus, the higher the temperature, the greater the amount of methane that would leak from a system containing methane. Also, the larger the temperature, the lower the solubility of methane in water and oil. In oil and gas operations where methane is dissolved in the produced water and oil (in the form of solution gas, for example), the higher the temperature, the larger the amount of methane released from this water and oil. Alberta is a large oil-producing province holding the third-largest reserves of oil globally (March 2022) which implies that there are industry sources for methane emissions. The wind parameters (speed and direction) do not provide an improvement of the predictive performance of the model when included in the training data. This is likely because the wind is somewhat random in nature and does not have a physical link between it and methane concentration. However, the greater the wind, due to dilution effects, the lower would be the methane concentration. However, the results do not demonstrate that when the wind is added to the training set that it helps the predictive capabilities of the model over that of training the model on the methane data alone.

5 Conclusions

A multivariate prediction LSTM-based machine learning model to forecast methane concentrations has been evaluated. The results suggest that adding temperature and wind speed and direction to the methane concentration training dataset may help or harm the prediction performance of the machine learning model. On the one hand, when the temperature is added to the training dataset, the ability of the model to predict methane concentration is improved over that of using the methane concentration data alone. On the other hand, the use of the wind speed and direction led to less accurate predictions. The results reveal that the models provide different degrees of performance depending on the station. This demonstrates that the machine learning model is not able to perform uniformly for the stations. Another observation is that the MAE for the forecasting phase is roughly an order of magnitude higher than that of the training and validation phase. The RMSE exhibits similar trends to that of the MAE results. The results also suggest that the variance of the data does not affect forecasting performance. Limitations of the study arise from the LSTM method itself: potential overfitting by the LSTM, non-optimal choice of hyperparameters, and detection of anomalous data. Although attention was paid to detect issues arising from these limitations, future work will expand on understanding why the model was not able to perform uniformly for all of the stations as well as examining other machine learning methods that can improve the ability to predict methane concentrations when integrated with temperature and wind speed data.