Introduction

To fulfil its commitments under the Paris Agreement, Canada introduced the Greenhouse Gas Pollution Pricing Act in June 2018. This federal carbon pricing system serves as a “backstop” mechanism that kicks in when provincial emission pricing schemes fail to meet federal standards. The Pan-Canadian charge of $20 per tonne (t) came into effect on April 1, 2019, and ratcheted up by $10/t each year, reaching $50/t in 2022. The $50/t (in constant $2019) emission price is equivalent to the social cost of carbon (SCC) currently used by the federal government. Upon full implementation in 2022, GHG emissions are expected to decline by 80–90 million tonnes across Canada (Environment and Climate Change Canada 2018). Furthermore, the government has announced its intention to increase the carbon price by $15/t per year starting in 2023 rising to $170/t in 2030.Footnote 1

At full implementation, the actual GHG reduction realized in various industries will depend on their specific marginal abatement costs (MACs) for GHGs. Since GHG emission intensities and the associated MACs differ across industries, the abatement level undertaken will differ accordingly. What is the MAC for each industry? Which industries will reduce pollution to avoid the carbon levy and which industries will find the $50/t cost relatively cheaper than the cost of emission reduction? Based on the estimated MACs, what is the extent of abatement each industry would achieve at the $50/t carbon price? To provide some reasoned answers to these important questions, we estimate the MACs of GHGs across many sectors in Canada using the shadow cost approach.

The shadow price of a pollutant measures the opportunity cost of reducing pollution (the quantity of the undesirable output) at the actual mix of the desirable and the undesirable outputs (Färe et al. 1993). Accordingly, it can be interpreted as the opportunity cost of an incremental emission reduction in terms of the forgone desirable output. The higher the shadow cost, the higher the opportunity cost of achieving the additional reduction in undesirable outputs (pollutants). In the context of GHG emissions, the shadow cost measures the MAC of GHGs and is considered the notional market price of GHGs. Therefore, the MAC is higher in industries where total GHG emissions are low, reflecting the increasing difficulty of reducing a marginal unit of pollution.

Estimated shadow costs of GHGs are useful for undertaking various analyses. First, one can evaluate policy ex ante by using the shadow cost as a reference price when introducing a market-based scheme or alternatively as the benchmark penalty rate for emissions. The shadow price is also used to determine the expected reduction under a given emission charge, or likewise to determine the required emission charge for achieving a specific reduction target. Next, the industry could use such estimates as a benchmark cost in investment project evaluations that account for pollution costs (Dang and Mourougane 2014).Footnote 2 Lastly, shadow marginal abatement costs are used to provide estimates for environmentally adjusted total factor productivity (Nanere et al. 2007; Gu et al. 2019; and Rodŕıguez et al. 2018).

It must be noted that the shadow cost approach is different from the SCC approach, which is estimated using integrated assessment models (Nordhaus 2017). The SCC approach is used in USA and Canada in appraising relevant policies. Canada adopted the estimates provided by the US government with some modifications and currently uses a SCC of $50/t CO2 (in 2019 dollars). Estimation of the SCC involves forecasting socio-economic and emissions trajectories up to 300 years into the future, thus making the estimates inherently uncertain. Another related issue is the selection and use of the appropriate discount factor to derive present values of future estimates.

As Stern and Stiglitz (2021) note, the SCC underestimates the true social cost of emissions resulting either in inaction or too little action. To this effect, the 2017 High-Level Commission on Carbon Prices chaired by Stiglitz and Stern emphasized the resource-cost approach, the costs of achieving a certain emission reduction target. The resource-cost approach has been adopted by the UK, France, and the World Bank.Footnote 3 In the same spirit as the resources-cost approach, which utilizes actual data to estimate MACs, this study uses the shadow cost approach.

The theoretical model used as a basis for estimating shadow MAC is the environmental production technology. This production technology includes desirable and undesirable outputs and can be fully characterized by directional distance functions (Chambers et al. 1998) or the Shephard output or input distance functions (Färe et al. 2005; Färe and Primont 1995; Swinton 1998; Hailu and Veeman 2000; and Vardanyan and Noh 2006). These distance functions determine the mapping rule used to approximate the MAC and are discussed in greater details in “Environmental Production Technology” section. The shadow price can be derived due to the duality of distance functions to revenue and cost functions. The shadow cost estimates measure the value of the foregone income/the incremental cost resulting from the reduction in undesirable outputs.

Approximating MACs using the concept of shadow price has been vastly used in the literature. This is primarily due to the relatively modest data requirements used in the distance function approach, namely inputs and outputs quantities. Additionally, the distance function approach provides great flexibility regarding the level of application, ranging from cross-country, cross-regions, and cross-plants within an industry (Dang and Mourougane 2014). For instance, Maradan and Vassiliev (2005) estimated shadow prices of CO2 in 76 developed and developing countries. Similarly, Boussemart et al. (2017) estimated worldwide shadow costs of CO2, including 119 developed and developing countries.

Considerable work is done in estimating MACs in various industries or regions in China. For instance, Wei et al. (2013) estimated MACs for thermal power plants in China; Wang et al. (2017) estimated MACs for plants in Chinese iron and steel industry; Lee and Zhang (2012) estimated MACs for Chinese Manufacturing industries; Tang et al. (2016) estimated MACs for Chinese regions; and Du et al. (2015) and Ma and Hailu (2016) estimated MACs for Chinese provinces. More recently, Wu et al. (2021) estimate MACs for two major pollutants at the provincial level, analysing their drivers over time, while Xue et al. (2022) provide provincial estimates of shadow MACs for the industrial sector. In the US context, a number of studies have estimated MACs for coal power plants (Färe et al. 2005; Vardanyan and Noh 2006; Lee and Zhou 2015; and Coggins and Swinton 1996), Matsushita and Yamane (2012) provided estimates for the Japanese electricity generation sector, and Gu et al. (2019) provide plant-level estimates for the Canadian manufacturing sectors. Overall, the shadow cost method of estimating MAC has become an important tool for approximating the market price of GHGs.

Following Cuesta et al. (2009) and Cuesta and Zofìo (2005) and unlike all the above mentioned studies, we adopt the hyperbolic output distance function (HODF) and use stochastic frontier econometric methods to estimate the parameters for computing/approximating the shadow MAC for aggregate GHGs.Footnote 4

This approach is preferred to the conventional distance functions. More specifically, the Shephard output and input distance functions measure performances radially in terms of the ability to expand all outputs or contract inputs equi-proportionately. Similarly, the directional output distance function measures performance in terms of reducing the undesired output and expanding the desired output along a unit directional vector. Thus, they treat both the desired and undesired outputs symmetrically. What is desired is a distance function that treats them asymmetrically and HODF meets this requirement (Cuesta et al. 2009).

Our study is the first to provide estimates for MAC of GHGs for Canadian industries using the HODF. Previous Canadian MAC estimates by Dang and Mourougane (2014) at country level and by Gu et al. (2019) at the plant-level for manufacturing sector are based on ODF method and do not account for inefficiency. Overall, the adoption of the HODF is a notable strength of our study. Furthermore, adaption of a parametric method based on time-varying and time-invariant panel stochastic frontier provides some advantages. As noted by Ma et al. (2019) unlike the nonparametric or Data Envelopment Analysis (DEA) approach, parametric distance function is differentiable, thus facilitating the estimation of shadow prices and the easier interpretation of the results (Ma et al. 2019: 6).

We utilize industry and sectoral data on goods producing industries, which account for about 70 per cent of the national GHG emissions in Canada, for the period 2009–2015. Our results suggest that the overall average estimate (across two estimation methods and all industries) is about $130/t. However, most sectoral MACs are lower than the $50/t pollution pricing policy intended by the federal government. More specifically, most of large emitting industries (accounting for about 90 per cent of total GHG emissions in our data) are characterized by shadow MACs below the $50/t benchmark, while manufacturing industries with low GHG intensities have typically shadow MACs above $100/t. This means that the $50/t carbon levy would induce substantial GHG abatement among large polluters since reducing emission is less costly for them than paying the emission charge.

The rest of the paper is organized as follows. “The theoretical framework” section outlines the theoretical models and derivation of shadow price. “Empirical specifications, data and the estimation method” section presents the empirical work with separate subsections dedicated to the selection of the model specification, the description of the data and the estimation method. “Results” section outlines the estimation results, followed by the conclusions.

The theoretical framework

Environmental production technology

The environmental production technology summarizes the joint production process for desirable and undesirable (pollution) outputs. Denoting the inputs vector, and a desirable and an undesirable outputs by \(x,y\) and\(b\), respectively, the production possibility set is a compact set defined formally as \(P\left(x\right)=\{\left(y,b\right):(x,y,b)\in T)\) where \(T\) is the underlying technology, which say that \(x\) can produce \(y\) and \(b\), or\(T=\{( x,y,b ) :x \mathrm{can} \mathrm{produce} ( y,b ) \}\). The environmental production technology satisfies two key axioms: (i) null-jointness of desirable (good) and undesirable (bad) outputs; (ii) weak disposability of the bad output and free disposability of the desirable output (Chung et al. 1997; Färe et al. 2005). Null-jointness implies that producing good outputs inevitably leads to production of bad outputs (pollution). This is formally stated as: if \(( y,b ) \in P( x )\) and if \(b=0\), then \(y=0\).

Weak disposability of bad outputs indicates that the reduction in bad outputs, keeping inputs constant, requires a reduction in desirable outputs, thus capturing the idea that reducing bad outputs is costly (Chung et al. 1997; Färe et al. 2005). This assumption is considered to be most appropriate in the context of GHG emissions, since they are difficult to dispose of (Boussemart et al. 2017). Formally, if \(( y,b ) \in P( x )\) then for all \(0 \le \theta \le 1\), \(( \theta y, \theta b ) \in P( x )\). That is, a reduction in bad output (by a factor of\((1-\uptheta\)) is feasible only if there is at least a proportionate reduction in desirable outputs. This assumption is critical for computation of MACs. On the other hand, free (strong) disposability of desirable outputs implies that producers can freely increase or decrease the production of desirable outputs without incurring extra costs. Formally, free disposability is given as: \(\left( { y,b } \right) \in P\left( { x } \right)\) and \(\left( { y ^{\prime} ,b } \right) < \left( { y,b } \right)\) imply \(\left( { y ^{\prime} ,b } \right) \in P\left( { x } \right).\)

As mentioned earlier, the environmental production technology can be fully characterized by output and input distance functions, each of which embodies its own unique mapping rule that in turn plays an important role in computation of shadow costs (Zhou et al. 2015). In the Shephard output distance function (ODF), the mapping rule is based on the maximal proportional expansion of outputs required to move a specific technically inefficient output set/vector onto the production frontier\(P( X )\), while keeping inputs constant. Formally, ODF is given as:

$$D_{0} \left( { x,y,b } \right) = {\text{in}} f\left\{ { \delta :\left( { x,\frac{y}{\delta } ,\frac{b}{\delta } } \right) \in T\left( { x,y,b } \right); \delta \in \Re^{ + } } \right\}.$$
(1)

ODF is a non-increasing function of inputs and is homogeneous of degree one in desirable outputs (Dang and Mourougane 2014). That is, for \(\lambda >0, {D}_{0} \left( x, \lambda y,b \right)= \lambda {D}_{0} \left( x,y,b \right).\) Accordingly, if \(\lambda =1/y\), then \(\left(\frac{1}{y}\right)\left\{ {D}_{0}\left( x,y,b \right)\right\}= {D}_{0}\left( x,1,b \right).\) Furthermore, it is non-increasing in inputs and undesirable outputs, but is non-decreasing in desirable outputs (Färe et al. 2005). It is also important to note that the ODF is equal to the output-based radial measure of technical efficiency. Accordingly, its value is greater than zero, but less or equal to one. The ODF is equal to one for an efficient input–output vector.

In the Shephard input distance function (IDF) the mapping rule is based on the maximal proportional contractions of inputs, given the same quantities (vector) of outputs (Hailu and Veeman 2000). Formally, the input distance function (IDF) is given as:

$${D}_{i} \left( x,y,b \right)=\mathrm{sup} \left\{ \gamma :\left(\frac{x}{\gamma } ,y,b \right)\in T\left( x,y,b \right); \gamma \in {\mathfrak{R}}^{+} \right\}.$$
(2)

By definition, the IDF is the reciprocal of the input-based measure of technical efficiency (Ma and Hailu 2016). The IDF is homogeneous of degree one and is non-decreasing in the inputs and undesirable output vectors, while it is non-increasing in desirable outputs. Interested readers can find further details regarding the IDF in Hailu and Veeman (2000).

The directional output distance function (DODF), on the other hand, measures the simultaneous contraction of undesirable outputs and expansion of desirable outputs along a directional vector. In other words, DODF is defined for a specific directional vector g \(=\left( {g}_{y} , {g}_{b} \right),\) which determines its mapping rule. Formally, the DODF is given as:

$$\overrightarrow{{D}_{O} } \left( x,y,b; {g}_{y} , {g}_{b} \right)=\mathrm{sup} \left\{\beta :\left( x,y+ \beta {g}_{y} ,b+ \beta {g}_{b} \right)\in T\left( x,y,b\right)\right\}$$
(3)

where \({g}_{y}\) and \({g}_{b}\) are the directional vectors. A commonly used directional vector is\(g=( {g}_{y} , {g}_{b} ) =( 1,-1 )\). The directional distance function takes a value equal to or greater than zero. Unlike the ODF, DODF is not assumed to be homogeneous of degree one in outputs. The equivalent property is known as the translation property formally given as:

$$\overrightarrow{{D}_{O} } \left(x,y+\rho {g}_{y},b+\rho {g}_{b};{g}_{y} , {g}_{b}\right)=\overrightarrow{{D}_{O} } \left( x,y,b; {g}_{y} , {g}_{b} \right)-\rho \forall \rho \in {\mathfrak{R}}^{+}$$
(4)

These two properties (homogeneity and translation properties) are critical in making the distance functions amenable for estimation. The DODF is non-decreasing in inputs and bad outputs but non-increasing in good outputs. Chung et al. (1997) provide an interesting comparison between the ODF and DODF, which indicates that they are highly related and more specifically, under constant returns to scale, that \(\mathrm{DODF}= ( 1/ODF ) -1\) or equivalently, \(\mathrm{ODF}=1/( 1+\mathrm{DODF} )\). Accordingly, the technical efficiency that is normally measured by the ODF could be also measured by\(1/( 1+\mathrm{DODF} )\).

Now consider the hyperbolic ODF (HODF), which replaces the restrictive homogeneity of degree one by the “almost homogeneous” assumption. Like the ODF, its value ranges between zero and one, and it satisfies the monotonicity properties of the ODF. However, unlike the ODF, it treats desirable and undesirable outputs asymmetrically (Cuesta and Cuesta et al. 2009; Cuesta and Zofìo 2005; and Duman and Kasman 2018). On the other hand, HODF is similar to the DODF in that it scales the good and the bad simultaneously and it moves production in the north-western direction (Vardanyan and Noh 2006; p. 180). It represents the maximum expansion of the desirable output and equi-proportionate contraction of the undesirable output for a given amount of inputs (Cuesta et al. 2009, p. 2234). Figure 1 illustrates the environmental production technology and the relationships among the various distance functions.

Fig. 1
figure 1

Source Vardanyan and Noh (2006): p. 180

Environmental output set and iso-revenue lines under various mapping rules.

Formally, the HODF is defined as:

$${D }_{H}\left( x,y,b \right)=\mathrm{in} f\left\{ \eta :\left( x,\frac{y}{\eta } ,b \eta \right)\in T\left( x,y,b \right); \eta \in {\mathfrak{R}}^{+}\right\}$$
(5)

According to Cuesta et al. (2009) and Cuesta and Zofìo (2005), a function \({D }_{H}\left( x,y,b\right)\) is almost homogeneous of degrees \({k}_{1}, {k}_{2}, {k}_{3}\) and \({k}_{4}\) if\({D}_{H}\left({\mu }^{{k}_{1}}x,{\mu }^{{k}_{2}}y,{\mu }^{{k}_{3}}b\right)={\mu }^{{k}_{4}}{D}_{H}\left(x,y,b\right)\),\(\forall \mu >0\). Accordingly, the almost homogeneity of the environmental production technology is based on the assumption that (\({k}_{1}, {k}_{2}, {k}_{3},{k}_{4}=(\mathrm{0,1},-\mathrm{1,1})\) so that \({D}_{H}\left(x,\mu y,\frac{b}{\mu }\right)=\mu {D}_{H}\left(x,y,b\right) \forall \mu >0.\) Consequently, if we set\(\mu =1/y\), this implies that \({D}_{H}\left( x,1,by \right)=\left(\frac{1}{y}\right){D}_{H}\left( x,y,b \right).\) It is also shown further by Cuesta et al. (2009) that a differentiable function \({D}_{H}\left({x}_{1},{x}_{2},\dots ,{x}_{j};{y}_{1},{y}_{2},\dots ,{y}_{m};{b}_{1},{b}_{2},{b}_{3},\dots ,{b}_{n}\right)\) is almost homogeneous if \({k}_{1}\sum_{j=1}^{J}\frac{\partial {D}_{H}}{\partial {x}_{j}}{x}_{j}+{k}_{2}\sum_{m=1}^{M}\frac{\partial {D}_{H}}{\partial {y}_{m}}{y}_{m}+{k}_{3}\sum_{n=1}^{N}\frac{\partial {D}_{H}}{\partial {b}_{n}}{b}_{n}={k}_{4}{D}_{H};\) and imposing \(\left({k}_{1}, {k}_{2}, {k}_{3},{k}_{4}\right)=(\mathrm{0,1},-\mathrm{1,1})\) and dividing both sides by \({D}_{H}\) yields \(\sum_{m=1}^{M}\frac{\partial {D}_{H}}{\partial ln{y}_{m}}\frac{{y}_{m}}{{D}_{H}}-\sum_{n=1}^{N}\frac{\partial {D}_{H}}{\partial ln{b}_{n}}\frac{{b}_{n}}{{D}_{H}}=1,\) where \({y}_{m}\) are the desirable outputs and \({b}_{n}\) are the undesirable outputs. For a single desirable output y and a single undesirable output b, this simply implies that\(\frac{\partial \mathrm{ln}{D}_{H}}{\partial \mathrm{ln}y}-\frac{\partial \mathrm{ln}{D}_{H}}{\partial \mathrm{ln}b}=1\).

Under constant returns to scale, the HDOF is equal to the square root of the input distance function or square root of the inverse of the output distance function (Fare et al. 2002). This also shows that one can draw its relation to the directional output distance function given the relationship between the ODF and DODF shown above.

Derivation of shadow cost

Derivation of shadow costs is based on duality of revenue (profit) function and output distance functions. Therefore, the shadow cost estimates based on an ODF measures the value of foregone income resulting from the reduction in undesirable outputs. The revenue function, assuming a single desirable and undesirable output is given as \({p}_{y}y+ {p}_{b}b\), where \({p}_{y}\) is the market price of the desirable output and \({p}_{b}\) is the shadow price of the undesirable output. Shadow price of the undesirable output is derived by solving

$${\mathrm{max}}_{\left(y,b\right)}\mathcal{L}\equiv {p}_{y}y+{p}_{b}b+\lambda \left[{D}_{o}\left(x,y,b\right)-1\right]$$
(6)

which yields:

$${p}_{b} = {p}_{y}\left[\left(\frac{ \partial {D}_{0} \left( x,y,b \right)}{\partial b}\right)/\left(\frac{ \partial {D}_{0} \left( x,y,b \right)}{\partial y}\right)\right]$$
(7)

Recalling that the output distance function is non-increasing in b and non-decreasing in y, we deduce that the value of \({p}_{b}\) must be negative. The term in the square brackets in Eq. (7) is the marginal rate of transformation between undesirable and desirable outputs.

The dual to the HODF is, on the other hand, a bit different as shown by Fare et al. (2002) and is “the returns to dollars” which is given as \(\frac{{p}_{y}y+{p}_{b}b}{{\varvec{w}}{\varvec{x}}}\), where \({\varvec{w}}{\varvec{x}}\) the cost of inputs; w is a vector of input prices and x is the vector of inputs. Thus, the shadow price is computed from the following maximization problem:

$${\mathrm{max}}_{\left(y,b\right)}\mathcal{L}\equiv \frac{{p}_{y} y+{p}_{b} b}{{\varvec{w}}{\varvec{x}}}+\lambda \left[{D}_{H}\left(x,y,b\right)-1\right]$$
(8)

which yields:

$${p}_{b} = {p}_{y}\left[\left(\frac{ \partial {D}_{H} \left( x,y,b \right)}{\partial b}\right)/\left(\frac{ \partial {D}_{H} \left( x,y,b \right)}{\partial y}\right)\right]$$
(9)

under the assumption that inputs are held constant. It should be noted that this result is not applicable to the “super” hyperbolic output distance function (Cuesta et al. 2009), which allows adjustment in the inputs. Equation 9 shows that the formula used to compute shadow cost using the HODF is similar to the one derived from the ODF under relevant assumptions.

Limitations of the theoretical framework

It is important to note that the theoretical framework of distance functions has limitations, primarily because the approach crucially relies on assumptions regarding the production technology. Murty et al. (2012) argue that free disposability of pollution, which treats pollution as an input in the IDF, and the weak disposability assumption of ODFs, may generate unacceptable implications for trade-offs among inputs, outputs, and pollution. To overcome some of these implications, Murty et al. (2012) suggest a by-production framework. Specifically, their approach considers two separate production technologies, one for the intended output and the other for pollution, but this is very difficult to apply empirically. As result, there is little traction of their approach in the empirical literature. On the other hand, Ma et al. (2019) note that using the weakly disposable formulation in the parametric distance function, which we adopt in this study, is less controversial than the nonparametric versions. For directional distance functions, the key shortcoming is related to the use of arbitrary directional vectors (Ma et al. 2019).

Empirical specifications, data and the estimation method

Model specification

Which theoretical approach should be adapted for specification of the empirical model? The literature review by Zhou et al. (2014) indicated that estimates of shadow costs greatly depend on the estimation methods. Ma and Hailu (2016) show that the differences in the input/output coverage may explain the considerable heterogeneity in abatement cost estimates. Most of the empirical works utilize ODFs, while there are few exceptions that are based on IDFs (Ma and Hailu 2016; and Lee et al. 2002).

In their exploratory study that compares various approaches, Vardanyan and Noh (2006) concluded that there is no single technique that is superior to all others. As a general principle, however, they indicated that Shephard ODF/IDF is more suitable for estimating shadow prices under less stringent regulations permitting for instances of an increase in CO2 emissions. In contrast, the DODF is more appropriate in modelling a situation in which producers may increase one output while simultaneously reducing the other, something that is expected under more stringent environmental regulations, such as a mandatory reduction of CO2 emissions.

Estimation of the parameters requires imposing restrictions such as the homogeneity restriction in the case of ODF and the translation property in the case of DODF. These lead to the differences in their functional forms: the DODF has a quadratic form with variables measured in levels, while the ODF is translog in form (Ma and Hailu 2016). Vardanyan and Noh (2006) point out that both the homogeneity of degree one assumption required for the ODF and the Quadratic form specification of DODF are restrictive. In addition to restrictiveness of the quadratic form, the arbitrariness of the choice of directional vector is considered another shortcoming of DODF.

In the hyperbolic ODF (HODF), the restrictive homogeneity assumption is replaced by the less restrictive “almost homogeneous” assumption (Vardanyan and Noh 2006). Unlike ODF, HODF treats desirable and undesirable outputs asymmetrically (Cuesta et al. 2009; Duman and Kasman 2018). Thus, HODF is similar to the DODF in that it simultaneously scales the good and the bad in the north-western direction (Vardanyan and Noh 2006, p. 180). However, this is possible without requiring arbitrarily chosen directional vector as in DODF. The functional form of HODF is translog (Cuesta and Zofio 2005; Cuesta et al. 2009), thus implying it is not relying on the restrictive quadratic form used in DODF. These desirable features make the translog HODF attractive for the purpose of approximating shadow cost. As a result, our study is based on the HODF functional form. This also distinguishes our study from the existing estimates of MACs of GHG abatement in Canada. In the next section, we formally present the translog ODF and illustrate the “almost homogenous” restrictions required to obtain HODF results.

The Translog ODF and the “almost homogeneous” restriction

The Translog output distance function is given as:

$$\begin{aligned} \ln D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right) = & \;\alpha_{0} + \mathop \sum \limits_{k = 1}^{K} \alpha_{k} \ln x_{kj} \\ & \; + \frac{1}{2}\mathop \sum \limits_{k = 1}^{K} \mathop \sum \limits_{{k^{\prime} = 1}}^{K} \alpha_{{kk^{\prime}}} \ln x_{kj} \ln x_{{k^{\prime}j}} + \beta_{1} \ln y_{j} \\ & \; + \frac{1}{2} \beta_{2} \left( {\ln y_{j} } \right)^{2} + \gamma_{1} \ln b_{j} + \frac{1}{2} \gamma_{2} \left( {\ln b_{j} } \right)^{2} \\ & \; + \mathop \sum \limits_{k = 1}^{K} \delta_{k} \ln x_{kj} \ln y_{j} + \mathop \sum \limits_{k = 1}^{K} \eta_{k} \ln x_{kj} \ln b_{j} + \mu \ln y_{j} \ln b_{j} + \nu_{j} . \\ \end{aligned}$$
(10)

where j indexes the industry and k indexes the inputs, also assuming that \(\alpha_{{k^{\prime}k}} = \alpha_{{kk^{\prime}}}\). To relate the shadow price given in Eq. (9) to the Translog HODF, we replace \(\frac{{\partial D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right) }}{{ \partial b_{j} }} = \frac{{\partial \ln D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right) }}{{\partial \ln b_{j} }} \cdot \frac{{D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right) }}{{b_{j} }}\) and \(\frac{{\partial D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right)}}{{ \partial y_{j} }} = \frac{{\partial \ln D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right)}}{{\partial \ln y_{j} }} \cdot \frac{{D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right) }}{{y_{j} }}\). Substituting these expressions in Eq. (9), we obtain:

$$p_{b,j} = p_{y,j} \left( {\frac{{\frac{{\partial \ln D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right) }}{{\partial \ln b_{j} }} \cdot \frac{{D_{j} \left( { . } \right) }}{{b_{j} }}}}{{\frac{{\partial \ln D_{j} \left( {x_{j} ,y_{j} ,b_{j} } \right) }}{{\partial \ln y_{j} }} \cdot \frac{{D_{j} \left( { . } \right) }}{{y_{j} }}}}} \right) = p_{y,j} \left( {\frac{{y_{j} }}{{b_{j} }}} \right)\left( {\frac{{ \in_{bj} }}{{ \in_{yj} }}} \right)$$
(11)

where \(\varepsilon_{b}\) and \(\varepsilon_{y}\) are the elasticities with respect to the undesirable and the desirable outputs, respectively.

The “almost homogeneity” of outputs is given by \(D \left( { x,y,b } \right)/ y = D \left( { x,1,by } \right)\) (Cuesta et al. 2009; Duman and Kasman 2018). Taking the logarithms of both sides, we obtain \(\ln D \left( { x,y, b} \right) - \ln y = \ln D\left( {x,1,by} \right) \Rightarrow - \ln y = \ln D\left( {x,1,by} \right) - \ln D \left( { x,y, b} \right)\).

The parameters of this specification are estimated by firstly, imposing the “almost homogeneity” restrictions in order to translate the specification in Eq. (10) to that of \(\ln D_{H} \left( {x_{j} ,1,b_{j} y_{j} } \right)\) and secondly, by treating the unobserved \(\ln D_{j} \left( { x_{j} ,y_{j} , b_{j} } \right)\) as a one-sided error term denoted by \(u_{j} ,\) which measures inefficiency. In other words, we estimate:

$$\begin{aligned} - \ln y_{j} = & \quad \alpha_{0} + \mathop \sum \limits_{k = 1}^{K} \alpha_{k} \ln x_{kj} + \frac{1}{2}\mathop \sum \limits_{k = 1}^{K} \mathop \sum \limits_{{k^{\prime} = 1}}^{K} \alpha_{{kk^{\prime}}} \ln x_{kj} \ln x_{{k^{\prime}j}} \\ & \quad + \frac{1}{2} \beta_{2} \left( {\ln y_{j} } \right)^{2} + \gamma_{1} \ln b_{j} + \frac{1}{2} \gamma_{2} \left( {\ln b_{j} } \right)^{2} + \mathop \sum \limits_{k = 1}^{K} \delta_{k} \ln x_{kj} \ln y_{j} \\ & \quad + \mathop \sum \limits_{k = 1}^{K} \eta_{k} \ln x_{kj} \ln b_{j} + \mu \ln y_{j} \ln b_{j} + \nu_{j} - u_{j} . \\ \end{aligned}$$
(12)

Imposing restrictions that \(\beta_{2} = \gamma_{2} = \mu\); \(\sum \delta_{k} - \sum \eta_{k} = 0\) and \(\beta_{1} - \gamma_{1} = 1\), in fact set conditions for the “almost homogeneity.”

We estimate the parameters of the model using a stochastic frontier model, which permits two error components, the statistical noise, and the one-sided inefficiency term uj, by imposing all the above restrictions, except for the last one, a priori. Under these restrictions, all parameters are identifiable except for \(\beta_{1}\), which is computed based on the estimated value of \(\gamma_{1}\). Since the parametric approach does not impose the monotonicity condition a priori, it is important to check whether it is satisfied at each data point. If the monotonicity condition is satisfied, then the \(\varepsilon_{b} /\varepsilon_{y}\) ratio in Eq. (11) is negative.

Data

This study utilized a panel data consisting of 30 industries (sectors) and seven years (2009–2015), all compiled from Statistics Canada. More specifically, data on Capital inputs measured in geometric end-year net stock of non-residential capital by sector (reported in 2012 dollars) come from Statistics Canada, Table 36–10-0096–01. Data on Labour inputs measured in hours worked and the real GDP measured in 2012 dollars, are both sourced from Statistics Canada, Table 36–10-0480–01. Data on GHG emissions and energy inputs by sector are sourced from Table 38–10-0097–01 and Table 38–10-0096.01, respectively.

Table 1 shows the list of industries (sectors) included in the data set and their respective values of outputs, GHG emissions (kt of CO2e) and real GDP (M$), as well as the three inputs; energy (TJ), capital (M$), and labour (thousands of hours worked) in 2015. Total GHG emission in 2015 of industries included in this study was 491 Mt CO2e. This is about 70 percent of the national emissions, which was 714 Mt CO2e. One of the major contributors of GHG excluded from the list is the transportation sector, which accounts for 25 per cent. Within the group of industries covered in this study, the top ten polluters including sectors of crop production, animal production, oil and gas extraction, and utilities (mainly electricity generation) account for nearly 90 per cent of the group’s total GHG emissions. While these industries also account for 77 per cent of total energy use, they only produce 40 per cent of the group’s GDP (Table 1).

Table 1 GDP, GHG emissions, energy, capital, and labour in 2015, by sector

As shown by the data, production in these top ten industries is both energy and GHG intensive. According to the environmental production technology, the higher is the GHG intensity in the sector, the lower is the shadow marginal abatement cost expected. The above data then suggests that MACs of GHG would be lower in these top ten industries, further implying that these industries are more likely to undertake abatement when emission pricing policies are introduced. This is because the emission price is more likely to exceed their (lower) MACs.

Figure 2 depicts the industries ranked according to their GHG intensities. Only four of the top ten GHG-intensive industries are manufacturing industries (paper manufacturing, non-metallic mineral product manufacturing (primarily cement), primary metal, and petroleum and coal product manufacturing). Agriculture, utilities, mining, and coal and gas extraction sectors constitute the rest. Given the low GHG intensity in many manufacturing industries, we expect them to be characterized by higher MACs. As a result, these industries are less likely to undertake emission reductions unless emission prices are very high.

Fig. 2
figure 2

Greenhouse gas emission intensity by industry

Estimation method

Estimation of shadow cost is obtained using a number of methods: the nonparametric (mathematical programming), parametric but non-stochastic data envelopment methods, and parametric (econometric) methods. While the non-stochastic methods ensure that the monotonicity conditions are enforced at all data points, they fail to account for the statistical noise and require large data points to generate reliable results. The econometric approach, on the other hand, does not a priori impose this condition. Thus, one should inspect whether the results fail to satisfy the monotonicity condition and exclude cases in which the condition is violated.

As discussed above, the data used in this study is a panel data set, involving 7 years and 30 cross-sectional units. Thus, our focus is on the panel data stochastic frontier econometric method given as:

$$y_{jt} = \beta_{0} + \mathop \sum \limits_{k = 1}^{K} \beta_{jk} x_{kjt} + \nu_{jt} - s u_{jt} .$$
(13)

where \(s = 1\) for production frontiers and \(s = - 1\), for cost frontiers; \(y_{jt}\) is the dependent variable; \(x_{kjt}\) are the explanatory variables; with k indexing the regressors, j indexing cross-section units, and t indexing the time. Note that the vector \(x_{kjt}\) includes all the variables indicated in Eq. 12.

In its early development, the panel data stochastic frontier framework was based on the assumption that technical inefficiency is simply the individual heterogeneity captured by the fixed effect term (Schmidt and Sickles 1984; Battese and Coelli 1988). This implies that technical inefficiency is time invariant. Since this method attributes all individual heterogeneity to inefficiency, it does not permit estimation/evaluation of technical efficiency evolving overtime. Therefore, it is important to consider methods that allow time variation. Cornwell et al. (1990), Kumbhakar (1990) and Battese and Coelli (1995) proposed various specifications that allow for a time-varying inefficiency term.

We employ the time-varying specification proposed by Battese and Coelli (1992), which parameterizes inefficiency using

$$u _{jt} = \exp \left\{ { - \eta \left( { t - T_{j} } \right)} \right\}u_{j}$$
(14)

where \(T_{j}\) is the last period in the \(jth\) panel and η is the decay parameter. If η is not statistically significantly different from zero, the inefficiency term is simply time invariant and is equal to \(u _{j}\). On the other hand, when \(\eta > 0\), the degree of inefficiency decreases over time and when \(\eta < 0\), the degree of inefficiency increases over time. Since \(t = T_{j}\) is the last period, the last period for industry \(j\) contains the base level of inefficiency. Thus, if \(\eta > 0\), the level of inefficiency decays towards the base level and if \(\eta < 0\) the level of inefficiency increases towards the base level. However, this approach fails to account for individual specific effects separately from the efficiency effect. To mitigate for this and to account for unobserved fixed-effects, we include industry dummies in our estimation model in the same spirit with the standard panel data econometrics and note this as Model 1.

In addition, we estimate the time invariant model noted as Model 2 (Battese and Coelli 1988), for comparison purposes to Model 1, the preferred model described above. Model 2 attributes all individual effects to the inefficiency term. However, this model allows us to include time dummies. Both Model 1 and Model 2 estimations are based on the maximum likelihood method. The Likelihood Function assumes the half-normal and normal distribution of the inefficiency term and the idiosyncratic error, respectively; \(v_{jt} \sim N\left( {0,v_{v}^{2} } \right)\); \(u_{jt} \sim N^{ + } \left( { 0, v_{u}^{2} } \right)\).

Results

We obtained estimates under both time-varying and time-invariant efficiency specifications, imposing the theoretical restrictions on the parameters a priori. Since the decay parameter of the efficiency term (\(\eta\)) is statistically significant, the time-varying model is a better fit. In addition, the time-varying model is more robust as reflected by the considerably higher values of the log-likelihood function and the Wald Chi-square statistics. Nonetheless, we discuss results based on both models, as presented in Table 2.

Table 2 Estimation results

The statistical significance of the inefficiency term is ascertained by testing whether \(\sigma_{s}^{2}\) and \(\gamma\) are statistically significantly different from zero. These are provided by variables denoted as lnsigma2 and lgtgamma, respectively.Footnote 5

As mentioned earlier the satisfaction of monotonicity condition at each data point is an important requirement for the validity of the results. We verified the monotonicity condition by checking whether the computed shadow costs at each data point are expressed in negative values. The Kernel density graph comparing the shadow MACs computed based on estimates from both models (Fig. 3) clearly shows that this condition is satisfied in all cases. The distribution shows that the estimated MACs using time-varying efficiency are relatively larger, particularly with some extremely large cases. This is also evident from Fig. 5, which compares the estimated average shadow MACs.

Fig. 3
figure 3

Kernel density curves of shadow MACs

To pinpoint the extreme cases, we refer to Fig. 4, a scatter plot comparing estimated MACs in each year by industries. As clearly shown in Fig. 4, there are three time periods in which shadow MACs are significantly high in the computer and electronic product manufacturing industry. This industry displays the lowest GHG intensity (see Fig. 2), and as a result, it is generally characterized by the highest shadow MAC. The outlier MACs, therefore, confirm these theoretical expectations. The graph also confirms that the MACs estimated using the two approaches are generally similar except for the three distinct years for this specific industry. To recap the difference between the two, model 1 captures industry specific effects while also allowing for time variation of the inefficiency term. This feature makes the model preferred while estimates based on model 2 are presented as robustness check. The fact that the estimates are generally the same confirms that the results are robust.

Fig. 4
figure 4

Scatter plot of shadow MACs by industry and year

The calculated shadow costs are shown in Table 3 of Appendix, while the average shadow MACs for the period are depicted in Fig. 5. The estimated shadow MACs are almost consistently ranked in the reverse order of GHG intensities, with sectors with the highest GHG intensities having the lowest MACs. The estimates from Model 1, the time-varying efficiency model, result in shadow MACs ranging from a low of $5.50/t in animal production to a high of $993.55/t in computer and electronic products manufacturing sector. The corresponding values received from the time-invariant Model 2 estimates are $3.40/t and $635.35/t, respectively. Averaging the two computed MACs for each sector, we obtain shadow MACs ranging from $4.45/t to $814.45/t. The vertical line in Fig. 5 shows the reference line of $50/t carbon charge/price. The average MAC estimates are less than $50/t in 16 of the 30 sectors covered, as indicated under the horizontal cut-off line in Fig. 5. Moreover, these 16 sectors account for about 98 per cent of total GHG emissions in our data set and their actions regarding the reduction of GHG emissions carry considerable weight in the national outcome.

Fig. 5
figure 5

Comparison of estimated shadow MACs by sectors

How do our estimates compare to those available for the Canadian context? A Statistics Canada study considering the manufacturing sector during the period 2004–2014, approximated an average shadow price at $390/t for GHG, and $453/t for CO2 measured in 2012 Canadian dollars (Gu et al. 2019). Unlike our estimation, the output directional function in this study assumed production at the efficiency frontier and as a result, their estimates do not account for inefficiency. This assumption could be one reason why their estimates can be much higher than our estimates. As demonstrated in section two, the closer the production mix is to the frontier the higher the shadow cost due to limited opportunity for substitution. Additionally, their shadow cost estimate for methane is very high, implying that estimates based on the GHG aggregate are even higher for industries in which methane accounts for significant proportions of GHGs. Dang and Mourougane (2014), applying the same method as Gu et al. (2019) to a panel of OECD countries, reported a shadow cost of $245/t (measured in 2005 USD (PPP)) for the Canadian economy, while Brandt et al. (2014) in another OECD study reported a shadow cost of 130 US$/t measured in 2008 USD. Overall, the discrepancies in the estimates received can be explained by the different features of mapping rules.

Conclusions

This study calibrates the MACs of reducing GHG emissions in Canada across 30 industries, using the shadow cost approach. Specifically, adopting a HODF, employing a stochastic frontier estimation and based on industry level data for GHG emissions caused by energy consumption, we obtain an average shadow MAC of $130/t. Based on this estimate and the CO2 levy of $50/t imposed by the federal government, one can identify industries that would undertake GHG abatement to avoid paying the federal carbon tax. For instance, in the GHG-intensive industries such as the electric utilities and non-conventional oil extraction, the estimated MACs are much lower than the CO2 tax of $50/t. Furthermore these low-MACs sectors account for about 98 per cent of total GHG emissions and 94 per cent of total energy use in industries studied. The important policy implication is that the national reduction of GHG emissions is expected to be significant under the current federal carbon pricing scheme because major emission contributors also have a relatively low marginal cost to undertake reductions.

On the other hand, there are also manufacturing industries with shadow MACs substantially higher than $100/t. These high-MAC industries likely would not undertake any abatement activities since simply paying the carbon levy represents a relatively less expensive alternative for them. Since their emissions accounts for a negligible share of national emissions, they carry a relatively low weight in the overall emission outcome. Furthermore, the revenue generated by the carbon levy on these industries may be relatively low due to their low emissions.

Nevertheless, at an individual level, the manufacturing industries will incur an increase in their overall cost of doing business due to emission charges. For those sectors and firms that are trade-exposed, the emission charge will lead to competitive disadvantages, particularly in cases where trade competitors are in countries that lack any stringent or meaningful carbon pricing policy.

In addition, our framework shows how improvements in technical inefficiencies could reduce GHG emissions, while keeping the same production frontier. In other words, our estimation of shadow MACs, using an environmental production technology, is entirely based on a projection of inefficient production mix onto the frontier and thus, MACs represent the notional marginal cost of abatement under the existing production technology. Thus, the model does not account for the role of technological progress that would shift the production possibility frontier altogether. Similarly, the framework does not account for substitution of GHG-intensive inputs by cleaner alternatives. That is, the mapping rule does not account for the fact that producers could reduce energy intensity or shift their energy sources to clean alternatives. Thus, our prediction regarding how different sectors might respond to the emission pricing policy, based on the estimated shadow MACs, pertains to the short-run. In the long-run, technological change will be an important vehicle for GHG abatement.

Finally, as noted before our estimates are based on industry level data, dictated by the availability of data. The policy implications drawn then speak to that level. Ideally, estimation based on plant-level would provide a much better understanding of the individual responses of each firm within a given sector against the carbon tax. Overall, obtaining MACs estimates at plant-level over a longer period of time would be more advantageous in understanding the GHG abatement activity in Canada. This, in turn, is an important avenue for future research.