1 Introduction

The short-term orientation of executive pay is a fundamental shortcoming of compensation practices. Former US Treasury Secretary Geithner stated that

This financial crisis had many significant causes, but executive compensation practices were a contributing factor. Incentives for short-term gains overwhelmed the checks and balances meant to mitigate against the risk of excess leverage...Companies should seek to pay top executives in ways that are tightly aligned with the long-term value and soundness of the firm.Footnote 1

A fundamental recommendation made by the Treasury in the same press release is that “compensation should be structured to account for the time horizon of risks.” Past crises have shown that if firms do not account for short-term changes to firm performance in their compensation contracts, this can have severe consequences.

Despite recent policy changes, compensation practices are continuing to be heavily criticized in the media.Footnote 2 Thus, this paper addresses short-termism in executive compensation. I focus specifically on distributional heterogeneity in the time horizon of performance–pay elasticities using yearly total compensation, and accumulated wealth. The main question I address asks if pay at the top of the conditional distribution is more short-term oriented?

The research question comes from the criticism that executives benefit excessively from short-term changes in firm value (Bebchuk and Fried 2004; Edmans et al. 2017a). Graham et al. (2005) find that 78% of executives are willing to sacrifice long-term firm value to outperform the market’s expectations. I test if the relation of executive pay to short-term and long-term firm and industry performance is heterogeneous across the conditional yearly compensation and total wealth distribution, using the Method of Moments–Quantile Regression (MM–QR) (Machado and Santos Silva 2019). I also allow for asymmetric response of pay to positive and negative short-term firm performance in a second specification, as in the asymmetric benchmarking literature (Garvey and Milbourn 2006; Campbell and Thompson 2015; Daniel et al. 2019).

I employ a panel quantile regression methodology developed by Machado and Santos Silva (2019), to account for endogeneity driven by risk preferences and other latent personality traits, assuming they are time-constant, and executive-firm specific. The strength of the estimator is that it accounts for unobserved average, and distributional heterogeneity with executive-firm fixed effects, which is not the case for most other panel quantile regression estimators (Machado and Santos Silva 2019). It also gives direct inference on the significance of distributional effect heterogeneity, which I use to test the hypotheses.

The literature hitherto has identified short-termism as a problem (Narayanan 1985; Bebchuk and Stole 1993; Edmans et al. 2019; Marinovic and Varas 2019), but not systematically assessed distributional differences in the time horizon of executive compensation. I find significant distributional heterogeneity of short-term and long-term performance–pay relations.Footnote 3 Total yearly compensation, a flow measure of pay, is more sensitive to short-term firm performance in the left tail of the conditional distribution and more sensitive to long-term firm performance in the right tail. By contrast, total wealth, a stock measure of pay, is not always significantly, but quantitatively more sensitive to short-term firm performance in the right tail of the conditional distribution and more sensitive to long-term firm performance in the left tail. This suggests there are weaker incentives to invest in long-term projects for conditionally wealthier executives (Edmans et al. 2017b). Putting this into context, Gopalan et al. (2014) find that firms react to higher stock returns by increasing the duration of compensation; however, this can also be to retain talent.

Past literature has suggested asymmetric benchmarking as a possible driver of managerial skimming, which would be so if the asymmetry is stronger in the right tail of the conditional distribution (Garvey and Milbourn 2006; Bizjak et al. 2008; Daniel et al. 2019). When allowing for asymmetric short-term performance–pay elasticities, the degree of asymmetry from negative short-term firm performance is very similar across the distribution. This makes asymmetric benchmarking an unlikely mechanism driving differences in conditional pay.

The results of this study show the importance of carefully implementing stock-based pay as an incentive, if its aim is to induce the executive to maximize long-term firm value.

The paper proceeds as follows. Section 2 discusses related literature and develops hypotheses. Section 3 describes the data. Section 4 discusses the empirical methodology and its application. Section 5 presents the results. Section 6 discusses the potential mechanisms and policy.

2 Related literature and hypotheses

I build an empirical model, including four main variables, to explain executive pay, the dependent variable. In a similar framework, Hallock et al. (2010) find distributional differences in performance–pay elasticities for CEOs, using conditional quantile regressions (Koenker and Bassett 1978), ranging from 0.07 at the first decile, to 0.15 at the 9th decile. Following Hallock et al. (2010), my empirical model allows for heterogeneous performance–pay relations across the conditional distribution of compensation.

Firms optimally pay executives to maximize long-term firm value, which captures all relevant outcomes of executives’ behavior, e.g., changes in growth or profit, and restructuring (Jensen 2001; Edmans et al. 2012, 2017b). Managers may not act in the best interest of the firm, and aim to increase own wages or reputation with short-term-oriented action (Narayanan 1985). Bebchuk and Stole (1993) argue that asymmetric information between managers and shareholders can lead to sub-optimal investment. Examples of short-term behaviors are forgoing positive-NPV projects that sacrifice short-term performance, undertaking negative-NPV projects that boost short-term performance, M&A announcements, and stock repurchases with free cash (Edmans et al. 2017b, 2019). Bizjak et al. (1993) and Cadman et al. (2013) show that long-term equity is used more frequently in industries where short-term performance is an unreliable predictor of long-term performance.

The distributional model also tests for differences in short-term and long-term firm performance–pay elasticities. I include short-term firm value to model managerial actions that have a short-term effect on firm value (Narayanan 1985; Bebchuk and Stole 1993), which also captures luck and productivity changes. I argue that industry and macroeconomic controls capture most other factors influencing firm value over the business cycle. While Gopalan et al. (2014) directly measure the duration of executive pay with a weighted average of vesting periods for pay components, I complement this by assessing how different pay measures correlate with long-term and short-term stock performance. It is possible that higher short-run performance also increases the value of long-term pay, which is an unintended consequence of this kind of pay, if executives can cash in on short-run changes to their equity holdings once they have vested. Supporting this conjecture, Edmans et al. (2019) provide empirical evidence that vesting equity provides an incentive for executives to engage in behavior that sacrifices long-term firm value.

I test whether incentives for short-term and long-term firm-performance differ across the distribution. If the null hypothesis are not rejected, this would suggest that short-termism is a greater problem when pay is (conditionally) greater. This would support the arguments above.

Hypothesis 1

The short-term firm performance–pay relation is positive and increases with the conditional pay quantile.

Hypothesis 2

The long-term firm performance–pay relation is positive and decreases with the conditional pay quantile.

In an optimal contract, an executive’s variable compensation is positively correlated with firm performance, and exogenous measures that are also correlated with firm performance are used to filter luck (Holmstrom 1979; Edmans et al. 2012; Edmans and Gabaix 2016). Other studies show mixed evidence on relative evaluation (Frydman and Jenter 2010), and that managers are rewarded positively for external forces affecting firm performance (Bertrand and Mullainathan 2001).

Here, I re-explore the hypothesis that managers in the lower-tail of the distribution are more likely to be benchmarked against the industry, and receive higher pay when economic conditions are good (Bizjak et al. 2008). If firms adjust pay upwards when the industry is performing better, holding own firm performance constant, this is often used to retain executives (Bizjak et al. 2008; Campbell and Thompson 2015). This can be explained in equilibrium by the manager’s outside option increasing if other firms link their manager’s pay to their firm performance. I include long-term and short-term average industry shareholder wealth as the two other variables of interest. I also control for macroeconomic indicators, which are other exogenous factors potentially correlated with firm performance.

I test whether benchmarking against long- and short-term industry performance is distributionally heterogeneous. If Hypotheses 3 and 4 are supported, they support the findings of Bizjak et al. (2008).

Hypothesis 3

The short-term industry performance–pay relation is positive and decreases in the conditional pay quantile.

Hypothesis 4

The long-term industry performance–pay relation is positive and decreases in the conditional pay quantile.

Garvey and Milbourn (2006) estimate that for a CEO at the mean, the performance–pay relation is between 25 and \(45\%\) when a change in firm performance due to luck is negative, than when it is positive. Addressing a potential mechanism, Bizjak et al. (2008) find that asymmetric benchmarking of yearly compensation is used to retain CEOs, and is not strongly associated with poor corporate governance. If a firm engages in such behavior, a CEO can threaten to leave. In a comprehensive study testing robustness of this asymmetry, Daniel et al. (2019) find no significant interaction between bad luck and the pay benchmark in the majority of specifications for US firms.

I re-explore this mechanism and test for distributional heterogeneity of the short-term firm performance–pay asymmetry, from the perspective of when pay is granted, using total yearly compensation. This leads to the fifth and sixth hypotheses:

Hypothesis 5

Total compensation is more sensitive to positive than to negative short-term changes in firm value.

Hypothesis 6

The reduction to the short-term firm performance–pay relation, when performance is negative, increases with the conditional total compensation quantile.

If these hypotheses are both supported, especially Hypothesis 6, then asymmetric performance benchmarking is one possible reason for higher conditional pay. If only Hypothesis 5 is supported, and there is no evidence for distributional heterogeneity, then asymmetric benchmarking is not a driver of managerial skimming.

3 Data

I use compensation data from an 11-year unbalanced panel of executives in the C-suite of publicly listed firms for 34 countries over the years 2003–2013, provided by BoardEx. Most observations come from the USA, UK, Western Europe, and Scandinavia. The unit of observation is the pay of an executive i, in a firm f, in an industry s, at year t. There are 143 executives who switch firms within the observation period in the final sample, which can be calculated by the difference in the 6939 executive-firm matches, and the 6796 executives in the data altogether.Footnote 4 Matching firm financial data is from the ORBIS database provided by Bureau van Dijk. Executives are included in the final sample if there is at least one non-missing observation of total compensation.

The main dependent variables of the study are total compensation and total wealth. Total compensation consists of salary and cash bonus plus the grant-date value of newly emitted equity-linked compensation (such as stock options, restricted stock awards), and long-term incentive plans (restricted bonuses) awarded each year, as used by Fernandes et al. (2013). Total compensation measures the grant-date opportunity cost to the shareholders of the executive’s pay package (Fernandes et al. 2013). Executives accumulate stock holdings and other equity-linked pay. Firm performance can affect executive utility through wealth to a larger degree than yearly total compensation (Frydman and Jenter 2010). Edmans and Gabaix (2016) show that incentives for executives are larger when using total wealth to proxy utility. I use total wealth as an outcome variable, which is the sum of the estimated market value of an executive’s cumulative holdings of stock-related pay, in-the-money options, and long-term incentive plans for an executive (Fernandes et al. 2013).Footnote 5 Note that options exercised and stock sold by an executive disappear from total wealth.

Firm value is a widely used proxy of firm performance and managerial effort (Jensen 2001; Edmans et al. 2017b). Firms’ market value of equity at year end is used to generate the main independent variables (Jensen and Murphy 1990; Bertrand and Mullainathan 2001).Footnote 6 I control for macroeconomic indicators, GDP per capita, GDP-growth in percent, provided by the World Bank and inflation as the percentage change in average consumer prices, provided by the International Monetary Fund. These serve to control for time-variant country-level heterogeneity, although there may still be residual variance not captured here. Controls for age and age squared of an executive are included, in line with Bertrand and Mullainathan (2001). Since the estimator accounts for executive-firm fixed effects and age, additionally including tenure would be collinear. Executive-firm fixed effects, discussed below, control for unobserved average and distributional differences in wealth or total compensation that are time-constant.

Although other studies control for firm size (Murphy 1999; Garvey and Milbourn 2006), growth potential using the market-to-book ratio (DeVaro et al. 2017), and leverage (DeVaro et al. 2017), I have chosen explicitly not to include these controls, as they can likely cause biased estimates of coefficients. Including control variables that are simultaneously determined with the outcome variable of interest by the independent variables leads to this bias (Angrist and Pischke 2008; Swanquist and Whited 2018).Footnote 7

3.1 Measuring short-term and long-term performance

Short-term and long-term firm and industry performance is identified using the band-pass filter (Christiano and Fitzgerald 2003), and is based on the theory of business cycles (Burns and Mitchell 1947). This separates the performance, proxied by market value of a firm f or industry s at time t into a trend component Trend, and a cyclical component, Shock, so that

$$\begin{aligned} Market\ Value_{ft}=Shock_{ft}+Trend_{ft}. \end{aligned}$$
(1)

This filter is also used to identify the effects of firm shocks on executive compensation in a different setting done by DeVaro et al. (2017). I apply it to the time series of the market value of each firm to generate the shocks of year end market value for each firm, and to the time series of the mean year-end shareholder wealth of the industry.

I separate stochastic cycles from the trend that range from two to eight years, as data are yearly. This filtering method is in accordance with Burns and Mitchell (1947), who define business cycles as stochastic cycles in business data between 1.5 and 8 years. The time period to measure short-term changes in firm value is also in line with compensation practices, where yearly bonuses are “short term,” and pay withheld for longer than one year, normally 3–5 years, is “long.” Long-term pay aims to remove such productivity cycles from compensation plans. Figure 1 shows the application of the filter to a firm, and an industry, from my sample. The figures show that it works to identify stochastic changes in performance, with mean zero. The shock and trend for each firm, \(Shock_{ft}\) and \(Trend_{ft}\), and each industry, \(Shock_{st}\) and \(Trend_{st}\), serve as the proxies for short-term, and long-term firm and industry performance.

As short-term performance is measured by yearly cycles of firm value, it is not driven entirely by exogenous factors to the firm, such as luck (DeVaro et al. 2017), but also captures factors endogenous to the firm, such as short-term-oriented behavior (Edmans et al. 2019), and yearly productivity changes. Productivity is likely to sink if managers put more attention on public relations, and actions that focus more on short-term stock price manipulation, than on operations (Peng and Röell 2014). The shock variable measures “short-term performance,” and the trend component measures “long-term performance.”

Table 1 describes the main variables of interest, using all observations from the final sample. An advantage of the data set is to be able to track executives over a long time frame. In the final sample, I have data from 34 countries, adding to the generality of the findings to countries outside the USA, which has been the central focus in the literature hitherto.

Fig. 1
figure 1

Business cycle of firm with BvD ID GB03194476, and of the banking sector, showing the time series of market value, the cyclical and trend components from the band pass filter, removing cycles from 2 to 8 years and accounting for drift

Table 1 Summary statistics

4 Empirical framework

4.1 Application of method of moments–quantile regression

I aim to tackle numerous endogeneity concerns in my empirical application. Past research shows that the level of executive pay can be driven by selection of more talented managers into contracts with higher pay, through assortative matching (Gabaix and Landier 2008; Tervio 2008). More talented managers likely cause better firm performance as well. Estimating the conditional variance of pay is potentially confounded by managers’ risk preferences. Confident managers are more likely to undertake in risky investments with free cash (Malmendier and Tate 2005), close M&A deals (Malmendier and Tate 2008), and hold more of own-company stocks (Malmendier and Tate 2005). Thus, latent preferences and personality traits, such as risk tolerance and confidence, likely cause more volatile firm performance and pay, and even structurally different portfolios. I assume here, that these preferences and traits are more or less time-constant (Cobb-Clark and Schurer 2012; Bernile et al. 2017; Schildberg-Hörisch 2018). These endogeneity concerns are addressed by the MM–QR estimator, outlined in this chapter, which was developed by Machado and Santos Silva (2019).

The MM–QR uses estimates of conditional mean and the conditional scale function to estimate regression quantiles (Machado and Santos Silva 2019). This makes it computationally easy to estimate a model with a large number of individual-specific fixed effects in a quantile regression framework. I estimate around 6939 executive-firm fixed effects. I identify the response of the compensation \(Y_{ifst}\), of an executive i, in firm f, in industry s, at time t, to performance variables that vary at the firm level, or the industry level to which the firm belongs. These measures are summarized for now by X of firm f, in industry s, at time t, as defined by \(X_{fst}\) for firm-level variables, and \(X_{st}\) for industry-level variables, written together as \(X_{(f)st}\). Firm and industry performance measures, and all control variables are summarized for now under \(X_{(f)st}\). The response of compensation to performance, summarized by the coefficient vector \(\beta \), is allowed to depend on the position of pay in the conditional distribution, and is clustered at the executive-firm level, which is modeled by unobserved noise \(U_{ifst}\) distributed on the uniform interval [0, 1]. This is in order to estimate

$$\begin{aligned} \text {log}\ Y_{ifst}=X_{(f)st}'\beta (U_{ifst}),\quad i=1,\ldots ,n. \end{aligned}$$
(2)

However, standard quantile regression methods do not deal with the panel structure of the data. This poses a problem for identification of \(\beta \), if there is time-constant unobserved heterogeneity at the individual level, affecting both firm performance and compensation. Accounting for this by including individual intercepts, \(\alpha _{ifs}\), estimates the pay relation at the \(\tau \)’th quantile as

$$\begin{aligned} Q_{\text {log}\ Y_{ifst}} (\tau |X_{(f)st})=\alpha _{ifs}+X_{(f)st}'\beta q(\tau ), \end{aligned}$$
(3)

where \(q(\tau )=F^{-1}_U(\tau )\). However, including a large number of individual specific intercepts in the quantile regression is computationally burdensome. Further, variance estimates of other covariates may be increasingly large in proportion to the amount of fixed effects (Koenker 2004). This is especially problematic if the panel is small, since standard errors for individual effects will be large.Footnote 8

A second potential source of unobserved heterogeneity is in the conditional variance of pay. If the conditional variance of pay depends on time-constant unobserved factors, not accounting for these can bias estimates of the conditional variance, if they are correlated with independent and dependent variables. The method applied here accounts for unobserved differences in the conditional mean and the conditional variance of pay, using a location-scale model developed by Machado and Santos Silva (2019).

I assume in the analysis that the location and scale functions are known, to specify the empirical model of the relation between pay and performance with control variables as

$$\begin{aligned} \text {log}\ Y_{ifst} = \alpha _{ifs} + X'_{(f)st}\beta +\sigma (\delta _{ifs}+X_{(f)st}\gamma )\varepsilon _{ifst} \end{aligned}$$
(4)

where \(\sigma \) is the scale function. I assume the scale function to be linear in covariates. Here, regressors may only affect the distribution of the response variable through known location and scale functions (Koenker and Bassett 1982). However, heteroskedasticity may not be linear, but can be multiplicative (Godfrey 1978; Koenker and Bassett 1982). In this case, the scale-shift at a quantile q is not linear in covariates but quadratic in covariates (Koenker 2005).Footnote 9 Thus, results should be taken with some caution, as they do not account for second or higher order moments of performance. Most studies of performance–pay do not include polynomials of performance. I do not include polynomials to be in line with the literature, and keep results comparable. I estimate

$$\begin{aligned} {\hat{Q}}_{\text {log}\ Y_{ifst} }(\tau |X_{(f)st}) = ({\hat{\alpha }}_{ifs}+{\hat{\delta }}_{ifs}{\hat{q}}(\tau )) + X'_{(f)st}({\hat{\beta }}+{\hat{\gamma }}{\hat{q}}(\tau )). \end{aligned}$$
(5)

The point estimate of the coefficient of interest l, at the \(\tau \)’th quantile is

$$\begin{aligned} \hat{\beta _l}(\tau ,X_{(f)st})={\hat{\beta }}_l+{\hat{q}}(\tau ){\hat{\gamma }}. \end{aligned}$$
(6)

The scale parameter \({\hat{\gamma }}\) estimates the distributional heterogeneity.

In the estimation procedure, main variables are in logarithmic form in estimations below, but logarithmic notation is omitted here for brevity. The average estimated coefficients \({\hat{\beta }}\) in the MM–QR procedure are obtained by using OLS of time-demeaned independent and dependent variables, regressing \((Y_{ifst}-\frac{\sum _t Y_{ifst}}{T})\) on \((X_{(f)st}-\frac{\sum _t X_{(f)st}}{T})\). Then, the location shift, which is the standard fixed effect from a within regression, \({\hat{\alpha }}_{ifs}\), is predicted from the above estimation of \({\hat{\beta }}\), \({\hat{\alpha }}_{ifs}=\frac{1}{T}\sum _t(Y_{ifst}-X'_{(f)st}{\hat{\beta }})\). The residuals are \({\hat{R}}_{ifst}= Y_{ifst}-{\hat{\alpha }}_{ifs}-X'_{(f)st}{\hat{\beta }}\). The scale parameter, \({\hat{\gamma }}\), is estimated by regressing the time-demeaned absolute value of residuals \((|{\hat{R}}_{ifst}|-\frac{\sum _t|{\hat{R}}_{ifst}|}{T})\) on \(X_{(f)st}\).Footnote 10 The part of conditional variance that is time-constant and unobserved is estimated by \({\hat{\delta }}_{ifs}=\frac{1}{T}\sum _t(|{\hat{R}}_{ifst}|-X'_{(f)st}{\hat{\gamma }})\). The quantile \(q(\tau )\) is then estimated by

$$\begin{aligned} \min _{q}\sum _i\sum _t\rho _{\tau }\bigg ({\hat{R}}_{ift}-\bigg ({\hat{\delta }}_{if}+X'_{f,st}{\hat{\gamma }}\bigg )q\bigg ) \end{aligned}$$

to obtain estimates of quantiles \({\hat{q}}(\tau )\) in the data, where \(\rho \) is the check-function (Machado and Santos Silva 2019).

As the estimation procedure above shows, parameter estimates \({\hat{\gamma }}\) are amended of the executive-firm fixed effects \({\hat{\alpha }}_{ifs}+{\hat{q}}(\tau ){\hat{\delta }}_{ifs}\). Interpretations of point estimates at a quantile \({\hat{q}}(\tau )\) do not depend on time-constant individual characteristics, such as talent or risk preferences. This is an advantage of the estimation procedure not accounted for by most other quantile regression methodologies, and simple to implement. Further, standard errors are clustered via bootstrapping, to account for serial correlation of compensation.

One potential problem of the empirical application is the fixed-T asymptotic bias of the estimated scale parameter \({\hat{\gamma }}\) and quantile \({\hat{q}}(\tau )\) when the number of individuals n relative to the panel length T, n/T, is large (Machado and Santos Silva 2019). This is because the MM–QR estimator assumes that asymptotically, the number of individuals is small compared to the panel length, or as \((n,T)\rightarrow \infty \), \(n=o(T)\). Parameters of average effects, however, remain consistent in short panels with large n. Machado and Santos Silva (2019) show in Theorem 4 that it is possible to remove the bias by using a jackknife. I therefore bias-correct point estimates in main results using the split-panel jackknife method (Dhaene and Jochmans 2015).Footnote 11

This method estimates two scale parameters of half panels, by splitting total executive-year observations N into odd and even years, \(N_{odd}\) and \(N_{even}\). The half-panels have the same (or very similar) number of individuals n as the full panel, but only half as many time periods, allowing us to identify the bias from having a small T by using the sample size weighted differences of full-panel and half-panel estimates. MM–QR estimations are run on both half-panels separately, and, assuming the amount of bias is proportionate to the number of observations N, the scale parameter from MM–QR, \({\hat{\gamma }}_{MM{-}QR}\), is corrected accordingly. The corrected scale parameter is \({\hat{\gamma }}_{JK}=2{\hat{\gamma }}_{MM{-}QR}-{\hat{\gamma }}_{odd}\frac{N_{odd}}{N}-{\hat{\gamma }}_{even}\frac{N_{even}}{N}\). The estimated quantiles, \({\hat{q}}(\tau )\), are also bias-corrected analogously to \({\hat{q}}_{JK}(\tau )=2{\hat{q}}_{\mathrm{MM{-}QR}}(\tau )-{\hat{q}}_{odd}(\tau )\frac{N_{odd}}{N}-{\hat{q}}_{even}(\tau )\frac{N_{even}}{N}\). Thus, if the scale parameter or quantile is over-estimated (under-estimated) when the panel becomes shorter, it is corrected downward (upward). The bias-corrected point estimates of the coefficient of interest l, at the \(\tau \)’th quantile are

$$\begin{aligned} {\hat{\beta }}_l^{JK}(\tau ,X_{(f)st})={\hat{\beta }}_l+{\hat{q}}_{JK}(\tau ){\hat{\gamma }}_{JK}. \end{aligned}$$
(7)

5 Results

5.1 Total compensation

Testing the Hypotheses 14, I estimate regression quantiles using the MM–QR. The dependent variable is the logarithm of 1+total compensation, \(\text {log}\ Y_{ifst}\), for an executive i, in firm f, in industry s, at time t. Performance measures are the logarithm of 1 + transformed short-term firm and industry performance, \(\text {log}\ Shock_{ft}\) and \(\text {log}\ Shock_{st}\),Footnote 12 the logarithm of 1 + long-term firm and industry performance, \(\text {log}\ Trend_{ft}\) and \(\text {log}\ Trend_{st}\), macroeconomic controls \(Z_{ft}\) outlined in the Data section, and year indicators.Footnote 13

I estimate the performance–pay elasticity with a log-log model in the main specification. Here, one assumes that managerial actions affect firm value proportionately to firm size, and that resulting bonuses relate proportionately to firm value. This is especially realistic if pay is also equity-linked, which is the case here (Edmans et al. 2017b). For example, a corporate restructure will likely increase the %-performance of the firm. On the other hand, perquisites, such as buying a private jet, may only reduce $-performance. In the estimation, the fixed effect that is identified is an executive-firm pair fixed effect, as outlined above in the empirical methodology. If an executive switches firms, another fixed effect is estimated. This accounts for unobserved, time-constant heterogeneity of executive-firm matches in the average level, and conditional variance of pay. I estimate

$$\begin{aligned} {\hat{Q}}_{\text {log}\ Y_{ifst}}(\tau |\cdot )= & {} ({\hat{\alpha }}_{ifs}+{\hat{\delta }}_{ifs}{\hat{q}}(\tau ))+(\text {log}\ Shock_{ft} \nonumber \\&+\text {log}\ Shock_{st}+\text {log}\ Trend_{ft}+\text {log}\ Trend_{st}\nonumber \\&+Z'_{ft}+\psi _t)({\hat{\beta }}+{\hat{\gamma }}{\hat{q}}(\tau )) \end{aligned}$$
(8)

using an unbalanced panel, after winsorizing the sample at the 1st and 99th percentiles.Footnote 14 Summary statistics of the winsorized sample are shown in Table 5 of Appendix. Results of the same estimation for the unwinsorized sample are shown in Table 6 of Appendix. I deal with serial correlation of pay by clustering standard errors via bootstrap (Parente and Santos Silva 2016). Even if there is intra-cluster correlation, estimates of quantile regression are also consistent under certain conditions (Parente and Santos Silva 2016). I resample from the regression sample by firm-executive cluster, with 200 replications. Location, scale and point estimates of coefficients of interest at the 10th, 30th, 50th, 70th, and 90th conditional quantiles are reported in Panel A of Table 2. In panel B, I show results from the split-panel jackknife bias correction (Dhaene and Jochmans 2015).

Table 2 Winsorized MM–QR of log total compensation on log of firm and industry performance measures

Hypothesis 1 asks if the short-term firm performance–pay relation is positive and increases with the conditional pay quantile. Testing Hypothesis 1 in Table 2, the point estimates of interest belong to the variable \(\text {Log}\ Shock_{f}\). Evidence from both the winsorized and unwinsorized sample (shown in Table 6 of Appendix) points in the same direction. Both location and scale parameters are estimated precisely in columns one and two. The location parameters reported in column 1 are from a standard fixed effects estimator. The location function shows that one average, a 1% increase in short-term firm value is associated with 0.04% more total compensation. This elasticity is also quantitatively similar at the conditional median.

Now turning to the heterogeneity of the short-term firm performance–pay relation, the scale parameter is negative, showing that conditionally higher paid managers have a lower short-term firm performance–pay sensitivity, which decreases from 0.07 to 0.02, from the 10th to the 90th conditional quantiles. In panel B, the bias-corrected results find larger decreases in the elasticity, ranging from 0.08 at the 10th percentile to 0 at the 90th percentile. This rejects Hypothesis 1 using total compensation as a measure of pay.

In the unwinsorized sample, in Table 6 of Appendix, short-term firm performance–pay sensitivities are larger overall, which is to be expected, since compensation data are very right-skewed. The distributional heterogeneity is also significant, is qualitatively the same, and quantitatively larger, which is in line with the results from the winsorized regressions.

Results in Tables O.1 and O.4 of Online Appendix show that distributional heterogeneity of the short-term performance pay elasticity is robust to a log-level model, for both winsorized and unwinsorized data. Shock values in the log-level model are not scaled by firm size, so the pay-performance sensitivity is a %–$ relation. The results imply that short-term performance–pay elasticity is smaller in the right tail of the conditional distribution. They are against Hypothesis 1, from the perspective of granting compensation, as total compensation is a flow measure. The findings are naturally dependent on this specific measure of pay.

Hypothesis 2 explores if the long-term firm-performance pay sensitivity is positive and decreases with the conditional pay quantile. The coefficient of interest is Log \(Trend_f\) in Table 2. The elasticity of total compensation to long-term firm performance is 0.21 at the median (column 5). Both location and scale parameters are estimated moderately precisely, and the positive scale parameter shows that predicted earnings respond more to long-term changes in performance at the right tail. The elasticity is about 23% higher at the 90th percentile than the 10th percentile, and the difference is statistically significant at the 10% level. This evidence rejects the premise of Hypothesis 2, in which the sensitivity of pay to performance is decreasing in the conditional total compensation quantile.

Regarding the robustness of results, the bias-corrected results in panel B are quantitatively similar. The unwinsorized regressions in Table 6 of Appendix reveal, however, no significant heterogeneity across the conditional distribution. The log-level specification in Tables O.1 and O.4 reveals significant heterogeneity in the same direction as the winsorized log-log specification. The long-term firm performance-total compensation relation is increasing in the conditional quantile in three out of the four tested specifications, and in no case does it go in the opposite direction. The evidence is in line with the notion that higher conditional total compensation is more strongly benchmarked against long-term firm value. It is also worth noting that short-term performance–pay elasticities are smaller than long-term elasticities across the distribution in the winsorized sample, but this reverses in the raw data.

These results speak somewhat against the interpretation that greater conditional total compensation results from short-term managerial actions or managerial skimming, when compensation is granted (Edmans et al. 2019). A potential explanation of the short-term performance–pay relation is that pay is associated with firms’ liquidity. For example, in the financial crisis, firms also cut bonuses of non-managerial employees, even though these employees do not affect overall firm performance to a large degree (Efing et al. 2018). Results are also in line with the story that boards take the long-term firm performance into account when granting compensation.

Hypothesis 3 questions whether short-term industry performance–pay relation is positive and decreases in the quantile. I find no significant relation between industry short-term performance and total compensation in winsorized results in Table 2. The estimate of the location parameter is close to zero. The results from unwinsorized regressions in Table 6 suggest that there is a negative relation between short-term industry performance and pay, but this could be driven by outliers in the data. The results from the log-level specifications in Tables O.1 and O.4 of Online Appendix show similar results. Firms do not appear to use short-term industry benchmarking, but we cannot rule it out. This could be due to measurement of industry shocks, which may not capture special groups of peers used for relative evaluation (Bizjak et al. 2008).

Hypothesis 4 asks if the long-term industry performance–pay relation is positive, and decreases in the conditional pay quantile. The coefficient of interest belongs to the industry trend in Table 2. The location and scale parameters are imprecisely estimated. This is in line with the findings of the literature, in which there is mixed evidence that firms use industry performance benchmarking and also select peers in special groups (Edmans et al. 2017b). I can not entirely rule out long-term industry benchmarking, as estimates are noisy, and again the unwinsorized regressions suggest a negative correlation between long-term industry performance and compensation.

I next test Hypothesis 5, which asks whether short-term firm performance–pay sensitivity is the same for positive and negative short-term firm performance. I also test Hypothesis 6, which predicts that the size of the reduction to the short-term firm performance–pay relation, when performance is negative, increases with the conditional total compensation quantile. This specification allows for different performance–pay sensitivities for positive and negative short-term performance, proxied by positive and negative short-term firm performance. I estimate this using an analogous regression to above, with an added interaction term with \({\mathbf {I}}\{S_{f}<0\}_{ft}\), indicating negative short-term firm performance in firm f, at year t, and run

$$\begin{aligned} {\hat{Q}}_{\text {log}\ Y_{ifst}}(\tau |\cdot )= & {} ({\hat{\alpha }}_{ifs}+{\hat{\delta }}_{ifs}{\hat{q}}(\tau ))+(\text {log}\ Shock_{ft} \nonumber \\&+{\mathbf {I}}\{S_{f}<0\}_{ft}\times \text {log}\ Shock_{ft} + \text {log}\ Shock_{st}\nonumber \\&+\text {log}\ Trend_{ft} +\text {log}\ Trend_{st}+Z'_{ft}+\psi _t)({\hat{\beta }}+{\hat{\gamma }}{\hat{q}}(\tau )). \end{aligned}$$
(9)

Turning to estimation results in Table 3, the location parameters for the coefficients of interest, Log \(Shock_f\), and the interaction with the indicator for negative short-term performance, are precisely estimated. The elasticity between total compensation and short-term firm performance, at the mean and median, is 0.05, in case performance is positive, but reduces to zero, in case firm performance is negative. The estimate of the scale function for the interaction term is positive for the bias-corrected point estimate, but very close to zero. Thus, executives are, quantitatively, equally well insured for bad performance across the conditional distribution, and the lower tail gains more from positive short-term performance.

This is significant evidence for an asymmetry in pay for positive and negative short-term firm performance, which supports Hypothesis 5. However, the asymmetry is quantitatively similar across the conditional distribution, rejecting Hypothesis 6. If anything, the asymmetry from negative shocks becomes smaller.

Regarding robustness of results, the unwinsorized results in Table 7 show similar results quantitiavely and qualitatively, and here the degree of asymmetry is much smaller. The elasticity at the median in case of positive short-term firm performance is 0.22 (not significant), which reduces by 0.028 to about 0.19, in case short-term firm performance is negative. This supports the interpretation that actions and events that affect firm performance proportionately, correlate asymmetrically with compensation. The asymmetry is not driven by actions and events that only affect the dollar change in firm value to a small degree, as the interaction is not significant in any log-level models in Online Appendix.

A potential confound of the short-term performance–pay relation is executives leaving the firm, or being fired if they perform poorly. Campbell and Thompson (2015) find that asymmetry in performance–pay for good and bad firm performance is likely used as a retention device, as the asymmetry is stronger when labor market conditions are favorable for executives. This means the executive’s outside option is higher. Even if some managers are fired, we should still expect to observe asymmetry in short-term performance–pay sensitivity.

Table 3 Winsorized MM–QR of log total compensation on firm and industry performance measures with asymmetry

5.2 Executive wealth

The long-term development of accumulated pay is harder for the board to predict, since once pay is granted, it is harder to be renegotiated and can be strategically influenced by the executive. Firms grant executives stock-related pay over a successive number of periods that eventually vest after at least three years for stock at the median or four years for stock options (Edmans et al. 2019). The stock-related pay holdings comove mechanically with the stock price, which is not the case for salary and cash bonus. This justifies a second test of the hypotheses using total wealth as the dependent variable as a proxy for executive utility. I test Hypotheses 14 again, using total wealth, the value of all accumulated equity-linked, and deferred compensation, as the dependent variable and measure of pay. Results are shown in Table 4 and in Table 8 for unwinsorized regressions.

Hypothesis 1 asks again, if the short-term firm performance-wealth relation is positive, and increases with the conditional wealth quantile. Testing Hypothesis 1, the coefficient of interest belongs to \(\text {Log}\ Shock_{f}\). Column 1 shows an average increase in executive wealth by 0.05% for an increase in short-term firm value of 1%. The scale parameter on column 2 is not significantly estimated, especially the bias-corrected estimate in Panel B. This shows the short-term performance-wealth sensitivity is increasing in the distribution, but that this is imprecisely estimated. The point estimates for short-term firm performance pay substantially increase from the 10th percentile (0.03) to the 90th percentile (0.07). This is an economically significant increase of over 100% change in the short-term firm-performance pay elasticity. This result is in contrast to using total compensation as the measure of pay, and is in line with Hypothesis 1.

The results from log-level models in Tables O.3 and O.6 also support this finding. There is a significant and positive scale parameter estimated for \(\text {Log}\ Shock_{f}\) in the winsorized regression. In Table O.6, the scale parameter has the same sign with positive, and increasing point estimates of \(\text {Log}\ Shock_{f}\) with the quantile.

An explanation for these findings is that stock-related pay in total wealth is very sensitive to the firm value. Further implications of this finding are also important and can be inferred from the literature. If stock-related pay is about to vest in a given month, executives have an incentive to announce M&A transactions or share buy-backs, which can send the stock price soaring (Edmans et al. 2019). A prime example cited by Edmans et al. (2019) was the Bazaarvoice acquisition of PowerReviews in June 2012, which saw executives cashing in $90 Million US in stock after the stock went above $20. Executives were cited to know that the aim of the M&A transaction was to eliminate the primary competitor from the market. An antitrust law suit followed and the stock price declined to $7. This anecdote shows that the value of long-term incentives reacts to short-term behavior, in line with my findings. Higher conditional wealth is more strongly associated with short-term performance. This is in line with the concern that short-term behavior and stock price manipulation is a mechanism managers use to increase their own pay.

Since equity also vests, and the sale of equity is endogenous to the short-term stock price, executive wealth may be negatively correlated with the short-term firm performance for some observations. If the short-term firm value is down, executives may prefer to keep equity that is vested. If there is a jump in short-term firm performance due to a stock buy-back or an announced M&A transaction, they will sell equity that has vested (Edmans et al. 2019). This can reduce the coefficient size. Another reason is that winsorizing the data also introduces measurement error, causing attenuation bias.

Testing Hypothesis 2 again regarding long-term firm performance, the coefficient of interest belongs to \(\text {Log}\ Trend_{f}\). In Panel A of Table 4, the elasticity at the median (and mean, shown by the location parameter) is 0.75. Total wealth increases by 0.75% for a 1% increase in long-term firm performance. The mean and median long-term firm performance–pay elasticity is much larger using executive wealth (0.75), than using total compensation (0.21) as the dependent variable. This finding is in line with the empirical and theoretical literature on executive compensation, which shows that elasticities are generally larger when using total wealth as the dependent variable or utility measure (Edmans et al. 2017b). Further, results show a decrease in the correlation between long-term firm performance and executive wealth in upper quantiles for log-log models. The elasticity decreases significantly using the bias-corrected estimates in Panel B by 0.14 from the 10th to the 90th percentile. Short-term pay drives the wealth of managers comparatively more in the right tail of the conditional distribution.

This result is not supported by log-level models in Tables O.3 and O.6, which test incentives for a dollar change in firm-value. However, M&A transactions and share buybacks are more likely to affect firm value in a proportionate manner, rendering the log-log specification be a more appropriate model. The results show that long-term incentives are present and sizable for executives across the distribution, despite the heterogeneity. The concern about short-term behavior is supported when looking at the results in Table 8, and considering that some executives sell vesting equity when firm value increases.

I test Hypothesis 3 in Tables 4 and 8 using total wealth. The location parameter for short-term industry performance is noisily estimated, but point estimates show a positive correlation between executive pay and industry shocks in the lower tail. There is also significant heterogeneity, with the elasticity decreasing from 0.09 at the 10th percentile to 0.035 at the median. This is in line with Bizjak et al. (2008), who show that benchmarking of executives with respect to industry performance is more likely for executives receiving below median pay, as they have a competitive outside option if the industry is performing well, holding firm performance constant. However, one must be careful comparing results directly, as estimates are of the conditional distribution.

Positive industry benchmarking in the short term is also supported by log-level models in Tables O.3 and O.6 of Online Appendix. One potential mechanism is that executives may receive more pay in the form of stock-related pay if the industry is performing better, with firm performance held constant. When testing Hypothesis 4, parameters are noisily estimated and if anything negative, but far from significant.

Table 4 Winsorized MM–QR of log total wealth on log of firm and industry performance measures

6 Discussion

This paper is concerned with distributional differences in the time horizon of executive compensation. The literature hitherto has identified short-termism as a problem, but not systematically shown whether higher (conditional) pay is associated with more short-term, or long-term-oriented incentives. The quantile regression framework with fixed effects used, MM–QR, allows a systematic analysis of this increasingly important question.

The findings show that performance pay is more short-term-oriented in the lower tail of the distribution when using total compensation. This is potentially driven by risk-sharing, as firms must reduce bonuses in bad times to maintain liquidity (Efing et al. 2018). By contrast, performance pay is more short-term oriented in the upper tail of the distribution when total wealth is used to measure pay. This could be because yearly compensation is easier for the board to measure and control. It can be easily adjusted by the firm to account for short-run and long-run performance in each period. Compensation committees can adjust bonuses and amount of options and performance stock granted, but accumulated long-term pay, mostly consisting of stock-related pay, cannot be easily adjusted once granted. Once awarded, there is more potential for executives to strategically increase their payout, which may not coincide with maximizing long-term business performance.

Stronger wealth sensitivity to short-term shocks is potentially driven by a ratchet effect of long-term incentive plans, found by (Bebchuk et al. 2010). Some firms pay a fixed value of stock options or fixed value of performance stock, which means the number of stocks granted is adjusted according to the last month’s stock price. The calculation of the number of shares or options is usually based on averages of around 30 trading days, which is not enough to eliminate fluctuations that occur on a yearly basis. A low short-term firm value at the grant date would result in a larger number of shares being granted with fixed value, e.g., in the case of performance shares, and possibly also restricted stock. This gives greater leverage to the executive when firm value rises again. In the case of stock options, the same effect occurs if the share price is relatively low before allocation and therefore a lower strike price is set, or the executive receives a larger number of options. Supporting this mechanism, Yermack (1997) finds that stocks show negative abnormal returns before granting stock options, and positive abnormal returns after granting.

A potential confound of the findings in this paper is reverse causation. It is possible for higher wages to change performance; however, there have been scant attempts by studies to show a causal direction from pay to firm performance (Edmans et al. 2017b). Another problem is that contracts are not observed in my study, and are endogenous (Edmans and Gabaix 2016). The study cannot infer without knowing contracts how much of results are driven by strategic manipulation of stock prices and other short-term behaviors, and how much of the results are from an efficient market. Thus, the policy implications must be taken with caution. It is also possible under certain conditions for pay to be more strongly related to the short-term stock performance, when there is sufficient ambiguity about manipulation (Peng and Röell 2014).

Long-term compensation can only be amended ex-post if there are contractual means to restrict the payout to certain conditions, or regain already granted stock-related pay in certain cases. In the US, claw-back clauses that can recover pay in case of fraud are written in law, but these cannot prevent short-term behavior that is legal, such as investment choice.Footnote 15

Theory suggests multiple mechanisms to deal with the concerns about short-termism. An optimal contract with concerns about stock price manipulation after vesting relates current pay to all past performance dates of firm performance, conditions vesting on performance, and not on a fixed date, and can also shift some vesting into retirement (Edmans et al. 2012; Marinovic and Varas 2019). Lengthening the vesting period into retirement is costly for the executive near career end, as this exposes the executive to more risk.

A practical implementation of this would be to link the payout value of stock-related pay to the average price during the entire holding period. Since stock-related pay is often emitted in rolling windows of three or four years, this would not cause a large disadvantage to executives who do increase long-term firm value, and reduce the opportunity to game the incentive pay system. Since firms have limited liability, the stock would not be paid out in case of bankruptcy.

There is always discretion involved in choosing which system to use to incentivize executives. In reality, this is determined by a market equilibrium, making it potentially difficult for a single firm to implement policy recommendations. The results together cast some doubt with the growing body of evidence on effectiveness of long-term incentive plans to prevent managerial skimming.