1 Introduction

In studies explaining firm outcomes with governance variables, researchers commonly assume that they exhibit very little time variation. However, unless there is time variation in the data, there are limited ways to strengthen a claim of a causal effect, for example by using a within-firm estimator. We report power properties of the within-firm estimator for different degrees of time variation, length of time series and frequency of sampling of ownership and board independence. Our results help researchers determine whether they have sufficient power in a given empirical setting and how to increase it.

Typical research questions in finance and accounting consider how the controlling ownership stakes of different types of investors affect firm decisions (Banerjee and Homroy 2018; Larrain and Francisco 2013), how outcomes like cost of capital or performance depend on different degrees of conflict of interest among controlling stakeholders (Bertrand et al. 2002; Lin et al. 2011), or how board independence relates to performance or CEO turnover (Coles et al. 2008; Graham et al. 2020). Establishing a causal relationship between governance and these outcomes is very difficult because, among many other reasons, accounting for unquantifiable confounding factors is rarely possible. For example, a significant relationship between a governance variable and performance may not truly exist, but may appear as a result of unobservable firm characteristics (Lins 2003; Bennedsen and Nielsen 2010). In a survey of the board literature, Hermalin and Weisbach (2003) argue that most board research as of that time had failed to address the endogeneity in board composition. More recent survey and editorial articles continue to highlight the problem of reliably addressing endogeneity in governance research (Adams 2017; Edmans and Holderness 2017). The state-of-the-art best practice employs empirical designs based on natural experiments. However, they are rare, or, if present, the setting is generally imperfect and requires additional strengthening tests that are often based on time variation (Atanasov and Black 2016, 2020).

From an empirical design perspective, the problem of insufficient time variation in ownership is exacerbated by the limitations of vendor-constructed databases that do not provide all ownership links and require extensive pre-processing and verification (Holderness 2009; Dlugosz et al. 2006), and/or the necessity to hand collect data. Board related variables bring fewer data processing challenges, but they are only easily available for listed firms and similarly may change slowly over time (Black et al. 2017). To facilitate research centered on governance variables with low time variation and/or limited availability, our solution consists of methodological guidance based on simulations and verified using real data from a variety of sources (including a rare granular ownership data source). Our simulations generate artificial data series that satisfy a number of constraints to imitate the underlying economic processes behind governance variables. The simulation results provide guidance on the amount of time variation required for sufficient statistical power of within-firm estimators, depending on a variety of institutional features. Our findings are useful to researchers in at least three ways. First, when a project requires the hand collecting of governance data with low variation, by following our simulation results a researcher can efficiently collect non-consecutive data points to ensure sufficient time variation. For example, if a researcher has hand collected two time observations of ownership (board independence) data four years apart for a sample of firms of interest and less than 60% (55%) of them change between the two periods, she could decide whether it is worthwhile collecting one additional time observation to increase the power properties of her tests. This additional time period would reduce the required proportion of changing firms to around 42% (45%) to detect statistical significance if it was present. Second, suppose a researcher uses one of the widely available databases, for example board data from BoardEx or Execucomp, or any of the corporate ownership products of Bureau van Dijk (now part of Moody’s), Thomson Reuters (now Refinitiv) or Standard & Poors. She can use our results to check whether the amount of time variation in her sample of interest is sufficient. This is important because, if she finds no significant relationship, knowing whether this may be due to the lack of time variation helps her determine the choice of additional tests to employ in the project. Or alternatively, if time variation is sufficient, helps strengthen her conclusions that a relationship is unlikely to exist. Third, if a researcher decides not to use within-firm estimators and substantiates this decision based on the lack of time variation in the data, our benchmark results help add a degree of formality to this reasoning.

To show the usefulness of our simulation findings and as a proof of concept, we replicate regressions from two existing studies—Lin et al. (2013) and Coles et al. (2008). Lin et al. (2013) examine how the divergence between cash flow and control rights of the controlling owner affects the proportion of public debt a firm holds. Coles et al. (2008) analyze the relationship between board independence, complexity of expertise required by a firm’s board and Tobin’s Q. Both studies originally do not use a within-firm estimator for their baseline results, motivated by a lack of time variation. We show modifications to their design, guided by our simulation results, where time variation is sufficient, and the more reliable within-firm estimator is able to detect statistically significant relationships as a result of the improved power. For example, in the case of Lin et al. (2013) using firms from the same jurisdiction as in the original paper with ten consecutive time observations (whereas they have on average four per firm) allows the within-firm estimator to detect statistical significance. Importantly, if having ten consecutive time observations is prohibitive, we show that sampling five years apart and having four time observations would be sufficient to find statistically significant results. The conclusions change when time variation is especially low (for example, for US firms). In this case neither more time observations per firm, nor sampling with gaps allows the detection of significance even though it exists.

Lastly, we provide evidence of the different patterns of time variation in governance variables. We employ commonly used data sources (BoardEx, Capital IQ, Thomson Reuters, CSMAR) plus a unique database (ICO by Statistics Canada) that provides a long time series, complete corporate structures and exact direct ownership stakes for all global multinationals that have any subsidiary operating in Canada. We document that the degree of time variation in ownership and board independence differs substantially depending on jurisdiction of control, listed status, family ownership and ownership structure complexity. Therefore, the stability of governance variables cannot always be taken for granted and should be tested for the particular dataset in question, as we do here. For example, the time variation in board independence is greater for non-listed than listed firms and for Anglo-SaxonFootnote 1 than US firms. Time variation in ownership is indeed modest for multinational firms controlled from Anglo-Saxon jurisdictions, and it is likely the case for Continental Europe; however, we find a considerable time variation in the population of Canadian firms. We further find that there is greater time variation in the ownership of family versus non-family-controlled firms belonging to pyramid structures, regardless of their control jurisdiction. In addition, the prevalence of complex pyramidal corporate structures is greater among family firms. Although the distinct features of family owners have been studied extensively (e.g., Almeida and Wolfenzon 2006; Anderson and Reeb 2003; Belenzon et al. 2019), the fact that they tend to adjust the ownership structures of their firms more frequently is a new finding. Across size groupings, unsurprisingly, firms belonging to larger structures change more frequently than in smaller structures. This greater variability for family firms is preserved when grouping by (i.e., controlling for) size and jurisdiction.

We further document the following patterns of time variation: (1) the averages of ownership and board independence change little over time; (2) the average proportion of firms that change over two consecutive years is relatively small for ownership and larger for board independence of listed firms; and (3) extended periods of stability are common for ownership and less so for board independence of listed firms, and it takes several years for a considerable proportion of firms to leave a period of stabilityFootnote 2; (4) the standard deviation of ownership over time is greater for family firms and pyramids in univariate and multivariate analyses; (5) ANOVA and persistence regressions confirm the differing patterns of stability by family control, jurisdiction and ownership structure complexity.

The importance of considering corporate ownership and board independence in empirical research cannot be overstated. Corporate ownership is central to our understanding of firm decisions like growth (Belenzon et al. 2019), innovation (Aghion et al. 2013), mergers and acquisitions (Basu et al. 2009; Boateng et al. 2017), internationalization (Singla et al. 2017) and financing (Lin et al. 2011, 2013). Ownership and board independence shape incentives, preferences, and short-term versus long-term orientation of managers and boards in listed firms (Siegel and Choudhury 2012; Banerjee and Homroy 2018; Ellis et al. 2020) and of owner-managers in family firms (Deephouse and Jaskiewicz 2013); tip the balance of power among managers, and majority and minority owners (Aguilera and Crespi-Cladera 2016; Guo and Masulis 2015); allow for unique capabilities that different owners and directors bring to their firms (Rabbiosi et al. 2019; Edmans and Holderness 2017; Kim and Starks 2015), etc. The importance of quantifying governance variables more precisely (including ownership and board independence) is underscored by the ever-rising use of Environmental, Governance and Social (ESG) considerations in investment decision-making. They are already formally regulated in the UK and across Europe and gradually gaining importance in the US (Eccles and Klimenko 2019; Mooney 2018).Footnote 3 In turn, the presence of sizable controlling stakes versus more diversified holdings of different types of institutional investors has implications for monitoring incentives, shareholder activism and adjustments in executive pay structures, which ultimately affect firm decision making (Azar et al. 2018; Flammer and Bansal 2017; Lardon et al. 2019).

The finance and accounting literature abounds with studies that exploit key strategic changes to answer questions about firm outcomes. These settings often involve ownership structure adjustments (M&As, reorganizations, vertical and horizontal integration, new product and market entry) and board composition changes (independence, minority representation, etc.). Our results open the way for time-variation-based methods of analysis where ownership or board independence is a key variable of interest. We help researchers understand the reason behind a finding of no significance of a within-firm estimator, highlight settings where time variation is likely higher and show researchers how to increase the power of within-firm estimators without necessarily having to collect a large number of time observations. This way we answer the call in several recent editorials and review articles highlighting the importance of addressing endogeneity if the quality of business research is to be improved (Semadeni et al. 2014; Reeb et al. 2012; Abdallah et al. 2015; Holmes et al. 2018).

The rest of the study is organized as follows: Sect. 2 outlines the background and justifies our simulation design; Sect. 3 describes our data sources and replicating regressions; Sect. 4 consists of the detailed analysis of time variation in ownership and board independence variables; and Sect. 5 concludes.

2 Background and simulation design

Corporate ownership and board composition governance variables exhibit little time variation, which limits their potential for clean causal inference. For example, the earliest ownership studies constructed hand-collected samples which were typically static because data was not available in electronic form even for listed firms (La Porta et al. 1999; Claessens et al. 2000; Faccio and Lang 2002, among many).Footnote 4 More recently, electronic sources improved coverage and technological advancements have allowed construction of time series datasets, but they have rarely been exploited to address endogeneity. For example, Villalonga and Amit (2009, p. 3083) explain: “Because these variables exhibit very little time-series variation, we abstain from using firm fixed effects,” while Banalieva and Eddleston (2011, p. 1065) acknowledge that “the standard fixed effects estimation is infeasible”. More recently, Pronobis and Schaeuble (2020, footnote 21) write “A closer look to the dataset reveals that our foreign ownership variables are relatively sticky. That makes it difficult to get a sufficient number of observations for which we can identify changes in foreign ownership.” On this basis, they acknowledge the subsequent lack of power: “Therefore, the results provided by estimating our change model have only limited explanatory power”. Similar examples abound in the literature on board independence. Coles et al. (2008) argue that including firm fixed effects is not appropriate in their setting because most of the variation in board size arises in the cross-section instead of in the time series. Choi et al. (2007) report that their results on the value of outside directors disappear once they introduce firm fixed effects. Black and Kim (2012) highlight the advantages of their Korean data set, as this provides enough time variation in the variable “outside directors” to make within-estimation feasible. Wintoki et al. (2012, p. 591) acknowledge that “[b]oard structure is highly persistent [, which] can reduce the power of any panel data estimator”. More recently, Frye et al. (2021) refer to the stickiness of board structure and in all their specifications they only use industry fixed effects, rather than a within-firm estimator.

Faced with this lack of time variation, a few of the papers employ a random effects estimation which does not rely exclusively on the time dimension of the data. However, it is not appropriate to consider this estimator as an alternative to fixed effects because the underlying assumption of the random effects estimator is exogeneity.Footnote 5 Empirically, Black et al. (2014) perform extensive testing and reject the equivalence of the fixed effects and random effects estimators in the context of multi-country governance studies. The fixed effects and first difference estimators are the ones alleviating time invariant endogeneity.Footnote 6 As Reeb et al. (2012, p. 214) highlight: “Evidence of causal relation with unit level fixed effects can be quite compelling […].” Bliese et al. (2019, p. 9) go as far as to refer to the within-firm estimator as the “gold standard to which results from other analytic options are compared”.

There is growing scientific consensus that the most reliable causal inference methodologies are shock-based designs often referred to as quasi-natural experiments (Atanasov and Black 2016, 2020). These approaches often require at least two time points—before and after the shock—for a difference-in-difference (DiD) design (for example, Aguilera et al. 2017; Liu et al. 2015). Importantly, even when an instrument and/or a shock is available, the design may be imperfect and require additional strengthening measures to rule out alternative explanations (Atanasov and Black 2020). For example, one could include additional covariates and/or firm fixed effects in a DiD design to get closer to satisfying the parallel trends assumption. Furthermore, routinely required tests to support causal inference findings are placebo tests to show that a significant finding around the shock disappears around a different date. However, the absence of significance of the placebo test could be a mechanical consequence of the lack of time variation in the key explanatory variable and cannot be used as supporting evidence for the DiD findings.

Given the small amount of time variation in ownership and board independence and/or limited data availability, some researchers have opted to use data with gaps. For example, Franks et al. (2012) focus on family control at two points in time ten years apart, Basu et al. (2017) study blocks of any kind of owners five years apart, while Lin et al. (2011, 2012) construct ultimate ownership for four points in time two and three years apart. Wintoki et al. (2012) sample board independence every two years, while Boone et al. (2007) and Linck et al. (2008) sample every three years. There is no formal guidance in the literature on the gap length that ensures sufficient time variation in the variable of interest. Even in cases where electronic databases make uninterrupted time series readily available, extensive pre-processing and cleaning may preclude the researcher from examining every single consecutive observation. In addition, in the case of ownership, these series may not reflect the complete set of ownership links, and additional data-collection is likely to be necessary.Footnote 7

The first article to employ the fixed effects estimator in the study of managerial ownership and firm performance, using consecutive observations from a widely available database, found no significance (Himmelberg et al. 1999). However, one more recent extension (Kim and Lu 2011) and a comprehensive in-depth analysis (Fabisik et al. 2021), using a longer time period with greater time variation as a result of stock-based executive compensation, identified a significant relationship. Similarly, Graham et al. (2020) document that although there is large persistence in board independence, over longer horizons there are significant within-firm changes in board structure. This invites the question of the amount of time variation sufficient to identify a relationship, if it exists, as well as the number of observations necessary to detect it when data is not readily available, both of which we address in this work.

The time variation in most governance variables is shaped by the regulatory and institutional environment in which firms operate. In the case of ownership, the values naturally cluster at important threshold points: simple majority, supermajority and 100%. For board independence, different jurisdictions have gradually introduced legal minimums for listed firms. In addition, firm charters determine the frequency of replacing board members. There are also differences in terms of the range of values governance variables can take. For example, cash flow rights can be very close to 0, for firms held by multiple layers of subsidiaries, all the way to 1, for wholly owned firms. Board independence of listed firms is most likely to vary between 0.5 and 0.95. All these differences are paramount in both parts of our empirical design. In this section, we simulate separately the time variation in ownership and board independence as close as possible to their real-life characteristics, whereas in section four, we capture these differences by performing statistical analysis of real data from multiple angles (averages over time, cross-sectional and time-series standard deviation, year-to-year variation, persistence, etc.).

We begin by modelling the time series evolution of ownership that reflects its specific nature: being a particularly stable (“sticky”) variable, changing in a step-wise fashion and clustering in ranges with economic importance (for example, slightly above 50%). We employ transition probability matrices for this purpose.Footnote 8 A transition matrix \(T\) contains the probabilities \({p}_{rc}\) of moving from state \(r\) (along the rows) to state \(c\) (along the columns). The states are ownership ranges that reflect important cutoffs used for significant decisions in corporate charters, for example, absolute majority (50% + 1 vote) or supermajority at two thirds and three quarters. We pick the minimum number of states that capture the important cutoffs and the typical patterns of ownership stakes as summarized in Faccio and Lang (2002), Claessens et al. (2000) and Holderness (2009):\(own<0.05\);\(0.05\le own<0.5\);\(0.5\le own<0.51\);\(0.51\le own<0.6\);\(0.6\le own<0.75\);\(0.75\le own<0.9\);\(0.9\le own<1\);\(own=1\). For board independence, there are less likely to be common institutional cut-offs, because mandated minimums are jurisdiction-specific, not always codified, and often under a comply-or-explain regime. Thus, our simulations use data-driven quartiles (as in Graham et al. 2020), where the transition probabilities and bin cutoffs are determined by real-life global data in the BoardEx and CSMAR databases.Footnote 9

We impose a structure on the ownership transition matrix generated in each iteration of the simulations satisfying a minimum number of rules consistent with stylized facts we already know from existing work and the nature of ownership data: (1) rows sum up to 1 (general property of transition probabilities); (2) the diagonal elements are very high (reflecting stickiness); (3) probabilities along the two off-diagonals (superdiagonal and subdiagonal) are relatively higher (reflecting the greater chance of moving to an adjacent state; for example, the probability of moving from bin 3 (\(0.5\le own<0.51\)) to bins 2 or 4 is higher than to other bins); and (4) the probabilities along the last column are also relatively higher (reflecting a tendency towards wholly owned subsidiaries (Nicodano and Regis 2019)).Footnote 10 This structure would hold for any one of the three controlling ownership measures used in the literature: ultimate cash flow rights, control rights or the ratio between the two (often referred to as wedge; for example in Faccio et al. 2011 and Lin et al. 2011, 2012 and 2013). It is likely that the ultimate cash flow rights variable exhibits greater time variation than control rights, because any rearrangement of the corporate structure will affect it, while control rights are based on a threshold of control and will only change if that threshold is reached. In this case, the wedge variable will co-vary with ultimate cash flow rights. Of course, there may be cases where both ultimate cash flow rights and control rights change to the same degree, whereby the wedge may appear more stable. To accommodate research applications with any of these ownership measures and a variety of settings from different jurisdictions where ownership may vary to a different degree over time, we define two degrees of stringency of the regularity constraints (2)–(4) above: one for a case of relatively higher time variation and another for lower time variation.

For board independence we define the data-driven quartile bins as follows: \(board\; indep<0.5\); \(0.5\le board\; indep<0.65\); \(0.65\le board\; indep<0.8\); and \(0.8\le board\; indep<0.95\). These cutoffs are based on the actual distribution of board independence in the BoardEx database for global listed and non-listed firms and in the CSMAR database for Chinese listed firms. Transition probabilities and bin widths by jurisdiction, listed status and state ownership are given in the online appendix. In the context of board independence, time variation will depend on the regulatory and institutional setting in different countries with respect to listed, private and state-owned firms. For example, the Chinese Securities Regulatory Commission (CSRC) mandates that as of June 30, 2003, a minimum of one third of all board directors of a listed firm should be independent. While, among Anglo-Saxon countries and most of Europe, listing rules require that more than half of directors be independent (Papadopoulos 2019). To reflect this variety in the legal range of levels board independence can assume and thereby its scope for change, we again adopt a low and high time variation regime for the simulations of board independence.

The simulation analysis follows the standard power calculation framework outlined in Murphy et al. (2014). We generate artificial data to assess the power of hypothesis tests in a firm-fixed effects specification when using different time-series lengths and frequency of sampling, under the absence or presence of time-variant endogeneity. Our strategy is to generate pseudo-random samples by a process following a theoretical relationship of interest. Our focus is determining whether tests of statistical significance of the coefficient of interest are able to detect this relationship (by rejecting the null hypothesis). Intuitively, since our artificial data was generated under the alternative hypothesis, we should reject the null hypothesis most of the time. Conventionally, a test with a power of 80% or above is considered adequate (see Murphy et al. 2014).

More specifically, the true model underlying our generated data takes the form:

$$ y_{it} = \alpha_{i} + \gamma_{t} + \beta X_{it} + \lambda W_{it} + \varepsilon_{it} $$
(1)

where \(\beta\) is the primary coefficient to be estimated,

\(X_{it}\) is artificially generated ownership/board independence data based on a random transition probability matrix \(T\) following the minimal structure described above,

\({W}_{it}\) is artificially generated data for a generic control variable (like size, leverage, tangibility, etc.) that is correlated directly with \(X\) and with \(Y\) via its error term \(\varepsilon \), where \({W}_{it}=\pi {X}_{it}+{\varepsilon }_{it}+{\xi }_{it}\). We choose relatively high levels for \(\pi \) and \(\lambda \) of 0.9 to generate a conservative case of high time-varying endogeneity.Footnote 11 In the set of results under time-varying exogeneity \(\pi =0\) and \(\lambda =0\).

\({y}_{it}\) is a hypothetical outcome variable (e.g., firm value, operating performance, public–private debt ratio, foreign direct investment, export intensity, innovation, risk-taking, employee turnover, etc.),

\({\alpha }_{i}\) are firm-specific effects generated as \({\alpha }_{i}\sim IIDN\left(\mathrm{0,1}\right)\),

\({\gamma }_{t}\) are time-specific effects,

\({\varepsilon }_{it}\) and \({\xi }_{it}\) are Gauss-Markov disturbances generated as \({\varepsilon }_{it}\sim IIDN\left(\mathrm{0,1}\right)\) and \({\xi }_{it}\sim IIDN\left(\mathrm{0,1}\right)\).

Under time-varying endogeneity we also allow the explanatory variable to be related to \({\varepsilon }_{it}\) through the transition probabilities \({p(X}_{rcit}|\psi ,{f}_{t})\). We follow Bazzi et al. (2017). The parameter vector \(\psi \) contains all static parameters that govern the transition probabilities, while \({f}_{t}\) captures the dynamic elements that depend on \({\varepsilon }_{it}\). Making \(X\) and \(W\) dependent on the disturbance term of the outcome and correlated with each other is realistic and creates correlation with the error, which violates the zero conditional mean assumption and leads to biased estimates.Footnote 12

We simulate data under (1) using three different values for beta, \(\beta =0.4, 0.6, 0.8,\) for two transition matrices reflecting high and low time variation in \(X\). We then estimate regression model (1) on a subsample of the generated data using different gaps and lengths of time. The null and alternative hypothesis are:

$$ {\text{H}}0:\beta = 0 $$
$$ {\text{Ha}}:\beta \ne 0 $$

Table 1 reports the proportion of times out of 1000 iterations that the null hypothesis is (correctly) rejected. If we find a rejection rate for hypothesis H0 of at least 80%, the amount of time variation is sufficient for statistical power purposes (shown in bold). In Panel A of Table 1 we simulate data that mimics a situation in which the researcher has available consecutive data points for 10, 15 and 20 years shown along the x-axis. The different values of the theoretical beta coefficient in (1), i.e., \(\beta \) = 0.4, 0.6 and 0.8, are given along the y-axis. The reported results are based on a conservatively small sample of 500 firms, given the limitations of hand-collecting data. The first and third sections summarize the results where strict exogeneity holds; i.e., \(cov\left({X}_{it},{\varepsilon }_{it}\right)=0\) in (1), and the second and fourth sections summarize the results where ownership is endogenous; i.e., \(cov\left({X}_{it},{\varepsilon }_{it}\right)\ne 0\) in (1) and \(cov\left({W}_{it},{\varepsilon }_{it}\right)\ne 0\). Results for the low (high) time variation case are at the left (right) of each table.

Table 1 Simulation results

We find that for the simulations exploiting consecutive time observations (10, 15 and 20), whether based on relatively high or low time variation, if there was a true relationship between \(X\) and \(y\), it should be uncovered by hypothesis tests under time-varying exogeneity of \(X\) (first and third sections of Panel A Table 1). However, under time-varying endogeneity (second and fourth sections of Panel A Table 1) a weak theoretical relationship (\(\beta \) = 0.4) cannot be detected for any time series length. In the case of \(\beta \) = 0.6 and low time variation, power is insufficient for ownership, but for board independence, 20 years of data overcomes this.Footnote 13

Since in many cases collecting ten consecutive years of data may not be practical, we repeat our simulations using different time-series lengths and frequency of sampling. In Panels B and C we consider data with gaps—every 2–5 years—(along the y-axis) for two, three or four time observations (along the x-axis). For example, the coordinate (two time obs, every three years) is consistent with collecting data in 2010 and 2013. The different values for the beta coefficient in Eq. (1) are represented vertically. The first section of Panels B and C shows results where strict exogeneity holds, whereas the second section shows results for the presence of time-varying endogeneity.

In Panel B (ownership), for the high time variation case (right-hand side), we find that the 80% power threshold is reached for a theoretical \(\beta \) = 0.8 and any number of time points and gaps in sampling, while for \(\beta \) = 0.4, two or three time observations less than four years apart are no longer sufficient. The results are somewhat stronger in the right-hand side of Panel C (board independence), with additional cases of sufficient power for two time observations every three years under \(\beta \) = 0.6, as well as three time observations every three years under \(\beta \) = 0.4. For low time variation (left-hand side), in Panel B (ownership), power is only sufficient for \(\beta \) = 0.8, with four time observations sampled three years apart or more. In the left-hand side of Panel C (board independence), power is a lot higher—being sufficient in all cases under \(\beta \) = 0.8 and for three and four time observations under \(\beta \) = 0.6 and even under \(\beta \) = 0.4 for four time observations every three years or more. The results in Panel B show that collecting ownership data with gaps does not generally lead to good power properties when time variation is low, except when the theoretical relationship is strong and there are at least four time observations. By contrast, in the case of board independence (Panel C), even for weaker theoretical strength under relatively low time variation, sampling every three years or more and having at least four time observations provides sufficient power.

In the right-hand sections of Panels B and C we present results under time-varying endogeneity. We find that endogeneity leads to worse power properties than under exogeneity for both the low and high time variation cases. For ownership (Panel B), power is only sufficient under \(\beta \) = 0.8, high time variation and three or more time observations, while for board independence (Panel C), we see cases of sufficient power for low time variation and \(\beta \) = 0.6. We caution that the power measure is not as informative in the presence of endogeneity since the t-statistic at its basis has a biased numerator.

To sum up, if a relationship between \(X\) and \(y\) exists, consecutive data of ownership or board independence covering at least ten years should detect it when using a within-firm estimator under exogeneity. Under time-varying endogeneity, however, a weaker theoretical relationship cannot be detected even with 20 years of data and large time variation. Under the practical approach to collecting data with gaps in the presence of time-varying endogeneity, hypothesis tests are more likely to detect a relationship with ownership (board independence) if there are at least three time observations for a relatively strong (even for less strong) theoretical relationship and time variation is high (or even when time variation is low).

In light of our simulation results, we can revisit some existing work where time variation can be deduced. Donelli et al. (2013) report explicitly an average proportion of 6–7% of firms changing each year over a 20-year period for their Chilean data. This degree of time variation clearly corresponds to our low variation case (lower left corner of section two of Panel A Table 1). In their Table 8 they report various outcome regressions, whereby ownership changes are significant only when firm fixed effects are not included. There could be two possible explanations: (1) either time-invariant endogeneity leads to bias in the OLS estimator and erroneously shows significance, or (2) the relationship indeed exists but the low time variation in their data prevents them from detecting it with a within-firm estimator. Our simulation results show lack of power for the same 20-year period, with theoretical relationship strength below \(\beta \) = 0.8 and low time variation, as is the case in their data, and therefore provide support for explanation 2. Thus, the conclusion of Donelli et al. (2013) regarding a weak theoretical relationship between ownership and real outcomes is supported by our work.

In Panels A (ownership) and C (board independence) of Fig. 1 we summarize the simulation results under time-varying endogeneity and sampling data with gaps to produce a relationship between a measure of time variation (the proportion of firms changing between one time point and the next) and statistical power. These graphs show the predicted marginal probabilities of rejecting H0 based on a logit specification using the data from all simulations.Footnote 14 The logit model regresses power on frequency of sampling, the proportion of firms that change between the first and next period, true beta, number of time observations, and starting year of sampling.Footnote 15 We show statistical power (along the y axis) as a function of the proportion of firms that change between two time periods (along the x axis) for three different strengths of the theoretical relationship being tested, sampling gaps and number of time observations. Suppose a researcher has collected two time observations five years apart for a sample of firms, as in Basu et al. (2017). She can compute the proportion of firms that change between the two points in time; then she can compare it to the required 63% (for \(\beta \) = 0.8) or 86% (for \(\beta \) = 0.6) for sufficient statistical power of at least 80% (bottom row of Panel A of Fig. 1). If she observes lower proportions, she could decide to collect an extra time observation, which reduces the required range to 40% and 60% respectively.

Fig. 1
figure 1figure 1figure 1

Statistical power as a function of time variation under time-variant endogeneity of ownership and board independence

Two of the studies we surveyed allow us to deduce the proportion of observations that change in their sample but only over the whole period of analysis: in Lin et al. (2011), 21% of firms exhibit changes in ownership over a 13-year period, while in Lin et al. (2013) the proportion is 39% over a 10-year period. In the first paper, the authors collect ownership data four years apart. To see how these degrees of time variation map with our results, in Panel B of Fig. 1 we present a subset for sampling every four years as a function of the proportion of firms that change between the first and last year of data (instead of the following year, as in Panels A and C). We note that Lin et al. (2011) opt for difference regressions not on their full data but only on the observations that do change. The 13 years of data sampled every four years corresponds to the right-most graph in Panel B. For 17% of firms changing (close to the 21% in Lin et al. (2011)) between the first and last period, sufficient power is only present for a theoretical beta coefficient of 0.8. Therefore, a presumed lack of power on a full first difference estimation in their case is consistent with a relatively lower theoretical beta.

For board independence under endogeneity (Panel C of Fig. 1) we show sampling every two years, consistent with the design in Wintoki et al. (2012), and every four years for contrast. For low theoretical strength and two time periods power is very low, but for four time observations we find that power is enough once 45% of the firms change between the first and next sampling period. For higher theoretical strength this proportion drops to 10%. We examine these findings further in the next section, where we replicate two influential studies where ownership and board independence are the key variables of interest.

Figure 1 may help researchers in several ways. Even when consecutive years of data are easily available in electronic form from data vendors, careful empirical design will benefit from comparing the degree of time variation in the data (relevant to a research question of interest) to our benchmark results. For example, if the researcher finds no significant relationship and the amount of variation in their data is low relative to our benchmark results, the decision not to use within-firm estimators can be substantiated with more formality. When data collection is a hurdle, for example when board data for private firms is not readily available, starting with two non-consecutive time observations of data and measuring the proportion of firms that change allows the researcher to decide whether collecting an additional time observation of data is worthwhile. Furthermore, a researcher starting a new project could look up an existing study that uses data with similar characteristics (jurisdiction, family control, listed status, etc.), check the proportion of firms that change in that study, and, coupled with our power graphs, choose a sampling time gap that provides sufficient statistical power.

3 Data and replicating regressions

Next, we check whether our simulation findings play out in real-life data by replicating two highly cited studies focusing on ownership (Lin et al. 2013) and board independence (Coles et al. 2008). We use several data sources for ownership and board independence data. Our most comprehensive source (ICO) provides a long time series coverage of detailed controlling ownership for a large number of firms. It is distinctive from most of the popular ownership databases in the following ways. First, it provides the complete chain of ownership links from any firm operating in Canada (above a certain size thresholdFootnote 16) to its ultimate owner. This disclosure is mandated every year under the Corporations Returns Act (CRA). By contrast, the family of ownership databases compiled by Bureau van Dijk (BvD)—Osiris, Orbis, Amadeus, Fame, etc.—update their historical data only if new information becomes available, which means that it has uneven sampling gaps by design. We summarize all differences between our data source and BvD in the online appendix. Importantly, CRA covers multinationals as long as they have a subsidiary in Canada, which allows us to observe all cross-border and foreign ownership links that lead to the ultimate controlling owner and therefore we have a large number of global companies. One of the most valuable features of this data source, which we exploit in the next section, is that it covers all non-listed firms, as opposed to the typical database coverage, where only some non-listed firms self-select to provide disclosure. We have extensive coverage for Anglo-Saxon firms (19,082, of which 1369 are listed) and somewhat less from Continental Europe (6269, of which 308 are listed) and Rest of the World (3019, of which 137 are listed)—refer to Fig. 2 for full coverage of the data and further splits by family control and ownership structure complexity. We perform an extensive hand-collection of family control status, apply a verification algorithm with multiple alternative sources and exploit the long time series to detect inconsistencies and outliers. Last, we compute three ownership variables—cash flow rights of the ultimate owner, control rights of the ultimate owner and the ratio between the two—using computational tools from graph theory and formalized in Almeida et al. (2011). To ensure greater representativeness for the rest of the world, we augment all analyses with the CSMAR database covering all listed firms in China and offer further splits by state ownership.

Fig. 2
figure 2

Data coverage of the ICO database. This figure shows the coverage of the ICO data for productive firms only (excluding financials, holding companies and charities – exact definition in Appendix 1). The same figure for non-productive firms is in the online appendix.

The raw direct ownership stakes we start with come from the Intercorporate Ownership Database (ICO) compiled by Statistics Canada for the period 1995–2019. The CRA specifies a penalty of fines and/or prison for failure to disclose ownership information. In ICO, all firms are attributed to belong to structures referred to as “enterprises” based on common control.Footnote 17 The two groups of firms (domestic Canadian and multinationals) differ in that the domestic dataset includes a large majority of private and small firms, while the multinational (MNC) dataset represents firms that are larger and more likely to be listed. For the replicating regressions we are limited only to listed firms, for which there is financial data (in either Capital IQ or Thomson One) and for which we can construct an outcome variable (public-to-private debt ratio and firm performance as proxied by Tobin’s Q) plus the control variables in each of the two studies we replicate. We compute three ownership variables: the ultimate cash flow rights of the controlling owner (ucfr), the control rights of the controlling owner (cr) and the ratio between the two (wedge). ucfri is the proportion of one unit of disbursement from firm i that is received by the controlling owner, while cri is the critical control threshold, which is shown to be equivalent to the concept of the weakest link (as used in La Porta et al. 1999, Claessens et al. 2000, and Faccio and Lang 2002) when cross-shareholdings and multiple links are absent but can also be computed for more complex structures. We follow Almeida et al. (2011) in the construction of ucfr and cr. In the remainder of the text, for brevity, we use the term “ownership” synonymously with any one of the three variables. Full definitions of ownership and board independence variables are given in Appendix 1, while summary statistics of all variables used in the two replications are in Appendix 2.

We collect board independence data from BoardEx, which is the most comprehensive board composition database, covering more than 20,000 companies globally. We separate firms into four jurisdiction groups in descending order of coverage: US (12,559, of which 7342 are listed), Anglo-Saxon (7228, of which 5258 are listed), Continental Europe (3532, of which 2865 are listed) and Rest of the World (5716, of which 4993 are listed). Again, we augment the East-Asian coverage by analyzing board independence of all Chinese listed companies in the CSMAR database (2613, of which 1000 are state-owned). For the illustrative regression on board independence, we use the same data as in the original study—US listed firms—and retrieve the required financial variables from Compustat.

We begin with a quasi-replication of Lin et al. (2013). They study the choice between public and private debt among Western European and East-Asian companies. Lin et al. (2013) use consecutive time observations for a 10-year period (2001–2010), whereas in two earlier related studies (Lin et al. 2011, 2012) they construct ultimate ownership for four points in time that are two and three years apart. Although Lin et al. (2013) have a 10-year period, their unbalanced panel provides on average four time observations per firm (9783 firms and 43,273 observations in their Table 4, which contains the specification we use). Our ownership data cover 25 years and allow us to select firms that have sufficient number of consecutive time observations so that we are able to examine different time series properties.

We adopt their specificationFootnote 18:

$$ debt\; choice = \gamma_{t} + \eta_{j} \gamma_{t} + \beta_{1} wedge + \beta_{2} zscore + \beta_{3} wedge*zscore + \mathop \sum \limits_{k = 4}^{10} \beta_{k} X_{k} + \mathop \sum \limits_{q = 1}^{n} \theta_{q} ind_{q} + \varepsilon $$
(2)

Our extensions employ the within-firm estimator:

$$ debt\; choice = \alpha_{i} + \gamma_{t} + \eta_{j} \gamma_{t} + \beta_{1} wedge + \beta_{2} zscore + \beta_{3} wedge*zscore + \mathop \sum \limits_{k = 4}^{10} \beta_{k} X_{k} + \varepsilon $$
(3)

where we suppress firm-year subscripts for brevity; \(debt choice\) is the ratio of public to private debt; \(\alpha_{i}\) are firm fixed effects, which we use in our extensions but which are not present in Lin et al. (2013); \(\gamma_{t}\) are year effects, \(\eta_{j} \gamma_{t}\) are country × year effects; \(X_{k}\) is a vector of control variables: cash-flow rights, leverage, tangibility, size, profitability and Tobin’s Q, all defined exactly as in the original paper (\({ind}_{q}\) are industry fixed effects, which are absorbed by the firm fixed effects in our extensions).

We use a sample of similar but not identical firms because the data source in Lin et al. (2013) is a proprietary database (ORBIS by BvD—refer to the online appendix for the differences in precision between the BvD family of databases and ICO). However, we apply the exact same regression specification as in their Table 4, given in Eq. (2) above, plus within-firm estimators with different frequencies of sampling, given in Eq. (3). We show results for three jurisdictional samples in Table 2: All firms (Panel A), Continental Europe and East Asian firms—CEEA (Panel B) and US & UK firms (Panel C). We examine how the results may differ for firms not analyzed in Lin et al. (2013)—US firms, which exhibit lower time variation in ownership than those in Western Europe and East-Asia.

Table 2 Illustrative regression results

Column (1) of Panels A, B and C in Table 2 reports the same OLS estimator used by Lin et al. (2013), while in column (2) we show the within-firm estimator using the full data. In columns (3) and (4) of Panels B and C we limit the sample to contain four time observations with sampling frequency every two and every five years. To maintain comparability between the two sampling frequencies in Panels B and C, we keep the number of time observations the same, which means that in the five-years-apart case we need at least 16 years of data (including the first and last year).

When using the full data with a OLS specification with industry dummies (column (1) of Panels A, B and C), we confirm Lin et al. (2013) findings that the use of public debt is lower when firms have a higher wedge and z-score, but the interaction of the two counteracts this effect. However, in column (2) of Panel A, using the within-firm estimator results in lack of significance. This could be attributed to the low time variation in ownership of US firms, which represent around half of the sample.

The within-firm estimator in column (2) of Panels A, B and C only detects significance for the sub-sample of CEEA firms. At the bottom of Panels B and C in Table 2 we report the proportion of firms that change between the first and second sampling period. In column (3) Panel B Table 2 only 14.6% of CEEA firms change and the wedge effect is not detected, but five years apart (column (4)), 35.7% of the firms change and the effect is revealed.

For US & UK firms (Panel C of Table 2), however, the potential effect of the wedge cannot be detected by the within-firm estimator (columns (2), (3) and (4)). The low degree of time variation for the US & UK sample is evident in that only 10.8% of firms change in two consecutive years, while the proportion increases slightly to 13.4% and 17.1% respectively for two and five years apart. Both sets of results in Panels B (CEEA firms) and C (US & UK firms) are consistent with our simulation findings. In particular, having ten years of data is sufficient to detect an effect if it exists and is relatively strong (second section of Panel B Table 1); however, sampling five years apart with four time observations requires at least 24% of firms to change (left-most graph on the bottom row of Panel A Fig. 1).

Next, we replicate the regressions in Table 5 of Coles et al. (2008). They study the effect of firm complexity on firm value as measured by Tobin’s Q in requiring a variety of rich expertise from the board. Their specification is:

$$ Q = \gamma_{t} + \beta_{1} insider\; frac + \beta_{2} outsiders + \beta_{3} advice + \beta_{4} advice*outsiders + \mathop \sum \limits_{k = 5}^{11} \beta_{k} X_{k} + \mathop \sum \limits_{q = 1}^{n} \theta_{q} ind_{q} + \varepsilon $$
(4)

Our extensions employ the within-firm estimator:

$$ Q = \alpha_{i} + \gamma_{t} + \beta_{1} insider\; frac + \beta_{2} outsiders + \beta_{3} advice + \beta_{4} advice*outsiders + \mathop \sum \limits_{k = 5}^{11} \beta_{k} X_{k} + \varepsilon $$
(5)

where we suppress firm-year subscripts for brevity; \(Q\) is Tobin’s Q; \(\alpha_{i}\) are firm fixed effects, which we use in our extensions but are not present in Coles et al. (2008); \(\gamma_{t}\) are year effects; \(insider frac \) = 1–board independence; \(outsiders\) is the log of the number of independent directors; \(advice\) is a dummy variable equal to 1 if the firm-year observation ranks above the median by the first principal component of firm complexity based on the number of segments, size and leverage; \({X}_{k}\) is a vector of control variables: R&D dummy, standard deviation of returns, profitability plus its lag, intangible assets and CEO ownership, all defined as in the original paper; and \({ind}_{q}\) are industry fixed effects, which are absorbed by the firm fixed effects in our extensions. Here we require three time observations four years apart, which means a minimum of nine years of data plus one for lagged profitability.

We begin by mimicking the number of observation in Coles et al. (2008) in Panel D of Table 2. We are unable to replicate their number of firms, since they are not reported in the original study. We show that the within-firm estimator only detects the significance on insider fraction for the subset of firms with high time variation in column (4). This implies that the sample Coles et al. (2008) were working with likely had low time variation. Next, we exploit our full sample, which is larger than Coles et al. (2008), despite the fact that we limit the time period to ten years as is the case in their paper. This is due to the better coverage in BoardEx, which begins in 1999, while the Execucomp data in Coles et al. (2008) covers the period 1992–2001, but is sparser. The OLS and within-firm estimators in columns (1) and (2) of Panel E Table 2 detect the effect of the proportion of insiders as found by Coles et al. (2008) in their Table 5 column (3). When we sample with gaps using three time observations, power decreases but is still enough to detect the effect in column (3) given that 34.8% of firms change between the first and next sampling period (consistent with the dashed line in the top right graph of Panel C in Fig. 1, which requires more than 30% of firms changing). In column (4), however, 41.2% of firms changing four years apart no longer provides sufficient power, as the bottom right graph of Panel C in Fig. 1 requires more than 45% of firms to change.

Our replicating regressions verify that the within-firm estimator can detect a relationship between a sticky variable of interest even in a short panel as long as the sampling is performed as far apart as necessary to capture sufficient time variation. This is important, because within-firm estimators (even if imperfect) are more reliable in establishing causality than alternative approaches in the absence of shocks or valid instruments.Footnote 19

4 Empirical analysis

Last, we perform six types of time variation analysis to investigate how the four variables we consider map along our simulation findings (Table 1 and Fig. 1) depending on jurisdiction, listed status, family or state control and ownership structure complexity. Three of the analyses are of descriptive nature (averages over time, year-to-year variation and stable regimes), while the other three analyses are regression-based (analysis of variance, regressions showing the determinants of the time-series standard deviation of ownership and board independence and persistence regressions). We show additional sub-sample results along all six types of analyses in the online appendix.

The granular ownership data in ICO with a long time dimension allows for the most detailed sample splits, the dominance of state-owned firms in China provides an additional interesting dimension, while BoardEx shows the time variation patterns in board independence by jurisdiction and listed status. The composition of the ICO database is presented in Panel A of Fig. 2. The group of MNCs controlled from Anglo-Saxon countries is large and provides relative jurisdictional homogeneity. Therefore, our baseline analysis focuses on the two largest groups: Canadian firms and Anglo-Saxon MNCs. We present results for the two remaining groups—Continental Europe and Rest of the World—in the accompanying online appendix (the right-most two rectangles of each graph in Panel A of Fig. 2).Footnote 20 Anglo-Saxon countries are united by a type of capitalism characterized by a lower degree of government intervention and greater reliance on free market mechanisms (Esping-Andersen 1990). We focus on productiveFootnote 21 firms that are part of either a pyramid or a group corporate structure.Footnote 22,Footnote 23 This approach may appear too restrictive at first glance, but we emphasize that the granularity of the data means that firms which usually would be classified as stand-alone, here fall in the group or pyramid type. In particular, listed or large private firms, even if non-family-controlled, almost always have subsidiaries (often wholly owned and organized in flat structures in the case of US or UK control) and therefore will be classified as either group or pyramid (the bottom two groups in Panel B of Fig. 2). Our definition of a stand-alone firm applies to small firms without any links to other legal entities. Stand-alone firms exhibit almost no time variation in ownership (results available in the online appendix). While all member firms of a corporate structure are included in the calculation of ultimate cash flow rights, in the subsequent analyses we are only interested in productive firms, because the non-productive ones are most often shell companies or individual trusts without any business activity. This still leaves in the analysis the productive firms that may have an ultimate owner that is a financial institution.

These filtering steps have a balancing role in that the differences in size, industry and listed status between Canadian firms and Anglo-Saxon MNCs become less pronounced. In particular, the variety of sizes and industry composition represented in the structures are not significantly different between the Canadian and Anglo-Saxon MNC samples (industry and size splits are presented in the online appendix). The proportions of listed firms remain different, whereby 21% of the Canadian enterprises contain a listed firm, while 55% of the Anglo-Saxon MNCs do.Footnote 24

Starting with the total counts of productive firms in the database shown at the bottom of each graph in Panel A of Fig. 2 (78,128 non-listed and 2472 listed), we focus on the pyramids and groups with Canadian (28,116) and Anglo-Saxon (17,620) control. After dropping the firms that disappear from the database in 2006 due to a higher size threshold for mandatory disclosure, we have 6033 Canadian and 4848 Anglo-Saxon firms split by family control and listed status in Panel B of Fig. 2.

In Fig. 3 we show anecdotal examples of the time variation in ucfr for two firms each from the Canadian (Panel A) and Anglo-Saxon MNC (Panel B) subsamples. The two Canadian firms are Celestica Inc, and Indigo Inc.—two publicly listed firms that are part of the G. W. Schwartz group. The steep drop in ucfr in 2001 for Indigo happens when it was acquired by G. W. Schwartz. It was initially added in the group structure with a direct stake of 0.44, which later varies as Indigo moves further down the layers of pyramiding. Similarly, the stake in Celestica, which is a subsidiary of the investment holding company arm of the group (Onex Corp.), varies as the intermediate firms in the chain of control up to the ultimate owner are reshuffled.

Fig. 3
figure 3

Examples of ownership changes over time

In Panel B of Fig. 3 we show another two listed firms controlled by US-based MNCs: Sears Canada, controlled consecutively by Sears-Roebuck and Co, and ESL Investments Inc. (the investment arm of hedge fund investor Eddie Lampert) until its bankruptcy in 2017. The second is Kronos Inc., controlled by Contran Inc.—the private holding company of the late Dallas billionaire Harold Simmons. We see that both firms maintain a relatively stable ucfr slightly above 0.5 in the period 1997–2004.

These examples illustrate typical economic processes behind the time variation in ownership. We now turn to a detailed analysis of the patterns of these changes for different groups of firms over time.

In our analysis a firm is considered to be family-controlled if its ultimate owner is reported as a single entity describing an individual, a family, a group of related individuals or a group of related families, as in Claessens et al. (2002).Footnote 25 We take great care to assign family status with as much precision as we can by verifying the data in at least five other sources (as described in the accompanying online appendix).Footnote 26

We study the patterns of time variation in ownership across jurisdictions, family and non-family status, and ownership structure complexity. These groupings are guided by the extensive literature on family firms, ownership structures and institutions. Existing literature reveals a number of reasons why family owners make ownership change decisions differently than non-family ones. For example, inheritance planning is only relevant for family owners and therefore some changes in the ownership structure of family firms are motivated by the distribution of wealth among descendants (Villalonga and Amit 2009; Tsoutsoura 2015). Family owners have a longer-term investment horizon (Bertrand and Schoar 2006) and are more risk-averse (Faccio et al. 2011), which affects the timing and size of their ownership stake decisions. Importantly, family business groups are likely to be organized as pyramids (Almeida and Wolfenzon 2006) or employ other control enhancing mechanisms (Masulis et al. 2011).

Board independence may vary more in mature markets with stricter regulatory or institutional shareholder scrutiny. Among 26 countries which undertook corporate governance reforms in the beginning of the 2000s, only three mandated a minimum board independence threshold, while most employed a comply-or-explain model (Kim and Lu 2013). At the same time the range of values board independence can take depends directly on the current legal minimum in the respective jurisdiction—one third being most predominant in emerging markets versus one half in developed economies (Papadopoulos 2019). In addition, non-listed firms are likely to not be subject to any minimum board independence requirements, while state-owned firms may have even stricter limits (for example 90% in Sweden and 80% in Vietman (OECD 2018)).

4.1 Summary statistics and standard deviation regressions

We begin the analysis of time variation of ownership and board independence with a summary graph of yearly averages across all firms, which we show in Fig. 4. In Panel A, the cash flow rights of the ultimate owner exhibit great stability. Starting from the bottom, the group of Canadian family firms are held with the lowest stake on average, which changes the most year to year. The average ownership stake of the non-family Canadian and family Anglo-Saxon MNC groups is always close to 0.9 and fluctuates somewhat. The group with the least time variation in its average is that of Anglo-Saxon MNC non-family firms. In addition, the average of this group is always above 0.95, which suggests that the vast majority of these firms are wholly owned. The top line in Panel B shows very low variation in the average board independence of US listed firms, with no discernible adjustment following the Sarbanes–Oxley Act. On the other hand, the rest of the Anglo-Saxon countries display the gradual increase in board independence consistent with the implementation of legal minimums or comply-or-explain codes. The majority of non-listed firms start being covered by BoardEx in 2007 and show much lower average levels of independence, around 0.5, with slightly more adjustments over time for the US subsample. Overall, Fig. 4 points to little time variation in time averages within groupings but different degrees of time variation across type of firm and jurisdiction. Corresponding graphs for control rights and wedge are shown in the online appendix.

Fig. 4
figure 4

Cross-section over time. The ownership sample in Panel A consists of productive firms for which we have data for at least 12 years, while for board independence in Panel B we require a minimum of five years of data. The coverage of BoardEx for non-listed firms is very limited before 2007.

This broad view of time variation does not tell us what proportion of the observations are responsible for the movements in the average, nor whether the movements have a common source for all firms or are firm-specific; we analyze these possibilities in the following sections.

In Table 3 we present three types of summary statistics: overall means, cross-sectional standard deviations (equal to the standard deviation of the time averages for all firms) and average time standard deviations (computed over time and then averaged across firms) of ownership and board independence by subsamples. Not surprisingly, the average ownership (standard deviation) in a pyramid is lower (higher) than in a group. However, even for pyramids the means are relatively high, which means that the typical firm is wholly owned (Panel A of Table 3). For Chinese listed firms in Panel B the means are much lower, reflecting the absence of wholly owned firms. The time standard deviations are always lower than the cross-sectional ones, reflecting the sticky nature of the variables. In Panel C of Table 3 mean board independence is higher for listed than non-listed firms for all jurisdictions, while the cross-sectional standard deviation is much higher for non-listed firms. Time standard deviations are always low as expected for a stable variable. In China, both private and state-owned firms have mean board independence very close to the legal minimum, with both cross-sectional and time standard deviations also very low, suggesting that within-firm estimators are unlikely to detect effects in Chinese data. Tests for differences in variances by subgroups are almost always statistically significant (shown in the online appendix). Corresponding tables of summary statistics and tests for differences in variances for control rights and wedge are available in the online appendix.

Table 3 Summary statistics

To test whether time variation is statistically significantly different in a multivariate setting, in Table 4 we present regressions of the time-series standard deviation of ownership and board independence on indicators for the different groupings in Table 3. The dependent variable is constructed over 4-year rolling windows for each firm. We confirm that family-controlled firms have significantly higher ownership time-series standard deviation. Across jurisdictions, Canadian firms exhibit statistically significantly higher standard deviation of ownership than Anglo-Saxon MNCs. The greater time variability of family firms is maintained through different size groupings (Panel B of Table 4). Board independence of listed firms has statistically significantly higher time-series standard deviation across jurisdictions (Panel C). The size decile analysis in Panel D reveals that the lower time variability in board independence for US listed firms is driven by the two smallest deciles.

Table 4 Regressions of time-series standard deviation of ownership and board independence

Overall, the summary statistics and test results in Tables 3 and 4 suggest that the amount of time variation in ownership and board independence differs across jurisdictions, family control and listed status.

4.2 Analysis of year-to-year variation

In the next two sections we would like to see how our granular data maps onto the generalized simulation results in Fig. 1, where we show the relationship between power and proportion of firms that change between the first and next (last) period. We start with year-to-year changes and the firms responsible for these changes. In Table 5 Panel A (Panel B) we report the proportion of Canadian (Anglo-Saxon MNC) firms that change in each year. We use two definitions of change: any change > 0 and changes > 0.05 (as in Donelli et al. 2013), and we report non-family and family firms separately.Footnote 27 We also analyze the yearly proportions of changes by industry and size, but for brevity they are only reported in the online appendix. Corresponding tables with very similar results for control rights and wedge are also shown in the online appendix.

Table 5 Year-to-year variation in ownership

In both Panels A and B of Table 5 we note the lack of systematic spikes or drops in the proportion of firms that change in particular years or periods. The only exception is a jump in the period 2005–2006, which coincides with the changes in disclosure rules and size thresholds in CRA.

Table 5 Panel A (B) tells us that at a minimum of two years of data (that allows the computation of a change) on average we have only 13.8% (7.1%) of Canadian (Anglo-Saxon) family firms, and only 11.1% (4.3%) of Canadian (Anglo-Saxon) non-family firms, exhibiting a change in ownership. The numbers for changes greater than 5% are comparable to the results in Donelli et al. (2013) for Chile.

Over 17 years, the proportion of firms that change at least once becomes 60.6% (45%) and 36.9% (22.8%) respectively.

In both Panels C and D of Table 5 we see a very different picture for board independence. The minimum proportion of firms that change their board independence in any year is 45%, with the overall proportion of firms that change at least once becoming 100% over the entire period. Therefore, we uncover a large difference in the time variation of ownership and board independence when it comes to the proportion of firms changing each year.

Consider the findings in Table 5 relative to the generalized simulation results in Fig. 1. The minimum average proportion of firms that change ownership between two consecutive calendar years among all subsamples is 4.3%, which accumulates to 22.8% for the full period. Distributing this accumulated change equally over the period implies a rough estimate at year 13 of 21%, which would provide sufficient statistical power for four time observations and \(\beta \) = 0.8, corresponding to the right-most graph of Panel B of Fig. 1. For the other subsamples time variation is higher, and they would provide sufficient power even for a shorter time series: the 13.8% average year-to-year proportion of changing firms for the Canadian family subsample would translate to 42.1% in year 9 and have sufficient power for \(\beta \) close to 0.6.

Realistically, in empirical research involving sticky data it is much more likely to have a few time observations and be in the situation where the proportion of firms that account for the firm-specific time variation is relatively low. This highlights the local nature of an estimator based on time variation for a persistent variable and necessitates a closer look at the changing firms. We examine the kind of firms that change by industry and size (results in the online appendix). The industry and size splits reveal that the most changeable firms are larger and that in all categories family firms change more than non-family firms.

4.3 Analysis of stable regimes

Finally, we note that the year-to-year changes in the previous section are calendar year specific. Therefore, in Table 6 we introduce the concept of a stable regime that is calendar year independent. We follow DeAngelo and Roll (2015), who examine the stability of capital structures in US listed firms over long horizons. A stable regime is defined as a period of t years, where ownership does not change outside a predetermined bandwidth. For example, in Fig. 3 we see that Indigo is in a stable regime where no changes occur to its wholly owned status from 1997 until 2001, then it enters another stable regime at ~ 45% until 2006 and subsequently it experiences changes every year. Indigo will then be part of the proportion of firms that are in a stable regime for at least five years, while for Celestica, the longest stable regime is three years (~ 65% in 2005–2007).

Table 6 Stable regimes in ownership

We examine minimum lengths of stable regimes from 3 to 16 years and changes of 0%, 5% and 10%. For example, when we require the change to be greater than 10% for the stable regime to end, Celestica will be in one for nine years (1999–2007). However, when we require the change in ownership to be less than 5% or 0%, Celestica is never in a stable regime. We limit the analysis to firms that have at least 12 years of data to make the computed proportions in each year be driven only by changes and not by the number of available firms in the sample.

In Panel A of Table 6 we report the proportion of Canadian non-family firms with stable ownership regimes. Comparing the overall reduction in stability between Canadian non-family (Panel A) and family firms (Panel B), we note that the stability of family firms drops faster. From the second row of Panel A (Panel B) we note that although 98.2% (99.1%) of non-family (family) firms do not change their ownership structure within three years, by year 7 we see a decrease to 89.9% (84.4%), and this proportion further decreases to 67.2% (56.9%) by year 12 and 63.0% (51.6%) by year 16.

When we relax our definition of stable regime by increasing the bandwidth to 10%, we still find a clear reduction in the proportion of firms that follow stable regimes as time increases. The family effect is evident across jurisdictions, i.e., ownership stability dissipates faster for family firms.

The median and mean number of years in a stable regime reported in the last two columns of Table 6 also show that ownership stability is lower for family firms in both the Canadian and Anglo-Saxon MNC sample, as is board independence stability for Anglo-Saxon versus US firms.

On row four of all panels in Table 6 we report the year-to-year decrease in stability under the 0.05 bandwidth. In bold we have shown the years with the biggest drop. Using this metric, we find yet again that the biggest drop happens sooner (year 8) for family than non-family firms (year 12) in the Canadian sample. In Panels E and F of Table 6 we note that for board independence the largest drop in stability for US firms happens in year 9 (equal to the drop in year 10), while for Anglo-Saxon firms it is in year 5. Both groups begin with very high proportions of three-year stability, just as in the case of ownership, but the reductions are much larger, and by year 16 the proportion of firms still in a stable regime has dropped to 0 on row 1, whereas for ownership that proportion stays around or above 50% (Panels A–D of Table 6).

The stable regimes analysis shows that time variation accrues faster for some kinds of firms (Canadian family firms in the case of ownership and Anglo-Saxon firms for board independence). While within seven years close to 90% of non-family firms remain in stable ownership regimes, five more years reduces this proportion to two thirds. Stable regime tables for control rights and wedge are presented in the online appendix.

4.4 Analysis of variance and perisistence regressions

In Table 7 we show disaggregation of the variation in ownership and board independence by firm fixed effects, year fixed effects, industry-year, industry-size deciles and firm-time interactions, following DeAngelo and Roll (2015) and Lemmon et al. (2008). The first column shows the adjusted R-sq of each model, while the remaining columns report the relative proportion of variation explained by each set of fixed effects. Exact regression specifications and methodological details are presented in the online appendix. The proportion of explained variation in ownership accounted for by firm-time interactions is substantial in cash flow rights: 0.1325 for Canadian non-family firms and 0.1756 for family firms; 0.1814 for Anglo-Saxon MNC non-family firms and 0.2808 for Anglo-Saxon MNC family firms. The results are similar for control rights, wedge and board independence. For comparison, industry-time interactions are ten times less important for all groups. The analysis-of-variance results tell us that over a longer period of 25 years the firm-specific time varying component of ownership is important—accounting for a just under a fifth of all explained variation for family Canadian firms and close to a third for family Anglo-Saxon MNC firms. Therefore, the within-firm estimator should not be dismissed on the grounds of lack of firm-specific time variation. In contrast, the common-to-all-firms time series variation, industry-specific time variation and the one captured by industry-size deciles are all negligible. Existing studies often argue that using industry, time, size deciles and their interactions is a way to address the inability to use firm fixed effects. Our results, however, show that this approach does not come close to capturing the variation accounted for by firm fixed effects.

Table 7 Analysis of variance

The persistence regressions in Table 8 follow Graham et al. (2020), where ownership and board independence are regressed on their initial level. The statistically significant and large coefficients confirm the stability we have documented in the previous analyses, as well as the fact that it differs across groups. For example, in Panel A we see that cash flow rights for Canadian non-family firms change by between 0.65 and 0.7 for a unit level change in the initial value, while this is only between 0.44 and 0.47 for family firms. The initial level coefficients for board independence in Panel B similarly show different degree of stability across jurisdictions. The lower persistence for US listed firms is evident in line with the time-series standard deviation results in Table 4.

Table 8 Persistence regressions

4.5 Summary of empirical analysis

In the last part of this study we show the different time variation patterns in ownership and board independence manifesting in different subgroups by jurisdiction and type of control. In year averages we find that both variables vary little for most subgroups (Fig. 4), and the standard deviation over time is lower than that computed cross-sectionally (Table 3). The year-to-year analysis shows that the proportion of firms that change between any two consecutive calendar years is different across subgroups (Table 5), with dramatically higher year-to-year variation for board independence than ownership. On the other hand, the stable regimes are calendar time independent and indicate that a relatively shorter period is required for time variation in ownership to accrue for Canadian family firms and longer for non-family firms and Anglo-Saxon MNCs, as well as for board independence of Anglo-Saxon versus US firms (Table 6). We confirm these descriptive findings in multivariate analyses. In particular, the time-series standard deviation of ownership is higher for family firms and pyramid structures, but lower for Anglo-Saxon firms (Table 4). In terms of listed status, the time-series standard deviation of board independence of non-US firms is higher relative to non-listed ones. Variance decomposition and persistence regressions (Tables 7 and 8) similarly demonstrate the different patterns of time variation revealed in the descriptive findings. Therefore, we confirm that the time variation in ownership and board independence exhibits different patterns across subgroups of firms, with the subsamples with higher time variation corresponding to sufficient power in the simulation results.

5 Conclusion

Recent methodological and editorial guidance calls for improving causal inference and ruling out alternative explanations in all sub-disciplines of business research. This can be done either through a source of exogenous variation or through careful and exhaustive tests that convincingly support the baseline results. Very often these approaches rely on time variation. For example, existing studies where controlling ownership or board independence is an explanatory variable often find it impossible to employ within-firm estimators (that address time-invariant endogeneity) because of the lack of time variation. Based on theoretically and empirically grounded rules, we simulate artificial data reflecting the evident stickiness of such variables in existing work and analyze the power properties of within-firm estimators under different degrees of time variation. The results provide guidance on key elements of the empirical design of governance research. Responding to common challenges, such as difficulty of data collection, we derive relationships between lengths of time and gaps in data collection on the one hand and statistical power on the other. Our results are useful for a variety of research settings in any jurisdiction and can be employed as a benchmark to assess whether time variation is sufficient to detect a relationship if it exists.

When data needs to be hand collected or extensively pre-processed, a researcher may begin by examining the proportion of firms that change between two time points. Our simulations suggest that for a range of strengths of the theoretical relationship of interest, when sampling ownership data three years apart, a minimum of between 25 and 55% of firms in the data should change between two neighboring time points for sufficient statistical power. For board independence, sampling data two years apart requires between 15 and 50% of firms changing.

Even when a project uses an electronic data source, which does not present data collection challenges, our findings are useful in judging whether a finding of no significance is due to lack of power as opposed to ownership or board independence being unrelated to the outcome of interest.

We illustrate the simulation findings with quasi-replications of seminal studies of ownership (Lin et al. 2013) and board independence (Coles et al. 2008). In the case of ownership, a unique granular database (ICO) allows the exact computation of variables for consecutive years over a relatively longer time period and a large number of firms. For board independence, we use the most popular source with the widest coverage (BoardEx). Based on our replications, we confirm existing results that firms controlled by an Anglo-Saxon jurisdiction often have insufficient variation in ownership for statistical power purposes; however, non-Anglo-Saxon firms exhibit more time variation that allows the use of within-firm estimators. For board independence of US listed firms, a relationship is detectable only if the time series is sufficiently long or, for a shorter series with gaps, if the proportion of firm changing is sufficient. We further establish that there are indeed different degrees of time variation in governance variables across type of control, jurisdiction and complexity of ownership structure, which supports the usefulness of the simulation findings. When researchers use our findings for empirical design decisions and data collection strategies, they should condition these choices on the specific characteristics of their data as we do here.

Beyond methodological guidance addressing a particular problem with governance data, our findings open new avenues for future research. The fact that the amount of time variation in ownership and board independence differs across jurisdictions, listed status, family control, etc., invites a natural question about the drivers of this greater time variation.Footnote 28 At least three possible theoretical explanations can be explored. Greater decision making flexibility for single/concentrated versus multiple/dispersed owners (Bhaumik et al. 2010) is consistent with more frequent adjustments in governance mechanisms in response to arising technological and competitive opportunities or regulatory and commodity pricing shocks (Graham et al. 2020). The joint pursuit of business and personal objectives by decision makers can manifest in governance changes associated with personal inheritance and tax planning (Carney et al. 2014; Tsoutsoura 2015) or reputational concerns (Belenzon et al. 2019; Deephouse and Jaskiewicz 2013). If certain types of controllers are more likely to expropriate outside shareholders, they may opportunistically adjust their own exposure to the cash flows generated by a firm they control in the expectation of future negative performance (Bertrand et al. 2002).

Our generalized simulation results can be used to revisit many of the research questions in existing literature, where adding one more time observation or sampling further apart can allow time-variation-based methods to be employed. Further, by being able to exploit the time dimension in the data, many new research projects are more likely to become viable. In terms of new avenues of research where time variation in ownership can be useful, consider the adoption of peer-to-peer technology that has democratized access to startup and secondary equity financing. This recent phenomenon can inform the long-lasting debate on whether business groups or other forms of concentrated corporate control are compensating for institutional weakness or hampering innovation and growth (Khanna and Yafeh 2007; Morck 2005). Researchers can revisit this question by studying the effect of changes in ownership patterns in industries that are relatively less costly for new entrants (software services, call centers) in jurisdictions with different degrees of institutional weakness, on firm-level innovation, growth and value added. Similarly, time-varying ownership data can be used to test whether the advent of block-chain technology, and the unprecedented degree of transparency of business transactions (including in ownership stakes) it allows, reduce the incentives to maintain complex ownership structures (Yermack 2017).

In terms of board independence, we know from Graham et al. (2020) that it exhibits long-term dynamics for US listed firms. Given regulatory and institutional constraints within other jurisdictions and different incentives for private as opposed to listed firms, further theories of bargaining and/or dynamic contracting become testable in data with manageable time series lengths.