Review of Quantitative Finance and Accounting

, Volume 36, Issue 4, pp 491–516

The advantages of using quarterly returns for long-term event studies

Authors

  • Ronald Bremer
    • Department of Information Systems and Quantitative Studies, Rawls College of BusinessTexas Tech University
    • Department of Finance, Albers School of BusinessSeattle University
  • Philip C. EnglishII
    • Department of Finance and Real Estate, Kogod School of BusinessAmerican University
Original Research

DOI: 10.1007/s11156-010-0191-2

Cite this article as:
Bremer, R., Buchanan, B.G. & English, P.C. Rev Quant Finan Acc (2011) 36: 491. doi:10.1007/s11156-010-0191-2

Abstract

The main purpose of this paper is to explore the low power and methodological problems as they continue to plague long-term event study research. We investigate long-term tests (up to 2 years) performed on non-overlapping quarterly time frames as a solution. Components of commonly employed characteristic-based matching processes are examined as the source of low power. Single “best” matching firms don’t statistically match their event firms at the time of the event and are vastly inferior to matching with portfolios. A modified market mean method which uses the securities continuously traded during the calendar event period, is shown to be well specified, have comparable power and avoid the costs of more complex matching methodologies. Contrary to popular perception, increased power derives from the decreased variance in comparison returns; not from an increased covariance between comparison firm returns and event firm returns. The tests are easy to implement, well-specified and have higher power when based on quarterly versus monthly data.

Keywords

Long-horizon performance studiesMatching characteristics

JEL Classification

C100C150G120G140

1 Introduction

While short-term event studies are used to assess the reaction of stock market participants to new information over a period of days or months, long-term event studies are employed to assess the impact of changes in firm organization, financial structure or activities over a period of years (typically 1–5 years). Thus, short-term event studies, often seen as tests of market efficiency, focus on the reduction of asymmetric information about the nature of the firm or changes in the nature of the firm. Long-term event studies, by contrast, attempt to assess the impact of such events as changes in firm organization, financial structure or activities on the realized stock market performance of the firm’s shares or long-term performance of investment recommendations.1 In essence, short-term event studies focus on the impact of new information on the current expectation of future returns while long-term event studies focus on the ultimate effect of the changes transmitted in the information release on future returns.2

In order to assess whether the event does indeed result in a change in firm performance and therefore stock price performance, a comparison must be made between the firm’s performance after the event and what that performance would have been had there been no event. Current long-term event study techniques, as recommended by Ikenberry et al. (1995), Barber and Lyon (1997) and Lyon et al. (1999), select firms that have characteristics that are believed to be “similar” to the firms that are the subject of the event and then compare the stock market performance of the matched firms to that of the announcing firms. We refer to those firms that have made information announcements as event or treatment firms and those firms selected for comparison as match [matching] or control firms and to this category of techniques as characteristic-based matching. As pointed out by Chou et al. (2006, 2010), Dichev and Piotroski (2001), Ang and Zhang (2004) and Byoun (2004) there is a lack of consensus on the best approach to select the comparison firm returns and to ascertain whether the abnormal returns generated using a particular firm or set of firms are statistically detectible at generally accepted levels.3

In this paper, we propose and investigate tests for abnormal returns on quarterly non-overlapping intervals that can be used for longer periods (up to 2 years) alone or in conjunction with the current long-term tests. The tests are simple to perform, the test statistic structure is similar to that of commonly used long-term tests, they are well specified under fairly general conditions, they have reasonable power properties and are less sensitive to the biases noted for the traditional long term tests using monthly data.

We use a simulation based empirical design to investigate several factors affecting the specification and power of the characteristic-based matching design applied to quarterly buy and hold abnormal returns. We demonstrate that the increased cost of complexity borne implementing characteristic-based matching gains researchers little as does using monthly returns instead of quarterly returns. By using repeated sample sizes of 200, we guarantee comparability across techniques and control for the sample size effect on the level of power. We then use the same levels of simulated abnormal return for each technique to investigate the power characteristics of the different matching criteria in order to determine which performs best.

We find appropriate Type I error rates for the proposed tests using any of the characteristic-based matching techniques but determine that the power of the test is higher for tests that (1) match using portfolios rather than individual firms; (2) employ matching criteria that produce a larger set of “similar” firms through the use of less restrictive matching conditions, and (3) use quarterly data rather than monthly data. We investigate the source of the increased power through inspection of the variance–covariance attributes of the differing techniques and conclude that the higher power of the portfolio techniques is attributable to the lower variation in control firm returns caused by diversification when using comparison portfolios. The reduction in abnormal return variation due to the portfolio effect more than outweighs the beneficial effects of any potential increase in the covariance of event-firm returns vis-à-vis comparison firm returns due to using a more “similar” single firm match.

Our results suggest that researchers wishing to achieve appropriate Type I error rates and simultaneously maximize the power of test statistics on buy-and-hold abnormal returns4 (BHAR) can do so by using a matching technique that compares event firm returns to the return on an equally weighted portfolio of all firms with returns for the event period and employ quarterly data. We refer to this as the modified market mean return technique. This procedure is relatively costless in its construction and minimizes the variance of matching firm returns while the corresponding decrease in covariance is still large enough to have an effect on the power of the test.

The paper proceeds as follows: Sect. 2 details the argument for characteristic-based matching and current matching techniques used in long-term event studies and in conjunction with the proposed sequence of tests. Section 3 outlines the data and simulation methodology. Simulation results are discussed in Sect. 4. Section 5 concludes our findings.

2 Characteristic-based matching: theory and practice

2.1 Theory

Long-term event study techniques are widely employed to study a variety of topics in finance and characteristic-based matching, initially investigated in Barber and Lyon (1997) has become the dominant technique employed by researchers in this area.5 The test statistic in such studies is:
$$ TS = {\frac{{\overline{{R_{e} - R_{c} }} }}{{\sqrt {{\text{Var}}\left( {R_{e} - R_{c} } \right)/n} }}} $$
(1)
where Re is the return to the event firm, Rc is the control or matching firm [portfolio] return, the difference between the two is the abnormal return, \( \overline{{R_{e} - R_{c} }} \) is the average abnormal return over the n event firms and Var(Re − Rc) is the variance of the abnormal return estimated as the variance of the difference between event firm returns and matching firm returns whether matching with a single firm or a portfolio of firms. In long-term tests Re and Rc (or the return for each firm composing the portfolio) are typically compounded over 1, 3 or 5 years using the formula
$$ \prod {(1 + R_{ij} )} - 1 $$
(2)
where the product is taken over all months that compose the desired time frame and Rij is the return for firm j at month i after the event.6

Conducting a long-term event study involves a host of decisions about how to estimate the returns in Eq. (1) and, therefore, the abnormal returns and the variance of abnormal returns. Chief amongst these decisions are: (1) the method for calculating the returns themselves; (2) the selection process to determine the control firm or firms whose returns will proxy for the event firm returns had there been no event; and (3) the time frame to inspect for evidence of a change in event firm returns.7 Barber and Lyon (1997) address the first decision by showing that the practice of using short-term returns and compounding them to gain long-term returns yields mis-specified test statistics. Consequently, they recommend a practice wherein buy-and-hold returns are used to estimate both the realized return to the event firms and the realized return to the control firm or set of firms. We follow that practice herein. The second decision is also investigated in Barber and Lyon (1997) and has been a primary focus of the long-term event-study literature since then. The selection of the appropriate control firm return affects both abnormal return estimation and the significance of tests used to assess the statistical significance of that estimation through the mechanism of estimated abnormal return variance. It is upon this selection that we focus in this paper.

The estimated variance of abnormal returns will be a function of three things: (1) the variance of the event firm returns (which is the same regardless of the process used to select control firms); (2) the variance of the matching firm returns and (3) the covariance between the event firm returns and matching firm or portfolio returns. Transforming Eq. (1), we have:
$$ TS = {\frac{{\overline{{R_{e} - R_{c} }} }}{{\sqrt {\left( {{\text{Var}}\left( {R_{e} } \right) + {\text{Var}}\left( {R_{c} } \right) - 2{\text{Cov}}\left( {R_{e} ,\,R_{c} } \right)} \right)/n} }}} $$
(3)

The implicit premise of the matching techniques employed in characteristic-based matching, as in Barber and Lyon (1997), is that “better” matches result in an increased covariance which in turn results in a decreased variance for the abnormal return and a higher test statistic. While it is certainly true that an increase in the covariance term can result in a decrease in the variance, the selection of the appropriate matching firms also affects the abnormal return estimate directly through the differencing of the returns and through the portion of abnormal return variance attributable to variation in the control firm returns. We include an investigation of the differing roles that variation in control firm returns, covariance of control firm and event firm returns and the difference between event firm return and control firm return play as part of our exploration of the power properties of long-term event study test statistics applied to non-overlapping quarterly or monthly time frames.

The ability to correctly statistically detect the subsequent change in performance, known as the power of the test statistic,8 is generally believed to depend directly on the comparability of the match. Generally speaking, the power of the test statistic used in characteristic-based abnormal return estimation will be a function of three things: (1) the number of firms in the sample (2) the matching process itself; and (3) the non-centrality parameter (induced abnormal return). A test with low power is undesirable because it will lead the researcher to conclude that events are statistically insignificant when in fact they are significant. If a researcher is comparing well-specified tests, those with higher power are preferred. If two techniques have comparable levels of power and are both well-specified, researchers should use the one that requires the fewest assumptions, imposes the least cost in terms of computation and implementation, and has the strongest underlying statistical or economic foundation. Additionally, the specification and power of the tests over non-overlapping time frames is important if one is to detect a reliable timing pattern of abnormal returns and thus know not only that an abnormal return has occurred but when it has occurred.

2.2 Practice

Since the validation of characteristic-based matching by Barber and Lyon (1997), long-term event studies have appeared regularly in finance journals. Table 1 provides the frequency of long-term event studies using each of several abnormal return estimation techniques as published in the Journal of Finance, Journal of Financial Economics,Journal of Business, Review of Financial Studies and Journal of Financial and Quantitative Analysis.. Across the 1997 to September of 2006 time period, ninety-nine articles were published containing long-run abnormal return estimation in the analysis.9 Characteristic-based matching techniques are those that match event firms with non-event firms based on event firm characteristics and then use the return of the matching firms or portfolio of firms as an estimate of the expected return for the event firms. Abnormal returns are then the mean difference in event firm returns and matching firm returns. Representative examples of these techniques are found in Prevost and Rao (2000), McConnell et al. (2001), Datta et al. (2001), Mitchell and Stafford (2000), Carpenter and Remmers (2001), and Clarke et al. (2004). The matching may occur only on characteristics at the time of the event, as in Barber and Lyon (1997), or may reoccur during the event period in the variant known as calendar time matching as in Mandelker (1974) and Jaffe (1974).
Table 1

The use and prevalence of long-run abnormal returns estimation

Technique

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Characteristic-based matching

2

5

10

11

8

7

10

11

5

2

Estimation-period methods

0

1

0

4

2

0

0

1

0

2

Both

0

2

1

3

2

5

2

1

2

0

Total

2

8

11

18

12

12

12

13

7

4

Table entries are the total number of articles appearing in the Journal of Finance, Journal of Financial Economics, Journal of Business, Review of Financial Studies and Journal of Financial and Quantitative Analysis employing the indicated long-run abnormal return estimation techniques. Characteristic-based matching techniques are those that calculate abnormal returns by comparing event firm event period returns to returns from a set of firms that have been matched on event firm characteristics. The techniques classified below as characteristic-based matching techniques are those developed by and attributed to Mandelker (1974), Jaffe (1974), Mitchell and Stafford (2000), Barber and Lyon (1997) and variations thereof (e.g. Lyon et al. 1999). Estimation-period methods are those techniques that utilize data from before the event to develop a model of expected returns after the event. The techniques classified below as estimation-period methods would include all those investigated by Kothari and Warner (1997) as developed and employed by numerous authors, most notably Brown and Warner (1980). Market model returns, market adjusted returns and mean adjusted returns are examples of estimation-period based techniques

Estimation-period based techniques include those techniques initially validated in Brown and Warner (1980) such as market adjusted returns and market model abnormal returns. Numerous researchers have modified and extended the initial estimation-period based techniques in the intervening 27 years. Kothari and Warner (1997) provide a detailed review and thoroughly investigate the properties of estimation-period based techniques for long-run abnormal return estimation.

It is readily apparent from Table 1 that the currently preferred techniques for estimating long-run abnormal returns are those that rely on characteristic-based matching. Of the 99 studies published during this time period, 71 employ one or more forms of characteristic-based matching as the sole technique for estimating abnormal returns. Only ten studies during this time period exclusively used estimation-period based abnormal returns. Much of the motivation for the more involved estimation techniques and modifications to the basic t-test methodology is linked with the consequences of compounding and/or accumulating abnormal returns over multiple time frames where the effect of the changes is then explored for a long time period (e.g. monthly compounded returns over a period of years). Our proposed tests are for non-overlapping short time frames (quarterly compounded returns over a period of years) so if the statistical properties of the basic t test and characteristic matching methodology can be verified, the simplicity of the method strongly supports this methodology.

The standard characteristic-based matching approach is predicated upon the idea that two or more firms that are “similar” at a given point in time should have similar performance in the future ceteris paribus. For example, when a firm undergoes a structural change, any resulting change in its stock performance vis-à-vis its matched counterpart is then attributed to the structural change or to superior performance.10

The current practice in long-term event studies is to look at the broad question of whether there is a significant amount of cumulative and/or compounded abnormal return at a relatively small number of time frames after the event.11 The most common time frames are 1, 3 and 5 years. Size and book-to-market are the overwhelmingly popular matching criteria. The most common modifications are the addition of industry matching criteria implemented using SIC classification, the inclusion of a fourth factor, lagged return, to capture momentum as suggested by Carhart (1997), or both. The timing of abnormal returns within the time frame chosen by the researcher (years, quarters, months, etc.) is not generally investigated. Reliably inferring this pattern from the results of the sequence of long-term tests currently employed is not possible since the tests are not independent and have no direct information with respect to this pattern. By accumulating abnormal returns over increasingly longer periods, the detection of abnormal returns becomes more difficult if the effects are transitory and the biases investigated by Barber and Lyon (1997) are introduced. We propose and investigate tests for abnormal returns on shorter (quarterly) non-overlapping intervals that can be used for longer time frames (years) in conjunction with the current long-term tests. The timing of abnormal returns can be directly inferred from this sequence of tests. By using the current long-term event study techniques, such as those suggested in Barber and Lyon (1997), the researcher eliminates the biases but at the expense of power and ability to determine the timing of abnormal returns within the time frame investigated.12

The characteristic-based matching technique advocated by Barber and Lyon (1997) and Lyon et al. (1999) is an iterative process. For the matching firm candidate(s), the researcher first selects one or more control firms in the same standardized industrial code (SIC) as that of the event firms. This set of potential matching firms is then matched to the event firms by the market value of event firm equity to incorporate the size effect and then by the book-to-market ratio to incorporate a long-term performance characteristic noted by Fama and French (1993).13 If a single matching firm is being employed, the return of the matching firm with the closest match in each of the above categories iteratively is used as the referent return to determine the extent of abnormal performance. Since it is unlikely that an exact matching firm will be found, single-firm industry matching may occur at the one, two or three-digit SIC level. For size and book-to-market matching, tolerance levels vary by study but generally range from within ±10 to ±30%.14 For example, Lyon et al. (1999) first identify firms with market value of equity between 70 and 130% of the market value of equity of the sample firm (a tolerance level of 30%). From that set of firms, they choose the firm with the book-to-market ratio closest to that of the sample firm. Frequently, this process is iterative in that the researcher first searches for matches within the same 4-digit SIC classification that meet the tolerance requirements. Failing to find a match in that classification, the SIC requirement is relaxed to 3-digit matching and a match is selected using the same tolerance levels for size, book-to-market or both. Failing to match at the 3-digit level is followed by inspection of the 2-digit SIC matches and so on. If there are no firms within the tolerance levels within the SIC matching classifications, the return of the firm with the closest matching characteristics is used as the control return. This is the process used by Prevost and Rao (2000) and McConnell et al. (2001). One desirable trait for any testing procedure is that it be simple and the preceding process is not. Accordingly, when implementing a single firm match technique, we determine the matching firm at the time of the event and the return on this date for this firm is used for Rc from Eq. (1) for all subsequent quarters or months.

If a portfolio of matching firms is being employed for comparison purposes, then all non-event firms are generally sorted into portfolios (usually deciles) based on a characteristic believed to be part of the return generating process and event firm returns are then compared to the average return for the portfolio of which the event firm would have been a member. This may be done for multiple characteristics to create what is implicitly assumed to be a better match between event firm return and control portfolio return-generating processes. For example, all non-event firm returns may be sorted into size-based portfolios (generally deciles) then the size-based portfolios are further sorted into book-to-market based portfolios (usually quintiles) as in Mitchell and Stafford (2000). Different sorting levels (deciles, quintiles, quartiles) are employed in different studies as are different sorting criteria (see Carpenter and Remmers (2001) and Clarke et al. (2004)). Analogous to our implementation for a single firm match, when implementing a portfolio method, the portfolio of matching firms is found on the event quarter or month and the mean return for all firms in the portfolio for this quarter or month is used for Rc in Eq. (1) for all subsequent quarters or months.

The differing permutations of sorting techniques, tolerance levels and iterative selection criteria are legion with little direction in the literature as to which techniques are well-specified over various time periods and which have better levels of power.15 We investigate the characteristic-based matching process itself as the source of low power in event study test statistics on quarterly and monthly time frames. We first analyze the power of the test statistic while employing characteristic-based matches at the individual firm level and then at the portfolio level. By using common sample sizes, the same set of event firms for the tests and abnormal returns calibrated to make differing period comparisons valid, we provide insight into which techniques are well-specified and have relatively higher power in direct comparison to each other.

3 Simulation construction

We employ the Center for Security Price Research (CRSP) database for stock price performance data and Standard and Poor’s Compustat database for quarterly book value of equity. We use monthly and quarterly data from the fourth quarter of 1971 to the fourth quarter of 2003 for all NYSE/AMEX/NASDAQ firms as covered by CRSP and Compustat. In the first stage of the simulation we find, for each firm, the best matching firm or the portfolio of matching firms for each of the matching criterion studied. For a particular quarter, all firms with data for that quarter and the quarter before are used. Using the largest possible set of firms as possible best matches or portfolio members is consistent with prevailing practice regardless of the metric used to measure abnormal returns or the test statistic employed to determine statistical significance.

The second stage of the simulation constitutes the power study. In order to ensure that we can assess long-term performance, we eliminate firms from the eligible event sampling set if they do not have at least nine continuous quarters of data: the event quarter and eight quarters after the event quarter. While this biases the sample towards larger firms and introduces a small survivorship bias to the sampling space, we believe that the biases that result from this aspect of the technique are consistent with current practices. As a result of their failure to survive or lack of data availability, the firms eliminated are not likely to be included in a long-term event study but are typically included as matching candidates.16 Since we use all firms with nine quarters of data for the entire time period from 1972 to 2003, we have the population of firms eligible for an event study spanning an eight-quarter event window for those years.

Since the initial verification of short-term event study test statistics by Brown and Warner (1980, 1985), block resampled bootstrap analysis has been used to verify both the Type I error rates and power curves of alternative event study empirical designs. We employ a block resampled bootstrap process similar to the Brown and Warner (1980, 1985) process but using the characteristic-based matching firm methodology as in Barber and Lyon (1997) and Lyon et al. (1999) with additional characteristic-based matching techniques added to reflect alternative methods such as those recommended by Mitchell and Stafford (2000), Carpenter and Remmers (2001) and Clarke et al. (2004).17

In order to simulate a long-term event study, we randomly select event firms while ensuring that the firms selected have data available for nine quarters beginning with the randomly selected event date. We block resample 350 event firm samples of size 200. We replicate this process 10 times for a total of 3,500 samples comprised of 200 event firms each. We apply several permutations of the matching routines advocated by Barber and Lyon (1997) and Lyon et al. (1999) and others in order to determine the ex ante set of matching firms available for estimation of the power of the test.

The matching criteria we employ include matching by standard industry code (SIC) at different digit levels, matching on the size of the firm, as measured by the market value of equity, matching on the ratio of the book value of equity to the market value of equity and matching with a momentum factor, the previous period quarterly return. We investigate several of the most popular currently employed designs for assessing abnormal returns and some natural extensions selected to illustrate the source of increased power.18 The designs can be loosely classified as single-firm matching designs or portfolio-comparison designs.

In the single-firm matching design, we investigate among others the representative model employed in McConnell et al. (2001) and Prevost and Rao (2000).19 With this design, researchers first find all firms with the same 4-digit SIC code. The firm with the closest market value of equity then becomes the matching firm if that value is within ±25% of event firm market value. If there are no firms satisfying that criteria with the same 4-digit SIC, firms with the same 3-digit SIC are next examined and the closest match within ±25% of event firm market value of equity is selected. If there are no qualifying 3-digit SIC matches, the process is repeated iteratively for 2-digits, 1-digit and no digit SIC matching. If there are still no matches, the firm with the closest market value of equity regardless of SIC classification is used. An equivalent matching design using book value of equity in the place of market value is also investigated. To isolate the effect of the SIC portion of the design, the designs which find the closest match (using either market value or book value of equity) under 4 digit, 3 digit, 2 digit, 1 digit, and no SIC digits matching are also investigated. The final single-firm matching design we investigate is that used by Datta et al. (2001).20 Here, the firm with the smallest total absolute percentage difference of the total market value of the firm, book value of equity and prior quarterly return is found and used as the matching firm.

In the portfolio-comparison design, we investigate a host of variations. The first designs look at firms within 10, 20 or 30% of either market value or book value of equity and with 4 digit, 3 digit, 2 digit, 1 digit or no SIC digit matching. Three sequential sets of steps (rules) are then considered based on these tolerance levels and the level of SIC digit matching. Let the triple term (A, B, C) represent the firms with A matching SIC digits, within B% of the market value and within C % of book value of equity. For example, (3, 20, 30) represents firms within 3 matching SIC digits, within 20% of the market value and within 30% of book value of equity. The first sequence rule (Rule 1) examines the groups defined by (4,10,10), then by (4,10,20), then (4,10,30), (4,20,10) … (0,30,30). The matching portfolio is the one resulting from the first rule in the sequence which results in 1 or more firms in the portfolio. Abnormal returns are then calculated as the difference between buy-and-hold returns to the event firms and buy-and-hold returns to the equally-weighted portfolio of the matching firms whose membership would have included the event firm.

The second sequence rule (Rule 2) looks at groups of firms defined by (4,10,10), (4, 20,10), (4,30,10), (4,10,20) … (0,30,30). The third sequence rule (Rule 3) looks at groups of firms defined by (4,10,10), (4,20,20), (4,30,30), (3,10,10) … (0,30,30). To assess the affect of the portfolios with a larger number of firms, we include variations in which the comparison portfolio has a minimum number of matches gathered while going through the above process. For example, when the method contains the label “n > 59”, the matching portfolio is the one resulting from the first rule in the sequence which results in 60 or more firms in the portfolio. The difference between Rule 1 and Rule 2 lies in the order of the matching criteria. Where Rule 1 first matches within SIC classification by size then by book-to-market ratio, Rule 2 first matches by book-to-market ratio then by size. Rule 3 parallels Rule 1 but relaxes the tolerance levels iteratively by equal amounts (e.g. ±10% on each category simultaneously).

A second set of portfolio-comparison designs, of which Mitchell and Stafford (2000),21 Carpenter and Remmers (2001)22 and Clarke et al. (2004)23 are representative, are also investigated. Following Mitchell and Stafford (2000), we construct comparison portfolios by sorting firms first into deciles on size, then those deciles into quintiles based on book-to-market. Abnormal returns are then calculated as the difference between buy-and-hold returns to the event firms and buy-and-hold returns to the equally-weighted portfolio of the matching firms whose membership would have included the event firm. Likewise, we follow Carpenter and Remmers (2001) by performing the same operation using size (deciles) and prior period return (quintiles) in place of book-to-market value. Properties of various extensions of these designs are investigated wherein at the first stage the groups are formed by breaking the market value distribution into either 10, 20 or 25 equal groups and in the second stage each of these groups further broken into 5, 10 or 20 additional groups based on either book-to-market value or prior period return. We also follow Clarke et al. (2004) by performing the same process with size (deciles) then book-to-market (quintiles) then prior period return (quintiles) to create the reference portfolios. Finally, we use a modified market mean of all firms with return data available for the entire 9 quarters as the comparison return.

Finally, following Savickas (2003), we induce an event on the monthly return with a random draw from a uniform distribution on an interval equal to the indicated effect size ±.25%. The quarterly BHAR is calculated from the 3 monthly returns with the simulated effect added when appropriate. This ensures that the mean effect is equal to the indicated effect size for that time period and incorporates variance around that mean to more accurately represent the market reaction to an event and evolution of performance changes across time. The test statistic employed is the difference between the average buy and hold return on the event firms and the average buy and hold return on the characteristic-based matching firm or equal-weighted portfolio of firms. The power curve for the characteristic-based matching routines is then determined by identifying the ratio of instances in which the two-tailed test statistic correctly identifies instances of induced abnormal return to the total number of tests for differing levels of induced abnormal return.

4 Results

Characteristic-based matching routines are typically employed where the control group is composed of either a single matching firm or a portfolio of matching firms. Table 2 summarizes the power estimates for 10 different matching criteria over the eight quarters after the event for an event study sample size of 200 (approximately the average sample size for long-term event studies in the literature).24 Panel A contains the results when there is no induced abnormal return (the Type I error rate). Consistent with prior studies, the characteristic-based matching routines generally have appropriate Type I error rates (an error rate of ~0.05 when alpha = 0.05). This is true whether the comparison return is that of a single matching firm or a portfolio of matching firms. Panels B through F present the summary evidence on these techniques for induced abnormal returns varying from 1 to 9% quarterly. Similar to the findings of Barber and Lyon (1996) and Lyon et al. (1999), the power levels are quite low for all the matching methods for all levels of abnormal return examined, reaching generally accepted levels when there is an induced 5% quarterly abnormal return for the entire eight quarters in those cases in which portfolio matching is employed.25
Table 2

Type I error rates and power estimates

 

Quarters After

Summary Stats.

Single firm tech.

  

1

2

3

4

5

6

7

8

Mean

Median

SD

 

Panel A: Type I error rate

Seq. TV ±25%

  

0.044

0.045

0.049

0.048

0.049

0.051

0.043

0.047

0.047

0.048

0.003

1

Seq. B/M ±25%

  

0.053

0.047

0.047

0.048

0.049

0.052

0.043

0.045

0.048

0.048

0.003

 

Min. Abs. Diff.

  

0.048

0.048

0.048

0.050

0.047

0.051

0.051

0.043

0.048

0.048

0.003

2

Portfolio comparison techniques

 Rule 1

  

0.045

0.047

0.051

0.046

0.047

0.051

0.051

0.049

0.048

0.048

0.002

 

 Rule 2

  

0.048

0.050

0.049

0.048

0.054

0.043

0.047

0.053

0.049

0.049

0.003

 

 Rule 3

  

0.045

0.042

0.045

0.048

0.044

0.041

0.043

0.043

0.044

0.044

0.002

 

  TV

B/M

Ret

            

  10

5

0.046

0.053

0.039

0.051

0.051

0.044

0.050

0.035

0.046

0.048

0.006

3

  10

5

0.047

0.049

0.047

0.049

0.050

0.052

0.049

0.046

0.049

0.049

0.002

4

  Ret

B/M

TV

            

  20

20

10

0.053

0.042

0.047

0.047

0.052

0.044

0.049

0.047

0.048

0.047

0.004

5

 Mod. market mean

  

0.047

0.049

0.049

0.045

0.055

0.051

0.047

0.045

0.049

0.048

0.003

 

Panel B: Power, effect size 1%

Seq. TV ±25%

  

0.061

0.070

0.068

0.066

0.065

0.056

0.058

0.064

0.064

0.065

0.005

1

Seq. B/M ±25%

  

0.069

0.067

0.067

0.068

0.060

0.065

0.057

0.060

0.064

0.066

0.004

 

Min. Abs. Diff.

  

0.055

0.064

0.061

0.059

0.053

0.059

0.059

0.051

0.058

0.059

0.004

2

Portfolio comparison techniques

 Rule 1

  

0.057

0.057

0.066

0.053

0.054

0.061

0.061

0.063

0.059

0.059

0.005

 

 Rule 2

  

0.065

0.062

0.068

0.057

0.066

0.056

0.057

0.065

0.062

0.064

0.005

 

 Rule 3

  

0.062

0.064

0.059

0.065

0.055

0.057

0.054

0.064

0.060

0.061

0.004

 

  TV

B/M

Ret

            

  10

5

0.063

0.073

0.069

0.069

0.065

0.057

0.062

0.067

0.066

0.066

0.005

3

  10

5

0.061

0.070

0.074

0.064

0.060

0.061

0.055

0.070

0.064

0.063

0.006

4

  Ret

B/M

TV

            

  20

20

10

0.063

0.059

0.062

0.058

0.066

0.066

0.061

0.066

0.063

0.063

0.003

5

 Mod. market mean

  

0.074

0.077

0.077

0.078

0.073

0.084

0.077

0.077

0.077

0.077

0.003

 

Panel C: Power, effect size 3%

Seq. B/M ±25%

  

0.168

0.177

0.181

0.171

0.149

0.175

0.157

0.164

0.168

0.170

0.011

 

Min. Abs. Diff.

  

0.143

0.159

0.153

0.147

0.136

0.149

0.145

0.144

0.147

0.146

0.007

2

Portfolio comparison techniques

 Rule 1

  

0.155

0.168

0.183

0.165

0.154

0.161

0.154

0.167

0.163

0.163

0.010

 

 Rule 2

  

0.166

0.169

0.178

0.168

0.165

0.165

0.161

0.174

0.168

0.167

0.005

 

 Rule 3

  

0.167

0.177

0.186

0.174

0.160

0.171

0.161

0.178

0.172

0.173

0.009

 

  TV

B/M

Ret

            

  10

5

0.249

0.257

0.257

0.247

0.226

0.231

0.237

0.255

0.245

0.248

0.012

3

  10

5

0.265

0.264

0.273

0.256

0.241

0.254

0.241

0.260

0.257

0.258

0.011

4

  Ret

B/M

TV

            

  20

20

10

0.251

0.242

0.256

0.234

0.215

0.233

0.233

0.256

0.240

0.238

0.014

5

 Mod. market mean

  

0.282

0.294

0.287

0.276

0.261

0.282

0.285

0.293

0.283

0.284

0.011

 

Panel D: Power, effect Size 5%

Seq. TV ±25%

  

0.359

0.367

0.372

0.364

0.340

0.343

0.349

0.377

0.359

0.362

0.014

1

Seq. B/M ±25%

  

0.360

0.369

0.368

0.358

0.347

0.354

0.345

0.364

0.358

0.359

0.009

 

Min. Abs. Diff.

  

0.322

0.344

0.330

0.322

0.308

0.331

0.325

0.331

0.327

0.328

0.010

2

Portfolio comparison techniques

 Rule 1

  

0.356

0.377

0.383

0.362

0.355

0.361

0.353

0.378

0.366

0.362

0.012

 

 Rule 2

  

0.363

0.380

0.392

0.375

0.357

0.383

0.373

0.397

0.378

0.378

0.014

 

 Rule 3

  

0.369

0.392

0.392

0.386

0.362

0.380

0.367

0.400

0.381

0.383

0.014

 

  TV

B/M

Ret

            

  10

5

0.581

0.611

0.602

0.599

0.567

0.586

0.583

0.626

0.594

0.593

0.019

3

  10

5

0.617

0.627

0.629

0.615

0.592

0.608

0.608

0.642

0.617

0.616

0.015

4

  Ret

B/M

TV

            

  20

20

10

0.570

0.580

0.592

0.573

0.546

0.561

0.545

0.601

0.571

0.572

0.020

5

Mod. Market Mean

  

0.614

0.623

0.628

0.619

0.606

0.618

0.623

0.639

0.621

0.621

0.010

 

Panel E: Power, effect Size 7%

Seq. TV ±25%

  

0.591

0.591

0.598

0.602

0.570

0.573

0.580

0.607

0.589

0.591

0.014

1

Seq. B/M ±25%

  

0.585

0.592

0.594

0.594

0.573

0.585

0.586

0.594

0.588

0.589

0.007

 

Min. Abs. Diff.

  

0.555

0.563

0.557

0.566

0.534

0.559

0.535

0.568

0.555

0.558

0.013

2

Portfolio comparison techniques

 Rule 1

  

0.607

0.623

0.615

0.597

0.593

0.605

0.595

0.622

0.607

0.606

0.012

 

 Rule 2

  

0.601

0.614

0.623

0.619

0.600

0.622

0.621

0.634

0.617

0.620

0.011

 

 Rule 3

  

0.627

0.635

0.627

0.623

0.621

0.629

0.634

0.652

0.631

0.628

0.010

 

  TV

B/M

Ret

            

  10

5

0.875

0.885

0.877

0.871

0.854

0.879

0.864

0.899

0.876

0.876

0.013

3

  10

5

0.887

0.894

0.890

0.897

0.880

0.891

0.886

0.909

0.892

0.891

0.009

4

  Ret

B/M

TV

            

  20

20

10

0.852

0.864

0.863

0.861

0.836

0.855

0.851

0.865

0.856

0.858

0.010

5

Mod. market mean

  

0.878

0.884

0.888

0.891

0.867

0.882

0.882

0.895

0.883

0.883

0.009

 

Panel F: Power, effect size 9%

Seq. TV ±25%

  

0.785

0.788

0.777

0.784

0.763

0.776

0.777

0.788

0.780

0.781

0.008

1

Seq. B/M ±25%

  

0.774

0.783

0.787

0.783

0.769

0.776

0.776

0.792

0.780

0.780

0.008

 

Min. Abs. Diff.

  

0.764

0.761

0.759

0.767

0.731

0.752

0.736

0.761

0.754

0.760

0.013

2

Portfolio comparison techniques

 Rule 1

  

0.807

0.815

0.815

0.794

0.799

0.802

0.793

0.826

0.806

0.805

0.012

 

 Rule 2

  

0.809

0.806

0.818

0.804

0.799

0.809

0.807

0.828

0.810

0.808

0.009

 

 Rule 3

  

0.823

0.832

0.822

0.831

0.815

0.819

0.828

0.841

0.826

0.826

0.008

 

  TV

B/M

Ret

            

  10

5

0.979

0.983

0.983

0.981

0.977

0.984

0.973

0.988

0.981

0.982

0.005

3

  10

5

0.982

0.982

0.987

0.983

0.984

0.985

0.982

0.986

0.984

0.984

0.002

4

  Ret

B/M

TV

            

  20

20

10

0.973

0.977

0.976

0.975

0.969

0.977

0.975

0.979

0.975

0.976

0.003

5

 Mod. market mean

  

0.978

0.981

0.981

0.981

0.978

0.978

0.980

0.985

0.980

0.981

0.002

 

Table entries are Type I error rates (Panel A) and power estimates for varying levels of induced abnormal return (Panels B–F). Abnormal returns were induced following Savickas (2003). Power is defined as a test’s ability to detect abnormal performance when it is present and is measured as the percentage of incidences in which the test statistic is significant at the alpha = 5% level for a two-tailed test. Boldface type in Panels B–E indicates the highest level of power. The final column entries indicate the following commonly used techniques: (1) Prevost and Rao (2000), McConnell et al. (2001); (2) Datta et al. (2001); (3) Mitchell and Stafford (2000); (4) Carpenter and Remmers (2001); and (5) Clarke et al. (2004)

Single firm techniques are presented in the first section wherein the comparison firm is the firm satisfying the matching criteria as specified in the first column. Seq. indicates a technique in which the SIC classification iterates from matching at the highest level (4-digit match) to the lowest level (0-digit match) if no firm matches under the specified tolerance levels as in Prevost and Rao (2000) and McConnell et al. (2001). Minimum absolute difference follows Datta et al. (2001) and is the closest firm with the lowest summed minimum absolute value difference between total value, book to market ratio and prior period return

The second section presents portfolio techniques where the matches are selected using the following rules

Rule 1: Using all firms with a 4-digit SIC match, select the firms within ±10% of total market value. From this set of firms, select the firms with book-to-market value ratio within ±10% of the event firm book-to-market value. If there are no firms in this set, expand the book-to-market requirement to ±20% then to ±30%. If there are no firms satisfying these criteria, expand the total market value requirement to ±20% and redo the book-to-market value requirements in the preceding manner. If there are no firms within ±20% of total value, select the firms within ±30% of total value and redo the book-to-market matching as previously specified. If no firms are found using this process in the 3-digit SIC set of firms, use the 2-digit SIC matches and repeat the process. If there are still no firms satisfying the criteria, employ 1-digit and 0-digit matching in iteration using the preceding process at each of the SIC matching levels

Rule 2: Same as Rule 1 with initial matching on closest book-to-market value ratio and subsequent matching on total value

Rule 3: Same as Rule 1 but tolerance levels are increased symmetrically (e.g. ±10% of total value and ±10% of book-to-market value followed by ±20% for each then 30% for each)

Finally, the last section includes portfolio based comparison techniques in which matching portfolios are constructed by sorting the potential matching firms in order of the criteria contained in the first three columns. Table entries in these columns are the number of sorted portfolios. Those first portfolios are then sorted on the next criteria and that number is reported in the table entry. For example, the first row indicates that the returns of all stocks were first sorted into ten portfolios based on total value and then the total value portfolios were then sorted into twenty portfolios based on the members’ book to market value ratio for a total of 200 referent portfolios. The abnormal return for each event stock is the difference between its realized return and the mean return for the portfolio of which it would have been a member. The final portfolio technique uses a modified market mean where the market is composed of all stocks having returns for the entire event period

In all cases, techniques employing matching portfolios of firms have higher power levels than single firm control samples selected using any matching criteria. At low levels of induced abnormal return (<3%, Panel B), the raw difference between power levels is not high but at higher levels of induced abnormal return (3% and greater, Panels C through F) that difference becomes marked. At lower levels of induced abnormal return, the modified market mean model outperforms all other techniques though the difference with respect to the other portfolio techniques is not marked. At a higher level of induced abnormal return, power is slightly higher with the portfolio sorting techniques advocated by Mitchell and Stafford (2000) and Carpenter and Remmers (2001) though no technique is clearly dominant.

In order to ascertain the source of the increase in power, we investigate several possibilities. The underlying foundation of single firm characteristic-based matching is the belief that the reduction in abnormal return variance is primarily due to an increase in the covariance of the return between the matching firm and the event firm. If this covariance is sufficiently large (the firms are sufficiently “similar” in a correlation sense), then the variance of the abnormal return will be markedly reduced and increased power will result. Table 3 presents a summary of the power statistics for the single firm characteristic-based matching techniques in Table 2 and several others. If single firm matching performs as generally believed then the covariance of firms that are “closer” matches should be higher and therefore the power should be higher as well absent any bias in estimated abnormal return. Panel A presents specification and power statistics for the basic sequential methods as well as for methods allowing a gradual relaxation of the SIC requirement used to capture industry level effects. As is readily apparent, all of the single firm techniques are well-specified and the power levels for the differing single firm techniques are all similar to each other. The covariance between the matching firm returns and the event firm returns is almost identical for all the alternative specifications. There is little evidence of any effect from industry matching and, where there is an effect, it does not follow the expected pattern. In the case in which the match is determined as the closest total value match within SIC specification, the pattern is the opposite of what would be expected. The modified market mean is included for comparison purposes and it is readily apparent from the columns indicating relative power that the modified market mean dominates the single firm techniques by at least 16.9% and as much as 36.8% on average.
Table 3

Single firm techniques

Single firm tech.

Mean

Comparison of avg. diff. with MM

 

Type I

1%

3%

5%

7%

9%

%

Raw

Var

Cov

 

Panel A: Summary of power statistics

Seq. TV ±25%

0.047

0.064

0.164

0.359

0.589

0.780

−31.0

−0.178

0.111

0.028

1

SIC 0 closest TV

0.052

0.069

0.173

0.369

0.598

0.788

−28.3

−0.169

0.111

0.020

 

SIC 1 closest TV

0.047

0.064

0.164

0.355

0.583

0.775

−31.3

−0.181

0.110

0.023

 

SIC 2 closest TV

0.048

0.064

0.163

0.351

0.575

0.768

−31.9

−0.185

0.111

0.025

 

SIC 3 closest TV

0.047

0.059

0.145

0.317

0.541

0.741

−36.8

−0.208

0.116

0.027

 

SIC 4 closest TV

0.050

0.065

0.155

0.328

0.543

0.736

−34.3

−0.203

0.111

0.028

 

Seq. B/M ±25%

0.048

0.064

0.168

0.358

0.588

0.780

−30.7

−0.177

0.103

0.029

 

SIC 0 closest B/M

0.051

0.072

0.181

0.371

0.597

0.782

−27.1

−0.168

0.112

0.023

 

SIC 1 closest B/M

0.049

0.069

0.177

0.365

0.587

0.781

−28.6

−0.173

0.114

0.026

 

SIC 2 closest B/M

0.050

0.070

0.180

0.378

0.612

0.798

−26.8

−0.161

0.114

0.028

 

SIC 3 closest B/M

0.051

0.072

0.210

0.462

0.726

0.893

−16.9

−0.096

0.089

0.021

 

SIC 4 closest B/M

0.052

0.071

0.182

0.377

0.605

0.790

−26.7

−0.164

0.103

0.029

 

Min. abs. diff.

0.048

0.058

0.147

0.327

0.555

0.754

−36.1

−0.201

0.112

0.031

2

Mod. market mean

0.049

0.077

0.283

0.621

0.883

0.980

NA

NA

0.015

0.014

 

Size decile

Average bias in event period return

Seq. TV ±25%

Seq. B/M ±25%

Min. abs. diff.

Mod. market mean

Panel B: Selected biases

1

0.022***

−0.006***

0.050***

−0.052***

2

0.010***

0.008***

0.046***

−0.012***

3

0.001

0.009***

0.040***

−0.001

4

−0.001

0.009***

0.047***

0.005***

5

0.000

0.013***

0.046***

0.015***

6

−0.003

0.007***

0.047***

0.018***

7

−0.002

0.006***

0.055***

0.021***

8

−0.001

−0.001

0.053***

0.023***

9

−0.003***

−0.009***

0.048***

0.018***

10

0.004***

−0.008***

0.043***

0.019***

Average

0.003

0.003

0.048

0.005

Table entries in Panel A are average Type I error rates and power levels for the 8 quarters succeeding the simulated event, the percentage difference between the average for each indicated technique and the average power level for the modified market mean technique, the raw difference between the average power level for each technique and the average level for the market model, the variance of the comparison firm returns with the event firm returns (Var) and the covariance of comparison firm returns with event firm returns (Cov). The final column indicators for prevalent techniques are as defined in Table 2. Panel B contains the average difference (bias) between the event firm returns and the comparison firm returns during the event period segmented by size decile and the average of those averages across the size deciles. The average difference is calculated as the average event firm return minus the comparison firm return for each event-firm comparison-firm pair in each sample and then averaged across the entire event period for each sample and the across the samples. Asterisks in the adjacent column indicate statistical significance for a t-test of equality of the average difference to zero

While the denominator of the test statistic is influenced by the variance–covariance structure of event firm and matching firm returns, the numerator is affected by any bias induced by using the returns of one as an estimate of the return of the other. Panel B presents the average bias in the returns of matching and event firms segmented by size decile for several selected techniques. The smallest biases occur with the sequential value matching techniques but all the methods contain some amount of bias. In conclusion, for single firm matching techniques, the matching criteria appear to make little difference in the power of the test statistic since the variance of matching firm returns, covariance terms and biases are similar regardless of the technique chosen. The dominance of the modified market mean technique over these techniques appears to arise from the low variance of the matching portfolio returns rather than from an increased covariance. This suggests that using a portfolio return as the comparison return should improve the power of the test.

Table 4 contains specification and power characteristics for three sequential selection rules and three different minimum portfolio sizes. By using selection rules identical or similar to those presented in the prior table, we can control for the “quality” of the match while simultaneously incorporating the potential portfolio effect suggested by the evidence in Table 3. In effect, Rule 1 is the rule used in Prevost and Rao (2000) but the matching return is the average return to all firms satisfying the tolerance limit rules instead of just the firm whose characteristics are closest to those of the event firm. With the addition of portfolio size requirements (n > 10 and n > 59), the rules (1 through 3) are applied as previously mentioned but are iterated until the minimum portfolio size indicated is achieved. Thus, the first section contains results wherein the process iterates under each rule until the criteria are satisfied regardless of the number of firms satisfying the criteria. The average portfolio sizes under each of the rules are 1.85, 1.82 and 3.52 firms each, respectively. The next section, delineated with n > 19, does the same but the process continues until the matching portfolio has at least 20 firms in it. The average portfolio sizes under each of these rules are 28.25, 29.73 and 37.28 firms each, respectively. Finally, the section indicating n > 59, iterates until the matching portfolio contains at least 60 firms. The average portfolio sizes under each of these rules are 84.57, 84.91 and 90.33 firms each, respectively. By performing the analysis with common rules but differing portfolio sizes, we can isolate the effect of using portfolios as a basis for comparison from the effects of the selection criteria themselves. While all the methods presented are well-specified, the power results here reinforce the results of the prior table. The increase in power caused by using larger portfolios with the same rules is clearly evident as is the source of that increase. Higher power is achieved primarily through the decrease in comparison metric variance and not as a result of increased covariance between the matching firm returns and event firm returns. It is still the case that the modified market mean outperforms the other techniques but that dominance is minimal for the largest comparison portfolios which have similar matching firm variance and covariance.26
Table 4

Sequential selection techniques

Sequential techniques

Mean

Comparison of avg. diff. with MMM

Type I

1%

3%

5%

7%

9%

%

Raw

Var

Cov

Panel A: Summary of power statistics

Rule 1

0.048

0.059

0.163

0.366

0.607

0.806

−31.2

−0.169

0.087

0.032

Rule 2

0.049

0.062

0.168

0.378

0.617

0.810

−29.3

−0.162

0.085

0.032

Rule 3

0.044

0.060

0.172

0.381

0.631

0.826

−28.8

−0.155

0.082

0.032

Rule 1, n > 19

0.050

0.061

0.210

0.512

0.810

0.958

−12.1

−0.049

0.025

0.022

Rule 2, n > 19

0.047

0.061

0.214

0.519

0.814

0.957

−12.6

−0.047

0.025

0.022

Rule 3, n > 19

0.049

0.062

0.212

0.521

0.813

0.959

−11.8

−0.046

0.025

0.022

Rule 1, n > 59

0.051

0.078

0.268

0.603

0.871

0.976

−1.7

−0.010

0.018

0.017

Rule 2, n > 59

0.049

0.078

0.274

0.606

0.875

0.977

−1.1

−0.007

0.018

0.017

Rule 3, n > 59

0.050

0.074

0.267

0.598

0.870

0.976

−3.0

−0.012

0.018

0.017

Mod. market mean

0.049

0.077

0.283

0.621

0.883

0.980

NA

NA

0.015

0.014

Table entries are average Type I error rates and power levels for the 8 quarters succeeding the simulated event, the percentage difference between the average for each indicated technique and the average power level for the modified market mean technique, the raw difference between the average power level for each technique and the average level for the market model, the variance of the comparison firm returns with the event firm returns (Var) and the covariance of comparison firm returns with event firm returns (Cov). The final column indicators for prevalent techniques are as defined in Table 2

Table 5 concludes the detailed examination of matching strategies by presenting the Type I error rates, power level characteristics and biases for the portfolio sorting based techniques. Once again, all the variations are well-specified and have similar levels of power. While the modified market mean still outperforms them, the difference is small in an absolute sense (raw differences of .029 or less) and less than 9.8% in a relative sense. With a broad array of different sorting techniques used to arise at comparison portfolios, the variance and covariance of the matching firm returns is similar across all the differing techniques suggesting no single sorting technique is any better than any other. The highest average power levels occur with the modified market mean model which is also the least costly technique to implement. As anticipated, these power results appear to arise from the very low comparison firm variance which is more than enough to offset the reduced covariance between event firm returns and control portfolio returns.27
Table 5

Sorted portfolio techniques

Portfolio techniques

Mean

Comparison of avg. diff. with MMM

Type I

1%

3%

5%

7%

9%

%

Raw

Var

Cov

 

Panel A: Summary of power statistics

TV

B/M

Ret

           

10

20

0.051

0.066

0.235

0.565

0.857

0.974

−8.8

−0.029

0.027

0.027

 

5

20

0.049

0.059

0.231

0.567

0.863

0.977

−10.6

−0.029

0.025

0.025

 

4

20

0.051

0.062

0.231

0.571

0.861

0.978

−9.7

−0.028

0.024

0.024

 

10

10

0.050

0.061

0.234

0.572

0.859

0.975

−9.8

−0.029

0.025

0.025

 

5

10

0.050

0.063

0.237

0.574

0.865

0.976

−8.9

−0.026

0.024

0.024

 

4

10

0.050

0.066

0.243

0.586

0.870

0.978

−7.1

−0.020

0.023

0.023

 

10

5

0.046

0.066

0.245

0.594

0.876

0.981

−6.6

−0.016

0.024

0.024

3

5

5

0.047

0.065

0.249

0.596

0.878

0.980

−6.4

−0.015

0.023

0.023

 

4

5

0.050

0.066

0.249

0.599

0.879

0.981

−6.0

−0.014

0.023

0.022

 

10

20

0.050

0.063

0.237

0.583

0.869

0.978

−8.5

−0.023

0.024

0.024

 

5

20

0.052

0.062

0.236

0.585

0.877

0.980

−8.5

−0.021

0.022

0.022

 

4

20

0.051

0.063

0.242

0.594

0.881

0.983

−7.4

−0.016

0.021

0.021

 

10

10

0.049

0.064

0.245

0.600

0.882

0.983

−6.7

−0.014

0.022

0.022

 

5

10

0.049

0.065

0.247

0.601

0.886

0.983

−6.2

−0.012

0.021

0.020

 

4

10

0.050

0.064

0.242

0.598

0.886

0.984

−6.9

−0.014

0.020

0.020

 

10

5

0.049

0.064

0.257

0.617

0.892

0.984

−5.1

−0.006

0.021

0.020

4

5

5

0.052

0.067

0.256

0.611

0.889

0.984

−4.6

−0.007

0.020

0.020

 

4

5

0.048

0.062

0.251

0.604

0.889

0.984

−6.5

−0.011

0.020

0.019

 

Ret

B/M

TV

           

20

20

10

0.048

0.063

0.240

0.571

0.856

0.975

−9.0

−0.028

0.029

0.029

5

Mod. market mean

  

0.049

0.077

0.283

0.621

0.883

0.980

NA

NA

0.015

0.014

 

Size decile

Average bias

Seq. TV ±25%

Seq. B/M ±25%

Ret 20, B/M 20, TV 10

Mod. market mean

Panel B: Selected biases

1

0.029***

0.032***

0.017***

−0.052***

2

0.011***

0.011***

0.007***

−0.012***

3

0.005***

0.004***

0.003

−0.001

4

0.000

0.000

−0.001

0.005***

5

0.000

0.000

−0.004*

0.015***

6

−0.002

−0.003**

−0.005***

0.018***

7

−0.004***

−0.004***

−0.005***

0.021***

8

−0.003***

−0.003***

−0.004***

0.023***

9

−0.003***

−0.003***

−0.004***

0.018***

10

−0.002***

−0.002***

−0.003***

0.019***

Average

0.003

0.003

0.000

0.005

Table entries are average Type I error rates and power levels for the 8 quarters succeeding the simulated event, the percentage difference between the average for each indicated technique and the average power level for the modified market mean technique, the raw difference between the average power level for each technique and the average level for the market model, the variance of the comparison firm returns with the event firm returns (Var) and the covariance of comparison firm returns with event firm returns (Cov). The final column indicators for prevalent techniques are as defined in Table 2

The selection of comparison firms in portfolio-based techniques also results in a bias existing between event firms and their comparison portfolio counterparts (see Panel B). The bias is generally smaller than the bias for the single firm matching techniques but also exhibits a high degree of statistically detectible difference. The portfolio techniques result in biases wherein smaller event firm returns have a positive bias (smallest size decile = 1) and larger firms have a negative bias (largest size decile = 10).

By examining the degree of statistical similarity between firms and their matched counterparts, we can determine if the matching-based routines provide matches that are statistically similar at the time of formation. If the event firms and matching firms are not statistically similar, it is an error to classify them as being the same. We refer to this as a classification error.28 By classifying event firm and matching firms as being the same if the matching firms satisfy arbitrary tolerance bands, measurement error is introduced into the process of determining abnormal future performance. To the extent that firms that are classified as having the same characteristics are in fact different, their return generating processes would also be expected to differ. In summary, the evidence in Tables 3 and 5, Panel B of both, indicates that characteristic-based matching methodologies that rely on a single firm or a portfolio of firms as the comparison firm have as their foundation matching and event firms that are not statistically similar in many respects including prior period stock return. This bias is worse for single-firm techniques. This suggests that single firm matching contains a greater classification error and is potentially inferior to portfolio matching.29

The foundation of characteristic-based matching lies in the argument that the matched firms and the event firms are “similar” at the time of the event Kothari and Warner (2005) with respect to the matching characteristics. Differences in stock price performance post event are then argued to be differences caused by the event. The power of the test used to detect these differences is fundamentally related to the variance of the forecast error which is in turn related to the variance of the forecasted return and the covariance of that return with the return to the event firms. Figure 1 summarizes the evidence in Tables 3, 4 and 5 by presenting average power for the portfolio sorting techniques, the single firm techniques and the modified market mean technique. Portfolio techniques are well-specified and dominate single-firm techniques at all levels. The evidence suggests that the effect on power of a reduction in the variance of the comparison metric returns gained by using portfolio returns rather than single firm returns more than offsets the decline in covariance (Fig. 2). This evidence indicates that researchers wishing to achieve appropriate Type I error rates and higher power should use portfolio based comparison techniques the simplest of which is the modified market mean technique where the comparison portfolio is the equally-weighted portfolio of firms with returns for the entire event period.
https://static-content.springer.com/image/art%3A10.1007%2Fs11156-010-0191-2/MediaObjects/11156_2010_191_Fig1_HTML.gif
Fig. 1

Average power. The evidence in Tables 3, 4 and 5 by presenting average power for the portfolio sorting techniques, the single firm techniques and the market mean technique. The three lines represent single firm, sorted portfolio techniques and the modified market mean technique. Induced abnormal return is reported on a quarterly basis. Power is measured by identifying the ratio of instances in which the two-tailed test statistic correctly identifies instances of induced abnormal return to the total number of tests for differing levels of induced abnormal return

https://static-content.springer.com/image/art%3A10.1007%2Fs11156-010-0191-2/MediaObjects/11156_2010_191_Fig2_HTML.gif
Fig. 2

Variance-covariance contributions. This figure presents the average contribution of matching firm variance, the average covariance of event firm returns with matching firm returns and the average combination of the two to the denominator of the test statistic for each indicated family of techniques (single firm, sorted portfolio and modified market mean). Since event firm variance is constant for any given sample, the bars represent the relative increase (decrease) in abnormal return variance and consequently the decrease (increase) in the value of the test statistic employed under each technique

In order to validate our tests and to provide a method for discerning the timing of event effects within a long time frame, we perform a sample long-term event study using event dates extracted from Thompson Financial Corporation’s SDC Global New Issues database for public new issues of straight debt and privately placed preferred stock.30 Examples representing both categories of events from 1982 to 2005 are presented to illustrate the proposed sequence of tests. The announcements are for straight debt issued in public markets (13,316 announcements and preferred stock that is privately placed (84 announcements.) For each example, a firm is considered an event firm with quarter 0 being the quarter in which the earliest announcement date associated with an event occurs. Event firms are included only if the firm has return data for the next eight quarters.

Table 6 summarizes for the event quarter and the eight quarters after the event the mean abnormal return and the p-value for the two-tailed test of nonzero abnormal returns. Three matching techniques are used: the analogs of the single firm matched of McConnell et al. (2001) and Prevost and Rao (2000), the portfolio method of Mitchell and Stafford (2000), and the modified market mean technique suggested by the power analysis. The abnormal return estimate and the p-values for the usual long-term tests based on these three methods are given for the 1 and 2 year time frame and the 1 year based on quarters 5–8. Several things are of immediate note. First, sample size has the anticipated effect. Large sample sizes lead to lower cutoff levels. Second, there are very different readily discernible patterns in the abnormal returns over the succeeding eight quarters. For straight debt public issues, the abnormal returns are positive and statistically significant for the entire eight subsequent quarters. The conclusions of the long-term test are the same for all three methods indicating significant positive abnormal returns for the first year, for the second year and for the 2-year interval.
Table 6

Event study examples

Single Firm Tech.

 

Quarters after

Long term tests

0

1

2

3

4

5

6

7

8

1–4

5–8

1–8

 

Panel A: Public straight debt issue

Seq. TV ±25%

MAR

0.024

0.019

0.012

0.013

0.011

0.010

0.010

0.010

0.010

0.220

0.086

1.750

1

 

p-Val

0.000

0.000

0.000

0.000

0.000

0.001

0.002

0.002

0.001

0.000

0.000

0.000

 

Portfolio tech.

 10 TV, 5 B/M

MAR

0.028

0.023

0.016

0.017

0.015

0.014

0.014

0.014

0.014

0.210

0.091

1.749

3

 

p-Val

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

 

 Mod. mkt. mean

MAR

0.021

0.016

0.009

0.01

0.008

0.007

0.007

0.007

0.007

0.267

0.137

1.91

 
 

p-Val

0.000

0.000

0.000

0.000

0.000

0.002

0.002

0.003

0.001

0.000

0.000

0.000

 

Panel B: Private placement preferred stock

Seq. TV ±25%

MAR

0.029

−0.031

−0.022

−0.022

−0.077

−0.032

−0.079

−0.050

−0.066

0.041

−0.108

−0.020

1

 

p-Val

0.538

0.404

0.595

0.543

0.032

0.466

0.029

0.195

0.078

0.429

0.011

0.748

 

Portfolio tech.

 10 TV, 5 B/M

MAR

0.052

−0.007

0.001

0.002

−0.053

−0.009

−0.056

−0.027

0.043

−0.041

−0.176

−0.240

3

 

p-Val

0.212

0.789

0.965

0.945

0.058

0.791

0.059

0.353

0.152

0.361

0.002

0.041

 

 Mod. mkt. mean

MAR

0.042

−0.017

−0.009

−0.008

−0.063

−0.019

−0.066

−0.037

−0.053

0.004

−0.151

−0.093

 
 

p-Val

0.198

0.507

0.778

0.739

0.010

0.549

0.013

0.135

0.040

0.928

0.001

0.085

 

Table entries indicate the average abnormal return (MAR) for each quarter after the event and the p-value of the test statistic for the single firm technique of Prevost and Rao (2000), the portfolio method of Mitchell and Stafford (2000) and the modified market mean approach. The corresponding long term test is also performed for 1 year after the event (Quarters 1–4), the second year after the event (Quarters 5–8) and the 2 years after the event (Quarters 1–8). Panel A contains the results for 13,316 straight debt issues obtained from Thompson Financial Securites Data Corp New Issues Database and panel B contains the results for 84 private placement preferred stock issues from the same source

For private placements of preferred stock, the initial quarter sees a positive abnormal return, albeit not statistically significant. Significant negative abnormal returns are detected at quarters 4, 6 and 8 after the event. None of the long-term tests find a significant abnormal return in the first year but they all detect a significant negative abnormal return in the second year. Only the portfolio technique detected a significant abnormal return over the 2 year time frame. Long-term studies looking at 2 years after the event would look at only the 1 year and the 2 year time frames, not the second year test. They would conclude no significant abnormal returns in the first year but significant negative abnormal returns over the 2-year time frame. The individual quarter tests sharpen the picture of when the negative abnormal returns occurred.

In both cases, the quarterly returns methodology employing the modified market mean approach performs similarly to the more commonly employed techniques but at significantly reduced cost relative to the more complex techniques. Researchers can avoid the biases introduced by compounding daily or monthly returns in long-term event studies and the subjectivity of matching routines by using quarterly returns and the suggested modified market model.

5 Conclusion

We performed tests on non-overlapping quarterly time frames for periods up to 2 years are investigated as a way to enhance the information in long-term event studies. The proposed tests are simple to implement, are well-specified and have good power characteristics. We document that the single firm characteristic-based matching techniques have low power. We also observed that a portfolio of all stocks with data for the entire event period serves as well for comparison as more complicated selection strategies. We conclude that characteristic-based matching results in low power because of a lack of statistical comparability between event firms and matching firms that arises from the lack of statistical similarity at the point of matching and across time. Our results suggest that popular methods are well-specified when performed on quarterly time intervals even out to 2 years. Those methods that use portfolios returns for comparison have higher levels of power than those that use single-firms. The results also suggest that the cost of using sorting techniques can be avoided by using the portfolio of all stocks with returns for the event period as the comparison portfolio (modified market model). This technique has appropriate Type I errors, high levels of power, and low implementation costs. We conclude from this evidence that long-term return analysis is best performed using the stocks of all firms with quarterly returns for the event period as the comparison portfolio.

Footnotes
1

Long-term event studies may also be used to test market efficiency, particularly in those instances where the relationship between long-term stock price performance and short-term announcement effects are investigated.

 
2

Palmon et al. (2009) trace the earliest event study work back to Dolley (1933).

 
3

Additionally, the tests can have specification problems due to skewness in the distribution of abnormal returns and biases due to new listing, survivorship, overlapping-horizons and portfolio rebalancing. Most of these problems are caused by or amplified by the accumulating and/or compounding of the abnormal returns over long time frames. A number of modifications to the t test based on a characteristic based matching methodology have been proposed in the literature. Kothari and Warner (2005) present a comprehensive summary of these methods.

 
4

The buy-and-hold abnormal return (BHAR) approach is another name for the characteristic-based matching method. The BHAR approach is described by Mitchell and Stafford (2000) as “the average multiyear return from a strategy of investing in all firms that complete an event and selling at the end of a prespecified holding period versus a comparable strategy using otherwise similar nonevent firms”.

 
5

We review and discuss the prevalence of these techniques as applied in Table 1 and in the following section.

 
6

We report only the results using quarterly data. Monthly tests have consistently lower power for a fixed level of induced abnormal return. Monthly results are available upon request.

 
7

It is of interest to note that there is an epistemological issue that is largely ignored in the design of long-term event studies. In a short term event study focusing on the information impact of firm news, a relatively short time period, often a single day, is generally believed to be sufficient to assess if the news did indeed have an impact. In long-term event studies, there has been no work to the best of our knowledge to determine an appropriate time period for long-term abnormal return assessment. The selection of an appropriate time frame is left up to the investigator with long-term abnormal return estimation regularly reported as anything from a single year out to 5 years.

 
8

Power is a test’s ability to detect abnormal performance when it is present and is measured as one minus the probability of a type II error.

 
9

The Journal of Business published 12, the Review of Financial Studies published seven, the Journal of Financial and Quantitative Analysis published 17, the Journal of Financial Economics published 36 and the Journal of Finance published 27.

 
10

Fama (1998) argues for a calendar time approach and against the characteristic-based matching approach because of the systematic errors that arise when imperfect expected return proxies are compounded over long horizons. Mitchell and Stafford (2000), in their study of the long-term impact of mergers, seasoned equity offerings and share repurchases, claim that measuring long-term abnormal performance with mean BHARs in conjunction with bootstrapping is not an adequate methodology because it assumes independence of multiyear event-firm abnormal returns. In contrast, Loughran and Ritter (2000) argue that the calendar time portfolio approach has low power to detect abnormal performance because it averages over months of “hot” and “cold” event activity.

 
11

To the best of our knowledge, existing research provides no guidance as to the appropriate length of time over which to examine for long-term abnormal returns nor does theoretical work typically specify the appropriate length of time. This is in stark contrast to short-term event studies which propose short time frames for the incorporation of new information as a result of market efficiency.

 
12

The problem is similar to that of investigating autocorrelation in time series. Overall tests for the presence of autocorrelation are common, but they are rarely used without a more detailed look at the pattern of the autocorrelations at the individual lags. Separate tests and methodologies are used for the two different dimensions of the problem.

 
13

An additional iteration may be added to accommodate the momentum factor noted by Carhart (1997) in which control firms are matched to the prior 1-period returns of the event firms.

 
14

Tolerance levels vary from study to study with no clearly dominant choices. In many instances, tolerance levels are not symmetric (e.g. 10% on size but 30% on book-to-market). A comprehensive list of published articles and their tolerance levels for the Journal of Finance, Journal of Financial Economics, Review of Financial Studies, Journal of Business and Journal of Financial and Quantitative Analysis for the 1997–2006 period is available from the authors upon request. Generally, if no match is found within the set defined by the tolerance bands the closest match is selected by relaxing first the tolerance requirements then the level of SIC matching iteratively until no SIC match is required and the firm with the closest criteria match is used.

 
15

Barber and Lyon (1997), Lyon et al. (1999), Mitchell and Stafford (2000) and others all investigate various aspects of some of the test statistics but not the large set of alternative specifications generated in different empirical applications.

 
16

Because the tests are independent, in the application of our proposed sequence of tests, an event firm can be included in only the quarters for which data is available. We exclude such a process here to ensure comparability to prior studies.

 
17

Brown and Warner (1980) investigate characteristic-based matching in short-term event studies where event firms were matched to control firms on beta.

 
18

The papers cited as being representative do not represent the exhaustive list of all papers employing each particular technique. As previously mentioned, there are a myriad of modifications to basic long-term return estimation. These techniques are representative of the vast majority in basic form.

 
19

Indicated simultaneously within the tables on the right hand side with the number 1.

 
20

Indicated within the tables on the right hand side with the number 2.

 
21

Indicated within the tables on the right hand side with the number 3.

 
22

Indicated within the tables on the right hand side with the number 4.

 
23

Indicated within the tables on the right hand side with the number 5.

 
24

Monthly versions of the tables are available upon request. The evidence in them shows that for a given size of persistent effect, quarterly tests are well-specified and have power equal to or greater than monthly tests.

 
25

The induced abnormal returns are reported on a quarterly basis. Thus, a 5% induced abnormal return is equivalent to an annual difference in returns of 21.6%.

 
26

While we have not included the bias statistics results here, they are available upon request. It is interesting to note that when the bias is calculated for the differing portfolios at each stage of the sequential matching process, there is no discernable pattern in sign, size or statistical significance. As previously mentioned, if characteristic-based matching works as hypothesized, the bias should be smaller for early matches and larger for later matches with statistical significant differences from zero occurring later in the matching process. This lack of pattern suggests that the reason that the covariance does not increase is because the firms aren’t actually “similar” in a statistical sense even though they satisfy the ad hoc matching rules created by researchers.

 
27

Barber and Lyon (1997) find that the market mean is mis-specified when using yearly BHARs at and beyond 1-year. By using quarterly data, we provide a mechanism for testing out to 2 years that is well-specified and has a relatively high level of power.

 
28

We are grateful to Stephen Brown for the discussion that led us to refer to this as an error in classification or categorization.

 
29

Our findings from this and prior tables also have relevance to the calendar-time matching technique. Regardless of rebalancing, the first-quarter results presented here are the best that can be achieved by rebalancing. Since the first-quarter results in Table 2 are still dominated by portfolio comparison techniques, using a portfolio control sample is also preferable to the much more costly calendar-time portfolio technique.

 
30

Speiss and Affleck-Graves (1999) examine long-run performance of debt issues; Krishnan and Laux (2005) and Howe and Lee (2006) examine preferred stock long-run performance; Yang and Lau (2010) examine long-run performance of Yankee stock offerings and Chen et al. (2009) examine the long-run reaction to share repurchases.

 

Acknowledgments

We would like to thank Stephen Brown, Kim Sawyer, Ted Moore, Jeff Netter, LeRoy Brooks, Terry Shevlin, Rob Brown, Steven Mann, participants at the 2006 Financial Management Association Meetings and 2007 Southern Finance Association Meetings for their valuable suggestions and conversations regarding this paper. Any remaining errors are our own.

Copyright information

© Springer Science+Business Media, LLC 2010