Sup-ADF-style bubble-detection methods under test

In this paper, we analyze the capacity of supremum augmented Dickey–Fuller (SADF), generalized SADF (GSADF), and of several heteroscedasticity-adjusted sup-ADF-style tests for detecting and date-stamping financial bubbles. Our Monte Carlo simulations find that the majority of the sup-ADF-style tests exhibit substantial size distortions, when the data-generating process is subject to leverage effects. Moreover, the sup-ADF-style tests often have low empirical power in identifying a (flexible and empirically relevant) rational stock-price bubble, recently proposed in the literature. In a simulation study, we compare the effectiveness of two real-time bubble date-stamping procedures (Procedures 1 and 2), both based on variants of the backward SADF (BSADF) test. While Procedure 1 (predominantly) provides better estimates of the bubbles’ origination and termination dates than Procedure 2, the first procedure frequently stamps non-existing bubbles. In an empirical application, we use NASDAQ data covering a time-span of 45 years and find that the bubble date-stamping outcomes of both procedures are sensitive to the data frequency chosen by the econometrician.


Introduction
In a series of influential articles, Phillips, Wu, and Yu (2011;PWY hereafter) and Phillips et al. (2014Phillips et al. ( , 2015a   unit-root testing for explosiveness in time-series data. In the wake of this work, the most prominent testing procedures-the sup augmented Dickey-Fuller (SADF) test and its generalized version (the GSADF test)-have been applied in a plethora of empirical studies, in which data explosiveness is interpreted as indicating an asset-price bubble. Along this line of argument, the studies aim at detecting speculative bubbles in alternative types of financial markets, for example in stock markets (Homm and Breitung 2012), commodity markets (Long et al. 2016), as well as in housing (Pan 2019;Hu and Oxley 2018a) and currency markets (Bettendorf and Chen 2013;Hu and Oxley 2017). In the meantime, the popularity of the SADF and GSADF tests has been enhanced further by the fact that both testing procedures have become standard routines in such econometric software packages as EViews and R (Caspi 2017).
In simulation experiments, Phillips, Shi, and Yu (2015a;PSY hereafter) demonstrate that SADF and GSADF tests have high discriminatory power when artificial stock-price data are generated under periodically collapsing Evans bubbles (Evans 1991). While the rational Evans bubble has become a benchmark specification in the theoretical and empirical literature, Wilfling (2014, 2018) elaborate two theoretical properties of the Evans bubble that appear irreconcilable with realworld stock-price dynamics. (i) The Evans bubble always collapses completely within one trading unit, implying that stock-price volatility also collapses abruptly within one period. (ii) After a crash, the Evans bubble necessarily reverts to the same expected value, a phenomenon for which there is no justification. By contrast, Rotermann and Wilfling (2018) propose an alternative rational bubble specification-in the form of a lognormal-mixture process-which is able to generate more realistic, stochastically deflating bubble trajectories. In the subsequent sections, we will make extensive use of this bubble specification.
In this article, we investigate the capacity of various sup-ADF-style testing procedures for detecting and date-stamping financial bubbles. Besides the original SADF and GSADF tests introduced by PWY and PSY, we consider the heteroscedasticity-adjusted variants (bootstrapped and sign-based SADF and GSADF tests), as established in Harvey, Leybourne, Sollis, and Taylor (2016;HLST) and Harvey, Leybourne, and Zu (2020;HLZ). In various ways, we extend and modify the size, power, and date-stamping analyses from the above-mentioned articles. Our investigation has the following major findings. (i) As a prominent equity-market volatility asymmetry, we consider the leverage effect, according to which negative shocks often have a relatively larger impact on volatility than positive shocks. To capture such heteroscedasticity, we study the effects of a threshold GARCH (TGARCH) volatility structure on the empirical size of the sup-ADF-style tests and find that several of them reveal considerable size distortions. (ii) We generate artificial stock-price data under the above-mentioned rational Rotermann-Wilfling bubble specification. Our simulations show that the sup-ADF-style tests often have low empirical power under this realistic bubble model.
(iii) In a simulation study, we contrast two real-time bubble date-stamping strategies-both based on variants of the backward SADF (BSADF) test-namely (1) the procedure established in PSY, and (2) a methodology using the sign-based test statistic of HLZ. For our simulated bubble settings, we find that the first strategy (predominantly) outperforms the second by yielding more accurate estimates of the bubbles' origination and termination dates. However, the first procedure shows a pronounced tendency to date-stamp non-existing bubbles. (iv) In an empirical application, we apply the two bubble date-stamping strategies to NASDAQ data and compare the date-stamping results when using monthly versus daily observations. Our findings reveal that the dating strategies are sensitive to the practitioner's choice of data frequency. We explain this phenomenon by the structural changes in the data-generating process that come along with the frequency shift.
The remainder of the paper is organized as follows. Section 2 briefly reviews the essentials of the present-value stock-price model and the rational Rotermann-Wilfling bubble specification. Section 3 recapitulates the sup-ADF-style tests and analyzes their properties via Monte Carlo simulation. Section 4 investigates the date-stamping procedures, and Sect. 5 concludes.

Present-value model, rational bubbles, and explosiveness
PWY and PSY motivate their SADF and GSADF testing procedures on the basis of the well-known present-value stock-price model with constant expected returns (Campbell et al. 1997). Within this framework, the date-t stock-price P t is given by the Euler equation where E t (·) denotes the conditional expectation operator and D t+1 the dividend payment between t and t + 1. The constant r > 0 is the discount factor, often referred to as the required rate of return, which is just sufficient to compensate investors for the riskiness of the stock. 1 The first-order expectational difference Eq. (1) can be solved routinely, by repeatedly substituting future prices forward. The entire class of solutions to Eq. (1) is given by where B t is any stochastic process satisfying the submartingale property The quantities P f t = P t − B t , and B t in Eqs. (2) and (3) are called 'fundamental stock-price' and 'rational bubble', respectively.
Besides the constituting submartingale property (3), any rational (stock-price) bubble should satisfy two additional theoretical properties, as pointed out by Diba and Grossman (1988a, b). (i) Rational bubbles cannot start from zero, and (ii) negative bubbles are ruled out, as t → ∞. The most frequently applied rational, parametric specification satisfying all these properties is the Evans (1991) bubble. However, this bubble reveals a major empirical shortcoming in that it always bursts entirely, from one trading unit to the next. These abrupt bursts not only entail unrealistic stock-price trajectories, but also incompatible volatility dynamics (Rotermann and Wilfling 2014). In Sect. 3, we consider the bubble specification suggested by Rotermann and Wilfling (2018). This bubble model-a mixture of two lognormal processes-(i) generates realistic trajectories and stock-price volatility paths and (ii) satisfies the rationality condition (3) plus the two Diba-Grossman conditions mentioned above.
For our stock-price simulations in Sect. 3, we need to specify the fundamental stockprice process {P f t }. To this end, we adopt the frequently encountered assumption that dividends follow a random walk with drift, where {e t } ∞ t=1 is an i.i.d. Gaussian white-noise process with mean 0 and variance σ 2 e (see, inter alia, Homm and Breitung 2012). Taking conditional expectations, and adding them as in Eq. (2), we obtain the fundamental stock price as which, after inserting Eq. (5) into (6) and rearranging the terms, yields ι ι 2 = 0.02, π π = 0.87, α α = 0.91 ι ι 2 = 0.02, π π = 0.985, α α = 0.998 ι ι 2 = 0.005, π π = 0.85, α α = 0.88 ι ι 2 = 0.005, π π = 0.96, α α = 0.99 showing that the fundamental stock price P f t also follows a random walk with drift. At this stage, some comments are in order on the interrelation between the concepts 'explosiveness in asset prices' and 'existence of a bubble'. In our frameworkconsisting of Eqs. (1)-(7)-the fundamental stock price P f t constitutes a (nonexplosive) I (1) process. Therefore, in view of Eqs. (2) and (3), if we find empirical evidence of explosive behavior in the stock-price process (2), we can attribute this explosiveness to the rational bubble. This profound conclusion, however, hinges crucially on the specific assumptions of our framework, and is far from being generally valid. To illustrate, let us consider two alternative model setups. (i) A situation, in which the fundamental stock price (for whatever economic reason) follows an explosive process. (ii) An extended rational valuation framework with stochastic discount factors (instead of our constant required rate of return r ). 3 It is straightforward to verify that, under both scenarios, explosiveness in stock prices neither constitutes a necessary nor a sufficient condition for deducing the existence of a bubble.

SADF and GSADF tests
The SADF and GSADF tests for explosiveness (applied to the time series {y t } T t=0 ) rest on well-defined sequences of t-statistics (ADF statistics) of the parameter θ , estimated from the empirical specification where k is the transient lag-order, denotes the first-difference operator, and ε t i.i.d.
The objective is to test the unit-root null hypothesis H 0 : θ = 1 versus the right-tailed alternative of explosiveness, H 1 : θ > 1. For characterizing the respective sequences of ADF statistics, which are needed to formally represent the ultimate SADF and GSADF test statistics, we consider subsamples over the time domain {0, 1, . . . , T } as fractions of the original sample. For this purpose, let the fractions (i) r 0 , (ii) r 1 , and (iii) r 2 , respectively, denote (i) the (fractional) width of the smallest subsample (used to initialize the computation of the test statistic), (ii) the (fractional) starting point of a subsample, and (iii) the (fractional) endpoint of a subsample.
Using this notation, PWY define the SADF test statistic as the sup-ADF-statistic from repeated estimation of the empirical regression (8) on a forward expanding sample sequence. Specifically, the authors consider as given, the minimal sample window width r 0 , set the subsample starting point r 1 = 0, and let the subsample endpoint r 2 range between r 0 and 1. Denoting the ADF-statistic for a subsample running from r 1 to r 2 by ADF r 2 r 1 , they define the SADF test statistic as The GSADF test-suggested by PSY with the goal of improving the detection capacity under multiple stock-price bubbles-essentially pursues the same idea as the SADF test, but processes more subsamples to estimate the ADF-regression (8). In contrast to the SADF variant, the GSADF test allows the fractional starting point r 1 to range between 0 and r 2 − r 0 , implying a double recursive subsample structure. The corresponding test statistic is defined as PSY derive the asymptotic null distributions of the SADF and GSADF test statistics on the basis of the prototypical model with weak (local to zero) intercept form, with constants a and η > 1/2. Under the null hypothesis of a unit root (θ = 1), the limiting distributions of the test statistics are given by and where W (·) denotes the standard Wiener process.

Heteroscedasticity adjustments
Recently, HLST and HLZ have addressed the issue of bubble detection under heteroscedasticity. We briefly review their testing procedures, which both refer to non-stochastic unconditional heteroscedastic patterns.

Wild-bootstrap SADF and GSADF tests
HLST suggest using a wild-bootstrap re-sampling scheme in order to obtain a sizecontrolled SADF testing procedure, in the presence of non-stationary (unconditional) volatility. Their wild-bootstrap algorithm consists of the following five steps.
Step 1. Generate a sequence {w t } T t=2 of independent N(0, 1) random variables and construct the following series of bootstrap innovations Step 2. Construct the bootstrap sample as the partial sums Step 3. For a subsample running from r 1 = 0 to r 2 , consider the t-value for θ * in the fitted OLS regression, and denote this bootstrap ADF-statistic by ADF r 2 r 1 =0 * . Compute the bootstrap SADF test statistic (denoted by SADF * ) in the usual way: Step 4. Repeat Steps 1 to 3 B times to generate a sample of bootstrapped SADF statistics, SADF * b B b=1 . Use this sample to approximate the cumulative distribution function of SADF * , denoted by G * T (·).
Step 5. To conduct the wild-bootstrap SADF test at the significance level α, compute the SADF test statistic from Eq. (9), using the original sample {y t } T t=1 , and reject the unit-root null hypothesis, if the SADF statistic exceeds the 1 − α quantile G * We denote the wild-bootstrap SADF testing procedure by SADF bootstrap , in order to distinguish it from the original SADF test described in Sect. 3.1. In our subsequent analysis, we also apply the bootstrap algorithm to the original GSADF test and denote this variant by GSADF bootstrap . We emphasize that-due to the re-sampling schemethe critical values of the SADF bootstrap and GSADF bootstrap tests are always adapted to the specific trajectory {y t } T t=1 , which is to be tested for explosiveness. Therefore, the critical values of the bootstrapped tests, obtained from {y t } T t=1 , are not universally applicable to other trajectories.

Sign-based SADF and GSADF tests
As a further heteroscedasticity adjustment, HLZ propose the sign-based GSADF test under deterministically time-varying (unconditional) volatility. The idea is to compute the GSADF statistic from Eq. (10) not from the directly observed series {y t } T t=1 , but rather from the transformed series where sign(x) ∈ {− 1, 1} for x ≤ 0 and x > 0, respectively, is the sign function. Per definition, {C t } T t=2 is the series of cumulated signs of the first differences. For a subsample running from r 1 to r 2 , we consider the t-value for ϑ in the fitted OLS regression, In our analysis below, we also consider sign-based SADF tests (denoted by SADF sign-based ) by setting r 1 = 0 in Eq. (16).

Monte Carlo study
We first approximate asymptotic critical values of the SADF, SADF sign-based , GSADF and GSADF sign-based tests. 4 In a second step, we analyze the empirical size and power of the tests.
We approximate the asymptotic critical values of the four test statistics via Monte Carlo simulation. In contrast to PSY, we do not make use of the asymptotic null distributions from Eqs. (12) and (13), the simulation of which requires an approximation of the Wiener process. Instead, we simulate the critical values by restricting the datagenerating process to the prototypical specification in Eq. (11) with parameters θ = 1 and a = η = 1, and use the sample size T = 5000. (The parameter values for a and η are taken from PSY.) With these settings, we approximate the asymptotic critical values by simulating 100,000 and 12,500 replications of the SADF, SADF sign-based and GSADF, GSADF sign-based statistics, respectively.
The upper block of Table 1 reports our asymptotic critical values of the SADF and GSADF tests. The values share two features with their analogs from the PSY simulations via the asymptotic null distributions from Eqs. (12) and (13). (i) The critical values of both test statistics increase with a decrease in minimal window size r 0 . (ii) The GSADF critical values always exceed their SADF counterparts. In most cases, our asymptotic critical values are slightly larger than those reported by PSY, yielding more conservative rejections of the null hypothesis. In our analysis below, we prefer our critical values from Table 1 to those provided by PSY. The lower block of Table 1 displays our asymptotic critical values of the SADF sign-based and GSADF sign-based tests, which exhibit the same qualitative features as those from the upper block. Large-sample critical values are approximated by simulating the data-generating process from Eq. (11) with θ = 1 and a = η = 1, and using Eq. (14). The sample size is T = 5000. The numbers of replications are 100,000 for the SADF, SADF sign-based , and 12,500 for the GSADF, GSADF sign-based tests, respectively Table 2 Sizes of SADF, SADF sign-based , GSADF, GSADF sign-based tests in finite samples when using asymptotic critical values Sizes are obtained by simulating data (under the null hypothesis) from the specification in Eq. (11) with parameters a = η = θ = 1, and Eq. (14). Using 10,000 replications, the sizes are computed via the (asymptotic) 95% critical values from Table 1 In order to check the validity of our asymptotic critical values, when applied to finite samples, we simulate sizes for the four tests. For this purpose, we generate data from the prototypical process in Eq. (11) under the null hypothesis (with parameters a = η = θ = 1) for the finite-sample sizes T ∈ {100, 200, 400, 800, 1600}. On the basis of 10,000 replications, we compute simulated sizes as the fractions of replications, for which the tests erroneously reject the null of a unit root in favor of the alternative, thus indicating explosiveness. We report the results in Table 2 for a nominal size of 5% (i.e., we use the 95% critical values from Table 1). For the SADF, SADF sign-based , and the GSADF tests, we do not find substantial size distortions. By contrast, the GSADF sign-based test appears to be slightly over- Finite-sample critical values are approximated by simulating the data-generating process from Eq. (11) with parameters a = η = θ = 1, and Eq. (14). The numbers of replications are 100,000 and 12,500 for the SADF sign-based and GSADF sign-based tests, respectively sized, when applying asymptotic critical values in finite samples (in particular, for T ∈ {100, 200, 400}). The latter finding is in line with HLZ, who demonstrate that the convergence of finite-sample critical values to their asymptotic counterparts is rather slow for the GSADF sign-based test. To be on the safe side, we compute finite-sample critical values for both sign-based tests (SADF sign-based , GSADF sign-based ), using the sample sizes T ∈ {100, 200, 400, 800, 1600}. Table 3 displays the finite-sample critical values, which we use in our analysis below, to ensure correctly sized sign-based tests.
The process specification in Eq. (11) assumes homoscedastic errors and ignores conditional heteroscedasticity, a well-documented phenomenon in all types of financial markets. To assess the impact on the SADF and GSADF testing procedures, PSY address the sizes of both tests under unit-root processes with GARCH errors. Using standard GARCH(1, 1) errors as in Bollerslev (1986), they do not find critical size distortions. However, the standard GARCH(1, 1) specification does not account for volatility asymmetries, such as the highly relevant leverage effect, according to which negative stock-market shocks tend to exert a larger impact on volatility than positive shocks (e.g., Black 1976;Christie 1982;Schwert 1989). Therefore, we modify the PSY size analysis and consider the unit-root DGP from Eq. (11) with a = η = θ = 1 under (threshold) TGARCH(1, 1) errors, where I(·) denotes the indicator function, which takes on the value 1 if the market is shocked by bad news (ε t−1 < 0), and is 0 in the case of good news (Zakoïan 1994). The first two rows of Table 4 display the simulated sizes of the SADF and GSADF tests under TGARCH(1, 1) errors for the nominal size of 5% and the sample sizes T ∈ {100, 200, 400, 800, 1600} on the basis of 10,000 replications. We set the TGARCH(1, 1) parameters from Eq. (18) to ω = 0.4387, γ = 0, β = 0.9319, φ = 0.1306, so as to coincide with the maximum likelihood estimates obtained from monthly observations of the NASDAQ price-dividend ratio, covering the (relatively)  Sizes are obtained by simulating data (under the null hypothesis) from the specification in Eq. (11) with parameters a = η = θ = 1 under TGARCH errors, as described in Eqs. (17) and (18) tranquil period between January 1988 and December 1994. 5 Apparently, the SADF and GSADF tests exhibit substantial size distortions under volatility asymmetry in the form of TGARCH(1, 1) errors. Rows 3-6 of Table 4 display the sizes of the four heteroscedasticity-adjusted tests under TGARCH(1, 1) errors. We first recall that the four tests-on the grounds of their construction in the HLST and HLZ articles-adjust for unconditional (but not for conditional) heteroscedasticity patterns in the data-generating processes. However, Rows 3, 4 of Table 4 indicate that the SADF sign-based and GSADF sign-based tests operate well in controlling the actual sizes under TGARCH errors. By contrast, the SADF bootstrap and GSADF bootstrap tests (Rows 5, 6 in Table 4) exhibit substantial undersizing. These tests-for which, in line with HLST, we use B = 499 bootstrap replications-reject the null hypothesis of a unit-root too rarely (as opposed to the SADF and GSADF tests from Rows 1-2, which reject the null too often).

Empirical power
We now address the empirical power properties of the SADF, SADF sign-based , GSADF and GSADF sign-based tests. We distinguish between homoscedastic and TGARCH(1, 1) stock-price trajectories.

Homoscedastic trajectories
We simulate 10,000 stock-price series from the present-value equation P t = P f t + B t , with the fundamental stock price P f t evolving according to Eqs. (5) and (6), and the  Table 4. B t is generated as in Eq. (4). The remaining parameters are set to μ = 0, D 0 = 1.6942, B 0 = 10.1925, ψ = 0.9840, ι 2 = 0.0061, π = 0.9595 and α = 0.9675. For the power calculation, we use 10,000 replications, the 95% critical values from Table 1 for the SADF and GSADF tests, and the finite-sample critical values from Table 3 for the SADF sign-based , GSADF sign-based tests. For the SADF adjusted , GSADF adjusted tests in Block (b), we simulated critical values (not reported). For the SADF bootstrap , GSADF bootstrap tests, we set B = 499 in Step 4 of the HLST wild-bootstrap algorithm rational bubble B t following the lognormal mixture from Eq. (4). The involved parameters are set equal to the estimates obtained in Rotermann and Wilfling (2018), who fit the bubble specification (4) to monthly NASDAQ observations between January 1990 and October 2013, applying a particle-filter technique. Specifically, we use the parameter values μ = 0, σ 2 e = 0.4476, D 0 = 1.6942, B 0 = 10.1925, ψ = 0.9840, ι 2 = 0.0061, π = 0.9595 and α = 0.9675.
Block (a) of Table 5 ('Homoscedastic trajectories') reports the results of our power analysis for the sample sizes T ∈ {100, 200, 400, 800, 1600}. We use the 95% critical values from Table 1 for the SADF, GSADF tests, the 95% finite-sample critical values from Table 3 for the SADF sign-based , GSADF sign-based tests, B = 499 bootstrap replications for the SADF bootstrap , GSADF bootstrap tests, and 10, 000 Monte Carlo replications. Our analysis yields six major findings.
(i) The power of the GSADF test exceeds the power of the SADF test-except for T = 1600, where both tests have power equal to 1. (ii) By contrast, the power of the SADF sign-based test always (slightly) exceeds the power of the GSADF sign-based test. (iii) The SADF sign-based and GSADF sign-based tests have higher power than the original SADF, GSADF tests for T = 100, and in the SADF-case also for T = 200. In all other cases, the sign-based tests exhibit lower power. (iv) The power of all six tests increases with increase in sample size. (v) The power of the first four tests (SADF, GSADF, SADF sign-based , GSADF sign-based ) is extremely low for T = 100, 200 and improves only moderately for T = 400. The tests perform satisfactorily for T = 800. For T = 1600, the four tests identify the bubble in (almost) each of the 10,000 simulated stock-price series. (vi) The power values of the SADF bootstrap , GSADF bootstrap tests unambiguously fall below the power values of all other tests in the five scenarios. In anticipation of our subsequent analysis, we emphasize that this clear-cut result also holds under TGARCH(1, 1) trajectories, as shown in Block (b) of Table 5. Again, in each scenario, the SADF bootstrap , GSADF bootstrap tests exhibit the (by far) lowest power values among all 6 tests analyzed. We note that the computation of the 20 SADF bootstrap , GSADF bootstrap power values in Table 5 is extremely time-consuming. 6 Thus, owing to (1) sizing problems, (2) low power values, and (3) computational burdens, we exclude the SADF bootstrap , GSADF bootstrap tests from our further analysis.
The Blocks (a) of Table 6 ('Homoscedastic trajectories') report the power of the remaining four sup-ADF-style tests, when some model parameters are singly (or, as π and α, jointly) varied, while all other parameters are held constant at their baselevels given in the Note of Table 6 (ceteris-paribus analysis). Explicitly, we let (i) the dividend drift μ range between 0 and 0.003 (Settings 1-4), (ii) the discount factor ψ range between 0.975 and 0.990 (Settings 5-8), and (iii) the probability π range between 0.35 and 0.95 (Settings 9-12). In the case of the π -variation, we simultaneously adjust the parameter α, so that the mean bubble growth factor from Eq. (4), α/(ψπ), remains constant at 1.06. We conduct our analysis with 10,000 Monte Carlo replications for each parameter setting, and the sample size T = 400. 7 The Blocks (a) in Table 6 ('Homoscedastic trajectories') yield the following four findings. (i) The SADF and GSADF tests have higher power than their sign-based counterparts in 10 of 12 settings (the two exceptions are the Settings 8 and 12). (ii) A variation in the dividend drift μ (ceteris paribus) does not substantially affect the power of either test. This result is not surprising for two reasons. First, the SADF and GSADF tests are based on Eq. (8), which captures the effects of the dividend drift. Second, the computation of the sign-based test statistics rests on the series {C t } from Eq. (14), which-by construction-is independent of μ. (iii) The power of the four tests decreases dramatically with an increase in discount factor ψ. For instance, for ψ = 0.99 (Setting 8), the tests only detect 9.73%, 12.35%, 11.00%, and 9.67% of the simulated bubbles, respectively. An explanation may be that-with an increasing discount factor ψ-the bubble's positive (mean) growth factor from Eq. (4), α/(ψπ), decreases, rendering the detection of explosiveness more difficult. (iv) Increasing π -probabilities (Settings 9-12) entail substantial decreases in the power of the SADF and GSADF tests, while the impact on the power of the SADF sign-based , GSADF sign-based tests appears ambiguous. A plausible explanation of the SADF and GSADF power reduction could be as follows. Recall that π represents the likelihood of ongoing bubble growth at the constant rate of 6% (which we achieve by an appropriate adjustment of α). For π = 0.95, α = 0.9975 (Setting 12), both tests only detect 8.04% and 9.72% of the bubbles. Prima facie, this finding seems counter-intuitive, since we would expect the tests to exhibit higher power when the probability of bubble inflation (bubble growth) increases. Our explanation of this phenomenon rests on the fact that the joint variation of the parameters π and α keeps the bubble inflation rate, given by α/(ψπ) − 1, stable at the 6% level. However, at the same time, this variation increases the mean bubble deflation rate in Eq. (4), given by (1 − α)/[ψ(1 − π)] − 1, in absolute value, namely from − 0.0111 to − 0.9492. Evidently, neither of the SADF, GSADF tests is capable of coping with these opposing effects on bubble dynamics.  Table 4. B t is generated as in Eq. (4). The basic set of parameters is μ = 0, D 0 = 1.6942, B 0 = 10.1925, ψ = 0.9840, ι 2 = 0.0061, π = 0.9595, α = 0.9675. The parameters μ, ψ, π, α are then singly (or, as π and α, jointly) varied as indicated. The remaining parameters are held constant at their basic values. For the power calculations, we set T = 400, r 0 = 0.1, use 10,000 replications for each parameter setting, the 95% critical values from Table 1 for the SADF, GSADF tests, and the finite-sample critical values from Table 3 for the SADF sign-based , GSADF sign-based tests. For the SADF adjusted , GSADF adjusted tests under TGARCH trajectories, we simulated critical values (not reported)  Table 6 by 10,000, so that the values now represent the number of bubbles correctly detected by the tests in each parameter setting. 8 Four findings are striking. (i) The bubbledetection capacity of the GSADF test only slightly outperforms that of the SADF test. For the sign-based tests, the reverse is true, with the SADF sign-based tests slightly outperforming the GSADF sign-based tests. (ii) In 10 of 12 parameter settings, all four tests detect less than 6,000 (of 10,000) bubbles per setting, that is, more than 40% of the existing bubbles remain undetected. (iii) Pertaining to the sign-based tests, the bubble-detection capacity is even poorer. The SADF sign-based , GSADF sign-based tests detect less than 6,000 bubbles in 11 of 12 settings (and in 10 of 12 settings, even less than 3,100 bubbles). (iv) Only in 2 of 12 settings, the SADF and GSADF tests identify more than 8,000 bubbles, while the sign-based tests detect more than 8,000 bubbles only in 1 of 12 settings.
Block (b) of Table 5 reports the power of the SADF, GSADF tests and their sign-based counterparts for the sample sizes T ∈ {100, 200, 400, 800, 1600} under TGARCH trajectories. Since the conventional SADF and GSADF tests exhibit substantial size distortions under TGARCH heteroscedasticity (see Sect. 3.3.1), we follow the lines of HLZ (p. 11) and size-correct both tests infeasibly. To this end, we simulate finite-sample critical values using the data-generating process from Eq. (11) with a = η = θ = 1 and TGARCH(1, 1) errors. We do not report the critical values here (they are available upon request), but denote the infeasibly size-corrected tests by SADF adjusted and GSADF adjusted . Comparing the power values from Block (b) in Table 5 with those from Block (a), we find qualitatively similar features under TGARCH as under homoscedastic trajectories. However, for the sample sizes T ∈ {400, 800, 1600} the tests under TGARCH have lower power than their counterparts under homoscedasticity.
The latter finding is further strengthened by Table 6, in which Block (b) displays the power values under TGARCH(1, 1) stock-price trajectories for T = 400, r 0 = 0.1, when some of the model parameters are varied. Panel (b) of Fig. 2 ('TGARCH(1,1) trajectories') shows the numbers of bubbles detected by the respective tests. Visual inspection of both panels in Fig. 2 reveals that the bubble-detection rates are often substantially lower under TGARCH than under homoscedastic stock-price trajectories. However, Panel (b) shows that the SADF sign-based and GSADF sign-based tests outperform their (size-adjusted) counterparts in 10 of 12 parameter settings (exceptions are the settings 5 and 6).

Bubble date-stamping
The aim of this section is threefold. In Sect. 4.1, we review bubble date-stamping procedures, which are based on the sup-ADF-style tests from Sect. 3. In Sect. 4.2, we investigate the performance of these date-stamping procedures in a simulation study, using a completely specified data-generating process with known TGARCH(1, 1)-heteroscedasticity. In Sect. 4.3, we apply the procedures to NASDAQ data, for which the DGP-and in particular the heteroscedasticity pattern-is completely unknown.

Date-stamping procedures
In addition to mere bubble detection, PSY propose a date-stamping procedure for estimating the (fractional) origination and termination dates (denoted by r e and r f ) of bubbles in real time. 9 The underlying idea rests on a recursive test procedure called the backward SADF (BSADF) test. The BSADF test follows the same principle as the SADF test, but processes the sample in the reverse direction. The test proceeds in two steps. (i) It computes a sequence of ADF statistics using a series of samples, in which each individual sample has the same fixed endpoint r 2 , while the starting point ranges between 0 and r 2 − r 0 . (ii) The BSADF test statistic is then defined as the sup-value of the ADF statistics computed in Step (i): In order to test for explosiveness in the time-series process at date t = T r 2 ( · is the floor function), the BSADF r 2 (r 0 ) statistic is compared to a critical value obtained from Monte Carlo simulation. Letting r 2 range between r 0 and 1, PSY (i) define the origination date T r e of a bubble as that point in time with the first chronological observation, at which the BSADF statistic exceeds the critical value, and (ii) suggest estimating the origination date T r e via r e = inf r 2 ∈[r 0 ,1] where β T denotes the significance level and scv β T r 2 is the 100(1 − β T )% critical value of the SADF test statistic based on T r 2 observations. 10 In a similar vein, the termination date T r f of the bubble is defined as the point in time with the first chronological observation, at which the BSADF statistic falls below the critical value. Additionally, assuming a minimal time lag of δ log(T ) observations to exist between the origination and the termination date of the bubble, PSY propose estimating the termination date T r f via We note that the parameter δ in Eq. (21) controls for the minimal duration of the bubble. Equations (19)-(21) illustrate that PSY date-stamping ultimately rests on Dickey-Fuller test statistics, which may lead to unreliable conclusions under heteroscedastic 9 PWY establish a predecessor date-stamping methodology, similar to the PSY procedure. However, the PWY procedure may be inconsistent in the presence of multiple bubbles (see PSY, pp. 1044PSY, pp. -1045. 10 PSY point out that the significance level β T may depend on the sample size T and shrink to zero as T → ∞. data. In view of this, we additionally consider a modified PSY dating strategy that uses the HLZ sign-based test statistic from Sect. 3.2.2, and which we denote by PSY sign-based . To this end, we first define the sign-based BSADF test statistic We then estimate the origination date T r e of a bubble via where β T is the significance level and sbcv β T r 2 the 100(1 − β T )% critical value of the SADF sign-based test statistic based on T r 2 observations. Similarly, we estimate the associated termination date T r f via (24) We note that HLZ discuss the PSY sign-based dating strategy from Eqs. (22)-(24) and find that the procedure may be unable to estimate the termination date consistently. To overcome this drawback, they propose an alternative procedure that provides consistent estimates of the bubble's origination and termination dates. However, this refined methodology (i) is only able to date-stamp one bubble, and (ii) constitutes an ex-post rather than a real-time strategy. We therefore do not consider it in our analysis.
A third obvious date-stamping strategy may consist of embedding the bootstrap ADF-statistic from Sect. 3.2.1 into Eqs. (19)-(21) (PSY bootstrap ). However, besides the low power values reported in Table 5, the computational burden, associated with the bootstrap tests, renders the PSY bootstrap date-stamping procedure hard to apply in practice (cf. Footnote 6). We therefore do not present computational results for the PSY bootstrap procedure. 11

Simulation study
We analyze the impact of conditional heteroscedasticity on the performance of PSY date-stamping from Eqs. (19)-(21) and compare the results with the PSY sign-based strategy from Eqs. (22)-(24). We consider the data-generating process suggested in Phillips and Shi (2018), but modify it by imposing TGARCH(1, 1) errors. Formally, 11 Recently, Phillips and Shi (2020) propose a new, computationally less demanding (than PSY bootstrap based on HLST) bootstrap date-stamping procedure, which is implemented in the (psymonitor) R package. This new procedure is, however, practically not amenable to our simulation study in Sect. 4.2. As shown in Figs. 2 and 3 of Phillips and Shi (2020), this date-stamping procedure typically requires subjective judgmental decisions (via eyeballing) on the estimated bubble termination and origination dates, which is impossible to execute in an investigation with 10,000 trajectories. We readdress this issue in Sect. 4.3.
Our date-stamping simulation provides the following main findings. (i) Only on rare occasions the PSY date-stamping procedure does not stamp any bubble (Row 'No bubble' in Table 7), while-irrespective of the collapse pattern-this occurs more frequently under the PSY sign-based procedure. (ii) Both procedures routinely stamp more than one bubble (Row '> 1 bubble'). For the PSY procedure, this rate of erroneous stamping increases from 20.33% to 39.18% with increasingβ values (i.e., with less abrupt collapse patterns). By contrast, this rate decreases from 24.57% to 5.88% with increasingβ values for the PSY sign-based procedure.
(iii) In 2 of 3 settings (β = 0.5, 0.9), the PSY sign-based procedure outperforms the PSY procedure in detecting the correct number of exactly one bubble (Row 'One bubble'). (iv) On average, both procedures overestimate the origination date (Row 'Mean ( r e − r e )'). 13 This result is stable across the distinct collapse patterns. (v) The PSY procedure exhibits an increasing bias in estimating the termination date, when it comes to less abrupt collapse patterns (Row 'Mean ( r f − r f )'). By contrast, the PSY sign-based strategy exhibits a large bias in estimating the termination date in case of a sudden bubble collapse (β = 0.1).
Overall, the PSY procedure outperforms the PSY sign-based procedure in estimating the origination and termination dates of a bubble (in terms of smaller biases) in 5 of 6 settings. However, the PSY procedure often erroneously stamps more than one bubble, especially under the realistic scenarios of disturbing and smooth bubble collapses. 14

Empirical analysis
In this section, we bubble date-stamp the NASDAQ stock market. We use Thomson Reuters Datastream, which provides daily and monthly observations of the NASDAQ composite dividend yields. The data cover the time-span 2 January 1973-29 December 2017 (T = 540 monthly, T = 11739 daily observations). We obtain the price-dividend ratio as the inverse of the dividend yield. In order to estimate the termination dates, 13 In computing the 'Mean deviations' of the estimated origination and termination dates from their true values, we only used those trajectories, for which the procedures stamped exactly one bubble. 14 We also analyzed the PSY date-stamping procedure using infeasibly size-adjusted critical values. Compared to the original (non-adjusted) PSY strategy, we do not find substantial differences in bubble-date estimation accuracy. However, under the infeasibly size-adjusted PSY procedure the rate of erroneously stamping more than a single bubble is considerably lower, irrespective of the collapse pattern.  (21) and (24), the date-stamping procedures require an assumption regarding the minimal (fractional) bubble duration δ log(T )/T . We impose a minimal duration of 6 months (180 days), implying the (approximate) values δ = 2.2 (monthly observations) and δ = 44.2 (daily observations). In contrast to Sect. 4.2, the data-generating process, underlying the NASDAQ, is completely unknown. In particular, we neither know (i) the true number of bubbles (0, 1, 2, . . .), nor (ii) the DGP's true heteroscedasticity pattern. A welldocumented stylized empirical fact is that financial time series are typically subject to time-varying (and, most likely, overlapping) unconditional and conditional heteroscedasticity patterns of unknown form (e.g., Schwert 1989; HLST, Reher and Wilfling 2016). For example, for the NASDAQ data we may find-among other forms of heteroscedasticity-significant TGARCH effects over certain sampling periods, but not over others (see Footnote 5). In this context, we recall that the bootstrap (HLST) and sign-based (HLZ) bubble-detection tests from Sect. 3 only adjust for specific patterns of unconditional heteroscedasticity. Under (conditional) TGARCH heteroscedasticity, however, both tests may exhibit substantial power problems. This latter issue becomes important, when it comes to the econometrician's selection of the appropriate bubble date-stamping procedure (PSY, PSY sign-based , and conceivably, PSY bootstrap ). Since the NASDAQ-DGP is subject to heteroscedasticity of unknown form, there is, a priori, no econometric justification for why to prefer one of the three date-stamping procedures to any of the other. In our subsequent datestamping analysis, we again discard the PSY bootstrap procedure due to computational infeasibility and apply PSY and PSY sign-based . A further methodological issue pertains to the econometrician's choice of the data frequency. We therefore contrast the NASDAQ date-stamping results for monthly with daily observations.

Monthly data
We start our date-stamping analysis with the monthly NASDAQ sample (Fig. 4). As the training period, we use the first 47 (out of 540) observations. As described in Sect. 4.1, we investigate explosive behavior in the price-dividend ratio via the intersections of the BSADF and BSADF sign-based test statistics with their corresponding 95% critical values (obtained from 2,000 Monte Carlo replications). The upper panel in Fig. 4, analyzing the PSY date-stamping procedure, stamps (i) three short-lived periods of explosiveness (shorter than 6 months), indicated by vertical dashed lines, and (ii) four potential bubble periods, indicated by gray shaded areas. These four periods last from The final three of these periods can be ascribed to (i) the bull market prior to Black Monday in October 1987, (ii) the dotcom bubble with its crash starting at the beginning of 2000, and (iii) the short-term stock-market recovery after the Lehman Brothers insolvency in September 2008. 15 Especially the latest period, stamped during the subprime mortgage crisis, reflects a feature of the PSY procedure that is also reported by other authors. Sometimes the PSY procedure identifies collapse periods rather than bubble periods (see, inter alia, Hu and Oxley 2018b). The lower panel of Fig. 4 displays the monthly date-stamping results for the PSY sign-based procedure. The procedure only detects one bubbly period (gray shaded area), lasting from July 1997-December 1997. Additionally, the procedure marks two short-lived periods of explosiveness (vertical, dashed lines) around (i) May 1987, and (ii) May 1998, the lengths of which fall below our imposed minimal bubble duration of six months. Obviously, the PSY sign-based procedure only slightly signals the bull market prior to the Black Monday crash in 1987, and completely fails to stamp (i) the principal part of the dotcom bubble during the years 1999 and 2000, and (ii) the market recovery after the Lehman Brothers insolvency in autumn 2008.

Daily data
In Fig. 5, we analyze the performance of the two date-stamping procedures for the daily NASDAQ observations, where our training period consists of the first 312 (out of 11739) observations. The PSY date-stamping procedure in the upper panel again identifies a number of short-lived periods of explosiveness. However, since we impose a minimal bubble duration of 180 days (δ = 44.2), the PSY procedure now only identifies three bubbly periods under the daily data. In contrast to monthly data, the collapse period during the subprime mortgage crises now remains unstamped. Evidently, the PSY-stamped bubbles under daily observations can differ considerably in the origination and/or termination dates from their analogs under monthly data. Concretely, the first bubble in the upper panel of Fig. 5 already starts on 24 March 1983 under daily data, but not before June 1983 under monthly data in the upper panel of Fig. 4 (i.e., nearly 2.5 months later). Similarly, the second bubble under daily data starts on 24 January 1986, but-under monthly data-not before April 1986. On the other hand, the dotcom bubble ends in January 2001 under monthly observations, but already on 6 October 2000, when using daily data. Thus, for the dotcom bubble, the different estimates of the termination and origination dates imply a 5-month-longerlasting bubble under monthly observations.
The daily PSY sign-based NASDAQ date-stamping is displayed in the lower panel of Fig. 5. Obviously, the BSADF sign-based test statistics exceed the critical values over the entire sample, signaling a permanent NASDAQ bubble over 44 years. This clearcut result for the PSY sign-based procedure again documents an apparent sensitivity of the date-stamping procedures to data frequency shifts. A frequency shift is typically associated with differing econometric properties of the underlying DGP, for example in terms of altering (i) degrees-of-explosiveness (over certain subsamples), and/or (ii) (local) heteroscedasticity patterns. The differing DGP properties can then have the potential to induce divergent date-stamping results.
We end by noting that the econometrician's imposition of a minimal bubble duration (6 months/180 days in our analysis) at the outset of the date-stamping procedure turns out to be crucial. In Figs. 4 and 5, we mark a number of short-lived periods of explosiveness not recognized as bubbles, due to our arbitrary setting of δ = 2.2 and δ = 44.2 for monthly/daily observations. The practitioner will frequently be confronted with the judgmental and subjective question of whether a (relatively) shortlived sequence of BSADF statistics exceeding the critical values either (i) is to be interpreted as a bubbly period, or (ii) is to be viewed as a statistical artifact. 16

Conclusion
This paper investigates the performance of SADF, GSADF, and several heteroscedasticity-adjusted sup-ADF-style tests in detecting financial bubbles. We address (i) the empirical size of the tests under typical financial-market volatility asymmetries (like the leverage effect), and (ii) the empirical power of the tests in detecting a class of rational bubbles, as proposed by Rotermann and Wilfling (2018). Our Monte Carlo simulations find that the majority of the tests exhibit substantial size distortions when the data-generating process is subject to leverage effects. 17 Moreover, the sup-ADFstyle tests often have low empirical power in identifying the (flexible) Rotermann-Wilfling (2018) bubble. As shown in the Panels (a) and (b) of Fig. 2, in 21 of our 24 scenarios (= 87.5%), more than 40% of the existing bubbles remain undetected by the sup-ADF-style tests. In addition, we investigate the performance of two realtime bubble date-stamping procedures (PSY and PSY sign-based ) in a Monte Carlo simulation. While the PSY procedure outperforms the PSY sign-based strategy in terms of bubble-date estimation accuracy, it frequently stamps non-existing bubbles. Finally, we apply both date-stamping procedures to monthly and daily NASDAQ data over a