Discussion of ‘Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection’

We congratulate the author for this interesting paper which introduces a novel method for the data segmentation problem that works well in a classical change point setting as well as in a frequent jump situation. Most notably, the paper introduces a new model selection step based on finding the ‘steepest drop to low levels’ (SDLL). Since the new model selection requires a complete (or at least relatively deep) solution path ordering the change point candidates according to some measure of importance, a new recursive variant of the Wild Binary Segmentation (Fryzlewicz in Ann Stat 42:2243–2281, 2014, WBS) named WBS2, has been proposed for candidate generation.


Theoretical properties
One of the main strengths of the proposed methodology, possibly due to the SDLL, is that it can work well both in a change point regime as well as in a frequent jump regime: In a change point regime the minimum distance to the next change, δ i := min(η i − η i−1 , η i+1 − η i ), is reasonably large while the magnitude of the change f ′ i is bounded from above and can be small (even tend to zero as T → ∞).In a frequent jump regime δ i is small (related to outlier detection) and necessarily corresponding jumps f ′ i need to be large to be detectable.In both situations, an adaptation of Lemma 1 of Wang et al. (2018) shows that no consistent estimator of the locations of change point exists when σ −2 min 1≤i≤N (δ i (f ′ i ) 2 ) < log(T ).While WBS2.SDLL is shown to perform well in both regimes numerically, the paper does not provide a theoretical underpinning of this good behaviour, in the sense that only a lineartime change point setting with δ T := min i δ i being of the same order as the sample size T is considered: Such an assumption is not necessary for consistent change point detection and, moreover, it excludes models such as extreme.teeth(ET) and extreme.extreme.teeth(EET), which are reasonably considered as belonging to the frequent jump regime with δ T ≤ 5.
In the future, it will be very exciting to see which theoretical framework will help us to better understand the performance of statistical procedures that aim at handling both regimes simultaneously.
In addition, the best currently available results for the localisation rate attained by WBS as well as the requirement on the magnitude of changes for their detection, are sub-optimal when δ T /T → 0 (see Proposition 3.4 of Cho and Kirch (2020)).Baranowski et al. (2019) and Wang et al. (2018) suggest modifications of WBS that alleviate the sub-optimality at the cost of introducing additional tuning parameters such as a threshold or an upper bound on the length of random intervals.However, even in these papers, the assumptions are formulated in terms of min i δ i min i (f ′ i ) 2 , which does not reflect that the strength of multiscale procedures lies in their ability to handle data sets containing both small changes with long distances to neighbouring change points, as well as large changes with shorter distances (see e.g., the mix model).Cho and Kirch (2020) consider multiscale change point situations by working with ) in the theoretical investigation of a more systematic moving sum (MOSUM)-type procedure for candidate generation.

SDLL with alternative candidate generation methods
As already pointed out by the author, both components of the proposed algorithm, i.e., candidate generation and model section, can be used in combination with other methods.For example, in Cho and Kirch (2020), a version of WBS2 has been adopted as a candidate generation method for the localised pruning method proposed for model selection.We will now show that deterministic candidate generation methods, such as the multiscale MOSUM procedure (Chan and Chen, 2017;Cho and Kirch, 2020), can be used with SDLL.Our first tentative attempt at generating a complete solution path of candidates with a reasonable measure of importance attached, is described in Section 3 below.Based on some initial simulation results reported in Table 3.1, we conclude that deterministic candidate generation methods can be a good alternative, and that this approach merits further research.Such a deterministic method will always yield the same result when applied to the same data set, whereas WBS-based methods can produce different outcomes in different runs (as observed in Cho and Kirch (2020) on array comparative genomic hybridization data sets).In particular, WBS-based results are reproducible only if the seed of the random number generation is also reported.In Section 4.1 of the present paper, the use of a 'median' of several runs is proposed to reduce this problem, which clearly comes at the cost of additional computation time.

MOSUM-candidate generation and some simulations
Many of the methods included in the comparative simulation studies of the present paper have been designed for the change point regime with their default parameters chosen accordingly, e.g., to save computation time.For example, the algorithm referred to as 'MOSUM' in the present paper, implemented in the R package mosum (Meier et al., 2019a), has a tuning parameter that relates to the smallest δ T permitted, and its default value is set at 10, which we consider as a reasonable lower bound for a change point problem.Also, the default choice of the parameter α ∈ [0, 1], which stems from change point testing and sets a threshold for candidate generation in the algorithm, is somewhat conservative (α = 0.1) and not very meaningful in the frequent jump regime.Moving away from the change point regime, we set the minimum bandwidth as small as possible in generating the bandwidth set G,1 and also set a more liberal threshold with α = 0.9.With these choices, MOSUM shows much better performance than that reported in the present paper, see Table 3.1 below.
Additionally, we explore the possibility of deterministic candidate generation based on moving sum statistics for a given set of bandwidth pairs (G l , G r ) ∈ G × G: At each scale (G l , G r ), we identify all k which maximises | M k (G l , G r ; X)| locally within We aggregate the MOSUM statistics generated at multiple scales as Referring to the methodology combining Algorithm 1 with SDLL as MOSUM.SDLL, Table 3.1 shows the results from applying WBS2.SDLL, MOSUM.SDLL (both with λ = 0.9) and MO-SUM (with the aforementioned choice of parameters) to ET and EET summarised over 1000 realisations.All methods perform better for EET than for ET since the signal-to-noise ratio σ −2 min i δ i (f ′ i ) 2 is greater for ET (see also Section 1 above).