Abstract
The characteristic aspects of dynamic distortions on a lengthy time series of i.i.d. pure noise when embedded with slightly-aggregating sparse signals are summarized into a significantly shorter recurrence time process of a chosen extreme event. We first employ the Kolmogorov–Smirnov statistic to compare the empirical recurrence time distribution with the null geometry distribution when no signal being present in the original time series. The power of such a hypothesis testing depends on varying degrees of aggregation of sparse signals: from a completely random distribution of singletons to batches of various sizes on the entire temporal span. We demonstrate the Kolmogorov–Smirnov statistic capturing the dynamic distortions due to slightly-aggregating sparse signals better than does Tukey’s Higher Criticism statistic even when the batch size is as small as five. Secondly, after confirming the presence of signals in the pure noise time series, we apply the hierarchical factor segmentation (HFS) algorithm again based on the recurrence time process to compute focal segments that contain a significantly higher intensity of signals than do the rest of the temporal regions. In a computer experiment with a given fixed number of signals, the focal segments identified by the HFS algorithm afford many folds of signal intensity which also critically depend on the degree of aggregation of sparse signals. This ratio information can facilitate better sensitivity, equivalent to a smaller false discovery rate, if the signal-discovering protocol implemented within the computed focal regions is different from that used outside of the focal regions. We also numerically compute the specificity as the total number of signals contained in the computed collection of focal regions, which indicates the inherent difficulty in the task of sparse signal discovery.
Similar content being viewed by others
References
Abramovich F, Benjamini Y (1996) Adaptive thresholding of wavelet coefficients. Comput Stat Data Anal 22:351–361
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
Cai T, Jin J, Low M (2007) Estimation and confidence set for sparse normal mixtures. Ann Statist 35: 2421–2449
Chang L-B, Goswami A, Hsieh F, Hwang C-R (2013) An invariance for the large sample empirical distribution of waiting time between successive extremes. Under review for a special volume on stochastic calculus. In: Hwang CR et al (ed) (2013) Festschrift in honor of Professor S. R. Srinivasa Varadhan on the occasion of his 70th birthday, Academia Sinica, Taipei, Taiwan
Donoho D, Jin J (2004) Higher criticism for detecting sparse heterogeneous mixtures. Ann Stat 32:962–994
Donoho D, Jin J (2008) Higher criticism thresholding: optimal feature selection usful features are rare and weak. Proc Natl Acad Sci 105:14790–14795
Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 96:1151–1160
Fushing H, Hwang CR, Lee HC, Lan YC, Horng SB (2006) Testing and mapping non-stationarity in animal behavioral processes: a case study on an individual female bean weevil. J Theor Biol 238:805–816
Fushing H, Chen SC, Pollard KS (2009) A nearly exhaustive search for CpG islands on whole chromosome. Int J Biostat 5, Article 14
Fushing H, Chen S-C, Hwang C-R (2010a) Non-parametric decoding on discrete time series and its application in bioinformatics. Stat Biosci 2:18–40
Fushing H, Chen SC, Lee HJ (2010b) Computing circadian rhythmic patterns and beyond: a new non-Fourier analysis. Comput Stat 24:409–430
Fushing H, Chen SC, Lee HJ (2010c) Statistical computations on biological rhythms I: dissecting variable cycles and measuring phase shifts in activity event time series. J Comput Graph Stat 19:221–239
Fushing H, Ferrer E, Chen SC, Chow SM (2010d) Dynamics of dydic interaction I: exploring non-stationarity of intra- and inter-individual affective processes via hierarchical segmentation and stochastic small-world networks. Psychometrika 75:351–372
Fushing H, Chen SC, Hwang C-R (2012) Discovering stock dynamics through multidimensional volatility-phases. Quant Financ 12:213–230
Hall P, Jin J (2008) Properties of higher criticism under strong dependence. Ann Stat 36:381–402
Jeng XJ, Cai T, Li H (2010) Robust identification of sparse segments in ultra-high dimensional data analysis. J Am Stat Assoc 105:1156–1166
Jin J (2007) Proportion of nonzero normal means: univeral oracle equivalences and uniformly consistent estimates. J R Stat Soc Ser B 70:461–493
Jin J, Cai T (2007) Estimating the null and the proportion of non-null effects in large scale multiple comparison. J Am Stat Assoc 102:496–506
Kac M (1947) On the notion of recurrence in discrete stochastic processes. Bull Am Math Soc 53:1002–1010
Tukey J (1989) Higher criticism for individual significance in several tables or parts of tables. Princeton University, Princeton (Internal working paper)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported in part by the NSF under Grant DMS 1007219 (co-funded by Cyber-enabled Discovery and Innovation (CDI) program).
Rights and permissions
About this article
Cite this article
Chen, SC., Fushing, H. & Hwang, CR. Discovering focal regions of slightly-aggregated sparse signals. Comput Stat 28, 2295–2308 (2013). https://doi.org/10.1007/s00180-013-0407-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-013-0407-8