Analysis of the observed and intrinsic durations of gamma-ray bursts with known redshift

The duration distribution of 408 GRBs with measured both duration $T_{90}$ and redshift $z$ is examined. Mixtures of a number of distributions (standard normal, skew-normal, sinh-arcsinh, and alpha-skew-normal) are fitted to the observed and intrinsic durations using the maximum log-likelihood method. The best fit is chosen via the Akaike information critetion. The aim of this work is to assess the presence of the presumed intermediate GRB class, and to provide a phenomenological model more appropriate than the common mixture of standard Gaussians. While $\log T^{obs}_{90}$ are well described by a truly trimodal fit, after moving to the rest frame the statistically most significant fit is unimodal. To trace the source of this discrepancy, 334 GRBs observed only by $Swift$/BAT are examined in the same way. In the observer frame, this results in a number of statistically plausible descriptions, being uni- and bimodal, and with the number of components ranging from one to three. After moving to the rest frame, no unambiguous conclusions may be put forward. It is concluded that the size of the sample is not big enough to infer reliably GRB properties based on a univariate statistical reasoning only.


Introduction
Gamma-ray bursts (GRBs) are the most powerful explosions known in the Universe, with an emission peak in the 200-500 keV region, and the total isotropic energy released of the order 10 51 -10 54 ergs (for recent reviews, see  Mészáros & Rees 2015). They are also one of the most distant astronomical objects discovered, with the highest known redshift of z ∼ 9.4 measured for GRB090429B (Cucchiara et al. 2011). Mazets et al. (1981) first pointed out hints for a bimodal distribution of T b (taken to be the time interval within which fall 80 − 90% of the measured GRB's intensity) drawn for 143 events detected in the KONUS experiment. Kouveliotou et al. (1993) also found a bimodal structure in the log T 90 distribution of 222 events from CGRO/BATSE, based on which GRBs are commonly divided into short (T 90 < 2 s) and long (T 90 > 2 s) classes, where T 90 is the time interval from 5% to 95% of the accumulated fluence. While generally short GRBs are of merger origin (Nakar 2007) and long ones come from collapsars (Woosley & Bloom 2006), this classification is imperfect due to a large overlap in duration distributions of the two populations (Lü et al. 2010;Bromberg, Nakar & Piran 2011;Bromberg et al. 2013;Shahmoradi 2013;Shahmoradi & Nemiroff 2015;Tarnopolski 2015c). Horváth (1998) and Mukherjee et al. (1998) independently discovered a third peak in the duration distribution in the BATSE 3B catalog, located between the short and long groups, and the statistical existence of this intermediate class was claimed to be supported (Horváth 2002) with the use of BATSE 4B data. Interestingly, using clustering techniques, Chattopadhyay et al. (2007) established the optimal number of classes to be three, too. Also in Swift/BAT data evidence for a third component in log T 90 was announced (Horváth et al. 2008;Zhang & Choi 2008;Huja, Mészáros &Řípa 2009;Horváth et al. 2010;Zitouni et al. 2015). Other datasets, i.e. RHESSI (Řípa et al. 2009) and BeppoSAX (Horváth 2009), are both in agreement with earlier results regarding the bimodal distribution, and the detection of a third component was established on a lower, compared to BATSE and Swift, significance level. Thence, four different satel-lites provided hints about the existence of a third class of GRBs.
Those conclusions were based on the finding that a mixture of three standard Gaussians (a 3-G) is a better fit than a mixture of two Gaussians (a 2-G). This is not surprising, because adding parameters to a nested model always results in a better fit (in the sense of a lower χ 2 or a higher maximum log-likelihood L) due to more freedom given to the model to follow the data. The important questions are whether this improvement is statistically significant, can the three components be related to physically distinct classes, and whether the model is an appropriate one-is there a model that is a better fit? (See Tarnopolski 2015a,b for a discussion.) However, even quantifying the relative improvement via p-values 1 is not a definite detection of another physical class of astronomical objects. All of the post-BATSE 3B fits were bimodal, not trimodal, even if comprised of three components. The third peak in the BATSE 3B sample (Horváth 1998) was smeared out with the BATSE 4B catalog when more data was gathered (see Fig. 5 in Zitouni et al. 2015). It was suggested by Zitouni et al. (2015) that the duration distribution corresponding to the collapsar scenario might not necessarily be symmetrical because of a non-symmetrical distribution of envelope masses of the progenitors. Specifically, it was shown by Tarnopolski (2015a) that the log T 90 distribution of GRBs detected by Fermi is also bimodal for several binnings. Moreover, a number of intrinsically skewed distributions were fitted to the data of BATSE, Swift and Fermi (Tarnopolski 2015b), and it was found that mixtures of two skewed components follow the data at least as good (BATSE and Swift), or better (Fermi) than a conventionally used 3-G, and that they are bimodal as well (in the sense of having two local maxima; Schilling, Watkins & Watkins 2002). Generally, n-modality is commonly associated with n populations underlying a distribution. Hence, the existence of an intermediate GRB class is unlikely.
The analysis of the observed durations was performed by many authors, as reviewed above. However, the intrinsic duration-the one in the rest frame-of a GRB is affected by its cosmological distance, and is 1 If one has two fits with χ 2 (ν 1 ) and χ 2 (ν 2 ), then their difference, ∆χ 2 , is distributed like χ 2 (∆ν), where ∆ν is the difference in the degrees of freedom (see Appendix A in Band et al. 1997, andHorváth 1998). Alternatively, if one uses the log-likelihood to assess the goodness of fit, then twice their difference, 2(L 1 − L 2 ), is distributed like χ 2 (∆ν). If a p-value associated with either of the two versions of χ 2 (∆ν) does not exceed the significance level α, one of the fits (with lower χ 2 or higher L) is statistically better than the other (Horváth 2002). It is crucial to note that these methods may be applied to nested models only.
shorter than the observed one: Considering the median redshift of long GRBs,z long ≈ 2, it is evident that GRBs with T obs 90 6 s have an intrinsic duration generally smaller than 2 s, which makes them short ones. Note that the classification of short GRBs is the same in both the observer and rest frames. The analysis of the T int 90 distribution was performed rarely due to a small number of GRBs with measured redshift: Zhang & Choi (2008) examined 95, Huja, Mészáros &Řípa (2009) analyzed 130, and Zitouni et al. (2015 investigated 248 Swift GRBs. While Zhang & Choi (2008) focused on the apparent bimodality, and Huja, Mészáros &Řípa (2009) did not translate the observed durations to the rest frame, Zitouni et al. (2015) found that a 3-G follows the Swift data better than a 2-G (in observer as well as in the rest frame; see their Figs. 6 and 7). However, in both frames the distributions were bimodal, yet apparently skewed, and hence the existence of an intermediate class is still unlikely. The plausible explanation of this phenomenon is that there are two GRB classes with intrinsically nonsymmetrical duration distributions.
The aim of this article is to perform a statistical analysis of the GRBs with measured redshift in order to test against the existence of the intermediate GRB class. Mixtures of various distributions (standard Gaussians, skew-normal, sinh-arcsinh and alpha-skew-normal) are applied to verify whether the statistical significance of a three-Gaussian fit might by challenged by a mixture of skewed distributions with only two components. Both the observed and intrinsic durations are examined.
This article is organized as follows. In Sect. 2 the dataset, fitting methods and the properties of the examined distributions are described as outlined by Tarnopolski (2015b). In Sect. 3 the study of the sample of all GRBs with measured redshift is presented. This is followed by an analysis of Swift GRBs with known redshift in Sect. 4. Section 5 is devoted to discussion, and in Sect. 6 concluding remarks are given.

Dataset
A sample of 408 GRBs with measured both the observed durations T obs 90 and redshifts z is used 2 . It contains 334 GRBs detected by Swift, constituting the second sample examined herein. The sample of all GRBs consists of 386 long GRBs and 22 short ones. The latter all come from Swift observations, except one that was detected by HETE (GRB040924). A scatter plot of the data on a redshift-logarithm of duration plane is drawn in Fig. 1. The median redshifts for short and long GRBs are equal toz short = 0.72 andz long = 1.76, respectively. The intrinsic durations are calculated according to Eq. (1). Distributions of the log T 90 for the observed and intrinsic durations are examined hereinafter, and are displayed in Fig. 2 for the sample of all GRBs.

Fitting method
Two standard fitting techniques are commonly applied: χ 2 fitting and maximum likelihood method (ML). For the first, data needs to be binned, and despite various binning rules are known (e.g. Freedman-Diaconis, Scott, Knuth etc.), they still leave place for ambiguity, as it might happen that the fit may be statistically significant on a given significance level for a number of binnings Koen & Bere 2012;Tarnopolski 2015a). The ML method is not affected by this issue and is therefore applied herein. However, for display purposes, the binning was chosen based on the Freedman-Diaconis rule.
Having a distribution with a probability density function (PDF) given by f = f (x; θ) (possibly a mixture), where θ = {θ i } p i=1 is a set of p parameters, the log-likelihood function is defined as are the datapoints from the sample to which a distribution is fitted. The fitting is performed by searching a set of parametersθ for which the log-likelihood is maximized (Kendall & Stuart 1973). When nested models are considered, the maximal value of the log-likelihood function, L max ≡ L p (θ), increases when the number of parameters p increases.
For nested as well as non-nested models, the Akaike information criterion (AIC) (Akaike 1974;Burnham & Anderson 2004;Liddle 2007;Tarnopolski 2015b) may be applied. The AIC is defined as A preferred model is the one that minimizes AIC. The formulation of AIC penalizes the use of an excessive number of parameters, hence discourages overfitting. It prefers models with fewer parameters, as long as the others do not provide a substantially better fit. The expression for AIC consists of two competing terms: the first measuring the model complexity (number of free parameters) and the second measuring the goodness of fit (or more precisely, the lack of thereof). Among candidate models with AIC i , let AIC min denote the smallest. Then, where ∆ i = AIC i − AIC min , can be interpreted as the relative (compared to AIC min ) probability that the i-th model minimizes the AIC. 3 The AIC is suitable when N/p is large, i.e. when N/p > 40 (Burnham & Anderson 2004, see also references therein). When this condition is not fulfilled, a second order bias correction is introduced, resulting in a small-sample version of the AIC, called AIC c : The relative probability is computed similarly to when AIC is used, i.e. Eq. (4) is valid when one takes ∆ i = AIC c,i − AIC c,min . Thence, It is important to note that this method allows to choose a model that is best among the chosen ones, but does not allow to state that this model is the best among all possible. Hence, the probabilities computed by means of Eq. (6) are the relative, with respect to a model with AIC c,min , probabilities that the data is better described by a model with AIC c,i . What is essential in assessing the goodness of a fit in the AIC method is the difference, ∆ i = AIC c,i − AIC c,min , not the absolute values of the AIC c,i . 4 If ∆ i < 2, then there is substantial support for the i-th model, and the proposition that it is a more proper description is highly probable. If 2 < ∆ i < 4, then there is strong support for the i-th model. When 4 < ∆ i < 7, there is considerably less support, and models with ∆ i > 10 have essentially no support (Burnham & Anderson 2004;Biesiada 2007).

Distributions and their properties
In nearly all researches conducted so far on the GRB duration distribution, three components were found to 3 Relative probabilities of the models normalized to unity are called the Akaike weights, w i . In Bayesian language, Akaike weight corresponds to the posterior probability of a model (under assumption of different prior probabilities; Biesiada 2007). 4 The AIC value contains scaling constants coming from the loglikelihood L. One might conisder ∆ i = AIC c,i − AIC c,min a rescaling transformation that forces the best model to have ∆ min = 0, and so ∆ i are free of such scaling constants (Burnham & Anderson 2004).
describe the observed distribution statistically better than a mixture of two components. However, in all previous analyses a mixture of standard (non-skewed) Gaussians was fitted. This might possibly lead to erroneous conclusions, as describing a non-symmetrical distribution by a mixture of symmetrical components will eventually lead to overfitting (some of the twocomponent skewed distributions considered below are characterized by fewer free parameters than a standard three-Gaussian). Moreover, Zitouni et al. (2015) suggested that the duration distribution of long GRBs might not necessarily be symmetrical because of a nonsymmetrical distribution of envelope masses of the progenitors. Since McBreen et al. (1994) observed that the distribution of log T 90 may be in form of a mixture of standard Gaussians, many authors followed this approach and also restrained the analysis to nonskewed normal distributions Kouveliotou et al. 1996;Horváth 1998Horváth , 2002Horváth et al. 2008;Zhang & Choi 2008;Horváth et al. 2008;Horváth 2009;Huja, Mészáros &Řípa 2009;Huja & Rípa 2009;Horváth et al. 2010;Koen & Bere 2012;Barnacka & Loeb 2014;Tarnopolski 2015a). Therefore, in light of the suggestion of Zitouni et al. (2015) that the T 90 distributions underlying the two well-established GRB classes (Kouveliotou et al. 1993;Woosley & Bloom 2006;Nakar 2007) may not be symmetrical (Tarnopolski 2015a), the following distributions are considered herein.

Study of the complete z sample
The biggest number of free parameters among the examined PDFs is p = 14 in the case of a 3-SAS. Combined with N = 408 for all GRBs, or N = 334 for the Swift subsample, this implies N/p < 40. Therefore, in what follows the AIC c is used instead of the AIC.
First, the sample of 408 GRBs with measured both redshift and duration is examined. The PDFs, given by Eq. (7)-(10), with k = 2 or 3, are fitted to the log T 90 distributions using the ML method from Sect. 2.2. Next, the AIC c is calculated according to Eq. (5). The best fit among the examined is the one that yields the smallest AIC c .
The results of the fitting procedure applied to the observed durations are gathered in Table 1, and the fits, in graphical form, are displayed in Fig. 3. Contrary to previous reasearches, all of the three-component PDFs (3-G, 3-SN, and 3-SAS, where the support for the latter is weak) are trimodal, and the third peak is located in the area of the presumed intermediate GRB class, i.e. within the range 2−10 s. To assess its significance more easily, the AIC c and relative probabilities are plotted in Fig. 4. The PDF with minimal AIC c is a conventional 3-G, and the second best fit is a 3-SN, with a relative probability of 90.6%. A 2-SN, however, has substantial support, too, due to ∆ 2−SN = 1.393. The remaining two-component fits (2-G and 2-SAS), as well as a 1-ASN, yield a strong support having 2 < ∆ i < 4, but the evidence is weaker than for the former three models. The remaining, 3-SAS and 2-ASN, have considerably less or no support.
The picture revealed by the rest frame duration distribution, T int 90 , is different. As displayed in Fig. 5, the 3-SN and 3-SAS are also trimodal, and the 3-G, with the durations being systematically shifted leftwards comparing with the observed durations, lost its third peak, leaving a bimodal distribution with a prominent shoulder in the area of the presumed intermediate GRBs. The parameters of the fits are gathered in Table 2. A remarkably different picture, compared to the result of the T obs 90 analysis, follows from the AIC c plot in Fig. 6. It turns out that the intrinsic durations are best described by a conventional 2-G; the second best model is a 3-G, having a relative probability of 17.1%. The other models have considerably less or almost no support. This suggests that the intrinsic T 90 distribution may be indeed bimodal.
To conclude this Section, the T obs 90 distribution is possibly trimodal, and in the rest frame, due to the properties of Eq. (2), it turns into a bimodal. Table 1 Parameters of the fits for the observed durations. Label corresponds to labels from Fig. 3 and 4

. The smallest
AICc is marked in bold, and p is the number of parameters in a model.

Label
Dist    Fig. 6 The same as Fig. 4, but for intrinsic durations.

Study of the Swift subsample
The reason for examinig separately a smaller sample of 334 GRBs detected only by Swift is the fact that the T 90 distributions and other features, e.g. sensitivity in different energy bands, are detector dependent (e.g., Horváth et al. 2006;Horváth 2009;Huja, Mészáros &Řípa 2009;Horváth et al. 2010;Bromberg et al. 2013;Zitouni et al. 2015;Tarnopolski 2015b,c), and thus the sample examined in the previous Section 3 might be biased. The majority of GRBs with known redshift comes from Swift, and hence one might consider the detections made by other satellites a contamination that falsifies the outcome. Therefore, it is desired to investigate a sample in which all observations were made by the same instrument. The analysis is performed in the same way as it was done in Sect. 3. Again, the observed durations are examined first. A striking difference is that all of the fits are at most bimodal, and unimodal when a 2-G, 1-ASN or 2-ASN is considered. The parameters of the fits are gathered in Table 3, and the fitted curves are displayed in Fig. 7. The uni-or bimodality is consistent with previous analyses performed on more complete samples of Swift GRBs (Horváth et al. 2008;Huja, Mészáros & Rípa 2009;Zitouni et al. 2015;Tarnopolski 2015b), and the curves for three-component fits show a prominent shoulder on the left-hand side of the peak related to long GRBs.
The AIC indicates that the distribution of log T obs 90 is best described by a 3-G. The next two best fits, a 1-ASN and a 2-SN, have a ∆ i < 2, and hence yield strong support in their favor. Next, a 2-G has a relative probability of 24.3% of being a more proper model. The remaining models have considerably less support. It follows that in case of the observed durations one cannot discern reliably the best description among a one-or two-component PDFs, what is also consistent with the previous analyses, as the Swift detection rate is heavily biased towards long GRBs (the ratio of short to long GRBs is < 1 : 14), hence the sample is strongly dominated by long GRBs. Hence, combined with the relatively low number of redshift-equipped GRBs, it appears that due to this domination is unambiguous, in terms of modality, classification of the T int 90 distribution is uncertain.
When the intrinsic durations are considered, there appear some trimodal fits (see Fig. 9). Surprisingly, the model with the lowest AIC is a bimodal 2-ASN, while the second best fit is achieved by a (also bimodal) 3-G, having a relative probability of 13.9%. The remaining models have signinificantly less support (compare with Table 4 and Fig. 10). The 2-ASN consists of a bimodal and a unimodal component. The former consists of two peaks with comparable height, and is visually very symmetric. The latter is skewed, with its mode placed near the peak of the bimodal component that corresponds to long GRBs. Hence, the overall role of the unimodal component is to rescale the bimodal one in a nonlinear way in order to follow the data. The structure of this fit is unusual and unexpected, as in the previous samples the 2-ASN model did not perform very well, being one of the worst fits.

Discussion
Generally, inferring an existence or lack of thereof, based on statistical evidence, must be done with care. Having samples with limited size adds difficulty to such an assessment, as in small samples there is more room for statistical fluctuations that might obscure the global picture. Previous researches, cited in Sect. 1, mostly imply that a 3-G fit is a better descriptive model than a 2-G. Nevertheless, the fits achieved were bimodal, indicating the presence of only two GRB classes (Tarnopolski 2015a). A remarkable exception was the BATSE 3B dataset (Horváth 1998), where the third peak had a negligible probability of 10 −4 to be a chance occurence. It turnt out, however, that a bigger dataset obtained by the same instrument did not reveal its presence anymore (Horváth 2002). Examining the observed, instead of intrinsic, durations might also cast doubts on the reality of the observed phenomenon. Having that in mind, it is tempting to state that the intermediate GRB class is unlikely to be a real class based on the analysis of 408 GRBs with known both T 90 and redshift. This statement could be justified with the results presented in Sect. 3, where the two best models to describe the log T obs 90 distribution were trimodal, but after moving to the rest frame, the most plausible description was provided with a conventional 2-G. It may appear that the intrinsic durations should trace the physical context of the GRBs more appropriately.
On the other hand, the GRB characteristics are not only sample-dependent, as showed above, but also detector-dependent (e.g., Horváth et al. 2006;Horváth 2009;Huja, Mészáros &Řípa 2009;Horváth et al. 2010;Bromberg et al. 2013;Zitouni et al. 2015;Tarnopolski 2015b,c). Thereofore, lacking a dataset numerous enough for the statistics to provide a convincing proof, one may only claim evidence, or its lack, in a specific sample under consideration (see also Tarnopolski 2015a,b). To get rid of the detectordependency, only 334 GRBs as detected by Swift were examined. The outcome of this analysis, shown in Table 3 Parameters of the fits for the observed Swift durations. Label corresponds to labels from Fig. 7 and 8. The smallest AICc is marked in bold, and p is the number of parameters in a model.

Label
Dist   Sect. 4, is surprisingly inconsistent with the one from a bigger sample in Sect. 3, being at the same time consistent with previous analyses performed on a bigger sample of Swift GRBs-the obtained fits are all uni-or bimodal, and the one with the lowest AIC c is a bimodal 3-G; the next best fits were a unimodal 1-ASN and a bimodal 2-SN. Both yield strong evidence in their favor, so it is not possible to unambiguously infer the number of components, or even the modality of the Swift sample 5 . After moving to the rest frame, the problems are not solved, especially that the best fit now is a 2-ASN. The problem with this distribution is that it consists of a bimodal component [see Fig. 9 (h)], with locations of its peaks in agreement with the groups of short and long GRBs. It seems like the role of the second component here is to merely adjust the height of the fit. The second best fit, a bimodal 3-G, has a relative probability of 13.9%. While this is a statistically valid result, meaning that among the exmined distributions the 2-ASN is best balanced between the goodness of fit and the number of parameters, from the physical point of view, regarding the knowledge about GRBs, this result is an unrealistic one, as the short and long GRBs are known to stem from different progenitors, mergers and collapsars, respectively. Even after dismissing the 2-ASN, differentiating between a 3-G and 1-ASN is not possible in the framework of the AIC c , as these two models yield ∆ i = 0.535. Hence, the currently available redshift distribution unfortunately does not allow to infer the existence of the intermediate GRBs class reliably, likely due to the smallness of the sample.

Conclusions
The research conducted so far on different samples of GRB duration distributions indicate that a 3-G follows the data better than a 2-G. However, even with three components, the fitted distribution is usually bimodal, implying two physical classes. Because a twocomponent mixture of skewed distributions was shown to be a statistically better fit, in case of Fermi/GBM observations, than the commonly applied 3-G (Tarnopolski 2015b), in this paper the same approach was undertaken to investigate the modality and goodness of fit in case of GRBs with measured both redshift and duration. The reason for this is that in the rest frame the effects of cosmological factors are mostly elliminated, hence it is expected that it will provide an insight into the properties of GRBs.
It was found that in a sample of 408 GRBs with known redshift, the best fits-3-G and 3-SN-are trimodal (in the sense of having three local maxima), but after moving to the rest frame, a (unimodal) 2-G yielded considerably stronger support than any other examined distribution. However, this sample is dominated by detections made by Swift/BAT (334 events, ≈ 82% of the total number of GRBs with measured z), and hence this finding might be affected by the fact that GRB properties are detector-dependent. Therefore, the Swift/BAT subsample was also examined, and it turnt out that it is not possible to reliably infer the best fit within the chosen information-theoretic framework (AIC c in this work). This may be caused by the smallness of the sample, and so the solution is to, hopefully, repeat the analysis in the future on a wider GRB sample (see also Zhang & Xie 2007). Because the mathematical model of the observed as well as intrinsic durations is still lacking, the physical interpretation of the results obtained herein is limitted. The distribution of intrinsic durations, being systematically shifted towards shorter values, while may be believed to trace the properties of GRB population more accurately, is also affected by statistical fluctuations. Considering the Swift subsample, the distributions are strongly dominated by long GRBs, what might cause introduction of biases in the analysis undertaken.