Introduction

The brain is a system that consists of millions of thousands of neurons designed to perceive and respond to external stimulus in a highly nonlinear and complex way. Reaction times, broadly defined as the time lapse between the presentation of a stimulus and a response to it, have been proven a ubiquitous metric extensively used in experimental psychology, cognitive neuroscience, psychophysiology and behavioural neuroscience to explain the mechanisms supporting higher- and lower-order cognitive processes. Research suggests there is a tight coupling between neuronal RTs and behavioural RTs (see Palmer et al. 2007; Galashan et al. 2013; Mukamel and Fried 2012; Levakova et al. 2015) and that neural processing, signal transmission, and decision processes are bundled within that time lapse (see Voelker et al. 2017; see Figure 1 in Commenges and Seal (1986) for a representation of the association between stimulus, neuronal RTs, and behavioural RT). The importance of investigating RTs has been demonstrated via computational, human- and animal-based behavioural approaches. From a computational perspective, for example, it has been shown that behavioural RTs can be used as biomarkers for characterising neurological diseases (Maia and Kutz 2017). From a human behavioural perspective, evidence shows that RTs can indeed allow differentiation between certain brain diseases. For example, Jahanshahi et al. (1993) showed that patients with Parkinson’s disease, Huntington’s disease and cerebellar disease exhibit different average RTs, and Osmon et al. (2018) demonstrated the clinical advantage of examining the distribution of Attention Deficit/Hyperactivity and neurotypical patients’ RTs. Animal-based research has shown how and when specific sets of neurons fire to selected stimuli under highly controlled experimental settings (e.g. Sun et al. 2019; Luna et al. 2005; Múnera et al. 2012; Veit et al. 2014).

Various distributions including the Inverse-Gaussian (or Wald), Weibull, Log-Normal, Weibull-Gaussian, and Gamma have been used to fit behavioural and neural reaction time data. For example, Palmer et al. (2007) used the Weibull distribution, Leiva et al. (2015) used the Birnbaum–Saunders distribution, and Seal et al. (1983) used the Gamma distribution. Other distributions have recently been proposed for modelling RT data. For example, Tejo et al. (2018, 2019), and Martínez-Flórez et al. (2020) have put forward the shifted Birnbaum–Saunders, shifted Gamma distributions, and Exponential-Centred Skew-Normal distributions, respectively, as good fits to RTs. For example, Osmon et al. (2018) fitted the participants’ RTs with the three-parameter Johnson’s SU distribution,Footnote 1 and Foroni et al. (2020) found the four-parameter Sinh-Arcsinh distribution gave the best fit for simple choice RTs. However, the distribution most commonly used to fit RT data is the Exponential Gaussian (also known as Exponentially Modified Gaussian or Ex-Gaussian distribution; here referred to as EG). This is a three-parameter distribution that can fit the data’s location, scale and rightward exponential shape. The EG hence shows a positive skew, which is the canonical shape of both neuronal and behavioural RTs [e.g. see Figure 4 in Hanes and Schall (1996), Figure 2 in Hauser et al. (2018), and Figure 2 in Mormann et al. (2008) for neuronal RT shapes, Figure 7 in Osmon et al. (2018) and Figures 3C, 4C, and 5B in Fischer and Wegener (2018) for behavioural RT shapes]. Although flexible enough to fit the typical positive skew distribution of RTs, the EG cannot fit RTs that exhibit normal-like or even negative skew. Normal- and negative-like shapes have been reported for rate RTs (reciprocal RTs, i.e. 1/RT) [see Figure 2A in Harris et al. (2014)] and recognition (go/no-go) RTs [see Figure 2 in Limongi et al. (2018)], respectively.

In this article, a four-parameters distribution called Generalised Exponential Gaussian is proposed. Similar to the EG distribution, the proposed distribution can fit positively skewed RT shapes but it has the advantage of fitting Gaussian-like and negatively skewed RT shapes. The article unfolds as follows: first, statistical arguments in favour of asymmetric distributions are presented; second, the properties of the EG are outlined; third, the details of the proposed distribution are described; fourth, the results of a computer simulation examining the properties of the proposed distribution are reported; fifth, the proposed distribution is illustrated via four data sets; sixth, generalised additive models for location, scale and shape (GAMLSS) are briefly presented, as this is the only existing regression framework suitable to model regression data in a fully distributional fashion; and finally, the statistical and practical implications of the proposed distribution are discussed in relation to GAMLSS.

Asymmetric distributions as better alternatives to model non-normal data

Most statistical analyses featured in published research rely on techniques that assume, among other things, that data follow a normal distribution. In practice, however, data tend to follow non-normal shapes [see Bono et al. (2017)]. When faced with non-normal distributions, the common approach is to transform the numeric variables. Although transformations can indeed be successful, they bring challenges in relation to the interpretability of the new metric and back-transformation (Marmolejo-Ramos et al. 2015; Pek et al. 2018; Vélez et al. 2015). Specifically, it is not always possible to find a back-transformation that enables interpretation of the parameter estimate, and this issue is more pronounced when there are several variables with different transformations [see Azzalini and Capitanio (1999)]. Distributions more flexible than the traditional Gaussian have been proposed to overcome these challenges. These newly proposed distributions enable data with different degrees of asymmetry and kurtosis to be tackled. Some of these distributions are the Skew-Normal [SN; Azzalini (1985)], Power-Normal [PN; Durrans (1992); Pewsey et al. (2012)], and Skew-Normal Alpha-Power [here called SNAP for short; Martínez-Flórez et al. (2014)].

The SN model is defined by the probability density function (PDF)

$$\begin{aligned} \varphi (z;\lambda )=2\phi (z)\Phi (\lambda z), \quad z\in {\mathbb{R}}, \end{aligned}$$
(1)

where \(\phi\) and \(\Phi\) are the density and distribution functions, respectively, of the standard normal distribution, and \(\lambda\) is a skewness parameter. This distribution is denoted by \(Z\sim {SN(\lambda )}\) and, in addition to the work by Azzalini (1985), it has been extensively studied by Henze (1986), Chiogna (1998), Gómez et al. (2007) and Pewsey et al. (2012).

Alternatively, Lehmann (1953) proposed a family of distributions with PDF \(F(z;\alpha )=\{F(z)\}^{\alpha -1}\), where F is a distribution function with \(z\in {\mathbb{R}}\), and \(\alpha \in \mathbb{Z}^+\). In general terms, this distribution is generated from the distribution of the maximum of the sample. This model is known in the literature as Lehmann’s alternative model and is widely discussed by Gupta and Gupta (2008).Footnote 2

The PN model, denoted by \(Z\sim PN(\alpha )\) and introduced by Durrans (1992), has the following PDF:

$$\begin{aligned} f(z,\alpha )=\alpha \phi (z)\left\{ \Phi (z)\right\} ^{\alpha -1},\quad z \in {\mathbb{R}}, ~~\alpha \in \mathbb{R}^+, \end{aligned}$$
(2)

where \(\phi\) and \(\Phi\) are the density and distribution functions, respectively, of the standard normal distribution and \(\alpha\) is a shape parameter. This distribution has multiple applications in cases where data cannot be handled through normal distributions and, instead, data present high or low asymmetry and/or kurtosis.

Martínez-Flórez et al. (2014) proposed the SNAP distribution (\(SNAP(\lambda ,\alpha )\)), a more flexible extension of the previous two distributions. This distribution not only contains the SN and PN distributions as special cases, but it also includes the normal distribution. The PDF of the SNAP distribution is as follows:

$$\begin{aligned} f(z,\alpha )=\alpha \phi _{SN}(z;\lambda )\left\{ \Phi _{SN}(z;\lambda )\right\} ^{\alpha -1}, \quad z, \lambda \in {\mathbb{R}}, ~~\alpha \in \mathbb{R}^+. \end{aligned}$$
(3)

The SN, PN, and SNAP distributions illustrate the versatility of asymmetric and generalised distributions for fitting non-Gaussian data (Table 1 provides a summary of key aspects of these distributions). Although it could be argued that these distributions provide good-enough fits to RT data, they have not been investigated in the context of RT experiments. The EG distribution, however, is the most commonly used distribution to fit RTs obtained in (neuro)psychological experiments [see Dawson (1988)].

Table 1 Summary of the asymmetric distributions skew-normal (SN), power-normal (PN), and alpha-power skew-normal (SNAP)

The Ex-Gaussian distribution and its statistical properties

This probability model was introduced by Hohle (1965) through the convolution of two independent random variables; the Normal and Exponential PDFs. This distribution was conceived between 1956 and 1963 (Christie and Luce 1956) while searching for a model that could represent the disjunctive structure of RTs. That is, the RT is composed of an exponentially distributed decision time, plus a variable time or motor RT. After studying the cumulative distribution function (CDF) of the RT for different intensities of auditory stimuli, McGill (1963) concluded that the variable had an exponential form with similar constant times. This led to the assumption that at least one component of the total RT had an exponential distribution. Given that the time constants implied by the curves appeared to be almost independent of the stimulus intensity, McGill (1963) assumed that this component was the time required for the motor response, while the other component was assumed to be the time taken to make a decision. Specifically, the EG’s exponential distribution component has a constant average response \(\tau\) (Luce 1986; McGill 1963) while the remaining part of the EG follows a normal distribution \(N(\mu ,\sigma ^{2})\) (Hohle 1965). These statistical properties of the EG distribution are described in more detail in “Appendix” [see also section 13.3.2.1 in Rigby et al. (2020)].

The generalised exponential-Gaussian distribution

Based on the distribution of the sample’s maximum, Lehmann (1953) proposed the family of distributions \(\mathbb{F}_F=\{F(x)\}^\alpha\) where \(F(\cdot )\) is an absolutely continuous distribution function and \(\alpha\) is a rational number. In the context of hydrology, Durrans (1992) extended this model to a family of distributions \(\mathbb{G}_F=\{F(x)\}^\alpha\) where \(F(\cdot )\) is a distribution function and \(\alpha \in {\mathbb{R}}^+\). This family of distributions has PDF \(g(x)=\alpha f(x)\{F(x)\}^{\alpha -1}\) where \(f=dF\) denotes the distribution of fractional order statistics [see also Stigler (1977)]. When \(F=\Phi\) (i.e. the Normal distribution’s CDF), it is known as the generalised normal distribution. This is a flexible distribution that can be conceived as an extreme values distribution given that \(\alpha \in {\mathbb{R}}^+\). Also, when \(\alpha \in \mathbb{N}\), this distribution can fit data with positive or negative skews. Finally, when \(\alpha =1\), the original distribution’s PDF \(f(\cdot )\) is obtained.

Following the above-mentioned work by Durrans (1992), a generalisation of the distribution of the maximum in the \(f_{EG}(\cdot )\) is proposed by considering an \(\alpha -\)fractional order statistic where \(\alpha \in {\mathbb{R}}^+\). That is, the goal is to extend the EG distribution by incorporating a new parameter that controls the skewness and kurtosis of the distribution, i.e. the distribution’s shape. As the new distribution has skewness and kurtosis values above and below those possible by the original \(f_{EG}(\cdot )\), it is hence much more flexible than the traditional EG distribution in accommodating skewness and kurtosis.

According to the results given by Durrans (1992) and Pewsey et al. (2012), the exponential extension of the EG distribution is given by the following PDF:

$$\begin{aligned} \varphi (x)&= \frac{\alpha }{\tau }e^{-\frac{x-\mu }{\tau }+\frac{\sigma ^2}{2\tau ^2}}\Phi \left( \frac{x-\mu }{\sigma } -\frac{\sigma }{\tau }\right) \nonumber \\&\quad \left[ \Phi \left( \frac{x-\mu }{\sigma }\right) -e^{-\frac{x-\mu }{\tau } +\frac{\sigma ^2}{2\tau ^2}}\Phi \left( \frac{x-\mu }{\sigma }-\frac{\sigma }{\tau }\right) \right] ^{\alpha -1} \end{aligned}$$
(4)

such that \(\alpha \in {\mathbb{R}}^+\) is a shape parameter that controls skewness and \(\tau\) regulates kurtosis; \(\mu\) and \(\sigma\) are the location and scale parameters respectively (such that \(\sigma >0\) and \(-\infty<\mu <\infty\)). This distribution is called from here on the Generalised Exponential Gaussian (GEG) distribution and it is denoted by \(X\sim GEG(\tau ,\mu ,\sigma ,\alpha )\). Note that when \(\alpha =1,\) the GEG meets the EG distribution. On the other hand, when \(\tau \rightarrow 0\), then \(\mathbb{E}(X)\rightarrow \mu , ~~Var(X)\rightarrow \sigma ^{2},~~skew\rightarrow 0\quad \text {and}\quad kurt\rightarrow 3\); i.e. \(EG(\tau ,\mu ,\sigma ^{2})\rightarrow N(\mu ,\sigma ^{2})\). Additionally, when \(\tau \rightarrow 0\) and \(\alpha \ne 1\), the GEG distribution converges to the PN distribution; i.e. \(GEG(\tau ,\mu ,\sigma ,\alpha )\rightarrow PN(\mu ,\sigma ,\alpha )\). Figure 1 displays some of the shapes the GEG distribution can take.

Fig. 1
figure 1

Probability density (first row) and cumulative distribution functions (second row) for values of \(\tau ,\mu ,\sigma\), and \(\alpha\) of the GEG distribution. Three negatively skewed, symmetric, and positively skewed shapes are shown in the first, second, and third columns respectively

The CDF of the random variable \(X\sim GEG(\tau ,\mu ,\sigma ,\alpha )\) is given by:

$$\begin{aligned} F_{GEG}(x)=\left[ \Phi \left( \frac{x-\mu }{\sigma }\right) -e^{-\frac{x-\mu }{\tau } +\frac{\sigma ^2}{2\tau ^2}}\Phi \left( \frac{x-\mu }{\sigma }-\frac{\sigma }{\tau }\right) \right] ^{\alpha } \end{aligned}$$
(5)

To generate a random variable with GEG distribution, a uniform random variable U in (0,1) should be used. So, letting

$$\begin{aligned} u=[F_{EG}(x; \tau ,\mu ,\sigma )]^\alpha , \end{aligned}$$

then

$$\begin{aligned} u^{1/\alpha }=F_{EG}(x; \tau ,\mu ,\sigma ) \end{aligned}$$

from where it follows that

$$\begin{aligned} x=F_{EG}^{-1}(u^{1/\alpha }; \tau ,\mu ,\sigma ), \end{aligned}$$

where \(F_{EG}^{-1}(\cdot ;\tau ,\mu ,\sigma )\) is the inverse function of the EG distribution with parameters \(\tau ,\mu ,\sigma\), which is available in the R packages emg and gamlss.dist (via the function ‘exGAUS’).

The survival, inverse risk, and the hazard functions of the GEG distribution are:

$$\begin{aligned} S_{GEG}(x)&= 1-\left[ \Phi \left( \frac{x-\mu }{\sigma }\right) -e^{-\frac{x-\mu }{\tau }+\frac{\sigma ^2}{2\tau ^2}}\Phi \left( \frac{x-\mu }{\sigma }-\frac{\sigma }{\tau }\right) \right] ^{\alpha },\nonumber \\&\quad r_{GEG}(t)=\frac{f_{GEG}(x)}{F_{GEG}(x)}=\alpha r_{EG}(t), \end{aligned}$$
(6)
$$\begin{aligned} h_{GEG}(t)&= \alpha \frac{e^{-\frac{x-\mu }{\tau }+\frac{\sigma ^2}{2\tau ^2}}\Phi \left( \frac{x-\mu }{\sigma }-\frac{\sigma }{\tau }\right) \left[ \Phi \left( \frac{x-\mu }{\sigma }\right) -e^{-\frac{x-\mu }{\tau }+\frac{\sigma ^2}{2\tau ^2}}\Phi \left( \frac{x-\mu }{\sigma }-\frac{\sigma }{\tau }\right) \right] ^{\alpha -1}}{\tau \left( 1-\left[ \Phi \left( \frac{x-\mu }{\sigma }\right) -e^{-\frac{x-\mu }{\tau }+\frac{\sigma ^2}{2\tau ^2}}\Phi \left( \frac{x-\mu }{\sigma }-\frac{\sigma }{\tau }\right) \right] ^{\alpha } \right) } \end{aligned}$$
(7)

where \(r_{EG}(\cdot )\) is the inverse risk function of the EG distribution defined in Eq. (11 in “Appendix”). This entails the inverse risk function of the GEG distribution being directly proportional to the previous function, and in the same way, they are intervals where \(r_{GEG}\) grows or decreases. Some properties of the exponentiated generalized class of distributions can be found in Cordeiro et al. (2013). Details regarding the GEG distribution’s moments, log-likelihood function, score function, and information matrices are presented in “Appendix”.

A simulation-based assessment of the generalised exponential-Gaussian distribution

A simulation was carried out to investigate the maximum likelihood estimates (MLEs) of the GEG’s parameters across several data generating process (DGP) scenarios. While \(\mu\) and \(\sigma\) were set at 0 and 1 respectively (as no major impact of location and scale on the quality of the estimates was expected), \(\tau\) and \(\alpha\) were varied such that \(\tau \in \{0.5,\, 1.25\)} and \(\alpha \in \{0.75,\, 1.75,\, 2.75\}\). Variations of these parameter values were assessed across small and large sample sizes such that \(n\in \{50;\, 100;\, 200;\, 400;\, 800;\, 1600\}\). Each of the resulting 36 DGP was replicated 1000 times.

For each setting, the empirical bias, root mean squared error (rMSE), and coverage probability were estimated (see Table 2). The coverage probability had a 95% Wald confidence interval utilising the inverse of the observed Fisher information as the asymptotic covariance matrix for all four distributional parameters \(\mu\), \(\sigma\), \(\tau\) and \(\alpha\). The median value across the 1,000 simulations in each DGP was estimated for the bias and the rMSE. The empirical coverage probabilities were obtained by averaging over the replications. Given that the numerical estimation of MLE failed in the case of small sample sizes, the number of converged optimisations for each DGP is reported.

The results indicate that, as expected from standard asymptotic consistency and normality of MLEs, all criteria improve as the sample size increases, regardless of the specific parameter setting (i.e. the bias and the rMSE decrease while the coverage approaches the nominal level of 95%). It is evident too that larger values of either \(\tau\) and/or \(\alpha\) are associated with decreasing statistical performance; thus, larger sample sizes are required to obtain reliable estimates.

Table 2 Behaviour of the MLE for selected parameters of the GEG distribution

Illustration of the generalised exponential-Gaussian distribution via published data sets

In this section, the versatility of the GEG distribution is illustrated via four data sets in which motor or neuronal RTs are featured.

  • ADHD’s simple RTs in Osmon et al. (2018) [here ADHD data set]. Osmon et al. (2018) obtained three different types of RTs from 27 neurotypical participants and 28 participants diagnosed with attention-deficit/hyperactivity disorder (ADHD). The RTs obtained during the simple RT task are featured in this study. In this task, participants had to press the same centrally located key when a stimulus appeared, regardless of the location on a computer screen (right or left side). As each participant had a fixed number of 120 trials and there were 28 ADHD participants, a total of 3360 trials were obtained.

  • Monkey S’s RTs in a reaching task in Kuang et al. (2016) [here M.S. data set]. The goal of this study was to investigate the neuronal activity in the posterior parietal cortex in two rhesus monkeys while they performed centre-out hand reaches under either normal or prism-reversed viewing conditions. All the trials in all 107 sessions for the ‘normal right’ condition from monkey S were used. There were between 19 and 62 trials across sessions and the median number of trials across sessions was \(40 \pm 14.82\). This gives a total of 3996 trials and RTs. After removing RTs < 50 ms, 3980 RTs remained. It is important to acknowledge that the lower limit of 50 ms is somewhat arbitrary and there are no agreed rules in the monkey literature on what the lower and upper boundaries should be. However, considering usual visual information processing latencies (ventral stream: \(\sim\)50 ms; dorsal stream: \(\sim\)100 ms), imposing a 50 ms constraint seems to be a minimal requirement.

  • Crows’ RTs in a visual task in Veit et al. (2014) [here C.D. data set]. This study aimed to investigate the neuronal correlates of visual working memory in four trained carrion crows. The experimental set-up required crows to remember a visual stimulus for later comparison while the activity of neurons in the nidopallium caudolaterale (a higher association brain area functionally akin to the prefrontal cortex in monkeys) was recorded. Veit et al. (2014) reported a histogram of the RTs of 162 visually responsive neurons (i.e. neurons from the four crows for which RTs could be estimated) in Figure 4B in their paper.

  • Synchronised cortical state and neuronal RTs [here S.S.N. data set]. Fazlali et al. (2016) investigated the link between spontaneous activity in the locus coeruleus (a key neuromodulatory nucleus in the brainstem) and synchronised/desynchronised states in the vibrissal barrel somatosensory cortex (BC) in Wistar rats. One of the analyses looked at neuronal responses in the BC during the two cortical states. The authors found that neuronal RTs in the BC were faster during the desynchronised than during the synchronised cortical state (in their study neuronal RTs were defined as the first time bin exceeding background activity by three standard deviations). The distribution of the BC’s neuronal RTs in the synchronised cortical state are featured in this study. Figure 6B and the section ‘Reduced response latency in desynchronised state’ in Fazlali et al. (2016) provide details for this data set.

Table 3 reports the goodness-of-fit of the GEG and other distributions to these data sets (to improve numerical stability, all data were divided by a constant factor of 100). The results indicated that the SNAP and SN distributions provided the best fits in the C.D. and S.S.N. data sets, respectively, and that the GEG distribution gave the best fit in the remaining two data sets (bearing in mind that the lower the AIC and/or BIC, the better the fit). Note that, while we are comparing distributions with differing numbers of parameters and therefore an inherent advantage for more complex distributions to fit the data better, AIC and/or BIC both adjust for the model complexity such that we can make valid comparisons across distributions with different numbers of parameters.

Although the ECSN (Exponential-Centred Skew-Normal) distribution gave the second-best fit in the C.D. data set, this result is not reliable given difficulties in convergence, leaving that second place for the GEG distribution. In the case of the S.S.N. data set, the LN and GEG distributions gave the second- and third-best fits, respectively.

Table 3 also shows that the NO distribution tended to give bad fits (i.e. very high AIC and/or BIC) due to its natural inability to fit asymmetric data and its definition on the real line, which does not match with the non-negativity of RTs. Such a result reinforces the claim that methods that assume normality in the response variable (e.g. ordinary least squares, t-test and ANOVA) are not suitable to analyse and model RT data. The results of the LN distribution also indicate that a logarithmic transformation is usually not enough to make the distribution of RTs adhere to a normal law. That the EXP distribution gave the highest AIC and/or BIC suggests that having just one parameter (rate or inverse scale in the case of this distribution) limits this distribution’s flexibility to meet the shape of RT data. Overall, the results thus suggest the GEG distribution provides a good fit to these behavioural and neuronal RTs data sets (see Fig. 2). A future study should aim to fit several suitable distributions to a much larger collection of real-life neuronal and behavioural data sets to obtain a fine-grained picture of which distributions tend to give the best fit across data sets of comparable characteristics.

Fig. 2
figure 2

Empirical CDFs of four real-life data sets and four fitted theoretical CDFs. The data distributions are represented by black dots (eCDF). Note that the NO distribution tends to miss the tails of the data (e.g. in data sets M.S. and C.D.) and in other cases it misses the data locations (e.g. in the C.D. data set). Note there is a trade-off between interpretability and fitness (i.e. accuracy and flexibility) that requires careful consideration when selecting a distribution to model data. GEG = four-parameters Generalised Exponential-Gaussian distribution; NO = two-parameter Normal distribution; G = two-parameter Gamma distribution; SN = three-parameter Skew-Normal distribution. The x axis represents RTs (these were divided by 100 to improve numerical stability). See Table 3 for the results of the fits

Table 3 Parameter and goodness-of-fit estimates of 10 distributions fitted to four data sets

GAMLSS in a nutshell: a regression framework for distributional modelling

In the regression context, it is traditional to investigate the effects of independent variables (IVs) on the mean of the dependent variable (DV) and this is achieved via ordinary least squares regression (also known as linear models, LM). Improvements on the LM approach have been reflected in the generalised linear model (GLM) by replacing the required normal distribution of the response variable with the exponential family of distributions (e.g. the Gamma distribution). Although GLM is more flexible than LM, both focus on the effects of the IVs on the DV’s mean. However, even if LM and GLM could also model the effects of the IVs on the DV’s standard deviation, the findings would be limited to the data’s location and scale parameters. Additionally, LM allows only a linear relationship between continuous IVs and DVs, and GLM assumes a linear relationship between the transformed response in terms of the link function and IVs. Generalised additive models (GAM), however, allow modelling such a relationship by using non-parametric (smooth) functions on the numeric IVs. Generalised additive models for location, scale and shape (GAMLSS) is the only existing regression framework that encompasses all these regression methods (Stasinopoulos et al. 2018; Kneib et al. 2021). It also has the extra property of allowing modelling of the effects of IVs on the DV’s location, scale and shape (i.e. skewness and kurtosis) via over 100 statistical distributions, thus enabling a comparison between many different models and proper distributional analysis [Rigby et al. (2020); see also Fig. 3].

Fig. 3
figure 3

Cumulative distribution functions (CDF) illustrating different normal (left plot) and non-normal distributions (right plot). Left plot (differences/similarities in location and scale): black and red CDFs have similar location and similar scale; blue and black/red CDFs have similar location and different scale; blue and green CDFs have different location and similar scale; green and black/red CDFs have different location and different scale. In all these cases, LM and GLM would be able to identify similarities/differences in location only (i.e. mean values); that is, standard techniques are good for detecting shifts but not shapes of distributions. Right plot (different types of shapes): black and blue CDFs represent distributions with positive skew, green and red CDFs represent distributions with negative skew, and the grey CDF represents a uniform distribution. The dotted grey horizontal line cuts through the distributions medians

Statistically speaking, in a GAMLSS model \(Y_i \sim D(\varvec{\theta })\) where the values of \(Y_i\) are n independent observations, for \(i=1,2,\ldots ,n\), and that have probability (density) function \(f_Y(y_i|{\varvec{\theta }})\) conditional to distribution parameters, usually up to four distribution parameters, each of which can be a function of the IVs. For \(k=1,2,3,4\), let \(g_k(.)\) be a known monotonic link function relating a distribution parameter to a predictor \(\varvec{\eta }_k\), such that

$$\begin{aligned} \begin{aligned} g_k(\varvec{\theta })=\varvec{\eta }_k={\mathbf{X}}_k \varvec{\beta }_k+\sum _{j=1}^{J_k}s_{kj}({\mathbf{x}}_{kj}) \end{aligned} \end{aligned}$$

where \(\mathbf{X}_k\) is a known design matrix, \({\beta }_k=(\beta _{k1}, \ldots , \beta _{k J_{k}^{'}} )^{\top }\) is a parameter vector of length \(J_{k}^{'}\), \(s_{kj}\) is a smooth non-parametric function of variable \(X_{kj}\) and the terms \(\mathbf{x}_{kj}\) are vectors of length n, for \(k=1,2,3,4\) and \(j=1,\ldots ,J_k\). Here, \(\varvec{\theta }\) \(=(\tau , \mu , \sigma , \alpha )\), D represents the GEG distribution, and each of the GEG distribution parameters can be modelled as linear or smooth functions of the IVs.

In a nutshell, GAMLSS is thus an interpretable, flexible and sophisticated framework for data modelling. It is interpretable in that it is conceived in the well-known regression framework; it is flexible in that it allows modelling of the response variable via several candidate statistical distributions; and it is sophisticated in that numeric and categorical IVs can be subjected to cutting-edge smoothing algorithms. Thus, GAMLSS could be an educated analytical approach to respond to the current lack of statistical sophistication and rigour permeating research in neuroscience (Nieuwenhuis et al. 2011).

To illustrate the potential of GAMLSS for the analysis of RT data, the data set shown in Figure 2F in Schledde et al. (2017) was examined via GAMLSS. In that study, the authors recorded in monkey motion-sensitive area MT and investigated the latency of neurons in response to stimulus changes under different conditions of attention. Two monkeys were engaged in a change-detection paradigm and the neuron under investigation was responding to a speed change either with spatial attention directed to its receptive field or away from it (i.e. attend-in and attend-out), and with feature-attention directed towards the motion domain or towards the colour domain otherwise (i.e. speed and colour tasks). The authors found that the response latency of the neurons was significant, depending on the attentional condition, with both spatial and feature attention having an influence on the shape of the distribution of latencies.

Such data can be represented as the regression model ‘\(RT \sim A * T\)’; where RT are the neuronal latencies, A and T are the 2-level categorical variables attention (attend-in and attend-out) and task (colour and speed), respectively, and ‘\(*\)’ stands for main effects and interactions. That model is equivalent to a \(2\times 2\) ANOVA design and that is traditionally assessed via LM. GAMLSS models therefore imply four conditional distributions whose ECDFs are shown in Figure 2F in Schledde et al. (2017). There can be several analytical options, but for illustration purposes only a main-effects-with-no-random-effects model was considered. A GAMLSS modelling of the conditional distributions via the 10 probability distributions considered in Table 3 indicated that while the G distribution gave the best fit for one conditional distribution, the NO distribution fitted the three remaining conditional distributions best. A marginal distributional modelling (i.e. all the RTs) showed the EG distribution gave the best fit (the GEG distribution being the second-best fit). Two versions of the regression model shown above were considered; a model in which the DV was modelled via the NO distribution and a model in which the DV was modelled via the EG distribution.Footnote 3 While the NO model corresponds to the classic LM, the EG model is achievable only via GAMLSS. In both cases only the location parameter was investigated and RTs were divided by 100 to improve numerical stability. The results showed that the EG model (AIC = − 73.86) provided a better fit to the data than the NO model (AIC = − 70.09).Footnote 4

Discussion and conclusion

The GEG distribution was proposed as a candidate statistical model of behavioural and neuronal RTs and its statistical properties were described and examined via simulations. Given that the GEG is a four-parameter distribution, it can readily adopt non-normal shapes typically found in RT data; and this was exemplified via real-life data sets. It is a common practice to apply non-linear transformations to RT data to meet parametric assumptions and thus approximate normality or improve symmetry. However, the GEG distribution, and other distributions considered here enable working with the original shape of the data and therefore sidestep unnecessary non-linear transformations. The following paragraphs discuss statistical and practical implications of the GEG distribution within a GAMLSS framework for the analysis of neuronal and behavioural RTs.

Some statistical graphics aspects relating to GAMLSS

Commenges and Seal (1986) argue that explaining the relationship between neuronal RTs and behavioural RTs in well-controlled experiments, depends on the statistical methods used for the data analysis. The GEG distribution is amenable to properly characterise the distribution of both types of RTs conditioned on the specific variables manipulated in an experiment. However, the explanatory power of the GEG can only be appreciated when this distribution is used within a distributional modelling approach. Such a method was briefly described above: GAMLSS (Stasinopoulos et al. 2017). A key step, though, in the distributional modelling of data is the use of statistical graphical techniques that allow the distribution of the data to be examined. Traditionally, bar plots have been used for such a purpose but they do not allow the shape of the data to be visualised; instead, boxplots and violin plots are better techniques. However, empirical cumulative distribution function (ECDF) plots are the optimal approach to investigate the shape of data and are instrumental in comparing vectors of data. An example of ECDFs representing neuronal RTs can be seen in Figure 4A in Mormann et al. (2008). Indeed, Mormann et al. (2008)’s study is an excellent example of how neuronal RTs can provide insight as to neuronal firing associated to brain connectivity and stimuli.

A hypothetical scenario where GAMLSS can be used

The explanatory and predictive power of statistical distributions can only be achieved via the GAMLSS framework. The following lines depict a hypothetical experiment in which GAMLSS could be used to model RTs via the GEG distribution to better understand neuronal activity. A study conceptualised from the the two-stream model of higher-order visual processing (Goodale and Milner 1992) measures extracellular RTs in neurons specialised in the shape of visual stimuli. The study’s goal is to characterise neuronal RTs conditioned on IVs of interest in the experiment; of particular interest is the presentation time of the stimuli. Hence, single- and multiple-neuron recordings are performed in cortical visual areas V1, V2, V3, and V4 from a small sample of neurotypical human adult participants. The researchers define neuronal RTs as the time lapse between the presentation of the stimulus and the moment the neuron generates an action potential. Since each neuron ‘sees’ all stimuli, RT distributions per neuron for all trials and stimuli are obtained. Further suppose that data from a reasonable number of neurons in each V area are obtained (e.g. 10 neurons per area per participant). The stimulus consists of equal numbers of two-dimensional round and angular shapes of equal size and colour (e.g. all black colour). The task consists of sacadding from a fixation point to selected coordinates on a computer screen where the shapes are shown individually and randomly at a fixed interstimulus interval but at three different presentation times (e.g. stimuli are exposed for 10, 30, and 50 ms; i.e. each image is seen three times in total).

A traditional LM or GLM model to analyse the data could be conceived as \(RT \sim V * T * S\), such that RT, V, T, and S stand for the resulting marginal distribution of neuronal RTs, the four V cortical areas, the three presentation times, and the two types of shapes, respectively; and \(*\) stands for the main effects and interactions. The results would inform whether, for example, there are differences in mean RTs among the four V areas, the three presentation times, and/or the two types of shapes. Also, the model would indicate, for example, if there is an interaction between V and T such that differences in mean RTs between V areas may occur at certain presentation times. The other two two-way interactions and the three-way interaction could also be investigated. The issue with modelling such data via LM or GLM is that the findings are limited to mean RTs. Regardless of the main or interaction effects on the mean RTs, it may be the case that there are effects on the RTs’ variability (i.e. the RT’s scale). Neither LM nor GLM can detect potential effects of the IVs on changes in the RT’s scale. GAMLSS, on the other hand, is able to examine effects that the IVs can have on the RT’s location parameter (as LM and GLM do) but can also determine if the same IVs (or a subset of them) can affect the response’s variability (see Fig. 3). Importantly, while LM can model the DV via the Normal distribution only and GLM can do so via distributions from the Exponential family, GAMLSS can use those and any other statistical distribution implemented in the gamlss.dist R package (Rigby et al. 2020) or in a way that can be used within the GAMLSS framework [see Roquim et al. (2021) and the ‘RelDists’ package for examples of this approach].

Modelling the data of this hypothetical experiment via GAMLSS would allow understanding of the trial-to-trial RT signature of single and multiple neurons conditioned on the stimuli and task they are faced with. Furthermore, it could be inferred that signature RTs’ distributions (location, scale and shape) should be able to differentiate between healthy and unhealthy neurons according to task demands (e.g. visual tasks, auditory tasks, type of stimuli). If there is a predictive goal, trees and forest for distributional regression could be used [see Schlosser et al. (2019) and the R package disttree], since they blend algorithmic modelling with GAMLSS modelling.

Applied implications of the GAMLSS modelling

The previous example illustrates how neuronal RTs can be used to explain the processing of stimuli in networks of brain areas. An actual example of the value of investigating neuronal RTs is provided in Figures 5D and 5E in Yoshor et al. (2007). Those figures show an increasing trend in neuronal RTs (Figure 5D) and time to peak (Figure 5E) from zone 1 (areas for shape and colour processing) to zone 4 (areas for object and face processing) in the visual cortex; in particular, there was an RT mean difference between neurons in zone 1 and zone 4. In a similar vein, Mormann et al. (2008) found that parahippocampal, entorhinal, hippocampal, and amygdalan neuronal average RTs were in accord with neuroanatomical evidence. Also, Lin et al. (2018) found increments in RTs’ average and variability from the calcarine fissure (low-order primary sensory cortex) to the fusiform gyrus (higher-order association area), areas within the ventral pathway of face processing. Such types of analytical approach contribute to better characterising brain areas and their functions by placing neurons along a sensory-motor spectrum (DiCarlo and Maunsell 2005). Very likely, a more sophisticated analysis of this type of data via GAMLSS would further the current understanding of neuronal activation in those areas.

Future challenges

Although using the GEG distribution for the analysis of RTs offers a promising avenue for new research, there are some aspects that need to be investigated. It was argued that the GEG is a four-parameter distribution flexible enough to capture changes in location, scale, and shape. The simulation study reported above indicated that the GEG’s scale parameters, that is, skewness and kurtosis, require large samples to allow reliable estimates. Future work should investigate these parameters in more depth. A way to do so is via transformed moment kurtosis and skewness plots [see section 16.1.2 in Rigby et al. (2020)]. These types of plot enable the investigation of regions of possible combinations of transformed moment skewness and transformed kurtosis of the distributions, and so the flexibility of the GEG can be compared to other continuous distributions in terms of moment skewness and kurtosis.

Conclusion

In summary, distributional modelling of neuronal RTs enables fine-grained temporal profiles of brain areas and networks. The GEG has been proposed as a suitable distribution to fit RT data and its use in neuronal and behavioural statistical modelling will contribute to forging the link between neurometrics and psychometrics. Data sets featured in this article and related R codes are available at https://cutt.ly/IWJeSkO.