A Six Parameters Beta Distribution with Application for Modeling Waiting Time of Muslim Early Morning Prayer

Beta distribution is a well-known and widely used distribution for modeling and analyzing lifetime data, due to its interesting characteristics. In this paper, a six parameters beta distribution is introduced as a generalization of the two (standard) and the four parameters beta distributions. This distribution is closed under scaling and exponentiation, and has reflection symmetry property, has some well-known distributions as special cases, such as, the two and four parameters beta, generalized modification of the Kumaraswamy, generalized beta of the first kind, the power function, Kumaraswamy power function, Minimax, exponentiated Pareto, and the generalized uniform distributions. Its moments about the origin, moment generating function, incomplete moments, mean deviations, are derived. The maximum likelihood estimation method is used for estimating its parameters and applied to estimate the parameters of the six different simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two different mosques, were used to illustrate the usefulness and the flexibility of this distribution, as well as, presents better fitting than the other gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions


3 Introduction
Due to its interesting characteristics, the beta distribution is one of the well-known continuous distribution, that has a wide range of application in various filed, such as reliability applications and production quality control. It has a flexible shape, that reflects a wide range of natural and empirical phenomena in nature and reality that can be modelling with this distribution. Its domain, the interval from zero to one, add another interesting characteristic to this distribution by allowing it to consider as a probability distribution of probabilities, such as fraction of time, measurements whose values (or relative values) all lie between zero and one, or the random behavior of percentages and fractions, especially, in the cases when we have no idea about the probability, and therefore, it can be used to represents all probabilities. Another area that used beta distribution for representing possible values of probabilities or a distribution of the probabilities is the Bayesian studies, as being the prior distribution, that is widely used. In fact, it is one of the three common distributions, with the rectangular/uniform and normal distributions, that are employed to represents within the framework Bayesian analysis of continuous variables, Sheskin [1, p. 397]. Data mining methods and techniques need to use information about the prior probability knowledge, hence the beta distribution is representing a candidate for such situations, see Shi [2], and Olson and Shi [3] for further details. For an intensive reference of the beta distribution see Johnson et al. [4, p. 210-275].
The probability density function (pdf) of the four parameters beta distribution, Johnson et al. [5, p. 210], is given by; where, the parameters , , a and b satisfy that > 0, > 0, a and b are real number such that a < b , B( , ) is the beta function, Abramowitz and Stegun [6, p. 258], defined by; and ( ), the gamma function, Abramowitz and Stegun [6, p. 255], defined by; The common widely used form of beta distribution in the literature, is the pdf given by; (1)

3
Annals of Data Science (2021) 8 (1):  This two parameters form is called sometimes, the standard beta distribution, which is obtained from (1) by making the transformation; x = t−a b−a . One direction of the research employing the beta distribution is the generalization of the form given by (4), in order to be even more flexible and cover a lot of shapes.
Armero and Bayarri [7] introduced the Gauss hypergeometric distribution, with parameters p , q, r and , as a generalization to the beta distribution when they studied a Bayesian queuing theory problem, with the following pdf; where p > 0, q > 0, −∞ < r < ∞ , > −1, and F (2,1) is the generalized hypergeometric function defined for non-negative integers n and m by; and (a) k is defined by; Gordy [8] introduced the confluent hypergeometric distribution, with parameters p , q and s, with pdf given by; where p > 0, q > 0 and −∞ < s < ∞.
Pathan et al. [9] introduced a five parameters distribution as a generalization beta distribution, called it generalized beta distribution, with pdf given by; where the parameters , , , and satisfy that > 0, > 0, 0 ≤ < 1, and are real numbers and Φ 1 (.) is the Humbert's confluent hypergeometric function given in Srivastava and Manocha [10, p. 58, Eq. (36)], and derive expressions for its distribution function moments.
Ng et al. [11] study the properties and evaluate the prediction level of a 6 parameters generalized beta distribution model with pdf given by; F (n,m) a 1 , … , a n ;b 1  where B( , ) is the beta function defined by (2). We will write f (x) instead of f (x;a, b, , , A, B) for simplicity. We have the following proposition; Proposition 1 The function f defined by (9) is a pdf with its cumulative distribution function (CDF) F given by; > 0 , implying that f given in (9) is non-neg- It follows that, for any x such that, a Now by using the transformation z = from which we get (10). We note that the F X can be written, for aA where I(z; , ) is the regularized incomplete beta function, Abramowitz and Stegun [6, p. 263], defined by; □ Definition of the SPBD The rv X is said to have a SPBD with parameters a, b, , , A and B written as X ∼ SPBD(a, b, , , A, B ), if its pdf is given by (9), or equivalently, its CDF is given by (10) or (13). Figure 1 shows some plots of the pdf of the SPBD for some of its parameter's values, inducting that this distribution has a lot of different flexible shapes.

Boundaries and Some Limits of the pdf
Let us study the behavior of the pdf of the SPBD(a, b, , , A, B ) at certain points. At the boundary's points, we have from (9) for 0 < < ∞ , that; Therefore;  Similarly; Therefore;

Series Expansion
Proposition 2 The function f given by (9) can be written in the following expansion series.
where Proof Since 0 < a and aA Therefore, using the binomial series expansion, Abramowitz and Stegun [6, p. 14], we can write; and lim A→0 + f aB Similarly, we have that; Hence, using (16) and (17) into the function f given by (9) we get (15).□

The Mode
For aA , we can see that the pdf of the SPBD satisfies the following; Therefore, x f (x) = 0, is equivalent to either f (x) = 0, which is discussed in Sect. 3.1 above, or where Let discuss the real roots of (20), according to the following cases.
A , that is when c 1 = 0 and c 2 ≠ 0 , then (20) has a single root given by; Hence; the root in term of x is given by; (20) in terms of x, that is when c 2 2 − 4c 1 c 3 ≥ 0 , are given by; Since 2 x 2 f x i , for i = 1, 2, and 3 is not easy to be evaluated, an empirical evaluation has to be studied to see at which point x i we have a local maximum in order to determined the mode of the SPBD.

Quantile Function
Let 0 < p < 1 , then the quantile function of the rv X ∼ SPBD(a, b, , , A, B ), Q , is defined by; can be found using (13), to be; where I −1 is the inverse of regularized incomplete beta function.
In particular, the median of X, Med(X) ; is given by; Table 1 represents parameters values and domain ranges of the some selected SPBD data sets, which has different shapes and domain range, that will use for our simulation study in Sect. 5, as well as, will be used for computing of certain Annals of Data Science (2021) 8(1):57-90 statistics of SPBD later in this section, while Fig. 2 represents the plots of the quantile functions of these SPBD data sets.

Lemma 2
1. Let the rv U has the standard uniform distribution, U(0, 1 ), and the rv X defined by has a log-logistic distribution with parameters δ and γ , Johnson et al. [4, p. 151], with CDF given by; has the generalized uniform distribution, Tiwari et al. [18], with CDF given by; has the beta distribution with parameters 1 and c , with CDF given by; an exponential distribution with parameters and b , Johnson et al. [5, p. 494], with CDF given by; has a logistic distribution with parameters and , Johnson et al. [4, p. 115], with CDF given by; Therefore, the rv X ∼ SPBD (a, b, , , A, B). Proof of cases (2) through (9) can be shown on the same lines as the proof of (1).
Annals of Data Science (2021) 8(1):57-90 We may note that the SPBDs stated in Cases 2, 5, 6, 8, and 9 are all special cases of the generalized beta of the first kind distribution (see Case 4 of Sect. 3.7).□

Proof
Therefore, using (13), we have that; On the same lines as the proof of Proposition 3, we can prove the following Propositions 4 and 5. □

Exponentiation Property
Proposition 4 (The SPBD is closed under exponentiation) Let the rv

Reflection Symmetry Property
Proposition 5 Let the rv X ∼ SPBD(a, 1, , , A, B) and let the rv

Order Statistics
Let X 1 , X 2 , …, X n be a random sample of size n from SPBD (a, b, , , A, B ), with pdf f and CDF F, and let X 1∶n , X 2∶n , …, X n∶n be their order statistics, then for i = 1, 2, 3, … , n , the pdf of i-th order statistics X i∶n , f i∶n (x), is given for by; Hence, for a , and using the fact that; we have that; Then the pdf of the rv X i∶n , f i∶n , can be written as; where

Moments about the Origin
Let k = 1, 2, 3, … , then the moment of the rv X ∼ SPBD (a, b, , , A, B) of order k about zero is given by; from which we have that; where F (2,1) is the regularized hypergeometric function, Virchenko et al. [20], defined by; and (a) m is as defined by (7). Note that, in case that A = 0 then;

Mean and Variance
Using (24), the mean of X ∼ SPBD (a, b, , , A, B) is given by; And the variance; Table 2 represents the mean, median, mode and variance of the selected SPBD data sets that are given in Table 1.

The Moment Generating Function
Similarly, the moment generating function of the rv X ∼ SPBD(a, b, , , A, B) , M X (t) , can be found to be;

Harmonic Mean
The harmonic mean of X ∼ SPBD (a, b, , , A, B) , on the same lines as that of the moment of X, is given by;

Incomplete Moments
The k-th incomplete moment of X ∼ SPBD (a, b, , , A, B) , I(z, k) , is defined by; Using the form of the pdf of X given in (15), then; from which we have that; j;a, b, , , A, B) j;a, b, , , A, B) k

Mean Deviations
The mean deviation of X ∼ SPBD (a, b, , , A, B) about its mean = E(X) , MD( ) , is given by; which can be found, Cordeiro et al. [21], to be; Hence, using (13) and (26), for the rv X ∼ SPBD(a, b, , , A, B) , we have that; where is given from (25).
Similarly, the mean deviation of X about its median m , MD(m) , is given by;

Probability Weighted Moments
The probability weighted moments of order s and r of X ∼ SPBD (a, b, , , A, B) , s,r , is given by; Using the fact that; with the use of (24), we have that; j;a, b, , , A, B) k

Renyi Entropy
Let us compute the Renyi entropy as a measure of variation of the uncertainty of the rv X ∼ SPBD (a, b, , , A, B ). For > 0 such that ≠ 1 , we have for the rv X ∼ SPBD (a, b, , , A, B ) that; First, we note that; A, B ). It follows, using (24) that;

Lorenz and Bonferroni Curves
For 0 < < 1 , the Lorenz curve, L( ) , and Bonferroni curves, B( ) , for the rv X ∼ SPBD (a, b, , , A, B ), are given by, respectively; and where Q( ) is the quantile function of the rv X at , and I(z, k) is the incomplete moment of the rv X. Therefore, using (25) and (26), we have that; (27) And similarly, that;

Parameters Estimation of the SPBD
The maximum likelihood estimation (MLE) method will be used for estimating the parameters of the SPBD. Let x 1 , x 2 , … , x n be a random sample from SPBD(a, b, , , A, B ), as given by (9) j;a,b, , ,A,B Since Eqs. (28)-(33) are not easy to be solved explicitly, numerical technique, as Newton Rapson method or any other well-known optimization algorithm, see Shi et al. [22], may be employed to do so, or to use a well-known software package, such as maxLik, Henningsen and Toomet [23], or GAMLSS, Stasinopoulos and Rigby [24], to find the MLE of the parameters of the SPBD.

A Simulation Study
In order to examine the performance of the MLE method given in Sect. 4, we perform a simulation study to do so. The bias and the mean squares errors (MSE) of the estimates are the principle measures of the performance.
The statistical software R and the Absoft Pro Fortran compiler are employed for computing. The maxLik package of the statistical software R is used mainly for computing the MLEs, see Henningsen and Toomet [23] for details of this package, while the Absoft Pro Fortran is used for other needed computations.
The six miscellaneous SPBD models given in Table 1, that have different pdf's shapes and variable ranges, will be used to simulated data sets for each model, and for each data set, the bias and the MSE are computed for the MLE of the model parameters for different simulated sample sizes. The sample sizes that will be taken are 25, 50, 100, 300, 500, and 1000. In each situation, the parameters of, say, the first model of the six SPBD models given in Table 1, are estimated from 5000 random variates generated from the given SPBD model, and the sample mean, bias, variance, and the MSE for the parameters are computed as; and hence M SE ̄ = Var ̄ + Bias ̂ 2 . This procedure is repeated for each sample size, then repeated for each SPBD model. Table 3 shows the bias of the estimated parameters of the different simulated SPBD data sets for each sample size, while Table 4 presents the MSE of the estimated parameters of the different simulated SPBD data sets for each sample size. Both Tables 3 and 4 show, for each of the SPBD model parameters, that the bias and MSE decreases as the sample size increases. Figure 3 shows the behaviour of the MSE plots of the estimated parameters for six the SPBD simulated data sets, which shows graphically, for of the SPBD model parameters, that the Annals of Data Science (2021) 8(1):57-90 Table 3 The bias of the estimated parameters of the simulated SPBD data sets for each sample size n n     MSE decreases as the sample size increases. Hence, from the result, as the MLS plots decreases as the sample size increases, we may conclude that the MLE method seems to have high efficiency as the sample size become large. Table 5 shows the actual values and the MLE parameter values (as the average values for the 5000 replications) of the different simulated SPBD data sets, and Fig. 4 shows visually their corresponding pdf's plots.
In conclusion, the simulation indicates that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models. We will abbreviate this data set by within streets mosque data. Table 6 presents some statistics of the observed mosque data sets. Using both mosque data sets, the MLE method was employed to estimate the parameters of the SPBD model for each, and Table 7 shows the actual and the predicted frequencies, model parameters estimates, the Chi squares goodness of fit test for the SPB, the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, as well as, the likelihood ratio test (LRT) for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions. Figure 5, illustrating the histograms and the fitted pdfs for both main and within street mosque data sets. Now, for the main street data set case, since the p values of Chi squares goodness of fit test for the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, is smaller than 0.05, and that the p value of the SPBD model equals to 0.9488, the SPBD performs better than all these distributions. Although, for the within street mosque data set, the Chi squares goodness of fit test p value of the generalized beta of the first kind distribution equals to 0.23087 inducting that this distribution can fit this data, the SPBD model perform better in this case since its p value equals to 0.96088, and since the p values of Chi squares goodness of fit test for the gamma, the exponential, and the four parameters beta, is smaller than 0.05, the SPBD performs better than all these distributions also. Next, the p values of the likelihood ratio test (LRT) for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions, are less than 0.05, indicating statistically, that SPBD preforms better, in both main and within street data sets. These finding indicates that the SPBD outperforms the gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions and provides the best fit for both main and within mosque data sets.

Summary
A new six parameters beta distribution is introduced, which has a more flexible shape and a wide bounded domain than the than the two (standard) and the four parameters beta distributions, and its properties consisting of, and some of its different various shapes are given to show its flexibility. Its boundaries, limits, mode, quantities, reliability and hazard functions, Renyi entropy, Lorenz and Bonferroni curves are studied. This distribution is closed under scaling and exponentiation, and has reflection symmetry property, and has some well-known distributions as special cases, such as, the two and four parameters beta, generalized modification of the Kumaraswamy, generalized beta of the first kind, the power function, Kumaraswamy power function, Minimax, exponentiated Pareto, and the generalized uniform distributions. Its order statistics, moment generating function, with its moments consisting of the mean, variance, moments about the origin, harmonic, incomplete, probability weighted moments, and mean deviations are derived. The maximum likelihood estimation method is used

3
Annals of Data Science (2021) 8(1):57-90 Table 7 Observed and predicted frequencies, model parameters estimates and goodness of fit for mosque data sets   for estimating its parameters and applied to estimate the parameters of six different simulated data sets of this distribution having different pdf shapes, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from different simulated sample sizes, which are shown to be decreasing as the sample size increases, indicating that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two different mosques, are used in order to show the usefulness and the flexibility of this distribution in application to real-life data sets. The MLE method was employed using these data set to estimate the parameters of the SPBD, the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, and the Chi squares goodness of fit test for these distributions, as well as, the LRT for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions, were employed, and all the results through the p values of these tests, statistically, outperforms SPBDs over the other stated distributions.