Bootstrap confidence intervals of CNpk for type‐II generalized log‐logistic distribution

This paper deals with construction of confidence intervals for process capability index using bootstrap method (proposed by Chen and Pearn in Qual Reliab Eng Int 13(6):355–360, 1997) by applying simulation technique. It is assumed that the quality characteristic follows type-II generalized log-logistic distribution introduced by Rosaiah et al. in Int J Agric Stat Sci 4(2):283–292, (2008). Discussed different bootstrap confidence intervals for process capability index. Maximum likelihood method is considered for obtaining the estimators of the parameter. Monte Carlo simulation technique is applied to find out the coverage probabilities and average widths of the bootstrap confidence intervals. The results are illustrated with real data sets.


Introduction
In the present era, the term "Process Capability Indices" finds frequent space in the statistical quality control literature. If one may keen to study whether the ongoing production process is moving according to the predefined specifications or not, process capability index is the right technique to be chosen as it will help in monitoring and analyzing quality process and productivity. As we know, statistical quality control refers to the use of statistical technique in monitoring and maintaining the standards of products and services. A quality process evaluation procedure such as process capability analysis helps the manufacturer to achieve consumer quality expectations. It is an effective measure to gauge the quality of production process. Process capability analysis is used to determine whether the process capability of a supplier conforms to a customer's specifications, by applying an expression called the process capability index (PCI), to a controlled process. PCI is one such tool to measure the quality process at given specifications. As this method is simple and transparent and underlying assumptions are not complicated compared with conventional methods, PCI methods got popularized. The first PCI was developed by Juran (1974); later, different PCIs found their origin when the underlying distribution is normal, viz. Vännman (1995), Aslam et al. (2013), etc. As pointed by Kane (1986) and Gunter (1989a, b, c, d), the quality characteristic may not be normal in many occasions and assuming normality in such cases could lead to inaccurate and unreliable results. Clements (1989) introduced two PCIs, C p and C pk , for non-normal data by relaxing the normality assumption and brought the concept of quantile-based PCIs. Later, it was developed by Vännman (1995). In a similar way, Kane (1986) developed PCI based on non-normal assumption. Distinguished statisticians made their efforts in developing different PCI methods, viz. Chan et al. (1988), Pearn and Chen (1995), Chen and Pearn (1997), Wood (2005), Chen et al. (2008), Wu and Liang (2010), Perakis (2010) and Kashif et al. (2017). Peng (2010a, b) developed parametric lower confidence limits of quantile-based PCIs and also studied PCIs for 1 3 processes with skewed distributions. Similar developments can be seen from Kantam et al. (2010) and Wararit and Somchit (2012) for half-logistic distribution, and from Rao et al. (2015) for inverse Rayleigh and log-logistic distributions. The main aim behind the development of PCIs is to give indication on the quality process whether it is moving in line with the predefined standards. These standards can be determined by setting lower specification limit (L) and upper specification limit (U). In the traditional approach, it is assumed that the quality process is normally distributed, and then the PCI, C pk , is given by The sample mean (x) and standard deviation (s) derived from a random sample of size n X 1 , X 2 , … , X n are to be used to estimate the unknown parameters and ; hence, Clements (1989) suggested that if the process characteristic is drawn from a non-normal distribution, PCI C pk can be constructed for any distribution as U p , L p and M are, respectively, the 99.865th, 0.135th and 50th percentiles of the concerned distribution. Another method proposed by Chen and Pearn (1997), when the underlying process is from a non-normal distribution, is where q is the qth quantile, i.e.,P X < q = q , p 1 = 0.00135 , p 2 = 0.5 , p 3 = 0.99865 , d = (USL − LSL)∕2 , m = (USL + LSL)∕2 and T is the target value; from (4), we have As described above, many PCI methods are developed so far, and few of the widely used PCIs are C p and C pk developed by Kane (1986). In this paper, we proposed the popular process capability index C Npk when the quality process follows TGLLD. The next part of the article is prepared in the following way. Introduction of TGLLD and the estimation of parameters using ML method are given in "Type-II generalized log-logistic distribution" section. In "Bootstrap (2) C pk = Min USL −x 3s ,x − LSL 3s . (3) confidence intervals" section, the bootstrap confidence intervals are determined for the PCIs proposed by Chen and Pearn (1997). In "Simulation study" section, simulated results for small sample comparison are tabulated. Finally, the benefit of PCIs so developed for TGLLD is demonstrated with an example in "Illustrative example" section.

Type-II generalized log-logistic distribution
Log-logistic distribution (LLD) has proven its importance in quality control. Different authors developed properties and types of acceptance sampling plans for LLD. The cumulative distribution function (CDF) of the log-logistic distribution (LLD) is where σ is the scale parameter and λ is the shape parameter. The practical pertinence of generalized log-logistic distribution (GLLD) in diverse sectors attracted various authors to put their attention in developing some extensions for effective and wide use of log-logistic distribution, viz. Rosaiah et al. (2006Rosaiah et al. ( , 2007. One such extension to this distribution is named as type-II generalized log-logistic distribution (TGLLD) introduced by Rosaiah et al. (2008); its cumulative distribution function (CDF) is It may be noted that the distribution given in (6) is defined through the reliability-oriented generalization of log-logistic distribution. In short, we call this as the type-II generalized log-logistic distribution [type-I generalized (exponentiated) log-logistic distribution is given by Rosaiah et al. (2006)]. The corresponding probability density function (PDF) is given by where σ is the scale parameter, and λ and θ are shape parameters. The three-parameter TGLLD will be denoted by TGLLD ( , , ) . If = 1 , then Eq. (7) becomes log-logistic distribution, and if = 1, then TGLLD becomes Pareto type-II distribution. Since log-logistic distribution is also a survival model as exemplified by many authors in the past, a series system of independent components with common log-logistic lifetime distribution for each component, we are motivated to study some inferential aspects of the distribution of such a series system. As not much work is reported about the study of such a model, we made an attempt to take up some theoretical, applied inferential problems with respect to type-II generalized log-logistic model. The model provided more accurate results, especially when the data were examined for quality characteristics. Rao et al. (2012a, b) developed the reliability test plans for this distribution. The reliability function and hazard (failure rate) function of type-II generalized log-logistic distribution are, respectively, given by Let t 1 , t 2 , … , t n be a random sample of size n drawn from TGLLD (T; , , ) , then likelihood function L of the sample is The log-likelihood function is The log-likelihood equations to obtain MLEs of , and are obtained as Let the parametric function of TGLLD be represented by = ( , , ) ; by recalling the invariance property of MLE, ̂q = q (̂) becomes the maximum likelihood estimator (MLE) of quantile q , where ̂= ̂,̂,̂ be the MLE of = ( , , ) . Hence, we consider that the MLE of the proposed PCI with parametric function ̂= ̂,̂,̂ is Therefore, C Npk (̂) is a real-valued function of quantiles p 1 , p 2 and p 3 .
The qth quantile of TGLLD with parameters = ( , , ) nique is considered for small sample comparison, as it may be difficult to find out mathematical form of sampling distribution of ̂.

Bootstrap confidence intervals
Bootstrap sampling is a method of drawing samples (with replacement) from the underlying probability distribution. Efron (1982) introduced a computationally intensive method of estimation called Bootstrap, a technique of computerbased simulation for estimating the parameters under consideration. Estimates of PCI are determined through bootstrap confidence intervals. As stated by Efron and Tibshirani (1993), for obtaining reasonably accurate confidence interval estimates, a minimum of 1000 bootstrap resamples may be considered. Among many other methods, the types of bootstrap confidence interval developed by Efron and Tibshirani are considered under study, viz. the standard bootstrap (SB) confidence interval, the percentile bootstrap (PB) confidence interval and the bias-corrected percentile bootstrap (BCBP) confidence interval. Let t 1 , t 2 , … , t n be a random sample of size n drawn from a quality process following TGLLD; then, t * 1 , t * 2 , … , t * n is a bootstrap sample of size n drawn with replacement from the original sample. Using this bootstrap sample, the bootstrap Ĉ Npk , denoted by Ĉ * Npk , can be obtained. For B bootstrap samples, we can obtain B bootstrap Ĉ ′ Npk s and arrange them in ascending order, i.e., Ĉ * Npk (1),Ĉ * Npk (2), … ,Ĉ * Npk (B) , which forms an empirical bootstrap distribution of Ĉ Npk . Here, we take B = 10,000 bootstrap samples. (15)

Standard bootstrap (SB) confidence interval
Here, the PCI estimate under study is Ĉ Npk ; then, jth bootstrap sample estimator of Ĉ Npk is where ̂( j) is the jth bootstrap estimator of . Hence, the sample average and standard deviation are obtained as

Bias-corrected percentile bootstrap (BCPB) confidence interval
The complete bootstrap distribution obtained from a sample which could result in the higher or lower side of the expected value which is nothing but the bias, as the name indicates; the third method introduced to correct this potential bias is bias-corrected percentile bootstrap (BCPB) confidence interval. We use the ordered distribution of Ĉ * Npk , to compute the probability p 0 = Pr Ĉ * Npk ≤Ĉ Npk , where Ĉ * Npk is the estimated value of Ĉ Npk . Then, the following are determined: 1. The biased-correction factor Z 0 = −1 (p 0 ) , where (.) is the cumulative standard normal distribution function. 2. P L = (2Z 0 + Z ∕2 ) and P U = (2Z 0 + Z 1− ∕2 ) are computed using Z 0 values. (16) Hence, 100 (1 − )% bias-corrected percentile bootstrap (BCPB) confidence interval for Ĉ Npk is given by where Ĉ * Npk(r) is the rth ordered value of the B bootstrap estimator of Ĉ Npk . To determine the performance of the above three confidence intervals, we considered their estimated coverage probabilities and average widths. The probability that the true value of Ĉ Npk is covered by the 100 (1 − )% bootstrap confidence interval is called "coverage probability," and the same is obtained for the methods discussed above. In addition, the average width of the bootstrap confidence interval is calculated based on 5000 different trials. The performance of the confidence intervals CI SB , CI PB and CI BCPB based on their estimated coverage probabilities and average widths of the bootstrap confidence interval is studied through simulation.

Simulation study
The present section dealt with the results obtained through simulation study on the evaluation of three bootstrap confidence intervals of the process capability index given in Eq. (16) of TGLLD. With different parametric combinations = 1 and = 4, 5, 6, 7 , we consider the sample size n = 10, 15, 20, 25, 30 and set the lower and upper specification limits as 1 and 29, respectively, to draw the simulation results. B = 10000 bootstrap samples of size n are generated from the original sample and repeated the exercise 5000 times. Using three methods, i.e., SB, PB and BCPB, the 95% confidence intervals were obtained. The difference between upper and lower confidence limits called the average width along with bias and MSE is calculated to compare the simulation results which are presented in Tables 1, 2, 3 and 4. The criteria set for comparison of results is the indices having lower average width, and higher coverage probabilities are to be considered.
It is noticed from the results given in Tables 1, 2, 3 and 4 that when the sample size grows the corresponding average width is falling down, indicating that moderately large sample throws better results. When we compared the average width values, BCPB method recorded lower values than SB and PB methods and followed the order BCPB < PB < SB. Average width of all the methods showed an upward trend when the shape parameter raises from 4 to 7. Similar pattern is observed when the other shape parameter turns up from 3.5 to 5. From the coverage probabilities recorded in Tables 1, 2, 3 and 4 for all three methods, a raising pattern is observed when increases from 4 to 7. SB method recorded higher estimated coverage probabilities which are more than the confidence level (0.95) than BCPB and PB methods, and the pattern observed is PB < BCPB < SB. These probabilities get nearer to the confidence level (0.95) in BCPB method when the shape parameter turns up from 4 to 7. When the sample size increased to 30, bias and MSE recorded their lowest values at the parametric values = 4 and = 4.   We show a rough indication of the goodness of fit for our model by plotting the superimposed for the data shows that the TGLLD is a good fit as shown in Fig. 1 and also goodness of fit is emphasized with QQ plot, as displayed in Fig. 1. The maximum likelihood estimates of the threeparameter TGLLD for the runoff amounts are ̂= 0.7616, = 2.6602 and ̂= 1.772 ; the Kolmogorov-Smirnov test found that the maximum distance between the data and the fitted TGLLD is 0.0657 with p value 0.9999. As it can be seen from the high p value that the data set considered is non-normal, TGLLD model is the best fit to this data set. Meanwhile, the maximum likelihood estimates of the twoparameter TGLLD for the runoff amounts are ̂= 2.6602 and ̂= 1.772 ; the Kolmogorov-Smirnov test found that the maximum distance between the data and the fitted TGLLD is 0.2526 with p value 0.1891. Therefore, the two-parameter TGLLD also provides reasonable good fit for the runoff amounts.
The bootstrap confidence intervals and widths of C Npk and C pk are given in Table 5 for the given example. Numerical example shows that width of class intervals is considerably large in traditional C pk method as compared to the bootstrap approach for C Npk . Moreover, among the three bootstrap methods BCPB shows better performance than the other two methods; the simulation results also show the same.

Conclusions
In this article, we constructed bootstrap confidence intervals of process capability index using bootstrap method (proposed by Chen and Pearn, (1997)) by applying simulation technique, assuming that the underlying distribution is TGLLD. Bootstrap confidence intervals are constructed using three methods, i.e., SB, PB and BCPB. The performance of these methods is compared by deriving average Fig. 1 Density plot and Q-Q plot of the fitted type-II generalized log-logistic distribution for the runoff amounts data width, coverage probabilities, bias and MSE from simulation results. ML method of estimation is used to estimate the parameters under study. When both average width and coverage probabilities are considered as the performance criteria, BCPB method throws better results than the other two methods.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.