1 Introduction

Population-based mass screening programs are often involved with a two-stage procedure for the identification of disease of interest. In such a two-stage procedure, the attendees are screened with a first-line tool followed by the provision of confirmatory diagnosis for those with positive results in the second stage. In colorectal cancer (CRC) screening program, a stool-based examination such as fecal immunological test (FIT) is used as the first-line method. Attendees with positive FIT are then referred for the confirmatory examination by using colonoscopy at the second stage. During this two-stage process, the waiting time (WT) for confirmatory diagnosis plays a central role in the balance between the design and scale of a screening program and the limited clinical capacity of offering confirmatory procedure. As a long WT may result in the delayed diagnosis for cancer and shortening WT may impose tremendous burden on the confirmatory procedure, the determination on the scale of providing screening service should thus take into account the impact of WT due to referral for FIT-positive subjects and clinical capacity. Such a two-stage procedure of population-based screening has been implemented in Taiwan and also in other countries including South Korea, Japan, Australia, Italy, Spain, Netherlands, the UK, the USA, and so on (Rabeneck et al., 2020).

As population-based screening commences from the uptake of screening for healthy and asymptomatic population, the arrival rate of attendees is thus the first key factor affecting the number of test-positive subjects and the WT. In addition to the scale of population enrollment, the change in the positive criteria or the cutoff of screening test will also affect the number of subjects requiring for referral and the WT for confirmatory diagnosis given the cap of clinical capacity. The pattern of WT thus plays an important role in the design for population-based screening program in terms of the optimal resource allocation in the context of two-stage screening process.

One of the statistical models considered here for modelling WT is the Phase-type time distributions introduced by Neuts (1974) and modified by Fackrell (2009). This approach provides the strength to elucidate the underlying dynamic of hidden phases to account for progressive multi-phase transitions. One of a subclass that has been extensively utilized in the healthcare industry is the Coxian phase-type (CPH) distribution for the length of hospitalization. Marshall et al. applied the CPH distributions to classify the hospital length of stay of elderly patients into hidden phases on which the allocation of medical resources and costs can be based (Marshall et al. 2007). To further understand the process of hospitalization, Marshall et al. incorporated the Bayesian belief networks with CPH distributions to take into account the effect of patient information on hospital length of stay (Marshall and McClean 2003). Under this context, Gordon et al. further modelled multiple patient transitions between hospital and community by using the mixture of conditional CPH distributions for assessing the efficacy of providing alternative care in the community in preventing hospital readmission (Gordon et al. 2016).

It is therefore of great interest to model the WT for FIT-positive subjects waiting for the referral of confirmatory diagnosis with colonoscopy using the CPH model as indicated above. In addition, the use of the CPH model enables one to assess how the WT is affected by relevant postulated factors such as demographic features, type of institution, geographic areas, and prevalent screen or subsequent screen. Taking such information into account will shed light on the sources of heterogeneity of WT on the dynamic of transition between hidden phases.

On applying the CPH model for the WT of the two-stage process, we are faced with two statistical complications inherited from the context of population-based CRC screening program. First, the WT distribution is strongly determined by the arrival process among those who have the uptake of population-based screening for CRC. Second, it may also be affected by positive rate and the compliance with the referral to confirmatory diagnosis among those with positive results. To solve the first issue, a Poisson Queue process and its regression models are proposed to embed the arrival process of attending CRC screening into the CPH model. The second is related to the mixture type of the Queue process. The CPH model is a special case of hyper-exponential model. It is thus intuitive to consider the application of hypo-exponential model as the referral of participants with positive test will suffer from the non-compliance issue. From the methodological viewpoint, the CPH model has to be modified to accommodate both hyper-exponential and hypo-exponential models. To solve this issue, we proposed a Poisson hurdle regression model (Mullahy 1986) embedded into the CPH model.

Given these issues in the context of two-stage process for population-based screening program, we developed a unifying model to incorporate the three interlinked processes: the Poisson Queue process to model the arrival and the uptake of screening, the hurdle process to model non-compliance to confirmatory diagnosis, and the CPH process to model the most important outcome, the WT among compliers. We applied the proposed queue hurdle Coxian phase-type model (abbreviated as QH-CPH) to estimate the arrival rate, the rate of not complying with confirmatory diagnosis, and the WT and to assess how relevant factors affect three processes in a systematic framework. This novel approach not only takes the arrival rate of eligible screenee into account, but also determines the factors affecting non-compliance of confirmatory colonoscopy and further depicts the dynamic of hidden phases of WT according to relevant postulated factors.

The rest of article are organized as follows. Section 2 provides an overview of Taiwanese nationwide CRC screening program, which motivates our development of the QH-CPH model. An illustration on the application of the proposed QH-CPH model to Taiwanese population-based two-stage screening for CRC with FIT as the first line screening tool and colonoscopy as confirmatory diagnosis is also provided. Section 3 specifies the model description of Poisson Queue process, hurdle model, and CPH distribution, and unifies these three processes into the QH-CPH to fit into the complex two-stage screening data. Section 4 gives a series of simulations on the bias and mean square error (MSE) varying with the number of enrollees with the scenario similar to Taiwanese biennial CRC screening program with FIT test. Section 5 demonstrates the results of arrival rate, positive rate, compliance rate and WT based on the QH-CPH model on the application of Taiwanese nationwide CRC screening data. We further simulated the entire dynamic processes from the arrival rate of being served by FIT until the completion of colonoscopy with a continuous Queue process making allowance for non-compliance with the Poisson hurdle regression model and hidden states of WT using the CPH model. Section 6 discusses the novelty on the methodology and the application of the QH-CPH model, the limitation, and the conclusion about how useful it is for evaluation of the impacts of different demands for colonoscopy attributed to different pre-determined cutoffs of screening test (i.e. FIT) that determines positive rate and also compliance with and WT for colonoscopy.

2 Motivation: two-stage population-based CRC screening

The development of a systematic framework for modeling the WT was motivated by the data on Taiwanese nationwide CRC screening program between 2004 and 2009. Details on the implementation of the FIT-based CRC screening program have been described in our previous study (Chiu et al. 2015). In brief, the nationwide program provided biennial FIT as the first-line test to subjects aged 50 to 69 years and was served in 25 municipalities of Taiwan covering a total of 5,417,699 eligible residents. During the study period, 1,160,895 of the eligible population attended the program, giving a 21.4% coverage rate. Attendees with positive result determined by a FIT level higher than the cutoff value of 20 μg hemoglobin/g feces (Chen et al. 2007) were referred for confirmatory colonoscopic diagnosis in order to detect CRC at their early stage.

For the purpose of assessing the effect of cutoff level measured by the standard unit of μg hemoglobin/g feces (Chen et al. 2007), subjects screened by the stool-based test that cannot be transformed to the standard unit were excluded from our study. We thus enrolled 1,013,183 subjects yielding 1,232,145 FIT measurements during the study period for the following analysis. Given a total of 26,720,380 eligible FIT screens during the study period, the mean annual arrival rate was thus 4.61%. Among the FIT measurements, 45,305 of them were positive, giving a 3.7% positive rate. For those with positive FIT, 33,983 (75%) complied with the referral and received confirmatory colonoscopy as the second stage for CRC screening.

Based on the empirical data on two-stage screening process, the impact of cutoff value for the first-line test (i.e. FIT) on positive rate and the number of referrals can be evaluated. In the scenario using a low cutoff of 10 μg hemoglobin/g feces, the positive rate is increased to around 7% with 86,250 positive tests. On the contrary, a higher cutoff of 30 μg hemoglobin/g feces results in a 3% positive rate and 36,963 positive tests. It is apparent that the number of positive tests is influenced by the cutoff value and further affects the subsequent procedure of confirmatory diagnosis including compliance and WT given a limited clinical capacity. As mentioned above, the Taiwanese nationwide CRC screening program are faced with a high demand of over one million eligible participants, which in turn yields the high demand for the referral of positive FIT for confirmatory colonoscopy. It is thus of great interest to assess the influence of cutoff values of first-line test on WT through a systematic approach.

3 The QH-CPH model for CRC screening data

3.1 Model description for the components of QH-CPH model

For the purpose of modeling the arrival rate for eligible screenees, non-compliance with colonoscopy (non-complier), and the WT for undergoing colonoscopy (complier) with a systematic approach, we applied the Poisson distribution to model the arrival rate, the hurdle model to identify factors affecting non-compliance, and the CPH distribution to identify the underlying dynamic hidden phases of the WT among compliers. The three interlinked processes were then integrated to develop a novel QH-CPH model. The framework of the proposed QH-CPH model in the context of the two-stage screening process is depicted in Fig. 1 with the details on each of three process elaborated as follows.

Fig. 1
figure 1

Queue hurdle Coxian phase-type (QH-CPH) model. FIT( +) consists of positive fecal immunochemical tests (FIT) and negative FIT tests with family history

3.1.1 Poisson queue process

As depicted in Fig. 1, a population-based screening program starts from the attendance of eligible subjects. The Poisson process for the arrivals of the screening program can be expressed by

$$f\left( {Z_{i} ;\nu } \right) = \nu^{{Z_{i} }} \exp \left( { - \nu } \right)/Z_{i} !,\;\;\nu > 0,$$
(1)

where Zi = 1 for subjects participating in CRC screening and Zi = 0 for non-participant, i = 1, …, N (= 26,720,380) denoting the ith subject eligible for attending FIT tests during the study period. The Poisson rate parameter, v, corresponds to the mean arrival rate of screenees per person-year. As the number of subjects participating in the CRC screening program varied with season, sex and age, a Poisson regression model can be applied to incorporating these covariates and capturing the heterogeneity in the Poisson rate parameter, v, as follows.

$$\log \left( \nu \right) = {{\varvec{\upalpha}}}^{^{\prime}} {\mathbf{x}},$$
(2)

where x is the vector of covariates with the corresponding vector of regression coefficient α.

3.1.2 Hurdle model

Following the attendance of screening program, subjects with positive FIT result may or may not comply with referral for confirmatory diagnosis. To accommodate the non-complier of undergoing colonoscopy, a hurdle model was applied. The hurdle model consisted of two components. The hurdle part with a logistic regression model for the binary response captures the compliant behavior, and the non-hurdle part with a truncated Poisson regression model for the count greater than one models the response (WT) among the compliers. Covariates relevant with the compliant or the queue for the confirmatory colonoscopy including sex, age, geographic area, type of screening units, level of urbanization, and round of  screening were incorporated in the regression models.

The probability mass function (p.m.f) for the truncated Poisson process given the count greater than one is written by

$$P\left( {Y_{j} = y_{j} |Y_{j} > 0} \right) = \left( {\tau_{j} t_{j}^{w} } \right)^{{y_{j} }} \exp \left( { - \tau_{j} t_{j}^{w} } \right)/\left[ {y_{j} !\left\{ {1 - \exp \left( { - \tau_{j} t_{j}^{w} } \right)} \right\}} \right],$$
(3)

where Yj is the total number of subjects receiving colonoscopy among the screen-positive subjects in the jth (= 1328) category aggregated by covariates. Denote the referral status of the ith screen-positive subject in the jth category by a binary variable, Yij, which is 1 for those receiving confirmatory colonoscopy and 0 otherwise. The WT for the subject, \(T_{ij}^{w}\), is defined for the period from the date of positive FIT to the date of colonoscopy for Yij = 1. The total number of screenees with positive FIT receiving colonoscopy, Yj, is thus derived by \(\mathop \sum \nolimits_{i = 1}^{{N_{j} }} Y_{ij}\), where Nj is the total number of tests in the jth category of covariates with the total WT for colonoscopy, \(T_{j}^{w}\), derived by \(\mathop \sum \nolimits_{i = 1}^{{N_{j} }} T_{ij}^{w}\). Given the observed total WT in the jth category of covariates, tjw, and the mean rate of subjects with positive FIT receiving colonoscopy, τj, the expected number of receiving colonoscopy, ηj, is thus \(\tau_{j} t_{j}^{w}\), as represented in formula (3).

Let pj denote the probability of refusing to undergo colonoscopy (non-complier) for subjects with positive FIT for the jth category of covariate. For Yj = 0, the p.m.f of the hurdle model can be derived by \({\text{P}}\left( {Y_{j} = 0} \right) = p_{j}^{{N_{j} }}\). For the compliers (Yj > 0), the p.m.f. is written as \(P\left( {Y_{j} = y_{j} } \right) = \left( {1 - p_{j} } \right)^{{y_{j} }} \left\{ {1 - \exp \left( { - \eta_{j} } \right)} \right\}^{ - 1} \eta_{j}^{{y_{j} }} \exp \left( { - \eta_{j} } \right)/y_{j} !\).

To identify the effect of relevant covariates (xj) such as sex, age at screening, geographic areas, type of screening units, level of urbanization, and round of screening on the non-compliance behavior and the WT for colonoscopy, we applied a logistic regression model for pj as follows

$$p_{j} = {\text{exp}}\left( {{\mathbf{x}}_{j} {\varvec{\gamma}}} \right)/\left( {1 + {\text{exp}}\left( {{\mathbf{x}}_{j} {\varvec{\gamma}}} \right)} \right),$$
(4)

and a Poisson regression model for ηj as follows

$$\log \left( {\eta_{j} } \right) = \log \left( {t_{j}^{w} } \right) + {\varvec{x}}_{j} {\varvec{\beta}}.$$
(5)

The log-likelihood function for the proposed hurdle-Poisson model is thus written by

$$\begin{aligned} \ln L = \log \left\{ {\prod {\left( {p_{j} } \right)^{{N_{j} }} \left( {1 - p_{j} } \right)^{{y_{j} }} \left( {1 - \exp \left( { - \eta_{j} } \right)} \right)^{ - 1} \eta_{j}^{{y_{j} }} \exp \left( { - \eta_{j} } \right)/y_{j} !} } \right\} \\ = \log \left\{ {L_{1} \left( {{\varvec{\upgamma}}} \right)} \right\} + \log \left\{ {L_{2} \left( {{\varvec{\upbeta}}} \right)} \right\}, \\ \end{aligned}$$
(6)

which is substituted by formula (4) and (5).

3.1.3 CPH distribution

For the complier, the time from the date with positive FIT result to the date of confirmatory colonoscopy, WT, can be categorized into hidden phases by using a CPH distribution and treating the receiving of confirmatory colonoscopy as an absorbing state. Following the notation of Marshall et al. (2007), the CPH model describes the time to absorption for a finite Markov chain in continuous time over the phases {1, 2,…, k, k + 1}. This Markov chain of CPH has one absorbing phase (k + 1)th state, and k transient phases (1, …, k). The process of interest starts only from the first transient phase. Applying the CPH model to the WT data, the transient phases represent the hidden types of subjects with long or short WT for colonoscopy and the absorbing phase is the occurrence of compliance with colonoscopy for a subject. Let random variable T represent the WT to receive confirmatory diagnosis, which is a CPH process specified as follows. The infinitesimal generator for the CPH process is specified by a block-matrix form as follows

$${\varvec{R}} = \left( {\begin{array}{*{20}c} {\varvec{Q}} & {\varvec{q}} \\ 0 & 0 \\ \end{array} } \right).$$

The submatrix Q is defined by

$${\varvec{Q}} = \left[ {\begin{array}{*{20}c} { - \left( {\lambda_{1} + \mu_{1} } \right)} & {\lambda_{1} } & 0 & \cdots & 0 & 0 \\ 0 & { - \left( {\lambda_{2} + \mu_{2} } \right)} & {\lambda_{2} } & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & { - \mu_{k} } \\ \end{array} } \right],$$

and q is specified by

$${\varvec{q}} = \left( {\mu_{1} ,\mu_{2} , \ldots ,\mu_{k} } \right)^{T} .$$

The submatrix Q is a transition rates matrix restricted to the transient phases and q is a vector of transition rates from a series transient phases to the absorbing phase. The parameters depicting the CPH process contained in submatrix Q and vector q is defined as follows. The transition rate from the transient phase s to s + 1 is λs. The probability of transition between successive transient states is thus

$$P(X\left( {t + \Delta t} \right) = s + 1 |X\left( t \right) = s) = \lambda_{s} \Delta t + o\left( {\Delta t} \right)\;\;{\text{for}}\;\;s = 1, \ldots k - 1.$$
(7)

The parameter μs depicts the rate of absorbing to phase k + 1 from transient phase s. The probability of absorbing for the transient state s is written as

$$P(X\left( {t + \Delta t} \right) = k + 1 |X\left( t \right) = s) = \mu_{s} \Delta t + o\left( {\Delta t} \right)\;\;{\text{for}}\;\;s = 1, \ldots ,k.$$
(8)

On applying the CPH model to the waiting process of confirmatory colonoscopy for subjects with positive FIT, μs represents the referral rate for the sth type of hidden WT classification and λs is the rate of transition between the hidden types of waiting procedure such as short and long waiting.

To ensure that the absorption occurs in a finite time with probability one, every non-absorbing state is transient, so the matrix R can be divided into blocks and let the submatrix Q contain no transitions of the absorbing state. Due to the absorption occurs in a finite time with probability one, the process with transition intensity matrix Q is an honest Markov process. As the differential equation either by forward or backward Kolmogorov equations have the same unique solution to an honest Markov process (Cox and Miller 1965), the formal solution of transition probability matrix can be derived by

$${\mathbf{P}}\left( {\varvec{t}} \right) = {\text{exp}}\left( {{\varvec{R}}t} \right)$$
(9)
$$= \mathop \sum \limits_{n = 0}^{\infty } {\varvec{R}}^{n} \frac{{t^{n} }}{n!}.$$
(10)

For finite matrix R, that is when the number of states of the process is finite, the series (10) is convergent and (9) is the unique solution of both forward and backward equations. As a result, the probability density function (p.d.f) of CPH is the derivative of the transition probabilities specified by (9) as follows:

$$f_{c} \left( t \right) = \frac{d}{dt}P_{1k} \left( t \right),$$
(11)

where P1k(t) is the (1, k) element of transition probability matrix P(t). The p.d.f of (11) represents the probability of entering the absorbing from the 1st phase at time t. The WT in each phase and the marginal WT are thus derived as follows. Let Ts represent the length of WT in phase s, where Ts ~ exp(λs + μs), the moment-generating function (MGF) is given by \(M_{{T_{s} }} \left( \theta \right) = \left( {\lambda_{s} + \mu_{s} } \right)/\left( {\lambda_{s} + \mu_{s} - \theta } \right)\) with the expected value of E(Ts) = 1/(λs + μs). For the marginal mean of WT in the system, T, it can be calculated by the MGF as follows

$$M_{T} \left( \theta \right) = \mathop \int \limits_{ - \infty }^{\infty } \exp \left( {t\theta } \right)dP_{1k} \left( t \right)$$
$$E\left( T \right) = M_{T}^{^{\prime}} \left( 0 \right).$$

3.2 The QH-CPH model

With the integration of the three models specified above, the QH-CPH distribution for WT in the context of population-based screening program starting from attendance, positive FIT results, compliance for referral, and the WT for receiving colonoscopy can be expressed by a systematic approach as follows

$$f\left( {z,x,y,t;\nu ,\pi ,p,\lambda ,\mu } \right) = \exp \left( { - \nu } \right)\left\{ {\nu \left( {1 - \pi } \right)^{1 - x} \left[ {\pi p^{1 - y} \left\{ {\left( {1 - p} \right)f_{C} \left( t \right)} \right\}^{y} } \right]^{x} } \right\}^{z} ,$$
(12)

where v is the mean arrival rate of screenees per person-years, which comes from the Poisson queue model (Z = 1 is participant and Z = 0 is non-participant). The Poisson regression can be used to take covariates into account as in (2). Parameter π is the positive rate for FIT with X = 1 for positive result and X = 0 for negative result based on the pre-determined cutoff value of FIT. The probability of refusing to undergo colonoscopy is captured by p with Y = 0 for the non-complier of colonoscopy and Y = 1 for complier. The parameter p can be estimated by using the hurdle model. The logistic regression form was adopted to identify significant covariates by transforming regression coefficients estimated from the binary part of the hurdle regression model (see Supplementary Table 1) into a new continuous covariate (noncompliance score, score1). The continuous noncompliance score was then divided into a binary variable by using the median value of score1 (1.6711) associated with the noncompliance probability as follows

$${\text{logit}}\left( p \right) = \beta_{0} + \beta_{1} \times G_{1} ,$$

\(G_{1} = \left\{ {\begin{array}{*{20}c} {1,\;\;\;if\;\;score_{1} \ge - 1.6711 } \\ {0,\;\;\;if\;\;score_{1} < - 1.6711} \\ \end{array} } \right..\)

For the marginal WT of receiving colonoscopic exam, T, the p.d.f is fc(t) derived from the CPH distribution as elaborated by (11). The proportional hazards regression form was applied to transition/referral rates of the CPH model by using the regression coefficients estimated from the non-hurdle part as score2 (WT score) (Supplementary Table 1), which was also transformed into a binary variable by using the median value of score2 (3.4869) as follows

$$\lambda_{s} = \lambda_{0s} {\text{exp}}\left( {\gamma \times G_{2} } \right),$$
$$\mu_{s} = \mu_{0s} {\text{exp}}\left( {\gamma \times G_{2} } \right),$$

\(G_{2} = \left\{ {\begin{array}{*{20}c} {1,\;\;\;if\;\;score_{2} \ge - 3.4869} \\ {0,\;\;\;if\;\;score_{2} < - 3.4869} \\ \end{array} } \right..\).

Note that the continuous score1 (noncompliance score) and the continuous score2 (WT score) were calculated from the regression coefficients of hurdle (binary) part and non-hurdle part of the Poisson hurdle model, respectively, which are shown in Supplementary Table 1. The higher the score1, the more likely not to undergo colonoscopy. The higher the score2, the shorter WT to undergo colonoscopy. For example, a 67-year-old woman dwelling in non-metropolitan area of eastern Taiwan attended FIT screening for the first time at hospital, then the score1 according to all estimates of regression coefficients (see below) was −0.0762 (=-2.3532+0.0283+0.2309 +0.4809+0.1946+0.2103+1.1320). Once she underwent colonoscopy, the score2 was −4.0683 (= −4.2421−0.0560+0.2298).

3.3 Criteria for model selection

After constructing the likelihood function of the QH-CPH distribution in (12), the Bayesian information criterion (BIC), which is widely applied to likelihood-based model (Schwarz 1978), was used for model selection. In our analyses, we have to determine what is the optimal number of phases for the WT and what are significant covariates responsible for arrival rate, positive rate, and compliance rate. As parameters need to be estimated increase, the overfitting might occur. As a result, the penalty term using BIC was introduced to deal with this problem. After testing in each stage, the most appropriate model was determined with the minimum BIC score.

4 Simulation study

We set up a scenario similar to Taiwanese population-based two-stage screening for CRC between 2004 and 2009, which was a biennial screening program with 4.5% of arrival rate (per year), 3.7% of positive rate, 75% of compliance rate, and a 6-year follow-up period. Given this base scenario, a two-phase QH-CPH model (the optimal model identified by BIC and presented in the Results of the following Sect. 5) was used to assess the influence of the number of eligible subjects on the performance of estimated marginal WT given the preset scenario. For comparison, we assume that the manpower of colonoscopists would increase along with the expanded compliers and vice versa along with the reduced compliers so the clinical capacity could be affordable to have a confirmatory colonoscopy in 37 waiting days. The point estimate, bias, variance, and MSE of estimated marginal WT for the scenarios with a series of number of eligible subjects were derived from 100 repetitions.

Table 1 demonstrates that the estimated results on the marginal WT were close to the true value of 37 days derived from the two-phase QH-CPH model, which declined from 39.1 to 38.3 days as the number of eligible subjects increased. Both the biases and variances reduced in parallel with the increase in the number of eligible subjects. Large variation (Variance: 44.5) for the scenario with 4500 eligible subjects was noted and reduced greatly (Variance: 0.30) when the eligible subjects reached up to 450,000. When sample sizes increased from 4500 to 450,000, a tremendous decrease in the MSE from 48.9 to 1.72 was observed. Nonetheless, the bias of estimated marginal WT was small when the number of eligible subjects reached up to 10,000 (Bias: 1.6). For the scenarios with more than 80,000 eligible subjects, the MSE became stable. Accordingly, a minimal of 80,000 eligible subjects was required for the derivation of stable estimate on the marginal WT by using the two-phase QH-CPH model.

Table 1 Results of simulation study for the two-phase queue hurdle Coxian phase-type (QH-CPH) model based on 100 repetitions

4.1 Software used for estimation and simulation

All estimations were used by SAS 9.4 software with interactive matrix language (IML), which supports user-defined functions for non-linear optimization. We used the Newton–Raphson (NLPNRA) algorithm to derive maximum likelihood estimations (MLEs). In the part of simulation, the data was simulated by using SAS 9.4 software with the SAS DATA step.

5 Results

5.1 Empirical results on Taiwanese two-stage population-based screening for CRC

The empirical data on WT of Taiwanese CRC screening program show a positively skewed distribution with a long tail, representing an extremely long WT for confirmatory colonoscopy among some FIT-positive cases (Supplementary Fig. 1). The model selection criteria show that the two-phase QH-CPH model was the most feasible model with minimum BIC value. Compared with the one-phase QH-CPH model, the BIC for two-phase QH-CPH model was reduced by 1,324. Although the three-phase QH-CPH had a lower BIC value by 15 compared with the two-phase model, the estimated referral rate of the moderate waiting phase (μ2) was 0, suggesting the identifiability issue between μ2 and the transition rate from the short waiting phase to moderate waiting phase (λ1).

On the implementation of screening program, most screenees participated in the FIT screening program during the period between March and September and the rate was low from October to February. The two-phase QH-CPH model with the consideration of such a season effect was thus used to depict the empirical WT of Taiwanese CRC screening program. The two-phase QH-CPH model classified the states for receiving confirmatory colonoscopy into short waiting phase and long waiting phase (Table 2 and Supplementary Fig. 2) with the corresponding mean WT of 33 days and 152 days, respectively. The marginal mean WT was estimated as 37 days.

Table 2 Estimated results of two-phase queue hurdle Coxian phase-type (QH-CPH) models

In addition to the season effect, the arrival rate might vary with sex and age as well. These effects were thus incorporated into the two-phase QH-CPH model using a proportional hazards regression form. Table 3 shows the estimated results by incorporating these covariates into the proposed two-phase QH-CPH model. The comparison of BIC values between models suggest that the model including season, sex, and age (Model B in Table 3) was better than the model with only season as the covariate (BIC reduction by 93,137) and that with season and sex (Model A in Table 3, BIC reduction by 22,151). The estimated results based on Model B show that male and young subjects had a lower arrival rate despite a common positive rate (3.7%), non-compliance probability (26.3%), and marginal WT (37 days). It is very interesting to note that the referral rate was almost five times greater in the short waiting phase (0.0299) compared with the long waiting phase (0.0066). Around 15% subjects were in a dilemma referred to undergo colonoscopy and were trapped in the long waiting phase.

Table 3 The comparison of estimated results derived from a series of nested QH-CPH models with two-phases

Apart from the variation on arrival rates, compliance rates with colonoscopy and the median WT may vary with specific characteristics as well. For the Model C (shown in Table 3) taking into account these characteristics in the form of noncompliance score (score1) derived from the hurdle part of Poisson hurdle model based on Supplementary Table1 (as described in Sect. 3.2), there was a decrease in BIC value by 581 compared with the model considering only the heterogeneity in arrival rate (Model B). This suggests that the large variation of complying with colonoscopy was attributed to noncompliance score (score1). To further accommodate the variation of the WT captured by WT score (score2) calculating from non-hurdle part of the Poisson hurdle model based on Supplementary Table1 (as described in Sect. 3.2), we extended Model C to assess whether the WT score (score2) had more impact on the transition rate of undergoing colonoscopy with short waiting phase (μ1), or on the transition rate of undergoing colonoscopy with long waiting phase (μ2), or on the transition rate from short waiting phase to long waiting phase (λ1). The estimated results along with BIC values for a series of extended models are listed in Supplementary Table 2. The differences of BIC were 289 and 296 compared to both models with score2 related to λ1 and μ2, respectively. The results of model selection specified in Supplementary Table 2 show the model with score2 related to μ1 was the best fitted model (Model D in Table 3), representing the high heterogeneity was noted in the transition rate of undergoing colonoscopy with short waiting phase (μ1). Therefore, Table 3 shows that this final model (Model D) was much more interpretable than the previous one (Model C) due to the reduction of BIC by 287. Based on the estimated results of this best fitted model (Model D), the probability of not complying with colonoscopy was 21% in low score1 group and 31% in high score1 group. Among compliers, the mean WT in short waiting phase were 36 days and 29 days in low score2 group and high score2 group, respectively, while the mean WT in longer waiting phase was 151 days. The corresponding marginal mean WT were 40 days and 33 days for the low and high score2 group, respectively.

Based on the estimated results of Model D in Table 3, we predicted the transition probabilities between short and long waiting phase by different WTs among compliers (Supplementary Fig. 3). The probability of staying in short waiting phase (\({\text{P}}_{11}\)) declined over time. Subjects with lower score2 had longer WT in short waiting phase compared with those with higher score2 given the same probability of staying in short waiting phase (black and gray solid line, Supplementary Fig. 3). The transition probability from short waiting phase to long waiting phase (P12) was small with ignorable difference between these two groups (black and gray dotted line, Supplementary Fig. 3). The transition probability of undergoing colonoscopy (P13) increased over time, because attendees received colonoscopic exam eventually. Under the same transition probability to undergo colonoscopy, the low score2 group had longer WT than the high score2 group (black and gray dash line, Supplementary Fig. 3).

5.2 Projection of WT by different scenarios

Based on the estimated results, Table 4 shows how the WT was affected by three parameters, arrival rate, positive rate, and compliance rate by using a simulation study with the QH-CPH model under the context of Taiwanese CRC screening program. Here, arrival rate was re-parameterized annual coverage rate for a better understanding and application to screening policies. In Table 4, we first set a 3% positive rate and 70% compliance rate for a total of 26,720,380 eligible FIT tests equivalent to the clinical capacity during the period of 2004–2009 in Taiwan (third column, Table 4). For the annual coverage rate increased from 40 to 80%, the mean WT for colonoscopy prolonged from 40 to 80 days, respectively. Under the same scenario with the positive rate rose from 3 to 7% (fourth column, Table 4), the mean WT ranged from 94 to 188 days. The mean WT not only increased with coverage rate, but also lengthened with higher compliance rate.

Table 4 Predicted waiting time for colonoscopy among overall subjects, high WT group, and low WT group given the joint effect of coverage rate, compliance rate, and positive rate

Table 4 also shows the impact of coverage rate, compliance rate, and positive rate on the predicted WTs among subjects with high and low WT score. The larger the compliance rate, the more incremental WT in the low WT score group. For example, it would increase by 8 waiting days for the low WT score group (47 days) compared with the high WT score group (39 days) under 60% coverage rate, 3% positive rate, and 50% compliance rate. The corresponding figure increased to 15 days (85 days vs. 70 days for low and high WT score, respectively) with a 90% compliance rate.

5.3 The referral policy of WT for colonoscopy

Based on the sensitivity analysis depicted above, the results derived from QH-CPH model provide a quantitative assessment for the impact of various scenarios of the screening policy on WT given a predetermined clinical capacity. The Taiwanese national CRC screening policy has been revised since the year 2010 to increase the accessibility and coverage rate of the program. After then, the volume of screenees expanded rapidly with an increased (7.3%) positive rate. As a result, a decline in the compliance rate with the prolongation in WT was expected.

Given a tolerable WT of three months, the screenees were preferably recruited in a gradually rolling out manner with the consideration of the scale of eligible population in Taiwanese CRC screening program with a two-year program to reach an 80% coverage rate given a 7% positive rate and a 70% compliance rate. Based on the simulation study provided in Table 4, the desirable WT of three months can be achieved by altering the cutoff of screening test and also the compliance rate with confirmatory diagnosis. For example, if the cutoff value of FIT positive was shifted from 30 μg hemoglobin/g feces to 10 μg hemoglobin/g feces, the positive rate would increase from 3 to 7%, resulting in a prolonged overall WT by 2.35-fold (from 40 to 94 days) given 40% coverage rate and 70% compliance rate.

6 Discussion

The proposed QH-CPH model here succeeded in linking three stochastic processes together as a unified framework, including the arrival rate of attending screen, referral rate due to the positivity of screening test, and the compliance with colonoscopy in relation to WT for colonoscopy. Because those who had FIT positive would be referred to undertake colonoscopy examination within three months in our nationwide policy which has already considered the capacity of colonoscopy there must be other significant factors affecting the WT when he/she delayed the schedule for colonoscopy examination. That’s why we still need to take significant covariates into account that were incorporated into the transformed score after these compliers were classified into two group (short-wait and long-wait) by using the two-phases CPH model.

The merits of this study not only make contribution to developing a new statistical method for modeling the joint effects of the three crucial parameters inherited from the context of the screening program but also provide a series of projections of WT by different scenarios to facilitate the planning of population-based screening program. The further incorporation of relevant covariates including personal attributes and seasonal and geographical variations is to elucidate the heterogeneity of the proposed QH-CPH model. For the application of the proposed QH-CPH model to population-based screening program with the characteristics of queue and non-response to colonoscopy, the findings gave a clue to explore the causes dominating the differences with respect to provider factors such as the implementation of screening program and medical resources and population factors such as the knowledge and attitude toward CRC screening and medical interventions. The results of covariates included in the QH-CPH model also provide more insight on the facilitation for the referral of positive FIT. Because the costs in terms of administrative and medical resources are also required in each phase of screening activities, the results of the QH-CPH model can shed light on the allocation of these limited resources with efficiency.

One of the limitations of the proposed QH-CPH model is related to whether the homogeneous processes used in the arrival rate, the hurdle part, and the CPH process can account for the heterogeneity of empirical scenario. To reduce this concern, relevant covariates have been incorporated into each process by using proportional hazards regression forms. However, such a heterogeneity may be explained beyond these covariates. The introduction of random effect parameters to each process may provide a solution but the likelihood function would become computationally intractable. This becomes the subject of ongoing research.

In conclusion, we developed a new QH-CPH model to quantify the three articulated stochastic processes from the uptake of screening using the queue process, the compliance with the referral of positive results of screenees to have confirmatory diagnosis using the hurdle model, and the CPH model to identify hidden phases during the WT for undergoing colonoscopy for the referrals. Given the limited clinical resources, the development of this new model not only provides a new insight into the underlying process affecting the WT for early detection of CRC, but also helps clinicians or hospital managers improve the quality of service and provide health policy-makers for making the optimal decisions.