1 Background of the Study

Cancer is the name given to a collection of related diseases. Cancer can start almost anywhere in the human body, which is made up of trillions of cells. It is one of the leading causes of death in the world and represents a tremendous burden on patients, families and societies [1].

There were 12.7 million new cancer cases in worldwide, of which 5.6 million occurred in developed countries and 7.1 million in developing countries. The corresponding estimates for total cancer deaths were 7.6 million 2.8 million in developed countries and 4.8 million in developing countries. There were an estimated 4.9 million new cases and 0.266 million global deaths from cervical cancer accounting for 7.5% of all female cancer deaths. Cervical cancer is one of the leading causes of death in the world and represents a tremendous burden on patients, families and societies. It is estimated that over one million women worldwide currently have cervical cancer; most of them have not been diagnosed or have no access to treatment that could cure them or prolong their lives [2, 3].

Survival data is a term used for describing data that measure a time to the occurrence of a given event of interest. In this study the event of interest is survival time of cervical cancer patients from the day of diagnosis. One of the major aims of this analysis was to assess the survival of women with cervical cancer using various parametric frailty models. Kaplan and Meier obtained one important development in non-parametric methods [4]. The non-parametric methods work well for homogeneous samples; they do not determine whether certain variables are related to the survival times [5]. The Cox PH model has the restriction that proportional hazards assumption holds with time-fixed covariates; and it may not be appropriate in many situations and other modifications such as stratified Cox model or Cox model with time-dependent variables are required [6]. Study subjects (cervical cancer patients) in this study came from clustered community and hence clustered cervical cancer patients survival data may be correlated at the regional level. In this study shared frailty models were explored assuming that cervical cancer patients with in the same cluster (region) shares similar risk factors. Frailty model is common to all individuals in a cluster and responsible for creating dependence between event times [7]. Although the Cox regression model is the most favorable employed technique in survival analysis, parametric models do have a number of benefits [8].

2 Methodology

2.1 Study Population and Setting

The population of this study was all cervical cancer patients who had been registered at Tikur Anbessa specialized hospital, Oncology center, from 2011–2015. All data were carefully reviewed from the registration log book and patients’ registration card; if any inadequate information was countered it was checked from the file and excluded from analysis if proven to be inadequate. A total of 907 women with cervical cancer from whole regions of Ethiopia were considered.

The response variable for this study is survival time of the women with cervical cancer which measured in months. Death of cervical cancer patients was event of interest for this research. The patients were censored for those loss follow up and not died up to study ends. The following variables were considered for their influence on survival time of cervical cancer patients; age, smoking status, recurrence, stage of the disease, treatment taken, cycles of chemotherapy, number of sexual partner, number of children, aim of radiography, history of abortion, HIV status, family history, age at marriage and age at first birth.

2.2 Hazard Function

The hazard function \(\hbox {h(t) }\) the instantaneous potential for failing at time t, given that the individual has survived up to time t [9].

$$\begin{aligned} \hbox {h(t) }= \quad \frac{f(t)}{S(t)}=-\frac{d}{dt}\ln S( t ) \end{aligned}$$
(1)

where f(t) and S(t) are the probability density function and survival probability function at time t.

2.3 Kaplan–Meier Product Method

Kaplan–Meier estimator incorporates information from all of the observations available, both censored and uncensored, by considering any point in time as a series of steps defined by the observed survival and censored times [4]. Therefore, the Kaplan–Meier estimate of the survival function at time t is given by:

$$\begin{aligned} \hat{{S}}(t)=\prod _{i=1}^k {\left( {\frac{R(t_{(j)} )-d_{(j)} }{R(t_{(j)} )}} \right) },\quad \hbox { for }\quad t_{(j)}{} { ~<~t<t}_{(j+1)}, \quad \,{ k~=~1,2,{\ldots },r} \end{aligned}$$
(2)

2.4 Log-rank Test

Assessing whether or not there is a real difference between groups can only be done, with any degree of confidence, by utilizing statistical tests [10].

$$\begin{aligned} Q=\frac{\left[ {\sum \nolimits _i^m w_i \left( {d_{1i} -\hat{e}_{1i} } \right) } \right] ^{2}}{\sum \nolimits _i^m w_i^2 \hat{V}_{1i} }\sim \chi _{k-1}^2 \end{aligned}$$
(3)

2.5 Cox PH Regression Model

The non-parametric method does not control for covariates and it requires categorical predictors. A broadly applicable and the most widely used method of survival analysis [11].

$$\begin{aligned} \widehat{HR} =\frac{h_0 (t)\exp \big (\hat{\beta }^{,}X\big )}{h_0 (t)\exp \big ({\hat{\beta }}^{,} X^{{*}}\big )} \end{aligned}$$
(4)

2.6 Parametric Frailty Models

The concept of frailty provides a suitable way to introduce random effects in the model to account for association and unobserved heterogeneity [12]. Models constructed in terms of group-level frailties are referred to as ‘shared’ frailty models because observations within a subgroup share unmeasured ‘risk factors’ that prompt them to exit earlier than other subgroups [13]. A random effect describes excess risk or frailty for distinct categories, such as individual or families, over and above any measured covariates. It is recognized that individuals in the in the same group (cluster) are more similar than individuals in different cluster because thus, frailty or random effect model try to account for correlations within groups [14]. The inverse Gaussian distribution was introduced as a frailty distribution alternative to the gamma distribution [15].

2.7 The Gamma Frailty Distribution

The gamma distribution has been widely applied as a mixture distribution . From a computational and analytical point of view, it fits very well as a mixture distribution to failure data. The most common reason for using the gamma distribution is its mathematical convenience. This is due to the simplicity of the derivative of the Laplace transform, meaning that traditional maximum likelihood procedures can be used for parameter estimation [15]. The density of a gamma-distributed random variable with parameter \(\theta \) is given by

$$\begin{aligned} f_z (z) =\frac{Z_i \exp \left( {{-Z_i }\big /\theta } \right) }{\theta ^{\frac{1}{\theta }}\Gamma \left( \frac{1}{\theta }\right) },\quad \theta >0 \end{aligned}$$
(5)

2.8 The Inverse Gaussian Frailty Distribution

The inverse Gaussian (inverse normal) distribution was introduced as a frailty distribution alternative to the gamma distribution by Hougaard [16]. The probability density function of an inverse Gaussian distributed random variable with parameter \(\theta >0\) is given by

$$\begin{aligned} f(z)=\frac{1}{\sqrt{2\pi }}z^{-3/2}\exp \left( {-\left( {\frac{1}{2\theta _z }} \right) \left( {z-1} \right) ^{2}} \right) \end{aligned}$$
(6)

In order to investigate effect of the candidate covariates on survival time of women with cervical cancer, we first did univariable parametric frailty models analysis by fitting a separate model for each candidate covariates. Covariates that were found to be significant in the univariable analysis were included in the multivariable analysis.

The multivariable parametric frailty models in the study was done by assuming the Exponential, Weibull, Log logistics and Log normal distributions for the base line hazard function. It was performed by using the twelve most significant covariates namely age, smoking history, cancer recurrence, stage, cycles of chemotherapy , sexual partner, family planning, family history, abortion history, HIV status, age at first marriage and age at first birth. However we excluded, Aim of radiotherapy, Treatment taken, Number of children, which were not significant in the univariable analysis.

3 Results

Of all 907 cervical cancer patients 349 (38.48%) were died and the overall median survival time was 26 months while the minimum observed event time was 6 months and the maximum was 60 months, respectively (Table 3).

The AIC value of Log-normal Inverse-Gaussian model 1159.95 was the minimum among all the other AIC values of the models indicating that it was the most efficient model to describe cervical cancer patients’ dataset (Table 1).

Table 1 AIC values of the parametric frailty models

Analysis based on Log-normal Inverse-Gaussian model showed that age, smoking history, stage, family history, abortion history, HIV status, age at first marriage and age at first birth were significant.

The output of the Log-normal Inverse-Gaussian showed patients with age groups 51–60, 61–70 and > 70, women who smoking cigarettes, patients with stage III and IV, women who had family history of cervical cancer, patients with previous abortion history and living with HIV AIDS were statistically significantly shorten survival time of women with cervical cancer while patients married after arrived 20th birth day and not giving birth up to the study ends prolong the survival time of cervical cancer patients.

The acceleration factor for patients with age group 51–60 and 61–70 are estimated to be 0.56 and 0.49 respectively this indicates patients with age group 51–60 and 61–70 have less survival time than the age group of 20–30 patients. Similarly acceleration factor for patients with age group > 70 is estimated to be 0.67 this implied that women with age group 20–30 have longer survival time than women with age group > 70. The acceleration factors of patients with stage III and IV were estimated to be 0.56 and 0.35. This oblique that the stage I patients have longer survival time than stage III and IV.

The acceleration factor for women that had family history of cervical cancer is 0.63 which indicates that the survival of time of patients that had family history of cervical cancer shorten than patients not have family history. The estimated acceleration factor for patients who had history of abortion is estimated to be 0.62 with very small P-values (\(P~=~0.00\)) which indicate patients who had no history of abortion prolong the survival time than patients who had history of abortion. Patients not living with HIV AIDS prolong the survival time than patients living with HIV AIDS with acceleration factor 0.69. The acceleration factor for women married at age of 21–25 is estimated to be 5.72 and P-value is small (\(P=0.01\)) this indicates women who married at late age have prolonged survival time than women that married at age of \(\le \) 15 years. the acceleration factor for women give birth at age of 16–20 is estimated to be 1.22 and P-value is small (\(P=0.01\)) this indicates women who got first baby at age of 16–20 have prolonged survival time than women that got the first baby at age of \(\le \) 15 years. The acceleration factor for women not giving birth up to the study ends is estimated to be 2.95 with P-value (\(P=0.002\)) This indicates women not giving birth up to the study ends have prolonged survival time than women that got the first baby at age of \(\le \) 15 years (Table 2).

Table 2 Summary result for the final Log-normal inverse-Gaussian frailty models

To check the adequacy of baseline hazard, the exponential is plotted by the − log(S(t)) with the time of the study; the Weibull is plotted by log (− log(S(t))) with the logarithm of time of the study; the log-logistic is plotted by log(\(\frac{\hat{S}\left( t \right) }{1-\hat{S}\left( t \right) })\) with the logarithm of time of the study and the log-normal baseline by \(\phi ^{-1}\left[ {1-S\left( t \right) } \right] \) against log (t) of time of the study. If the plot is linear, the given baseline distribution is appropriate for this dataset (Fig. 1). The plot of log-normal was more linear than other plots.

Fig. 1
figure 1

Graphical evolution of Exponential, Weibull, Log logistic and Log normal assumption

The Cox-Snell residuals are one way to investigate how well the model fits the data. The plot for fitted model of residuals for log-normal to our data via maximum likelihood estimation with cumulative hazard functions (Fig. 2). If the model fits the data, the plot of cumulative hazard function of residuals against Cox-Snell residuals should be approximately a straight line with slope 1. The plot makes straight lines through the origin for log-normal baseline distribution suggesting that it is appropriate for survival time of cervical cancer patients’ data set.

Fig. 2
figure 2

Cox- Snell residuals plots of log-normal distribution for survival time of patients’

A quantile–quantile plot is made to check if the accelerated failure time provided an adequate fit to the data by using two different groups of population. We checked the adequacy of the model by comparing the significantly different groups of patients by family history, smoking cigarettes, HIV status and abortion history. The figures appear to be approximately linear for all covariates family history, smoking status, HIV status and abortion history of patients (Fig. 3). Therefore the accelerated failure time appears to be the best to describe survival time of cervical cancer data set.

Fig. 3
figure 3

Quantile–quantile plot to check the adequacy of the accelerated failure time mode

4 Discussion

The main purpose this study was assesing the determinant of survival time of cervical cancer dataset which was obtained from Tikur Anbessa Specialized Hospital. The comparison of distribution of the model was performed using the AIC criteria, where the mode with minimum AIC is accepted to be the best [10]. Accordingly, Log Log-normal Inverse-Gaussian model to describe the cervical cancer dataset. This finding is in accordance with the studies [17,18,19,20,21,22] with regards the Log-normal Inverse-Gaussian model.

The numbers of cervical cancer cases were not related to population size of the regions in Ethiopia at the study period. The number of cases in other region of the country depends on their distance from Addis Ababa; the farther the region, the fewer number of cases. Because patients may not got the opportunity to go and be diagnosed. Those patients who are relatively near to the hospital such as patients in Addis Ababa, Oromiya and Amhara are able to go the hospital and diagnosed. But, those patients who are far from the hospital are not able to go the hospital easily due to many reasons like transportation cost, accommodation, job and family related cases. Similar finding was obtained from [23, 24].

Patients treated at old age were significantly associated to shorten to time death of cervical cancer. This implied survival time for cervical cancer patients was highest in the youngest women and decreases with increasing age. The current study is consistent with other findings of [25,26,27,28] reported that increases in age have been related to poorer survival time of cervical cancer.

The results of this study suggested that smoking was significant predictive factor for survival time of the patients. Non-smokers had longer survival time than smokers. The current study is consistent with other findings of [29,30,31].

Accordingly the stage of the disease also had significant association to survival timing of patients. Stage I and Stage II patients had long survival time than stage III and IV patients. This is due to early to an early stage cervical cancer can simply be cured by available cancer treatments. This result is consistent with [29, 30].

Family history of cervical cancer is also one factor that significantly predicts the survival of the patients. In this study, family history is significantly associated with survival time cervical cancer patients. The current study is consistent with other findings [32] reported that women who have a mother or sister diagnosed with cervical cancer have a greater risk of developing this cancer than women without any cervical cancer family history

The history of abortion also had significant association to survival time of cervical cancer patients’. Those women who had no history of abortion had long survival time than those who had abortion history. This result is consistent with [33] reported that previous abortion plays a role in cervical cancer. And also a study by Ibrahim et al. [34] in Danish found a significantly positive trend between those women had abortion and the death of cervical cancer.

The results of this study indicated that status of HIV AIDS was significant predictive factor for survival time of cervical cancer patients in TASH. Women who lived with HIV AIDS had smaller survival time than women who are free from HIV. This happens since cervical cancer makes CD4 counts lower and they are more likely to die. Somers [35] and Abel et al. [36] found a significantly positive trend between those women having had HIV and the death of cervical cancer.

The results of the study revealed that having first baby at early age (\(\le 15\)) years has a significant association with survival time of cervical cancer patients. Women who has had first child at early age had smaller survival time. Similarly Similar finding was obtained from [37] reported having your first baby before the age of 16 gives a higher risk, compared to women who had their first baby after the age of 25.This result is also in accordance with the studies [38, 39].

5 Conclusions

This study used survival time of cervical cancer patients’ dataset of those patients who started their cancer treatment from 2003–2007 years with the aim of assessing the determinant of time-to-death of women with cervical cancer in Tikur Anbessa specialized hospital. Out of the total 907 women who started cancer medicine (treatments), about 38.48% died at the end of the study. The estimated mean survival time of women was 26 months.

To model the determinants of survival time of cervical cancer patients, Cox Ph model was used. Then parametric frailty models were fitted because the assumption of Cox proportional model was violated. Different parametric frailty models by using different baseline distributions were applied. Among this using AIC, Log-normal Inverse-Gaussian model is better fitted survival time of cervical cancer patients’ dataset among various parametric frailty models.

The result of Log-normal Inverse-Gaussian model showed that age, smoking status, stage, family history, abortion history, HIV status, age at first marriage and age at first birth were found significant predictors for survival time of patients in Tikur Anbessa specialized hospital. Of which married after 20 years and not giving birth up to study the end of time prolong the timing death of cervical cancer patients. Similarly age classes 51–60, 61–70 and > 70, smoking cigarettes, stage III and IV, family history of cervical cancer, abortion history, HIV status and having first baby at early age (\(\le \) 15) were statistically significantly shorten timing of death of women with cervical cancer.