Methods
Data collection and utilization
The 2013 National Demography and Health Survey (NDHS) dataset was used for the implementation of the model fit. Data collection procedure involved a multi-stage cluster sampling technique. Prior to the survey, Nigeria was demarcated into smaller units regarded as enumeration areas (EAs) called clusters. This demarcation takes into consideration of the state boundaries to prevent merging of clusters within states. The respondents were selected from each cluster based on rural–urban allocation of specific numbers of clusters in the country. The current study used individual recode data with the information provided by women of childbearing age (15–49 years). Further information about the sampling strategy used for data collection can be accessed in the data originator’s website [15].
Data management
The outcome variable of interest was fertility which was measured by the number of children ever born (CEB), obtained from a total sample of 38,948 women. The data were weighted and the clustering effect was adjusted for in the various count models but unweighted for the skewness test and descriptive summaries of children (Additional file 1). To examine the correlation between CEB and background characteristics of women, a pairwise correlation test based on Bonferroni correction [17] for each region was conducted, 12 variables were used for the model fit: residence, women educational level, religion, ethnicity, wealth index, contraceptive use, currently residing with partner, number of other wives, age at first sex, husband educational level, women working status and husband/partners’ age. All these independent variables were retained for North Central and North West. For South East, South South and South West, residing with partner, number of wives, partner’s education was removed with an additional variable, women work status excluded for North East due to collinearity. All analyses were performed using Stata 15.0 at 0.05 level of significance.
Generalized linear models
Poisson model
The most common technique employed to model count data is Poisson regression. It has a usual feature of equality of mean and variance. Its probability mass function is given as:
$${\text{Pr}}\left( {{\text{Y}} = {{\text{y}}_{\text{i}}}{\text{|}}\mu } \right)= \frac{{{{\text{e}}^{{{ - }}\mu }}{\mu ^{{{\text{y}}_{\text{i}}}}}}}{{{{\text{y}}_{\text{i}}}{\text{!}}}};~{{\text{y}}_{\text{i}}}{\text{ = 0}},{\text{1}},{\text{2}}, \ldots$$
(1)
Where \({\text{y}}_{\text{i}}\) denote the random variable of the count response, that is, number of children ever born [18, 19].
Negative binomial model
The negative binomial (NB) distribution is a two-parameter distribution combining the Poisson distribution and the Gamma distribution (Gamma–Poisson mixture). It relaxes the assumption of equality of mean and variance, thus accounting for unobserved heterogeneity in count data [19,20,21,22]. Its probability mass function is given as:
$$Pr\left( {{\text{y}}_{\text{i}} {\text{|}} {{\mu }},\alpha } \right) = \frac{{\varGamma \left( {\alpha^{ - 1} + {\text{y}}_{\text{i}} } \right)}}{{\varGamma \left( {\alpha^{ - 1} } \right)\varGamma \left( {{\text{y}}_{\text{i}} + 1} \right) }} \left( {\frac{{\alpha^{ - 1} }}{{\alpha^{ - 1} + {{\mu }}}}} \right)^{{\alpha^{ - 1} }} \times \left( {\frac{{{\mu }}}{{{{\mu }} + \alpha^{ - 1} }}} \right)^{{{\text{y}}_{\text{i}}}} .$$
(2)
The mean and variance of the negative binomial distribution are E [y|µ, α] = µ and V [y|µ, α] = µ (1 + αµ). Where α is the dispersion parameter (if α > 0 and µ > 0). Special cases of the negative binomial include the Poisson (α = 0) and the geometric (α = 1) [19].
Zero-inflated models
For the zero-inflated Poisson (ZIP), the first process consist of a Poisson distribution that generates counts, some of which may be zero-sampling zero, and the second process is governed by binary distribution (logit or probit) for zero values-structural zeros [23]. Given variable yi, The ZIP model probability mass function has two model components as follows:
$$\Pr \left( {y_{i} |\mu _{i} } \right) = \left\{ {\begin{array}{*{20}l} {{\text{p}}_{{\text{i}}} + \left( {1 - {\text{p}}_{{\text{i}}} } \right)\exp \left( { - \mu _{{\text{i}}} } \right),} & {{\text{y}}_{{\text{i}}} = 0,0 \le p \le 1} \\ {\frac{{\left( {1 - {\text{p}}} \right)\exp \left( { - \mu _{{\text{i}}} } \right)\mu _{{\text{i}}}^{{{\text{y}}_{{\text{i}}} }} }}{{{\text{y}}_{{\text{i}}} !}}}, & {{\text{y}}_{{\text{i}}} \ge 1} \\ \end{array} } \right.$$
(3)
The outcome variable \(y_{i}\) is a non-negative integer, \(\mu_{i}\) is the expected Poisson count for the ith individual; \(p\) is the probability of extra zeros.
Similarly to the ZIP, the zero-inflated negative binomial (ZINB) model is employed to account for both over-dispersion and excess zero problems. For dependent variable yi with many zeros, the ZINB model probability mass function is given as:
$$\Pr \left( {y_{i} |\mu _{i} ,\alpha } \right) = \left\{ {\begin{array}{*{20}l} {p_{i} + \left( {1 - p_{i} } \right)\left( {1 + \alpha \mu _{i} } \right)^{{ - \alpha ^{{ - 1}} }} }, & {0 < p < 1} \\ {\left( {1 - p_{i} } \right)\frac{{\Gamma \left( {y_{i} + \frac{1}{\alpha }} \right)\left( {\alpha \mu _{i} } \right)^{{y_{i} }} }}{{y_{{i!}} {\text{ }}\Gamma \left( {\frac{1}{\alpha }} \right)1 + \alpha \mu ^{{y_{i} + \frac{1}{\alpha }}} }}} , & {y_{i} > \alpha } \\ \end{array} } \right.$$
(4)
where α ≥ 0 is an over-dispersion parameter [22].
Hurdle models
In the hurdle Poisson (HP) model, the first part is the hurdle at zero, which addresses the “few” or “more” zero outcome than the distributional assumption of the Poisson model and the second part governs the truncation part or positive outcomes [2, 19, 23]. Given a variable \(y_{i}\). the HP probability distribution is given as:
$$\Pr \left( {y_{i} = 0} \right) = 1 - p, \quad 0 \le p \le 1$$
$$\Pr \left( {Y = y_{i} } \right) = p\frac{{\exp \left( { - \mu_{i} } \right)\mu_{i}^{{y_{i} }} }}{{y_{i} !}}, \mu > 0;\quad y_{i} = 1,2, \ldots$$
(5)
where µ is the mean of the Poisson model, when \(\left( {1 - p} \right) > { \exp }\left( { - \mu } \right)\), the data contain more zeros relative to the Poisson model.
The hurdle negative binomial (HNB) is used when the hurdle model is appropriate and the data exhibit over-dispersion [19, 24]. The HNB model is given as:
$$\Pr \left( {y = 0} \right) = 1 - p, \quad 0 \le p \le 1$$
$${ \Pr }\left( {\text{Y = y}} \right) = \frac{\text{p}}{{ 1- \left( {\frac{\text{r}}{{\mu {\text{ + r}}}}} \right)^{\text{r}} }}\frac{{\varGamma ( {\text{y + r)}}}}{{\varGamma \left( {\text{r}} \right){\text{y!}}}}\left[ {\frac{\mu }{{\mu {\text{ + r}}}}} \right]^{\text{y}} \left[ {\frac{\text{r}}{{\mu {\text{ + r}}}}} \right]^{\text{r}} ,\quad {\text{ r,}}\;\mu \;{ > }\; 0 ;\;{\text{y = 1,2}} \ldots$$
(6)
The mean and variance of the HNB distribution are given as µ and µ (1 + µ/r) respectively, the quantity µ(1 + µ/r) is a measure of dispersion [22].
Model assessment and evaluation
The model selection criterion was based on the maximum likelihood estimates of the model parameter, using the log-likelihood and the Information Criterion (IC)—Akaike (AIC) and Bayesian (BIC). A lower IC value implies that the model is of better fit [25, 26]. An IC values with difference greater than 10 implies that the model with a smaller IC is superior, a value difference of 4 to 10 suggest a moderate superiority of one model against the other and an IC value differences less than 4 implies that the competing models are said to be indistinguishable [26].