Study area
The study was conducted in Ethiopia which is a developing country, whose economy is entirely dependent on agriculture. Total population of the country was projected to be 102.3 million in 2016, with 83% living in rural areas [19]. Contraception use is the major preventive method for teenage pregnancy, but only 5.2% of married and sexually active teenagers use modern method of contraception in Ethiopia [10].
Data source and study population
We have used birth data set of EDHS 2016 for this study. The data set was accessed from the Measure DHS website (http://www.measuredhs.com). To conduct the 2016 EDHS, a two-stage stratified cluster sampling technique has been employed. In the first stage, enumeration areas were selected. Enumeration area is a geographic area consisting of a convenient number of dwelling units which served as a counting unit for the census. In the second stage, 28 households per enumeration area were selected with an equal probability systematic selection per enumeration area [23].
To reflect the community level effect; 645 clusters or enumeration areas were included in this study. A total of 2679 younger women aged 20 to 24 years who were interviewed for age at first birth at the time of the survey were included in this study. These age groups were selected due to full exposure to the risk of pregnancy before age 20, and those beyond 24 years of age were not included because they are more likely to suffer from memory lapse and event omission compared with younger females.
Important variables were selected by referencing the DHS recode-6 manual and questionnaire at the end of EDHS 2016 report. Furthermore, selections of households, validation procedure anddata quality assurance are available in detail elsewhere [23].
Study variables
Outcome variable
Teenage pregnancy
Teenage pregnancy: It is a composite binary outcome variable that refers to pregnancy experience of a woman before age 20 years. It was derived by concatenating age at first birth with age of the women at the time of the survey. Then it was categorized in such a way that 0 = no pregnancy before age 20 (that includes those who had their first pregnancy at age 20 or later, or never had pregnancy) and 1 = pregnancy experienced before the 20th birthday.
Independent variables
In this study, a two level variables (individual and community level) were considered as independent variables.
Individual level variables
Age at first marriage
Age at first marriage is defined as the age at which the respondent began living with her first spouse/partner. It was encoded into three categories. “Married before age 15”, “married at age 15-17” and “not married before 18”. Not married before 18 years of age includes those who had been married at age 18 and after; or were not married in their age of life.
Sexual experience
Sexual experience was encoded into three categories: “active before age 15”, “active at age 15-17” and “active at the age of 18 and above”.
Educational status of women
This variable has categories of “no education”, “primary”, “secondary” and “higher” in the 2016 EDHS. In the current study, provided a small number of cases in the higher category, it was merged to secondary level hence recoded to no education, primary and secondary or above categories.
Employment status of women
In the EDHS; employment status related data was collected as “no job” or as a list of different jobs. For this study, those lists of jobs were merged together and a dichotomized variable was generated as“ employed” and “not employed” regardless of the type of employment.
Media exposure
Watching television (TV), listening to radio and reading newspaper at least once a week were considered to measure exposure to media for both women and their partners’ in the 2016 EDHS. But the reading newspaper was not included in the current study because according to the 2016 EDHS most women (95%) in Ethiopia have no exposure to print media [10]. Therefore, for this study reading newspaper was not included. Consequently, a new variable was generated by concatenating the other two media sources (TV and Radio). The categories include: “Both at least once a week”, “Either at least once a week” and “No accesses at least once a week”.
Wealth index
The 2016 EDHS categorized wealth index with the national-level wealth quintiles (from lowest to highest) [10]. This variable was derived from the different assets of the households to assess the household cumulative wealth status. In the dataset, the categories for wealth index were presented as Poorest, Poorer, Middle, Richer, and Richest. In this study, by merging poorest with poorer and richest with richer a new variable was generated with “Poor”, “Middle” and “Rich” categories.
Religion
In the 2016 EDHS, religion has subcategories of Orthodox, Muslim, Protestant, Catholic, traditional followers and others. Since the former three were dominant of other religions with their frequency, they were encoded independently. Given that few number of Catholic and traditional religion followers, they merged to others.
Community level variables
Community-level variables were generated by aggregating the individual level data into cluster except for place of residence and geographical region that were taken as it is. In the 2016 EDHS, place of residence was one of the characteristics that helped in designing the sample to give population and health indicators at the national level. Region was the geographical location where population lives in that directly explain the community characteristics. It indicates the 11 administrative regional states of Ethiopia namely Tigray, Afar, Amhara, Oromiya, Somali, Benishangul-Gumuz, Southern Nations, Nationalities and Peoples (SNNP), Gambela, Harari, Addis Ababa and Dire Dawa. All of them were encoded independently.
Other community-level variables were obtained by aggregating the individual women characteristics into clusters. They were computed using the proportion of a given variables’ sub category we were concerned upon per cluster. Since the aggregate values for all generated variables have no meaning at the individual level, they were categorized into groups based on the national median values. Median values were used to categorize as high and low because all the aggregated variables were not normally distributed. Similar procedures were applied to all aggregate variables. Community contraception use was generated based on the proportion of women who ever use family planning in the clusters. It shows the overall family planning utilization in the community. Community unmet need for family planning (supply) was also created based on the proportion of women with unmet need for family planning in each cluster. Community educational status was also generated based upon the proportion of educated community in each cluster. It shows the overall female educational attainment in the community. Community poverty status was created based on the proportion of poor women within their cluster. Community media exposure was produced based on the individual response for exposure to radio and TV.
Statistical analysis
Multilevel regression analysis
In data with a nested structure like that of EDHS, the individual observations have some degree of correlation within a cluster because of common characteristics they share. As a result, when the correlation with the upper level is ignored and only the individual level characteristics are considered, it might lead to a violation of the assumption of independence between observations. This result in biased parameter estimates and will generally lead to underestimation of the standard errors and, produce spurious significant results and accordingly to incorrect conclusions on effect sizes [21]. In contrast, modeling group-to-group variation simultaneously with individual-to-individual variation in analysis has several advantages. It allows us to focus on the importance of both communities’ and individuals’ effects on individuals’ health outcome. By using the clustering information, it enables us to obtain statistically efficient estimates of regression coefficients [21, 22]. Therefore, to get the mixed effect (fixed effect for both the individual and community level factors and a random effect for the between cluster-variation), a two-level mixed-effect logistic regression analysis was used in this study. Thus, the log of the probability of being pregnant to a teenage was modeled in the following form;
$$ \log \left[\frac{\uppi_{\mathrm{ij}}}{1-{\uppi}_{\mathrm{ij}}}\right]={\upbeta}_0+{\upbeta}_1{\mathrm{X}}_{\mathrm{ij}}+{\upbeta}_2{\mathrm{Z}}_{\mathrm{ij}}+{\mathrm{u}}_{\mathrm{j}} $$
Where, i and j are the level 1 (individual) and level 2 (community or clusters) units, respectively; X and Z refer to individual and community-level variables, respectively; πij is the probability of being pregnant for the ith teenager in the jth community; the β’s are the fixed coefficients-therefore, for every one unit increase in X/Z (a set of explanatory variables) there is a corresponding effect on the probability of being pregnant to teenager. Whereas,β0 is the intercept –the effect on the probability of being pregnant to a teenager in the absence of influence of predictors; and uj shows the random effect (effect of the community on a teenager decision to become pregnant) for the jth community.
The clustered nature of the data, and the within and between community variations were taken in to account by assuming each community has different intercept (β0) and fixed coefficient (β).The amount of community variation was expressed as Intra-class Correlation Coefficient(ICC) computed as; \( \mathrm{ICC}=\frac{\updelta^2{\mathrm{u}}_0}{\updelta^2{\mathrm{u}}_0+\frac{\uppi^2}{3}} \) where, \( \frac{\pi^2}{3} \) denotes the variation within a cluster (individual level) and δ2u0 is the variation between clusters (communities).
We analyzed the data using STATA 12 (Stata Corp. Inc. Texas, USA). Since the sampling procedure for EDHS 2016 was complex; the selection procedure deviates from the assumption of randomness (simple random selection). To account this problem, the data were declared for its complexity and the variables were designated with information about the survey design before analysis. Moreover, sampling weights were done to proceed with the descriptive statistics such as frequency and proportions to adjust for non-proportional allocation of the sample to strata (urban and rural dwellings) and regions during the survey process.
Various model diagnostics were done for the final model. The study considered several factors which might modify the effect of each other on teenage pregnancy, the interaction effect was checked and there was no interaction effect between the covariates. Moreover, the multicollinearity (correlation of predictors with each other) was checked by using variance inflation factors (VIF) and no variable had VIF greater than 5, indicated the absence of significant collinearity among explanatory variables. Akaike’s Information Criterion (AIC) was used to choose a model that best explains the data and the model with low AIC value was taken. A test of how well the model explains the data (goodness of fit test) was checked by using Hosmer-Lemshow statistics and it was non-significant (prob> chi2 = 0.1270), indicating the model fits the data reasonably well. The predicting ability of the model (model accuracy) was evaluated using the Receiver Operating Characteristic (ROC) and it was 85.03%, indicating the model was good enough in differentiating pregnancy experienced from non-pregnancy experienced subjects correctly [24].
Statistically significance association was tested using Wald statistics, with results p-values less than 0.05 were considered statistically significant. The results of fixed effects were presented as adjusted odds ratio (AOR) at their 95% confidence interval (95% CI).