1 Introduction

Malaria is a life-threatening disease caused by parasites that are transmitted to people through the bites of infected female Anopheles mosquitoes. About 3.2 billion people, almost half of the world’s population, are at risk of malaria [1]. According to 2015 WHO report, there were 214 million new cases of malaria worldwide in 2015 (range 149–303 million). The African Region accounted for most global cases of malaria (88%), followed by the South-East Asia Region (10%) and the Eastern Mediterranean Region (2%) [2].

It has been estimated that approximately 68% of the Ethiopian population live in areas below 2000 m of altitude and, thus, are considered to be at risk of malaria. This makes malaria the number one health problem in Ethiopia with an average of 5 million cases per year. Plasmodium falciparum and plasmodium vivax are the most dominant malaria parasites in Ethiopia [3, 4]. The disease causes 70,000 deaths each year and accounts for 17% of outpatient visits to health institutions [5].

Anemia, defined as a low blood hemoglobin concentration, has been shown to be a public health problem that affects low, middle and high income countries and has significant adverse health consequences, as well as adverse impacts on social and economic development. Anemia may result from a number of causes, with the most significant contributor being iron deficiency. Approximately 50% of cases of anemia are considered to be due to iron deficiency, but the proportion probably varies among population groups and in different areas, according to the local conditions. Other causes of anemia include acute and chronic infections (e.g. malaria, cancer, tuberculosis and HIV) [6].

The malaria parasites, entering the blood after an infective mosquito bite, infect red blood cells. At the end of that infection cycle, red blood cell ruptures. This process lowers the amount of red blood cells and can in a severe stage cause severe anemia [1]. On the African continent, where Plasmodium falciparum is the most prevalent human malaria parasite, anemia is responsible for about half of the malaria-related deaths [7]. Therefore, knowing the relation between malaria and anemia could have a great contribution to the development of prevention strategies. Although different studies have been conducted on the prevalence of anemia and malaria and to identify the association between them, still not much have been done to jointly model the prevalence of malaria and anemia and show the relationship between malaria and anemia. So this study is intended to jointly model the prevalence of malaria and anemia by employing a bivariate probit model.

2 Methodology

2.1 Data

The study was conducted on patients visiting Alaba Health Center, Alaba District, Southern Ethiopia, for medical examination from November to December 2016. The data was obtained from 384 study participants.

2.2 Bivariate Probit Model (BPM)

The bivariate probit model is a joint model for two binary dependent variables whose disturbances are assumed to be correlated. It generalizes the index function model from one latent variable to two latent variables that may be correlated.

Let \(Y_1^*\) and \(Y_2^*\) be two latent variables. A latent variable is a variable that is incompletely observed. Latent variables can be introduced into binary outcome models as an index of an unobserved propensity for the event of interest to occur [8].

Define the unobserved latent variables

$$\begin{aligned}&Y_1^*=X_1 \beta _1 +\varepsilon _1 \end{aligned}$$
(1)
$$\begin{aligned}&Y_2^*=X_2 \beta _2 +\varepsilon _2 \end{aligned}$$
(2)

where \(\varepsilon _1 \) and \(\varepsilon _2 \) are joint normal with means zero, variances one, and correlation \(\rho \).

$$\begin{aligned} \left\{ {{\begin{array}{l} {\varepsilon _1 } \\ {\varepsilon _2 } \\ \end{array} }{|}X} \right\} \sim N\left( {\left[ {{\begin{array}{l} 0 \\ 0 \\ \end{array} }} \right] \left[ {{\begin{array}{ll} 1&{} \rho \\ \rho &{} 1 \\ \end{array} }} \right] } \right) \end{aligned}$$
(3)

Then the bivariate probit model specifies the observed outcomes to be

$$\begin{aligned} Y_1 =\left\{ {{\begin{array}{l} {1\quad if\,Y_1^*>0} \\ {0,\quad otherwise} \\ \end{array} }} \right. \end{aligned}$$
(4)
$$\begin{aligned} Y_2 =\left\{ {{\begin{array}{l} {1\quad if\,Y_2^*>0} \\ {0,\quad otherwise} \\ \end{array} }} \right. \end{aligned}$$
(5)

The bivariate probit model can be written as

$$\begin{aligned} P\left( {y_1 =i,y_2 =j} \right) ={\Phi }_2 \left( {X_1^{\prime } \beta _1 ,X_2^{\prime } \beta _2 ,\rho } \right) \end{aligned}$$
(6)

2.2.1 Estimation of Parameter Coefficients

The coefficients (\(\beta _1 , \beta _2\,\hbox {and}\,\rho \) ) can be estimated using the maximum likelihood estimation.

First let’s define some function: let \(q_1 =2y_1 -1\) and \(q_2 =2y_2 -1\). Thus, \(q_i =1\) if \(y_i =1\) and \(-1\) if \(y_i =0\) for \(\mathrm{i}= 1 and 2\).

The probabilities that enter the likelihood function are

$$\begin{aligned} P\left( {Y_1 =y_i ,Y_2 =y_j |X_1 X_2 } \right) ={\Phi }_2 \left( {q_1 X_1^{\prime } \beta _1 ,q_2 X_2^{\prime } \beta _2 ,q_1 q_2 \rho } \right) \end{aligned}$$
(7)

The likelihood function is given as

$$\begin{aligned}&L=\prod P\left( {Y_1 =y_i ,Y_2 =y_j } \right) =\prod {\Phi }_2 \left( {q_1 X_1^{\prime } \beta _1 ,q_2 X_2^{\prime } \beta _2 ,q_1 q_2 \rho } \right) \end{aligned}$$
(8)
$$\begin{aligned}&L=\prod \left\{ P\left( {Y_1 =1,Y_2 =1} \right) +P\left( {Y_1 =1,Y_2 =0} \right) P\left( {Y_1 =0,Y_2 =1} \right. \right) \nonumber \\&\quad \left. P\left( {Y_1 =0,Y_2 =0} \right) \right\} \end{aligned}$$
(9)

Consider the latent variables and the observed outcomes on Eqs. (1), (2), (4) and (5). From these equations we can get the following relations.

$$\begin{aligned}&Y_1 =1 \, \hbox {If} \,\, \varepsilon _1 > \, -X_1 \beta _1 \, \hbox {and} \, Y_1 =0 \, \hbox {if} \, \varepsilon _1 < \, -X_1 \beta _1 \end{aligned}$$
(10)
$$\begin{aligned}&\hbox {Also} \, Y_2 =1 \,\, \hbox {if} \, \varepsilon _2 > \, -X_2 \beta _2 \, \hbox {and} \, Y_2 =0 \, \hbox {if} \, \varepsilon _2 < \, -X_2 \beta _2 \end{aligned}$$
(11)

We can write the likelihood as

$$\begin{aligned} L=\prod \left\{ {{\begin{array}{l} {P\left( {\varepsilon _1> \, -X_1 \beta _1 ,\varepsilon _2> \, -X_2\beta _2 } \right) +P\left( {\varepsilon _1<-X_1 \beta _1 ,\varepsilon _2> \, -X_2 \beta _2 } \right) } \\ {\;\;+\,P\left( {\varepsilon _1 >-X_1 \beta _1 , \,\varepsilon _2<-X_2 \beta _2 } \right) +P\left( {\varepsilon _1<-X_1 \beta _1 ,\,\varepsilon _2 <-X_2 \beta _2 } \right) } \\ \end{array} }} \right\} \nonumber \\ \end{aligned}$$
(12)

Then the log likelihood is

$$\begin{aligned}&ln \, L=\sum ln P\left( {Y_1 =y_i ,Y_2 =y_j } \right) =\sum ln {\Phi }_2 \left( {q_1 X_1^{\prime } \beta _1 ,q_2 X_2^{\prime } \beta _2 ,\rho } \right) \end{aligned}$$
(13)
$$\begin{aligned}&ln \, L=\sum \left\{ {{\begin{array}{l} {ln P\left( {\varepsilon _1> \, -X_1 \beta _1 ,\varepsilon _2> \, -X_2 \beta _2 } \right) +lnP\left( {\varepsilon _1<-X_1 \beta _1 ,\varepsilon _2>-X_2 \beta _2 } \right) } \\ {\;\;+\,ln P\left( {\varepsilon _1 >-X_1 \beta _1 , \,\varepsilon _2<-X_2 \beta _2 } \right) +ln P\left( {\varepsilon _1<-X_1 \beta _1 , \,\varepsilon _2 <-X_2 \beta _2 } \right) } \\ \end{array} }} \right\} \nonumber \\ \end{aligned}$$
(14)
$$\begin{aligned}&ln \, L=\sum \left\{ {{\begin{array}{l} {ln \, {\Phi }_2 \left( {X_1^{\prime } \beta _1 ,X_2^{\prime } \beta _2 ,\rho } \right) +ln \, {\Phi }_2 \left( {-X_1^{\prime } \beta _1 ,X_2^{\prime } \beta _2 ,-\rho } \right) } \\ \;\;+\,{{\Phi }_2 \left( {X_1^{\prime } \beta _1 ,-X_2^{\prime } \beta _2 ,-\rho } \right) +ln \, {\Phi }_2 \left( {-X_1^{\prime } \beta _1 ,-X_2^{\prime } \beta _2 ,\rho } \right) } \\ \end{array} }} \right\} \end{aligned}$$
(15)

So we can get the estimator for the coefficients by maximizing the log likelihood function.

2.2.2 Seemingly Unrelated Bivariate Probit Model

In this study it is also considered the application of the seemingly unrelated bivariate probit model. Seemingly unrelated bivariate probit model is used when two equations are to be estimated and the dependent variable of one of them is an explanatory variable in the other.

The latent variable for the seemingly unrelated bivariate probit model is given as:

$$\begin{aligned}&Y_1^*=X_1 \beta _1 +\varepsilon _1 \end{aligned}$$
(16)
$$\begin{aligned}&Y_2^*=\gamma Y_1 +X_2 \beta _2 +\varepsilon _2 \end{aligned}$$
(17)

The estimation procedures are the same except including \(\gamma Y_1 \) in this model.

2.2.3 Marginal Effects

After we estimate the parameters we have to consider the marginal effects of the covariates in the conditional distribution [10, 11]. Marginal effects determine the magnitude of change of the conditional probability of the outcome variable when you change the value of a regressor, holding all the regressors constant at some value [9]. The marginal effect for the bivariate probit model is then given by

$$\begin{aligned} \frac{\partial {\Phi }_2 \left( {X_1^{\prime } \beta _1 ,X_2^{\prime } \beta _2 ,\rho } \right) }{\partial X_i }= \varphi \left( {X_i^{\prime } \beta _i } \right) {\Phi }_2 \left( {\frac{X_2^{\prime } \beta _2 -\rho X_1^{\prime } \beta _1 }{\sqrt{1-\rho ^{2}}}} \right) \beta _i , i =1, 2 \end{aligned}$$
(18)

The marginal effects for categorical variables shows how conditional probability changes as the categorical variable changes from 0 to 1, after controlling in some way for the other variables in the model.

3 Result and Discussion

3.1 Descriptive Statistics

At first a summary statistic was made for each of the variables in the study. The summary measure is important to get a picture about the variables included in this study. The frequency and the percentage of each variable are calculated. The result is shown in the Table 1.

Table 1 The frequency (percentage) of each variable included in the study

3.2 Chi Square Test of Association

Before we fit the probit model the relationship between the response and the explanatory variables is checked using a chi-square test. Chi-square test is used to see the association between each response variable, i.e. malaria and anemia, and the explanatory variables so that we can see whether the selected variables have a relation with the response variables. The result of the chi-square test is then shown in the Table 2.

Table 2 Chi-square test of association between the responses (malaria and anemia) and each explanatory variable

As can be seen on Table 2 the variables sex, age and education level are associated with malaria and the variables sex and education level are associated with anemia. Marital status is found to be not associated with both malaria and anemia and age is also not associated with anemia.

3.3 Univariate Analysis

A univariate probit regression is fitted for malaria and anemia to see the effect of each explanatory variable when fitted with the other explanatory variables. The results obtained from the univariate models are discussed in the next part. The first category of each variable is considered as a reference category.

3.3.1 Univariate Analysis for Malaria

The prevalence of malaria is fitted with the other explanatory variables to identify the factors that affect the prevalence of malaria. From the chi square test of association it was found that sex, age and education level are significant. Then all the variables are fitted using a probit model and the result obtained is shown in Table 3.

Table 3 The parameter estimates and standard errors of the univariate probit model for malaria

From Table 3, the result shows that sex, age, education level and marital status are significantly determining the prevalence of malaria. From the test of association marital status was found to be insignificant but when fitted with other variables it becomes significant.

3.3.2 Univariate Analysis for Anemia

The prevalence of anemia is fitted with the other explanatory variables to identify the risk factors. From the chi square test of association it was found that sex and education level are significant. Then all the variables are fitted using a probit model and the result obtained is shown in Table 4.

Table 4 The parameter estimates and standard errors of the univariate probit model for anemia

From the above table showing the univariate analysis of anemia, the result shows that malaria, sex and education level are significantly determining anemia. The variables age and marital status are not significant. This result shows that sex and education are the variables that are significantly affecting the prevalence of anemia whether fitted alone or together with other variables.

3.4 Bivariate Analysis for Malaria and Anemia

A bivariate probit model is used to jointly model the prevalence of malaria and anemia and the result is shown in the table below. The data was modeled first using a bivariate probit model and it was then fitted using the seemingly unrelated bivariate probit model. After that we can choose the model that best fits the data. Table 5 shows the bivariate analysis of malaria and anemia using the bivariate probit model.

Table 5 The parameter estimates and standard errors of the bivariate probit model for malaria and anemia

As can be seen on Table 5 the result shows that sex, age, education level and marital status are significant in explaining the prevalence of malaria. This means these variables: sex, age, education level and marital status have an impact on the distribution of malaria disease. And sex and education level are significant in explaining the prevalence of anemia. This means having a different sex and education level have a difference on being affected by anemia. The average marginal effect was also calculated and the result shows that females have a 7.95% lower chance of being infected with malaria than males. People in age group of 31–45, 46–60 and above 60 have respectively a 32.26, 45.44 and 49.06% lower chance of being infected with malaria than that in age group of 15–30. People with primary education and secondary education have respectively 21.16 and 15.97% lower chance of being infected with malaria than that of illiterate people. Married and divorced people have respectively a 9.8 and 24.55% higher chance of being infected with malaria than that of single individuals. The average marginal effects for anemia also shows that females have 8.15% higher chance of being anemic than that of males. People who can read and write, having primary education, secondary education and higher education have respectively 16.37, 14.9, 26.5 and 29.12% lower chance of being anemic than that of illiterate people.

The overall fit of the model is checked by the Wald test [Wald \(X^{2}\) (22) = 63.72, p value = 0.0000] and shows that the model is significant. It is also checked for the presence of correlation. The likelihood ratio test rejects the null hypothesis of zero correlation [\(X^{2}\)(1) = 12.8119, p value = 0.0003]. This means the correlation between the prevalence of malaria and anemia is significant.

The data was also modeled using a seemingly unrelated bivariate probit model. The result is shown in the Table 6. In this model we included only the significant variables for both the response variables.

Table 6 The parameter estimates and standard errors of the seemingly unrelated bivariate probit model for malaria and anemia

As can be seen from Table 6, all the variables are significant for malaria. That mean all this variables; age, sex, education level and marital status, have an impact on the distribution of malaria disease. For anemia, malaria, sex and education are significant. That is; malaria, sex and education have an impact on the spread of anemia disease. The average marginal effect was also calculated and the result shows that females have 8.6% lower chance of being infected with malaria than males. People in age group of 31–45, 46–60 and above 60 have respectively a 30.44, 43.84 and 48.82% lower chance of being infected with malaria than that in age group of 15–30. People with primary education and secondary education have respectively 20.48 and 15.89% lower chance of being infected with malaria than that of illiterate people. Married and divorced people have respectively a 10.27 and 36.47% higher chance of being infected with malaria than that of single individuals. The average marginal effects of anemia shows that people with malaria have 46.62% higher chance of becoming anemic as compared to that of malaria free individuals. Females have 10.44% higher chance of being anemic than that of males. People who can read and write, having secondary education and higher education have respectively 11.17, 16.15 and 20.28% lower chance of being anemic than that of illiterate people.

The overall fit of the model is significant with Wald \(X^{2}\) (23) = 75.28 and p value = 0.0000. The likelihood ratio test fails to reject the null hypothesis of zero correlation [\(X^{2}\)(1) =0.8791, p value= 0.3484]. This means the correlation between the prevalence of malaria and anemia is not significant.

4 Discussion

The finding of this study shows that sex, age, education level and marital status are significant for malaria in both the univariate and the bivariate case. The study showed that females have the highest risk of malaria than males. Other study in Ethiopia also found the same result [12].

The result of this study also showed that there is a high risk of malaria among younger people than the older ones. This result is consistent with other studies. A study done in Ethiopia found that there is a high prevalence of malaria on younger age groups [12, 13].

Based on the result found from our study, education level is found to be significantly associated with malaria. A study done in Kenya also found that greater malaria risk is associated with lower level of education of individuals [14]. This study also showed that marital status is significantly associated with malaria. The result found that divorced people have the highest chance of being infected with malaria. This result is not consistent with other studies [15].

Malaria, sex and education level are found to be significantly associated with anemia in both the univariate and the bivariate models. According to this study people with malaria have the highest risk of being anemic than that of malaria free peoples. Another study in Ethiopia also found that malaria infected individuals have a high risk of being anemic than malaria free individuals [16].

Based on the result of this study females have the highest risk of anemia than males. A study done in south India also found that females have highest risk of anemia [17]. Another study in Ethiopia also found a consistent result [18]. Our finding also found that there is an association between anemia and education level. The result shows that illiterate people have the highest chance of anemia. Two studies done in India also found that people at lower education level have the highest risk of anemia [19, 20].

5 Conclusion

First a chi-square test was conducted to see the association between the prevalence of malaria and anemia and the explanatory variables and the result shows that the variables sex, age and education level are associated with malaria and the variables sex and education level are associated with anemia. Then a univariate probit model was fitted for both malaria and anemia to see the effect of each explanatory variable when fitted with the other explanatory variables. The result shows that sex, age, education level and marital status are significant for malaria. Anemia is significantly associated with malaria, sex and education level.

A bivariate probit model was also fitted using both bivariate probit model and seemingly unrelated bivariate probit model. The results of the bivariate probit model shows that sex, age, education level and marital status are significantly associated with malaria and sex and education level are significantly associated with anemia. The average marginal effect also shows that males, people at the age of 15–30, illiterate people and divorced people have the highest chance of being infected with malaria and females and illiterate people have the highest chance of being anemic. The likelihood ratio test shows that there is a positive correlation between malaria and anemia. The result of the seemingly unrelated bivariate probit model shows that the variables significant for malaria are the same as that of the bivariate probit results but for anemia, malaria, sex and education level are significant. Based on the result of the average marginal effects it was found that males, people at the age of 15–30, illiterate people and divorced people have the highest chance of being infected with malaria. Females and illiterate people have the highest chance of being anemic and people with malaria have the highest chance of becoming anemic as compared with people who are not infected with malaria. The likelihood ratio test shows that there is no correlation between malaria and anemia.