The value of mHealth for managing chronic conditions


Chronic conditions place a high cost burden on the healthcare system and deplete the quality of life for millions of Americans. Digital innovations such as mobile health (mHealth) technology can be used to provide efficient and effective healthcare. In this research we explore the use of mobile technology to manage chronic conditions such as diabetes and hypertension. There is ample empirical evidence in the healthcare literature showing that patients who use mHealth observe improvement in their health. However, an analytical study that quantifies the benefit of using mHealth is lacking. The benefit of using mHealth depends on many factors such as the current health condition of the patient, pattern of disease progression, frequency of measurement and intervention, the effectiveness of intervention, and the cost of measuring. Stochastic modeling is a suitable approach to take these factors into consideration to evaluate the benefit of mHealth. In this paper, we model the disease progression with the help of a Markov chain and quantify the benefits of measuring and intervention taking into consideration the above-mentioned factors. We compare two different modes for measuring and intervention, mHealth mode and conventional office visit mode, and evaluate the impact of these factors on health outcome.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. We note that we also tried the k-step transition matrix as \(P_{r}P_{w}^{k-1}\). However, this approach was not accurate especially when the absorption time is not a multiple of k.


The first author would like to thank Dr. R. A. Ramanujan, Diabetic Care Associates, Binghamton, NY, for sharing his experience in developing and using a well-integrated, mHealth application to treat over 2000 of his chronic disease patients. We would like to thank the reviewers for their constructive comments. The order of authors is alphabetical.

Correspondence to Saligrama Agnihothri.


Appendix A: Markov model for gaps in intervention

In Section 3 we assume that there is an intervention every time a patient measures. This assumption can be easily relaxed by expanding our state space. Assume that there is an intervention only when the patient is in state 2. We consider one more dimension to our current state space. The second dimension denotes whether there is an intervention. The state at period n is given by {Xn, Yn}, where Xn ∈ {1, 2, 3}, as before, Yn = 0 if there is no intervention, and Yn = 1 if there is an intervention. Note that there is no intervention when in states 1 and 3. Therefore, Yn = 0. That is, P(Xn+ 1, 1|Xn) = 0 for Xn+ 1 = {1, 3} and Xn = {1, 2, 3}. Hence, we have 4 possible states: (1, 0), (2, 0), (2, 1), and (3,0).

Then the patient’s immediate transition over the next base period follows the transition probability matrix Pm, given by

$$\begin{array}{@{}rcl@{}} P_{m} = \left[\begin{array}{cccccccccc} 1- a & 0 & a & 0\\ b & 1- b - a & 0 & a \\ \beta_{m} b & \frac{1 - \beta_{m} b - e_{m} a}{2} & \frac{1 - \beta_{m} b - e_{m} a}{2} & e_{m} a \\ 0 & 0 & 0 & 1 \end{array}\right]. \end{array} $$

In Eq. 14, note that we accommodate the possibility that the intervention (for a patient who just got sick and transitioned to state 2) happens over multiple base periods, especially if the intervention is not effective. If the patient stays in (2,1), they continue to receive intervention and therefore increase their chance of getting back to state 1 (from which they originally transitions from). Of course, in reality, not everyone at state 2 might be continuously eligible for such interventions and hence we also let them transition to state (2,0) which means no more interventions as long as they are in state 2. We have arbitrarily chosen the 50-50 split between (2, 0) and (2, 1). This can of course be varied.

In case measurements are taken only every k periods, since we already consider a second dimension in the state space, let the second dimension denote either there is an intervention or there is going to be one during the next measurement period. Hence, we can still do with just 4 possible states: (1, 0), (2, 0), (2, 1), and (3, 0). State (2, 1) denotes that the patient received an intervention if he measures (in the case of Pm matrix) and will receive an intervention when he measures next at the k th period (in the case of Pw matrix). Pr in this case will be simply Pm for one period and the following modified Pw for the remaining k − 1 periods,

$$\begin{array}{@{}rcl@{}} P_{w} = \left[\begin{array}{cccccccccc} 1- a & 0 & a & 0\\ b & 1- b - a & 0 & a \\ b & \gamma (1- b - a) & (1-\gamma) (1- b - a) & a \\ 0 & 0 & 0 & 1 \end{array}\right]. \end{array} $$

In Eq. 15, states (2, 0) and (2, 1) are treated the same when it pertains to transitions to states 1 and 3. The same state transition for state (2, 0) is also unchanged, implying that these patients do not receive any interventions as long as they are in state 2. For patients moving to state (2,1) they will get intervention if they measure in the next period. But, with probability γ(1 − ba) they go to state (2, 0). If we make γ to be 0, then patients in regular mode will all get an intervention in the next period if they are in state 2.

Appendix B: Distribution of first passage time

In this section we present the distribution of the first passage time for a healthy (State 1) patient to the absorbing state. We first restate a result pertaining to the distribution of the absorption time for general birth and death processes, from Theorem 1.2 of [57]:

Theorem 4

Consider a discrete birth and death chain with transition kernelPon the state space {0, … , d} started at 0, whered is an absorbing state, and suppose that the other birthprobabilities\(p^{*}_{i}, 0\leq i \leq d-1\), and death probabilities\(q_{i}^{*}, 1\leq i \leq d-1\)arepositive. If\(p^{*}_{i-1}+q^{*}_{i} < 1\)for1 ≤ id, then the absorption time starting from state 0 has the probability generating function

$$\begin{array}{@{}rcl@{}} f(u) = \prod\limits_{j = 0}^{d-1}\frac{(1-\theta_{j})u}{1-\theta_{j} u}, \end{array} $$

where − 1 ≤ θj < 1 are the d nonuniteigenvalues ofP.

Note that Theorem 4 implies that the absorption time starting from state 0 can be expressed as the sum of d independent geometric random variables each with parameter 1 − θj, since this sum has a probability generating function of the form of f.

We begin by finding the distribution of the absorption time for a Markov chain following transition matrix Pw, by calculating the eigenvalues of Pw and applying Theorem 4. If a + b < 1, then the random time to absorption starting from state 1, denoted as \(\tilde {T}(1)\), has a distribution of the form

$$\begin{array}{@{}rcl@{}} P(\tilde{T}(1)=n) &=& \frac{(1-p_{1})(1-p_{2})(p_{2}^{n-1}-p_{1}^{n-1})}{p_{2}-p_{1}},\\ &&{}\text{ for} n\geq 2 \text{and if} p_{1} \neq p_{2}, \end{array} $$
$$\begin{array}{@{}rcl@{}} &&{}=\frac{(n-1)(1-p_{1})(1-p_{2})p_{2}^{n-1}}{p_{1}},\\ &&{}\text{ for} n\geq 2 \text{if} p_{1} = p_{2} \end{array} $$


$$\begin{array}{@{}rcl@{}} p_{1} = \frac{1}{2}(2a+b+\sqrt{b(4a+b)}), \end{array} $$
$$\begin{array}{@{}rcl@{}} p_{2} = \frac{1}{2}(2a+b-\sqrt{b(4a+b)}). \end{array} $$

With this result, we can easily derive the distribution of the absorption time starting from state 1, for patients using mHealth mode or office visits, since these processes are Markov chains with transition probabilities of a similar form as Pw. The distribution of the absorption time for a patient using mHealth mode following transition matrix Pm is given by replacing a by ema and b with βmb in Eqs. 16 and 17. The distribution of the absorption time for a patient using office visits following transition matrix \(\tilde {P}\) defined in Eq. 3 is given by replacing a by a(er(k − 1)/k + em/k) and b with b(βr(k − 1)/k + βm/k) in Eqs. 16 and 17.

Appendix C: Proofs of results in Section 3

Proof of Lemma 1

We construct the coupling. We generate a sequence of uniform random variables, denoted by {Un}. Given Xn and Un, we define Xn+ 1 by

$$\begin{array}{@{}rcl@{}} X_{n + 1}&=&2 \text{ if} X_{n}= 1 \text{and} 0\leq U_{n} \leq A,\\ &=&1 \text{ if} X_{n}= 1 \text{and} A\leq U_{n} \leq 1,\\ X_{n + 1}&=&3 \text{if} X_{n}= 2 \text{and} 0\leq U_{n} \leq A,\\ &=&2 \text{ if} X_{n}= 2 \text{and} A\leq U_{n} \leq 1-B,\\ &=&1 \text{ if} X_{n}= 2 \text{and} 1-B\leq U_{n} \leq 1. \end{array} $$

Given Yn, we generate Yn+ 1 in a similar fashion:

$$\begin{array}{@{}rcl@{}} Y_{n + 1}&=&2 \text{ if} Y_{n}= 1 \text{and} 0\leq U_{n} \leq A^{\prime},\\ &=&1 \text{ if} Y_{n}= 1 \text{and} A^{\prime}\leq U_{n} \leq 1,\\ Y_{n + 1}&=&3 \text{ if} Y_{n}= 2 \text{and} 0\leq U_{n} \leq A^{\prime},\\ &=&2 \text{ if} Y_{n}= 2 \text{and} A^{\prime}\leq U_{n} \leq 1-B^{\prime},\\ &=&1 \text{ if} Y_{n}= 2 \text{and} 1-B^{\prime}\leq U_{n} \leq 1. \end{array} $$

We proceed by induction. Assume that XnYn, then we must show that Xn+ 1Yn+ 1. We divide our analysis into subcases depending on the values of Xn, Yn, and Un, and show that in each subcase, Xn+ 1Yn+ 1. Note that 0 ≤ A′ ≤ A ≤ 1 − B′ ≤ 1 − B ≤ 1. The inequality A ≤ 1 − B′ is true because 1 − B′ − A ≥ 1 − B′ − A′ ≥ 0, where the first inequality follows from the fact that AA′, and the second inequality follows from the fact that 1 − B′ − A′ is a probability in the P′ transition matrix. Therefore, there are five possibilities for the value of Un:

  • Case I: 0 ≤ UA′.

  • Case II: A′ ≤ UA.

  • Case III: AU ≤ 1 − B′.

  • Case IV: 1 − B′ ≤ U ≤ 1 − B.

  • Case V: 1 − BU ≤ 1.

If Xn = 1 and Yn = 1, then Xn+ 1 = 2 and Yn+ 1 = 2 in Case I, Xn+ 1 = 2 and Yn+ 1 = 1 in Case II, Xn+ 1 = 1 and Yn+ 1 = 1 in Case III, Xn+ 1 = 1 and Yn+ 1 = 1 in Case IV, and Xn+ 1 = 1 and Yn+ 1 = 1 in Case V.

If Xn = 2 and Yn = 1, then Xn+ 1 = 3 and Yn+ 1 = 2 in Case I, Xn+ 1 = 3 and Yn+ 1 = 1 in Case II, Xn+ 1 = 2 and Yn+ 1 = 1 in Case III, Xn+ 1 = 2 and Yn+ 1 = 1 in Case IV, and Xn+ 1 = 1 and Yn+ 1 = 1 in Case V.

If Xn = 2 and Yn = 2, then Xn+ 1 = 3 and Yn+ 1 = 3 in Case I, Xn+ 1 = 3 and Yn+ 1 = 2 in Case II, Xn+ 1 = 2 and Yn+ 1 = 2 in Case III, Xn+ 1 = 2 and Yn+ 1 = 1 in Case IV, and Xn+ 1 = 1 and Yn+ 1 = 1 in Case V.

In all cases, it is true that Xn+ 1Yn+ 1. □

Proof of Theorem 1

We first prove that Tm(s) is non-increasing in s. Let {Xn} be the sequence of states visited by the patient using mHealth mode, with X0 = 2, and similarly let {Yn} be the sequence of states visited by the patient using mHealth mode, with Y0 = 1. The conditions of Lemma 1 are met by setting P = P′ = Pm and so there is a coupling of {Xn} and {Yn} such that XnYn for all n.

From this observation, we are able to conclude that:

  • The X process must reach the absorbing state sooner than the Y process. Thus time to absorption is higher for any given path {Un} in the Y process compared to the X process, and so the average time to absorption is higher as well. For the sake of completeness, we prove this statement formally below, however we omit it for the other claims since the arguments are almost identical. Let X be defined on probability space Ω with measure PX and Y be defined on probability space Ω with measure PY. Then Lemma 1 gives a coupling of X and Y such that for all n, Xn(ω) ≥ Yn(ω) and PX(ω) = PY(ω). Define \(\tilde {T}_{X}(\omega )\) to be the random time to absorption in X and \(\tilde {T}_{Y}(\omega )\) to be the random time to absorption in Y . Then we need to show that

    $$\begin{array}{@{}rcl@{}} {\int}_{\Omega}\tilde{T}_{X}(\omega) d P_{X}(\omega) \geq {\int}_{\Omega}\tilde{T}_{Y}(\omega) d P_{Y}(\omega). \end{array} $$

    This inequality holds since for all ω ∈Ω, \(\tilde {T}_{X}(\omega ) \geq \tilde {T}_{Y}(\omega )\) and PX(ω) = PY(ω) by our coupling construction.

  • Since R(s) is decreasing in s, for every period n, the Y process yields higher utility than the X process. Furthermore, the average time until absorption is higher for the Y process than the X process. Thus, the total utility over the patient’s lifetime is higher for the Y process compared to the X process as well.

To prove the remaining claims in the theorem, we repeat the same argument. For instance, to show that (Tm(s), vm(s)) are decreasing in em, consider two values of em, denoted as \(e_{m}^{(1)}\) and \(e_{m}^{(2)}\), such that \(e_{m}^{(1)}> e_{m}^{(2)}\). Create two corresponding processes {Xn} and {Yn} where X0 = Y0 = s, and where the transition probabilities for the process X follow the Pm matrix with \(e_{m} = e_{m}^{(1)}\) (denote this matrix as \(P_{m}^{(1)}\)), and the transition probabilities for the process Y follow the Pm matrix with \(e_{m} = e_{m}^{(2)}\) (denote this matrix as \(P_{m}^{(2)}\)). Then we can couple {Xn} and {Yn} according to Lemma 1 by setting \(P=P_{m}^{(1)}\) and \(P^{\prime }=P_{m}^{(2)}\), ensuring that for all n, XnYn. The argument proceeds in exactly the same way as above to conclude that since the state is always higher in the X process than in the corresponding Y process, the average time to absorption and the average lifetime utility is higher for the Y process compared to the X process.

We omit the proofs for the remaining claims since they are identical to the proofs we have outlined. □

Proof of Theorem 2

We proceed with a similar proof as that of Theorem 1. Let Xn represent the nth period health state of a patient using regular mode, let Yn represent the nth period health state of a patient using mHealth mode. Set X0 = Y0 = s. If n is a multiple of k, then the transition matrix used in the next step of the X process is Pr, and we can apply Lemma 1 by setting P = Pr and P′ = Pm. In the other case, if n is not a multiple of k, then the transition matrix used in the next step of the X process is Pw and we can apply Lemma 1 by setting P = Pw and P′ = Pm. In either case, there exists a coupling such that XnYn for all n.

Thus, we conclude that Tm(s) ≥ Tr(s) since the time to absorption is always greater in process Y than process X, and must be greater in expectation as well. Denote \(\tilde {T}_{X}\) to be the random absorption time in the X process, \(\tilde {T}_{Y}\) to be the random absorption time in the Y process, \(\tilde {v}(\{X_{n}\},s)\) to be the lifetime utility of a patient traversing the X process, starting in state s, and \(\tilde {v}(\{Y_{n}\},s)\) to be the lifetime utility of a patient traversing the Y process, starting in state s. We append “∼” to the utility function v to emphasize that these quantities are random and dependent on the path of X and Y, and are different from the average utility functions (vm(s), vr(s)). The expressions for \(\tilde {v}(\{X_{n}\},s)\) and \(\tilde {v}(\{Y_{n}\},s)\) are

$$\begin{array}{@{}rcl@{}} \tilde{v}(\{X_{n}\},s) = \sum\limits_{n = 0}^{\tilde{T}_{X}}{R(X_{n})} - \lfloor T_{X} / k \rfloor C_{r}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \tilde{v}(\{Y_{n}\},s) = \sum\limits_{n = 0}^{\tilde{T}_{Y}}{R(Y_{n})} - T_{Y} C_{m}. \end{array} $$

Note that \({\sum }_{i=n}^{\tilde {T}_{X}}{R(X_{n})}\leq {\sum }_{n = 1}^{\tilde {T}_{Y}}{R(Y_{n})}\) since \(\tilde {T}_{X}\leq \tilde {T}_{Y}\), XiYi for all i, and R(s) is decreasing in s. From this observation and our assumption that kCmCr, we conclude that \(\tilde {v}(\{X_{n}\},s)\leq \tilde {v}(\{Y_{n}\},s)\). For each possible path, the lifetime utility is higher in the Y process than the X process, and must be higher on average as well. □

Proof of Theorem 3

From Eq. 13,

$$\begin{array}{@{}rcl@{}} \dfrac {\mathrm{d}g}{\mathrm{d}a} = - \frac{b (1 - 2 a + a^{2} e) (k + e -1)}{(1 - a)^{2} a^{2} (1 - e) e (k -1)}. \end{array} $$

The equation dg/da = 0 has two solutions:

$$\begin{array}{@{}rcl@{}} \frac{k+e-1 - \sqrt{(1 - e) (k+e-1)^{2}}}{e (k+e-1)} \end{array} $$


$$\begin{array}{@{}rcl@{}} \frac{k+e-1 + \sqrt{(1 - e) (k+e-1)^{2}}}{e (k+e-1)}. \end{array} $$

Clearly, \(\frac {k+e-1 + \sqrt {(1 - e) (k+e-1)^{2}}}{e (k+e-1)} \ge 1\) since e ≤ 1. Since \(0 \le \frac {k+e-1 - \sqrt {(1 - e) (k+e-1)^{2}}}{e (k+e-1)} \le 1\), this is the a we want. □

Appendix D: Justifying \(\tilde P\) approximation

In this section we run numerical experiments to show that the office visit process for a patient is well approximated by a Markov chain with transition matrix \(\tilde {P}\) defined in Eq. 3. To do so, for a given set of parameters (a, b, er, k) we calculate the average absorption time using Tr(1) defined in Eq. 5, and compare it to the average absorption time calculated using simulation. For the latter, in every run we simulate a Markov process starting in state 1 where the transition probability follows Pr in each kth period, and follows Pw otherwise, and track the number of transitions needed until the Markov chain reaches the absorbing state 3. This process is repeated 500,000 times and we report the average number of transitions until absorption. We remark that a similar analysis can be performed to check the accuracy of Nr in Eq. 4, but the results are similar and omitted for brevity.

We run five different experiments by choosing different parameters in each setting:

  • Experiment I: a = 0.2, b = 0.1, er = 0.75.

  • Experiment II: a = 0.2, b = 0.1, er = 0.5.

  • Experiment III: a = 0.2, b = 0.1, er = 0.25.

  • Experiment IV: a = 0.8, b = 0.1, er = 0.5.

  • Experiment V: a = 0.1, b = 0.8, er = 0.5.

For all the five experiments, we vary k from 1 to 10 and report the average absorption time using both simulation and based on Eq. 5. The results of these experiments are shown in Figs. 78910 and 11.

Fig. 7
figure 7

Experiment I, a = 0.2, b = 0.1, er = 0.75

Fig. 8
figure 8

Experiment II, a = 0.2, b = 0.1, er = 0.5

Fig. 9
figure 9

Experiment III, a = 0.2, b = 0.1, er = 0.25

Fig. 10
figure 10

Experiment IV, a = 0.8, b = 0.1, er = 0.5

Fig. 11
figure 11

Experiment V, a = 0.1, b = 0.8, er = 0.5

We are mainly interested in Experiments I, II, III. For these experiments we set a and b, respectively the probability of a patient’s health condition worsening or improving in the absence of physician intervention, to a realistic level. We vary er and k since different values for these two parameters will affect the frequency with which we make transitions according to Pr instead of Pw, and the magnitude of the discrepancy in the transition probabilities between Pr and Pw. Overall, our approximation works well in these experiments. By assuming that the office visit process follows a Markov chain with transition matrix \(\tilde {P}\), we do not lose much in overall accuracy. At maximum, the error between the simulated absorption time and the approximated absorption time is 3.4%. For all the settings that we tested in experiments I, II, and III, the average error was 1.6%.

Experiments IV and V represent extreme cases where in the absence of treatment, either (i) it is likely that the patient condition worsens quickly, or (ii) it is likely that the patient’s condition improves quickly. Although the parameters may be unrealistic, we nevertheless report them to test the limits of our approximation. For Experiment IV, the maximum error between the simulated absorption time and the approximated absorption time is 13.5%, while the average error amongst all the settings tested in experiment IV was 8.5%. One explanation for the higher error is that in this setting when k is large, it is likely that in the simulation the patient reaches the absorbing state before even the first intervention at period k is reached. However in the approximation, it is assumed that the patient can get an intervention in every period with probability 1/k, and so the approximation is likely to overestimate the absorption time. In any case, the error is unlikely to have any practical impact since the magnitudes of the absorption time are small to begin with. For instance, the maximum error between simulated absorption time and approximated absorption time occurs when the simulated absorption time is 2.97 periods and the approximated absorption time is 3.375 periods.

For Experiment V, the maximum error between the simulated absorption time and the approximated absorption time is 8%, while the average error amongst all the settings tested in experiment V was 1.5%. Overall the approximation works well, except for one instance when k = 2. When k is small, it is possible that having an intervention once every k periods is more beneficial than having a 1/k chance of having an intervention every period. In the latter case, the patient has a risk of not receiving intervention within k periods. However, we remark this setting is largely irrelevant to the analysis of mHealth mode; there is little point of continuous monitoring of the patient when a is small and b is large since the patient is usually healthy even without intervention.

Agnihothri, S., Cui, L., Delasay, M. et al. The value of mHealth for managing chronic conditions. Health Care Manag Sci 23, 185–202 (2020).

