Skip to main content

Advertisement

Log in

Inverse optimization for assessing emerging technologies in breast cancer screening

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Identifying the optimal screening strategies for breast cancer, the second leading cause of female cancer deaths in the US, is a major societal problem creating much controversy. The optimal screening strategies significantly depend on the sensitivity and specificity of the screening modality used. While the current state-of-the-art screening technology is mammography, its sensitivity or specificity may increase over time, or mammography may be replaced by another technology such as tomosynthesis in the near future. The purpose of this study is to identify the optimal use of the next generation of breast cancer screening modalities, whose sensitivity and specificity in clinical practice are either yet unknown or keep improving over time. Contrary to the prior literature that focuses on finding the optimal screening policy for given sensitivity and specificity values, we take an inverse optimization approach and focus on finding the range of sensitivity and specificity values, for which a given screening policy is optimal. To replicate breast cancer progression in the US population under various screening policies, we develop a parametric Partially Observable Markov Chain (POMC) model, which accounts for unobservable and age-specific disease progression, age-specific mortality, and the possibility of detecting cancer without a screening exam (either via self-detection or a clinical breast exam). We then formulate a nonlinear program (NLP) to identify the range of sensitivity and specificity values that optimize a particular screening policy. We show that this NLP is nonconvex for some parameter values, and hence difficult to solve. We prove several structural properties of the model, and by exploiting these properties, we propose a complete solution algorithm for this problem. We use real data in our numerical analysis and show that with the current technology, biennial breast cancer screening is slightly better than annual screening for the average-risk population. We also find that an improvement only in sensitivity (but not in specificity) will not change the current optimal policy. Furthermore, we characterize the lost potential quality-adjusted life years (QALYs) due to suboptimal practice, and show that biennial screening is more robust than annual screening in the sense that it results in fewer lost QALYs due to choosing a suboptimal screening policy. Given that the design of multicenter clinical trials may be prohibitively expensive and lengthy, our findings may be especially valuable to policymakers in deciding about the optimal use of an emerging breast cancer screening modality, and adapting a new technology in different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1

Similar content being viewed by others

References

  • Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (p. 1). New York: ACM.

    Chapter  Google Scholar 

  • Breast cancer facts and figures: 2009–2010. Atlanta: American Cancer Society.

  • Arias, E. (2006). United States life tables, 2004. National vital statistics reports, 54(14), 1–40.

    Google Scholar 

  • Ayer, T. (2011). Optimal policies for personalized breast cancer screening. PhD thesis, University of Wisconsin-Madison

  • Ayer, T., Alagoz, O., & Stout, N. K. (2012). A POMDP approach to personalize mammography screening decisions. Operations Research, 60(1), 1017–1018.

    Google Scholar 

  • Baker, R. D. (1998). Use of a mathematical model to evaluate breast cancer screening policy. Health Care Management Science, 1(2), 103–113.

    Article  Google Scholar 

  • Barton, M. B., Harris, R., & Fletcher, S. W. (1999). Does this patient have breast cancer? The screening clinical breast examination: should it be done? How? The Journal of the American Medical Association, 282(13), 1270–1280.

    Article  Google Scholar 

  • Baxter, N. (2001). Should women be routinely taught breast self-examination to screen for breast cancer? Canadian Medical Association Journal, 164(13), 1837–1845.

    Google Scholar 

  • Bernardi, D., Ciatto, S., Pellegrini, M., Tuttobene, P., Fanto, C., Valentini, M., Michele, S. D., Peterlongo, P., & Houssami, N. (2012). Prospective study of breast tomosynthesis as a triage to assessment in screening. Breast cancer research and treatment, 133(1), 1–5.

    Article  Google Scholar 

  • Bolan, C. (2011). Breast screening’s trade-offs. Applied Radiology.

  • Brewer, N. T., Salz, T., & Lillie, S. E. (2007). Systematic review: the long-term effects of false-positive mammograms. Annals of Internal Medicine, 146(7), 502–510.

    Article  Google Scholar 

  • Cassandra, A. R. (1998). A survey of POMDP applications. In Working notes of AAAI 1998 fall symposium on planning with partially observable Markov decision processes (pp. 17–24).

    Google Scholar 

  • Choi, J., & Kim, K. E. (2011). Inverse reinforcement learning in partially observable environments. Journal of Machine Learning Research, 12, 691–730.

    Google Scholar 

  • Costantino, J. P., Gail, M. H., Pee, D., Anderson, S., Redmond, C. K., Benichou, J., & Wieand, H. S. (1999). Validation studies for models projecting the risk of invasive and total breast cancer incidence. Journal of the National Cancer Institute, 91(18), 1541–1548.

    Article  Google Scholar 

  • De Haes, J. C., de Koning, H. J., van Oortmarssen, G. J., Van Agt, H. M., de Bruyn, A. E., & van der Maas, P. J. (1991). The impact of a breast cancer screening programme on quality-adjusted life-years. International Journal of Cancer, 49(4), 538–544.

    Article  Google Scholar 

  • Dobbins, J. T. III. (2009). Tomosynthesis imaging: at a translational crossroads. Medical physics, 36(6), 1956–1967.

    Article  Google Scholar 

  • Drummond, M. F., Sculpher, M. J., Torrance, G. W., O’Brien, B. J., & Stoddart, G. L. (2005). Methods for the economic evaluation of health care programmes. New York: Oxford University Press.

    Google Scholar 

  • Earle, C. C., Chapman, R. H., Baker, C. S., Bell, C. M., Stone, P. W., Sandberg, E. A., & Neumann, P. J. (2000). Systematic overview of cost-utility assessments in oncology. Journal of Clinical Oncology, 18(18), 3302–3317.

    Google Scholar 

  • Elmore, J. G., Barton, M. B., Moceri, V. M., Polk, S., Arena, P. J., & Fletcher, S. W. (1998). Ten-year risk of false positive screening mammograms and clinical breast examinations. New England Journal of Medicine, 338(16), 1089–1096.

    Article  Google Scholar 

  • Elmore, J. G., Reisch, L. M., Barton, M. B., Barlow, W. E., Rolnick, S., Harris, E. L., Herrinton, L. J., Geiger, A. M., Beverly, R. K., Hart, G., et al. (2005). Efficacy of breast cancer screening in the community according to risk level. Journal of the National Cancer Institute, 97(14), 1035–1043.

    Article  Google Scholar 

  • Elmore, J. G., Wells, C. K., Lee, C. H., Howard, D. H., & Feinstein, A. R. (1994). Variability in radiologists’ interpretations of mammograms. New England Journal of Medicine, 331(22), 1493–1499.

    Article  Google Scholar 

  • Erkin, Z., Bailey, M. D., Maillart, L. M., Schaefer, A. J., & Roberts, M. S. (2010). Eliciting patients’ revealed preferences: an inverse Markov decision process approach. Decision Analysis, 7(4), 358–365.

    Article  Google Scholar 

  • Ferzli, G. S., Hurwitz, J. B., Puza, T., & Van Vorst-Bilotti, S. (1997). Advanced breast biopsy instrumentation: a critique. Journal of the American College of Surgeons, 185(2), 145–151.

    Article  Google Scholar 

  • Fryback, D. G., Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Kuruchittham, V., & Remington, P. L. (2006). The Wisconsin breast cancer epidemiology simulation model. Journal of the National Cancer Institute Monographs, 36, 37–47.

    Article  Google Scholar 

  • Gail, M. H., Costantino, J. P., Bryant, J., Croyle, R., Freedman, L., Helzlsouer, K., & Vogel, V. (1999). Weighing the risks and benefits of tamoxifen treatment for preventing breast cancer. Journal of the National Cancer Institute, 91(21), 1829–1846.

    Article  Google Scholar 

  • Gur, D. (2007). Tomosynthesis: potential clinical role in breast imaging. American Journal of Roentgenology, 189(3), 614–615.

    Article  Google Scholar 

  • Hillman, B. J., & Gatsonis, C. A. (2008). When is the right time to conduct a clinical trial of a diagnostic imaging technology? Radiology, 248(1), 12–15.

    Article  Google Scholar 

  • Jemal, A., Siegel, R., Ward, E., Hao, Y., Xu, J., & Thun, M. J. (2009). Cancer statistics, 2009. CA: A Cancer Journal for Clinicians, 59(4), 225–249.

    Google Scholar 

  • Klabunde, C. N., & Ballard-Barbash, R. (2007). Evaluating population-based screening mammography programs internationally. Seminars in Breast Disease, 10(2), 102–107.

    Article  Google Scholar 

  • Maillart, L. M., Ivy, J. S., Ransom, S., & Diehl, K. (2008). Assessing dynamic breast cancer screening policies. Operations Research, 56(6), 1411–1427.

    Article  Google Scholar 

  • Mandelblatt, J. S., Wheat, M. E., Monane, M., Moshief, R. D., Hollenberg, J. P., & Tang, J. (1992). Breast cancer screening for elderly women with and without comorbid conditions: a decision analysis model. Annals of Internal Medicine, 116(9), 722–730.

    Article  Google Scholar 

  • Mandelblatt, J. S., Cronin, K. A., Bailey, S., Berry, D. A., de Koning, J. H., Draisma, G., Huang, H., Lee, S. J., Munsell, M., Plevritis, S. K., et al. (2009). Effects of mammography screening under different screening schedules: model estimates of potential benefits and harms. Annals of Internal Medicine, 151(10), 738–747.

    Article  Google Scholar 

  • Messina, C. R., Lane, D. S., Glanz, K., West, D. S., Taylor, V., Frishman, W., & Powell, L. (2004). Relationship of social support and social burden to repeated breast cancer screening in the women’s health initiative. Health Psychology, 23(6), 582–594.

    Article  Google Scholar 

  • Nelson, H. D., Tyne, K., Naik, A., Bougatsos, C., Chan, B. K., & Humphrey, L. (2009). Screening for breast cancer: systematic evidence review update for the US preventive services task force. Annals of Internal Medicine, 151(10), 727–W242.

    Article  Google Scholar 

  • Neu, G., & Szepesvári, C. (2009). Training parsers by inverse reinforcement learning. Machine Learning, 77(2), 303–337.

    Article  Google Scholar 

  • Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proc. 17th International Conf. on Machine Learning. Citeseer.

    Google Scholar 

  • Ozekici, S., & Pliska, S. R. (1991). Optimal scheduling of inspections: a delayed Markov model with false positives and negatives. Operations Research, 39(2), 261–273.

    Article  Google Scholar 

  • Parker, S. L., Tong, T., Bolden, S., & Wingo, P. A. (1997). Cancer statistics, 1997. CA: A Cancer Journal for Clinicians, 47(1), 5–27.

    Google Scholar 

  • Rafferty, E. A., Park, J. M., Philpotts, L. E., Poplack, S. P., Sumkin, J. H., Halpern, E. F., & Niklason, L. T. (2013). Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. Radiology, 266(1), 104–113.

    Article  Google Scholar 

  • Ramachandran, D. (2007). Bayesian inverse reinforcement learning. In 20th Int. Joint Conf. Artificial Intelligence.

    Google Scholar 

  • Sackett, D. L., & Haynes, R. B. (2002). Evidence base of clinical diagnosis: the architecture of diagnostic research. BMJ: British Medical Journal, 324(7336), 539.

    Article  Google Scholar 

  • Shapiro, S., Coleman, E. A., Broeders, M., Codd, M., de Koning, H., Fracheboud, J., Moss, S., et al. (1998). Breast cancer screening programmes in 22 countries: current policies, administration and guidelines. International Journal of Epidemiology, 27(5), 735–742.

    Article  Google Scholar 

  • Shen, Y., & Zelen, M. (2001). Screening sensitivity and sojourn time from breast cancer early detection clinical trials: mammograms and physical examinations. Journal of Clinical Oncology, 19(15), 3490–3499.

    Google Scholar 

  • Skaane, P. (2011). Controversies in mammography screening: let us not ignore science in this never-ending debate. Acta Radiologica, 52(10), 1061–1063.

    Article  Google Scholar 

  • Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21(5), 1071–1088.

    Article  Google Scholar 

  • Smith, R. A., Duffy, S. W., & Tabár, L. (2012). Breast cancer screening: the evolving evidence. Oncology, 26(5), 471–475.

    Google Scholar 

  • Sommer, C. A., Stitzenberg, K. B., Tolleson-Rinehart, S., Carpenter, W. R., & Carey, T. S. (2011). Breast MRI utilization in older patients with newly diagnosed breast cancer. Journal of Surgical Research, 170(1), 77–83.

    Article  Google Scholar 

  • Sonnenberg, F. A., & Beck, J. R. (1993). Markov models in medical decision making: a practical guide. Medical Decision Making, 13(4), 322–338.

    Article  Google Scholar 

  • Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Smith, M. A., Robinson, S. M., & Fryback, D. G. (2006). Retrospective cost-effectiveness analysis of screening mammography. Journal of the National Cancer Institute, 98(11), 774–782.

    Article  Google Scholar 

  • USPSTF (2009). Clinical guidelines: screening for breast cancer: US preventive services task force recommendation statement. Annals of Internal Medicine, 151, 716–726.

    Article  Google Scholar 

  • Zelen, M. (1993). Optimal scheduling of examinations for the early detection of disease. Biometrika, 80(2), 279–293.

    Article  Google Scholar 

Download references

Acknowledgements

The author thanks Jagpreet Chhatwal, Qiushi Chen, Chelsea C. White III, Jeff Pavelka, Anthony Bonifonte, Sait Tunc, and the anonymous reviewers for their suggestions and insights, which have improved this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Turgay Ayer.

Appendix:Proofs of the Analytical Results

Appendix:Proofs of the Analytical Results

Proof of Proposition 1

We prove this by induction. Basis: \(V_{\pi^{i}_{T}}(b,x,y) = \sum_{s \in S^{PO}}b(s) r_{T}(s)\). Now, we will present \(V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y)\) in terms of the α-vectors. Substituting the value of τ[b,a,o] from (2) into \(V_{\pi ^{i}_{t+1}}(\tau[b,a,o],x,y) =\sum_{s' \in S^{PO}}\tau[b,a,o](s')\alpha _{\pi^{i}_{t+1}}(s',x,y)\), we get

$$\begin{aligned} V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y) = & \left\{ \begin{array}{ll} \sum_{s' \in S^{PO}} \bigg(\frac{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)P_{t}^{(a,o)}(s'|s)}{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)} \bigg)\alpha_{\pi^{i}_{t+1}}(s',x,y) \\ \quad \textrm{ if } a=W \textrm{ or } a=E \textrm{ and } o=E-,\\ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y) \\ \quad \textrm { if } a=E \textrm{ and } o=E+.& \end{array} \right. \end{aligned}$$
(17)

In (17), \(\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)\) does not depend on s′, hence we can move it out of the summation. Also, changing the order of summation, we obtain the following:

$$\begin{aligned} &V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y) = \left\{ \begin{array}{l@{\quad}l} \frac{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(a,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)} \\ \quad \textrm{ if } a=W \textrm{ or } a=E \textrm { and } o=E-,\\ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y) \\ \quad \textrm{ if } a=E \textrm{ and } o=E+.& \end{array} \right. \end{aligned}$$
(18)

Now, we substitute the value of \(V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y) \) to obtain \(V_{\pi^{i}_{t}}(b,x,y)\) in terms of α-vectors.

Case 1: \(d^{i}_{t}=E\)

From (4), we know that

$$\begin{aligned} V_{\pi^{i}_{t}}(b,x,y)&= b(0)x\bigg(r_{t}(0,E,E-)+V_{\pi^{i}_{t+1}}(\tau [b,E,E-],x,y)\bigg) \\ &\quad{}+ b(0)\big(1-x\big)\bigg(r_{t}(0,E,E+)+V_{\pi^{i}_{t+1}}(\tau [b,E,E+],x,y)\bigg) \\ &\quad{}+\sum_{s=1}^{2}b(s) \bigg[ \big(1-y\big)\bigg(r_{t}(s,E,E-)+ V_{\pi^{i}_{t+1}}(\tau[b,E,E-],x,y) \bigg)\bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)yR_{t}(s) \\ & = \sum_{s \in S^{PO}}b(s) \Bigg[ K_{t}^{E}(E-|s)\bigg(r_{t}(s,E,E-)+ V_{\pi^{i}_{t+1}}(\tau[b,E,E-],x,y) \bigg)\Bigg] \\ &\quad{}+ b(0)K_{t}^{E}(E+|0)\bigg(r_{t}(0,E,E+)+V_{\pi^{i}_{t+1}}(\tau [b,E,E+],x,y)\bigg) \\ &\quad{}+\sum_{s=1}^{2}b(s)K_{t}^{E}(E+|s)R_{t}(s), \end{aligned}$$
(19)

where (19) follows from replacing x with \(K_{t}^{E}(E-|0)\) and y with \(K_{t}^{E}(E+|s)\) when s∈{1,2}, and the fact that \(\mathbb{K}_{t}^{E}\) is a stochastic matrix.

Substituting the values of \(V_{\pi^{i}_{t+1}}(\tau[b,E,E-],x,y)\) and \(V_{\pi^{i}_{t+1}}(\tau[b,E,E+],x,y)\) from (18) we obtain

$$\begin{aligned} V_{\pi^{i}_{t}}(b,x,y)&= \sum_{s \in S^{PO}}b(s) K_{t}^{E}(E-|s)\Bigg[r_{t}(s,E,E-) \\ &\quad{}+ \bigg(\frac{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(a,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)} \bigg)\Bigg] \\ &\quad{}+ b(0)K_{t}^{E}(E+|0)\bigg(r_{t}(0,E,E+)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) \\ &\quad{}+\sum_{s=1}^{2}b(s)K_{t}^{E}(E+|s)R_{t}(s) \end{aligned}$$
(20)
$$\begin{aligned} &= \sum_{s \in S^{PO}}b(s)K_{t}^{E}(E-|s)\Bigg[r_{t}(s,E,E-) + \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+ b(0)K_{t}^{E}(E+|0)\Bigg[r_{t}(0,E,E+) +\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y) \Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)K_{t}^{E}(E+|s)R_{t}(s) \end{aligned}$$
(21)
$$\begin{aligned} & = b(0)x \Bigg[r_{t}(s,E,E-)+\sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+b(0)\big(1-x\big)\Bigg[r_{t}(s,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s) \big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)yR_{t}(s) \end{aligned}$$
(22)
$$\begin{aligned} & = b(0)x \Bigg[r_{t}(s,E,E-)-r_{t}(s,E,E+)\Bigg] \\ &\quad{}+b(0)\Bigg[r_{t}(s,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s) \big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)yR_{t}(s) \end{aligned}$$
(23)

where (21) follows from changing the order of summation, rearranging the terms, and canceling the identical terms in the numerator and denominator; (22) follows from replacing \(K_{t}^{E}(E-|0)\) with x, \(K_{t}^{E}(E+|s)\) when s∈{1,2} with y, and the fact that \(\mathbb{K}_{t}^{E}\) is a stochastic matrix; and (23) follows because \(P_{t}^{(E,E-)}(s'|0) =P_{t}^{(E,E+)}(s'|0)\) for all tT by Assumption 3.

Case 2: \(d^{i}_{t}=W\)

Similar to Case 1, substituting the value of \(V_{\pi^{i}_{t+1}}(\tau [b,W,o],x,y)\) from (18) into (4), we obtain

$$\begin{aligned} &V_{\pi^{i}_{t}}(b,x,y) \\&\quad= \sum_{ s \in S^{PO}, o \in\varTheta_{W} } b(s)K_{t}^{W}(o|s) \Bigg[r_{t}(b,W,o) \\ &\quad\quad{}+ \bigg( \frac{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)} \bigg) \Bigg] \\ &\quad =\sum_{ s \in S^{PO}, o \in\varTheta_{W} } b(s)K_{t}^{W}(o|s) r_{t}(b,W,o) \\ &\quad\quad{}+ \sum_{ o \in\varTheta_{W} } \left(\frac{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)} \right) \sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s) \end{aligned}$$
(24)
$$\begin{aligned} &\quad =\sum_{ s \in S^{PO}, o \in\varTheta_{W} } b(s)K_{t}^{W}(o|s) r_{t}(b,W,o) \\ &\quad\quad{}+ \sum_{ o \in\varTheta_{W} } \sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y) \end{aligned}$$
(25)
$$\begin{aligned} &\quad =\sum_{s \in S^{PO}}b(s) \sum_{o \in \varTheta_{W}}K_{t}^{W}(o|s)\Bigg[r_{t}(s,W,o) +\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s) \alpha_{\pi^{i}_{t+1}}(s',x,y) \Bigg], \end{aligned}$$
(26)

where (24) follows from changing the order of summation and rearranging the terms, (25) follows from canceling the terms in the numerator and denominator, and (26) follows from simple algebra.  □

Proof of Lemma 1

We prove the general inequality case and the proof for the strict inequality in Part (a) is very similar, with the only exception that the basis in the induction changes. By Proposition 1, \(V_{\pi^{i}_{t}}(b,x_{2},y_{2}) \geq V_{\pi ^{i}_{t}}(b,x_{1},y_{1})\) if \(\sum_{s \in S^{PO}}b(s)\alpha_{\pi ^{i}_{t}}(s,x_{2},y_{2}) \geq\sum_{s \in S^{PO}}b(s)\alpha_{\pi ^{i}_{t}}(s,x_{1},y_{1})\). Therefore, it is sufficient to show that \(\alpha _{\pi^{i}_{t}}(s,x_{2},y_{2})\geq\alpha_{\pi^{i}_{t}}(s,x_{1},y_{1})\) for all sS PO whenever x 2x 1 and y 2y 1. We prove this by induction as follows. Basis: \(\alpha_{\pi^{i}_{T}}(s,x_{2},y_{2})= r_{T}(s) \geq\alpha_{\pi^{i}_{T}}(s,x_{1},y_{1}) = r_{T}(s)\). Suppose the assertion holds for \(\alpha_{\pi^{i}_{t+1}}\). Then, we need to show this for the induction step. Note that when \(d^{i}_{t} = W\), this follows directly from Proposition 1 because of the induction hypothesis. When \(d^{i}_{t} = E\) and s=0, the assertion can be proven as follows:

$$\begin{aligned} \alpha_{\pi^{i}_t}(0,x_2,y_2)&=r_{t}(0,E,E+) + \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x_2,y_2) \\ &\quad{} + x_2 \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \end{aligned}$$
(27)
$$\begin{aligned} &\geq r_{t}(0,E,E+) + \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x_1,y_1) \\ &\quad{}+ x_1 \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &=\alpha_{\pi^{i}_t}(0,x_1,y_1) \end{aligned}$$
(28)

where (28) follows because of the induction hypothesis and from the assumption that r t (0,E,E−)−r t (0,E,E+)≥0.

On the other hand, when \(d^{i}_{t} = E\) and s∈{1,2}, the assertion can be proven as follows:

$$\begin{aligned} &\alpha_{\pi^{i}_t}(s,x_2,y_2) \\ &\quad=\big(1-y_2\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\Bigg] +y_2 R_{t}(s) \\ &\quad=\big(1-y_1\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\Bigg] + y_1 R_{t}(s) \\ &\quad\quad{}+ \big(y_2-y_1\big) \Bigg[ R_{t}(s)- \bigg(r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\bigg)\Bigg] \end{aligned}$$
(29)
$$\begin{aligned} &\quad\geq\alpha_{\pi^{i}_t}(s,x_1,y_1) \\ &\quad\quad{}+\big(y_2-y_1\big) \Bigg[ R_{t}(s)- \bigg(r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\bigg)\Bigg] \end{aligned}$$
(30)
$$\begin{aligned} &\quad\geq\alpha_{\pi^{i}_t}(s,x_1,y_1), \end{aligned}$$
(31)

where (29) follows from rearranging the terms, (30) follows from the induction hypothesis, and (31) follows from (9). □

Proof of Lemma 2

This proof follows directly from the proof of Lemma 1, where we show that \(\alpha_{\pi^{i}_{t}}(s,x,y)\) is nondecreasing in x and y for all sS PO, tT, and \(\pi^{i}_{t} \in\varPi _{t}\). □

Proof of Proposition 2

We would like to show that the feasible region of the NLP given in (5) is nonconvex in y, i.e. \(V_{\pi^{1}_{t}}(b,x,y) \leq V_{\pi ^{2}_{t}}(b,x,y)\) for all \(y \in\{\underline{y}, \overline{y}\}\), but ∃ \(\dot{y} \in[\underline{y},\overline{y}]\) such that \(V_{\pi ^{1}_{t}}(b,x,\dot{y}) \geq V_{\pi^{2}_{t}}(b,x,\dot{y})\). Let \(\pi ^{i}_{t+1}\) be any fixed policy (i.e., does not depend on b) in Π t+1, and \(\varPi_{t}=\{\pi^{1}_{t}, \pi^{2}_{t}\}\) where \(\pi^{1}_{t}=\{E, \pi^{i}_{t+1}\}\), and \(\pi^{2}_{t}=\{W, \pi^{i}_{t+1}\}\). That is, \(\pi ^{1}_{t}\) and \(\pi^{2}_{t}\) are identical except the action taken at time t. From Proposition 1, we know that \(V_{\pi ^{i}_{t}}(b,x,y) =\sum_{s \in S^{PO}}b(s)\alpha_{\pi^{i}_{t}}(s,x,y)\), hence it is sufficient to show that \(\sum_{s \in S^{PO}}b(s) (\alpha _{\pi^{1}_{t}}(s,x,y)- \alpha_{\pi^{2}_{t}}(s,x,y) )\) is negative for all \(y \in\{\underline{y}, \overline{y}\}\) but positive for some \(\dot {y} \in[\underline{y}, \overline{y}]\). Let \(\alpha_{\pi ^{i}_{t}}(x,y)=\big[\alpha_{\pi^{i}_{t}}(s,x,y)\big]\). From Proposition 1, we have

$$\begin{aligned} & \alpha_{\pi^{1}_t}(x,y)-\alpha_{\pi^{2}_t}(x,y) \\ &= \left[ \begin{array}{c} x \bigg(r_{t}(0,E,E-)-r_{t}(0,E,E+)\bigg)+ \bigg(r_{t}(0,E,E+)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg)\\ \big(1-y\big) \bigg(r_{t}(1,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(1)\\ \big(1-y\big) \bigg(r_{t}(2,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(2) \end{array} \right] \\ &\quad{}-\left[ \begin{array}{c} \sum_{o \in\varTheta_{W}}K_{t}^{W}(o|0)\Bigg(r_{t}(0,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg)\\ \sum_{o \in\varTheta_{W}}K_{t}^{W}(o|1)\Bigg(r_{t}(1,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg)\\ \sum_{o \in\varTheta_{W}}K_{t}^{W}(o|2)\Bigg(r_{t}(2,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg) \end{array} \right] \\ &= \left[ \begin{array}{c} x \bigg(r_{t}(0,E,E-)-r_{t}(0,E,E+)\bigg)+ \bigg(r_{t}(0,E,E+)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg)\\ \big(1-y\big) \bigg(r_{t}(1,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(1)\\ \big(1-y\big) \bigg(r_{t}(2,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(2) \end{array} \right] \\ &{}-\left[ \begin{array}{c} r_{t}(0,W,W-)+\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|0)\alpha_{\pi ^{i}_{t+1}}(s',x,y)\\ r_{t}(1,W,W-)+\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|1)\alpha_{\pi ^{i}_{t+1}}(s',x,y)\\ r_{t}(2,W,W-)+\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|2)\alpha_{\pi ^{i}_{t+1}}(s',x,y) \end{array} \right] = \end{aligned}$$
(32)
$$\begin{aligned} & \left[ \begin{array}{@{\!\!}c@{\!\!}} x (r_{t}(0,E,E-)-r_{t}(0,E,E+))+ (r_{t}(0,E,E+) - r_{t}(0,W,W-) )\\ (1-y) (r_{t}(1,E,E-) - r_{t}(1,W,W-) ) +y (R_{t}(1) -r_{t}(1,W,W-)- \sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y))\\ (1-y) (r_{t}(2,E,E-) - r_{t}(2,W,W-) ) +y (R_{t}(2) - r_{t}(2,W,W-)-\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)) \end{array} \right] \end{aligned}$$
(33)
$$\begin{aligned} &= \left[ \begin{array}{c} (x-1) du^{D}\\ y \ell_{t}(1, x, y)\\ y \ell_{t}(2, x, y) \end{array} \right] \end{aligned}$$
(34)

where (32) and (33) follow from Assumptions 2, 3, and the fact that both \(\pi^{1}_{t}\) and \(\pi ^{2}_{t}\) are fixed policies and do not depend on b, and (34) follows from Assumption 2 and Definition 1. Note that du D≥0, (x−1)du D≤0 because 0≤x≤1, and t (1,x,y) is nonincreasing in y by Lemma 2. Therefore, the values of du D, yℓ t (1,x,y), and yℓ t (2,x,y) could be such that

$$\sum_{s \in S^{PO}}b(s)\bigg(\alpha_{\pi^{1}_t}(s,x,y)- \alpha_{\pi ^{2}_t}(s,x,y) \bigg)=b\left[ \begin{array}{c} (x-1) du^{D}\\ y \ell_{t}(1, x, y)\\ y \ell_{t}(2, x, y) \end{array} \right]$$

is negative for all \(y \in\{\underline{y}, \overline{y}\}\) but positive for some \(\dot{y} \in[\underline{y}, \overline{y}]\), which makes the problem nonconvex. □

Proof of Lemma 3

We prove this by induction. For the basis step, it is obvious to see that \(\alpha_{\pi^{i}_{T}}(s,\overline{x},y)-\alpha_{\pi ^{i}_{T}}(s,\underline{x},y)=0\), as \(\alpha_{\pi^{i}_{T}}(s,x,y)=r_{T}(s)\) for any s, x, and y. Assume that the assertion holds for the decision epoch t+1. At time t, there are two possible cases: either the decision variable is W or E.

Case 1: \(d^{i}_{t}=W\)

$$\begin{aligned} &\alpha_{\pi^{i}_t}(s,\overline{x},y)-\alpha_{\pi^{i}_t}(s,\underline {x},y) \\ &\quad=\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|s)\Bigg[r_{t}(s,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\overline{x},y)\Bigg] \\ &\quad\quad{}-\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|s)\Bigg[r_{t}(s,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \\ &\quad=\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|s)\sum_{s' \in\{1,2\} }P_{t}^{(W,o)}(s'|s)\Bigg[\alpha_{\pi^{i}_{t+1}}(s',\overline {x},y)-\alpha_{\pi^{i}_{t+1}}(s',\underline{x},y)\Bigg] \end{aligned}$$
(35)
$$\begin{aligned} &\quad=0, \end{aligned}$$
(36)

where (35) follows by canceling out \(\sum_{o \in \varTheta_{W}}K_{t}^{W}(o|s)r_{t}(s,W,o)\) and from Assumption 4, and (36) follows from the induction hypothesis.

Case 2: \(d^{i}_{t}=E\)

$$\begin{aligned} &\alpha_{\pi^{i}_t}(s,\overline{x},y)-\alpha_{\pi^{i}_t}(s,\underline {x},y) \\ &\quad=\big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\overline {x},y)\Bigg] +yR_{t}(s) \\ &\quad\quad{}-\big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg]-yR_{t}(s) \\ &\quad=\big(1-y\big)\Bigg[ \sum_{s' \in \{1,2\}}P_{t}^{(E,E-)}(s'|s) \bigg( \alpha_{\pi^{i}_{t+1}}(s',\overline {x},y) - \alpha_{\pi^{i}_{t+1}}(s',\underline{x},y) \bigg)\Bigg] \end{aligned}$$
(37)
$$\begin{aligned} &\quad=0, \end{aligned}$$
(38)

where (37) follows from Assumption 4 and simple algebra, and (38) follows from the induction hypothesis.  □

Proof of Proposition 3

Let \(0 \leq\underline{x} \leq\overline{x} \leq1\). We want to show that:

$$\begin{aligned} &\bigg(V_{\pi^{1}_t}(b,\overline{x},y) - V_{\pi^{2}_t}(b,\overline {x},y)\bigg)- \bigg(V_{\pi^{1}_t}(b,\underline{x},y) - V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \\ &\quad=\bigg(V_{\pi^{1}_t}(b,\overline{x},y)-V_{\pi^{1}_t}(b,\underline {x},y)\bigg)- \bigg(V_{\pi^{2}_t}(b,\overline{x},y)- V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \\ &\quad\geq0 . \end{aligned}$$
(39)

Note that by Lemma 3 for any \(\pi^{i}_{t} \in\varPi_{t}\), bB, and \(\overline{x}, \underline{x} \in[0,1]\):

$$\begin{aligned} V_{\pi^{i}_t}(b,\overline{x},y)-V_{\pi^{i}_t}(b,\underline{x},y) &= \sum _{s \in S^{PO}} b(s) \bigg(\alpha_{\pi^{i}_{t}}(s,\overline{x},y) - \alpha_{\pi^{i}_{t}}(s,\underline{x},y)\bigg)\\ &=b(0)\bigg(\alpha_{\pi^{i}_{t}}(0,\overline{x},y) - \alpha_{\pi ^{i}_{t}}(0,\underline{x},y)\bigg). \end{aligned}$$

Then, we can rewrite (39) as follows:

$$\begin{aligned} &\bigg(V_{\pi^{1}_t}(b,\overline{x},y) - V_{\pi^{2}_t}(b,\overline {x},y)\bigg)- \bigg(V_{\pi^{1}_t}(b,\underline{x},y) - V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \end{aligned}$$
(40)
$$\begin{aligned} &\quad=b(0)\Bigg[\bigg(\alpha_{\pi^{1}_{t}}(0,\overline{x},y) - \alpha_{\pi ^{1}_{t}}(0,\underline{x},y)\bigg) - \bigg(\alpha_{\pi ^{2}_{t}}(0,\overline{x},y) - \alpha_{\pi^{2}_{t}}(0,\underline {x},y)\bigg)\Bigg] \end{aligned}$$
(41)

Now, we will express \(\alpha_{\pi^{i}_{t}}(0,\overline{x},y) - \alpha _{\pi^{i}_{t}}(0,\underline{x},y)\) for a given policy \(\pi^{i}_{t}=\{ d^{i}_{t}, d^{i}_{t+1}, \ldots, d^{i}_{T}\}\) in terms of the input parameters (i.e., rewards, transition probabilities, and observation probabilities).

Case 1: \(d^{i}_{t}=E\)

From Proposition 1, we have

$$\begin{aligned} \alpha_{\pi^{i}_t}(0,\overline{x},y)-\alpha_{\pi^{i}_t}(0,\underline {x},y)&=\overline{x} \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+) \Bigg] \\ &\quad{}+\Bigg[r_{t}(0,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',\overline {x},y)\Bigg] \\ &\quad{}-\underline{x} \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &\quad{}-\Bigg[r_{t}(0,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \\ & = \bigg(\overline{x}-\underline{x}\bigg) \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &\quad{}+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \bigg(\alpha_{\pi^{i}_{t+1}}(s',\overline {x},y) -\alpha_{\pi^{i}_{t+1}}(s',\underline{x},y)\bigg) \end{aligned}$$
(42)
$$\begin{aligned} & = \bigg(\overline{x}-\underline{x}\bigg) \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &\quad{}+ P_{t}^{(E,E+)}(0|0) \bigg(\alpha_{\pi^{i}_{t+1}}(0,\overline{x},y) -\alpha_{\pi^{i}_{t+1}}(0,\underline{x},y)\bigg), \end{aligned}$$
(43)

where (42) follows from rearranging the terms and (43) follows from Lemma 3.

Case 2: \(d^{i}_{t}=W\)

Again, from Proposition 1, we have

$$\begin{aligned} &\alpha_{\pi^{i}_t}(0,\overline{x},y)-\alpha_{\pi^{i}_t}(0,\underline {x},y) \\ &\quad=\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|0)\Bigg[r_{t}(0,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',\overline{x},y)\Bigg] \\ &\quad\quad{}-\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|0)\Bigg[r_{t}(0,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \\ &\quad=\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\Bigg[\alpha_{\pi ^{i}_{t+1}}(s',\overline{x},y)-\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \end{aligned}$$
(44)
$$\begin{aligned} &\quad=P_{t}^{(E,E+)}(0|0)\Bigg[\alpha_{\pi^{i}_{t+1}}(0,\overline {x},y)-\alpha_{\pi^{i}_{t+1}}(0,\underline{x},y)\Bigg] \end{aligned}$$
(45)

where (44) follows from the assumption that \(P_{t}^{(W,W-)}(s'|0)= P_{t}^{(W,W+)}(s'|0)= P_{t}^{(E,E+)}(s'|0)\) and algebraic simplification and (45) follows from Lemma 3.

Then, combining the results of Case 1 and Case 2, we get

$$\begin{aligned} \alpha_{\pi^{i}_t}(0,\overline{x},y)-\alpha_{\pi^{i}_t}(0,\underline {x},y) = & \left\{ \begin{array}{ll} \big(\overline{x}-\underline{x}\big) \big[r_{t}(0,E,E-)-r_{t}(0,E,E+)\big] \\ \quad{}+ P_{t}^{(E,E+)}(0|0) \big(\alpha_{\pi^{i}_{t+1}}(0,\overline{x},y) -\alpha_{\pi^{i}_{t+1}}(0,\underline{x},y)\big) \\ \quad \textrm{if } d^{i}_t=E,\\ P_{t}^{(E,E+)}(0|0)\big[\alpha_{\pi^{i}_{t+1}}(0,\overline{x},y)-\alpha _{\pi^{i}_{t+1}}(0,\underline{x},y)\big] \\ \quad \textrm{if } d^{i}_{t}=W, \end{array} \right. \end{aligned}$$

Expanding this recursive equation, we obtain the following:

Then,

$$\begin{aligned} &\bigg(V_{\pi^{1}_t}(b,\overline{x},y) - V_{\pi^{2}_t}(b,\overline {x},y)\bigg)- \bigg(V_{\pi^{1}_t}(b,\underline{x},y) - V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \end{aligned}$$
(46)
$$\begin{aligned} &\quad=b(0)\Bigg[\bigg(\alpha_{\pi^{1}_{t}}(0,\overline{x},y) - \alpha_{\pi ^{1}_{t}}(0,\underline{x},y)\bigg) - \bigg(\alpha_{\pi ^{2}_{t}}(0,\overline{x},y) - \alpha_{\pi^{2}_{t}}(0,\underline {x},y)\bigg)\Bigg] \end{aligned}$$
(47)
(48)
$$\begin{aligned} &\quad \geq0 \end{aligned}$$
(49)

where (47) follows from (41), (48) follows from the recursive expansion, and (49) follows from the fact that \(\overline{x} \geq\underline{x}\), Assumption 1, and Definition 2.  □

Proof of Corollary 1

Let \(\pi^{*}_{t}\) be the optimal policy at (x,y) and \(\overline{x} \geq x \geq0\). Let \(\pi^{i}_{t}\) be any arbitrary policy from the policy set Π t such that \(\pi^{i}_{t}\) is less aggressive than \(\pi^{*}_{t}\). Assume that \(\pi^{i}_{t}\) is uniquely optimal at \((\overline{x},y)\). Then \(V_{\pi^{i}_{t}}(b,\overline{x},y) > V_{\pi^{*}_{t}}(b,\overline{x},y)\). However, from Proposition 3, we have

$$\begin{aligned} V_{\pi^{*}_t}(b,\overline{x},y)-V_{\pi^{i}_t}(b,\overline{x},y) & \geq V_{\pi^{*}_t}(b,{x},y) - V_{\pi^{i}_t}(b,{x},y) \\ &\geq0, \end{aligned}$$
(50)

where (50) follows because \(\pi^{*}_{t}\) is the optimal policy at (x,y). Then, \(V_{\pi^{i}_{t}}(b,\overline{x},y) \leq V_{\pi^{*}_{t}}(b,\overline{x},y)\) which contradicts with our assumption and completes the proof. □

Proof of Theorem 1

Let \(\varPi_{t}=\{ \pi^{i}_{t} | i \in I=\underline{I} \cup\overline{I}\}\) such that \(\underline{I}\) indexes policies that are less aggressive than \(\pi^{*}_{t}\) and \(\overline{I}\) indexes policies that are more aggressive than \(\pi^{*}_{t}\). Also, let 0≤x 1x 2≤1 such that \(V_{\pi^{*}_{t}}(b,x_{1},y) \geq V_{\pi^{i}_{t}}(b,x_{1},y)\) and \(V_{\pi ^{*}_{t}}(b,x_{2},y) \geq V_{\pi^{i}_{t}}(b,x_{2},y)\) for all \(\pi^{i}_{t} \in \varPi_{t}\). Then for any x such that x 1xx 2, we want to show that \(V_{\pi^{*}_{t}}(b,x,y) \geq V_{\pi^{i}_{t}}(b,x,y)\) for all \(\pi ^{i}_{t} \in\varPi_{t}\). Suppose we arbitrarily select a policy \(\pi^{i}_{t} \neq\pi^{*}_{t}\) from Π t . If \(i \in\underline{I}\) (i.e., \(\pi^{i}_{t}\) is less aggressive than \(\pi^{*}_{t}\)), then from Proposition 3, we have

$$\begin{aligned} V_{\pi^{*}_t}(b,{x},y) - V_{\pi^{i}_t}(b,{x},y) &\geq V_{\pi ^{*}_t}(b,x_1,y) - V_{\pi^{i}_t}(b,x_1,y) \\ &\geq0, \end{aligned}$$
(51)

where (51) follows because \(\pi^{*}_{t}\) is feasible at (x 1,y).

On the other hand, if \(i \in\overline{I}\) (i.e., \(\pi^{i}_{t}\) is more aggressive than \(\pi^{*}_{t}\)), then again from Proposition 3, we have

$$\begin{aligned} &V_{\pi^{i}_t}(b,{x},y) - V_{\pi^{*}_t}(b,{x},y)\leq V_{\pi ^{i}_t}(b,x_2,y)-V_{\pi^{*}_t}(b,x_2,y)\leq0, \end{aligned}$$
(52)

which again follows because \(\pi^{*}_{t}\) is feasible at (x 2,y). Then, \(V_{\pi^{*}_{t}}(b,{x},y) - V_{\pi^{i}_{t}}(b,{x},y) \geq0\) for any x 1xx 2 and \(\pi^{i}_{t} \in\varPi_{t}\), which completes the proof. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ayer, T. Inverse optimization for assessing emerging technologies in breast cancer screening. Ann Oper Res 230, 57–85 (2015). https://doi.org/10.1007/s10479-013-1520-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-013-1520-3

Keywords

Navigation