Inverse optimization for assessing emerging technologies in breast cancer screening

Ayer, Turgay

doi:10.1007/s10479-013-1520-3

Inverse optimization for assessing emerging technologies in breast cancer screening

Published: 15 January 2014

Volume 230, pages 57–85, (2015)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Turgay Ayer¹

848 Accesses
19 Citations
Explore all metrics

Abstract

Identifying the optimal screening strategies for breast cancer, the second leading cause of female cancer deaths in the US, is a major societal problem creating much controversy. The optimal screening strategies significantly depend on the sensitivity and specificity of the screening modality used. While the current state-of-the-art screening technology is mammography, its sensitivity or specificity may increase over time, or mammography may be replaced by another technology such as tomosynthesis in the near future. The purpose of this study is to identify the optimal use of the next generation of breast cancer screening modalities, whose sensitivity and specificity in clinical practice are either yet unknown or keep improving over time. Contrary to the prior literature that focuses on finding the optimal screening policy for given sensitivity and specificity values, we take an inverse optimization approach and focus on finding the range of sensitivity and specificity values, for which a given screening policy is optimal. To replicate breast cancer progression in the US population under various screening policies, we develop a parametric Partially Observable Markov Chain (POMC) model, which accounts for unobservable and age-specific disease progression, age-specific mortality, and the possibility of detecting cancer without a screening exam (either via self-detection or a clinical breast exam). We then formulate a nonlinear program (NLP) to identify the range of sensitivity and specificity values that optimize a particular screening policy. We show that this NLP is nonconvex for some parameter values, and hence difficult to solve. We prove several structural properties of the model, and by exploiting these properties, we propose a complete solution algorithm for this problem. We use real data in our numerical analysis and show that with the current technology, biennial breast cancer screening is slightly better than annual screening for the average-risk population. We also find that an improvement only in sensitivity (but not in specificity) will not change the current optimal policy. Furthermore, we characterize the lost potential quality-adjusted life years (QALYs) due to suboptimal practice, and show that biennial screening is more robust than annual screening in the sense that it results in fewer lost QALYs due to choosing a suboptimal screening policy. Given that the design of multicenter clinical trials may be prohibitively expensive and lengthy, our findings may be especially valuable to policymakers in deciding about the optimal use of an emerging breast cancer screening modality, and adapting a new technology in different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-objective constrained partially observable Markov decision process model for breast cancer screening

Article 29 April 2023

Optimal Decision Making for Breast Cancer Treatment in the Presence of Cancer Regression and Type II Error in Mammography Results

Value of MRI and Ultrasound Screening for Breast Cancer in Non-High-Risk Populations

References

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (p. 1). New York: ACM.
Chapter Google Scholar
Breast cancer facts and figures: 2009–2010. Atlanta: American Cancer Society.
Arias, E. (2006). United States life tables, 2004. National vital statistics reports, 54(14), 1–40.
Google Scholar
Ayer, T. (2011). Optimal policies for personalized breast cancer screening. PhD thesis, University of Wisconsin-Madison
Ayer, T., Alagoz, O., & Stout, N. K. (2012). A POMDP approach to personalize mammography screening decisions. Operations Research, 60(1), 1017–1018.
Google Scholar
Baker, R. D. (1998). Use of a mathematical model to evaluate breast cancer screening policy. Health Care Management Science, 1(2), 103–113.
Article Google Scholar
Barton, M. B., Harris, R., & Fletcher, S. W. (1999). Does this patient have breast cancer? The screening clinical breast examination: should it be done? How? The Journal of the American Medical Association, 282(13), 1270–1280.
Article Google Scholar
Baxter, N. (2001). Should women be routinely taught breast self-examination to screen for breast cancer? Canadian Medical Association Journal, 164(13), 1837–1845.
Google Scholar
Bernardi, D., Ciatto, S., Pellegrini, M., Tuttobene, P., Fanto, C., Valentini, M., Michele, S. D., Peterlongo, P., & Houssami, N. (2012). Prospective study of breast tomosynthesis as a triage to assessment in screening. Breast cancer research and treatment, 133(1), 1–5.
Article Google Scholar
Bolan, C. (2011). Breast screening’s trade-offs. Applied Radiology.
Brewer, N. T., Salz, T., & Lillie, S. E. (2007). Systematic review: the long-term effects of false-positive mammograms. Annals of Internal Medicine, 146(7), 502–510.
Article Google Scholar
Cassandra, A. R. (1998). A survey of POMDP applications. In Working notes of AAAI 1998 fall symposium on planning with partially observable Markov decision processes (pp. 17–24).
Google Scholar
Choi, J., & Kim, K. E. (2011). Inverse reinforcement learning in partially observable environments. Journal of Machine Learning Research, 12, 691–730.
Google Scholar
Costantino, J. P., Gail, M. H., Pee, D., Anderson, S., Redmond, C. K., Benichou, J., & Wieand, H. S. (1999). Validation studies for models projecting the risk of invasive and total breast cancer incidence. Journal of the National Cancer Institute, 91(18), 1541–1548.
Article Google Scholar
De Haes, J. C., de Koning, H. J., van Oortmarssen, G. J., Van Agt, H. M., de Bruyn, A. E., & van der Maas, P. J. (1991). The impact of a breast cancer screening programme on quality-adjusted life-years. International Journal of Cancer, 49(4), 538–544.
Article Google Scholar
Dobbins, J. T. III. (2009). Tomosynthesis imaging: at a translational crossroads. Medical physics, 36(6), 1956–1967.
Article Google Scholar
Drummond, M. F., Sculpher, M. J., Torrance, G. W., O’Brien, B. J., & Stoddart, G. L. (2005). Methods for the economic evaluation of health care programmes. New York: Oxford University Press.
Google Scholar
Earle, C. C., Chapman, R. H., Baker, C. S., Bell, C. M., Stone, P. W., Sandberg, E. A., & Neumann, P. J. (2000). Systematic overview of cost-utility assessments in oncology. Journal of Clinical Oncology, 18(18), 3302–3317.
Google Scholar
Elmore, J. G., Barton, M. B., Moceri, V. M., Polk, S., Arena, P. J., & Fletcher, S. W. (1998). Ten-year risk of false positive screening mammograms and clinical breast examinations. New England Journal of Medicine, 338(16), 1089–1096.
Article Google Scholar
Elmore, J. G., Reisch, L. M., Barton, M. B., Barlow, W. E., Rolnick, S., Harris, E. L., Herrinton, L. J., Geiger, A. M., Beverly, R. K., Hart, G., et al. (2005). Efficacy of breast cancer screening in the community according to risk level. Journal of the National Cancer Institute, 97(14), 1035–1043.
Article Google Scholar
Elmore, J. G., Wells, C. K., Lee, C. H., Howard, D. H., & Feinstein, A. R. (1994). Variability in radiologists’ interpretations of mammograms. New England Journal of Medicine, 331(22), 1493–1499.
Article Google Scholar
Erkin, Z., Bailey, M. D., Maillart, L. M., Schaefer, A. J., & Roberts, M. S. (2010). Eliciting patients’ revealed preferences: an inverse Markov decision process approach. Decision Analysis, 7(4), 358–365.
Article Google Scholar
Ferzli, G. S., Hurwitz, J. B., Puza, T., & Van Vorst-Bilotti, S. (1997). Advanced breast biopsy instrumentation: a critique. Journal of the American College of Surgeons, 185(2), 145–151.
Article Google Scholar
Fryback, D. G., Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Kuruchittham, V., & Remington, P. L. (2006). The Wisconsin breast cancer epidemiology simulation model. Journal of the National Cancer Institute Monographs, 36, 37–47.
Article Google Scholar
Gail, M. H., Costantino, J. P., Bryant, J., Croyle, R., Freedman, L., Helzlsouer, K., & Vogel, V. (1999). Weighing the risks and benefits of tamoxifen treatment for preventing breast cancer. Journal of the National Cancer Institute, 91(21), 1829–1846.
Article Google Scholar
Gur, D. (2007). Tomosynthesis: potential clinical role in breast imaging. American Journal of Roentgenology, 189(3), 614–615.
Article Google Scholar
Hillman, B. J., & Gatsonis, C. A. (2008). When is the right time to conduct a clinical trial of a diagnostic imaging technology? Radiology, 248(1), 12–15.
Article Google Scholar
Jemal, A., Siegel, R., Ward, E., Hao, Y., Xu, J., & Thun, M. J. (2009). Cancer statistics, 2009. CA: A Cancer Journal for Clinicians, 59(4), 225–249.
Google Scholar
Klabunde, C. N., & Ballard-Barbash, R. (2007). Evaluating population-based screening mammography programs internationally. Seminars in Breast Disease, 10(2), 102–107.
Article Google Scholar
Maillart, L. M., Ivy, J. S., Ransom, S., & Diehl, K. (2008). Assessing dynamic breast cancer screening policies. Operations Research, 56(6), 1411–1427.
Article Google Scholar
Mandelblatt, J. S., Wheat, M. E., Monane, M., Moshief, R. D., Hollenberg, J. P., & Tang, J. (1992). Breast cancer screening for elderly women with and without comorbid conditions: a decision analysis model. Annals of Internal Medicine, 116(9), 722–730.
Article Google Scholar
Mandelblatt, J. S., Cronin, K. A., Bailey, S., Berry, D. A., de Koning, J. H., Draisma, G., Huang, H., Lee, S. J., Munsell, M., Plevritis, S. K., et al. (2009). Effects of mammography screening under different screening schedules: model estimates of potential benefits and harms. Annals of Internal Medicine, 151(10), 738–747.
Article Google Scholar
Messina, C. R., Lane, D. S., Glanz, K., West, D. S., Taylor, V., Frishman, W., & Powell, L. (2004). Relationship of social support and social burden to repeated breast cancer screening in the women’s health initiative. Health Psychology, 23(6), 582–594.
Article Google Scholar
Nelson, H. D., Tyne, K., Naik, A., Bougatsos, C., Chan, B. K., & Humphrey, L. (2009). Screening for breast cancer: systematic evidence review update for the US preventive services task force. Annals of Internal Medicine, 151(10), 727–W242.
Article Google Scholar
Neu, G., & Szepesvári, C. (2009). Training parsers by inverse reinforcement learning. Machine Learning, 77(2), 303–337.
Article Google Scholar
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proc. 17th International Conf. on Machine Learning. Citeseer.
Google Scholar
Ozekici, S., & Pliska, S. R. (1991). Optimal scheduling of inspections: a delayed Markov model with false positives and negatives. Operations Research, 39(2), 261–273.
Article Google Scholar
Parker, S. L., Tong, T., Bolden, S., & Wingo, P. A. (1997). Cancer statistics, 1997. CA: A Cancer Journal for Clinicians, 47(1), 5–27.
Google Scholar
Rafferty, E. A., Park, J. M., Philpotts, L. E., Poplack, S. P., Sumkin, J. H., Halpern, E. F., & Niklason, L. T. (2013). Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. Radiology, 266(1), 104–113.
Article Google Scholar
Ramachandran, D. (2007). Bayesian inverse reinforcement learning. In 20th Int. Joint Conf. Artificial Intelligence.
Google Scholar
Sackett, D. L., & Haynes, R. B. (2002). Evidence base of clinical diagnosis: the architecture of diagnostic research. BMJ: British Medical Journal, 324(7336), 539.
Article Google Scholar
Shapiro, S., Coleman, E. A., Broeders, M., Codd, M., de Koning, H., Fracheboud, J., Moss, S., et al. (1998). Breast cancer screening programmes in 22 countries: current policies, administration and guidelines. International Journal of Epidemiology, 27(5), 735–742.
Article Google Scholar
Shen, Y., & Zelen, M. (2001). Screening sensitivity and sojourn time from breast cancer early detection clinical trials: mammograms and physical examinations. Journal of Clinical Oncology, 19(15), 3490–3499.
Google Scholar
Skaane, P. (2011). Controversies in mammography screening: let us not ignore science in this never-ending debate. Acta Radiologica, 52(10), 1061–1063.
Article Google Scholar
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21(5), 1071–1088.
Article Google Scholar
Smith, R. A., Duffy, S. W., & Tabár, L. (2012). Breast cancer screening: the evolving evidence. Oncology, 26(5), 471–475.
Google Scholar
Sommer, C. A., Stitzenberg, K. B., Tolleson-Rinehart, S., Carpenter, W. R., & Carey, T. S. (2011). Breast MRI utilization in older patients with newly diagnosed breast cancer. Journal of Surgical Research, 170(1), 77–83.
Article Google Scholar
Sonnenberg, F. A., & Beck, J. R. (1993). Markov models in medical decision making: a practical guide. Medical Decision Making, 13(4), 322–338.
Article Google Scholar
Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Smith, M. A., Robinson, S. M., & Fryback, D. G. (2006). Retrospective cost-effectiveness analysis of screening mammography. Journal of the National Cancer Institute, 98(11), 774–782.
Article Google Scholar
USPSTF (2009). Clinical guidelines: screening for breast cancer: US preventive services task force recommendation statement. Annals of Internal Medicine, 151, 716–726.
Article Google Scholar
Zelen, M. (1993). Optimal scheduling of examinations for the early detection of disease. Biometrika, 80(2), 279–293.
Article Google Scholar

Download references

Acknowledgements

The author thanks Jagpreet Chhatwal, Qiushi Chen, Chelsea C. White III, Jeff Pavelka, Anthony Bonifonte, Sait Tunc, and the anonymous reviewers for their suggestions and insights, which have improved this manuscript.

Author information

Authors and Affiliations

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Turgay Ayer

Authors

Turgay Ayer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Turgay Ayer.

Appendix:Proofs of the Analytical Results

Proof of Proposition 1

We prove this by induction. Basis: $V_{\pi^{i}_{T}}(b,x,y) = \sum_{s \in S^{PO}}b(s) r_{T}(s)$. Now, we will present $V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y)$ in terms of the α-vectors. Substituting the value of τ[b,a,o] from (2) into $V_{\pi ^{i}_{t+1}}(\tau[b,a,o],x,y) =\sum_{s' \in S^{PO}}\tau[b,a,o](s')\alpha _{\pi^{i}_{t+1}}(s',x,y)$, we get

$$\begin{aligned} V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y) = & \left\{ \begin{array}{ll} \sum_{s' \in S^{PO}} \bigg(\frac{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)P_{t}^{(a,o)}(s'|s)}{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)} \bigg)\alpha_{\pi^{i}_{t+1}}(s',x,y) \\ \quad \textrm{ if } a=W \textrm{ or } a=E \textrm{ and } o=E-,\\ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y) \\ \quad \textrm { if } a=E \textrm{ and } o=E+.& \end{array} \right. \end{aligned}$$

(17)

In (17), $\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)$ does not depend on s′, hence we can move it out of the summation. Also, changing the order of summation, we obtain the following:

$$\begin{aligned} &V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y) = \left\{ \begin{array}{l@{\quad}l} \frac{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(a,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)} \\ \quad \textrm{ if } a=W \textrm{ or } a=E \textrm { and } o=E-,\\ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y) \\ \quad \textrm{ if } a=E \textrm{ and } o=E+.& \end{array} \right. \end{aligned}$$

(18)

Now, we substitute the value of $V_{\pi^{i}_{t+1}}(\tau[b,a,o],x,y) $ to obtain $V_{\pi^{i}_{t}}(b,x,y)$ in terms of α-vectors.

Case 1: $d^{i}_{t}=E$

From (4), we know that

$$\begin{aligned} V_{\pi^{i}_{t}}(b,x,y)&= b(0)x\bigg(r_{t}(0,E,E-)+V_{\pi^{i}_{t+1}}(\tau [b,E,E-],x,y)\bigg) \\ &\quad{}+ b(0)\big(1-x\big)\bigg(r_{t}(0,E,E+)+V_{\pi^{i}_{t+1}}(\tau [b,E,E+],x,y)\bigg) \\ &\quad{}+\sum_{s=1}^{2}b(s) \bigg[ \big(1-y\big)\bigg(r_{t}(s,E,E-)+ V_{\pi^{i}_{t+1}}(\tau[b,E,E-],x,y) \bigg)\bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)yR_{t}(s) \\ & = \sum_{s \in S^{PO}}b(s) \Bigg[ K_{t}^{E}(E-|s)\bigg(r_{t}(s,E,E-)+ V_{\pi^{i}_{t+1}}(\tau[b,E,E-],x,y) \bigg)\Bigg] \\ &\quad{}+ b(0)K_{t}^{E}(E+|0)\bigg(r_{t}(0,E,E+)+V_{\pi^{i}_{t+1}}(\tau [b,E,E+],x,y)\bigg) \\ &\quad{}+\sum_{s=1}^{2}b(s)K_{t}^{E}(E+|s)R_{t}(s), \end{aligned}$$

(19)

where (19) follows from replacing x with $K_{t}^{E}(E-|0)$ and y with $K_{t}^{E}(E+|s)$ when s∈{1,2}, and the fact that $\mathbb{K}_{t}^{E}$ is a stochastic matrix.

Substituting the values of $V_{\pi^{i}_{t+1}}(\tau[b,E,E-],x,y)$ and $V_{\pi^{i}_{t+1}}(\tau[b,E,E+],x,y)$ from (18) we obtain

$$\begin{aligned} V_{\pi^{i}_{t}}(b,x,y)&= \sum_{s \in S^{PO}}b(s) K_{t}^{E}(E-|s)\Bigg[r_{t}(s,E,E-) \\ &\quad{}+ \bigg(\frac{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(a,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{a}(o|s)} \bigg)\Bigg] \\ &\quad{}+ b(0)K_{t}^{E}(E+|0)\bigg(r_{t}(0,E,E+)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) \\ &\quad{}+\sum_{s=1}^{2}b(s)K_{t}^{E}(E+|s)R_{t}(s) \end{aligned}$$

(20)

$$\begin{aligned} &= \sum_{s \in S^{PO}}b(s)K_{t}^{E}(E-|s)\Bigg[r_{t}(s,E,E-) + \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+ b(0)K_{t}^{E}(E+|0)\Bigg[r_{t}(0,E,E+) +\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y) \Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)K_{t}^{E}(E+|s)R_{t}(s) \end{aligned}$$

(21)

$$\begin{aligned} & = b(0)x \Bigg[r_{t}(s,E,E-)+\sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+b(0)\big(1-x\big)\Bigg[r_{t}(s,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s) \big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)yR_{t}(s) \end{aligned}$$

(22)

$$\begin{aligned} & = b(0)x \Bigg[r_{t}(s,E,E-)-r_{t}(s,E,E+)\Bigg] \\ &\quad{}+b(0)\Bigg[r_{t}(s,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s) \big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg] \\ &\quad{}+\sum_{s=1}^{2}b(s)yR_{t}(s) \end{aligned}$$

(23)

where (21) follows from changing the order of summation, rearranging the terms, and canceling the identical terms in the numerator and denominator; (22) follows from replacing $K_{t}^{E}(E-|0)$ with x, $K_{t}^{E}(E+|s)$ when s∈{1,2} with y, and the fact that $\mathbb{K}_{t}^{E}$ is a stochastic matrix; and (23) follows because $P_{t}^{(E,E-)}(s'|0) =P_{t}^{(E,E+)}(s'|0)$ for all t≤T by Assumption 3.

Case 2: $d^{i}_{t}=W$

Similar to Case 1, substituting the value of $V_{\pi^{i}_{t+1}}(\tau [b,W,o],x,y)$ from (18) into (4), we obtain

$$\begin{aligned} &V_{\pi^{i}_{t}}(b,x,y) \\&\quad= \sum_{ s \in S^{PO}, o \in\varTheta_{W} } b(s)K_{t}^{W}(o|s) \Bigg[r_{t}(b,W,o) \\ &\quad\quad{}+ \bigg( \frac{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)} \bigg) \Bigg] \\ &\quad =\sum_{ s \in S^{PO}, o \in\varTheta_{W} } b(s)K_{t}^{W}(o|s) r_{t}(b,W,o) \\ &\quad\quad{}+ \sum_{ o \in\varTheta_{W} } \left(\frac{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y)}{\sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)} \right) \sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s) \end{aligned}$$

(24)

$$\begin{aligned} &\quad =\sum_{ s \in S^{PO}, o \in\varTheta_{W} } b(s)K_{t}^{W}(o|s) r_{t}(b,W,o) \\ &\quad\quad{}+ \sum_{ o \in\varTheta_{W} } \sum_{s \in S^{PO}}b(s)K_{t}^{W}(o|s)\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x,y) \end{aligned}$$

(25)

$$\begin{aligned} &\quad =\sum_{s \in S^{PO}}b(s) \sum_{o \in \varTheta_{W}}K_{t}^{W}(o|s)\Bigg[r_{t}(s,W,o) +\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s) \alpha_{\pi^{i}_{t+1}}(s',x,y) \Bigg], \end{aligned}$$

(26)

where (24) follows from changing the order of summation and rearranging the terms, (25) follows from canceling the terms in the numerator and denominator, and (26) follows from simple algebra. □

Proof of Lemma 1

We prove the general inequality case and the proof for the strict inequality in Part (a) is very similar, with the only exception that the basis in the induction changes. By Proposition 1, $V_{\pi^{i}_{t}}(b,x_{2},y_{2}) \geq V_{\pi ^{i}_{t}}(b,x_{1},y_{1})$ if $\sum_{s \in S^{PO}}b(s)\alpha_{\pi ^{i}_{t}}(s,x_{2},y_{2}) \geq\sum_{s \in S^{PO}}b(s)\alpha_{\pi ^{i}_{t}}(s,x_{1},y_{1})$. Therefore, it is sufficient to show that $\alpha _{\pi^{i}_{t}}(s,x_{2},y_{2})\geq\alpha_{\pi^{i}_{t}}(s,x_{1},y_{1})$ for all s∈S ^PO whenever x ₂≥x ₁ and y ₂≥y ₁. We prove this by induction as follows. Basis: $\alpha_{\pi^{i}_{T}}(s,x_{2},y_{2})= r_{T}(s) \geq\alpha_{\pi^{i}_{T}}(s,x_{1},y_{1}) = r_{T}(s)$. Suppose the assertion holds for $\alpha_{\pi^{i}_{t+1}}$. Then, we need to show this for the induction step. Note that when $d^{i}_{t} = W$, this follows directly from Proposition 1 because of the induction hypothesis. When $d^{i}_{t} = E$ and s=0, the assertion can be proven as follows:

$$\begin{aligned} \alpha_{\pi^{i}_t}(0,x_2,y_2)&=r_{t}(0,E,E+) + \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x_2,y_2) \\ &\quad{} + x_2 \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \end{aligned}$$

(27)

$$\begin{aligned} &\geq r_{t}(0,E,E+) + \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x_1,y_1) \\ &\quad{}+ x_1 \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &=\alpha_{\pi^{i}_t}(0,x_1,y_1) \end{aligned}$$

(28)

where (28) follows because of the induction hypothesis and from the assumption that r _t(0,E,E−)−r _t(0,E,E+)≥0.

On the other hand, when $d^{i}_{t} = E$ and s∈{1,2}, the assertion can be proven as follows:

$$\begin{aligned} &\alpha_{\pi^{i}_t}(s,x_2,y_2) \\ &\quad=\big(1-y_2\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\Bigg] +y_2 R_{t}(s) \\ &\quad=\big(1-y_1\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\Bigg] + y_1 R_{t}(s) \\ &\quad\quad{}+ \big(y_2-y_1\big) \Bigg[ R_{t}(s)- \bigg(r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\bigg)\Bigg] \end{aligned}$$

(29)

$$\begin{aligned} &\quad\geq\alpha_{\pi^{i}_t}(s,x_1,y_1) \\ &\quad\quad{}+\big(y_2-y_1\big) \Bigg[ R_{t}(s)- \bigg(r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',x_2,y_2)\bigg)\Bigg] \end{aligned}$$

(30)

$$\begin{aligned} &\quad\geq\alpha_{\pi^{i}_t}(s,x_1,y_1), \end{aligned}$$

(31)

where (29) follows from rearranging the terms, (30) follows from the induction hypothesis, and (31) follows from (9). □

Proof of Lemma 2

This proof follows directly from the proof of Lemma 1, where we show that $\alpha_{\pi^{i}_{t}}(s,x,y)$ is nondecreasing in x and y for all s∈S ^PO, t≤T, and $\pi^{i}_{t} \in\varPi _{t}$. □

Proof of Proposition 2

We would like to show that the feasible region of the NLP given in (5) is nonconvex in y, i.e. $V_{\pi^{1}_{t}}(b,x,y) \leq V_{\pi ^{2}_{t}}(b,x,y)$ for all $y \in\{\underline{y}, \overline{y}\}$, but ∃ $\dot{y} \in[\underline{y},\overline{y}]$ such that $V_{\pi ^{1}_{t}}(b,x,\dot{y}) \geq V_{\pi^{2}_{t}}(b,x,\dot{y})$. Let $\pi ^{i}_{t+1}$ be any fixed policy (i.e., does not depend on b) in Π _t+1, and $\varPi_{t}=\{\pi^{1}_{t}, \pi^{2}_{t}\}$ where $\pi^{1}_{t}=\{E, \pi^{i}_{t+1}\}$, and $\pi^{2}_{t}=\{W, \pi^{i}_{t+1}\}$. That is, $\pi ^{1}_{t}$ and $\pi^{2}_{t}$ are identical except the action taken at time t. From Proposition 1, we know that $V_{\pi ^{i}_{t}}(b,x,y) =\sum_{s \in S^{PO}}b(s)\alpha_{\pi^{i}_{t}}(s,x,y)$, hence it is sufficient to show that $\sum_{s \in S^{PO}}b(s) (\alpha _{\pi^{1}_{t}}(s,x,y)- \alpha_{\pi^{2}_{t}}(s,x,y) )$ is negative for all $y \in\{\underline{y}, \overline{y}\}$ but positive for some $\dot {y} \in[\underline{y}, \overline{y}]$. Let $\alpha_{\pi ^{i}_{t}}(x,y)=\big[\alpha_{\pi^{i}_{t}}(s,x,y)\big]$. From Proposition 1, we have

$$\begin{aligned} & \alpha_{\pi^{1}_t}(x,y)-\alpha_{\pi^{2}_t}(x,y) \\ &= \left[ \begin{array}{c} x \bigg(r_{t}(0,E,E-)-r_{t}(0,E,E+)\bigg)+ \bigg(r_{t}(0,E,E+)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg)\\ \big(1-y\big) \bigg(r_{t}(1,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(1)\\ \big(1-y\big) \bigg(r_{t}(2,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(2) \end{array} \right] \\ &\quad{}-\left[ \begin{array}{c} \sum_{o \in\varTheta_{W}}K_{t}^{W}(o|0)\Bigg(r_{t}(0,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg)\\ \sum_{o \in\varTheta_{W}}K_{t}^{W}(o|1)\Bigg(r_{t}(1,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg)\\ \sum_{o \in\varTheta_{W}}K_{t}^{W}(o|2)\Bigg(r_{t}(2,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)\Bigg) \end{array} \right] \\ &= \left[ \begin{array}{c} x \bigg(r_{t}(0,E,E-)-r_{t}(0,E,E+)\bigg)+ \bigg(r_{t}(0,E,E+)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg)\\ \big(1-y\big) \bigg(r_{t}(1,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(1)\\ \big(1-y\big) \bigg(r_{t}(2,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)\bigg) +yR_{t}(2) \end{array} \right] \\ &{}-\left[ \begin{array}{c} r_{t}(0,W,W-)+\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|0)\alpha_{\pi ^{i}_{t+1}}(s',x,y)\\ r_{t}(1,W,W-)+\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|1)\alpha_{\pi ^{i}_{t+1}}(s',x,y)\\ r_{t}(2,W,W-)+\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|2)\alpha_{\pi ^{i}_{t+1}}(s',x,y) \end{array} \right] = \end{aligned}$$

(32)

$$\begin{aligned} & \left[ \begin{array}{@{\!\!}c@{\!\!}} x (r_{t}(0,E,E-)-r_{t}(0,E,E+))+ (r_{t}(0,E,E+) - r_{t}(0,W,W-) )\\ (1-y) (r_{t}(1,E,E-) - r_{t}(1,W,W-) ) +y (R_{t}(1) -r_{t}(1,W,W-)- \sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|1)\alpha_{\pi^{i}_{t+1}}(s',x,y))\\ (1-y) (r_{t}(2,E,E-) - r_{t}(2,W,W-) ) +y (R_{t}(2) - r_{t}(2,W,W-)-\sum_{s' \in S^{PO}}P_{t}^{(W,W-)}(s'|2)\alpha_{\pi^{i}_{t+1}}(s',x,y)) \end{array} \right] \end{aligned}$$

(33)

$$\begin{aligned} &= \left[ \begin{array}{c} (x-1) du^{D}\\ y \ell_{t}(1, x, y)\\ y \ell_{t}(2, x, y) \end{array} \right] \end{aligned}$$

(34)

where (32) and (33) follow from Assumptions 2, 3, and the fact that both $\pi^{1}_{t}$ and $\pi ^{2}_{t}$ are fixed policies and do not depend on b, and (34) follows from Assumption 2 and Definition 1. Note that du ^D≥0, (x−1)du ^D≤0 because 0≤x≤1, and ℓ _t(1,x,y) is nonincreasing in y by Lemma 2. Therefore, the values of du ^D, yℓ _t(1,x,y), and yℓ _t(2,x,y) could be such that

$$\sum_{s \in S^{PO}}b(s)\bigg(\alpha_{\pi^{1}_t}(s,x,y)- \alpha_{\pi ^{2}_t}(s,x,y) \bigg)=b\left[ \begin{array}{c} (x-1) du^{D}\\ y \ell_{t}(1, x, y)\\ y \ell_{t}(2, x, y) \end{array} \right]$$

is negative for all $y \in\{\underline{y}, \overline{y}\}$ but positive for some $\dot{y} \in[\underline{y}, \overline{y}]$, which makes the problem nonconvex. □

Proof of Lemma 3

We prove this by induction. For the basis step, it is obvious to see that $\alpha_{\pi^{i}_{T}}(s,\overline{x},y)-\alpha_{\pi ^{i}_{T}}(s,\underline{x},y)=0$, as $\alpha_{\pi^{i}_{T}}(s,x,y)=r_{T}(s)$ for any s, x, and y. Assume that the assertion holds for the decision epoch t+1. At time t, there are two possible cases: either the decision variable is W or E.

Case 1: $d^{i}_{t}=W$

$$\begin{aligned} &\alpha_{\pi^{i}_t}(s,\overline{x},y)-\alpha_{\pi^{i}_t}(s,\underline {x},y) \\ &\quad=\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|s)\Bigg[r_{t}(s,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\overline{x},y)\Bigg] \\ &\quad\quad{}-\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|s)\Bigg[r_{t}(s,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \\ &\quad=\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|s)\sum_{s' \in\{1,2\} }P_{t}^{(W,o)}(s'|s)\Bigg[\alpha_{\pi^{i}_{t+1}}(s',\overline {x},y)-\alpha_{\pi^{i}_{t+1}}(s',\underline{x},y)\Bigg] \end{aligned}$$

(35)

$$\begin{aligned} &\quad=0, \end{aligned}$$

(36)

where (35) follows by canceling out $\sum_{o \in \varTheta_{W}}K_{t}^{W}(o|s)r_{t}(s,W,o)$ and from Assumption 4, and (36) follows from the induction hypothesis.

Case 2: $d^{i}_{t}=E$

$$\begin{aligned} &\alpha_{\pi^{i}_t}(s,\overline{x},y)-\alpha_{\pi^{i}_t}(s,\underline {x},y) \\ &\quad=\big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\overline {x},y)\Bigg] +yR_{t}(s) \\ &\quad\quad{}-\big(1-y\big) \Bigg[r_{t}(s,E,E-)+ \sum_{s' \in S^{PO}}P_{t}^{(E,E-)}(s'|s)\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg]-yR_{t}(s) \\ &\quad=\big(1-y\big)\Bigg[ \sum_{s' \in \{1,2\}}P_{t}^{(E,E-)}(s'|s) \bigg( \alpha_{\pi^{i}_{t+1}}(s',\overline {x},y) - \alpha_{\pi^{i}_{t+1}}(s',\underline{x},y) \bigg)\Bigg] \end{aligned}$$

(37)

$$\begin{aligned} &\quad=0, \end{aligned}$$

(38)

where (37) follows from Assumption 4 and simple algebra, and (38) follows from the induction hypothesis. □

Proof of Proposition 3

Let $0 \leq\underline{x} \leq\overline{x} \leq1$. We want to show that:

$$\begin{aligned} &\bigg(V_{\pi^{1}_t}(b,\overline{x},y) - V_{\pi^{2}_t}(b,\overline {x},y)\bigg)- \bigg(V_{\pi^{1}_t}(b,\underline{x},y) - V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \\ &\quad=\bigg(V_{\pi^{1}_t}(b,\overline{x},y)-V_{\pi^{1}_t}(b,\underline {x},y)\bigg)- \bigg(V_{\pi^{2}_t}(b,\overline{x},y)- V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \\ &\quad\geq0 . \end{aligned}$$

(39)

Note that by Lemma 3 for any $\pi^{i}_{t} \in\varPi_{t}$, b∈B, and $\overline{x}, \underline{x} \in[0,1]$:

$$\begin{aligned} V_{\pi^{i}_t}(b,\overline{x},y)-V_{\pi^{i}_t}(b,\underline{x},y) &= \sum _{s \in S^{PO}} b(s) \bigg(\alpha_{\pi^{i}_{t}}(s,\overline{x},y) - \alpha_{\pi^{i}_{t}}(s,\underline{x},y)\bigg)\\ &=b(0)\bigg(\alpha_{\pi^{i}_{t}}(0,\overline{x},y) - \alpha_{\pi ^{i}_{t}}(0,\underline{x},y)\bigg). \end{aligned}$$

Then, we can rewrite (39) as follows:

$$\begin{aligned} &\bigg(V_{\pi^{1}_t}(b,\overline{x},y) - V_{\pi^{2}_t}(b,\overline {x},y)\bigg)- \bigg(V_{\pi^{1}_t}(b,\underline{x},y) - V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \end{aligned}$$

(40)

$$\begin{aligned} &\quad=b(0)\Bigg[\bigg(\alpha_{\pi^{1}_{t}}(0,\overline{x},y) - \alpha_{\pi ^{1}_{t}}(0,\underline{x},y)\bigg) - \bigg(\alpha_{\pi ^{2}_{t}}(0,\overline{x},y) - \alpha_{\pi^{2}_{t}}(0,\underline {x},y)\bigg)\Bigg] \end{aligned}$$

(41)

Now, we will express $\alpha_{\pi^{i}_{t}}(0,\overline{x},y) - \alpha _{\pi^{i}_{t}}(0,\underline{x},y)$ for a given policy $\pi^{i}_{t}=\{ d^{i}_{t}, d^{i}_{t+1}, \ldots, d^{i}_{T}\}$ in terms of the input parameters (i.e., rewards, transition probabilities, and observation probabilities).

Case 1: $d^{i}_{t}=E$

From Proposition 1, we have

$$\begin{aligned} \alpha_{\pi^{i}_t}(0,\overline{x},y)-\alpha_{\pi^{i}_t}(0,\underline {x},y)&=\overline{x} \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+) \Bigg] \\ &\quad{}+\Bigg[r_{t}(0,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',\overline {x},y)\Bigg] \\ &\quad{}-\underline{x} \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &\quad{}-\Bigg[r_{t}(0,E,E+)+\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \\ & = \bigg(\overline{x}-\underline{x}\bigg) \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &\quad{}+ \sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0) \bigg(\alpha_{\pi^{i}_{t+1}}(s',\overline {x},y) -\alpha_{\pi^{i}_{t+1}}(s',\underline{x},y)\bigg) \end{aligned}$$

(42)

$$\begin{aligned} & = \bigg(\overline{x}-\underline{x}\bigg) \Bigg[r_{t}(0,E,E-)-r_{t}(0,E,E+)\Bigg] \\ &\quad{}+ P_{t}^{(E,E+)}(0|0) \bigg(\alpha_{\pi^{i}_{t+1}}(0,\overline{x},y) -\alpha_{\pi^{i}_{t+1}}(0,\underline{x},y)\bigg), \end{aligned}$$

(43)

where (42) follows from rearranging the terms and (43) follows from Lemma 3.

Case 2: $d^{i}_{t}=W$

Again, from Proposition 1, we have

$$\begin{aligned} &\alpha_{\pi^{i}_t}(0,\overline{x},y)-\alpha_{\pi^{i}_t}(0,\underline {x},y) \\ &\quad=\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|0)\Bigg[r_{t}(0,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',\overline{x},y)\Bigg] \\ &\quad\quad{}-\sum_{o \in\varTheta_{W}}K_{t}^{W}(o|0)\Bigg[r_{t}(0,W,o)+\sum_{s' \in S^{PO}}P_{t}^{(W,o)}(s'|0)\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \\ &\quad=\sum_{s' \in S^{PO}}P_{t}^{(E,E+)}(s'|0)\Bigg[\alpha_{\pi ^{i}_{t+1}}(s',\overline{x},y)-\alpha_{\pi^{i}_{t+1}}(s',\underline {x},y)\Bigg] \end{aligned}$$

(44)

$$\begin{aligned} &\quad=P_{t}^{(E,E+)}(0|0)\Bigg[\alpha_{\pi^{i}_{t+1}}(0,\overline {x},y)-\alpha_{\pi^{i}_{t+1}}(0,\underline{x},y)\Bigg] \end{aligned}$$

(45)

where (44) follows from the assumption that $P_{t}^{(W,W-)}(s'|0)= P_{t}^{(W,W+)}(s'|0)= P_{t}^{(E,E+)}(s'|0)$ and algebraic simplification and (45) follows from Lemma 3.

Then, combining the results of Case 1 and Case 2, we get

$$\begin{aligned} \alpha_{\pi^{i}_t}(0,\overline{x},y)-\alpha_{\pi^{i}_t}(0,\underline {x},y) = & \left\{ \begin{array}{ll} \big(\overline{x}-\underline{x}\big) \big[r_{t}(0,E,E-)-r_{t}(0,E,E+)\big] \\ \quad{}+ P_{t}^{(E,E+)}(0|0) \big(\alpha_{\pi^{i}_{t+1}}(0,\overline{x},y) -\alpha_{\pi^{i}_{t+1}}(0,\underline{x},y)\big) \\ \quad \textrm{if } d^{i}_t=E,\\ P_{t}^{(E,E+)}(0|0)\big[\alpha_{\pi^{i}_{t+1}}(0,\overline{x},y)-\alpha _{\pi^{i}_{t+1}}(0,\underline{x},y)\big] \\ \quad \textrm{if } d^{i}_{t}=W, \end{array} \right. \end{aligned}$$

Expanding this recursive equation, we obtain the following:

Then,

$$\begin{aligned} &\bigg(V_{\pi^{1}_t}(b,\overline{x},y) - V_{\pi^{2}_t}(b,\overline {x},y)\bigg)- \bigg(V_{\pi^{1}_t}(b,\underline{x},y) - V_{\pi ^{2}_t}(b,\underline{x},y)\bigg) \end{aligned}$$

(46)

$$\begin{aligned} &\quad=b(0)\Bigg[\bigg(\alpha_{\pi^{1}_{t}}(0,\overline{x},y) - \alpha_{\pi ^{1}_{t}}(0,\underline{x},y)\bigg) - \bigg(\alpha_{\pi ^{2}_{t}}(0,\overline{x},y) - \alpha_{\pi^{2}_{t}}(0,\underline {x},y)\bigg)\Bigg] \end{aligned}$$

(47)

(48)

$$\begin{aligned} &\quad \geq0 \end{aligned}$$

(49)

where (47) follows from (41), (48) follows from the recursive expansion, and (49) follows from the fact that $\overline{x} \geq\underline{x}$, Assumption 1, and Definition 2. □

Proof of Corollary 1

Let $\pi^{*}_{t}$ be the optimal policy at (x,y) and $\overline{x} \geq x \geq0$. Let $\pi^{i}_{t}$ be any arbitrary policy from the policy set Π _t such that $\pi^{i}_{t}$ is less aggressive than $\pi^{*}_{t}$. Assume that $\pi^{i}_{t}$ is uniquely optimal at $(\overline{x},y)$. Then $V_{\pi^{i}_{t}}(b,\overline{x},y) > V_{\pi^{*}_{t}}(b,\overline{x},y)$. However, from Proposition 3, we have

$$\begin{aligned} V_{\pi^{*}_t}(b,\overline{x},y)-V_{\pi^{i}_t}(b,\overline{x},y) & \geq V_{\pi^{*}_t}(b,{x},y) - V_{\pi^{i}_t}(b,{x},y) \\ &\geq0, \end{aligned}$$

(50)

where (50) follows because $\pi^{*}_{t}$ is the optimal policy at (x,y). Then, $V_{\pi^{i}_{t}}(b,\overline{x},y) \leq V_{\pi^{*}_{t}}(b,\overline{x},y)$ which contradicts with our assumption and completes the proof. □

Proof of Theorem 1

Let $\varPi_{t}=\{ \pi^{i}_{t} | i \in I=\underline{I} \cup\overline{I}\}$ such that $\underline{I}$ indexes policies that are less aggressive than $\pi^{*}_{t}$ and $\overline{I}$ indexes policies that are more aggressive than $\pi^{*}_{t}$. Also, let 0≤x ₁≤x ₂≤1 such that $V_{\pi^{*}_{t}}(b,x_{1},y) \geq V_{\pi^{i}_{t}}(b,x_{1},y)$ and $V_{\pi ^{*}_{t}}(b,x_{2},y) \geq V_{\pi^{i}_{t}}(b,x_{2},y)$ for all $\pi^{i}_{t} \in \varPi_{t}$. Then for any x such that x ₁≤x≤x ₂, we want to show that $V_{\pi^{*}_{t}}(b,x,y) \geq V_{\pi^{i}_{t}}(b,x,y)$ for all $\pi ^{i}_{t} \in\varPi_{t}$. Suppose we arbitrarily select a policy $\pi^{i}_{t} \neq\pi^{*}_{t}$ from Π _t. If $i \in\underline{I}$ (i.e., $\pi^{i}_{t}$ is less aggressive than $\pi^{*}_{t}$), then from Proposition 3, we have

$$\begin{aligned} V_{\pi^{*}_t}(b,{x},y) - V_{\pi^{i}_t}(b,{x},y) &\geq V_{\pi ^{*}_t}(b,x_1,y) - V_{\pi^{i}_t}(b,x_1,y) \\ &\geq0, \end{aligned}$$

(51)

where (51) follows because $\pi^{*}_{t}$ is feasible at (x ₁,y).

On the other hand, if $i \in\overline{I}$ (i.e., $\pi^{i}_{t}$ is more aggressive than $\pi^{*}_{t}$), then again from Proposition 3, we have

$$\begin{aligned} &V_{\pi^{i}_t}(b,{x},y) - V_{\pi^{*}_t}(b,{x},y)\leq V_{\pi ^{i}_t}(b,x_2,y)-V_{\pi^{*}_t}(b,x_2,y)\leq0, \end{aligned}$$

(52)

which again follows because $\pi^{*}_{t}$ is feasible at (x ₂,y). Then, $V_{\pi^{*}_{t}}(b,{x},y) - V_{\pi^{i}_{t}}(b,{x},y) \geq0$ for any x ₁≤x≤x ₂ and $\pi^{i}_{t} \in\varPi_{t}$, which completes the proof. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ayer, T. Inverse optimization for assessing emerging technologies in breast cancer screening. Ann Oper Res 230, 57–85 (2015). https://doi.org/10.1007/s10479-013-1520-3

Download citation

Published: 15 January 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10479-013-1520-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inverse optimization for assessing emerging technologies in breast cancer screening

Abstract

Access this article

Similar content being viewed by others

A multi-objective constrained partially observable Markov decision process model for breast cancer screening

Optimal Decision Making for Breast Cancer Treatment in the Presence of Cancer Regression and Type II Error in Mammography Results

Value of MRI and Ultrasound Screening for Breast Cancer in Non-High-Risk Populations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix:Proofs of the Analytical Results

Proof of Proposition 1

Case 1: \(d^{i}_{t}=E\)

Case 2: \(d^{i}_{t}=W\)

Proof of Lemma 1

Proof of Lemma 2

Proof of Proposition 2

Proof of Lemma 3

Case 1: \(d^{i}_{t}=W\)

Case 2: \(d^{i}_{t}=E\)

Proof of Proposition 3

Case 1: \(d^{i}_{t}=E\)

Case 2: \(d^{i}_{t}=W\)

Proof of Corollary 1

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inverse optimization for assessing emerging technologies in breast cancer screening

Abstract

Access this article

Similar content being viewed by others

A multi-objective constrained partially observable Markov decision process model for breast cancer screening

Optimal Decision Making for Breast Cancer Treatment in the Presence of Cancer Regression and Type II Error in Mammography Results

Value of MRI and Ultrasound Screening for Breast Cancer in Non-High-Risk Populations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix:Proofs of the Analytical Results

Appendix:Proofs of the Analytical Results

Proof of Proposition 1

Case 1: \(d^{i}_{t}=E\)

Case 2: \(d^{i}_{t}=W\)

Proof of Lemma 1

Proof of Lemma 2

Proof of Proposition 2

Proof of Lemma 3

Case 1: \(d^{i}_{t}=W\)

Case 2: \(d^{i}_{t}=E\)

Proof of Proposition 3

Case 1: \(d^{i}_{t}=E\)

Case 2: \(d^{i}_{t}=W\)

Proof of Corollary 1

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation