Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women

Srinivasa Rao, Arni S.R.; Diamond, Michael P.

doi:10.1007/s43032-019-00082-9

Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women

Original Article
Published: 14 January 2020

Volume 27, pages 763–770, (2020)
Cite this article

Reproductive Sciences Aims and scope Submit manuscript

Arni S.R. Srinivasa Rao^1,2,3 &
Michael P. Diamond⁴

309 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

In this technical article, we are proposing ideas, that we have been developing on how machine learning and deep learning techniques can potentially assist obstetricians/gynecologists in better clinical decision-making, using infertile women in their treatment options in combination with mathematical modeling in pregnant women as examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Future of IVF: The New Normal in Human Reproduction

Article Open access 03 January 2022

A review on deep learning in medical image analysis

Article 04 September 2021

Detection of Ovarian Cyst in Ultrasound Images Using Fine-Tuned VGG-16 Deep Learning Network

Article 13 March 2020

References

McDonnell J, Goverde AJ, Rutten FF, Vermeiden JP. Multivariate Markov Chain analysis of the probability of pregnancy in infertile couples undergoing assisted reproduction. Hum Reprod. 2002;17(1):103–6.
Article PubMed Google Scholar
Fiddelers AA, Dirksen CD, Dumoulin JC, van Montfoort A, Land JA, Janssen JM, et al. Cost-effectiveness of seven IVF strategies: results of a Markov decision-analytic model. Hum Reprod. 2009;24(7):1648–55.
Article PubMed Google Scholar
Rao ASRS, Diamond MP. Role of Markov modeling approaches to understand the impact of infertility treatments. Reprod Sci. 2017;11:1538–43.
Article Google Scholar
Hsieh MH, Meng MV, Turek PJ. Markov modeling of vasectomy reversal and ART for infertility: how do obstructive interval and female partner age influence cost effectiveness? Fertil Steril. 2007;88(4):840–6.
Article PubMed Google Scholar
Olive DL, Pritts EA. Markov modeling: questionable data in, questionable data out. Fertil Steril. 2008;89(3):746–7.
Article PubMed Google Scholar
Bartlett MS. Some evolutionary stochastic processes. J Roy Stat Soc Ber B. 1949;11:211–29.
Google Scholar
Kimura M. Solution of a process of random genetic drift with a continuous model. Proc Natl Acad Sci U S A. 1955;41(3):144–50.
Article CAS PubMed PubMed Central Google Scholar
Kimura M. Some problems of stochastic processes in genetics. Ann Math Stat. 1957;28(4):882–901.
Article Google Scholar
Lantz B. Machine learning with R: expert techniques for predictive modeling to solve all your data analysis problems. 2nd ed. Birmingham: Packt Publishing; 2015.
Google Scholar
Hastie, T; Tibshirani, R; Friedman, J. The elements of statistical learning. Data mining, inference, and prediction. 2nd edition, Springer Series in Statistics. Springer, New York; 2009.
Bandyopadhyay, S; Pal, SK. Classification and learning using genetic algorithms. Applications in bioinformatics and web intelligence. Natural Computing Series. Springer, Berlin; 2007.
Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349(6245):255–60.
Article CAS PubMed Google Scholar
Skansi, S. Introduction to deep learning. From logical calculus to artificial intelligence. Undergraduate Topics in Computer Science. Springer, Cham; 2018.
Goodfellow I, Bengio Y, Courville A. Deep learning. Adaptive computation and machine learning. Cambridge: MIT Press; 2016.
Google Scholar
Chen XW, Lin X. Big data deep learning: challenges and perspectives. IEEE Access. 2014;2:514–25.
Article Google Scholar
Miller DD, Brown EW. Artificial intelligence in medical practice: the question to the answer? Am J Med. 2018;131(2):129–33.
Article PubMed Google Scholar
Dukkipati, A; Ghoshdastidar, D; Krishnan, J. Mixture modeling with compact support distributions for unsupervised learning, Proceedings of the International Joint Conference on Neural Network (IJCNN): 2706-2713; 2016.
Van Messem A. Support vector machines, a robust prediction method with applications in bioinformatics, Principles and Methods for Data Science, Handbook of Statistics, volume 43, Elsevier-North Holland, Amsterdam (Eds. Arni S.R. Srinivasa Rao and C.R. Rao); 2020.
Abarbanel HDI, Rozdeba PJ, Shirman S. Machine learning: deepest learning as statistical data assimilation problems. Neural Comput. 2018;30(8):2025–55.
Article PubMed Google Scholar
Apolloni B, Bassis S. The randomness of the inferred parameters. A machine learning framework for computing confidence regions. Inf Sci. 2018;453:239–62.
Article Google Scholar
Martínez, AM.; Webb, GI.; Chen, S; Zaidi, NA. Scalable learning of Bayesian network classifiers. J Mach Learn Res. 17 2016, Paper No. 44, 35 pp.
Nielsen F. What is… an information projection? Not Am Math Soc. 2018;65(3):321–4.
Google Scholar
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281–305.
Google Scholar
Bouveyron C, Latouche P, Mattei PA. Bayesian variable selection for globally sparse probabilistic PCA. Electron J Stat. 2018;12(2):3036–70.
Article Google Scholar
Veloso de Melo V, Banzhaf W. Automatic feature engineering for regression models with machine learning: an evolutionary computation and statistics hybrid. Inf Sci. 2018;430(431):287–313.
Article Google Scholar
Vidyasagar M. Machine learning methods in the computational biology of cancer. Proc R Soc Lond Ser A Math Phys Eng Sci. 2014;470(2167):20140081 25 pp.
Article CAS Google Scholar
Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci U S A. 2016;113(27):7353–60.
Article CAS PubMed PubMed Central Google Scholar
Lee J, Wu Y, Kim H. Unbalanced data classification using support vector machines with active learning on scleroderma lung disease patterns. J Appl Stat. 2015;42(3):676–89.
Article Google Scholar
Kalidas, Y. Machine learning algorithms, applications and practices in data science, Principles and Methods for Data Science, Handbook of Statistics, Volume 43, (Eds. Arni S.R. Srinivasa Rao and C.R. Rao), Elsevier-North Holland, Amsterdam; 2020.
Govindaraju V and C. R. Rao (Editors) . Machine learning: theory and applications. Handbook of Statistics, 31. Elsevier/North-Holland, Amsterdam; 2013. xxiv+525 pp.
Bishop CM. Model-based machine learning. Philos Trans R Soc Lond Ser A Math Phys Eng Sci. 2013;371(1984):20120222 17 pp.
Article Google Scholar
Freno BA, Carlberg KT. Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations. Comput Methods Appl Mech Eng. 2019;348:250–96.
Article Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.
Article CAS PubMed Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S. Why does unsupervised pre-training help deep learning? J Mach Learn Res. 2010;11:625–60.
Google Scholar
Bengio, Y; Lamblin, P; Popovici, D; Larochelle, H. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19 (NIPS’06), (Eds. Bernhard Sch ̈olkopf, John Platt, and Thomas Hoffman); 2007,pages 153–160.
Yaron G, Yair H, Omri B, Guy N, Nicole F, Dekel G, et al. Identifying facial phenotypes of genetic disorders using deep learning. Nat Med. 2019;25:60–4.
Article CAS Google Scholar
Komorowski M, Leo AC, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018;24:1716–20.
Article CAS PubMed Google Scholar
Cherkassky M. Application of machine learning methods to medical diagnosis. Chance. 2009;22(1):42–50.
Article Google Scholar
Varun LK, Ryan S, David E. Holographic diagnosis of lymphoma. Nat Biomed Eng. 2018;2:631–2.
Article Google Scholar
Murthy, KR; Singh, S; Tuck, D; Varadan, V. Bayesian Item Response Theory for Cancer Biomarker Discovery, Integrated Population Biology and Modeling, Handbook of Statistics, Volume 40, (Eds. Arni S.R. Srinivasa Rao and C.R. Rao), Elsevier-North Holland, Amsterdam; 2019.
Kurmukov, A; Dodonova, Y; Zhukov, LE. Machine learning application to human brain network studies: a kernel approach. Models, algorithms, and technologies for network analysis, 229–249, Springer Proc. Math. Stat., 197, Springer, Cham; 2017.
Saha A, Dewangan C, Narasimhan H, Sampath S, Agarwal S. Learning score systems for patient mortality prediction in intensive care units via orthogonal matching pursuit. In: Proceedings of the 13th international conference on machine learning and applications (ICMLA), 2014; 2014. p. 93–8.
Chapter Google Scholar
He JY, Wu X, Jiang YG, Peng Q, Jain R. Hookworm detection in wireless capsule endoscopy images with deep learning. IEEE Trans Image Process. 2018;27(5):2379–92.
Article PubMed Google Scholar
Gurve D, Krishnan S. Deep learning of EEG time-frequency representations for identifying eye states. Adv Data Sci Adapt Anal. 2018;10(2):1840006 13 pp.
Article Google Scholar
Carneiro G, Nascimento JC, Freitas A. The segmentation of the left ventricle of the heart from ultrasound data using deep learning architectures and derivative-based search methods. IEEE Trans Image Process. 2012;21(3):968–82.
Article PubMed Google Scholar
Rueda A, Krishnan S. Clustering Parkinson’s and age-related voice impairment signal features for unsupervised learning. Adv Data Sci Adapt Anal. 2018;10(2):1840007 24 pp.
Article Google Scholar
Ustun B, Rudin C. Supersparse linear integer models for optimized medical scoring systems. Mach Learn. 2016;102(3):349–91.
Article Google Scholar
Agarwal S, Niyogi P. Generalization bounds for ranking algorithms via algorithmic stability. J Mach Learn Res. 2009;10:441–74.
Google Scholar
Tu C. Comparison of various machine learning algorithms for estimating generalized propensity score. J Stat Comput Simul. 2019\;89(4):708–19.
Article Google Scholar
Patel H, Thakkar A, Pandya M, Makwana K. Neural network with deep learning architectures. J Inf Optim Sci. 2018;39(1):31–8.
Google Scholar
Polson NG, Sokolov V. Deep learning: a Bayesian perspective. Bayesian Anal. 2017;12(4):1275–304.
Article Google Scholar
Jiequn H, Arnulf J, Weinan E. Solving high-dimensional partial differential equations using deep learning. Proc Natl Acad Sci U S A. 2018;115(34):8505–10.
Article CAS Google Scholar
Agarwal, N; Bullins, B; Hazan, E. Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res. 18 (2017), Paper No. 116, 40 pp.
Sirignano J, Spiliopoulos K. DGM: a deep learning algorithm for solving partial differential equations. J Comput Phys. 2018;375:1339–64.
Article Google Scholar
Ye JC, Han Y, Cha E. Deep convolutional framelets: a general deep learning framework for inverse problems. SIAM J Imaging Sci. 2018;11(2):991–1048.
Article Google Scholar
Pan S, Duraisamy K. Data-driven discovery of closure models. SIAM J Appl Dyn Syst. 2018;17(4):2381–413.
Article Google Scholar
Mahsereci, M; Hennig, P. Probabilistic line searches for stochastic optimization. J Mach Learn Res 18 (2017), Paper No. 119, 59 pp.
Schwab C, Zech J. Deep learning in high dimension: neural network expression rates for generalized polynomial chaos expansions in UQ. Anal Appl (Singap). 2019;17(1):19–55.
Article Google Scholar
Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines. J Mach Learn Res. 2014;15:2949–80.
Google Scholar
Salakhutdinov R, Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Comput. 2012;24(8):1967–2006.
Article PubMed Google Scholar
Jiang B, Wu TY, Zheng C, Wong WH. Learning summary statistic for approximate Bayesian computation via deep neural network. Stat Sin. 2017;27(4):1595–618.
Google Scholar
Deng Y, Bao F, Deng X, Wang R, Kong Y, Dai Q. Deep and structured robust information theoretic learning for image analysis. IEEE Trans Image Process. 2016;25(9):4209–21.
Google Scholar
Baldi P, Sadowski P, Lu Z. Learning in the machine: random backpropagation and the deep learning channel. Artif Intell. 2018;260:1–35.
Article PubMed PubMed Central Google Scholar
Chao D, Zhu J, Zhang B. Learning deep generative models with doubly stochastic gradient MCMC. IEEE Trans Neural Netw Learn Syst. 2018;29(7):3084–96.
Google Scholar
Poggio T, Smale S. The mathematics of learning: dealing with data. Not Am Math Soc. 2003;50(5):537–44.
Google Scholar

Download references

Acknowledgments

We thank the following individuals in alphabetical order of their last name for very valuable comments: Medina Jackson-Browne (Brown University, Providence), N.V. Joshi (Indian Institute of Science, Bangalore), K. Praveen (Microsoft, Irvine), and P. Sashank (CEO, Exactco, Hyderabad).

Author information

Authors and Affiliations

Division of Health Economics and Modeling, Department of Population Health Sciences, Medical College of Georgia, Augusta University, 1120, 15th Street, AE 1015, Augusta, GA, 30912, USA
Arni S.R. Srinivasa Rao
Laboratory for Theory and Mathematical Modeling, Division of Infectious Diseases Department of Medicine, Medical College of Georgia, Augusta University, 1120, 15th Street, AE 1015, Augusta, GA, 30912, USA
Arni S.R. Srinivasa Rao
Department of Mathematics, Augusta University, 1120, 15th Street, AE 1015, Augusta, GA, 30912, USA
Arni S.R. Srinivasa Rao
Medical College of Georgia, Augusta University, 1120 15th Street, CJ-1036, Augusta, Georgia
Michael P. Diamond

Authors

Arni S.R. Srinivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Michael P. Diamond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arni S.R. Srinivasa Rao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Machine Learning Algorithm for Fertility Treatment Outcome (MLAFTO)

Let x({c_∗, d_∗, g_∗, o_∗}) be the probability that an infertile woman with characteristics ∗ such that

∗ = 1, 2, …, k × l × m × n with i^th−ovulation induction (OI) treatment or IVF for i = 1, 2, …, p is conceived or delivered.

We compute x values for all ∗ combinations.

We compute $ \underset{i}{\max}\left(\underset{\ast }{\max }x\right) $, $ \underset{\ast }{\max}\left(\underset{i}{\max }x\right). $ Once the maximum probabilities are computed, ranking of these probabilities over various ∗ and i will provide relative chances of conceiving and delivering a live birth. AI quotient will match these combinations of ∗, and i is with the new couple who come to the clinic and suggest the probability of conceiving and delivering a live birth (see Appendix 2 for computing probabilities and descriptions related to max functions).

Appendix 2

Computation of Probabilities Through Markov Chains

In this Appendix, we propose a Markov Chain-based approach in computing probabilities of conception and delivering a live birth under various treatment options.

Suppose we want to compute the probability of conception and then delivering a baby for an infertile woman with a combination of background variables, say {c₅, d₄, g₁, o₁} and with OI_i or IVF treatments explained in the paper. Let B₁ be the set of all infertile women with background variables B₁ = {{c₅, d₄, g₁, o₁} who will be on OI_i or IVF treatment options. Let $ x{\left({B}_1\right)}_{jc}^{(T)} $be the probability that an infertile woman at the state j with characteristics {c₅, d₄, g₁, o₁} and with i^th−ovulation induction (OI) treatment or IVF for i = 1, 2, …, p is conceived in T− time steps. Let $ {x}_i{\left({B}_1\right)}_{jb}^{(T)} $be such a probability to deliver a baby and x(B₁)_cb be the probability of baby born to a woman within B₁ given that the woman is conceived. These three probabilities can be computed using below formulas:

$$ {x}_i{\left({B}_1\right)}_{jc}^{(T)}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j ds}\dots .\left({A}_{2.1}\right) $$

$$ {x}_i{\left({B}_1\right)}_{jb}^{(T)}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to b} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j ds}\dots .\left({A}_{2.2}\right) $$

$$ {x}_i{\left({B}_1\right)}_{cb}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{c\to b} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^c ds}\dots .\left({A}_{2.3}\right) $$

where $ {W}_{i,T}^{j\to c}(s) $ denotes s^th infertile woman in the state j who is on i^th treatment conceives in T− time steps and $ {W}_i^j(s) $ denotes s^th infertile woman in the state j who is on i^th treatment. $ \underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c}(s) ds $ is the total number of infertile women in the set B₁ who have moved from the state j to the state c who are on i^th treatment, and $ \underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds $ is the total number of women in the set B₁ who are at the state j who are on i^th treatment. Suppose B₂ be another set of all infertile women with different background variables, say B₂ = {{c₁, d₃, g₆, o₂} with OI_i or IVF treatments}, and then we can compute corresponding transition probabilities by a similar type of formulas as in (A_2.1) − (A_2.2).

Probability of transition from the state c to the state b does not depend upon $ \underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds $ but only on the $ \underset{s\epsilon {B}_1}{\int }{W}_i^c(s) ds, $ so the random variable responsible for the transition between these two states, say Y, obeys Markov property. Moreover, the transition probability matrix P_i(B₁) for the set of infertile women B₁ between states {j, c, b} who are on i^th treatment can be written as:

$$ {P}_i\left({B}_1\right)=\left[\begin{array}{c}\begin{array}{c}\ \\ {}\kern3em j\kern5em c\kern4.25em b\kern1em \end{array}\\ {}j\kern0.5em {x}_i{\left({B}_1\right)}_{jj}\kern0.5em {x}_i{\left({B}_1\right)}_{jc}\kern0.5em {x}_i{\left({B}_1\right)}_{jb}\\ {}\begin{array}{ccc}c& {x}_i{\left({B}_1\right)}_{cj}& \begin{array}{cc}{x}_i{\left({B}_1\right)}_{cc}& {x}_i{\left({B}_1\right)}_{cb}\end{array}\end{array}\\ {}\begin{array}{ccc}b& {x}_i{\left({B}_1\right)}_{bj}& \begin{array}{cc}{x}_i{\left({B}_1\right)}_{bc}& {x}_i{\left({B}_1\right)}_{bb}\end{array}\end{array}\end{array}\right] $$

where x_i(B₁)_jj + x_i(B₁)_jc = 1, x_i(B₁)_cc + x_i(B₁)_cb = 1 and x_i(B₁)_bb = 1. x_i(B₁)_jb = 0 due to Markov property, whereas x_i(B₁)_cj = x_i(B₁)_bj = x_i(B₁)_bc = 0 due to transition from c → j, b → j, and b → c are impossible. Similarly, we will compute:

$$ {P}_i\left({B}_{\ast}\right)=\left[\begin{array}{c}\begin{array}{c}\ \\ {}\kern3em j\kern5em c\kern4.25em b\kern1em \end{array}\\ {}j\kern0.5em {x}_i{\left({B}_{\ast}\right)}_{jj}\kern0.5em \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{jc}& {x}_i{\left({B}_{\ast}\right)}_{jb}\end{array}\\ {}\begin{array}{ccc}c& {x}_i{\left({B}_{\ast}\right)}_{cj}& \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{cc}& {x}_i{\left({B}_{\ast}\right)}_{cb}\end{array}\end{array}\\ {}\begin{array}{ccc}b& {x}_i{\left({B}_{\ast}\right)}_{bj}& \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{bc}& {x}_i{\left({B}_{1\ast}\right)}_{bb}\end{array}\end{array}\end{array}\right] $$

for ∗ = 1, 2, …, k × l × m × n. Let $ {W}_i^j\left({B}_{\ast}\right) $ be the number of infertile women (state j) within background characteristics B_∗ who are on i^th treatment, and let W^j(B_∗) be the total number of infertile women within background characteristics B_∗ such that:

$$ {W}^j\left({B}_{\ast}\right)=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) $$

$$ \bigcap \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right)=\varnothing \left(\mathrm{empty}\ \mathrm{set}\right). $$

Once P_i(B_∗) is computed based on certain design of the sample population, the sizes of $ {W}_i^j\left({B}_{\ast}\right) $ are not changed for computing probabilities using (A_2.1) − (A_2.2). That is, the matrix P_i(B_∗) is not updated based on newer women who have started treatment after the designed time interval.

Two functions are $ \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} $ and $ \underset{\ast }{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc.}\right\} $

The function $ \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} $ describes that the maximum of the probability values of women with background characteristics B_∗ across all the treatments, which is obtained as:

$$ \max \left\{\frac{\underset{s\epsilon {B}_{\ast }}{\int }{W}_{1,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_{\ast }}{\int }{W}_1^j(s) ds},\frac{\underset{s\epsilon {B}_{\ast }}{\int }{W}_{2,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_{\ast }}{\int }{W}_2^j(s) ds},\dots \right\}\dots ..\left({A}_{2.4}\right) $$

Through the expression (A_2.4), we will obtain k × l × m × n maximum values, where each maximum value represents maximum probability of conceiving by an infertile woman from a particular set of background characteristics and corresponding treatment type for which this maximum value is obtained. Similarly, we can construct $ \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{cb}\right\}. $ The function $ \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} $ describes that the maximum probability of conceiving within the women who are i^th treatment across different background characteristics, which is obtained as:

$$ \max \left\{\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds},\frac{\underset{s\epsilon {B}_2}{\int }{W}_{i,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_2}{\int }{W}_i^j(s) ds},\dots \right\}\dots .\left({A}_{2.5}\right) $$

Let $ {W}_i^j\left({B}_{\ast}\right) $ be the number of infertile women (state j) within background characteristics B_∗ who are on i^th treatment, and let W^j(B_∗) be the total number of infertile women within background characteristics B_∗, then:

$$ {W}^j\left({B}_{\ast}\right)=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) $$

$$ \bigcap \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right)=\varnothing \left(\mathrm{empty}\ \mathrm{set}\right). $$

See also Fig. 2 to see this disjoint property of infertile women within each background characteristics.

Result: Total infertile women with background characteristics {B_∗} can be written as the union of disjoint sets of women across all treatment options, i.e.,

$$ \underset{s\epsilon {B}_{\ast }}{\int }{W}_i^j(s) ds=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) ds\dots ..\dots .\left({A}_{2.4}\right) $$

Appendix 3

Machine Learning Versus Deep Learning in Computing Probabilities of Conception and Delivery

Suppose a new infertile woman whose background characteristics {B_N} is interested to start one of the available treatments OI_i or IVF. Let us understand how machine learning techniques are applied to decide which of the treatment will give maximum chance of conception and delivering a baby. Prior to a decision-making process on treatment options for this woman, let us suppose that probabilities of conception and delivery were previously computed through MLAFTO explained in the Appendix 1 and P(B_∗) for all * in the Appendix 2. The data used for these two computations is usually a predetermined or pre-designed one, i.e., the time frame and other design aspects of the data were well defined and are without any data-related errors. MLAFTO matches the new infertile woman characteristic set B_N with the sets {B_∗ : ∗ = 1, 2, …, k × l × m × n}. Let {B_y} be the set that matches with the new woman characteristics such that {B_y} − {B_N} = ∅ (null set). The corresponding values of

x(B_y)_jc and x(B_y)_cb

are considered as chances of conception and chances of delivery for the new woman who came to the clinic.

Note that the success or failure data of woman with {B_N} is not used in computation of P(B_∗) for all * which is the key for machine learning type of algorithm.

If each treatment trial of a woman whether or not that woman conceives is considered as onetime step of treatment (or one cycle of treatment) and the duration from conceiving of a woman to whether or not a baby is delivered is considered as onetime step of pregnancy (or one cycle of pregnancy), and let $ x{\left({B}_y\right)}_{jc}^{(n)} $ and $ x{\left({B}_y\right)}_{cb}^{(n)} $ be the corresponding n− step or n−cycle probabilities, then by Markov property, we have

$$ x{\left({B}_y\right)}_{jc}^{(n)}\times x{\left({B}_y\right)}_{cb}^{(m)}=x{\left({B}_y\right)}_{jb}^{\left(n+m\right)} $$

When another infertile woman with background characteristics {B_M} comes to the clinic for the purpose of decision-making of which type of treatment will be needed for a successful delivery, the prior computed transition probability matrix P(B_∗) for all * that was used in matching for a woman with {B_N} was not updated with the success or failure information of the woman with {B_N}. In a way, the matrices P_i(B_∗) are static in case we are using machine learning algorithms, and these are not influenced by new data generated on newer infertile women who come to the clinic after constructing P_i(B_∗).

Once an infertile woman walks into the clinic with background characteristics {B_M}, if deep learning techniques are implemented to predict the probabilities of conceiving (say, y(B_N)_jc) and the delivery (say, y(B_N)_jc), then the computations of such probabilities are different than machine learning techniques. Each time a new infertile woman with {B_M} comes to the clinic for the treatment purposes, instead of matching procedure with the existing static model explained above, deep learning involves reconstructing of the transition probability matrices P_i(B_∗), for i = 1, 2, …, p for conceiving and delivery with whatever data that is available prior to arriving of the woman with {B_M}. Rest of the computational procedures explained in the Appendix 2 remains the same. Deep learning techniques usually delay the output due to reconstructing of the P_i(B_∗) each and every time a new infertile woman comes to clinic.

General introductions of machine learning techniques, motivations, and key ideologies that were explained in a variety of research areas can be found in [9,10,11,12,13]. Specific ideas related to deep learning techniques were also well developed [14], deep learning techniques and applications were summarized [15], and an overview of importance of machine learning algorithms in medicine can be found in [16]. As explained in our article, the machine learning and deep learning techniques broadly use the same data within the specific goals, but their approach of handling the data and models distinguish them from each other. Statistical thinking had contributed several aspects of machine learning, for example, in developing computationally intense data classification algorithms, methods in data search and matching probabilities, data mining techniques, model classification and model fitting algorithms, and a combination of all these (see, e.g. [17,18,19,20,21,22,23,24,25,26,27,28,29],, and for a collection of articles related to statistical methods in machine learning, see [30]. Model-based machine learning methods [31] and the construction of coefficients in a regression model can be benefited by machine learning methods [32].

Deep learning techniques, instead of focusing on model-based approaches, would assist in understanding intricate structures of the large data sets and various interlinkages between these data sets [33]. Importance of unsupervised pre-training to the structural architecture and the hypothesis of testing design effects of such experiments are well studied [34, 35]. Deep learning and machine learning techniques could also assist in questions related to health informatics, disease detection, item response theories, and bioinformatics research [36,37,38,39,40,41]. There were also successful methods in deep learning algorithms which score patients in intensive care unit (ICU) for their severity and predict mortality without using any model-based assumptions in scoring systems [42] and for other medical applications, for example, detection of worms through endoscopy [43], ophthalmology studies [44], cardiovascular studies [45], Parkinson’s disease data [46], and medical scoring systems [47]. Deep learning procedures involved in various levels of abstraction for ranking system models can be found in [48, 49]; applications for mathematical models, parameter computations, and stability of algorithms are found in [50,51,52,53,54,55,56].

Statistical and stochastic modeling principles were applied in deep learning algorithms to strengthen the object search capabilities or for improved model fitting in uncertainty [32, 57, 58]. Boltzmann machines assist in the deep understanding of the data by linking layer level structured data and then by estimating model parameters through maximum likelihood methods [59, 60]. Random backpropagation and backpropagation methods help in stochastics transition matrix formations and computing quicker search algorithms in higher dimensional stochastic matrices and literature related to backpropagation could be found in several places, for example, see in [61,62,63,64]. A survey of statistical learning algorithms and their performance evaluations can be found in [65].

Appendix 4

Theorems

Theorem A.1: When W^j is the total number of infertile women (state j) whose data is used in the machine learning algorithm and δ ∈[1, klmn] and α ∈ [1, p] are considered as continuous for background characteristics and treatment options, then

$$ \frac{1}{pklmn}\left[\underset{\delta =1}{\overset{klmn}{\int }}\underset{\alpha =1}{\overset{p}{\int }}\frac{W_{\alpha}^{j\to c}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta +\underset{\delta =1}{\overset{klmn}{\int }}\underset{\alpha =1}{\overset{p}{\int }}\frac{W_{\alpha}^{c\to b}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d i\ d\delta \right]\le 1 $$

Proof: We have,

$$ {W}^j\left({B}_{\delta}\right)={\int}_1^p\frac{W_{\alpha}^j\left({B}_{\delta}\right)}{W^j\left({B}_{\delta}\right)} d\alpha \dots ..\dots .\left({A}_{5.1}\right) $$

and

$$ {W}^j={\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^j\left({B}_{\delta}\right)}{W^j\left({B}_{\delta}\right)} d\alpha d\delta \dots ..\dots .\left({A}_{5.2}\right) $$

Note that,

$$ \frac{W_1^{j\to c}\left({B}_{\delta}\right)}{W_1^j\left({B}_{\delta}\right)}+\frac{W_2^{j\to c}\left({B}_{\delta}\right)}{W_2^j\left({B}_{\delta}\right)}+\dots +\frac{W_p^{j\to c}\left({B}_{\delta}\right)}{W_p^j\left({B}_{\delta}\right)}\le p\dots ..\dots .\left({A}_{5.3}\right) $$

and

$$ \frac{W_1^{c\to b}\left({B}_{\delta}\right)}{W_1^j\left({B}_{\delta}\right)}+\frac{W_2^{c\to b}\left({B}_{\delta}\right)}{W_2^j\left({B}_{\delta}\right)}+\dots +\frac{W_p^{c\to b}\left({B}_{\delta}\right)}{W_p^j\left({B}_{\delta}\right)}\le p\dots ..\dots .\left({A}_{5.4}\right) $$

from the inequality (A_4.3), we can obtain,

$$ {\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^{j\to c}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta \le pklmn\dots ..\dots .\left({A}_{5.5}\right) $$

from the inequality (A_4.4) we can obtain,

$$ {\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^{c\to b}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta \le pklmn\dots ..\dots .\left({A}_{5.6}\right) $$

Required result is deduced from two inequalities (A_4.5) and (A_4.6).

Theorem A.2: For continuous α and δ, we have

$$ \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)\left({\int}_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta \right)\le {\left({\int}_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta \right)}^2 $$

Proof: We know,

$$ \frac{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}=\left\{\begin{array}{c}0\\ {}1\\ {}\theta\ \mathrm{for}\ \theta \epsilon \left(0,1\right)\end{array}\kern0.5em \begin{array}{c}\mathrm{if}\ \mathrm{no}\ \mathrm{infertile}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{conceives}\\ {}\mathrm{if}\ \mathrm{every}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{conceives}\\ {}\ \mathrm{if}\ \mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{woman}\ \mathrm{conceives}\end{array}\right. $$

$$ \frac{\int_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}=\left\{\begin{array}{c}0\\ {}1\\ {}\gamma\ \mathrm{for}\ \gamma \epsilon \left(0,1\right)\end{array}\kern0.5em \begin{array}{c}\mathrm{if}\ \mathrm{no}\ \mathrm{conceived}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{delivers}\\ {}\mathrm{if}\ \mathrm{every}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{delivers}\\ {}\ \mathrm{if}\ \mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{woman}\ \mathrm{delivers}\end{array}\right. $$

These imply,

$$ 0\le \frac{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\le 1\dots ..\dots .\left({A}_{5.7}\right) $$

$$ 0\le \frac{\int_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\le 1\dots ..\dots .\left({A}_{5.8}\right) $$

From (A_5.7) and (A_5.8), we can deduce required result.

Theorem A.3: Let f : A → ℝ⁺ and g : B → ℝ⁺ where A is the set of fractions of (A_5.7) and B is the set of all fractions of (A_5.8), and then f and g are defined only at the adherent points of A and B, respectively.

Proof: Note that,

$$ \min \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)=\min \left({\int}_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta \right)=0 $$

and

$$ \max \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)=\left({\int}_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta \right) $$

Two sets A and B are constructed from (A_5.7) and (A_5.8) as

$$ A=\left\{0,\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\dots, 1\right\}\dots ..\dots .\left({A}_{5.9}\right) $$

$$ B=\left\{0,\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\dots, 1\right\}\dots ..\dots .\left({A}_{5.10}\right) $$

From the elements of the set A as in (A_5.9), f is not defined at open subintervals,

$$ \left(0,\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\right),\left(\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\right),\dots $$

and from the elements of the set B as in (A_5.10), g is not defined at open subintervals

$$ \left(0,\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\right),\left(\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\right),\dots . $$

Hence, f and g are defined only at the adherent points of A and B.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivasa Rao, A.S., Diamond, M.P. Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women. Reprod. Sci. 27, 763–770 (2020). https://doi.org/10.1007/s43032-019-00082-9

Download citation

Received: 18 April 2019
Accepted: 26 June 2019
Published: 14 January 2020
Issue Date: February 2020
DOI: https://doi.org/10.1007/s43032-019-00082-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women

Abstract

Access this article

Similar content being viewed by others

The Future of IVF: The New Normal in Human Reproduction

A review on deep learning in medical image analysis

Detection of Ovarian Cyst in Ultrasound Images Using Fine-Tuned VGG-16 Deep Learning Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix 1

Appendix 2

Appendix 3

Appendix 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women

Abstract

Access this article

Similar content being viewed by others

The Future of IVF: The New Normal in Human Reproduction

A review on deep learning in medical image analysis

Detection of Ovarian Cyst in Ultrasound Images Using Fine-Tuned VGG-16 Deep Learning Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix 1

Appendix 2

Appendix 3

Appendix 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation