Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women

  • 1 Accesses


In this technical article, we are proposing ideas, that we have been developing on how machine learning and deep learning techniques can potentially assist obstetricians/gynecologists in better clinical decision-making, using infertile women in their treatment options in combination with mathematical modeling in pregnant women as examples.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 510

This is the net price. Taxes to be calculated in checkout.

Fig. 1.


  1. 1.

    McDonnell J, Goverde AJ, Rutten FF, Vermeiden JP. Multivariate Markov Chain analysis of the probability of pregnancy in infertile couples undergoing assisted reproduction. Hum Reprod. 2002;17(1):103–6.

  2. 2.

    Fiddelers AA, Dirksen CD, Dumoulin JC, van Montfoort A, Land JA, Janssen JM, et al. Cost-effectiveness of seven IVF strategies: results of a Markov decision-analytic model. Hum Reprod. 2009;24(7):1648–55.

  3. 3.

    Rao ASRS, Diamond MP. Role of Markov modeling approaches to understand the impact of infertility treatments. Reprod Sci. 2017;11:1538–43.

  4. 4.

    Hsieh MH, Meng MV, Turek PJ. Markov modeling of vasectomy reversal and ART for infertility: how do obstructive interval and female partner age influence cost effectiveness? Fertil Steril. 2007;88(4):840–6.

  5. 5.

    Olive DL, Pritts EA. Markov modeling: questionable data in, questionable data out. Fertil Steril. 2008;89(3):746–7.

  6. 6.

    Bartlett MS. Some evolutionary stochastic processes. J Roy Stat Soc Ber B. 1949;11:211–29.

  7. 7.

    Kimura M. Solution of a process of random genetic drift with a continuous model. Proc Natl Acad Sci U S A. 1955;41(3):144–50.

  8. 8.

    Kimura M. Some problems of stochastic processes in genetics. Ann Math Stat. 1957;28(4):882–901.

  9. 9.

    Lantz B. Machine learning with R: expert techniques for predictive modeling to solve all your data analysis problems. 2nd ed. Birmingham: Packt Publishing; 2015.

  10. 10.

    Hastie, T; Tibshirani, R; Friedman, J. The elements of statistical learning. Data mining, inference, and prediction. 2nd edition, Springer Series in Statistics. Springer, New York; 2009.

  11. 11.

    Bandyopadhyay, S; Pal, SK. Classification and learning using genetic algorithms. Applications in bioinformatics and web intelligence. Natural Computing Series. Springer, Berlin; 2007.

  12. 12.

    Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349(6245):255–60.

  13. 13.

    Skansi, S. Introduction to deep learning. From logical calculus to artificial intelligence. Undergraduate Topics in Computer Science. Springer, Cham; 2018.

  14. 14.

    Goodfellow I, Bengio Y, Courville A. Deep learning. Adaptive computation and machine learning. Cambridge: MIT Press; 2016.

  15. 15.

    Chen XW, Lin X. Big data deep learning: challenges and perspectives. IEEE Access. 2014;2:514–25.

  16. 16.

    Miller DD, Brown EW. Artificial intelligence in medical practice: the question to the answer? Am J Med. 2018;131(2):129–33.

  17. 17.

    Dukkipati, A; Ghoshdastidar, D; Krishnan, J. Mixture modeling with compact support distributions for unsupervised learning, Proceedings of the International Joint Conference on Neural Network (IJCNN): 2706-2713; 2016.

  18. 18.

    Van Messem A. Support vector machines, a robust prediction method with applications in bioinformatics, Principles and Methods for Data Science, Handbook of Statistics, volume 43, Elsevier-North Holland, Amsterdam (Eds. Arni S.R. Srinivasa Rao and C.R. Rao); 2020.

  19. 19.

    Abarbanel HDI, Rozdeba PJ, Shirman S. Machine learning: deepest learning as statistical data assimilation problems. Neural Comput. 2018;30(8):2025–55.

  20. 20.

    Apolloni B, Bassis S. The randomness of the inferred parameters. A machine learning framework for computing confidence regions. Inf Sci. 2018;453:239–62.

  21. 21.

    Martínez, AM.; Webb, GI.; Chen, S; Zaidi, NA. Scalable learning of Bayesian network classifiers. J Mach Learn Res. 17 2016, Paper No. 44, 35 pp.

  22. 22.

    Nielsen F. What is… an information projection? Not Am Math Soc. 2018;65(3):321–4.

  23. 23.

    Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281–305.

  24. 24.

    Bouveyron C, Latouche P, Mattei PA. Bayesian variable selection for globally sparse probabilistic PCA. Electron J Stat. 2018;12(2):3036–70.

  25. 25.

    Veloso de Melo V, Banzhaf W. Automatic feature engineering for regression models with machine learning: an evolutionary computation and statistics hybrid. Inf Sci. 2018;430(431):287–313.

  26. 26.

    Vidyasagar M. Machine learning methods in the computational biology of cancer. Proc R Soc Lond Ser A Math Phys Eng Sci. 2014;470(2167):20140081 25 pp.

  27. 27.

    Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci U S A. 2016;113(27):7353–60.

  28. 28.

    Lee J, Wu Y, Kim H. Unbalanced data classification using support vector machines with active learning on scleroderma lung disease patterns. J Appl Stat. 2015;42(3):676–89.

  29. 29.

    Kalidas, Y. Machine learning algorithms, applications and practices in data science, Principles and Methods for Data Science, Handbook of Statistics, Volume 43, (Eds. Arni S.R. Srinivasa Rao and C.R. Rao), Elsevier-North Holland, Amsterdam; 2020.

  30. 30.

    Govindaraju V and C. R. Rao (Editors) . Machine learning: theory and applications. Handbook of Statistics, 31. Elsevier/North-Holland, Amsterdam; 2013. xxiv+525 pp.

  31. 31.

    Bishop CM. Model-based machine learning. Philos Trans R Soc Lond Ser A Math Phys Eng Sci. 2013;371(1984):20120222 17 pp.

  32. 32.

    Freno BA, Carlberg KT. Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations. Comput Methods Appl Mech Eng. 2019;348:250–96.

  33. 33.

    LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.

  34. 34.

    Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S. Why does unsupervised pre-training help deep learning? J Mach Learn Res. 2010;11:625–60.

  35. 35.

    Bengio, Y; Lamblin, P; Popovici, D; Larochelle, H. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19 (NIPS’06), (Eds. Bernhard Sch ̈olkopf, John Platt, and Thomas Hoffman); 2007,pages 153–160.

  36. 36.

    Yaron G, Yair H, Omri B, Guy N, Nicole F, Dekel G, et al. Identifying facial phenotypes of genetic disorders using deep learning. Nat Med. 2019;25:60–4.

  37. 37.

    Komorowski M, Leo AC, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018;24:1716–20.

  38. 38.

    Cherkassky M. Application of machine learning methods to medical diagnosis. Chance. 2009;22(1):42–50.

  39. 39.

    Varun LK, Ryan S, David E. Holographic diagnosis of lymphoma. Nat Biomed Eng. 2018;2:631–2.

  40. 40.

    Murthy, KR; Singh, S; Tuck, D; Varadan, V. Bayesian Item Response Theory for Cancer Biomarker Discovery, Integrated Population Biology and Modeling, Handbook of Statistics, Volume 40, (Eds. Arni S.R. Srinivasa Rao and C.R. Rao), Elsevier-North Holland, Amsterdam; 2019.

  41. 41.

    Kurmukov, A; Dodonova, Y; Zhukov, LE. Machine learning application to human brain network studies: a kernel approach. Models, algorithms, and technologies for network analysis, 229–249, Springer Proc. Math. Stat., 197, Springer, Cham; 2017.

  42. 42.

    Saha A, Dewangan C, Narasimhan H, Sampath S, Agarwal S. Learning score systems for patient mortality prediction in intensive care units via orthogonal matching pursuit. In: Proceedings of the 13th international conference on machine learning and applications (ICMLA), 2014; 2014. p. 93–8.

  43. 43.

    He JY, Wu X, Jiang YG, Peng Q, Jain R. Hookworm detection in wireless capsule endoscopy images with deep learning. IEEE Trans Image Process. 2018;27(5):2379–92.

  44. 44.

    Gurve D, Krishnan S. Deep learning of EEG time-frequency representations for identifying eye states. Adv Data Sci Adapt Anal. 2018;10(2):1840006 13 pp.

  45. 45.

    Carneiro G, Nascimento JC, Freitas A. The segmentation of the left ventricle of the heart from ultrasound data using deep learning architectures and derivative-based search methods. IEEE Trans Image Process. 2012;21(3):968–82.

  46. 46.

    Rueda A, Krishnan S. Clustering Parkinson’s and age-related voice impairment signal features for unsupervised learning. Adv Data Sci Adapt Anal. 2018;10(2):1840007 24 pp.

  47. 47.

    Ustun B, Rudin C. Supersparse linear integer models for optimized medical scoring systems. Mach Learn. 2016;102(3):349–91.

  48. 48.

    Agarwal S, Niyogi P. Generalization bounds for ranking algorithms via algorithmic stability. J Mach Learn Res. 2009;10:441–74.

  49. 49.

    Tu C. Comparison of various machine learning algorithms for estimating generalized propensity score. J Stat Comput Simul. 2019\;89(4):708–19.

  50. 50.

    Patel H, Thakkar A, Pandya M, Makwana K. Neural network with deep learning architectures. J Inf Optim Sci. 2018;39(1):31–8.

  51. 51.

    Polson NG, Sokolov V. Deep learning: a Bayesian perspective. Bayesian Anal. 2017;12(4):1275–304.

  52. 52.

    Jiequn H, Arnulf J, Weinan E. Solving high-dimensional partial differential equations using deep learning. Proc Natl Acad Sci U S A. 2018;115(34):8505–10.

  53. 53.

    Agarwal, N; Bullins, B; Hazan, E. Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res. 18 (2017), Paper No. 116, 40 pp.

  54. 54.

    Sirignano J, Spiliopoulos K. DGM: a deep learning algorithm for solving partial differential equations. J Comput Phys. 2018;375:1339–64.

  55. 55.

    Ye JC, Han Y, Cha E. Deep convolutional framelets: a general deep learning framework for inverse problems. SIAM J Imaging Sci. 2018;11(2):991–1048.

  56. 56.

    Pan S, Duraisamy K. Data-driven discovery of closure models. SIAM J Appl Dyn Syst. 2018;17(4):2381–413.

  57. 57.

    Mahsereci, M; Hennig, P. Probabilistic line searches for stochastic optimization. J Mach Learn Res 18 (2017), Paper No. 119, 59 pp.

  58. 58.

    Schwab C, Zech J. Deep learning in high dimension: neural network expression rates for generalized polynomial chaos expansions in UQ. Anal Appl (Singap). 2019;17(1):19–55.

  59. 59.

    Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines. J Mach Learn Res. 2014;15:2949–80.

  60. 60.

    Salakhutdinov R, Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Comput. 2012;24(8):1967–2006.

  61. 61.

    Jiang B, Wu TY, Zheng C, Wong WH. Learning summary statistic for approximate Bayesian computation via deep neural network. Stat Sin. 2017;27(4):1595–618.

  62. 62.

    Deng Y, Bao F, Deng X, Wang R, Kong Y, Dai Q. Deep and structured robust information theoretic learning for image analysis. IEEE Trans Image Process. 2016;25(9):4209–21.

  63. 63.

    Baldi P, Sadowski P, Lu Z. Learning in the machine: random backpropagation and the deep learning channel. Artif Intell. 2018;260:1–35.

  64. 64.

    Chao D, Zhu J, Zhang B. Learning deep generative models with doubly stochastic gradient MCMC. IEEE Trans Neural Netw Learn Syst. 2018;29(7):3084–96.

  65. 65.

    Poggio T, Smale S. The mathematics of learning: dealing with data. Not Am Math Soc. 2003;50(5):537–44.

Download references


We thank the following individuals in alphabetical order of their last name for very valuable comments: Medina Jackson-Browne (Brown University, Providence), N.V. Joshi (Indian Institute of Science, Bangalore), K. Praveen (Microsoft, Irvine), and P. Sashank (CEO, Exactco, Hyderabad).

Author information

Correspondence to Arni S.R. Srinivasa Rao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

Machine Learning Algorithm for Fertility Treatment Outcome (MLAFTO)

Let x({c, d, g, o}) be the probability that an infertile woman with characteristics ∗ such that

∗ = 1, 2, …, k × l × m × n with ith−ovulation induction (OI) treatment or IVF for i = 1, 2, …, p is conceived or delivered.

We compute x values for all ∗ combinations.

We compute \( \underset{i}{\max}\left(\underset{\ast }{\max }x\right) \), \( \underset{\ast }{\max}\left(\underset{i}{\max }x\right). \) Once the maximum probabilities are computed, ranking of these probabilities over various ∗ and i will provide relative chances of conceiving and delivering a live birth. AI quotient will match these combinations of ∗, and i is with the new couple who come to the clinic and suggest the probability of conceiving and delivering a live birth (see Appendix 2 for computing probabilities and descriptions related to max functions).

Appendix 2

Computation of Probabilities Through Markov Chains

In this Appendix, we propose a Markov Chain-based approach in computing probabilities of conception and delivering a live birth under various treatment options.

Suppose we want to compute the probability of conception and then delivering a baby for an infertile woman with a combination of background variables, say {c5, d4, g1, o1} and with OIi or IVF treatments explained in the paper. Let B1 be the set of all infertile women with background variables B1 = {{c5, d4, g1, o1} who will be on OIi or IVF treatment options. Let \( x{\left({B}_1\right)}_{jc}^{(T)} \)be the probability that an infertile woman at the state j with characteristics {c5, d4, g1, o1} and with ith−ovulation induction (OI) treatment or IVF for i = 1, 2, …, p is conceived in T− time steps. Let \( {x}_i{\left({B}_1\right)}_{jb}^{(T)} \)be such a probability to deliver a baby and x(B1)cb be the probability of baby born to a woman within B1 given that the woman is conceived. These three probabilities can be computed using below formulas:

$$ {x}_i{\left({B}_1\right)}_{jc}^{(T)}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j ds}\dots .\left({A}_{2.1}\right) $$
$$ {x}_i{\left({B}_1\right)}_{jb}^{(T)}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to b} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j ds}\dots .\left({A}_{2.2}\right) $$
$$ {x}_i{\left({B}_1\right)}_{cb}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{c\to b} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^c ds}\dots .\left({A}_{2.3}\right) $$

where \( {W}_{i,T}^{j\to c}(s) \) denotes sth infertile woman in the state j who is on ith treatment conceives in T− time steps and \( {W}_i^j(s) \) denotes sth infertile woman in the state j who is on ith treatment. \( \underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c}(s) ds \) is the total number of infertile women in the set B1 who have moved from the state j to the state c who are on ith treatment, and \( \underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds \) is the total number of women in the set B1 who are at the state j who are on ith treatment. Suppose B2 be another set of all infertile women with different background variables, say B2 = {{c1, d3, g6, o2} with OIi or IVF treatments}, and then we can compute corresponding transition probabilities by a similar type of formulas as in (A2.1) − (A2.2).

Probability of transition from the state c to the state b does not depend upon \( \underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds \) but only on the \( \underset{s\epsilon {B}_1}{\int }{W}_i^c(s) ds, \) so the random variable responsible for the transition between these two states, say Y, obeys Markov property. Moreover, the transition probability matrix Pi(B1) for the set of infertile women B1 between states {j, c, b} who are on ith treatment can be written as:

$$ {P}_i\left({B}_1\right)=\left[\begin{array}{c}\begin{array}{c}\ \\ {}\kern3em j\kern5em c\kern4.25em b\kern1em \end{array}\\ {}j\kern0.5em {x}_i{\left({B}_1\right)}_{jj}\kern0.5em {x}_i{\left({B}_1\right)}_{jc}\kern0.5em {x}_i{\left({B}_1\right)}_{jb}\\ {}\begin{array}{ccc}c& {x}_i{\left({B}_1\right)}_{cj}& \begin{array}{cc}{x}_i{\left({B}_1\right)}_{cc}& {x}_i{\left({B}_1\right)}_{cb}\end{array}\end{array}\\ {}\begin{array}{ccc}b& {x}_i{\left({B}_1\right)}_{bj}& \begin{array}{cc}{x}_i{\left({B}_1\right)}_{bc}& {x}_i{\left({B}_1\right)}_{bb}\end{array}\end{array}\end{array}\right] $$

where xi(B1)jj + xi(B1)jc = 1, xi(B1)cc + xi(B1)cb = 1 and xi(B1)bb = 1.  xi(B1)jb = 0 due to Markov property, whereas xi(B1)cj = xi(B1)bj = xi(B1)bc = 0 due to transition from c → j, b → j, and b → c are impossible. Similarly, we will compute:

$$ {P}_i\left({B}_{\ast}\right)=\left[\begin{array}{c}\begin{array}{c}\ \\ {}\kern3em j\kern5em c\kern4.25em b\kern1em \end{array}\\ {}j\kern0.5em {x}_i{\left({B}_{\ast}\right)}_{jj}\kern0.5em \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{jc}& {x}_i{\left({B}_{\ast}\right)}_{jb}\end{array}\\ {}\begin{array}{ccc}c& {x}_i{\left({B}_{\ast}\right)}_{cj}& \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{cc}& {x}_i{\left({B}_{\ast}\right)}_{cb}\end{array}\end{array}\\ {}\begin{array}{ccc}b& {x}_i{\left({B}_{\ast}\right)}_{bj}& \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{bc}& {x}_i{\left({B}_{1\ast}\right)}_{bb}\end{array}\end{array}\end{array}\right] $$

for ∗ = 1, 2, …, k × l × m × n. Let \( {W}_i^j\left({B}_{\ast}\right) \) be the number of infertile women (state j) within background characteristics B who are on ith treatment, and let Wj(B) be the total number of infertile women within background characteristics B such that:

$$ {W}^j\left({B}_{\ast}\right)=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) $$
$$ \bigcap \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right)=\varnothing \left(\mathrm{empty}\ \mathrm{set}\right). $$

Once Pi(B) is computed based on certain design of the sample population, the sizes of \( {W}_i^j\left({B}_{\ast}\right) \) are not changed for computing probabilities using (A2.1) − (A2.2). That is, the matrix Pi(B) is not updated based on newer women who have started treatment after the designed time interval.

Two functions are \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} \) and \( \underset{\ast }{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc.}\right\} \)

The function \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} \) describes that the maximum of the probability values of women with background characteristics B across all the treatments, which is obtained as:

$$ \max \left\{\frac{\underset{s\epsilon {B}_{\ast }}{\int }{W}_{1,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_{\ast }}{\int }{W}_1^j(s) ds},\frac{\underset{s\epsilon {B}_{\ast }}{\int }{W}_{2,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_{\ast }}{\int }{W}_2^j(s) ds},\dots \right\}\dots ..\left({A}_{2.4}\right) $$

Through the expression (A2.4), we will obtain k × l × m × n maximum values, where each maximum value represents maximum probability of conceiving by an infertile woman from a particular set of background characteristics and corresponding treatment type for which this maximum value is obtained. Similarly, we can construct \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{cb}\right\}. \) The function \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} \) describes that the maximum probability of conceiving within the women who are ith treatment across different background characteristics, which is obtained as:

$$ \max \left\{\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds},\frac{\underset{s\epsilon {B}_2}{\int }{W}_{i,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_2}{\int }{W}_i^j(s) ds},\dots \right\}\dots .\left({A}_{2.5}\right) $$

Let \( {W}_i^j\left({B}_{\ast}\right) \) be the number of infertile women (state j) within background characteristics B who are on ith treatment, and let Wj(B) be the total number of infertile women within background characteristics B, then:

$$ {W}^j\left({B}_{\ast}\right)=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) $$
$$ \bigcap \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right)=\varnothing \left(\mathrm{empty}\ \mathrm{set}\right). $$

See also Fig. 2 to see this disjoint property of infertile women within each background characteristics.

Fig. 2.

Disjoint sets of infertile women across all the treatment options within the background characteristics (B)

Result: Total infertile women with background characteristics {B} can be written as the union of disjoint sets of women across all treatment options, i.e.,

$$ \underset{s\epsilon {B}_{\ast }}{\int }{W}_i^j(s) ds=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) ds\dots ..\dots .\left({A}_{2.4}\right) $$

Appendix 3

Machine Learning Versus Deep Learning in Computing Probabilities of Conception and Delivery

Suppose a new infertile woman whose background characteristics {BN} is interested to start one of the available treatments OIi or IVF. Let us understand how machine learning techniques are applied to decide which of the treatment will give maximum chance of conception and delivering a baby. Prior to a decision-making process on treatment options for this woman, let us suppose that probabilities of conception and delivery were previously computed through MLAFTO explained in the Appendix 1 and P(B) for all * in the Appendix 2. The data used for these two computations is usually a predetermined or pre-designed one, i.e., the time frame and other design aspects of the data were well defined and are without any data-related errors. MLAFTO matches the new infertile woman characteristic set BN with the sets {B :  ∗  = 1, 2, …, k × l × m × n}. Let {By} be the set that matches with the new woman characteristics such that {By} − {BN} =  ∅ (null set). The corresponding values of

x(By)jc and x(By)cb

are considered as chances of conception and chances of delivery for the new woman who came to the clinic.

Note that the success or failure data of woman with {BN} is not used in computation of P(B) for all * which is the key for machine learning type of algorithm.

If each treatment trial of a woman whether or not that woman conceives is considered as onetime step of treatment (or one cycle of treatment) and the duration from conceiving of a woman to whether or not a baby is delivered is considered as onetime step of pregnancy (or one cycle of pregnancy), and let \( x{\left({B}_y\right)}_{jc}^{(n)} \) and \( x{\left({B}_y\right)}_{cb}^{(n)} \) be the corresponding n− step or n−cycle probabilities, then by Markov property, we have

$$ x{\left({B}_y\right)}_{jc}^{(n)}\times x{\left({B}_y\right)}_{cb}^{(m)}=x{\left({B}_y\right)}_{jb}^{\left(n+m\right)} $$

When another infertile woman with background characteristics {BM} comes to the clinic for the purpose of decision-making of which type of treatment will be needed for a successful delivery, the prior computed transition probability matrix P(B) for all * that was used in matching for a woman with {BN} was not updated with the success or failure information of the woman with {BN}. In a way, the matrices Pi(B) are static in case we are using machine learning algorithms, and these are not influenced by new data generated on newer infertile women who come to the clinic after constructing Pi(B).

Once an infertile woman walks into the clinic with background characteristics {BM}, if deep learning techniques are implemented to predict the probabilities of conceiving (say, y(BN)jc) and the delivery (say, y(BN)jc), then the computations of such probabilities are different than machine learning techniques. Each time a new infertile woman with {BM} comes to the clinic for the treatment purposes, instead of matching procedure with the existing static model explained above, deep learning involves reconstructing of the transition probability matrices Pi(B), for i = 1, 2, …, p for conceiving and delivery with whatever data that is available prior to arriving of the woman with {BM}. Rest of the computational procedures explained in the Appendix 2 remains the same. Deep learning techniques usually delay the output due to reconstructing of the Pi(B) each and every time a new infertile woman comes to clinic.

General introductions of machine learning techniques, motivations, and key ideologies that were explained in a variety of research areas can be found in [9,10,11,12,13]. Specific ideas related to deep learning techniques were also well developed [14], deep learning techniques and applications were summarized [15], and an overview of importance of machine learning algorithms in medicine can be found in [16]. As explained in our article, the machine learning and deep learning techniques broadly use the same data within the specific goals, but their approach of handling the data and models distinguish them from each other. Statistical thinking had contributed several aspects of machine learning, for example, in developing computationally intense data classification algorithms, methods in data search and matching probabilities, data mining techniques, model classification and model fitting algorithms, and a combination of all these (see, e.g. [17,18,19,20,21,22,23,24,25,26,27,28,29],, and for a collection of articles related to statistical methods in machine learning, see [30]. Model-based machine learning methods [31] and the construction of coefficients in a regression model can be benefited by machine learning methods [32].

Deep learning techniques, instead of focusing on model-based approaches, would assist in understanding intricate structures of the large data sets and various interlinkages between these data sets [33]. Importance of unsupervised pre-training to the structural architecture and the hypothesis of testing design effects of such experiments are well studied [34, 35]. Deep learning and machine learning techniques could also assist in questions related to health informatics, disease detection, item response theories, and bioinformatics research [36,37,38,39,40,41]. There were also successful methods in deep learning algorithms which score patients in intensive care unit (ICU) for their severity and predict mortality without using any model-based assumptions in scoring systems [42] and for other medical applications, for example, detection of worms through endoscopy [43], ophthalmology studies [44], cardiovascular studies [45], Parkinson’s disease data [46], and medical scoring systems [47]. Deep learning procedures involved in various levels of abstraction for ranking system models can be found in [48, 49]; applications for mathematical models, parameter computations, and stability of algorithms are found in [50,51,52,53,54,55,56].

Statistical and stochastic modeling principles were applied in deep learning algorithms to strengthen the object search capabilities or for improved model fitting in uncertainty [32, 57, 58]. Boltzmann machines assist in the deep understanding of the data by linking layer level structured data and then by estimating model parameters through maximum likelihood methods [59, 60]. Random backpropagation and backpropagation methods help in stochastics transition matrix formations and computing quicker search algorithms in higher dimensional stochastic matrices and literature related to backpropagation could be found in several places, for example, see in [61,62,63,64]. A survey of statistical learning algorithms and their performance evaluations can be found in [65].

Appendix 4


Theorem A.1: When Wj is the total number of infertile women (state j) whose data is used in the machine learning algorithm and δ ∈[1, klmn] and α ∈ [1, p] are considered as continuous for background characteristics and treatment options, then

$$ \frac{1}{pklmn}\left[\underset{\delta =1}{\overset{klmn}{\int }}\underset{\alpha =1}{\overset{p}{\int }}\frac{W_{\alpha}^{j\to c}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta +\underset{\delta =1}{\overset{klmn}{\int }}\underset{\alpha =1}{\overset{p}{\int }}\frac{W_{\alpha}^{c\to b}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d i\ d\delta \right]\le 1 $$

Proof: We have,

$$ {W}^j\left({B}_{\delta}\right)={\int}_1^p\frac{W_{\alpha}^j\left({B}_{\delta}\right)}{W^j\left({B}_{\delta}\right)} d\alpha \dots ..\dots .\left({A}_{5.1}\right) $$


$$ {W}^j={\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^j\left({B}_{\delta}\right)}{W^j\left({B}_{\delta}\right)} d\alpha d\delta \dots ..\dots .\left({A}_{5.2}\right) $$

Note that,

$$ \frac{W_1^{j\to c}\left({B}_{\delta}\right)}{W_1^j\left({B}_{\delta}\right)}+\frac{W_2^{j\to c}\left({B}_{\delta}\right)}{W_2^j\left({B}_{\delta}\right)}+\dots +\frac{W_p^{j\to c}\left({B}_{\delta}\right)}{W_p^j\left({B}_{\delta}\right)}\le p\dots ..\dots .\left({A}_{5.3}\right) $$


$$ \frac{W_1^{c\to b}\left({B}_{\delta}\right)}{W_1^j\left({B}_{\delta}\right)}+\frac{W_2^{c\to b}\left({B}_{\delta}\right)}{W_2^j\left({B}_{\delta}\right)}+\dots +\frac{W_p^{c\to b}\left({B}_{\delta}\right)}{W_p^j\left({B}_{\delta}\right)}\le p\dots ..\dots .\left({A}_{5.4}\right) $$

from the inequality (A4.3), we can obtain,

$$ {\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^{j\to c}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta \le pklmn\dots ..\dots .\left({A}_{5.5}\right) $$

from the inequality (A4.4) we can obtain,

$$ {\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^{c\to b}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta \le pklmn\dots ..\dots .\left({A}_{5.6}\right) $$

Required result is deduced from two inequalities (A4.5) and (A4.6).

Theorem A.2: For continuous α and δ, we have

$$ \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)\left({\int}_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta \right)\le {\left({\int}_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta \right)}^2 $$

Proof: We know,

$$ \frac{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}=\left\{\begin{array}{c}0\\ {}1\\ {}\theta\ \mathrm{for}\ \theta \epsilon \left(0,1\right)\end{array}\kern0.5em \begin{array}{c}\mathrm{if}\ \mathrm{no}\ \mathrm{infertile}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{conceives}\\ {}\mathrm{if}\ \mathrm{every}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{conceives}\\ {}\ \mathrm{if}\ \mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{woman}\ \mathrm{conceives}\end{array}\right. $$
$$ \frac{\int_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}=\left\{\begin{array}{c}0\\ {}1\\ {}\gamma\ \mathrm{for}\ \gamma \epsilon \left(0,1\right)\end{array}\kern0.5em \begin{array}{c}\mathrm{if}\ \mathrm{no}\ \mathrm{conceived}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{delivers}\\ {}\mathrm{if}\ \mathrm{every}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{delivers}\\ {}\ \mathrm{if}\ \mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{woman}\ \mathrm{delivers}\end{array}\right. $$

These imply,

$$ 0\le \frac{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\le 1\dots ..\dots .\left({A}_{5.7}\right) $$
$$ 0\le \frac{\int_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\le 1\dots ..\dots .\left({A}_{5.8}\right) $$

From (A5.7) and (A5.8), we can deduce required result.

Theorem A.3: Let f : A → + and g : B → + where A is the set of fractions of (A5.7) and B is the set of all fractions of (A5.8), and then f and g are defined only at the adherent points of A and B, respectively.

Proof: Note that,

$$ \min \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)=\min \left({\int}_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta \right)=0 $$


$$ \max \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)=\left({\int}_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta \right) $$

Two sets A and B are constructed from (A5.7) and (A5.8) as

$$ A=\left\{0,\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\dots, 1\right\}\dots ..\dots .\left({A}_{5.9}\right) $$
$$ B=\left\{0,\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\dots, 1\right\}\dots ..\dots .\left({A}_{5.10}\right) $$

From the elements of the set A as in (A5.9), f is not defined at open subintervals,

$$ \left(0,\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\right),\left(\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\right),\dots $$

and from the elements of the set B as in (A5.10), g is not defined at open subintervals

$$ \left(0,\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\right),\left(\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\right),\dots . $$

Hence, f and g are defined only at the adherent points of A and B.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Srinivasa Rao, A.S., Diamond, M.P. Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women. Reprod. Sci. (2020).

Download citation


  • Machine learning
  • State spaces
  • AI in medicine