Skip to main content
Log in

Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women

  • Original Article
  • Published:
Reproductive Sciences Aims and scope Submit manuscript

Abstract

In this technical article, we are proposing ideas, that we have been developing on how machine learning and deep learning techniques can potentially assist obstetricians/gynecologists in better clinical decision-making, using infertile women in their treatment options in combination with mathematical modeling in pregnant women as examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.

Similar content being viewed by others

References

  1. McDonnell J, Goverde AJ, Rutten FF, Vermeiden JP. Multivariate Markov Chain analysis of the probability of pregnancy in infertile couples undergoing assisted reproduction. Hum Reprod. 2002;17(1):103–6.

    Article  PubMed  Google Scholar 

  2. Fiddelers AA, Dirksen CD, Dumoulin JC, van Montfoort A, Land JA, Janssen JM, et al. Cost-effectiveness of seven IVF strategies: results of a Markov decision-analytic model. Hum Reprod. 2009;24(7):1648–55.

    Article  PubMed  Google Scholar 

  3. Rao ASRS, Diamond MP. Role of Markov modeling approaches to understand the impact of infertility treatments. Reprod Sci. 2017;11:1538–43.

    Article  Google Scholar 

  4. Hsieh MH, Meng MV, Turek PJ. Markov modeling of vasectomy reversal and ART for infertility: how do obstructive interval and female partner age influence cost effectiveness? Fertil Steril. 2007;88(4):840–6.

    Article  PubMed  Google Scholar 

  5. Olive DL, Pritts EA. Markov modeling: questionable data in, questionable data out. Fertil Steril. 2008;89(3):746–7.

    Article  PubMed  Google Scholar 

  6. Bartlett MS. Some evolutionary stochastic processes. J Roy Stat Soc Ber B. 1949;11:211–29.

    Google Scholar 

  7. Kimura M. Solution of a process of random genetic drift with a continuous model. Proc Natl Acad Sci U S A. 1955;41(3):144–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kimura M. Some problems of stochastic processes in genetics. Ann Math Stat. 1957;28(4):882–901.

    Article  Google Scholar 

  9. Lantz B. Machine learning with R: expert techniques for predictive modeling to solve all your data analysis problems. 2nd ed. Birmingham: Packt Publishing; 2015.

    Google Scholar 

  10. Hastie, T; Tibshirani, R; Friedman, J. The elements of statistical learning. Data mining, inference, and prediction. 2nd edition, Springer Series in Statistics. Springer, New York; 2009.

  11. Bandyopadhyay, S; Pal, SK. Classification and learning using genetic algorithms. Applications in bioinformatics and web intelligence. Natural Computing Series. Springer, Berlin; 2007.

  12. Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349(6245):255–60.

    Article  CAS  PubMed  Google Scholar 

  13. Skansi, S. Introduction to deep learning. From logical calculus to artificial intelligence. Undergraduate Topics in Computer Science. Springer, Cham; 2018.

  14. Goodfellow I, Bengio Y, Courville A. Deep learning. Adaptive computation and machine learning. Cambridge: MIT Press; 2016.

    Google Scholar 

  15. Chen XW, Lin X. Big data deep learning: challenges and perspectives. IEEE Access. 2014;2:514–25.

    Article  Google Scholar 

  16. Miller DD, Brown EW. Artificial intelligence in medical practice: the question to the answer? Am J Med. 2018;131(2):129–33.

    Article  PubMed  Google Scholar 

  17. Dukkipati, A; Ghoshdastidar, D; Krishnan, J. Mixture modeling with compact support distributions for unsupervised learning, Proceedings of the International Joint Conference on Neural Network (IJCNN): 2706-2713; 2016.

  18. Van Messem A. Support vector machines, a robust prediction method with applications in bioinformatics, Principles and Methods for Data Science, Handbook of Statistics, volume 43, Elsevier-North Holland, Amsterdam (Eds. Arni S.R. Srinivasa Rao and C.R. Rao); 2020.

  19. Abarbanel HDI, Rozdeba PJ, Shirman S. Machine learning: deepest learning as statistical data assimilation problems. Neural Comput. 2018;30(8):2025–55.

    Article  PubMed  Google Scholar 

  20. Apolloni B, Bassis S. The randomness of the inferred parameters. A machine learning framework for computing confidence regions. Inf Sci. 2018;453:239–62.

    Article  Google Scholar 

  21. Martínez, AM.; Webb, GI.; Chen, S; Zaidi, NA. Scalable learning of Bayesian network classifiers. J Mach Learn Res. 17 2016, Paper No. 44, 35 pp.

  22. Nielsen F. What is… an information projection? Not Am Math Soc. 2018;65(3):321–4.

    Google Scholar 

  23. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281–305.

    Google Scholar 

  24. Bouveyron C, Latouche P, Mattei PA. Bayesian variable selection for globally sparse probabilistic PCA. Electron J Stat. 2018;12(2):3036–70.

    Article  Google Scholar 

  25. Veloso de Melo V, Banzhaf W. Automatic feature engineering for regression models with machine learning: an evolutionary computation and statistics hybrid. Inf Sci. 2018;430(431):287–313.

    Article  Google Scholar 

  26. Vidyasagar M. Machine learning methods in the computational biology of cancer. Proc R Soc Lond Ser A Math Phys Eng Sci. 2014;470(2167):20140081 25 pp.

    Article  CAS  Google Scholar 

  27. Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci U S A. 2016;113(27):7353–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Lee J, Wu Y, Kim H. Unbalanced data classification using support vector machines with active learning on scleroderma lung disease patterns. J Appl Stat. 2015;42(3):676–89.

    Article  Google Scholar 

  29. Kalidas, Y. Machine learning algorithms, applications and practices in data science, Principles and Methods for Data Science, Handbook of Statistics, Volume 43, (Eds. Arni S.R. Srinivasa Rao and C.R. Rao), Elsevier-North Holland, Amsterdam; 2020.

  30. Govindaraju V and C. R. Rao (Editors) . Machine learning: theory and applications. Handbook of Statistics, 31. Elsevier/North-Holland, Amsterdam; 2013. xxiv+525 pp.

  31. Bishop CM. Model-based machine learning. Philos Trans R Soc Lond Ser A Math Phys Eng Sci. 2013;371(1984):20120222 17 pp.

    Article  Google Scholar 

  32. Freno BA, Carlberg KT. Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations. Comput Methods Appl Mech Eng. 2019;348:250–96.

    Article  Google Scholar 

  33. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.

    Article  CAS  PubMed  Google Scholar 

  34. Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S. Why does unsupervised pre-training help deep learning? J Mach Learn Res. 2010;11:625–60.

    Google Scholar 

  35. Bengio, Y; Lamblin, P; Popovici, D; Larochelle, H. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19 (NIPS’06), (Eds. Bernhard Sch ̈olkopf, John Platt, and Thomas Hoffman); 2007,pages 153–160.

  36. Yaron G, Yair H, Omri B, Guy N, Nicole F, Dekel G, et al. Identifying facial phenotypes of genetic disorders using deep learning. Nat Med. 2019;25:60–4.

    Article  CAS  Google Scholar 

  37. Komorowski M, Leo AC, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018;24:1716–20.

    Article  CAS  PubMed  Google Scholar 

  38. Cherkassky M. Application of machine learning methods to medical diagnosis. Chance. 2009;22(1):42–50.

    Article  Google Scholar 

  39. Varun LK, Ryan S, David E. Holographic diagnosis of lymphoma. Nat Biomed Eng. 2018;2:631–2.

    Article  Google Scholar 

  40. Murthy, KR; Singh, S; Tuck, D; Varadan, V. Bayesian Item Response Theory for Cancer Biomarker Discovery, Integrated Population Biology and Modeling, Handbook of Statistics, Volume 40, (Eds. Arni S.R. Srinivasa Rao and C.R. Rao), Elsevier-North Holland, Amsterdam; 2019.

  41. Kurmukov, A; Dodonova, Y; Zhukov, LE. Machine learning application to human brain network studies: a kernel approach. Models, algorithms, and technologies for network analysis, 229–249, Springer Proc. Math. Stat., 197, Springer, Cham; 2017.

  42. Saha A, Dewangan C, Narasimhan H, Sampath S, Agarwal S. Learning score systems for patient mortality prediction in intensive care units via orthogonal matching pursuit. In: Proceedings of the 13th international conference on machine learning and applications (ICMLA), 2014; 2014. p. 93–8.

    Chapter  Google Scholar 

  43. He JY, Wu X, Jiang YG, Peng Q, Jain R. Hookworm detection in wireless capsule endoscopy images with deep learning. IEEE Trans Image Process. 2018;27(5):2379–92.

    Article  PubMed  Google Scholar 

  44. Gurve D, Krishnan S. Deep learning of EEG time-frequency representations for identifying eye states. Adv Data Sci Adapt Anal. 2018;10(2):1840006 13 pp.

    Article  Google Scholar 

  45. Carneiro G, Nascimento JC, Freitas A. The segmentation of the left ventricle of the heart from ultrasound data using deep learning architectures and derivative-based search methods. IEEE Trans Image Process. 2012;21(3):968–82.

    Article  PubMed  Google Scholar 

  46. Rueda A, Krishnan S. Clustering Parkinson’s and age-related voice impairment signal features for unsupervised learning. Adv Data Sci Adapt Anal. 2018;10(2):1840007 24 pp.

    Article  Google Scholar 

  47. Ustun B, Rudin C. Supersparse linear integer models for optimized medical scoring systems. Mach Learn. 2016;102(3):349–91.

    Article  Google Scholar 

  48. Agarwal S, Niyogi P. Generalization bounds for ranking algorithms via algorithmic stability. J Mach Learn Res. 2009;10:441–74.

    Google Scholar 

  49. Tu C. Comparison of various machine learning algorithms for estimating generalized propensity score. J Stat Comput Simul. 2019\;89(4):708–19.

    Article  Google Scholar 

  50. Patel H, Thakkar A, Pandya M, Makwana K. Neural network with deep learning architectures. J Inf Optim Sci. 2018;39(1):31–8.

    Google Scholar 

  51. Polson NG, Sokolov V. Deep learning: a Bayesian perspective. Bayesian Anal. 2017;12(4):1275–304.

    Article  Google Scholar 

  52. Jiequn H, Arnulf J, Weinan E. Solving high-dimensional partial differential equations using deep learning. Proc Natl Acad Sci U S A. 2018;115(34):8505–10.

    Article  CAS  Google Scholar 

  53. Agarwal, N; Bullins, B; Hazan, E. Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res. 18 (2017), Paper No. 116, 40 pp.

  54. Sirignano J, Spiliopoulos K. DGM: a deep learning algorithm for solving partial differential equations. J Comput Phys. 2018;375:1339–64.

    Article  Google Scholar 

  55. Ye JC, Han Y, Cha E. Deep convolutional framelets: a general deep learning framework for inverse problems. SIAM J Imaging Sci. 2018;11(2):991–1048.

    Article  Google Scholar 

  56. Pan S, Duraisamy K. Data-driven discovery of closure models. SIAM J Appl Dyn Syst. 2018;17(4):2381–413.

    Article  Google Scholar 

  57. Mahsereci, M; Hennig, P. Probabilistic line searches for stochastic optimization. J Mach Learn Res 18 (2017), Paper No. 119, 59 pp.

  58. Schwab C, Zech J. Deep learning in high dimension: neural network expression rates for generalized polynomial chaos expansions in UQ. Anal Appl (Singap). 2019;17(1):19–55.

    Article  Google Scholar 

  59. Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines. J Mach Learn Res. 2014;15:2949–80.

    Google Scholar 

  60. Salakhutdinov R, Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Comput. 2012;24(8):1967–2006.

    Article  PubMed  Google Scholar 

  61. Jiang B, Wu TY, Zheng C, Wong WH. Learning summary statistic for approximate Bayesian computation via deep neural network. Stat Sin. 2017;27(4):1595–618.

    Google Scholar 

  62. Deng Y, Bao F, Deng X, Wang R, Kong Y, Dai Q. Deep and structured robust information theoretic learning for image analysis. IEEE Trans Image Process. 2016;25(9):4209–21.

    Google Scholar 

  63. Baldi P, Sadowski P, Lu Z. Learning in the machine: random backpropagation and the deep learning channel. Artif Intell. 2018;260:1–35.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Chao D, Zhu J, Zhang B. Learning deep generative models with doubly stochastic gradient MCMC. IEEE Trans Neural Netw Learn Syst. 2018;29(7):3084–96.

    Google Scholar 

  65. Poggio T, Smale S. The mathematics of learning: dealing with data. Not Am Math Soc. 2003;50(5):537–44.

    Google Scholar 

Download references

Acknowledgments

We thank the following individuals in alphabetical order of their last name for very valuable comments: Medina Jackson-Browne (Brown University, Providence), N.V. Joshi (Indian Institute of Science, Bangalore), K. Praveen (Microsoft, Irvine), and P. Sashank (CEO, Exactco, Hyderabad).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arni S.R. Srinivasa Rao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Machine Learning Algorithm for Fertility Treatment Outcome (MLAFTO)

Let x({c, d, g, o}) be the probability that an infertile woman with characteristics ∗ such that

∗ = 1, 2, …, k × l × m × n with ith−ovulation induction (OI) treatment or IVF for i = 1, 2, …, p is conceived or delivered.

We compute x values for all ∗ combinations.

We compute \( \underset{i}{\max}\left(\underset{\ast }{\max }x\right) \), \( \underset{\ast }{\max}\left(\underset{i}{\max }x\right). \) Once the maximum probabilities are computed, ranking of these probabilities over various ∗ and i will provide relative chances of conceiving and delivering a live birth. AI quotient will match these combinations of ∗, and i is with the new couple who come to the clinic and suggest the probability of conceiving and delivering a live birth (see Appendix 2 for computing probabilities and descriptions related to max functions).

Appendix 2

Computation of Probabilities Through Markov Chains

In this Appendix, we propose a Markov Chain-based approach in computing probabilities of conception and delivering a live birth under various treatment options.

Suppose we want to compute the probability of conception and then delivering a baby for an infertile woman with a combination of background variables, say {c5, d4, g1, o1} and with OIi or IVF treatments explained in the paper. Let B1 be the set of all infertile women with background variables B1 = {{c5, d4, g1, o1} who will be on OIi or IVF treatment options. Let \( x{\left({B}_1\right)}_{jc}^{(T)} \)be the probability that an infertile woman at the state j with characteristics {c5, d4, g1, o1} and with ith−ovulation induction (OI) treatment or IVF for i = 1, 2, …, p is conceived in T− time steps. Let \( {x}_i{\left({B}_1\right)}_{jb}^{(T)} \)be such a probability to deliver a baby and x(B1)cb be the probability of baby born to a woman within B1 given that the woman is conceived. These three probabilities can be computed using below formulas:

$$ {x}_i{\left({B}_1\right)}_{jc}^{(T)}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j ds}\dots .\left({A}_{2.1}\right) $$
$$ {x}_i{\left({B}_1\right)}_{jb}^{(T)}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to b} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j ds}\dots .\left({A}_{2.2}\right) $$
$$ {x}_i{\left({B}_1\right)}_{cb}=\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{c\to b} ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^c ds}\dots .\left({A}_{2.3}\right) $$

where \( {W}_{i,T}^{j\to c}(s) \) denotes sth infertile woman in the state j who is on ith treatment conceives in T− time steps and \( {W}_i^j(s) \) denotes sth infertile woman in the state j who is on ith treatment. \( \underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c}(s) ds \) is the total number of infertile women in the set B1 who have moved from the state j to the state c who are on ith treatment, and \( \underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds \) is the total number of women in the set B1 who are at the state j who are on ith treatment. Suppose B2 be another set of all infertile women with different background variables, say B2 = {{c1, d3, g6, o2} with OIi or IVF treatments}, and then we can compute corresponding transition probabilities by a similar type of formulas as in (A2.1) − (A2.2).

Probability of transition from the state c to the state b does not depend upon \( \underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds \) but only on the \( \underset{s\epsilon {B}_1}{\int }{W}_i^c(s) ds, \) so the random variable responsible for the transition between these two states, say Y, obeys Markov property. Moreover, the transition probability matrix Pi(B1) for the set of infertile women B1 between states {j, c, b} who are on ith treatment can be written as:

$$ {P}_i\left({B}_1\right)=\left[\begin{array}{c}\begin{array}{c}\ \\ {}\kern3em j\kern5em c\kern4.25em b\kern1em \end{array}\\ {}j\kern0.5em {x}_i{\left({B}_1\right)}_{jj}\kern0.5em {x}_i{\left({B}_1\right)}_{jc}\kern0.5em {x}_i{\left({B}_1\right)}_{jb}\\ {}\begin{array}{ccc}c& {x}_i{\left({B}_1\right)}_{cj}& \begin{array}{cc}{x}_i{\left({B}_1\right)}_{cc}& {x}_i{\left({B}_1\right)}_{cb}\end{array}\end{array}\\ {}\begin{array}{ccc}b& {x}_i{\left({B}_1\right)}_{bj}& \begin{array}{cc}{x}_i{\left({B}_1\right)}_{bc}& {x}_i{\left({B}_1\right)}_{bb}\end{array}\end{array}\end{array}\right] $$

where xi(B1)jj + xi(B1)jc = 1, xi(B1)cc + xi(B1)cb = 1 and xi(B1)bb = 1.  xi(B1)jb = 0 due to Markov property, whereas xi(B1)cj = xi(B1)bj = xi(B1)bc = 0 due to transition from c → j, b → j, and b → c are impossible. Similarly, we will compute:

$$ {P}_i\left({B}_{\ast}\right)=\left[\begin{array}{c}\begin{array}{c}\ \\ {}\kern3em j\kern5em c\kern4.25em b\kern1em \end{array}\\ {}j\kern0.5em {x}_i{\left({B}_{\ast}\right)}_{jj}\kern0.5em \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{jc}& {x}_i{\left({B}_{\ast}\right)}_{jb}\end{array}\\ {}\begin{array}{ccc}c& {x}_i{\left({B}_{\ast}\right)}_{cj}& \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{cc}& {x}_i{\left({B}_{\ast}\right)}_{cb}\end{array}\end{array}\\ {}\begin{array}{ccc}b& {x}_i{\left({B}_{\ast}\right)}_{bj}& \begin{array}{cc}{x}_i{\left({B}_{\ast}\right)}_{bc}& {x}_i{\left({B}_{1\ast}\right)}_{bb}\end{array}\end{array}\end{array}\right] $$

for ∗ = 1, 2, …, k × l × m × n. Let \( {W}_i^j\left({B}_{\ast}\right) \) be the number of infertile women (state j) within background characteristics B who are on ith treatment, and let Wj(B) be the total number of infertile women within background characteristics B such that:

$$ {W}^j\left({B}_{\ast}\right)=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) $$
$$ \bigcap \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right)=\varnothing \left(\mathrm{empty}\ \mathrm{set}\right). $$

Once Pi(B) is computed based on certain design of the sample population, the sizes of \( {W}_i^j\left({B}_{\ast}\right) \) are not changed for computing probabilities using (A2.1) − (A2.2). That is, the matrix Pi(B) is not updated based on newer women who have started treatment after the designed time interval.

Two functions are \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} \) and \( \underset{\ast }{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc.}\right\} \)

The function \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} \) describes that the maximum of the probability values of women with background characteristics B across all the treatments, which is obtained as:

$$ \max \left\{\frac{\underset{s\epsilon {B}_{\ast }}{\int }{W}_{1,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_{\ast }}{\int }{W}_1^j(s) ds},\frac{\underset{s\epsilon {B}_{\ast }}{\int }{W}_{2,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_{\ast }}{\int }{W}_2^j(s) ds},\dots \right\}\dots ..\left({A}_{2.4}\right) $$

Through the expression (A2.4), we will obtain k × l × m × n maximum values, where each maximum value represents maximum probability of conceiving by an infertile woman from a particular set of background characteristics and corresponding treatment type for which this maximum value is obtained. Similarly, we can construct \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{cb}\right\}. \) The function \( \underset{i}{\max}\left\{{x}_i{\left({B}_{\ast}\right)}_{jc}\right\} \) describes that the maximum probability of conceiving within the women who are ith treatment across different background characteristics, which is obtained as:

$$ \max \left\{\frac{\underset{s\epsilon {B}_1}{\int }{W}_{i,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_1}{\int }{W}_i^j(s) ds},\frac{\underset{s\epsilon {B}_2}{\int }{W}_{i,T}^{j\to c}(s) ds}{\underset{s\epsilon {B}_2}{\int }{W}_i^j(s) ds},\dots \right\}\dots .\left({A}_{2.5}\right) $$

Let \( {W}_i^j\left({B}_{\ast}\right) \) be the number of infertile women (state j) within background characteristics B who are on ith treatment, and let Wj(B) be the total number of infertile women within background characteristics B, then:

$$ {W}^j\left({B}_{\ast}\right)=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) $$
$$ \bigcap \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right)=\varnothing \left(\mathrm{empty}\ \mathrm{set}\right). $$

See also Fig. 2 to see this disjoint property of infertile women within each background characteristics.

Fig. 2.
figure 2

Disjoint sets of infertile women across all the treatment options within the background characteristics (B)

Result: Total infertile women with background characteristics {B} can be written as the union of disjoint sets of women across all treatment options, i.e.,

$$ \underset{s\epsilon {B}_{\ast }}{\int }{W}_i^j(s) ds=\bigcup \limits_{i=1}^p{W}_i^j\left({B}_{\ast}\right) ds\dots ..\dots .\left({A}_{2.4}\right) $$

Appendix 3

Machine Learning Versus Deep Learning in Computing Probabilities of Conception and Delivery

Suppose a new infertile woman whose background characteristics {BN} is interested to start one of the available treatments OIi or IVF. Let us understand how machine learning techniques are applied to decide which of the treatment will give maximum chance of conception and delivering a baby. Prior to a decision-making process on treatment options for this woman, let us suppose that probabilities of conception and delivery were previously computed through MLAFTO explained in the Appendix 1 and P(B) for all * in the Appendix 2. The data used for these two computations is usually a predetermined or pre-designed one, i.e., the time frame and other design aspects of the data were well defined and are without any data-related errors. MLAFTO matches the new infertile woman characteristic set BN with the sets {B :  ∗  = 1, 2, …, k × l × m × n}. Let {By} be the set that matches with the new woman characteristics such that {By} − {BN} =  ∅ (null set). The corresponding values of

x(By)jc and x(By)cb

are considered as chances of conception and chances of delivery for the new woman who came to the clinic.

Note that the success or failure data of woman with {BN} is not used in computation of P(B) for all * which is the key for machine learning type of algorithm.

If each treatment trial of a woman whether or not that woman conceives is considered as onetime step of treatment (or one cycle of treatment) and the duration from conceiving of a woman to whether or not a baby is delivered is considered as onetime step of pregnancy (or one cycle of pregnancy), and let \( x{\left({B}_y\right)}_{jc}^{(n)} \) and \( x{\left({B}_y\right)}_{cb}^{(n)} \) be the corresponding n− step or n−cycle probabilities, then by Markov property, we have

$$ x{\left({B}_y\right)}_{jc}^{(n)}\times x{\left({B}_y\right)}_{cb}^{(m)}=x{\left({B}_y\right)}_{jb}^{\left(n+m\right)} $$

When another infertile woman with background characteristics {BM} comes to the clinic for the purpose of decision-making of which type of treatment will be needed for a successful delivery, the prior computed transition probability matrix P(B) for all * that was used in matching for a woman with {BN} was not updated with the success or failure information of the woman with {BN}. In a way, the matrices Pi(B) are static in case we are using machine learning algorithms, and these are not influenced by new data generated on newer infertile women who come to the clinic after constructing Pi(B).

Once an infertile woman walks into the clinic with background characteristics {BM}, if deep learning techniques are implemented to predict the probabilities of conceiving (say, y(BN)jc) and the delivery (say, y(BN)jc), then the computations of such probabilities are different than machine learning techniques. Each time a new infertile woman with {BM} comes to the clinic for the treatment purposes, instead of matching procedure with the existing static model explained above, deep learning involves reconstructing of the transition probability matrices Pi(B), for i = 1, 2, …, p for conceiving and delivery with whatever data that is available prior to arriving of the woman with {BM}. Rest of the computational procedures explained in the Appendix 2 remains the same. Deep learning techniques usually delay the output due to reconstructing of the Pi(B) each and every time a new infertile woman comes to clinic.

General introductions of machine learning techniques, motivations, and key ideologies that were explained in a variety of research areas can be found in [9,10,11,12,13]. Specific ideas related to deep learning techniques were also well developed [14], deep learning techniques and applications were summarized [15], and an overview of importance of machine learning algorithms in medicine can be found in [16]. As explained in our article, the machine learning and deep learning techniques broadly use the same data within the specific goals, but their approach of handling the data and models distinguish them from each other. Statistical thinking had contributed several aspects of machine learning, for example, in developing computationally intense data classification algorithms, methods in data search and matching probabilities, data mining techniques, model classification and model fitting algorithms, and a combination of all these (see, e.g. [17,18,19,20,21,22,23,24,25,26,27,28,29],, and for a collection of articles related to statistical methods in machine learning, see [30]. Model-based machine learning methods [31] and the construction of coefficients in a regression model can be benefited by machine learning methods [32].

Deep learning techniques, instead of focusing on model-based approaches, would assist in understanding intricate structures of the large data sets and various interlinkages between these data sets [33]. Importance of unsupervised pre-training to the structural architecture and the hypothesis of testing design effects of such experiments are well studied [34, 35]. Deep learning and machine learning techniques could also assist in questions related to health informatics, disease detection, item response theories, and bioinformatics research [36,37,38,39,40,41]. There were also successful methods in deep learning algorithms which score patients in intensive care unit (ICU) for their severity and predict mortality without using any model-based assumptions in scoring systems [42] and for other medical applications, for example, detection of worms through endoscopy [43], ophthalmology studies [44], cardiovascular studies [45], Parkinson’s disease data [46], and medical scoring systems [47]. Deep learning procedures involved in various levels of abstraction for ranking system models can be found in [48, 49]; applications for mathematical models, parameter computations, and stability of algorithms are found in [50,51,52,53,54,55,56].

Statistical and stochastic modeling principles were applied in deep learning algorithms to strengthen the object search capabilities or for improved model fitting in uncertainty [32, 57, 58]. Boltzmann machines assist in the deep understanding of the data by linking layer level structured data and then by estimating model parameters through maximum likelihood methods [59, 60]. Random backpropagation and backpropagation methods help in stochastics transition matrix formations and computing quicker search algorithms in higher dimensional stochastic matrices and literature related to backpropagation could be found in several places, for example, see in [61,62,63,64]. A survey of statistical learning algorithms and their performance evaluations can be found in [65].

Appendix 4

Theorems

Theorem A.1: When Wj is the total number of infertile women (state j) whose data is used in the machine learning algorithm and δ ∈[1, klmn] and α ∈ [1, p] are considered as continuous for background characteristics and treatment options, then

$$ \frac{1}{pklmn}\left[\underset{\delta =1}{\overset{klmn}{\int }}\underset{\alpha =1}{\overset{p}{\int }}\frac{W_{\alpha}^{j\to c}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta +\underset{\delta =1}{\overset{klmn}{\int }}\underset{\alpha =1}{\overset{p}{\int }}\frac{W_{\alpha}^{c\to b}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d i\ d\delta \right]\le 1 $$

Proof: We have,

$$ {W}^j\left({B}_{\delta}\right)={\int}_1^p\frac{W_{\alpha}^j\left({B}_{\delta}\right)}{W^j\left({B}_{\delta}\right)} d\alpha \dots ..\dots .\left({A}_{5.1}\right) $$

and

$$ {W}^j={\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^j\left({B}_{\delta}\right)}{W^j\left({B}_{\delta}\right)} d\alpha d\delta \dots ..\dots .\left({A}_{5.2}\right) $$

Note that,

$$ \frac{W_1^{j\to c}\left({B}_{\delta}\right)}{W_1^j\left({B}_{\delta}\right)}+\frac{W_2^{j\to c}\left({B}_{\delta}\right)}{W_2^j\left({B}_{\delta}\right)}+\dots +\frac{W_p^{j\to c}\left({B}_{\delta}\right)}{W_p^j\left({B}_{\delta}\right)}\le p\dots ..\dots .\left({A}_{5.3}\right) $$

and

$$ \frac{W_1^{c\to b}\left({B}_{\delta}\right)}{W_1^j\left({B}_{\delta}\right)}+\frac{W_2^{c\to b}\left({B}_{\delta}\right)}{W_2^j\left({B}_{\delta}\right)}+\dots +\frac{W_p^{c\to b}\left({B}_{\delta}\right)}{W_p^j\left({B}_{\delta}\right)}\le p\dots ..\dots .\left({A}_{5.4}\right) $$

from the inequality (A4.3), we can obtain,

$$ {\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^{j\to c}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta \le pklmn\dots ..\dots .\left({A}_{5.5}\right) $$

from the inequality (A4.4) we can obtain,

$$ {\int}_1^{klmn}{\int}_1^p\frac{W_{\alpha}^{c\to b}\left({B}_{\delta}\right)}{W_{\alpha}^j\left({B}_{\delta}\right)} d\alpha d\delta \le pklmn\dots ..\dots .\left({A}_{5.6}\right) $$

Required result is deduced from two inequalities (A4.5) and (A4.6).

Theorem A.2: For continuous α and δ, we have

$$ \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)\left({\int}_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta \right)\le {\left({\int}_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta \right)}^2 $$

Proof: We know,

$$ \frac{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}=\left\{\begin{array}{c}0\\ {}1\\ {}\theta\ \mathrm{for}\ \theta \epsilon \left(0,1\right)\end{array}\kern0.5em \begin{array}{c}\mathrm{if}\ \mathrm{no}\ \mathrm{infertile}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{conceives}\\ {}\mathrm{if}\ \mathrm{every}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{conceives}\\ {}\ \mathrm{if}\ \mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{woman}\ \mathrm{conceives}\end{array}\right. $$
$$ \frac{\int_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}=\left\{\begin{array}{c}0\\ {}1\\ {}\gamma\ \mathrm{for}\ \gamma \epsilon \left(0,1\right)\end{array}\kern0.5em \begin{array}{c}\mathrm{if}\ \mathrm{no}\ \mathrm{conceived}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{delivers}\\ {}\mathrm{if}\ \mathrm{every}\ \mathrm{women}\ \mathrm{with}\ \upalpha\ \mathrm{delivers}\\ {}\ \mathrm{if}\ \mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{woman}\ \mathrm{delivers}\end{array}\right. $$

These imply,

$$ 0\le \frac{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\le 1\dots ..\dots .\left({A}_{5.7}\right) $$
$$ 0\le \frac{\int_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\le 1\dots ..\dots .\left({A}_{5.8}\right) $$

From (A5.7) and (A5.8), we can deduce required result.

Theorem A.3: Let f : A → + and g : B → + where A is the set of fractions of (A5.7) and B is the set of all fractions of (A5.8), and then f and g are defined only at the adherent points of A and B, respectively.

Proof: Note that,

$$ \min \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)=\min \left({\int}_{\delta }{W}_{\alpha}^b\left({B}_{\delta}\right) d\delta \right)=0 $$

and

$$ \max \left({\int}_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta \right)=\left({\int}_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta \right) $$

Two sets A and B are constructed from (A5.7) and (A5.8) as

$$ A=\left\{0,\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\dots, 1\right\}\dots ..\dots .\left({A}_{5.9}\right) $$
$$ B=\left\{0,\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\dots, 1\right\}\dots ..\dots .\left({A}_{5.10}\right) $$

From the elements of the set A as in (A5.9), f is not defined at open subintervals,

$$ \left(0,\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\right),\left(\frac{1}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^j\left({B}_{\delta}\right) d\delta}\right),\dots $$

and from the elements of the set B as in (A5.10), g is not defined at open subintervals

$$ \left(0,\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\right),\left(\frac{1}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta},\frac{2}{\int_{\delta }{W}_{\alpha}^c\left({B}_{\delta}\right) d\delta}\right),\dots . $$

Hence, f and g are defined only at the adherent points of A and B.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srinivasa Rao, A.S., Diamond, M.P. Deep Learning of Markov Model-Based Machines for Determination of Better Treatment Option Decisions for Infertile Women. Reprod. Sci. 27, 763–770 (2020). https://doi.org/10.1007/s43032-019-00082-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s43032-019-00082-9

Keywords

Navigation