Skip to main content

Advertisement

Log in

Improving process algebra model structure and parameters in infectious disease epidemiology through data mining

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Computational models are increasingly used to assist decision-making in public health epidemiology, but achieving the best model is a complex task due to the interaction of many components and variability of parameter values causing radically different dynamics. The modelling process can be enhanced through the use of data mining techniques. Here, we demonstrate this by applying association rules and clustering techniques to two stages of modelling: identifying pertinent structures in the initial model creation stage, and choosing optimal parameters to match that model to observed data. This is illustrated through application to the study of the circulating mumps virus in Scotland, 2004-2015.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abrams, S., Beutels, P., & Hens, N. (2014). Assessing mumps outbreak risk in highly vaccinated populations using spatial seroprevalence data. American Journal of Epidemiology, 1006–1017.

  • Anderson, R.M., & May, R.M. (1992). Infectious diseases of humans: dynamics and control (Vol. 28). Wiley Online Library.

  • Asha, T., Natarajan, S., & Murthy, K (2012). Data mining techniques in the diagnosis of tuberculosis. In Understanding tuberculosis-global experiences and innovative approaches to the diagnosis. InTech.

  • Babtie, A.C., Kirk, P., & Stumpf, M.P. (2014). Topological sensitivity analysis for systems biology. Proceedings of the National Academy of Sciences, 111(52), 18507–18512.

    Article  MathSciNet  MATH  Google Scholar 

  • Bartocci, E., & Lió, P. (2016). Computational modeling, formal analysis, and tools for systems biology. PLoS Computational Biology, 12(1), e1004591.

    Article  Google Scholar 

  • Bartocci, E., Bortolussi, L., & Sanguinetti, G. (2014). Data-driven statistical learning of temporal logic properties. In International conference on formal modeling and analysis of timed systems. Springer (pp. 23–37).

  • Bartocci, E., Bortolussi, L., Nenzi, L., & Sanguinetti, G. (2015). System design of stochastic models using robustness of temporal properties. Theoretical Computer Science, 587, 3–25.

    Article  MathSciNet  MATH  Google Scholar 

  • Benkirane, S., Norman, R., Scott, E., & Shankland, C. (2012). Measles epidemics and PEPA: an exploration of historic disease dynamics using process algebra. In International symposium on formal methods. Springer (pp. 101–115).

  • Bonmarin, I., Santa-Olalla, P., & Lévy-Bruhl, D. (2008). Modélisation de l’impact de la vaccination sur l’épidémiologie de la varicelle et du zona. Revue d’epidemiologie et de Sante Publique, 56(5), 323–331.

    Article  Google Scholar 

  • Bortolussi, L., Milios, D., & Sanguinetti, G. (2016). Smoothed model checking for uncertain continuous-time markov chains. Information and Computation, 247, 235–253.

    Article  MathSciNet  MATH  Google Scholar 

  • Cameron, R.L., & Smith-Palmer, A. (2015). Measles, mumps, rubella and whooping cough illness, routine childhood vaccine uptake. tech. rep., Health Protection Scotland.

  • Cameron, R.L., & Smith-palmer, A. (2016). Measles, mumps, rubella and whooping cough illness, routine childhood vaccine uptake. Tech. Rep. 01, Health Protection Scotland.

  • Castillo-Chavez, C., Blower, S., Driessche, P., Kirschner, D., & Yakubu, A.-A. (2002). Mathematical approaches for emerging and reemerging infectious diseases: models, methods, and theory. Springer.

  • Ciocchetta, F., & Hillston, J. (2009). Bio-PEPA: A framework for the modelling and analysis of biological systems. Theoretical Computer Science, 410(33-34), 3065–3084.

    Article  MathSciNet  MATH  Google Scholar 

  • Ciocchetta, F., & Hillston, J. (2010). Bio-PEPA for epidemiological models. Electronic Notes in Theoretical Computer Science, 261, 43–69.

    Article  Google Scholar 

  • De Espíndola, A.L., Bauch, C.T., Cabella, B.C.T., & Martinez, A.S. (2011). An agent-based computational model of the spread of tuberculosis. Journal of Statistical Mechanics: Theory and Experiment, 2011(05), P05003.

    Article  Google Scholar 

  • Donaghy, M., Cameron, J.C., & Friederichs, V. (2006). Increasing incidence of mumps in scotland: options for reducing transmission. Journal of Clinical Virology, 35(2), 121–129.

    Article  Google Scholar 

  • Finkenstädt, B., Keeling, M., & Grenfell, B. (1998). Patterns of density dependence in measles dynamics. Proceedings of the Royal Society of London B: Biological Sciences, 265(1398), 753–762.

    Article  Google Scholar 

  • Hethcote, H.W. (2000). The mathematics of infectious diseases. SIAM Review, 42(4), 599–653.

    Article  MathSciNet  MATH  Google Scholar 

  • Georgoulas, A., Hillston, J., Milios, D., & Sanguinetti, G. (2014). Probabilistic programming process algebra. In International conference on quantitative evaluation of systems. Springer (pp. 249–264).

  • Guerriero, M.L. (2009). Qualitative and quantitative analysis of a Bio-PEPA model of the gp130/jak/stat signalling pathway. In Transactions on computational systems biology XI. Springer (pp. 90–115).

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I.H. (2009). The WEKA data mining software: an upyear. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.

    Article  Google Scholar 

  • Hamami, D., & Atmani, B. (2012). Modeling the effect of vaccination on varicella using Bio-PEPA. In International conference on modeling and simulation MS2012. Proc IASTED (pp. 783–077).

  • Hamami, D., & Atmani, B. (2013). Tuberculosis modelling using bio-PEPA approach. World Academy of Science, Engineering and Technology, International Journal of Medical, Health, Biomedical, Bioengineering and Pharmaceutical Engineering, 7(4), 183–190.

    Google Scholar 

  • Hamami, D., & Atmani, B. (2016). Obtaining optimal Bio-PEPA model using association rules: Approach applied to tuberculosis case study. In International conference on information systems for crisis response and management in Mediterranean countries. Springer (pp. 62–75).

  • Hamami, D., Baghdad, A., & Shankland, C. (2017). Decision support based on bio-PEPA modeling and decision tree induction: a new approach, applied to a tuberculosis case study. International Journal of Information Systems in the Service Sector (IJISSS), 9(2), 71–101.

    Article  Google Scholar 

  • Hickson, R., Mercer, G., Lokuge, K., & et al. (2011). Sensitivity analysis of a model for tuberculosis. In 19th international congress on modelling and simulation (pp. 926–932).

  • Inbarani, H.H., Azar, A.T., & Jothi, G. (2014). Supervised hybrid feature selection based on pso and rough sets for medical diagnosis. Computer Methods and Programs in Biomedicine, 113(1), 175–185.

    Article  Google Scholar 

  • Keeling, M.J., & Rohani, P. (2008). Modeling infectious diseases in humans and animals. Princeton University Press.

  • Kim, H., Ishag, M.I.M., Piao, M., Kwon, T., & Ryu, K.H. (2016). A data mining approach for cardiovascular disease diagnosis using heart rate variability and images of carotid arteries. Symmetry, 8(6), 47.

    Article  MathSciNet  Google Scholar 

  • Liao, T.W. (2005). Clustering of time series data — a survey. Pattern Recognition, 38(11), 1857–1874.

    Article  MATH  Google Scholar 

  • Lynch, S.M., & Moore, J.H. (2016). A call for biological data mining approaches in epidemiology. BioData Mining, 9(1), 1.

    Article  Google Scholar 

  • Malthus, T.R. (1888). An essay on the principle of population: or, a view of its past and present effects on human happiness. Reeves & Turner.

  • Moore, J.L., Liang, S., Akullian, A., & Remais, J.V. (2012). Cautioning the use of degree-day models for climate change projections in the presence of parametric uncertainty. Ecological Applications, 22(8), 2237–2247.

    Article  Google Scholar 

  • Morgan-Capner, P., Wright, J., Miller, C.L., & Miller, E. (1988). Surveillance of antibody to measles, mumps, and rubella by ag. BMJ, 297(6651), 770–772.

    Article  Google Scholar 

  • Oaken, D.R. (2014). Optimisation of definition structures parameter values in process algebra models using evolutionary computation. PhD thesis, University of Stirling.

  • Okaïs, C., Roche, S. , Kürzinger, M.-L. , Riche, B., Bricout, H., Derrough, T., Simondon, F., & Ecochard, R. (2010). Methodology of the sensitivity analysis used for modeling an infectious disease. Vaccine, 28(51), 8132–8140.

    Article  Google Scholar 

  • Pardalos, P.M., Boginski, V.L., & Alkis, V. (2008). Data mining in biomedicine (Vol. 7). Springer Science & Business Media.

  • Ramanathan, A., Steed, C.A., & Pullum, L.L. (2012). Verification of compartmental epidemiological models using metamorphic testing, model checking and visual analytics. In 2012 ASE/IEEE international conference on BioMedical computing (BioMedCom). IEEE (pp. 68–73).

  • Ross, B.J., Imada, J., & Evolving stochastic processes using feature tests and genetic programming (2009). In Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM (pp. 1059–1066).

  • Sebban, M., Mokrousov, I., Rastogi, N., & Sola, C. (2002). A data-mining approach to spacer oligonucleotide typing of mycobacterium tuberculosis. Bioinformatics, 18(2), 235–243.

    Article  Google Scholar 

  • Sullivan, R. (2012). Introduction to data mining for the life sciences. Springer Science & Business Media.

  • Sumner, T. (2010). Sensitivity analysis in systems biology modelling and its application to a multi-scale model of blood glucose homeostasis. PhD thesis, University College London: UCL.

    Google Scholar 

  • Takla, A., Wichmann, O., Klinc, C., Hautmann, H., Rieck, T., & Koch, J. (2013). Mumps epidemiology in Germany 2007–11. Eurosurveillance, 18(33), 20557.

    Article  Google Scholar 

  • Tomar, D., & Agarwal, S. (2013). A survey on data mining approaches for healthcare. International Journal of Bio-Science and Bio-Technology, 5(5), 241–266.

    Article  Google Scholar 

  • Vespignani, A. (2012). Modelling dynamical processes in complex socio-technical systems. Nature Physics, 8(1), 32–39.

    Article  Google Scholar 

  • Witten, I.H., Frank, E., Hall, M.A., & Pal, C.J. (2016). Data mining: practical machine learning tools and techniques. Morgan Kaufmann.

  • Wu, J., Dhingra, R., Gambhir, M., & Remais, J.V. (2013). Sensitivity analysis of infectious disease models: methods, advances and their application. Journal of the Royal Society Interface, 10(86), 20121018.

    Article  Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for critical support and review of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalila Hamami.

Appendices

Appendix A: Additional figures

Figure 8 (resp. Figs. 9 and 10) shows that K-means clustering applied to mumps confirmed cases and plotted by Age (resp. MMR status and Sex) is not meaningful for the model where both clusters depicted similar age groups (resp. similar distribution of MMR status and Sex).

Fig. 8
figure 8

K-means clustering applied to mumps confirmed cases OneYearOneBoardVaccStatus: Clusters vs Age

Fig. 9
figure 9

K-means clustering applied to mumps confirmed cases OneYearOneBoardVaccStatus: Clusters vs MMR Status

Fig. 10
figure 10

K-means clustering applied to mumps confirmed cases OneYearOneBoardVaccStatus: Clusters vs Sex

Appendix: B: Model

  1. 1.

    mu = 0.000037;

  2. 2.

    beta1 = 0.7;

  3. 3.

    beta2 = 0.9;

  4. 4.

    beta3 = 0.4;

  5. 5.

    mu1 = 0.0000021;

  6. 6.

    mu2 = 0.0000028;

  7. 7.

    mu3 = 0.000025;

  8. 8.

    alpha = 0.05;

  9. 9.

    gamma = 0.143;

  10. 10.

    lambda = 0.07;

  11. 11.

    tau = 0.00034;

  12. 12.

    delta = tau/2;

  13. 13.

    sizeOutside = 110000;

  14. 14.

    sizeLocal = 5300000;

  15. 15.

    location world : size = 5200000 , type = compartment;

  16. 16.

    location Local in world: size = sizeLocal, type = compartment;

  17. 17.

    location Local in world: size = sizeLocal, type = compartment;

  18. 18.

    location Outside in world : size = sizeOutside, type = compartment;

  19. 19.

    thigh = 4;

  20. 20.

    tlow = 9;

  21. 21.

    month = floor(time/30);

  22. 22.

    season_time = 1-H(((month - 12*floor(month/12)) - tlow)* (thigh-(month - 12*floor(month/12))) );

  23. 23.

    N = (S1@Local + E@Local + I@Local + R@Local + S2@Local + MMR1@Local + MMR2@Local);Kinetic Laws

  24. 24.

    kineticLawOf BIRTH1: mu1 * N;

  25. 25.

    kineticLawOf BIRTH2: mu2 * N;

  26. 26.

    kineticLawOf BIRTH3: mu3 * N;

  27. 27.

    kineticLawOf MMR1_S2: MMR1@Local *tau;

  28. 28.

    kineticLawOf MMR2_S2: MMR2@Local *delta;

  29. 29.

    kineticLawOf Death_MMR1 : mu * MMR1@Local;

  30. 30.

    kineticLawOf Death_MMR2 : mu * MMR2@Local;

  31. 31.

    kineticLawOf immigration : lambda/10000;

  32. 32.

    kineticLawOf S1_E: (beta1 * S1@Local * I@Local)/N * (season_time) + (1-season_time)*(beta3 * S1@Local * I@Local)/N ;

  33. 33.

    kineticLawOf S2_E: (beta2 * S2@Local * I@Local)/N * (season_time) + (1-season_time)* (beta3 * S2@Local * I@Local)/N;

  34. 34.

    kineticLawOf E_I: alpha * E@Local;

  35. 35.

    kineticLawOf I_R: gamma * I@Local;

  36. 36.

    kineticLawOf Death_S1: mu * S1@Local;

  37. 37.

    kineticLawOf Death_I: mu * I@Local ;

  38. 38.

    kineticLawOf Death_E: mu * E@Local;

  39. 39.

    kineticLawOf Death_S2: mu * S2@Local;

  40. 40.

    kineticLawOf Death_R: mu * R@Local;Species

  41. 41.

    S1 = (BIRTH1,1) >> S1@Local + (S1_E,1) << S1@Local + Death_S1 << S1@Local;

  42. 42.

    S2 = (S2_E,1) << S2@Local + Death_S2 << S2@Local + (MMR2_S2,1) >> S2@Local + (MMR1_S2,1) >> S2@Local;

  43. 43.

    E = (S1_E,1) >> E@Local + (S2_E,1) >> E@Local + (E_I,1) << E@Local+ Death_E << E@Local;

  44. 44.

    I = (E_I,1) >> I@Local + (I_R,1) << I@Local + Death_I << I@Local+ immigration[Outside → Local] (.)I + (S1_E,1) (.) I + (S2_E,1) (.) I;

  45. 45.

    R = (I_R,1) >> R@Local+ Death_R << R@Local ;

  46. 46.

    MMR1 = (BIRTH2,1) >> MMR1@Local + (MMR1_S2,1) << MMR1@Local+ Death_MMR1 << ;

  47. 47.

    MMR2 = (BIRTH3,1) >> MMR2@Local + (MMR2_S2,1) << MMR2@Local+ Death_MMR2 << ;Model component

  48. 48.

    S1@Local[1100000] < ∗ > S2@Local[0] < ∗ > E@Local[0] < ∗ > I@Local[20]< ∗ > R@Local[3218600] < ∗ > MMR1@Local[273541] < ∗ > MMR2@Local[250000] < ∗ > I@Outside[10000]

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamami, D., Atmani, B., Cameron, R. et al. Improving process algebra model structure and parameters in infectious disease epidemiology through data mining. J Intell Inf Syst 52, 477–499 (2019). https://doi.org/10.1007/s10844-017-0476-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0476-1

Keywords

Navigation