Complementing real datasets with simulated data: a regression-based approach

  • M. A. Ortiz-BarriosEmail author
  • J. Lundström
  • J. Synnott
  • E. Järpe
  • A. Sant’Anna


Activity recognition in smart environments is essential for ensuring the wellbeing of older residents. By tracking activities of daily living (ADLs), a person’s health status can be monitored over time. Nonetheless, accurate activity classification must overcome the fact that each person performs ADLs in different ways and in homes with different layouts. One possible solution is to obtain large amounts of data to train a supervised classifier. Data collection in real environments, however, is very expensive and cannot contain every possible variation of how different ADLs are performed. A more cost-effective solution is to generate a variety of simulated scenarios and synthesize large amounts of data. Nonetheless, simulated data can be considerably different from real data. Therefore, this paper proposes the use of regression models to better approximate real observations based on simulated data. To achieve this, ADL data from a smart home were first compared with equivalent ADLs performed in a simulator. Such comparison was undertaken considering the number of events per activity, number of events per type of sensor per activity, and activity duration. Then, different regression models were assessed for calculating real data based on simulated data. The results evidenced that simulated data can be transformed with a prediction accuracy R2 = 97.03%.


Activity recognition Activity duration Regression analysis Non-linear models Determination coefficient Quantile-quantile plots 



The Authors which to acknowledge support from the REMIND Project from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 734355.


  1. 1.
    Alshammari N, Alshammari T, Sedky M, Champion J, Bauer C (2017) OpenSHS: open smart home simulator. Sensors 17:1003CrossRefGoogle Scholar
  2. 2.
    Alshammari T, Alshammari N, Sedky M, Howard C (2018) SIMADL simulated activities of daily living dataset. Data 3:11CrossRefGoogle Scholar
  3. 3.
    Ane A, Alyssa W, Maureen S-E, et al. (2018) Smart home-based prediction of multidomain symptoms related to alzheimer’s disease. IEEE J Biomed Health Inform 22:1720–1731CrossRefGoogle Scholar
  4. 4.
    Debes C, Merentitis A, Sukhanov S, Niessen M, Frangiadakis N, Bauer A (2016) Monitoring activities of daily living in smart homes: understanding human behavior. IEEE Signal Processing Magazine 33:81–94CrossRefGoogle Scholar
  5. 5.
    De-La-Hoz-Franco E, Paola A-C, Medina QJ (2018) Espinilla macarena. Sensor-based datasets for human activity recognition–a systematic review of literature. IEEE Access 6:59192–59210CrossRefGoogle Scholar
  6. 6.
    DESA UN (2015) United Nations department of economic and social affairs, population division. World population prospects: the 2015 revision, key findings and advance tables in Technical Report Working Paper No. ESA/P/WP. 241Google Scholar
  7. 7.
    Mendoza-Palechor F, Menezes M L, SantAnna A, Ortiz-Barrios M, Samara A, Galway L (2019) Affective recognition from EEG signals: an integrated data-mining approach. Journal of Ambient Intelligence and Humanized Computing 10(10):3955–3974CrossRefGoogle Scholar
  8. 8.
    Francillette Y, Boucher E, Bouzouane A, Gaboury S (2017) The virtual environment for rapid prototyping of the intelligent environment. Sensors 17:2562CrossRefGoogle Scholar
  9. 9.
    Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  10. 10.
    Gergonne JD (1974) The application of the method of least squares to the interpolation of sequences. Historia Mathematica 1:439–437. Translated by Ralph St. John and Stephen M. Stigler from the 1815 French editionMathSciNetCrossRefGoogle Scholar
  11. 11.
    Hamad R, Järpe E, Lundström J (2018) Stability analysis of the t-SNE algorithm for human activity pattern data. In: The 2018 IEEE international conference on systems, man, and cybernetics (SMC2018)Google Scholar
  12. 12.
    Helal S, Kim E, Hossain S (2010) Scalable approaches to activity recognition research . In: Proceedings of the 8th international conference pervasive workshop, pp 450–453Google Scholar
  13. 13.
    Helal S, Lee JW, Hossain S, Kim E, Hagras H, Cook D (2011) Persim-simulator for human activities in pervasive spaces. In: 2011 7th international conference on intelligent environments (IE), pp 192–199IEGoogle Scholar
  14. 14.
    Holmes J (2016) An overview of the domiciliary care market in the UKGoogle Scholar
  15. 15.
    Kamara-Esteban O, Azkune G, Pijoan A, Borges CE, Alonso-Vicario A, López ID (2017) MASSHA: an agent-based approach for human activity simulation in intelligent environments. Pervasive Mobile Comput 40:279–300CrossRefGoogle Scholar
  16. 16.
    Krishnan NC, Cook D (2014) Activity recognition on streaming sensor data. Pervasive Mobile Comput 10:138–154CrossRefGoogle Scholar
  17. 17.
    Larsen RJ, Marx ML (2006) An introduction to mathematical statistics and its applications. Pearson6 edGoogle Scholar
  18. 18.
    Lee JW, Cho S, Liu S, Cho K, Helal S (2015) Persim 3d: context-driven simulation and modeling of human activities in smart spaces. IEEE Trans Autom Sci Eng 12:1243–1256CrossRefGoogle Scholar
  19. 19.
    Lundström J, De Morais WO, Menezes M, et al. (2016) Halmstad intelligent home-capabilities and opportunities. In: International conference on IoT technologies for healthcare. Springer, pp 9–15Google Scholar
  20. 20.
    Miguel O, Dionicio N, Genett J, Hugo H (2016) Solving flexible job-shop scheduling problem with transfer batches setup times and multiple resources in apparel industry. In: International conference in swarm intelligence. Springer, pp 47–58Google Scholar
  21. 21.
    Millan-Calenti JC, Tubío J, Pita-Fernández S, et al. (2010) Prevalence of functional disability in activities of daily living (ADL), instrumental activities of daily living (IADL) and associated factors, as predictors of morbidity and mortality. Archives of Gerontology and Geriatrics 50:306–310CrossRefGoogle Scholar
  22. 22.
    Mlinac ME, Feng MC (2016) Assessment of activities of daily living, self-care, and independence. Archives of Clinical Neuropsychology 31:506–516CrossRefGoogle Scholar
  23. 23.
    Nugent C, Synnott J, Celeste G (2016) Improving the quality of user generated data sets for activity recognition. In: Ubiquitous computing and ambient intelligence. Springer, pp 104–110Google Scholar
  24. 24.
    National Statistics Office (2018) Labour force surveyGoogle Scholar
  25. 25.
    Organization World Health (2015) World report on ageing and health. World Health OrganizationGoogle Scholar
  26. 26.
    Ortiz BM, Felizzola JH (2015) Reduction of average lead time in outpatient service of obstetrics through six sigma methodology. In: Ambient intelligence for health. Springer, pp 293–302Google Scholar
  27. 27.
    Paterson C (2018) World alzheimer report 2018. Alzheimer’S Disease InternationalGoogle Scholar
  28. 28.
    Prince MJ, Wu F, Guo Y, et al. (2015) The burden of disease in older people and implications for health policy and practice. The Lancet 385:549–562CrossRefGoogle Scholar
  29. 29.
  30. 30.
    Stepler R (2016) Smaller share of women ages 65 and older are living alone: more are living with spouse or children. Pew Research CenterGoogle Scholar
  31. 31.
    Suits DB (1957) Use of dummy variables in regression equations. Journal of the American Statistical Association 52:548–551CrossRefGoogle Scholar
  32. 32.
    Synnott J, Nugent C, Jeffers P (2015) Simulation of smart home activity datasets. Sensors 15:14162–14179CrossRefGoogle Scholar
  33. 33.
    Synnott J, Nugent C, Zhang S, et al. (2016) Environment simulation for the promotion of the open data initiative. In: 2016 IEEE international conference on smart computing (SMARTCOMP), pp 1–6IEEEGoogle Scholar
  34. 34.
    Vittinghoff E, Gidden DV, Shiboski SC (2011) Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. Springer2 edGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Department of Industrial Management, Agroindustry and OperationsUniversidad de la Costa CUCBarranquillaColombia
  2. 2.Convergia ConsultingHalmstadSweden
  3. 3.School of Computing, Computer Science Research InstituteUlster UniversityBelfastUK
  4. 4.Department of Intelligent Systems and Digital DesignHalmstad UniversityHalmstadSweden

Personalised recommendations