Complementing real datasets with simulated data: a regression-based approach


Activity recognition in smart environments is essential for ensuring the wellbeing of older residents. By tracking activities of daily living (ADLs), a person’s health status can be monitored over time. Nonetheless, accurate activity classification must overcome the fact that each person performs ADLs in different ways and in homes with different layouts. One possible solution is to obtain large amounts of data to train a supervised classifier. Data collection in real environments, however, is very expensive and cannot contain every possible variation of how different ADLs are performed. A more cost-effective solution is to generate a variety of simulated scenarios and synthesize large amounts of data. Nonetheless, simulated data can be considerably different from real data. Therefore, this paper proposes the use of regression models to better approximate real observations based on simulated data. To achieve this, ADL data from a smart home were first compared with equivalent ADLs performed in a simulator. Such comparison was undertaken considering the number of events per activity, number of events per type of sensor per activity, and activity duration. Then, different regression models were assessed for calculating real data based on simulated data. The results evidenced that simulated data can be transformed with a prediction accuracy R2 = 97.03%.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Alshammari N, Alshammari T, Sedky M, Champion J, Bauer C (2017) OpenSHS: open smart home simulator. Sensors 17:1003

    Article  Google Scholar 

  2. 2.

    Alshammari T, Alshammari N, Sedky M, Howard C (2018) SIMADL simulated activities of daily living dataset. Data 3:11

    Article  Google Scholar 

  3. 3.

    Ane A, Alyssa W, Maureen S-E, et al. (2018) Smart home-based prediction of multidomain symptoms related to alzheimer’s disease. IEEE J Biomed Health Inform 22:1720–1731

    Article  Google Scholar 

  4. 4.

    Debes C, Merentitis A, Sukhanov S, Niessen M, Frangiadakis N, Bauer A (2016) Monitoring activities of daily living in smart homes: understanding human behavior. IEEE Signal Processing Magazine 33:81–94

    Article  Google Scholar 

  5. 5.

    De-La-Hoz-Franco E, Paola A-C, Medina QJ (2018) Espinilla macarena. Sensor-based datasets for human activity recognition–a systematic review of literature. IEEE Access 6:59192–59210

    Article  Google Scholar 

  6. 6.

    DESA UN (2015) United Nations department of economic and social affairs, population division. World population prospects: the 2015 revision, key findings and advance tables in Technical Report Working Paper No. ESA/P/WP. 241

  7. 7.

    Mendoza-Palechor F, Menezes M L, SantAnna A, Ortiz-Barrios M, Samara A, Galway L (2019) Affective recognition from EEG signals: an integrated data-mining approach. Journal of Ambient Intelligence and Humanized Computing 10(10):3955–3974

    Article  Google Scholar 

  8. 8.

    Francillette Y, Boucher E, Bouzouane A, Gaboury S (2017) The virtual environment for rapid prototyping of the intelligent environment. Sensors 17:2562

    Article  Google Scholar 

  9. 9.

    Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Book  Google Scholar 

  10. 10.

    Gergonne JD (1974) The application of the method of least squares to the interpolation of sequences. Historia Mathematica 1:439–437. Translated by Ralph St. John and Stephen M. Stigler from the 1815 French edition

    MathSciNet  Article  Google Scholar 

  11. 11.

    Hamad R, Järpe E, Lundström J (2018) Stability analysis of the t-SNE algorithm for human activity pattern data. In: The 2018 IEEE international conference on systems, man, and cybernetics (SMC2018)

  12. 12.

    Helal S, Kim E, Hossain S (2010) Scalable approaches to activity recognition research . In: Proceedings of the 8th international conference pervasive workshop, pp 450–453

  13. 13.

    Helal S, Lee JW, Hossain S, Kim E, Hagras H, Cook D (2011) Persim-simulator for human activities in pervasive spaces. In: 2011 7th international conference on intelligent environments (IE), pp 192–199IE

  14. 14.

    Holmes J (2016) An overview of the domiciliary care market in the UK

  15. 15.

    Kamara-Esteban O, Azkune G, Pijoan A, Borges CE, Alonso-Vicario A, López ID (2017) MASSHA: an agent-based approach for human activity simulation in intelligent environments. Pervasive Mobile Comput 40:279–300

    Article  Google Scholar 

  16. 16.

    Krishnan NC, Cook D (2014) Activity recognition on streaming sensor data. Pervasive Mobile Comput 10:138–154

    Article  Google Scholar 

  17. 17.

    Larsen RJ, Marx ML (2006) An introduction to mathematical statistics and its applications. Pearson6 ed

  18. 18.

    Lee JW, Cho S, Liu S, Cho K, Helal S (2015) Persim 3d: context-driven simulation and modeling of human activities in smart spaces. IEEE Trans Autom Sci Eng 12:1243–1256

    Article  Google Scholar 

  19. 19.

    Lundström J, De Morais WO, Menezes M, et al. (2016) Halmstad intelligent home-capabilities and opportunities. In: International conference on IoT technologies for healthcare. Springer, pp 9–15

  20. 20.

    Miguel O, Dionicio N, Genett J, Hugo H (2016) Solving flexible job-shop scheduling problem with transfer batches setup times and multiple resources in apparel industry. In: International conference in swarm intelligence. Springer, pp 47–58

  21. 21.

    Millan-Calenti JC, Tubío J, Pita-Fernández S, et al. (2010) Prevalence of functional disability in activities of daily living (ADL), instrumental activities of daily living (IADL) and associated factors, as predictors of morbidity and mortality. Archives of Gerontology and Geriatrics 50:306–310

    Article  Google Scholar 

  22. 22.

    Mlinac ME, Feng MC (2016) Assessment of activities of daily living, self-care, and independence. Archives of Clinical Neuropsychology 31:506–516

    Article  Google Scholar 

  23. 23.

    Nugent C, Synnott J, Celeste G (2016) Improving the quality of user generated data sets for activity recognition. In: Ubiquitous computing and ambient intelligence. Springer, pp 104–110

  24. 24.

    National Statistics Office (2018) Labour force survey

  25. 25.

    Organization World Health (2015) World report on ageing and health. World Health Organization

  26. 26.

    Ortiz BM, Felizzola JH (2015) Reduction of average lead time in outpatient service of obstetrics through six sigma methodology. In: Ambient intelligence for health. Springer, pp 293–302

  27. 27.

    Paterson C (2018) World alzheimer report 2018. Alzheimer’S Disease International

  28. 28.

    Prince MJ, Wu F, Guo Y, et al. (2015) The burden of disease in older people and implications for health policy and practice. The Lancet 385:549–562

    Article  Google Scholar 

  29. 29.

    Statistics Minitab (2003)

  30. 30.

    Stepler R (2016) Smaller share of women ages 65 and older are living alone: more are living with spouse or children. Pew Research Center

  31. 31.

    Suits DB (1957) Use of dummy variables in regression equations. Journal of the American Statistical Association 52:548–551

    Article  Google Scholar 

  32. 32.

    Synnott J, Nugent C, Jeffers P (2015) Simulation of smart home activity datasets. Sensors 15:14162–14179

    Article  Google Scholar 

  33. 33.

    Synnott J, Nugent C, Zhang S, et al. (2016) Environment simulation for the promotion of the open data initiative. In: 2016 IEEE international conference on smart computing (SMARTCOMP), pp 1–6IEEE

  34. 34.

    Vittinghoff E, Gidden DV, Shiboski SC (2011) Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. Springer2 ed

Download references


The Authors which to acknowledge support from the REMIND Project from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 734355.

Author information



Corresponding author

Correspondence to M. A. Ortiz-Barrios.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ortiz-Barrios, M.A., Lundström, J., Synnott, J. et al. Complementing real datasets with simulated data: a regression-based approach. Multimed Tools Appl 79, 34301–34324 (2020).

Download citation


  • Activity recognition
  • Activity duration
  • Regression analysis
  • Non-linear models
  • Determination coefficient
  • Quantile-quantile plots