Improving the Quality of User Generated Data Sets for Activity Recognition
It is fully appreciated that progress in the development of data driven approaches to activity recognition are being hampered due to the lack of large scale, high quality, annotated data sets. In an effort to address this the Open Data Initiative (ODI) was conceived as a potential solution for the creation of shared resources for the collection and sharing of open data sets. As part of this process, an analysis was undertaken of datasets collected using a smart environment simulation tool. A noticeable difference was found in the first 1–2 cycles of users generating data. Further analysis demonstrated the effects that this had on the development of activity recognition models with a decrease of performance for both support vector machine and decision tree based classifiers. The outcome of the study has led to the production of a strategy to ensure an initial training phase is considered prior to full scale collection of the data.
KeywordsActivity recognition Open data sets Data validation Data driven classification
Invest Northern Ireland partially supported this project under the Competence Centre Program Grant RD0513853 – Connected Health Innovation Centre.
- 1.Nugent, C., Cleland, I., Epsinilla, M., Santanna, A., Synnott, J., Banos, O., Lundstrom, J., Hallberg, J., Calzada, A.: An initiative for the creation of open datasets within pervasive healthcare. In: Future of Pervasive Health Workshop. ACM (2016). doi: 10.4108/eai.16-5-2016.2263830
- 2.Synnott, J., Nugent, C., Zhang, S., et al.: Environment simulation for the promotion of the open data initiative. In: SmartSys Workshop (2016)Google Scholar
- 3.Ortiz Barrios, M., Nugent, C., Synnott, J.: A methodology for assessing the quality of datasets in support of data driven activity recognition. In: EMBC 2106 (2016, in press)Google Scholar
- 4.Ubihealth project. http://www.ubihealth-project.eu/index.php. Accessed 8 March 2016
- 5.Sagha, H., et al.: Benchmarking classification techniques using the opportunity human activity dataset, In: 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Anchorage, AK, pp. 36–40 (2011)Google Scholar
- 6.UCI machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 08 March 2016
- 7.PhysioNet: the research resource for complex physiologic signals. https://www.physionet.org/. Accessed 08 March 16
- 8.Synnott, J., Chen, L., Nugent, C.D., Moore, G.: The creation of simulated activity datasets using a graphical intelligent environment simulation tool. In: Engineering in Medicine and Biology Society (EMBC), pp. 4143–4146 (2014)Google Scholar
- 11.Herazo-Padilla, N., Montoya-Torres, J.R., Muñoz-Villamizar, A., Isaza, S.N., Polo, L.R.: Coupling ant colony optimization and discrete-event simulation to solve a stochastic location-routing problem. In 2013 Winter Simulations Conference (WSC), pp. 3352–3362, December 2013. IEEE (2013)Google Scholar