Skip to main content
Log in

K-means DTW Barycenter Averaging: a clustering analysis of COVID-19 cases and deaths on the Brazilian federal units

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

A challenge faced while monitoring the COVID-19 pandemic in Brazil is the identification of patterns of incidence and mortality, which can help prioritize interventions to avoid excessive disease transmission and associated deaths. This study aimed to identify epidemiological patterns concerning the evolution of the pandemic among Brazilian federal units (states). The proposed methodology is based on a combination of non-hierarchical k-means clustering and dynamic time warping (DTW), used to measure distances among time series, with the subsequent use of the DTW Barycenter Averaging (DBA) algorithm to calculate cluster centroids for time series of variable lengths. The dataset used is a time series consisting of the number of new cases and deaths per epidemiological week, and the number of cumulative cases and deaths until a given epidemiological week for each of the 27 Brazilian federal units. Six groups of Brazilian federation units were formed based on the similarities between the prevalence and incidence curves. The results demonstrate efficiency with respect to the characterization of both COVID-19 cases and rates of mortality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Brasil.: Brasil Confirma Primeiro Caso do Novo Coronavírus. https://www.gov.br/pt-br/noticias/saude-e-vigilancia-sanitaria/2020/02/brasil-confirma-primeiro-caso-do-novo-coronavirus. Accessed 30 Jan 2024

  2. Moura, E.C., Cortez-Escalante, J., Cavalcante, F.V., Barreto, I.C.D.H.C., Sanchez, M.N., Santos, L.M.P.: Covid-19: temporal evolution and immunization in the three epidemiological waves, Brazil, 2020–2022. Revista de Saúde Pública 56, 105 (2022)

    Article  Google Scholar 

  3. Ichihara, M.Y., C Costa, L.C., Fiaccone, R.L., de Medeiros, A.G., Bellido, J., Souza, R.F.D.S., Rocha, C., Anjos, A.F.D., Sebastião, M., Pimenta, D., et al.: Measuring social inequalities in health in the covid-19 pandemic in a middle-income country: the ids-covid-19 index (2023)

  4. Zeiser, F.A., Donida, B., da Costa, C.A., de Oliveira Ramos, G., Scherer, J.N., Barcellos, N.T., Alegretti, A.P., Ikeda, M.L.R., Müller, A.P.W.C., Bohn, H.C., et al.: First and second covid-19 waves in Brazil: a cross-sectional study of patients’ characteristics related to hospitalization and in-hospital mortality. Lancet Region. Health Am. 6, 8 (2022)

    Google Scholar 

  5. Cota, W.: Monitoring the number of COVID-19 cases and deaths in Brazil at municipal and federative units level. SciELOPreprints (2020). https://doi.org/10.1590/scielopreprints.362

    Article  Google Scholar 

  6. Oded Maimon, L.R.: Data Mining and Knowledge Discovery Handbook, 1st edn. Springer, Berlin (2005)

    Book  Google Scholar 

  7. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Data Management Systems Series, Morgan Kaufmann Publishers (2001)

    Google Scholar 

  8. Lahreche, A., Boucheham, B.: A fast and accurate similarity measure for long time series classification based on local extrema and dynamic time warping. Expert Syst. Appl. 168, 114374 (2021). https://doi.org/10.1016/j.eswa.2020.114374

    Article  Google Scholar 

  9. Landmesser, J.: The use of the dynamic time warping (dtw) method to describe the covid-19 dynamics in Poland. Oeconomia Copernicana 12(3), 539–556 (2021). https://doi.org/10.24136/oc.2021.018

    Article  Google Scholar 

  10. Jeong, Y.-S., Jeong, M.K., Omitaomu, O.A.: Weighted dynamic time warping for time series classification. Pattern Recogn. (2011). https://doi.org/10.1016/j.patcog.2010.09.022

    Article  Google Scholar 

  11. Sakoe, H., Chiba, S.: A dynamic programming approach to continuous speech recognition. Proc. Seventh Int. Congress Acoust. 3, 65–69 (1971)

    Google Scholar 

  12. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978). https://doi.org/10.1109/TASSP.1978.1163055

    Article  Google Scholar 

  13. Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975). https://doi.org/10.1109/TASSP.1975.1162641

    Article  Google Scholar 

  14. Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints (2004)

  15. Niennattrakul, V., Ratanamahatana, C.A.: Inaccuracies of shape averaging method using dynamic time warping for time series data. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) Computational Science—ICCS 2007, pp. 513–520. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-72584-8_68

  16. Niennattrakul, V., Ratanamahatana, C.A.: On clustering multimedia time series data using k-means and dynamic time warping. In: 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE’07), pp. 733–738 (2007). https://doi.org/10.1109/MUE.2007.165

  17. Petitjean, F., Ketterlin, A., Gançarski, P.: A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44(3), 678–693 (2011). https://doi.org/10.1016/j.patcog.2010.09.013

    Article  Google Scholar 

  18. Petitjean, F., Forestier, G., Webb, G.I., Nicholson, A.E., Chen, Y., Keogh, E.: Dynamic time warping averaging of time series allows faster and more accurate classification. In: 2014 IEEE International Conference on Data Mining, pp. 470–479 (2014). https://doi.org/10.1109/ICDM.2014.27

  19. Jang, M., Han, M.-S., Kim, J.-H., Yang, H.-S.: In: Mehrotra, K.G., Mohan, C., Oh, J.C., Varshney, P.K., Ali, M. (eds.) Dynamic Time Warping-Based K-Means Clustering for Accelerometer-Based Handwriting Recognition, pp. 21–26. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-21332-8_3

  20. Anh, D.T., Thanh, L.H.: An efficient implementation of k-means clustering for time series data with dtw distance. Int. J. Bus. Intell. Data Min. 10(3), 213–232 (2015). https://doi.org/10.1504/IJBIDM.2015.071311

    Article  Google Scholar 

  21. Forestier, G., Petitjean, F., Dau, H.A., Webb, G.I., Keogh, E.: Generating synthetic time series to augment sparse datasets. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 865–870 (2017). https://doi.org/10.1109/ICDM.2017.106

  22. Cuturi, M., Blondel, M.: Soft-dtw: a differentiable loss function for time-series (2017). arXiv preprint arXiv:1703.01541. https://doi.org/10.48550/arXiv.1703.01541

  23. Leodolter, M., Plant, C., Brändle, N.: Incdtw: an r package for incremental calculation of dynamic time warping. J. Stat. Softw. Art. 99(9), 1–23 (2021). https://doi.org/10.18637/jss.v099.i09

    Article  Google Scholar 

  24. Javed, A., Rizzo, D.M., Lee, B.S., Gramling, R.: Somtimes: self organizing maps for time series clustering and its application to serious illness conversations. CoRR (2021) arXiv: 2108.11523. https://doi.org/10.48550/arXiv.2108.11523

  25. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974). https://doi.org/10.1080/03610927408827101

  26. da Silva, P.L.P.: Um estudo sobre o agrupamento de séries temporais e sua aplicação em curvas de carga residenciais. Master’s thesis, Universidade Federal de Minas Gerais (2016). https://repositorio.ufmg.br/bitstream/1843/BUOS-APWMJD/1/versao_final_dissertacao_impressao_capa_dura_pedro_pazzini.pdf

  27. R Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. R Foundation for Statistical Computing (2021). https://www.R-project.org/

  28. Giorgino, T.: Computing and visualizing dynamic time warping alignments in R: the dtw package. J. Stat. Softw. 31(7), 1–24 (2009). https://doi.org/10.18637/jss.v031.i07

    Article  Google Scholar 

  29. Sardá-Espinosa, A.: Time-series clustering in r using the dtwclust package. Roy J. (2019). https://doi.org/10.32614/RJ-2019-023

    Article  Google Scholar 

  30. Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T.L., Miller, E., Bache, S.M., Müller, K., Ooms, J., Robinson, D., Seidel, D.P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., Yutani, H.: Welcome to the tidyverse. J. Open Source Softw. 4(43), 1686 (2019). https://doi.org/10.21105/joss.01686

    Article  Google Scholar 

  31. Box, G., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Holden-Day (1970)

    Google Scholar 

Download references

Acknowledgements

This article is the result of part of the project entitled “Avaliação dos efeitos das desigualdades sociais na pandemia COVID-19 em país de baixa e média renda,” (Assessment of the effects of social inequalities during the COVID-19 pandemic in low- and middle-income countries) carried out by the Center for Integration of Data and Knowledge for Health, linked to the Gonçalo Muniz Institute, Oswaldo Cruz Foundation (CIDACS/IGM-FIOCRUZ). This project was funded by Health Data Research United Kingdom (HDR-UK) and received support from the Bill and Melinda Gates Foundation.

Funding

This project was funded by Health Data Research United Kingdom (HDR-UK) and received support from the Bill and Melinda Gates Foundation.

Author information

Authors and Affiliations

Authors

Contributions

All authors played essential roles throughout the development of this work, actively contributing from the initial planning phase of the study to the completion of the article’s writing. Jonatas Silva do Espirito Santo and Jackson Santos da Conceição were responsible for creating the figures, in collaboration with Lilia Carolina Carneiro da Costa, Rosemeire Leovigildo Fiaccone, and Anderson Ara, who drafted the manuscript. The critical review of the article and final approval were conducted by Maria Yury Ichihara, Marcos Ennes Barreto, and all other authors.

Corresponding author

Correspondence to Jonatas Silva do Espirito Santo.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

do Espirito Santo, J.S., da Conceição, J.S., da Costa, L.C.C. et al. K-means DTW Barycenter Averaging: a clustering analysis of COVID-19 cases and deaths on the Brazilian federal units. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00542-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41060-024-00542-9

Keywords

Navigation