Skip to main content
Log in

Accelerated Sequential Data Clustering

  • Original Research
  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Data clustering is an important task in the field of data mining. In many real applications, clustering algorithms must consider the order of data, resulting in the problem of clustering sequential data. For instance, analyzing the moving pattern of an object and detecting community structure in a complex network are related to sequential data clustering. The constraint of the continuous region prevents previous clustering algorithms from being directly applied to the problem. A dynamic programming algorithm was proposed to address the issue, which returns the optimal sequential data clustering. However, it is not scalable and hence the practicality is limited. This paper revisits the solution and enhances it by introducing a greedy stopping condition. This condition halts the algorithm’s search process when it is likely that the optimal solution has been found. Experimental results on multiple datasets show that the algorithm is much faster than its original solution while the optimality gap is negligible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The Abdominal and Direct Fetal ECG Database analyzed during the current study is available in the https://physionet.org/content/adfecgdb/1.0.0/, the Brno University of Technology ECG Signal Database is available in the https://physionet.org/content/but-pdb/1.0.0/, the MIT-BIH arrhythmia database is available in the https://physionet.org/content/mitdb/1.0.0/, and the QT database is available in https://physionet.org/content/qtdb/1.0.0/ repository.

Notes

  1. This can be achieved by subtracting \(\text {Mean}(X)\) from each data point \(x_i\) in X

  2. Usage: pip install accelerated-sequence-clustering.

  3. https://physionet.org/content/adfecgdb/1.0.0/

  4. https://physionet.org/content/but-pdb/1.0.0/

  5. https://physionet.org/content/mitdb/1.0.0/

  6. https://physionet.org/content/qtdb/1.0.0/

  7. https://www.math.uwaterloo.ca/tsp/world/grpoints.html

References

  • Abbasi, M., Bhaskara, A., & Venkatasubramanian, S. (2021). Fair clustering via equitable group representations. In: Proceedings of the ACM conference on fairness, accountability, and transparency (pp. 504–514)

  • Abbasimehr, H., & Baghery, F. S. (2022). A novel time series clustering method with fine-tuned support vector regression for customer behavior analysis. Expert Systems with Applications (p. 117584)

  • Aloise, D., Deshpande, A., Hansen, P., et al. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2), 245–248.

    Article  Google Scholar 

  • Arthur, D. (2007). K-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027–1035). New Orleans, Louisiana, Society for Industrial and Applied Mathematics.

  • Bigdeli, A., Maghsoudi, A., & Ghezelbash, R. (2022). Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman District, NE Iran. Journal of Geochemical Exploration, 233(106), 923.

    Google Scholar 

  • Cerqueti, R., D’Urso, P., De Giovanni, L., et al. (2022). Weighted score-driven fuzzy clustering of time series with a financial application. Expert Systems with Applications, 198(116), 752.

    Google Scholar 

  • Chan, Z. S., Collins, L., & Kasabov, N. (2006). An efficient greedy k-means algorithm for global gene trajectory clustering. Expert Systems with Applications, 30(1), 137–141.

    Article  Google Scholar 

  • Ding, C., Sun, S., & Zhao, J. (2022). MST-GAT: A multimodal spatial-temporal graph attention network for time series anomaly detection. Information Fusion,.

  • Dogan, A., & Birant, D. (2022). K-centroid link: A novel hierarchical clustering linkage method. Applied Intelligence, 52(5), 5537–5560.

    Article  Google Scholar 

  • Dupin, N., Nielsen, F., & Talbi, E. (2018). Dynamic programming heuristic for K-means clustering among a 2-dimensional Pareto frontier. In: 7th International conference on metaheuristics and nature inspired computing (pp. 1–8)

  • Enayati, E., Mortazavi, R., Basiri, A., et al. (2023). Time series anomaly detection via clustering-based representation. Evolving Systems. In press

  • Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.

    Article  MathSciNet  Google Scholar 

  • Houssein, E. H., Ibrahim, I. E., Neggaz, N., et al. (2021). An efficient ECG arrhythmia classification method based on Manta ray foraging optimization. Expert Systems with Applications, 181(115), 131.

    Google Scholar 

  • Jezewski, J., Matonia, A., Kupka, T., et al. (2012). Determination of fetal heart rate from abdominal signals: Evaluation of beat-to-beat accuracy in relation to the direct fetal electrocardiogram. Biomedizinische Technik/Biomedical Engineering, 57(5), 383–394.

    Article  Google Scholar 

  • Kalti, K., & Touil, A. (2023). A robust contextual fuzzy C-means clustering algorithm for noisy image segmentation. Journal of Classification. In press

  • Kaya, M. F., & Schoop, M. (2022). Analytical comparison of clustering techniques for the recognition of communication patterns. Group Decision and Negotiation, 31(3), 555–589.

    Article  Google Scholar 

  • Laguna, P., Mark, R. G., Goldberg, A., et al. (1997). A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In: Computers in cardiology 1997 (pp. 673–676). IEEE

  • Lei, T., Jia, X., Zhang, Y., et al. (2018). Significantly fast and robust fuzzy C-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Transactions on Fuzzy Systems, 26(5), 3027–3041.

    Article  Google Scholar 

  • Li, A., Xiong, S., Li, J., et al. (2022). AngClust: Angle feature-based clustering for short time series gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics,.

  • Li, H. (2019). Multivariate time series clustering based on common principal component analysis. Neurocomputing, 349, 239–247.

    Article  Google Scholar 

  • Li, X., & Liu, H. (2018). Greedy optimization for K-means-based consensus clustering. Tsinghua Science and Technology, 23(2), 184–194.

    Article  Google Scholar 

  • Li, Y., Ma, J., Miao, Y., et al. (2020). Similarity search for encrypted images in secure cloud computing. IEEE Transactions on Cloud Computing,.

  • Lin, C. R., & Chen, M. S. (2002). On the optimal clustering of sequential data. In: Proceedings of the 2002 SIAM international conference on data mining (pp. 141–157). SIAM

  • Maršánová, L., Smisek, R., Němcová, A., et al. (2021). Brno University of Technology ECG signal database with annotations of P wave (BUT PDB)

  • Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50.

    Article  Google Scholar 

  • Mortazavi, R., & Erfani, S. H. (2018). An effective method for utility preserving social network graph anonymization based on mathematical modeling. International Journal of Engineering, 31(10), 1624–1632.

    Google Scholar 

  • Mortazavi, R., & Jalili, S. (2014). Fast data-oriented microaggregation algorithm for large numerical datasets. Knowledge-Based Systems, 67, 195–205.

    Article  Google Scholar 

  • Mortazavi, R., & Jalili, S. (2017). Fine granular proximity breach prevention during numerical data anonymization. Transactions on Data Privacy, 10(2), 117–144.

    Google Scholar 

  • Moshkovitz, M., Dasgupta, S., Rashtchian, C., et al. (2020). Explainable K-means and K-medians clustering. In: International Conference on Machine Learning (pp. 7055–7065). PMLR

  • Nielsen, F. (2016). Hierarchical clustering. In: Introduction to HPC with MPI for data science (pp. 195–211). Springer, chap 8

  • Pakhira, M. K. (2014). A linear time-complexity k-means algorithm using cluster shifting. In: International conference on computational intelligence and communication networks (pp. 1047–1051). IEEE

  • Pasupathi, S., Shanmuganathan, V., Madasamy, K., et al. (2021). Trend analysis using agglomerative hierarchical clustering approach for time series big data. The Journal of Supercomputing, 77(7), 6505–6524.

    Article  Google Scholar 

  • Sun, L., Qin, X., Ding, W., et al. (2022). Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing, 473, 159–181.

    Article  Google Scholar 

  • Suo, Y., Ji, Y., Zhang, Z., et al. (2022). A formal and visual data-mining model for complex ship behaviors and patterns. Sensors, 22(14), 5281.

    Article  Google Scholar 

  • Wang, H., & Song, M. (2011). Ckmeans. 1d. dp: Optimal K-means clustering in one dimension by dynamic programming. The R journal, 3(2), 29.

    Article  Google Scholar 

  • Wang, Q., Zhang, F., & Li, X. (2018). Optimal clustering framework for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 56(10), 5910–5922.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis. The first draft of the manuscript was written by Reza Mortazavi, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Reza Mortazavi.

Ethics declarations

Ethical Conduct

Ethical conduct is not applicable for this article.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mortazavi, R., Enayati, E. & Basiri, A. Accelerated Sequential Data Clustering. J Classif (2024). https://doi.org/10.1007/s00357-024-09472-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00357-024-09472-4

Keywords

Navigation