Skip to main content

Hybrid Hierarchical Clustering Algorithm Used for Large Datasets: A Pilot Study on Long-Term Sleep Data

  • Conference paper
  • First Online:
Precision Medicine Powered by pHealth and Connected Health (ICBHI 2017)

Part of the book series: IFMBE Proceedings ((IFMBE,volume 66))

Included in the following conference series:

  • 1474 Accesses

Abstract

Clustering is a popular analysis technique in a modern science full of unlabeled data, hidden dependencies and relations between elements in datasets. The presented study proposes a new hybrid hierarchical clustering method suitable for large datasets. It is based on the combination of effective simple methods. The proposed method was tested and compared with a widely used agglomerative clustering method. Two groups of datasets were used for testing. The first group contains data delivered from real biomedical data and related to a real problem of indication of sleep stages. The second group consists of artificially generated large data. Time, memory consumption, and mutual information were compared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lloyd S (2006) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–137

    Article  MathSciNet  Google Scholar 

  2. Yin Ch, Zhang S (2017) Parallel implementing improved k-means applied for image retrieval and anomaly detection. Multimed Tools Appl 76:16911–16927

    Article  Google Scholar 

  3. Borgwardt S, Brieden A, Gritzmann P (2017) An LP-based k-means algorithm for balancing weighted point sets. Eur J Oper Res 263:349–355

    Article  MathSciNet  Google Scholar 

  4. Jeon Y, Yoo J, Lee J, Yoon S (2017) NC-link: a new linkage method for efficient hierarchical clustering of large-scale data. IEEE Access 5:5594–5608

    Google Scholar 

  5. Medvedev V, Kurasova O, Bernataviciene J, Treigys P, Marcinkevicius V, Dzemyda G (2017) A new web-based solution for modelling data mining processes. Simul Model Pract Theory 76:34–46. High-Performance Modelling and Simulation for Big Data Applications

    Google Scholar 

  6. Li L, Xiwei Ch, Dashi L, Yonggang L, Guandong X, Ming LHSC (2013) A spectral clustering algorithm combined with hierarchical method. Neural Netw World 6:499–521

    Article  Google Scholar 

  7. Gagolewski M, Bartoszuk M, Cena A (2016) Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf Sci 363:8–23

    Article  Google Scholar 

  8. Iber C (2007) Sleep medicine american academy. The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications. American Academy of Sleep Medicine

    Google Scholar 

  9. Gerla V, Djordjevic V, Lhotska L, Krajca V (2009) System approach to complex signal processing task. Comput Aided Syst Theory-EUROCAST 2009:579–586

    Google Scholar 

  10. Gerla V (2012) Automated Analysis of Long-Term EEG Signals. PhD thesis. Czech Technical University in Prague

    Google Scholar 

  11. Tanaseichuk O, Hadj Khodabakshi A, Petrov D et al (2015) An efficient hierarchical clustering algorithm for large datasets. Austin J Proteomics Bioinf Genomics 2

    Google Scholar 

  12. Murtagh F, Legendre P (2014) Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J Classif 31:274–295

    Article  MathSciNet  Google Scholar 

  13. Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035

    Google Scholar 

  14. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson International EditionPearson Addison Wesley

    Google Scholar 

  15. James RG, Mahoney JR, Crutchfield JP (2017) Information trimming: sufficient statistics, mutual information, and predictability from effective channel states. Phys Rev E 95:060102

    Article  Google Scholar 

Download references

Acknowledgements

This research has been supported by the project Temporal context in analysis of long-term non-stationary multidimensional signal, register number 17-20480S of the Grant Agency of the Czech Republic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Gerla .

Editor information

Editors and Affiliations

Ethics declarations

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gerla, V., Murgas, M., Mladek, A., Saifutdinova, E., Macas, M., Lhotska, L. (2018). Hybrid Hierarchical Clustering Algorithm Used for Large Datasets: A Pilot Study on Long-Term Sleep Data. In: Maglaveras, N., Chouvarda, I., de Carvalho, P. (eds) Precision Medicine Powered by pHealth and Connected Health. ICBHI 2017. IFMBE Proceedings, vol 66. Springer, Singapore. https://doi.org/10.1007/978-981-10-7419-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7419-6_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7418-9

  • Online ISBN: 978-981-10-7419-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics