Advertisement

Dynamic clustering of interval data based on hybrid \(L_q\) distance

  • Leandro Carlos de Souza
  • Renata Maria Cardoso Rodrigues de SouzaEmail author
  • Getúlio José Amorim do Amaral
Regular Paper
  • 18 Downloads

Abstract

Dynamic clustering defines partitions within data and prototypes to each partition. Distance metrics are responsible for checking the closeness between instances and prototypes. Considering the literature about interval data, distances depend on interval bounds and the information inside the intervals is ignored. This paper proposes new distances, which explore the information inside of intervals. It also presents a mapping of intervals to points, which preserves their spatial location and internal variation. We formulate a new hybrid distance for interval data based on the well-known \(L_q\) distance for point data. This new distance allows for a weighted formulation of the hybridism. Hence, we propose a Hybrid \(L_q\) distance, a Weighted Hybrid \(L_q\) distance, as well as the adaptive version of the Hybrid \(L_q\) distance for interval data. Experiments with synthetic and real interval data sets illustrate the usefulness of the hybrid approach to improve dynamic clustering for interval data.

Keywords

\(L_q\) distance Symbolic data analysis Clustering Data models 

Notes

Acknowledgements

The authors would like to thank CNPq and CAPES (Brazilian Agencies) for their financial support.

References

  1. 1.
    Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, ChichesterCrossRefzbMATHGoogle Scholar
  2. 2.
    Billard L, Le-Rademacher J (2012) Principal component analysis for interval data. Wiley Interdiscip Rev Comput Stat 4(6):535–540CrossRefGoogle Scholar
  3. 3.
    Burden RL, Faires JD (2011) Numerical analysis. Cengage Learning, Brooks/ColezbMATHGoogle Scholar
  4. 4.
    Chavent M, Lechevallier Y (2002) Dynamical clustering of interval data: optimization of an adequacy criterion based on Hausdorff distance. In: Classification, clustering, and data analysis, pp 53–60Google Scholar
  5. 5.
    Chavent M (2004) An Hausdorff distance between hyper-rectangles for clustering interval data. In: Banks D et al (eds) Classification, clustering an data mining application, proceedings of the IFCS04. Springer, Berlin, pp 333–340CrossRefGoogle Scholar
  6. 6.
    Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, CambridgezbMATHGoogle Scholar
  7. 7.
    De Carvalho FAT, Brito P, Bock H-H (2006b) Dynamic clustering for interval data based on L2 distance. Comput Stat 21:231–250CrossRefzbMATHGoogle Scholar
  8. 8.
    De Carvalho FAT, Souza RMCR, Chavent M, Lechevallier Y (2006a) Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognit Lett 27:167–179CrossRefGoogle Scholar
  9. 9.
    De Carvalho FAT, Lechevallier Y (2009a) Dynamic clustering of interval-valued data based on adaptive quadratic distances. Trans Syst Man Cyber Part A 39:1295–1306CrossRefGoogle Scholar
  10. 10.
    De Carvalho FAT, Lechevallier Y (2009b) Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognit 42:1223–1236CrossRefzbMATHGoogle Scholar
  11. 11.
    De Carvalho FAT, Souza RMCR (2010) Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognit Lett 31(5):430–443CrossRefGoogle Scholar
  12. 12.
    Diday E, Simon JC (1976) Clustering analysis. In: Fu KS (ed) Digit Pattern Classif. Springer, Berlin, pp 47–94Google Scholar
  13. 13.
    Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley, ChichesterzbMATHGoogle Scholar
  14. 14.
    Diday E (2016) Thinking by classes in data science: the symbolic data analysis paradigm. Wiley Interdiscip Rev Comput Stat 8(5):172–205MathSciNetCrossRefGoogle Scholar
  15. 15.
    Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246MathSciNetCrossRefGoogle Scholar
  16. 16.
    Fränti P, Kivijärvi J (2000) Randomised local search algorithm for the clustering problem. Pattern Anal Appl 3:358–369MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 24:2367–2376Google Scholar
  18. 18.
    Lichaman M (2013) newblock UCI machine learning repositoryGoogle Scholar
  19. 19.
    Lima Neto EA, De Carvalho FAT (2010) Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 54:333–347MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Lima Neto EA, De Carvalho FAT (2008) Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 52:1500–1515MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Martinez WL, Martinez AR (2007) Computational statistics handbook with MATLAB. Chapman & Hall CRC, New YorkzbMATHGoogle Scholar
  22. 22.
    Renche AC, Christensen WF (2012) Methods of multivariate analysis, 3rd edn. Wiley, New YorkCrossRefGoogle Scholar
  23. 23.
    Silva Filho TM, Souza RMCR (2015) A swarm-trained k-nearest prototypes adaptive classifier with automatic feature selection for interval data. Neural Netw 80:19–33CrossRefGoogle Scholar
  24. 24.
    Silva APD, Brito P (2006) Linear discriminant analysis for interval data. Comput Stat 21(2):289–308MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Silva APD, Brito P (2015) Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. J Classif 32(3):516–541MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Souza LC (2016) Agrupamento e regressão linear de dados simblicos intervalares baseados em novas representações. PhD Thesis, Universidade Federal de Pernambuco, PE, Brazil, https://repositorio.ufpe.br/handle/123456789/17640
  27. 27.
    Souza RMCR, De Carvalho FAT (2004) Clustering of interval data based on city–block distances. Pattern Recognit Lett 25:353–365CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Leandro Carlos de Souza
    • 1
  • Renata Maria Cardoso Rodrigues de Souza
    • 2
    Email author
  • Getúlio José Amorim do Amaral
    • 3
  1. 1.Departamento de ComputaçãoDC/UFERSAMossoróBrazil
  2. 2.Centro de InformáticaUniversidade Federal de PernambucoRecifeBrazil
  3. 3.Departamento de Estatística, Centro de Ciências ExatasUniversidade Federal de PernambucoRecifeBrazil

Personalised recommendations