Skip to main content
Log in

Quantitative methods of standardization in cluster analysis: finding groups in data

  • Published:
Journal of Radioanalytical and Nuclear Chemistry Aims and scope Submit manuscript

Abstract

The aim of this paper is to evaluate the impact of three standardization methods (z-score, log10 and improved min–max) in determining the number of clusters for a dataset of 146 archaeological ceramic fragments in which mass fractions of chemical elements were determined by INAA. The results showed a tendency towards clustering, which did not occur to the non-standardized data. The standardization methods indicated the presence of three groups within the database. Quality evaluation of these clusters, by means of internal validation indexes, showed that the best performance was obtained with the log10 transformation. This transformation also performed well in the calculation of compactness, while the improved min–max showed better performance in terms of separability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Hazenfratz R, Munita CS, Glascock MD, Neves EG (2016) Study of exchange networks between two Amazon archaeological sites by INAA. J Radioanal Nucl Chem 309:195–205. https://doi.org/10.1007/s10967-016-4758-9

    Article  CAS  Google Scholar 

  2. Martin FF, Di Piazza A, D’oriano C, Carapeza ML, Paonita A, Rotolo SG, Sagnotti L (2017) New insights in the provenance of the obsidian fragments of the Island of Ustica (Palermo, Sicily). Archaeometry 59:435–454. https://doi.org/10.1111/arcm.12270

    Article  CAS  Google Scholar 

  3. Antonelli F, Ermeti AL, Lazzarini L, Verità M, Raffaelli G (2014) An archaeometric contribution to the characterization of renaissance maiolica from Urbino and a comparison with coeval maiolica from Pesaro (the M arches, central Italy). Archaeometry 56:784–804. https://doi.org/10.1111/arcm.12045

    Article  CAS  Google Scholar 

  4. Santos JO, Reis MS, Munita CS, Silva JE (2017) Box-Cox transformation on dataset from compositional studies of archaeological potteries. J Radioanal Nucl Chem 311:1427–1433. https://doi.org/10.1007/s10967-016-4987-y

    Article  CAS  Google Scholar 

  5. Munita CS, Paiva RP, Alves MA, Momose EF, Saiki M (2000) Chemical characterization by INAA of Brazilian ceramics and cultural implications. J Radioanal Nucl Chem 244(3):575–578. https://doi.org/10.1023/A:100675031293

    Article  CAS  Google Scholar 

  6. Yu KN, Miao JM (1998) Multivariate analysis of the energy dispersive X-ray fluorescence results from blue and white Chinese porcelains. Archaeometry 40:331–339. https://doi.org/10.1111/j.1475-4754.1998.tb00841.x

    Article  CAS  Google Scholar 

  7. Shackley MS (2008) Archaeological petrology and the archaeometry of lithic materials. Archaeometry 50:194–215. https://doi.org/10.1111/j.1475-4754-2008.00390.x

    Article  Google Scholar 

  8. Funtua II, Oladipo MOA, Njinga RL, Jonah SA, Yusuf I, Ahmed YA (2012) Evaluation for the accuracy and tapplicability of instrumental neutron activation analysis of geological materials on Nigeria Nuclear Research Reactor -1 (NIRR-1). Int J Appl Sci Technol 2:286–292

    Google Scholar 

  9. Tudela DR, Tatumi SH, Yee M, Brito SL, Morais JL, Morais DD, Piedade SC, Munita CS, Hazenfratz R (2012) TL, OSL and C-14 dating results of the sediments and bricks from mummified nuns’ grave. An Acad Bras Cienc 84(2):237–244. https://doi.org/10.1590/S0001-37652012005000031

    Article  CAS  PubMed  Google Scholar 

  10. Nyarko BJB, Bredwa-Mensah Y, Serfor-Armah Y, Dampare SB, Akaho EHK, Osae S, Chatt A (2007) Investigation of trace elements in ancient pottery from Jenini, Brong Ahafo region, Ghana by INAA and Compton suppression spectrometry. Nucl Instr Methods Phys Res Sect B 263:196–203. https://doi.org/10.1016/j.nimb.2007.04.086

    Article  CAS  Google Scholar 

  11. Speakman RJ, Glascock MD (2007) Acknowledging fifty years of neutron activation analysis in archaeology. Archaeometry 49:179–183. https://doi.org/10.1111/j.1475-4754.2007.00294.x

    Article  Google Scholar 

  12. Mucha HJ, Bartel HG, Dolata J (2008) Effects of data transformation on cluster analysis of archaeometric data. In: Preisach C, Burkhardt H, Smidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 681–688. https://doi.org/10.1007/978-3-540-78246-9_80

    Chapter  Google Scholar 

  13. Tanioka K, Yadohisa H (2012) Effect of data standardization on the result of k-means clustering. In: Gaul W, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 59–67. https://doi.org/10.1007/978-3-642-24466-7_7

    Chapter  Google Scholar 

  14. Chu CW, Holliday JD, Willett P (2009) Effect of data standardization on chemical clustering and similarity searching. J Chem Inf Model 49(2):155–161. https://doi.org/10.1021/ci800224h

    Article  CAS  PubMed  Google Scholar 

  15. Mucha HJ, Bartel HG (2015) Resampling techniques in cluster analysis: is subsampling better than bootstrapping? In: Lausen B, Krolak-Schwerdt S, Böhmer M (eds) Data science, learning by latent structures, and knowledge discovery. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 113–122. https://doi.org/10.1007/978-3-662-44983-7_10

    Chapter  Google Scholar 

  16. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs

    Google Scholar 

  17. Milligan GW, Cooper MC (1988) A study of standardization of variables in cluster analysis. J Classif 5:181–204. https://doi.org/10.1007/BF01897163

    Article  Google Scholar 

  18. Williams WT, Lambert JM (1966) Multivariate methods in plant ecology: V. Similarity analyses and information-analysis. J Ecol. https://doi.org/10.2307/2257960

    Article  Google Scholar 

  19. Kabir W, Ahmad MO, Swamy MNS (2016) A new anchored normalization technique for score-level fusion in multimodal biometric systems. In: 2016 IEEE international symposium on circuits and systems (ISCAS), pp 93–96

  20. Bartlett MS (1947) The use of transformations. Biometrics 3:39–52. https://doi.org/10.2307/3001536

    Article  CAS  PubMed  Google Scholar 

  21. Cross GR, Jain AK (1982) Measurement of clustering tendency. Proc. IFAC Symp Digit Contr. https://doi.org/10.1016/S1474-6670(17)63365-2

    Article  Google Scholar 

  22. Lawson RG, Jurs PC (1990) New index for clustering tendency and its application to chemical problems. J Chem Inf Comput Sci 30:36–41. https://doi.org/10.1021/ci00065a010

    Article  CAS  Google Scholar 

  23. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7

    Article  Google Scholar 

  24. Fávero LP, Fávero P (2017) Análise de Dados: Técnicas multivariadas exploratórias com SPSS e Stata. Elsevier, Amsterdam

    Google Scholar 

  25. Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications, vol 20. Siam, Philadelphia

    Book  Google Scholar 

  26. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl 28:100–108. https://doi.org/10.2307/2346830

    Article  Google Scholar 

  27. Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17:107–145. https://doi.org/10.1023/A:1012801612483

    Article  Google Scholar 

  28. Brun M, Sima C, Hua J, Lowey J, Carroll B, Suh E, Dougherty ER (2007) Model-based evaluation of clustering validation measures. Pattern Recogn 40:807–824. https://doi.org/10.1016/j.patcog.2006.06.026

    Article  Google Scholar 

  29. Iam-on N, Garrett S (2010) Linkclue: a matlab package for link-based cluster ensembles. J Stat Softw 36(9):1–36

    Article  Google Scholar 

  30. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046

    Article  Google Scholar 

  31. Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46:243–256. https://doi.org/10.1016/j.patcog.2012.07.021

    Article  Google Scholar 

  32. Bolshakova N, Azuaje F (2003) Cluster validation techniques for genome expression data. Signal Process 83:825–833. https://doi.org/10.1016/s0165-1684(02)00475-9

    Article  Google Scholar 

  33. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Patt Anal Machine Intel 2:224–227. https://doi.org/10.1109/TPAM1.1979.4766909

    Article  Google Scholar 

  34. Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27. https://doi.org/10.1080/03610927408827101

    Article  Google Scholar 

  35. Munita CS, Paiva RP, Alves MA, Oliveira PMS, Momose EF (2001) Major and trace elemento characterization of prehistoric ceramic from Rezende archaeological site. J R N Ch 248(1):93–96. https://doi.org/10.1023/A:1010682209370

    Article  CAS  Google Scholar 

  36. Munita CS, Paiva RP, Alves MA, Oliveira PMS, Momose EF (2003) Provenance study of archaeolgical ceramic. J Trace Microprobe Tech 21(4):697–706. https://doi.org/10.1081/TMA-120025819

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Casimiro S. Munita.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nogueira, A.L., Munita, C.S. Quantitative methods of standardization in cluster analysis: finding groups in data. J Radioanal Nucl Chem 325, 719–724 (2020). https://doi.org/10.1007/s10967-020-07186-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10967-020-07186-6

Keywords

Navigation