Abstract
The aim of this paper is to evaluate the impact of three standardization methods (z-score, log10 and improved min–max) in determining the number of clusters for a dataset of 146 archaeological ceramic fragments in which mass fractions of chemical elements were determined by INAA. The results showed a tendency towards clustering, which did not occur to the non-standardized data. The standardization methods indicated the presence of three groups within the database. Quality evaluation of these clusters, by means of internal validation indexes, showed that the best performance was obtained with the log10 transformation. This transformation also performed well in the calculation of compactness, while the improved min–max showed better performance in terms of separability.
Similar content being viewed by others
References
Hazenfratz R, Munita CS, Glascock MD, Neves EG (2016) Study of exchange networks between two Amazon archaeological sites by INAA. J Radioanal Nucl Chem 309:195–205. https://doi.org/10.1007/s10967-016-4758-9
Martin FF, Di Piazza A, D’oriano C, Carapeza ML, Paonita A, Rotolo SG, Sagnotti L (2017) New insights in the provenance of the obsidian fragments of the Island of Ustica (Palermo, Sicily). Archaeometry 59:435–454. https://doi.org/10.1111/arcm.12270
Antonelli F, Ermeti AL, Lazzarini L, Verità M, Raffaelli G (2014) An archaeometric contribution to the characterization of renaissance maiolica from Urbino and a comparison with coeval maiolica from Pesaro (the M arches, central Italy). Archaeometry 56:784–804. https://doi.org/10.1111/arcm.12045
Santos JO, Reis MS, Munita CS, Silva JE (2017) Box-Cox transformation on dataset from compositional studies of archaeological potteries. J Radioanal Nucl Chem 311:1427–1433. https://doi.org/10.1007/s10967-016-4987-y
Munita CS, Paiva RP, Alves MA, Momose EF, Saiki M (2000) Chemical characterization by INAA of Brazilian ceramics and cultural implications. J Radioanal Nucl Chem 244(3):575–578. https://doi.org/10.1023/A:100675031293
Yu KN, Miao JM (1998) Multivariate analysis of the energy dispersive X-ray fluorescence results from blue and white Chinese porcelains. Archaeometry 40:331–339. https://doi.org/10.1111/j.1475-4754.1998.tb00841.x
Shackley MS (2008) Archaeological petrology and the archaeometry of lithic materials. Archaeometry 50:194–215. https://doi.org/10.1111/j.1475-4754-2008.00390.x
Funtua II, Oladipo MOA, Njinga RL, Jonah SA, Yusuf I, Ahmed YA (2012) Evaluation for the accuracy and tapplicability of instrumental neutron activation analysis of geological materials on Nigeria Nuclear Research Reactor -1 (NIRR-1). Int J Appl Sci Technol 2:286–292
Tudela DR, Tatumi SH, Yee M, Brito SL, Morais JL, Morais DD, Piedade SC, Munita CS, Hazenfratz R (2012) TL, OSL and C-14 dating results of the sediments and bricks from mummified nuns’ grave. An Acad Bras Cienc 84(2):237–244. https://doi.org/10.1590/S0001-37652012005000031
Nyarko BJB, Bredwa-Mensah Y, Serfor-Armah Y, Dampare SB, Akaho EHK, Osae S, Chatt A (2007) Investigation of trace elements in ancient pottery from Jenini, Brong Ahafo region, Ghana by INAA and Compton suppression spectrometry. Nucl Instr Methods Phys Res Sect B 263:196–203. https://doi.org/10.1016/j.nimb.2007.04.086
Speakman RJ, Glascock MD (2007) Acknowledging fifty years of neutron activation analysis in archaeology. Archaeometry 49:179–183. https://doi.org/10.1111/j.1475-4754.2007.00294.x
Mucha HJ, Bartel HG, Dolata J (2008) Effects of data transformation on cluster analysis of archaeometric data. In: Preisach C, Burkhardt H, Smidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 681–688. https://doi.org/10.1007/978-3-540-78246-9_80
Tanioka K, Yadohisa H (2012) Effect of data standardization on the result of k-means clustering. In: Gaul W, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 59–67. https://doi.org/10.1007/978-3-642-24466-7_7
Chu CW, Holliday JD, Willett P (2009) Effect of data standardization on chemical clustering and similarity searching. J Chem Inf Model 49(2):155–161. https://doi.org/10.1021/ci800224h
Mucha HJ, Bartel HG (2015) Resampling techniques in cluster analysis: is subsampling better than bootstrapping? In: Lausen B, Krolak-Schwerdt S, Böhmer M (eds) Data science, learning by latent structures, and knowledge discovery. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 113–122. https://doi.org/10.1007/978-3-662-44983-7_10
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs
Milligan GW, Cooper MC (1988) A study of standardization of variables in cluster analysis. J Classif 5:181–204. https://doi.org/10.1007/BF01897163
Williams WT, Lambert JM (1966) Multivariate methods in plant ecology: V. Similarity analyses and information-analysis. J Ecol. https://doi.org/10.2307/2257960
Kabir W, Ahmad MO, Swamy MNS (2016) A new anchored normalization technique for score-level fusion in multimodal biometric systems. In: 2016 IEEE international symposium on circuits and systems (ISCAS), pp 93–96
Bartlett MS (1947) The use of transformations. Biometrics 3:39–52. https://doi.org/10.2307/3001536
Cross GR, Jain AK (1982) Measurement of clustering tendency. Proc. IFAC Symp Digit Contr. https://doi.org/10.1016/S1474-6670(17)63365-2
Lawson RG, Jurs PC (1990) New index for clustering tendency and its application to chemical problems. J Chem Inf Comput Sci 30:36–41. https://doi.org/10.1021/ci00065a010
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Fávero LP, Fávero P (2017) Análise de Dados: Técnicas multivariadas exploratórias com SPSS e Stata. Elsevier, Amsterdam
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications, vol 20. Siam, Philadelphia
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl 28:100–108. https://doi.org/10.2307/2346830
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17:107–145. https://doi.org/10.1023/A:1012801612483
Brun M, Sima C, Hua J, Lowey J, Carroll B, Suh E, Dougherty ER (2007) Model-based evaluation of clustering validation measures. Pattern Recogn 40:807–824. https://doi.org/10.1016/j.patcog.2006.06.026
Iam-on N, Garrett S (2010) Linkclue: a matlab package for link-based cluster ensembles. J Stat Softw 36(9):1–36
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046
Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46:243–256. https://doi.org/10.1016/j.patcog.2012.07.021
Bolshakova N, Azuaje F (2003) Cluster validation techniques for genome expression data. Signal Process 83:825–833. https://doi.org/10.1016/s0165-1684(02)00475-9
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Patt Anal Machine Intel 2:224–227. https://doi.org/10.1109/TPAM1.1979.4766909
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27. https://doi.org/10.1080/03610927408827101
Munita CS, Paiva RP, Alves MA, Oliveira PMS, Momose EF (2001) Major and trace elemento characterization of prehistoric ceramic from Rezende archaeological site. J R N Ch 248(1):93–96. https://doi.org/10.1023/A:1010682209370
Munita CS, Paiva RP, Alves MA, Oliveira PMS, Momose EF (2003) Provenance study of archaeolgical ceramic. J Trace Microprobe Tech 21(4):697–706. https://doi.org/10.1081/TMA-120025819
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nogueira, A.L., Munita, C.S. Quantitative methods of standardization in cluster analysis: finding groups in data. J Radioanal Nucl Chem 325, 719–724 (2020). https://doi.org/10.1007/s10967-020-07186-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10967-020-07186-6