Abstract
When dealing with full spectrum images in which each pixel is characterized by a full spectrum, i.e. spectral images, standard segmentation methods, such as k-means or hierarchical clustering might be either inapplicable or inappropriate ; one aspect being the multi-GB size of such data set leading to very expensive computations. In the present contribution, we propose an approach to spectral image segmentation combining hierarchical clustering and spatial constraints. On the one hand spatial constraints allow to implement an algorithm with a reasonable computation time to obtain a segmentation and with a certain level of robustness with respect to the signal-to-noise ratio since the prior knowledge injected by the spatial constraint partially compensates for the increase in noise level. On the other hand hierarchical clustering provides a statistically sound and known framework that allows accurate reporting of the instrument noise model. In terms of applications, this segmentation problem is encountered particularly in the study of ancient materials that benefits from the wealth of information provided by the acquisition of spectral images. In the last few years, data collection has been considerably accelerated, enabling the characterization of the sample with a high dynamic range in both the spatial dimensions and composition and leading to an average size of a single data set in the tens of GB range. Hence we also considered computational and memory complexity when developing the herein proposed algorithm. Taking on this application domain, we illustrate the proposed algorithm on a X-ray fluorescence spectral image collected on an ca. 100 Myr fossil fish, as well as on simulated data to assess the sensitivity of the results to the noise level. For such experiment, the lower sensitivity to noise simultaneously lead to an increase in the spatial definition of the collected spectral image, thanks to the faster acquisition time, and to a reduction in the potentially harmful radiation dose density to which the samples are subjected.
Similar content being viewed by others
References
Alfeld M, Janssens K. Strategies for processing mega-pixel x-ray fluorescence hyperspectral data: a case study on a version of Caravaggio’s painting Supper at Emmaus. J Anal At Spectrom. 2015;30(3):777–89.
Ambroise C, Govaert G. Convergence of an EM-type algorithm for spatial clustering. Pattern Recogn Lett. 1998;19:919–327.
Bergamaschi A, Medjoubi K, Messaoudi C, Marco S, Somogyi A. Mmx-i: data-processing software for multimodal x-ray imaging and tomography. J Synchrotron Radiat. 2016;23(3):783–94.
Bertrand L, Cotte M, Stampanoni M, Thoury M, Marone F, Schöder S. Development and trends in synchrotron studies of ancient and historical materials. Phys Rep. 2012;519(2):51–96. https://doi.org/10.1016/j.physrep.2012.03.003.
Bertrand L, Robinet L, Thoury M, Janssens K, Cohen SX, Schöder S. Cultural heritage and archaeology materials studied by synchrotron spectroscopy and imaging. Appl Phys A Mater Sci Process. 2012;106(2):377–96. https://doi.org/10.1007/s00339-011-6686-4.
Bertrand L, Thoury M, Anheim E. Ancient materials specificities for their synchrotron examination and insights into their epistemological implications. J Cult Herit. 2013;14(4):277–89.
Bertrand L, Thoury M, Gueriau P, Anheim É, Cohen S. Deciphering the chemistry of cultural heritage: Targeting material properties by coupling spectral imaging with image analysis. Accounts Chem Res. 2021. https://doi.org/10.1021/acs.accounts.1c00063.
Bonnet N, Herbin M, Vautrot P. Multivariate image analysis and segmentation in microanalysis. Scanning Microsc. 1997;11:1–21.
Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. New York: Taylor & Francis; 1984.
Calinski T, Harabasz A. A dendrite method for cluster analysis. Commun Stat. 1974;3:1–27.
Cleveland W, Grosse E, Shyu WM. Statistical models in S, chap. Chapter 8: local regression models. New York: Wadsworth & Brooks; 1992.
Davesne D, Gueriau P, Dutheil D, Bertrand L. Exceptional preservation of a cretaceous intestine provides a glimpse of the early ecological diversity of spiny-rayed fishes (acanthomorpha, teleostei). Sci Rep. 2018;8:8509.
Everitt BS, Landau S, Leese M, Stahl D. Cluster analysis. 5th ed. New York: Wiley; 2010.
Fiske LD, Katsaggelos AK, Aalders MCG, Alfeld M, Walton M, Cossairt O. A data fusion method for the delayering of x-ray fluorescence images of painted works of art. In: 2021 IEEE International Conference on Image Processing (ICIP), 2021;3458–3462. 10.1109/ICIP42928.2021.9506300
Grabowski B, Masarczyk W, Głomb P, Mendys A. Automatic pigment identification from hyperspectral data. J Cult Herit. 2018;31:1–12.
Gueriau P, Bernard S, Farges F, Mocuta C, Dutheil DB, Adatte T, Bomou B, Godet M, Thiaudière D, Charbonnier S, et al. Oxidative conditions can lead to exceptional preservation through phosphatization. Geology. 2020;2:2.
Gueriau P, Jauvion C, Mocuta M. Show me your yttrium, and i will tell you who you are: implications for fossil imaging. Palaeontology. 2018;61(6):981–90.
Gueriau P, Mocuta C, Bertrand L. Cerium anomaly at microscale in fossils. Anal Chem. 2015;87(17):8827–88367.
Gueriau P, Mocuta C, Dutheil D, Cohen S, Thiaudière D, Charbonnier S, Clément G, Bertrand L. Trace elemental imaging of rare earth elements discriminates tissues at microscale in flat fossils. PLoS One. 2014;9(1):e86946.
Gueriau P, Réguer S, Leclercq N, Cupello C, Brito P, Jauvion C, Morel S, Charbonnier S, Thiaudière D, Mocuta C. Visualizing mineralization processes and fossil anatomy using synchronous synchrotron X-ray fluorescence and X-ray diffraction mapping. J R Soc Interface. 2020;17(169):20200216. https://doi.org/10.1098/rsif.2020.0216.
Lance GN, Williams WT. A general theory of classificatory sorting strategies: II. Clustering systems. Comput J. 1967;10(3):271–7. https://doi.org/10.1093/comjnl/10.3.271.
Lebart L. Programme d’agrégation avec contrainte. Cahiers de L’analyse des Données. 1978;3:275–87.
Mihalić IB, Fazinić S, Barac M, Karydas AG, Migliori A, Doračić D, Desnica V, Mudronja D, Krstić D. Multivariate analysis of pixe+ xrf and pixe spectral images. J Anal At Spectrom. 2021;36(3):654–67.
Milligan G, Cooper M. An examination of procedures for determining the number of clusters in a data set. Psychometrika. 1985;50:159–79.
Pouyet E, Rohani N, Katsaggelos AK, Cossairt O, Walton M. Innovative data reduction and visualization strategy for hyperspectral imaging datasets using t-sne approach. Pure Appl Chem. 2018;90(3):493–506.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2020). https://www.R-project.org/
Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66(336):846–50.
Rodriguez MA, Kotula PG, Griego JJ, Heath JE, Bauer SJ, Wesolowski DE. Multivariate statistical analysis of micro-X-ray fluorescence spectral images. Powder Diffr. 2012;27(2):108–13.
Sciutto G, Oliveri P, Prati S, Quaranta M, Bersani S, Mazzeo R. An advanced multivariate approach for processing X-ray fluorescence spectral and hyperspectral data from non-invasive in situ analyses on painted surfaces. Anal Chim Acta. 2012;752:30–8.
Solé VA, Papillon E, Cotte M, Walter P, Susini J. A multiplatform code for the analysis of energy-dispersive X-ray fluorescence spectra. Spectrochim Acta B. 2007;62:63–8.
Vekemans B, Janssens K, Vincze L, Aerts A, Adams F, Hertogen J. Automated segmentation of \(\mu\)-xrf image sets. X-Ray Spectrom. 1997;26(6):333–46.
Vogt S, Maser J, Jacobsen C. Data analysis for X-ray fluorescence imaging. J Phys IV. 2003;104:617–22.
Ward JH. Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963;58:236–44.
Webb S. The microanalysis toolkit: X-ray fluorescence image processing software. In: AIP Conference Proceedings, vol. 1365. American Institute of Physics 2011; pp. 196–199
Acknowledgements
We thank S. Charbonnier, G. Clément, N.-E. Jalil, Didier B. Dutheil (MNHN, Paris), A. Tourani (Cadi Ayyad University, Marrakesh), P.M. Brito (Rio de Janeiro State University, Rio de Janeiro), F. Khaldoune, H. Bourget and B. Khalloufi for organizing and/or participating in the field work that collected the fossil. This field expedition to Morocco was supported by the Muséum national d’Histoire naturelle through the “ATM Biodiversité actuelle et fossile” and by UMR 7207 CR2P. We acknowledge Synchrotron SOLEIL for provision of beamtime, and C. Mocuta and D. Thiaudière for assistance at the DiffAbs beamline. Authors would also like to thanks the peers that reviewed the manuscript for their constructive comments and advices that helped us enhancing the presentation of our results.
Author information
Authors and Affiliations
Contributions
This work arose from discussions between GC, SXC and AG. SXC proposed the exploitation of spatial constraints and the use of \(\chi ^2\) as an adapted dissimilarity measure for XRF spectra. GC proposed the heuristic rule to stop applying spatial constraint on the segmentation. AG proposed a version of the \(\chi ^2\) metric consistent between the spatially constrained initial steps and the unconstrained agglomerative steps, so that Lance and William formulae could be used in this latter part. SXC and AG implemented the algorithm and its result representations in R. SXC proposed and implemented the color matching in segmentation representation and the zero noise models. PG performed all the experimental measurements and interpretations on the fossil, and oriented the algorithm design to ensure results are valuable for the practitioner. All authors contributed to the writing of this manuscript.
Corresponding author
Ethics declarations
Conflict of interest statement
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Celeux, G., Cohen, S.X., Grimaud, A. et al. Hierarchical Clustering of Spectral Images with Spatial Constraints for the Rapid Processing of Large and Heterogeneous Data Sets. SN COMPUT. SCI. 3, 194 (2022). https://doi.org/10.1007/s42979-022-01074-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01074-4