1 Introduction

Depositional environments encompass the interplay of physical, chemical, and biological processes that govern sediment deposition [1, 2]. Fluvial environments, shaped by river systems, exhibit inherent dynamism reflected in variations in flow velocity, sediment load, and channel morphology [3]. This dynamism leaves distinct signatures within fluvial sedimentary archives, making them invaluable for deciphering sedimentary processes and reconstructing past depositional environments [4].

Classic approaches, including lithofacies analysis and grain size distribution (GSDs), provide a foundation for characterizing transport and depositional processes. However, lithofacies analysis based on classifying rocks by composition, texture, and structure [5] can be limited by equifinality. This principle highlights that similar lithofacies can be formed by different depositional processes. For instance, cross-bedded sandstone could form within a braided river channel or a crevasse splay [3, 6].

To refine interpretations, sedimentologists integrate lithofacies associations (i.e., a recurring set of lithofacies formed within a distinct depositional environment or sub-environment) with complementary techniques [7]. Paleocurrent analysis reveals fluvial flow patterns [5], while architectural element analysis aids in deciphering channel forms [3]. These combined approaches help overcome equifinality and mitigate the risk of oversimplifying complex depositional systems. Traditionally, lithofacies associations are identified through qualitative analysis. However, complex, multivariate datasets associated with lithofacies characteristics can obscure associations and make qualitative assessments subject to interpretation bias.

GSDs are fundamental for characterizing the transport and depositional processes of fluvial systems. Their polymodal nature indicates the mix of these processes [8,9,10, 10]. Whereas traditional methods like log-normal distributions and probability density functions (PDF) aid in GSD analysis, they can oversimplify multimodal distributions or yield inaccurate statistical parameters [11, 12]. Principal Component Analysis (PCA) offers a powerful approach to enhance the identification and interpretation of lithofacies associations. By reducing the dimensionality of sedimentological datasets, PCA helps discover patterns that might be obscured in traditional qualitative characterization. Furthermore, PCA has the potential to refine the analysis of grain size distributions (GSDs) within lithofacies, aiding in differentiating depositional processes. By visualizing sample relationships based on their primary PCs, clusters can be observed, representing potential lithofacies associations. However, the direct application of PCA to PDFs or log-normal coefficients masks distinctions of the original GSD, while standard PCA is hindered by the compositional constraints of GSDs [13, 14]. Several studies [10, 12, 15, 16] demonstrate the effectiveness of compositional data analysis (CoDA) in addressing limitations of traditional grain size distribution (GSD) analysis in fluvial environments. CoDA, through techniques like centered log-ratio (clr) transformation, overcomes the closure constraint inherent in GSDs. This allows for objective interpretation using multivariate statistical analysis [12] and a more comprehensive analysis of the entire GSD compared to traditional methods [10]. This shift towards CoDA provides a more objective framework for interpreting GSDs and characterizing depositional processes in fluvial settings.

1.1 Compositional data and grain size distribution (GSD)

Compositional data represents the relative proportions or percentages of parts or components that make up a whole [17]; the sum of its all components is fixed at 100% (a sum constraint). This means that the values of one component are dependent on the values of the other components, resulting in (spurious correlations). All these properties apply to the grain size data; (Firstly. GSDs contain relative information about the proportions of distinct size fractions in a sample); consequently, any change to one proportion of the distribution induces a reciprocal alteration to all remaining proportions. (Secondly. Grain-size fractions are presented as percentages, containing nonnegative values that sum to constant or 100%.).

PCA assumes that the variables are independent, which violates the sum constraint. PCA can amplify the spurious correlations, producing misleading results. PCA assumes that the variables are measured on the same scale, but compositional data is represented on different scales, e.g., weight percentages, leading to biased results in PCA. PCA can be difficult to interpret in compositional data analysis, as the principal components may not correspond to sedimentary processes. To overcome these problems, the CoDA approach [18] transforms the compositional data into a log-ratio space where PCA can be applied.

CoDA is based on “sub-compositional” coherence [19]. In multimodal GSD, each grain-size fraction or mode is a sub-composition. Log ratios are the logarithms of the ratio of two parts of a composition [20]. For instance, if p (grain size fractions) are in the compositional data, the log-ratio transformation can function in (p–1) dimensional space [18, 21]. These can remove the compositional constraint of GSDs, allowing a comprehensive characterization of the GSDs through unconstrained PCA. A prerequisite for this method is that all data values are greater than zero.

This study provides a methodology integrating compositional data analysis (CoDA) and principal component analysis (PCA) to enhance the characterization of the depositional processes within the Shendi Formation. This approach helps overcome limitations of qualitative analysis by applying centered log ration transformation (clr) and PCA to identify patterns within grain size distribution (GSDs) of the lithofacies, enhancing the interpretability of GSDs through the application of PCA, potentially revealing links between principal components and depositional processes. Moreover, the study investigates the correlations between PCA-derived lithofacies associations and environmental factors to potentially distinguish the fluvial sub-environments.

1.2 The situation of Shendi Formation

In central Sudan, the Upper Shendi Formation emerges as a crucial geological entity, defining the sedimentary landscape of the Shendi-Atbara basins, shaping the region into plateau and mesas, crowned by a flat regional planation surface [22], marked by ferruginous sandstones and oolite ironstones [23], the formation is topped by the Hudi Chert Formation and Wadi Awataib Conglomerate [24]. The surrounding terrain is marked by Quaternary and older wadi and eolian sediments, punctuated by Pleistocene River terraces along the Nile [22, 25]. Shendi Formation is constituted of heterogeneous clastic sedimentary rocks; pebble conglomerates, intraformational conglomerates, ferruginous quartzose sandstones, mudstones, and oolite ironstones [24, 26] deposited in fluviatile environment within graben structure [27]. The depositional environment is characterized by arid and torrid climatic conditions marked by gypsum and halite, indicative of evaporitic processes. Correlations with Albian-Cenomanian Omdurman and Wadi Milk Formations suggest phases of humid and warm climate, supported by the presence of wood fossils and vertebrate remains [28,29,30]. These studies are solely qualitative insights, and the sub-fluvial depositional environments, lithological characteristics, and climatic conditions of the Shendi Formation remain rich with potential discoveries, especially in quantitative essence.

2 Material and methods

2.1 Field and laboratory measurement

Vertical sedimentary sections of the Shendi Formation (Fig. 1) were described at Bagrawiyah-Umm Ali and Al Musawwarat regions using lithological parameters such as mineralogical composition, texture, and physical sedimentary structures [31] (Fig. 2). The sandy beds were sampled from lower to upper stratigraphic levels, preferably from sandstone beds. Lithofacies were defined based on textural, and structural characteristics and named following Maill’s facies code [32]; facies are classified into facies associations based on the distribution pattern. Representative facies samples were subject to mechanical size analysis to determine their grain size distribution (GSDs). Approximately 40 g of each sample was treated with hydrogen peroxide (30% w/v) to remove organic matter and disaggregate any cementing materials. Samples were then immersed in water, gently disaggregated using a rubber-tipped pestle in a porcelain mortar (to minimize grain breakage) and agitated using a mechanical shaker for 48 h. Following agitation, samples were oven-dried at a low temperature (40°C) to prevent alteration of clay minerals. Dry sieving was conducted using a calibrated sieve shaker with a sieve stack ranging from − 1 to 4 phi (covering gravel to fine sand fractions). Sieving duration was standardized to 15 min per sample to ensure consistent results. The weight retained on each sieve was carefully measured and recorded for subsequent analysis. Statistical parameters (mean size, sorting, skewness, kurtosis) were determined following the graphic method [33] using GRADISTAT v. 9.1 [34].

Fig. 1
figure 1

Geological sketch of the Shendi Formation in the central Sudan vicinity, with the geographical location of the studied sedimentary sections (modified after GRAS 2005)

Fig. 2
figure 2

A compilation of the representative vertical sedimentary profiles of Bagrawiyah-Umm Ali and Al Musawwarat regions of the Shendi Basin. The profiles depict various lithofacies, indicated by grain size and sedimentary structures. Gm = massive conglomerate, St = trough cross-bedded sandstone, Sp = planar cross-bedded sandstone, Sh = horizontally bedded sandstone, massive sandstone (Sm), rippled sandstone (Sr), Sl = laminated sandstone, Fm = massive mudstone and Fl = laminated silt and clay stones

2.2 Log ratio transformation and principal component analysis

Compositional data represents a proportion of a whole, inherently constrained within a simplex rather than Euclidean sample space [18]. This constraint of simplex space complicates the direct application of PCA and CA. The statistical parameters of the GSD, D5, D95, Md, and grain size fractions were subject to clr-transformation following [18] before PCA, conducted in Past 4.03 [35]. The clr transformation establishes a one-to-one mapping between the simplex and Euclidean space while retaining interpretable geometric relationships of the data [36], hence facilitating the application of PCA and CA and simplifying interpreting results [37]. PCA was performed following a standardized procedure that can be found in many texts [38,39,40]. Firstly, data standardization by centering the data around the mean and scaling it to unit variance followed by the computation of the covariance matrix to understand the inter-feature variances. Eigenvalues and eigenvectors of the covariance matrix were then calculated, to identify the principal components (PCs), with eigenvectors representing the directions of maximum variance, while the eigenvalues indicating the magnitude of variances. Eigenvalues were sorted in descending order, and eigenvectors were ranked accordingly to prioritize the PCs that explain the most variance. Each PC represents a specific proportion of the total variance in the GSD data; PC1 covers the largest portion of variation, while the subsequent components sequentially capture the remaining variance hierarchically. The resulting PCs helped interpret sediment transport processes, depositional environments, and the key variables driving lithofacies differentiation.

The determination of an appropriate number of PCs for subsequent analyses is a challenge, as there is no acknowledged criterion for this decision [41]. The selection is often influenced by the specific objectives of the data application and the relevant knowledge domain. However, one commonly utilized method relies on Kaiser’s rule [38], which suggests retaining only those PCs whose eigenvalues exceed one. In our study, we have integrated Kaiser’s criterion (Fig. 3) with a scree plot and the analysis of explained variance percentages (Table 1), further supported by our understanding of the depositional history of the studied region, leading to the selection of the first three PCs as optimal for our analysis, which have eigenvalues greater than 1, cumulatively account for ~ 90.1% of the total variance in the dataset.

Fig. 3
figure 3

Scree plot illustrating the eigenvalues of principal components derived from the dataset. The red line indicates Kaiser’s criterion, which suggests retaining principal components with eigenvalues greater than 1

Table 1 Summary of principal component analysis results

2.3 The CM diagram

The CM diagram [42, 42] is used to establish the relationships between the sediment textures and processes of deposition. Passega [42] defined M and C as the cumulative GSDs' median and one percentile. These values can readily be obtained as the grain diameters (in mm or µm) belonging to the 50 and 95 percentiles of the cumulative distribution functions [16].

2.4 The workflow of the analysis

Figure 4 illustrates the systematic approach for the effective characterization of lithofacies and transport and depositional processes. The approach commences with the decomposition of the multimodal GSD, extracting statistical parameters and grain size fractions from mud to gravel. Compositional data analysis (CoDA) is applied to the grain size fractions, utilizing centered log-ratio (clr) transformations to address the data closure problem [13], ensuring that the compositional nature of the data does not skew the analysis and maintaining the integrity of the statistical interpretations. clr transformation acts as a precursor to Principal Component Analysis (PCA), which serves to reduce the dataset’s dimensionality and highlight the principal components that capture the essence of sediment transport and depositional dynamics [10, 43]. Parallelly, CM diagrams were constructed, plotting the median grain size (M) against the coarser percentile (C), to discern sediment transport mechanisms [44]. The application of these advanced statistical techniques forms a comprehensive framework, facilitating nuanced interpretations of lithofacies associations and shedding light on the intricacies of sediment transport and deposition within the studied geological context.

Fig. 4
figure 4

Flowchart of the analysis methodology

3 Results and discussion

3.1 Lithofacies analysis

The analyzed deposits consist predominantly of sandstone lithofacies, while conglomerate and mudstone lithofacies have fewer dominant components (Figs. 2 and 5). Most deposits are intensely iron cemented and reddish to dark brownish to almost black. Ten lithofacies were defined (Table 2) (Fig. 6); matrix-supported massive conglomerate (Gmm), planar cross stratified conglomerate (Gp), trough cross-bedded sandstone (St), planar cross-bedded sandstone (Sp), horizontally bedded sandstone (Sh), massive sandstone (Sm), rippled sandstone (Sr), laminated sandstone (Sl), massive mudstone (Fm) and laminated silt and clay stones (Fl).

Figure. 5.
figure 5

A The relative abundance of the ten lithofacies reported in the studied sections. B Textural classification of the samples in the studied sections, based on [46]; upper-case letters indicate largest proportion; lower-case indicate qualifiers; brackets indicate ‘slightly’, e.g. (g)mS is slightly gravelly, muddy sand

Table 2 Summary of the lithofacies types reported in the studied sections of the Shendi Formation in Al Musawwarat and Bagrawiyah-Umm Ali regions
Fig. 6
figure 6

Collection of representative field photographs showing various lithofacies. Annotations indicate the lithofacies code adopted from [47]. Trough cross-bedded sandstone (St) in 2 and 3, horizontally bedded sandstone (Sh) in 4, planar cross-bedded sandstone (Sp) in 5, ripple laminated sandstone (Sr) in 6, massive mudstone (Fm) in 7, and laminated sandstone (Sl) and laminated siltstone mudstone (Fl) in 8

3.2 CoDA, PCA, and Lithofacies association

PCA applied to the clr-transformed grain size data showed three principal components (PC1 to PC3) (Fig. 7) that explained ~ 91.86% of the data variance. These Three PCs represent three distinct depositional processes: the dominant process is indicated by PC1, which explains ~ 57.3% of the variance, and the lesser dominant process is indicated by PC2, accounting for ~ 19.2%, and the least dominant process is indicated by PC3 with explained variance ~ 15.3%. PC1 showed a significant positive correlation with fine fractions (i.e., very fine and fine sand) and a moderate positive correlation with clay and medium sand (Fig. 7A). On the other hand, PC2 significantly correlates with clay fraction (Fig. 7B). PC3 on the other hand correlates with gravel and medium sand fractions (Fig. 7C).

Fig. 7
figure 7

AC Correlation of the PC1, PC2, and PC3 with the grain size fractions. (clr = centered log-ratio transformed)

As the grain size fractions with positive correlations with PC1, PC2, and PC3 correspond to different lithofacies groups each, these PCs therefore indicate three different lithofacies associations; Lithofacies association (LFA1-3) and subsequently depositional processes (Fig. 8). LFA1 was dominated by channel bar lithofacies, LFA2 was dominated by channel lage/base channel, and LFA3 was dominated by overbank lithofacies. LFA1 and LFA 2 had relatively high abundance in the Al Musawwarat area and LA3 is dominating in the Bagrawiyah-Umm Ali area (Fig. 5A and B). The fluvial channel facies association is primarily made up of conglomerate lag deposits; massive conglomerate (Gm), matrix-supported massive conglomerate (Gmm), planar cross-stratified gravel (Gp), trough cross-bedded sandstone (St), stratified sandstones (Sh), planar cross-bedded sandstone, massive sandstone (Sm) and ripple laminated sandstone (Sr). LFA3 deposits are composed of structureless mudstone (Fs), laminated mudstone (Fl); and horizontally laminated sandstone (Sl). Subsequently, three depositional processes can be described.

Fig. 8
figure 8

The CM diagram illustrates the relations between the lithofacies associations (LA1-LA3) and their corresponding depositional processes modified after [48, 48]. The graph plots the median grain size (M) against the graphic skewness (C), with data points color-coded by lithofacies association The light grey shape represents the depositional processes where the C and M percentiles are expressed in phi (ɸ) while the smaller dark grey is when the CM is expressed in (mm) as in this instance

The first process mainly influenced the sand fractions, manifested in LFA1 with the St, Sp, Sh, Sm, and Sr. The sand grains of these lithofacies are bedload transported by traction currents or intermittent suspension moving in saltation. The high bedload sedimentation rates contribute to the formation of channel bars, which subsequently obstruct the downstream flow, inducing erosional activities upstream. Concurrently, downstream areas characterized by reduced flow velocity and the presence of eddies manifest bars, become sediment sinks. This dynamic equilibrium between erosional processes on the upstream side of the bars and depositional processes on the downstream side facilitates the longitudinal migration of these bars over time. This sediment transport mechanism on the bars gives rise to the development of secondary bedforms, including but not limited to planar beds, dunes, and ripples.

Two­dimensional (2D) dunes occur at low flow speeds, the migration of 2D dunes and give rise to planar cross­bedded facies (Sp). Three-dimensional (3D) dunes have curved lee faces and oval scour pits at the lee foot, producing (St) [32, 49, 50]. Sh facies indicate deposition under upper plane bed conditions, at the transition between subcritical to supercritical flow regime [32], and the recurrence of Sh in small thickness in studied sections indicates recurring dynamic events (i.e., flash floods). The coarse-grain Sh may be formed if the flow is not competent to form coarse-grained dunes, however, it is rarely preserved due to the erosion of coarse-grain sediments by the subsequent flow [32, 51]. Sr facies produced by the migration of climbing in ripples lower regime [52,53,54]. The Sm facies are compositionally identical to the structured facies and may be developed through the entrainment of sandy floodplain material [55].

The second process (LFA2) primarily influenced the clay fraction, manifested in Fm and Fl facies. Fine-grained sediments (i.e., mud, silt, and very fine sand) are transported as suspension loads in the streams and deposited in floodplains. The Fm facies are suggested to be formed in the lower flow as of the deposits of standing water [32, 56]. Fl may be deposited from suspension and weak traction currents. The relatively low occurrence of Fm facies in most of the Al Musawwarat sections can be attributed to poorly preservation conditions in braided river environments where rapid reworking of Fm facies bodies triggered by frequent channel migration frequently, giving rise to the formation of intraformational conglomerate facies.

The third process (LFA3) influenced the gravel fractions represented by Gmm and Gt facies. The un-channelized, base gravel facies can be transported and deposited from cohesive bedload mode during high flow events [24, 45, 49, 51]. The gravel facies with erosive bases can be associated with channel bar formation [57]. Erosion of the riverbed by the dynamic flow of the river results in a scour of the bottom surface, which characterizes the gravel facies and coarse St; these erosion processes leave the gravels and coarser sediment as the flow becomes incompetent for further transportation, resulting in channel lag deposits formation.

Our findings on the Shendi Formation align with and broaden the interpretations of the Cretaceous sandstones in the North African region, which show diverse fluvial lithofacies characteristics. In Sudan, the Gedaref Formation, Odurman, and Bentiu Formations encompass sandstones with various structures representing a braided depositional environment [58,59,60]. Similarly, the Lower Cretaceous Nubian Formation in Egypt shows fine- to very fine-grained subarkosic arenites [61]. In Libya, the Cretaceous sandstone displays features of braided fluvial and meandering river lithofacies, indicating alternating braided and high-sinuosity patterns in the fluvial channels [62, 63]. These studies highlight the variability in lithofacies and depositional environments of Cretaceous sandstones across the region, emphasizing the importance of detailed sedimentological characterization for refining the understanding of the Cretaceous fluvial system. While the previous studies characterized the fluvial depositional environments with varying channel dynamics and overbank processes, the integrated CoDA-PCA approach applied in Shendi Formation represents an advancement. By quantifying subtle variations within GSDs, we were able to determine the optimal number of lithofacies associations correlated to specific depositional processes (channel-bar, overbank, high-energy). This enhanced interpretation contrasts with the more generalized facies associations identification, enabling the identification of more nuanced sub-environments. Integrating CoDA-PCA into future studies of similar formations could further enhance our understanding of regional fluvial system dynamics.

Although the integration of CoDA-PCA helped quantify the lithofacies characterization within the Shendi Formation, it is vital to acknowledge the limitations that may impact the interpretation and generalization of the findings. First, the samples, while strategically chosen, may not fully capture the complex variability of a large fluvial system. Further studies should incorporate more extensive sampling and consider the potential differential preservation of the lithofacies. Furthermore, while the PCA uncovers distinct patterns, direct assignment of the main PCs to specific depositional processes requires caution, and sedimentary criteria to validate the resulting PCs and their interpretations are essential. It is important to consider the influence of grain size sorting, post-deposition changes, and diagenetic processes, and the possibility of overlapping signatures from multiple processes; these factors may require additional analyses (e.g., Paleo-current data, provenance studies) to consolidate the interpretations. Although the lithofacies associations were valuable in characterizing the depositional processes, consideration is required to reason for these findings beyond the study area as variations in geological settings and depositional environments in other regions can result in different associations, therefore thoughtful interpretation and validation against additional data is crucial.

4 Conclusion

  • This study demonstrates the efficacy of integrating compositional data analysis (CoDA) with principal component analysis (PCA) to enhance the characterization of the depositional processes within the fluvial Shendi Formation. By utilizing clr-transformed and PCA, this method overcomes the limitations of traditional approaches reliant on the qualitative identification of lithofacies association. The PCA revealed three principal components, from which three different lithofacies associations were identified, supporting the identification of depositional processes of the Shendi Formation, including those related to channel bar dynamics, overbank deposition, and high-energy flow events.

  • The findings of this research illustrate the potential of the combined CoDA-PCA for enhancing the interpretation of fluvial depositional systems. Its ability to discern variations within grain size distribution datasets offered a powerful tool for more investigations of fluvial formations.

  • The deposition of the Shendi Formation took place within a dynamic fluvial environment characterized by variations in flow energy, channel migration, and periodic overbank flooding. The dominant sandstone lithofacies suggest active bedload transport, while the finer-grained facies reflect periods of lower energy deposition.

  • The insights gained from this study, along with the efficacy of the integrated CoDA-PCA approach, have the potential to contribute to the understanding of fluvial depositional processes within other geological formations.