Permafrost-thaw lake development in Central Yakutia: sedimentary ancient DNA and element analyses from a Holocene sediment record

In Central Yakutia (Siberia) livelihoods of local communities depend on alaas (thermokarst depression) landscapes and the lakes within. Development and dynamics of these alaas lakes are closely connected to climate change, permafrost thawing, catchment conditions, and land use. To reconstruct lake development throughout the Holocene we analyze sedimentary ancient DNA (sedaDNA) and biogeochemistry from a sediment core from Lake Satagay, spanning the last c. 10,800 calibrated years before present (cal yrs BP). SedaDNA of diatoms and macrophytes and microfossil diatom analysis reveal lake formation earlier than 10,700 cal yrs BP. The sedaDNA approach detected 42 amplicon sequence variants (ASVs) of diatom taxa, one ASV of Eustigmatophyceae (Nannochloropsis), and 12 ASVs of macrophytes. We relate diatom and macrophyte community changes to climate-driven shifts in water level and mineral and organic input, which result in variable water conductivity, in-lake productivity, and sediment deposition. We detect a higher lake level and water conductivity in the Early Holocene (c. 10,700–7000 cal yrs BP) compared to other periods, supported by the dominance of Stephanodiscus sp. and Stuckenia pectinata. Further climate warming towards the Mid-Holocene (7000–4700 cal yrs BP) led to a shallowing of Lake Satagay, an increase of the submerged macrophyte Ceratophyllum, and a decline of planktonic diatoms. In the Late Holocene (c. 4700 cal yrs BP–present) stable shallow water conditions are confirmed by small fragilarioid and staurosiroid diatoms dominating the lake. Lake Satagay has not yet reached the final stage of alaas development, but satellite imagery shows an intensification of anthropogenic land use, which in combination with future warming will likely result in a rapid desiccation of the lake.


Introduction
Alaases are thermokarst depressions with meadows (Soloviev 1959) and residual thaw lakes (alaas lakes) which constitute the typical alaas landscape (Bosikov 1991;Crate et al. 2017). Approximately 16,000 alaases are located in the Central Yakutian lowland, covering a total area of 4410 km 2 , corresponding to approximately 17-30% of the total land area of the region (Bosikov 1991). Early Holocene warming in Central Yakutia (Biskaborn et al. 2021) led to thermokarst formation due to degradation of the ice wedge complex (Bosikov 1991;Séjourné et al. 2015). Alaases develop in a weakly sloping catchment as closed or semi-closed thermokarst depressions on top of permafrost. The formation of alaases has three main stages: dyede, tyympy, and a residual lake in the final alaas stage (Bosikov 1991). Dyede means "rounded lake with low but steep banks, including centered polygons in the bottom". As a term, it was introduced by Soloviev (1959) to characterize the initial stage of alaas development. Tyympy is the stage of lake enlargement when underground ice melt results in a drastic increase in the volume of water, and a flattening of the bottom of the basin by sedimentary deposits (Bosikov 1991). In the transition from dyede to tyympy stages, the formation of a depression with lower elevation occurs. The main source of water for alaas lakes is snowmelt in spring and summer precipitation. In the transition of tyympy to the final alaas stage, after partial lake water evaporation, a meadow vegetation appears on their banks. These grass-covered alaas basins and the availability of freshwater in remnant lakes make them ideal settlement areas in the harsh continental climate of Central Yakutia. Alaas lakes are the only sources of water supply for 1/5 of the population of Yakutia (Desyatkin 2004). Development of alaases is a natural process (Soloviev 1959;Bosikov 1991) but intensified agriculture can eventually cause the irreversible loss of alaas lake systems (Desyatkin et al. 2021). Freshwater availability is already a serious concern in Central Yakutia .
It is important to understand the response of alaas lakes and their ecological condition to a warming climate in the past in order to infer their potential development under continued warming and intensified human usage. The composition of aquatic communities, derived from diatoms and macrophytes, in combination with biogeochemical proxies, derived from X-ray fluorescence (XRF) data, are valuable indicators to reconstruct aquatic diversity changes throughout time and the influence of catchment-related environmental factors.
Important bioindicators of lake-water quality are diatoms (class Bacillariophyceae) and macrophytes (Zhao et al. 2021). Diatoms are one of the most commonly used bioindicators and serve as key indicator organisms in boreal regions (Palagushkina et al. 2012). Diatom assemblages have been shown to be sensitive to hydrochemical parameters, water depth, nutrients, and catchment vegetation around the lake, but also indirectly to atmospheric changes such as air temperature and precipitation . Microfossil records of diatoms have been used to reconstruct past diatom compositions in Yakutian lake sediments explored by light-microscopy (Pestryakova et al. 2012;Biskaborn et al. 2016) and sedimentary ancient DNA (sedaDNA) metabarcoding for northern lakes (Stoof-Leichsenring et al. 2020). Although submergent macrophytes are valuable indicators of water and nutrient level fluctuations, the sparse fossil record limits investigations on past macrophyte communities, while sedaDNA suitably archives macrophyte DNA from arctic and alpine areas (Alsos et al. 2018;) which facilitates paleo-reconstructions. Thus, both diatom and macrophyte sedaDNA records are promising proxies for Central Yakutian lake development.
XRF-derived element ratios and carbon and nitrogen geochemistry are valuable proxies for sedimentary characteristics and biogeochemical properties in thermokarst-lake sediments helping to decipher catchment-related changes (Bouchard et al. 2011;Biskaborn et al. 2013Biskaborn et al. , 2019Vyse et al. 2020).
In this study we examine an alaas lake's development through time, exemplified by a sediment core from Lake Satagay, Central Yakutia. We use sedaDNA proxies, targeting diatoms and macrophytes, and XRF data, to reconstruct alaas lake development during the last 10,800 years. We aim to answer the following research questions: (1) How did the aquatic communities of Lake Satagay, exemplified by diatoms and macrophytes, change during the Early, Mid, and Late Holocene? (2) How did catchment-related changes affect lake development and aquatic communities? (3) How is the development of an alaas lake connected to external factors?

Study area
Lake Satagay (N 63.07816°, E 117.99806°) is located in Central Yakutia (Sakha Republic), East Siberia, on the Lena-Viluy terrace, within the catchment of the River Viluy, 805 km west from Yakutsk (Fig. 1a). Permafrost in this region is continuous, with a thickness of c. 400 to 700 m (Fedorov et al. 2018). The climate is characterized as extremely continental with warm summers and very cold, dry winters with a mean annual air temperature ranging between − 11.5 and − 7.5 °C (Fedorov et al. 2018).
In its current state, Lake Satagay is a shallow thermokarst lake within an alaas system (Fig. 1b). There are numerous other alaas landforms close by, with some of them currently used for agriculture (haymaking and cattle and horse pastures). The estimated total area of the alaas system including the lake, wetland, and surrounding grassland is c. 2.5 km 2 . Lake surface area is c. 1.7 km 2 , estimated from an ESRI © satellite image.

Fieldwork
A sediment core and surface water samples were collected during the Expedition "Chukotka-Yakutia 2018", which was organized and conducted jointly by the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research (AWI) and the North-Eastern Federal University (NEFU) (Kruse et al. 2019). In early August 2018, the lake's maximum depth was 1.8 m, with a low water transparency of 0.56 m, and high dissolved organic carbon (DOC) concentration of 29.9 mg/L. Sediment core Fig. 1 Lake Satagay study site. a Overview map of eastern Siberia, with evergreen and deciduous needle-leaf forest classification (merged together from forest classes of the European Space Agency (ESA) Climate Change Initiative (CCI) Landcover) and the position of the Lake Satagay indicated (black star). b Zoomed in map of the Lake Satagay with the position of the core indicated (white star). Background satellite image source: ESRI, Maxar, GeoEye, Earthstar Geographics, CNES/ Airbus DS, USDA, USGS, AeroGRID, IGN, and the GIS User Community EN18224-4, with a diameter of 6 cm and a length of 121 cm, was retrieved using an UWITEC gravity corer with hammer modification from the deepest measured point of the lake (Fig. 1b). Surface water was analyzed in situ with a WTW Multi 340i for pH, conductivity, and oxygen values. Samples from surface water were passed through a cellulose-acetate filter (0.45 µm pore size), and hydrochemical analyses were performed at the laboratories at AWI Potsdam, Germany. Water-chemistry parameters from field measurements together with laboratory results revealed alkaline water with moderate specific conductivity (Table S1).

Sediment core subsampling
Subsampling of the core was performed in a cleaned and cooled chamber at the German Research Centre for Geosciences (GFZ), Potsdam. Core EN18224-4 lacks visible lamination, mostly being a homogeneous, brown color, with only one layer where a brighter shade is present between 42 and 36 cm depth. The core was subsampled at 1 cm intervals and sediment slices were separated into samples for the different analyses, including, for this study, sedaDNA, diatom morphology, and geochemistry.

Dating
The chronology for EN18224-4 is based on radiocarbon age dating of ten bulk sediment samples, spread equally across the core. It indicates continuous sedimentation throughout the last c. 10,800 cal yrs BP covered by the sediment core. A detailed description of the age-depth modelling approach can be found in Glückler et al. (2022). In this study, the mean value of the modelled chronologies per depth, calibrated to calendar years BP, is used. We refer to Walker et al. (2012) for the Holocene zonation, but with the integration of regional climate condition of Central Yakutia (Biskaborn et al. 2012;Ulrich et al. 2017) and the results of this study: Early Holocene (10,700-7000 cal yrs BP) Mid-Holocene (7000-4700 cal yrs BP) and Late Holocene (4700 cal yrs BP-present).

Sedimentology
The sedimentological analytics were performed in the sediment laboratory at AWI Potsdam. For measurement of total organic carbon (TOC), 111 samples, and for total nitrogen (TN) measurement, 17 samples were freeze-dried and homogenized by grinding in a planetary mill. TOC was subsequently measured via loss on ignition in all 111 samples of the sediment core, using a soliTOC cube analyzer (Elementar Analysensysteme GmbH, Germany). Total inorganic carbon (TIC) and TOC are given in weight percentage (wt%). Nitrogen was measured from the 17 freezedried samples spread across the core, using a TruMac CNS elemental analyzer (LECO, Germany). Following Meyers (2002), wt% of TOC and TN are used to calculate the C/N ratio, multiplied by the ratio of atomic weights of N-C (1.167) to yield C/N atomic ratios (TOC/TN atomic ).

XRF scanning
To get high-resolution data on elemental concentrations for the sediment core, one half was scanned using an Avaatech X-Ray Fluorescence Spectrometer (XRF) Core Scanner at the Bundesanstalt für Geowissenschaften und Rohstoffe (BGR), Berlin, Germany. Twenty-four elements were measured to obtain the chemical composition of the core. We performed a 2 mm step scan using 10 kV at 750 µA, 16 s count time without filter for light elements, and 30 kV at 1500 µA with Pd-Thick filter for heavier elements. For this study, we used nine elements (Fe, Mn, Sr, Rb, Si, Al, Ti, Br, and Zr).

Microfossil diatom preparation
Diatom slide preparation for microscopic investigations followed a standard procedure (Battarbee et al. 2001) and was conducted in a dedicated diatom laboratory at AWI Potsdam. Ten samples of 0.5-1.0 g of sediment were freeze-dried to remove all organic material. Samples were cooked at 90 °C in 30% hydrogen peroxide under a fume hood for 7-8 h. Then, hydrochloric acid was added to stop the reaction and remove remaining carbonates. Diatom slides were fixed on a glass slide using a hot plate and Naphrax resin. Diatom counting was performed using a Zeiss Axioscope A1 light microscope with a Plan-Apochromat 100 × /1.4 Oil Ph3 objective at a magnification of 1000x. A minimum of 500 diatom valves were counted for each sample (Pestryakova et al. 2012). Diatom-valve concentration was estimated using a formula from Pestryakova et al. (2016). The number of diatom valves in 1 g of sediment equals the quotient of bcde divided by fgh, where b is dilution; c is the number of rows in the cover slips at 100 × magnification; d is the number of fields in a row; e is the number of counted valves, and divisors are: f dry sediment; g drop volume (Battarbee cup), and h number of viewed fields of view. Diatom identification was supported by using established literature (Krammer andLange-Bertalot, 1986-1991;Lange-Bertalot 1993;Lange-Bertalot and Metzeltin 1996), and an online database was used for synonyms (http:// www. algae base. org). Habitat classification is based on Pestryakova et al. (2012Pestryakova et al. ( , 2016Pestryakova et al. ( , 2018.

SedaDNA approach
SedaDNA was obtained from 65 core sediment samples. DNA extractions were performed following Epp et al. (2019) using either the Dneasy PowerSoil DNA Isolation Kit (Qiagen, Germany) with 250 mg sediment input, or the Dneasy PowerSoil Max DNA Isolation Kit (Qiagen, Germany) with about 1.0-2.4 g sediment input. Samples were extracted twice to realize a total sediment input of 500 mg. A few samples needed to be extracted four times (4 × 250 mg) to gain enough DNA for polymerase chain reaction (PCR) amplification. For each extraction batch one blank was added to control for putative contamination in the DNA extraction chemicals.
PCRs for diatom and plant metabarcoding were run in three replicates along with No Template Controls (NTCs) to control chemical contamination of PCR chemicals. PCR products were amplified in a Biometra ThermoCycler (Biometra, Germany) at initial denaturation at 95 °C for 5 min, followed by PCR cycles with denaturation at 95 °C for 30 s, annealing at 50 °C for 30 s and elongation at 68 °C for 30 s. For the amplification of diatom DNA, we used 50 cycles, while 40 cycle PCRs were run for the amplification of plant DNA. Each PCR product had been checked by running gel electrophoresis in 2% agarose gel and displaying bands in UV (Jena Analytics).
Purification of PCR products for each sample was done with MinElute (Qiagen, Germany), following the manufacturer's protocol. The first pool contained 3 × 61 PCR products from plant DNA amplifications. The second pool contained 3 × 65 PCR products from diatom DNA. Pooled samples were prepared with a PCR-free library (metafast protocol) and sequenced in paired-end mode (2 × 150 bp) on an Illumina NextSeq 500 platform (Illumina Inc.) at the external sequencing service (Genesupport, Fasteris SA, Switzerland).

Sequence-data processing and filtering
After Illumina sequencing, data were processed using OBITools3 (https:// git. metab arcod ing. org/ obito ols/ obito ols3). A general description of the bioinformatic pipeline is given in Boyer et al. (2015). To align forward and reverse reads, data were combined using the "obi alignpairedend" command, and to remove unpaired reads "obi grep" was used. To allocate reads to the corresponding combinations of primer tags the command "obi ngsfilter" was used. Grouping and counting of identical sequences was performed using "obi uniq". Denoising data using "obi annotate" removed sequences with < 10 counts and < 10 length. The taxonomic assignment was conducted by using different databases. Diatom and plant EMBL reference databases were built by applying "obi ecopcr" with diatom and plant primers, respectively, on the EMBL Nucleotide Sequence Database (Release 143, April 2020). Ecopcr results were then converted with "build_ref_db" into an ecopcr database that "obi ecotag" uses for taxonomic classification.
Plant DNA sequences were matched against the EMBL Nucleotide Sequence Database and the Arctic and Boreal vascular plant and bryophytes database (Sønstebø et al. 2010;Willerslev et al. 2014;Soininen et al. 2015) for plants. Only 100% matches with either the Arctic or EMBL database were considered. If Arctic and EMBL matches both showed 100%, the taxonomic name from the Arctic database was selected. Results from the OBITools pipeline of the diatom dataset were further filtered at the level of amplicon sequence variants (ASVs). The taxonomic assignment was based on 98-100% similarity to at least one entry of the EMBL reference database. ASVs not belonging to Bacillariophyta (diatoms) were discarded. ASVs were assigned "Scientific names" (to the best possible assignment which can vary between family, genus, or species level). Final count data were resampled (https:// github. com/ Stefa nKruse/ R_ Raref action) to the minimum number of counts (n = 2050, derived from 65 samples) to normalize the dataset prior to subsequent statistical analyses. This process was repeated 100 times and the mean value calculated. The final dataset includes 422 ASVs, which are grouped into 43 unique "Scientific names".
Results from OBITools3 pipeline of the plant dataset were further filtered and aggregated at the best scientific name level. Terrestrial plants were excluded and aquatic plant data were resampled to the minimum number of counts (n = 1653, derived from 61 samples). This process was repeated 100 times and the mean value calculated. The final dataset consists of 14 unique taxa, including ten submergent plant types and two emergent macrophytes. In addition, four cyanobacterial taxa were identified with the plant metabarcoding approach and were kept in the dataset.

Statistical analyses
All statistical analyses and visualizations were completed in R v. 3.6.1 (R Core Team 2019). Elemental compositional data from XRF scanning are given in counts per second (cps). Single element values were transformed with a centered-log ratio (clr) transformation, while for ratios of elements, an additive-log ratio (alr) transformation was applied (R package "compositions"; van den Boogaart and Tolosana-Delgado 2006). Ages were extrapolated using "na. approx" (R package "stats") for the high-resolution XRF data, in order to run pairwise correlation and PCA. Two gaps in the TOC and TIC data were linearly interpolated using the "approx" function (R package "stats"). Pearson correlation analysis was performed on sedimentological and geochemical variables to test for correlation between the elements and the element ratio data with TOC, TIC, and the TOC/TN atomic ratio.
Bar plots of taxa were produced for sedaDNA and the microscopic diatom datasets using the strat. plot function from the rioja R-package (Juggins 2019). Relative abundance from resampled counts of sedaDNA diatoms and macrophytes as well as of diatoms was calculated for CONISS clustering and evaluated by the "broken stick" model in R (packages "vegan", Oksanen et al. 2015, and "rioja"; Juggins 2019). The number of zones was defined by a dispersion value higher than the random brokenstick model.
Principal component analysis (PCA) was used to illustrate the compositional change in a combined dataset, including of diatoms and macrophytes, the impact of selected sedimentological and geochemical as explanatory variables for community changes. Therefore, log-transformed relative proportions for sedaDNA data and selected clr and alr transformed XRF and square-rooted TOC and TIC values were plotted via "envfit" onto the PCA ordination calculated with the "rda" function. RDA (redundancy analysis) using the function "rda" was applied to calculate the single and unique (using the function "Condition") proportion of explained variance. Additionally, we calculated the proportion of all significant explanatory variables and by using the function "ordistep" (direction = forward selection) an automatic search that maximizes the R 2 in order to select the best explanatory variables. The function "RsquareAdj" calculates the adjusted R using a permutation approach. All steps of the ordination analyses were performed with the "vegan" R-package (Oksanen et al. 2015).

Sedimentological and geochemical temporal stages of the lake
The sediment core EN18224-4 is characterized by a stable sedimentation rate (mean of 0.14 ± 0.09 mm year −1 ), with one exception in the period between 4500 and 5500 cal yrs BP, where the sedimentation rate increases to 0.42 mm year −1 .
TOC gradually increases during the Early Holocene (from 17.7 to 39.5 wt%) and from 6000 cal yrs BP to recent it stays high (mean value 43.3 wt%; Fig. 2), except between 5000 and 4300 cal yrs BP, when TOC values abruptly decrease to 21.8 wt% (at c. 4900 cal yrs BP). TIC in the Early Holocene from c. 10,500 to 9700 cal yrs BP is relatively high (mean 0.9 wt%), then decreases gradually and stays relatively low in the Mid-Holocene (mean of 0.08 wt%). The highest TIC is between c. 4900 and 4700 cal yrs BP in the Mid-Holocene.
Values of TOC/TN atomic range from 8.4 to 14.9. In the Early Holocene, the TOC/TN atomic maximum is 14.9 (at 10,789 cal yrs BP) and the minimum is 10.8 (at c. 9000 cal yrs BP). Values are stable in the Mid-Holocene, ranging from 11.6 to 10.5. One peak occurs in the Late Holocene (13.0; c. 4900 cal yrs BP) before it abruptly decreases to 8.4 (c. 4200 cal yrs BP).
The Sr/Rb shows two peaks at c. 5100 cal yrs BP (5.1) and at c. 4740 cal yrs BP (5.3). An initial period of increase runs from c. 10,830 cal yrs BP to 8860 cal yrs BP. After this period, the Sr/ Rb of c. 4.0 gradually decreases down to 1.6 at around 6140 cal yrs BP. The ratio shifts to a second period of increase until c. 4740 cal yrs BP and then decreases until 600 cal yrs BP. The Mn/Fe increases in the Early Holocene and through the Mid-Holocene towards a maximum of − 1.6 in the Late Holocene at c. 3790 cal yrs BP. Then Mn/Fe decreases to a minimum of − 4.4 at c. 820 cal yrs BP. The Si/Al has lowest values of around 2.6 in the Early Holocene between 10,830 to 10,280 cal yrs BP. The Si/Ti is rather stable with a mean around 0.6. Relatively higher values occur only between 4970 and 4640 cal yrs BP. The Br shows a peak at c. 9870 cal yrs BP (− 0.2) and a second at c. 6960 cal yrs BP (− 0.4). The Zr is high at around 0.2 in the period between c. 10,830 and 9790 cal yrs BP, while its overall mean is − 0.36. Fig. 2 Vertical profile of selected XRF elements and ratios (centered-log ratio (clr) and additive-log ratio (alr) transformed and smoothed) and concentrations of total organic carbon (TOC (wt%)) and total inorganic carbon (TIC (wt%)) and distribution of TOC/TN atomic ratio over time. Grey dashed lines divide the core into Early, Mid, and Late Holocene.

Diatom composition derived from the sedaDNA metabarcoding approach
Our raw data comprises a total of 13,028,150 counts and 2311 ASVs. After filtering for 98-100% similarity to the reference database, 9,895,410 counts remained. Of these, 90.05% is assigned to Bacillariophyceae (diatoms; 8,991,139 counts), 9.93% to Eustigmatophyceae (Nannochloropsis; 982,918 counts), and 0.01% to Heteropediaceae (900 counts) and Kryptoperidiniaceae (86 counts), while 367 counts were sequenced from blanks and NTCs. As diatoms and Nannochloropsis are phylogenetically placed within the group of Stramenopiles, Nannochloropsis was kept. We discarded Heteropediaceae, Kryptoperidiniaceae, blanks, and NTCs in a further filtering step. The final dataset contains 9,894,057 reads (75.95% of raw total reads) and aggregates to 422 unique ASVs. Rarefaction to minimal counts (n = 2050) decreased the dataset to a total count of 129,150. After filtering and resampling, two samples were discarded. 422 ASVs were grouped into 43 taxa names, of which 42 taxa cover different levels of taxonomy within Bacillariophyceae, and one taxon of Eustigmatophyceae (Fig. 3).
CONISS clustering divided the diatom record into two zones. Zone I covers the period between 10,700 and c. 5800 cal yrs BP, and thus spans from the Early Holocene to the first half of the Mid-Holocene. Zone II stretches from the second half of the Mid-Holocene (c. 5800 cal yrs BP) through the whole Late Holocene until the present day. Diatom richness varies from 30 taxa that were detected in the Early Holocene, to 23 in the Mid-Holocene (7000-5000 cal yrs BP), and 31 taxa in the Late Holocene.
In general, Pseudostaurosira elliptica (27.8%), Stephanodiscus (23.5%), and Nannochloropsis (20.5%) are dominant within zone I, whereas unclassified Staurosira (13.5%) is subdominant. However, in the Early Holocene (10,700-c. 7600 cal yrs BP) there are some interchangeable shifts of the ascendancy between dominant and subdominant taxa. After an abrupt extinction of Stephanodiscus (no reads), Pseudostaurosira elliptica, unclassified Staurosira, Ulnaria, and Pinnularia brebissonii alternately share dominance. In zone II, a shared dominance of Pseudostaurosira elliptica and Nannochloropsis is indicated at the end of the Mid-Holocene, while unclassified Staurosira remains subdominant. Stephanodiscus ASVs indicate a relatively small occurrence in zone II. Zone II displays a drastic decrease of planktonic diatom abundance when compared to zone I.

Diatom composition derived from the microscopic approach
The number of counted diatom valves in each of ten microscopic samples ranges from 517 to 722 valves/ sample (mean = 576). Mean diatom concentration per 1 g of sediment is 1.41 × 10 6 . The highest concentration (3.59 × 10 6 per 1 g of dry sediment) is found in a Mid-Holocene sample (48 cm depth, c. 5000 cal yrs BP), whereas the lowest concentration (0.19 × 10 6 per 1 g of dry sediment) is from an Early Holocene sample (c. 9300 cal yrs BP). Diatom-valve concentration per 1 g of dry sediment is higher in the Mid-Holocene (mean 2.83 × 10 6 ) compared to the Late Holocene (0.69 × 10 6 ) and the Early Holocene (1.28 × 10 6 ). In total, 38 diatom taxa were detected, including seven taxa identified to genus level and 31 to species level (Fig. 4). Taxonomically, the final fossil diatom dataset includes 16 families, 23 genera, and 31 species.
The most prominent species are epiphytic (84% of the whole dataset): Staurosira venter (corresponding to 27%), Pseudostaurosira brevistrata (26%), and S. construens (13%). The most abundant planktonic diatom, Fragilaria capucina, comprises 8% of the whole dataset, while other representatives such as Aulacaseira ambigua, Cyclotella radiosa, Fig. 4 Relative abundance (%) of diatom taxa in the assemblages gained from morphological identification. Total species richness and valve concentration are given on the right-hand side before the Holocene zonation for the core (EH: Early Holocene, MH: Mid-Holocene, LH: Late Holocene). Blue bars indicate planktonic diatoms, green bars indicate epiphytic diatoms, and brown bars indicate benthic diatoms. Note the different scales of relative abundance and Stephanodiscus hantzschii, comprise only 2% of the dataset. Benthic species comprise only 6% of the whole dataset. Stephanodiscus hantzschii was detected only in samples from the early and Mid-Holocene.
A CONISS clustering on aquatic plants and cyanobacteria data separates the core into four zones (Fig. 5). Zone I (10,700-9300 cal yrs BP) is dominated by Potamogeton perfoliatus and Stuckenia pectinata as well as the occurrence of cyanobacteria Planktothrix and Dolichospermum crissum. Zone II (c. 9300 to 7200 cal yrs BP) is characterized by Potamogeton, Stuckenia pectinata, and Ceratophyllum demersum, which is the predominant type from 7200 cal yrs BP up to recent times. In zone III (c. 7200-2700 cal yrs BP) only occasionally are Potamogeton, Menyanthes and Nymphaeaceae found in the aquatic plant community. In zone IV (c. 2700 cal yrs BP to present), in addition to the dominant Ceratophyllum type, Myriophyllum sibiricum and Potamogeton praelongus reach higher abundances.

Diatom and macrophyte communities and its relationship with sedimentological and geochemical data
The PCA biplot of the first and second principal components (PC) explain 31.5% and 14.3%, respectively, of the variance in the combined sedaDNA dataset of diatoms, including Nannochloropsis and macrophytes (Fig. 6). The ordination separates Mid-Holocene from Early and Late Holocene samples. Variables of environmental data show high association of Mn/ Fe, TOC and TIC along the PC1 axis and Zr along the PC2 axis. A dominance of Ceratophyllum demersum is associated with high Mn/Fe ratios, whereas a  Table S2.

Discussion
Lake deepening in the early Holocene (10,700-7000 cal yrs BP) We infer that the thermokarst processes leading to the formation of Lake Satagay were likely initiated in the late glacial and that the lake already existed at the beginning of the Early Holocene. This is supported by the presence of typically cold-adapted freshwater diatoms and submergent aquatic plants detected in the oldest sample (10,700 cal yrs BP) of the sediment core (Fig. 3). In addition, the presence of algal cells in non-pollen palynomorphs also suggests freshwater conditions already in the Early Holocene . This is in accordance with other paleolimnological studies from the Viluy region, which infer that thermokarst development in Central Yakutia began either during the Allerød oscillation (c. 13,500-12,500 cal yrs BP; Andreev 2000), or since the late glacial (c. 12,000 cal yrs BP; Crate et al. 2017). Given the existence of pioneering small benthic fragilarioid taxa, the lake was shallow at the very beginning of the Early Holocene and would thus freeze to the bottom. Similar conditions are seen from the initial formation of other thermokarst lakes in the region (Biskaborn et al. 2012). Biskaborn et al. (2012) suggest that a combination of small benthic fragilarioid taxa together with low TOC and low diatom concentration is indicative of a colder lake environment with extended seasonal ice cover. Lake Satagay's existence and shallow depth in the Early Fig. 6 Biplot showing the results of the principal component analysis (PCA) performed on the most dominant sedaDNA diatom taxa and Nannochloropsis, indicating their relation to physicochemical variables within Holocene zones (numbers next to sample points refer to their age in cal yrs BP) Holocene (c. 10,700-10,300 cal yrs BP) are further indicated by a positive correlation (r = 0.55; Figure  S3) between TOC and Br such as for the glacial Lake Ilirney in Chukotka (Vyse et al. 2020) both suggesting low productivity. In addition, relatively low Si/ Al ratios indicate low productivity (Vyse et al. 2020), with relatively high detrital input indicated by Zr values (Biskaborn et al. 2021). These conditions of low productivity and relatively high detrital input fit well to the initial dyede stage of alaas lake development (Bosikov 1991;Fedorov 2022).
Our data suggest deepening of Lake Satagay during the Early Holocene (c. 10,300-10,000 cal yrs BP). At this time, a deepening in the dyede stage is possible as the upper layer of the ice complex thaws with high potential for thaw subsidence . We assume that the high values of the Sr/Rb ratio detected in the Lake Satagay record during this time could indicate terrestrial input of larger grainsize particles (Biskaborn et al. , 2021. The main evidence for a deeper lake during this time is given by the dominant planktonic diatom ASVs of Stephanodiscus and Aulacoseira granulata that occurred in highest abundance in the Early Holocene (Fig. 3). Stephanodiscus hantzschii, for example, has been verified as an indicator species for alaas lakes and is known to occur in magnesium-rich and highly mineralized waters (Pestryakova et al. 2012). The maximum growth rate of Stephanodiscus occurs at c. 5 ± 6 °C water temperature in nutrient-rich conditions under ice (van Donk & Kilham 1990). The occurrence of planktonic and under-ice taxa indicate that Lake Satagay was thus deep enough not to freeze down to the bottom completely, as is known to happen to shallow thermokarst lakes during long and cold winters (Pestryakova et al. 2012). Concurrent with Stephanodiscus ASVs, are ASVs of the planktonic microalgae Nannochloropsis. Nannochloropsis limnetica, for example, is known from freshwater/brackish habitats (Krienitz et al. 2000) and can survive under ice (Fietz et al. 2005)

and outcompete
Stephanodiscus (Marzetz et al. 2017). However, the morphological dataset does not fully support the findings of the genetic dataset. In particular, Stephanodiscus hantzschii is not the dominant taxon in the Early Holocene samples; Pseudostaurosira brevistrata is instead. Further, the submergent macrophyte Potamogeton perfoliatus, which grows at a water level optimum between 2.5 and 3 m, supports the inference of deeper water conditions in Lake Satagay during the Early Holocene.
More evidence of a prevailing input of terrestrial plant material is provided by the relatively high TOC/TN atomic ratio in this period (Meyers and Teranes 2002;Biskaborn et al. 2013). A pollen-based vegetation reconstruction by Glückler et al. (2022) reveals open woodlands with extensive grasslands, while charcoal particles reveal increased wildfire disturbances until 8500 cal yrs BP. Potentially, any post-fire thermokarst activity may have contributed to this increase of erosional input (Jones et al. 2015). In addition, the occurrence of warmer periods known from the detection of Pinus pollen in lake sediment cores from Central Yakutia at about 9600 cal yrs BP (Andreev et al. 1992) likely accelerated thermokarst processes in the Early Holocene and Lake Satagay seems to have shifted to a deeper tyympy stage lake after c. 10,000 cal yrs BP until 8000 cal yrs BP.
With rising temperatures after c. 8000 cal yrs BP (Andreev et al. 1992) the lake was likely exposed to intensified evaporation in dry summers. A putative lake shallowing at the end of the Early Holocene along with a high specific conductivity regime in the lake are supported by the presence of the submergent Stuckenia pectinata. This taxon grows in shallow water (0.2-2.0 m; Kipriyanova et al. 2017) and forms large and dense shrubs in brackish lakes with high levels of water conductivity (Herzschuh et al. 2010).
High lake productivity in the Mid-Holocene (7000-4700 cal yrs BP) Mid-Holocene warming was marked by the establishment of modern-like taiga vegetation after c. 7000 cal yrs BP (Müller et al. 2009). During the Holocene Thermal Maximum (HTM) in Central Yakutia (c. 7000-5000 cal yrs BP) an acceleration of thermokarst processes is assumed . Similar warmer temperatures are reported from Lake Temje between c. 6700 and 4800 cal yrs BP (Nazarova et al. 2013). Warmer HTM temperatures most likely led to a reduction in the ice-covered periods of Lake Satagay, as supported by higher values of the commonly used proxy for lacustrine redox conditions Mn/Fe and increased lake productivity indicated by Br (Vyse et al. 2020).
In the transition from the Early Holocene to the Mid-Holocene both diatom approaches reveal a compositional change. In the metabarcoding data, Cymbellaceae, Epithemia, and Navicula radiosa occur. High abundances of Stauroneis phoenicentron and Navicula radiosa, also seen in the microscopic approach, are indicative of an assumed intermediate lake specific conductivity (100-500 µS cm −1 ; Pestryakova et al. 2018) throughout this period. We assume the lower specific conductivity and lake shallowing resulted in a significant loss of Stephanodiscus and its displacement by the highly productive phytoplankton Nannochloropsis (Krienitz et al. 2000;Marzetz et al. 2005) at the beginning of the Mid-Holocene. A scarce occurrence of Stephanodiscus has also been attributed to lake shallowing in another study from Sysy-Kyuele, a thermokarst lake in northern Yakutia (Biskaborn et al. 2012). Also, the increase of larger epiphytic diatoms, such as Gomphonema, Ulnaria, and Fragilaria dilatata is known from another Mid-Holocene microfossil record from Lake Boguda (Pestryakova et al. 2012). The detection of larger diatoms, peak values of biogenic silica, Si/Ti and Si/Al ratios, and total valve concentration (2.8 × 10 6 g −1 ) can be indicative of a generally high primary productivity. High primary productivity of diatoms is thus likely to increase the sedimentation rate which peaks between c. 5500 and 4500 cal yrs BP in Lake Satagay. Similarly, biological diversification and increased bioproduction is indicated from Lake Sysy Kyuele at around 5700 cal yrs BP (Biskaborn et al. 2012).
The diversity of macrophytes is dominated by Ceratophyllum demersum during the Mid-Holocene. Interestingly, Ceratophyllum demersum is known to reduce water mixing and increase water clarity through allelopathy (Gross et al. 2003). This phenomenon is characterized by the excretion of substances inhibiting algae growth, which might have caused the abrupt disappearance of Nannochloropsis between c. 6000 and 5000 cal yrs BP. Also, algae among the nonpollen palynomorph data decrease after c. 5100 cal yrs BP , while pine trees spread rapidly during that time, forming a dense forest which increases water consumption and limits water supply to the lake, causing continuous lake shallowing. Furthermore, forest cover leads to a high allochthonous organic input to the lake, leading to higher DOC concentrations than e.g. in tundra environments (Duff et al. 1998;Dulias et al. 2017). High DOC in the lake considerably reduces water transparency that impacts the diatom composition (Pienitz and Smol 1993;Saulnier-Talbot et al. 2003;Huang et al. 2020).
Towards the end of the Mid-Holocene there is a pronounced drop of TOC at c. 4900 cal yrs BP, which could be connected to a high sedimentation rate ). An increase in sedimentation rate could be caused by the erosion of an activated thaw slump during activity triggered by permafrost thaw (Sejourne et al. 2015). Thaw slumping occurred in the tyympy stage may lead to the transition from this stage to the next, because it leads to the filling of a lake basin with sediment. This gradually transforms any remaining steep shoreline of the tyympy into the gentle slopes of the resulting alaas basin. Our results show a peak of TIC and Sr/Rb erosional input proxies indicating lake-basin infilling with sediments and therefore further shallowing of the lake. Moreover, in shallow lakes the limited water depth is another important factor regulating diatom composition (Dulias et al. 2017). Shallow lakes are prone to freeze in winter (Pestryakova et al. 2012). Diatom composition in this extreme condition is mentioned above and in Biskaborn et al. (2012). Indeed, the low depth could have caused the significant decrease of planktonic diatoms and Nannochloropsis after c. 4700 cal yrs BP.
Stabilization of a shallow lake system in the Late Holocene (4700 cal yrs BP-present) The Late Holocene is characterized by a general cooling of the climate. Paleolimnological data from Central Yakutia indicate a decline in temperature after about 4700 cal yrs BP (Nazarova et al. 2013). Our data indicate an overall lower lake productivity compared to the Mid-Holocene. The overall reduction of thermokarst processes due to this cooling trend and the presence of macrophytes may be responsible for the oligotrophic water conditions during the Late Holocene. In addition, previous lake shallowing in the warmer Mid-Holocene and cooler conditions in the Late Holocene, which potentially allowed the lake to freeze down to the bottom, resulted in a change from planktonic to smaller benthic diatoms.
Similar findings of an abrupt decrease in planktonic diatoms, such as Aulacoseira italica (c. 4200 cal yrs BP), are recorded from another lake named Satagay in the Viluysky District by microscopic diatom inspection (Pestryakova et al. 2012).
Both approaches of diatom analyses in our study reveal that there were similar lake conditions at the beginning of the Early Holocene and in the Late Holocene, as shown by the occurrence of smaller fragilarioid types. These include Staurosira venter, S. construens, and Pseudostaurosira brevistrata. Small benthic fragilarioid taxa prefer oligotrophic-neutral to alkaline lakes and can withstand rather unstable environmental conditions, such as caused by annual freezing to the bottom or by flooding, which are common characteristics in permafrost-affected taiga lakes in Central Yakutia . Prevailing abundance of epiphytic diatoms in the reconstructed diatom composition of the Late Holocene may hint at decreasing lake depth and also at decreasing light availability by the increase of allochthonous DOC due to forest-cover establishment in the lake catchment.
Ceratophyllum demersum is the only macrophyte type at the beginning of the Late Holocene, warmer conditions since c. 2500 cal yrs BP facilitated the growth of Myriophyllum sibiricum and Potamogeton. This community shift is likely linked to further lake shallowing during the Late Holocene. Other study sites in Central Yakutia indicate a development towards the final alaas stage at about 6350-5870 cal yrs BP , resulting in a smaller residual lake in the deepest part of the alaas basin. To date, Lake Satagay has not yet reached the final alaas stage with alaas basin drying. The Sr/ Rb ratio, since the beginning of the Late Holocene, has decreased towards the present, suggesting reduced thermokarst erosion and thaw-slumping activity similar to processes that are described in Biskaborn et al. (2013) for a thermokarst lake in Northern Yakutia.

Anthropogenic usage of Lake Satagay
Alaas landscapes have a strong cultural relevance, and the alaas basins in particular are subjected to human activities (Pestryakova et al. 2012;Desyatkin et al. 2021). "Satagay" in the Sakha language means "unsettled" or "unresolved". On satellite images from ESRI Satellite ArcGIS/World Imagery (dated to 29/07/2017) we detected no development of grassland around Lake Satagay and thus no visible active agricultural use of the lake so far. However, the existence of several paths through the forest indicate that the lake is used, either as a drinking water source, or as a fishing and/or hunting lake. Since Lake Satagay is located in a basin, the lack of a natural outflow contributes to waterlogging, decreasing its chances of drying out and developing into a final alaas. However, the satellite imagery revealed a 1 km long canal that was dug from the eastern end of Lake Satagay alaas. Most likely this canal is used to control the water level of the neighboring Lake Mongus and could artificially cause accelerated drying of Lake Satagay. Studies from alaases in Central Yakutia, east of the Lena River (Lena-Amga interfluve), reveal that lake cessation takes c. 1500-2000 years , allowing for the higher evaporation of that region (Kumke et al. 2007).

Conclusions
The multi-proxy record from Lake Satagay exemplifies a climate-dependent thermokarst and alaas lake development in Central Yakutia since the Early Holocene. The paleolimnological reconstructions are mainly based on sedimentary ancient DNA (sedaDNA) targeting diatoms and Nannochloropsis, as well as macrophytes to characterize the turnover of aquatic communities during different climatic periods throughout the Holocene. Along with biological proxies, XRF and sedimentological data were used to infer catchment-related and lake-internal geochemical changes. Both biological and geochemical data complement each other and support the assignment of the following lake developmental stages of Lake Satagay: initial lake formation during the deglacial, lake deepening caused by rising temperatures in the Early Holocene, lake shallowing and high productivity during the Mid-Holocene, and stable shallow conditions during the cooler Late Holocene. The lake developmental stages coincide with known alaas stages and suggest that Lake Satagay has not yet reached the final stage of alaas lake development.
Rising temperatures, recent anthropogenic impact and artificial draining of the lake might, however, accelerate desiccation. Eventually, this may lead to the loss of Lake Satagay and thus an important freshwater source for the local population, while being unable to serve as an agricultural alaas for haymaking and livestock farming. Since rapidly changing environmental conditions currently affect the whole boreal zone, many more alaas lakes may be experiencing similar developments. To sustain these landscapes for the local communities, it is important to understand long-term lake development in similar lake systems to allow better future management of these unique lake types.