An integrative study of species distribution modelling and conservation genetics: Magnolia in Hispaniola

On the island of Hispaniola, five endemic species of Magnolia occur, all of which are threatened with extinction. Little is known about their distribution and genetic health, hampering targeted conservation actions. The objective of this study is to assess the potential distribution and the genetic health of the Magnolias of Hispaniola, to inform concrete guidelines for effective conservation management. Using species distribution modelling (SDM), we predict habitat suitability for the Magnolias of Hispaniola by analysing 21 variables, describing climate and landscape features, on 635 occurrences. We genotyped 417 individuals using 16 microsatellite markers, to test for genetic structure and degree of inbreeding. The SDM and genetic data confirm the recognition of the four studied Magnolia species. The known individuals of the three Dominican Magnolias are structured into five populations which show ample genetic diversity and little inbreeding overall. For conservation management, we propose to focus on exploration using the SDM results, and protection and reinforcement using the genetic and occurrence data. The genetic results guide prioritization of species and populations. The SDM results guide spatial prioritization. Installing and/or protecting habitat corridors between populations, starting with the two species with the lowest genetic diversity and relatively nearby populations, is recommended as a durable conservation strategy. Meanwhile, reinforcement efforts can be undertaken to artificially increase gene flow for which we appoint sink and source population pairs, using the genetic data.


Introduction
Tropical montane cloud forests (henceforth TMCFs) comprise 0.26% of the Earth's surface, yet they are home to a disproportionately high number of species (Bubb et al. 2004). This habitat is characterized by heavy rainfall, high species endemism and low resilience to disturbance (Hamilton et al. 1995). Despite their importance of harbouring high numbers of (endemic) biodiversity, TMCFs face similar threats as many other tropical forest habitats (Bubb et al. 2004). While progress on the sustainable conservation of TMCFS is being made (López- Barrera et al. 2017), many challenges still need to be overcome (Hamilton et al. 1995). For example, Wilson and Rhemtulla (2018) showed that larger and more intact cloud forests contained more tree species, but small remnant forest patches still contribute significantly to biodiversity. They concluded that the small remnant forest patches are essential building-blocks for reserve design aimed at conserving landscape-level tree diversity. Yet, rising anthropogenic pressures are altering forest species compositions and ecosystem functioning (Malhi et al. 2014). To be effective, management mitigation and restoration actions need, amongst others, information on the current and potential distribution of high-quality areas, as well as on the current state of remnant populations, information which is often unavailable (Toledo-Aceves et al. 2021).
One area that exemplifies the situation of TMCFs and the challenge to their conservation is the island of Hispaniola, the second largest Caribbean Island. It is divided in two countries, Haiti and the Dominican Republic (henceforth DR). Haiti is one of the most deforested countries on earth, with less than 1% of its original primary forests remaining (Hedges et al. 2018), yet the DR suffers likewise from forest loss and degradation (Sangermano et al. 2015a). Nonetheless, Hispaniola remains a biodiversity hotspot due to its high species endemism, housing for example about 6000 endemic plant species (Maunder et al. 2008;  (Castillo 2016), which are all threatened to some degree (Rivers et al. 2016) and consisting of only a few (known) populations each (Fig. 1). The Magnolias of Hispaniola are at great risk of extinction, with one of the most prevalent factors being habitat loss due to timber exploitation and land conversion (Rivers et al. 2016;Castillo et al. 2018;Veltjen et al. 2019). As these species reside in TMCFs, their survival is intricately linked to the preservation of this habitat. Because Magnolia is an eye-catching genus due to its beautiful flowers, aromatic leaves and ornamental use (Fig. 2), it serves as an umbrella species for conservation (Roberge and Angelstam 2004), safeguarding both other species co-occurring in the TMCFs, and the habitat itself.
To construct effective conservation strategies, several sources of information are essential: identification of the most important threats, a clear delineation of species identity, sufficient knowledge on the species' habitat requirements and their current and potential geographical distribution, and a quantification of the genetic diversity of remnant populations (Kramer and Havens 2009;IUCN 2017). Ideally, all this information is known and allows one to confidently and correctly define 'management units', i.e., groups of individuals that are clustered together for conservation management purposes to contain the genetic diversity necessary to ensure evolvability in the light of changing environments and local adaptation (Weckworth et al. 2018). While some data on the geographical distribution of the Magnolias of Hispaniola are available (Castillo et al. 2018;Veltjen et al. 2019), their potential and actual distribution remain unclear. Similar doubts exist regarding the genetic diversity and structure of the currently known populations. Veltjen et al. (2019) assessed the population structure and genetic health for all extant Caribbean Magnolia species, including four out of five Magnolia species from Hispaniola (Magnolia emarginata was not assessed as individuals of this Haitian species have only been rediscovered by Eladio M. Fernández Ensemble-based distribution map for the Dominican Republic using "vote counting". Habitat suitability ranges from zero to fifteen, representing the number of ensemble analyses that labelled a pixel as suitable (i.e., TSS-value > 0.7). Forest 2000 layer is derived from Sangermano et al. (2015b)

Fig. 3
Ensemble-based distribution map for the Magnolia species from Hispaniola and the Dominican Republic using weighted averaging. Habitat suitability ranges from zero to one, representing the weighted mean TSS-value for each pixel Table 1 Sample information of the three Magnolia species from the Dominican Republic.
Species: taxonomy according to Howard (1948). in June 2022 after being lost to science for 97 years (Nowshin 2022)). However, the Veltjen et al. (2019) study used only two localities per species, with 20 individuals genotyped for each locality, whilst much more localities and individuals are currently known. This allowed to deduct a general genetic pattern for the Caribbean Magnolias (i.e., little inbreeding, yet high genetic structuring), but species-specific recommendations could not be presented as the data were too limited. To provide clarity on the species delimitation, distribution and genetic diversity of the known remnant populations of the Magnolias of Hispaniola, we integrate in this study species distribution modelling and conservation genetics techniques. Species distribution modelling (SDM) is widely used to address questions in conservation biology, ecology, and evolution (Guisan and Thuiller 2005;Guillera-Arroita et al. 2015). It allows to predict the species' (potential) distributions and the habitat suitability, based on species occurrence data and environmental information. Conservation genetics can resolve fragmented population structure and identify species or populations at risk as a result of reduced genetic diversity (Frankham et al. 2015). Compared to Veltjen et al. (2019), we significantly expand here the sampling of individuals, number of localities and SSR markers, to more comprehensively and accurately quantify the genetic diversity of M. domingensis, M. hamorii and M. pallescens (Medina et al. 2006;Wang et al. 2021). By integrating the results of the SDM and the comprehensive conservation genetic analyses, we formulate concrete conservation actions that maximize the genetic diversity in the long term. In doing so, we strive to maintain healthy populations with enough evolvability to make them flexible and resistant to future disturbances.
We addressed the following objectives regarding the distribution and genetic health of the Magnolias of Hispaniola. Using SDM, we aimed to (1) predict suitable Magnolia habitat for conservation efforts and to guide explorations for new populations. Furthermore, using conservation genetics, we aimed to (2) assess the population structure to infer conservation units, and (3) quantify genetic diversity, enabling us to make statements about the genetic health of the populations. Finally, we integrated results of both disciplines to (4) formulate concrete conservation management actions.

Species distribution modelling
Data preparation: firstly, occurrence data needed to be rarefied over a predefined rarefying distance to minimize spatial autocorrelation. Available occurrence data are often not gathered using systematic surveying methods, but include opportunistic data and typically do not represent a random sampling across the species' range (Phillips et al. 2009). Occurrence data are thus often biased in terms of sampling effort. Without rarefaction, this bias will cause predictions to overestimate habitat suitability in environments similar to the more intensely sampled areas . Secondly, the SDM algorithms needed information about the absence of species in the region, which typically are only rarely available (Zurell 2020). To tackle this lack of reliable absence data, "pseudo-absences" (also referred to as "background data") are generated from a predefined background area (Barbet-Massin et al. 2012). The selection of a proper prevalence ratio (i.e., the number of pseudoabsences relative to the number of presences) is the subject of ongoing debate, but mainly depends on the SDM algorithms applied (Liu et al. 2019). Therefore, we tested different pseudo-absence selection strategies in accordance with Lobo et al. (2010).
Model fitting and evaluation: as algorithms have different underlying assumptions and extrapolate to unsampled environments in a different manner, it has been proposed to combine predictions across multiple modelling methods. This "ensemble modelling" enhances predictive performance and reduces uncertainty (e.g. Hao et al. 2020) in model forecasts. We built and evaluated the performance of the modelling algorithms by cross-validating the data. This model fitting and evaluation strategy divides the data in two subsets: a training set, which trains the model, and an independent test set, which tests its predictive accuracy. Finally, an ensemble of SDM models that agrees with a predefined minimum predictive accuracy was used to obtain final predictions of Magnolia habitat suitability and potential geographic distribution across the study area.
Despite our focus on Magnolias from the Dominican Republic, SDM was executed on Hispaniola as a whole (i.e., including Haiti) as delineating study areas based on ecologically meaningless criteria such as country borders risk resulting in spurious predictions of habitat suitability (Bystriakova et al. 2012).

Occurrence data and predictor variables
Occurrence data of four of the five species from Hispaniola were obtained from Veltjen (2020), Castillo et al. (2018), herbarium records (Online Resource 1) and pers. comm. of Joel Timyan (Société Audubon Haiti). Occurrence data were retained if they had a spatial resolution of ≤ 1" or ~ 0.0083333°. No occurrence data of M. emarginata were considered because there were no recent nor sufficiently precise locality data at the time of our analyses. Similarly, no occurrence data of M. domingensis from Haiti were included. This led to a presence-only dataset of 635 occurrences. This dataset was subsequently rarefied with a rarefying distance of 1 km, reducing it to 30 independent occurrences. Because a minimum of five independent occurrences per species are necessary after rarefying (Broennimann et al. 2012), we opted for an explicit "genus-level"-approach (Phillips et al. 2017;Stas et al. 2020) and performed SDM on all remaining independent occurrences.
Predictor variables were obtained from ENVIREM (Title and Bemmels 2018), CHELSA (Karger et al. 2017) and SEDAC (Venter et al. 2016(Venter et al. , 2018. The included environmental variables describe temperature, precipitation, aridity, evapotranspiration, and topography, and one variable describing cumulative human pressures on the environment. To identify strongly correlated variables, we used the "raster.cor.matrix" and "raster.cor.plot" function in the R package ENMTools (Warren et al. 2010). The Pearson correlation coefficient "r" was used to determine which variables were most strongly correlated (|r| > 0.7). Of the correlated variables, we selected the most ecophysiologically meaningful ones in accordance with Mod et al. (2016) (see Online Resource 2). All variables were obtained with a spatial resolution of 1 km 2 (~ 0.0083333°) or less.

Model fitting and evaluation
The different pseudo-absence selection strategies resulted in 15 ensemble models predicting the suitability of Hispaniola for the focal Magnolia species. Each ensemble model consisted of eight different modelling algorithms, as implemented in the ensemble model-ling framework of the package "sdm" (Naimi and Araújo 2016) in R (R Core Team 2020): two regression-based models (Generalised Linear Model (GLM) & Multivariate Adaptive Regression Splines (MARS)), five machine-learning models (Classification and Regression Trees (CART), Boosted Regression Trees (GBM), Maximum Entropy (MaxEnt), Random Forest (RF) & Support Vector Machine (SVM)) and one likelihood-based model (Maxlike). Each SDM algorithm ran with default settings (Naimi and Araújo 2016). Pseudo-absences were sampled at random in background areas defined as either concentric zones around the presence data (using four different radii (i.e., 10 km, 25 km, 50 km & 75 km)) or across the whole of Hispaniola (i.e., no radius restriction) sensu VanDerWal et al. (2009). Pseudoabsences were not selected from pixels containing Magnolia presences. Pixels correspond with the spatial resolution of the predictor variables (i.e., 1 km 2 ). Three prevalence ratios were used to sample pseudo-absences: neutral prevalence (1:1), twice the number of presences (1:2) and five times as many (1:5). Prevalence ratios were kept low to account for the low number of (independent) Magnolia occurrences, as proposed by Liu et al. (2019). This led to a total of 15 (5 × 3) predictions of Magnolia habitat suitability (on a scale of 0 to 1), which were combined into ensemble predictions. Each individual model was calibrated using a 10-fold cross-validation with 80 − 20% random split of the presence data to serve as training data for each replicate. Model evaluation is given by the True Skill Statistic (TSS), since it is considered the most optimal measure for the performance of predictive models (Allouche et al. 2006). TSS values range from − 1 to + 1, with values of zero or less indicating a performance that is no better than random, while a value of + 1 suggests a perfect model capacity to discriminate between suitable and unsuitable areas. Interpretation of the TSS values follows Allouche et al. (2006). For each of the 15 ensemble models, only "good performing" algorithms, characterized by a TSS-value of > 0.7, were kept and ensemble habitat suitabilities were obtained using simple averaging of the good performing algorithm outcomes.
To obtain a single consensus prediction of likely Magnolia occurrence across Hispaniola, three different methods were used to summarize the 15 ensemble model outcomes above. First, the 15 Magnolia habitat suitability predictions were transformed from continuous suitabilities (0 to 1) to discrete predicted presences versus absences, using the habitat suitability threshold maximizing model TSS as cut-off and setting the suitability for pixels below this threshold to zero. Three final distribution maps were created. One obtained through "vote counting" pixels in every presence-absence prediction, thus creating a gradient from 0 (area was never suitable) to 15 (area was always indicated as suitable). A second by employing a weighted averaging procedure, combining the 15 habitat suitabilities into a single habitat suitability map, weighing individual models by their TSS values (better models contribute more strongly to the final prediction). A third by performing a similar standard "unweighted" averaging. These latter two methods resulted in a continuous prediction of habitat suitability between 0 and 1. An additional ensemble distribution map, specifically for the DR, was established from the "vote counting" distribution map of Hispaniola. Here, we added an additional layer of forest cover from the year 2000, obtained from Sangermano et al. (2015b).

Plant material and DNA extraction
Individuals of Magnolia domingensis, M. hamorii and M. pallescens in the DR were sampled in 2015 and 2021 by haphazard sampling (Ward and Jasieniuk 2009;Veltjen et al. 2019). In total, 417 individuals were genotyped, sampled at 12 distinct localities. For each locality sampled in 2015, one herbarium specimen serves as a morphological voucher for its species identification. Sample information and a map of the sampling locations are given in Table 1; Fig. 1, respectively.
DNA isolation was performed on dried leaf tissue according to a modified cetyltrimethylammonium bromide (CTAB) extraction protocol (Doyle and Doyle 1987), with MagAttract Suspension G solution (Qiagen, Germantown, USA) mediated cleaning (Larridon et al. 2015). A Qubit® 2.0 Fluorometer (Thermo Fisher Scientific, Massachusetts, USA) and Nanodrop 2000 Spectrophotometer (Thermo Fisher Scientific) were used to assess DNA quantity and quality, respectively.

Microsatellite testing and genotypification
Twenty-five microsatellite or Simple Sequence Repeat (SSR) markers from Veltjen et al. (2019) were selected, based on the SSR marker being polymorphic in at least one of the three Magnolia species from the DR. From the localities sampled in 2015 (i.e., BAR, ROD, CAC, COR, CAS, SAL and MON in Table 1), a selection of 24 individuals was used to reevaluate the performance of these microsatellite markers (Online Resource 3). The 24 individuals were chosen to include as much variation as possible on a spatial scale and to have 260/230 and 260/280 optical density (OD) ratios approaching 2. Performance testing was done using simplex (i.e., one microsatellite marker per PCR) three-primer PCRs (Vartia et al. 2014). A reaction contained 2 × QIA Multiplex PCR Master Mix (Qiagen), 5ng/µl DNA, 0.025 µM for each forward primer, 0.1 µM for each reverse primer and 0.1 µM for each specified dye. The selected primer pairs were labelled with a fluorescent dye FAM, NED, PET and VIC, which were linked to their respective universal tail T3, Hill, Neo and M13. PCRs ran on a volume of 5 µl under the following conditions: an initial activation step of 15 min at 95 °C, followed by 35 cycles of 30 s at 94 °C, 90 s at 57 °C and 90 s at 72 °C; and a final extension for 10 min at 72 °C. An ABI 3730XL fragment analyser (Thermo Fisher Scientific) with a GeneScanTM 500 LIZ™ ladder (Thermo Fisher Scientific) was used to quantify the PCR products. The results of these simplex tests were analysed in Geneious v.8.1.9 (http://www.geneious.com, Kearse et al. 2012) using the microsatellite plugin. An SSR marker was included in the subsequent multiplex design and final genotyping if it could be unambiguously scored for all tested species and populations, and if it was polymorphic. Multiplex pools were designed using Multiplex Manager (Holleley and Geerts 2009). PCR conditions and peak calling occurred cf. the simplex testing above. In total, four multiplex pools allowed unambiguous genotypification of the test-individuals, whereafter we genotyped the full dataset of 417 individuals. In total we evaluated 25 SSR markers of which 17 were considered qualitative and unambiguous upon constructing the final dataset.
To account for human and technical flaws such as genotyping errors and null alleles, we ran MICROCHECKER v.2.2.3 (Van Oosterhout et al. 2004) and ML-NullFreq (Kalinowski and Taper 2006). Null alleles are alleles that do not produce a functional end-product or have a mutation in the primer region which may inhibit PCR amplification (De Meeûs 2018;Frankham et al. 2015). MICROCHECKER and ML-NullFreq ran for 1000 and 100,000 repetitions, respectively. No markers were removed. MICROCHECKER labelled five markers with potential null alleles: MA40_045, MA42_001, MA42_059, MA42_126 and MA42_472. However, they were only highlighted in locality MON. ML-NullFreq indicated high values (> 0.1) for MA42_001 and MA42_059 in locality MON and for MA42_203, MA42_241 and MA42_397 in locality TON. TON is a locality with a small sample size; hence, we did not consider these as a true indication for null alleles. Similarly, as null alleles were only found in MON and in no other locality, no markers were removed from the subsequent downstream analyses.
To assure random sampling of the genome, we examined the dataset for the presence of linkage disequilibrium (LD) using the program GENEPOP v.4.7.5 (Raymond and Rousset 1995;Rousset 2008). GENEPOP ran with the dememorization number set to 10,000, with 1000 batches and with 50,000 iterations per batch. Following Waples (2015), both uncorrected and (sequential Bonferroni) corrected p-values were considered. GENEPOP revealed that one marker, MA40_045, showed significant LD with three markers after sequential Bonferroni correction: MA42_231 in ROD, MA40_282 in BAR and MA42_472 in ROD, SAL, and MON. Hence, MA40_045 was discarded to guarantee independent sampling of the genome with respect to other microsatellite markers, bringing the final number of markers to a total of 16 (see Online Resource 4).

Population structure and genetic diversity
To infer the number of conservation units, we assessed the population structure in three ways: STRUCTURE analyses, DAPC analyses and genetic differentiation measurements. In STRUCTURE v.2.3.4 (Pritchard et al. 2000), analyses were executed using four datasets: one with all three focal species and three datasets for each species. STRUCTURE ran under the following conditions: 100 000 Markov chain Monte Carlo (MCMC) replicates after an initial burn-in of 100 000, correlated allelic frequencies and the admixture model. Correlated allele frequencies were selected, given the little phylogenetic differentiation (Veltjen et al. 2022) and conservation genetic differentiation (Veltjen et al. 2019) between the three species. The number of groups (K) was set from one to twenty for the general dataset (all species included), which allowed all 12 sampling localities to be retrieved as a separate population and to account for the possibility of undetected substructures. K was set from one to ten for the species-specific datasets. Each value of K was run ten times for each dataset. A visualization of the results was obtained from Structure Harvester Web v.0.6.94 (Earl and vonHoldt 2012). The optimal K-value was selected based on the ΔK statistic, following Evanno et al. (2005), and the mean maximum likelihood (Mean LnK or − LnK ). Mean LnK proved valuable as ΔK cannot retrieve K = 1 as the most optimal K-value. Visualisation of the STRUCTURE barplots was done with DISTRUCT v.1.1 (Rosenberg 2004). DAPC (Discriminant Analysis of Principal Components) analyses were executed in R using the package "adegenet" (Jombart and Ahmed 2011) on the same datasets as the STRUCTURE analyses. Using the "find.cluster" function, we retained the maximum number of PCs and selected groups on the basis of the Bayesian Information Criterion (BIC). The number of Principal Component Analysis (PCA) eigenvalues was determined using 1000 cross-validation replicates. The PCA of the DAPC was set according to the lowest Mean Squared Error (MSE). All discriminant functions (i.e., DA eigenvalues) were kept. Genetic differentiation measurements comprised: pairwise F ST (Weir and Cockerham 1984), G ST (Nei and Chesser 1983), G' ST (Hedrick 2005) and Jost's D (D J ; Jost 2008). The values and their confidence intervals were calculated for each population and locality using the R package diveRsity (Keenan et al. 2013). Interpretation of the F ST -values follows Hartl and Clark (1997): <0.05 is considered "little", 0.05-0.15 "moderate", 0.15-0.25 "great" and > 0.25 "very great" genetic differentiation.
To characterize the genetic health, we quantified the genetic diversity with diversity statistics for each locality and population as defined in Table 1 and the population structure analyses, respectively. Genetic diversity was estimated by quantifying the following statistics in GenAlEx v.6.503 Smouse 2006, 2012): number of alleles per locus (A), number of private alleles (A P ), observed heterozygosity (H O ), expected heterozygosity (H E ) and the number of polymorphic loci (P). Moreover, allelic richness (A R ) and the inbreeding coefficient (F IS ) were calculated in FSTAT v.2.9.4 (Goudet 2000). A R−X represents the allelic richness rarefied to X number of individuals. To quantify if F IS significantly differed from zero, deviations from Hardy-Weinberg proportions (HWP) were tested in GENEPOP with 2-tailed exact tests for every marker × population combination. If possible, a complete enumeration was executed (Louis and Dempster 1987). Otherwise, MCMC chains were employed with 200 batches and 50 000 iterations (Guo and Thompson 1992).

Species distribution modelling
Out of 39 candidate predictor variables, 21 were retained because they were uncorrelated and considered (more) ecophysiologically meaningful (Online Resource 2). During model evaluation, no ensemble models were discarded as all 15 models had a model accuracy of TSS ≥ 0.7. Values for model performance across prevalence ratios and pseudo-absence sampling distance range from 0.800 to 0.993 ( Table 2). These values exceed 0.7, and therefore indicate good predictive models (Allouche et al. 2006). Model performance for each of the 15 ensemble predictions can be found in Table 2. TSS-values correlated positively with increasing pseudo-absence sampling distance as well as higher prevalence ratios. The three final SDM maps based on vote counting, weighted, and unweighted averaging can be found in Fig. 3 and Online Resource 5. The additional ensemble distribution map for the DR can be found in Fig. 4.

Population structure and genetic diversity
For all four STRUCTURE analyses ΔK and mean LnK plots are presented in Online Resource 6. The ΔK of the STRUCTURE analysis of the entire dataset is three, yet with visual inspection on the mean LnK plot, K = 4 still increases the likelihood. In the species datasets the ΔK for DOM, HAM and PAL is two, confirmed by visual inspection of the mean LnK for DOM and PAL. For HAM, however, there is no large difference in mean likelihood for K = 1 (-5229.80) and K = 2 (-5132.83), thus not leading to a clear asymptotic shape in the mean LnK plot. Hence, we consider K = 1 to be the optimal number of genetic groups for HAM. Representative replicate barplots for the optimal K groups are in Fig. 5. In the full dataset (Fig. 5a) all individuals are assigned with great confidence to one of the four clusters, visible as each barplot having clearly more than 70% coloured in that of one genetic group, viz. M. domingensis (yellow), M. hamorii (pink) or one of the two populations of M. pallescens (green). For the analyses per species, M. domingensis was further subdivided in two populations (Fig. 5b), which roughly followed the two localities (i.e., BAR and ROD), yet 8 individuals had an unclear assignment or were assigned to the locality where they were not sampled (see Online Resource 7). The 8 trees that were assigned to the other locality in more than 70% of the runs have a DBH ranging from 4.7 to 19.9. Figure 6 visualizes the DAPC analysis of the full dataset. The find.clusters function for this dataset showed that the lowest, significant drop in BIC score was at K = 4. Cross-validation for K = 4, resulted in 40 PCAs being retained for the DAPC analyses, in combination with DA = 3. The first axis shows a clear distinction between BAH and the three remaining populations, which are subsequently separated by the second axis. The results from the three other DAPC analyses can be found in Online Resource 7. Considering all four DAPC analyses, multiple, unambiguous genetic clusters were obtained for the full dataset and for PAL.
Pairwise F ST , G ST , G' ST and Jost's D (D J ) values can be found in Table 3 and Online Resource 8. Values between populations range from 0.081 to 0.271 for F ST , from 0.042 to 0.158 for G ST , from 0.149 to 0.550 for G' ST and from 0.081 to 0.378 for D J . Confidence intervals (CIs) of the pairwise indices can be found in Online Resource 9.
Diversity statistics for each population, locality and microsatellite marker are compiled in Online Resource 10. The mean values for the populations and localities can be found in Table 4. One population and two localities, namely EBV, ENT and MON, showed significant deviations from HWP for seven, four and one out of 16 markers, respectively, which resulted in an F IS that was significantly deviating from zero. Based on the three different types of analyses for genetic structure, five populations were defined: BAR, ROD, BAH, EBV and VAL.

Potential Magnolia distributions across Hispaniola
The ensemble SDM predicted the habitat of Magnolia across Hispaniola with great accuracy, as evidenced by the high model evaluation statistics obtained. Moreover, all 66 additional occurrences from the sampling effort in 2021 (which were not included in model fitting) fell within highly suitable predicted areas. Similarly, herbarium occurrence data that  Table 1 Table 3   were deemed too unprecise for the SDM analyses, were often situated in highly suitable habitats as well (green to dark green areas in Fig. 3).
Regions of high habitat suitability for the Magnolias of Hispaniola co-occur with the numerous mountain ranges on the islands, which are annotated in Fig. 1. While we cannot rule out the possibility that our genus-level approach lacks species-specific nuances in Magnolia habitat requirements, we recommend future expeditions in search for undocumented Magnolia populations, to specifically focus on the areas of highest predicted habitat suitability for surveys, summarized in the following sentences. Mountain ranges with the most promising areas of detecting new Magnolia populations in the Dominican Republic are the Cordillera Central, the Sierra de Bahoruco and the Sierra de Neiba (Figs. 1 and 3). Possible remnant populations could be found in the Cordillera Septentrional and the Sierra Martín García, an isolated extension of the Sierra de Neiba. For Haiti, areas of interest reside in the mountain ranges Massif de la Hotte and Massif de la Selle, the Haitian counterpart of the Sierra de Bahoruco. However, previous expeditions to the Massif de la Selle did not find any hitherto unknown Magnolia species (pers. comm. Joel Timyan, Société Audubon Haiti). This can be the result of the species no longer being present in this area due to the severe deforestation in the mountain range itself, exacerbated by its relative proximity to the capital Port-au-Prince. To a lesser degree, possible areas of interest can be found in the Massif du Nord, Montagnes Noires and Chaîne des Matheux in central and northern Haiti (Figs. 1  and 3). Due to an extensive history of deforestation (Hedges et al. 2018) and the absence of  dark green patches of suitable habitat in northern Haiti (Fig. 3) it was deemed unlikely that Magnolia populations still occurred here.
Knowledge about the potential habitat to guide survey designs for rare species is one of the most valuable uses of distribution modelling in ecology and conservation (McCune 2016). It also provides policy makers and on-the-ground conservationists a guideline for spatial prioritisation of conservation efforts (Villero et al. 2017). Concretely, it can serve as the basis for reinforcement efforts and the designation of habitat corridors; and allows to search for the true or full extent of populations more efficiently ). Habitat corridors have successfully been applied by Baruah et al. (2019), Adhikari et al. (2012) and Liu et al. (2018).
Conservation Units and Priorities. By integrating the results of the SDM (Fig. 3, Online Resource 5) and conservation genetic analyses (Figs. 5 and 6; Table 3, Online Resource 6-9), we propose to treat the five genetic Magnolia populations of the DR (i.e. BAR, ROD, BAH, EBV and VAL) as three conservation units (CU), following the morphospecies of Howard (1948).
On the one hand, because no unthreatened, related Magnolia species were included in this research, we can only make statements about the genetic diversity being healthy, low or high when comparing the localities and populations with each other (Spielman et al. 2004;Väli et al. 2008). On the other hand, we can also compare with diversity statistics generated in other studies, however, to be treated with caution. The genetic diversity of the five genetic populations appears 'ample' compared to other conservation genetic studies on (threatened) Magnolia species (Chávez-Cortázar et al. 2021;Aldaba Núñez et al. 2021) and only one out of five populations shows genetic signatures of inbreeding, with a low deviation from zero (Table 4) (Table 4) and occurrence data (Fig. 3, Online Resource 5).
For M. domingensis the STRUCTURE (Fig. 5b) and DAPC (Online Resource 7) analyses on the species-specific dataset indicate two populations with a few migrant/relict trees in each of the localities, while these analyses on the full dataset (Figs. 5a and 6) and the SDM map (Fig. 3) only predict one. Additionally, the genetic differentiation statistics indicate intermediate values (Table 3) and there is a notable number of private alleles that sets these localities apart (Table 4).The genetic diversity of BAR is average across the Magnolias of the DR (Table 4). In contrast, ROD exhibits the worst genetic health among the five genetic populations (Table 4), with the lowest values for any A-statistic (i.e. A-, A R -and A P -values). However, both M. domingensis populations have low and non-significant F IS -values. As ROD exhibits low genetic diversity, genetic substructuring is not pronounced and the localities are linked by suitable habitat, we consider M. domingensis to be one CU, whereby we recommend reversing the ongoing genetic differentiation and low genetic diversity in ROD. This approach allows genetic rescue effects (i.e., a decrease in population extinction probability owing to gene flow; Bell et al. (2019)) between these populations to negate a further decline in genetic diversity (Ingvarsson 2001;Whiteley et al. 2014). Interestingly, due to an increment in sample size compared to Veltjen et al. (2019), we no longer see LD for population ROD. This was attributed to a recent bottleneck and a subsequent lack of random mating to restore linked loci (Slatkin 2008). Now, we can elucidate that this was a random sampling error, but see Waples (2015).
By integrating the genetic structuring results of M. ekmanii from Veltjen et al. (2019) with the SDM analyses (Fig. 3), we see a similar pattern as described for M. domingensis, namely, the species being split genetically in two populations. Based on the results of the SDM-analyses, this disjunction is unexpected considering that both populations are connected by a mountain range with highly suitable potential habitat. Further sampling efforts in the area could resolve this issue. Like the M. domingensis case described above, we treat M. ekmanii as a single CU, considering its genetic substructure and lower genetic health. Noteworthy, new occurrences of M. ekmanii were found relatively recently in the region of Bois Pangnol (easternmost point of the M. ekmanii occurences on Fig. 1; pers. comm. Joel Timyan, Société Audubon Haiti), an isolated location given the geographic distance between other Magnolias in Hispaniola. As these were not included in the genetic analysis, we cannot make inferences regarding their genetic structure or health. We recommend including this new locality in future conservation genetic studies.
According to the STRUCTURE and DAPC analyses (Figs. 5 and 6, Online Resource 7), M. pallescens is divided in two genetic groups: a northern population (EBV) in Ébano Verde Scientific Reserve and a southern one (VAL) in Valle Nuevo National Park (Fig. 3). Between EBV and VAL, we see an F ST -value of 0.147 and a D J of 0.154 (Table 3), which are considered moderate genetic fixation and allelic differentiation, respectively, following the quantification of Hartl and Clark (1997). This indicates that gene flow between the two populations has been low or absent for a significant amount of evolutionary time, allowing the populations to vary in allelic composition and frequency. When we compare infraspecific F ST -and D J -values with other studies on island endemic populations of trees, we find similar results. For example, F ST = 0.000-0.229 for Pinus caribaea var. bahamensis, an endemic from the Bahaman archipelago (Sanchez et al. 2014). Recently, new individuals have been found in the suitable habitat patch between EBV and VAL, Reserva Cientifica Las Neblinas (henceforth RCLN), a previously unknown population of Magnolia pallescens (Figs. 1 and 3). Three historical herbarium vouchers (García 1002 [JBSD], García 1184 [JBSD] and Veloz 3237 [JBSD]) already implied a (historical) presence of M. pallescens in this patch, thus bridging the two populations geographically. These vouchers were excluded from the SDM analysis because of their imprecise occurrence data. In terms of genetic diversity, the population summary statistics for EBV and VAL indicate average genetic health when compared with other Magnolia populations from the DR (Table 4) Interestingly, although the EBV population has a significant F IS -value, no significant F IS -values are found for its two localities. One potential explanation could be that a Wahlund effect was artificially created by merging the two localities (De Meeûs 2018). The Wahlund effect reduces heterozygosity in a population due to population substructuring (Wahlund 1928). The opposite situation from EBV is visible in VAL. Here, the VAL population does not have a significant F IS -value, yet they are significant for two out of the four localities of VAL. We attribute this to unrepresentative sampling across a small spatial scale. In terms of the other genetic diversity statistics, both M. pallescens populations, EBV and VAL, have average values. When comparing these results with Veltjen et al. (2019), no substantial differences are visible, aside from the significant F IS -value for EBV. Because of the recently discovered intermediate population and the significant F IS -value of population EBV and localities ENT and MON (population VAL), we propose to treat M. pallescens as one CU and hence actively aim to reverse ongoing genetic differentiation between the two populations. .
Magnolia hamorii has only one genetic population, BAH, situated in the Natural Protected Landscape Miguel Domingo Fuerte. The population, which represents the species, is most genetically differentiated from any other population as evidenced by the DAPC plot (Fig. 6), the D J -values (Table 3) and the high number of private alleles (A PP ; Table 4). Geographically, the population is also isolated the most, both in distance as well as occurring in a distinct different mountain chain, as compared to M. pallescens and M. domingensis, that reside in different parts of the same mountain chain. Many of the genetic diversity indices (A, H E ) of M. hamorii are the highest among the Magnolias of the DR (Table 4). Its F IS -value of 0.047 (Table 3) is the second highest among all populations, but non-significant and unproblematic as such. We consider M. hamorii as one CU since the sampled localities have one continuous modelled suitable habitat (Fig. 3) and there is no detectable genetic substructure (Table 3; Figs. 5a and 6).
The pattern of limited gene flow, leading to high genetic structuring, and little inbreeding is similar to other Magnolia studies such as those of Aldaba Núñez et al. (2021) and Veltjen et al. (2019). Inbreeding is expected to be minimal as the flowers of Magnolia are protogynous (Thien 1974) and promote outcrossing (Tamaki et al. 2009). However, geitonogamy has been reported in other Magnolia species (Bernhardt and Thien 1987;Hirayama et al. 2005;Tamaki et al. 2008) and regardless of selfing, inbreeding patterns can occur from mating with relatives e.g. in small and/or fragmented populations. The strong inbreeding depression in the early life stages of trees ensures that adult plants result solely from outcrossing (Sorensen 1999), increasing outcrossing rates and maintaining genetic diversity (Petit and Hampe 2006). High genetic diversity is desirable as their sessile lifestyle allows evolution of locally adapted ecotypes. Additionally, genetic diversity is "stored" in their great longevity, allowing genetic variants to circulate a long time within the populations (Alberto et al. 2013). This allows for potential reinforcement of genetically degraded populations (Aitken et al. 2008).
The results we report on the genetic diversity and habitat availability strongly nuance the Red List assessment categories of the studied species (Rivers et al. 2016) and confirm publications such as Vellend and Geber (2005) and Rivers et al. (2014). A first example: M. ekmanii has a relatively large potential habitat in Massif de la Hotte, exceeding that of M. domingensis (Fig. 3), which has comparable low values for its genetic diversity (Table 4 and Veltjen et al. 2019), while both are considered Critically Endangered (Rivers et al. 2016;Wheeler 2015). Hence with this information, M. domingensis can be considered in more dire need of urgent conservation action than M. ekmanii. A second example: M. pallescens and M. hamorii are both considered as Endangered, yet, when aligning the geographic results and genetic results we see an inverse relationship. Magnolia pallescens has the highest (potential) habitat, yet average genetic diversity with inbreeding, while M. hamorii has a smaller (potential) habitat, yet with higher genetic diversity and no inbreeding. This second example stresses that nor the current (estimated) geographic extent and knowledge on population trends are indicative of the true genetic diversity of the species, which relies on demographic history over a longer time.
Guidelines for conservation management. To ensure a sustainable future for the Magnolias of Hispaniola, we propose to centralize conservation strategies around three main actions: exploration, protection, and reinforcement. Preferably for all the Magnolia species of Hispaniola, yet, as indicated in the previous paragraph, when resources are limited, following the defined priorities based on integrated genetic and SDM data (i.e. 1) M. domingensis & M. ekmanii, 2) M. pallescens and 3) M. hamorii).
Further exploration of the contemporary distribution of the Magnolias of Hispaniola should use Fig. 3 as a guideline. We propose the following strategies to search for new Magnolia individuals/populations, with decreasing priority. Initially, the highly suitable areas between the populations of M. domingensis and M. ekmanii should be explored, followed by nearby highly suitable, mountainous areas in the Massif de la Hotte, Sierra de Bahoruco and Cordillera Central. Subsequently, a search for remnant populations of M. emarginata and/or M. domingensis in Massif du Nord and Montagnes Noires. Finally, areas with no prior indication of Magnolia populations such as Sierra de Neiba, Sierra Martín García and Chaîne des Matheux should be further explored. Already, an exploration between the populations of M. domingensis has already been set up (Ramón Castillo Torres, pers. comm.).
Protection aims to conserve and expand the species' habitat, TMCFs. Where possible, the Magnolias' attractive properties can be used as an argument to expand or form new protected areas. In other words, use their potential as flagship species to draw attention to unprotected areas on the one hand and their potential as umbrella species to conserve not only the Magnolias, but TMCFs and their diversity on the other hand. Given that an adequate amount of genetic diversity is available within the conservation units, the focus should be on the expansion of protected areas. Ideally, these areas should strive to connect forest fragments in the landscape and the different populations as such. These habitat corridors (Christie and Knowles 2015) are especially important for the populations of M. ekmanii and M. domingensis, should no natural corridor exist. When successful, gene flow between populations should increase genetic diversity. In the case of M. hamorii, its population is under threat by the expansion of coffee plantations and tourism by the government. The latter has largely been negated by the Fundación Progressio (pers. comm. Ramón Elías Castillo Torres, Fundación Progressio). The major conservation action for M. hamorii therefore constitutes reducing future threats to the population.
Although protection is considered the most important goal in the long term, reinforcement is essential to cross this gap during which sufficient protection is established. Reinforcement aims to preserve and enhance genetic diversity and population sizes. Our interpretation of "reinforcement" follows Seddon et al. (2014): "The release of an organism into an existing population of conspecifics to enhance population viability." Since gene flow between populations is limited, illustrated by the lower genetic health of ROD and a significant inbreeding value in EBV, we propose to execute reinforcement efforts between populations (Brichieri-Colombi and Moehrenschlager 2016). Here, populations BAR for Magnolia domingensis and VAL and RCLN for Magnolia pallescens will function as source populations of genetic diversity and ROD and EBV as their respective receptive sink populations. The benefits of minimizing loss of genetic diversity due to inbreeding and fragmentation outweighs the undoing of potential local adaptations. We encourage conservation practitioners to raise seedlings in nurseries and plant them close/within the sink population as randomly distributed as possible. Moreover, new populations could be established in suitable Magnolia habitat in the vicinity of known populations, preferably in an area secure from large scale threats, such as the area between ROD and BAR (M. domingensis). Nurseries should avoid hybridization and potential subsequent outbreeding depression (Keller et al. 2000;Tamaki et al. 2017) at any cost. Propagation efforts in nurseries are ongoing for the Magnolias of the DR, with mixed results (Castillo et al. 2018). New M. pallescens individuals are being planted in degraded areas near VAL; and also in nearby gardens, green areas of businesses etc. around EBV by the goodwill of the people. In the latter case, we advise to use individuals from VAL and RCLN as source material. For M. hamorii and M. domingensis, 1690 and 130 new individuals, respectively, have recently been planted with more on the way (Ramón Castillo Torres, pers. comm.). We emphasize the importance of collecting seeds from multiple localities and from as many different trees as possible across a large spatial scale. This ensures representative subsampling of the genetic diversity within the conservation unit (Oldfield and Newton 2012). To ensure and maximally extrapolate success, we propose to instate a monitoring programme to keep track of the survival rates of the seedlings and mature trees. This allows a quick and swift respond to new threats (Godefroid et al. 2011).

Conclusion
The SDM analyses suggest that the potential distribution is restricted to the various mountain ranges on Hispaniola. We indicate the mountains of the Montagnes Noires as a priority area to search for more remnant populations of the highly endangered M. emarginata and M. domingensis in Haiti. The Sierra de Neiba is the most promising area, without prior known Magnolia occurrences, to search for new populations in the Dominican Republic. Highly suitable habitat is present between known populations of M. domingensis, M. ekmanii and M. pallescens. These patches are highlighted for future reinforcement efforts should no natural habitat corridor exist. The Magnolias of the DR consist of five genetic populations, representing three species. We recommend each species to be treated as its own conservation unit (CU). The pattern of little inbreeding for Caribbean Magnolia species seems to persist in our expanded sampling. Population "Ébano Verde" and localities "Entrance" and "Montellano" exhibit minimal, but significant, inbreeding. Out of the three DR Magnolia species, M. domingensis is highlighted as it contained the lowest genetic diversity, the smallest number of (known) individuals, the lowest number of sampling localities and the lowest amount of potential habitat in its vicinity. For conservation management, we propose to focus on three actions: exploration, protection, and reinforcement. The main recommendation includes protection and instatement of habitat corridors between populations of M. domingensis and M. ekmanii, whereby the SDM results will serve as a guideline for spatial prioritization.