Climate change impacts on migration of Pinus koraiensis during the Quaternary using species distribution models

Clarifying the influences of paleoclimate changes on the disjunct distribution formation of plants allows a historical and mechanical understanding of current vegetation and biodiversity. This study investigated the influences of paleoclimate changes on the present disjunct distribution formation of Pinus koraiensis (Korean pine) using species distribution modeling. A species distribution model (SDM) was built using maximum entropy principle algorithms (MaxEnt), data from 152 occurrences of the species, and four bioclimatic variables at 2.5 arcminute (approximately 5 km) spatial resolution. The simulation revealed the excellent fit of the MaxEnt model performance, with an area under the curve (AUC) value of 0.922 and continuous Boyce index (BCI) value of 0.925 with fivefold cross-validation. The most important climatic factor was the minimum temperature of the coldest month. Suitable habitats for the species ranged between − 30.1 and − 4.1 °C. Projected suitable habitats under the Last Glacial Maximum (approximately 22,000 years ago [ka BP]: LGM) period showed wide distributions in eastern China, the southern part of the Korean Peninsula, and the Japanese Archipelago. After the mid-Holocene (approximately 6 ka BP), the suitable habitats expanded northwards in continental regions and retreated from both north and southwest of Japan. This eventually formed disjunct suitable habitats in central Japan. An increase in temperature after the LGM period caused the migration of P. koraiensis toward new, suitable habitats in continental Northeast Asia, while species in the Japanese Archipelago retreated, forming the present disjunct distributions.


Introduction
The factors that contribute to the establishment of geographical disjunct distributions of plant species have long captured the interest of botanists and ecologists (e.g., Wood 1972;Wen 1999;Qian and Ricklefs 2000;Richard 2006). Climatic changes over the glacial and interglacial periods during the Quaternary period (over the past 2.6 million years) have caused the expansion and retraction of plant habitats, local extinctions, and the development of present-day disjunct distribution patterns (Hewitt 1996;Comes and Kadereit 1998;Svenning et al. 2008;Chen et al. 2012;Shitara et al. 2018). Accordingly, elucidating the processes whereby the disjunct distributions of plant species develop in relation to paleoclimate change provides an important context for understanding the development of present-day vegetation and biodiversity .
Northeast Asia is composed of three landmass elements: continental regions, such as the eastern to northeastern parts of China and around the southern parts of Russian Far East; peninsulas extending south from the continent, such as the Korean Peninsula; and the East Asian Islands running parallel to the continent, such as the Japanese Archipelago (Fig. 2a). Cool and cold temperate forests are among the forest types that are most representative in these regions and are characterized by higher plant species richness than other forests at the same latitude, such as those in North America and Europe (Latham and Ricklefs 1993;Qian and Ricklefs 2000).
In these forest zones, some of the main canopy tree species such as Pinus koraiensis Siebold et Zucc. (Korean pine), Picea sect. Picea, and Betula davurica are widely distributed in continental regions (Kolbek 2003;Krestov et al. 2006) and are also distributed in the Japanese Archipelago, where they occur only in a few montane regions, showing present-day disjunct patterns of geographical distribution (e.g., Katsuki et al. 2004;Aizawa et al. 2012;Shitara et al. 2018). Macro-fossil and pollen fossil records indicate that these species were widely distributed in the Japanese Archipelago during the Last Glacial Maximum (LGM; between ca. 27 and 19 thousand years ago [ka BP]; Clark et al. 2009), and rapidly reduced from most areas in response to the global warming that occurred thereafter (e. g., Tsukada 1983;Morita 2000;Takahara et al. 2010). However, how and why post-LGM climate changes have influenced these distribution shifts remains controversial.
Under such circumstances, species distribution models (SDMs) are effective modern tools that quantitatively determine which environmental variables are most strongly correlated with species distributions and can be used to project suitable and vulnerable habitats under climate change scenarios (Pearson and Dawson 2003;Elith 2006;Phillips et al. 2006;Svenning et al. 2011). However, the uncertainty regarding the former habitats of species projected by SDMs using paleoclimatic data should ideally be verified by evidence such as that provided by macrofossil records (Svenning et al. 2011;Varela et al. 2011;Roberts and Hamann 2015).
P. koraiensis can be considered to be ideal for investigating the possible impacts of climate on the development of disjunct species distributions, because numerous macro-fossil records of this species dating back to the LGM have been found in the Japanese Archipelago (Miki 1956;Tsukada 1983;Morita 2000). This species occurs mainly in the Amur and Maritime provinces of the Russian Far East, northeast China, the Korean Peninsula, and the Japanese Archipelago (Okitsu and Momohara 1997;Potenko H Velikov 1998). However, although this species is widely distributed in continental Northeast Asia (Potenko and Velikov 1998;Kolbek et al. 2003), it rarely grows naturally in the Japanese Archipelago, where it is primarily confined to relatively small areas in the central mountainous regions of Honshu and Shikoku, thus indicating its present-day disjunct distribution (Okitsu and Momohara 1997;Aizawa and Kaneko 2009;Fig. 2a). Macro-fossil records indicate that this species was widely distributed in the Japanese Archipelago during the LGM, but rapidly retreated from most areas thereafter (Okitsu and Momohara 1997;Morita 2000;Okitsu 2002). Moreover, few previous studies have shown that P. koraiensis is strongly affected by climate change (Xu and Yan 2001;He et al. 2005;Zhang et al. 2014;Ahn et al. 2015;Belyanin and Belyanina 2019).
The aim of this study was to identify the climatic factors that have influenced the distribution of P. koraiensis and determine how paleoclimate change has affected the distribution and development of the present-day disjunct distribution of P. koraiensis in Northeast Asia. To this end, we initially constructed an SDM using the present distribution of P. koraiensis, and identified the climatic variables that were strongly correlated with the species distributions. Based on these relationships, habitats considered suitable for the species were projected under paleoclimate conditions in the LGM and mid-Holocene (ca. 6 ka BP) to examine the distribution patterns of P. koraiensis in the past. Additionally, to verify the accuracy of the model, we overlapped the projected suitable habitats at the LGM with the distribution locations of the macro-fossil records for P. koraiensis in the Japanese Archipelago from the LGM period.

Target species
P. koraiensis is an evergreen coniferous tree species that grows to heights exceeding 30 m. This is one of the dominant canopy tree species in the cool and cold temperate conifer and mixed conifer-broad-leaved forests in Northeast Asia. The seeds of this species are an important source of food for red squirrels (Sciurus vulgaris), Siberian chipmunks (Eutamias sibiricus), Eurasian nuthatches (Sitta europaea), and Eurasian nutcrackers (Nucifraga caryocatactes), which also function as agents of seed dispersal (Miyaki 1987;Hutchins et al. 1996). In this regard, the Eurasian nutcracker, which occasionally caches seeds in the soil, is particularly important (Hutchins et al. 1996). The timber of P. koraiensis is mainly used for construction purposes, and is edible; the seeds are a valuable source of local income (Thomas and Farjon 2013). In addition, mixed deciduous broad-leaved and P. koraiensis forests provide important habitats for wildlife, including the Siberian tiger (Panthera tigris altaica; Thomas and Farjon 2013). In recent years, however, the distribution range of this species has been markedly reduced owing to logging activities (Thomas and Farjon 2013). In the Japanese archipelago, although P. koraiensis is included as a ''lower risk species'' in the Japanese Red Data Book (Environment Agency of Japan 2012), the populations of this species are quite small compared to that of Abies veitchii, Picea jezoensis var. hondoensis, and Tsuga diversifolia which are common species in the subalpine zone of Honshu. In addition, a P. koraiensisdominated forest is very rare (Okitsu and Momohara 1997).

Study area
The study area (120-150°E and 30-60°N) covers the actual distribution of P. koraiensis. The climate of the Asian continent is regulated by moist oceanic winds, with Asian monsoons during the summer and dry continental winds during the winter (Krestov et al. 2006;Nakamura et al. 2007;Li et al. 2015). In contrast, continental islands, including those of the Japanese Archipelago, are affected by a humid oceanic climate and heavy precipitation throughout the year (Nakamura et al. 2007).

Collection of species occurrence data
In this study, data on the occurrence of P. koraiensis in continental Northeast Asia were obtained to cover the distribution range of the actual species from various sources (specimens with recorded location and distribution maps; Kharkevicz 1989;Aizawa et al. 2012;Zhang et al. 2014;Chang and Kim 2015;Kan et al. 2015;Okitsu et al. 2016;and Lyu et al. 2017). Recently, data from the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/) have been frequently adopted as a source of species occurrence data. Unfortunately, as a result of our search on the occurrence of this species, we found that there are many spatial biases in the Korean Peninsula, that is, there are many occurrence data in South Korea, but very few in other areas, particularly in North Korea for the present time (as of April 23, 2021). Moreover, there are errors in some locations in the Japanese Archipelago. For instance, in GBIF, we found some occurrence points of P. koraiensis that are recorded outside of the species' native range. Under such circumstances, we decided not to use the occurrence data of GBIF in this study, because these spatial biases and location errors potentially distort species distribution modeling (Hortal et al. 2008;Phillips 2008;Beck et al. 2014).
The sources collected for this study do not specify location latitudes and longitudes; specimen data and distribution maps were georeferenced for digitalization of the occurrence points using the georeferencing tools available in ArcGIS version 10.6. Species occurrence data for the Japanese Archipelago were collected for the years ranging from 2015 to 2017 by conducting field surveys using a global positioning system. To reduce the effects of spatial autocorrelations, if more than two occurrences were recorded for a pixel (2.5 arcminute, ca. 5 km), which is the same size as the resolution of WorldClim used as the environmental variables in this study, only one presence record was used for the construction of the models. As a result, we obtained data on 152 occurrences in total, including secondary sources and the Japanese field survey (Figs. 1,2a). Furthermore, 10,000 background points as absence were randomly selected, which is recommended for regression-based modeling and to sufficiently sample the background calibration environment (Guevara et al. 2018).

Collection of climate data
Only climatic factors were used to generate our SDMs, because the aim of this study was to assess the impact of paleoclimate change on the distribution of P. koraiensis at a large scale. It has been emphasized in several studies that climate is the most important factor for plant distribution on a large spatial scale with coarse resolution (Matsui et al. 2004;Higa et al. 2013;Ahn et al. 2015;Tang et al. 2018). For instance, Matsui et al. (2004) suggested that their model was projected for the distribution of Fagus crenata using four climate variables, namely, topography, soil, surface geology, slope, and aspect with a 1 km spatial resolution across Japan, and the contribution of the four climate variables reached 92.6%. Moreover, according to an SDM study projected with variables for climate, topography, soil, surface geology, slope, aspect, river, and geological age, for 7 tree species with 1 km spatial resolution in Japan, the relative importance of non-climatic variables were only 7.0-22.3% (Higa et al. 2013). For the reason that the spatial scale is lager and resolution of our model is higher than those in previous studies, we consider that the influence of non-climatic variables in determining the distribution is lower. In addition, there are no paleo-non-climatic variables such as terrain and soil in this study area to predict past suitable habitats for the present time. Thus, we did not include the nonclimatic variables in our models.
We downloaded 19 bioclimatic variables (2.5 arcminute resolution) related to the present climate  from the WorldClim 1.4 dataset website (https://www.worldclim.org/; see also Hijmans et al. 2005) covering the present distribution range of the species and neighboring areas. Projecting of suitable habitats for the LGM (ca. 22 ka BP) and mid-Holocene (ca. 6 ka BP), we used two general circulation models (GCMs) for the climate scenarios for the LGM and mid-Holocene to account for variation and uncertainty in model predictions: The Community Climate System Model Version 4 (CCSM4: Gent et al. 2011) and the Model for Interdisciplinary Research on Climate Earth System Model (MIROC: Watanabe et al. 2011) were downloaded from WorldClim 1.4 (resolution: 2.5 arcminute). Both GCMs have high resolution, and have been frequently used in many studies, given that they are considered to have high accuracy when applied to Northeast Asia and Japanese Archipelago scales (Chen et al. 2012;Sakaguchi et al. 2012;Worth et al. 2013;Kimura et al. 2014;Tsuyama et al. 2014).

SDM construction
First 12 bioclimatic variables, which could be interpreted physiologically or ecologically, were selected from 19 bioclimatic variables as candidate variables for model construction (Supplementary Information  Table S1). The 12 selected bioclimatic variables are critical for plant growth and survival (Tanaka et al. 2006(Tanaka et al. , 2012. Temperatures of the cold and dry seasons indicate extreme cold, which affects the upper and northern range limits of plants. The temperatures of the warm and wet seasons indicate heat during the growing season. Precipitation during the cold and dry seasons indicates snow cover in cold regions and water  supply in warm regions. Snow cover influences plant survival by facilitating insulation and snow pressure during winter and a source of water during spring (Tanaka et al. 2012). Precipitation during the warm and wet seasons indicates the growing season and water supply.
Second, we selected the most suitable bioclimatic variables to explain species distribution among the 12 bioclimatic variables based on the following statistical climatic variable selection methods (see Tang et al. 2017Tang et al. , 2018. To avoid deterioration of the prediction accuracy due to multicollinearity, correlation analysis was performed using the 152 occurrence points and the 10,000 background points within the study area, and bioclimatic variables with high multicollinearity (Pearson correlation coefficient (r) C|0.80|) were grouped based on the collinearity (Table S1), resulting in four groups of variables, each with three variables (Table S2): Group A (temperature variables for warm and wet season: MaWM (Bio5), MeWtQ (Bio8), MeWaQ (Bio10)), Group B (temperature variables for cold and dry season: MiCM (Bio6), MeDQ (Bio9), MeCQ (Bio11)), Group C (precipitation variables for warm and wet season: PWtM (Bio13), PWtQ (Bio16), PWaQ (Bio18)), and Group D (precipitation variables for cold and dry season: PDM (Bio14), PDQ (Bio17), PCQ (Bio19)). Furthermore, we created all the possible combinations of 12 bioclimatic variables that contained at least two variables and excluded the combination that included more than two variables within the same group of highly correlated variables. Using this procedure, the number of candidate combinations of the variables was 243. Moreover, the variance inflation factors (VIFs) of all combinations were calculated using vif function of usdm package (Naimi et al. 2014) in R Version 4.0.3 (R Development Core Team 2020), and the combinations with VIF values [ 5 were excluded. Using this procedure, the number of final candidate combinations of the variables was 221. To select the best combination of variables, we first set the beta-multiplier at 0 and constructed all 221 models using maximum entropy principle algorithms (MaxEnt; version 3.4; https:// biodiversityinformatics.amnh.org/open_source/maxent/) on the basis of the 152 occurrence points and above the 10,000 background points in all 221 combinations of variables. The main merit of MaxEnt is that it can be applied to presence-only data and provides excellent predictive performance for this type of data (Elith et al. 2006;Hernandez et al. 2006). The best combination of variables, which is the lowest AICc value combination of variables, was selected using the corrected Akaike information criterion (AICc; Hurvich and Tsai 1989; Table S3). As a result, the best combination of variables was the minimum temperature of the coldest month (MiCM; Bio6), mean temperature of the warmest quarter (MeWaQ; Bio10), precipitation of the warmest quarter (PWaQ; Bio18), and precipitation of the coldest quarter (PCQ; Bio19; Table 1). The four variables are often used as major and important variables for SDMs of plant species' distributions in Northeast Asia, because the variables can be described and discussed biologically or ecologically (e.g., Tanaka et al. 2012;Kimura et al. 2014;Tsuyama et al. 2014). We simulated suitable P. koraiensis habitats MaxEnt based on given occurrences and environmental variables (Phillips et al. 2006). Different betamultipliers were assessed from 1 to 3.5, in 0.5-unit steps, and the best model allocated set the best betamultiplier value of 1.5, based on AICc.

Model evaluation
To assess the model performance, we employed k-fold cross-validation. In this study, fivefold partitioning was used to construct the testing (20%) and training data (80%) from occurrence records. Each model was assessed using the area under the curve (AUC) of receiver operating characteristic (ROC) analysis (Fig. 1). The mean of the 5 MaxEnt predictions was calculated and was mapped using QGIS version 3.10.
Additionally, we assessed model performance using the continuous Boyce index (CBI; Boyce et al. 2002). CBI only requires presences and measures the extent to which the predicted suitability differs from a random distribution of observed presences. This is consistent with a Spearman correlation with values ranging from -1 to ? 1 (Boyce et al. 2002;Hirzel et al. 2006). The threshold for obtaining a map of absence and presence was applied to the maximum sensitivity plus specificity logistic threshold. This threshold method is available and robust for presence/ absence data as well as presence-only data, because it can be used with presence-only data when random points are used instead of true absences (Liu et al. 2013. Therefore, this threshold has been frequently used in many studies. The probability of presence was mapped as a continuous value between the threshold and 1. MaxEnt estimates the importance of each variable for suitable habitats based on two criteria: percent contribution and permutation importance. Percent contribution represents the relative contribution of each environmental variable to the MaxEnt estimate (Phillips et al. 2006). The permutation importance of each variable is calculated based on the fact that each environmental variable on training presence and background data is permuted randomly (Phillips et al. 2006), that is, high values of permutation importance indicate that the accuracy of the estimate is reduced considerably by changing the permutation of the variable (Phillips et al. 2006;Phillips 2010). The relationships between habitat suitability for a species and bioclimatic variables were also examined using the response curves generated by MaxEnt. The series of work processes in this study were summarized as an ODMAP checklist (Table S4) with reference to the standard protocol described by Zurell et al. (2020).

Projection of the suitable habitats at the Last Glacial Maximum and mid-Holocene
Suitable habitats in the LGM and mid-Holocene has been projected using the two GCMs (CCSM4 and MIROC). The coastlines of the LGM projecting results were masked using a landmass shapefile from Assis et al. (2018). Moreover, the area of suitable habitat (in km 2 ) over the threshold for all models at each time slice was calculated using the ''areaPolygon'' function in the geosphere R package (Hijmans 2019).
Validation of suitable habitats using fossil records from the LGM To verify the accuracy of the projected LGM suitable habitats, we collected the macro-fossil records for Japan during the LGM from different literature sources, thereby obtaining fossil records for 19 occurrence locations (Table S5). The data for both suitable habitats and macro-fossil records of P. koraiensis in the LGM were mapped using QGIS version 3.10.

Model performance and present suitable habitats
The overall mean test AUC was 0.922 ± 0.003, and the mean CBI was 0.925 ± 0.05 on the fivefold crossvalidation. These values were considered excellent fits (Swets 1988;Boyce et al. 2002;Thuiller et al. 2003). Habitats considered suitable for P. koraiensis (projected using the maximum sensitivity plus specificity logistic threshold value of 0.1468) were identified in the mid to northern regions of the Korean Peninsula, the Russian Far East (the northern limit is near the Amur River), a part of northeast China, central Honshu, and some western regions of the Japanese Archipelago (Fig. 2b). The total area of suitable habitats over the maximum sensitivity plus specificity logistic threshold value (0.1468) was 806,228 km 2 , 692,856 km 2 in continental Northeast Asia, and 113,372 km 2 in the Japanese Archipelago ( Table 2).
The habitats under high suitability (0.5-1) accounted for 28.6% (200,094 km 2 in continental Northeast Asia and 30,783 km 2 in the Japanese Archipelago) of the total area of suitable habitats (Table 2) and were intensively distributed in the northern part of the Korean Peninsula and central Honshu of the Japanese Archipelago. Suitable habitats in continental Northeast Asia were found to be widely and continuously distributed and corresponded well with the actual occurrences of the species. In contrast, suitable habitats were shown to be discontinuously distributed in the Japanese Archipelago. Moreover, in large parts of eastern Hokkaido, northern Honshu, and Sakhalin, areas considered to be suitable habitats are currently devoid of P. koraiensis (Fig. 2b) and are accordingly identified as ''empty habitats'' for this species (Armonies and Reise 2003;Tsuyama et al. 2014;Tang et al. 2017). The results obtained for the percent contribution of climatic variables revealed that the most important bioclimate variable in the model was the minimum temperature of the coldest month (MiCM), followed by precipitation of the warmest quarter (PWaQ), mean temperature of the warmest quarter (MeWaQ), and precipitation of the coldest quarter (PCQ). Based on permutation importance, MiCM was found to be the most important variable, followed by MeWaQ, PWaQ, and PCQ ( Table 1). The response curve showed that P. koraiensis was most probable when MiCM was between -30.1 and -4.1°C (Fig. 3a). In PWaQ, the species exceeded the threshold when it reached 306.3 mm and was the highest at 866.2 mm (Fig. 3b). The species was most probable where MeWaQ was between 9.9 and 24.1°C and PCQ was between 16.2 and 460.0 mm (Fig. 3c, d).

Suitable habitats during the last glacial maximum and mid-Holocene
The projections for P. koraiensis during the LGM, obtained using both GCMs assessed in the present study (CCSM4 and MIROC), indicated that suitable habitats were distributed on the continental shelf located on the eastern side of eastern China, south of the Korean Peninsula, and on the Japanese Archipelago, excluding most regions of Hokkaido (Fig. 4a, b). However, in the CCSM4, suitable habitats were distributed in the coastal regions of Russian Far East and the southern part of Hokkaido, and these areas were not predicted to be suitable habitats in the MIROC (Fig. 4a, b). The area of suitable habitats over the threshold in the LGM was slightly wider than that LGM (CCSM4) 822,283 417,767 404,515 427,966 232,167 195,799 394,316 185,599 208,716 LGM (MIROC) 1024,271 464,239 560,032 693,473 330,632 362,841 330,797 133,606 197,191 of the present suitable habitats, but the area of habitats with high suitability in the Japanese Archipelago was approximately 4.3 (MIROC) to 6 (CCSM4) times larger than it is in the present (Table 2).
LGM macrofossil records collected from the Japanese Archipelago were obtained from 19 locations in Honshu and the northern part of Kyushu (Fig. 4c, d). Seventeen locations of the macro-fossil records overlapped with the suitable habitats over the threshold in CCSM4, and 16 overlapped with the habitats in MIROC (Table 3). No fossils were found in the areas of southern Hokkaido, the Kii Peninsula, Shikoku, and the southern part of Kyushu, where suitable habitats were projected to have been distributed. During the mid-Holocene, suitable habitats projected under both scenarios were found to approximate the present suitable habitats (Fig. 5), but the areas were narrower than the present suitable habitats (Table 2). In particular, in MIROC, the area of habitats with high suitability (over 0.5) in continental Northeast Asia was only 40,071 km 2 (Table 2).

Discussion
The effect of climate factors on the present distribution of P. koraiensis In previous studies, the relationship between the distribution changes of P. koraiensis and climate change was examined at local scales only (Xu and Yan 2001;Ahn et al. 2015; Belyanin and Belyanina 2019). In contrast, in our study we investigated the distribution shifts of P. koraiensis with post-LGM climate change in the whole of Northeast Asia, using the SDM approach. Our model identified MiCM (i.e., cold temperatures in winter) as being the most important climatic factor contributing to shifts in the distribution of P. koraiensis (Table 1), which is consistent with previous regional SDM results for the species (Ahn et al. 2015). In our results, the suitability of P. koraiensis exceeded the threshold of predicted suitability in the range of MiCM between -30.1 and -4.1°C (Fig. 3a). This climatic range approximately corresponded to the actual latitudinal distribution range of P. koraiensis (Figs. 1, 6e). Thus, MiCM appears to be the main climatic factor that influences the latitudinal distribution of P. koraiensis.
Precipitation appears to have affected the distribution of P. koraiensis, mainly in the more central regions of continental Northeast Asia and part of the Japanese Archipelago. Our response curves showed that the suitability of P. koraiensis was below the threshold at PWaQ 306.3 mm, which is the second most important variable in the percent contribution (Table 1; Fig. 3b). The eastern edge of northeastern China, the coastal areas of Russian Far East, the Korean Peninsula, and the Japanese Archipelago have high year-round precipitation, because these regions are influenced by maritime air masses in the summer, that is, the Asian summer monsoon. However, in the inner regions (western regions) of continental Northeast Asia, which are not influenced by summer monsoons, periods of extreme drought often occur (Li et al. 2015). Our results suggest that P. koraiensis is less likely to grow in the inner regions of continental Northeast Asia with low precipitation. Moreover, the PCQ response curve showed declining habitat suitability where the PCQ exceeded 200 mm (Fig. 3d). This suggests that excessive precipitation or snowfall in the coldest months decreases the suitability of the habitat. This phenomenon appears on the Sea of Japan side of the Japanese Archipelago (Fig. 6t). Excellence of our model performance (the AUC = 0.922 and the CBI = 0.925) shows that the inclusion of only climate variables at this spatial extent and resolution is warranted. Including additional variables such as terrain and soil might only modestly increase model performance.
Distribution shifts of P. koraiensis in continental Northeast Asia from the LGM Our models based on two GCMs for the LGM showed that suitable P. koraiensis habitats retreated to the continental shelf in eastern China, the Korean Peninsula in continental Northeast Asia during the LGM (Fig. 4a, b). This finding supports the phylogeographical study by Aizawa et al. (2012), who speculated that the Korean Peninsula, Changbai Mountains in northeast China, and the Japanese Archipelago played roles as refugia for the species during the LGM. This retreat was considered to have been caused by a decline in MiCM (Fig. 6a, b). Bao et al. (2015) indicated that, in addition to these regions, the Xiaoxingan region was a further LGM refugium. Our models did not identify any areas of suitable habitat in this region during the LGM (Fig. 3a, b), which may have been attributable to the coarse resolution of our model. This region may have served as an area comprising ''microrefugia'', which are locations characterized by complex topographic features that support the existence of populations of species during periods of range contraction due to unfavorable climate conditions (Bao et al. 2015;Hylander et al. 2015). To resolve this issue, although there are no detailed paleo-edaphic data yet,  Values in bold font indicate the predicted suitability over the threshold ([ 0.1468). Italic values indicate the predicted suitability over the high threshold ([ 0.5) biologically relevant paleoenvironmental data such as topography, soil types, and land cover with finer scale resolution will be developed, and higher quality and resolution SDM should be constructed in the future. Our models predicted that the suitable habitat expanded northward in continental Northeast Asia in post-LGM (Fig. 5), which is consistent with previous studies (Potenko and Velikov 1998;Aizawa et al. 2012;Bao et al. 2015;Belyanin and Belyanina 2019). This expansion is assumed to have occurred in response to an increase in MiCM during in this period due to the development of the East Asian summer monsoon after the LGM (Fig. 6c- Kan 2004;Wu et al. 2016). The northern limit of suitable P. koraiensis habitat during the mid-Holocene was considerably farther north than that during the LGM (approximately 1200 km based on the MIROC scenario). It is considered that this northward expansion was facilitated, at least, by the dispersal activities of the Eurasian Nutcracker, which has been estimated to carry the seeds of P. koraiensis over distances of 4 to 5 km (Changhu and Yueqi 1996;Hutchins et al. 1996;Changhu 2002).
Distribution shifts and disjunct distribution formation of P. koraiensis in the Japanese archipelago from the LGM Predicting past suitable habitats using SDM is immensely useful for discussing the past distribution of a target species, but there is uncertainty in this single approach as well as the inherent limitations of climate reconstructions (Svenning et al. 2008(Svenning et al. , 2011. In this regard, assessment of both SDMs and macrofossil records reduces uncertainty of the projection by SDMs and allows for more confidence in the interpretation than based on a single approach alone. Our models based on two GCMs for the LGM, showed that Hatched areas indicate suitable habitats based on model thresholds for Pinus koraiensis under the different climate scenarios P. koraiensis had wider and more continuous distribution in the Japanese Archipelago than the present, and corresponded with the distribution of the species macro-fossils (Fig. 4c, d). These findings suggest that climatic conditions of large areas of the Japanese Archipelago during the LGM, apart from Hokkaido, were suitable for the growth of P. koraiensis.
By comparing the suitable habitats projected using the two GCMs for the LGM, we found that those projected in the CCSM4 scenario were distributed in the southern part of Hokkaido in the Japanese Archipelago, whereas according to the MIROC scenario, there were no habitats suitable for P. koraiensis in this region. The number of overlaps between macrofossil records and suitable habitats was almost the same (Table 3). However, there is no evidence, notably provided by macro-fossil records, to indicate that P. koraiensis was distributed in Hokkaido during the LGM. Accordingly, in this study, we conclude that the MIROC scenario is more realistic than that projected by the CCSM4 scenario for this region.
The area of suitable habitat has become narrower and has caused a disjunct distribution since the mid-Holocene (Table 2; Fig. 5). Furthermore, this finding was consistent with previous studies based on pollen fossil records, indicating that the distribution of coniferous forests, including those of Pinaceae, declined rapidly in northern Honshu from ca. 15 to 10 ka BP (Takahara et al. 2010;Ooi 2016). Our permutation importance and the climate distribution maps indicated that changes in MiCM caused this contraction in the range of suitable P. koraiensis habitats, especially in Honshu of the Japanese Archipelago (Table 1; Fig. 6c-e). Moreover, PCQ (in cool temperate regions indicating the accumulation of snowfall) increased along the Sea of Japan side of Honshu after the Tsushima warm current began to flow into the Sea of Japan ca. 10 ka BP (Fig. 6r-t ;Ohba 2006). Generally, Pinus is relatively susceptible to snow breakage (Jalkanen and Konôpka 1998), and it therefore appears that an increase in PCQ after the LGM influenced the retreat of P. koraiensis on the Sea of Japan side of northern Honshu.
In contrast, in our models, a vast area of empty habitat of the species was found in the southern part of Sakhalin, eastern Hokkaido, and northern Honshu in the present (Fig. 2). This indicates that a reduction in size and fragmentation of suitable habitat after the LGM may have led to the extinction of the species in northern Honshu and hampered its northward shift. Northward migration may also have been prevented by the post-LGM disappearance of the land bridge between Hokkaido and northern Honshu (Tsugaru Strait). Currently, the depth of the Tsugaru Strait is between 130 and 140 m, whereas during the LGM, the sea level is estimated to have fallen by 120-130 m (Fairbanks 1989;Ohshima 1990;Matsui et al. 1998;Stanford et al 2011). Even if a land bridge was formed in the Tsugaru Strait during the LGM, it is considered to have existed for only a relatively short period (Yashima and Miyauchi 1990;Ohba 2006;Stanford et al. 2011). Since the end of the LGM (ca. 17 ka BP), there has been an increase in the rate of sea level rise in the Oyashio Current flowing in the Sea of Japan across the Tsugaru Strait between the Sea of Japan side of northern Honshu and southwestern Hokkaido (Ohba 2006;Isobe 2020). However, it is estimated that there was no substantial rise in temperature until at least ca. 14.6 ka BP (Cuffey and Clow 1997), at which time, the Tsugaru Strait appears to have already been formed. Hence, even if the climate of Hokkaido became suitable for the growth of P. koraiensis after the LGM, the species would not have been able to migrate from Honshu to Hokkaido.

Conclusion
In this study, we modeled the current suitable habitat of P. koraiensis in Northeast Asia and projected suitable habitats under the LGM and mid-Holocene scenarios. In addition, we verified the accuracy of our model by overlapping the predicted habitats with macro-fossil records for the LGM. We conclude that the MiCM has been a major climatic factor influencing the shift in the distribution and establishment of the disjunct distribution of P. koraiensis. It can be speculated that fluctuations in this temperature influenced the shift in the distribution of many other cool and cold temperate species in Northeast Asia during the Quaternary period (Tsuyama et al. 2014;Shitara et al. 2018;Tang et al. 2018). Additionally, our results indicate that there is currently an empty habitat in the Japanese Archipelago and Sakhalin, which has remained uncolonized. Both climate change post LGM and geological events such as strait and land bridge formations would have influenced the establishment of empty habitats and the current disjunct distribution of P. koraiensis.