1 Introduction

Dynamic downscaling, such as a regional ocean-climate model (RCM), is a powerful tool that provides regional climate information on higher spatial and temporal resolutions than those of global reanalysis and climate models, and it is particularly useful for determining processes related to complex coastal terrain (e.g., Seo et al. 2014; Teufel et al. 2017; Zhang et al. 2020; Oh and Sushama 2021). Considerable work has been conducted to improve RCM-based climate information by developing dynamical and physical schemes that are more sophisticated (e.g., Giorgi et al. 2012; Lindstedt et al. 2015; Komkoua Mbienda et al. 2021). However, it is paramount to prescribe higher-quality initial and lateral boundary conditions in the RCM (e.g., Suh and Oh 2015; Rocheta et al. 2020) as global reanalysis and climate model simulations are used as the initial and lateral boundary forcings (e.g., Eyring et al. 2016; O’Neill et al. 2016). Therefore, to provide an RCM-based reliable future climate scenario associated with global warming, an extremely reliable global climate model simulation is required as the driving forcing.

Significant efforts have been made to improve global climate model simulations. In the 1990s, the World Climate Research Programme (WCRP) promoted a set of experiments known as the Coupled Model Intercomparison Project (CMIP), with the aim of better understanding past climate changes and for making projections and uncertainty estimates about the future (e.g., Meehl et al. 2000; Annan and Hargreaves 2011; Taylor et al. 2012). The sixth phase of CMIP (CMIP6; Eyring et al. 2016), which has recently progressed, features updates to the parameterization schemes and the addition of new physical processes. It also has a somewhat higher resolution than CMIP5, which advanced our understanding of regionally heterogeneous climate warming (e.g., Taylor et al. 2012; Sun et al. 2020). CMIP6 also contains the Scenario Model Intercomparison Project (ScenarioMIP; O’Neill et al. 2016; O’Neill et al. 2020), which produces projections for new sets of emissions and land use scenarios based on Shared Socioeconomic Pathways (SSPs; Riahi et al. 2017). In this respect, the CMIP6 models provide the opportunity to investigate the climate system and perform dynamic downscaling under new scenarios.

Numerous studies have examined the performance of CMIP6 models on global and regional scales (e.g., Eyring et al. 2016; Kim et al. 2020; Lee et al. 2021; Planton et al. 2021; Tang et al. 2021; Xie et al. 2022). CMIP6 models have been reported to realistically reproduce mean and extreme climates compared to observations, and their performances have improved compared to those of previous phase CMIP models (e.g., Kim et al. 2020; Xie et al. 2022; Fan et al. 2022). These model evaluation studies have been mainly conducted in the atmospheric fields connected with the international regional-atmosphere climate model project known as the COordinated Regional climate Downscaling EXperiment (CORDEX; Oh et al. 2014; Torres-Alavez et al. 2021), and model performances have often been evaluated based on habitable land areas rather than the ocean region (e.g., Kim et al. 2020; Xie et al. 2022; Fan et al. 2022).

However, the ocean covers approximately 71% of the Earth’s surface, and it plays a key role in controlling climate change and is a very efficient carbon sink that absorbs 23% of CO2 emissions (e.g., Dobush et al. 2022). It is well known that North Pacific (NP) climate variabilities, such as the Pacific Decadal Oscillation (PDO), are closely linked to the climate over East Asia and North America through large-scale circulation changes (e.g., Nishikawa et al. 2021). The current global climate change trend will result in substantial oceanographic warming over the NP ocean by the end of the twenty-first century (e.g., IPCC 2014; Alexander et al. 2018), and NP coastal communities are facing challenges in establishing plans to mitigate the impact of climate change on their socioeconomic activities. In addition, the Northwest Pacific (NWP) is characterized by a complex local circulation and large variability, and it is mainly influenced by major ocean currents (such as the Kuroshio Current, Tsushima Current, East Korean Warm Current, and Yellow Sea Warm Current) (e.g., Seo et al. 2014). Unfortunately, the ability of the CMIP6 models to reproduce these complex climate variabilities in the NWP ocean is limited owing to their coarse horizontal resolution (Giorgi et al. 2012; Teufel et al. 2017; Zhang et al. 2020), and they thus require dynamic downscaling in this region. Therefore, it is necessary to first quantitatively evaluate how the available CMIP6 models perform with respect to dynamic downscaling in the NP and NWP regions.

The present study aims to thoroughly quantify the performance of 17 CMIP6 models over the NP and NWP oceans for 1979–2014. A particular emphasis is placed on evaluating the models for annual mean climatology using the quantified Performance Score (PS). In addition, the long-term trends of ocean temperature for the historical period are quantified and compared, because it is important to conduct accurate simulations of the substantial oceanographic warming that is predicted to occur in the future climate. This study lays the foundation for conducting dynamic downscaling over the NP and NWP oceans in modeling historical and future climate scenarios. As such, this study provides baseline information for selecting the optimal global climate models for dynamic downscaling in these regions to apply in RCM simulations.

The remainder of this paper is organized as follows. Section 2 briefly describes the data and the methods used in this study. Section 3 presents the quantified model performance based on physical quantities and long-term ocean temperature trends. A summary and discussion are then presented in Sect. 4.

2 Data and Methods

2.1 Data

The 17 CMIP6 models currently provide all the boundary variables required for running the RCM from the Earth System Grid Federation (ESGF) website at https://esgf-node.llnl.gov/search/cmip6/ (see Table 1), and these required boundary variables are summarized in Table 2. This study uses monthly mean sea surface temperature, near-surface air temperature, precipitation, and near-surface eastward and northward components of wind data from 17 CMIP6 models for the period 1979–2014 (Table 1). The variables above correspond to the surface boundary forcing used to drive the RCM. Note that the CMIP6 simulations used in this study are from historical experiments of atmosphere–ocean coupled models.

Table 1 The CMIP6 models used in this study
Table 2 Summary of surface and lateral boundary variables required to drive regional ocean climate model (RCM). Note that sea surface temperature (tos) is identical to the first layer of the sea water potential temperature (thetao) in most CMIP6 models

The ocean heat content in the upper 2000 m represents the energy absorbed by the ocean through the surface and lateral boundaries, and it is evaluated using the equation for the ocean heat content as follows,

$$\text{Ocean heat content}\left(x,y\right)={\int }_{-2000}^{0}{C}_{p0}{\rho }_{0}Tdz$$
(1)

where \({C}_{p0}\) is the seawater heat capacity, as defined by IOC et al. (2015); \({\rho }_{0}\) is the reference density calculated by the first-year annual mean of temperature and salinity; and \(T\) is the Conservative Temperature. An increasing ocean heat content indicates ocean warming, which varies regionally (Fox-Kemper et al. 2021; Garcia-Soto et al. 2021). The distribution of the ocean heat content and its trend need to be considered in the evaluation, especially when conducting dynamic downscaling, because the long-term regional heat content trend is influenced mainly by advective heat flux (Tian et al. 2016).

Although seawater salinity, X and Y velocities, and the sea surface height above the geoid are also required to drive an RCM, this study focuses on evaluating the physical quantity of ocean temperatures and its long-term trends in the context of global warming. In addition, although some CMIP6 models have large ensemble members of up to approximately 31 members, this study analyzes the performances of single model members to conduct a fair comparison, and the first member, r1i1p1f1, is typically used.

The 5th generation of the European Centre for Medium-Range Weather Forecast global reanalysis (ERA5; Hersbach et al. 2020) data for the satellite-era historical period (1979–2014) is used as the reference data to evaluate the surface boundary variables of the CMIP6 models. This reanalysis has a higher spatial (30 km) and temporal resolution (hourly) than previous reanalysis datasets (i.e., ERA-Interim; Dee et al. 2011; NCEP/DOE reanalysis; Kalnay et al. 2002). It implies that ERA5 data is potentially more suitable for evaluating regional climate variability. Numerous studies have used ERA5 data as a reference for the reproducibility of global climate models with near-surface variables (e.g., Kim et al. 2020; Li et al. 2021; Oh and Sushama 2021). To evaluate the estimated ocean heat content and vertical profile of seawater potential temperature, the Institute of Atmospheric Physics (IAP) ocean temperature data for the period 1979–2014 is collected (e.g., Cheng et al. 2017). These IAP data provides a 1° × 1° horizontal resolution with a monthly temporal resolution and 41 vertical levels from 1 to 2000 m.

As shown in Table 1, the horizontal resolutions of the CMIP6 models differ from one another. To summarize the multi-model ensemble statistics and conduct a one-to-one model comparison, we interpolated the surface variables of all models into a common 1° × 1° grid using bilinear remapping. Similarly, the surface variables of ERA5 data at a resolution of approximately 30 km are interpolated into a 1° × 1° grid and then used in the evaluation. Furthermore, the three-dimensional ocean temperature is vertically interpolated into standard depth levels from the World Ocean Atlas (WOA; Boyer et al. 2018) and then horizontally interpolated into a common 1° × 1° grid.

2.2 Evaluation Matrix

The model evaluations conducted in this study mainly focuses on their performances in the NP (Lat.: -20° to 65°, Lon.: 98° to 284°) and NWP (Lat.: 15° to 58°, Lon.: 113° to 165°) oceans (see Fig. 1) in terms of evaluating the physical quantities and long-term trends of annual mean climatology. The performance of each model is evaluated using the root-mean-square difference (RMSD) and Taylor skill score (TSS; Taylor 2001), and their respective equations are as follows,

Fig. 1
figure 1

Spatial distribution of annual mean ocean heat contents in the upper 2000 m (ohc), sea surface temperature (tos), near-surface air temperature (tas), precipitation (pr), and near-surface eastward (uas) and northward (vas) components of wind over the North Pacific (NP) ocean (Lat.: -20° to 65°, Lon. 98° to 284°) for the period 1979–2014. The boxed area in each sub-plot indicates the Northwest (NWP) ocean (Lat: 15° to 58°, Lon.: 113° to 165°). The left and right panels indicate the references (IAP data for “ohc” and ERA5 for the others) and CMIP6 multi-model ensemble (Ens.), respectively

$$RMSD=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{\left({M}_{i}-{R}_{i}\right)}^{2}}$$
(2)
$$TSS=\frac{{(1+Correlation)}^{4}}{4\left(SDR+\frac{1}{SDR}\right)^{2}}$$
(3)

In Eq. (2), n is the number of total grids in the ocean areas of the analysis domain, and Mi and Ri denote the model and reference at i th grid, respectively. In Eq. (3), SDR is the ratio of the spatial standard deviations of the model to that of the reference data, and the Correlation can be calculated as follows,

$$Correlation=\frac{\sum_{i=1}^{n}({M}_{i}-\overline{M })({R}_{i}-\overline{R })}{\sqrt{\sum_{i=1}^{n}({M}_{i}-\overline{M })}\sqrt{\sum_{i=1}^{n}({R}_{i}-\overline{R })}}$$
(4)

where \(\overline{M }\) and \(\overline{R }\) represent the mean values of the model and the reference for the ocean areas in the analysis domain, respectively. The TSS score quantifies the similarity between the model and reference data with respect to the distribution and amplitude of the spatial pattern. The relative RMSD and TSS values of each model are calculated based on their median values, and the combined metric of the relative RMSD and TSS is used to visually separate the CMIP6 models into superior and inferior model groups.

Model performance depends on the variables, the analysis regions, and the evaluation measures used (e.g., Taylor 2001; Eyring et al. 2016; Lee et al. 2021). The evaluation measure is also important to make a fair and standard comparison (Taylor 2001). In this study, we quantify the performance of models using the PS based on the normalized RMSD and TSS for the analyzed regions. The PS equation is as follows,

$$PS=\frac{1}{Nvar}{\sum }_{i=1}^{Nvar}\left(\frac{(1-{NRMSD}_{i})+{NTSS}_{i}}{2}\right)\times 100$$
(5)

where Nvar is the number of evaluation variables (i.e., the sea surface temperature, near-surface air temperature, precipitation, near-surface eastward and northward components, and ocean heat content); NRMSD and NTSS indicate RMSD and TSS normalized by the Min–Max Normalization method, providing a linear transformation of the original range of data and maintains relationships among the original data (e.g., Patro and Sahu 2015). All of the scaled evaluation measures range from 0 to 1. The perfect values of NRMSD and NTSS are 0 and 1, respectively. In the numerator in parenthesis of Eq. (5), by performing 1 minus NRMSD, the perfect value is converted to 1 as for NTSS. Then, the denominator is multiplied by 2 to set a maximum value of the summation of the two scaled evaluation measures to 1. Subsequently, we calculate an equal-weighted average for all variables in each analysis region. To aid interpretation, these values are multiplied by 100, giving a range of the PS values for each model from 0 to 100. The PS value has the advantage that it can evaluate synthetic model performance as it is based on the RMSD and TSS of all variables considered in the evaluation. Therefore, PS is used to quantify the model performance over the analysis region.

3 Results

3.1 Evaluation of Physical Quantity for Surface and Lateral Boundary Forcing Variables

Figure 1 shows the spatial distribution of the annual mean ocean heat content, sea surface temperature, near-surface air temperature, precipitation, and near-surface eastward and northward components of wind from references and the CMIP6 multi-model ensemble over the NP ocean for the period 1979–2014. The NWP ocean is presented in each subplot as a boxed area. Compared with the references, the CMIP6 multi-model ensemble reasonably reproduces the spatial distributions of the surface and lateral boundary forcing variables. That is, the spatial ocean heat and related temperature characteristics induced by the difference in latitude, the large precipitations in the intertropical convergence zone (ITCZ) and East Asian monsoon region, and the location and strength of trade winds and westerlies are realistically reproduced, with spatial correlations of 0.85–0.99. Regionally, the simulated ocean heat contents have warm biases of + 0.4 to + 2.0 × 1010 J/m2 in the NP ocean compared to the reference, particularly for the significant warm biases in the NWP ocean. Both sea surface and near-surface-air temperatures tend to be slightly overestimated by approximately + 0.5 to + 1.5 °C in the ITCZ, but the opposite biases with a similar magnitude are found in the NWP ocean. For precipitation, relatively large and small biases ranging from -3 to + 3 mm/day and -0.5 to + 0.5 mm/day are observed in the ITCZ and the NWP ocean, respectively. Similarly, the simulated trade winds are relatively weak, whereas the strength of the westerlies is well simulated compared with the reference. These results are consistent with those of previous studies that showed CMIP multi-model ensembles have positive or negative biases on a regional scale with respect to the surface and lateral boundary variables for the historical climate but reasonably reproduce their spatial distribution (e.g., Kim et al. 2020; Zhang et al. 2020; Fan et al. 2022).

Figure 2 shows the box and whisker plots of the RMSD for the surface and lateral boundary forcing variables simulated by the CMIP6 models in the NP and NWP oceans. The RMSD ranges from 0.5 to 2.5, depending on the variable and region analyzed. There is a relatively larger model spread in the ocean heat content of the NP compared to that of the other variables. In addition, relatively lower performances are found for precipitation due to the large biases in the ITCZ precipitation zone (Figs. 1g and h). However, the simulated temperature performance in the NWP ocean is lower, and the model spread is larger than the other variables. The model resolution plays a critical role in simulating local temperature, particularly in the NWP ocean, which has complex coastal terrain, and this can result in large between-model diversity (e.g., Nishikawa et al. 2021; Oh and Sushama 2021). The results obtained here imply that the simulated temperature performance needs to be adequately considered when selecting CMIP6 models for dynamic downscaling in this region.

Fig. 2
figure 2

Box and whisker plot of the root-mean-square difference (RMSD) for the climatologies of ocean heat contents in the upper 2000 m (ohc, x1010 J/m2), sea surface temperature (tos, °C), near-surface air temperature (tas, °C), precipitation (pr, mm/day), and near-surface eastward (uas, m/s) and northward (vas, m/s) components of wind simulated by the 17 CMIP6 models in the a) North Pacific (NP) and b) Northwest Pacific (NWP) oceans for the period 1979–2014. Only the ocean grid is used to compute the RMSD, and the IAP for ohc and ERA5 for other variables are used as a reference. A six-number summary of the box plot is also shown; minimum score (Min.), 25th percentile (Q1), median (M), 75th percentile (Q3), maximum score (Max.), and outliers (circle). The median value is also presented in each sub-box plot

The relative RMSDs of the CMIP6 model for the surface and lateral boundary variables are compared in Fig. 3 using the median values of each sub-box plot shown in Fig. 2. In general, the model performance depends on the variable and region analyzed. For instance, CMCC model series show a relatively better performance with respect to temperatures than the other variables, whereas ACCESS model series show a relatively good performance in the near-surface eastward components of wind, and UK-ESM1-0-LL model shows a relatively better performance in the NP ocean but a lower performance in the NWP ocean. Overall, EC-Earth3 and CNRM model series perform relatively better in the NP and NWP oceans than the other model series.

Fig. 3
figure 3

Diagram of relative root-mean square differences (RMSDs) for the North Pacific (NP, Lat.: -20° to 65°, Lon.: 98° to 284°) and Northwest Pacific (NWP, Lat: 15° to 58°, Lon.: 113° to 165°) oceans in the 1979–2014 climatologies of ocean heat contents in the upper 2000 m (ohc, x1010 J/m2), sea surface temperature (tos, °C), near-surface air temperature (tas, °C), precipitation (pr, mm/day), and near-surface eastward (uas, m/s) and northward (vas, m/s) components of wind simulated by the 17 CMIP6 models. Only the ocean grid is used to compute the RMSD, and the IAP for ohc and ERA5 for other variables are used as a reference

A scatter plot based on the combined matrix of the relative RMSD and TSS is shown in Fig. 4, and the model performance is classified linearly for all variables and analysis regions. The results indicate that the combined use of RMSD and TSS enables appropriate classification of the models’ performances. This result shows that for temperature-related variables, the model performance for the magnitude of a physical quantity is of relatively greater importance for determining the driving model, as all CMIP models reproduce their spatial pattern well. For other variables, the model performance for spatial patterns is also important in determining the driving model.

Fig. 4
figure 4

Scatter plot using the relative root-mean square difference (RMSDs, y-axis, unit: %) and Taylor Skill Score (TSS, x-axis, unit: %) matrix for (a–f) North Pacific (NP) and (g–l) Northwest Pacific (NWP) oceans in the 1979–2014 climatologies of ocean heat contents in the upper 2000 m (ohc), sea surface temperature (tos), near-surface air temperature (tas), precipitation (pr), and near-surface eastward (uas) and northward (vas) components simulated by the CMIP6 model. Only the ocean grid is used to compute the RMSD and TSS, and the IAP for “ohc” and ERA5 for other variables are used as a reference. Note that the y-axis for relative RMSD is upside down; therefore, the closer the circle to the upper right, the better the performance

Table 3 summarizes the performance of the 17 CMIP6 models based on the PS value calculated from the normalized RMSD and TSS (see Sect. 2.2). In the NP ocean, the EC-Earth-Veg model provides the best performance, with the highest PS value of 85.89 among 17 CMIP6 models. The second- and third-best models are the EC-Earth3 and EC-Earth3-Veg-LR models, with PS values of 84.97 and 84.22, respectively. These results show that EC-Earth3 model series are the good choice for use in dynamic downscaling in the NP ocean. However, in the NWP ocean, CNRM-CM6-1 and CNRM-ESM2-1 models are the best and second-best models, with PS values of 76.97 and 76.69, respectively. The MRI-ESM2-0 and EC-Earth-Veg models are also appropriate for this ocean, with PS values of 75.77 and 75.67, respectively. This result implies that different global climate models can be recommended as a driving forcing for dynamic downscaling depending on the area analyzed.

Table 3 Performances of the 17 CMIP6 models according to the North Pacific (NP) and Northwest Pacific (NWP) oceans. PS is the performance score calculated using a combination of the normalized root-mean square error (RMSD) and Taylor skill score (TSS). All oceanic and atmospheric variables were used to calculate the PS

The sensitivity of the models’ performances to the variables used to compute the PS value is shown in Fig. 5. In general, the PS values of EC-Earth3 model series are higher, and the spread of the values is smaller than that of the other models for both the NP and NWP oceans (Figs. 5a and b). It indicates that the model performance of this model series is relatively less sensitive to the variables used, providing a superior performance. In contrast, the PS values of CMCC model series are widely spread according to the variables used. For example, when considering only temperature variables in computing the PS value, CMCC model series perform highly compared to the other CMIP6 models, but their performance is dramatically reduced when other boundary-forcing variables are considered when computing the PS value. The quantified model performance as a function of the variables considered is summarized in Tables 4 and 5 for the NP and NWP oceans, respectively. Overall, EC-Earth3 model series perform highly in most sensitivity tests within the NP ocean. When considering only wind, the UKESM1-0-LL and ACCESS-ESM1-5 models could be good choices for dynamic downscaling in this ocean.

Fig. 5
figure 5

The sensitivity of 17 CMIP6 model’s performances as a function of the variables used to compute the performance score (PS) for North Pacific (NP) and Northwest Pacific (NWP) oceans. The spread of PS for each model is also presented

Table 4 The performances of 17 CMIP6 models for the North Pacific (NP) ocean as a function of the variables used to compute the performance score (PS)
Table 5 Same as Table 4 except for the Northwest Pacific (NWP) ocean

In the NWP ocean, the models’ performances show a significant dependency on the variable considered. The results show that the MRI-ESM2-0 and CNRM-CM6-1 models are good choices for dynamic downscaling in this ocean considering atmospheric and oceanic variables, respectively, but the CMCC-ESM2 model is the best choice when considering only temperatures for dynamic downscaling in this ocean, and the ACCESS-ESM1-5 model may be the best choice when only wind is considered. These results therefore indicate that the selection of the CMIP6 model for use in dynamic downscaling in the NWP ocean will vary depending on the goal of the study conducted.

3.2 Evaluation of Long-Term Ocean Temperature Trend

In the context of continuous future oceanographic warming, it is critical that the long-term ocean temperature and the related heat content trends provided by CMIP6 models in relation to the historical period are accurate, because the RCM is more likely to follow that of driving forcing. In this subsection, we compare the ocean heat content, sea surface, and near-surface-air temperature trends between the 17 CMIP6 models and the reference.

Figure 6 shows the spatial distribution of the long-term annual mean near-surface-air temperature, sea surface temperature, and ocean heat content trends from the references and the CMIP6 multi-model ensemble over the NP ocean for the period 1979–2014. The western Pacific warming and eastern Pacific cooling for all three variables are observed in the references (Figs. 6a, c, and e). This result is consistent with those of previous studies using different observation datasets (Maher et al. 2018; Li et al. 2019). Spatially opposite temperature trends are closely associated with changes in circulation. For example, Maher et al. (2018) found a strengthening of the equatorial undercurrent in response to strengthened winds, which brought cooler water to the surface of the eastern Pacific and an increase in the shallow Pacific overturning cells, thereby resulting in the input of additional heat into the subsurface western Pacific. In addition, wind acceleration increases the subsequent transport of heat toward the western Pacific. This strengthening of the wind circulation can be primarily attributed to the cold tongue mode rather than the impact of the El Niño-Southern Oscillation (Li et al. 2019).

Fig. 6
figure 6

Same as Fig. 1, but for the annual mean near-surface air temperature (tas), sea surface temperature (tos), and ocean heat content (ohc) trends for the period 1979–2014

The CMIP6 multi-model ensemble fails to capture eastern Pacific cooling with all three variables. It instead shows warming trends throughout all NP oceans, with the exception of slight cooling in the ocean heat content around the Philippine Sea. These warming trends are larger in high-latitude and near-surface air temperatures than in low-latitude and ocean temperatures. The failure of the CMIP6 multi-model ensemble to capture the eastern Pacific cooling could result from its failure to reproduce the strengthening of the observed trade winds, which are known to have brought cooler water to the surface of the eastern Pacific during the historical period (not shown). This limitation of the ability of CMIP models to reproduce the spatial pattern of long-term trends, especially those of precipitation and wind variables, has been reported in previous studies (e.g., Lee et al. 2019; Vicente-Serrano et al. 2022). However, Lee et al. (2019) reported that CMIP models can reproduce regionally averaged temperature trends, depending on the region analyzed.

The time series of the observed and simulated regionally averaged annual mean near-surface-air temperature, sea surface temperature, and ocean heat content over the NP and NWP oceans for the period 1979–2014 are shown in Fig. 7. Their long-term trends are also summarized in Table 6. With respect to the regional averages, all three variables in reference gradually increased over the 36-year period in the NP and NWP oceans (see the thick black line in Fig. 7), and these warming trends (i.e., 0.47–0.65 W m−2, 1.66–3.88 °C per century) were statistically significant (Table 6). Stronger warming trends in the near-surface air temperature (i.e., 2.16–3.88 °C per century), compared to the sea surface temperature (i.e., 1.66–2.47 °C per century), are apparent in both the NP and NWP oceans. In addition, the NWP ocean tends to have more robust warming trends (i.e., 2.47–3.88 °C per century) than the NP ocean (i.e., 1.66–2.16 °C per century).

Fig. 7
figure 7

Time series of the regionally averaged annual mean near-surface-air temperature (tas), sea surface temperature (tos), and ocean heat contents (ohc) over the North Pacific (NP) and Northwest Pacific (NWP) oceans for the period 1979–2014

Table 6 Summary of observed and simulated regionally averaged annual mean ocean heat content (ohc), sea surface temperature (tos), and near-surface air temperature (tas) trends over the North Pacific (NP) and Northwest Pacific (NWP) oceans for the period of 1979–2014. Note that the “ohc” trend unit represents the change rate in the total heat energy contained in a unit area

Overall, the CMIP6 multi-model ensemble reasonably captures the regionally averaged physical quantities of the warming trends of the three variables over time compared to the reference (see the thick purple line in Fig. 7). In addition, it reproduces the characteristics of the relative magnitude of the observed warming trends according to the variables and regions analyzed. However, it shows more substantial warming (1.5–2.0 times) than the reference, except for the ocean heat content in the NP ocean (see the second row in Table 6).

Most of the CMIP6 models reproduce the observed warming trends well, although their magnitudes differ. As shown in Fig. 7f, there is significant diversity in the ability of CMIP6 models to simulate the ocean heat content compared to the other variables, and this could be attributable to the model resolution. The NWP region contains complex coastal areas. It may significantly affect simulations of the seawater potential temperature in deep layers because of the prescribed seabed topography which differs depending on the model resolution (e.g., de la Vara et al. 2020).

The relative errors of the long-term trends simulated by the 17 CMIP6 models compared to the reference are further examined in Fig. 8. The blue color indicates that the warming trend of the CMIP6 model is smaller than that in the reference. Note that EC-Earth3-Veg-LR model shows a negative trend in ocean heat content in the NWP ocean (see Table 6). The relative errors, even in a single model, depend on the variable and the region analyzed. For example, most of the CMIP6 models underestimate warming in the ocean heat content in the NP ocean, but overestimate warming in the other variables in this ocean. Some models, such as the MPI-ESM1-2-HR and MRI-ESM2-0 models, show lower warming trends in the ocean heat content in the NP ocean but overestimate warming in the NWP ocean.

Fig. 8
figure 8

Relative errors in the ocean heat content (ohc, W m−2), sea surface temperature (tos, °C per century), and near-surface air temperature (tas, °C per century) trends simulated by the 17 CMIP models over the North Pacific (NP) and Northwest Pacific (NWP) oceans for the period 1979–2014, compared to the reference. The blue color indicates that the warming trend of the CMIP6 model is smaller than that in the reference. Note that EC-Earth3-Veg-LR shows a negative “ohc” trend over the NWP

When considering the warming levels of the three variables, EC-Earth3 and UKESM1-0-LL models show the most significant deviations from the warming levels in the reference for both the NP and NWP oceans. In contrast, MPI-ESM1-2-HR and MRI-ESM2-0 models show relatively smaller deviations from the warming levels of the reference over these regions. It is of note that the MRI-ESM2-0 model provides a good physical quantity performance in the NWP ocean (see Table 3), and this model could thus be a good choice for use in dynamic downscaling in the NWP ocean. However, the result suggests that providing a better performance in terms of physical quantity does not directly connect to providing a better performance in relation to the trend. For instance, EC-Earth3-Veg and CNRM-CM6 models, which respectively provide the highest-performing physical quantities based on PS values in the NP and NWP oceans (see Table 3), show moderate performance with respect to their warming trends. This result implies that it is necessary to carefully consider various factors when selecting a CMIP6 model as a driving forcing in dynamic downscaling.

4 Summary and Discussion

This study quantitatively evaluates the performance of CMIP6 models as driving forcing for dynamic downscaling in the NP and NWP oceans in terms of their abilities to reproduce physical quantities and long-term trends. The 17 CMIP6 models provide all the surface and lateral boundary variables required for running the RCM from the ESGF website as of 2022. 04. (see Tables 1 and 2), and their performances are evaluated. The various driving variables, i.e., the ocean heat content, sea surface temperature, near-surface air temperature, precipitation, and near-surface eastward and northward components of wind, are compared to the ERA5 and IAP data for the period 1979–2014. A particular emphasis is placed on the model performance for annual mean climatology using the PS value based on normalized RMSD and TSS. Furthermore, in consideration of oceanographic warming, the long-term trends of the regionally averaged near-surface air temperature, sea temperature, and ocean heat content over the NP and NWP oceans are examined.

Compared with the references, the CMIP6 multi-model ensemble reasonably reproduces the spatial distributions of the physical quantities of the surface and lateral boundary variables, with a spatial correlation of 0.85–0.99. However, the performance of a single CMIP6 model significantly depends on the variable and region analyzed, particularly in terms of the physical magnitude. Overall, the RMSDs of the EC-Earth3 and CNRM model series are relatively lower in the NP and NWP oceans compared to the other model series. Of the 17 CMIP6 models, EC-Earth-Veg and CNRM-CM6-1 models show the best performances in terms of the PS values (85.89 and 76.97) for the NP and NWP oceans, respectively. In particular, the EC-Earth3 model series are less sensitive to the driving variables used in computing the PS value for the NP ocean, which suggests that this model series could be a good choice as the driving forcing for dynamic downscaling in this ocean. In the NWP ocean, the model performance shows a significant dependency on the variable considered in computing the PS value. This implies that selecting the appropriate CMIP6 model as the driving forcing for dynamic downscaling in this ocean depends on the research perspective.

In the trend analysis, the CMIP6 multi-model ensemble reasonably captures the regionally averaged warming trends, although its warming is more robust than the reference by 1.5–2.0 times. Both MPI-ESM1-2-HR and MRI-ESM2-0 models provide relatively good performances in the NP and NWP oceans compared to the other models. In particular, MRI-ESM2-0 model shows a high performance for the physical quantity in the NWP ocean (see Table 3), and it could thus be a good choice for use in dynamic downscaling in the NWP ocean.

However, providing a better performance in terms of physical quantity does not directly correlate with providing a better performance with respect to long-term trends. It should be noted that the PS value used in this study could be sensitive to the selection of variables and regions. Therefore, when conducting dynamic downscaling in NP and NWP oceans, users need to make a subjective decision as to which model to employ, based on their specific research needs. Besides, when conducting dynamic downscaling, multi-model experiments forced by independent driving models are needed rather than using a single model. This study is meaningful in that it provides objective information, and thus saves time and computing resources, for researchers to construct a more systematic ensemble experiment and perform dynamic downscaling on the NP and NWP oceans.

Investigating the vertical profile of ocean warming trends along the RCM boundary assists in evaluating global climate models because substantial heat exchange with the surrounding region occurs through the boundary of the RCM, especially for the NWP. Preliminary results for the vertical warming trend profile along the southern boundary are shown in Fig. 9a. Both ACCESS-CM2 and ACCESS-ESM1-5 models show a shallowing of the thermocline and an increase in water temperature in the intermediate layer (500–1,000 m). This vertical structure, which differs from that of the reference, is problematic when driving an RCM. An extensive model spread is also found, especially in the upper 1,000 m along the eastern boundary of the NWP ocean (Fig. 9b). Therefore, further studies are necessary to comprehensively consider the performance of the vertical profiles.

Fig. 9
figure 9

Vertical profiles of long-term ocean temperature trends from reference and CMIP6 models along the (a) southern and (b) eastern boundaries of the Northwest Pacific (NWP) for the period 1979–2014