Performance of MPAS-A and WRF in predicting and simulating western North Pacific tropical cyclone tracks and intensities

Performances of the Model for Prediction Across Scales-Atmosphere (MPAS-A) in predicting and the Weather Research and Forecasting (WRF) model in simulating western North Pacific (WNP) tropical cyclone (TC) tracks and intensities have been compared. Parallel simulations of the same historical storms that made landfall over southern China, namely, TCs Hope (1979), Gordon (1989), Koryn (1993), Imbudo (2003), Dujuan (2003), Molave (2009), Hato (2017) and Mangkhut (2018), were carried out using WRF and MPAS-A, with initial conditions (and, for WRF, lateral boundary conditions as well) taken from ERA-interim. For MPAS-A, the model was integrated using a standard 60-to-3-km variable-resolution global grid mesh and also on 160-to-2-km grids customized to cover the TC tracks with the highest resolution mesh. The WRF model was integrated using a 15-km/3-km nested domain. No TC bogus scheme was applied when initializing the MPAS-A and WRF simulations. It was found that while TC tracks were reasonably captured by the two models configured variously, the storm intensities were underestimated in general. Given MPAS-A runs were initial value predictions whereas WRF runs were dynamically downscaled from ERA-interim, the finding that MPAS-A has comparable (or slightly better) performance as (than) WRF is noteworthy. To further examine the sensitivity of the MPAS-A TC forecasts to the initial data, additional experiments were carried out for TCs Molave and Hope using ERA5 reanalysis as initial conditions. The ERA5 initialized runs showed significant (slight) improvement in intensity (track) evolution, suggesting that the underestimated TC intensity is likely related to inferior representation of storms in the ERA-interim initial fields. Furthermore, additional runs using another customized 60-to-2-km mesh showed a reasonable improvement in capturing the TC tracks, suggesting that the track forecast accuracy of MPAS-A in TC can be sensitive to the grid resolution in the coarsest part of the variable-resolution mesh used.


Introduction
During the TC peak season, western North Pacific (WNP) tropical cyclones (TCs) bring strong winds, extreme precipitation and high storm surges to the coastal cities in southern China (Chen et al. 2020), posing great threats to public safety and economic activity. Super typhoons Hato (2017) and Mangkhut (2018), for instance, caused at least 23 and 6 casualties, respectively, and hundreds of injuries during their landfalls in southern China. In general, Hong Kong is exposed to more damages if TCs make landfall to its west (e.g. Hato 2017) compared to those making landfall to its east (e.g. Usagi 2013), as northern mountains can block northerly winds to some extent in the latter scenario, but not in the former. Therefore, how accurate TC tracks and intensities are predicted or simulated by numerical weather prediction (NWP) models is of great importance for TC landfall risk measure and the following timely safety precautions.
To achieve that, high-resolution models are required for resolving the fine-scale TC dynamics (Gentry and Lackmann 2010;Davis et al. 2010); at the same time, a relatively large simulation domain may be needed for avoiding lateral boundary effects (Landman et al. 2005) and capturing the large-scale circulations (Laprise et al. 2012). Since a highresolution global (presumably non-hydrostatic) model Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s00704-020-03444-5. integration is extremely computationally intensive, a workaround practice is to use a high-resolution limited-area model (e.g. the Advanced Research Weather Research and Forecasting; hereafter WRF) to dynamically downscale coarse-resolution global model forecasts or reanalysis data (e.g. Cha and Wang 2013). However, this poses many issues, including artefacts introduced by abrupt resolution transitions at the lateral boundaries (e.g. Park et al. 2014;Hashimoto et al. 2016), inconsistencies in dynamics and physics between the regional and global models Giorgi 2006) and insufficient forcing of the large-scale state in the interior of regional model domain (in case of no spectral nudging) (e.g. von Storch et al. 2000;Cha et al. 2011), etc. The Model for Prediction Across Scales-Atmosphere (MPAS-A) is a global, fully compressible, non-hydrostatic atmospheric model developed with the basis of WRF model but spatially discretized on a C-grid staggered unstructured Voronoi mesh using finite-volume formulation (Skamarock et al. 2012). By using such unstructured spherical centroidal Voronoi tessellations (SCVTs), local refinement is achieved via variable-resolution mesh such that the horizontal resolution would exhibit gradual transition from coarser region to finer region (e.g. Lui et al. 2020). As a global NWP model, MPAS-A can be integrated forward in time, given the initial conditions, without any lateral boundary forcing. In contrast, regular latitude-longitude grids are used and local refinement is carried out through domain nesting in WRF framework. Besides, WRF needs to constantly update the lateral boundary conditions over time using dynamical downscaling. Despite the studies by Hagos et al. (2013) and Park et al. (2014) in comparison of these two refinement approaches based on idealized simulations, limited studies were conducted to compare the two models in a real-world context. Thus, this study aims at comparing the performance of MPAS-A and WRF in reproducing the tracks and intensities of historical WNP TCs.
In the official MPAS-A version, several pre-generated "standard" quasi-uniform and circularly refined variableresolution meshes are available. Simulations using the standard variable-resolution meshes have been shown to provide comparable performance to those using uniform meshes (Sakaguchi et al. 2015). Both settings also give reasonable results in capturing TC characteristics such as track and intensity (e.g. Huang et al. 2017Huang et al. , 2019 and for "climatological" studies (e.g. Davis et al. 2016;Michaelis et al. 2019). However, the major shortcoming for the standard variableresolution meshes lies in the lack of flexibility for customizing meshes based on users' specific needs (e.g. resolution, shape, span and number of refinement regions and degree of resolution variations). In light of this, a variable-resolution mesh is customized in this study, specifically for simulating WNP TCs that move northwestward and make landfall over southern China while trying to save computational cost. To the authors' knowledge, this is the first study to employ such customized variable-resolution mesh in MPAS-A for simulating TCs. Thus, this study also attempts to demonstrate the feasibility of utilizing customized meshes under the MPAS-A framework.
The remainder of this paper is organized as follows. The model configurations of both MPAS-A and WRF models, the experimental settings and the tracking methods are given in Section 2. The TC tracks and intensities' evolution and the related error analysis in the parallel experiments are compared and included in Section 3. Moreover, results from the additional experiments for studying MPAS-A's sensitivity to the choice of initial conditions as well as grid resolution are also given in this section. Finally, conclusions are given in Section 4.

Model configurations and experimental set-up
The global model used in this study is MPAS-A version 6.1 from NCAR (Skamarock et al. 2018). MPAS-A's dynamical core solves a set of fully compressible nonhydrostatic equations discretized horizontally on a C-grid staggered unstructured Voronoi mesh using finite-volume formulation and vertically in a generalized hybrid geometric-height coordinate. In our study, MPAS-A was run on a standard 60-to-3-km mesh with a circular refinement region (see Fig. 1a), and a customized 160-to-2-km mesh with an oval-shaped refinement region oriented in a SE-to-NW direction (see Fig. 1b), with 55 vertical levels reaching a model top of 30 km (~12 hPa). The refinement regions of these two meshes cover the South China Sea, southern China and WNP to different extents. The 160-to-2-km mesh was generated using the ClusterTech Platform for Atmospheric Simulation (CPAS; www.cpas. earth) version 0.3.2. CPAS v0.3.2 employed a mesh generation algorithm in which initial grid points (i.e. Voronoi cell vertices) were generated by a modified Frontal-Delaunay algorithm (Engwirda 2017) and bisection techniques. The positions of the points were then further optimized by Lloyd's method (Du et al. 1999;Ringler et al. 2008;Lui et al. 2020) with respect to a mesh density function (which depends on the mesh specification) until they converged to a SCVT. The 160-to-2-km mesh was customized specifically for simulating WNP TCs that make landfall over southern China. Its resolution within the finest region is approximately 3 km, while the mesh is slightly further refined by boosting the regions with large orographic variance and along the coastline to 2 km (not shown). The physical parameterizations used in this study include the new Tiedtke cumulus scheme (Zhang and Wang 2017), the WSM6 cloud physics scheme (Hong and Lim 2006), the Noah land surface scheme (Tewari et al. 2004), the Yonsei University (YSU) planetary boundary layer (PBL) scheme , the Monin-Obukhov surface layer (Janjić 2001) and the RRTMG scheme for long-and short-wave radiation (Iacono et al. 2008). Furthermore, WRF version 4.0 was utilized for comparisons with MPAS-A (see Chen et al. 2020). The model physics employed in WRF experiments are identical to those in MPAS-A experiments. The WRF experiment consists of two domains where a 3-km domain was nested in a 15kmdomain (Fig. 1c) and with 55 vertical levels reaching a model top of 10 hPa. The convection scheme was switched off in the 3-km domain in WRF experiments. Concerning the "scale-awareness" of convection parameterization, although a convection-permitting suite of physics is available in MPAS-A, we adopted code modification following a similar approach in WRF to disable convection parameterization for cells with spacings smaller than 9 km in MPAS-A experiments, for a fair comparison with WRF. Model configurations are summarized in Table 1.
Regarding the static data fields, the Moderate Resolution Imaging Spectroradiometer (MODIS) land use data and the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) topographical data were used, whereas the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-interim (0.75°× 0.75°) (Dee et al. 2011) was used as the initial conditions in MPAS-A and WRF experiments. The same reanalysis data were also used as forcing data to update lateral boundary conditions and sea surface temperature every 6 h in WRF experiments. Nudging and TC bogus were switched off in WRF experiments; currently no TC bogus scheme is available for MPAS-A. Apart from ERA-interim, a recently released high spatial resolution reanalysis (ERA5; Hersbach et al. 2018) was also used in additional sensitivity experiments (see Section 3). This reanalysis dataset has a higher spatial resolution (31-km horizontal resolution and 137 vertical levels) than ERA-interim (79-km horizontal resolution and 60 vertical levels). Note that without TC bogus, TCs could be initialized with weaker intensity, thereby potentially resulting in weaker beta drift in the experiments; still, this allows a fair evaluation on vortex representation in ERA-interim and ERA5 datasets.
Eight historical storms, namely, Hope (1979), Gordon (1989), Koryn (1993), Imbudo (2003, Dujuan (2003), Molave (2009), Hato (2017 and Mangkhut (2018), were simulated. These TCs were selected because they generally moved northwestward and made landfall over southern China. Moreover, their intensities and impacts (e.g. storm surge) on the coastal region were relatively strong. The initial and final time for these simulations is summarized in Table 2. The first 12 h were treated as spin-up and excluded from our analyses.

TC tracking method
The vortices were tracked by the Geophysical Fluid Dynamics Laboratory (GFDL) vortex tracker (Davis et al. 2016). Since the GFDL vortex tracker does not support the native MPAS-A meshes, the MPAS-A outputs were first mapped onto a regular 0.025°× 0.025°grid (~2.78 km) by barycentric interpolation as inputs for GFDL vortex tracker. Using Barnes analysis, the vortex centre was located through searching for the average of the maximum or minimum of the tracking parameters, including relative vorticity at 10 m, 850 hPa and 700 hPa levels; mean sea level pressure (MSLP); geopotential height at 850 and 700 hPa levels; winds at 10 m, 850 hPa and 700 hPa levels; as well as the geopotential thickness in the 500-to-850 hPa, 200-to-500 hPa and 200-to-850 hPa layers (Davis et al. 2016;Biswas et al. 2018). To keep tracking the storms, a radial pressure gradient exceeding 0.0015 hPa km −1 must exist in any azimuthal direction, and the mean tangential component of wind at 850 hPa within 225 km of the position fix must be cyclonic and larger than 1.5 m s −1 . Additionally, the distance between the position fix obtained from MSLP and the 850 hPa relative vorticity must not exceed 400 km, and the average translational speed must be smaller than 60 kt (1 11 km hr. −1 ). The outputs were then evaluated against the post-storm best estimates of the tracks and intensities from the Hong Kong Observatory (HKO) "best" track data available in International Best Track Archive for Climate Stewardship Table1 Model configurations used in WRF and MPAS-A experiments in this study. Note that convection scheme is disabled for grids smaller than 9 km in MPAS-A and in the WRF 3-km domain

Contents
WRF MPAS-A  (Schreck et al. 2014). The direct position error is the great-circle distance from the simulated position to the best-track position and can be further decomposed into the along-track and cross-track error components, relative to the direction of the TC motion in the best track. The along-track (cross-track) error is positive when the simulated position is ahead (to the right) of the best-track position.

Results and discussions
3.1 Performance in track and intensity evolution Figure 2 shows the observed best tracks from HKO and simulated tracks in the 15-km and 3-km domains of WRF experiments as well as 60-to-3-km and 160-to-2-km MPAS-A Fig. 1 MPAS-A a 60-to-3 km, b 160-to-2-km global variableresolution meshes (units: km) and c WRF nested grids at 15-and 3km resolution for domains 1 and 2, respectively experiments for the eight historical storms. It was found that the storm tracks were reasonably captured by the two models with various configurations. For Hope, both MPAS-A and WRF can predict the observed west-northwestward motion for the first 2 days (Fig. 2a). Except in the MPAS-A 60-to-3km simulation, to the south of Taiwan, the MPAS-A 160-to-2km and WRF simulations showed an over-predicted northward deflection. After 3 days, all simulated tracks can barely capture the pronounced southward deflection during the landfall on the coast of southern China. For Dujuan, MPAS-A and WRF tracks deviated more northward than the observed westnorthwestward translation on the first day (Fig. 2e). The northward deviation caused the simulated vortices to make direct landfall on the southern Taiwan mistakenly later on the next day and led to a northward bias in the subsequent tracks. For Koryn, both models can reasonably produce the northwestward translation and landfall on Luzon Island on the first day (Fig. 2c). However, during landfall, all simulated tracks showed an erroneous southward deflection, as opposed to the observed northward deflection. Afterwards, the simulated vortices moved slower and more southward than in the best track. For Hato and Mangkhut, both models can reasonably capture the northwestward translation over the entire simulation periods ( Fig. 2g and h). However, the translational speed was significantly underestimated. Finally, except for Molave in the MPAS-A 160-to-2-km simulation, both MPAS-A and WRF can reproduce the observed tracks rather well for Gordon, Imbudo and Molave (see Fig. 2b, d, f). Time series of direct position, along-track and cross-track errors for the model simulated storms in comparison to the best track data are shown in Figs. S1, S2 and S3, respectively. Except the large position errors resulted from the overestimated northward deviation for Hope, the wrongly predicted southward deflection for Koryn and the underestimated translational speed for Hato and Mangkhut, the simulated tracks generally had a direct position error less than 200 km (see Fig. S1). Due to weaker-thanobserved TC speeds, significant negative along-track errors were found in both model simulations for Hato and Mangkhut (see Fig. S2g and S2h). Resulted from the wrongly predicted northward (southward) deflection, significant positive (negative) cross-track error and negative along-track error were found in both model simulations for Hope (Koryn) (see Figs. S2a, S2c, S3a and S3c). Figure 3 shows the observed sea level pressure from HKO and simulated intensity in the 15-km and 3-km domains of WRF experiments as well as in the 60to-3-km, 160-to-2-km MPAS-A experiments for the eight historical storms. The simulated intensities for all storms were under-predicted and its evolution was poorly captured in both MPAS-A and WRF. We reckon that such underestimation might be attributed to the weak intensity in the initial conditions and partly responsible for the aforementioned discrepancies in simulated tracks compared to the best track data. The overall performance in simulating storm tracks between MPAS-A and WRF was evaluated by examining their evolution of mean direct position errors for all TC cases in the course of 72 h (Fig. 4a). Within the first 36 h, the track performance between MPAS-A and WRF was comparable, having a mean position error less than 150 km. However, at later times MPAS-A tended to show a better predictability in TC tracks than WRF. For example, at T + 60 h, the 60-to-3-km and 160-to-2-km MPAS-A experiments had a position error of 163 km and 178 km on average, respectively, which was smaller than that in WRF 15 km (201 km) and WRF 3 km (209 km). Within 72 h, MPAS-A produced a mean position error less than 180 km. Overall, it is suggested that MPAS-A performs better than WRF in capturing the TC tracks, with the former using the 60-to-3-km resolution mesh being the best. Regarding the storm intensity, sea level pressure of TC centre was generally overestimated in model simulations, with biases of up to 44.5 hPa, 44.3 hPa, 48.1 hPa and 43.1 hPa, respectively, for 60-to-3-km and 160-to-2km MPAS-A simulations and 15-km domain and 3-km domain of WRF simulations (see Fig. 4b). The intensity errors generally became smaller near landfalls. The overall performance for predicting TC intensity of WRF and MPAS-A was comparable, except perhaps at later times (i.e. from T + 60 to T + 72 h). Furthermore, MPAS-A gave slightly smaller underestimation in storm intensity than WRF in general. Finally, 60-to-3-km and 160-to-2-km MPAS-A had comparable skills in simulating intensity evolution. Given MPAS-A runs were initial value predictions whereas WRF runs were dynamically downscaled from ERA-interim, the finding that MPAS-A has comparable (or slightly better) performance in track and intensity evolution as (than) WRF is particularly noteworthy.

Underestimation in TC speeds
Firstly, for Koryn, at 06 UTC 26 Jun 1993, the centre in 60-to-3-km MPAS-A experiments was predicted at 17. 9°N, 120.3°E , and the position error was 30.7 km compared to the observed location (18.1°N, 120.1°E). After 6 h, the predicted centre (18.1°N, 119.1°E) deviated significantly from the observed location (19.0°N, 118.6°E), thereby leading to a position error of 113 km. The along-track error and cross-track error grew, respectively, from − 30.5 km to − 97.9 km and from − 3.17 km to − 56.6 km, suggesting that there was a slow and southward bias. Similar bias was also observed in the 160to-2-km MPAS-A counterparts (see Fig. S1c). Secondly, for Hato, at 00 UTC 22 Aug 2017, the predicted centres were at 20.2°N, 121.0°E and 19.9°N, 120.5°E in the 60-to-3-and 160-to-2-km MPAS-A experiments, respectively. Compared to the observed location (20.0°N, 120.1°E), they had a position error of 96.6 km and 43.3 km, respectively. Six hours later, while the cross-track errors only showed relatively small changes, the along-track errors grew significantly from − 41.7 km to − 127 km (from − 35.0 km to − 160 km) in the 60-to-3-km (160-to-2-km) MPAS-A simulations (see Fig.  S1g). Lastly, for Mangkhut, at 06 UTC 15 Sep 2018, the 60to-3-km (160-to-2 km) experiment predicted that centre was located at 18.4°N, 120.4°E (18.6°N, 120.7°E) and had a position error of 74.7 km (106 km) compared to observed location at 18.5°N, 119.7°E. After 6 h, while the crosstrack errors only showed relatively small changes, the alongtrack errors grew significantly from − 70.5 km to − 142 km (from − 88.0 km to − 157 km) in the 60-to-3-km (160-to-2 km) MPAS-A simulations (see Fig. S1h).
To shed light on why MPAS-A simulations showed underestimated translation in these cases, we compared the predicted steering flows in MPAS-A 60-to-3-km simulations with ERA-interim reanalysis. Figure 5 shows 500 hPa geopotential height and 700 hPa-to-300 hPa vertically averaged winds in the MPAS-A 60-to-3-km simulations, ERAinterim reanalysis, as well as their differences between MPAS-A and ERA-interim. In general, MPAS-A can reasonably capture the location of the WNP subtropical high as well as the environmental flow pattern (see Fig. 5a, b, d, e, g and h). However, the subtropical high and the environmental flow seemed to be slightly weaker in MPAS-A predictions, compared to ERA-interim reanalysis. By taking their differences, the southeasterly wind to the northeast of the TCs was found to be significantly underestimated in the MPAS-A simulations (Fig. 5c, f and i). To the northeast (southwest) of the underestimated southeasterly wind, an anomalous low (high) was found, corresponding to anomalous positive (negative) vorticity. Furthermore, the anomalous northwesterly winds in NE quadrant were found to be stronger than the anomalous southeasterly winds in SW quadrant, suggesting a weaker beta drift in MPAS-A simulations. Such weaker beta drift was likely due to the underestimated storm intensity. Overall, the weaker southeasterly steering flow led to a weaker TC translation in MPAS-A simulations than observation. Note that in order to discriminate what portion of track error do the steering flow and beta drift contribute individually, experiments with TC intensity initialized more realistically (e.g. by using a TC bogus scheme or better initial conditions) might be needed in future studies.

Underestimation in storm intensity
To examine the TC intensity biases in MPAS-A simulations, additional 160-to-2-km MPAS-A experiments were carried out using 0.75°× 0.75°ERA5 reanalysis as initial conditions.

MPAS-A 60-to-3-km (blue) and 160-to-2 km (red) experiments and the 15-km domain (orange) and 3-km domain (green) in the WRF experiments
Note that here a coarser resolution (0.75°) ERA5 data was used as initial conditions for a fair comparison with ERAinterim initialized runs, though the ERA5 data can be at a much higher spatial resolution (0.25°). Also note that although the intensity was under-predicted for all eight TCs, only Hope, Molave and Hato were selected for additional ERA5 initialized runs. It is because these three TCs moved across Luzon Strait without making a landfall (except at the end of their lifetime), thereby considerably eliminating the orographic effect. Moreover, as these TCs were also selected for additional runs in Section 3.4, the results in the two sections could therefore be compared.
The tracks and intensities evolution for the ERA5 initialized runs are presented in Fig. 6. When switching from ERAIN to ERA5 as initial conditions, the deepening of the central pressure was found to be enhanced significantly (up tõ 40 hPa) and thus the intensity evolution was improved for Hope, though the intensity remained to be largely underpredicted (slightly over-predicted) before (after) 12 UTC 1 Aug 1979 (Fig. 6b). For Molave, the intensity evolution was well captured by the ERA5 initialized run, despite some relatively small discrepancies (Fig. 6d). For Hato, the intensification was well reproduced in the first 36 h (Fig. 6f). Until then (i.e. 03 UTC 23 Aug 2017), it continued to intensify in MPAS-A simulation, in contrast to weakening in observation. This over-predicted intensification can be attributed to the weak translational motion of Hato in MPAS-A simulation, thereby delaying its time to make landfall by~6 h and mistakenly allowing it to further intensify in the northern South China Sea. Regarding the track performance, ERA-interim and ERA5 initialized runs were comparable for Hope and Molave (see Fig. 6a and c). However, the predicted track for Hato showed significant improvement with ERA5 as initial conditions (Fig. 6e). The reduced northward deflection of Hato near the Luzon Strait may be related to weaker orographic interaction due to a stronger and faster TC in ERA5initialized simulations (e.g. Yeh and Elsberry 1993). Furthermore, it is noted that the underestimation in translation was slightly reduced, which was likely due to the increased TC intensity and stronger beta effect. However, the translational speed was still underestimated because of the weaker background steering flows in MPAS-A predictions as Fig. 4 Time series of a direct position errors (units: km), and b minimum sea level pressure (units: hPa) averaged for the TCs from the MPAS-A 60-to-3-km (blue) and 160-to-2-km (red) experiments and the 15-km domain (orange) and 3-km domain (green) in the WRF experiments, from lead time of 0 to 72 h. The vertical error bars indicate the range of errors (i.e. maximum and minimum) in each set of TC experiments. The sample size for the time series at each lead time is shown at the bottom mentioned in Section 3.2. Overall, using ERA5 as initial conditions, storm intensity was significantly improved, whereas the track performance was slightly improved.
To compare the TC representation in ERA5 and ERAinterim data for Hope, Molave and Hato, we examined the 850 hPa winds at the initial times (i.e. 12 UTC 29 Jul 1979, 12 UTC 16 Jul 2009 and 00 UTC 21 Aug 2017, respectively) for the two datasets (see Fig. 7). The 850 hPa wind patterns and the locations of vortices were very similar in both reanalyses. However, it was found that the vortices' eyes were tighter, and the cyclonic winds were stronger in ERA5 compared to ERA-interim (see Fig. 7a, b, d, e, g and h), consistent with the deeper initial TC central pressure in ERA5 than ERAinterim for these three cases (see Fig. 6b, d and f). Furthermore, it was observed that the TC horizontal wind structure was better resolved in ERA5 data than in ERAinterim data. By taking the difference of the two datasets, it was found that the 850 hPa cyclonic winds can be weaker in ERA-interim than in ERA5 data. The magnitude of the differences in 850 hPa winds can be as large as~12 m s −1 in the NW quadrant of Molave and SE quadrant of Hato, or even1 6 m s −1 in the NE quadrant of Hope (see Fig. 7c, f and i).
To fully utilize the high spatial resolution of ERA5, additional simulations were conducted using 0.25°× 0.25°ERA5 as initial conditions. The initial vortex central pressure and the predicted intensity evolution were comparable between the 0.25°and 0.75°ERA5 initialized runs (see Fig. 8b, d and f). Better resolved storm structure in 0.25°ERA5 data seemed to enhance the intensification of vortices in the MPAS-A simulations, although it was over-predicted compared with observation. Regarding the track performance in 0.25°ERA5 initialized runs, the northward bias was reduced for Hope ( Fig. 8a) but increased for Molave (Fig. 8c), whereas no significant changes were found in Hato's track (Fig. 8e). To conclude, the underestimated TC intensity in ERA-interim initialized MPAS-A simulations was likely related to the inferior representation of the TCs in the initial fields in ERAinterim, in comparison to ERA5 reanalysis. Another sets of model simulations were conducted to examine more systematically the sensitivity of MPAS-A to initial conditions (ERAinterim vs ERA5) in simulating on TC tracks and intensities (details reported elsewhere). Results suggested the ERA5initialized runs showed significant improvement in TC intensities as well as their evolution when compared to ERAinterim initialized simulations (see Fig. S4), which were attributed to aforementioned better representation of vortices in ERA5 fields.

Difference in predicted steering flows
It is also noticed that the overall track performance of 60-to-3km MPAS-A simulations was slightly better than the 160-to-2-km ones (see Fig. 4a). Ringler et al. (2011) found that the simulation error was primarily dependent on the grid resolution in the coarsest part of a mesh, based on idealized shallow water simulations using the "official" variable resolution meshes. Motivated by that, additional ERA-interim initialized runs were conducted for Hope, Molave and Hato using another customized 60-to-2-km mesh (see Fig. S5) to examine whether an increased resolution in the coarsest part of the mesh can improve the prediction of large-scale flows and hence the TC track. It was observed that the 60-to-2-km simulations generally showed reasonable improvement in capturing the TC tracks as shown in Fig. 9. The northward bias in the track of Hope (Molave) was reduced slightly (significantly) in 60-to-2-km experiments compared to 160-to-2-km counterparts (see Fig. 9a and c), whereas Hato's track showed no significant change (Fig. 9e). Regarding the simulated intensity evolution, no significant improvement can be found (Fig. 9b, d and f).
For Molave, the tracks simulated in 160-to-2-km and 60to-2-km MPAS-A experiments showed significant difference after 12 UTC 17 Jul 2009, where the former track deviated northward and the latter track was in accord with the "best" track. In the light of the above difference, we compared the mean 700-to-300 hPa winds and 500 hPa geopotential height averaged for the period when Molave was passing the Luzon Strait and approaching the coast of southeastern China (i.e. Fig. 6 a, c, e The tracks and b, d, f minimum sea level pressure (units: hPa) for Hope, Molave and Hato, respectively, from the HKO best track data (black), the 0.75°ERA-interim initialized (red) and the 0.75°ERA5 initialized (purple) 160-to-2-km MPAS-A experiments from 12 UTC 17 Jul 2009 to 18 UTC 18 Jul 2009), between the 60-to-2-km and 160-to-2-km MPAS-A experiments (Fig. 10). Although the mean large-scale flows were weaker than those in ERA-interim near Taiwan and Luzon Strait, the MPAS-A experiments can reasonably capture their patterns (see Fig. 10a and b; not shown for 160-to-2-km experiment). By taking the difference between the two MPAS-A experiments, the 60-to-2-km experiment showed a stronger northeasterly component in the steering flow in the northern Luzon Strait (Fig. 10c). Such a flow difference was responsible for correcting the northward bias in Molave's track when passing the Luzon Strait, hence the improved track performance in the 60-to-2-km experiment. Overall, the above suggests that the grid resolution in the coarsest part of the mesh can also play a role in predicting the TC steering flow. In particular, the latter can be dictated by the larger scale circulation (such as the trough-ridge structure over East Asia; see Fig. 10a). It is likely that the 60-to-2-km experiment could better capture regional circulation features compared to the 160-to-2-km counterpart, thus resulting in better track forecast accuracy in the MPAS-A model framework.

Conclusions
We compared the performance of MPAS-A and WRF models in capturing the WNP TC tracks and intensities' evolution of eight historical TCs, namely, Hope (1979), Gordon (1989), Koryn (1993), Imbudo (2003), Dujuan (2003, Molave (2009), Hato (2017 and Mangkhut (2018), using ERAinterim as initial conditions (and lateral boundary conditions as well for WRF). For MPAS-A simulations, two variableresolution global grid meshes were used for each TC case, namely, a standard 60-to-3-km and a customized 160-to-2- In general, both MPAS-A and WRF simulations can reasonably reproduce the TC tracks compared to the HKO best tracks data. Track performance for Gordon, Imbudo and Molave was relatively promising, while notable northward biases were found in Hope's and Dujuan's tracks, whereas southward bias was found in Koryn's track. Moreover, the translational speed was found to be lower than observation for Koryn, Hato and Mangkhut. Despite the relatively satisfactory performance on simulated tracks, the simulated TC intensity was significantly underestimated, and their evolution was poorly captured in both models. For predictions of 36 h ahead, the simulated tracks of MPAS-A and WRF simulations were comparable, in which the mean position errors were both smaller than 150 km. However, with a lead time of 36 to 72 h, MPAS-A predictions began showing a smaller mean position error than WRF simulations. Concerning the intensity evolution, both MPAS-A and WRF were comparable for predictions within a lead time of 60 h. For lead time up to 60 h or more, the intensity error is smaller in MPAS-A predictions than in WRF, which was likely related to the better track performance in MPAS-A as compared to WRF. The above assessment was based on a relatively small number of TC events, and due to the difference of "grid configurations" between MPAS-A and WRF, an utterly fair comparison is generally impossible. Apparently, more TC cases are needed in future studies to ascertain the above findings. Nevertheless, given MPAS-A simulations were initial value predictions whereas WRF simulations were dynamically downscaled from reanalysis fields, the finding that MPAS-A has comparable (or slightly better) performance as (than) WRF is noteworthy. Furthermore, the 160-to-2-km simulations were almost as good as 60-to-3-km simulations, though the latter had a slightly better track performance. Based on the aforementioned promising findings, we consider our study as successful demonstration of using customized variable-resolution meshes for high-resolution regional/local forecasts using MPAS-A.
It is also worth mentioning that the customized 160-to-2km mesh has 433,154 cells, which is roughly half of the number of cells of the standard 60-to-3-km mesh (835,586 cells). This significantly reduces the required computational resources, which is also an important factor to consider in weather forecast other than accuracy. Besides, there are other advantages for customized meshes over standard meshes. Firstly, in contrast to spherical refinement region in standard MPAS-A meshes, our mesh generation provides users the freedom to customize the "domains" of interest (i.e. refinement regions). The users can select as many refinement regions as needed, and these regions can also be of arbitrary shapes. Secondly, it provides flexibility to users for setting the grid resolution in the finest and coarsest parts of the mesh, as well as the width of the transition zone in between. The scaling in grid resolution can be as high as 160:2 in our study. Lastly, the resolution can be boosted for regions with large orographic variations and along coastlines (not discussed in this study), which may improve the representation of local circulations (e.g. orographic flow and land-sea breezes). We believe that through increasing the number of optimizations and re-designing the refinement zones, the mesh quality as well as the model predictability can be further improved.
We investigated the underestimation in TC translation for Koryn, Hato and Mangkhut by comparing the predicted steering flows in MPAS-A simulations with ERA-interim reanalysis data. It was found that MPAS-A reasonably reproduced the location of WNP subtropical high as well as the steering flow but underestimated their strengths, compared to ERA-interim reanalysis. The weaker southeasterly steering flow resulted in a weaker TC translation in MPAS-A simulations. We are aware that the vortices in our experiments were initialized with weak pressure fields. Apart from the major error in predicting the large-scale steering flow, such initial weak intensity could also lead to track error (e.g. weak beta drift). However, a TC bogus scheme is currently unavailable in MPAS-A; we reckon that there is a need for future MPAS-A studies to initialize TCs with a bogus scheme, in order to have a more quantitative picture about the physical mechanism behind and a satisfactory understanding of the track error (e.g. what portion of track error do the steering and beta drift account for individually). Currently, we are developing a vortex initialization scheme in MPAS-A's SCVT framework based on the KBR filter (Kurihara et al. 1993(Kurihara et al. , 1995 for future studies on this issue. We have also examined the underestimation in storm intensity in MPAS-A simulations by conducting additional simulations for TCs Hope, Molave and Hato using the 0.75°× 0.75°ERA5 reanalysis as initial conditions. The ERA5 initialized simulations showed significant improvement in intensity evolution, though the intensity remained under-predicted for Hope and over-predicted for Hato, whereas the track performance was slightly improved. By comparing the low-level Fig. 9 As in Fig. 6, but from the 0.75°ERA-interim initialized 160-to-2-km (red) and 60-to-2-km (cyan) MPAS-A experiments winds at the initial times between two datasets, the wind patterns and the locations of vortices were found to be very similar. However, ERA5 was found to have a more detailed and better representation of the vortices over ERA-interim, including tighter vortices' eyes (i.e. smaller radius of maximum wind), stronger low-level cyclonic winds at the eyewalls and a more detailed horizontal wind structure. This finding suggests that the underestimated TC intensity in MPAS-A simulations can be attributed to the inferior representation of the storms in the initial fields in ERA-interim. However, as ERA5 reanalysis was just publicly released recently, more comprehensive studies are required to compare its difference with another reanalysis, e.g. ERA-interim.
Finally, to further study the impact of the grid resolution in the coarsest part of the MPAS-A meshes on their prediction of the large-scale flow, we have conducted additional simulations using another customized 60-to-2-km mesh, which has similar specification to the 160-to-2-km mesh. The 60-to-2km simulations showed reasonable improvement in capturing the TC tracks but no significant improvement in intensity evolution, in comparison to the 160-to-2-km simulations. For instance, the northward bias in Molave's track, as found in 160-to-2-km simulations, was significantly reduced when switching to the 60-to-2-km mesh. By examining the difference in their predicted steering flows, an anomalous northeasterly flow was found near the northern Luzon Strait in the 60-to-2-km simulation, thereby correcting Molave's northward bias. Such anomalous steering flow, likely controlled by other large-scale circulation features, was responsible for the improved track performance in 60-to-2-km experiment. This finding suggests that the prediction of the steering flow and the resulting TC track prediction can be sensitive to the grid resolution in the coarsest part of the variableresolution mesh used in MPAS-A simulations. Further study is needed to investigate more quantitatively the sensitivity of simulated steering flow to the grid resolution in MPAS-A framework.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Fig. 10 500 hPa geopotential height (shading; units: m) and 700-to-300 hPa vertically averaged winds (see scale arrow at bottom right), from a the ERAinterim reanalysis, b 60-to-2-km MPAS-A experiment and c the difference between 60-to-2-km and 160-to-2-km MPAS-A experiment. Results were further averaged from 12 UTC 17 Jul 2009 to 18 UTC 18 Jul 2009 (i.e. from T + 12 to T + 42 for simulation of Molave)