1 Introduction

The Bay of Bengal (BoB) in the North Indian Ocean (NIO) is an important ocean basin for the formation and propagation of Tropical Cyclones (TCs) as it has some of the warmest Sea Surface Temperature (SST) in the Tropics. TC related impacts in the BoB are exacerbated by the funnel-shaped semi-enclosed structure of the basin which makes it prone to storm surge events induced by the TC intense winds. Only 6% of the global TCs form over the NIO basin (Singh and Roxy 2022). Despite TC frequency and intensity in the BoB being considerably lower than in the Western North Pacific (WNP) or North Atlantic (NA), TCs do pose a significant threat to the lives and properties of people in the regions surrounding the BoB, because of the large population densities. In fact, in the last 300 years, 20 out of the 23 deadliest cyclones (with fatalities greater than 10,000) occurred over the Bay of Bengal (Mohanty et al. 2015; WMO 2011). More recently, Nargis caused an estimated death toll well above 100,000 in Myanmar (Webster 2008). Unlike the other tropical basins, the Bay of Bengal (as well as the Arabian Sea [AS]) has two cyclone seasons, with a distinct bimodal distribution in the seasonal cycle of TC frequency. One peak is in the pre-monsoon transition period (April–May) and another peak is in the post-monsoon transition period (October–November) (Li et al. 2013). In this paper, we use the term super cyclone ratio to refer to the ratio of the super cyclone frequency (with super cyclones defined as those having a maximum sustained windspeed ≥ 60 m s−1, Li et al. 2021) to the total TC frequency. Interestingly, the pre-monsoon and post-monsoon transition periods are very different in terms of TC intensification; during the pre-monsoon and post-monsoon periods, respectively, the BoB experiences the highest and lowest super cyclone ratio (Li et al. 2021) in the world. However, both the annual cyclone frequency and average intensity are found to be higher in the post-monsoon (even though the super cyclone ratio is higher in pre-monsoon) than in the pre-monsoon season (Li et al. 2013; Vissa et al. 2013).

It has long been known that environmental variables, such as SST, mid-tropospheric moisture content, low level relative vorticity and conditional instability favour the genesis of TCs, while high vertical windshear inhibits TC formation (Gray 1968; Palmen 1948). SST is an important factor in the genesis of TCs, but it alone cannot describe the cyclogenesis process. Like the genesis frequency, the SSTs also exhibit a bimodal seasonal distribution during the pre-monsoon and post-monsoon periods in the BoB. The relation between SST and TC frequency is not straightforward. The SST is higher in the pre-monsoon season than in the post-monsoon, but the largest number of cyclones is observed in the post-monsoon season (Yanase et al. 2012). Both in the BoB and AS, the observed large-scale environmental conditions that influence the genesis and intensification of TCs are found to be different during the pre-monsoon and post-monsoon seasons (Li et al. 2013; Murakami et al. 2017). Li et al. (2013) investigated what environmental parameters modulate the bimodal characteristics of BoB TCs and they argued that the higher mid-tropospheric moisture in the post-monsoon season leads to a higher number of TCs in that season than in the pre-monsoon season.

Global Climate Models (GCMs) can represent how the large-scale circulation will change in the future and what effect that could have on TC activity during the pre-monsoon and post-monsoon seasons in the BoB. It would be helpful to know if the pre-monsoon might continue being a season prone to a higher proportion of super cyclone ratio in the future. As a prerequisite of using GCMs to evaluate the effect of global warming on TC activity, it is crucial to assess the reliability of the models to simulate accurately TC activity in the present. Only a small number of previous studies have used high-resolution GCMs or compared the atmosphere-only and coupled GCMs over the BoB (Murakami et al. 2013; Roberts et al. 2020a,b). Vishnu et al. (2019) evaluated the performance of two Regional Climate Models (RCMs) for the simulation of TCs in the NIO. Two RCMs use different model dynamics and physical process parameterization techniques while coupling the atmosphere to the land surface. In their research, both models showed reasonable skill to capture the climatological observed temporal and spatial variability of the cyclogenesis in that region. In terms of intensity, although both models reasonably simulated the observed relationship between the TC minimum Mean Sea Level Pressure (MSLP) and TC maximum lifetime windspeed, the TC intensities were underestimated (in terms of both the MSLP and windspeed) in the models. RCMs are forced by large-scale environmental conditions at their boundaries either by GCMs or by reanalyses (Marbaix et al. 2003; Camargo et al. 2016). One limitation of RCMs is that it is difficult to perform future experiments due to the spurious interaction between the model solution and lateral boundary conditions (the driving fields from GCMs) (Marbaix et al. 2003).

A recent study by Roberts et al. (2020a) examined the impact of the resolution on TC simulations for GCMs (atmosphere-only) developed by the PRIMAVERA (PRocess-based climate sIMulation: AdVances in high-resolution modelling and European climate Risk Assessments) project. They found that both the TC frequency and intensity increase when there is an improvement in the horizontal resolution towards 25 km. Their investigation also showed the geographical distribution and structure of TCs improve if there is an increase in the model horizontal resolution. In their study, they provided a global overview and then focused their analysis of simulated TCs on the North Atlantic, northwestern Pacific and eastern Pacific regions. The extent to which PRIMAVERA GCMs can accurately simulate TC activity in the BoB has not yet been studied and there is a need to focus on the BoB region specifically. Moreover, the analysis in Roberts et al. (2020a) was limited to the atmosphere-only models. How the atmosphere–ocean coupling in the coupled versions of the GCMs in PRIMAVERA affects the TC simulation in the BoB still remains unexplored. Whilst some research has been carried out to explore the large-scale environmental parameters, no single study exists that examines the large-scale environmental conditions associated with TC genesis in the PRIMAVERA models. The existing literature on PRIMAVERA GCMs (both atmosphere-only and coupled) has not explored the ways in which biases in the large-scale environmental conditions might contribute to the biases in the TC activity. In this paper, our study aims to address these limitations by assessing the performance of the PRIMAVERA GCMs (both atmosphere-only and coupled versions) in simulating the BoB TC activity and associated large-scale environmental conditions.

The overarching research question that the current study seeks to address is how well do the GCMs in PRIMAVERA represent TC activity and the environmental factors influencing the formation and intensification of TCs compared to the observations during the pre-monsoon and post-monsoon seasons over the BoB. To answer this, we evaluate the ability of GCMs (both the atmosphere-only and coupled versions) to simulate the observed TC activity (frequency and intensity) in the Bay of Bengal. This study aims to address the following related research questions: How well do GCMs simulate the TC seasonal cycle? How do the TC frequency and intensity vary during the pre-monsoon and post-monsoon seasons? What biases in the environmental factors lead to biases in the pattern of the TC seasonal cycle during the pre-monsoon and post-monsoon seasons? What is the difference in performance of the atmosphere-only and ocean–atmosphere coupled models? This paper provides an understanding of the possible causes of model biases. We also compare the performance of the atmosphere-only and coupled versions of the models. Furthermore, we briefly explain if there is any difference in biases for the coupled models compared to the atmosphere-only versions of the models.

The rest of the paper is organized as follows: we describe the datasets, models, TC detection algorithm, GPI (Emanuel and Nolan 2004; Camargo et al. 2007b) (a product of four large-scale environmental conditions, defined later) and methodology used in this study in Sect. 2. Section 3 presents the results obtained from the analysis of PRIMAVERA GCMs including the climatology of the simulated TC activity (TC frequency and intensity), a possible explanation of biases in the large-scale environment that might be responsible for the biases in TC activities and a brief evaluation of the atmosphere-ocean coupling. Section 4 gives a discussion and some conclusions of the finding obtained from the analysis.

2 Data and methods

2.1 Datasets

  1. (a)

    Reanalysis

We use the ERA5 reanalysis dataset (Hersbach et al. 2020) archives provided by ECMWF (European Centre for Medium-Range Weather Forecasts) available at 6-hourly intervals. This data has a horizontal resolution of 31 km (0.25°) globally. Currently, monthly averaged output datasets are available publicly from 1979 onwards (Hersbach et al. 2020).

To investigate the large-scale environmental factors influencing the genesis of TCs, 6-hourly absolute vorticity at 850 hPa, relative humidity at the 600 hPa, SST, and zonal and meridional wind at 850 hPa are collected from the ERA5 reanalysis dataset for the period 1979–2018. These reanalysis datasets are used for comparison with the GPI and large-scale environmental variables associated with the TC genesis in models. Although TCs are identified and tracked in the ERA5 reanalysis data, we use the best track data for TC tracks and intensities.

  1. (b)

    IBTrACS

In our research, to determine the position (latitude and longitude), date, time and maximum sustained windspeed of TCs from their genesis throughout their lifetime, the International Best Track Archive for Climate Stewardship (IBTrACS) data (Levinson et al. 2010; Knapp et al. 2010) with 6-h temporal interval maintained by the National Center for Environmental Information (NCEI) is used for observation reference. The number of TCs obtained from the model simulations will be compared against the observed IBTrACS dataset.

IBTrACS data collects the position and intensity information during the lifetime of the cyclone from multiple agencies around the world and then combines them into a single data set. It produces the data set by utilizing post-season TC reanalyses collected from all available operational data sources- satellite, surface and aircraft reconnaissance (Kruk et al. 2010; Knapp et al. 2010). In this study, we use the best track datasets for which the source agency of IBTrACS is JTWC.

  1. (c)

    PRIMAVERA

In our study, we use six high-resolution (~ 25 km) GCMs developed by the PRIMAVERA project following the HighResMIP (Haarsma et al. 2016) protocol to investigate how tropical cyclones form and intensify. The six high-resolution GCMs are CMCC-CM2 (Cherchi et al. 2019), CNRM-CM6.1 (Voldoire et al. 2019), EC-Earth3P (Haarsma et al. 2020), ECMWF-IFS (Roberts et al. 2018), HadGEM3-GC3.1 (Roberts 2019) and MPI-ESM1.2 (Gutjahr et al. 2019). Detailed information on the models and their horizontal grid spacings is provided in Table 1.

Table 1 Summary of the properties of PRIMAVERA models, their horizontal resolutions and the number of available ensemble members in HighResMIP highresSST-present (atmosphere-only) and hist-1950 (coupled) simulations as used in this study

A major advantage of using the high-resolution models in PRIMAVERA is that they better capture the observed TC activity compared to their low-resolution counterparts (Roberts et al. 2020a). TCs are analysed in both atmosphere-only and coupled models in the present climate for the period 1950–2014 (with the exception that two of the HadGEM3-GC3.1 ensemble members have a time period of 1979–2014). We use as many ensemble members as available (information provided in Table 1) when producing the TC climatology for the climate models to reduce the uncertainty associated with the internal variability of the ensemble members in each model. We use both the atmosphere-only (forced with historical observed SST) and atmosphere–ocean coupled versions of the six PRIMAVERA models.

There is an uncertainty in the observed interannual variability and trends of TCs due to the relatively short observational record of TCs over climate scales of the last 100 years (Landsea 2007; Landsea et al. 2009). As well, we are still not fully confident if the agencies consistently record storms like tropical depressions, monsoon depressions, subtropical cyclones etc. in their best track data archives (Hodges et al. 2017). GCMs are able to complement the limited observational record available (Murakami et al. 2017). We use a total of 15 ensemble members (for atmosphere-only models) and 11 ensemble members (for atmosphere–ocean coupled models) for the six PRIMAVERA models over 65 years in the present (1950–2014). This is far greater than the length of the period covered by the ERA5 data from 1979 to 2020 (42 years). Thus, the sample size of TCs increases in the climate model data making statistics more reliable. Additionally, IBTrACS observations have temporal biases. In general, the TCs in IBTrACS have much shorter lifetimes compared to the tracks in the GCMs. We get longer lifetimes in the models (containing both the precursors and post-TC stages) compared to the observations (Hodges et al. 2017).

In addition to increasing the sample size of TCs, we want to measure the performance of the GCMs to simulate future TC activity in the BoB. We will have confidence in GCMs’ ability to simulate BoB TCs well in the future climate if, after our evaluation, we find that they can simulate TC activity in the BoB effectively in the present climate.

2.2 Tropical cyclone detection and tracking algorithm

TCs in the simulations are identified and tracked directly using a TC detection method namely the TRACK (Hodges et al. 2017) algorithm.

In the TC feature-tracking, at first, all tropical disturbances are tracked in the Northern Hemisphere (NH). At this stage, all systems are tracked using their 6-hourly vertically-averaged (800, 700 and 600 hPa levels) relative vorticities. Triangular truncation is applied to retain a wavenumber in the range 6–63 and is termed as T63 resolution. This spectral filtering helps to remove the vorticity noise present in the smallest spatial scales. The off-grid vorticity maxima exceeding by 5 \(\times\) 10–5 s−1 in each timestep in the NH are identified and then initialised into tracks using a nearest neighbour approach. These tracks are then refined by minimising a cost function for track smoothness subject to adaptive constraints for displacement distance and track smoothness. Only systems having a minimum lifetime of 2 days are retained as tracks for the next step. In the next stage of the TRACK approach, identification methods are applied to differentiate between TCs and other tropical systems. TCs are isolated from all the other tracked systems by applying the following identification criteria for the intensity and warm core structure: (1) The 850 hPa level vorticity at T63 resolution must exceed a value of 6 \(\times\) 10–5 s−1, (2) to infer the warm core structure the difference between T63 lower (850 hPa) and upper (200 hPa level) tropospheric vorticities is used and this must be greater than 6 \(\times\) 10–5 s−1 to be identified as a TC, (3) each level from 850 to 200 hPa is required to have a vorticity centre as an indication of coherent vertical structure in the TC, (4) criteria (1) to (3) must exist for a minimum 1 day over the ocean and (5) the genesis location of TC needs to be within 0–30°N. Other intensity variables are added to the tracks, the 10 m windspeed and Mean Sea Level Pressure (MSLP) (Hodges et al. 2017).

2.3 The large‑scale environmental processes associated with TC genesis

Pioneered by Gray (1979), a number of genesis indices have been developed which empirically relate the large-scale environmental variables linked to the genesis of TCs with the TC genesis frequency. One such widely used index is the Genesis Potential Index (GPI) (Emanuel and Nolan 2004; Camargo et al. 2007a, b) which is a product of four large-scale climatological variables:

$$GPI=Term1 \times Term2\times Term3\times Term4$$
(1)

where,\({\text{Term}}\,1 = \left| {10^{5} \eta } \right|^{{{\raise0.7ex\hbox{$3$} \!\mathord{\left/ {\vphantom {3 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}}} ,\,{\text{Term}}\,2 = \left( {1 + 0.1V_{{{\text{shear}}}} } \right)^{ - 2} , {\text{Term}}\,3 = \left( \frac{H}{50} \right)^{3} {\text{and}} {\text{Term}}\,4 = \left( {\frac{{V_{{{\text{pot}}}} }}{70}} \right)^{3} .\)

Here, \(\eta\) is the absolute vorticity at 850 hPa in 10–5 s−1, \(H\) is the relative humidity at 600 hPa in percent, \({V}_{shear}\) is the magnitude of vector difference of the 200 and 850 hPa wind fields in ms−1 and \({V}_{pot}\) is the potential intensity (theoretical maximum intensity that a TC can reach in its lifetime) in ms−1. The maximum TC potential intensity (PI) is defined as (Bister and Emanuel 2002),

$${V}_{pot}^{2}= {C}_{p}\left({T}_{s}- {T}_{0}\right)\frac{{T}_{s}}{{T}_{0}}\frac{{C}_{k}}{{C}_{D}} \left(ln{\theta }_{e}^{*}-ln{\theta }_{e}\right)$$
(2)

In Eq. (2), \({C}_{p}\) is the heat capacity at constant pressure, \({T}_{s}\) is the temperature at the ocean surface, \({T}_{0}\) is the mean outflow temperature, \({C}_{k}\) is the exchange coefficient for enthalpy, \({C}_{D}\) is the drag coefficient, \({\theta }_{e}^{*}\) is the saturation equivalent potential temperature at the ocean surface and \({\theta }_{e}\) is the equivalent potential temperature at the boundary layer.

Although much improvement has been made in terms of model horizontal resolution and physics, GCMs still poorly resolve the number of TCs in the model simulations when we use TC detection algorithms (trackers). TC frequency in the models is sensitive to the choice of trackers (Bourdin et al. 2022). These detection algorithms use certain thresholds related to intensity and structure which can be model sensitive. Low resolution models in particular underestimate the frequency of TCs. GCMs better simulate the large-scale environments than TC characteristics (properties of TCs) (Camargo et al. 2016). GPI can capture the main elements of the seasonal variability of TC frequency in every ocean basin (including BoB) based on observations (Camargo et al. 2007b; Emanuel 2013; Li et al. 2013) and can be used in conjunction with the TC detection algorithms for detecting the pattern of TC seasonal cycles in GCMs (Camargo et al. 2007a). GPI may help explain the shape of the seasonal cycle of the TC frequency in the models. However, GPI cannot capture the actual TC frequency in a month and the annual cycle of GPI is smoother than that of the TC frequency (for the annual cycle of the number of TCs, we observe a sharpness of the curve during the peak cyclone seasons) (Yang et al. 2021). In this research, we use GPI to explain physically the biases in the seasonal cycle of TC frequency in GCMs. First, we show that GPI can explain the variations of the annual cycle of BoB TC frequency in the observations. Because we find that TC and GPI are correlated in the observations in the BoB, we assume that GPI encapsulates the key parameters driving TC frequency in the BoB. For this reason, we use GPI as a tool to understand first if the environment is responsible for the biases in the seasonal cycle of TC frequency in GCMs and, if so, which environmental parameters in GPI are causing the bias. One important thing to be mentioned here is that, while GPI can help explain the basin-wide genesis, it cannot explain the intensity or tracks of TCs (Emanuel 2013).

We examine GPI and its components in the area covering the main development region (MDR) of the BoB. Main development regions are the source regions for around 50% or more of TC activity in any basin during the peak tropical cyclogenesis months (Waters et al. 2012). There are cases when using the MDR GPI is more useful than using the basin-wide GPI and MDR-averaged GPI is able to explain storm frequency seasonal cycles in all basins formed due to the favourable TC genesis conditions in the respective region (Bruyère et al. 2012; Li et al. 2013). We calculate the box-averaged climatological monthly GPI and its components for our analysis. The region covered by the box (5°–15°N and 80°–95°E) is the MDR for the TCs in the BoB which contains about 80% of the genesis locations of TCs for this region. Like the GPI and its components, the TC frequency in the models is also estimated for the MDR in the BoB. Any cyclone with a genesis location within the MDR is included in TC frequency.

The magnitude of TC frequency varies more from basin to basin than does the magnitude of GPI in observations. Although a larger size of the basin is responsible for a greater variation in the scale of TC number (e.g., in the western North Pacific) for some basins, the relationships between GPI and the average number of cyclones are inconsistent amongst some basins in the observations (Camargo et al. 2007b). Similarly, it is possible that there is an inconsistency in the relationships between GPI and TC frequency amongst the models we analyse.

We use several ensemble members (Table 1) to estimate the cyclone frequency in the models, while we use just one member for the calculation of GPI and its components. We calculate the ensemble-averaged TC frequency (with all the ensemble members that are available for a particular model) for each model. For the GPI, it is a relatively robust estimate to be evaluated just by one member because it is based on large-scale environmental fields. We found this information holds when investigating the GPI in the HadGEM3-GC3.1 model. For the HadGEM3-GC3.1 model, we calculated the climatological monthly GPI and its components at first using just one ensemble member and then using several ensemble members that were available. It did not make a difference when we averaged GPI and its components in two of the aforementioned ways. In the case of TC frequency, it needs several members to calculate because TCs are rare events in GCM simulations, and we may not have a large sample size for TCs.

Because we are mostly interested in the seasonal variations (especially during the pre- and post- monsoon seasons) rather than the total number of TCs, we recalibrate TC frequency in the models as described in the next section.

2.4 Windspeed calibration

In our research, the Saffir–Simpson scale is used to classify cyclones on the basis of the intensities of their maximum lifetime sustained winds and minimum lifetime MSLPs. Table 2 provides information about how the cyclones are grouped into tropical storms and various stronger categories in terms of their windspeed and MSLP ranges in the Saffir-Simpson scale.

Table 2 Classification of Saffir-Simpson storm intensity categories depending on their minimum lifetime MSLP and maximum lifetime 1-min sustained windspeed (Roberts et al. 2020a) and the percentile range corresponding to each category as used in this research

The typical width of the TC eye-wall (where maximum wind speed is found) is approximately 10 km. The horizontal grid-spacing (~ 25 km) of the high-resolution GCMs is still insufficient to capture the maximum intensity occurring along the eye-wall, and even the current state-of-the-art high-resolution GCMs underestimate the intensity (Moon et al. 2020).

As intensities are severely underestimated in all climate models, we calibrate the maximum sustained windspeed of TCs in the models with the IBTrACS data to facilitate comparison to observations. Another reason for the calibration is to understand whether GCMs can capture the fact that the pre-monsoon season has a higher ratio of intense cyclogenesis to total TC frequency compared to that in the post-monsoon season. Despite the fact that models are unable to represent the correct distribution of observed TC intensity (not shown), we assume that their proportions of TCs with relatively higher or lower intensities can be compared with similar proportions in the observations, allowing us to determine whether they can capture the fact that higher super cyclone (Categories 4 and 5 in the Saffir–Simpson scale) ratio occurs during the pre-monsoon (in the same proportion as in the observations). To demonstrate that and/or to explain why models fail to capture this feature, we recalibrate the TC intensity. The calibration technique simply adjusts the value of the maximum wind. To do the calibration, as a first step, we estimate the percentile values corresponding to the lower and upper bounds (in terms of the TC windspeed) of each category in the Saffir–Simpson scale for the IBTrACS data (Table 2). Then we find the respective TC maximum intensities in each model that correspond to the same percentile values in the IBTrACS (Table 3). We then set these model-specific intensities to be the new adjusted boundaries for each intensity range. One thing to be noted is that the calibration only determines the windspeed range corresponding to each category in the Saffir-Simpson scale for the models in relation to the IBTrACS data. So, the calibration technique only affects the category of model TCs based on intensity and the model TC frequency is unaffected by this. The seasonal cycle of TC frequency will be the same regardless of applying the calibration technique or not.

Table 3 The lower and upper bounds of windspeed limits to classify storm categories according to the Saffir-Simpson scale obtained for each model after windspeed calibration in the respective model with IBTrACS

Moon et al. (2020) constructed a horizontal-resolution dependent windspeed adjustment factor for GCMs using a more realistic asymmetric wind field profile of the TC. To test this alternative calibration method in our study, we multiplied the lifetime TC maximum intensity of the PRIMAVERA GCMs having horizontal grid-spacings of 25 km and 50 km by the adjustment factors 1.29 and 1.365 (derived by Moon et al. 2020) respectively (not shown). However, we find lower windspeed values following this technique than the aforementioned calibration technique. So, throughout most of the paper, we will describe the results found by incorporating the first calibration method mentioned above which uses IBTrACS wind speed percentiles.

Another alternative calibration method which we incorporate is that we classify the cyclones based on the intensities of their minimum lifetime MSLPs (similar to that using the maximum sustained windspeed but without any calibration) (Sect. 3.1.1). We did not find major differences between TC seasonal cycles estimated using windspeeds and MSLPs, so in this paper we discuss only for windspeed.

3 Results

3.1 Tropical cyclone activity in observations and models and comparison between the atmosphere-only and coupled versions of the models over the BoB

3.1.1 Relation between TC frequency and GPI

First, we evaluate how well all the models (both the atmosphere-only and coupled versions) in PRIMAVERA represent the observed TC activity (frequency and intensity), with a particular focus on the difference between the pre- and post-monsoon seasons. Additionally, we use the empirical index GPI as a tool to assess the TC climatology in observations and model simulations. We explore if GPI has the capability to reproduce the seasonal cycle of the cyclone frequency first in the observations and then in the models. Figure 1 shows the seasonal variations of the number of TCs per year by category according to Saffir–Simpson scale from IBTrACS and the climatological monthly GPI values from ERA5. Figure 2 represents the seasonal variations of the calibrated (Sect. 2.4) TC frequency by category according to the adjusted Saffir–Simpson wind speed scale and the climatological monthly GPI values for the atmosphere-only models. Figure 3 shows the same information as Fig. 2 for the coupled versions of the models.

Fig. 1
figure 1

Seasonal cycles of the number of cyclones per year (during the period 1979–2020) and GPI (during the period 1979–2018) in the BoB for the observational data. The grey, blue, green, yellow, orange and red colours in the bars correspond to the monthly cyclone frequency of the tropical storms and Categories 1, 2, 3, 4 and 5 respectively according to the Saffir-Simpson scale. The climatological monthly number of cyclones per year is estimated using the IBTrACS data. The overlaid black solid curve represents the seasonal cycle of the climatological GPI and is calculated using the ERA5 reanalysis data. The purple numbers represent the super cyclone ratios in pre-monsoon (left) and post-monsoon (right) seasons. The left and right vertical axes are for the number of cyclones per year and GPI values respectively

Fig. 2
figure 2

Seasonal cycles of the number of cyclones per year and GPI in the BoB for a CMCC-CM2, b CNRM-CM6.1, c EC-Earth3P, d ECMWF-IFS, e HadGEM3-GC3.1 and f MPI-ESM1.2 atmosphere-only models in PRIMAVERA in the present climate (1950–2014). The grey, blue, green, yellow, orange and red colours in the bars correspond to the monthly cyclone frequency of the tropical storms and Categories 1, 2, 3, 4 and 5 respectively according to the wind-speed-calibrated Saffir-Simpson scale (Sect. 2.4). The overlaid grey and black solid curves represent the seasonal cycles of the climatological GPI using the ERA5 reanalysis data and climatological model GPI respectively. The left and right vertical axes are for the number of cyclones per year and GPI values respectively

Fig. 3
figure 3

Seasonal cycles of the number of cyclones per year and GPI in the BoB for a CMCC-CM2, b CNRM-CM6.1, c EC-Earth3P, d ECMWF-IFS, e HadGEM3-GC3.1 and f MPI-ESM1.2 coupled models in PRIMAVERA in the present climate (1950–2014). The grey, blue, green, yellow, orange and red colours in the bars correspond to the monthly cyclone frequency of the tropical storms and Categories 1, 2, 3, 4 and 5 respectively according to the wind-speed-calibrated Saffir-Simpson scale (see Sect. 2.4). The overlaid grey and black solid curves represent the seasonal cycles of the climatological GPI using the ERA5 reanalysis data and climatological model GPI respectively. The left and right vertical axes are for the number of cyclones per year and GPI values respectively

From Figs. 1, 2 and 3, it is clearly apparent that the GPI is able to capture the seasonal variation of the TC frequency over the Bay of Bengal in both the observations (Fig. 1) and most of the models (Figs. 2 and 3). For some models in Figs. 2 and 3, there is an inconsistency between the magnitude of GPI and the genesis frequency in a particular season. Perhaps the most striking feature is that all the atmosphere-only (Fig. 2) and coupled (Fig. 3) models (except the HadGEM3-GC3.1 model in Fig. 2e), as well as the observations (Fig. 1), are able to capture the double peaks of the Bay of Bengal TCs in the pre-monsoon and post-monsoon seasons (if we ignore the fact that compared to the observations all models have a higher number of cyclones per year in the first few months of the year, with these cyclones having relatively low intensity).

However, if we compare Figs. 1 and 2b, it is evident that the difference in cyclone frequency between the post-monsoon and pre-monsoon seasons is reduced in the CNRM-CM6.1 atmosphere-only model (Fig. 2b) compared to the observations (Fig. 1). The difference in the magnitude of the monthly GPIs between the post-monsoon and pre-monsoon seasons is also smaller in the CNRM- CM6.1 atmosphere-only model (Fig. 2b) compared to the observations (Fig. 1). These facts hold true for the CNRM- CM6.1 coupled model (Fig. 3b) as well.

For the HadGEM3-GC3.1 model, both in the atmosphere-only and coupled versions (Figs. 2e and 3e respectively), the cyclone frequency in the pre-monsoon season is quite low compared to observations while the post-monsoon season is more similar to observations, and so the difference in the TC frequency between the post-monsoon and pre-monsoon seasons is very large. The magnitude of pre-monsoon GPI is also the lowest for this model among all the models. MPI-ESM1.2 model (Fig. 2f and 3f) performs very poorly because firstly, the monthly cyclone frequency is quite low compared to the observations (especially for the coupled version) and secondly, the TC frequency captured by this model is also very low compared to the other five models. The track density (the mean number of tracks per month through a 4 \(^\circ\) cap at each point on a common grid) in the models shows variability compared to the observations (Fig. S1 [atmosphere-only] and S2 [coupled] in the supplementary information file).

For the atmosphere-only models in Fig. 2, it is clear the relationship between the annual mean climatological GPI and annual mean climatological TC frequency is unique to each model. For example, in CNRM-CM6.1 (Fig. 2b), there is a positive change in the climatological GPI while a negative change is found in TC frequency relative to the values in the CMCC-CM2 model (Fig. 2a). This is also true for the coupled models (Fig. 3) that each coupled model has its own distinct relationship of climatological GPI and TC frequency. These results are in line with a previous study by Camargo et al. (2007a) who investigated the relationship between the climatological GPI and climatological TC frequency for several atmosphere-only GCMs. They also found that a larger climatological GPI in one model compared to others does not necessarily lead to a larger climatological TC frequency in the respective model for the TCs formed over the NIO.

By comparing the atmosphere-only and coupled models, we see that except for the HadGEM3-GC3.1 model, TC frequency in January decreases for all the coupled models compared to the atmosphere-only version of the respective models. The total annual-mean TC frequency is less in all coupled models (except for the HadGEM3-GC3.1 coupled model in which the annual frequency increases slightly). The difference in the cyclone frequency between the post-monsoon and pre-monsoon peaks decreases for all coupled models in comparison to the respective atmosphere-only versions.

There is a phase shift in the seasonal cycles of GPI for most of the coupled GCMs compared to their atmosphere-only counterparts. All coupled models (except the MPI-ESM1.2 model) have the peak GPI in October during the post-monsoon, compared to November for observations and most atmosphere-only models. All coupled models (except the HadGEM3-GC3.1) also have the pre-monsoon peak GPI in May, as in the observations, whereas half of the atmosphere-only models have this peak in April. In terms of the TC frequency, during the pre-monsoon season, the peak is in May for the MPI-ESM1.2 coupled model while in the MPI-ESM1.2 atmosphere-only model the peak is in April. Apart from this model, all other models have the same peak months for the TC frequency during the pre-monsoon and post-monsoon.

When we compare the atmosphere-only models (Fig. 2) with their coupled counterparts (Fig. 3), we also find that changes in GPI and TC frequency have opposite changes in some models (suggesting that other processes than the environmental parameters used in GPI play a role and the ocean feedback is one candidate) and similar changes in the rest of the models. A positive change in the climatological GPI of the coupled version compared to the respective atmosphere-only counterpart of a model does not necessarily correspond to a positive change in climatological TC frequency in the coupled version, and vice versa.

Overall, we argue that the model GPI represents well the combined effect of the four large-scale environmental processes associated with the genesis of the cyclones in both the atmosphere-only and coupled versions of the models. Further details evaluating the relationship between the model GPI and model TC number (with storms classified according to the Saffir-Simson scale based on their windspeeds and MSLPs) are provided in the supplementary information file (Fig. S3 and S4 in the supplementary information file). Figure S1 in the supplementary information file shows that GPI captures a large fraction of the variation of the seasonal cycle of the BoB TCs during the pre-monsoon and post-monsoon seasons in PRIMAVERA (both the atmosphere-only and coupled versions) GCMs when we use calibrated windspeeds. Figure S2 presents an equivalent figure but for the intensity classified based on MSLPs and we find the same result as in Figure S1.

3.1.2 TC intensity

We now contrast model TC intensity in the pre-monsoon (April–May) and post-monsoon (October-December) seasons. As already mentioned, the super cyclone ratio is higher in the pre-monsoon period over the Bay of Bengal (Li et al. 2019) than in the post-monsoon period in observations (Fig. 1). We now investigate if this characteristic remains true in the models. Even though the atmosphere-only and coupled HadGEM3-GC3.1 models have low TC frequencies in the pre-monsoon season in comparison with the post-monsoon seasons, and MPI-ESM1.2 atmosphere-only model (Fig. 2e) underestimates both TC frequency and intensity compared to the observations, after calibration all atmosphere-only and coupled models have a larger super cyclone ratio in the pre-monsoon season, as in the observations (Table 4). To our knowledge, this is the first evaluation of the super cyclone ratio in the BoB in GCMs.

Table 4 Super cyclone ratios (after applying the calibration technique) in the six PRIMAVERA (both atmosphere-only and coupled) GCMs during the pre-monsoon and post-monsoon seasons

In Table 4, if we compare the performance of the atmosphere-only and coupled models to simulate the TC intensity (after calibration), we have lower super cyclone ratios in all of the coupled models (except HadGEM3-GC3.1 in the post-monsoon) compared to their atmosphere-only counterparts during the pre-monsoons and post-monsoon seasons. In MPI-ESM1.2 coupled model (Fig. 3f), interestingly June has category 5 cyclones (unlike the atmosphere-only MPI-ESM1.2 [Fig. 2f]) and June and July contain all the super cyclones formed in a year. The TC intensity captured by this model (before windspeed calibration) is also very low compared to the other five models (not shown). In most cases, without calibration models cannot produce super cyclones in the BoB and even if they manage to produce super cyclones the super cyclone ratio is pretty low in those models (see supplementary information file Table S2).

Another noteworthy feature of Table 4 is that all models (except the atmosphere-only EC-Earth3P) after calibration underrepresent the inter-seasonal contrast (between the pre- and post-monsoon) in the super cyclone ratio if compared to that in the observations (Fig. 1). That means the ratio of the pre-monsoon and post-monsoon super cyclone ratios is less than the observed inter-seasonal contrast of ~ 3 (please see the purple numbers in Fig. 1). In general, coupling further reduces this contrast (except in the coupled MPI-ESM1.2 model [mainly because this model cannot capture super cyclone during the post-monsoon at all even after the calibration]).

3.2 Climatological seasonal variability of GPI and its related large-scale environment

Here, we further explore the following question in greater detail: do higher values of GPI in a given month of the year lead to higher TC frequency and/or intensity?

3.2.1 Relation between seasonal cycles of fractional TC frequency and GPI

Figure 4 shows the seasonal cycle of the fractional TC frequency (the number of monthly TCs divided by the total over all months, Fig. 4b), GPI (Fig. 4a), and the GPI components (Fig. 4[c-f]) for all the atmosphere-only models as well as the observations. We use fractional TC frequency to account for the large differences in annual-mean TC frequency, which we know cannot be explained by GPI (see above). If we compare Fig. 4a, b, it is evident that GPI cannot explain the actual fractional TC frequency in the models. Rather GPI only explains the relative size of pre-monsoon and post-monsoon TC peaks (especially during the post-monsoon period). It represents the relative magnitude of fractional TC frequency in the pre-monsoon compared to the post-monsoon season correctly. Table 5 compares the relative magnitude of seasonal peaks of cyclone frequency and GPI with observations and supports these findings. Table 5 shows that a larger ratio of pre-monsoon and post-monsoon climatological average GPI predicts a larger ratio of pre-monsoon and post-monsoon climatological average TC frequency and a smaller ratio of pre-monsoon and post-monsoon climatological average GPI predicts a smaller ratio of pre-monsoon and post-monsoon climatological average TC frequency in most of the models (both atmosphere-only and coupled) compared to the observations. This table also shows that in all models except the CNRM-CM6.1 (atmosphere-only) model, the ratios of GPI and TC frequency are less than one, i.e. the models have higher values in post-monsoon as in the observations. Below, we explore which environmental parameters are responsible for the differences in GPI and fractional TC frequency between the pre-monsoon and post-monsoon peaks.

Fig. 4
figure 4

Comparison of the seasonal cycle of a GPI, b fractional TC frequency, c RH term, d absolute vorticity term, e vertical windshear term and f PI term in GPI of all the atmosphere-only models with ERA5 reanalysis

Table 5 Pre-monsoon and post-monsoon seasons average GPI and TC frequency ratios in the six PRIMAVERA (both atmosphere-only and coupled) GCMs

We qualitatively assess which term contributes the most to the model bias in the pre-monsoon and post-monsoon for the BoB TCs. From Fig. 4, it is clear that CNRM-CM6.1 model has the largest bias in GPI (Fig. 4a) that occurs in the pre-monsoon season. One notable contribution to this bias is the RH term of GPI (Fig. 4c). The second largest bias in GPI occurs in the MPI-ESM1.2 model (Fig. 4a) which also has a large RH bias in the pre-monsoon season (similar to CNRM-CM6.1) and in the post-monsoon season to a lesser extent (Fig. 4c). EC-Earth3P also tends to overestimate GPI from May to December (Fig. 4a). In terms of the TC frequency (Fig. 4b), the largest bias occurs for the HadGEM3-GC3.1 during the pre-monsoon and in CNRM-CM6-1 in the post-monsoon seasons.

The role of each component of GPI in the biases of TC seasonal variation in the coupled models is largely similar to that in the atmosphere-only models (see supplementary information file Fig. S5).

3.2.2 TC intensity

We also analyse the PI, which is one component of GPI and which predicts the maximum intensity that any TC could theoretically reach at a given location and time. Analysing the PI may help to explain the possible intensification of TCs (Emanuel (2000); and Kossin and Camargo (2009)). We examine the relationship between PI and the super cyclone ratio in those models. Our hypothesis is that in the pre-monsoon season we expect to observe lower GPI and higher PI, while in the post-monsoon season higher GPI and lower PI should be found for storms in the BoB basin. As can be seen from Fig. 4a (GPI) and 4f (PI), our hypothesis appears to be true except for the CNRM-CM6.1 GPI (in which case the difference in GPI between the post-monsoon and pre-monsoon is very small).

The relationship between PI and the super cyclone ratio in the coupled models is fairly similar to that in the atmosphere-only models (see Fig. S5 in the supplementary information file).

3.3 Contributions of large‑scale environmental processes in GPI to model biases

3.3.1 Atmosphere-only models

Section 3.2.1 is a qualitative assessment, but here we present a quantitative assessment of the terms in GPI contributing the most to the model biases in the pre-monsoon and post-monsoon BoB TCs. Li et al. (2013) explored the relative contributions of each term in the GPI to quantitatively assess which term contributes the most to the bimodal feature of the BoB TCs. They used the National Centers for Environmental Prediction (NCEP)-National Center for Atmospheric Research (NCAR) reanalysis for the period 1981–2009. They found that the combined effect of enhanced vertical wind shear, vorticity and SST counteract the effect of increased RH in the monsoon season and inhibit TC formation (Fig. 3 in Li et al. 2013). They argued that RH is responsible for the difference in the seasonal variation of the climatological TC frequency during the pre-monsoon and post-monsoon periods. We carry out a similar analysis but using the ERA5 data for the period 1979–2020. We reach the same conclusions made by Li et al. (2013) (not shown). We here further explore the relative contributions of each individual genesis variable. This will help us to understand the relationship between tropical cyclogenesis and the large-scale environmental conditions and the mechanisms of how the environment influences TC genesis over the BoB in the models.

We now quantitatively examine in detail the independent contributions by the four terms in GPI to the model biases for all the atmosphere-only models. Each panel in Fig. 5 represents the monthly relative contribution of each term in GPI to the total GPI bias for each particular atmosphere-only model, with ERA5 taken as a reference. For example, the relative change in the Term1 of GPI can be expressed mathematically as,

Fig. 5
figure 5

The climatological monthly contribution of the relative difference of each term in GPI to the model biases and their sum for the a CMCC-CM2, b CNRM-CM6.1, c EC-Earth3P, d ECMWF-IFS, e HadGEM3 and f MPI-ESM1.2 (atmosphere-only) models in the Bay of Bengal. Different colour bars correspond to different environmental variable terms in the GPI. The black dashed line is the sum of the biases of the four terms in GPI and the red solid line is the actual GPI bias (relative difference) in the model

$${\delta Term1}_{rel }= \frac{{Term1}_{Model} - {Term1}_{ERA5}}{{Term1}_{ERA5}}$$
(3)

The results in Fig. 5 can be quantitatively interpreted like this: e.g., in Fig. 5b, the magnitude of the relative difference in RH term is around 1 in April and in Fig. 4c, the magnitude of the GPI RH term for ERA5 is nearly 1 in the same month for the CNRM-CM6.1. This means that for this month, the value of model RH term is nearly twice the value of RH term in ERA5 reanalysis.

In the pre-monsoon season, the RH term (green bars) contributes the most to the GPI bias in CNRM-CM6.1 and MPI-ESM1.2 while the vorticity term dominates in HadGEM3 (Fig. 5). Due to the increase of RH term (which could be linked to an increased GPI) in the CNRM-CM6.1 during the pre-monsoon, it is possible therefore that this model produces almost the same magnitude of GPI during the pre-monsoon and post-monsoon seasons (Fig. 4a). The negative RH bias (which is small in magnitude) in the post-monsoon for the CNRM-CM6.1 model might also contribute to the magnitude of GPI being almost the same during the pre-monsoon and post-monsoon seasons. For the HadGEM3 (Fig. 5e), strong negative vorticity bias in the pre-monsoon is responsible for a reduction in GPI during this season.

In the post-monsoon season, the windshear term (blue bars) contributes significantly and positively to the GPI bias in all models (Fig. 5a–f). The positive GPI windshear term (inverse of actual windshear) bias is likely to increase the model fractional TC frequency during the post-monsoon season compared to the observations in all models. In observations, lower vertical windshear in the BoB during the pre-monsoon and post-monsoon seasons compared to that in the monsoon season accounts for the much lower TC frequency during the monsoon relative to pre-monsoon and post-monsoon (Li et al. 2013; Roose et al. 2022). Interestingly, EC-Earth3P produces very little bias during both the pre-monsoon and post-monsoon periods although there are notable biases in other months. CMCC-CM2 and ECMWF-IFS also have small biases during the pre-monsoon and post-monsoon (with both models having negligible biases during the pre-monsoon season).

3.3.2 Coupled models

Figure 6a-f shows the monthly relative contribution of each term in GPI to the total GPI bias for the coupled versions of the six PRIMAVERA models with ERA5 taken as reference. Like the atmosphere-only models presented in Fig. 5, analysis of Fig. 6 reveals that the vertical windshear term contributes the most to the model GPI biases in the coupled GCMs during the post-monsoon period. And like their atmosphere-only counterparts (Fig. 5), the environmental term in the GPI that contributes the most to the model GPI bias in pre-monsoon season is different across the coupled models. It is interesting that the PI term in GPI has a substantial contribution to the GPI bias when considering the model environmental variable biases during the pre-monsoon for the models (except the CMCC-CM2 and ECMWF-IFS model) in Fig. 6. Whether the PI bias is positive or negative in a particular model, PI term bias in the coupled version is always a lot stronger compared to the PI bias in the atmosphere-only counterpart of the respective model (except the CMCC-CM2 model).

Fig. 6
figure 6

The climatological monthly contribution of the relative difference of each term in GPI to the model biases and their sum for the a CMCC-CM2, b CNRM-CM6.1, c EC-Earth3P, d ECMWF-IFS, e HadGEM3 and f MPI-ESM1.2 (coupled) models in the Bay of Bengal. Different colour bars correspond to different environmental variable terms in the GPI. The black dashed line is the sum of the biases of the four terms in GPI and the red solid line is the actual GPI bias (relative difference) in the respective model

3.4 Evaluation of large-scale environmental factors influencing TC genesis

We want to explore the possible causes of the model biases in GPI components discussed above. For this reason, we investigate the geographical distribution of the various related large-scale environmental variables and then try to link them to the GPI component biases in the models. We perform this analysis separately for the atmosphere-only and coupled models.

We carry out a specific analysis for each of the six PRIMAVERA models. The models we discuss mostly are the CNRM-CM6.1, EC-Earth3P, HadGEM3 and MPI-ESM1.2 models as these have comparatively larger biases (in terms of GPI and fractional TC frequency) than the other two models (specially for the atmosphere-only versions).

As discussed in Sect. 3.3, different GPI components tend to contribute the most to the GPI biases in different models. However, the possible causes responsible for a particular environmental term bias are consistent across the models.

3.4.1 Atmosphere-only models

3.4.1.1 RH bias in the pre-monsoon season

For both the CNRM-CM6.1 and MPI-ESM1.2 atmosphere-only models, during the pre-monsoon, the RH term in the GPI contributes the most (and positively) to the model GPI bias. As shown in Fig. 7, for both models, in the BoB box of MDR, positive RH biases (Fig. 7a, c) and positive precipitation biases are observed (Fig. 7b, d) during the pre-monsoon season, suggesting GPI biases might arise from the wrong representation of convection over the Maritime Continent, albeit with slightly different underlying mechanisms. In addition, we find that in the CNRM-CM6.1 (Fig. 7b), there is a low-level convergence bias over the MDR which coincides with the area of positive precipitation anomaly over the same area in the model during this season. For the MPI-ESM1.2, there is a cyclonic low-level circulation around the western edge of the box, which together with the positive rainfall bias over the Bay of Bengal, could suggest a too-early onset of the monsoon.

Fig. 7
figure 7

Spatial distributions of model biases in the a 600 hPa RH (contours) (CNRM-CM6.1 atmosphere-only), b 850 hPa wind (vectors) and precipitation (contours) (CNRM-CM6.1 atmosphere-only) c 600 hPa RH (contours) (MPI-ESM1.2 atmosphere-only), d 850 hPa wind (vectors) and precipitation (contours) (MPI-ESM1.2 atmosphere-only) in the pre-monsoon season

3.4.1.2 Vorticity bias in the pre-monsoon season

For atmosphere-only HadGEM3-GC3.1, in the pre-monsoon season, vorticity is the dominant term leading to a negative GPI bias in this season. We find negative biases in TC frequency, GPI and the vorticity term in GPI during the pre-monsoon season. Figure 8a, b present the geographical distributions of the low level (850 hPa) vorticity component in GPI during the pre-monsoon season for ERA5 reanalysis and HadGEM3-GC3.1 model bias, respectively. From the figure, it is apparent that there is positive vorticity in the MDR over the BoB for the ERA5 data (Fig. 8a), however, the HadGEM3-GC3.1 vorticity bias is negative in that region (Fig. 8b), which would cause less seeding and/or growth of cyclonic depressions that might otherwise develop into TCs at a later stage.

Fig. 8
figure 8

Spatial distributions of the a Vorticity component at 850 hPa in GPI (contours) for observations (ERA5), b Vorticity component at 850 hPa in GPI (contours) for HadGEM3-GC3.1 (atmosphere-only) model bias, c 850 hPa level wind for observations (ERA5) and b 850 hPa level wind for HadGEM3-GC3.1 (atmosphere-only) model bias during the pre-monsoon season. In c and d, the contours represent the windspeeds

When analysing 850 hPa wind bias (Fig. 8d), the negative vorticity bias (Fig. 8b) is mostly due to the negative shear vorticity (Bell and Keyser 1993) bias in the area of BoB MDR. In ERA5 (Fig. 8c), it is apparent from wind vectors that along the southern edge of the MDR box the zonal gradient of the meridional component of the wind (dv/dx) is positive and the meridional gradient of the zonal component of the wind (du/dy) is negative towards the north in this box. For the model biases (Fig. 8d), in the box over the BoB, both the dv/dx and –du/dy terms become smaller which eventually leads to a negative vorticity bias (according to equation [1]).

3.4.1.3 Vertical wind shear bias in the post-monsoon season

Figure 9a represents the spatial distribution of the model bias in the magnitude of the actual vertical windshear between 200 and 850 hPa levels for the CNRM-CM6.1 (atmosphere-only) model in the post-monsoon season. This figure shows a negative windshear bias in the MDR box over the BoB. We also find the same negative bias in windshear but of different magnitudes for the EC-Earth3P, HadGEM3-GC3.1 and MPI-ESM1.2 (atmosphere-only) models over the same area (see Fig. S6 in the supplementary information file). To explain the positive windshear term bias, we investigated the geographical distribution of both the upper level (200 hPa) and lower level (850 hPa) wind fields and found that it is mainly the bias in the upper-level wind that leads to a windshear bias in the model. For the sake of brevity, we only show the results for CNRM-CM6.1 (atmosphere-only) but we find that the cause of the vertical windshear bias is consistent across all the atmosphere-only models analysed. Figure 9b, c compare the spatial distribution of the upper level (200 hPa) wind in the post-monsoon season for the CNRM-CM6.1 (atmosphere-only) model (Fig. 9c) with the ERA5 reanalysis (Fig. 9b). We find that for ERA5, the easterly wind is prevalent during this season near the equator and in the BoB box covering the MDR. The upper-level easterlies are weaker in the model compared to ERA5, resulting in a westerly bias in the BoB box (Fig. 9c). This is true for the pre-monsoon windshear biases in the models as well (not shown) in some of the models (HadGEM3-GC3.1 [atmosphere-only] and CNRM-CM6.1 [coupled]) where GPI windshear biases are large and positive.

Fig. 9
figure 9

Spatial distributions of the a Model bias in the magnitude of the actual vertical wind shear between 200 and 850 hPa for CNRM-CM6.1 (atmosphere-only), b Upper level (200 hPa) winds (both speed and direction) for observations (ERA5), and c Model bias in the upper level (200 hPa) winds (both speed and direction) for CNRM-CM6.1 (atmosphere-only) model during the post-monsoon season. In a The contours represent the magnitude of the vertical wind shear bias in the model while in (b) and (c), the shaded contours and barbs represent the windspeed and wind vectors at 200 hPa respectively

The origin of the negative vertical windshear bias is also consistent across all the analysed coupled models (EC-Earth3P, HadGEM3-GC3.1 and MPI-ESM1.2) and the cause of the windshear bias is also exactly the same as in the atmosphere-only models. The other two PRIMAVERA models (CMCC-CM2 and ECMWF-IFS [both the atmosphere-only and coupled versions]) have similar windshear bias and the explanation for the windshear bias is the same as in these analysed models.

The analyses of the windshear biases for the models (both atmosphere-only and coupled) other than the CNRM-CM6.1 (atmosphere-only) model are documented in Figures S6 and S7 of the supplementary information file.

3.4.1.4 Wind shear biases and weaker-than-normal Walker circulation

During the post-monsoon season, there is significant positive low-level vorticity over the BoB box in the reanalysis (because there is a cyclonic circulation in the box.) (Fig. 10a). In Fig. 10b, there is a northeasterly low-level wind bias (wind coming from the land with little moisture in it) accompanied by a negative precipitation bias in the box over the BOB covering the MDR.

Fig. 10
figure 10

Spatial distributions of the a Precipitation (contours) overlapped by 850 hPa winds (vectors) for observations (ERA5) and b CNRM-CM6.1 (atmosphere-only) model precipitation bias (contours) overlapped by 850 hPa wind bias (vectors) in the post-monsoon

From Figs. 9 and 10, the most striking feature is that the low-level easterly (Fig. 10b) and upper-level westerly wind (Fig. 9c) biases around the equator are associated with a positive rainfall bias to the west and a negative rainfall bias to the east in the equatorial Indian Ocean (Fig. 10b). Thus, a weakening of the normal equatorial Indian Ocean Walker Cell happens in the model simulation. We also observe a weaker-than-normal Walker circulation in the coupled version of CNRM-CM6.1 over the equatorial Indian Ocean (not shown). The weakening of the upper-level easterlies in the CNRM-CM6.1 leads to a negative windshear bias in the equatorial Indian Ocean (vertical wind shear bias in the post-monsoon season in Sect. 3.4.1).

A similar weakening in the Walker cell is observed in the HadGEM3 (atmosphere-only) model as well, but over the NIO only instead of the whole equatorial Indian Ocean region as in the CNRM-CM6.1 (atmosphere-only) model. For the EC-Earth3P and MPI-ESM1.2 (atmosphere-only versions) models, no such weakening of the atmospheric circulation cell (specially in terms of the precipitation bias) is observed in the post-monsoon.

3.4.2 Coupled models

3.4.2.1 SST biases in coupled models in contrast to their atmosphere-only counterparts

In this subsection, we analyse the role of atmosphere–ocean coupling on GPI biases. We try to explain the differences in TC frequency between the atmosphere-only and coupled models and assess the role of the TC-induced SST cooling. The TC-induced SST cooling is the result of the upper ocean vertical mixing (entrainment of the deep cooler water into the ocean mixed layers), upwelling by Ekman pumping (caused by TC wind stress) and heat transport to the atmosphere and can have a significant impact on the mean SST (Vincent et al. 2012, 2013).

Figures 11a–d compare the SSTs in the ERA5 reanalysis (Fig. 11a, c) with the CNRM-CM6.1 coupled model SST biases (Fig. 11b, d) during the pre-monsoon and post-monsoon seasons. For the ERA5, the pre-monsoon season (Fig. 11a) has higher SSTs (about 1 °C larger SST) than the post-monsoon season (Fig. 11c) in the box over the BoB covering the MDR area. For the CNRM-CM6.1 coupled model, both during the pre-monsoon (Fig. 11b) and post-monsoon (Fig. 11d), almost the entire domain has a negative bias (except for some of the regions east to Somalia during the post-monsoon). Surprisingly, in the post-monsoon, we observe a positive SST bias on the western side and an intense negative bias on the eastern side of the equatorial Indian Ocean (a condition very similar to a Positive Indian Ocean Dipole [PIOD] phenomenon) for this model (Fig. 11d). When an ocean–atmosphere coupling is added in the coupled CNRM-CM6.1, the anomalous low-level easterlies (Fig. 10b) in the atmosphere-only CNRM-CM6.1 causes SSTs to become warmer to the west and cooler to the east of the equatorial Indian Ocean in the coupled version. This SST anomaly further intensifies the original circulation and precipitation anomalies (Fig. 10b) and the system acts like a positive feedback loop. The situation appears to result in a Bjerknes feedback in the equatorial Indian Ocean in the coupled CNRM-CM6.1 model during post-monsoon.

Fig. 11
figure 11

Spatial distributions of the a SST (contours) for the ERA5 during pre-monsoon overlapped by 850 hPa winds (vectors), b SST model bias (contours) for coupled CNRM-CM6.1 overlapped by 850 hPa wind bias (vectors) during pre-monsoon, c ERA5 SST (contours) overlapped by 850 hPa winds (vectors) during post-monsoon and d coupled CNRM-CM6.1 SST model bias (contours) overlapped by 850 hPa wind bias (vectors) during the post-monsoon

Over the BoB, we see a reduction in SSTs of around ~ 0.5 °C in both seasons with the negative bias being slightly stronger in the post-monsoon than in the pre-monsoon.

Similar results are found when analysing SSTs for all the other five PRIMAVERA coupled models in the MDR box (not shown). When comparing the coupled and atmosphere-only models, a reduction in the SST occurs in almost all the coupled models over the BoB box both during the pre-monsoon and post-monsoon seasons. And as already discussed in Sect. 3.1.1, for most models, fewer storms are observed in the coupled models than in their atmosphere-only counterparts. This might happen due to the reduction in the SST mean state in the coupled versions of the models compared to the atmosphere-only counterparts.

3.4.2.2 Analyses of the model biases in the PI term

In Fig. 6, the PI term in GPI seems to dominate other components of GPI when investigating model biases in the pre-monsoon season (except the CMCC-CM2 and ECMWF-IFS model). Biases in PI are also larger in the coupled versions compared to the respective atmosphere-only models (Fig. 5). If we compare Fig. 12 (pre-monsoon PI bias) with Fig. 11b (pre-monsoon SST bias for the same model), then we see that the PI bias is positive while the SST bias is negative in the area of MDR over the BoB. After also evaluating the relationship for the other three models (not shown), we find that the signs of the SST and PI biases are the same for the HadGEM3-GC3.1 and MPI-ESM1.2 models but the signs of the SST and PI biases are opposite to each other for the CNRM-CM6.1 and EC-Earth3P models during the pre-monsoon season. We conclude that analysing SST alone is not sufficient to explain the model biases in PI. Relative SST with regard to the tropical mean might potentially be more important (Vecchi and Soden 2007).

Fig. 12
figure 12

Spatial distribution of model bias in the PI term of GPI for CNRM-CM6.1 (coupled) model in the pre-monsoon season. The contours represent the magnitude of the GPI PI bias in the model

4 Conclusion and discussions

Accurately forecasting the TC activity in a warming climate for the BoB is of vital importance because of its huge socio-economic impact associated with the potential for widespread damage in this region. In this study, the aim was to assess the performance of the high-resolution PRIMAVERA GCMs and explore how GCMs simulate the seasonal variations of TC frequency compared to the observations in the BoB.

Our analysis shows that all models (except the atmosphere-only HadGEM3-GC3.1) can simulate the variations in the annual cycle of the TC frequency reasonably well in the BoB. One major finding is that almost all the models (both the atmosphere-only and coupled versions) can capture the bimodal distribution of TC seasonal cycle during the pre-monsoon and post-monsoon seasons after windspeed calibration. Similar to the observations, most models have a lower number of TCs during the pre-monsoon season than in the post-monsoon season. Regarding TC intensity, most models can capture the observed characteristics of a higher super cyclone ratio in the pre-monsoon season compared to the post-monsoon season. All models (except the HadGEM3-GC3.1 in post-monsoon) have higher super cyclone ratios in their respective atmosphere-only versions compared to the coupled models both during the pre-monsoon and post-monsoon. The inter-seasonal contrast in the super cyclone ratios between the pre-monsoon and post-monsoon reduces in all models (except the atmosphere-only EC-Earth3P model) when compared to the observations. Ocean–atmosphere coupling further reduces this contrast in almost all the models. However, all these findings are somewhat limited by the fact that we use the calibrated wind intensities here rather than the TC intensities simulated by the models. Nevertheless, we stress that this recalibration of the intensity does not affect the fact that the models correctly represent the formation of a higher proportion of the (relatively) most intense TCs in pre-monsoon than in post-monsoon, despite lower TC frequencies in this season.

The GPI is a product of four large-scale environmental conditions that can help determine the potential for tropical cyclogenesis. Our results show that GPI represents well the relative magnitude of the fractional TC frequency during the pre-monsoon and post-monsoon seasons in the models and observations. While GPI can capture the pattern in the seasonal cycle of TC frequency, it cannot capture actual TC frequency. Each model (both atmosphere-only and coupled) has its unique relationship of the magnitude of climatological GPI and TC frequency in the BoB and the relationship varies a lot from model to model. For example, in the CNRM-CM6.1 model, climatological TC frequency is almost the same as in the CMCC-CM2 model but the magnitude of climatological GPI is almost 4 times higher than in the CMCC-CM2 model in May. In the EC-Earth3P model, TC frequency is nearly half of that in the CMCC-CM2 model and the magnitude of climatological GPI is almost double that in the CMCC-CM2 model during the same month. MPI-ESM1.2 model captures an extremely small number of TCs per year, even though GPI is comparable to the observations. This model also has weaker storms compared to the other models. As in the MPI-ESM1.2. model, Camargo et al. (2007a) also found the GPI for several other models GPI fails to capture the actual number of TCs in an individual basin in their study. It would be very interesting to know why MPI-ESM1.2 fails to simulate TCs properly, however, it is beyond the scope of our research. The weaker structure of TCs or a lower number of cyclones in the MPI-ESM1.2 model might be due to a lack of key processes in the model, limitations in model physics, model dynamics, or physics-dynamics coupling in the model. Future work is needed to discover the actual reasons. Camargo et al. (2007a) suggested that the variations in the climatology of the number of TCs across the models are primarily related to the variations in the dynamics of the simulated storms in the models rather than the variations in the simulated large-scale environmental conditions associated with the formation of TCs (as represented by the GPI).

We have documented model biases and investigated their possible causes. We examine the biases in the pattern of seasonal variations of TC frequency and compare them with the biases in the large-scale environmental conditions in GPI for all analysed models during the pre-monsoon and post-monsoon seasons. We find that CNRM-CM6.1 (atmosphere-only) model has the highest bias in GPI among all the models. Our analysis reveals that during the pre-monsoon season, the RH term in GPI contributes the most to the model bias both for the CNRM-CM6.1 and MPI-ESM1.2 (atmosphere-only) models. The pre-monsoon positive RH term bias in these atmosphere-only models might be caused by an early monsoon onset because the models have a positive rainfall bias over the Bay of Bengal. However, in Figs. 5a (CNRM-CM6.1) and 5d (MPI-ESM1.2), we also see a small positive GPI windshear term (which is the inverse of the actual vertical windshear) anomaly in the models. In monsoon circulations, we would expect an increased environmental vertical windshear (which would lead to a decrease in the GPI windshear term), but in the CNRM-CM6.1 model a reduction in this variable occurs instead. This unexpected finding needs further investigation, though it may be related to overall negative upper-level wind speed biases (see below). For the HadGEM3-GC3.1 (atmosphere-only) model, the vorticity term in GPI has the largest contribution to the change in GPI during the pre-monsoon season. This vorticity bias contributes negatively to GPI bias in this model during the pre-monsoon season and is caused mostly by a negative shear vorticity component.

This study has found that for all models (both atmosphere-only and coupled), positive biases in the windshear term contribute the most to the GPI biases in the post-monsoon. The reason for this positive windshear term bias (which corresponds to lower windshear than observed) is weaker upper-level easterlies (a westerly bias) in the models compared to the observations, and this is consistent across all the atmosphere-only and coupled models in PRIMAVERA. In the CNRM-CM6.1 (both the atmosphere-only and coupled), along with the upper-level westerly wind bias, we also observe a lower-level easterly wind bias around the equator. These biases are associated with a positive rainfall bias to the west and a negative rainfall bias to the east of the equatorial Indian Ocean and appear to be associated with a weaker-than-normal Walker circulation in this region for this model. A similar weaker-than-normal Walker circulation is observed over the NIO in the HadGEM3 -GC3.1 (atmosphere-only) model. We do not find this Walker cell bias (specially in terms of the precipitation bias) in any of the rest models analysed. One of the limitations of our study is that we do not explain the reason for the weakening of upper-level easterlies during the post-monsoon that is contributing to the windshear term biases in the models. Exploring what is causing the upper-level wind biases is beyond the scope of our research, however, this is a very important thing to investigate. Because the upper-level westerly bias is associated with a lower-level easterly bias and a positive rainfall bias to the west and a negative rainfall bias to the east in the equatorial Indian Ocean in some models, we hypothesize that the upper-level wind biases might arise from the problem in coupling between convection and circulations in the tropics for the models.

For the coupled GCMs, the PI term in GPI is found to have the largest contribution to the model GPI biases in the pre-monsoon season (for most of the models). We find that SST alone cannot explain these PI biases. Further work is required to investigate the other fields used to calculate PI. When ocean and atmosphere interchange fluxes in the coupled model, this ocean–atmosphere coupling reduces the inter-seasonal contrast in cyclone frequency and super cyclone ratio (i.e. coupling increases the biases in terms of the super cyclone ratio) in the models [due to TC-Ocean negative feedback (Vincent et al. 2012, 2013)]. This has implications for studies of the future climate using coupled models. We need to be cautious when we project the TC activity for a future climate and understand the climate impacts on TC activity with coupled models. Introducing coupling might either increase or decrease the biases in the TC activity in future simulation runs for the models when compared to the present climate.

In the pre-monsoon season, lower magnitudes of GPI and higher magnitudes of PI are found, while in the post-monsoon season higher GPI and lower PI are found, for the BoB in all models except for the CNRM-CM6.1 atmosphere-only GPI. These results are consistent with the findings in observations and models that TC frequency is higher in the post-monsoon season but the super cyclone ratio is higher in the pre-monsoon season in agreement with Li et al. 2013.

When comparing the performance of the atmosphere-only and ocean–atmosphere coupled models, lower SST is found in most of the coupled models over the BoB. Likely due to this reduction in energy over the warm ocean surface needed to sustain and intensify TCs through heat transport, both TC frequency and intensity are lower in most of the coupled models.

This is the first study that has documented the model bias in the large-scale environmental conditions for PRIMAVERA GCMs over the BoB. The understanding of the biases gained here could be used to help improve predictions of the TCs in the BoB by GCMs. This study also provides an understanding of the role of atmosphere–ocean coupling in the simulation of BoB TCs. However, one possible limitation of this study is the degree to which GPI may not be a suitable diagnostic for the analysis of large-scale environments associated with the TC seasonal cycle in the models. In this paper, we have not proven a causal link between GPI and TC frequency in GCMs. Instead, we assume that the relation that was derived from the observations remains valid in the model world, and we are encouraged in making that assumption by the fact that fractional TC frequency and GPI have consistent biases in the pre-monsoon and post-monsoon seasons.

In this paper, we have limited our analysis to TC activity and associated large-scale environmental conditions simulated in GCMs for the present climate. In a future paper, we will explore how TC activity might change in a future warming climate in the BoB and we will discuss those projections in light of the biases described in this study.