The Covid-19 pandemic has significantly impacted public health and economy globally since 2019 and the situation is still not entirely under control at the time of writing this paper. In India, the first wave of Covid-19 infections began in March 2020 that lasted for 8 months with a peak in mid-September. The second wave that started in mid-March 2021 rapidly turned severe and took a serious toll on public health. The infection spread during the second wave was alarmingly rapid and reached its peak within 3 months as compared to the 6 months required to reach the peak during the first wave. The number of severe patients and fatalities were also higher during the second wave. Such rapid spread was completely unexpected especially in the light of continued restrictions on people movement during February to April 2021 in India. Finally, the third wave started in late December 2021 and peaked within a month in January 2022 as shown in Fig. 1.

Fig. 1
figure 1

Reported active cases, cumulative cases, and death in India

Almost all known models (Xu and Li 2020; Thakur et al. 2020; Asad et al. 2020; Agrawal et al. 2021; Renardy et al. 2020; Cacciapaglia et al. 2020) were significantly off the mark in terms of their ability to predict the trends in infection count, critical cases, and fatality—including ours (Barat et al. 2021). We were successful in predicting the timeline of the second wave in the Indian city of PuneFootnote 1 several months before the wave started; however, our prediction of the amplitude of peak cases was lower than reality. In an ex post facto analysis, we examined several factors to explain virulence, peak amplitude, and rapid spread of the second wave. They include characteristics of B.1.617.2 i.e., Delta variant (WHO 2022) (over B.1.1.7 i.e., Alpha variant), the sudden opening of all public places, increased frequency of super-spreading events and mass gatherings, and increased noncompliance with Covid appropriate behaviour. However, the extent each of these factors contributed individually to the overall dynamics of the second wave and its severe impacts needs to be established.

Similarly, while the increased infectiousness of the Omicron variant (WHO 2022) was the principal driving factor for the third wave, individual contributions from other factors such as the apparently reduced severity of the variant, the fraction of vaccinated population, vaccine efficacy against Omicron, and natural immunity of the population are yet to be explored in the literature to the best of our knowledge. It is worth noting that many models, including the oft-quoted SUTRA model (Agrawal et al. 2021) and other models (Mandal et al. 2021; Choudhary and Priyanka 2021; Kavitha et al. 2021; Mohan et al. 2022) attempted to predict the timing and the amplitude of the peak case numbers as well as the expected burden on the public health infrastructure during the third wave. These models use advanced statistical methods and AI techniques. However, all had to re-adjust their predictions every 7–10 days, especially during the upward trajectory of the third wave.

It appears then that we still lack a clear understanding of the how to model the effects of new and emerging variants (the term variant refers to Variant of Concern, Variant of Interest (WHO 2022) throughout this paper) on the progression of the pandemic in a geographical area of interest, including the interplay of factors such as vaccine efficacy, public compliance with administrative interventions. We believe that such understanding is important to control future waves, especially those triggered by new variants, that may impact the society’s ability to return to a (new) socio-economic normal.

In this paper, we focus on understanding the complex interplay of different key drivers and their collective contribution to a surge in infections through a detailed ex post facto analysis. This paper also investigates and discusses a set of key factors that should be critically analysed for better prediction of future waves and their impact on public health.

We conducted this ex post facto analysis using a comprehensive parameterized digital twin that extends our earlier digital twin of Pune (Barat et al. 2021) and by critically revisiting the actual situation in Pune from January 2021 to November 2021 (essentially the period that covers the entire second wave). Conceptually, we combined a data-centric approach with a domain model–centric approach accompanied by rigorous simulation-based experimentation. Our data-centric approach helps to analyse real data about various key indicators of the pandemic that include the reported cases, critical cases and reported deaths in Pune during the various phases of a wave. Domain model-centric approach is based on a fine-grained simulatable model which captures individuals, places and virus characteristics along with a candidate set of non-pharmaceutical interventions to control the pandemic.

The purposive city digital twin captures the established facts about all variants; people and their demographic characteristics; different places in a city and the possible mixing patterns in those places; non-pharmaceutical interventions; vaccines and their efficacy, and other epidemiological characteristics such as waning immunity and reinfection possibilities. The digital twin also captures the uncertainties around these factors via a set of configurable parameters. Simulation-led experimentation helps to explore various possibilities (i.e., explore uncertainty) through parametrization and provides a means of comparison with real, recorded situations. Systematic simulation-led explorations focusing on multiple candidate factors provides an understanding about how different factors contribute to the wave at different points of time. In particular, a parameterized digital twin can help us predict the effects of future variants with prescribed characteristics.

The rest of the paper is structured as follows: Section 2 provides a brief overview of state-of-the-art in predicting the evolution of the Covid-19 pandemic. Section 3 presents our digital twin-based approach. Section 4 illustrates a wide range of simulations that we conducted using the city digital twin to understand the emergence of the second wave in Pune using the available information. Section 5 synthesizes simulation results with the reported data to develop hypotheses about Covid-19 dynamics and the various influencing factors. Section 6 validates the hypotheses on the third wave of the pandemic in Pune. Section 8 concludes by highlighting the learnings from \(300+\) simulation experiments that we conducted as part of this study.

Survey of the State of the Art

A majority of the predictive models for pandemic are fundamentally coarse-grained. They adopt one of the two techniques to predict the future: (a) statistical modelling supported by historical data (Agrawal et al. 2021; Mohan et al. 2022; Zhu and Chen 2021), including those based on AI (Fayyoumi et al. 2020); or (b) compartmental models (e.g., SEIR model) that capture epidemiological understanding or domain knowledge in the form of differential or similar evolution equations (He et al. 2020; Pandey et al. 2020; López and Rodo 2021). While coarse-grained models are usually computationally efficient and explainable (being based on well-understood mathematical techniques), they have several shortcomings, which are particularly relevant in the context of predicting the progress of a pandemic. Chiefly these coarse-grained models ignore the heterogeneity of the population in terms of age, comorbidity, and socio-economic factors that manifest in wide variance of individual behaviours of the population (Barat et al. 2021; Kerr et al. 2021), as they focus on aggregated movement of population from one cohort to other. Importantly, coarse-grained models fail to comprehend micro-causality and emergent behaviour in a cohort, e.g., super-spreader events from social gatherings.

In addition to these generic limitations, purely historical data centric coarse-grained models are vulnerable to both internal and external threat to validities (Winter 2000). External validity becomes prominent during the early phase of a new variant as one needs to rely on data collected from, for instance, an altogether different geographical region. For example, the data collected from South Africa are used for predicting possible infection trends of other counties that are having different vaccines, vaccine coverage, demographic details, and so on. Internal validity is a concern for infection prediction as the observed cases in a given area are not an accurate representation of the reality as observations depend not only on actual infection but also on the ratio of asymptomatic cases and the scale of random testing. For example, analysis of infection spread of Omicron based on the observed data might lead to inaccurate interpretation as asymptomatic cases are considerably high for Omicron, testing uptake is considerably low, and case reporting is a universal concern due to lower severity and wide, and largely undocumented, use of home-testing facilities.

To overcome the limitations of coarse-grained models, fine-grained agent-based models have been employed as a competing approach (Kerr et al. 2021; Ferguson et al. 2020; Wolfram 2020; Silva et al. 2020; Cuevas 2020). The key objective of these models is to capture the behaviour of micro-elements such as people, households, and places (e.g., workplace, school, shops) to predict macroscopic indicators such as the number of infected cases, cases that need medical infrastructure, deaths, and so on. Fine-grained models need to make a trade-off between richness and scale. The richness includes the ability to represent the heterogeneity of the people, households, and places at a fine-grained level to take the model closer to the real context, i.e., city, state or country. Many of the agent-based models (Silva et al. 2020; Cuevas 2020) consider high-level classifications of these entities as cohorts, where each cohort is internally represented using aggregated equations, and represent the whole system as a connected network of a limited number of cohorts. These models address scalability by aggregation and can estimate the impacts of fine-grained interventions, such as the impact of infection spread when all shops and/or offices are closed. However, they exhibit similar limitations as coarse-grained models for comprehending emergent behaviour and micro-causalities of pandemic dynamics. Covasim (Kerr et al. 2021), on the other hand, uses an agent-based model to capture the individualistic behaviours of a wide range of micro-elements and their interactions. They linearly scale down the population (in the order of 10\(^{3}\)) to make the simulation manageable. From a richness perspective, they capture demographic variations in the population, a wide range of places and interventions. However, the progression of infection in an exposed person and the combined effect of a specific variant and vaccine on the individual are encoded as predefined equations within person agents. This limits the ability to understand the interplay of the effect of a vaccine and the characteristics of variant on an individual.

Our approach is similar to Covasim in that we used agents to represent people with demographic heterogeneity and professions, household archetypes, place archetypes, and interventions. In addition, and importantly, we consider variants and vaccines as agents to observe the emergence of possible transmission, transition, severity, and fatality of an individual with specific age, gender, comorbidity, vaccination status, infection history, and infected with specific variant.

Simulation-Based Experiments

Prior Work

Fig. 2
figure 2

Predicted indicators based on our earlier experimentation and observed indicators

During the first wave (March–November 2020) in India, we constructed a parameterized city digital twin (Barat et al. 2021) to explore the efficacy of non-pharmaceutical interventions such as movement restrictions and relaxations, testing, mask usage. Technically, a fine-grained agent modelling technique was effectively leveraged to faithfully represent four aspects of interest that were believed to be the key influencers of the Covid-19 infection spread during the first wave: (a) epidemiological aspects of the Covid-19 virus (i.e., transmission and phases of infections); (b) demographic heterogeneity and comorbidity of the citizens in a city; (c) places of interest where people interact with each other; and (d) non-pharmaceutical interventions devised by the local administration (restrictions on free movements, face mask adoption, social distancing, testing, quarantine strategy, and contact tracing). We further contextualized city digital twin for Pune by configuring the demographic details and comorbidity distribution among Pune’s citizens, household structures and prototypical areas that reflect socio-economic characteristics of the city, various professions and their movements, and prototypical places, such as offices, factories, schools, markets, worship places.

The Pune-specific digital twin was validated by comparing the simulated set of key indicators (KIs) of pandemic dynamics that include patients requiring O2, critical cases and deaths with the real data collected from March to June 2020. Our trend analyses, starting from July 2020, closely resemble how the first wave unfolded in terms of KIs and the timeline of the peak in Pune as shown in Fig. 2. In our study published in January 2021, we also predicted the possibility of second wave in Pune in between March and April 2021 (ref. Figure 19 of Barat et al. (2021)). While our predictions about second wave and its timeline matched closely with reality, we were wrong in predicting the magnitude of the second wave peak as shown in Fig. 2—primarily our prediction was much milder than reality (around 70% of first wave peak versus around 130% of first wave peak).

Modelling Issues

Fig. 3
figure 3

Mind map of possible factors and associated uncertainties

Fig. 4
figure 4

Qualitative analysis of possible influence

In the first wave, the key dynamics of the pandemic were primarily governed by one specific variant (i.e., Alpha) and the uncertainties were thus limited to the demographic factors (e.g., age and comorbidity) and people’s compliance with Covid Appropriate Behaviours (CAB). Moreover, the uncertainty related to CAB was considerably less as the situation was completely new to the population and very few people violated CAB. But after first wave, several uncertainties and new behavioural patterns emerged along multiple dimensions as illustrated in Fig. 3. People started violating administrative interventions and CAB norms such as the use of face mask and social distancing. Testing uptake varied significantly with time and space across cities. Strict home quarantine and institutional quarantine requirements were gradually lifted, and city administrators closed several institutional quarantine facilities due to low utilization.

Fig. 5
figure 5

Our extended digital twin and approach for risk-free experimentation

In addition to the existing uncertainties, several new types of uncertainties arose and became prominent. Vaccine-related uncertainties, such as effectiveness, delay in developing protection and loss of effectiveness, turned out to be points of major concern. The emergence of new variants and loss of natural immunity are the other types of uncertainties that are still not fully understood. More interestingly, these uncertain factors have been seen to influence the key indicators in a non-linear way over time as shown in Fig. 4. For example, an effective vaccine and a less severe variant (e.g., as for the case of Omicron variant) lead to less critical cases—an experience of such a trend makes people reluctant to comply with administrative and social norms. In a longer run, such noncompliances and low testing uptake influence the number of infections, which in turn can influence detected and critical cases after a delay. None of the models, including our earlier city digital twin (Barat et al. 2021), considered such factors while modelling the pandemic during its first wave. Therefore, the models became inaccurate for the purpose of predicting the second and subsequent waves.

Upon ex post facto reflection of what went wrong in the model during the second wave, most models attempted to account for the additional factors and uncertainties by adjusting the rates and the time delay associated with the movement of people from one category of infection to another in the model, such as susceptible to exposed. In contrast, we tried to incorporate those additional factors and interrelationships in a fundamental way by extending our city digital twin with variants, vaccines and their efficacy, different types of public gatherings, people’s movement to and from another city with a new Covid variant, and loss of immunity using agents and agent behaviours. Such type modelling capability not only helps to understand the situation of a system (e.g., city) for known set of variants and situations but is also useful to experiment different hypothetical scenarios such as the consequences of a new mutation, loss of immunity, and effectiveness of booster doses.

Extensions to Address New Uncertainties

We extended our earlier digital twin along all four interrelated dimensions as highlighted in Fig. 5. We extended the population characteristics dimension to include immunity from vaccination and infection. The places & movements dimension was extended to capture in- and outflow of people from and to a city. This is an important consideration when a new variant spreads for the first time in a locality or city. We also introduced new places of interest along with their characteristics, e.g., public gardens and bars, as they became prominent places where people started visiting after the first wave. The Non Pharmaceutical Interventions (NPI’s) and compliance dimension was extended to include vaccines & their adoption, varying levels of compliance and testing uptake. In our earlier digital twin, we assumed \(100\%\) compliance with home and institutional quarantine rules, but we found that such strict compliance is rarely seen in reality. Thus, we parameterized it to explore a range of noncompliance scenarios. Vaccines, one of the key interventions at the moment, are introduced as agents. Any number of vaccine agents with varying efficacy (in terms of reducing infection, severity, and fatality probability), together with an associated configurable delay after administration of a dose, can be specified and experimented using our digital twin. Vaccine adoption in a city is modelled via a parameterized administration rate. Multiple vaccines of different characteristics (e.g., Covishield and Covaxin (Rather et al. 2021)) can be set to be introduced from a specific day of a month to the population with a set of criteria on age (e.g., \(60+\) or \(45+\)), comorbidity (e.g., diabetes and hypertension) and profession (e.g., medical professionals). Their dose intervals can also be configured, such as 28 days for Covaxin, 90 days for Covishield, 270 days for third dose based on the vaccination policiesFootnote 2\(^{,}\)Footnote 3.

As for the epidemiological aspect, our new digital twin can specify and introduce new/hypothetical variants using a parameterized agent specification. A configured variant agent represents specific infectivity, severity, and fatality characteristics along with other properties such as the probability of immune escape. One can introduce new variant to a population of a city by specifying a start date and possible rate of introduction through in and out movement from and to other cities.

The overall situation in a city is essentially the interplay of all aspects from each of the four dimensions. We observe possible situations by contextualizing our digital twin with city-specific information and known facts about vaccines, variants, interventions, and compliance with them using digital twin simulation as shown in Fig. 5. Situations emerge through agent interactions. For example, a person with specific demographic characteristics, comorbidity, vaccine doses, and infection history can move around across different places within and across cities. As a result, she may get exposed to a specific variant with a certain probability. The susceptible-to-exposed transition of a person (target) depends on the duration and frequency of proximal contact with an infectious person (source), infection history of the target person and variant characteristics. Essentially, susceptibility-to-exposed transmission dynamics is an interplay between two person agents, place agents, and variant agent (of target person agent). In particular, the infectivity of a variant, the duration and proximity of source and target agents, and the characteristics of the place, i.e., open vs close, define the probability of infection spread in a place. Further, the progression of the infection in an exposed person (i.e., exposed to infectious, infectious to asymptomatic, mildly symptomatic or severe, respective state to recovered or dead) and the possible degree of criticality of that person depend on the characteristics of the person, infection and vaccine histories, as well as the infecting variant. Thus, the progression dynamics is the interplay among person agent, vaccine agent and a variant agent and the dynamics depend on: (a) age and comorbidity of an individual; (b) vaccine effectiveness for that person; (c) immunity developed due to earlier infection, and (d) characteristics of the variant. Other interventions and their compliance also influence the overall dynamics.

Movement-related restrictions of a person limit mixing. Testing helps in isolating infected person from susceptible population. However, noncompliance of quarantine increases mixing within household members and close contacts. Multiple simulations with varying parameters help to comprehend the complexities that are illustrated in Fig. 4. Over multiple experiments, we explored five key factors to understand the causality of the unexpected second wave in Pune: (a) movements—opening of places, such as schools, public gardens, restaurants & bars; (b) violation of Covid appropriate behaviour, social norms and strict home quarantine norms; (c) new variants; (d) vaccines and their efficacy, and (e) waning immunity.

Table 1 Parameters and their uncertainties

Simulation-Based Experiments

Fig. 6
figure 6

Reported data and known facts

Our prior study (Barat et al. 2021) predicted that a second wave in Pune could start from March 2021 and it was expected to continue till mid-April with 70% critical case load as compared to that observed during the first wave. While our prediction for the timeline of the second wave matched with the reality, the peak was much higher than predicted, as shown in Fig. 2. We conducted a thorough systematic ex post facto analysis of possible infection spread dynamics to understand how the new factors and their associated uncertainties might have influenced the overall situation that resulted in such significant deviation. The key objective of our analysis is not just focusing on what might have gone wrong in our model but to understand the key factors that should be analysed critically for better prediction for the future waves.

As discussed in Sect. 3 and depicted in Figs. 3 and 4, more than 20 critical variables along seven broad dimensions might have influenced the Key Indicators (KIs) of the pandemic in a given city. The KIs include detected cases, critical cases, and death. To understand the possible influences of these critical factors on KIs in a quantitative term, we conducted several experiments by contextualizing our extended digital twin with precisely known facts (i.e., known knowns (Pawson et al. 2011)) and the facts which were known with a certain degree of uncertainty (i.e., known unknowns (Pawson et al. 2011)). The known known factors, such as vaccine criteria and adoption in Pune, are augmented as an extended model in the digital twin, and the known unknown factors are added as a range of possible values, where the ranges are defined based on available information from authentic sources and domain knowledge from epidemiological, demographic, and administrative perspectives. The list of such factors with possible ranges and guiding principles for determining such ranges are summarized in Table 1.

Experimental Setup

A ex post facto analysis of multiple variables to understand their influence on infection dynamics by observing real data is a classic sensitivity analysis or polynomial regression problem where linear programming and other optimization techniques are found to be candidate approaches. However, the stochasticity of the constituent elements of a city (i.e., people, places, variants, interventions and their compliance) and the large number of unknown parameters with wide a range of possibilities make such approaches ineffective. Moreover, these parameter values do not change simultaneously—they are relevant at different points in time (e.g., vaccine adoption starts from a specific date for a specific group of people, restrictions get imposed from some other day and a new variant enters Pune on yet another day) and often they become effective after an uncertain time-delay, for instance, the effectiveness of a vaccine after a dose. The large number of variables and their wide range of variation over time make this problem a cumbersome one to solve using conventional optimization techniques. We adopted a systematic simulation-based experimentation technique that explores the relevant variables over multiple simulation runs. The simulation results were compared with reported official data pertaining to detected cases, critical cases and deaths to understand the overall dynamics and possible influences.

Table 2 Phases for experimentations


The search space, i.e., parameter values as specified in Table 1, is explored through an iterative simulation of the digital twin. To make the iterations manageable without compromising on precision of the analysis, we divided the time from January 2021 to November 2021 into five Phases and interpreted the reported data along those phases as shown in Fig. 6. Each phase has specific characteristics as summarized in Table 2. Therefore, only a limited subset of parameters is relevant for exploring a phase as illustrated in Fig. 7. Essentially, this prunes out combinations that probably have not happened. For example, the effect of vaccines was negligible during Phase 1 and Phase 2 as number of people in Pune vaccinated with at least one dose was less than 5%. Therefore, exploring vaccine efficacy during Phase 1 might not be a pragmatic consideration.

Multiple iterations focusing on a subset of parameters from Table 1 and the correlation of simulated values with actual reported values of the key indicators (KIs) helped to derive possible parameter values (i.e., hypothesis about a possible value of a parameter). Values derived from a specific phase are further analysed in the context of subsequent phases to prove or disprove the hypothesis under consideration. In addition, we also defined and simulated several anti-hypotheses to establish influences and understand overall dynamics.

Fig. 7
figure 7

Phases for experimentation

Our hypotheses and anti-hypotheses are proved and disproved, respectively, through 345 scenario evaluations. To eliminate the threat to validity of simulation results and the possibility of considering extreme emergent situations of stochastic behaviours in a simulation run, we repeated each scenario 5 times.

Table 3 Predicted values and justifications
Fig. 8
figure 8

Simulation results with best-suited parameter values

Fig. 9
figure 9

Hidden indicators that influence pandemic dynamics

Synthesis of Simulation-Based Experiments

Synthesis of 1725 simulation runs with varying parameter values (from Table 1) guided by known facts for all five phases (as illustrated in Fig. 7) and critical comparison with the real data (as shown in Fig. 7) helped to derive likely values of the parameters along seven dimensions that are highlighted in Fig. 3. Derived values of these parameters along with a justification and our understanding of the overall dynamics for these parameters are summarized in Table 3. Predicted trends of KIs for derived parameter values in comparison with actual reported data are shown in Fig. 8. We argue that understanding the dynamics of a pandemic requires approximate understanding of a set of meaningful indicators that are not amenable to easy measurement, and we term these as Hidden Indicators (HIs). In this case, the distribution of existing variants, active infection, and seroprevalence level of a city serve as HIs. Detected cases are not an approximate representation of active infection as they differ significantly as a function of the testing uptake. Similarly, measuring the seroprevalence level of a city by knowing cumulative detected cases or death count is not appropriate as the number of detected cases is associated with testing while death is associated with the fatality rate of individual variants. The estimated values of HIs for the second wave are shown in Fig. 9.

Analysis of the Phases

Known facts from Phase 1 indicate that number of detected cases and critical cases were going down despite movement relaxations (Fig. 6e) such as opening of schools, shops, offices and other places, neglect of CABs such as the use of face masks and social distancing, social & public gatherings, and quarantine norms. Two possibilities emerge: (a) seroprevalence reached saturation level for possible contacts allowed at that time and the infectivity associated with the Alpha variant, and (b) no other variant coexisted in significant proportion. Phase 2 is an interesting transition phase. Although cases started increasing from mid-February 2021, the rise was not significantly enough till March 15 (as demarcated with a red dotted line in Fig. 6) to suggest the existence of a new variant. Moreover, there was no update from WHO (WHO 2022) about any new variant at that point of time. Therefore, the surge in infection and critical cases till March 15 were attributed to administrative relaxations and noncompliance with CABs. The later part of Phase 2, i.e., after March 15, is when the situation became unexplainable with just movement relaxations and noncompliance of Covid norms. Our simulations indicate that the existence of Delta variant was significantly low during Phase 1 and might have increased rapidly towards the end of February-possibly from adjacent cities through people’s in and out flow. Therefore, the effect of Delta compensated the downward trend in Alpha-driven infections due to seroprevalence-related saturation in Phase 1.

Table 4 An overview of selected anti-hypothesis and rejection justification

Our analysis also indicates that Delta became dominant by mid-March as shown in Fig. 9a. Movement relaxations, violation of Covid norms and violation of home quarantine norms along with the significant presence of Delta variant, supported by low vaccine adoption during Phase 2, explain the situation. High infectivity of the Delta variant played a crucial role for infection spread and noncompliance of administrative interventions was the next most significant contributing factor. We explored several other possibilities (anti-hypotheses) as principal causes for the observed surge in Phase 2: (a) opening up of places, lifting movement restrictions, and noncompliance to social and quarantine norms, (b) higher infectivity, severity, and fatality of Delta variant, (c) existence of the Delta variant earlier than predicted, and (d) faster and greater waning of vaccine-induced immunity. Simulation-based explanations for rejecting these anti-hypotheses are summarized in Table 4. This understanding explains the discrepancy between our prior prediction (Barat et al. 2021) and the reality. Essentially, we considered all factors except emergence and dominance of the Delta variant while conducting our earlier experimentsFootnote 4.

Phase 3 is another critical phase during the second wave. Reported number of detected cases, critical cases, and deaths in Pune were rapidly rising despite the lockdown, high testing uptake, and increasing number of vaccinated individuals as shown in Fig. 6d. They indicate the possibility of delay in implementing strict lockdown or administrative interventions, noncompliance to home & institution-quarantine norms, high infectivity of Delta variant, poor protection from partial vaccination, and delay in building adequate immunity after vaccination. Our explorations, which were aimed at explaining the situation, led to the emergence of the following combined factors: (a) Delta being the dominant variant, b) possible 7–10 days delay in implementing strict administrative interventions (imposed from April 4 in PuneFootnote 5), (c) violation of home quarantine norms by more than 50% of the people, thus spreading infection to household members, and d) vaccine-induced immunity taking 2–3 weeks to develop to adequate levels after the administration of a dose. Explorations also indicate that the effectiveness of both vaccines, i.e., Covishield and Covaxin, might be below 50% after the first dose. For this Phase, the dominance of Delta variant was the most critical factor followed by noncompliance of home quarantine norms.

Descending trends of detected cases and critical cases in Phase 4 indicate two possibilities: (a) saturation of seroprevalence level for associated infectivity of Delta variant and allowed movements for Pune population and (b) efficacy of the vaccines. Our experiments suggest the former factor slowed down infection transmission during Phase 4 and the latter contributed to sharp reduction of critical cases in Pune. Phase 5 was the time when vaccine efficacy was adequate and compensated for the high infectivity and virulence of Delta. High effectiveness resulted in less critical cases and deaths. Detected cases reduced due to two factors: (a) high seroprevalence level, and (b) reduced testing uptake. The latter factor is an indirect influence of less severity (due to vaccine effect).

Fig. 10
figure 10

Analysis on cumulative detected cases and deaths

Analysis of Factors Influencing the Indicators

Synthesis of reported data about KIs along with simulation-led derivations of KIs and HIs help to understand possible influences of various factors and the associated micro-dynamics. Notably, actual active infection during the second wave is computed to be less as compared to the first wave in our experimentations as shown in Fig. 9b. However, reported detected cases (authentic data from Pune city administrator) were found to increase rapidly from March 15 to May 15 as shown in Fig. 10. The key reason for such a rapid rise in detected cases is not the rise of actual infection at same rate, but rather the increase in testing uptake– it was 6–8 times higher than the peak testing uptake during the first wave. The indirect influence for such high testing uptake can be contributed to the panic associated with the higher severity due to Delta variant (it was the dominant variant by that time) to an extent.

Fig. 11
figure 11

Analysis of critical cases

Fig. 12
figure 12

Actual and predicted situation during the third wave in Pune

Fig. 13
figure 13

Understanding of hidden factors of third wave

Therefore, we argue that detected cases in a city is not a good indicator to assess the state of the Covid pandemic—it can be misleading in some situations. The number of critical cases in a city, on the other hand, is a better indicator to assess the state of the pandemic in a city. But they are influenced by several factors. For example, unexplainable rise of critical cases from March 15 to May 15 in Pune can be clarified if active infection, distribution of Delta variant in Pune, characteristics of Delta variant are approximately predicted, and vaccine status is precisely known as illustrated in Fig. 11a. In this case, Delta becoming dominant from mid-March and continuing to be more prominent till mid-May is the key factor. Low vaccine coverage, limited efficacy of the first dose and the delay in developing vaccine-induced immunity were the next most significant contributing factors, followed by the delay in imposing a strict lockdown and noncompliance with home quarantine guidelines.

Death is arguably a more definite indicator among the existing KIs but understanding the situation of a city based on death count is also a difficult proposition as it depends on critical cases and several hidden indicators that include distribution of variants and the fatality rate of those variants. Moreover, death is also indirectly associated with available healthcare facilities in a city to an extent. Therefore, relying on one or two indicator(s) for a short-time window to predict the future course of the pandemic or for understanding the dynamics of a pandemic might be inappropriate in complex situations. Here, we considered multi-criteria evaluation along multiple key indicators and hidden indicators over extended time horizon to understand the dynamics of a wave and a pandemic.

Table 5 Waves in Pune and their characteristics

Exploring 3rd Wave Dynamics

We introduced the Omicron variant that caused third wave in the digital twin to understand the third wave dynamics. We introduced an Omicron agent as a more infectious variant than Delta (as reported in Kumar et al. (2022)), less severe & less fatal than Alpha variant, and capable of bypassing vaccination-induced immunity to an extent. We also configured movements for places based on the real situation in Pune during the third wave and added new interventions as they emerged. Our predictions about KIs and HIs along with actual data are shown in Figs. 12 and 13. Key observations from real data sources and simulation-led observations for all three waves are summarized in Table 5.

For the third wave, we estimated active infections to be double and critical cases to be one third as compared to the second wave. The high number of active infections can be attributed to the high infectivity of Omicron and its immune bypass capability, opening up of public places, lifting of movement restrictions, and significantly reduced compliance with home quarantine norms as shown in Table 5. However, the detected cases shown in Fig. 12c and average weekly count in Table 5 do not reflect such high numbers primarily due to low testing uptake as compared to the second wave as shown in Fig. 12a. Critical cases and death are computed and found to be low as shown in Fig. 12d, e, respectively, due to the compounded effects of low severity and fatality of Omicron variant, high vaccine coverage and vaccine efficacy (as computed from second wave analysis). In particular, the key reasons for smaller number of critical cases, despite high number of predicted active cases, are: (a) low severity of Omicron, (b) dominance of Omicron variant as shown in Fig. 13a, b vaccine effectiveness and significant vaccination coverage within the eligible population (100% received one dose and 75% received both doses as shown in Fig. 12b, c) healthy unvaccinated population below 18 years are less prone to severe symptoms. This observation (anecdotally) matches with the real situation—severe cases during third wave in Pune were mostly unvaccinated people or partially vaccinated ones with comorbidity.

Wider Application and Limitations

Although all experiments presented in this paper are exclusively for Pune, the modelling parameters used in our city digital twin are exhaustive and this would make our model applicable to most cities in India and elsewhere, modulo the necessary contextualization. City Specific contextualization requires two types of information: parameters needed to faithfully represent a city, and parameter ranges to conduct ex post facto analysis. Contextualization of a digital twin for a city expects information describing unique characteristics of the city that include its population, age distribution, a high-level understanding of comorbidity (as available for PuneFootnote 6), and socio-economic characteristics (such as those available for PuneFootnote 7. Parameters for ex post facto analysis requires data as indicated in Table 1.

A key limitation of our approach is the sheer number of details required for contextualization of city digital twin. As we observed at the onset of the Delta-driven wave, this can make any ABM (not necessarily just ours) difficult to construct or tweak in time-critical situations. Therefore, an ABM may need to be supported by a low-order (and possibly lower fidelity) model that can be used to tune the ABM rapidly in the event of a clear divergence from the field data (Paranjape et al. 2022). A rigorous, methodical approach for designing and operating such a companion model has yet to be derived, to the best of our knowledge.

Concluding Discussion

In this paper, we presented a systematic simulation-based experimentation study to understand factors that influence the key indicators of Covid-19 pandemic. It critically reflected on official Covid-19 pandemic data reported by Pune city authorities until now to derive useful insights about possible influences and pandemic dynamics. They help justify the possible causes for perplexing situations observed during second and third wave. They also identify parameters that should be critically evaluated to predict future waves and assess the spread and impact. Our experiments indicate that characteristics of dominant variants is the principal contributing factor for infection spread. Variants with higher infectivity can potentially lead to a wave—as seen in last three major waves. People’s movements along with the characteristics of variant play a major role to intensify/slowdown the pace at which a wave will unfold in a city during its initial phase. As summarized in Table 5, the first wave took around 160 days to peak in Pune. In comparison, the second wave took 91 days, and the third wave took only 31 days to peak. The key reasons for such differences are twofold: a) increased infectivity of the dominant variant i.e., Omicron is more infective than Delta which is more infective than Alpha variant, and b) relaxing of movement related restrictions and reduced adherence to Covid appropriate behaviour from first wave to second wave and then to third wave as shown in Table 5.

Detailed analysis of simulation results indicates that movement restrictions impacting open spaces (i.e., lockdown) by itself can at best delay the onset of a wave and that too to some extent only. Instead, appropriate control measures that reduce mixing of people in closed places (as opposed to open places) and limit household infections lead to better result. Therefore, isolation in the form of strict home /institutional quarantine backed by effective testing contribute the most toward reduced infection spread. Wearing face mask while in indoor places helps to curb the infection spread to a large extent. Seroprevalence level, a hidden indicator, plays a role in deciding the peak value of active infection but only when the dominant variant is not capable of bypassing immunity—as we have seen for Omicron variant during the third wave. Infectivity and severity of a variant have a complex relationship with other factors that influence how quickly a variant can be the dominant variant and make impact to healthcare system in a city. A variant with low infectivity and low severity exponentially disappears from the community with time as we observed for Alpha vs Delta (shown in Fig. 9). Variant with low infectivity and high severity disappears much faster compared to a variant with low infectivity and low severity as severely infected person is likely to go for testing and subsequent isolation. Variant having high infectivity and low severity (e.g., Omicron variant) quickly becomes the dominant as shown for Omicron and Delta in Fig. 13a. Variant with high infectivity and high severity can potentially be the most dangerous, however, high severity leads to early detection thus limiting the impact significantly—such a situation can be controlled through early testing, contract tracing and strict isolation. Therefore, a variant with high infectivity and high severity is unlikely to survive for long. Variant’s ability to bypass immunity is another characteristic that needs to be considered carefully as it significantly contributes towards the magnitude of the peak of a wave.

Vaccine is the most critical factor towards controlling the severity of infection spread and subsequent fatality modulo the extent and quality of healthcare available. We have seen its impact during the later stage of the second wave and throughout the third wave as vaccine adoption in Pune has reached significantly high level by then. However, reduced severity typically leads to reduced testing and lax compliance to isolation norms—as witnessed during the third wave. These two factors might lead to greater infection spread thus putting to risk those with comorbidities and/or without vaccine protection.

The rapid recession of the third wave has led to an equally quick return to normalcy. With a sizeable fraction of the population yet to receive full vaccination cover, there is speculation about a further wave. Based on simulation results, we surmise that a major surge in infections is possible only if: a) a new variant characterized by moderate to high infectivity, moderate severity, and immunity-bypassing capability emerges, and/or b) a significant chunk of the population becomes susceptible due to immunity waning over time. The latter can cause a wave even in the absence of the former, as most of the variants are likely to persist for a while and can cause a surge if seroprevalence levels drop significantly. As predicting the possibility of a new variant is out of the scope of our study, this paper makes no prediction about future waves. However, it urges administrators and policy makers to be vigilant about Covid-19 mutations with specific characteristics and the possibility of waning immunity while defining a safer return to new normalcy.