Influenza is a common, highly contagious viral respiratory disease. Annually it affects 5 to 15% of the world's population, causing considerable morbidity and mortality in all age groups [1]. Influenza vaccines have been available for more than half a century. For optimal efficacy, vaccine strain compositions are updated regularly to counter "antigenic drift" that occurs progressively from season to season as a consequence of immune selection, so that the vaccine antigens are as close as possible to the circulating wild-type antigens. Current inactivated vaccines comprise preparations of virus from two subtypes of influenza A (H1N1 and H3N2) and one of influenza B. Purification of these trivalent vaccines leaves mainly viral haemagglutinin (HA) and neuraminidase (NA) glycoproteins. The haemagglutination-inhibiting (HI) antibodies generated in response to stimulation by an exposure to HA prevents infection by disrupting the binding of the virus to host receptors. The concentration of HI antibodies in the blood (HI titre) is measured using a specific immunological assay [2].

Despite the extensive use of the HI assay in the annual approval process of inactivated vaccines [3, 4] and in the evaluation of new seasonal or pandemic influenza vaccines, limited attempts have been made to use HI as a means to predict influenza vaccine efficacy. Based notably on the observations made in a seminal paper by Hobson et al [5], a HI titre of 1:40 is generally accepted to be associated with a 50% reduction in the risk of illness in a susceptible population [6], and can be referred to as the 50% protective titre (50% PT).

Recently, Gilbert et al. [7] used logistic regression to analyze the relationship between HI titre and vaccine efficacy but only as an illustrative example with data coming from one of the first clinical trials ever performed [8].

Better understanding of the relationship between HI titre and protection against illness may help evaluate vaccine efficacy when only immunological data are available. Pandemic vaccines offer a good illustration of circumstances in which an immune correlate is potentially useful for the assessment of vaccine efficacy [9]. More generally speaking, correlates of protection are valuable in any situation where practical issues or resource limitations prevent the direct estimation of vaccine efficacy.

Beyond the specific case of influenza, statistical validation of surrogate endpoints has generated extensive literature [1013]. Recently, Qin et al [14] developed a framework for the identification of different levels of correlates of protection adapted to the context of vaccination. Several applications of this methodology exist for drugs in the literature (see e.g. Molenberghs et al [15]), but only a few can be found for vaccines using either the results of a single clinical trial [7, 16, 17] or simulated data [18].

Here we describe the development of a model, using a meta-analytical approach, that relates protection against laboratory-confirmed influenza to HI titre.

The methodological problems raised by the development of this model can be divided in three categories. The first category is related to the nature of the relation between HI titre and protection against influenza. This relation is unlikely to be of linear form and the preventive role of HI antibodies must be separated from other factors that influence the occurrence of influenza illness. The second stems from the geographic and temporal variation that affects not only virus circulation but also possibly the level of protection conferred by HI titres. Assessing such variations requires the use of a meta-analytical approach with datasets collected over time in vaccinated and unvaccinated populations. We therefore considered a nonlinear hierarchical model with random effects associated with all parameters to be estimated. The third type of methodological problem is directly linked to the nature of data available to perform this estimation. They were collected from articles published in the medical literature and in which data are presented for a limited number of HI titre intervals. The model developed therefore accounts for this interval-censorship.


A model for estimating the relation between HI titre and protection against influenza

A simple model for one study, no censorship and no covariates

Influenza illness is the result of a complex process involving the risk of being in contact with an infectious individual, the risk that this contact leads to infection and finally the risk that infection results in illness. Our objective is not to model this whole process in detail but to focus on the protection afforded to an individual by the level of humoral HI antibodies.

The model starts with a baseline risk (0 ≤ λ ≤ 1) that an individual develops influenza in absence of any HI-related protection. To estimate the risk that an individual develops influenza (P(y j = 1)) in presence of HI antibodies, this baseline risk is combined with a function defining the contribution of HI titre to the individual's protection (0 ≤ π (T j , θ) ≤ 1, where Tj is the HI titre and θ is the associated vector of parameters). More specifically:


To fully characterize this estimation, the functional form associated with π (T j , θ), hereafter referred to as the HI-protection curve, needs to be specified. π (T j , θ) will be a flexible and smooth increasing function. In accordance with Dunning [16] it is specified as a two-parameter inverse logit function (θ ={α, β}) applied to log-transformed HI titre values. The original model is then further modified to make its parameters more directly interpretable, leading to the following equation:


The parameter α is closely linked to the 50% protective titre (50% PT) first identified by Hobson et al [5]: ∀θ, T j = e απ (T j , θ) = 0.5. The parameter α can therefore be interpreted as a location parameter for the HI protection curve.

The parameter β is also easily interpretable since it is directly related to the slope of π (T j , θ) for , i.e. it describes the steepness of the HI protection curve.

A random-effects model with uncensored data with covariates

The inclusion of multiple datasets (i = 1, ..., I) enables to account for the heterogeneity likely to impact the HI protection curve. This was done using a hierarchical model, considering random effects for the parameters of the curve (α i , β i ). The datasets considered for these random effects correspond to observations made for one virus strain in one study.

Part of this heterogeneity may be explained by relevant covariates. We tested the impact of several covariates by adding in some models a vector of binary variables Xi with associated parameters α c , β c . This representation limits the analysis to covariates whose values are common to all subjects of a given dataset and are consistent with the type of data considered.

Finally, the baseline risks λ i , which are unrelated to the HI protection curve, will be treated as independent dataset-specific parameters. No assumption was therefore made on the variation of λ i across studies. Similarly to random effects, we consider baseline risks to be specific to datasets corresponding to observations made for one influenza strain in a study.

The probability to be estimated becomes


Where θ i = {α i , β i , α c , β c }.

A random-effects model with interval-censored data with covariates

Censorship does not modify the structure of the random-effects model but rather necessitates an additional stage for its estimation. In a Bayesian framework, censorship can be accounted for using data augmentation techniques [19]. The basic idea is to consider the unknown Tij values as latent variables and to use sampling to impute their possible values given their underlying distribution and the interval to which they are known to belong . With this approach, Tij are in fact treated as additional parameters. We considered here that the log-transformed HI titres are normally distributed (with mean μ i and standard deviation σ i ) and specific to each dataset considered in the analysis. Provided that censorship is non-informative, the relationship between HI titres and clinical protection continues to be estimated using equation (3).

Estimation methods

The nature of the model considered led us to adopt a Bayesian approach to perform the estimation of its parameters. Markov Chain Monte Carlo (MCMC) methods were implemented using the Bayesian software package WinBUGS [20]. The corresponding directed acyclic graph is presented Figure 1. Posterior summary statistics were based on 3 Markov chains of 20,000 lengths after a burn-in period of 20,000 iterations. Convergence was assessed using Gelman-Rubin statistics [21] as well as the iteration history and kernel densities.

Figure 1
figure 1

Directed acyclic graph of the interval-censored model with covariates. Square boxes represent fixed quantities, white circles stochastic nodes, grey circles logical nodes, solid arrows stochastic dependencies and dashed arrows deterministic dependencies. Bold lines corresponds to nodes associated with data.

We assumed the two random parameters of the HI protection curve to be normally distributed (). For dataset-specific and other parameters, we selected the following non-informative prior distributions:

Our choice of non-informative priors is quite usual. It is however worth mentioning that for the baseline risks λ i , we specifically considered Jeffreys prior for the Beta distribution which can be regarded as less informative than the alternative choice λ i ~ Beta(1,1) (see e.g. Gupta and Nadarajah [22]).

The comparison of the estimation of the HI protection curve obtained with the different models tested was mainly performed using the Deviance Information Criterion (DIC) [23]. This criterion, an Akaike-like criterion for Bayesian models, assesses the goodness-of-fit of a model using posterior mean deviance. To compare models with and without covariates, we also used the 95% credible interval of the parameters associated with covariates.


To estimate HI-related protection, we reviewed the literature to identify datasets combining information on HI antibody titre and the occurrence of influenza. Relevant publications from 1945 to 2006 were identified in the Medline and Embase databases using the search terms: influenza, immune correlate, protection, vaccination, vaccine, immunogenicity, protective efficacy, serological surrogate.

We identified 36 articles, 21 of which were excluded based on inclusion of non-target populations (children in 9 studies; pregnant women in 1 study); insufficient data to allow quantification of the link between HI titre and influenza protection (6 studies); vaccination using live vaccines (3 studies); previously reported data (2 articles).

Fifteen articles were thus retained: 6 challenge studies, 5 clinical trials and 4 cohort studies (Table 1). In challenge studies, healthy adult volunteers were randomised to receive vaccine or placebo and then were exposed to a fixed dose of influenza virus after a pre-challenge serum HI titre assessment. In clinical trials, participants were randomized to receive vaccine or placebo before the influenza season, and had at least a pre-season serum HI titre assessment. In cohort studies, all the participants included in the cohort had at least a pre-season serum HI titre assessment, performed after immunization for vaccinated participants. In all clinical trials and cohort studies, the occurrence of influenza was observed during the influenza season following the HI titre assessment. Most study subjects were adults younger than 60 years, and only one study included elderly adults (60+ years) [24].

Table 1 Characteristics of studies included in the analyses

Several studies reported data for both vaccinated and non-vaccinated populations and HI titres corresponding to different vaccine strains. We split this information into different datasets to consider homogeneous groups of subjects in terms of HI titre information and vaccination status. Thus, the 15 articles provided 37 datasets presented Table 1 for a total of 5889 observations and 1304 influenza cases. All except 39 influenza cases in one study [24] were laboratory confirmed using paired serology (four fold change) or virus isolation.

These data enabled us to test the effect of circulating strain (A or B), vaccine exposure (yes or no), study design (clinical trial, challenge study or cohort study) and diagnostic method (laboratory-confirmed clinical diagnosis, serological diagnosis and clinical diagnosis).

HI titres are measured using a two-fold serial dilution assay, which determines the highest dilution of serum which still inhibits haemagglutination [2]. So while the concentration of HI antibodies can be considered to be a continuous variable, and will be treated as such in our analysis, the assay only provide a limited number of possible results, creating the first level of censorship. A second level of censorship results from how the data are published. In most of the cases, data are reported for a limited number of HI titre intervals and not for each possible result of the HI assay.


Gelman-Rubin statistics and kernel densities (Figure 2) confirm that the markov chains and the burn-in period are sufficiently long for the posterior statistics to be meaningful. The results presented Table 2 show a strong and positive relationship between HI titre and the occurrence of influenza: β is significantly different from 0 and positive whatever the model considered. These results are reinforced by the overall good fit to the data illustrated Figure 3 for the ALL model (all available data included and no covariate).

Table 2 Impact of vaccination status and vaccine strain on the estimation of HI-related protection:
Figure 2
figure 2

Gelman-Rubin diagnostic plot (a) and kernel densities (b) (ALL model). GR plot: blue corresponds to within-chain variability, green to between chain variability and red to their ratio.

Figure 3
figure 3

Estimated (Y-axis) versus observed (X-axis) number of influenza cases for the 37 datasets considered in the analysis (ALL model). Red corresponds to vaccinees, blue to non vaccines, grey to unknown status, circles to type A strain and triangles to type B.

The impact of covariates related to vaccination status and virus strain are important for the interpretation of the HI protection curve. In both cases, their impact is not significant: DIC does not decrease or even increase when {α c , β c } are added to the model and none of these parameters significantly differ from 0. The same curve therefore seems to apply regardless of the type of virus strain considered and regardless of the vaccination status of the subjects. This latter result is of particular importance as the absence of significant differences between vaccinated and non vaccinated subjects corresponds to a condition set by Prentice [10] for defining an accurate surrogate endpoint.

We also estimated the HI protection curve considering only the data reported by Hobson et al [5] (HOB model). The 50% level of protection is obtained with this model for a log-transformed HI titre is 3.38 [95% CI:1.5;5.2] corresponding to 1:29 in the natural scale, which is consistent with the 50% protection titre identified by Hobson and his co-workers (1:18-1:36) but remains higher to that obtained in the ALL model (1:17 in the natural scale). The credible interval is also much wider in the HOB model [1:5;1:195] than in the ALL model [1:10;1:29], which indirectly supports the use of a meta-analytical approach to obtain more robust estimates and to distinguish uncertainty from heterogeneity.

The standard deviations reported in Table 2 provide insight on the importance of the heterogeneity of the HI protection curve across studies. Significant variations can be observed: the 25% percentile associated with the variation of the 50% PT from one study to another can be calculated to correspond to a HI titre of 1:10 and the 75% percentile to a HI titre of 1:30. Similarly, the steepness of the HI protection curve, as measured by β, ranges from 1.05 to 1.55.

Finally, we tested covariates corresponding to the conditions in which the observations on influenza were collected (Table 3): study design (DES model) and diagnostic method (DIAG model). None of these covariates improved data fit as measured by the DIC criterion or the significance of the corresponding parameters.

Table 3 Impact of study design and diagnostic method on the estimation of HI-related protection

Overall, our results support the conclusion that the best representation of the relationship between HI titres and protection against influenza is obtained when combining all available data without any covariate (ALL model). The corresponding HI protection curve is presented Figure 4.

Figure 4
figure 4

Estimated probability of protection according to the level of HI titre. (All Model - Posterior Mean value and 95% credible interval).


Although the association between serum HI antibody titre and protection against clinical influenza is well established [25, 26], the precise nature of the relationship has drawn little attention. To address this, we developed a model to provide an estimate of the level of protection associated with any HI titre level for use to predict influenza vaccine efficacy. Using HI data from multiple studies, the model showed a positive and significant relationship between immunogenicity and clinical protection. The relationship was notably found to be similar for vaccinated and unvaccinated subjects which is in accordance with Prentice criterion for a good surrogate marker [10].

Our results challenge the usual approach of defining protection based on the identification of a single threshold. The slope of the protection curve in all of the models we tested favoured a progressive increase in protection with increasing HI titre rather than a discrete threshold that could be applied to each individual. The incremental increase in clinical protection is however particularly important at titres of up to 1:100 (which includes the commonly used 1:40 threshold). Additional benefits become marginal beyond 1:150, which concords with the result derived by De Jong et al [25] who estimated the median 90% PT for influenza vaccines to be 1:192. Our analysis therefore supports their conclusion that developing influenza vaccines capable of reducing the number of poor or low responders would be clinically beneficial.

The comparison between our reference case combining all available data (ALL model) and the model based only on data reported by Hobson et al (HOB model) provides a good illustration of the added value of the analysis presented here. The results of HOB model are consistent with the results reported in the original paper in terms of 50%PT. Our model therefore does not contradict previous results, but provides an enhanced representation of the relationship between HI titre and protection against influenza. In addition, the large confidence intervals associated with HOB model highlights the need to combine a large number of observations to get an accurate representation of this relationship and to account for the heterogeneity across studies. This is precisely the added value of the meta-analytical approach proposed here. This meta-analysis confirms the existence of diversity likely to affect the interpretation of the results of a single study. This heterogeneity can be explained both by the conditions in which the subjects were exposed to influenza (e.g. time between HI titre assessment and occurrence of influenza) or by the design of the studies (e.g. laboratory methods).

It has to be mentioned that to simplify model specification, we considered random effects capturing jointly the heterogeineity across studies and across virus strains in the same study. A more detailed representation would have been possible notably to be able to consider specifically these two levels of heterogeneity.

Overall, the different covariates considered did not improve the fit to the data either measured by the DIC criterion or by the significance of the corresponding parameters. This result holds not only for vaccination status but also for the type of viral strain, study design and diagnostic method. This supports the idea of the large applicability of the HI protection curve derived in this analysis.

Some similarities can be found between our results for HI assay and a statistical Surrogate of Protection (SoP) defined by Qin et al [14] i.e. an immunological measurement characterized by a relationship with the end point of interest (in this case laboratory-confirmed influenza) that is similar in vaccinees and non vaccinees. One important point is that no attempt was made to establish a causal relationship but only to identify a statistical link. it can for instance be argued that HI antibodies only relate to the humoral immune response and therefore neglects the role played by cell-mediated immunity [27]. However, the coexistence of different biological mechanisms does not preclude the identification of a statistical link with one specific measurement. It only requires that this link is not improperly interpreted as sufficient to explain the complex biological mechanisms that trigger the protection against an infectious disease.

The main application of a surrogate of protection is predicting vaccine efficacy. Although our analysis was at this stage only focused on the derivation of an HI protection curve, this is clearly an important next step. The HI protection curve can also be used for comparing vaccines characterized by different immunological profiles. This can be seen as an improvement over the use of standard criteria such as seroprotection rates (i.e. percentage of subjects with a HI titre above the 1:40 threshold for protection). As pointed out by Nauta et al. [28], such criteria may be misleading for this type of comparison if the HI protection is better described as a curve than using a threshold approach.

The reliability of our model depends directly on the quality of data used for the estimation and calculation phases. As we used published data, there were some limitations that could not be overcome. The data were acquired and published over a period of many years, and the studies involved heterogeneous populations, different study designs, and in some cases inadequate or no description of randomization procedures. The HI test itself changed over time and is also subject to inter-laboratory variability [29]. Other differences noted were: vaccine composition and dosage, case definition, interval between vaccination and antibody titration, assay method. The selected datasets also had limited information on the status of confounding factors such as pre-vaccine antibody level titre, influenza vaccination history, prevalence of co-morbidities, nutritional deficiencies, chronic exposure to stress or drugs that could affect the immune system [30]. The lack of available information on baseline covariates such as previous history of influenza disease or vaccination is another limitation. Antigenic similarity between vaccine strains and circulating strains, which vary with time, even during a single season, plays a key role in vaccine efficacy. While our use of results from 15 studies may have partially overcome this heterogeneity, reported immunogenicity results generally correspond to vaccine strains having a good match with circulating strains. This problem of matching is particularly critical for pandemic vaccines, and the direct application of our results to vaccines developed in the context of pandemic preparedness should be considered very cautiously. It is also important to stress than H5N1 vaccines require a different HI test than the one used for seasonal vaccines [31].

To further evaluate and establish this model, an important development will be to perform a similar analysis using data that includes detailed information at the individual level with a virological diagnosis of influenza (most cases considered in this analysis were serologically confirmed). The accuracy of such an analysis will however ultimately depend on the number of influenza cases considered. We believe that our use of over 1000 influenza cases in establishing our model as well as the consideration of study heterogeneity can be seen as the major strength of the analysis performed.

Finally our analysis was exclusively focused on the case of influenza. However, the question raised by the identification of a good correlate of protection is applicable to all vaccine-preventable diseases. Meta-analytical approaches have been extensively used for validating surrogate endpoints for therapeutic drugs [15], but the number of applications to vaccines remains very limited. The approach used here, which relies on published information to access a large number of cases (the "price to pay" in terms of data quality being here censorship), could be easily adapted to other vaccine-preventable diseases.


The model developed enables us to specify the relationship between HI antibody titres and clinical protection against influenza while accounting for heterogeneity among studies. This relationship appears consistently positive and similar irrespective of vaccination status or viral strain and could be used to predict the efficacy of inactivated influenza vaccines when only immunogenicity data are available.