Background

It is estimated that one quarter of the world population is infected with (TB). Although the disease is preventable and treatable, about one and a half million people die annually from it, effectively placing TB as the first infectious cause of death. Due to person to person infection and treatment mismanagement, (MDR) TB continues to emerge, increasing the complexity in treatment and thus potentially worsening the transmission rate. There is a growing awareness that TB can be effectively fought only working globally, starting from countries like India, where the infection is endemic [1].

Once a person is diagnosed with TB, one of the most critical issues is the duration of the therapy, because of the high costs involved, the increased chances of non-compliance (which increase the probability of developing an MDR strain), and the time the patient is still infectious to others. One exciting possibility to shorten the duration of the therapy are novel host-reaction therapies (HRT), as an adjuvant for antibiotic therapy. Typical endpoints in the clinical trials for HRTs are time to sputum culture conversion, and incidence of recurrence. While for the first it is in some cases possible to have a statistically powered evidence for efficacy in a phase II clinical trial, recurrence almost always requires a phase III clinical trial with thousands of patients involved, and huge costs.

The in silico trials for tuberculosis vaccine development (STriTuVaD) project is an EU funded, multidisciplinary consortium testing the RUTI vaccine in a Phase IIb clinical trial. RUTI® antitubercular vaccine, provided by Archivel Farma S.L, is a polyantigenic liposomal vaccine containing fragments of Mycobacterium tuberculosis cells, currently being developed as therapeutic vaccine in patients with pulmonary tuberculosis. The vaccine, shown to be one of the most advanced therapeutic vaccines against drug sensitive TB and MDR-TB, has already been studied in healthy volunteers and for the prevention of active TB in patients with latent TB [2].

To help in this development, we extend Universal Immune System Simulator (UISS) [3, 4] to include the relevant determinants of such clinical trial, we establish its predictive accuracy against the individual patients recruited in the trial, use it to generate digital patients, predict their response to the host-reaction therapy being tested, and combine them to the observations made on physical patients using a new in silico-augmented clinical trial approach that uses a Bayesian adaptive design. This approach, where found effective could drastically reduce the cost of innovation in this critical sector of public healthcare.

To reproduce biological the diversity of the subjects to be simulated, an appropriate strategy for the generation of libraries of digital patients is developed by identifying a vector of features involving both biological and pathophysiological parameters, facilitating the personalisation of the digital patient.

In this paper we sketch the strategy we adopt to generate the cohort of digital patients, and show some preliminary results about the dynamics of TB on a subset of these patients. First, we briefly describe UISS and its extension to TB.

Extending UISS to track TB

We will briefly describe here the UISS computational framework and its extension to model tuberculosis, UISS-TB. The interested reader can find more detail in [5].

UISS is a multi-agent framework for the simulation of the immune system dynamics that can be extended to track specific diseases and related treatments. Unlike classical top-down approaches, where mean behaviours are modelled through systems of differential equations [6,7,8], agent based models and multi-agent systems track individual entities. It is the interactions between these entities that can give rise to global nonlinear behaviours. UISS has been developed as a multi-scale computer simulator of the immune system, as it takes into account both cellular and molecular entities and processes.

UISS has a proven track record, for instance it has been used for modelling the effects of a vaccine against the onset of mammary carcinoma [9, 10] and consequent lung metastases [11]; for the initial stages of atherosclerosis [12], for melanoma [3]; more recently, in the study of multiple sclerosis [4, 13] and for testing the efficacy of citrus-derived adjuvants for influenza vaccines and human papilloma virus [14, 15]. For its use within STriTuVaD, we have extended UISS to include TB dynamics along with the artificial immunity induced by vaccination strategies as presented in [5].

In order to depict individuals, a vector of features comprising biological and pathophysiological parameters has been identified. The list of parameters, their relative range and units are displayed in Table  1.

Table 1 Vector of 22 features for individualising virtual patients

Methods

In order to create an in silico patient, one needs to provide a single value for each feature. These values could be taken from individual physical patients; however, if a cohort of digital patients is to be produced, one should have a mechanism for producing as many different input vectors as needed, that are biological/physiological plausible. Formally, this requires the characterisation of the joint distribution of the inputs in the population. We have compiled typical values and standard deviations for each feature, providing a way to generate plausible values for each component at a time. Proceeding in this way would neglect the biological correlations between features and thus would not guarantee a physiologically plausible input vector. Hence, we must take into account these correlations. Given that we have 22 input variables, we should specify \(22 \times 21/2 = 231\) correlations. Using relevant literature [16, and references therein] and expert opinion, we have qualified these correlations, determining that all correlations are positive, but the correlation of IL-10 with the rest of the features.

Formalising in silico profile generation

In theory, one could elicit the joint distribution of the features vector, i.e. describe mathematically how each feature relates to the others in a space of 22 dimensions; but this would be not only extremely difficult, but also time consuming and data demanding. Our approach is to rely on current mathematical biology consensus and use a Gaussian to represent the population distribution. The additional advantage of using this approach will be discussed in the next section.

Formally, we say that the vector \({\varvec{f}}\) = \(\left\{ {f_{1},\dots ,f_{d}}\right\}\) follows a d-variate Gaussian distribution with joint probability density function,

$$\begin{aligned} \hbox{N}_{d} ({\varvec{f}} | \boldsymbol{\mu }, {\Sigma }) = \frac{|\Sigma |^{-1/2}}{(2 \pi )^{d/2}} \exp \left[ {- \frac{1}{2} \left( {{\varvec{f}} - \boldsymbol{\mu }}\right) ' \Sigma ^{-1} \left( {{\varvec{f}} - \boldsymbol{\mu }}\right) }\right] , \end{aligned}$$

with mean \(\boldsymbol{\mu } = \left\{ {\mu _{1},\dots ,\mu _{d}}\right\}\) and covariance matrix,

$$\begin{aligned} \Sigma = \left( {\begin{matrix}\sigma ^2_{1} &{} \sigma _{12} &{} \ldots &{} \sigma _{1d} \\ \sigma _{21} &{} \sigma ^2_{2} &{} \ldots &{} \sigma _{2d} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \sigma _{d1} &{} \sigma _{d2} &{} \ldots &{} \sigma ^2_{d} \end{matrix}}\right) \,, \end{aligned}$$

where,

$$\begin{aligned} {{\,\mathrm{Cov}\,}}\left( {x_i, x_j}\right) = \sigma _{ij} \, \text {related to the correlations by} \, {{\,\mathrm{Cor}\,}}\left( {x_i, x_j}\right) = \rho _{ij} = \frac{\sigma _{ij}}{\sqrt{\sigma ^2_i\sigma ^2_j}}\,. \end{aligned}$$

So, if we are able to elicit a measure of correlation between two inputs, we can calculate their covariance.

The elements in the diagonal, \(\sigma ^2_i\) are the marginal variances of each element, \(f_i\), and \(\mu _i\) the corresponding marginal mean. As mentioned above, we already have compiled a list with these values, so we have elicited values for \(\boldsymbol{\mu }\) and the diagonal elements of \(\Sigma\), \(\sigma ^2_i\).

Cohort generation

Once \(\boldsymbol{\mu }\) and \(\Sigma\) have been elicited, generating an in silico profile is a relatively trivial task: one must sample a point in the 22-dimensional space, consistent with \(\hbox{N}_{22} ({\varvec{f}} | {\boldsymbol{\mu }}, {\Sigma })\). However, we can exploit the properties of the Gaussian distribution to produce a cohort consistent with some specific characteristics. Say, for instance, that our target population has a particular range of BL, we would like then to produce digital patients consistent with that specific profile. Formally, let \(f_1\) represent BL and \({\varvec{f}} _{-1} = \left\{ {f_{2},\dots ,f_{22}}\right\}\), the rest of the features; we would like to sample from \(\hbox{N}_{21} ({\varvec{f}} _{-1} | {f_1, \boldsymbol{\mu }}, {\Sigma })\), ie the conditional distribution of the rest of the features, given that BL has a specific value. This is a standard procedure, which can be readily implemented.

We can go further and sort the list of features according to either their importance in determining the profile of a patient, or to the precision of their elicited mean, variance and covariance, and then proceed to sample from the conditional distributions. In general, let \({\varvec{f}} _s\) denote the vector of features with pre-specified values, so that \({\varvec{f}} = \left\{ {{\varvec{f}} _s, {\varvec{f}} _r}\right\}\), \({\varvec{f}} _s \in {\mathbb{R}}^{d-q}\), where \({\varvec{f}} _r \in {\mathbb{R}}^q\) is the vector of free features.

The conditional distribution, \(p ({{\varvec{f}} _r}| {{\varvec{f}} _s = {\varvec{a}}})= \hbox{N}_{q} ({\varvec{f}} _r | {\boldsymbol{\nu }}, {\Omega })\) with

$$\begin{aligned} \boldsymbol{\nu } = \boldsymbol{\mu } _r + \Sigma _{rs} \Sigma _{ss}^{-1} ({\varvec{a}} - \boldsymbol{\mu } _s) \quad \text {and} \quad \Omega = \Sigma _{rr} - \Sigma _{rs} \Sigma _{ss}^{-1} \Sigma _{sr}, \end{aligned}$$

where

$$\begin{aligned} \Sigma = \left( {\begin{matrix} \Sigma _{ss} &{} \Sigma _{sr} \\ \Sigma _{rs} &{} \Sigma _{rr} \end{matrix}}\right) \quad \text {with sizes} \quad \left( {\begin{matrix} (d-q) \times (d-q) &{} (d-q) \times q \\ q \times (d-q) &{} q \times q \end{matrix}}\right) . \end{aligned}$$

\(\Omega\) the Schur complement of \(\Sigma _{rr}\) in \(\Sigma\). Judicious choice of \({\varvec{f}} _s\) and \({\varvec{f}} _r\) enables sampling sequentially, e.g. from least to most important feature.

Results

We created an R script [17] for the generation of digital patents, available from the corresponding author upon request. We report results from three groups of 15 patients with different profiles, each with fixed (Age, BMI and MtbSputum) to roughly represent different profiles in the population and initial bacterial load. Profile 1 has (35, 21.4, 15), Profile 2 (45, 28.2, 502), and Profile 3 (55, 31.8, 910), the full set of values can be obtained from the Additional file 1. These can be used as input to the UISS-TB web interface, available from www.strituvad.eu (accessed on 28/07/20), by selecting the Tuberculosis disease model, hence accessible to any user with a conventional computer and access to the internet.

The GUI panel displays default values and admissible ranges for the vector of features parameters. Once the specific vector of features is completed, the user can click on the Submit button and a unique identification simulation number is assigned. The user can check the simulation status by clicking on the check status button, after selecting the appropriate simulation id. When the simulation is complete, the user can visualise results of immune system dynamics. In our case, the progression of each patient was simulated 50 times for 1 year, with levels of the various species recorded every 600 seconds. The data from each patient requires roughly 100 MB of disk storage.

We use the total (Ab) to exemplify some characterisation of the output; e.g. Fig. 1 shows the total Ab count for one simulation of the 15 patients in Profile 1. In order to characterise the mean behaviour, we average the 50 repetitions per patient. Figure 2 depicts the median and quartiles for a selection of patients (columns) for each profile (rows). It is clear there is an increased variability around the main and secondary peaks; while levels consistently fall back to nought after roughly 16 days (3500 h). The distribution of time at the peak level is illustrated in Fig. 3, it occurs consistently within 112–116 days for all profiles, while Profile 3 shows a slightly increased variability.

Fig. 1
figure 1

Profile 1 antibodies count. Time traces of the antibodies count for the 15 virtual patients in Profile 1, using only one out of the 50 simulations

Fig. 2
figure 2

Average antibodies count. Time traces of the average antibodies count for a sample of 3 virtual patients from each profile. The count has a main peak roughly at 4.5 hrs regardless of the profile

Fig. 3
figure 3

Time at peak. Distribution of time at peak antibodies count by patient and profile and the distribution of the average by profile

Conclusions

UISS-TB is a state-of-the-art agent based model capable of tracking the dynamics of TB infection in humans. Individual digital patients are defined by a vector features, known to be fundamental in TB infection dynamics and normally measured clinically, hence often readily available.

Discussion

In order to produce virtual cohorts of patients, we propose a sequential approach based on a characterisation of the distribution of these features in the population of interest; the approach allows to fix any combination of features, enabling mimicking patient selection criteria, thus yielding a method for setting up augmented in silico clinical trials.