Structure-Based Prediction of Anti-infective Drug Concentrations in the Human Lung Epithelial Lining Fluid

Obtaining pharmacologically relevant exposure levels of antibiotics in the epithelial lining fluid (ELF) is of critical importance to ensure optimal treatment of lung infections. Our objectives were to develop a model for the prediction of the ELF-plasma concentration ratio (EPR) of antibiotics based on their chemical structure descriptors (CSDs). EPR data was obtained by aggregating ELF and plasma concentrations from historical clinical studies investigating antibiotics and associated agents. An elastic net regularized regression model was used to predict EPRs based on a large number of CSDs. The model was tuned using leave-one-drug-out cross validation, and the predictions were further evaluated using a test dataset. EPR data of 56 unique compounds was included. A high degree of variability in EPRs both between- and within drugs was apparent. No trends related to study design or pharmacokinetic factors could be identified. The model predicted 80% of the within-drug variability (R2 WDV) and 78.6% of drugs were within 3-fold difference from the observations. Key CSDs were related to molecular size and lipophilicity. When predicting EPRs for a test dataset the R2 WDV was 75%. This model is of relevance to inform dose selection and optimization during antibiotic drug development of agents targeting lung infections.


INTRODUCTION
Hospital-or ventilator acquired pneumonia (HAP, VAP) is associated with a high mortality (1). Therefore, reaching efficacious effect-site concentrations of antibiotics is essential for successful treatment (2) and to suppress the emergence of resistance (3,4). For the majority of lung infections, the site of infection is the epithelial lining fluid (ELF). In order to reach the ELF, antibiotics needs to pass from the lung capillary into the interstitial space and subsequently move across the alveolar wall epithelium (Fig. 1). The alveolar membrane acts as a semi-permeable barrier due to the presence of tight junctions (5) and the presence of drug transporter proteins (6). This results in ELF concentrations that may be several folds lower or higher than the corresponding plasma concentration, as was nicely summarized by Kiem et al. (7) and Rodvold et al. (8). Therefore, the consideration of ELF concentrations during development of antibiotic agents for lung infections is of considerable importance.
The quantification of antibiotic concentrations in ELF is challenging. Bronchoalveolar lavage (BAL) is currently the most frequently used procedure (9) for such measurements. Limitations of this procedure include the indirect method of quantifying drug concentrations, the burden for volunteers due to its invasive nature, and the possibility of obtaining only single time-point samples among additional methodological Electronic supplementary material The online version of this article (doi:10.1007/s11095-015-1832-x) contains supplementary material, which is available to authorized users.
limitations (10). A more recent and promising technique for measuring ELF concentrations is bronchoscopic microsampling (BMS), which allows for repeated measurement of concentrations over time (11). Considerable variability has been reported for ELF concentrations not only between subjects but also within subjects (7,8). Such variation can be related to the aforementioned methodological issues, the drug-distribution related pharmacokinetics (PK), and potentially other physiological disease related factors such as edema and potential effects of inflammation on membrane permeability.
Although conducting lung PK studies is currently the standard for evaluating pulmonary exposure of antibiotics, in silico approaches to predict partitioning of antibiotics into the ELF space based on chemical structure properties would be of considerable relevance. Such predictive models could either inform or replace the design of complex and burdensome clinical lung PK studies. As such, predictive models can support clinical dose selection studies.
The prediction of various PK properties including the prediction of partition coefficients into various tissues is an important and widely explored field. Indeed, previously developed models allow prediction of such partition coefficients in various tissues (12,13) based on both drugspecific physico-chemical properties, and system-specific properties related to, for instance, tissue composition (14,15). However, when there is a gap in knowledge of active transport processes, such as for the alveolar barrier or the blood-brain barrier, these approaches provide poor predictions. In such cases, data-driven approaches can be useful, because strong mechanistic understanding is not required for these approaches. The relevance of such empirical, data-driven modeling for predicting partitioning into the blood-brain barrier has already been widely demonstrated (13,(16)(17)(18)(19).
Data-driven models for drug distribution aim to relate drug-specific chemical properties derived from their structure to the PK property of interest, and may be referred to as quantitative structure-property relationship (QSPR) models. These chemical descriptors are either properties directly derived from the molecular structure, e.g. number of nitrogen atoms, or, properties like log P that can be predicted using well-established equations. Subsequently, statistical modelling approaches can be applied to construct predictive models that associate these chemical descriptors to the PK property of interest.
Predictive QSPR models have often been based on ordinary least squares based linear regression modeling. However, such approaches deal poorly with the large number of highly correlated chemical descriptors. Moreover, in many cases, the predictors outnumber the observations, which leads to over-fitting and poor generalizability. One important statistical modeling approach that addresses this limitation by imposing a penalty on the size of coefficients is the penalized regularized regression modelling approach implemented by ridge regression and lasso regression, which use the λ 1 and λ 2 penalties, respectively. Both λ 1 and λ 2 are shrinkage methods that aim to prevent over-parametrization due to correlations of predictors by shrinking regression coefficients to zero. While the λ 1 penalty encourages regression coefficients becoming zero (i.e. variable selection), the λ 2 penalty encourages highly correlated variables to have similar regression coefficients (i.e. grouping), resulting in small but non-zero coefficients. Another regression method is the partial least squares regression (20). Partial least squares regression and ridge regression behave similarly, except that ridge regression can be considered slightly more flexible, and therefore more powerful (20). More recently, elastic net regression has been proposed as a linear combination of the λ 1 and λ 2 penalties, as such combining lasso and ridge regression, Here, the total amount of shrinkage is determined by both, λ 1 and λ 2 . Their values can be tuned using various methods such the bootstrap or cross validation methods. Often, the lasso regression penalty λ 1 is parameterized by s which is the fraction of the L1-norm of the penalized coefficient vector over the unpenalized coefficient vector, bounded between 0 and 1. A higher value of s reflects a lower λ 1 penalty, since the sum of the absolute coefficient values is closer to its unpenalized maximum. Setting λ 2 to 0 performs lasso regression whereas setting s to 1 performs ridge regression.
The objective of this paper is to develop a QSPR model for structure-based prediction of elastic net EPRs of anti-infective agents and associated agents (e.g. β-lactamase inhibitors) based on literature reported values for lung and plasma drug concentrations. The developed model can be used to provide quantitative understanding of effective site concentrations for antibiotic drug development targeting lung infections.

MATERIALS AND METHODS
This analysis was performed as follows: i) original publications reporting ELF concentrations were collected and relevant data was extracted; ii) an exploratory analysis of the EPRs was performed evaluating the effect of various factors not related to chemical descriptors; iii) an elastic net model was trained based on the chemical descriptors of identified antibiotics and associated drugs; iv) the optimal model was evaluated using a test dataset not used for model development.

Data Collection, Extraction and Curation Procedure
The model training dataset was based on two previously reported systematic reviews of clinical studies quantifying concentrations of anti-infective agents (antibiotics, antifungals and associated agents such as β-lactamase inhibitors) in plasma and ELF (7,8). The original publications included in these two systematic reviews were considered to represent a complete overview of available literature on ELF lung concentrations reported for anti-infective agents up to 2011. Potentially the training dataset could have been extended slightly further by searching for studies of drugs that were not used for the treatment of lung infections. However, we had some concerns about including compounds that are structurally completely different as this could introduce bias to the predictions of EPRs of anti-infective agents; the primary application area of the model. Therefore no other unrelated agents were included in the model training dataset.
The external model evaluation dataset was based on: i) the results of a lung PK study for imipenem and MK-7655 that included contributions from some of the co-authors contributed (21) and ii) additional lung PK studies of anti-infective agents identified in the literature for the period between 2011 and 2014 and which were not already included in the training dataset. The following PubMed search query was used to identify additional relevant studies: (ELF or Bepithelial lining fluid^) and antibiotic and (B2011/10/08^[PDat] : B2015/01/ 01^[PDat]) NOT (murine or mice) NOT Review[ptyp] NOT Bin vitro^.
The extraction of data from the original publications (7,8) proceeded as follows. First, all individual publications from literature were systematically retrieved. Subsequently, we collected the following data for each paper: i) individual or mean paired concentrations or AUC values for plasma and ELF; ii) the method of measurement of lung concentrations (e.g. BAL or BMS); iii) details of the study design (dose, time of measurement, route of administration); iv) measurement of either total or unbound drug concentration; v) disease state of the subject (healthy volunteer, patient with lung-disease, patient without lung-disease); vi) the number of subjects on which the ELF/ plasma observations were based. In all cases, individual observations were used if available.
The plasma and ELF concentration data included concentrations that were below the lower limit of quantification (LLOQ). Different scenarios were identified for either the plasma or the ELF observations, or both, being below LLOQ. First, if both the plasma and ELF concentrations were below LLOQ, then the observations were omitted from the analysis. Second, if only either the plasma or the ELF concentration was below the LLOQ, then LLOQ/4 was imputed for the missing concentration, because LLOQ/4 was chosen as conservative estimate of the concentration which was expected to be closer to zero than to LLOQ. Third, if only either the plasma or the ELF concentration was below the LLOQ and the LLOQ was unknown for that study, then an LLOQ of 0.1 mg/L was assumed and LLOQ/4 was imputed for the missing concentration. The value of 0.1 mg/L was considered as a realistically low value based on the observed distribution of concentrations. We evaluated the impact of this imputation strategy by training models based on the data without LLOQ and with LLOQ imputation and using both models for prediction within the same dataset, which featured only above-LLOQ data. The impact of LLOQ imputation was quantified in terms of RMSE. If the RMSE would be significantly higher for the model with LLOQ imputation than for the model with LLOQ exclusion, it would be an indication that LLOQ imputation biases the above-LLOQ predictions.
Plasma concentrations that were reported as total concentrations were converted to the unbound concentration by multiplication with their fraction unbound, which was obtained for the majority of drugs from the DrugBank database (22). For the ELF concentrations we assumed that protein binding plays a negligible role since protein concentrations in the ELF are much lower than in plasma (23). an assumption also made by other investigators (7).
After this curation procedure we calculated the EPRs, which were subsequently transformed using the natural logarithm in order to obtain a more symmetric distribution of the ratios suitable for regression analysis.

Generation of Chemical Descriptors
For each of the identified drugs in the training and test datasets we computed a unique set of chemical descriptors using the R package rcdk (24,25) that provides an interface to the widely used chemistry development kit (CDK) software platform (26). Molecular structures were described using SMILES, as obtained from the PubChem database. Based on the SMILES molecular structure of each drug, all available chemical descriptors within the CDK platform were generated. Subsequently descriptors that had equal values, a correlation of 1, or had only 2 unique values were removed from the descriptor dataset. Descriptors with only 2 unique values were removed to support the leave-one-drug out cross validation.

Exploratory Analysis of Lung Concentrations
Visualizations were generated to assess the change in EPRs in relation to disease state (healthy, patient with lung disease, patient without lung disease), steady state PK, and in relation with the time after dose.

Development and Evaluation of Elastic Net Model
R (version 3.1.2) was used to perform all data manipulations and visualization. The R package elasticnet (27) together with the machine learning wrapper package caret were used to fit the elastic net models (28).
The dataset included both individual measurements and mean values. If individual values were reported we included these in the dataset. If however only mean values were reported, i.e. based on data obtained from several patients at one time point, we included these values instead. To account for the difference between either single individual observations or single mean observations, weighting based on the number of observations available was implemented.
The optimal tuning parameters of the elastic net model (s and λ) were determined using an adapted version of the leaveone-out cross validation (LOOCV). In this adapted version of LOOCV, all data for one drug are iteratively removed from the dataset and subsequently an elastic net model is fit on the remaining data and subsequently the ELF/plasma ratio was predicted for the left-out drug. This process was repeated for all combinations of λ 2 ={0, 1e-04, 1e-03, 5e-03, 1e-02, 5e-02, 1e-01, 2.5e-01, 5e-01} and s={1e-04, 1e-03, 1e-02, 0.0001, 0.001, 0.01, 0.05, 0.10, 0.15, (..), 1.00}. The RMSE of the individually observed versus typical predicted ELF/plasma ratios obtained after cross-validation for each set of tuning parameters was computed. The set of tuning parameters with the lowest RMSE was selected to fit the full dataset using the elastic net model.
Since the elastic net model only considered chemical descriptors, within-drug variability (WDV) related to other factors cannot be predicted. Thus, a theoretical upper limit (lower than 1) for the R 2 , i.e. the R 2 lim can be defined as follows: where C obs,i,a is the ith observation of the ath drug and C obs,mean,a is the mean observation for ath drug. Subsequently the WDV corrected R 2 (R 2 WDV ) was defined as follows: where C pred,a is the prediction of the ath drug. Here, R 2 WDV thus represents the proportion of between-drug variability that can be predicted.
Using the trained elastic net model we predicted the EPRs for the test dataset and computed the RMSE and R 2 values of the obtained predictions. The relative importance of the descriptors was calculated by sequentially fitting a linear model to the observed EPR data, for each of the predictors. From these models, the statistical significances of the slopes being different from zero were calculated for each of the predictors, and scaled between 0 and 100 so that the maximum value of 100 signifies the strongest statistical significance between the descriptor and EPR.
Finally, to assess the appropriateness of the rcdk descriptors, we repeated this modeling procedure for another set of descriptors computed by the open source software PaDELdescriptor (29). Descriptors that had equal values, a correlation of 1, or had only 2 unique values were removed from the descriptor dataset. Descriptors with only 2 unique values were removed to support the leave-one-drug out cross validation. Additionally, any descriptors that were included in the final list of rcdk descriptors were removed from the PaDEL set of descriptors, so that the PaDEL descriptor set would be maximally different from the rcdk descriptor set.
These observations were associated with 1981 underlying paired observations, i.e. when considering mean values based on several individual observations. A total of 97 different studies were included. A more detailed overview of dataset composition and original references is provided in Table S1 and a table of the raw log-transformed EPR  values is provided as Table S2. For the test dataset which was used for model evaluation we identified 5 drugs not included in the training dataset including imipenem (21). MK-7655, a beta-lactamase inhibitor (21). arbekacin (30). GSK2251052 (31) and tedizolid (32).
Of all concentrations in the training dataset, 4 and 13.5% of respectively plasma and ELF concentrations were either not measured or were LLOQ values. For missing ELF concentrations, 34% of the data also had missing plasma concentrations and were therefore not included in the analysis. There were no instances of plasma concentrations and ELF concentrations at the same time being above LLOQ. For the test dataset there were no missing or LLOQ observations. For all drugs, chemical descriptors were derived. A total of 145 descriptors were used. An overview of the correlation structure of the different descriptors is provided in Fig. 2. This figure illustrates the challenge of dealing with multiple highly correlated descriptors. An overview of the chemical structures of the 56 antibiotics for which these descriptors were derived is provided in Figure S1. For the set of PaDEL descriptors, 919 descriptors were used, none of which were included in the aforementioned 145 rcdk descriptors.

Exploratory Analysis
First we evaluated the distribution of observed EPRs in the training dataset as depicted in Fig. 3, stratifying by antibiotic class. From this figure the large within and between antibiotic variability in EPR ratios becomes clear. Some grouping according to class (in color) was observed.
The effect of disease state on EPRs was explored for a subset of antibiotics (n=11) where ratios in more than one disease states was available (Fig. 4, top). Although differences between disease states for different antibiotics are apparent, no clear consistent trend was found.
Regarding pharmacokinetic factors affecting EPR, we recorded the steady state situation. Here, we distinguished between studies involving a single administration (i.e. not reaching steady state), or, if repeated dosing or a prolonged infusion was used (i.e. steady state can be assumed). In this case the EPRs are expected to have reached a state of equilibration and hence closer to their true partition coefficient. Again for 11 antibiotics we identified studies where both steady state and non-steady state studies were available. Although for a few cases, as may be expected, an increased EPR was found for steady state (rifampicin, clarithromycin), this trend was not consistent across the different antibiotics. (Fig. 4, bottom).
Finally we explored the effect of time of measurement after dosing while stratifying across steady state or non-steady state conditions, if available ( Figure S2). Also here, theoretically, trends were expected of increasing ratio's over time, but were not clearly observed.

Model Development
First the optimal set of tuning parameters for the elastic net model was identified based on the lowest RMSE value identified after leave-one-drug-out cross validation (LOOCV) (Fig. 5). The optimal LOOCV metrics were RMSE 1.36 and R 2 WDV was 0.54 (Table II). The numbers are more prone to fluctuation in the test dataset because of the small number of compounds in that dataset (n=5). Nevertheless, the R 2 WDV was 75% for the test dataset. The final model (available as supplementary Rdata file) with the tuned parameters resulted in a R 2 WDV of 0.80, e.g. explained 80% of the predictable variability, and the RMSE was 1.08 (Table II). The observed and predicted EPRs are depicted in Fig. 6. Here, the vertical grey lines indicate the observed range of EPRs. Figure 7 outlines the model residuals (difference in observed versus predicted) as a function of the predicted EPR. No bias can be seen in the residuals for very low or very high EPR values. The percentage of drugs within a 2-and 3-fold difference from the observations was 57.1 and 78.6%, respectively (Table II).
The variable importance plot (Fig. 8) shows the relative importance of different chemical descriptors for the prediction of the EPR, for the 20 most important descriptors. The meaning of these descriptors is described in Table I. Most important were descriptors related to molecular size (MDEC) (33). which concerns molecular distances between carbons. Other important descriptors were related to lipophilicity (XlogP, MlogP), or carbon connectivity in molecules (khs and C3SP3).
Although ionizability is a potential relevant descriptor from a mechanistic point of view it was not included in our model because i) it was not included in the CDK package thereby complicating the application by our model by others and ii) its inclusion was not expected to substantially improve model performance due to the inclusion of already a large number of descriptors.
We imputed concentrations of 0.1 mg/L for ELF and plasma observations that were below LLOQ. The purpose was to avoid biasing the model towards only higher concentration range, which could impair the ability to predict low EPR values well. However, a downside of this approach is that the choice of imputation value can bias the above LLOQ predictions. To investigate the extent of this bias, we compared the predictions with two models: one model was fitted to a dataset with below-LLOQ-imputed observations, a second model was fitted to dataset with below-LLOQexcluded observations. Both of these models were used to predict EPR values in a dataset with only above-LLOQ data. The model fitted with imputed values had an RMSE of 0.97 while the model with only above LLOQ data had an RMSE of 1.01, indicating that the imputation of 0.1 mg/L for below LLOQ values does not bias the predictions for the rest of the data to any relevant extent.

Model Validation
The final elastic net model was used to generate predictions for 5 antibiotics not included in the training dataset (Fig. 6). These predictions had an R 2 WDV of 0.75 and RMSE of 0.70 (Table II), which indicates quite reasonable predictive performance, and which is consistent with the R 2 WDV obtained from the LOOCV. The separately trained model with PaDELderived set of descriptors had a slightly higher R 2 WDV for the full dataset, when compared to the R 2 WDV of the rcdkderived descriptor set (Table II). However, the R 2 WDV obtained from the LOOCV was identical between the models trained on the two sets of descriptors (Table II), highlighting that the two descriptor sets are equally good in predicting the EPR of a new compound not included in the training data. The percentage of drugs within a 2-and 3-fold difference from the observations was 40 and 80%, respectively. Of note, no significance should be attributed to the 2-fold and 3-fold change for train and test datasets because the test dataset only contains 5 compounds making these values sensitive to random variation.

DISCUSSION
We provide the first in silico prediction model for EPRs based on chemical descriptors from a relatively large dataset of antibiotics with potentially a variety of mechanisms (e.g. passive diffusion and active transport). The predictions obtained for both training and test datasets indicate that a considerable amount of between-antibiotic EPR variability could be explained by the use of this chemical descriptor-based model.  Previous reviews on ELF concentrations (7,8) were focused on a more qualitative evaluation, outlining EPR trends between different antibiotic classes. Indeed, such class-based comparisons are relevant based on the visualizations in this article. However, in order to make an impact in antibiotic drug development, there is a significant need for more quantitative models that can predict tissue site concentrations more accurately and in such a way to further support and inform the development of novel antibiotics.
Given the retrospective nature of the dataset analyzed, there exist a multitude of factors that may potentially influence observed EPRs. Firstly, no clear effects of within-antibiotic variability related to factors such as disease state or pharmacokinetic properties could be identified, i.e. no clear and consistent trends were observed. While we certainly expect that such factors may influence within-antibiotic variability, their contribution was limited based on the observed data. Other factors that differ both betweenand within different antibiotics are: i) errors in the applied scaling to unbound plasma concentrations of part of the data, ii) differences in alveolar and bronchial ELF concentrations, iii) various study design related factors such as the wide variation in time of measurement and dosing strategy, and finally iv) inherent differences between BAL and BMS assays. Nonetheless, for the various antibiotics multiple studies were pooled, allowing for potentially more unbiased estimates of typical lung exposures while fitting the elastic net models.
The ten most important descriptors were molecular size, as described by molecular distance edge descriptors between carbon primary, secondary, tertiary and quaternary carbons (MDEC24, MDEC12, MDEC34), carbon binding connectivity (khs.ssssC, C3SP3), sulfur atom binding (khs.ssS), lipophilicity descriptors (MLogP, XLogP), and number of carboxylic cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cefpodoxime proxetil cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin cethromycin clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid clavulanic acid  acid groups (nAcid). The descriptors like molecular size, lipophilicity and acidity are expected to be related to the passive diffusion process. However, a large number of other descriptors were also seen to have a relevant (>20%) relative importance (Fig. 8). Potentially such descriptors may have also helped in explaining the active transport process. We imputed LLOQ values at a value of 0.1 mg/L, mainly for ELF concentrations, as this was the commonly observed threshold across ELF studies. We aimed to include these LLOQ values in order to prevent bias towards predicting higher EPRs, for drugs that actually resulted in low EPR values below the LLOQ. However, a further decrease in imputation led to further decreases in R 2 for the above LLOQ values, e.g. a bias towards predicting the higher concentrations. As such the choice of 0.1 appeared to result in the best balance between bias towards either lower-or higher EPR values.
Given that current mechanistic understanding of alveolar membrane transport is limited, we aimed at developing a statistical model for structure-based prediction of EPRs with good predictive properties, but at its core still remains empirical. Nonetheless, the identified predictors provide insights into the relative importance of various physico-chemical properties on a global level. The use of a regularized regression modelling approach allowed the evaluation of a large set of chemical descriptors while appropriately managing the risks of model over-fitting that is a major concern in such modelling exercises. However, interpreting the model may be challenging because of the large number of regression coefficients, which may be considered a limitation of this model. From Fig. 6 few of the antibiotics exhibit substantial deviations from the mean observed EPRs. Nonetheless, similarly, experimental BAL studies are associated with considerable uncertainties as shown in our analysis. Therefore, this may provide some justification for the use of our in silico model, or may be an inherent variability arising in the data from these BAL studies, rather than solely attributable to model misspecification. Moreover, when performing dose selection studies, only very large deviations from plasma concentrations are of relevance, i.e. minor deviations will not negatively impact the design of these studies or the selection of optimal dose levels.
For development of the QSPR model we have chosen to include a major part of the collected compounds for model development and only a limited set of compounds for an external evaluation, where we observed reasonable predictive performance. How well will this model perform when used to predict the EPR for a new antibiotic? Based on the model evaluation performed we expect reasonable performance for compounds with some similarity in the structural scaffolds of various drug classes included in the model development.
However, for compounds with radically different structural features, our model may not yield informative predictions. However even in the case of such compounds our model can be beneficial and confirmatory in vitro or in vivo experiments may be warranted. How can this model now be used in the drug development process of anti-infective agents for lung infections? Practically, our model can be applied in a straightforward fashion for prediction of EPRs of new anti-infective agents. First, the descriptor values can be computed using the Rcdk R package, and subsequently the predictions can be generated using our final model included as Rdata file as Electronic supplementary material to this paper. An example script is also included as Electronic supplementary material to demonstrate the application of the QSPR model. Conceptually, the model developed in this study could be of considerable relevance to inform and optimize the design of lung PK studies (34). since such studies are methodologically complex and burdensome with respect to the obtaining samples. Secondly, in combination with a straightforward population PK model (35) accounting for plasma PK and its inter-individual variability, clinical study designs can be simulated in order to identify optimal probability of target attainment based on the ELF concentration rather than the plasma concentration. Finally, again in combination with a population PK model, our model may be of relevance for screening or generating confirmatory evidence for antibiotics currently already used to treat lung infections off label, but for which no formal lung PK studies have been conducted.

ACKNOWLEDGMENTS AND DISCLOSURES
This study was performed within the framework of Dutch Top Institute Pharma, PKPD PLATFORM 2.0 (project number D2-501). This work was carried out on the Dutch national e-infrastructure with the support of SURF Foundation. The authors do not have any conflict of interest to report.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.