Introduction

All internal body surfaces are bathed with fluids that contain a variety of molecules, some of which are involved in innate immunity defense. Innate immunity forms the first line of defense against microbial invasion, and well-established mediators of this system include antimicrobial peptides [1]. Lipids are also ubiquitously present in body fluids [2], and there is increasing evidence that selected lipids have antimicrobial activity [3, 4] and form a lipid-mediated arm of the innate immunity response. We have shown that cholesteryl esters have antimicrobial properties and contribute to the innate immunity of secretions of the airway mucosa [5, 6]. Developing an optimized separation method in order to characterize the role of individual lipids in innate immunity would be a significant advance to this field. Methods requiring minimal sample handling with high recovery for subsequent functional assays are essential. Current methods for cholesteryl ester analysis include HPTLC [7], TLC-GC [8], and ES-MS/MS [9, 10]. A high-performance liquid chromatography (HPLC) protocol for cholesteryl ester analysis has been described by Cullen et al. [11]. However, this protocol is suboptimal because triglycerides are first removed, preventing complete lipid analysis and thus adding additional steps. To meet our interest in characterizing antimicrobial lipids, we developed a reversed-phase HPLC (rpHPLC) one-step protocol for total lipid extracts that is suitable for both analytical and preparative scale work, but in this method, cholesteryl esters were incompletely resolved [5]. Because of the evidence of an antimicrobial role for cholesteryl esters [5, 6], optimization of the analytical protocol became of paramount importance.

The use of information processing techniques, in particular artificial neural networks (ANN), has proved valuable for a variety of separation methods [1214]. When combined with experimental design techniques, ANN quickly optimizes the separation conditions and shortens analysis time, and does so without knowledge of the physical or chemical properties of the analytes [15]. In terms of applications, a limited number of studies have utilized ANN for optimizing HPLC experimental conditions [1517]. Others have used a hybrid artificial neural network–genetic algorithm (ANN-GA) approach to improve separation methods for isolated compounds [18] and for routine pesticide analysis [19]. However, the incorporation of ANN-based methods for investigating the separation of complex human biological fluids is lagging, particularly with regards to lipids. Herein, we describe the first use of an ANN-GA approach for characterizing cholesteryl esters in human body secretions.

Experimental

Chemicals and reagents

HPLC grade acetonitrile, reagent grade alcohol, water, and chloroform were obtained from Fisher Scientific (New Jersey, USA). HPLC grade dichloromethane was obtained from EMD Chemicals Inc. (Darmstadt, Germany). Cholesterol, cholesteryl arachidonate, cholesteryl linoleate, cholesteryl palmitate, tri-palmitin (a highly hydrophobic triglyceride with similar elution times as cholesteryl esters in rpHPLC), cholesteryl stearate, and heptadecanoic acid (internal standard for milk lipid extraction) were obtained from Sigma-Aldrich (Missouri, USA).

Lipid standards

A mixture of cholesterol, cholesteryl arachidonate, cholesteryl linoleate, cholesteryl palmitate, tri-palmitin, and cholesteryl stearate, each at 1 mg/mL, was prepared in dichloromethane.

Lipid extraction from human milk

Human milk from three different donors was purchased from Mother’s Milk Bank, Denver, CO. Lipids were extracted according to Bligh and Dyer [20] using 10 µL aliquots for rpHPLC analysis and 20 µL aliquots for ESI/MS/MS.

Chromatography

Under the control of Chromeleon® software (version 6.60 SP2), solutions of standards and extracts re-dissolved in dichloromethane were manually injected (1–3 μL/injection, 20 μL injection loop) onto a reversed-phase column (Dionex Acclaim PolarAdvantage II, a silica-based column with a proprietary amide-embedded ligand, 150 mm × 2.1 mm ID, 3 μm particle size) preceded by a guard column (Dionex Acclaim PolarAdvantage II, 2.1 × 10 mm, 5-μm particle size) in a temperature-controlled compartment (Dionex model TCC-100 column oven). The column had been previously equilibrated in the selected eluant and was eluted (Dionex P680 low-pressure quaternary pump with degasser) at specified flow rates. The eluant was passed through an evaporative light-scattering detector (ELSD, Alltech model 800). For fraction collection, the ELSD was bypassed and collected fractions were dried in a stream of nitrogen and stored at −20 °C for further analysis.

Mass spectral analysis

Cholesteryl esters in collected fractions were identified by ES/MS and MS/MS according to Duffin et al. [9] and as previously reported [5].

Experimental design

The influence of flow rate, ethanol and methanol content in the mobile phase (acetonitrile), and column temperature on cholesteryl ester separation was investigated with a fractional factorial design where the number of experiments of a full factorial design is reduced by a number p according to 2 kp. The choice of mobile phase components and column temperature were based on preliminary observations. Moreover, the selection of flow rate was based on studies showing its effect on peak shape [21] and resolution [22]. The eight randomized experiments (run in triplicate for a total of 24 runs) and acquired data are highlighted in Table 1. Data were analyzed in JMP (SAS Institute, USA) statistical software.

Table 1 Summary of factors and responses for screening runs

Derringer’s desirability function

Cholesteryl ester separation was evaluated for the total number of peaks, the retention time of the last eluting peak, and the sum of all peak resolutions, as described in Novotna et al. [15]. Each parameter was transformed into a desirability variable (d) using Derringer’s two-sided transformation [23]:

$${d_i}= \left\{ {\begin{array}{*{20}{c}} {\left[ {\frac{{{Y_i} - {Y_{i*}}}}{{{c_i} - {Y_{i*}}}}} \right]^s} & {{Y_{i*}} \leqslant {Y_i} \leqslant {c_i}} \\{\left[ {\frac{{{Y_i} - Y_i^*}}{{{c_i} - Y_i^*}}} \right]^t} & {{c_i} < {Y_i} \leqslant Y_i^*} \\0 & {{Y_i} < {Y_{i*}}\quad {\hbox{or}}\quad {Y_i} > Y_i^*} \\\end{array} } \right. .$$
(1)

where Y i is the measured response, Y i* the minimum acceptable value, Y i * the maximum acceptable value, and c i is the target value of the measured response. Values of d i range from 0 to 1, with 1 being the most desirable. A measured response below the minimum acceptable value or above the maximum acceptable value produces the unacceptable result d i  = 0. Furthermore, the exponents s and t adjust the importance of the target value relative to the minimum and maximum values, respectively.

For the transformation of the total number of peaks, the target value was set at 6 to match the number of analytes in the standard mixture. The minimum acceptable value was set at 1, while the maximum acceptable value was set at 8 to accommodate lipid oxidation or degradation that might result in additional peaks. The exponents s and t were both set at 3 to assign more weight to values closer to the target value of 6. For the transformation of the retention time of the last eluting peak, a target value of 25 min was chosen to accommodate future studies of biological fluids and high throughput analysis. The maximum value was set at 100 min, and s was set to 1 and t to 3 to match the preference for a shorter run.

Before the sum of all peak resolutions was transformed into a desirability value, resolution between consecutive peaks was first calculated as:

$$ R = \frac{{2\left( {{t_2} - {t_1}} \right)}}{{{w_1} + {w_2}}} $$
(2)

where t and w indicate the retention time and associated baseline peak widths of the consecutive peaks, respectively. To arrive at a target value, resolutions between consecutive peaks were evaluated to allow for the adequate spacing of additional cholesteryl esters or other non-polar compounds like triglycerides [11]. The resolution between the first and second peaks, cholesterol and cholesteryl arachidonate, was set at 9 to avoid interference from diglycerides and phospholipids, which may elute in the same region [5]. The resolution between each of the remaining consecutive peaks was set at 4. All peak resolutions were summed, resulting in an overall target value of 25. The minimum acceptable value was set at 1, the maximum acceptable value was set at 28, s was set at 1, and t was set at 3 so that the desirability would increase moderately toward 25, but decline more rapidly thereafter. The individual desirabilities of the three parameters were subsequently merged into one single numeric value D (response) as follows:

$$ D = {\left( {{d_p} \times {d_{\rm t} } \times {d_r}} \right)^{1/3}} $$
(3)

where d p is the transformation of number of peaks, d t the transformation of the retention time of the last peak, and d r the transformation of the sum of all peak resolutions.

Artificial neural network–genetic algorithm approach

Figure 1 shows the ANN-GA strategy employing Levenberg–Marquardt backpropagation with a sigmoid transfer function. ANNs are computational modeling tools defined by structures comprising interconnected adaptive processing elements that perform parallel computations for data processing and interpretation [24]. A collection of processing units communicates by activation of neurons over a number of weighted connections [25]:

$$ \xi_j^l = \sum\limits_{i = 1}^{{N_{l - 1}}} {w_{ij}^l} x_i^{l - 1} $$
(4)

where \( {\xi_j} \) is the net effect and \( w_{ij}^l \) is a connection weight. In a given layer l, a standard neuron, j, integrates the signals, x j , impinging upon it and producing the net effect. Input and output layers are present and symbolize data inputs into the neural network and response of the network to the inputs, respectively.

Fig. 1
figure 1

Schematic of the hybrid artificial neural network–genetic algorithm (ANN-GA) method

Genetic optimization has proven beneficial in the determination of efficient neural network structure through a population of individuals, which evolves toward optimum solutions through defined genetic operators (selection, crossover, and mutation) [26, 27]. The GA used here operated according to a general two-step process: (1) initialization of the population and evaluation of each member of the initial population and (2) reproduction until a stopping condition was met. A population of random binary strings, each of which represented a specific network architecture and set of training parameters, were defined. The fitness of each trained network was calculated according to:

$$ {\rm Fitness} = {W_1}{f_{\rm{e}}}\left( {\widehat{e}} \right) + {W_2}{f_{\rm{t}}}\left( {\widehat{t}} \right) $$
(5)

where f e(ê) is the error between real output and ANN output, f t(\( \hat{t} \)) is the training time of ANN, and W 1 and W 2 are appropriate values of weight. According to the observed fitness, the GA method selected a new group of strings, ostensibly representing the parents of the next generation with an assigned probability of reproduction.

For this study, a two-point crossover approach (one that calls for two points to be selected on the parent organism string) was employed. Mutation ensured that the probability of searching regions in the experimental space was never zero and prevented loss of genetic material through reproduction and crossover. The above process was repeated until the best string that gave the maximum fitness or minimum mean square error (MSE) was chosen. The MSE was calculated as:

$$ {\rm MSE} = \frac{{{\rm SSE} }}{n} $$
(6)

where SSE is the sum of squares error given by:

$$ {\rm SSE} = {\hbox{Su}}{{\hbox{m}}_{\left( {i = 1\;{\rm{to}}\;n} \right)\left\{ {{w_i}{{\left( {{y_i} - {f_i}} \right)}^2}} \right\}}} $$
(7)

and y i is the observed data value, f i the predicted value from the fit, and w i the weighting applied to each data point, usually w i  = 1. For objective appreciation, synaptic weights and bias were set at 0.01 and 0.1, respectively. GA configuration values included a maximum generation of 100, a crossover probability of 0.5, and a mutation probability of 0.01. All data were cross-checked to ensure accuracy and validity and analyzed via JMP (SAS Institute, USA) and MATLAB 7.8 (The Math Works, USA).

Results and discussion

Experimental design

From the experimental design matrix (Table 1), eight experiments (performed in triplicate) were analyzed in randomized order with the mean response and standard error reported. Factorial ANOVA results confirmed good agreement between measured and predicted values, with the summary of fit reported at r 2 = 0.96. The general ANOVA calculations for the linear model are presented in Table 2. Prob > F represents the significance probability for the F ratio, which states that if the null hypothesis is true, a larger F statistic would only occur due to random error. Significance probabilities of ≤0.05 are evidence of at least one significant regression factor in the model. Close examination of Prob > F revealed that percent ethanol has the greatest significant single effect (<0.001) on the model response, D. Ethanol also exhibited a significant interactive effect when combined with flow rate (<0.001).

Table 2 ANOVA calculations for the linear model

ANN-GA prediction and validation

Transformed data obtained from the fractional factorial design was used as the study data set for ANN-GA modeling. This data set was randomly divided with a 90% split between the training (70%) and test sets (30%), and the additional 10% set aside for validation. The training data set was used for model fitting in computing the gradient and updating network weights and biases. The optimal ANN architecture (Fig. 2), realized in 18 generations, included three hidden neurons and a MSE of 0.112. Typically, the smaller the MSE value, the smaller is the difference between the predicted results and the actual data, and thus the better the prediction performance was realized for the neural model. At this point, the network had the best generalization ability. The test set was then employed to check and verify the trained neural network. A plot of experimental responses versus peak responses predicted by ANN-GA for training and verification of the optimum architecture, which exhibited excellent model fitting (r 2 = 0.97, n = 10) is presented in Fig. 3. Poor agreement would be indicative of overlearning. However, the level of correlation exhibited in this study confirms that the optimal neural network selected adequately modeled the response (D) for the six analyte standards.

Fig. 2
figure 2

Schematic of the optimized ANN architecture realized in this study

Fig. 3
figure 3

Plot of experimental response versus the response predicted by the ANN-GA model for training and verification

From the data patterned by the ANN-GA model, a response surface was generated for the two interactive factors (percent ethanol and flow rate), with the largest value on the response surface corresponding to the greatest distribution of associated peaks. Software-generated ANN simulation runs were obtained as a result of examining the response surface, with five validation conditions that resulted in maximum predicted response values chosen for experimental verification. A summary of the modeled factors and responses are presented in Table 3. As shown, successful validation was achieved, with experimental responses in good agreement (<6.2% discrepancy difference) with model-predicted responses. Modeled conditions from validation run 1 were used to compare peak separation before and after ANN-GA optimization as shown in the representative chromatographs presented in Fig. 4a, b, respectively. In Fig. 4b, six discernible and fully resolved peaks are noticeable, including the two previously co-eluting peaks, cholesteryl palmitate and tripalmitin. The average retention time of the last peak for this separation was 38.0 min. As this criteria was considered less critical, the trade-off between retention time and peak resolution was deemed acceptable.

Table 3 Summary of factors and responses for validation runs
Fig. 4
figure 4

Reversed-phase HPLC chromatograms of cholesteryl ester analysis before (a) and after (b) ANN-GA optimization employing validation run 1. Three micrograms each of cholesterol (C), cholesteryl arachidonate (Ca), cholesteryl linoleate (Cl), cholesteryl palmitate (Cp), tri-palmitin (Tp), and cholesteryl stearate (Cs) were injected

Considering that a satisfactory peak separation was achieved with a limited number of HPLC runs, this study confirms the suitability of the ANN-GA model in method optimization for rpHPLC. In contrast, manipulation of parameters via trial and error did not lead to a substantially improved separation of cholesteryl esters after more than 50 runs, and hence, this approach was abandoned (data not shown). Previously, ANN has been used successfully to separate structurally similar compounds, namely, indinavir and lactone, using HPLC [28], and complex mixtures of neuropeptides [15].

To assess whether the improved method (validation run 1) would be applicable to complex biological fluids, we subjected extracted human milk lipids to rpHPLC using the same experimental conditions dictated by the ANN-GA model. Choosing a lipid-rich fluid provided a robust test of the ANN-GA model. Since fatty acids and other lipids in human milk may interfere with cholesteryl ester separation [29], we included a water gradient (15% to 0% H2O, 16.5% to 19.4% reagent alcohol with acetonitrile in the first 5 min of the separation) before the isocratic gradient used for validation run 1. This water gradient had previously been successful in separating free fatty acids in our laboratory (unpublished data).

Figure 5 shows representative chromatograms of three different lipid analyses from human milk before (Fig. 5a, isocratic gradient with 50% alcohol/50% acetonitrile) and after method optimization (Fig. 5b, gradient according to validation run 1). The numbers in the chromatogram indicate fractions that were collected in a subsequent run for further ESI/MS and MS/MS analysis. Even with this lipid-rich complex fluid, the optimized method designed for cholesteryl ester analysis produced an improved separation of non-polar lipids as reflected in peak shape and baseline separation. Specifically, 14 discernible peaks in the region of less polar and non-polar lipids of which only two returned to the baseline were recorded using the pre-optimization method, while 17 distinct peaks in this region of which seven returned to the baseline were recorded after ANN-GA optimization.

Fig. 5
figure 5

Reversed-phase HPLC chromatograms of lipid extract from human milk before (a) and after (b) ANN-GA optimization employing validation run 1. Lipid extract from 10 μL of human milk, spiked with 40 μg heptadecanoic acid as internal standard, was injected. Fraction numbers that were further analyzed by ES/MS and MS/MS are indicated (see Table 4 and ESM Table S1)

ESI/MS and MS/MS analysis of the collected fractions revealed an overall improved separation of the four cholesteryl esters that had been employed for modeling (Table 4). A complete listing of masses and cholesteryl esters found in the collected fraction is given in Electronic supplementary materials (ESM) Table S1. The complexity of the identified cholesteryl esters is in agreement with previous reports on cholesteryl ester content in human milk [30], though we were able to identify additional cholesteryl esters not previously described, possibly due to our one-step separation approach. The former study employed preparative TLC prior to methylation and gas liquid chromatography to identify the constituent fatty acids of milk cholesteryl esters. Carroll and Rudel [31] established an HPLC method in which cholesterol and ten cholesteryl esters were separated, of which seven lipids were partially and four were fully resolved. However, this method required two identical columns connected in series [32]. This effectively increased peak resolution by increasing column length. Further peak separation of complex biological fluids with our current one protocol would be also expected with increasing the column length. Others have achieved a combination of partial and complete separation of up to 14 cholesteryl esters and cholesterol, but pre-purification to remove triglycerides by hydrolysis or silica gel-60 column chromatography was required to remove interfering components [11, 33]. In contrast, here, we have shown an rpHPLC method that improves separation of cholesteryl esters in a complex biological fluid without a pre-purification step on a single column, which will accelerate studies on the biological role of individual cholesteryl esters in mucosal secretions.

Table 4 Elution profile of selected cholesteryl esters extracted from human milk in rpHPLC before and after ANN-GA modeling

Conclusion

This study provides guidance for the development and application of experimental design methodology and ANN-GA modeling tools for rapid optimization of the HPLC method used to separate cholesteryl esters. This approach optimally determined a set of conditions in which cholesteryl esters were fully separated in standard mixtures and better resolved in a complex biological fluid in a timely fashion. This method improvement will facilitate studies to unveil the biological function of cholesteryl esters in innate immunity. Moreover, as a general-purpose optimization approach, ANN-GA modeling will likely prove useful for a wide range of method optimizations concerning complex biological samples.