Background

Urinary tract infections (UTIs) are the most common form of bacterial infections both in the general population and in hospital patients, attributing to nearly 25% of all infections [1]. UTIs are much more common in females than males. It is estimated that 40–50% of women will develop a UTI during their lives, and approximately 33% of women will have recurrent acute uncomplicated UTI [2]. Common primary bacterial uropathogens are Escherichia coli, Staphylococcus saprophyticus, Enterococcus spp., Proteus mirabilis, and Klebsiella pneumoniae. While most common secondary uropathogens are Staphylococcus aureus, Klebsiella oxytoca, Pseudomonas aeruginosa, Streptococcus agalactiae and fungal pathogen Candida spp. [3,4,5,6]. Approximately 60–80% of all uncomplicated bacterial UTIs are caused by E. coli. Researchers have recognized that urine is not sterile and confirmed the importance of resident bacterial flora (urinary microbiota) in the lower urinary tract. Resident urinary microbiota is mostly composed of Lactobacillus gasseri, Corynebacterium coyleae, Actinobaculum schaalii, Aerococcus urinae, Gardnerella vaginalis, Streptococcus anginosus, Streptococcus epidermis, Actinomyces neuii and Bifidobacterium spp. [7, 8].

In order to identify microorganisms in clinical microbiology laboratories, most used methods are microbiological techniques which are still based on cultivation on different culture media [9]. Despite advances in genomics and proteomics, urine culture method is still the golden standard for the diagnosis of UTIs. Urine samples containing more than 105 CFU/mL of a single microbial species usually indicate clinical relevance. However, there are significant shortcomings to these cultivation-oriented methods. The first limitation is the time required for the cultivation of microorganisms and subsequent identification [10]. Standard incubation times range from 12 to 24 h in order to enable reliable detection of the presence of uropathogens [11]. The second limitation is the requirement for fresh urine samples. Some of these limitations may result in overall negative urine cultures in up to 80% of cases, in many microbiology laboratories [12]. Unfortunately, a wide variety of sampling methods and inappropriate specimen transport are major cause of pre-analytical errors [13].

Various methods have been used for detection of microorganisms in clinical microbiology [14,15,16]. For fast screening of urine samples, flow cytometry (such as Sysmex analyser) has been used. However, urine flow cytometer is not able to provide bacteria identification [17, 18]. Genomic methods relying on DNA analysis, such as SeptiFast, FilmArray or GeneXpert, are being used, however they are still not approved by the FDA for UTI identification [14]. Usage of real-time PCR methods in the identification of uropathogens has been proven as feasible [19], however it is limited in its scope. Techniques using DNA sequencing regularly show more sensitivity compared to standard urine culture test. For this reason, bacterial identification relying on sequencing of the 16S rRNA genes is becoming a method of choice for detection of uropathogens in urine samples [20, 21].

Field of proteomics also offers methods for microbial identification, mass spectrometry (MS) being the most prominent one. MS platforms used include matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) based analysis producing characteristic spectrum called peptide mass fingerprint (PMF), or less frequently used liquid chromatography tandem mass spectrometry (LC-MS/MS) based peptide sequencing. LC-MS/MS depends on initial isolation of bacterial colonies from urine and their subsequent cultivation [22, 23], while MS based analysers claim ability to directly process samples or swabs. Today, MS-based analysers are in routine use, such as the Bruker BioTyper (Bruker Daltonics) and VITEK MS Plus (bioMérieux), both detecting MS1 spectra fingerprint consisting of most abundant proteins present in a wide array of microorganisms [24,25,26]. The US Food and Drug Administration (FDA) has issued regulatory approval for using MALDI-TOF mass spectrometry-based platform for routine identification of pathogenic microbes from human specimens in clinical microbiology laboratories [23, 27]. This instrument is coupled with dedicated software and database so it can perform a comparison of the recorded MS1 spectra with the mass spectra of known microorganisms stored in the database. However, MALDI-TOF MS has its limitations and does not allow identification of microorganisms at the species level, nor it performs well when more than one species or strain is present in the sample [28,29,30]. Furthermore, in order to obtain reliable results, samples have to be cultured on selective agar and a single microbial colony is then used to identify an organism. To bypass time-consuming and selective cultivation stage, culture-independent methods have been developed [17, 31,32,33,34]. More recently, there has been growing interest in mass spectrometry based proteomic analyses directly from urine samples, thus skipping the cultivation stage [35, 36]. Ideally, the metaproteomic analysis should be able to provide sufficient numbers of strain-specific peptides useful for microbial identification at the genus, species and even strain-level, and it could also be applied to urine samples containing more than one species, including even potential biomarkers used for non-invasive monitoring of human diseases [37,38,39,40].

Methods

Urine samples collection and storage

Urine specimens were collected from the Centre for Clinical Microbiology and Hospital Infections, University Hospital Dubrava with only exclusion criteria being antimicrobial therapy. Through the period from October to December 2016 total of 2993 urine specimens were received from patients for whom a urinary culture analysis was requested (Additional file 2: Table S1). The samples were collected from patients according to the instructions for collecting the urine by midstream clean-catch technique [41].

Urine culture test

The microorganisms were identified by routine microbiology methods [42]. Aliquots made from urine specimens were inoculated onto McConkey agar and blood agar plates using a 1 µl calibrated loop and incubated aerobically at 37 °C from 18 to 48 h, according to the standard operating procedure at the Centre for Clinical Microbiology and Hospital Infections, University Hospital Dubrava. Single colonies were counted to determine the bacterial concentration. Clinically significant infections were considered those with more than 105 CFU/mL.

Samples for genomics and proteomics analysis

From samples that tested positive (total of 1571) on urine culture test, 16 samples were randomly selected, matching the following criteria: a.) more than 105 CFU/mL and b.) more than 30 ml of urine. All sixteen urine samples (associated with corresponding laboratory reports) were stored at − 20 °C and used for further genomic and proteomic analyses.

Genomic analysis

DNA extraction

Frozen samples were thawed at room temperature and homogenised. Bacterial genomic DNA was extracted using the Maxwell 16 Cell DNA Purification Kit on the Maxwell 16 research instrument (Promega, Madison) according to the manufacturer’s instructions. The concentration of DNA was determined using a Nano-Drop spectrophotometer (Shimadzu Biotech).

16S rRNA sequencing and bioinformatics analysis

Extracted DNA was sent to Next Generation Sequencing Service Provider (MR DNA, Texas, USA). Sequencing was performed on an Illumina MiSeq platform using paired-end sequencing protocol. Amplicons of the 16S rRNA gene were generated using primers targeting V3 and V4 variable regions of the ribosomal RNA. A 30-cycle PCR reaction was performed using the HotStarTaq Plus Master Mix Kit (Qiagen, USA). Microbiome bioinformatic analysis was performed using QIIME 2 (Quantitative Insights Into Microbial Ecology) software package version 2018.4 [43]. Paired-end raw sequences were demultiplexed and quality filtered using the q2-demux plugin followed by de-noising with DADA2 [44]. First 7 bases of forward and reverse reads were trimmed, forward reads were truncated to 290 bases, and reverse reads to 240 bases. Taxonomy was assigned to obtained amplicon sequence variants using the q2-feature-classifier [45] which relies on classify-sklearn naive Bayes taxonomy classifier and Greengenes v. 13_8 from which 99% OTUs reference sequences were trimmed to variable regions 3 and 4 [46]. Amplicons were analysed using the QIIME 2 (version 2017.4).

Proteomics

Sample preparation

For each sample, a homogenized aliquot of 10 ml urine sample was centrifuged at 1000 g at room temperature for 1 min (Additional file 1: Figure S1). Insoluble sediment was discarded, and supernatant was transferred to a new tube and centrifuged at 16,000 g at 4 °C for 5 min. The supernatant was discarded, and the bacterial pellet was re-suspended in a buffer (25 mM NH4HCO3, pH 7.8). The pellet was homogenized on vortex and centrifuged at 16,000 g, at 4 °C for 5 min. This procedure was designed to “wash out” mainly excess human cells and it was repeated three times. Proteins were extracted from the bacterial pellet using 100 µL of bacterial protein extraction reagent B-PER (Thermo-Pierce, USA). Following the manufacturer’s protocol, sample was incubated at room temperature for 15 min and subsequently heated at 100 °C in a water bath for 2 min. Insoluble cellular debris was removed by centrifugation at 16,000 g at 4 °C for 5 min. Finally, supernatant with soluble proteins contained in B-PER solution was ready for the next step in proteomics sample preparation.

In solution digestion

Protein sample contained in B-PER (70 µL) was mixed with 2 µL of trypsin solution (1 mg/mL, Merck, Germany). The in-solution digestion was carried out at 37 °C on a thermoshaker (500 rpm) for 18 h (overnight).

Peptide fractionation

After 18 h of trypsin in-solution digestion, fractionation was performed using the Agilent Bravo automated liquid handling platform (96-channel tip head) and AssayMAP SCX cartridges according to the manufacturer’s instructions, and fractionation protocol (application note 5991-3602EN), SCX cartridges were primed with 400 mM ammonium formate/1% formic acid/25% acetonitrile (ACN), equilibrated with 1% formic acid/25% ACN, loaded with samples, and eluted sequentially using a 40 mM ammonium formate/25% ACN (pH 3.5; 4.0) 40 mM ammonium acetate/25% ACN, (pH 4.5; 5; 5.5) and 100 mM ammonium hydroxide/25% ACN (pH 9.5). From each processed sample, a total of six fractions were collected by chromatography using a pH modulated stepwise elution method.

MALDI-TOF/TOF mass spectrometry analysis

For sample analysis, 1 µl of 5-mg/mL α-CHCA (α-cyano-4-hydroxycinnamic acid) matrix solution was mixed with 1 µl of each sample fraction (six fractions per sample). From the resulting solution, 1 µl was spotted onto the Opti-TOF MALDI 384 target plate (AB Sciex). After drying at room temperature, spotted samples were analysed using a 4800 Plus MALDI-TOF/TOF mass spectrometer (Applied Biosystems Inc., Foster City, USA) equipped with a 200 Hz, 355 nm Nd: YAG laser. MS spectra were acquired over a mass range of 800–4500 m/z. Peptide fragmentation was performed at collision energy (CID) of 1 kV in positive ion reflection mode, using nitrogen as collision gas. For each sample up to 20 most intense peaks of MS spectra were selected for MS/MS spectra analysis. Approximately 1000 single shots were accumulated from different positions for MS analysis, and 2000 shots spectra were recorded for the subsequent fragment ion spectra. Internal calibration using trypsin autolysis fragments was performed. MS and MS/MS spectra were acquired using the 4000 Series Explorer software v 3.5.3 (AB Sciex).

Analysis of proteomics data

Mascot (version 2.1. Matrix Science, UK) analysis was carried out to identify peptides and to search for matching proteins in the NCBI “nr” database (20140312) with taxonomy filter set for Proteobacteria (11838333 sequences), Firmicutes (5487348 sequences) and Homo sapiens (276468 sequences). Search parameters for MS and MS/MS database were as follows: parent ion mass tolerances of 0.3 Da and 0.5 Da fragment ion mass tolerance, trypsin digestion with a maximum of one miscleavage per peptide and methionine oxidation as variable modification. Trypsin specificity was set at C-terminal lysine and arginine unless next residue is proline. Qualitative data analysis was performed with MASCOT using a 95% confidence interval, so the significance threshold was adjusted with the false discovery rate below 5%. In Mascot reports a minimum score of 48 was used.

Results and discussion

Urine culture test

All samples, which have undergone proteomics and genomics analyses, were benchmarked against standard urine culture test that accompanied all the samples (Additional file 2: Table S2). Among the 16 clinical samples analysed, 13 were classified as monobacterial infections and 3 were classified as polymicrobial (at least two identified uropathogens). Thirteen samples showed presence of Gram-negative and only three to Gram-positive bacteria. Regarding taxonomic diversity of the samples analysed, according to standard tests, there were 7 different bacterial species in total, belonging to 4 respective genera (Additional file 2: Table S3).

Effect of storage time and temperature on bacteria in urine samples

Guidelines for the collection and storage of urine specimens differ for different diagnostic purposes. This is something we should be aware of. Urine samples should be collected and stored having in mind exact diagnostic procedures to be carried out. In our study, short-term storage (up to 4 weeks) of urines at − 20 °C showed to be a good choice for the preservation of bacteria in collected samples. Long-term storage (for more than 3 months) at − 80 °C led to biomass loss, most likely due to prolonged freezing which caused greater bacterial cell fragility, thus leading to greater extent of cell disruption during centrifugation (unpublished observations).

Identification of microorganisms using genomics

16S rRNA sequencing results

Identification of bacterial taxa is shown in Table 1. Lowest obtainable taxonomic level for which assignment was possible is being shown as a result of genomic identification. Table 1 provides following information: sample number, conventional urine culture result, DNA concentration and 16S rRNA gene sequencing result.

Table 1 Identification results based on conventional urine culture and 16S rRNA gene sequencing

What stands out in this table is a disparity in taxonomic identification obtained through 16S rRNA gene sequencing—in the majority of cases bacteria were identified on genus level (44%) and family level (56%), while the identification on species level is usually lacking.

It is apparent that Klebsiella spp. (UR1-UR3), and Enterobacter spp. (UR13-UR14) identifications are difficult to compare due to different levels of taxonomy assignment by the method [47], while there is a significant positive correlation amongst other results for both conventional and genomics methods. A possible explanation for this difficulty might be related to bacterial nomenclature, taxonomy and very high sequence identity. Furthermore, genomic based 16S rRNA analysis was not informative at the genus and/or species level in the family Enterobacteriaceae [48]. There was a surprising difference between standard test and genomics results in sample UR 5. Standard urine culture test indicated Enterococcus faecalis as a single uropathogen in this sample, while 16S rRNA indicated polymicrobial mixture without Enterococcus genus listed. There are two possible explanations for this disparity, one indicating a urine collection sample contamination [49] which would likely cause a genomics test error, and the other being false-positive result of standard culture-based urine test giving a false positive Enterococcus result.

Method for proteomics-based identification of uropathogens

The present study was undertaken to assess the potential of bottom-up proteomics for identification of pathogens directly from the urine samples of patients with UTIs by benchmarking the results obtained against the reference ones (standard urine tests) and using the 16S rRNA gene sequencing—genomics for arbitration in cases where proteomics gives results which differ from the standard urine test.

Sample preparation

For the proteomic analysis, a minimum concentration of 105 CFU/mL and a volume of 5 mL of fresh urine sample or urine stored in the refrigerator up to 4 weeks were used. In this preliminary study, we investigated and compared the preparation of samples stored at − 20 °C and − 80 °C. We based our decision on the optimal storage temperature of samples on visual inspection of pellets during centrifugation. In the case of urine samples stored at − 80 °C bacterial cells were lost, and the pellet was deemed insufficient for further downstream analysis. On the other hand, samples stored at − 20 °C showed abundant biomass, however, this proved to be a challenge to wash. Reason for this could be cell aggregation, probable auto-aggregation, especially since blood was present in tested samples [32]. Furthermore, good separation of bacterial cells from other materials such as yeast cells, epithelial cells, leukocytes, erythrocytes, mucus, urinary casts, and different types of crystals that can be present in urine depends on centrifugation speed [32, 49]. Moreover, at high-speed the pellet will likely be abundant with cell debris. Consequently, damaged cells will be washed off during the sample preparation process. Pellet volume was identified as an important element that influenced the success of positive protein identification. Microbial biomass had to be visible to the naked eye after washing steps. The obtained pellet biomass can be seen in Additional file 1: Figure S2.

Previous studies had considered the impact of ultra-sonication on microorganisms to improve sample preparation [32, 50, 51]. In our research protein extraction using B-PER worked for both gram-negative and gram-positive bacteria, so there was no need for additional mechanical methods of cell rupture. In reviewed literature, no data was found on the efficiency of protein digestion in the presence of B-PER. We believe that no other group has reported the use of trypsin in the B-PER solution.

Peptide fractionation

During a preliminary study, we found that the amount of data we could get from one sample spot was insufficient. Thus, to overcome this obstacle we used peptide fractionation. We hypothesised that peptide fractionation would help to enrich the low-abundance peptides (Additional file 1: Figure S4).

MALDI-TOF/TOF mass spectrometry results

Protein identifications and data analysis

While BioTyper and Vitek use reference databases to identify and classify the microorganisms according to their mass spectra fingerprint, we relied on peptide ion fragments from MS/MS scans and MASCOT protein search results which were translated into MASCOT based uropathogen identification ranks. For this purpose, we have combined MASCOT score with a peptide count and made a simple Python script that ranks organisms suspected to be in the sample based on probability of their proteins being detected. First step was protein identification of tryptic peptides conducted using MASCOT search engine [52]. This provided us with both score and number of queries matched for proteins belonging to one or more organisms. The Mascot Score is a statistical score for how well the spectra generated match the database protein sequence [52, 53]. Plainly, a higher score indicates a more confident protein match while the number of queries matched indicates the number of spectra that were matched to this protein. Although it is not unusual for a portion of peptides to be scanned multiple times, overall, the greater the score and greater the number of queries matched—greater the probability of a true positive match. Therefore, we have combined these two measures into a “summa score”, simply by summing up all individual peptide scores for a given protein match. Proteins and respective taxa were ordered based on this “summa score” in descending order and highest scoring taxa was taken as most likely uropathogen identification. Table 2 compares the results of this analysis with the standard urine culture test. Summarized report on MASCOT identified bacterial proteins is listed in Additional file 2: Table S4.

Table 2 MALDI-TOF/TOF analysis with MASCOT identification of uropathogens

The proteins ordered by summa score were listed in Additional file 3: Table S1. Significant minimum MASCOT summa score obtained for all samples was 53, while maximum reported score was 830. A total number of 382 peptides were reported for all 16 samples. Most of these peptides belong to bacterial proteins (71%). Although we expected the majority of proteins belonging to ribosomes, we identified a rather small percentage of ribosomal proteins (8%). In our case proteins with the highest scores, were membrane proteins including outer membrane porin protein C, peptidoglycan-associated lipoprotein (PAL) and murein lipoprotein (MLP). This interesting result might be associated with the usage of the B-PER [54]. Considering all monobacterial samples, direct identifications provided reliable identification for genus Klebsiella (3 samples), Proteus (4 samples), Enterococcus (2 samples), Enterobacter (1 sample) and Citrobacter (1 sample). Overall, 87% of correlation with standard urine test was obtained with this simple proteomics approach for monobacterial samples.

These results are very encouraging since pathogenic species were correctly identified at the genus level using a relatively small number of identified bacterial proteins per sample, and in the absence of unique peptides. Although our results indicate that proteomics-based identification with a small number of proteins is feasible, high-throughput setup yielding more spectra and retrieving larger fractions of proteomes would be more favourable.

Microbial identification in polymicrobial cultures

To investigate polymicrobial cultures (UR11, UR15 and UR16), we compared the results obtained from the conventional urine culture, 16S rRNA gene sequencing and proteomics (Additional file 2: Table S5). Our previous experience with MALDI-TOF/TOF mass spectrometer indicated that bacterial identification in polymicrobial urine samples using this platform for proteomics has some limitations. As reported previously by other authors, MALDI-TOF MS identification of polymicrobial cultures directly from urine samples did not provide reliable results [17, 49]. Therefore, bacterial identification at the strain-level is still regarded as a challenge. Some of the underlying factors that compromise this method sensitivity in bacterial identification are: sample impurity substances (human proteins), low abundance of bacterial proteins in the sample [55], insufficient coverage of urinary bacterial species in the databases, shared peptide sequences among proteins from different taxa [38] as well as possibility of generating insufficient level of data by single MS injection per sample [39]. Bottom-up tandem MS accompanied with ever-growing proteomics and genomics databases and data processing through wide range of bioinformatics tools has made polymicrobial identification feasible [30, 36] but it still remains in domain of experimental research and far from clinical practice.

Human proteins versus contamination

Normal human urine of a healthy individual contains over 2000 proteins [56, 57], while over 5000 proteins can be found when the urinary tract is under inflammation [33]. Due to low protein concentration, urine is a difficult proteomic sample to work with [58].

We recorded 29% of human proteins in our samples, of which 33% were found to be repetitive (Additional file 2: Table S6). The most abundant of these repeated human proteins were classified as haemoglobin subunits (alpha and beta-globin), apolipoprotein and uromodulin. We did not find any evidence of epithelial cells from the urinary or vaginal tract, or any biomarkers.

As can be seen from Additional file 1: Figure S3, first two fractions cover more than 50% of the total number of proteins. Furthermore, Additional file 1: Figure S4 shows a quantitative overview of bacterial and human proteins of each sample. In terms of future work, it would be interesting to consider two-dimensional fractionation to increase bacterial proteome coverage and enhance the ratio of bacterial vs human proteins.

Limitations and future direction

With regard to the research method, the major limitation identified by this study is a small number of identified proteins per sample. Many proteomic analyses for bacterial identification were limited to monomicrobial specimens with high CFU/mL concentration based on our need to compare results with those of standard urine culture tests, which have own inherent drawbacks. This study lays the groundwork for future research. In the future, a possible direction could be dealing with lower abundant proteins to enhance effectiveness in proteome identification. Switching to a high-throughput platform such as ESI could solve this issue. Furthermore, to increase the number of proteins, a possible solution could be usage of peptide double fractionation or FASP (filter-aided sample preparation) method. To improve bacterial identification, we are developing bioinformatics software based on natural language processing. Urine is clinically underutilized and has a much greater potential in development of non-invasive tests and techniques. Proteomics approach and direct sample analysis have potential to provide us with a broader clinical picture that could bring us closer to precision medicine.

Conclusion

The main goal of the current study was to establish a procedure for analysis of uropathogens by proteomics, the procedure was tested using MALDI-TOF/TOF mass spectrometry directly from urine specimens. This study has shown that identification of bacteria from a native urine sample, without prior culturing step, depends on storage conditions, sample preparation method, as well as data analysis. Overall, the results of this study demonstrate that mass spectrometry based proteomics can effectively identify different uropathogens from fresh or cold stored, human urine samples directly, without cultivation step. The direct approach was able to provide reliable identification of bacteria at the genus-level in monobacterial samples, despite inherent limitations of mass spectrometry platform used. In case of polymicrobial urine samples, direct approach using the methods here described did not allow for unambiguous identification.