1 Introduction

COVID-19 first appeared in the city of Wuhan (China) in December 2019 and spread rapidly around the world. SARS-CoV-2 is a coronavirus very similar to other previous viruses, such as SARS-CoV and MERS-CoV (Zhou et al., 2020). The problem with SARS-CoV-2 is its great contagious capacity, which has been worsening with the new strains that have appeared, such as Omicron (Mannar et al., 2022; Golcuk et al., 2021). The data from the World Health Organization show than 496 million cases in the world, and over 6 million deaths (WHO, 2022).

The symptoms associated with COVID-19 are very diverse, affecting the respiratory, circulatory, digestive, endocrine and nervous systems, which makes it very difficult to treat the disease (Liu et al., 2020b; Niazkar et al., 2020; Richardson et al., 2020; Zhou et al., 2020). The effect of certain drugs, such as glucocorticoids, is being studied with promising results for the treatment of the disease, albeit due to the complexity of the illness, drug-disease interactions are still incompletely understood (Spick et al., 2022). Although much is still unknown about the molecular aspects of the disease, important advances have been made studying the metabolism of COVID-19 patients, mainly at the serological level (Bruzzone et al., 2020a; Costanzo et al., 2022; Ghini et al., 2022; Li et al., 2022; Shen et al., 2020; Thomas et al., 2020; Wu et al., 2020; Xu et al., 2021; Zhang et al., 2021). In urine samples from COVID-19 patients, alterations in certain biomarkers, such as glucose, kynurenines and proteins, have been found to reflect the metabolic alterations caused by the disease and to assess its severity (Bi et al., 2022; Dewulf et al., 2022; Liu et al., 2020a; Morell-Garcia et al., 2021).

The main objective of our work is to study the metabolic profiles of patients diagnosed with COVID-19 by positive RT-PCR testing and try to find the metabolic changes associated with the infection. We have used 1H NMR as an analytical technique because it is especially useful for analyzing complex biological samples, such as urine samples. We have worked with urine samples for various reasons. The first is that it is a biological fluid that collects the final products of metabolism, concentrated by the kidneys. And second, because samples can be obtained using a non-invasive technique. Moreover, urine has been extensively studied by NMR, so there is a lot of information about its composition, as well as the metabolic implications of the different compounds present in it (Bezabeh et al., 2019; Bouatra et al., 2013; Bruzzone et al., 2021; Duarte et al., 2014; Khakimov et al., 2020; Reo, 2002; Tynkkynen et al., 2019).

We have analyzed a cohort of urine samples from COVID-19 patients against samples from healthy people applying chemometric techniques that have allowed us to find the metabolites greatly modified during infection (Li et al., 2009; Verboven & Hubert, 2005; Verboven et al., 2012). We have created mathematical models that, based on these metabolites, can differentiate, and identify urine samples of COVID-19 patients from samples of healthy people.

These results show that important alterations in metabolism are taking place during the infection, similar to alterations seen in cases of type II diabetes mellitus, as well as alterations in products generated by the microbiota.

2 Materials and methods

2.1 Patient recruitment and samples preparation

Patient recruitment and sampling procedures were performed in accordance with the Declaration of Helsinki and applicable local regulatory requirements and laws and after approval from the Ethics Committee of the Hospital Universitario de San Juan (Alicante, Spain). Written informed consent was obtained from each participant before being included in this study. Urine samples of patients with COVID-19 and another viral chronic disease were obtained from the Hospital de la Marina Baixa (Vila Joiosa, Alicante, Spain). The first set of COVID-19 patient samples (n = 20) was collected in September, October, and November 2021, when the Delta variant was predominant in Spain. The second set of samples (n = 14) was collected during February and May 2022, when the dominant virus variant was Omicron. The healthy control samples were obtained from the Alicante University personnel (Table 1).

Table 1 Population and clinical data of the people whose urine have been analyzed

Human urine samples (first pass, morning) were collected from volunteers in 120 mL sterile urine specimen cups. Upon receipt (typically within 1 h of collection), all samples were stored at – 20 ºC. Prior to the analysis, the samples were thawed at room temperature for 30 min and centrifuged at 12.000 rpm. for 5 min.

2.2 1H NMR acquisition and data processing parameters

500 µL urine were place in 5 mm NMR tube with 50 µL of D2O with 0.75% 3-(trimethylsilyl)propionic-2,2,3,3-d4 acid sodium salt (TSP) and 0.1% sodium azide. The spectra were referenced to TSP at 0.00 ppm. All 1H NMR experiments were performed on a Bruker Avance 400 MHz equipped with a 5 mm HBB13C TBI probe with an actively shielded Z-gradient. 1D solution-state 1H NMR experiments were acquired with a recycle delay of 2 s, 32,768 time domain points, and with 2.556 s of acquisition time. The number of scans was 1024 and the experiment was carried out at 298 ºK. Spectra were apodized by multiplication with an exponential decay producing a 0.3 Hz line broadening in the transformed spectrum. The 1H NMR spectra were normalized and reduced to ASCII files using TopSpin (Bruker) and aligned using icoshift (version 1.0; available at www.models.kvl.dk) (Savorani et al., 2010). All 1H NMR spectra processing was performed in MATLAB (The MathWorks, Natick, MA). The region of water (4.60–4.95 ppm), urea (5.60–6.00 ppm) and extreme high and low fields (< 0.5 ppm and 10 ppm) were removed. After the elimination of these regions, the spectra were binned into 0.04 ppm buckets (MacKinnon et al., 2019).

2.3 Statistical analysis

The data obtained from the NMR spectrometer were imported into Matlab software (Mathworks, USA). The metabolites were identified in the one-dimensional spectra using The human metabolome database (HMDB).

The 1H NMR profiles form a matrix that is analyzed using the unsupervised method such as ROBPCA (Verboven & Hubert, 2005) and a supervised method such as PLS-LDA algorithm (Li et al., 2009, 2018). The PLS-LDA algorithm is a supervised method that allows data to be grouped according to a mathematical model. It allows us to know if the data are classified to the correct group and, also, to determine which properties are important for that correct classification. The important statistical parameters are the factors R2Y and R2X, to determine that the model is correct. We apply pareto scaling in the analysis of the data with PLS-LDA, using 3 components. In the variable selection, we used SPA (Subwindow Permutation Analysis) with the spectra binned (Li et al., 2010, 2018). This method is based on PLS-LDA models and the prediction errors.

The LIBRA library for unsupervised analysis, ROBPCA, was downloaded from the website (https://github.com/mwgeurts/libra). The LIBPLS library for supervised analysis, PLS-LDA, was downloaded from the website (http://www.libpls.net/). Both libraries are used with Matlab 2021b.

3 Results

Urine samples were analyzed by 1H NMR as detailed in the Materials and Methods section. A first inspection of the representative spectra obtained allows us to differentiate some samples from others, as can be seen in Fig. S1 (SUPPLEMENTARY MATERIAL). These spectra have been obtained with the raw data, without performing any processing. The most important differences are in the signals assigned as citric acid, ketone bodies, TMAO, sugar region, hippuric acid, and formic acid.

To confirm these preliminary results, we analyzed all available samples by 1H NMR. On this occasion, each sample was analyzed individually. An analysis of the spectra was performed using robust principal component analysis (ROBPCA), an unsupervised technique (Verboven & Hubert, 2005) (Fig. 1A). In the graph of the scores, we can observe the first separation between the samples, which is generated by the loadings of the main component 1 (PC1) and the main component 2 (PC2) (Fig. 1B). The classification of the samples is determined by the positive or negative value of loadings. In Fig. 1B, loadings with positive values indicate a higher concentration of these metabolites in the COVID-19 patient samples. The opposite occurs for negative values, that is, the negative values are from metabolites that are in lower concentration in COVID-19 patients. The variables selected in the model correspond to those observed in the first experiment.

Fig. 1
figure 1

ROBPCA score plots for urine spectrum data for COVID-19 patients (blue circles) and healthy controls (red diamonds)

Once it was verified that the separation between the group of COVID-19 patients and the group of healthy people is possible based on the 1H NMR spectra of urine with an unsupervised method (Robust PCA), we proceeded to use the supervised method PLS-LDA (Li et al., 2009, 2018). With PLS-LDA we were able to obtain an excellent classification of the samples (Fig. 2).

Fig. 2
figure 2

A First two components of the PLS-LDA model score plots for urine 1H NMR spectra for COVID-19 patients (blue circles) and healthy controls (red diamonds). B Predicted COVID-19 or healthy by PLS-LDA model using urine 1H NMR spectra, 3 LVs and pareto scaling as data pretreatment. C Pseudospectrum format PLS-LDA tpLoading for the urine 1H NMR spectra for COVID-19 patients and healthy controls. The intensity of the peaks (positive or negative) in the pseudospectrum significant the most significant spectral shift regions in the PLS-LDA model. The more significant peaks are identified in the pseudospectrum

The spectra were analyzed with a supervised chemometric technique (PLS-LDA) (Li et al., 2009; Tang et al., 2014). With PLS-LDA we were able to obtain results that allow us to classify the samples into two groups: a group of COVID-19 patients and another group of healthy people (Fig. 2). The classification is achieved thanks to identifying which spectrum signals are the ones that allow us to recognize differences between both groups (Fig. 3). The classification is supported by the values of the statistical parameters. In this study, the statistical parameters are very clear: R2Y was 75.99, the error was 0, sensitivity was 1, specificity was 1 and AUC was 1, using pareto scaled data. They are based on signals from citric acid, ketone bodies, TMAO, the sugar region, and signals from hippuric acid and formic acid, as identified in preliminary tests using unsupervised ROBPCA methods (Fig. 1). In Fig. 2C, the values of the loadings that are in the negative part indicate that these signals are higher in the samples of COVID-19 patients. These signals correspond to ketone bodies, TMAO, sugar region (glucose), and formic acid. We can also identify the phenylalanine and tyrosine signals as more intense in the group of COVID-19 patients. However, it is also very interesting to observe the signals that have decreased or disappeared in the spectra of COVID-19 patients compared to the spectra of healthy people. These signals correspond to citric acid, creatine/creatinine, glycine, and hippuric acid. It is important to note that in many of the samples from COVID-19 patients the levels of citric acid are almost undetectable. The same happens with hippuric acid. To confirm that the PLS-LDA model was correct, new samples were collected from COVID-19 patients and patients with a chronic viral disease, with the aim of using the PLS-LDA model to check the classification of the samples. 1H NMR spectra were entered into the model. We can see in Fig. 3 that the COVID-19 patient samples are correctly classified. These samples were collected during February and May, when the OMICRON variant was dominant, while the other samples were collected when this variant was not dominant. Samples from patients with another chronic viral disease present metabolite profiles in the 1H NMR spectrum similar to healthy people (Fig. 3).

Fig. 3
figure 3

Plot samples in PLS-LDA component space with separating line shown. Train data set: COVID-19 patients (red diamonds) and healthy controls (blue circles). Test data set: COVID-19 patients (black diamonds) and patients whit a chronic viral disease (green circles)

Working with 1H NMR spectra involves doing a point-by-point analysis across the entire spectrum. This can confuse the model by not delimiting the peaks that correspond to each molecule. To avoid this problem, we use the method of binning the spectra (Materials and Methods Section). We apply signal integration with divisions of 0.04 ppm (MacKinnon et al., 2019). With this, we reduce the spectrum to only 239 points. We again apply the unsupervised method of analysis (ROBPCA) (Fig. S2 SUPPLEMENTARY MATERIAL). We also get a good separation between samples from COVID-19 patients and healthy people. Applying a supervised method, such as PLS-LDA, we again obtain the same classification between the samples. In addition, having reduced the data in each spectrum by applying binning, it is easier to use a variable selection method, such as SPA. This method allows us to select the most important variables in the classification (Fig. S3 SUPPLEMENTARY MATERIAL). The most important signals to distinguish COVID-19 patients from healthy people are citric acid and hippuric acid signals, using SPA (Li et al., 2010). The signals of these organic acids practically disappear in COVID-19 patients (boxplot for these organic acids are shown in Fig. S4 SUPPLEMENTARY MATERIALS).

4 Discussion

The kidney plays a fundamental role in the body, as it allows the elimination of waste and/or harmful substances for the body. The kidney concentrates these substances in the urine for elimination. Therefore, urine is a fluid that collects the end point of many metabolic processes in the body (Duarte et al., 2014). It offers us invaluable information on the situation of the organism at a given moment or the organism's response to a certain situation, such as SAR-CoV-2 infection (Alvarez-Belon et al., 2020). The kidney is also a very important organ in intermediate metabolism due to its participation in fundamental metabolic pathways for the organism. Considering both roles of the kidney, that of a filtering agent and that of a critical metabolic center, together with the liver, allows us to understand the metabolic alterations caused by SARS-CoV-2 infection. To understand the metabolic changes caused by COVID-19, we have to evaluate each of the metabolites altered by the disease. Figure 2C shows the importance of the different signals to obtain a good classification and separation of the samples of COVID-19 patients and healthy people. Signals that are below zero indicate that they are dominant in urine samples from COVID-19 patients, while positive signals indicate that they are higher in healthy people. In 1H NMR, the intensity of the signal in the spectrum indicates the number of protons in each molecule that contributes to the signal. That is, the more signal intensity, the greater number of protons of that molecule. Therefore, the PLS-LDA multivariate analysis method is generating a tpLoadings profile that indicates the importance of the 1H NMR signal to generate the mathematical model. The mathematical model is what allows us to identify a sample as being from a COVID-19 patient or from a healthy person. The signals corresponding to ketone bodies (acetoacetate, 2-hydroxybutyrate, and acetone), TMAO, glucose, and the amino acids phenylalanine and tyrosine are more intense in COVID-19 patients than in healthy people. In urinalysis, the presence of ketones, glucose and proteins has been found elevated in COVID-19 patients (Liu et al., 2020a; Morell-Garcia et al., 2021). Previous studies on urinary proteomics of COVID-19 patients also indicate that they present aminoaciduria (Dewulf et al., 2022). This increase in ketone bodies would be in line with the increase in these substances found in the blood (Bruzzone et al., 2020a). Ketosis is generated in the liver because there has been a metabolic shift to a state of gluconeogenesis. When this ketogenic metabolic pathway is activated in the liver and kidney, all available precursors are used for de novo glucose synthesis (gluconeogenesis). One of the most important precursors is oxaloacetate (Nelson & Michael, 2014). This compound is the one that initially forms citrate when condensing with acetyl-CoA that comes from the oxidative decarboxylation of pyruvate or from the β-oxidation of fatty acids, initiating the Krebs cycle. The lack of oxaloacetate causes the concentration of acetyl-CoA to increase, which is directed toward the formation of ketone bodies. In addition, these ketone bodies will compensate for the lack of glucose in different tissues, such as the central nervous system and muscle (Nelson & Michael, 2014). We must also consider that the amino acids phenylalanine and tyrosine are ketogenic amino acids, which would be mobilized to serve as a source of acetyl-CoA. However, the presence of glucose in the urine makes us consider another more serious metabolic situation since a situation of ketosis is relatively normal in metabolism (Nelson & Michael, 2014) and should not produce an increase in ketone bodies or glucose in the urine. The presence of glucose in the urine, together with the increase in ketone bodies, would indicate a situation of type II diabetes. Glucose transporters in the nephritic tubes are unable to recover glucose from urine because there is excess glucose that saturating the transporters (Mather & Pollock, 2011; Schetz et al., 2010; Wen et al., 2021). It is also important to point out that the presence of ketone bodies in urine could be due to the role that these compounds play as molecular signals and protection against kidney damage. The presence of β-hydroxybutyrate has an anti-inflammatory effect and suppresses oxidative stress (Rojas-Morales, 2021; Rojas-Morales et al., 2020).

In different studies carried out with analyzes of COVID-19 patients, it could be deduced that the metabolism was changing to this situation of ketosis and type II diabetes or a certain resistance to insulin (Bruzzone et al., 2020a; Ghini et al., 2022). This metabolism alteration would be supported by the increase in the concentration of TMAO (trimethylamine N-oxide). TMA (trimethylamine) is produced by the bacterial flora from dietary metabolites such as choline and L-carnitine present in animal products in the diet (Hoyles et al., 2018). TMA is transported to the liver, where it is oxidized to TMAO by hepatic flavin monooxygenases (FMO1 and FMO3). The increased concentration of TMAO in the body has been associated with health problems such as inflammatory processes, risk of arteriosclerosis, neurological disorders, or kidney damage, among other diseases. However, the mechanism associated with these damages is still unknown (Barrea et al., 2018; Brial et al., 2018; Chhibber-Goel et al., 2017; Coras et al., 2019; Dove et al., 2012; Farmer et al., 2021; Gatarek & Kaluzna-Czaplinska, 2021; Janeiro et al., 2018; Koeth et al., 2013). One of the characteristics of COVID-19 is the appearance of inflammatory processes, which could be leading to increased levels of TMAO in the urine of COVID-19 patients, as occurs with other inflammatory processes (Tang et al., 2015; Yang et al., 2019). Kidney damage is also associated with an increased concentration of TMAO (Tang et al., 2015).

Another compound that also has high values in COVID-19 patients is formate. Formate is synthesized from precursors such as choline or methanol, but it is mainly obtained from the amino acid serine (Pietzke et al., 2020). In some COVID-19 patients, we have found very high levels of formate. Formate is used as a witness for exposure to environmental contaminants in workplaces, such as methanol. However, COVID-19 patients have not been exposed to these environments, so the elevated formate levels could be due to the massive production of amino acids during sarcopenia caused by the infection or to kidney damage that affects osmoregulation. (Gil et al., 2018; Liesivuori et al., 1992; Pietzke et al., 2020). In addition, formate induces changes in energy metabolism, increasing AICAR and increasing AMPK activity (Oizel et al., 2020), which would cause an increase in intracellular glucose generation via gluconeogenesis and glycogenolysis. However, there are still many unknowns about the role of formate in the body. Sarcopenia is one of the most serious problems associated with COVID-19 (Anker et al., 2021; Morley et al., 2020; Pleguezuelos et al., 2021, Wang et al., 2020), which may be causing the huge decrease in creatine/creatinine seen in COVID-19 patients vs. healthy people. Losses of muscle mass and strength pose a serious health risk (Isoyama et al., 2014; Newman et al., 2006). The loss of muscle mass would be caused by the breakdown of muscle proteins to use amino acids as an energy source, as well as glucose precursors in hepatic and renal gluconeogenesis. The situation of type II diabetes caused by the infection would be aggravating this process of sarcopenia as the necessary amount of glucose does not enter the muscle cells. The difference in creatine/creatinine values between COVID-19 patients and healthy people is key in the PLS-LDA model, indicating the severity of muscle loss during the disease. Creatine is generated in the muscle as a reserve of phosphate bonds, by the synthesis of phosphocreatine. Creatinine is a derivative of creatine that is formed by a non-enzymatic reaction (Nelson & Michael, 2014).

In healthy people, we find differences with respect to COVID-19 patients, in other metabolites such as acetic acid, citric acid, and hippuric acid. Acetic acid is part of the short-chain fatty acids (SCFA) that are synthesized in the gastrointestinal tract by the microbiota, particularly in the colon. They are produced by the fermentation of dietary fiber (Canani et al., 2011). Propionate and butyrate are used as an energy source by colonocytes, while acetate is excreted in the urine (Boets et al., 2017; Pomare et al., 1985). In COVID-19 patients, there is a decrease in this metabolite in urine samples. The change in this metabolite is probably due to changes in the diet of COVID-19 patients, due to their hospital admission, although we cannot rule out the negative effects of SARS-CoV-2 infection on the intestinal flora. These SCFAs also have anti-inflammatory effects (Boets et al., 2017), so if their concentration decreases, it may also be contributing to the state of inflammation caused by the infection.

Citric acid virtually disappears in COVID-19 patients. This metabolite is produced by the condensation of oxaloacetate with acetyl-CoA to initiate the Krebs cycle (Nelson & Michael, 2014). Its metabolic function is to be the beginning of the Krebs cycle to oxidize acetyl-CoA and obtain energy. It is also the precursor for fatty acid synthesis in a metabolic situation of excess sugars (Nelson & Michael, 2014). However, the metabolic situation in COVID-19 patients is different since the body is in a gluconeogenic situation. Using all possible precursors for glucose synthesis is a priority for cells. As we have discussed, oxaloacetate is one of those glucose precursors (Nelson & Michael, 2014). The kidney, along with the liver, is a gluconeogenic organ, which would explain the lack of citric acid in the urine. The citric acid in urine has the function of binding Ca2+ (Moe & Preisig, 2006) to prevent the formation of kidney stones since it prevents the formation of calcium oxalate, which would form the stones (Tiselius et al., 1993). Multivariate analysis of 1H NMR spectra indicates that this metabolite is the most important for identifying COVID-19 patients (Figs. 2 and S3). All healthy people have citric acid, while COVID-19 patients show an absence of this metabolite. In other words, we would not only be observing a metabolic change due to this absence of citric acid, but it could also be a fast and non-invasive method to detect the infection (Seker et al., 2009).

Glycine is also a glucogenic amino acid, so its significant decrease in samples from COVID-19 patients compared to healthy people would also support the hypothesis of this gluconeogenic situation in the body due to type II diabetes caused by the infection. COVID-19 patients suffer an enormous loss of muscle mass due to the breakdown of muscle proteins. Amino acids are used in the muscle as a source of energy to compensate for the lack of glucose (Nelson & Michael, 2014).

Hippuric acid is produced in the liver as a metabolite from the detoxification of benzoic acid (Irwin et al., 2016). Therefore, the decrease in this metabolite could be indicating that this detoxification process is not working properly, which would cause liver damage. In other words, the infection itself could be affecting the liver's detoxifying capacity, which would explain the liver damage observed in COVID-19 patients (Bruzzone et al., 2020a). These results should be confirmed by further investigations focused on hepatic metabolism. On the other hand, hippuric acid also plays a role in the kidney, preventing the formation of stones (Atanassova & Gutzow, 2013), so its decrease in urine could be associated with a decrease in citrate.

Another possibility would be that COVID-19 patients have severe malnutrition that activates gluconeogenesis and ketogenesis. This could produce an increase in ketone bodies in blood and urine. This phenomenon of increased blood ketone bodies has been observed in COVID-19 patients (Bruzzone et al., 2020b), although it has not been associated with malnutrition. On the other hand, what will never be associated with malnutrition is the increase in urine glucose in COVID-19 patients. Therefore, the most plausible hypothesis would be insulin resistance caused by SARS-CoV-2.

Although samples from COVID-19 patients were collected at different times, where the predominant strain was DELTA in one case and OMICRON in the other, this does not seem to cause differences in the results of the multivariate analysis, as they are always classified in the covid patient group (Figs. 1, 2, 3). A lipidomics study has also shown that there are no differences in lipid profiles in patients infected with different virus variants (Farley et al., 2022).

5 Conclusion

Our work points to urine as a relevant element in the rapid diagnosis of COVID-19, since we have observed that there are important changes in its composition in severe COVID-19. Furthermore, these changes in the presence and/or absence of certain metabolites are indicating some of the metabolic changes caused by the infection. These changes would affect various organs, such as the muscle, the liver, and the kidney, in addition to affecting, with great probability, the intestinal microbiota.

According to our results, the possibility of using the absence of citrate or hippuric acid as a potential detection method for COVID-19 seems promising to identify which patients will develop severe disease. Although in our research we have observed that the model distinguishes between COVID-19 patients and other chronic viral disease patients, it would be interesting to study whether it is also possible to differentiate COVID-19 patients from other patients with acute viral diseases.

However, before these results can be used as a diagnostic test for COVID-19, it would be necessary to expand the number of samples analyzed. Our aim is to try to obtain more samples and confirm our results. On the other hand, the metabolic alterations that COVID-19 patients present, and which are reflected in the metabolites present in urine, help to better understand the disease.