Introduction

Analysis of an array of hormones or metabolites may be useful for the clinician faced with a patient with a multitude of nonspecific symptoms that lead to a large differential of diagnoses, for screening of multiple diseases at once, for a patient on hormonal replacement therapy [1, 2], or for any patient where a complete picture of production and metabolism of a hormonal pathway is required [3]. Measurement of steroid hormones in urine can be an essential component to the diagnosis of hormone-related disorders [4, 5]. Some patients may require sampling over multiple days to determine monthly variations or effects of change in treatment. For hormones with known circadian or pulsatile fluctuations, a representation of the entire day is essential [6]; however, collection and storage of a 24-h urine can be cumbersome for patients [7, 8].

Mass spectrometry technology, both liquid chromatography tandem mass spectrometry (LC–MS/MS) for water-soluble compounds and gas chromatography tandem mass spectrometry (GC–MS/MS) for non-polar compounds, is now routinely used to measure arrays of steroid hormones and organic acids because of its high assay sensitivity, accuracy with small volumes, and ability to evaluate multiple analytes at the same time [9, 10]. This methodology allows for a complete profile of urine reproductive hormonal metabolites and multiple organic acids with high resolution of closely related structures [10].

Collection of urine on filter paper, which can then be dried and stored at room temperature until received by the laboratory [11,12,13], offers a significant advance in patient convenience and may improve patient adherence. If multiple samples are collected throughout the day, there is potential to capture both the diurnal variation of hormones along with the full range of daily hormonal production [11, 14]. Our prior proof-of-concept papers have demonstrated that dried urine leads to equivalent measures as liquid urine in our hands for cortisol, cortisone, and the cortisol metabolites, ⍺-tetrahydrocortisol, β-tetrahydrocortisol, and tetrahydrocortisone [14] as well as for estrone, estradiol, ⍺-pregnanediol, and β-pregnanediol [11]. Others have found similar results between liquid and dried urine with organic acids [12, 15, 16]. The urinary analytes in this study included female and male reproductive hormones, 6-hydroxymelatoninsulfate, and a number of organic acids, which have a multitude of uses (Additional file 1: Table S1). Inclusion of multiple metabolites from the estrogen and androgen pathways allows for a complete picture of estrogen and testosterone production and metabolism [17, 18].

The primary goal of this study was to confirm that measurement of the urinary profile of reproductive hormones, 6-hydroxymelatoninsulfate, and an array of organic acids extracted from dried urine collected on filter paper as analyzed by tandem mass spectrometry would provide results in agreement with measurements from liquid urine. The secondary aim was to demonstrate that measurement of reproductive hormones in a collection of four dried urine samples over a 15-h span throughout the day would accurately reflect the measurements of these hormones in a 24-h urine collection.

Methods

Study populations

A prospective observational study of urine collected from a population of healthy adult volunteers who agreed to participate in validation of urine analyses was conducted. This first study population included 26 individuals who provided data on hormonal measures (cortisol and cortisol metabolites, reproductive hormones, and 6-hydroxymelatoninsulfate) to compare samples from the 4-spot urine collection method to a 24-h urine collection. A subset of these individuals (n = 18) had data available to compare measures from dried versus liquid urine. As cortisol and cortisol metabolites [14] and ⍺ and β-pregnanediol [11] were validated in previous analyses, only the following hormones were included in this analysis: estrone (E1), estradiol (E2), estriol (E3), 2-hydroxyestrone (2OHE1), 2-hydroxyestradiol (2OHE2), 4-hydroxyestrone (4OHE1), 16-hydroxyestrone (16OHE1), 2-methoxyestrone (2-methoxyE1), testosterone (T), epitestosterone (EpiT), 5⍺- dihydrotestosterone (DHT), androsterone, etiocholanolone, 5⍺-androstanediol, 5β-androstanediol, dehydroepiandrosterone (DHEA), and 6-hydroxymelatoninsulfate. The data were collected between February and November of 2015 and informed consent was obtained from all participants.

The second analysis included 20 individuals whose deidentified data was pulled from the larger databank of 144,561 laboratory visits. Each of these samples included a single first-morning urine collection to compare results of dried versus liquid urine for the following organic acids: homovanillic acid (HVA), vanillylmandelic acid (VMA), kynurenic acid, xanthurenic acid, methylmalonic acid (MMA), pyroglutamic acid, 5-hydroxyindoleacetic acid (5-HIAA), and β-hydroxyisovaleric (Hiv) acid.

All volunteers in both study populations reported no medical problems and were not pregnant. Individuals were not excluded based on current or recent use of any hormonal medications, as the goal was only to compare measurement values for differing methodologies. Eighty percent of women in the first study population and all women in the second study population were premenopausal.

Sample collection

The 4-spot method involves urine samples collected at home at four times during the day: (1) the first urine of the day, (2) 2 h after awakening, (3) in the afternoon (approximately 4 PM), and 4) before bed (10PM). Participants collected samples by completely saturating 2 × 3 inches of filter paper (Whatman Body Fluid Collection Paper; Sigma-Aldrich, St. Louis, MO, USA) with urine. The paper was left exposed at room temperature for 24 h to dry. The stability of analytes in dried urine at room temperature for as long as 84 days has previously been demonstrated by this laboratory [11]. Dried samples were stored at − 80 °C until analyzed. Reproductive hormones were assessed in all four samples collected, while only the first morning sample was used for 6-hydroxymelatoninsulfate and the organic acid tests.

During the same day, all liquid urine samples for the 24-h collection were added to a low-density polyethylene plastic container (ES Robbins, Muscle Shoals, AL, USA) container with approximately 1 g of boric acid (Sigma-Aldrich, St. Louis, MO, USA) and kept refrigerated for the duration of the collection. The four dried urine samples removed a total of about 8 mL of urine from the 24-h collection. This was considered negligible and was not accounted for. The total volume of urine from 24-h collections was measured, and an aliquot was frozen and stored at − 80 °C until tested.

Urine reproductive hormone analysis

The urinary steroid hormones were analyzed using proprietary in-house CLIA (Clinical Laboratory Improvement Amendments) approved assays on the Agilent 7890/7000B GC–MS/MS (Agilent Technologies, Santa Clara, CA, USA). All reagents, unless otherwise noted, were purchased from Sigma-Aldrich (St. Louis, MO, USA). A 600 ul aliquot of liquid urine was taken from the sample collection and the equivalent of approximately 600 ul of urine was extracted from the filter paper using 2 mL of 100 mM ammonium acetate adjusted to a pH of 5.9. These aliquots of the conjugated hormones were transferred to a C18 solid phase extraction (SPE) column (UCT LLC, Briston, PA, USA), eluted using methanol, and the eluate was dried under nitrogen at 40 °C.

The conjugated hormones were then hydrolyzed from their glucuronide and sulfate forms to free forms using enzymes from Helix pomatia (Sigma-Aldrich, St. Louis, MO, USA) in acetate buffer (55 °C, 90 min). The enzymatic reaction was quenched with sodium hydroxide and the hormones extracted with ethyl acetate. The ethyl acetate extracts were dried under nitrogen at 40 °C. The analytes were derivatized using a mixture of 100 ul acetonitrile (ACN) and 50 ul bis(trimethylsilyl)trifluoroacetamide (Sigma-Aldrich, St. Louis, MO, USA) for 30 min at 70 °C. Internal standards (Steraloids, Newport, RI, USA) were added prior to ethyl acetate extraction, and the percentage recovery from all assays was greater than 90%. Derivatized extract (1.6 ul) was injected into the GC–MS/MS. Samples and controls were analyzed along with a standard curve spanning the expected range of concentrations. Instrument conditions for the oven were an initial temperature of 130 °C increasing to 200 °C at 25 °C/min, then to 230 °C at 4.3 °C/min, and finally to 290 °C at 25 °C/min. Multiple reaction monitoring transitions for ion mass > ion product of fragmentation can be found in Table 1. Creatinine was measured using a conventional colorimetric (Jaffe) method, after initial extraction from the filter paper. The average inter-assay coefficient of variation was 6.7% for creatinine. In addition to expressing the measures per mg of creatinine to correct for variations in filter paper saturation and hydration status, a secondary equation was applied to reduce bias related to the effects of age, sex, weight, and height on creatinine excretion [19].

Table 1 Multiple reaction monitoring transitions for ion mass > ion product of fragmentation for urinary reproductive hormones

Urine 6-hydroxymelatoninsulfate and organic acid analysis

The hydrophilic analytes were assessed by LC–MS/MS using proprietary in-house CLIA approved assays. For the 6-hydroxymelatoninsulfate assay, a 30 ul aliquot was taken from the methanol elution of both the liquid urine collection and the waking sample dried urine collected on filter paper. This extract was then reconstituted in 130 ul of deionized water. For the organic acids, a 100 ul aliquot of liquid urine was taken and an equivalent amount was extracted from the waking sample dried urine filter paper using 250 ul of water with the addition of 50 ul 100 mM ammonium acetate (Sigma-Aldrich, St. Louis, MO, USA) adjusted to a pH of 5.9 and 2% formic acid.

For 6-hydroxymelatoninsulfate, 20 ul was injected into an ultra-performance liquid chromatography (UPLC) (Waters Corporation, Milford, MA, USA) column with a Waters™ tandem quadrupole mass spectrometer detector (TQD). The sample was eluted from a 1.8u 2.1 × 50 mm pentafluorophenyl (PFP) column (Agilent Technologies, Santa Clara, CA, USA) using a gradient of 95% 0.001% formic acid in 5% ACN to 45% 0.001% formic acid in 55% ACN. For the organic acids, 5 ul was injected into the Waters™ UPLC column with TQD. These analytes were eluted from a 1.6 um 2.1 × 50 mm Luna Omega PS C18 column (Phenomenex, Torrance, CA, USA) using a gradient 99.9% 0.2% formic acid in 0.1% ACN to 73% 0.2% formic acid in 27% ACN. Multiple reaction monitoring transitions for ion mass > ion product of fragmentation for urine 6-hydroxymelatoninsulfate and the organic acids are listed in Table 2. The same creatinine corrections of the measures used for the reproductive hormones were also used for 6-hydroxymelatoninsulfate and the organic acids.

Table 2 Multiple reaction monitoring transitions for ion mass > ion product of fragmentation for urine 6-hydroxymelatoninsulfate and the organic acids

Statistical methods

A sample size of 18 individuals provides a power of greater than 80% to detect an intraclass correlation coefficient (ICC) of at least 0.6 with an alpha of 0.05 [20]. The statistical analyses were performed using SAS/STAT® software, Version 9.3 (SAS Institute Inc., Cary, NC, USA) and generated 2-sided p-values.

Variables are described as mean ± standard deviation if normally distributed and median (interquartile range (IQR)) if the distribution was skewed. Student’s t test (for normally distributed variables) or the Wilcoxon rank-sum test (for skewed variables) were used to determine differences between men and women. Spearman correlation coefficients (ρ) were used to determine interclass associations between variables.

Consistency between liquid versus dried urine measures and 4-spot versus 24-h collection methodology was assessed using intraclass correlation coefficients (ICC). ICCs, which range from 0 to 1 with proximity to 1 indicating better agreement, assess for agreement of a measure between two differing methodologies within individuals [21]. Skewed variables were log transformed to approximate a normal distribution prior to assessing ICCs. As 4-spot (ng/mg-Cr) and 24-h (ug/d) measures were expressed in differing units, sex-specific Z-scores ([individual measurement-mean]/standard deviation) were created to standardize the measures and allow for direct comparison. Comparisons of differences between measures within an individual were assessed using signed-rank tests (for skewed variables) or paired t-tests (for normally distributed variables). Because the hypotheses of this paper were intrinsically correlated, no adjustments were made for multiple comparisons.

Results

Study populations

Characteristics of the first study population (n = 26) are shown in Table 3. All of these individuals (58% female; 100% Caucasian) had data available for comparison of 4-spot versus 24-h urine samples for male and female reproductive hormones. A subset (10 female, 8 male) also had measurements to compare liquid versus dried urine samples for both the reproductive hormones and 6-hydroxymelatoninsulfate. Characteristics of the second study population (n = 20; 75% female; 80% Caucasian/10% Hispanic/10% Asian-Pacific Islander), data from whom were used to compare a single first morning collection of liquid versus dried urine samples for the organic acid tests, are provided in Table 4.

Table 3 Age and 24-h measures of reproductive hormones and 6-hydroxymelatoninsulfate of the first study population
Table 4 Age and measures of organic acids from an early morning spot urine collection of the second study population

Agreement between Liquid and Dried Urine Measures

For the majority of analytes, there was excellent agreement (ICC > = 0.90) between the liquid and dried measures (Table 5). The exceptions were VMA (ICC = 0.79) and pyroglutamic acid (ICC = 0.75), which still had good agreement. Similarly, for the majority of analytes, there was no systemic directionality to the difference in the dried urine compared to the liquid urine. However, estrone and estriol were consistently higher when measured in liquid urine, while some of the organic acids – VMA, kynurenic acid, pyroglutamic acid, and β-hydroxyisovaleric acid – were more concentrated in the dried urine sample (Table 5). Representative interclass correlations (Spearman) between the two methods are shown in Fig. 1 and Fig. 2 (the rest are available in Additional file 1: Figure S1).

Table 5 Comparison of liquid versus dried urine analytes (n = 18)
Fig. 1
figure 1

Correlations between the liquid versus dried measurements for select urine steroid hormones. The remainder are available in Additional file 1: Figure S1. Reported correlation coefficients are Spearman correlations. Cr: creatinine; DHEA: dehydroepiandrosterone

Fig. 2
figure 2

Correlations between the liquid versus dried measurements for select urine organic acids. The remainder are available in Additional file 1: Figure S2. Reported correlation coefficients are Spearman correlations. Cr: creatinine, Hiv: β-Hydroxyisovaleric

Agreement between the DUTCH 4-spot and 24-h urine collection for hormonal measures

The measurement of reproductive hormones in urine samples collected four times throughout the day were comparable to the gold standard of a 24-h urine collection (Table 6). As the measures from the 4-spot urine collection are reported in ng/mg-Cr and the 24-h urine measures are reported in ug/d, sex-specific Z-scores were created for direct comparison of the two methodologies. There was excellent consistency (ICC > 0.9) for the majority of the analytes and good consistency for the remainder (estriol, 5⍺ and 5β-androstanediol) (Table 6). There was no systematic directionality to the differences between any of the Z-scores (Table 6). Representative interclass (Spearman) correlations between the analytes are shown in Fig. 3 (the remainder are shown in Additional file 1: Figure S3).

Table 6 Comparison of the urinary hormone profile using the 4-spot (DUTCH) or 24-h urine collection method (n = 26)
Fig. 3
figure 3

Correlations between the 24-h urine collection and 4-spot (DUTCH) urine collection measurements. The remainder are available in Additional file 1: Figure S3. Reported correlation coefficients are Spearman correlations. Cr: Creatinine, DHEA: dehydroepiandrosterone, DUTCH: Dried Urine Testing for Comprehensive Hormones

A sensitivity analysis to verify the need for creatinine correction was done by calculating ICCs for the agreement between sex-specific Z-scores from the 24-h measures and the 4-spot assay without correction for creatinine. Without the creatinine correction, the ICCs were all lower than those observed with the creatinine correction (with the exception of estriol which had a slightly higher ICC by 0.04) with the degree of difference ranging from − 0.04 to 0.15 and averaging 0.07. An example of the interclass correlations (Spearman) between the 24-h measures and the 4-spot assay with and without the creatinine corrections is shown in Additional file 1: Figure S4.

Discussion

This study demonstrated the feasibility of accurately measuring multiple (up to 32) analytes in dried urine samples collected on filter paper using assays that conform to CLIA criteria. All measurements from dried urine demonstrated at least good agreement with measures from liquid urine, and the majority (83%) demonstrated excellent agreement with intraclass correlations greater than 0.9. For most analytes, neither loss nor excess concentration occurred during the sample drying or laboratory extraction process. In addition, measurement of the reproductive steroid hormones, which are usually evaluated from a 24-h collection due to their pulsatile release [22], were well represented by the 4-spot dried urine collection with at least good agreement of the 4-spot measurement with the 24-h gold standard measure for all steroid metabolites and excellent agreement (> 0.9) for the majority (82%). There were no systematic differences between the relative amount of hormone collected by either methodology. The 4-spot dried urine (DUTCH) methodology allows for efficient, accurate assessment of numerous urine metabolites using a convenient collection method, while avoiding the need for a full 24-h liquid urine collection.

The 4-spot dried urine collection has previously been shown to be representative of select 24-h measures of steroidal hormones by our group [11], as measurement of urinary ⍺-pregnanediol, β-pregnanediol, estrone, and estradiol with this method are representative of both 24-h urine collections as well as serum hormone concentrations. In fact, repeated assessments over a month demonstrated that the dried urine collections could accurately recreate the changes observed in serum during the menstrual cycle [11]. In addition, not only does the 4-spot urine method accurately represent a 24-h urine collection for urinary cortisol, cortisone, and cortisol metabolites, but it can also be used to represent the expected diurnal pattern observed with salivary measures, if each of the four collections is considered individually [14].

Previous studies have validated the usefulness of high throughput GC–MS/MS for urine steroid profiling of more than 30 metabolites; however, this was done in liquid urine [5]. Conversely, measurement of organic acids in spot dried urine samples using filter paper has already been well-validated as a technique for screening neonates for metabolic disorders [15, 23, 24] and for screening for neuroblastoma [25] with similar recoveries obtained from liquid and dried urine. Dried urine has also been used to measure other urine analytes of interest, such as sodium and potassium, with similarly high levels of stability [26]. As with others who used either filter paper or cotton swabs [27], we found that dried urine results are in agreement with liquid urine results, reduced the burden on patients, and had good stability over time [11]. Still, agreement between 24-h urine collections and early-morning, single spot urine collections for hormonal analysis are often poor [28]. This study now extends our prior findings [11, 14] to show that the increase to four spot urines spaced throughout the waking hours provides better coverage of the hormonal output for all the male and female reproductive hormones and metabolites, resulting in strong agreement with 24-h urine measures.

There are some caveats that must be considered when interpreting our results. During the urine collection, there may have been differences in saturation of the filter paper. Expressing the analyte concentration per mg of creatinine is designed to address this, while also correcting for hydration. The method used for creatinine adjustment also accounts for differences in creatinine excretion related to body size. This does rely on accurate self-reporting of age, height, and weight by the participants, so an estimate of expected creatinine excretion can be made. This may lead to the introduction of inaccuracies due to misreporting of individual characteristics; however, our statistical sensitivity analyses indicated that the DUTCH measurements were in better agreement with the 24-h urine results with the application of the creatinine corrections (see Additional file 1: Figure S4), with the ICC for some metabolites increasing by as much as 0.15 and raising the level of agreement with 24-h measures from good to excellent.

One of the limitations of this 4-spot dried urine method is that reference ranges are laboratory-specific and non-standardized. Still, interpretation of values above and below these reference ranges should be similar to that of other assays. In this study, there was some loss of hormone for estrone and estriol with the filter paper methodology, which may be related to differences in extraction efficiency between the steroid conjugates and creatinine, loss during the drying process, or incomplete saturation of the filter paper. Still, the difference was less than 12% of the total and would be compensated for by an adjustment of the reference range. In addition, we have previously shown that the dried urine measure of estrone has clinical utility because it is representative of serum estrogen measurements [11]. A number of organic acids plus DHT were more concentrated in the dried urine samples, on average. This may be due to differences in extraction efficiency, a matrix effect or analyte concentration during the drying process; however, this difference did represent less than 10% of the sample. Fortunately, due to the high level of agreement between the Z-scores from the DUTCH methodology and 24-h collections, laboratory reference ranges should account for these differences. Another issue is there are known genetic differences in glucuronidation of testosterone that may impact relative metabolism and urinary concentrations of testosterone and epitestosterone [29], and this may mean urinary androgen measures are not fully representative of production rates in a small percentage of individuals. There are also genetic differences in the enzymes that metabolize estrogens, potentially shifting the ratio of 2-hyroxylation to 16-hydroxylation metabolites [30], but these differences may be clinically relevant and indicative of cancer risk [31]. Our study population had limited diversity and was primarily Caucasian, so these results will need to be replicated in more diverse study populations as we could not make inferences regarding any potential differences in results that may have been caused by differences in race.

The methodology of a 4-spot urine collection on filter paper followed by GC–MS/MS or LC–MS/MS offers some advantages. The collection of dried urine on filter paper results in stable measurement of steroid hormones for extended periods of time both by us [11] and others [12] for up to one year [13], even at ambient temperature. Concentrations of organic acids are also stable on dried filter paper for weeks [32, 33]. Mass spectrometry assays, which are now the gold standard for measurement of steroid hormones in blood and urine [10], allow for the use of small sample volumes with excellent sensitivity and accuracy along with simultaneous measurement of a relatively large number of analytes. In combination with chromatography, either gas or liquid, it provides precise separation of closely related molecules [34] by their chemical and physical properties. GC–MS/MS does not exclude any lipophilic steroids, and so a run will contain all excreted steroids [9]. The use of the H. pomatia enzymes adds to the accuracy of the quantification of the hormone conjugates, as these enzymes include both a sulfatase and a glucuronidase. The method of GC–MS/MS does require an additional extraction and derivatization step, but workflows can be optimized to maximize throughput. A 24-h urine collection may be difficult for some patients to fully collect, especially if they are not able to remain at home for an entire day, are disabled, or are incontinent. This methodology removes that barrier and provides the ability to measure multiple hormones at once with a noninvasive collection method, obtaining a complete picture of both production and clearance of the major steroidal hormones.

A multitude of uses, both in research and in clinical scenarios, could be envisioned for assays that are able to measure multiple steroid hormones and organic acids in conveniently collected urine samples on filter paper. For example, the full range of hormonal changes in individuals related to disruption of the natural sleep cycle could be evaluated simultaneously. It is already known that the peak 6-sulfatoxymelatonin, as representative of melatonin, is lower in people working the night shift [35]; a full appreciation of the urinary steroid profile in individuals who work at night could add to this prior research. Similarly, urine profiling may help to fully define the changes expected in genetic syndromes of steroidogenesis [5, 36] and errors of metabolism [16]. Dried urine samples may be of particular benefit in screening neonates for organic acid disorders [15, 16], and there is recent interest in a possible association of organic acids with neuropsychiatric disorders [37,38,39]. The ability to look at a full urine profile can provide a more integrated view of the patient; for example, patients using oral contraceptives often have higher xanthurenic acid with concurrent pyroxidine deficiency [40], both of which would be observable using dried urine analysis. A greater understanding of the full effect of changes in hormonal concentrations and metabolites or important clinical subgroups could be determined for both exogenous use of hormones and for exposure to endocrine disrupting compounds like bisphenol A [41, 42]. Urine hormone profiling might also be used to fully describe age related changes, i.e. through puberty or menopause.

Conclusions

Mass spectrometry allows for the assessment of a full hormone profile in a small volume of urine such that an expanded view of both hormone production and clearance can be observed. In addition, results from dried urine are in strong agreement with those obtained from liquid urine. In combination with four spot urine collections on filter paper collected throughout the waking hours, we have shown that it is possible to accurately represent a 24-h urine collection. This technology may be useful to the clinician wishing to perform a large series of tests on patients to narrow the differential diagnosis, for those monitoring hormonal therapy or evaluating the menstrual cycle, or for those who need to reduce the burden of collection for their patients. This four-spot, dried urine method allows for assessment of both diurnal patterns [14] as well as total daily production, allowing for a comprehensive evaluation of adrenal and reproductive hormones and other urine metabolites.