Background

Due to good pharmacological activities and excellent curative effects, traditional Chinese medicine (TCMs) is increasingly popular not only in China but also around the world. Therefore, ensuring the efficient and safe use of TCM is an important issue. Given the complex components of TCMs, it is crucial to carry out a holistic quality control methodology, involving fingerprint technology and multi-components determination technology [1,2,3,4]. However, these technologies cannot be realized without reference standard (RS), which has brought great pressure to both providers and users. Firstly, the high price of RS led to a significant increase in the cost of TCM analysis. Besides, some TCM compounds are difficult to be extracted, isolated, and purified, while some are unstable and toxic, all of which lead to problems to the supply of RS. Furthermore, due to the low content of these compounds in TCMs, the preparation of the RS requires a large quantity of TCMs and organic solvents, which is not eco-friendly.

The substitute RS method has been developed as a feasible solution for the problems discussed above. Substitute RS is a method for the qualitative or quantitative determination of another one or more compounds to be measured by one or a few physical RS by using several constant eigenvalues and algorithms [5,6,7,8]. Qualitative substitute RS methods include relative retention time (RRT) technique [9,10,11,12], extractive reference substance (ERS) method [8,9,10,11], linear calibration using two reference substances (LCTRS) approaches [13,14,15] and Photon Diode Array (PDA) spectrum method [16,17,18]. Quantitative methods include the relative correction factor method [9,10,11,12] and the quantitative ERS technique [9,10,11]. These methods not only promote the application of multi-components determination and fingerprint analysis for quality control of medicines but also have been proven to be more economical and simple [13,14,15,16,17,18,19,20,21,22,23,24,25]. However, the substitute RS method used in the holistic quality control of medicines still has some problems. In particular, the qualitative analysis of chromatographic peaks is the critical issue and the most challenging problem of substitute RS method. For this part, the RRT method and ERS method were adopted by the Pharmacopoeia of several countries, such as Chinese Pharmacopoeia, European Pharmacopoeia, etc. Yet, the drawbacks of the RRT method are large retention time (tR) deviation and poor column durability. Also, the reference chromatogram provided by only one chromatographic column by the method of ERS leads to the differences between the actual and reference chromatogram due to the various brands or types of columns. Consequently, scholars have studied the selectivity of reversed-phase columns [26], classified the columns [27, 28], and put forward the method of selection system of columns [29, 30] to solve the problem of blind selection of columns. Nonetheless, the problem of a large prediction deviation of the RRT method has not yet been fundamentally solved.

Compared with the RRT method, the LCTRS method could reduce the deviation of tR prediction [13,14,15]. However, there is still a challenge for improving the prediction accuracy of tR, especially under the circumstances of different types of compounds, or with experiments that are conducted by columns with large differences in retention performance, which may even result in the reverse order of peaks [18]. PDA method may solve the problem of large deviation or reversed the order of peaks to some extent. However, it is difficult to effectively share data or objectively evaluate data in different laboratories, due to a lack of uniform PDA data exchange format among different brands of chromatography workstations [16, 17].

To solve these problems, we introduced the concept of the digital reference standard (DRS) in our previous study [31]. In the present study, a strategy for holistic quality control of TCM was proposed by the DRS analyzer using a phenolic acid extract of Salvia miltiorrhiza as an example. DRS analyzer is an algorithm software, which was developed to support the chromatographic algorithm methods of RRT and LCTRS, similarity algorithm of PDA spectrum, as well as the combination of different algorithms mentioned above. It is also a multi-dimensional database, which stores all the original data of the HPLC chromatogram and PDA spectrum during the establishment of the method. These data are not only useful for the calculation by software. Still, they are also crucial for searching and comparison of the chromatographic data by users, finally realizing the recommendation of column based on these data and improving the reproducibility and accuracy of the holistic quality control method. Phenolic acid extract of S. miltiorrhiza is the extract of Salviae Miltiorrhizae Radix (Danshen in Chinese), a popular TCM. Salviae Miltiorrhizae Radix is also used as a dietary supplement in other Asian countries, as well as in Europe and America. The design, algorithm, application, and characteristics of DRS analyzer were discussed in this study. Also, a series of quality control methods of fingerprint involving 11 compounds of polyphenolic acid extract of S. miltiorrhiza were developed based on DRS method.

 Methods

Chemicals and reagents

The phenolic acid extract of S. miltiorrhiza was obtained from the National Institutes for Food and Drug Control (NIFDC, Beijing, China). RSs of Sodium Danshensu, Salvianolic acid D, and Lithospermic acid were purchased from Shanghai Yuanye Bio-Technology (Shanghai, China). Reference standards of Protocatechuic aldehyde, Caffeic acid, Rosmarinic Acid, Salvianolic acid B, Salvianolic acid H/I, Salvianolic acid E, Salvianolic acid L, and Salvianolic acid Y were obtained from NIFDC (Beijing, China).

Ethanol, which was analytical grade, was purchased from Sinopharm Chemical Reagent (Shanghai, China). Acetonitrile, methanol, phosphoric acid, and formic acid, which were chromatographic grade, were purchased from Fisher Scientific (Pittsburgh, PA, USA). Deionized water was prepared by a Milli-Q system (Millipore, Bedford, USA).

Instruments and chromatographic conditions

Chromatographic analysis was performed on Agilent 1260 high-performance liquid chromatography with a DAD detector, ChemStation online control, and offline analysis workstation (Agilent, Santa Clara, CA, USA). Twenty-two columns (Table 1) from seven manufacturers were randomly selected. It is recommended to use at least ten columns from three manufacturers for DRS method research.

Table 1 Information of columns

Mobile phase A was 0.1% formic acid-water, and mobile phase B was 0.1% formic acid-acetonitrile. The elution procedure was as shown as below: 20–21.5% B for 0–30 min, 21.5–25% B for 30–35 min, 25–40% B for 35–45 min, 40–95% B for 45–50 min, 95 − 90% B for 50–53 min, 90 − 25% B for 53–60 min. The detection wavelength was 288 nm, and the UV-Vis absorption spectra (190–600 nm) were collected. Column temperature: 30 °C. Flow rate: 1 ml/min. Injection volume: 10 µl.

Preparation of sample and reference standard solution

The solvent used to dissolve and storage the sample was 25% ethanol-water solution, with pH adjusted to 2.0 by formic acid. The phenolic acids were relatively stable under this condition.

Appropriate amounts (above 16 mg) of phenolic acid extract of S. miltiorrhiza and 10 ml solution mentioned above were put into a conical flask, shaken and filtered through a 0.22 µm membrane before use.

An appropriate amount of 11 RSs, including sodium Danshensu, protocatechuic aldehyde, caffeic acid, salvianolic acid D, salvianolic acid E, salvianolic acid H/I, rosmarinic acid, lithospermic acid, salvianolic acid B, salvianolic acid L, and salvianolic acid Y were dissolved by the solution mentioned above to obtain the reference standard solution.

Software development

 Data format

DRS Analyzer supports the NetCDF (ANDI) data format [32], which is used for the exchanging and reading of chromatography and spectrometry data. The spectrum data from the PDA detector adopts an extended ANDI format [18]. HPLC instrument vendors such as Agilent and Waters have provided support for PDA spectrum exchanging with the extended ANDI format in their chromatographic workstation through macro or software upgrade.

 Program design

DRS analyzer is developed with C + + language, and Model View Controller (MVC) framework is adopted. It supports the chromatographic algorithm, PDA spectrum algorithm, as well as the combination of different algorithms mentioned above. The chromatographic algorithm includes the RRT method using one RS and the LCTRS method using two RSs. RRT is the ratio between tR of the analyte to the reference compound, which is the reference value for calculating the tR of an analyte. As RRT, StR is also the reference value. But StR is not the ratio; it is the arithmetic average of tR for the same compound on different HPLC systems under the same chromatographic conditions [14]. Also, there is a linear relationship between tR and StR for all compounds [14], as shown in Fig. 1. For the LCTRS method, tR of the two RSs and StR are substituted into linear equation [as expressed in formula (1)] to calculate the tR of the analyte [14]. The similarity algorithm of the PDA spectrum is the cosine method [33].

Fig. 1
figure 1

Linear relationship between tR (Inertsil ODS-3) and StR. No. 1 to 11 represented Sodium Danshensu, Protocatechuic aldehyde, Caffeic acid, Salvianolic acid D, Salvianolic acid E, Salvianolic acid H/I, Rosmarinic acid, Lithospermic acid, Salvianolic acid B, Salvianolic acid L, and Salvianolic acid Y, respectively

In addition, the software is a multi-dimensional database, which stores all the original data of the HPLC chromatogram and PDA spectrum during the establishment of the method, and the recommendation of the column could be realized based on these data. The method of recommendation for the column is based on correlation, which is different from the existing recommendation method based on causation [14, 27,28,29,30]

$${t}_{R}coli=a\times {St}_{R}+b.$$
(1)

 Results

Optimization of HPLC conditions and method validation

The mobile phase was investigated, including the separation effects of methanol and acetonitrile, the differences between phosphoric acid and formic acid, and the influences of column temperature. The gradient elution procedures and flow rates were optimized. The selected chromatographic conditions had good resolution, symmetrical peak shape, and reasonable analysis time. Chromatograms of samples were collected on 22 columns under optimized chromatographic conditions. Representative chromatograms and spectra are shown in Figs. 2, 3. The peaks were identified by the RSs, UV-Vis spectrum and mass spectrum.

Fig. 2
figure 2

Representative HPLC chromatogram of sample on Column 3 (Inertsil ODS-3). No. 1 to 11 represented the same compounds as Fig.  1

Fig. 3
figure 3

Representative UV-Vis spectra of the sample

Methodological validation experiments were performed on the Agilent Zorbax SB C18 column. The precision (n = 6), stability (12 h, n = 6), and repeatability (n = 6) were tested. The results showed that RSD of the tR of the 11 peaks and the peak areas were both less than 3%, thus meeting the requirements of fingerprint analysis.

Initialization for the DRS method

Since the columns of number 1 to 17 could effectively separate 11 peaks of the samples, data on these columns were utilized to initialize the model by steps, as shown in Fig. 4. The first step was data importing. The chromatographic data and corresponding of the samples on columns 1 to 17 were imported into the software, and integration operations such as adding and deleting peaks were performed. The chromatographic data were in ANDI format, with the file name extension “.cdf”. The spectral data were in extended ANDI format, with the file name extension “.nc”. The PDA data was optional. The second step was the peak assignment. Names of the 11 compounds were input into the software, and then the corresponding peaks of the 17 columns and the compounds (the red box part of Fig. 5) were matched one-to-one. The third step was setting the qualitative chromatographic method, taking LCTRS as an example. The tR window of the peak was set to 1 minute. If the tR deviation for the peak was ≤ tR window, the peak could be identified. In this study, peak 1 and peak 9 (recommended to select the peaks close to the first peak and last peak respectively, including the first peak and last peak as well) were selected as two reference compounds, as shown in the green box of Fig. 5. The spectral data were available in the present study, and the fourth step was to establish a spectral qualitative method. As shown in the area of the blue box in Fig. 5, the synthesized spectrum was selected as a spectral matching method, and the similarity threshold was set to 0.95.

Fig. 4
figure 4

Flow chart of method initialization

Fig. 5
figure 5

Method initialization on software: Assign the peaks, Set the qualitative method (chromatography), Set the qualitative method (spectrum)

Optimization and evaluation of DRS method

Selection of reference compound

Since the selection of the reference compound can significantly affect the accuracy of the RRT and LCTRS method to calculate the tR, the optimization was needed. According to our previous studies [14, 34], the general principles for RRT and LCRRS method to select reference compounds were as follows: the tR coverage of the reference compounds was 50–100%, and their non-linear deviation was small enough. The coverage of tR was a reflection of the relative position of reference compound between the first compound and the last compound. For the LCTRS method and RRT method, the calculation of the coverage method was expressed in formula (2, 3), respectively. Since there were various marker compounds in the overall quality control method, even if following the above principle, a large amount of calculation was still required to obtain the optimal reference compounds for the sample under certain chromatographic conditions

$$Coverage of {t}_{R}=\frac{{t}_{R2}-{t}_{R1}}{{t}_{Rlast}-{t}_{Rfirst}}.$$
(2)

tR2 is tR (or StR) of second reference compound; tR1 is tR (or StR) of first reference compound; tRlast is tR (or StR) of last compound; tRfirst is tR (or StR) of first compound [14]

$$Coverage of {t}_{R}=\frac{{t}_{Rreference}-{t}_{Rfirst}}{{t}_{Rlast}-{t}_{Rfirst}}.$$
(3)

tRreference is tR of reference compound; tRlast is tR of the last compound; tRfirst is tR of the first compound [34].

In the present study, 11 marker compounds and a total of 55 reference compound pairs were obtained, among which about 20 pairs were with tR coverage more than 50%. The software’s method optimization function provided the top 10 reference compound pairs with the highest accuracy, as shown in Table 2. It was revealed that the tR deviation (average deviation of 11 peaks on 17 columns) of the reference compound pair peak 1 and peak 9 was 0.304 min, and the identification rate was 99.5%, ranking 9th. However, the best pair was peak 3 and peak 9, with tR deviation being 0.258 min and identification rate being 99.5%. In comparison, the optimal combination reduced the deviation by 0.046 min.

Table 2 Top 10 best reference compound pairs

Adjustment of tR window

Obviously, on one hand, the smaller the tR window, the more accurate the method was, but on the other hand, the fewer the applicable columns were. The optimal tR window could be determined by the statistical results in the software’s method optimization function. According to Table 3, which showed the average tR deviation on 17 columns of different peaks, the average tR deviation of No.1 to 10 was less than 0.3 min, but for No.11, it was 0.6 min. Therefore, it might be appropriate to set a tR window of 0.8 min to cover the tR deviation of all peaks.

Table 3 Average tR deviation of different compounds

To verify this value, different tR windows were set; the tR deviation (average deviation of 11 peaks) and identification rates on different columns are summarized in Table 4; Fig. 6. The obtained results revealed that the windows of 0.3 min and 0.5 min were so narrow that the identification rate was less than 93%, and only a few columns were available, with a proportion less than 53%. Furthermore, the identification rates of 1.5 min and 2.0 min and the available columns were more than 99% and 94%, respectively, and the tR window was considerably large; however, there was a risk of misjudgment. It was demonstrated that 0.8 min and 1.0 min were near the inflection point, being a good balance for both the accuracy and the applicability. Finally, 0.8 min was selected.

Table 4 Average tR deviation and identification rate on different columns with different tR window
Fig. 6
figure 6

Trend of tR deviation and identification rate with different tR window

Each peak can be set its own tR window. For example, a window of 0.8 min could be set for peak 11 and 0.5 min for the other peaks. Smaller tR windows were used for the other peaks in this study, which further improved the accuracy of the method and reduced the misjudgment rates.

When the PDA spectrum qualitative function was available, the tR window could be widened. In the current study, it was set to 1.5 min according to the results of Table 4. According to our previous study, tR window was set to 0.5 min [13], 0.6 min, 1.2 min [14], 0.3 min [15] and 0.7 min [18], respectively. Therefore, when only the chromatographic qualitative function was used, the tR window was recommended to be 0.5 to 1.0 min. However, when the PDA spectrum function was obtained as well, it could be widened to 0.5–1.5 min.

Comparison of different methods

The software could provide four methods for peak identification, including the RRT method, LCTRS method, RRT combined with the PDA method, and LCTRS combined with the PDA method. The conditions of the four methods optimized according to “3.3.1” and “3.3.2” are shown in Table 5.

Table 5 Conditions of different methods

Taking Col15 (sunfire C18) as an example, Fig. 7a, b showed the results of RRT and LCTRS combined with PDA methods, respectively. The peak identification results in the red box indicated that Salvianolic acid B was incorrectly identified as Salvianolic acid L by the RRT method. Meanwhile, the two peaks of Salvianolic acid L and Salvianolic acid Y could not be identified due to the large tR deviation. Yet, LCTRS combined with the PDA method, accurately identified all peaks. Additionally, the green box revealed the tR deviation of each peak and the similarity of PDA. The blue box provided linear fitting results of tR. The yellow box showed the results of the PDA spectrum. The case suggested that LCTRS combined with the PDA method was superior to the RRT method.

Fig. 7
figure 7

Comparison of RRT method and LCTRS method on column 15 (WatersSunfire, C18). a The result of the RRT method, b The result of the LCTRS + PDA method ( Qualitative analysis result of peaks, Information table, Linear regression result, Spectrum result)

The comparison results of tR from column 1 to 17 by the four optimized methods mentioned above are summarized in Table 6. For the number of positive columns (tR deviation ≤ tR window and/or PDA similarity ≥ similarity threshold), it was demonstrated that LCTRS combined with PDA method was the best, with the smallest average tR deviation, the highest identification rate, and the largest amount of available columns. However, LCTRS ranked the highest when only the chromatographic algorithm was used.

Table 6 Comparison of different methods (17 columns for method establishment)

Sample tests

Considering the overlap of Salvianolic acid D peak and Salvianolic acid E peak in the chromatogram on columns 18–22, these columns were used for sample testing rather than method establishment. Three steps were included for sample testing. Firstly, the chromatographic and spectral data were introduced, and the peaks were integrated. Secondly, the reference compounds (peak 3 and peak 9) in the sample chromatogram were assigned. Thirdly, the results were obtained after running the method. The sample test results were exhibited in the same way as shown in Fig. 7, which included the qualitative results of peaks, qualitative result tables, linear fitting results, and spectrum. The peak qualitative results on column Agilent TC-C18 (2) of the four methods are shown in Fig. 8 and A shows the results of the RRT method, which had the smallest tR deviation of 0.110 min. Nevertheless, Salvianolic acid B peak was unidentified; Salvianolic acid L peak and Salvianolic acid Y peak were incorrectly identified. Figure 8b shows the results of the LCTRS method, which had the second smallest tR deviation of 0.280 min. Salvianolic acid L peak was correctly identified, but the Salvianolic acid Y peak was incorrectly identified. The RRT, combined with the PDA method (Fig. 8c) and the LCTRS combined with the PDA method (Fig. 8d) had the same identified results. As shown in figures, the Salvianolic acid L peak and Salvianolic acid Y peak were both correctly identified by the two methods. Still, the LCTRS, combined with the PDA method, had a smaller tR deviation of 0.293 min. Table 7 shows a summary of the comparison results of the four methods established on five columns revealing that the RRT method was still the worst method with the lowest identification rate of 72.7%. On the other hand, LCTRS combined with the PDA method remained the optimal method with a smaller tR deviation of 0.240 min and the highest identification rate of 80.0%.

Fig. 8
figure 8

Results of sample tests on column 21 [Agilent TC-C18(2)]. a The result of the RRT method, b The result of the LCTRS method, c The result of the RRT + PDA method, d The result of the LCTRS + PDA method

Table 7 Comparison of different methods on five unknown columns, regardless of Salvianolic acid D and Salvianolic acid E

Column recommendation by database

In the study of the HPLC analysis method, a lot of chromatographic data on different columns are generally collected. However, only the information of column type, such as C18, is provided by the legal standard method. In contrast, data of the brand of the column or related chromatograms are not shown. Nevertheless, these data are indeed valuable, and differences between more useful data (such as with better separation effect, shorter separation time, smaller tR deviation, lower cost of the column) and common data are also meaningful. Therefore, based on the idea of big data, these available data were stored as a part of DRS and used for column recommendation.

Positive and negative columns were defined for column recommendation. Positive columns were referred to columns on which all peaks could be effectively separated and identified. Negative columns were columns on which some peaks could not be separated or identified. In this study, 11 compounds could not be effectively separated on column 21; therefore, this column was considered a negative column for all the four methods (Fig. 8). Column 15 was a positive column for LCTRS combined with the PDA method (Fig. 7b); however, it was negative for the RRT method due to the large retention time deviation of certain compounds (Fig. 7a). For better analysis method reproducibility, future studies should choose the positive column instead of the negative one. For columns that are not on the list of positive or negative columns used, the results, chromatographic data, and PDA spectrum of the column are also meaningful. They can be applied to upgrade and improve the DRS method. Obviously, the positive or negative columns are distinguished for different medicines, different chromatographic conditions, and even for different peak identification methods for the same medicine. The list of the positive and negative columns for the phenolic acid extract of S. miltiorrhiza for the four methods is shown in Table 8, while more detailed information is presented on the software database.

Table 8 Column recommendations for different methods

Discussion

In the current study, the offline version of the DRS analyzer was used. In order to improve the convenience of data updating and data sharing, an online version should be developed in the future. The future direction of DRS is expected to be with big data, based on which the artificial intelligence could be introduced. In addition, specifications and the guideline of DRS should be studied in the future so as to ensure the authenticity, accuracy, and reliability.

Conclusions

To the best of our knowledge, the present study is the first that developed a DRS strategy. A series of quality control methods of fingerprints in the phenolic acid extract of S. miltiorrhiza was developed based on the DRS analyzer, involving the RRT method, LCTRS method, RRT combined with PDA spectrum method, and LCTRS combined with PDA spectrum method. In addition, the column database of samples was also established. The obtained results revealed the LCTRS combined with the PDA spectrum as an optimal way. The results also demonstrated that DRS analyzer could accurately identify 11 compounds of the samples, using only one or two physical RSs. The strategy significantly reduced the analysis cost and ensured the accuracy and reproducibility of the analysis method.

The DRS strategy adopted in this study has the following advantages. (1) the software automatically processes data, instead of the complex manual calculation, thus saving time and avoiding mistakes in calculation than RRT method and LCTRS method. (2) The results are objective and consistent, avoiding the subjectivity of manual identification than RRT method, ERS method, and LCTRS method. (3) The chromatographic and spectral data formats supported by the software are universal and compatible with mainstream chromatograph workstations; therefore, the popularization and application of the method can be easily realized. (4) It is compatible with a variety of substitute RS methods (such as RRT method, ERS method, and LCTRS method) and supports chromatographic algorithms, spectrum algorithms, and the combination of these algorithms, which has complementary advantages of each method. (5) DRS analyzer is based on the idea of big data to realize the recommendation of the column for different medicines, different chromatographic conditions and different peak identification methods (such as RRT method and LCTRS method) for the same medicine.

In summary, the DRS strategy can effectively reduce the cost of RSs, and achieve higher accuracy and reproducibility than the single substitute RS method. Moreover, it is automated, intelligent, objective, accurate, eco-friendly, universal, sharing, and promising, thus representing a feasible method for overall quality control (such as fingerprint analysis and simultaneous multi-components determination) of TCMs and herbal medicines on different chromatographic columns.