Introduction

Recently, Andrographis sp. (Andrographis paniculata (Burm. F.) Wall. Ex Nees), also known as “Xiyanping” in Chinese, is one of the target medicinal plants that are under testing for the anti a novel coronavirus (SARS-CoV-2) caused the coronavirus disease 2019 (COVID-19) in Thailand, China, and other countries (Sa-ngiamsuntorn et al. 2021; Enmozhi et al. 2020; Murugan et al. 2020; Shi et al. 2020; Cai et al. 2020). In general, A. paniculata is indicated for the relief of the common cold associated with nasal congestion, such as a sore throat, muscle aches, mucus, and stomach ache (Pholphana et al. 2004) and is more active as an anti-inflammatory (Shen et al. 2000) and related immunity (Puri et al. 1993; Calabrese et al. 2000; Chowdhury et al. 2012). Furthermore, andrographolide compound in A. paniculata was explored and reported with respect to the activity of broad-spectrum antiviral properties (Gupta et al. 2017). In 2020, Enmozhi et al. (Enmozhi et al. 2020) evaluated the andrographolide from A. paniculata as a potential inhibitor of the main protease of SARS-COV-2 (Mpro) through in silico studies such as molecular docking, target analysis, toxicity prediction, and the absorption, distribution, metabolism, excretion (ADME) prediction. Moreover, Cai et al. (Cai et al. 2020) reported that A. paniculata could reduce inflammation in COVID-19 patients and improve symptoms such as cough, fever, and rales in the lungs. It is recommended for treating patients in the progressive stage of COVID-19 (critical case) in the National Health Council (NHC) guidelines.

Therefore, the plant materials of A. paniculata should be confirmed to ensure the active content of andrographolide within a level of acceptance. Thai Herbal Pharmacopoeia (Department of Medical Sciences, Ministry of Public Health 2019) defined the positive factors for a powder of the aerial part of A. paniculata herb plant, with the amount of A. paniculata not less than 1.0%(w/w). However, there are significant differences in the active compound depending on the specific part of the herb plant collected, geographic origin, season, and harvest-time (Hossain et al. 2014). It has been reported that the greatest amount of bioactive compound from A. paniculata increased at 110 days postharvest and was lower at 130 days after pre-blossom of the plant (Sharma and Sharma 2013). There appear to be problems with A. paniculata, in specific situations, regarding the raw materials for the herb business and their regular supply and quality that are essential for a commercial enterprise. The purchase price of raw materials is usually settled based on the physical appearance involving piece form, color and contaminants, trust between the buyer and seller regarding the time of harvest, and that there are substantial bioactive compounds. However, it is possible for quality control in collection and processing to be variable, resulting in lost value for downstream products. The quantitative analysis of bioactive A. paniculata components is generally based on the extraction and analysis to estimate the amount of diterpene lactone using various methods such as TLC (Rajani et al. 2000), LC (Jain et al. 2000), and HPLC (Chen et al. 2007; Xu et al. 2008; Sharma et al. 2012). However, there are limitations in the analysis, and because these processes require considerable time, samples can be damaged through using the chemical substance and several types of scientific equipment. Thus, the high cost of analysis and the complex procedures require experienced analysts. The current research aimed to find possible opportunities by focusing on quality control upstream in the process and specifically on the raw material, using NIR spectroscopy to test for the amount of active ingredients in the A. paniculata samples, and thus solve these problems.

The application of NIR to analyze the bioactive compound of herbs has started to spread worldwide, especially in Asian countries such as China, Korea, Japan, and Thailand. There has been researched into both quantitative analysis of the active ingredients and quality analysis to collect and identify genuine herbs to command high prices based on the level of potential substrates categorized by geographical origin. In our previous studies, we reported the application of moving window partial least squares regression (MWPLSR) to the quantitative determination of total curcuminoids in turmeric rhizome by NIR spectroscopy (Kasemsumran et al. 2010). The second derivative pretreated NIR spectral range of 2040–2486 nm was selected to build the model for total curcuminoids prediction in turmeric powder sample, in which the obtained statistical values were the standard error of prediction (SEP) of 1.003% w/w and the ratio of prediction to deviation (RPD) of 4.857. We extended our study for NIR applications to a pharmaceutical analysis by reporting the achievement of NIR assaying curcumin in the 170-capsule of turmeric herbal medicines (Kasemsumran et al. 2014). Some works including our studies are summarized in Table 1 (Kasemsumran et al. 2010, 2014, 2017; Ren and Chen 1999; Schulz et al. 1999; Luypaert et al. 2003; Chen et al. 2009; Lee et al. 2014; McGoverin et al. 2010; Chan et al. 2007; Wang et al. 2007; Lim et al. 2012; Tahir et al. 2020; Liu et al. 2018; Tanaka et al. 2008; Kim et al. 2014). In addition to those research works, there has been a study to analyze the amount of andrographolides in Chinese herb samples of A. paniculata that developed a good calibration model with good efficiency and accurate prediction (Lai et al. 2018). However, the parts of A. paniculata plants didn’t take into account in the study.

Table 1 Literature review of NIR studies in plants, herbs, and herbal products

The objective of this research is to investigate the efficiency of NIR technology aiming at the classification for a selection process of A. paniculata with high bioactive compound and quantification of andrographolide (AP1) and dehydroandrographolide (AP3) in A. paniculata. The 170 samples with different sources and plant parts were achieved in this study since the quantity of bioactive compounds varies in different plant parts of A. paniculata. The developed process was achieved using the long-wavelength NIR region of 1000–2500 nm to measure the diffuse reflectance spectra of A. paniculata and partial least squares-discriminant analysis (PLS-DA) combined with partial least squares regression (PLSR) to construct the discriminant model and quantitative model, respectively.

Partial least squares regression (PLSR)

PLSR is usually employed in many applications for quantitative analysis (Kasemsumran et al. 2014; Schulz et al. 1999; Luypaert et al. 2003; Chen et al. 2009; McGoverin et al. 2010; Chan et al. 2007; Wang et al. 2007; Lim et al. 2012; Tanaka et al. 2008; Kim et al. 2014). It is a very powerful method, in which the y-variable (reference value) is taken to account by balancing the X- and y-information, then, the factors as linear combinations of the original spectral data (x-values) are constructed and employs only these factors in the regression equation. PLS regression aims to reduce the quantity of spectral data, only the most relevant part of the x-variation is used in a regression for predicting y. Thus, an efficient PLS calibration model can be obtained.

Partial least squares-discriminant analysis (PLS-DA)

PLS-DA is based on the PLSR algorithm. It is suitable for the classification of high dimension NIR spectral data for diverse purposes (Kasemsumran et al. 2017). It optimizes the fitting and prediction y to {0/1}-coded membership indicating variables in the development of factor number.

For quality control of herbal samples by using NIR spectroscopy, the support of advanced chemometric method is vital and helpful to achieve an efficient calibration model of the analyte.

Experimental

Materials

Figure 1 shows the form of a typical A. paniculata plant. For the current study, 170 samples of A. paniculata of know origin were used. Samples were collected over two years from four parts of each plant: aerial parts, stems, leaves, and stems mixed with leaves (Fig. 2) from farmland and herbal drugstores in ten provinces of Thailand namely, Bangkok, Nakhon Pathom, Nonthaburi, Prajeenburi, Phetchaburi, Phetchabun, Mahasarakham, Suphanburi, and Sa Kaeo. Fresh samples were heated at 50 °C for 10 h and then, after this was ground in a mechanical grinder (Cyclotec, model 1093 FOSS, Hillerod, Denmark) and passed through a 1 mm sieve.

Fig. 1
figure 1

Plant of A. paniculata (Burm. F. Nees) in a plantation at Nakhon Pathom province, Thailand

Fig. 2
figure 2

Parts of dried A. paniculata plant sold commercially and used in this study

FT-NIR spectral acquisition

1.00 ± 0.02 g of each powdered sample was kept in a separate glass vial and tightly pressed using stainless-steel seal equipment. All samples were collected with approximately the same depth of 1.2 cm above the vial bottom. The sample vial was placed on the vial measuring tray for NIR measurement as shown in Fig. 3, and data were collected using a Fourier Transform—Near-Infrared Spectrophotometer (FT-NIR) (NIR-Flex Solid, Buchi, Switzerland) in the scan range of 10,000–4000 cm−1. Triplicate measurements were made for each sample at a spectral resolution of 8 cm−1 and 128 times per scan. It is noted that the units of reflectance and wavenumber were converted to absorbance and wavelength (1000–2500 nm) prior to data analysis for the easy NIR band assignment.

Fig. 3
figure 3

NIR measurements for powder samples of A. paniculata

Bioactive compound extraction and quantification using HPLC

The powders, as scanned by NIR, were subjected to methanol extraction (AR Grade; ACI Labscan), in which the sample was placed in a cellulose thimble (30 mm × 100 mm) and extracted using methanol of 150 ml for 210 min in a soxhlet system. The extract was then concentrated using a rotary evaporator (Model R-210/215; Buchi; Denmark) and completely dried in an oven. Each concentrated extract was diluted by methanol (HPLC grade; ACI Labscan) to adjust the volume. The resultant liquid was passed through a syringe filter and collected in a tinted vial for HPLC analysis.

Bioactive compound quantification was done using HPLC according to Xu et al. (2008) by injecting a 10 μl sample into a column (Inert Sustain C18, 4.6 mm ID × 150 mm 5 μm; HPLC Shimadzu model LC-20A) with methanol: water (55:45) as the mobile phase at a flow rate of 1.0 ml/min at 30 °C with a UV detector at 223 nm. The standard curve was compiled based on the average from duplicates of standard andrographolide (purity ≥ 98%; Sigma-Aldrich) and dehydroandrographolide (purity ≥ 98; Sigma-Aldrich). Five calibrators of standard were freshly prepared by diluting the stock solutions with mobile phase in appropriate quantities. The calibration range was 10–300 μg/ml for both andrographolide (AP1) and dehydroandrographolide (AP3). The peak-area versus AP1 and AP3 concentrations in the calibration range resulted in the regression equation, r = 0.9999. Moisture analysis in herb was also done for all powder samples, and the results were employed for concentration calculation on a dry basis.

Statistical analysis of mean quantity from different parts of A. paniculata

Data of the bioactive quantities from AP1 and AP3 based on HPLC from the 120 collected samples were processed and then subjected to one-way analysis of variance, and sample means were compared Duncan’s new multiple range test at α = 0.05.

Calibration model development for discrimination and quantitative analysis of AP1 and AP3 content in A. paniculata using NIR spectral data

The NIR model was constructed using Unscrambler version 9.8 (CAMO AS; Trondheim, Norway) with three modes of the spectrum: (1) uncorrected spectrum; (2) corrected spectrum using multiplicative scattering correction (MSC); and (3) corrected spectrum using second derivatives (2D) based on Savitzky-Golay model (polynomial order = 2, number of smoothing points = 5), while applied with PLS technique using PLS1. All 170-sample data were divided into 120 calibration sets with the remaining 50 samples in the validation set. Full cross-validation was used to find the optimum number of PLS factors (F) with the lowest standard error of cross-validation (SECV) for the models used for the discrimination and quantitative determination.

The selection of A. paniculata herb plant by the amount of AP1 as a major bioactive compound was required not less than 1.0%. For the rapid selection, the classification model was built using PLS-DA. The calculation involved giving values of “0” and “1” to the NIR spectra of samples having a content of AP1 lower than 1.0% (out of specification) and that the high content from 1.0% (in the specification), respectively. The classification PLS model was built in a regression for predicting y. A sample with the predicted result lower than 0.5 was classified as out of specification (OUT), else the predicted result greater than equal 0.5 was classified as meet a requirement (PASS-1). Furthermore, individual PLS calibration models for the quantitative determination of actual AP1 and AP3 contents in A. paniculata were developed, respectively.

Results and discussion

Quantitative analysis of AP1 and AP3 in A. paniculata using HPLC

Figure 4 shows the chromatogram of an extracted sample having the highest amount of AP1 and less AP3, with peak retention times of 5.31 min and 14.39 min, respectively. Therefore, the major active compound found in A. paniculata plants was AP1.

Fig. 4
figure 4

HPLC chromatogram of extracted A. paniculata sample consisting of andrographolide (AP1) and dehydroandrographolide (AP3)

From the 120-sample calibration set, Table 2 shows the different amounts of both AP1 and AP3 extracted from the different parts of A. paniculata plants. The leaves contained the highest proportion of AP1. The aerial and leaf parts contained the highest proportion of AP3, without any significant difference between the AP3 levels from these two parts. Therefore, the leaves are the most important part to be used as raw material for extraction to acquire bioactive compounds since they contained the highest proportions of both AP1 and AP3. In addition, the results of the study indicated that commercial samples used in this study might be harvested during the period of vegetative, but not more than 50% of the flowering (Chen et al. 2007).

Table 2 Comparative analysis using one-way analysis of variance of mean quantities of AP1 and AP3 in different parts of A. paniculata

All samples (in both the calibration and validation sets) contained AP1 and AP3 in the range 0.151–3.608% and 0.572–1.990% by dry weight, respectively. Table 3 shows the distribution of quantities for the samples in the calibration and validation sets. The histogram in Fig. 5 indicates that amounts of AP1 and AP3 appeared to be normally distributed for both the calibration and validation sets. In fact, the normal curve line for the AP1 content differed more from a standard curve than for AP3. This indicated that each part contained many different quantities of AP1 with a higher resultant standard deviation (SD) of 0.900–0.902% dry weight than for AP3. The AP3 content appeared to be high curve (low flatness) for a normal distribution. Therefore, the SD was lower (0.336–0.338% dry weight), indicating that the selection of the raw material can be more rapid. Considering only the quantity of AP1, a selection of leaves would be best to maximize production. However, after grinding, it would be difficult to distinguish just this part of the material. Based on these results, it was necessary to develop a method to quantify the contents by applying a NIR spectroscopy technique.

Table 3 Distribution of andrographolide (AP1) and dehydroandrographolide (AP3) in A. paniculata sample sets
Fig. 5
figure 5

Distributions of andrographolide and dehydroandrographolide contents

NIR spectral data of A. paniculata powder

The standard normal variate (SNV) of three mean original NIR spectra in the 1000–2500 nm region of A. paniculata powder sample (dot line), standard AP1 (solid line), and standard AP3 (dash line) is illustrated in Fig. 6. It can be seen the informative bands due to the NIR active functional group (Fig. 4) of AP1 and AP3 in the NIR spectrum of A. paniculata sample as follows; two bands at 1180 and 1380 nm are the second-overtone of C-H alkene, and C-H methyl-alicyclic hydrocarbon, individually (Workman and JrL 2007). Two broad bands around 1430 and 1940 nm are largely due to the combination of O–H symmetric and antisymmetric stretching modes of water and the combination mode of the O–H stretching and deformation vibrations of water, respectively (Maeda et al. 1995). The water bands come from the residual moisture in the powder sample. A small peak around 1630 nm raises by the 1st overtones of C–H alkene (ethylidene). One broad band around 1660–1800 nm is assigned to the C–H and O–H of alicyclic hydrocarbon, and the C–H methylene. An individual peak at 1940 (O–H bending) and 1950 nm (O–H and C–H combination) was found in the NIR spectrum of the AP3 and AP1 standards, respectively. However, these two peaks did not appear in the NIR spectrum of A. paniculata sample because they were covered with a broad NIR band due to water. Those bands in 2060–2200 and 2270–2500 nm were associated with the combination band of C–H alkene and O–H, and the combination band of C–H stretching, C–H2 deformation of alicyclic hydrocarbon of AP1 and AP3 standards (Workman and JrL 2007).

Fig. 6
figure 6

Standard normal variate (SNV) of three mean FT-NIR spectra in the 1000–2500 nm region of A. paniculata powder sample (dot line), standard AP1 (solid line), and standard AP3 (dash line)

Figure 7A shows the NIR spectra of all powder samples based on absorbance by second derivatives (2D) using the Savitzky-Golay model (polynomial order = 2, number of smoothing points = 5). Using the 2D, it was possible to separate the overlapping spectra to understand better the relation between the light absorption and structural molecules as previously described. Figure 7B illustrates the SNV of two mean NIR spectra in the region of 1000–2500 nm of A. paniculata powder contained low and high AP1 content between 1.0%. It can be seen that the intensity of the water bands keeps constant with that comparison of two mean spectra having different in the content of AP1 in the samples (Fig. 7B). On the other hand, the intensity of the band in the 2060–2400 nm wavelength region changes with a change in AP1 content in samples (Fig. 7B). This region is relatively rich in the combination bands of C–H and O–H and C–H alicyclic hydrocarbon (Workman and JrL 2007). Nevertheless, those mean NIR spectra are very similar, and it is not easy to classify them. Therefore, the PLS-DA method was applied in this study to classify between samples with high (≥ 1.0%) and low (< 1.0%) amounts of AP1 in A. paniculata samples.

Fig. 7
figure 7

2D-NIR spectra obtained using 150 glass vials of calibration powder samples in the range 1000–2500 nm (a) and two standard normal variate (SNV) of mean 2D-NIR spectra in the region of 1000–2500 nm of A. paniculata having low AP1 content (dash line), and high AP1 content (solid line) (b)

Results of NIR calibration model for discrimination analysis of A. paniculata based on AP1 content

The classification models were developed using the different pretreatment spectra. The classification performances of PLS-DA regression models were compared, and their statistical results were reported in Table 4. The results show that the PLS-DA method yields success classification results in both none and pretreatment methods. The %CC value obtained from the PLS-DA method was 100% (Figure S1). The standard errors of calibration (SEC) and validation (SEV) are the important factors that express the classification performance in this study. The best PLS-DA model for the selection process of A. paniculata based on AP1 content was developed using 2D spectra in the whole region of 1000–2500 nm with a PLS factor of 6. It gives the best classification performance of 100% (CC) with the lowest SEC and SEV values. It appears that the 2D pretreatment can enhance any spectral differences in the whole region where bands due to any differences in the AP1 content appear. Figure 8 is a plot between the given actual value (X-axis) for each class and the PLS-DA predicted value (Y-axis) for validation samples by the best PLS-DA model.

Table 4 Statistical results of PLS-DA models for selection of A. paniculata sample base on the AP1 content using entire NIR wavelength region
Fig. 8
figure 8

PLS-DA model built by the 2D-NIR spectra for classification of A. paniculata samples by mean of the AP1 content (class 1: Pass, class 0: Out, Discriminant line: 0.5)

Results of NIR calibration model for determination of AP1 and AP3 contents

Table 5 shows the calibration statistic model for quantitative analysis of AP1 and AP3 in A. paniculata based on the data collected from the entire range of NIR absorption spectra. The best NIR calibration model for the prediction of AP1 content was a calibration generated from the 2D spectra with the factor of 7. Due to the power of 2D pretreatment, one can enhance the spectral differences that are related to vary of AP1 content. It provided the standard error of calibration (SEC) at its lowest (0.191%) with the highest correlation coefficient (R = 0.977). The test results from the full cross-validation had the lowest standard error of cross-validation (SECV = 0.248%), while testing with the independent sample set had a low standard error of validation (SEV) of 0.238%.

Table 5 Summary results for PLS calibration models of AP1 and AP3 A. paniculata powder sample for entire NIR wavelength region

Using the calibration dataset, the best result for determining the AP3 content used the NIR spectra without modification at the PLS factor of 11, with the lowest SEC of 0.123% and the highest correlation coefficient of R = 0.931. The test results from the full cross-validation had the lowest SECV of 0.144%, while the test results from the independent sample set also had a low SEV at 0.152%. The spectral pretreatment was unnecessary in model development for the prediction of AP3 content, perhaps, it is due to the amount, and standard deviation of AP3 in the samples is lower than that of AP1 (Table 3). The spectral optimization by those pretreatment methods in the entire wavelength region may yield positive effects on the AP1 compound because of its highest quantities and variation.

Figure 9A shows the scatter plot of the predicted AP1 contents from the best part of the NIR calibration and the data based on absorbance values with 2D on the y-axis and based on HPLC on the x-axis. By the way, Fig. 9B shows the scatter plot between the results of predicting the AP3 contents from the best NIR calibration equation when calculated from the whole range of non-pretreated NIR data (y-axis) against the actual content analyzed by HPLC (x-axis). Both of them revealed a good straight-line relationship.

Fig. 9
figure 9

Scatter plots between actual AP1 (a) and AP3 (b) contents in A. paniculata powder samples detected using HPLC (x-axis) and predicted contents using the selected NIR calibration model (y-axis)

Conclusions

The modeling results obtained in this study have demonstrated that NIR spectroscopy is promising to discover the target A. paniculata with high AP1 content and determine the contents of AP1 and AP3 in A. paniculata for the material selection process. PLS-DA analysis using an entire wavelength region of 1000–2500 nm and pretreated spectra with the 2D method succeed in classifying between A. paniculata materials in the specification (high AP1 of ≥ 1%) and those out of specification. Thus, an efficient PLS-DA model was obtained with 100%CC for the validation set. Furthermore, excellent PLS models for the determination of AP1 and AP3 contents in A. paniculata were developed. The most selective PLS calibration model for the quantitative determination of AP1 content in A. paniculata built by using 2D pretreated spectra over the whole region of 1000–2500 nm yielded prediction results with a low error of 0.24%. For the quantitative analysis of AP3 content in A. paniculata, PLS calibration developed using original spectra in the entire regions yielded good results with an error limit of 0.15%. Additionally, the leaf parts of A. paniculata produced the finest possible products due to its highest content of bioactive compounds. Therefore, NIR technology is a fabulous method that offers rapid analysis without extraction and without causing damage to samples that can facilitate the quality control of A. paniculata raw materials in pharmaceutical industry. This means that the quality of raw materials can be controlled for all samples. Consequently, the risk of substandard herbal products is possible to decrease.