Introduction

Natural products from land, sea, and rivers, such as plants, microorganisms, insects, and other animals, have historically proven their value as a source of molecules with therapeutic potential. Plant sources are especially important and still represent an important arsenal for new drug identification, as well as microorganisms (Mandova 2022). Several studies emphasize that natural products and their derivatives play an important role in the drug discovery and development process, covering a significant percentage of approved drugs worldwide (Harvey et al. 2015; Newman and Cragg 2020). A detailed analysis of new drugs approved by the Food and Drug Administration (FDA) between 1981 and 2019 revealed that 1,059 drugs are either natural products, direct derivatives, or synthetic drugs with the pharmacophoric group of active secondary metabolites and this corresponds to about 56.1% of the 1,881 drugs approved within this period. Among them, the antitumor area is the first in the ranking, with the launch of 172 anticancer drugs of natural origin, which represents 69.6% of the total drugs launched for this pathology, followed by 94 antibacterial (58%), and 70 antiviral drugs (37.6%) (Newman and Cragg 2020).

Between 1990 and 2000, the pharmaceutical industry focused its interest on the discovery of natural product-based drugs, a period known as “Green Eldorado”. However, important challenges were identified at the time, such as difficulties in the isolation and/or purification procedures of bioactive compounds, accessibility/harvesting of starting material and species identification, low yield of interest actives for initial pharmacological evaluation, governmental policies, among other challenges (David et al. 2015). According to this scenario, in the early 2000s, most major pharmaceutical companies shifted their focus back to discovering new drugs from libraries of synthetic compounds. They are comparatively easier to produce, replenish, and demonstrate good compatibility with high-throughput screening platforms. However, there has been a downward trend in the number of new drugs coming to market, with a resurgence of scientific interest in drug discovery from natural sources, despite its known challenges (Atanasov et al. 2015).

To measure the success of natural products in drug discovery, several investigators studied the relationships among natural products, marketed drugs, and synthetically prepared small molecule libraries. When comparing structural similarities, Grabowski and Schneider (2007) identified more than one thousand scaffolds in the natural product library that were not exemplified in any of the other sets of compounds studied. Feher and Schmidt (2003) used principal components analysis to map the chemical diversity space of the three classes of compounds, natural products, molecules from combinatorial synthesis and drug molecules. This research verified that marketed drugs and natural products cover a much larger volume of the diversity space than do combinatorial compounds. Thus, combinatorial compound libraries are much less diverse than those of natural products.

Different methodological approaches are used to identify active natural products in plants, such as ethnopharmacology, chemosystematics, molecular ecology, and computational tools (Albuquerque et al. 2014). Nowadays, chemoinformatics has major applications in the research of natural products in order to identify and optimize bioactive compounds. In drug discovery area, chemoinformatics has helped to mitigate billions in cost and decrease time through the preclinical and clinical phases. To date, the discovery process of more than 70 commercialized drugs has included a computational method, showing the relevance of natural products databases (Chen et al. 2017; Gómez-García and Medina-Franco 2022).

The application of high-throughput screening of large sample sets or libraries has become central to leading discoveries in industries and research institutions (Newman 2017). The screening of natural products can provide greater structural diversity than standard synthetic chemistry and offers unique opportunities for finding novel low molecular weight lead compounds (Bindsei et al. 2001). A natural products library contributes both to the discovery of therapeutic agents and the identification of starting points for chemical optimization through medicinal chemistry. What often distinguishes the industry leaders in drug discovery and development from their competitors is the quality of the compound libraries and the accessibility that they have to the information within those libraries.

This review aims to present important points for building and managing physical natural product libraries from plant origin, as biochemical screening strategy and drug discovery. In addition, this review intends to introduce crucial topics, less commonly discussed in other studies, such as pre-treatment of plant material, and processes that can directly influence the phytochemical composition of the samples that will compose the library. Taking all aspects together, building a natural product library might be challenging. However, many examples of success demonstrate that biodiversity brings great opportunities to foster drug discovery by exploring the best of natural product extracts, fractions or compounds.

Search Strategy

The data in this research were collected using Science Direct, PubMed, and American Chemical Society databases, as well as the Brazilian scientific databases, such as Scielo. The search included articles published between 1992 and 2023 in internationally recognized journals that cover the following areas of research on medicinal plants and natural products: Biological and Pharmacological Activities, Natural Products Chemistry, Analytical Studies and Chemoinformatics. In addition, scientific guidelines and websites of scientific content of recognized institutions, such as the European Medicines Agency (EMA) and the Convention on Biological Diversity (CBD) were used. The main keywords used for the search were Natural Products, Natural Product Libraries, Convention on Biological Diversity, and Brazilian Biodiversity.

Discussion

Regulatory Aspects of Biodiversity

Natural product libraries generally comprise plants, marine invertebrates and/or microorganism extracts, as well as secondary metabolites, that are often collected in megadiverse regions. Many of them are located at least partially in tropical or subtropical regions. The access and use of these biological resources must be mutually agreed between the country willing to use the resource and the country of origin of these species, which by the dmso is considered to have sovereign rights over them (Quinn 2012).

There is a complexity when it comes to the regulations defining the need for benefit sharing with biological materials and their origin countries, framed in the United Nations 1992 Convention on Biological Diversity and the Nagoya Protocol, as well as recent developments concerning benefit sharing linked to the use of marine genetic resources (Atanasov et al. 2021). Concerns about biodiversity loss and the recognition of its importance in supporting human life motivated the conception of the CBD established by the United Nations Conference on Environment and Development (UNCED), held in Rio de Janeiro in June 1992. The CBD was an important milestone for the international discussion on issues related to the environment. The convention is based on three equally important and complementary pillars: the conservation of biological biodiversity, the sustainable use of its components, and the fair and equitable sharing of benefits derived from the use of genetic resources (MMA 2000). As a supplementary agreement to the CBD, the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Deriving from Their Utilization (ABS) came into effect in October 2014. Based on the Convention on Biological Diversity (2022), it is an international agreement that aims to share the benefits arising from the fair and equitable utilization of genetic resources. At present, the CBD and Nagoya Protocol on ABS has been ratified by sixty-seven countries, agreed upon by sixty-four, approved by three and accepted by five.

In Brazil, a legal landmark (Law 13.123/15) was published on May 20, 2015 and entered into force on November 17, 2016. The National System for the Management of Genetic Resources and Associated Traditional Knowledge (SisGen), according to Ordinance SECEX/CGEN No. 1 (2017), was created to facilitate compliance with the legislation and assist the Genetic Heritage Management Council (CGen). This legal landmark, whose scope is more comprehensive than the previous legislation (Medida Provisória 2,186–16/2001), involves research, technological development, and economic exploitation of finished products and reproductive materials from access to genetic heritage and associated traditional knowledge. The construction of a new legislation was complex, considering the different interests and visions among diverse sectors of civil society, represented by the academia, industrial sector, and holders of associated traditional knowledge, as well as the different sectors of government (Silva and Oliveira 2018).

The replacement of the previous authorization by the current registry, which can be carried out during the research and technological development with genetic heritage and associated traditional knowledge in SisGen, resulted in significantly reduced bureaucratization of research and development in Brazil and, consequently, is one of the most positive changes in the Law. However, foreign researchers will be able to access Brazilian native biodiversity only if they are associated with public or private Brazilian scientific and technological research institutions, which must take responsibility for registering the activity (Silva and Oliveira 2018). An activity that is conducted without due association to a foreign institution is illegal.

Despite that ABS laws have their particularities; the majority include the need for formal contracts between the parties and approval of the government of the country that has the sovereign rights to the genetic resources. Negotiations can be time-consuming and require a good knowledge of the commercialization pathway, risks and rewards from all sides involved. It may take several years to negotiate appropriate benefit-sharing and access agreements (Quinn 2012).

Brazilian Bioactive Metabolites

Brazil has an extremely rich biodiversity, with approximately 20% of all known living species globally, which are found in different biomes. Six continental Brazilian biomes are defined: Amazon Forest, Cerrado, Caatinga, Atlantic Forest, Pantanal and Pampa (Valli et al. 2018). In addition, the Brazilian coastline of 7,367 km is home to three marine ecosystems and twelve large hydrographic regions, known today as the Blue Amazon (Castro et al. 2017).

For many years, Brazil has been ranked as the first of the 17 countries with mega diversity, with 22% of the total land plants in the world (Savi et al. 2019). At this moment, “Flora and Funga do Brasil” project recognizes 52,350 species (native, naturalized and cultivated), 5,028 of which are Algae, 1,617 Bryophytes, 1,412 Ferns and Lycophytes, 121 Gymnosperms and 36,002 Angiosperms (Flora e Funga do Brasil 2020). Despite the immense chemical and biological resources, few examples of natural products from Brazilian biodiversity have reached the world market of medicines, cosmetics and nutraceuticals (Valli et al. 2018).

Pilocarpine (1), an imidazole alkaloid isolated from the leaves of plants in the genus Pilocarpus, Rutaceae, is a successful example of a plant species from the Brazilian biodiversity used in important pharmaceutical applications, specifically in the control of elevated intraocular pressure and presbyopia (Andreazza et al. 2015; Wolffsohn et al. 2023). The commercial production of this pharmacological active natural product is derived entirely from P. microphyllus Stapf ex Wardlew. Another successful example was the discovery of bradykinin (2), a peptide isolated from Bothrops jararaca venom with vasodilating action, which led to the development of the synthetic analog captopril (3), the first angiotensin-converting enzyme inhibitor used worldwide to treat human hypertension (Camargo et al. 2012).

figure b

Some herbal medicines from the Brazilian biodiversity were fully developed in Brazil. In 2005, Ache Laboratórios Farmacêuticos launched Acheflan®, the first phytomedicine with 100% national research and development, produced from the volatile oil of Varronia curassavica Jacq. Boraginaceae (Syns. Cordia curassavica (Jacq.) Roem. & Schult., and C. verbenacea DC.) a plant found in the coast of the Atlantic Forest, which presents anti-inflammatory, antimicrobial, and healing properties. α-Humulene (4) is the the anti-inflammatory active ingredient found in the leaves (Martim et al. 2021).

figure c

A plant native from the Brazilian savanna “Cerrado”, Stryphnodendron adstringens (Mart.) Coville, Fabaceae, is traditionally used in wound healing and known as “barbatimão”. The pharmaceutical company Apsen Farmacêutica developed Fitoscar®, an ointment based on the S. adstringens dry extract indicated as a healing agent for several types of skin lesions. The dry bark extract of the species is also incorporated into liquid soaps due to its antiseptic property. Three classes of active compounds have been described, phenolic acids, flavonoids, and tannins (condensed tannins). The condensed tannins are the compounds identified in more significant proportions from this species (Ribeiro et al. 2022).

Another Brazilian native plant commonly found in the Atlantic Forest is Mikania glomerata Spreng., Asteraceae, traditionally known as “guaco”. The concentrated extract of “guaco” composes syrups and oral solutions that are widely used in the treatment of respiratory diseases, due to its bronchodilator, anti-inflammatory, and antispasmodic activities (Pasqua et al. 2019; Ormond et al. 2022). The Brazilian legislation recommends the identification of coumarin (5) as a chemical marker in M. glomerata. Although other metabolites in the extract, such as dihydrocoumarin, caurenoic acid, syringaldehyde, and o-coumaric acid, act in synergism to promote its therapeutic efficacy. The root extracts of Carapichea ipecacuanha (Brot.) L.Andersson (syn. Cephaelis ipecacuanha (Brot.) Willd.) Rubiaceae, also native to the Atlantic Forest, presents expectorant activity and constitutes the syrup “Melagrião”® in association with the extract of the leaves of M. glomerata (Rout et al. 2000).

figure d

There are several research groups in Brazil that focus on exploring this rich biodiversity rationally. One of these is the Nucleus of Bioassays, Biosynthesis and Ecophysiology of Natural Products (NuBBE) research group, which has been involved in the latest advances in natural product chemistry, including the search for biologically active compounds from plants from different biomes and endophytic fungi (Pilon et al. 2017).

NuBBE was one of the first Brazilian natural products chemistry research groups involved in the foundation of the Virtual Institute of Biodiversity, BIOTA-FAPESP, an ongoing successful program in the state of São Paulo, Brazil, nowadays a Worldwide recognized Biodiversity Program (Valli et al. 2013).

Notable compounds from Brazilian biodiversity and related to NuBBE research include the cytotoxic diterpene clerodane casearin X (6) from Casearia sylvestris Sw., Salicaceae (Ferreira et al. 2016), the anxiolytic erythrina alkaloid, ( +)-erythravine, (7) from the medicinal plant Erythrina mulungu Benth., Fabaceae (Valli et al. 2013) and (-)-spectaline (8), a rare piperidine alkaloid from Senna spectabilis (DC.) H.S. Irwin & Barneby, Fabaceae, which acts as a potent acetylcholinesterase inhibitor (Viegas et al. 2005; Freitas et al. 2018).

figure e

Faced with the greatest diversity of plants in the world, with an immense repository of unknown species, the examples of herbal medicines and bioactives from Brazilian biodiversity species show an enormous pharmacological, chemical, and economic potential to be explored. Therefore, the strategy of creating libraries based on natural products from this rich ecosystem can be an important source of new bioactive compounds for drug discovery.

Natural Products Libraries

Several modern approaches, including in silico studies such as molecular modeling, virtual screening, combinatorial chemistry, and organization of natural product libraries, have been used to improve the search for new therapeutic agents (Najmi et al. 2022). Natural product libraries are composed of flora or fauna samples, such as plants, marine organisms, and terrestrial or marine microorganisms. Once collected, samples need to be processed to result in available compounds, which allows their screening in several biological targets in vitro or testing them in phenotypical assays (Quinn 2012).

The process of establishing a natural product screening library for use in high-throughput screening (HTS) can begin with biological screening of crude extracts to identify a bioactive "hit" extract, which can subsequently be fractionated to isolate active natural products (Atanasov et al. 2021). Sometimes it is also possible to conclude that the crude extract is more biologically active than its fractions or the isolated compounds (Ekasari et al. 2022; Pourhadi et al. 2022). This cooperative effect observed in extracts is due to synergistic interactions of metabolites, that is, the combination of compounds presents greater therapeutic action than a compound individually (Geary 2013; Malongane et al. 2017).

These collections may be commercially available through collaboration and partnership programs. To accelerate hit or lead compound discovery, accessing or purchasing libraries of extracts, fractions, and compounds from nature’s biodiversity may be an effective strategy instead the construction of a new one from scratch. The following subtopics discussed the types of natural products libraries and their characteristics.

Crude Extract Libraries

To develop a library of viable natural product extracts for HTS assays, extraction protocols must be developed so that the obtained extract represents the metabolic diversity of the origin organism. Besides, it must present good sample yield, cost, and reduced time spent in sample processing (Wilson et al. 2020). Several available extraction methods (maceration, ultrasound-assisted solvent extraction, percolation, soxhlet extraction, pressurized solvent extraction, and reflux extraction) can be to obtain extracts for biological screening (Dias et al. 2021; Gori et al. 2021; Duru et al. 2022; Lin et al. 2022).

One of the great advantages of screening a crude extract library is the initial lower production cost in comparison to the generation of fractions or pure compound libraries, and the shorter time spent for obtaining and assembling it. However, some challenges for those who acquire these libraries need to be highlighted, as they are complex mixtures with variable polarity, stability, and solubility, in addition to having pigments that can interfere with HTS assay performance.

The US National Cancer Institute (NCI) developed a prominent example of a natural resource bioprospecting program. The NCI natural products collections are one of the largest and most diverse in the world, containing more than 230,000 extracts derived from plant, marine, and microbial organisms that have been collected from different regions around the world. These libraries are available to the research community for the screening of extracts and the isolation of bioactive natural products (Thornburg et al. 2018). The French National Chemical Library (ChembioFrance 2023) offers a diverse collection of 15,000 original natural extracts. This large collection is available in frozen 96-well plates, in standard solutions. The whole collection is available for screening, and it is possible to select the extracts from one to several botanic families. The Brazilian collection BIOPROS Extracts Library (BEL) has 800 extracts from native species of the Atlantic Forest biome (Almeida et al. 2021). As well as ChembioFrance, these extracts can be requested in the framework of scientific collaborations, through simple and fast legally protected procedures by a material transfer agreement (MTA).

In addition to the partnership programs such as NCI, ChembioFrance and Biopros, several companies commercialize extract libraries ready for biological screening. GreenPharma (2022) offers several types of NP libraries, such as extract library GPEL. This includes know-how in ethnobotany, botany, pharmacology, pharmacognosy, organic chemistry, and analytical chemistry to present a variety of extracts. GPEL library contains 200 plant species from 187 genera, corresponding to 80 families. Caithness Biotechnologies (2022) commercializes extract library with a unique focus on plants with a record of use in traditional medicines to maximize the potential 'hit' rate. Phytotitre is a collection of 800 plant extracts representing 367 plant species and 304 genera.

Fractionated Extracts Libraries

To reduce the intrinsic complexity of an extract, a second approach is to generate libraries of semi-purified fractions for screening. Crude extracts can be fractionated using conventional liquid chromatography techniques, such as liquid–liquid partitioning, classical or preparative column chromatography, or even solid phase extraction (Grkovic et al. 2020). Another separation technique that has recently gained popularity in natural products chemistry is supercritical fluid chromatography (Kaplitz et al. 2022). Fractionation methods can be adjusted so that sub-fractions preferentially contain compounds with drug-like properties, for example, with moderate hydrophilicity. Such approaches can increase the number of hits compared with crude extracts and enable more efficient follow-up of promising hits (Wagenaar 2008). Several large natural product fraction libraries have been established. One of the largest also belongs to NCI, which recently launched the Program for Natural Products Discovery (NPNPD) Prefractionated Library, with a goal of generating 1,000,000 fractionated samples for HTS assays (Thornburg et al. 2018). The Griffth Institute for Drug Discovery (2022) also has a rich arsenal of fractions in its collection, that have been derived from Australian plants, fungi, and marine invertebrates. Pre-fractionated extracts are considerably less complex than crude extracts and typically show better screening performance and enhanced biological activity, due to the concentration of active components present only as minor metabolites. Pre-fractionated extracts also facilitate the differentiation of active compounds from cytotoxic compounds and agonists from antagonists (Butler et al. 2014). In addition, they enable a rapid isolation and structural elucidation of biologically active natural products.

Pure Natural Compounds Libraries

Based on advances in chromatographic and spectroscopic techniques, it is now possible to identify new or commercially unavailable bioactive compounds and rapidly purify them for pharmaceutical, cosmetic, agrochemical, and food additive assays.

A pure natural products library comprises structurally diverse set of natural products with known structures and physicochemical properties, as well as high purity (Butler et al. 2014). Screening this type of library is attractive to groups that are not interested in biologically guided isolation, although they may still be willing to access the chemical diversity of natural products. One of the biggest commercial players in this area is Curia (2022), which has assembled a collection of nearly 300,000 samples derived from marine and land microorganisms and plants that are available in a screening-ready format. AnalytiCon Discovery (2022) has isolated about 20,000 different natural products and currently offers libraries with about 5,000 isolated compounds.

Selleckchem.com (2022) sells an exclusive library with 2,658 isolated natural compounds and crude extracts for HTS screening. Selleckchem has advantages over most companies that sell libraries, such as sharing rich documentation with structure, validated NMR and HPLC to ensure high purity, in addition to source description, biological activity, and even mechanism of action for some compounds. Table 1 presents a curation of natural products libraries available commercially and through partnership programs.

Table 1 Commercial- and public- available Natural Products Library

Although it may seem attractive to generate large numbers of pure compounds and pre-fractionated for screening, there is a considerable effort and monetary investment required to generate, store, and perform quality control analysis of these samples (Butler et al. 2014). A cost–benefit analysis indicates that pre-fractionation provides a significant benefit in screening, in comparison to the investment in enrichment of the screening set. The cost associated with preparing a natural product library of pure compounds is significantly greater than the effort to pre-fraction the extracts. (Quinn 2012).

Currently, the construction of internal natural product libraries, either of extracts, fractions or isolated compounds, is not a practice commonly performed by pharmaceutical companies. In the case of isolated compound libraries, this is even more evident. This is due to the long time between isolation and elucidation of the natural active structure. It is also directly related to the resupply challenge for hit-to-lead and preclinical studies. The decision whether a natural product library should be access commercially or build exclusively for internal use depends mainly on the strategic plan of each pharmaceutical company. These library facilities have been implemented in academic environments and biotechnology companies to track new drug leads. This tracking is established through partnerships or commercialization of these hits. Partnerships among industries, universities and research institutes are desirable, especially when it comes to innovation. In developed countries, it is already a consolidated practice. An example of this, regarding large natural product libraries, is the Natural Products Discovery Institute (NPDI), which internalized the previous libraries from Merck and Schering-Plough (Wilson et al. 2020). Alternatively, pharmaceutical companies access NP libraries commercially.

Some commercially available libraries present limitations, such as the absence of chemical characterization studies and, in some cases, the absence of the species names that compose the libraries. Such information is important for fostering the research process, and even for the decision of which libraries will be acquired, thus avoiding the potential overlap of species and genera. The pharmaceutical industry demands agility in the drug discovery steps, so the development of internal natural product libraries compatible with screening assays (HTS) may accelerate the early discovery stage of the species with biological activity interest. The following topics will address basic points and principles for building libraries based on natural products from plant origin for HTS purposes. The overview of the main steps is illustrated in Fig. 1.

Fig. 1
figure 1

Overview of the main steps for building natural product libraries from plant. (1) Land plants are the starting point for natural sources selection in this article. (2) Botanical identification and herbarium registration are essential requirements for all subsequent steps. (3) Legal agreements must be mutually agreed concerning diversity conservation and benefit sharing. (4) Sample collection. (5) Raw material pre-treatment: for plants, drying and grinding processes are essential to guarantee the material conservation. (6) Stored pre-treated raw material maintains compound integrity. (7) Preparation of crude plant extracts, preferably using solvents and extraction techniques in the context of green chemistry. (8) Fractionation of crude extract. (9) Isolation and identification of secondary metabolites in crude extracts and fractions. (10) Biological assay plates containing crude extracts, fractions, and isolated natural products

Collection

The collection of plant material, botanical identification, as well as herbarium registration are essential requirements and the basis for all subsequent steps. It is extremely important to document information about the raw material, such as: the collection site including geographic coordinates and date; collector identification and institution; collection method; along with relevant characteristics of the habitat and taxonomic knowledge of species. Such information ensures a greater probability of the collector succeeding in finding the exact location if further collection is required. This data is also useful to understand eventual phytochemical differences between collections of the same specimen. In addition, the recording of all these data is crucial for the establishment of a database for traceability purposes, as well as for conserving and understanding biological diversity. There are two main approaches for the selection of candidate species to be collected. It is possible to rely on ethnopharmacological knowledge, generated from the traditional use of the species, or to carry out random collections. Both approaches have benefits and challenges.

The collection approach based on empirical knowledge has a successful track record. For instance, several active constituents, including berberine, morphine, and picroside present in the species Berberis aristata Sims., Berberidaceae, Papaver somniferum L., Papaveraceae, and Picrorhiza kurroa Royle ex Benth., Plantaginaceae, respectively, have been isolated through this approach. However, if the access to species is not aligned with the protection conventions as discussed above, legal problems may arise with ethical groups or with the original country of traditional knowledge (Najmi et al. 2022).

The random approach is favorable when plant species from a region of high biodiversity need to be tracked. This strategy increases the chance of finding new species with unexpected biological activities and new constituents. This approach resulted, for example, in the discovery of taxol and camptothecin, isolated from Taxus brevifolia Nutt., Taxaceae, and Camptotheca acuminata Decne., Nyssaceae, respectively (Oberlies and Kroll 2004; Wani and Horwitz 2014) Both represent successful cases from the National Cancer Institute natural product library screening program. Although the random approach offers a good chance of success, the strategy has weaknesses since it does not provide any prior information about the biological activity of the selected species (Najmi et al. 2022).

Raw Material Pretreatment

After the collection and identification of the species, a pre-treatment of the plant material must be carried out, such as drying and grinding, processes that can directly influence the chemical composition of the extracts. Drying removes water from both the surface and inside part of plant material and it aims to stop the metabolic processes of plant tissues, avoiding possible degradation of bioactive compounds and, also, the development of fungi and bacteria during their storage. Several drying methods were developed, aiming to improve in the quality of the dried plant material, a greater retention of active compounds, volatile or not, better energy conservation and better process efficiency.

Drying can be performed outdoors, in semi-open environments with air circulation or in rooms with heating methods, where it is possible to define the temperature and air pressure range (Thamkaew et al. 2021). In addition to these previous methods of raw material conservation, new ones, also modern and effective, can be used on a laboratory scale, such as drying by convention, microwave vacuum drying and molecular drying, known as lyophilization (Krakowska-Sieprawska et al. 2022).

Hot-air oven drying (temperature range of 40–60 °C) is the most common drying method used in plant studies in lab-scale experiments (Shaw et al. 2016). The major advantage of hot-air drying is the capacity to control the process, such as drying temperature, drying time, and air velocity. These parameters can be adjusted to achieve the desired material properties. However, attention should be given when the use of high temperatures can lead to aroma and pigment degradation, and volatile actives evaporation. Another major disadvantage of hot-air drying is the high energy consumption (Thamkaew et al. 2021).

Convection drying is one in which a current of drying agent (dry gas, in most cases, air) flows around the plant material bringing heat and diminishing moisture. This method is more often used on an industrial scale and its main advantage is the possibility of obtaining a relatively cheap product. However, its disadvantage is the combination of long drying time and high temperature which can, in some cases, lead to the degradation of thermolabile bioactive compounds (Krakowska-Sieprawska et al. 2022).

Sun drying is the oldest drying method that has been used and is still used to dry medical plants and aromatic herbs in most tropical or sub-tropical countries. During the process, fresh plants are exposed directly to the sunlight (Janjai and Bala 2012). Studies show that sun drying causes color degradation and a decrease in volatile components in plants when compared to hot air drying or shade drying (Omidbaigi et al. 2007; Hassanpouraghdam et al. 2010). Sun drying can also cause damage to the epidermal surface and glandular trichomes (Alara et al. 2018). Shade drying is another method of drying plants. Ventilated air is heated using solar energy before passing through the plants housed in a room with ventilation, low humidity (22–27%) and no direct exposure to sunlight. This method offers advantages due to its ability to preserve light-sensitive substances and minimize light-induced chemical reactions such as oxidation (Thamkaew et al. 2021). Studies have observed that shade drying is a better drying method in terms of preservation of essential oil content, retention of bioactive compounds in dried plants and coloration compared to other methods such as hot air drying, sun drying, microwave drying and freeze-drying for many types of plants (Omidbaigi et al. 2007; Hassanpouraghdam et al. 2010; Ebadi et al. 2015). Two other drying methods that have been widely reported are freeze-drying and microwave drying. Lyophilization enables a low operating temperature, preserving the aroma of leaves, chemical composition and extraction yield in species such as Mentha spicata L. and Ocimum basilicum L., both from the Lamiaceae family, as well as Coriandrum sativum L., Apiaceae (Antal et al. 2011; Pirbalouti et al. 2013; Ghasemi Pirbalouti et al. 2017). Microwaves, on the other hand, provide relatively shorter drying times compared to convective drying, resulting in a greater retention of phenolic compounds and flavonoid content as observed in the leaves of Gynura pseudochina (L.) DC., Asteraceae (Sukadeetad et al. 2018).

Natural products libraries are composed of an arsenal of different plant species, therefore, it is necessary to consider the anatomical characteristics of each species when defining the drying method, as well as the definition of drying parameters. The milling process also influences the extraction efficiency. It is recommended that the particle size of the plant material should be around 0.2 mm. This degree of fragmentation increases the contact surface of the sample with the extracting solvent, increasing the homogeneity of the sample and the reproducibility of the extraction (Krakowska-Sieprawska et al. 2022). A study performed by Gião et al. (2009) observed the effect of particle size on the extent of extraction of phenolic compounds in three important medicinal plants: agrimony (Agrimonia spp. L, Rosaceae, sage (Salvia officinalis L., Lamiaceae), and savory (Satureja spp. L., Lamiaceae). At the end, an increase in the antioxidant capacity of the species was verified when smaller particle sizes were used.

Storage and Sample Quality

For a natural products library, it is essential to have a certain amount of intact pre-treated raw material stored, since compound degradation can occur in extracts and fractions which are stored for a long time. Fortunately, current strategies for collecting biota samples demand much less material than was previously required by screening programs. Dried and ground plant samples can keep the compounds' integrity, allowing new extract preparations without the need of new collections. The raw material must be stored, individually, in properly sealed and identified containers and their location must be recorded in a database system. These containers must be placed in a controlled humidity environment since high humidity can result in contamination by fungi or other microorganisms.

When it comes to extracts and fractions storage, there are two commonly practiced techniques: dry conditions and samples in solution. In the first case, the dry sample must be stored in Eppendorf or Falcon conical tubes, kept between -20 and 20 ºC within a nitrogen or low humidity atmosphere (Quinn 2012). In the second technique, samples in solution must be kept preferably in 96, 384 or 1,536 well microplates at -20 ºC (Mishra et al. 2008). This is the most common practice used by companies that sell libraries.

High-throughput screening assays are generally conducted on samples dissolved in dimethyl sulfoxide (DMSO), an aprotic solvent with intermediate polarity and capable of dissolving a variety of polar and non-polar compounds (Najmi et al. 2022). A challenge for this storage is the DMSO hygroscopic nature, which can lead to precipitation of the sample as water is absorbed from the air (Waybright et al. 2009). Sample oxidation may also occur, in addition to precipitation due to freezing and thawing cycles. If screening samples should be stored at dry conditions, efficient resolubilization is particularly important. Waybright et al. (2009) found that glycerol addition, as a nonvolatile cosolvent, simplifies solubilization of samples. A solvent combination of DMSO/glycerol/water (45:45:10, v/v) would avoid changes in water content in samples, though it did not allow high stock concentrations.

Extraction

Plants contain a wide variety of compounds with differences in their physicochemical properties, so there is no single solvent that solubilizes all of them. Therefore, plant material extraction can be performed using a variety of polar and non-polar solvents. In general, the chemical classes of the metabolites present in the extract can be expected depending on the polarity of the solvent used. Lipophilic compounds (low polarity constituents) such as oils, steroid fatty acids, hydrocarbons, and low polarity terpenoids are extracted in nonpolar solvents such as hexane and ether; while compounds with medium polarity, such as phenolics and alkaloids, are usually present in ethyl acetate and chloroform extracts. Highly oxygenated and high polarity compounds, such as saponins, flavonoids, and glycosidic alkaloids, are generally obtained in aqueous, hydroalcoholic, ethanolic and methanolic extracts (Najmi et al. 2022).

The extraction method can strongly influence the chemical composition, and thus the biological activity of the extract. Therefore, the selection of the extraction method, including the solvent, needs to be carefully considered. Extraction methods developed to increase extraction efficiency include the use of ultrasound (Dias et al. 2021), microwave pressure (Lin et al. 2022), supercritical fluid (Kaplitz et al. 2022), ionic liquids (Tang et al. 2012), and deep eutectic solvents (Duru et al. 2022). Although these newer methods can be useful, a simple method such as extraction with slightly aqueous EtOH using gentle agitation captures a broad range of bioactive molecules and is usually more than adequate for biological screening purposes. Another popular extraction solvent for screening libraries is ethyl acetate, which extracts less polar material compared to ethanol, resulting in cleaner extracts of mid-polarity compounds (Butler et al. 2014).

Extraction conditions for bioactive compounds of interest can be subsequently optimized upon re-extraction or scale-up studies. For production feasibility, the range of safe solvents is more restricted to ensure patient and consumer safety. According to the European Medicines Agency (EMA 2021), the scientific guideline ICH Q3C (R8) recommends the use of less toxic solvents and describes levels considered to be toxicologically acceptable for some residual solvents. In addition to scientific guideline ICH Q3C, Prat et al. (2014) made an interesting comparison of several of these guides and elaborated a table with a compilation of them. Marco et al. (2019) discussed the use of safer solvents and auxiliaries, in the context of green chemistry. Concerning the toxicity of solvents, there are several guidelines carried out by many institutions and companies. Marco et al. (2019) bring an interesting comparison of several of these guidelines, which classifies the solvents into six categories: “recommended”, “recommended or problematic?”, “problematic”, “problematic or hazardous?”, “hazardous” and “highly hazardous”.

The distinction between hazardous and highly hazardous is a source of endless discussions, complicated by the fact that only one pharmaceutical company (Sanofi) has a published list of “banned” solvents (Prat et al. 2014). Three solvents were, for example, classified as highly hazardous: hexane, nitromethane and pentane. Other solvents were classified as hazardous, such as diethyl ether, chloroform, dichloromethane, among others (Prat et al. 2014; Marco et al. 2019).

Among the recommended solvents are water, ethanol, isopropyl alcohol, n-butanol, ethyl acetate, isopropyl acetate, and sulfolane. The use of organic solvents and other reagents should be avoided when possible. Whenever not possible, these substances should be innocuous and controlled.

Fractionation and Isolation

Building a library of fractions from natural extracts presents several advantages, including the removal or separation of highly polar and nonpolar materials, concentration of minor metabolites, and separation of undesirable compounds. There are different prefractionation methods, ranging from near-complete purification to rough polarity separation into a small number of fractions. Prefractionation involves the separation of crude extracts by solid phase extraction (SPE), column chromatography, high performance liquid chromatography (HPLC), liquid–liquid partitioning, or some combination of the above to obtain fractions containing simpler mixtures (Quinn 2012). Appleton et al. (2007) have taken the approach of using reverse-phase C18 HPLC to collect four fractions starting from the solvent front peak. Eldridge et al. (2002) performed an organic extraction followed by silica flash chromatography and an aqueous extract followed by C18 flash chromatography. Subsequent HPLC collected forty fractions from each of the four flash fractions from the organic extract. Regarding the aqueous extract, it was passed through C18 and polyamide, and the single aqueous flash fraction was submitted to HPLC, resulting in forty fractions. The HPLC fractions that contained quantifiable compounds, approximately 60% of the total, constituted the library. This process led to a library of 36,000 fractions containing one to five compounds per well.

The program for Natural Products Discovery’s (NPNPD) Pre-fractionated Library may have the largest portfolio of fractions based on natural products in the world. The library of over 125,000 extracts was pre-fractionated using SPE. Then, each crude extract was separated into seven fractions, resulting in a library of > 1,000,000 fractions (Thornburg et al. 2018; Martínez-Fructuoso et al. 2023). Another approach is to purify as many components of an extract as possible and to determine the compound’s structure. This approach has two main advantages: 1) The isolated compounds can be treated the same as any other compound in a library; and 2) an immediate assessment of the compound’s potential can be made during hit evaluation, thus eliminating the time delay between hit extract identification and isolation of the hit natural product (Abel et al. 2002).

MEGAbolite®, a project initiated by AnalytiCon Discovery, in collaboration with Aventis, generated a collection of 4,000 pure NPs using high-throughput profiling, isolation, and structure elucidation technologies. The isolation and characterization of compounds were performed independently of the biological activity. Pre-fractionation was performed by flash chromatography, isolation by semi-preparative HPLC and structural elucidation by NMR. The non-redundancy within the collection was secured by characterizing each compound by its Liquid Chromatography coupled to Mass Spectrometry (LCMS) profile. To date, 60% of the compounds generated by the project (2,400 compounds) have undergone HTS against nine different targets. Pure natural product libraries showed higher hit rates than synthetic compound libraries in five out of nine assays. This finding suggests that the use of natural products as sources of new drugs would be the ideal approach for modern drug screening (Bindsei et al. 2001).

From the perspective of biologically guided isolation, Grkovic et al. (2020) developed an automated, high-capacity, and high-throughput procedure for the rapid isolation and identification of biologically active natural products from a pre-fractionated library. The semipreparative HPLC method uses 1 mg of the primary hit fraction and produces 22 subfractions in an assay-ready format. Following screening, all active fractions are analyzed by NMR, LCMS and Fourier transform infrared spectroscopy (FTIR), and the active principle structural classes are elucidated. These approaches have the potential to significantly reduce the time for identification of bioactive natural products, optimize screening costs, and enable faster outcomes for evaluating and identifying natural products in HTS (Bindsei et al. 2001). The biggest disadvantage of this approach is that minor compounds can be overlooked and, therefore, never reach a screening set (Quinn 2012). From an economic point of view, an optimized sample preparation in the form of isolated pure compound includes significant investments before screening. However, the overall process from screening to a validated lead is faster and less expensive when pure natural products libraries are used as primary raw materials and not crude extracts.

Analytical Techniques

The recent advances as high-performance computer hardware, extensive storage, accessible software and affordable online education have democratized the use of Artificial Intelligence in many sectors and research areas. The last decades have introduced natural language processing and machine learning algorithms, two subfields of Artificial Intelligence, to tackle natural products drug discovery challenges and open up opportunities (Saldívar-Gonzáles et al. 2022).

Scientists have strategized different approaches to reduce redundant natural crude extracts with the early chemical profiling of untargeted natural products. They have notably prioritized natural crude extracts using analytical chemistry techniques, i.e., gas/liquid chromatography, nuclear magnetic resonance spectroscopy, mass spectrometry, and combinations thereof (Saldívar-Gonzáles et al. 2022).

The increasing data digitization has enabled the implementation of mathematical and statistical methods. The field of chemometrics has leveraged the multivariate statistical analysis of data from the aforementioned techniques and from the optical radiation (i.e., infrared, visible and ultraviolet) for the rapid identification of known and unknown bioactive natural products from natural crude extracts (Saldívar-Gonzáles et al. 2022).

Artificial Intelligence and Machine Learning algorithms have slowly integrated different stages of natural products drug discovery, in order to both assist discovering and elucidating bioactive structures and to capture the molecular patterns of these privileged structures for molecular design and target selectivity.

Applications powered by Machine Learning, a subfield of Artificial Intelligence, have accelerated the discovery of new natural chemical candidates. The inclusion of new Artificial Intelligence technologies started at the turn of the twenty-first century with encoding their structures into computer-readable formats and generating chemical space visualization methods to manage and interpret the many naturally occurring compounds present in publicly available databases (Saldívar-Gonzáles et al. 2022).

In the 2010s, the development of Machine Learning models, i.e., regressions and classifications, to predict the biological activity/property of natural products has pushed candidates towards more advanced stages of drug development. It is worth noting that many predictions might inadvertently discard several bioactive natural products due to their striking physicochemical and structural differences with the model training sets. The limitations of these predictive models, also known as the applicability domains, are not systematically identified. Future improvements of predictive Machine Learning models should include an understanding of the scope and limitations of the available data (Saldívar-Gonzáles et al. 2022).

Obtaining precise information on the complete set of small chemicals or metabolomes of complex natural extracts that are primarily obtained from plants or microorganisms is a challenging task that requires sophisticated and advanced analytical methods. Advances in analytical instrumentation used in natural products research, associated with computational tools, have allowed the application of 'omics' approaches, such as metabolomics, in natural products-based drug discovery (Liu and Locasale 2017).

Metabolomics was developed as an approach to analyze multiple metabolites simultaneously in biological samples. Enabled by technological developments in chromatography and spectrometry, metabolomics was first applied in other areas of research, such as biomedical and agricultural sciences (Harvey et al. 2015). Metabolomics can provide accurate information about metabolite composition in extracts, thereby helping to prioritize natural products for isolation, accelerate dereplication, and discover unknown analogues and new scaffolds. Metabolomics can also detect differences between metabolite compositions in various physiological states of producer organisms and thus provide a means of monitoring the production of target molecules during production processes (Atanasov et al. 2021).

With the significant development that is occurring in metabolomics for biology and natural product research, dereplication analysis is gaining more and more importance from both targeted and untargeted analytical perspectives. Dereplication is the process of testing sample mixtures that are active in screening to differentiate the novel compounds from active known substances (Atanasov et al. 2021). Another important outcome of dereplication is the identification of multiple extracts or fractions that contain the same active component or biological profile. Dereplication aims to identify known compounds before proceeding to isolation (Quinn 2012).

Due to the intrinsic chemical diversity of natural products, a single analytical technique is not enough for a comprehensive analysis of a complex metabolome, requiring the use of multiple technologies. For metabolite identification, extracts are analyzed by nuclear magnetic resonance (NMR) spectroscopy, high resolution mass spectrometry (HRMS), or respective combined methods involving liquid chromatography (LC) and gas chromatography (GC). Liquid chromatography coupled to mass spectrometry is the most used technique and can separate numerous isomers present in botanical extracts (Wolfender et al. 2015). Due to its high sensitivity, HRMS detection is the gold standard for qualitative and quantitative metabolite profiling. Historically, the first databases were developed for electron ionization mass spectrometer (EI-MS) because this technique results in reproducible spectral patterns on different mass spectrometers. More recently, the use of high resolutions instruments has significantly improved dereplication procedures, allowing the use of generic databases through molecular formula searches. Various strategies have been employed, regarding detection, using MS (HRMS) and MS/MS (Wolfender et al. 2015).

NMR analysis of extracts is simple and reproducible, and provides direct quantitative information and detailed structural information, although it has relatively low sensitivity, meaning that it generally enables profiling only of major constituents. This flexible technique is used both directly for metabolomics of unfractionated extracts and for structural characterization of compounds and fractions obtained with appropriate separation methods, most often LC (Atanasov et al. 2021). After obtaining the data, the process of unequivocally identifying process the metabolites can also be challenging. The determination of molecular mass and formula combined with a cross-searching in the literature or structural natural products databases are very helpful in this process.

To accelerate the identification of bioactive NPs in extracts, several useful platforms have been created for metabolite identification, such as the Dictionary of Natural Products, which encompasses all structures reported with links to their biological sources (Wolfender et al. 2019). Another example is The Global Natural Products Social (GNPS) molecular networking platform developed in the Dorrestein laboratory, an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data (Wang et al. 2016). The METLIN platform includes a high-resolution MS/MS database with a fragment similarity search function. The platform was designed to address the characterization of known and unknown molecules (Guijas et al. 2018). Other databases and in silico tools such as Compound Structure Identification (CSI): FingerID and Input Output Kernel Regression (IOKR) can be used to search available fragment ion spectra, as well as to generate predicted spectra of fragment ions not present in current databases (Aksenov et al. 2017). Other free chemical structure databases are ChemSpider, PubChem, and Chemical Abstracts Service (CAS), which together contain more than 259 million structures and provide physicochemical information that has been compiled from hundreds of data sources (Aksenov et al. 2017).

Databases are key tools for investigating the incredibly diverse chemistry that is associated with natural products. With modern computer power, increasingly sensitive and faster analytical techniques, and a better understanding of algorithms it is becoming easier to understand the data that are being obtained from global MS inventories. All these tools together contribute to the precise identification of the chemical composition of plant extracts.

Perspectives and Future Directions

Research with natural products presents challenges for drug discovery, such as access and use of biological resources, technical barriers to screening, isolation, phytochemical characterization, and resupply. Therefore, different approaches were presented as possible for a natural product-based drug discovery process, as well as strategies used to assemble and manage natural product libraries of plant origin. We expect the curation of commercially available natural products libraries presented in the article, as well as all comments with a pharmaceutical industry perspective, will contribute to the discovery of new bioactive natural products.

Conclusions

Natural products have historically proven their value as a source of molecules with therapeutic potential. This review has presented several studies emphasize that natural products and their derivatives play an important role in the drug discovery and development process. The introduction of high-throughput screening and the miniaturization of bioassays have created a need to optimize natural product samples to better suit these new technologies. Although natural product screening has been an old and very successful practice, the efficiency of tracking new drug leads is tied to advances in creating high-quality libraries with the ability to extract, fractionate, isolate, identify, and rapidly replenish pharmacologically active compounds.

For natural products library from plants, the collection of plant material, botanical identification, as well as herbarium registration and mutually agreed upon terms with each participating source country (when it comes to collection biota internationally) are essential requirements and basis for all subsequent steps. Analytical techniques are fundamental for the construction and follow-up of natural products libraries. For the construction of crude extracts libraries, the use of hyphenated techniques such as LC–MS, GC–MS and LC-NMR, determine the extracts' fingerprint and guarantee the non-replication and non-redundancy in the species choices that make up the collection. In addition, the techniques can be used in sample quality control during storage, stability, and solubility studies. For the construction of fractions and isolated natural compounds libraries, the use of chromatographic methods allows the isolation and structural elucidation of the compounds.

This review has presented some of the recent advances in those technologies and has shown, when applied in combination, how they can facilitate an efficient process for building natural product-based libraries for HTS. We hope that this review will help researchers to consider all of these aspects when creating or acquiring screening libraries and that some of the technologies described here contribute to the discovery of novel bioactive natural products.