Introduction

Actinobacteria is a group of aerobic, spore-forming Gram-positive bacteria; they belong to the order Actinomycetales. It harbors GC content of DNA ranges from just under 50% (e.g., Hoyosella and Tropheryma) to over 70% (e.g., Frankia and Streptomyces), which are recorded from 16S ribosomal cataloging and DNA-rRNA pairing studies (Korn-Wendisch & Schneider, 1992). Now, it signifies one of the largest taxonomic groups recognized within the domain Bacteria (Ventura et al., 2007). The name Actinobacteria originated from Greek “atkis” (a ray) and “mykes” (fungus), having criteria of both Bacteria and Fungi (Das & Khosla, 2009), but it has sufficient different features to restrict it into kingdom Bacteria.

Actinobacteria are generally aerobic; however, some genera are facultative or obligatory anaerobic. They are chemoheterotrophic using diverse energy sources and complex polymers. Most of the Actinobacteria are free-living in a wider range of habitats in nature such as water (Ma et al., 2009), soil (Elbendary et al., 2018), the greatest depth of the ocean (Pathom-Aree et al., 2006), Antarctica (Rego et al., 2019), and desert soil (Busarakam et al., 2016). It has been reported that Actinobacteria are isolated from all layers of soil but gradually decreases with increasing the depth (Takahashi & Omura, 2003). In addition, some are pathogens for humans (Könönen & Wade, 2015), animals (Ertaş et al., 2005), and plants (Lerat, 2009). Actinobacteria have different morphological features that ranged from relatively simple rods and cocci to complex mycelial organization similar to eukaryotes. They were considered as an intermediate group between bacteria and fungi but now are recognized as prokaryotic organisms. Actinobacteria are considered as Gram stain-positive or Gram stain variable. The cell wall of Actinobacteria is rigid to keep the shape of the cell and avoid bursting at high osmotic pressure (Colquhoun et al., 1998). It is formed of different complex compounds such as peptidoglycan, teichoic, teichuronic acid, and polysaccharides (Davenport et al., 2000). The trend of Actinobacteria identification has been and still augmented, as it is a challenging signal for further exploration of new natural compounds. In this review, we highlight the procedures of rare Actinobacteria identification and manipulation of their biosynthetic gene cluster (Fig. 1).

Fig. 1
figure 1

Identification of bioactive rare Actinobacteria. The following flow chart demonstrates steps for the identification of Actinobacteria on the genus level and their biosynthetic genes. Additionally, it gives different options to manipulate these genes and hence monitor the biosynthetic capability of the Actinobacteria

Taxonomy of Actinobacteria

According to the first edition of Bergey’s Manual of Systematic Bacteriology, Actinobacteria belonged to the order Actinomycetales and was subdivided into 4 families Streptomycetaceae, Actinomycetaceae, Actinoplanaceae, and Mycobacteriaceae. The taxonomy of Actinobacteria has evolved considerably over time with the buildup of information. In the second edition of Bergey’s Manual of Systematic Bacteriology, Actinobacteria were included separately in the fifth volume. Phylum Actinobacteria is separated into 6 classes: Actinobacteria, Acidimicrobiia, Coriobacteriia, Nitriliruptoria, Rubrobacteria, and Thermoleophilia. Class Actinobacteria is subdivided into 16 orders: Actinomycetales, Actinopolysporales, Bifidobacteriales, Catenulisporales, Corynebacteriales, Frankiales, Glycomycetales, Jiangellales, Kineosporiales, Micrococcales, Micromonosporales, Propionibacteriales, Pseudonocardiales, Streptomycetales, Streptosporangiales, and Incertae sedis.

Bergey’s Manual of Systematics of Archaea and Bacteria showed that phylum Actinobacteria includes 5 classes, 19 orders, 50 families, and 221 genera. However, many novel taxa continue to be discovered, so this listing is certainly unfinished. The class Actinobacteria and fundamental taxonomic ranks above the genus level were proposed exclusively on the basis of 16S rRNA gene sequence-based groups and taxon-specific 16S rRNA gene sequences. This classification represented an obvious change in the classification of Actinobacteria above the genus level as it showed that previous classifications based on the form and function did not reflect natural relationships. Actinobacteria have been assigned the rank of a phylum as the phylogenetic depth signified by the lineage resembles that of existing phyla on the basis of its branching position in 16S rRNA gene trees (Barka et al., 2016).

Basics of Actinobacteria identification

Originally, classical approaches for the identification of Actinobacteria were based on morphological observations, chemotaxonomy, and physiological criteria. Morphological observations were performed to identify an unknown strain to the genus level including identification of aerial mycelium presence, color of substrate mycelium, color of aerial mycelium, ornamentation of spores, and production of soluble pigments (Barka et al., 2016; Shirling & Gottlieb, 1966; Messaoudi et al., 2015; Amin et al., 2017a). Chemotaxonomic criteria such as the detection of diaminopimelic acid (DAP) isomers are one of the most important cell wall properties of Actinobacteria. Determination of the DAP isomers LL (Levo) form or the meso is usually sufficient for the characterization of the Actinobacteria groups (Messaoudi et al., 2015; Amin et al., 2017a; Hasegawa et al., 1983).

A wide range of physiological characteristics has been evaluated, including carbohydrate utilization profile, nitrogen source utilization profile, degradation or hydrolysis of numerous substrates, and sensitivity to various inhibitors (Shirling & Gottlieb, 1966; Messaoudi et al., 2015).

The identification of Actinobacteria via traditional methods such as phenotypic characteristics is not as accurate as genotypic methods. 16S rRNA gene sequence analysis has been recognized as a powerful tool for the identification of poorly described, rarely isolated, or phenotypically aberrant strains and can lead to a unique phylogenetic analysis of newly isolated strains (Heuer et al., 1997; Monciardini et al., 2002; Busti et al., 2006). However, it is not yet possible to complete a comprehensive suprageneric classification of Actinobacteria from the results of partial sequencing of the 16S rRNA and could not distinguish between closely related species or even genera (Colquhoun et al., 1998). An updated taxonomy of the phylum Actinobacteria based on 16S rRNA trees was recently described (Korn-Wendisch & Schneider, 1992). That bring up to date classification removed the taxonomic ranks of subclasses and suborders, moving the prior subclasses and suborders to the ranks of classes and orders, respectively (Barka et al., 2016). The phylum is consequently distributed into six separate classes including Actinobacteria, Acidimicrobiia, Coriobacteriia, Nitriliruptoria, Rubrobacteria, and Thermoleophilia. The class Actinobacteria contains 16 orders, comprising previously suggested orders, Actinomycetales and Bifidobacteriales (Hasegawa et al., 1983). The order Actinomycetales is now delimited only to the members of the family Actinomycetaceae (Barka et al., 2016). Several genera were identified by 16S rRNA gene sequence analysis such as Streptomyces, Micromonospora, Kribbella, Actinomadura, and Saccharopolyspora (Patel et al., 2004).

Definition of rare Actinobacteria

Rare Actinobacteria are defined as certain types of Actinobacteria that are difficult to isolate. Molecular tools indicate that the so-called rare Actinobacteria are relatively abundant in various habitats, and they can be retrieved in large numbers if a suitable isolation method is available. We believe that exploring rare Actinobacteria, which is difficult to isolate, will result in chemical diverse active compounds (Donadio et al., 2002). Some genera belonged to this group such as Actinomadura, Actinoplanes, Amycolatopsis, Actinokineospora, Acrocarpospora, Actinosynnema, Catenuloplanes, Cryptosporangium, Dactylosporangium, Kibdelosporangium, Kineosporia, Kutzneria, Microbiospora, Microtetraspora, Nocardia, Nonomuraea, Planomonospora, Planobispora, Pseudonocardia, Saccharomonospora, Saccharopolyspora, Saccharothrix, Streptosporangium, Spirilliplanes, Thermomonospora, Thermobifida, Virgosporangium, Micromonospora, and some uncommon species of Streptomyces (Lazzarini et al., 2000; Mazza et al., 2003).

Rare Actinobacteria habitats

Although soil is the main habitat of rare Actinobacteria, they also can be isolated from different ecological niches. The population and types of rare actinomycetes in each ecosystem are affected by various factors such as soil type, pH, humus content, and humus type (Tiwari & Gupta, 2013). An Egyptian research group selectively isolated rare Actinobacteria genera including Actinomadura, Actinoplanes, and Micromonospora from soil samples in Egypt (Abd-allah et al., 2012). Another study declared the isolation of rare Actinobacteria from shallow water sediments of the Trondheim Fjord (Norway) including Micromonospora, Actinocorallia, Actinomadura, Knoellia, Glycomyces, Nocardia, Nocardiopsis, Nonomuraea, Pseudonocardia, Rhodococcus, and Streptosporangium genera (Bredholdt et al., 2007). Biodiversity of rare Actinobacteria in water was also reported from Lake Baikal revealing isolates belong to the genus Micromonospora (Terkina et al., 2002). Additionally, rare Actinobacteria inhabit extremophile ecological niches such as caves with low temperatures, high relative humidity, low amount of organic nutrients, and high mineral concentrations. Isolates belonging to the Nocardia and Micromonospora genera were isolated from El Gola cave, Sinai, Egypt (Mansour, 2003). Moreover, Nocardia altamirensis was isolated from the Altamira Cave, Cantabria, Spain (Jurado et al., 2008). The extreme desiccation condition of hyper-arid deserts is often accompanied by high temperature, low water activity, and intense radiation (Bull, 2011). Amin et al. declared the isolation of Micromonospora and Kribbella genera from the Sinai Desert, Egypt (Tolba et al., 2013).

Isolation of rare Actinobacteria

Isolation of rare Actinobacteria by conventional dilution plate methods is difficult. They require a complicated procedure for isolation, preservation, and cultivation because they are usually masked by fast growers such as bacteria, fungi, and common streptomyces (Lazzarini et al., 2000). So, new isolation methodologies were developed focusing on physical and chemical pretreatments for isolation samples’ previous dilution plate methods. Many pretreatments are used such as dry heat, phenol treatments (Hayakawa et al., 2004), sucrose gradient centrifugation (Yamamura et al., 2003), and sodium dodecyl sulfate treatment. These treatments eliminate non-filamentous bacteria from the samples and suppress fungal growth that in turn promotes the growth of slow-growing rare Actinobacteria (Stanek et al., 2011). Appropriate selective media containing macromolecules like casein, chitin, and humic acid are important for promoting the growth of rare Actinobacteria and suppressing bacterial and fungal contaminants. It has been confirmed that the addition of antibacterial and antifungal antibiotics such as anisomycin, cycloheximide, gentamicin, kanamycin, nalidixic acid, novobiocin, nystatin, penicillin, primaricin, polymyxin, rifampicin, streptomycin, tunicamycin, and vancomycin to the isolation media promotes the selection of rare Actinobacteria (Hong et al., 2009).

Role of rare Actinobacteria in antibiotic production

In 2018, the World Health Organization declared the occurrence of antimicrobial resistance everywhere in the world as a great challenge to public health (Roca et al., 2015; Tillotson, 2018). The emergence and spread of multi-resistant pathogens became near to all known antibiotics (Yong et al., 2009), which cause the urgent need for searching for new antibiotics. The emergence of multidrug resistance among bacteria including Staphylococcus aureus, members of the ESKAPE pathogens, and latterly extreme drug-resistant Mycobacterium tuberculosis is a major worldwide public health threat (Pfaller et al., 1998; Vajs et al., 2017). It has been reported that numerous bioactive compounds were isolated from Actinobacteria and inhibited multidrug-resistant pathogens such as vancomycin-resistant Enterococci, methicillin-resistant Staphylococcus aureus, Shigella dysenteriae, Klebsiella sp., Escherichia coli, and Pseudomonas aeruginosa (Selvameenal et al., 2009; Severin et al., 2014). Nowadays, it is highly important to explore new antibiotics in order to combat multidrug-resistant pathogens. We believe that exploring the biosynthetic potential of rare Actinobacteria will reveal novel structures with useful biological activities (Koehn & Carter, 2005; Baltz, 2006; Pelaez, 2006; Bull & Stach, 2007; Dancer, 2004).

Rare Actinobacteria isolates are a rich source of antibacterial agents, anti-parasitics, antifungal agents, herbicides, pesticides, anticancer, and immunosuppressive agents and enzymes (Takahashi & Omura, 2003; Magarvey et al., 2004; Singh & Barrett, 2006; Hacene et al., 2000). Different genera of Actinobacteria produce valuable bioactive molecules such as rifamycins from Amycolatopsis mediterranei (Solanki et al., 2008), erythromycin from Saccharopolyspora erythraea (Oliynyk et al., 2007), teicoplanin from Actinoplanes teichomyceticus (Somma et al., 1984), vancomycin from Amycolatopsis orientalis (Lazzarini et al., 2000), lupinacidins from Micromonospora lupini (Igarashi et al., 2007), neorustmicin from M. carbonacea (Yun-yang et al., 2007), rifamycin S from M. rifamycinica (Huang et al., 2009), erythromycin from Actinopolyspora sp. (Huang et al., 2009), and roseoflavin from Streptomyces davawensis (Grill et al., 2008).

The antibiotics produced by Actinobacteria are divided into numerous classes based upon their structure such as aminoglycosides (e.g., streptomycin and kanamycin), ansamycins (e.g., rifampin) (Floss & Yu, 1999), anthracyclines (e.g., doxorubicin) (Kremer et al., 2001), β-lactam (cephalosporins) (Kollef, 2009), macrolides (e.g., erythromycin), and tetracycline. Antibiotics produced by rare Actinobacteria have various mechanisms of action such as inhibition of cell wall synthesis (vancomycin), cell membrane damage (polyene), inhibition of DNA and RNA synthesis (quinolones and fluoroquinolones), protein synthesis inhibition (aminoglycosides, macrolides, and tetracyclines), and inhibition of essential enzymes required for folate metabolism (trimethoprim and sulfonamide).

Factors affecting antibiotic production

There are many natural products to be discovered from rare Actinobacteria (Bérdy, 2012). Searching for valuable antibiotics begins with screening unusual rare Actinobacteria to detect the best source of novel bioactive metabolites, followed by optimization of culture conditions for maximum antimicrobial compound production, antibiotic assay, chemical characterization, and identification of antibiotic substances. The nature of antibiotics produced by Actinobacteria depends upon the species, strain, and culturing conditions such as cell density, pH, incubation period, carbon sources, and nitrogen sources. The ability of Actinobacteria cultures to form antibiotics is not a fixed property but can be greatly increased or completely lost under different conditions of nutrition and cultivation. The cell density is an important factor in attaining the highest antimicrobial yield. It has been reported in some studies that the optimum pH range for antimicrobial metabolite production by Actinobacteria is 6–7 (Hamid et al., 2015; Amin et al., 2018; Amin et al., 2017b; Pharm, 2010; Ahmad et al., 2017), depending upon each strain. The incubation period of Actinobacteria greatly affects the yield of antibiotics. It was shown that the highest level of antimicrobial agent production was recorded after 6 to 8 days of incubation (Liang et al., 2008). Carbon sources are essential components in the culture media. Several reports showed that the optimum antimicrobial agent production depends upon the type and concentration of carbon sources used in culture media such as (starch, glucose, maltose, fructose, glycerol, and molasses) (Amin et al., 2017a; Amin et al., 2017b; Pharm, 2010; Taurino et al., 2011; Wang et al., 2017; Abdelwahed et al., 2012). Also, the type and amount of nitrogen sources such as ammonium sulfate, ammonium nitrate, ammonium chloride, peptone, soya bean meal, and yeast extract greatly influence the antimicrobial production by Actinobacteria (Amin et al., 2017a; Amin et al., 2017b; Ahmad et al., 2017; Taurino et al., 2011; Wang et al., 2017; Abdelwahed et al., 2012).

Antibiotic crude substances were extracted after the optimized fermentation step. Purification of the crude substance is performed using paper or thin-layer chromatography and high-performance liquid chromatography analysis which separates the metabolites according to the retention factor value. The further physiochemical analysis is important for the identification of antimicrobial agents such as infrared analysis, mass spectroscopy (Tiwari & Gupta, 2012), and nuclear magnetic resonance spectroscopy (Bérdy, 2012).

It has been shown that the manipulation of genes that encode the enzymes involved in the biosynthetic pathways is considered as a hopeful alternative approach for redesigning antibiotic structures to create new activities and overwhelmed microbial resistance to current drugs. Studying the functional analysis of biosynthetic genes is crucial for such approaches. In addition to that, the development of genetic manipulation methodologies and heterologous hosts is more genetically agreeable for antibiotic expression of biosynthetic genes (Sánchez et al., 2002).

Polymerase chain reaction screening for genes that encode the enzymes responsible for antibiotic production and studying their phylogeny and biotechnological manipulation of these genes are valuable tools for drug discovery (Amos et al., 2015; Okami & Hotta, 1988; Walsh, 2002). Bioinformatics tools are crucial for analyzing huge genomic and proteomic data that will help in the field of drug discovery and detecting novel antibiotics (Bérdy, 2012; Amos et al., 2015).

Molecular approaches for manipulating antibiotic biosynthetic clusters

Screening of non-ribosomal peptide synthetase and polyketide synthase genes

Non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) as non-ribosomal peptides (NRPs) and polyketides (PKs) are the two major classes of secondary metabolites with diverse chemical structures and a valuable source of pharmaceutically important molecules. Recent advances in genomics and genome sequencing have shown that the potential of Actinobacteria to produce molecules of pharmacological interest has been greatly under-evaluated. Full genome sequencing showed that there are three biosynthetic pathways: peptides manufactured by the conventional ribosomal assembly, NRPS metabolites, and polyketides. Non-ribosomal peptide synthetases and polyketide synthases are a group of enzymes, coded by genes responsible for the production of important antibiotic groups. The presence of NRPS and PKS genes in an Actinobacteria strain is highly related to their biosynthetic potential. The NRPS mechanism was first described in 1971 during a research on gramicidin S and tyrocidin biosynthesis. NRPSs are modularly organized with each module responsible for the incorporation of a specific amino acid. The modules consist of at least three core domains catalyzing a specific reaction in the incorporation of a monomer (Lipmann et al., 1971). Firstly, the adenylation (A) domain selects the cognate amino acid which it activates by transforming it into an aminoacyl adenylate. The thiolation or peptidyl carrier protein (PCP) domain covalently binds the activated monomer to the synthetase by a phosphopantetheinyl arm. The condensation (C) domain catalysis the formation of a peptide bond between the amino acids linked to two adjacent modules. (A) dedicated loading module, carrying only the (A), and PCP domains is the first module of the NRPS, whereas a termination module containing a thioesterase (TE) domain, which releases the peptide from the synthetase, concludes the assembly line (Lipmann et al., 1971).

Many NRPSs feature secondary specialized domains within modules that allow residue modifications. Epimerization (E) domains lead to D-isomer forms of amino acids; methylation (M), oxidation (Ox), reduction (R), formylation (F), and heterocyclization (Cy) domains enable NRPSs to biosynthesize an impressive number of diversified peptides with broad biological activities that cannot be produced by the classical ribosomal machinery (Felnagle et al., 2008). PKS consists of modules of at least three core domains: an acyltransferase (AT) domain which selects the suitable extender unit and transmits it to the acyl carrier protein (ACP) domain, wherever a thioester bond is made fixing the growing polyketide to the synthase, and a ketosynthase (KS) domain. The KS domain is responsible for the condensation between the extender unit present on the acyl carrier protein (ACP) domain of the same module and the polyketide intermediate bound to the (ACP) domain of the preceding module. Additional secondary domains such as ketoreductase (KR), oxidation (Ox), dehydratase (DH), methyltransferase (MT), enoylreductase (ER), and methylation (M) domains modify the growing polyketide molecule. Type II PKSs often feature a cyclase (Cy) domain leading to the formation of aromatic structures. The last module possesses a thioesterase (TE) domain catalyzing the release of the final product from the enzyme (Meurer et al., 1997; Moore & Hertweck, 2002).

Important antibiotics are produced by NRPS gene clusters such as amphomycin produced by Streptomyces canus (Yang et al., 2014), cephamycin from Streptomyces clavuligerus (Alexander & Jensen, 1998), daptomycin from Streptomyces roseosporus (Miao et al., 2006), and teicoplanin from Actinoplanes teichomyceticus (Somma et al., 1984). Antibiotics produced by PKS gene clusters are rifamycin from Amycolatopsis mediterranei (Stratmann et al., 1999), tetracycline from Streptomyces rimosus (Petković et al., 2006), and actinorhodin from Streptomyces coelicolor. Some antibiotics are hybrid from both NRPS and PKS gene clusters such as pristinamycin IIA from Streptomyces pristinaespiralis (Voelker & Altaba, 2001) and virginiamycin from Streptomyces virginiae (Pulsawat et al., 2007).

Several PCR assays using diverse primers were used to target PKS and NRPS genes, which is useful in selecting potent strains with a diverse biosynthetic potential due to different associations of modules (Amos et al., 2015). Recently, a research group studied the biosynthetic capability of Micromonospora sp. Rc5 isolated from Egyptian soils via NRPS and PKS PCR assays. They used eight pairs of primers demonstrating low similarity PKS gene clusters in Micromonospora sp. Rc5 compared to related PKS sequences to a database. The results revealed that these distinct clusters would probably be responsible for the production of different bioactive molecules (Amin et al., 2017c). Another study explored European soil sample biosynthetic potential using newly designed primers in NRPS and PKS PCR assays. The results revealed a surprising number of phylogenetically divergent NRPS and PKS sequences to some rare Actinobacteria such as Actinospica, Catenulispora, and Nonomuraea. They also suggested that these NRPS and PKS sequences may encode for novel bioactive compounds (Amos et al., 2015). They assumed that NRPS and PKS PCR assay limitations are the specific primer sequence designs in underexplored taxa and a great sequencing effort needed to discover all the gene clusters in these soils (Amos et al., 2015).

Actinobacteria genomic libraries

A genomic library is a collection of the total genomic DNA from a single organism and digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector. Genomic libraries are commonly used for sequencing applications. Fosmid vectors are cosmids which are a type of hybrid plasmid that contains a lambda phage cos sequence, which use the F-plasmid origin of replication and partitioning mechanisms to allow cloning of large DNA fragments. Fosmids can hold DNA inserts of up to 40 kb in size. Fosmids contain several main functional elements such as OriV (origin of replication): the sequence starting with which the plasmid DNA will be replicated in the recipient cell; tra-region (transfer genes): genes coding the F-pilus and DNA transfer process, antibiotic resistance genes as a selectable marker and lambda phage cos site sequence for packaging insert DNA integrated within fosmid into phages (Hall, 2004).

A fosmid library is prepared by extracting the genomic DNA from the target organism. DNA must be sheared into fragments of approximately 35 kb in size and cloned into the fosmid vector (Hall, 2004). The ligation mix is then packaged into phage particles, and the DNA is transfected into the bacterial host such as Escherichia coli and bacterial cells were grown on Luria-Bertani media containing an antibiotic marker. Bacterial clones with a fosmid library can only grow on the media due to antibiotic resistance genes carried on the fosmid. Fosmids may be convenient to construct libraries from complex genomes. Fosmids have high structural stability and have been found to maintain human DNA effectively even after 100 generations of bacterial growth (Shizuya et al., 1992).

Fosmid libraries have been constructed for a variety of organisms, including bacteria (Fitz-Gibbon et al., 1997), fungi (Magrini et al., 2004), plants (Meyer et al., 2008), animals (Zhang et al., 2007), and humans. Fosmid libraries can be successfully used to capture and express many functional genes such as those associated with antibiotic resistance (Udikovic-Kolic et al., 2014; Amos et al., 2014). In addition to that, fosmid libraries are used to evaluate the diversity of biosynthetic gene clusters and the discovery of several new bioactive compounds (Feng et al., 2011; Kallifidas et al., 2012). Similar studies showed the construction of Actinobacteria fosmid libraries with distinct clones with PKS genes (Parsley et al., 2011). Moreover, it was reported the screening of distinct NRPS and PKS genes in fosmid libraries from environmental DNA from soil (Amos et al., 2015).

It was demonstrated that only 40% of the genes from the genomes of 32 prokaryotes could be detected when expressed in E. coli (Gabor et al., 2004). The study also revealed significant differences in the predicted expression modes between distinct taxonomic groups of organisms. Another study showed that E. coli, Pseudomonas putida, and Streptomyces lividans differed in their abilities to express heterologous gene clusters (Martinez et al., 2004). Our previous work on the biosynthetic NRPS and PKS genes using a fosmid library indicated that it was a successful way to capture the genes. However, no gene expression was recorded. We concluded that the biosynthetic gene cluster integration in the fosmids was incomplete. We recommend the use of bacterial vectors with large DNA intake capacity for complete gene expression.

Whole-genome sequencing

Metagenomics is a novel strategy for the identification of bacteria diversity, and the next-generation DNA sequencing technologies increase scientific interests in understanding the microbial diversity inhabiting different environments. Several new methods for DNA sequencing were developed in the mid to late 1990s and were implemented in commercial DNA sequencers by the year 2000. These were called the “next-generation” sequencing methods such as reversible dye terminators (Illumina sequencing) (Bentley et al., 2008), massively parallel signature sequencing (Brenner et al., 2000), 454 pyrosequencing (Margulies et al., 2005), polony sequencing (Shendure et al., 2005), sequencing by oligonucleotide ligation detection (Mardis, 2008), ion torrent sequencing by synthesis (Rusk, 2010), single-molecule real-time sequencing by synthesis (Eid et al., 2009), and DNA nanoball sequencing (Drmanac et al., 2010). An important application of next-generation sequencing is whole-genome sequencing. Whole-genome sequencing is a powerful tool for genomics research and the most comprehensive method for analyzing the genome such as determining the sequence of individual genes, clusters of genes or operons, full chromosomes, or entire genomes of any organism. It reduces sequencing costs and produces large volumes of data (Bentley, 2006). Whole-genome sequencing is commonly associated with sequencing human genomes, livestock (Eck et al., 2009), plants (Goff et al., 2002), and microbes (Qiao et al., 2012). Whole-genome sequencing provides a high-resolution, base-by-base view of the genome; identifies potential causative variants for further follow-on studies of gene expression and regulation mechanism; and delivers large volumes of data in a short amount of time to support the assembly of novel genomes.

Illumina genome sequencing

Illumina (Bennett, 2004) is an American company founded by Shankar Balasubramanian and David Klenerman in 1998 and developed a sequencing method based on reversible dye terminator technology and engineered polymerases. Illumina gains the massively parallel sequencing technology invented in 1997 by Pascal Mayer and Laurent Farinelli, which is now implemented in Illumina’s Hi-Seq genome sequencers.

High-throughput Hi-Seq genome sequencers are used for whole-genome sequencing of numerous microbes and plant (Li et al., 2012), human, and animal genomes (Eck et al., 2009). The company provides a line of products and services that serve the sequencing, genotyping, and gene expression markets. Its tools allow the researchers to make genetic tests and provide medical information based on genomics and proteomics. Illumina genome sequencing technologies allow researchers to sequence DNA and RNA much more quickly to obtain the sequence of multiple strands at once and cheaply than the previously used Sanger sequencing; thus, it revolutionized the study of genomics and molecular biology (Pettersson et al., 2009).

In Illumina sequencing (Bentley, 2006; Bennett, 2004; Bennett et al., 2005), the process started with the fragmentation of purified DNA into 100–150-bp reads by enzymatic digestion or temperature. The small DNA fragments are linked to adapters which are a kind of molecular modifications and act as reference points during amplification, sequencing, and analysis. In this process, one end of a single DNA molecule is attached to a flow cell surface (it had a complementary sequence with adapters at certain regions). DNA fragments subsequently bend over and hybridize to complementary adapters creating a “bridge,” thereby forming the template for the synthesis of their complementary strands by DNA polymerases.

After the amplification step, a flow cell with more than 40 million clusters is produced, wherein each cluster is composed of approximately 1000 clonal copies of a single template molecule. The templates are sequenced in a massively parallel fashion using a DNA sequencing-by-synthesis approach that employs reversible terminators with removable fluorescent moieties and special DNA polymerases that can incorporate these terminators into growing oligonucleotide chains. The terminators are labeled with fluorescence of 4 different colors to distinguish among the different bases at the given sequence position, and a computer determines each base was added by the wavelength of the fluorescent tag and records every spot on the chip (Bentley, 2006; Bennett, 2004; Bennett et al., 2005).

Bioinformatics analysis and genome assembly

The production of raw sequence data is only the beginning of its detailed bioinformatics analysis (Severin et al., 2014). Many new methods for sequencing and correcting sequencing errors were developed.

Occasionally, raw reads provided by the sequencer are accurate and precise only in a part of their length. The use of the entire read may lead to artifacts in the downstream analyses such as genome assembly, single nucleotide polymorphism calling, or gene expression assessment. Two classes of trimming programs have been introduced, based on the window-based or the running-sum classes of algorithms (Del Fabbro et al., 2013). Trimming programs such as Trimmomatic to trim raw reads and remove adapters was widely used (Bolger et al., 2014). Samtools (Li et al., 2009) and bwa-mem software (Li & Durbin, 2009) were used for qualitative filtering the reads and assembling the genome.

Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated. Automated sequencing machines produce millions of small DNA fragments “read” which can read up to 1000 nucleotides. A genome assembly algorithm works by taking all the pieces and aligning them to one another and detecting overlapping regions of the reads. These overlapping reads can be merged and the process continues (Paszkiewicz & Studholme, 2010).

Genome assembly is a very difficult computational problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats may be thousands of nucleotides long, and some occur in dissimilar positions, particularly in the large genomes of plants and animals. The resulting genome is called a draft genome sequence; it is produced by combining the information sequenced contigs together in the correct orientation and order and then link them to create scaffolds (Paszkiewicz & Studholme, 2010).

Scaffolds are larger DNA fragments positioned along the physical map of the chromosome. Also, contigues obtained from the sequencing genomes can be assembled on the basis of the most similar reference genome on the database to fill the gaps (Darling et al., 2011). Several tools are used in this process such as mauve Aligner 2.4. software (Rissman et al., 2009) and CAP3 assembly program for contigue assembly (Huang & Madan, 1999).

Genome annotation

Genome annotation is the process of identifying the elements of the genome and attaching biological information to these elements (Stein, 2001). After genome assembly is performed, gene annotation is required to determine the structural and functional identity of those genes (Kisand & Lettieri, 2013). Automatic annotation tools try to perform all this by computer analysis, rather than manual annotation which involves human expertise. The popular online automated annotation systems are Rapid Annotation using Subsystem Technology (RAST) which is an automated service for annotating complete or approximately complete bacterial genomes. It also provides high-quality genome annotations for these genomes in phylogenetic tree analysis (Aziz et al., 2008), IMG (Integrated Microbial Genomes based on BLAST p) system which serves as a community resource for comparative analysis of publicly accessible genomes in a wide-ranging integrated context. IMG contains both draft and complete microbial genomes (Markowitz et al., 2009) and Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) developed at the National Center for Biotechnology Information (NCBI) that is based on gene prediction algorithms with homology-based methods. PGAAP annotates both complete genomes and draft genomes encompassing multiple contigs (Tatusova et al., 2016).

In 2018, a research group declared the annotation of two rare Actinobacteria genomes derived from Illumina whole-genome sequencing using mainly PGAAP system. They assume that this annotation opens the door for highlighting significant biosynthetic gene clusters in rare Actinobacteria (Amin et al., 2019).

Identification of biosynthetic gene clusters by antiSMASH

The genes encoding the biosynthetic pathways that are responsible for the production of secondary metabolites are usually clustered together on the chromosome in biosynthetic gene clusters. Recently, genome mining of such biosynthetic gene clusters (BGCs) is considered as an important method to identify new molecules, leading to the detection of lots of novel compounds. A variety of computational tools have been developed to analyze specific classes of secondary metabolites (Weber et al., 2015).

In 2011, a tool called the antibiotics and Secondary Metabolite Analysis SHell (antiSMASH) was introduced as a Web server for genomic identification and analysis of BGCs of any type, thus facilitating rapid genome annotation of a wide range of bacterial and fungal strains (Blin et al., 2013). Although antiSMASH is capable of annotating extensive chemical structures of secondary metabolites, it is still limited to annotate peptides and polyketides coded by modular assembly lines only. Annotation of chemical compounds coded by cyclization and tailoring reactions is still limited. A multiple possible end product compound strategy can be applied to overcome this limitation. This is important to prevent the replication of existing compounds for effective drug discovery and comparative analysis of unknown and known gene clusters (Weber et al., 2015).

Bioinformatics analysis using antiSMASH 3.0 predicts secondary metabolite gene clusters in rare Actinobacteria. Amin et al. mined the whole genomic sequence of Micromonospora sp. Rc5 isolated from the Egyptian desert using antiSMASH server. This study demonstrated out reads of 33 potential secondary metabolite gene clusters including PKS, NRPS, hybrid polyketide synthases, terpenes, lantipeptides, saccharides, siderophore, bacteriocin, arylpolyene, and unidentified clusters (Amin et al., 2019). Another study reporting the annotation of the draft genome sequence of Micromonospora sp. DSW705 using antiSMASH analysis predicts 3 PKS gene clusters, 1 NRPS gene clusters, and 3 hybrid PKS/NRPS gene clusters responsible for antitumor rakicidin synthesis (Komaki et al., 2016).

Conclusion

Rare Actinobacteria is a great potential source of antibiotic production against multidrug-resistant pathogens. Conventional and molecular identification of rare Actinobacteria is a valuable tool. However, further investigations including DNA-DNA hybridization and additional chemotaxonomic and biochemical tests are required to identify their species level. Molecular approaches for the identification of biosynthetic gene cluster are useful in detecting the antimicrobial potential of rare Actinobacteria with uncommon biochemical pathways. This will help in the development of novel bioactive metabolites. The current review provides information that helps to control antimicrobial drug resistance problems and will enhance health care in Egypt and worldwide. In addition, it introduces promising methodologies to support the research of drug discovery in Egypt.