Introduction

Ocean’s diatoms are an important group of eukaryotic phytoplankton that dominate phytoplankton communities in upwelling regions and at high latitudes (Benoiston et al. 2017). Diatoms are commonly fed to the larvae of aquatic animals in the aquaculture industry. The main species of diatoms used as food in the shrimp aquaculture industry are Chaetoceros, Thalassionema, Closterium and Thalassiosira. The genus Chaetoceros is highly representative in the marine plankton all over the world, in terms of diversity, abundance and distribution. The genus includes more than 225 recognized species and about 376 validly published names (Rines and Theriot 2003; Guiry 2020). The genus Chaetoceros is characterized by the presence of setae, which are long and hollow silicate spine-like projections protruding from the valve surface (Assmy et al. 2008; Kooistra et al. 2010). It is an ecologically important genus of marine planktonic diatoms that are found in coastal and upwelling regions (Jensen and Moestrup 1998). Moreover, Chaetoceros has been used for biofuel production because of its high growth rates and high lipid yield (Spaulding and Edlund 2008). Their utility stems in part from their small size and their high n-3 polyunsaturated fatty acids (PUFA) content; these properties also make Chaetoceros an important source of lipids and fatty acids for marine fish, bivalves and crustaceans (Xu et al. 1993; Zhou et al. 2007). Among Chaetoceros species, C. muelleri and C. gracilis have been cultivated for use as food for Litopenaeus vannamei larvae (Sangha et al. 2000). Rotifers fed C. gracilis showed increased viability, larger size, and low ciliate contamination (Knu 2004). Freeze-dried C. muelleri is commercially available as feed for shrimp, sea cucumbers and oysters and can be used in green water techniques in fish larviculture (e.g., https://algae.proviron.com). Previous research has indicated that C. muelleri shows antimicrobial activity against Staphyloccocus aureusEscherichia coli and Candida albicans (Mendiola et. al. 2007). C. gracilis has the ability to produce high-quality fatty acids for the lipid industry (Rika Partiwi et. al. 2009).

Chaetoceros cells have a silica cell wall in the form of two valves called the frustule. Microscopic identification of diatoms has been traditionally used based on morphological characters such as frustule shape (Rine and Hargraves, 1988). Although there are several detailed taxonomic descriptions of the microstructure of their silica frustules, diatoms are time-consuming and difficult to identify. New approaches (e.g. the use of combined morphological and molecular tools) have been recently used which have revealed a cryptic diversity of the genus (Gaonkar et al. 2018).

DNA barcoding is a fast, accurate and standardized method for species-level identification that uses short DNA sequences. DNA barcoding has become an effective tool for assessing global biodiversity patterns and permits non-taxonomic biologists to diagnose species challenging to identify (Siddall et al. 2009). Diatoms are an ideal model group for developing DNA barcoding methods that provide easy-to-use, standardized and fast identification tools. Nuclear 18S rRNA is the most widely used sequence that has been used in phylogenetic analyses of diatoms (Evans et al. 2007; Gaonkar et al. 2018). Random amplified polymorphic (RAPD) markers are also highly useful for the study of populations within species because of their low cost and the fact that they do not require large sample sizes to generate preliminary results (Godhe et al. 2006).

In this study, the morphological and molecular taxonomic characteristics of Chaetoceros isolated from the coast of Eastern Thailand were investigated through microscopic observations, DNA barcoding using RAPD-PCR techniques, and molecular phylogenetic approaches using 18S rDNA. In addition, we present NMR patterns as well as chemical information on Chaetoceros.

Materials and methods

Diatom propagation and culturing system

Chaetoceros were obtained from three different algae laboratories: the Center of Excellence for Marine Biotechnology (CEMB), Faculty of Science, Chulalongkorn University, which originally isolated Chaetoceros from natural sea water in Angsila, Chon Buri province (13°20′22.5" N and 100°55′26.5" E) (hereafter referred to as Chaetoceros CEMB); Institute of Marine Science, Burapha University (BIM), which originally isolated Chaetoceros from natural sea water in Bang Saen, Chon Buri (13°16′10.0" N and 100°55′20.6" E) (hereafter referred to as Chaetoceros BIM); and Chanthaburi Coastal Fisheries Research and Development (CHAN), Department of Fisheries, Ministry of Agriculture and cooperatives, which originally isolated Chaetoceros from natural sea water in Laemsing, Chanthaburi province (12°31′46.1" N and 102°03′02.0" E) (hereafter referred to as Chaetoceros CHAN).

Diatom culture procedures were carried out under sterile conditions to avoid contamination. The diatom samples from the laboratory collections were purified using the centrifugation and streaking method. The samples were washed 12 times by RO water (30 ppt salinity), followed by centrifugation at 2000 × g for 1–2 min, and the supernatants were discarded every round. Each diatom strain was then aseptically streaked onto a sterile plate containing F/2 (Guillard 1975) media with water at 30 ppt salinity and 1.2–1.5% agar. The plate was incubated at 25 °C until diatom colonies appeared. A single colony was transferred into test tubes containing 10 mL of fresh media. The diatom propagation process was first carried out in a 100-mL flask, followed by 500 mL in F/2 (Guillard 1975) media with water at 30 ppt salinity. The room was maintained at 25 °C and under continuous illumination at a low light intensity (1000 µ mol m-2 s-1). To accelerate growth, diatom cultures were vigorously aerated until they reached the stationary phase. Diatom cultures were then centrifugated at 12,000 × g for 1–2 min, and the supernatants were discarded. The cells were kept at − 20 °C until DNA extraction.

Morphological observations

Monoclonal cultures of diatom strains were identified to the genus or species level by morphological features based on observations using a light and scanning electron microscope. For light microscopy, diatom cultures were treated with concentrated hydrochloric acids (1:1, sample:HCl). A mixture was washed with distilled water 5 times and leave to sediment for 2 h to prepare cleaned frustules. The slides were examined using light microscopy under a 100 × oil immersion objective lens (model BX53, Olympus). For scanning electron microscopy, diatom cells were placed on glass plates coated with 0.5% Alcian blue, fixed with 2.5% glutaraldehyde in 0.1 M phosphate buffer saline (PBS) solution for 1 h and then washed with distilled water. Cells were further fixed under dark conditions with 1% osmium tetroxide in 0.1 M PBS solution for 1 h and then washed with distilled water. The samples were dehydrated in a graded ethanol series (70%, 80%, 90%, 95% and 100%) and dried using the critical point drying method. Finally, the samples were mounted onto stubs and sputter-coated with 99.99% pure gold. The specimens were examined under a scanning electron microscope (LEO1450VP, Zeiss, Oberkochen, Germany) with a 205-nm resolution at 10 kV. Morphological comparisons followed the procedures described in previous studies. Cell size (length, width and setae length, in µM) measurements of Chaetoceros were taken.

Genomic DNA extraction

Fifty nanograms of diatom cells were placed in a prechilled microcentrifuge tube containing 500 μL of the extraction buffer (100 mM Tris–HCl, 100 mM EDTA, and 250 mM NaCl; pH 8.0). High molecular weight genomic DNA (gDNA) was extracted from the washed diatom cells using the phenol–chloroform-proteinase K method (Sambrook and Russell 2001). The DNA pellet was resuspended in 100 μL of TE buffer (10 mM Tris–HCl, pH 8.0 and 0.1 mM EDTA). The DNA solution was incubated at 37 °C for 1–2 h and kept at 4 °C until used.

RAPD-PCR analysis

Polymerase chain reaction (PCR) amplification was performed in 25-µL volumes containing 50 ng of genomic DNA; 2.5 µL of 10X Taq DNA polymerase buffer; 200 µM each of dNTP; 0.2 µM primer (UBC101, 5′-GCGCCTGGAG-3′ and OPB01, 5′-GTTTCGCTCC-3′) [the University of British-Columbia pool (University of British-Columbia, Canada) and Operon (Operon Technologies, Inc., USA)]; 2.0 mM MgCl2; and 1.0 unit of Dynazyme™ Taq DNA polymerase (FINNZYMES, Finland). PCR thermocycling conditions were as follows: initial denaturation at 94 °C for 3 min, followed by 40 cycles of denaturation at 94 °C for 15 s; annealing at 36 °C for 1 min and extension at 72 °C for 1.30 min; and final extension at 72 °C for 7 min. Twelve µl of PCR products were electrophoresed on a 2.0% (w/v) agarose gel using a 1-kb DNA ladder (Promega) for size comparison before being stored at − 20 °C.

PCR amplification, gene cloning and sequencing

The 18S rDNA locus from the genomic DNA of each diatom was amplified using universal diatom 18S rDNA-specific primers (Ki et al. 2009) (forward AT18F01, 5′-TACCTGGTTGATCCTGCCAGTAG-3′ and reverse AT18R01, 5′-GCTTGATCCTTCTGCAGGTTCACC-3′). PCR amplification was carried out in 25-µL volumes containing 50 ng of genomic DNA; 2.5 µL of 10X Taq DNA polymerase buffer; 200 µM each of dNTP; 0.2 µM of each primer; 2.0 mM MgCl2; and 1.0 unit of Dynazyme™ II Hot Start Taq DNA polymerase (FINNZYMES, Finland). PCR thermocycling conditions were as follows: initial denaturation at 94 °C for 3 min, followed by 35 cycles of denaturation at 94 °C for 30 s; annealing at 50 °C for 1 min and extension at 72 °C for 2.30 min; and final extension at 72 °C for 7 min. The PCR products were electrophoresed on 1.5% (w/v) agarose gel using a 1-kb DNA ladder (Promega) for size comparison before being stored at − 20 °C.

Successful amplification products were purified, cloned and unidirectionally sequenced (AITBiotech Pte. Ltd., Singapore). Nucleotide sequences were analyzed against molecular reference databases using the BLAST® algorithm (Basic Local Alignment Search Tool) (www.ncbi.nlm.nih.gov/BLAST); similarity was considered significant when the probability (E) value recovered was less than 10–4 (Altschul et al. 1990).

DNA sequence characteristics and phylogenetic analyses

Intraspecific variation in Chaetoceros was investigated by comparing the DNA similarities and genetic distances of 18S rDNA sequences. Multiple alignments were performed with each dataset using the ClustalW algorithm (Thompson et al. 1994). The aligned sequences were trimmed at each end to the same length, and obvious base errors that were only found in single strands were manually removed. Identical positions of the aligned sequences were used. The corrected pairwise (p-) genetic distances were calculated using the Kimura 2-parameter model (MEGA 10.0, Tamura et al. 2007).

An 18S rDNA phylogeny was constructed using the unweighted pair group method with arithmetic mean and Maximum Likelihood algorithms (MEGA 10.0, Tamura et al. 2007) based on the Kimura two-parameter distance matrix; 1000 bootstrap replicates were performed to assess the reliability of the topology. The similarity of GenBank sequences and sequences obtained in our study were also compared. A total of 12 rDNA sequences from Chaetoceros were used in analyses. The 18S rDNA sequences were retrieved from eight Chaetoceros species (Supplementary Fig. 1).

Sample preparation for NMR analysis

Approximately 10 mg of methanol crude extract was dissolved in 600 µL of deuterated acetone, which included the internal standard tetramethylsilane (TMS). The mixture was sonicated for 5 min and centrifuged at 1400 × g for 5 min. Next, 550 µL of supernatant was collected with a pipette and placed in a 5-mm NMR tube. All experiments (1D and 2D experiments) were performed using a Bruker Avance III HD 400 MHz NMR spectrometer, equipped with a 5-mm BBFO probe (Double Resonance Broadband Observe with 19F probe) at 25 °C. The 1H-NMR spectra of Chaetoceros extract were collected using the following parameters: pulse program zg30; relaxation delay 1 s; pulse width 8.90 µs; number of scans 64; sweep width 18 ppm; and center of spectrum 6.50 ppm. The 13C-NMR spectrum of Chaetoceros extract was collected using the following parameters: pulse program zgpg30; relaxation delay 2 s; pulse width 7.50 µs; number of scans 8900; sweep width 240 ppm; and center of spectrum 100 ppm. Parameters of the 2D J-resolved NMR were as follows: pulse program jresgpprqf; relaxation delay 2 s; 128 increments; 16 transients; sweep width 18 ppm; and center of spectrum for 1H was 8.0 ppm in both dimensions. The parameters for HMBC are as follows: pulse program hmbcetgpl3nd; relaxation delay 2 s; 512 increments; 84 transients; sweep width and center of spectrum for 1H were 20.0 and 9.0 ppm, respectively. Sweep width and center of spectrum for 13C were 240.0 and 100.0 ppm, respectively.

Results and discussion

Growth pattern of Chaetoceros

Chaetoceros CHAN had the greatest cell numbers between day 3 (183 × 104 cells/mL) and day 6 (192 × 104 cells/mL) during the culture period (Supplementary Fig. 2). The growth of Chaetoceros CHAN and Chaetoceros BIM exponentially increased during days 2–3, and Chaetoceros CEMB exponentially increased during days 3–5. At day 6, the numbers of cells of Chaetoceros CEMB and Chaetoceros BIM were 168 × 104 and 150 × 104 cells/mL, respectively.

Morphological characterization of Chaetoceros

The most abundant phytoplankton in the Central Gulf of Thailand are diatoms and blue-green algae (Kajonwattanakul et al. 2008). In this study, Chaetoceros from this study were isolated from the Gulf of Thailand: Chaetoceros CEMB and Chaetoceros BIM were isolated from Chonburi province, and Chaetoceros CHAN was isolated from Chanthaburi province.

Light microscope images are shown in Figs. 1a–c. The sizes of the three Chaetoceros isolates were small (ca. 5 µM). They were delicate and nearly square or rectangular in girdle view, with the pervalvar axis longer than the apical axis. Traditionally, the identification of diatoms at the species level has been based on morphological features determined with the aid of light microscopy, including the morphology of the colonies, the shape and dimensions of cells, the thickness and direction of setae, the number and shape of chloroplasts and the presence and morphology of resting spores. However, other features that can only be resolved with electron microscopy, such as the fine structure of valves and setae and the location and number of rimoportulae, are now considered important (Sunesen et al. 2008). To minimize possible misidentifications, we used both scanning electron microscopy and light microscopy.

Fig. 1
figure 1

Light (ac) and scanning electron (dl) microscope images of Chaetoceros CEMB (a, d, g and j), Chaetoceros CHAN (b, e, h and k) and Chaetoceros BIM (c, f, i and l); the higher magnification shows the shape of entire cells (scale bars 5 µM)

Scanning electron microscopy shows that the cells of Chaetoceros CEMB, CHAN and BIM were usually solitary with flat or slightly convex valves (Figs. 1d–l). The setae are straight and narrow in diameter and arise from the poles of the cells. The surfaces of the frustules or cell walls were smooth and rectangular in girdle view. Chaetoceros CEMB cells were shallow rectangular, whereas those of Chaetoceros CHAN and Chaetoceros BIM were square to rectangular. Setae size (18.37 ± 7.41 μM) and transapical axis (4.66 ± 1.25 μM) were significantly higher in Chaetoceros CEMB than in Chaetoceros CHAN and Chaetoceros BIM (Table 1). Chaetoceros is a centric diatom with lightly silicified frustules. Each frustule possesses four long, thin spines or setae. The setae link the frustules together to form colonies of several cells. Frustules can usually be seen in girdle view. Distinguishing Chaetoceros species is difficult using a light microscope. The form of the chains, the sizes of the apical axis and valve shape are some of the most important morphological characters for recognizing species in this genus (Lee et al. 2014). All Chaetoceros isolates were confirmed to be Chaetoceros based on observation of their morphological features with a scanning electron microscope (Table 2).

Table 1 Morphological characters for differentiating the Chaetoceros isolates in this study
Table 2 Comparison of the morphological features of Chaetoceros CEMB, CHAN and BIM by scanning electron microscopy

Cells of C. debilis are roughly rectangular in girdle view and connected in spiraling chains. The basal part of the setae is distinct, and setae extend outward from the spiral. Valves are flat or slightly convex (although the spines make it appear concave). Apertures are narrowly oval and sometimes slightly constricted in the middle. Their diameter ranges from 8 to 40 µM; although they are distributed worldwide, they mainly occur in cold waters (Hasle and Syvertsen 1996). Species like C. debilis include C. curvisetus, which forms spirals wider in diameter; in addition, the apertures are larger and widely oval in C. curvisetus (Guiry and Guiry 2012; Tas and Hernández-Becerril 2017). The cells of C. gracilis and C. muelleri have an apical axis that ranges from 6 to 8 and 8–9 μM and transapical axis that ranges from 3 to 6 and 6–7 μM, respectively, while C. calcitrans has a cell diameter that ranges from 2 to 5 µM (Olenina et al. 2006). C. debilis is larger than C. gracilis and C. muelleri. Light and scanning microscope observations showed that Chaetoceros CEMB may be a different species compared with Chaetoceros CHAN and Chaetoceros BIM. Nevertheless, light microscopy is still faster and more reliable for diatom identification in a mixed sample for trained diatomologists.

C. gracilis and C. calcitrans are extensively used as food sources for rearing prawn larvae (Seraspe et al. 2014). C. gracilis is the phytoplankton species most commonly used in bivalve mollusk and fish hatcheries (Helm et al. 2004). Their effectiveness stems in part from their small size and n-3 HUFA content.

Nucleotide sequences and 18S rDNA phylogeny of Chaetoceros

The 18S rDNA sequences of Chaetoceros were obtained from gene cloning and unidirectional sequencing. Chaetoceros CEMB contained 18S rDNA sequences that were 1794 bp in length (Accession number MW513719.1), which were similar to those of C.muelleri (e-value = 0.0, identity = 99%). Chaetoceros CHAN had 18S rDNA sequences that were 1788 bp in length (Accession number MW513720.1), which were similar to those of C. gracilis (e-value = 0.0, identity = 99%). Chaetoceros BIM contained 18S rDNA sequences that were 1789 bp in length (Accession number MW513721.1), which were similar to those of C. gracilis (e-value = 0.0, identity = 99%). The 18S rDNA sequences of the three Chaetoceros showed high similarity (Supplementary Fig. 3).

The BLAST analysis revealed high similarity between the Chaetoceros sequences obtained in our study and Genbank sequences. We characterized partial nuclear 18S rDNA sequences of three Chaetoceros and compared them with available DNA sequences (12 sequences) obtained from GenBank (www.ncbi.nlm.nih.gov) (Fig. 2). Chaetoceros CHAN and BIM were clustered in the same clade with C. gracilis, and Chaetoceros CEMB was distinct from the others. This result was consistent with morphological data suggesting that Chaetoceros CEMB contained significantly larger setae and apical axes than Chaetoceros CHAN and BIM. The lack of complete consistency between molecular and morphological identification may stem from morphological shifts that occur between environmental species and cultured ones. Thus, species identification both before and after culture might be required to ensure the accuracy of identification (Kesici et al. 2013).

Fig. 2
figure 2

Partial sequences 18S rDNA Maximum Likelihood phylogenetic hypothesis inferred for the species of Chaetoceros CEMB (Accession number MW513719.1), CHAN (Accession number MW513720.1) and BIM (Accession number MW513721.1) (black dots) and Chaetoceros 18S rDNA sequences (12 diatom taxa) from GenBank; 1000 bootstrap replicates were performed to assess the reliability of the topology. A partial sequence 18S rDNA of Chlorella vulgaris (Accession number X13688.1) was used as the outgroup

Chaetoceros is a diverse genus of marine diatoms. Although the morphology of many members of the genus has been well described, molecular taxonomic studies of Chaetoceros are scarce. However, the use of new approaches by combination of morphological and molecular tools, have been recently used which have revealed a cryptic diversity of the genus (Gaonkar et al. 2018). The 18S and 28S rDNA phylogenies might provide suitable markers for resolving the species-level taxonomy of Chaetoceros (Oh et al. 2010).

RAPD profiles of Chaetoceros

Amplified fragments 300–2000 bp in size were obtained using RAPD-PCR analysis with UBC10 and OPB01 primers (Fig. 3). A dendrogram based on the RAPD-PCR band was created. In the study of the interpopulational variability of the three Chaetoceros culture populations, the selection of the RAPD primers was based on the quantity, intensity and repetition of the amplified fragments. These amplified fragments ranged in size from 50 to 2200 bp. A total of 80 fragments were identified, and 113 of these fragments (42.5%) were polymorphic. The average number of fragments per primer was relatively high. The percentage of polymorphic bands was 33.33%, 60.00% and 30.43% for Chaetoceros CHAN, Chaetoceros CEMB and Chaetoceros BIM, respectively (Table 3). DNA barcoding requires molecular loci that are variable enough to discriminate species and a molecular reference database for comparison. The similarity or divergence of the molecular sequence of an unknown organism to a vouchered reference sequence in the database is used for species identification. DNA barcoding of environmental samples involves the extraction of DNA from a pooled sample, PCR amplification of a target locus, cloning of the resulting PCR products, sequencing and analysis. With DNA barcoding techniques, even morphologically similar strains can be identified at the species level. These molecular phylogenetic analyses also enable the rapid, convenient, and accurate classification of diatoms and have thus contributed considerably to studies of diatom diversity.

Fig. 3
figure 3

A 2.0% agarose gel showing RAPD patterns of Chaetoceros using UBC10 (left) and OPB01 (right). Lanes 1 and 2 Chaetoceros CHAN, Lanes 3 and 4 Chaetoceros CEMB, Lanes 5 and 6 Chaetoceros BIM; M is the 100 bp DNA Marker

Table 3 Number of RAPD fragments and polymorphic products obtained in the analysis of three Chaetoceros populations

RAPD-PCR has been used for the molecular characterization and identification of 17 samples of Sargassum spp. (Ho et al. 1995). A 450-bp fragment generated using OPA13 was detected in 12 of 17 samples of Sargassum. This fragment was present in profiles from Turbinaria (Sargassaceae). This study showed that RAPD-PCR is useful for discriminating Sargassum samples and developing fingerprints for them. PCR–RFLP analysis has been used to resolve the species-level differences of 18 isolates of Chaetoceros Ehrenberg (Bacillariophyceae) by targeting the rbcL region of chloroplast DNA, which encodes the Rubisco large subunit (Toyoda et al. 2011). RAPD patterns for the species-level differences of Chaetoceros have not been reported to date. Molecular identification appears to be relatively effective for diatom identification given the similar efficacies of molecular and morphological identification in this study. However, more work is needed to optimize morphological and molecular approaches for diatom identification.

Identification of metabolites extracted from Chaetoceros by NMR spectroscopy

The 1H-NMR spectra of methanol extract from all Chaetoceros isolates showed similar characteristic chemical shift peaks. There was a total of 27 metabolites that were clearly identified based on comparison with previous research, a free NMR database (The Human Metabolome Database, HMDB) and a commercial NMR database (Bruker AssureNMR). The 1H-NMR spectra shown in Fig. 4 contain different groups of metabolites, including amino acids, sugars, carboxylic acids, fatty acids, vitamins and carotenoids. The peaks corresponding to the structures of each metabolite are summarized in Supplementary data.

Fig. 4
figure 4

1H-NMR spectra of Chaetoceros methanol extract in acetone-D6: (a) Chaetoceros CEMB (b) Chaetoceros BIM and (c) Chaetoceros CHAN where 1, glutamate; 2, proline; 3, alanine; 4, isoleucine; 5, methionine; 6, choline; 7, glycine; 8, cholesterol; 9, palmitic acid; 10, oleic acid; 11, linolenic acid; 12, α-linolenic acid; 13, arachidic acid; 14, glucose; 15, sucrose; 16, myo-inositol; 17, fucoxanthin; 18, astaxanthin; 19, lutein; 20, zeaxanthin; 21, violaxanthin; 22, chlorophyll c1; 23, chlorophyll a.; 24, glutamine; 25, valine; 26, leucine; and 27, steric acid

The characteristic chemical shifts of eight amino acids and sugars were observed around the region 4.10–1.98 ppm, which correspond to the -CH2- protons of amino acids, and 1.48–0.96 ppm, which correspond to the -CH- and -CH3 protons of amino acids (Azizan et al. 2018; Ma et al. 2019; Iglesias et al. 2019). The peaks around 5.20–3.82 ppm correspond to the -CH- protons of glucose and sucrose, and the peaks around 3.82–3.67 ppm correspond to the -CH2- protons of glucose and sucrose (Richter and Berger 2013). The representative proton signals of fucoxanthin (olefinic-H), astaxanthin, lutein and zeaxanthin were observed at a chemical shift around 7.01–6.10 ppm (Zailanie and Purnomo 2017; Shumilina et al. 2020; Otaka et al. 2016; Iwai et al. 2008). The identifications of these carotenoids have been confirmed by 2D-NMR (HMBC and JRES); the JRES spectrum showed the singlet signals of fucoxanthin and astaxanthin at 2.01 and 1.98 ppm, respectively, which correspond to the methyl groups (Supplementary Fig. 4) (Subramanian et al. 2015). The correlation between the proton and carbon signals in the HMBC spectrum is consistent with the results of previous studies (Azizan et al. 2018) (Supplementary Figs. 5–6). The signals of chlorophyll a and chlorophyll c1 observed around 9.77–8.20 ppm correspond to the -NH- protons of chlorophyll structures.

1H-NMR and 2D-NMR spectra of the crude extract revealed signals for six fatty acids, including palmitic acid, oleic acid, linoleic acid, α-linolenic acid, arachidic acid and stearic acid (Fig. 5). The characteristic peaks were similar to the results of previous studies (Roswanda et al. 2017; Otto et al. 2014; Singer et al. 1996). The correlation between proton and carbon signals observed around 1.73–1.29 ppm corresponded to arachidic acid, palmitic acid and stearic acid. Other valuable metabolites, such as myo-inositol, cholesterol and choline, were detected by 1D and 2D-NMR spectroscopy. The results indicated that further purification was not required for the identification of some major and minor small metabolites by NMR spectroscopy.

Fig. 5
figure 5

Expanded 1H-NMR spectra of Chaetoceros methanol extract with the peak area (ratio) compared with the 0.03% TMS peak area in acetone-D6: (a) Chaetoceros CEMB (b) Chaetoceros BIM and (c) Chaetoceros CHAN where 10, oleic acid; 11, linolenic acid; 12, α-linolenic acid; 13, arachidic acid; 22, chlorophyll c1; 23, chlorophyll a.; 24, glutamine; 25, valine; 26, leucine; and 27, steric acid

Lipids play an important role in larval growth and survival. Eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) are considered essential fatty acids because they are integral components of the plasma membrane and marine fish larvae cannot synthesize them from linoleic acid 18:3 (n-3). M. rosenbergii also lacks the ability to synthesize linolenic acid and linoleic acid (D’Abramo and Sheen 1993) and has a limited ability to elongate and desaturate short-chain n-3 and n-6 polyunsaturated fatty acids (e.g., C18) to long-chain polyunsaturated fatty acids (e.g., C20) (Reigh and Stickney 1989). Thus, marine fish larvae must acquire PUFAs through their diet of zooplankton (e.g., rotifers and crustaceans), which are enriched in these nutrients. Increasing the PUFA content of zooplankton before feeding larval fish and shrimp is a regular practice in the aquaculture industry (Apt and Behrens 1999).

The supply of EPA and DHA traditionally produced by marine fisheries will be insufficient to meet their market demand in food industry. Consequently, a sustainable alternative source is urgently required. Moreover, EPA and DHA as n-3 supplements can potentially be used as an adjuvant for cardiac issues associated with coronavirus disease 2019 (COVID-19) (Oliver et al. 2020). Through adaptive laboratory semi-continuous cultures condition, the smaller C. gracilis cells can accumulate relatively higher EPA (41.5% EPA content per total fatty acid) and fucoxanthin (Tachihana et al. 2020). In this study, unsaturated fatty acids and fucoxanthin in Chaetoceros were found. The results from this study suggesting that, high biomass production of unsaturated fatty acids and Fucoxanthin contents in Chaetoceros CHAN and BIM may achieve in semi-continuous culture at a low dilution and providing sufficient nutrients, to obtain high biomass production and valuable bio-compounds contents.

Conclusions

This study demonstrates the potential for DNA barcoding, coupled with microscopic observation and NMR characterization, for assessing Chaetoceros biodiversity. The RAPD barcodes, 18S rDNA sequences and NMR profiles of the three diatom isolates from this study can be used to identify Chaetoceros species when morphological differences are ambiguous.