Background

On December 12, 2019, the pandemic of coronavirus disease 2019 (COVID-19) was first reported in Wuhan by a novel coronavirus that has similarities with bat coronaviruses (SL-CoVZC45 and CoVZXC21) (Wu et al. 2020; Koyama et al. 2020). The novel RNA virus mutates rapidly and carries new variants spreading around the world (Mercatelli and Giorgi 2020). As of July 7, 2020, the majority of several clades of SARS-CoV-2 variants had been identified. Clade GR was most frequently observed worldwide. Meanwhile, clade L, the first observed clade of SARS-CoV-2, has the smallest proportion among all clades. The variants found in the Asian continent itself dominated by clade O (Hamed et al. 2020).

As of early October 2020, Indonesia has confirmed a total of 291,182 positive COVID-19 cases. The percentage of deaths in Indonesia is 3.72% (10,856 deaths), which is higher than the global death trend (2.98%) (WHO 2020). The surge of COVID-19 cases in Indonesia is expected to be related to mutations in the Spike Glycoprotein (S) gene that leads to the ability of the virus to bind to the ACE-2 (Angiotensin-Converting Enzyme 2) receptor in human hosts (Hoffmann et al. 2020). There are currently several in silico studies in Indonesia to understand the immunoinformatic aspect (Ansori et al. 2020b; Nidom et al. 2020), molecular docking (Parikesit and Nurdiansyah 2020a, b), genetic variant, and mutations (Ansori et al. 2020a, b; Nidom et al. 2020) of the SARS-CoV-2 genome from Indonesia COVID-19 cases. However, a microRNA (miRNA) SARS-CoV-2 study is rarely conducted in Indonesia. Therefore, we carried out an in silico approach for predicting and profiling miRNAs from Indonesia samples.

MicroRNA is a non-coding RNA that has a vital role in the gene expression of an organism. MicroRNA dysregulation can cause changes in expression patterns and gene mutations leading to disease severity (Winter et al. 2009; Krol et al. 2010; Mendell and Olson 2012; Finnegan and Pasquinelli 2013; Fernandez et al. 2017). Alterations of miRNA function also can change the genetic variation, which can be identified by detecting target miRNAs from miRNA sequences (Cammaerts et al. 2015; Martin-Guerrero et al. 2015; Fernandez et al. 2017). Additionally, pathogenicity and viral RNA replication are mediated by the direct binding of host miRNA to viral RNA. The binding of the miRNA-viral RNA process induces genetic dysregulation in the host. By mimicking cellular miRNAs host, viruses affect host regulatory pathways and directly change host transcriptomes (Skalsky and Cullen 2010; Trobaugh and Klimstra 2017).

Relevant research using Asian SARS-CoV-2 isolates with various methods for predicting miRNA is usually investigated by conducting bioinformatics analysis. Sarma et al. (2020) compared China isolates to several countries to study miRNA interaction between humans and SARS-CoV-2. They applied VMir to identify several potentials pre-miRNAs and MatureBayes to identify and predict mature miRNAs. From their study, 22 significant potential miRNAs were predicted from five SARS-CoV-2 genome sequences. Another comparative study of the SARS CoV genome with SARS-CoV-2 from 24 countries conducted by Khan et al. (2020) who found 24 predicted miRNAs. They utilized miRNAFold software to predict pre-miRNAs from hairpin miRNAs, MFE (Minimum Free Energy) miRNA analysis software to find secondary structures, and MaturBayes to predict miRNAs. However, other studies directly predict without determining pre-miRNAs. That approach can be performed using web-based analyzing software, miRDB, to detect the targeted miRNAs. This software detects mature miRNAs that affect gene regulations directly with functional annotation (Fulzele et al. 2020; Haddad and Al-Zyoud 2020). Fulzele et al. (2020), observed the role of SARS-CoV-2 miRNA targets with human functional genes through KEGG pathway analysis and Gene Ontology analysis on DIANA-miRPath v.3.0 based on target prediction scores above 90 from miRDB. They compared isolates from several Asian countries to Wuhan isolates. The study shows that seven target miRNAs have significant scores from miRNA 558 SARS-CoV-2 isolates. Li et al. (2020) found that miR ‐ 16‐2‐3p, miR ‐ 6501‐5p, and miR ‐ 618 were significantly expressed on peripheral blood genome sequences from COVID-19 patients. This study of differential expression of miRNA is operated in miRdeep2 for filtering that is compared with the reference genome.

In this study, we used the SARS-CoV-2 samples found in Indonesia that were obtained from several public genome databases and compared them to the Wuhan samples to predict miRNA target using bioinformatics pipeline. This research aimed to find the target host miRNA of SARS-CoV-2 and the role of the pathogenesis of COVID-19 in Indonesia cases by applying bioinformatics analysis.

Methods

SARS-CoV-2 sequence data

All SARS-CoV-2 data were obtained from GISAID (https://www.gisaid.org/), NCBI (https://ncbi.nlm.nih.gov), and National Genomics Data Center (NGDC) which is a part of the China National Center for Bioinformation (https://bigd.big.ac.cn/gwh/). In total, we included 39 SARS-CoV-2 samples from Indonesia and 37 samples from Wuhan. The information of all samples from both groups can be found in Supplementary data Tables 1 and 2. All data were accessed on September 13, 2020.

Predicting miRNA target

MiRDB was used to annotate and predict target human mature miRNAs against SARS-CoV-2 sequences. MiRDB utilizes code from gbnegrini/mirdb-custom-target-search (https://github.com/gbnegrini/mirdb-custom-target-search) based on the version of the web-based software (http://mirdb.org/custom.html). We employed mirTarget2 algorithm based on Support Vector Machine to compute the miRNA target prediction score [30]. The threshold of the miRNA target prediction score used in our work is 80. Target miRNAs detected by miRDB are based on 38,589 mature miRNAs from the miRBase version 22.1 database (http://www.mirbase.org/). ChromosPros Version 2.1.9 software was used for quality control and trimming SARS-CoV-2 sequence data.

Statistical analysis

We performed a series chi-square test to obtain significant miRNA that can differentiate Indonesia and Wuhan isolates. This test captures the differences by seeing the proportion of miRNA presence rate in both groups. Only miRNAs with a p value < 0.05 were considered significant. The corresponding cross table for each miRNA in both groups was also generated to give a clear perspective on this comparative analysis. The whole research flow can be seen in (Fig. 1).

Fig. 1
figure 1

Research process diagram flow

miRNA gene ontology analysis

The miRNA targets were analyzed for their pathway using web-based software DIANA-miRPath v3.0 (http://www.microrna.gr/miRPathv3) based on biological process subcategory in Gene Ontology (Vlachos et al. 2015). DIANA-miRPath v3.0 applies DIANA-microT-CDS algorithm that predicts the CDS (Coding Sequence) mRNA and 3′ UTR regions with high accuracy (Paraskevopoulou et al. 2013). The whole research process is depicted in Table 3.

Result

In total, there were 76 SARS-CoV-2 genome samples included in this study. The samples consist of 39 sequences from Indonesia and 37 sequences from Wuhan. Each sequence data is complemented with metadata, including age, gender, clade, and mutation data in the GISAID metadata to describe the SARS CoV-2 profile. The ages of patients infected with SARS CoV-2 in Indonesia ranged from 17-83 years, and the mean age of the patients was 50.79 years with a standard deviation of 17.88. Meanwhile, Wuhan patients who contracted COVID-19 ranged from 21 to 65 years old with an average age of 48.7 years and a standard deviation of 21.23. The gender profile of COVID-19 patients in Indonesia is 74.36% for men and 25.64% for women. Males and females composed 45.95 % and 54.05 % of the Wuhan population, respectively. In both countries, SARS CoV-2 infection was observed in people between the ages of 20 and 40 years old, as seen in the table of age distribution stratified by gender (Fig. 2). Based on this figure, in both countries, COVID-19 cases are often found in patients over 40 years of age in all populations. Based on the t-test analysis, the gender-related age distribution did not show any significant difference between the Indonesian and Wuhan populations.

Fig. 2
figure 2

Age distribution of SARS CoV-2 Metada from Indonesia and Wuhan samples stratified by gender

Based on GISAID metadata, the SARS CoV-2 clade profile between the Indonesian and Wuhan populations identified five and two distinct clades, respectively, as shown in Fig. 3. The clade GH and L dominated 51.3 % and 41 % of the clades found in the Indonesian population, respectively. Meanwhile, in Wuhan, clade L is the SARS CoV-2 group that controls almost all COVID-19 cases in this population. Clade L controls 94.6 % of the Wuhan population, with Clade S accounting for the remaining 5.4 percent. There is no significant difference between Clade L, which is mainly present in the Indonesian and Wuhan population. The mutation variants found from all the SARS CoV-2 clades were found in 72 variants in Indonesia and 24 in Wuhan. The Spike D614G and NSP12 P323L variants dominated SARS CoV-2 mutations in the Indonesian sample population, with both showing the same frequency of 30.56%. The NSP2 N92H and Spike N856K variants dominated the samples from Wuhan with frequencies of 25% and 12.5%.

Fig. 3
figure 3

Various SARS CoV-2 Clade in Indonesian (a) and Wuhan (b) population-Based on GISAID

The chi-square test outputs five significant miRNAs which differentiate Indonesia and Wuhan samples. Table 1 shows the list of these significant miRNA with the corresponding p value. The full contingency tables for these miRNAs can be seen in Table 2. hsa-miR-4778-5p (p value<0.001) and hsa-miR-4531 (p value=0.001) were two miRNAs with a clear contrast between these two groups. These two miRNAs were only predicted in less than half of genome samples from Indonesia, 48.72% and 25.64% respectively. On the other hand, these two miRNAs were predicted in the majority of Wuhan samples, with the former reach 100% of samples that contain this miRNA. The other three miRNAs, hsa-miR-6844 (p value=0.011), hsa-miR-627-5p (p value=0.023), and hsa-miR-3674 (p value=0.027), were consistently predicted in both sample groups despite of slight the differences in proportion. More than 90% of Indonesia samples were predicted to have these three miRNAs, whereas they are only detected in less than 85% of Wuhan samples. Furthermore, all samples from Indonesia were predicted to contain the hsa-miR-6844. Among these five significant miRNAs, only hsa-miR-6844 which associated with the ORF1ab gene.

Table 1 Significant miRNA
Table 2 Contingency table of Significant miRNAs

In the subsequent analysis, the 5 significant miRNAs were then analyzed to obtain the related biological pathway using DIANA-miRPath v 3.0. Table 3 shows four pathways that were obtained based on the five significant miRNAs interest in Gene Ontology. Each of the associated biological pathways has a p-value less than 6.5e-10 and is associated with at least three significant miRNAs based on the distribution of probability densities from the hypergeometric test (Cao and Zhang 2014; Tomczak et al. 2018). The most significant biology pathway according to this analysis is Cellular Nitrogen Compound Metabolic Process (GO:0034641) (p value <1e-325). This biology pathway includes 500 related genes where almost half of these genes are associated with hsa-mir-4778-5p. This particular miRNA is also associated with the majority of genes in each other pathway.

Table 3 Gene Ontology Pathway of miRNA SARS-CoV-2 Indonesia vs Wuhan

Discussion

The proportion comparison of five significant miRNAs in our data, as listed in Table 1, clearly shows the difference of characteristics in the two sample groups. We observed that these five miRNAs are included in a list of 873 common miRNAs targeting the COVID-19 genome, which were found via in silico analysis by Fulzele et al. (2020) and Khan et al. (2020). They used COVID-19 isolates from various countries, including those from Wuhan and other Asian countries, excluding Indonesia. However, we found none of these five miRNAs was significant in their results. Therefore, more thorough investigations on these locally unique miRNAs are needed. Aside from COVID-19, our significant miRNAs have been reported by previous studies on other diseases. Most of them have associations with several cancers (Cummins et al. 2006; Vaz et al. 2010; Jima et al. 2010; Persson et al. 2011).

Gene Ontology analysis shows four significant pathways related to the role of hsa-miR-4778-5p, hsa-miR-4531, and hsa-miR-6844 in the biological process (Table 3). Cellular Protein Modification Process (GO: 0006464) indicates that our significant miRNAs play roles in PTM. The reduction of phosphorylation, as one of the PTM mechanisms, is associated with NSP12 mutation (Sun et al. 2020), which was found in the majority of our Indonesian samples. NSP12 is well-known to be associated with ORF1AB (Yoshimoto 2020), whereas one of the significant miRNAs in our finding, hsa-miR-6844, is also related to ORF1AB. Therefore, NSP12 mutation may have an association with this miRNA. Thus, our result shows a consistent finding with the previous study. The other two biological processes, Cellular Nitrogen Compound Metabolic Process (GO: 0034641) and Biosynthetic Process (GO: 0009058) have a role in the inflammatory process during SARS-CoV-2 infection (Valko et al. 2007; Arisan et al. 2020; Bouhaddou et al. 2020; Hu et al. 2020b; Abu-Farha et al. 2020). Lastly, Fc-epsilon Receptor Signaling Pathway (GO: 0038095) plays an important role in the immunological reaction. A preliminary proteomics study (Kothapalli et al. 2020) with clinical samples related to lung injury caused by COVID-19 showed the downregulation of several proteins in the Fc-epsilon Receptor Signaling Pathway. Further research is needed to understand our predicted miRNA target in the inflammatory process and immune system in COVID-19 based on this gene ontology analysis.

The results of our miRNA target prediction analysis indicate some differences between SARS-CoV-2 found in Indonesia and Wuhan. The differences among the same virus are commonly caused by either mutation (subtle genetics changes) or recombination (major genetics changes) (Wimmer and Goldbach 1992). However, in this preliminary study, we merely focus on the mutations that may cause the differences in SARS-CoV-2 characteristics from these two origins. Supplementary data Tables 3 and 4 show mutations found in each sample from both origins. In our Indonesian samples, the top three most common mutations are Spike D614G, NSP12 P323L, and N3 Q57H. These three mutations are not identical with the three most common mutations in Wuhan samples, namely NSP2 N92H, Spike N856K, and NS8 L84S. The effect of Spike D614G mutation on the infectivity of SARS-CoV-2, as the most common mutation, is well reported in several recent studies (Plante et al. 2020; Mercatelli and Giorgi 2020; Korber et al. 2020; Mohammad et al. 2020; Yurkovetskiy et al. 2020; Franco-Muñoz et al. 2020; Jackson et al. 2020). This mutation is highly associated with the infectious rate on human lung cells, and colon cells (Yurkovetskiy et al. 2020). Additionally, the combination of Spike D614G and NSP12 P323L in all detected SARS-CoV-2 variants indicates the importance of these two mutations in terms of transmission and pathogenicity rate (Hartley et al. 2020; Kannan et al. 2020; Mutlu et al. 2020). A study from Vietnam cases, which also found D614G and P323L as the most common mutations, suggests that there is a high probability of the European origin of viruses in their samples as these two mutations are highly dominant in European samples (Nguyen et al. 2020). Unlike the previous two mutations, only a few studies discuss the N3 Q57H mutation (Wang et al. 2020; Soratto et al. 2020; Hassan et al. 2020). This mutation was found in 70% of the United States and Singapore samples (Wang et al. 2020).

In addition to the mutation factor, we also included clade analysis to compare the characteristics of SARS-CoV-2 from Indonesia and Wuhan. Clade can be associated with the phylogenetics analysis which categories certain variants of virus based on the lineage. SARS-CoV-2 clade affects the diversity of miRNA profiles in a region including Indonesian and Wuhan samples in the current study. Among SARS-CoV-2 Indonesia samples, the majority of them were grouped into clade GH (50%) and L (40%), with the rest of them, are categorized to clade G, O, and GR. On the other hand, only one sample is considered as clade S, while the rest of the samples from Wuhan are considered as clade L which is dominantly represented in Asian samples (Mercatelli and Giorgi 2020). Interestingly, clade GH, as the most common group in Indonesia samples, has been largely detected in the Americas (Hu et al. 2020a; Mercatelli and Giorgi 2020). This clade is derived from the mutations D614G and Q57H as commonly found in Indonesia samples. Currently, clade GH alongside GR is the most commonly observed clades in all SARS-CoV-2 sequenced samples globally (Mercatelli and Giorgi 2020). From the temporal analysis, it is believed that the peak of GH clade counts was in May 2020 (Alm et al. 2020).

Conclusion

In the present study, we compared the predicted miRNAs from two groups of SARS-CoV-2 samples from Indonesia and Wuhan (China). This comparative analysis detected various miRNA targets with different proportions in both groups. However, our statistical analysis predicted only five significant miRNAs that were associated with SARS-CoV-2 infection. Further analysis on the biology pathway prediction indicates four pathways that are related to the majority of significant miRNAs. The differences in miRNAs proportion in both groups also lead to the disparity of other SARS-CoV-2 characteristics such as clade and mutation. This in silico research can be used as the initial foundation for further research and our significant miRNA target can be used as a potential biomarker or treatment for COVID-19 cases in Indonesia. In future studies, we will expand our samples and compare them with several ASEAN countries to validate our significant miRNA target. We will also conduct in vitro and in vivo research for the study expression and role of significant target miRNAs in the severity of COVID-19.