Introduction

MicroRNAs are of interest to many research groups because of their extensive regulatory interaction networks and potential to serve as biomarkers (Condrat et al. 2020). Although they do not code for proteins, they orchestrate gene expression regulation by binding to mRNAs and preventing further protein synthesis. Thanks to the ability to bind to the transcripts of many genes, miRNAs are indirectly involved in numerous cellular processes, such as apoptosis or proliferation, which are crucial for the functioning of cells, tissues, and organisms (Fabian et al. 2010; Filipowicz et al. 2008; He and Hannon 2004). The development of modern technologies has allowed for in-depth analysis of microRNA functioning, including large-scale profiling of entire miRNAomes in different species and tissues, both in physiological and pathological conditions (Chen et al. 2014; Chirayil et al. 2018; Lagos-Quintana et al. 2002; Pawlina et al. 2017; Ropka-Molik et al. 2018).

Currently, the greatest analytical possibilities are offered by high-throughput next-generation sequencing (NGS) technologies. These enable the generation of a large amount of sequence data in a relatively short time, without the need for prior knowledge of the analysed sequences. Therefore, they provide not only the possibility of identifying new microRNAs, which is extremely important for expanding the state of knowledge and better characterisation of the examined tissue and species, but also allow for large-scale expression analysis. One of the available NGS technologies is provided by Illumina, called sequencing by synthesis (SBS). It supports massively parallel base-by-base sequencing using fluorescently labelled dNTPs, which are incorporated into growing DNA strands (Bentley et al. 2008).

The use of next-generation sequencing to identify miRNAs not only offers a wide range of analytical possibilities but also entails certain technical requirements, including adequate computer and human resources. Knowledge of the biology of these sequences and bioinformatics is also required. The large number of available programs and ready-made scripts for the identification and analysis of microRNA expression using various analytical algorithms offer a wide range of choices (Lukasik et al. 2016), but can also lead to additional discrepancies in the obtained results. These programs are characterised by a different analytical approach, a different form of interface, often not very user-friendly, and, most importantly, varied sensitivity and specificity. This is directly reflected in the number of identified sequences and the percentage of false positive and negative results.

Therefore, choosing the most suitable program to analyse a given type of data can often involve difficulties. The main and superior criterion is obtaining the most reliable results. In addition, however, a user may be also interested in the scope of analyses offered within a given program and, from a technical point of view, hardware requirements, the speed of calculations, and the accessibility of the form of generated results. Therefore, the selection of the optimal program may increase the credibility and comparability with other results available in the world literature as well as the repeatability of obtained results.

Thus, the aim of this study was to compare the results generated by three bioinformatic programs dedicated to microRNA sequence identification based on NGS data, followed by empirical validation of chosen miRNA sequences with the RT-qPCR technique, which is very often the method of choice for the NGS data validation (Everaert et al. 2017; Gurgul et al. 2017; Pawlina-Tyszko et al. 2020).

Material and methods

Comparative bioinformatics analysis

In the first stage, the results of next-generation sequencing of microRNA libraries obtained in the previous research projects (GEO accession numbers: GSE87901, GSE148302, GSE231434) (Niwińska et al. 2022; Pawlina et al. 2017; Pawlina-Tyszko et al. 2020) were subjected to quality analysis in order to select the best, most representative samples with phred quality > 20, using the FastQC program (Andrews 2010). Low-quality reads and adapter trimming as well as length filtering (18–25 nt) were carried out with the Trimmomatic software (Bolger et al. 2014). Then, based on the database of available software for microRNA analysis—Tools4miRs (Lukasik et al. 2016)—three programs suitable for known miRNA identification and prediction of novel sequences were selected. The main criterion was their common usage, according to the number of their citations in the world literature. An additional selection criterion was the long-term support of the authors, which ensures the development of the software and performance improvements, as well as bug fixes. The miRNA analysis was performed using the latest version of the microRNA sequence reference database miRBase 22.1 (Griffiths-Jones et al. 2008; Kozomara and Griffiths-Jones 2011) and the same reference genome for each tool (EquCab2.0, Sscrofa11.1, Bos_taurus_UMD_3.1). Default analysis settings were used along with the minimum number of reads set to 2. The obtained results were compared in terms of the number of known and new microRNAs identified, the calculation time, and hardware requirements.

Then, 21 microRNAs were selected for further validation using the qRT-PCR method in the next step. Because of differences in the number of identified microRNAs between individual programs, the selection of microRNAs for validation was dictated by an attempt to reflect the observed proportions as best as possible. Therefore, 11 miRNAs identified by miRDeep2 software were selected, including four known and seven new ones, as this program predicted the greatest number of novel sequences among the three programs analysed. Seven miRNAs profiled by sRNAtoolbox-sRNAbench were chosen for validation, including five known and two new ones, as this program identified the greatest number of known microRNAs of the three programs tested. Seven microRNAs identified by UEA sRNA Workbench were selected, including three known and four novel ones, as this software identified similar numbers of known and new microRNAs, intermediate compared to the other two programs. Moreover, the selection criteria included the priority for miRNAs with a read count ≥ 50, identified uniquely by one tool and having positive scores assigned by miRDeep2 and UEA sRNA Workbench. In case there were no high-quality sequences identified solely by one tool, alternative miRNAs detected by two programs were selected. The chosen miRNAs differentially identified by the tested programs were subjected to a pathway analysis using miRNet 2.0 analytics platform (Chang et al. 2020). Moreover, four miRNAs identified by all three programs were chosen as positive controls (miR-375, miR-215-5p, miR-708-5p, miR-381-3p).

Validation with RT-qPCR

RNA samples selected in the previous step were reverse transcribed using the miRCURY LNA RT kit (Qiagen) according to the manufacturer’s protocol. qPCR was performed using the miRCURY LNA SYBR Green PCR Kit (Qiagen) and commercially available and custom-made primers (Online Resource 1) specific for selected microRNAs (miRCURY LNA miRNA PCR Assay; Qiagen), according to the standard protocol. Each reaction was performed in duplicate, on a Quant Studio 7 Flex instrument (Thermo Fisher Scientific). For each pair of primers, a negative control was included without the addition of a template (NTC). The melt curve analysis, including the additional analysis of ambiguous results using uMELT Quartz software (Dwight et al. 2011), was performed to confirm the specificity of the primers used. In order to assess the efficiency of the conducted reactions, the standard curve method was used for each pair of primers. U6 snRNA and 5S rRNA were used as endogenous controls for expression normalisation. In addition, each sample was run with primers specific for cel-miR-39-3p, which is not expressed in mammals, to further confirm the correct performance of the reaction kit used and other primers. The obtained results were analysed using the ΔΔCt calculation method (Pfaffl 2001) to determine the expression level relative to the endogenous control. Pearson correlation coefficients between the NGS and qPCR results were calculated with the devtools R package (Wickham et al. 2022).

Results

Comparative bioinformatics analysis

The initial analysis of the miRNA-seq data from the previous projects (Niwińska et al. 2022; Pawlina et al. 2017; Pawlina-Tyszko et al. 2020) allowed us to select a total of 30 samples of the highest quality: 10 from horse skin tissue, 10 from pig brain tissue, and 10 from cow mammary gland tissue. Then, three common, renowned miRNA calling programs were chosen using Tools4miRs database (Lukasik et al. 2016), namely miRDeep2 (v2.0.1.2) introduced in 2012 (Friedländer et al. 2012), sRNAtoolbox-sRNAbench introduced in 2014 (formerly miRanalyzer) (Aparicio-Puerta et al. 2019, 2022; Barturen et al. 2014), and UEA sRNA Workbench (v4.7) introduced in 2012 (Stocks et al. 2012, 2018).

These programs differ in their interface and the requirement of user advancement. miRDeep2 is a software package written in Perl programming language. It allows the identification and profiling of known miRNA sequences as well as the prediction of potentially new miRNAs in animals, based on deep sequencing data (Friedländer et al. 2012). It is a command-line package, which requires some knowledge on running this type of software. Moreover, basic bioinformatics knowledge is needed to pre-process sequences and set miRNA identification parameters. The analysis using this program takes on average several dozen minutes per sample depending on the available hardware.

sRNAtoolbox-sRNAbench allows microRNA profiling in multiple animal and plant species. It identifies known miRNA sequences, predicts novel miRNAs, and characterizes the expression profile of miRNA variants—isomiRs. It is available as a web tool, standalone version, and Docker container (Aparicio-Puerta et al. 2019, 2022; Barturen et al. 2014). The online version of sRNAtoolbox-sRNAbench does not require specialized IT knowledge; however, basic knowledge on trimming, quality and mapping of NGS reads is required. Since it is based on an external server, it is independent of the user’s hardware resources, and the analysis takes a few minutes per sample.

The UEA sRNA workbench is a downloadable software package. It allows complete analysis of single or multiple small RNA datasets from both plants and animals, profiling small RNA expression patterns and identifying novel microRNAs. It can be run on Windows, Linux, and MAC OS PCs with a java plugin (Stocks et al. 2012, 2018). Basic knowledge on microRNA characteristics and pre-processing of NGS results is required to prepare input sequences and set parameters. The analysis takes several minutes per sample if minimal recommended hardware requirements are met.

The number and type of generated output files differed between the tested programs. UEA sRNA Workbench generated a csv file with reads processing statistics and a csv file with details on identified sequences. Graphic files showing the length distribution of the identified microRNAs (.png) and secondary structures of their precursor sequences (.jpg) were also created (Online Resources 2 and 3). The miRDeep2 output embraced the main result table in csv, html, and bed formats, several run logs and statistics, as well as several separate files with details on the identified known and novel miRNAs in bed and fa formats. The identified stem-loop sequences with their characteristics and mapping details were graphically visualised in pdf format (Online Resource 4). sRNAtoolbox-sRNAbench generated several dozen output files including statistics and details on identified hairpins, known and novel mature sequences, and sequence variants (isomiRs), in txt and fa formats. Several graphs visualising, e.g. top microRNAs, mapping stats, and mapped reads, were in png format (Online Resources 5 and 6).

Further analysis showed that UEA sRNA Workbench identified on average 181 known microRNAs per sample, followed by miRDeep2 (280) and sRNAtoolbox-sRNAbench (397). The number of predicted novel sequences ranged from 6 new miRNAs per sample on average (sRNAtoolbox-sRNAbench) through 131 (UEA sRNA Workbench) to 212 (miRDeep2). This was also reflected in the number of known and novel microRNAs identified for all samples tested (Table 1). When it comes to the software-unique sequences, UEA sRNA Workbench profiled 26 known and 567 novel miRNAs; miRDeep2 67 known and 2825 new sequences; while sRNAtoolbox-sRNAbench 134 known and 39 novel microRNAs (Table 1). The number of known miRNAs common for all three analysed tools amounted to 617, whereas the number of common novel microRNAs was equal to seven. Detailed data on the number of miRNAs common for two out of three programs are shown in Table 1 and Fig. 1.

Table 1 Data on the number of known and new microRNAs identified by the analysed programs in the 30 tested samples
Fig. 1
figure 1

Venn diagram showing the number of identified known (A) and predicted novel (B) microRNAs, unique for each software and the common ones

qPCR validation results

Out of the 21 analysed microRNAs, bta-miR-574-specific primers were excluded from further testing since the reaction efficiency and NTC results were incorrect. The analysis of the melting curve of the other primers revealed high specificity, as one product was obtained for each of them. The negative controls used, i.e. negative without template (NTC) and cel-miR-39-3p, did not give detectable reaction products. The final analysis of the relative expression level showed the expression of all tested microRNAs, including the positive controls (Table 2 and Online Resource 7).

Table 2 Table showing details of the qPCR-validated microRNAs. Information is provided for each of the analysed program whether a tested microRNA was identified, and if yes, whether as known (xxx-miR-yyy) or new

In summary, all the selected 11 miRNAs identified by miRDeep2, seven determined by sRNAtoolbox-sRNAbench, and seven identified by UEA sRNA Workbench were confirmed by qPCR (Table 3). The KEGG pathway analysis allowed identification of 66 significantly enriched pathways (FDR ≤ 0.1) for miRDeep2 (including 10 unique ones), 89 for sRNAtoolbox-sRNAbench (27 unique), and 54 for UEA sRNA Workbench (3 unique) (Online Resource 8).

Table 3 Summary table of the RT-qPCR validation results

In the next step, we calculated correlation coefficients for the investigated miRNAs between the qPCR results and read counts outputted by each of the programs to assess the significance of different methods of reporting the miRNA read number. Moreover, since custom-made primers were used for some miRNAs, we performed additional reactions with alternative custom 31 and custom 47 assays. This approach was applied to gain insight into the potential influence of qPCR primers and assay design on the validation results.

For miRDeep2, four strong and significant correlations (p-value ≤ 0.05) were identified, namely for two investigated miRNAs, hsa-miR-375 (R = 0.95) and bta-miR-3432a (R = 0.92); and for two positive controls, hsa-miR-375 (cow samples; R = 0.74) and hsa-miR-708-5p (cow samples; R = 0.86). The use of the alternative custom 47 assay caused a substantial increase in the correlation coefficient from 0.09 to 0.59, and a decrease of p-value from 0.79 to 0.07. Three significant correlations (p-value ≤ 0.05) were revealed for sRNAtoolbox-sRNAbench: one strong for the examined hsa-miR-451a (R = 0.91) and two for the positive controls hsa-miR-375 (cow samples; R = 0.67) and hsa-miR-708-5p (cow samples; R = 0.92). Among the tested miRNAs identified by UEA sRNA workbench, three showed the moderate correlation level (p-value ≤ 0.05): hsa-miR-223-3p (R = 0.65), custom 31 (R = 0.67), and cfa-miR-485 (R = 0.68), while another three were strongly correlated (p-value ≤ 0.05): hsa-miR-375 (R = 0.95), bta-miR-3432a (R = 0.97), and tgu-miR-30b-3p (R = 0.78). For the alternative custom 31 assay, a decrease in the correlation coefficient was observed from 0.67 to − 0.42 accompanied by the loss of statistical significance (p-value = 0.27). Of the positive controls, the following two were moderately correlated (p-value ≤ 0.05): hsa-miR-375 (cow samples, R = 0.63) and hsa-miR-375 (pig samples, R = 0.66). When it comes to miRNAs called by two programs, their correlation coefficients and p-values were similar (Table 4).

Table 4 Data on the results of the correlation analysis between the qPCR and miRNA-seq results for the investigated assays

In the case of known miRNAs which were annotated by sRNAtoolbox-sRNAbench, the calculations were also performed for separate isomiRs, including the isomiR with the same sequence as the used assay, to elucidate the influence of different miRNA sequences and length variants on qPCR validation results. As a result, one statistically significant (p-value ≤ 0.05) moderate correlation for the exact isomiR of hsa-miR-433-3p (R = 0.69) and one strong correlation for the exact isomiR of the positive control hsa-miR-708-5p (R = 0.89) were revealed. Correlation coefficients and statistical significance increased for two out of 11 miRNAs for which this approach was applied (hsa-miR-100-5p, hsa-miR-433-3p) (Table 4). The results obtained for the other isomiRs showed different levels of correlation and significance, including sequences differing in one nucleotide and those with different expression levels (Fig. 2 and Online Resource 9).

Fig. 2
figure 2

Graphic representation of results of the single isomiR correlation analysis for hsa-miR-223-3p and hsa-miR-708-5p assays. The assay and canonical miRNA sequences are bolded, while isomiR sequences are given below. Red nucleotides denote those different compared to the canonical sequence. Grey blocks represent the lack of a nucleotide compared to the canonical sequence. “%” stands for the share of a given variant in the whole miRNA read count. Yellow coloured cells highlight p-value ≤ 0.05

Discussion

MicroRNA expression profiling is a commonly used analysis in many types of experiment. It provides valuable information on the functioning of the cellular machinery of various tissues and species, as well as its disorders in various disease states. In-depth characterization of the miRNA profile is enabled by next-generation sequencing technology. There is a large number of miRNA identification programs allowing for the annotation of already known microRNAs and prediction of potentially new ones. Since they are based on different algorithms and may apply different identification criteria, they may give different results. Therefore, in this study, we empirically evaluated the performance of three programs for microRNA analysis with the RT-qPCR method. However, it should be noted that other software may perform differently than the ones analysed in this research.

We ran each tool with the default analysis settings since we assumed that this approach is taken by most users. The exceptions involved the exclusion of sequences with only one read count to decrease the likelihood of the presence of sequencing artefacts. Moreover, the miRNA sequence length was set to 18–25 nt, since this is the length range in which most miRNAs fall (Filipowicz et al. 2005; Pawlina et al. 2017; Pawlina-Tyszko et al. 2020). Additionally, longer sequences could increase the number of false positives (Fromm et al. 2022). Nevertheless, it cannot be ruled out that the adjustment of analysis parameters could boost algorithm performance. The performance may also be influenced by the technology of library preparation. The analysed libraries were prepared with ligation-based kits and total RNA input. Libraries based on enriched small RNAs input and/or ligation-free kits utilising poly(A) tailing and template switching could affect the sequencing output differently.

The performed comparison analysis showed differences in the number of known miRNAs and potentially new sequences between individual programs. Each of them also identified a number of unique microRNA sequences (Table 1 and Fig. 1). The qPCR validation confirmed the expression of all selected microRNAs in the tested samples, which proves the high reliability of each of the programs used (Tables 2 and 3). To shed some light on possible functional implications of different performance of the investigated algorithms, we carried out a pathway enrichment analysis for the differentially identified miRNAs. The analysis revealed numerous biological pathways, of which most were common for the tools. On the other hand, similarly to the miRNA profiling results, a number of program-specific pathways were defined (Online Resource 8). This emphasises a possible influence of software choice on the results of in silico functional analysis and, as a result, conclusions drawn. However, further investigation is needed to state if these program-specific microRNAs are always associated with specific biological processes.

Next, we calculated correlation coefficients between the qPCR results and read counts of the examined microRNAs outputted by the analysed algorithms to determine if there is a superior one. In this step, we included qPCR results for additional miRNAs identified by all the algorithms. These served as positive controls not only for qPCR reactions but also for the correlation coefficient calculations since they were identified irrespectively of the ones used in this study program. Significant correlations were identified for two of the six analysed positive controls, by each tested program (Table 4). When it comes to the software unique miRNAs, two of 11 miRDeep2 miRNAs, one of seven sRNAtoolbox-sRNAbench miRNAs, and six of seven UEA sRNA Workbench microRNAs had significant correlation coefficients. Thus, UEA sRNA Workbench showed the highest number of significantly correlated miRNAs. However, since this was true only for its unique miRNAs and was not observed for the positive controls, it is difficult to determine with certainty if it is the result of better algorithm performance or the coincidental choice of better correlating miRNAs for the qPCR validation.

In general, it was noticeable that if a miRNA was detected by at least two programs, its correlation coefficient and p-value were similar; i.e. if a microRNA was strongly and significantly correlated, it was so for all the tools (Table 4). This suggests that there may exist some miRNAs which pose a difficulty for identification algorithms. Similar observations were made by Everaert and colleagues on the RNA-seq data. They stated that such a situation takes place for genes that are typically low expressed (Everaert et al. 2017). This remains in agreement with our results; however, it may not be the whole truth for miRNAs. The correlation analysis results obtained in this study for the high-quality and probability miRNAs identified by all the programs and serving as positive controls were significantly correlated for only two of six miRNAs, and significant and insignificant coefficients were noted for sequences with read counts below 100 as well as those above 1000. Thus, other factors could also play substantial roles. One of them may be the afore-mentioned library preparation protocol. It may influence the obtained results introducing library bias associated with the preference of a ligation enzyme for some miRNAs over others (Fuchs et al. 2015). Moreover, primers or assays used for qPCR validation itself may be in some cases the source of variability and discrepancies, as was implicated by the results of qPCR with alternative primers. This may be true especially for unique miRNAs and custom-made assays, which are not wet-lab validated by a manufacturer.

Furthermore, the ubiquitous expression of isomiRs that is microRNA length and sequence variants might be involved. Numerous studies revealed a wide repertoire of isomiRs for many miRNAs. They are very often composed of several dozen different variants with different expression levels (Neilsen et al. 2012; Pawlina et al. 2017), which may vary with statistical significance between tissues, developmental stages, or treatments (Bizuayehu et al. 2012; Fernandez-Valverde et al. 2010; Guo et al. 2016; Loher et al. 2014; Pawlina et al. 2017; Telonis et al. 2015). To shed some light on the potential involvement and significance of these miRNA variants for qPCR validation, we took a closer look at their length and sequence characteristics and the share of the total pool. To this end, we calculated correlation coefficients between the qPCR data and read counts as well as the abundance of separate isomiRs. First, an analysis was performed for the isomiR with the exact same sequence as the used PCR primer assay, assuming the best hybridisation efficiency. This approach caused an increase of correlation coefficients for two of 11 examined miRNAs, namely hsa-miR-100-5p and hsa-miR-433-3p; however, only the hsa-miR-433-3p result was statistically significant (Table 4).

In the next step, we carried out the same analysis for other isomiRs detected for these miRNAs (Fig. 2 and Online Resource 9). This revealed quite a complex picture. First of all, 3′ isomiRs were the most common alteration, which remains in agreement with other research (Loher et al. 2014; Pawlina et al. 2017). We also observed that even variants three nucleotides longer than the used assay can be significantly correlated (hsa-miR-433-3p), so they seem to be quite well tolerated by the PCR assays. Moreover, correlation coefficients may vary for the same length isomiRs but with a different nucleotide added, which may be exemplified by hsa-miR-708-5p for which two-nucleotide 3′ additions were detected (Fig. 2 and Online Resource 9). They were statistically significant with strong and similar correlations except for the “GA” addition, which had the lowest statistics. When it comes to 5′ variants, although they are rarer than 3′ ones and usually shorter (one nucleotide addition or deletion), they were also significantly correlated for many analysed miRNAs (e.g. hsa-miR-433-3p, hsa-miR-100-5p, hsa-miR-223-3p). Furthermore, internal single nucleotide substitutions did not seem to substantially affect PCR performance, since their presence did not rule out significant correlations (e.g. hsa-miR-100-5p and hsa-miR-30c-5p), implying that the assays may bear a level of tolerance for at least some substitutions.

In general, the picture which emerges from this analysis is that the isomiRs composition influence on qPCR results and their correlation with NGS results may be profound and very complex. This may be reflected by hsa-miR-223-3p, for which a moderately significant correlation was established between UEA sRNA Workbench and qPCR results. This miRNA was composed of 13 isomiRs in the analysed samples. The single-isomiR analysis revealed significant correlations with the qPCR results for only three of them (Online Resource 9). The hsa-miR-100-5p analysis showed that the whole miRNA profile correlations between the NGS analysis programs and qPCR results were not significant. However, 21 of 232 detected isomiRs had significant correlation coefficients. On the other hand, hsa-miR-451a was characterised by a strong and significant correlation between the whole miRNA profile NGS results and qPCR results, while 12 of the detected 13 isomiRs were strongly and significantly correlated. The obtained results suggest that if the correlation of the whole miRNA profile NGS results with qPCR results is very strong and statistically significant, the same might be noted for most of the detected isomiRs. When it comes to moderately correlated and less-significant or insignificant miRNAs, there may exist some isomiRs for which significant correlations may be identified if the isomiR profile is broad.

In order to further investigate possible causes of such diversified results for isomiRs, we looked at the number of reads of single isomiRs, assuming that the likelihood of binding of primer sequences to the most abundant variants might be higher. Of the 11 investigated isomiR profiles, the expression level of the most abundant isomiR was strongly and significantly correlated for three miRNAs (hsa-miR-451a, hsa-miR-708-5p, and hsa-miR-375-3p); however, there were other, less-numerous isomiRs with stronger correlations (Fig. 2 and Online Resource 9).

Interestingly, during the analysis of the positive control results, we noticed that two assays examined in samples from two different species (the cow and the pig) performed differently. The correlation analysis results for hsa-miR-375-3p and hsa-miR-708-5p in the cow samples revealed moderately and strongly significant correlations for all the analysed programs (except for UEA sRNA Workbench results for hsa-miR-708-5p). The same assays were not significantly correlated in the pig samples, except for hsa-miR-375-3p UEA sRNA Workbench results and its two separate isomiRs (Fig. 2 and Online Resource 9). Thus, we looked at the isomiR profiles of these miRNAs in the tested samples to see if they may underlie these differences, since isomiR profiles may be even gender and population specific (Loher et al. 2014). However, the profile did not visibly differ in the case of hsa-miR-375-3p and for hsa-miR-708-5p minor differences in the number of identified isomiRs were observed (data not shown). Therefore, it remains to be determined with a higher number of samples and assays if these results are species dependent, tissue dependent, or associated with other factors.

To sum up, factors influencing concordance between miRNA-seq and qPCR data may be very numerous. Some of these may dominate over others and their combination may create unique signatures responsible for correlation results of single miRNAs. Apart from the library protocol bias, the expression of miRNA variants (isomiRs), their abundance, length, sequence composition, and profile characteristics may account in the case of microRNAs. Investigated species could also play a role, although further research is needed to confirm this influence and elucidate its exact nature. The performed analysis implies that the influence of these factors is virtual even for different miRNA-seq identifying algorithms in the case of high-quality and likelihood miRNAs detected by at least two of the analysed programs. This may also be true for other software; however, it remains to be elucidated. Furthermore, additional features that were not assessed here may be of vital importance. Similar observations and conclusions were made for RNA-sequencing analysis workflows also validated with RT-qPCR (Everaert et al. 2017). Finally, we cannot exclude the possibility that miRNAs chosen for the validation favour accurate quantification by NGS or qPCR. The performed analysis would also benefit in the future from an increase in the number of qPCR-quantified miRNAs, single isomiRs, different species and tissues as well as library preparation protocols, used assays, and validation techniques to gain a more comprehensive view.

The most prominent differences between the analysed algorithms were noted for the number of predicted potentially new sequences. sRNAtoolbox-sRNAbench identified 39 while miRDeep2 as many as 2825 novel miRNA candidates. Although these numbers would probably change when fine tuning the algorithm analysis settings, it is rather unlikely that they would even out. All the new sequences unique for the investigated programs, selected for qPCR validation in this study, were confirmed to be expressed, which implies that they may be true novel miRNAs. However, since they constituted only a small portion of all predicted new sequences, this does not guarantee the authenticity of the other ones. Moreover, these results become especially important in the light of the recent analysis performed by Fromm and colleagues who, as a result, postulated that the limits of human miRNA annotation had been met (Fromm et al. 2022). According to the current 22.1 version of the miRBase, 2654 human mature miRNAs are deposited. When it comes to the species investigated here, 1025, 690, and 475 mature miRNAs are currently deposited for the cow, horse, and pig, respectively. These numbers are far from being close to the human data; thus, there still might be a pool of new microRNAs to be discovered in these species, especially in the scarcely investigated tissues and developmental stages. Nevertheless, caution and thorough analysis are advised when identifying novel miRNAs since their repertoire is not limitless.

The results obtained in this study shed some light on the performance of three algorithms for miRNA-seq data analysis and their concordance with RT-qPCR validation. Coenye recently suggested that the added value of qPCR validation is low in the case of high-quality RNA-seq data, highly expressed genes, and those with large differences in expression levels (Coenye 2021). On the other hand, it may be still necessary if these terms are not met. We suggest that in the case of microRNA data, other factors may also be involved and their thorough and detailed characterisation necessitates further, more extensive studies. Whatever the validation decision, miRNA-seq results should be quality controlled and carefully inspected. On the other hand, it should be taken into account that qPCR validation results with moderate concordance with miRNA-seq results are not necessarily the sign of poor NGS quality but may stem from the intrinsic nature of miRNA sequences, the investigated research material, and applied technologies.

Conclusions

All programs differed slightly in the number of annotated known miRNAs and substantially in the number of predicted new sequences. Of note, each algorithm identified a set of unique miRNAs. All the analysed programs showed a good concordance with RT-qPCR expression data and none of them outperformed the others in terms of the number of confirmed sequences. Interestingly, the highest number of significantly correlated unique miRNAs was identified by UEA sRNA Workbench and the cause of this needs to be additionally confirmed. Careful analysis and interpretation are warranted when assessing RT-qPCR validation results since many different factors and miRNA features may influence them, apart from sequencing quality.