Background

Pear is a member of the Rosaceae family and the Amygdaloideae subfamily [1], and one of the most important temperate fruit trees globally, with a cultivation history spanning more than 3,000 years [2, 3]. At present, 22 species and 5,000 accessions of pear have been described, including 5 major domesticate species cultivated for the production of fruit, specifically P. communis, P. pyrifolia, P. bretschneideri, P. ussuriensis, and P. sinkiangensis [4].

Most cultivated pears are diploid (2n = 34), and the genome is highly heterozygous and contains several repetitive sequences. The genome of an important oriental pear variety ‘Dangshansuli’ (P. bretschneideri) was sequenced and assembled using the HiSeq Illumina technology combined with a BAC-by-BAC strategy [5]. After this, the western variety ‘Bartlett’ (P. communis) was sequenced with Roche’s 454 Sequencing Technology [6]. In recent years, several more pear reference genomes were published owing the rapid development of sequencing technologies [7,8,9,10,11,12,13]. These developments further led to the generation of a large number of transcriptome and population DNA re-sequencing data, allowing mining key genes responsible for important agronomic traits and studying the domestication history of pears [14, 15]. At present, pear genome and resequencing data have been collected in the Rosaceae Genome database GDR, but transcriptome data is lacking. Therefore, there is an urgent need for a database that can effectively integrate, analyze and disseminate pear multiomics data, and provide a platform for researchers to quickly access and utilize these resources. These resources are already available for a variety of plants, such as bayberry and pineapple [16, 17]. Therefore, we integrated the advantages of the above-mentioned databases and constructed the Pear Genomics Database (PGDB). In this study, a total of nine genome sequences, 35 transcription group datasets, and re-sequencing data from 30 pear accessions were collected. We also included commonly used tools, such as BLAST, JBrowse, phylogenetic tree building in the PGDB which will facilitate the future development of pear functional genomics and molecular biology approaches.

Database construction and content

The PGDB collected and processed data on genome sequences, annotation, expression, synteny, and resequencing, which are stored in the MySQL database server (5.7.34). The web interface mainly uses the front-end framework Twitter Bootstrap based on HTML5 (HyperText Markup Language 5), CSS (Cascading Style Sheets) and JavaScript, and allows users to connect various levels of information, query the data and generate results. The data can be downloaded through a PHP protocol (7.4.21). The entire website was developed using the Web server software Apache (2.4.48), and implemented in the Linux (CentOS 7.6) operating system (Fig. 1).

Fig. 1
figure 1

Overview of the PGDB website architecture

Genome assemblies and functional annotations

The PGBD collected information on 9 pear genomes, including ‘Dangshansuli’ v.1.0 (P. bretschneideri), ‘Dangshansuli’ v.1.1 (P. bretschneideri), ‘Cuiguan’ (P. pyrifolia), ‘Zhongai 1’ [(P. ussuriensis × communis) × spp.], ‘Shanxi Duli’ (P. betulifolia), ‘Nijisseiki’ (P. pyrifolia), ‘Bartlett’ v1.0 (P. communis), ‘Bartlett’ v2.0 (P. communis) and ‘d’Anjou’ (P. communis) (Additional file 1: Table S1). We kept the ID of each gene in the database consistent with the gene ID available in the original GFF annotation file. Gene Ontology (GO) [18] and InterPro [19] annotations were performed using InterProScan (v5.53-87.0) [20]. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were annotated with the bi-directional best hit (BBH) method from KAAS [21] using 40 plant species as reference (Additional file 2: Table S2).

Transcription factors

Transcription factors (TFs) and transcriptional regulators (TRs) modulate the expression of target genes, which in turn are involved in several important life processes, such as growth and development, secondary metabolism and abiotic stress responses [22, 23]. We used the iTAK program for TF and TR predictions, a software based on a set of consensus domain assignment rules [24]. We detected TFs and TRs from the aforementioned 9 pear genomes, i.e., ‘Dangshansuli’ v1.0 (2433, 563), ‘Cuiguan’ (2705, 616), ‘Shanxi Duli’ (2670, 628), ‘Zhongai 1’ (2409, 570), ‘Bartlett’ v1.0 (2541, 624), ‘Bartlett’ v2.0 (1910, 481), ‘Nijisseiki’ (2544, 625), ‘d’Anjou’ (2666,703), ‘Dangshansuli’ v1.1 (3517,475) (Additional file 3: Table S3), and found that the most numerous were the MYB and NAC families. These families include widely known key factors regulating development and stress responses [25, 26].

Synteny data

We identified synteny blocks and homologous gene pairs from 9 pear genome data. The protein sequences were aligned against each other and themselves using BLASTP (E-value ≤ 1e–10). The MCScanX [27] software was then employed with default parameters to determine the synteny blocks and homologous gene pairs from the BLASTP results.

Marker data

The Krait tool [28] was used to mine simple sequence repeat (SSR) resources in nine pear genome data. A total of 386,779 SSR markers were identified and divided into five categories, namely dinucleotides to hexanucleotides, with the minimum number of repeats of 6,5,4,4,4 for each SSR type. Primer3 software (58) [29] implemented in Krait tool was used to design SSR primers. The specific parameters are: the size range of polymerase chain reaction (PCR) product is 100–300 bp, the length of primer is 20–25 bases, the best is 22 bases, the best annealing temperature is 50–60 °C, the GC content is 40–60%, the best is 50%. Retain the default values for other parameters. In addition, 579 pairs of SSR markers were collected from reported literatures [30,31,32,33,34,35].

Transcriptomic data

The PGDB included transcriptomic data from seven key stages of fruit development on the following 5 cultivars: ‘Hosui’ (P. pyrifolia), ‘Yali’ (P. bretschneideri), ‘Kuerlexiangli’ (P. sinkiangensis), ‘Nanguoli’ (P. ussuriensis) and ‘Starkrimson’ (P. communis) [36]. The RNA-seq reads were mapped to the reference genome using software of SOAPaligner [37]. Transcription abundance was quantified by in-house perl scripts using the method of mapped sequence reads per million kilobytes per exon (RPKM). In addition, the results were changed to bigwig format using deepTools [38] software and placed in JBrowse.

Utility

Database content

The homepage of the PGDB database is mainly composed of three parts. The top navigation bar is a fast link entry of each module, including: ‘Tools’, ‘JBrowse’, ‘Species’, ‘Download’, and ‘About’. The middle part contains a brief introduction to the database and the fast link to the ‘Tools’ and ‘Species’ modules. The bottom portion includes the website’s launch date and other information.

Available tools

Search

The ‘Search’ page provides two retrieval modules (Fig. 2a). In the ‘Quick searching’ module, users can first search for detailed annotations on genes by simply selecting the cultivar genome and inputting the gene ID. The results page includes information on the sequences (including gene, CDS, and protein), functional annotations (GO, KEGG and InterPro) and the existence of homologous genes (Fig. 2b). Users can select which information should be displayed by clicking on different drop-down box options. In addition, users can also employ Bedtools [39] to retrieve genomic sequences by entering the reference genome coordinates. The results can be visualized online or downloaded for local storage. The ‘Sequence fetch’ module provides a batch search function for gene, CDS, and protein sequences.

Fig. 2
figure 2

Search page of the PGDB. (a) Search for genetic information and sequences. (b) Genetic information search results, including gene details, GO ①, KEGG ② and InterPro ③ functional annotations, sequences ④, and homologous genes ⑤

Gene expression

The ‘Gene Expression’ page provides a search function for genes with annotated RPKM values. Users can find this function in the navigation bar or the ‘Tools’ module in the middle section of the home page. The results are presented as line or bar charts drawn by Echarts [40] to display RPKM values at different development stages in pear fruits. The query results support online browsing and downloading to facilitate researchers conducting in-depth analyses.

Synteny

In this page, comparative genomic information between different pear varieties is provided to facilitate quick retrieval of genomic collinearity and homologous gene pairs (Fig. 3a). In the ‘Synteny Block’ module, users can obtain synteny blocks by selecting the pear genome and chromosome of choice. The top half of the results page contains an image showing the quantitative relationship between synteny blocks of the query and compared genomes. This is implemented by HighCharts. The bottom half of the page provides complete synteny block information (block ID, location, source, e-value) in the form of a list (Fig. 3b). By clicking on different synteny blocks, users will be linked to detailed information on homologous gene pairs within synteny blocks (Fig. 3c). In the ‘Synteny Image’ module, synteny images can be constructed between the chromosomes of any two genomes, and downloaded for further study (Fig. 3d).

Fig. 3
figure 3

Synteny page of the PGDB. (a) Querying synteny blocks between genomes and drawing synteny images. (b) The synteny blocks in the query and compared chromosomes. (c) The genes contained in each synteny block. (d) The collinear image drawn online

BLAST

This page provides a user-friendly BLAST tool for sequence alignment with ViroBlast [41]. Nucleotide and amino acid sequence similarity searches can be performed through a user-friendly input-output interface. We provide three types of query databases for genomic sequences, CDS sequences and protein sequences (Fig. 4a, b). Users can search the nucleotide sequence and the protein sequence databases by query sequences in BLASTN or BLASTX, and TBLASTN and BLASTP, respectively. In addition, users can choose TBLASTX to translate nucleotide sequences into protein sequences before comparison.

Fig. 4
figure 4

The BLAST and the JBrowse tools available in the PGDB. (a) BLAST page to search for regions of similarity between sequences. (b) The BLAST results page. (c) Visualization of genomic regions using the Genome browser. (d) Detailed information about a single region

SSR markers

PGDB provides a query page for two types of SSR markers based on genomic prediction and literature reports. Users search for molecular marker data by filling in SSR IDs or selecting special items. Users can submit the search criteria to obtain detailed information including variety, SSR ID, scaffold, motif, type, repeat, start, end, and length. In addition, for genomic SSR markers, detailed information related to primers can be obtained by clicking SSR ID, such as forward sequence, reverse sequence, Tm (temperature), GC content and product size, etc.

Phylogenetic tree building

This page provides a simple and quick tool for constructing phylogenetic trees. Users can input FASTA formatted sequences, with alignment performed with MAFFT (V7.158) [42]. IQ-Tree, a stochastic algorithm to infer phylogenetic trees by maximum likelihood, is then used to assemble these sequences [43, 44]. Both the aligned sequence file and the NWK file containing the phylogenetic tree can be downloaded. Finally the Phylo.io [45] tool was used for the visual presentation of the phylogenetic trees.

Transcription factor

This page provides a search function for predicted TF and TR families in the 9 pear genomes. The search form allows users to retrieve additional TF families by entering a specific gene ID or, instead, the family name for a complete list of genes in specific families. We also provide a list of 94 families at the bottom of the search page to serve as reference.

Genome browser

The genome browser is an important tool for visualization of high-throughput sequencing data. JBrowse [46] is a genome browser based on HTML5 and JavaScript, which contains a fully dynamic AJAX interface. We collected genome and annotation information for 9 pear varieties, as well as genome re-sequencing data for 30 pear cultivars, which were mapped to the ‘Dangshansuli’ v1.0 genome [1, 47, 48]. In addition, we mapped transcriptome data from five pear cultivars of seven stages to nine pear reference genomes. These data can all be viewed in JBrowse. On the left-hand side of the genome browser, the ‘Available Tracks’ option provides all displayable file options. After choosing which files to display, the information will appear on a window located in the right-hand side (Fig. 4c). Clicking on the different parts of the sequences will display detailed data information and allows users to browse gene sequences, structure and annotations (Fig. 4d).

Other options

The ‘Species’ page contains a brief introduction to the 9 pear genomes available and provides links to the relevant literature. The ‘About’ module contains three parts: the ‘Download’ page allows users to download genomic information, including FASTA files of the genome assembly, gene, CDS, and protein sequences, and gene structure data in the GFF format. The ‘Link’ page provides quick links to other plant-related databases and resources. The ‘Contact’ option allows users to contact the administrators of the PGDB.

Conclusion

PGDB currently includes genomic, transcriptomic and re-sequencing data for pear, which can be displayed through a user-friendly platform that is functionally practical. This can help researchers quickly retrieving, browsing and analyzing multi-omics data and promote in-depth studies and development of pear omics.