Bioinformatics Resource Manager: a systems biology web tool for microRNA and omics data integration
The Bioinformatics Resource Manager (BRM) is a web-based tool developed to facilitate identifier conversion and data integration for Homo sapiens (human), Mus musculus (mouse), Rattus norvegicus (rat), Danio rerio (zebrafish), and Macaca mulatta (macaque), as well as perform orthologous conversions among the supported species. In addition to providing a robust means of identifier conversion, BRM also incorporates a suite of microRNA (miRNA)-target databases upon which to query target genes or to perform reverse target lookups using gene identifiers.
BRM has the capability to perform cross-species identifier lookups across common identifier types, directly integrate datasets across platform or species by performing identifier retrievals in the background, and retrieve miRNA targets from multiple databases simultaneously and integrate the resulting gene targets with experimental mRNA data. Here we use workflows provided in BRM to integrate RNA sequencing data across species to identify common biomarkers of exposure after treatment of human lung cells and zebrafish to benzo[a]pyrene (BAP). We further use the miRNA Target workflow to experimentally determine the role of miRNAs as regulators of BAP toxicity and identify the predicted functional consequences of miRNA-target regulation in our system. The output from BRM can easily and directly be uploaded to freely available visualization tools for further analysis. From these examples, we were able to identify an important role for several miRNAs as potential regulators of BAP toxicity in human lung cells associated with cell migration, cell communication, cell junction assembly and regulation of cell death.
Overall, BRM provides bioinformatics tools to assist biologists having minimal programming skills with analysis and integration of high-content omics’ data from various transcriptomic and proteomic platforms. BRM workflows were developed in Java and other open-source technologies and are served publicly using Apache Tomcat at https://cbb.pnnl.gov/brm/.
KeywordsBioinformatics MicroRNA Systems biology Genomics Zebrafish
Bioinformatics resource manager
Human bronchial epithelial cells
National center for biotechnology information
There is an increasing need for bioinformatics tools to assist biologists having minimal programming skills with analysis and integration of high-content omics’ data from various transcriptomic and proteomic platforms. The Bioinformatics Resource Manager (BRM) is a web-based tool developed to facilitate identifier conversion and data integration for Homo sapiens (human), Mus musculus (mouse), Rattus norvegicus (rat), Danio rerio (zebrafish), and Macaca mulatta (macaque), as well as perform orthologous conversions among the supported species. BRM is particularly focused on reducing data fragmentation throughout these processes, allowing users to upload full tables of data, then appending new columns directly into those tables or directly integrating full tables based on common (or converted) identifiers.
Biological insight relies on the interpretation of annotated data. Often annotations need to be converted from one identifier to another or carried over to an orthologous annotation for some downstream tasks. DAVID  provides functionality for converting identifiers within a species but lacks the ability to look up orthologous genes. BioMart  integrates internal and external data to convert identifiers and provide orthologous gene information for model organisms. The functionality of these web-based conversion tools, like BRM, relies on user provided gene lists, although DAVID and BioMart lack the ability to merge identifier conversions with existing datasets. BRM also allows users to integrate data tables based on (1) string matching for tables that include common identifier types or (2) identifier conversion using National Center for Biotechnology Information (NCBI), Uniprot and Ensembl databases to allow for integration of tables without common identifier types (e.g. cross-species integration, gene-to-protein integration). Other tools, such as GeneWeaver, allow for identifier mapping within the context of their data analysis pipeline and tools for functional genomics . While BRM will also perform these functions within the context of BRM workflows, it allows users to simply update their omics tables with new metadata and biomolecular identifiers for use in any data analysis or software programs of interest.
In addition to providing a robust means of identifier conversion, BRM also incorporates a suite of microRNA (miRNA)-target databases upon which to query target genes or to perform reverse target lookups using gene identifiers. MiRNAs are small ~ 22 nucleotide non-coding RNAs that function as post-transcriptional regulators of gene expression. miRNAs typically interact with targets through sequence complementarity in the 3’UTR making it possible to computationally predict miRNA gene targets. Several tools exist to link miRNAs to gene targets, including both computationally predicted miRNA target databases and databases with experimentally validated targets (reviewed by Singh 2017). Available databases in BRM for miRNA target prediction include TargetScan , microRNA.org , and MicroCosm , as well as the validated miRNA target database miRTarBase . Each of these databases also allow searching for miRNA targets and performing reverse target queries based on gene ID. However, for input, many existing miRNA database interfaces are limited to single miRNA queries with the exception of microRNA.org which allows a comma-separated list of multiple identifiers. Further, the user will again have to perform table merges to align respective miRNAs into their gene result tables. Where miRNA names are inconsistent, a user may have to use miRBase  to verify conversions or use a dedicated tool like miRiadne  to convert miRNA identifiers between miRBase versions 10 through 21. Instead, BRM allows users to integrate predicted targets from databases directly into the experimental tables they have uploaded into BRM as input. BRM also integrates miRBase versions to convert user miRNAs to their most recent version before querying miRNA databases to ensure successful searches.
The BRM miRNA-target query allows users to retrieve targets from multiple databases simultaneously and integrate the resulting gene targets with experimental mRNA data. By utilizing multiple databases, a single search not only yields results from all available databases, it also allows a user to select more confident predictions by requiring targets to be present in multiple databases. Other available tools, such as miDIP 4.1, allow for simultaneous query of multiple miRNA target databases for human only  or provide users with the ability to integrate predicted targets from a single database with mRNA data, such as miRTrail . In addition, BRM’s miRNA workflows populate missing identifier fields that are typically created from merging multiple target identification resources providing users with more comprehensive output to accurately compare across multiple prediction tools.
Construction and content
BRM is a web application implemented in Java and Extensible Hypertext Markup Language. The front-end of BRM relies on PrimeFaces, an implementation of the Java Server Faces specification, to build user interface components. Data sources are maintained as flat files to facilitate database updates and are stored in memory during runtime to accelerate ID conversion and lookups across data resources to make BRM responsive even with fairly large user queries. BRM has been developed as an independent web tool, compared to utilizing platforms for tool development such as Galaxy , to allow flexibility to meet specific development requirements and maintain a straightforward, easy-to-use interface for the biological research community. BRM allows users to upload data directly into a simple web interface and provides several comprehensive workflows, which users can run independently for specific tasks or sequentially to allow users to seamlessly move data through multiple tasks. Maintaining BRM in this way allows us to optimize functionality and ensure consistency for users over time. Further, BRM is easily extended by its developers and has the ability to scale beyond the current data to accommodate additional tools, functionality, biomolecular identifiers and species.
BRM maintains local copies of NCBI’s Gene resource , Ensembl , and UniProt  for identifier conversions. MiRNA reference data is aggregated from Microcosm, TargetScan, MicroRNA, and miRTarBase with missing gene information being added using MyGene.info . miRbase is used for miRNA name conversion, accession numbers, and mature sequence data. Each data resource has an associated backup process that facilitates validation, database updates, and to backfill missing identifiers across resources.
Utility and discussion
BRM incorporates common tasks across highly relevant species to facilitate the integration and analysis of high-throughput data. The BRM web tool is organized into several workflows, 1) Add Identifiers, 2) Integrate Tables, 3) miRNA Targets and 4) miRNA Convert, allowing biological researchers the ability to perform complex bioinformatics tasks through a simple web-interface. Users can retrieve annotations and cross-reference gene and protein identifiers for several species, including human, macaque, mouse, rat and zebrafish and identify miRNA targets for human, mouse and zebrafish. Further, BRM allows datasets to be uploaded as tab-separated (.txt) files with columns in any order and will maintain the structure and content of user-provided data during queries. This allows users to easily incorporate additional content into their datasets to perform comparisons across species and platforms (e.g. transcriptomics and proteomics; microarray and RNA sequencing (RNASeq); in vitro and in vivo). BRM also provides a tool for directly integrating datasets across platform or species by performing identifier retrievals in the background. The BRM ‘miRNA Targets’ and ‘miRNA Convert’ workflows allow users to quickly identify miRNA gene targets from multiple databases, integrate miRNA and mRNA datasets based on target predictions, and retrieve current miRNA annotations for metadata from older platforms.
This tool integrates disparate data tables based on identifiers contained within the tables uploaded. Users have the ability to integrate data across species or platform (e.g. gene and protein data) without common identifiers in the tables. After uploading data, the user may select up to three identifier columns from each table upon which to perform the merge operation. Identifiers between tables can be compared using string equality, which performs a simple exact match, or conversions of identifiers within or across species can be performed. The output from this tool can be limited to a particular species as well as limited to just the intersection of the two input tables. Another important aspect of the data integration tool is that all user-provided data is maintained in the merge and the output includes a full integration of both tables based on the features chosen (see example in Cross-Species Data Integration below).
miRNA target prediction
Predicted gene targets from Microcosm, MicroRNA, and TargetScan, as well as experimentally validated gene targets from miRTarBase, can be queried using mature miRNA names. Mature miRNA names are converted to their current miRBase name during the search process. Target genes include identifiers for Entrez, gene symbol, and Ensembl gene and can optionally be appended to miRNA target prediction results. Gene target results can be limited to any combination of the databases and can be limited based on database overlap, e.g. require hits from at least 2 of the 4 selected databases. The workflow can optionally merge experimental data based on gene identifiers that match the predicted targets. Results include gene targets, database overlaps, respective scores from predictive databases, accession numbers for the stem-loop and mature miRNA, and the mature RNA sequence.
A reverse lookup, starting from gene identifiers as targets, can also be performed to return mature miRNA names. Multiple gene ID types may be used from the input table to ensure successful translation.
To facilitate analyses across tools it may necessary to convert miRNA identifiers to their most current miRBase version. This workflow, given a tab-delimited table, will accept one column as the defined miRNA and append its most recent version as the final column in the output. The output and conversion of identifiers can be restricted to a given species.
Cross-species data integration
miRNA target prediction and data integration
In order to identify miRNAs predicted to regulate genes significantly altered by BAP exposure in human cells, we utilized the reverse look-up feature (gene-to-miRNA query) of the miRNA Targets workflow in BRM. A tab delimited (.txt) file of genes differentially expressed (q < 0.05) by BAP in HBEC were uploaded to the miRNA Targets workflow (Additional file 2). Predicted miRNAs were restricted to those that were identified from any 4 of 4 target databases, meaning that the miRNA-gene target relationship was predicted by all data sources, including Microcosm, MicroRNA, TargetScan and miRTarBase. The miRNA predicted from this analysis associated with the most target interactions in the dataset was hsa-miR-124-3p, which was connected to 27 gene targets regulated by BAP. MiRNA-124-3p was recently found to be overexpressed in smokers at increased risk of cardiovascular disease  and elevated in HepaRG cells after BAP exposure .
BRM provides easy to follow workflows to assist biological researchers with complex bioinformatics tasks required for integration of disparate data types (e.g. cross-species and cross-platform) with specific tools for miRNA target prediction and conversion. Previous versions of the BRM software provided similar tools in a client-server application [25, 26], however compatibility with multiple operating systems (Windows vs Mac) and evolving support software (java runtime environment) resulted in several versions to support and maintain. In this new version, we have converted several of the old tools, such as the identifier conversion and miRNA target query, into seamless web interfaces without the need to download software or remember login information. We have also updated the workflows to simplify multiple steps through identifier conversions that happen in the background. Here, we provide example datasets and workflows for utilizing the BRM data integration tool to identify common biomarkers in humans and zebrafish after exposure to a ubiquitous environmental contaminant, BAP. BRM integrated the two RNAseq data tables from human and zebrafish utilizing the cross-species functionality without requiring any common identifiers. Further, BRM maintained the content and structure of the uploaded files during the integration for direct use in downstream visualization tools for interpretation. The BRM miRNA Targets workflow was also utilized to identify the potential functional consequences of miRNA regulation by BAP in human lung cells and involved target prediction of experimentally measured miRNAs and integration of predicted targets with differentially expressed mRNA collected in parallel. The resulting output included a list of high-confidence predicted targets for miRNAs regulated by BAP that were relevant to our experimental system and directly uploaded into other freely available software tools for additional analysis. Overall, BRM allows for efficient processing and integration of multiple data types within a single tool and provides users the ability to effectively mine complex data.
Pacific Northwest National Laboratory is a multi-program national laboratory operated by Battelle for the U.S. Department of Energy under Contract DE-AC05-76RL01830.
This project was supported by the National Institute of Environmental Health Sciences Superfund Research Program P42 ES016465 and T32ES07060. The funding body did not play any role in the design of the study, writing of the manuscript, and collection, analysis and interpretation of data.
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its Additional files.
JB participated in the design of the software, created the tutorials and drafted the manuscript. AP, EP, and JB developed BRM’s database structure and content. AP, DL, and JB developed BRM’s user-interface and workflow strategy. YC and MM analyzed and interpreted RNAseq data and tested the user interface. RLT carried out the molecular and biological studies and participated in the experimental design and data interpretation. KW and EP guided the development of BRM and revised versions of the manuscript. ST participated in software design, assisted in drafting the manuscript, and directed molecular and biological studies and data interpretation. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 4.Agarwal V, Bell GW, Nam J-W, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. Elife eLife Sciences Publications Limited. 2015;4:101.Google Scholar
- 15.The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018.Google Scholar
- 16.Xin J, Mark A, Afrasiabi C, Tsueng G, Juchler M, Gopal N, et al. High-performance web services for querying gene and variant annotation. Genome Biol. BioMed Central. 2016;17:91.Google Scholar
- 18.Wang YE, Kutnetsov L, Partensky A, Farid J, Quackenbush J. WebMeV: a cloud platform for analyzing and visualizing Cancer genomic data. Cancer Res American Association for Cancer Research. 2017;77:e11–4.Google Scholar
- 24.Nadiminty N, Tummala R, Lou W, Zhu Y, Shi X-B, Zou JX, et al. MicroRNA let-7c is downregulated in prostate cancer and suppresses prostate cancer growth. Das GM, editor. PLoS One 2012;7:e32832.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.