MS-Helios: a Circos wrapper to visualize multi-omic datasets
Advances in high-resolution mass spectrometry facilitate the identification of hundreds of metabolites, thousands of proteins and their post-translational modifications. This remarkable progress poses a challenge to data analysis and visualization, requiring methods to reduce dimensionality and represent the data in a compact way. To provide a more holistic view, we recently introduced circular proteome maps (CPMs). However, the CPM construction requires prior data transformation and extensive knowledge of the Perl-based tool, Circos.
We present MS-Helios, an easy to use command line tool with multiple built-in data processing functions, allowing non-expert users to construct CPMs or in general terms circular plots with a non-genomic basis. MS-Helios automatically generates data and configuration files to create high quality and publishable circular plots with Circos. We showcase the software on large-scale multi-omic datasets to visualize global trends and/or to contextualize specific features.
MS-Helios provides the means to easily map and visualize multi-omic data in a comprehensive way. The software, datasets, source code, and tutorial are available at https://sourceforge.net/projects/ms-helios/.
command line interface
circular proteome map
Innovative high-throughput technologies, such as microarrays, next-generation sequencing, and mass-spectrometry (MS) have greatly advanced our understanding of biological systems. With these readily available, cost-effective, and comprehensive data acquisition methods, systems biology is undergoing a transition from single-omic to multi-omic data analysis . However, integrating and visualizing thousands of multi-omic molecular profiles poses new challenges to systems biology. To date most multi-omic analysis methods rely on clustering, correlation , or dimensionality reduction methods, e.g., principal component analysis to transform the data prior to visualization .
To provide a holistic and integrated view, we recently introduced circular proteome maps (CPMs), visualizing sample features in a circular plot in a proteome-centric way . Circular plots allow one to visualize high-dimensional data and feature relationships in an intuitive and aesthetic way, relying on well-known plot types, e.g., histograms, scatter plots, and line plots [5, 6]. In addition, data tracks provide the means to contextualize specific features over multiple omic levels. The gold standard software to build circular plots is Circos, a command line based Perl program with a steep learning curve . Multiple R packages and tools are available to ease the construction process and visualization of circular plots [6, 7, 8, 9, 10]. These tools are either built for genomic data or map other data sources to a genomic basis; none of them consider multi-omic data integration or visualization with a non-genomic basis.
To ease the construction of circular plots with a non-genomic basis, we developed a Circos wrapper termed MS-Helios. MS-Helios is a command line tool that allows for fast prototyping, data exploration, and easy generation of high quality and publish-ready figures.
MS-Helios is a Java (1.8.0_121) desktop application with a command line interface (CLI). The CLI is built with the Apache Commons CLI library (1.3.1) to support GNU and POSIX like option syntax. MS-Helios and Circos (≥ 0.67–5) default parameters are set in Java property files. The built-in normalization and transformation methods use the Apache Commons Mathematics library (126.96.36.199).
To read an input file MS-Helios supports multiple field delimiters, e.g., comma, tabular, and space, as a regular expression. Input files have to be in a data matrix format, i.e. first row containing the sample names and first column the feature identifier. The first dataset defines ideogram order and initial feature coordinates in the stepwise construction, whereas subsequent datasets are data tracks. Each ideogram represents a sample and the respective end coordinates the sample size. MS-Helios provides various built-in functions to cluster, transform, normalize, sort and filter the input data. A naïve algorithm clusters ideogram features by sample occurrence. Cluster segments can be highlighted by Circos brewer colors and/or grid lines. To assign a sample specificity score to a feature, we implemented Shannon entropy , which is associated to the sample with the highest value. Sample-wise normalization methods include z-score, scaling [0, 1], and divide by min, max, mean, standard deviation or sum. Each data track can be sorted in ascending and descending order, to restructure the ideograms and respective data tracks. To highlight specific features MS-Helios supports a top-hat and percentile filter over samples by setting a threshold in the Circos rules configuration.
MS-Helios supports several Circos data track plot types, including histogram, scatter, line plot and wedge highlights. The Circos configuration is specific to each plot type, parameters are set for optimal visualization of large-scale data. To ease graphical post-processing of Circos plots, MS-Helios allows to partition the output by sample. Each construction step is stored in the MS-Helios file by serialization. MS-Helios writes Circos configuration and data files, as well as mapping files into an output folder.
Protein and transcript expression in juvenile Sus scrofa organs
MS-Helios enables users to build circular plots with a non-genomic basis for exploration of high-dimensional multi-omic data without requiring any prior knowledge with Circos. MS-Helios implements the most useful Circos plot types, but also facilitates easy extension to other plot types. Our datasets demonstrate the aesthetics and power of circular plots to highlight intra and inter sample variation in feature abundance.
We thank K. Overmyer for fruitful discussions.
This work was supported by funds from the National Science Foundation (DBI 0701846) and the National Institutes of Health grants (P41 GM108538) and (R35 GM11810).
Availability of data and materials
Project name: MS-Helios.
Project home page: https://sourceforge.net/projects/ms-helios/
Operating system(s): Platform independent.
Programming language: Java.
Other requirements: Java 1.8 or higher, Circos 0.67–5 or higher.
License: Apache 2.0.
Any restrictions to use by non-academics: no.
HM designed and implemented the software. HM and JJC wrote the manuscript. Both authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.