Manhattan++: displaying genome-wide association summary statistics with multiple annotation layers
Over the last 10 years, there have been over 3300 genome-wide association studies (GWAS). Almost every GWAS study provides a Manhattan plot either as a main figure or in the supplement. Several software packages can generate a Manhattan plot, but they are all limited in the extent to which they can annotate gene-names, allele frequencies, and variants having high impact on gene function or provide any other added information or flexibility. Furthermore, in a conventional Manhattan plot, there is no way of distinguishing a locus identified due to a single variant with very significant p-value from a locus with multiple variants which appear to be in a haplotype block having very similar p-values.
Here we present a software tool written in R, which generates a transposed Manhattan plot along with additional features like variant consequence and minor allele frequency to annotate the plot and addresses these limitations. The software also gives flexibility on how and where the user wants to display the annotations. The software can be downloaded from CRAN repository and also from the GitHub project page.
We present a major step up to the existing conventional Manhattan plot generation tools. We hope this form of display along with the added annotations will bring more insight to the reader from this new Manhattan++ plot.
KeywordsManhattan plot GWAS Meta-analysis R Software CRAN
Genome wide association study
Minor allele frequency
Million base pairs
A Manhattan plot, which plots the association statistical significance as –log10(p-value) in the y-axis against chromosomes in the x-axis, is a good way of displaying millions of genetic variants in one figure. One can easily spot regions of the genome that cross a particular significance threshold. Furthermore, it makes it easy to identify regions that can be taken forward for replication. Several software packages (QQMAN , GWAMA , IGV , https://genome.sph.umich.edu/wiki/Code_Sample:_Generating_Manhattan_Plots_in_R, SNPEVG ) come bundled with a plotting feature or a small R script which can generate a Manhattan plot. These scripts generate the plot but because of the lack of any further information in the plot (annotating the plot with gene names, identifying how significant are low frequency variants and high impact consequence variants in the GWAS), the Manhattan plot is losing its importance in more recent GWAS publications. However, with availability of large cohorts (eg. UK Biobank) and power to detect more loci crossing genome wide significant threshold (over 500 in the recent Blood Pressure GWAS ), it is a tedious, time-consuming process to annotate gene names manually on a Manhattan plot. Another drawback with the conventional plot is the inability to identify the number of variants hiding behind “a” visible dot. In order to overcome the limitation to annotate ever-increasing loci discovered, researchers have started transposing [6, 7, 8, 9, 10, 11] the Manhattan plot to give more room to display the gene names on the plot. Manhattan++ software tool reads the genome-wide summary statistic on millions of variants and generates the transposed Manhattan++ plot with user defined annotations like gene-names, allele frequencies, variant consequence and summary statistics of loci of interest.
Relevant columns in the configuration file for the software
Cells with one variant are black.
Cells with one variant with high conseq are light pink.
Cells with one variant with MAF less than threshold are green.
Cells with one variant with MAF less than threshold and high conseq are dark magenta.
Cells with 2 or more variants are blue.
Cells with 2 or more variants with high conseq in at least one are pink.
Cells with 2 or more variants with a MAF less than threshold in at least one are red.
Cells with 2 or more variants with at least one variant with MAF less than threshold and at least one variant with the conseq flag are cyan.
Here we present the Manhattan++ software which is a major step up from existing tools and addresses the highlighted limitations. Furthermore, the code is customizable and being open source increases the potential for future feature enhancements by the community. We recognize that there are existing scripts that generate a Manhattan plot but none can perform the tasks we have implemented in this software. However, only a handful of them annotate the plot with minimal level of detail (Additional file 1: Supplementary Note, Table S1). Most existing scripts generate a graph in a landscape orientation, which is not enough with ever-increasing number of discovered GWAS loci. A limitation with our method is that it takes one full A4 page of the journal to display but with more researchers reading publications online, this figure is highly web readable and useful for poster presentations. This software adds a lot of information to the existing Manhattan plot and we hope that the readers will be able to derive more information by looking at the Manhattan++ plot.
Availability and requirements
Project name: manhplot.
Project home page: https://github.com/cgrace1978/manhplot
Operating system(s): Platform independent.
Programming language: R (> = 3.4.0).
Other requirements: R dependencies (ggplot2, reshape2, ggrepel, gridExtra).
Any restrictions to use by non-academics: None.
License: GPL (> = 2).
CG wrote the software in R. AG wrote the Perl Utility. MF & HW provided valuable input for feature enhancements. AG, MF & HW wrote the manuscript. All authors read and approved the final manuscript.
This work was supported by BHF, European Commission (LSHM-CT- 2007-037273, HEALTH-F2–2013-601456), the Wellcome Trust (201543/B/16/Z), Wellcome Trust core award (090532/Z/09/Z, 203141/Z/16/Z), BHF Centre of Research Excellence and TriPartite Immunometabolism Consortium [TrIC]- NovoNordisk Foundation (NNF15CC0018486). Computation used the Oxford Biomedical Research Computing (BMRC) facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. Financial support was provided by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. Funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
- 5.Evangelou E, Warren HR, Mosen-Ansorena D, Mifsud B, Pazoki R, Gao H, et al. Genetic analysis of over one million people identifies 535 novel loci for blood pressure. bioRxiv. 2017.Google Scholar
- 11.Cortes A, Dendrou CA, Fugger L, McVean G. Systematic classification of shared components of genetic risk for common human diseases. BioRxiv. 2018.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.