CircularLogo: A lightweight web application to visualize intra-motif dependencies

Ye, Zhenqing; Ma, Tao; Kalmbach, Michael T.; Dasari, Surendra; Kocher, Jean-Pierre A.; Wang, Liguo

doi:10.1186/s12859-017-1680-2

CircularLogo: A lightweight web application to visualize intra-motif dependencies

Software
Open access
Published: 22 May 2017

Volume 18, article number 269, (2017)
Cite this article

Download PDF

You have full access to this open access article

BMC Bioinformatics Aims and scope Submit manuscript

CircularLogo: A lightweight web application to visualize intra-motif dependencies

Download PDF

Zhenqing Ye¹,
Tao Ma²,
Michael T. Kalmbach¹,
Surendra Dasari¹,
Jean-Pierre A. Kocher¹ &
…
Liguo Wang ORCID: orcid.org/0000-0003-2072-4826^1,2

2627 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Background

The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization.

Result

We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program’s source code and user’s manual are freely available at http://circularlogo.sourceforge.net. CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html.

Conclusion

CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.

DiffLogo: a comparative visualization of sequence motifs

Article Open access 17 November 2015

A new sequence logo plot to highlight enrichment and depletion

Article Open access 10 December 2018

Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

Article Open access 28 August 2014

Background

Many DNA and RNA binding proteins recognize their binding sites through specific nucleotide patterns called motifs. Motif sites bound by the same protein do not necessarily have same sequence but typically share consensus sequence patterns. Several methods have been developed to statistically model the position-specific consensus and diversity of nucleotide motifs using the position weight matrix (PWM) or position-specific scoring matrix (PSSM) [1, 2]. These mathematical representations are usually visualized using sequence logos, which depict the consensus and diversity of each motif residue as a stack of nucleotide symbols. The height of each symbol within the stack indicates its relative frequency, and the total height of symbols is scaled to the information content of that position [3, 4].

Traditional PWM and PSSM assume statistical independence between nucleotides of a motif. However, such assumption is not completely justified, and accumulated evidence indicates the existence of intra-motif dependencies [5,6,7,8]. For example, an analysis of wild-type and mutant Zif268 (EGR-1) zinc fingers, using microarray binding experiments, suggested that the nucleotides within transcription factor binding site (TFBS) should not be treated independently [5]. In addition, the intra-dependences within a motif were also revealed by a comprehensive experiment to examine the binding specificities of 104 distinct DNA binding proteins in mouse [8]. Intra-motif dependencies when into consideration could substantially improve the accuracy of de novo motif discovery [9]. Therefore, many statistical methods have been developed to characterize the intra-motif dependencies, which include the generalized weight matrix model [10], sparse local inhomogeneous mixture model (Slim) [11], transcription factor flexible model based on hidden Markov models (TFFMs) [12], the binding energy model (BEM) [13], and the inhomogeneous parsimonious Markov model (PMM) [14]. However, the most commonly used visualization tools such as WebLogo [3] and Seq2Logo [15] are incapable of displaying these intra-motif dependencies.

Only a handful of tools like CorreLogo, enoLOGOS, and ELRM are capable of visualizing positional dependencies [16,17,18]. CorreLogo depicts mutual information from DNA or RNA alignment using three-dimensional sequence logos generated via VRML and JVX. However, CorreLogo’s three-dimensional graphs are difficult to interpret because of the excessively complex and distorted perspective associated with the third dimension. ELRM generates static graphs to visualize intra-motif dependences. ELRM splits up “base features” and “association features” and fails to comprehensively integrate nucleotide diversities and dependencies. In addition, ELRM is limited to measuring dependence with its own built-in method. Similar to ELRM, enoLOGOS represents the dependency between different positions using a matrix plot underneath the nucleotide logo. While pLogo allows user to visualize correlations to a particular nucleotide position, it fails to provide overall view of intra-motif dependencies [4]. Finally, all of these tools lack the functionality for users to explore and interpret the data in an interactive fashion.

In this study, we developed CircularLogo, an interactive web application, which is capable of simultaneously displaying position-specific nucleotide frequencies and intra-motif dependencies. CircularLogo uses an open-standard, human-readable, flexible and programming language independent JSON (JavaScript Object Notation) data format to describe various properties of DNA motifs. Other commonly used motif formats such as MEME, TRANSFAC, and JASPAR can be easily converted into JSON format.

Implementation

JSON-Graph specifications of nucleotide motif representation

We used the JSON-Graph format to describe nucleotide motif in order to make it intelligible and malleable. The schema of JSON-Graph format is illustrated as below:

The contents within two curly braces describe a DNA or RNA motif. Specifically, the “id” keyword specifies the name of the motif. The “background” keyword designates nucleotides frequencies (in the order of A, T, C and G) of the relevant genomic background. For example, when studying motifs in human genome, these percentages are computed from the human reference genome as background distribution. By default, they are set to 0.25 representing equal frequencies. The “pseudocounts” keyword represents the extra nucleotides added to each position of the motif to avoid zero-division error in small data set; these are set to 0.25 for each nucleotide by default. The “nodes” section describes various properties of motif residues using the following keywords: a) the “index” keyword specifies the sequential order (in anticlockwise) of nucleotide stacks b) the “label” keyword denotes the identity of each nucleotide stack c) the “bit” keyword refers to the information content calculated for each nucleotide stack d) the “base” keyword indicates the four nucleotides sorted incrementally by their corresponding frequencies as designated by the “freq” keyword. The “links” section describes the pairwise dependencies between nucleotide stacks using the following keywords: a) the “source” and “target” keywords denoting the start and the end positions of nucleotide stacks b) the “value” keyword indicates the width of the link that is proportional to the strength of dependence between the two linked positions.

CircularLogo web server

CircularLogo web application uses NGINX (https://www.nginx.com/) web server with uWSGI (https://pypi.python.org/pypi/uWSGI) gateway interface to handle multiple concurrent client requests. The application is hosted on Amazon Elastic Compute Cloud (Amazon EC2).

Measure intra-motif dependencies using χ² statistic

We implemented two metrics to calculate the dependence between a pair of nucleotide positions: mutual information and the χ² statistic. The χ² statistic is widely used to test the independence of two categorical variables and corresponding Q score is a natural measure of dependency between two events that quantifies the co-incidence as follows. Let us assume that a DNA motif is l nucleotides long and is built from N sequences. For given two positions i and j within the motif (1 ≤ i ≤ l, 1 ≤ j ≤ l, i ≠ j), the observed di-nucleotide frequency is denoted as O _ij, which can be obtained by counting di-nucleotide combinations from the input N sequences. The expected di-nucleotide frequency is represented as E _ij. The χ² statistic score is then calculated as:

$$ Q={\displaystyle \sum_{k=1}^m\frac{{\left({O}_{ij}^k-{E}_{ij}^k\right)}^2}{E_{ij}^k}, Q\sim {x}^2\left( m-1\right), m=16,{O}_{ij}\in \left[ AA, AT, AC, AG,\dots \right]} $$

Here, m is the total number of di-nucleotides (4² = 16).

Measure intra-motif dependencies using mutual information

The second built-in approach to measure dependence is the mutual information. This metric quantifies the mutual dependence between two discrete random variables X (X = [A, C, G, T]) and Y (Y = [A, C, G, T]) and it is defined as:

$$ I\left( X; Y\right)={\displaystyle \sum_{y\in Y}{\displaystyle \sum_{x\in X} p\left( x, y\right) log}}\left(\frac{p\left( x, y\right)}{p(x) p(y)}\right) $$

Here, x (x ∈ [A, C, G, T]) and y (y ∈ [A, C, G, T]) represent nucleotides at two nucleotide stacks X and Y, respectively. p (x) and p (y) denote the nucleotide frequencies of x and y. p (x, y) defines the frequencies of dinucleotides (xy) from X and Y. The significance of dependency between two positions was evaluated using Chebyshev’s inequality. For example, if the observed mutual information is K × stdev times larger than that expected from random background model. P < = 1/K².

HNF6 motif analysis

HNF6 ChIP-exo data was obtained from Array Express (accession number E-MTAB-2060; http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2060/), processed with MACE [19], and HNF6 binding sites were extracted. The 5549 65-nucleotide (upstream 20 nucleotides + 25 nucleotides HNF6 binding site + downstream 20 nucleotides) sequences were published to https://sourceforge.net/projects/circularlogo/files/test/. All sequences were aligned by the HNF6 motif, which start from postion-29 to position-36.

tRNA sequence analysis

A total of 1114 tRNA sequences were downloaded from RFAM database [20] in the form of RFAM ‘seed’ alignment format (accession # RF00005; https://correlogo.ncifcrf.gov/ccrnp/trnafull.html). After excluding sequences with gaps in the alignment, 291 sequences were used as the final dataset to generate circular logo of tRNA (https://sourceforge.net/projects/circularlogo/files/test/). Mutual information was used as the metric to measure intra-motif dependencies. The lower 33% links were filtered out.

Synthesized DNA fragments of splice sites and branch-points for analysis

We used the synthesized DNA fragments by concatenating the 5′ donor site (16 bp), branch-point (21 bp) and the 3′ acceptor site (16 bp) to represent the splicing motif. Briefly, a total of 59,359 predefined, high-confidence human branch-points were downloaded from the supplementary data of the study [21]. We excluded introns with multiple branch-points, small introns (<1 kb) and introns with small gap (≤25 bp) between the branch-point and the acceptor site. For each of the remained introns, we first extracted upstream 6 bp and downstream 10 bp of 5′ donor site. Then we extracted a 21 bp DNA sequence encompassing branch-point by extending 10 bp to both upstream and downstream of the branch-point; thirdly, we extracted upstream 10 bp and downstream 6 bp of 3′ acceptor site. At last, we concatenated these three DNA sequences in the order of “5′ donor site–branch-point–3′ acceptor site” to form a 53 bp DNA fragment. We used a final set of 10,316 DNA fragments to generate circular logo (https://sourceforge.net/projects/circularlogo/files/test/).

Results

Circular nucleotide motif

Unlike the traditional sequence logos that display motif residues on a two-dimensional Cartesian coordinate system (with the x-axis denoting the position of residue stacks and the y-axis denoting the information contents), CircularLogo visualizes motifs using a polar coordinate system that facilitates the display of pairwise intra-motif dependencies with linked ribbons (Fig. 1). Since traditional PWM or PSSM representations do not preserve intra-motif dependency information, we use the JSON-Graph as the main input format to CircularLogo. When the input file is in JSON-Graph format that has pre-calculated nucleotide frequencies and dependencies, the CircularLogo simply transforms this file into a pictorial representation. In addition, CircularLogo also accepts the FASTA format motif representation as input. In this scenario, CircularLogo transforms the FASTA information into a JSON-Graph format by calculating the intra-motif dependency using the built-in χ² statistic or mutual information metric, and determine the height of each nucleotide stack in the same way as webLogo [3]. In brief, CircularLogo generates a sector for each motif position and draws nucleotide stack within that sector based on the information content and relative frequencies of nucleotides. All sectors are properly arranged into a circular layout. The width of linked arcs indicates the strength of intra-dependency between each pair of nucleotide positions.

CircularLogo allows users to interactively adjust a variety of parameters and explore intra-motif dependencies and fine-tune the appearance of the final output. For example, any nucleotide in the genome has a certain level of dependencies with its immediate neighbors. Such dependencies are considered as the background noise since they are not likely to be biologically meaningful. CircularLogo automatically filters out weak links according to user-specified p-value, and also provides a slider bar to let user to do interactive filtering.

Nucleotide dependencies within HNF6 motif

HNF6 (also known as ONECUT1) is a transcription factor that regulates expression of genes involved in a variety of cellular processes. The exact protein-DNA binding boundaries of HNF6 in mouse genome were previously defined by our group [19]. A total of 5549 binding sites, each of 25 nucleotides long, were used to explore the intra-motif dependencies. Each binding site was also extended 20 nucleotides up- and downstream in order to estimate the background dependency level. Pair-wise dependencies between all 65 positions were displayed in Fig. 1a. As we expected, dependencies between positions within the HNF6 binding site (i.e. nucleotides within 29th and 36th position) were much higher than those of flanking regions (Fig. 1b). Figure 1c indicated background links relating to node 5 (i.e. the 5th position of input DNA sequence). Figure 1d indicated dependencies related to node 33 within the HNF6 binding site after spurious links were removed.

Nucleotide dependencies within tRNAs

The transfer RNA (tRNA) is involved in translating message RNA (mRNA) into the amino acid sequence. It’s typical cloverleaf secondary structure is composed of D-loop, anticodon loop, variable loop and T_ΨC loop, as well as four base-paired stems between these loops (Fig. 2a). The nucleotides within stems are less conserved than those of loops, but base pairings within stems are required for structural stability. Thus we expect higher positional dependencies between nucleotides within stems than those within loops. We used CircularLogo, with mutual information as a measurement of dependence, to generate tRNA circular motif. After filtering out weak links (lower 33%), we observed four apparent clusters of connected links corresponding to the four stems (Fig. 2b). Comparing to motif logo generated from enoLOGOS (http://www.benoslab.pitt.edu/cgi-bin/enologos/enologos.cgi) using the same dataset, CircularLogo provided more intuitive view of intra-dependencies within the four stems (Fig. 2c). Figure 2b also shows that nucleotides with three loops (D-loop, Anticodon loop, and T_ΨC loop) exhibited much higher sequence conservation than that of nucleotides located in stems, suggesting that the loops are main functional domains of tRNA. For example, D-loop is the recognition site of aminoacyl-tRNA synthetase, an enzyme involved in amino-acylation of the tRNA molecule [22, 23], and T_ΨC loop is the recognition site of the ribosome.

Nucleotide dependencies between splicing sites and branch site in eukaryotic introns

Splicing is a critical step during pre-mRNA processing, where introns are removed and exons are joined together by the spliceosome complex. The eukaryotic genes contain three splicing motifs that are essential for successful intron excision: an almost invariant 5′-splice site (donor site), 3′-splice site (acceptor site) and the branch site that is about 20–50 bp upstream of acceptor site. Generally, two successive biochemical reactions are involved in the spliceosomal splicing: First, a specific branch-point nucleotide within the intron, defined during spliceosome assembly, performs a nucleophilic attack on the 5′-splice donor site to form a lariat intermediate. Second, the released 5′-exon attacks 3′-splice acceptor site to excise lariat structure and join the adjacent exons [24]. Recently, Mercer et al. identified 59,359 high-confidence human branch-points using high-throughput sequencing technique [21]. These reliable sites provide us a great opportunity to investigate how those elements interact with each other. We extracted the motif DNA sequences (see Implementation section) and explored their nucleotide dependencies using CircularLogo with χ² statistic approach (Fig. 3). After filtering those weak links, we found strong dependencies among the three sites (donor site, branch-point and acceptor site). In addition, CircularLogo further revealed the interactions between the polypyrimidine tract and the two splice sites (donor site and acceptor site).

Discussion

New statistical models and experimental approaches are being developed for measuring intra-motif dependency. CircularLogo uses a plain text, JSON-Graph formatted, file to describe DNA/RNA motifs, which enables users to generate a customized JSON-Graph file containing positional dependencies that are pre-calculated by their choice methods.

When the raw sequences were given to CircularLogo, it provides two approaches (χ² statistic and mutual information) for measuring the positional dependency. Both of these methods, although commonly used, are biased and unable to quantify dependencies between highly conserved nucleotide stacks (e.g. invariable sites) [6, 25]. This problem could be address by users providing as many sequences as possible in order to capture the low-frequent variants at those highly conserved sites. This is feasible due to genome-wide, high-throughput, screening technologies. For example, researchers usually identify tens of thousands of potential TFBSs using ChIP-seq or other similar technologies. After retrieving the potential TFBSs from ChIP-seq data, a researcher can align them using the predicted DNA motif and give the final alignment file as input for CircularLogo. We recommend that a FASTA input file should contain at least 25 sequences.

It is worth noting that the χ² statistic and mutual information are two different measures of dependence, each suited for use under different conditions. Essentially, the χ² statistic measures the co-occurrence of nucleotides of two different positions. Hence, χ² method is suited for measuring dependency between two conserved (i.e. less variable) positions but it has limited power to measure dependency between two highly variable positions wherein the dinucleotide frequencies are close to background (i.e. 1/16) and the χ² statistic approaches 0. In contrast, mutual information measures the reduction in uncertainty about nucleotide frequencies in one position, given some knowledge of nucleotide frequencies at another position. For a pair of highly conserved positions that are dominated by particular nucleotides, the information content of each position and the mutual information between them approaches to 0 bit. Hence, mutual information is suited for measuring dependency between two highly variable positions.

Conclusions

Visualization is key for efficient data exploration and effective communication in scientific research. CircularLogo is an innovative tool offering the panorama of DNA or RNA motifs taking into consideration the intra-site dependencies. We demonstrated the utility and practicality of this tool using examples wherein CircularLogo was able to depict complex dependencies within motifs and reveal biomolecular structure (such as stem structures in tRNA) in an effective manner.

Abbreviations

BEM:: the Binding energy model
JSON:: Java script object notation
JVX:: Java view geometry file
MACE:: Model-based analysis of ChIP-Exo
MEME:: Multiple Em for motif elicitation
MI:: Mutual information
PMM:: the Inhomogeneous parsimonious Markov model
PSSM:: Position-specific scoring matrix
PWM:: Position weight matrix
TFBS:: Transcription factor binding sites
TFFMs:: Transcription factor flexible model
VRML:: Virtual reality modeling language

References

Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23.
Article CAS PubMed Google Scholar
Boeva V. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells. Front Genet. 2016;7:24.
Article PubMed PubMed Central Google Scholar
Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90.
Article CAS PubMed PubMed Central Google Scholar
O’Shea JP, Chou MF, Quader SA, Ryan JK, Church GM, Schwartz D. pLogo: a probabilistic approach to visualizing sequence motifs. Nat Methods. 2013;10:1211-1212.
Bulyk ML, Johnson PLF, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30:1255–61.
Article CAS PubMed PubMed Central Google Scholar
Eggeling R, Gohr A, Keilwagen J, Mohr M, Posch S, Smith AD, et al. On the value of intra-motif dependencies of human insulator protein CTCF. PLoS ONE. 2014;9, e85629.
Article PubMed PubMed Central Google Scholar
Man TK, Stormo GD. Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 2001;29:2471–8.
Article CAS PubMed PubMed Central Google Scholar
Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, et al. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–3.
Article CAS PubMed PubMed Central Google Scholar
Grau J, Posch S, Grosse I, Keilwagen J. A general approach for discriminative de novo motif discovery from high-throughput data. Nucleic Acids Res. 2013;41, e197.
Article CAS PubMed PubMed Central Google Scholar
Zhou Q, Liu JS. Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics. 2004;20:909–16.
Article CAS PubMed Google Scholar
Keilwagen J, Grau J. Varying levels of complexity in transcription factor binding motifs. Nucleic Acids Res. 2015;43, e119.
Article PubMed PubMed Central Google Scholar
Mathelier A, Wasserman WW. The Next Generation of Transcription Factor Binding Site Prediction. PLoS Comput Biol Public Library of Science. 2013;9:e1003214.
Article CAS Google Scholar
Zhao Y, Ruan S, Pandey M, Stormo GD. Improved models for transcription factor binding site identification using nonindependent interactions. Genetics. 2012;191:781–90.
Article CAS PubMed PubMed Central Google Scholar
Eggeling R, Roos T, Myllymäki P, Grosse I. Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data. BMC bioinformatics. 2015;16:375.
Article PubMed PubMed Central Google Scholar
Thomsen MCF, Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 2012;40:W281–7.
Article CAS PubMed PubMed Central Google Scholar
Bindewald E, Schneider TD, Shapiro BA. CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments. Nucleic Acids Res. 2006;34:W405–11.
Article CAS PubMed PubMed Central Google Scholar
Yang C, Chang C-H. Exploring comprehensive within-motif dependence of transcription factor binding in Escherichia coli. Sci Rep. 2015;5:17021.
Article CAS PubMed PubMed Central Google Scholar
Workman CT, Yin Y, Corcoran DL, Ideker T, Stormo GD, Benos PV. enoLOGOS: a versatile web tool for energy normalized sequence logos. Nucleic Acids Res. 2005;33:W389–92.
Article CAS PubMed PubMed Central Google Scholar
Wang L, Chen J, Wang C, Uusküla-Reimand L, Chen K, Medina-Rivera A, et al. MACE: model based analysis of ChIP-exo. Nucleic Acids Res. 2014;42:e156.
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–41.
Article CAS PubMed PubMed Central Google Scholar
Mercer TR, Clark MB, Andersen SB, Brunck ME, Haerty W, Crawford J, Taft RJ, Nielsen LK, Dinger ME, Mattick JS. Genome-wide discovery of human splicing branchpoints. Genome Res. 2015;25:290–303.
Article CAS PubMed PubMed Central Google Scholar
Smith D, Yarus M. Transfer RNA structure and coding specificity. I. Evidence that a D-arm mutation reduces tRNA dissociation from the ribosome. J Mol Biol. 1989;206:489–501.
Article CAS PubMed Google Scholar
Hardt WD, Schlegl J, Erdmann VA, Hartmann RK. Role of the D arm and the anticodon arm in tRNA recognition by eubacterial and eukaryotic RNase P enzymes. Biochemistry. 1993;32:13046–53.
Article CAS PubMed Google Scholar
Lee Y, Rio DC. Mechanisms and regulation of alternative pre-mRNA splicing. Annu Rev Biochem. 2015;84:291–323.
Article CAS PubMed PubMed Central Google Scholar
Paninski L. Estimation of entropy and mutual information. Neural Comput. 2003;15:1191-253.

Download references

Acknowledgements

Not applicable

Funding

This works is partly supported by the Mayo Clinic Center for Individualized Medicine. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Availability and requirements

CircularLogo (http://circularlogo.sourceforge.net/) is implemented in Python and Django and is released under the GNU General Public License (GPLv2). CircularLogo web server (http://bioinformaticstools.mayo.edu/circularlogo/index.html) is hosted on Amazon Elastic Compute Cloud and uses NGINX web server with uWSGI gateway interface to handle multiple concurrent client requests. Local installation of CircularLogo on Linux, Mac OS X and Windows systems requires these modules: python2.7.10 (https://www.python.org/downloads/release/python-2710/), Django (https://www.djangoproject.com/), biopython (https://github.com/biopython/biopython.github.io/), numpy (http://www.numpy.org/) and scipy (https://www.scipy.org/). The source codes and datasets analyzed during the current study are available at: https://sourceforge.net/projects/circularlogo/files/. CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html.

Authors’ contributions

LW and JPK conceived the study. ZY and TM implemented CircularLogo software and performed the analysis. MK built CircularLogo web server. LW, ZY, SD and JPK wrote the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable

Ethics approval and consent to participate

Not applicable

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Zhenqing Ye, Michael T. Kalmbach, Surendra Dasari, Jean-Pierre A. Kocher & Liguo Wang
Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, USA
Tao Ma & Liguo Wang

Authors

Zhenqing Ye
View author publications
You can also search for this author in PubMed Google Scholar
Tao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Michael T. Kalmbach
View author publications
You can also search for this author in PubMed Google Scholar
Surendra Dasari
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre A. Kocher
View author publications
You can also search for this author in PubMed Google Scholar
Liguo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liguo Wang.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Ye, Z., Ma, T., Kalmbach, M.T. et al. CircularLogo: A lightweight web application to visualize intra-motif dependencies. BMC Bioinformatics 18, 269 (2017). https://doi.org/10.1186/s12859-017-1680-2

Download citation

Received: 16 November 2016
Accepted: 11 May 2017
Published: 22 May 2017
DOI: https://doi.org/10.1186/s12859-017-1680-2

CircularLogo: A lightweight web application to visualize intra-motif dependencies

Abstract

Background

Result

Conclusion

Similar content being viewed by others

DiffLogo: a comparative visualization of sequence motifs

A new sequence logo plot to highlight enrichment and depletion

Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

Background

Implementation

JSON-Graph specifications of nucleotide motif representation

CircularLogo web server

Measure intra-motif dependencies using χ2 statistic

Measure intra-motif dependencies using mutual information

HNF6 motif analysis

tRNA sequence analysis

Synthesized DNA fragments of splice sites and branch-points for analysis

Results

Circular nucleotide motif

Nucleotide dependencies within HNF6 motif

Nucleotide dependencies within tRNAs

Nucleotide dependencies between splicing sites and branch site in eukaryotic introns

Discussion

Conclusions

Abbreviations

References

Acknowledgements

Funding

Availability and requirements

Authors’ contributions

Competing interests

Consent for publication

Ethics approval and consent to participate

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Measure intra-motif dependencies using χ² statistic