TCC-GUI: a Shiny-based application for differential expression analysis of RNA-Seq count data
- 97 Downloads
Differential expression (DE) is a fundamental step in the analysis of RNA-Seq count data. We had previously developed an R/Bioconductor package (called TCC) for this purpose. While this package has the unique feature of an in-built robust normalization method, its use has so far been limited to R users only. There is thus, a need for an alternative to DE analysis by TCC for non-R users.
Here, we present a graphical user interface for TCC (called TCC-GUI). Non-R users only need a web browser as the minimum requirement for its use (https://infinityloop.shinyapps.io/TCC-GUI/). TCC-GUI is implemented in R and encapsulated in Shiny application. It contains all the major functionalities of TCC, including DE pipelines with robust normalization and simulation data generation under various conditions. It also contains (i) tools for exploratory analysis, including a useful score termed average silhouette that measures the degree of separation of compared groups, (ii) visualization tools such as volcano plot and heatmap with hierarchical clustering, and (iii) a reporting tool using R Markdown. By virtue of the Shiny-based GUI framework, users can obtain results simply by mouse navigation. The source code for TCC-GUI is available at https://github.com/swsoyee/TCC-GUI under MIT license.
KeywordsRNA-Seq Bioinformatics Differential expression analysis Shiny app
differentially expressed gene
false discovery rate
graphical user interface
total number of genes
number of replicates
principal component analysis
Tag Count Comparison
RNA-Seq is a common technique used to obtain gene expression data . A major application of RNA-Seq data is to identify differentially expressed genes (DEGs) under different groups or conditions [2, 3]. Till date, many methods have been developed for this purpose [4, 5, 6, 7, 8, 9], most of them implemented as R/Bioconductor packages [10, 11]. We had previously developed an R/Bioconductor package named TCC , the main characteristic of which is to implement a robust normalization procedure originally proposed by Kadota et al. . It can provide accurate differential expression (DE) results especially when up- and down-regulated DEGs in one of the groups are extremely biased in their number. However, due to its limitations of usage by non-R users, there is a need of an alternative for DE analysis by TCC.
Here, we present a graphical user interface (GUI) for TCC, named TCC-GUI. Using the Shiny framework , it enables non-R users to manipulate the package and adjust parameters easily in order to view the DE results. The users only need a modern web browser as the minimal requirement. Contrary to the original TCC and like any other Shiny app, TCC-GUI provides plenty of visualization tools: principal component analysis (PCA) for exploratory analysis , Volcano plot  to view the DE results, and so on. While making figures with high customizability is not a trivial task even for experienced R users, TCC-GUI facilitates such a task in real-time.
Data simulation (Step 0)
Similar to the original TCC, TCC-GUI can generate simulation data with various conditions in Step 0. The generated data can, of course, be used as input for DE analysis within TCC-GUI, as well as other tools. The “hypoData” provided as sample dataset in Step 1 is essentially the same as that generated in Step 0 with almost default settings (except for the proportion of assigned DEGs in individual groups); the total number of genes was 10,000 (Ngene = 10,000), 20% of the genes were DEGs (PDEG = 0.2), the number of groups was 2 (two-group comparison; G1 vs. G2), the levels of DE (fold-change; FC) for individual groups were fourfold (i.e., FCG1 = 4 and FCG2 = 4), the number of replicates (NR) were NRG1 = 3 and NRG2 = 3, and the proportions of assigned DEGs (P) were PG1 = 0.9 and PG2 = 0.1. Utilizing the advantage of GUI, users can recognize the number of DEGs assigned in individual groups in real-time. Simulation data for three-group comparison used in Tang et al. , for example, can be generated in Step 0.
Exploratory analysis (Step 1)
As a unique feature of TCC-GUI, it provides an average silhouette (AS) score for objectively estimating the degree of group separation [19, 20]. Silhouette was originally proposed for the interpretation and validation of cluster analysis . Silhouettes provide a measure of how well a sample is classified when it is assigned to a cluster, based on both their tightness and the separation between them. Since the silhouette scores are calculated for individual samples, the AS value can be obtained by taking the mean across all samples.
We recently demonstrated that the AS value can be utilized for estimating the degree of group separation . It ranges from − 1.0 to 1.0, and a higher AS value indicates a higher degree of group separation (i.e., a higher percentage of DEGs). In case of hypoData with PDEG = 0.2, TCC-GUI outputs AS = 0.246 (see the left side of Fig. 2). For data that contain no or few DEGs (i.e., PDEG = 0.0 approximately), the AS value would be around zero . Although the AS values can be calculated independent of SC, they also provide a relevant measure for the degrees of separation between the groups of interest (e.g., G1 vs. G2) in SC results.
TCC Computation (Step 2)
This step includes data normalization and DEG identification. It provides several analysis pipelines that can be performed by changing options in the parameter setting panel (see Step 2 in Additional file 2), They include the iterative edgeR pipeline (as default), iterative DESeq 2 pipeline, and the original edgeR or DESeq 2 pipeline.
The DE results will appear in the “Result Table” panel after the operation ends. While the main output is a p-value that indicates the degree of DE between the compared groups, other information, such as adjusted p-values (i.e., q-values) and log ratio (M) values, are also provided. The user can download the complete DE results and TCC-normalized data as CSV files. The user can also extract any subset of genes by the column of interest in the table. This can be done by utilizing the boxes at the bottom of the table. For example, the user will see a range of log-ratios (= log2(G2/G1)) as [− 6.63, 6.48] in the box of the “M Value” column. The user can extract genes that are twofold higher in G2 than in G1 by setting the range appropriately, as in [1.0, 7.0].
TCC-GUI also provides R codes internally used to execute the TCC. Researchers can learn the functions that are used internally, utilize this code as a template, and obtain reproducible results.
Visualization (Step 3)
Based on the definition of FDR, the 1320 genes satisfying 10% FDR are theoretically composed of 1320 × 0.1 = 132 non-DEGs while the remaining 1320 × 0.9 = 1188 are true DEGs. Similarly, 1841 genes satisfying 40% FDR are composed of 1841 × 0.4 = 736.4 non-DEGs and 1841 × 0.6 = 1104.6 true DEGs. Although genes satisfying arbitrarily defined FDR cut-offs are usually defined as DEGs, an increase in the number of genes by loosening the FDR cut-off does not necessarily indicate an increase in the number of true DEGs . In TCC-GUI, the number of genes satisfying various FDR thresholds can be interactively seen in the “MA Plot Parameters” panel. In addition, information of FDR cut-offs in increments of 0.05 is also provided in tabular form. This information, as well as the AS value, would be helpful to estimate how the true DEGs are included in the input data.
TCC-GUI provides Volcano plot as another way to visualize the DE result of two-group data. In contrast to MA plots that are constructed by plotting the M values (Y-axis) vs. A values (X-axis), it plots M values on the X-axis and statistical significances as − log10(p-value) on the Y-axis. Many users will be interested in genes located in the upper left or upper right areas in the plot. The user can, of course, change the colors and cut-off values for both axes and see the number of genes satisfying both the cut-offs. In case of hypoData with default settings, the user will see 1374 genes down-regulated and 283 genes up-regulated in G2. This is quite reasonable because the hypoData contains 1800 genes down-regulated and 200 genes up-regulated in G2.
TCC-GUI also provides two other visualization tools for somewhat general purposes: “Heatmap” and “Expression Level”. “Heatmap” is a graphical representation of data where the individual count values contained in a matrix are represented as pseudo-colors. Hierarchical clustering is usually performed on heatmap, enabling users to interpret the overall picture of expression patterns with ease . “Expression Level” can be used to visualize expression patterns for genes of interest. It would be useful to visualize, for example, expression patterns of top-ranked DEGs obtained from two- or more-group comparison.
Report (Step 4)
Although TCC-GUI has the option to export results after every step, some users may prefer the output merged into one single file. Like other sophisticated GUI-based tools, such as PIVOT , TCC-GUI supports this functionality in the final Step 4.
Representative analysis of a real count dataset
Here, we demonstrate a representative analysis of a real count dataset , available at the ReCount website . The dataset consisted of 36,536 genes × 21 liver samples. Bottomly et al.  had studied the expression levels of two common inbred mouse strains used in neuroscience research, i.e., 10 C57BL/6J strains and 11 DBA/2 J strains. TCC-GUI displayed the results of two-group comparison: (i) the AS value was 0.187 in Step 1, (ii) 22,604 low-count genes were filtered (i.e., 36,536-22,604 = 13,932 genes were used as input for TCC computation in Step 2), and (iii) 1530 genes satisfying 10% FDR (i.e., PDEG = 10.98%) were detected as DEGs after TCC computation in Step 2. These values were exactly the same as those described in Zhao et al. . A series of screenshots for this analysis is given in Additional file 3.
TCC-GUI is a browser-based application for DE analysis of RNA-Seq data. It enables non-R users to perform the TCC package without installation. In addition to the functionalities originally implemented in TCC, TCC-GUI provides plenty of interactive visualization functions. The powerful in-built functions would also be satisfactory for experienced R users.
While the development is complete from the end-user perspective, the internally used R codes are still cluttered. Moreover, the GUI in Step 4 is still in need of further improvement. These refinements are desirable in near future.
WS developed the TCC-GUI software and drafted the manuscript. JS provided critical comments as the first author of TCC’s paper. KS designed the study, supervised the critical discussion and revised the paper. KK made the conception of the software, substantially revised the paper, and led this project to completion. All authors read and approved the final manuscript.
We would like to thank Editage (http://www.editage.jp) for English language editing.
The authors declare that they have no competing interests.
Availability of data and materials
All data in this study were included in this article.
Consent for publication
Ethics approval and consent to participate
This study was supported by JSPS KAKENHI [Grant Numbers JP15K06919 and JP18K11521]. The funding body had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Ohde T, Morita S, Shigenobu S, Morita J, Mizutani T, Gotoh H, Zinna RA, Nakata M, Ito Y, Wada K, Kitano Y, Yuzaki K, Toga K, Mase M, Kadota K, Rushe J, Lavine LC, Emlen DJ, Niimi T. Rhinoceros beetle horn development reveals deep parallels with dung beetles. PLoS Genet. 2018;14(10):e1007651.CrossRefGoogle Scholar
- 10.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2016. https://www.R-project.org/.
- 11.Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80.CrossRefGoogle Scholar
- 13.Chang W, Cheng J, Allaire JJ, Xie Y, McPherson J. Shiny: Web Application Framework for R. 2018. R package version 1.2.0.Google Scholar
- 16.Su W. TCC-GUI Online version. 2019. https://infinityloop.shinyapps.io/TCC-GUI/. Accessed 9 Jan 2019.
- 17.Su W. TCC-GUI GitHub page. 2019. https://github.com/swsoyee/TCC-GUI. Accessed 9 Jan 2019.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.