Introduction

Histologic tissue analysis is vital for investigating disease states, understanding pathophysiological mechanisms and guiding diagnostics. Recent technological developments in digital and computational pathology enabled automated large-scale histopathology analyses [1,2,3,4]. The expansion of digital pathology has especially been fueled by deep learning-based workflows [5,6,7,8]. While end-to-end approaches focus on direct clinically or diagnostically actionable outputs, pathomics uses large-scale extraction of explainable, quantitative color or geometric features (e.g., the circularity) from histological structures identified using semantic segmentation for data mining of histopathology [9,10,11,12,13,14]. This approach is similar to molecular omics approaches and aims to better understand morphology by generating morphometric features for relevant tissue structures, allowing exploratory analyses [15]. The extracted features could be integrated into clinical decision-making, e.g., for patient risk stratification [16] or outcome prediction [17, 18]. Pathomics data can be generated with comparatively little cost in comparison to other omics methods, enabling broad implementation in many research groups. This makes pathomics analyses especially interesting for biomedical researchers performing histological analyses, but the datasets can be challenging for established conventional omics workflows due to large outlier variability and missingness caused by inconsistent occurrences of analyzed structures. In addition, biomedical researchers who mostly perform tissue-based analyses often lack the specific coding skills needed for analyzing pathomics data and streamlining time-intensive data curation processes [19]. For these reasons, we have developed an R shiny application — tRigon (Toolbox foR InteGrative (path-)Omics data aNalysis) — to make exploratory pathomics data analyses more open, accessible and feasible to researchers and clinicians. While tRigon was mainly designed for its application to pathomics data, it is also suitable for analysis of other high- or low-dimensional data such as molecular omics or medical datasets.

Implementation

tRigon is a Shiny application [20] built in the R framework [21] and is available both on CRAN (https://cran.r-project.org/web/packages/tRigon) and GitLab (https://git-ce.rwth-aachen.de/labooratory-ai/trigon). It includes various functions such as descriptive statistics, statistical tests and visualizations for analyzing large and complex datasets (Fig. 1). tRigon was tested on Windows, Linux and MacOS.

Fig. 1
figure 1

Overview of the available tRigon functions with their respective appearance in the user interface (ui)

Pathomics datasets typically consist of multiple .csv files, for example generated by our previously published framework for large-scale histomorphometry (FLASH) [9]. The datasets include structural morphometric measurements (e.g., diameter, area or shape-descriptors) for major histological compartments and structures. For large human cohorts or animal experiments, this can be challenging to analyze. Furthermore, the data needs to be integrated with additional metadata. For human specimens, all tissue pieces on a slide typically belong to the same case and share the same clinical information (e.g., two biopsy cores) while some slides from animal experiments contain samples from multiple experimental conditions, e.g., multiple specimens from various animals or a diseased specimen and its internal or contralateral control tissue on the same slide. Additionally, pathomics data can be analyzed on the specimen level (e.g., a single human pathology case) or with single structure resolution.

tRigon can aggregate large amounts of pathomics files based on metadata with other (e.g., clinical) information of the analyzed samples. Based on the desired analysis the application allows for human- or animal-type data workflows and supports specimen or structure level calculations.

For the aggregated feature files or own loaded datasets, tRigon provides users with a toolbox of different analytical methods, i.e., statistics, data visualizations and machine learning algorithms (Table 1). Each analysis tool represents a tab in the application and consists of an easily understandable user interface (Figs. 2, 3, 4, 5, 6, 7). tRigon users can tailor all functions to their specific needs by choosing from various statistical tests, distribution plots, machine learning methods and output style options. To effectively handle heterogeneous datasets, missingness is automatically reported in the application, non-normally distributed features are supported by multiple non-parametric tests and outliers can be scaled in plots accordingly. Additionally, the application includes a help section with instructions and common pitfalls. All processed data, generated plots and computed statistical tests can be downloaded if desired. To enable reproducible analyses across user sessions and to keep a record of results tRigon can generate and save markdown-based.html-reports including all relevant inputs (e.g., chosen features and group column, plot selection, etc.) and outputs for each task (Table 1). A full example analysis is provided in the supplementary material (Additional file 9: Table S1–S3 and Additional file 9: Figs. S1–S4).

Table 1 tRigon functions with explanations
Fig. 2
figure 2

User interfaces of the a load/process data and b descriptive statistics tabs

Fig. 3
figure 3

User interface of the a plotting tab. b example box plot and c example ridgeline plot with logarithmic scale set to “on”

Fig. 4
figure 4

User interface of the a descriptive statistics tab and b example output for the 100-times bootstrapped comparisons of medians with 95% confidence intervals for the feature “glom_tuft_shape_circularity” stratified by histopathological diagnoses in the AC_B cohort. Additional selectable tests include pairwise Wilcoxon-rank test and Kruskal–Wallis test

Fig. 5
figure 5

User interface of the a clustering tab. Features to be clustered can be selected, as well as the number of clusters and whether data points should be assigned to a group based on a grouping column in the metadata

Fig. 6
figure 6

User interface of the a feature Importance tab. Features can be selected to perform random forest- or recursive feature-based importance analysis for classification and regression tasks. b Example feature importance plots showing mean decrease accuracy and mean decrease gini for the selected features and dependent variable

Fig. 7
figure 7

User interface of the a correlation tab. Features can be selected to perform single- or multiple correlation showing a single correlation plot as an example output. b Example multiple correlation visualized as a correlation matrix

In addition to running tRigon locally via the R console, the application is freely available online in the ShinyApps.io workspace (https://labooratory.shinyapps.io/tRigon), albeit the memory size for free use is limited to 1 GB of Random-Access Memory (RAM). Therefore, users are advised to process and analyze computationally expensive files such as large pathomics datasets locally.

Results

Nine datasets from different platforms were acquired to demonstrate the effectiveness, versatility, and limitations of tRigon. Five of those are pathomics datasets including four human kidney cohorts and one animal experiment for 2,8-dihydroxyadenine crystal nephropathy, a mouse model for diet-induced tubulointerstitial fibrosis and scarring [22]. The human kidney datasets consist of two in-house biopsy (AC_B) and nephrectomy (AC_N) datasets [9] as well as the freely available Kidney Precision Medicine Project (KPMP) [23] and Human BioMolecular Atlas Program (HuBMAP) [24] datasets containing kidney biopsies and nephrectomies. Furthermore, we analyzed freely available aggregated specimen level pathomics data from a recent study on breast cancer, replicating their results (Additional file 9: Table S4 and Additional file 9: Figs. S5–S9) [14]. In total, the four human pathomics datasets include 3,287 instance level files with a total file size of 312.7 MB while the 2,8-dihydroxyadenine crystal nephropathy pathomics dataset consists of 9 files with a total file size of 13.0 MB. The aggregated breast cancer histomics data file contains a file size of 7.55 MB. Furthermore, three freely available non-pathomics medical datasets [25,26,27] with a total file size of 4.62 MB from the Teaching of Statistics in the Health Sciences (TSHS) Resources Portal were included.

Computation time was evaluated using two different settings, representing a high- and low-resource setting and three datasets with different sizes (Table 2). Setting A refers to running the application on a hybrid tablet-notebook (Intel Pentium CPU 1.60 GHz with 8 GB RAM) while setting B refers to running tRigon on a workstation (Intel Xeon Gold 6128 CPU 3.40 GHz, 128 GB RAM). In general, running tRigon on a workstation was faster, but computation times were still quick, and performance was smooth when running the app on setting A, even for large datasets (Table 2). Regardless of hardware tRigon was especially fast for statistical analysis (summary statistics, pairwise Wilcoxon-rank tests, and correlations) and visualizations (distribution plots, scatter plots, and correlation matrices). Processing data frames and machine learning algorithms remained more time-consuming operations, as expected (Table 2).

Table 2 tRigon runtime based on data frame size and computational setting

Discussion

tRigon is a user-friendly Shiny application for high-throughput, simple and reproducible analysis of high-dimensional data including pathomics datasets.

An obvious limitation of tRigon is that it is not designed to generate pathomics data. This means it cannot be used to directly investigate whole slide images and users must use another software. However, there are tools available that allow researchers, in some instances even without coding experience, to perform such analysis [28,29,30,31]. Another limitation is that tRigon is not designed as a full-scale statistical program, i.e., in-depth statistical analyses need to be performed with dedicated tools. However, the app allows adding new functionalities, potentially increasing the analytical tools in the future.

Conclusion

With tRigon, users can easily and effectively summarize or correlate features, visualize distributions, statistically test hypotheses, implement machine learning algorithms and cluster data. Markdown reports can help users with documenting each analysis step. tRigon can further accelerate pathomics research and facilitate creating valuable readouts for large (path-)omics datasets. We will continuously update and expand tRigon in the future.