Introduction

In the age of big data, scientists are increasingly looking toward large datasets to elucidate patterns and trends of interest. Similarly, new research in the biomedical sciences is centred around the use of “-omes”, a suffix referring to some totality, and the study of these objects is termed ‘omics, such as genomics, transcriptomics, and proteomics, among others. There is a growing desire to leverage the benefits of each ‘omics technology through their integration. This field, known as multi-omics, has the ability to discover novel causal mediators of disease and insights into biology missed by individual single-omics studies. This review will introduce common types of ‘omics and applications of multi-omics to the discovery of causal mediators of atherosclerotic cardiovascular disease (CVD), as well as challenges and opportunities for the field moving forward.

The Diversity of ‘Omics Technologies

There is substantial diversity in the types of ‘omics, but their common objective is to comprehensively characterise structural or functional features of a class of biological molecules in a cell, tissue, or organism. ‘Omics can be conceptualised as a gradient of layers spanning the heritable foundations of disease to more direct measures of environmental exposures on biological processes (Fig. 1). In such a model, it can be intuitively understood that adding information from intermediate layers can fill in gaps between underlying biology and disease. Therefore, integration of single-omics data can provide a deeper mechanistic understanding of pathology enabling the discovery of causal mediators that can lead to personalised or population-level trends of interest.

Fig. 1
figure 1

The complex and interconnected relationship of different types of ‘omics in cardiovascular disease. Each ‘omics layer provides unique properties and spans heritable DNA variants to more direct measures of environmental exposure such as metabolomics. Accordingly, each ‘omics data provides a piece of the puzzle, such that integrating multiple types of ‘omics can improve disease prediction, provide better causal inference, and elucidate biological mechanisms. Created with BioRender.com

Genome

Since the first drafts of the human genome, the adoption of genotyping arrays and next-generation sequencing have enabled researchers to characterise genetic variation across the whole genome as opposed to single genes [1, 2]. Costs have traditionally limited population-level studies to common variation, but a rapid decrease in the cost of whole-genome or -exome sequencing enables a more complete characterisation including rare genetic variation. Genome-wide association studies (GWAS) identify associations between genotype and phenotype by statistically modelling the relationship between the number of alleles for each genetic variant and a phenotype across all study participants [3]. To date, thousands of genetic loci have been associated with complex traits and diseases, including subclinical atherosclerosis [4], blood lipids [5, 6], stroke [7], and coronary artery disease [8,9,10]. Loci identified in GWAS include genes for known drug targets, such as PCSK9 or HMGCR for low-density lipoprotein (LDL) [5]. Nevertheless, due to linkage disequilibrium, whereby two or more variants are correlated because of the non-random association of alleles, an associated genetic variant may not be causal for disease, and pinpointing causal genes at a locus is necessary for biological interpretation. For instance, the 1p13 locus is associated with myocardial infarction and contains several plausible candidate genes (CELSR2, PSRC1, and SORT1), but functional follow-up implicated a non-coding variant decreasing expression of SORT1 and increasing very low-density lipoprotein secretion by disrupting transcription factor binding [11]. GWAS summary association statistics from individual studies and national biobanks [12,13,14] are often publicly deposited in repositories such as GWAS Catalog [15].

Epigenome

Epigenomics studies reversible modifications on deoxyribonucleic acid (DNA) or DNA-associated proteins that alter DNA accessibility and gene transcription. Examples include DNA methylation measured using bisulfite sequencing, DNA-binding proteins and histone modifications measured using chromatin immunoprecipitation sequencing (ChIP-seq), and chromatin accessibility measured using assay for transposase-accessible chromatin sequencing (ATAC-seq). Although reversible, epigenetic markers are inherited by daughter cells during mitosis. In this manner, epigenetics serve an essential role by enabling the diversity of specialised cell types necessary for multicellular life from a shared DNA sequence. However, aberrant epigenetics can also lead to dysregulated gene expression and disease. Methylation at various sites has been associated with hemorrhagic stroke [16], atherosclerosis [17], and coronary heart disease [18]. Besides DNA methylation, enzymes modifying histone deacetylation have been demonstrated to play a role in endothelial-mesenchymal transition and reducing atherosclerotic plaque volume [19] and delaying senescence in vascular smooth muscle cells [20]. Unlike genetic variation, epigenetic markers are not fixed and act as a heritable marker for past biological or environmental exposures, such as smoking [21] and air pollution [22], and accordingly present difficulties discerning whether changes occur as a cause or consequence of disease. Large-scale efforts, such as Roadmap Epigenomics (http://www.roadmapepigenomics.org/) and International Human Epigenome Consortium (http://ihec-epigenomes.org/), have developed public maps of epigenome modifications in various tissues [23].

Transcriptome

Transcriptomics refers to the characterisation of ribonucleic acid (RNA) transcripts in terms of quantitative changes to levels or qualitative changes, such as isoform switching through alternative splicing or post-transcriptional chemical modifications [24, 25]. Microarray-based platforms provide cost-effective transcriptome profiling but are limited to relative quantification and transcripts with prior sequence knowledge, while RNA-sequencing has gained popularity by providing more comprehensive information about the transcriptome such as discovery of novel genes, alternative transcripts, and allele-specific expression. The central dogma traditionally conceptualises protein-encoding messenger RNA (mRNA) as an intermediate molecule between DNA and functional proteins, such that expression changes provide immediate information regarding cellular states and transitions between them [26, 27]. Moreover, mRNA levels are associated with disease states, such as higher MCEMP1 in stroke [28], higher QSOX1 and PLBD1 in left ventricular dysfunction [29], higher SPP1 and matrix metalloproteinases in atherosclerotic plaques [30], or lower SLC25A20 and PDK4 after cardioversion in atrial fibrillation [31]. Despite not encoding functional proteins, non-coding RNA have an important role in physiology as well. In support of their role in disease, a survey of long non-coding RNA (lncRNA) across 49 human tissues identified 1432 trait-associated lncRNA [32]. Indeed, one of the most significant genetic loci associated with coronary artery disease is 9p21 with functional studies implicating the lncRNA, ANRIL, as its causal gene. Elucidating its mechanism has proven challenging due to the circularisation of transcripts and isoform-specific properties [33, 34]. The risk haplotype increases the proliferation of vascular smooth muscle cells [33, 35] and alters lysophosphospholipid metabolism [36], while in vitro ANRIL knockdown confers an atherogenic phenotype with increased adhesion and proliferation [34, 37]. Likewise, smaller non-coding RNA, such as microRNA (miRNA), circular RNA (circRNA), and Piwi-interacting RNA (piRNA), are differentially expressed in atherosclerotic vascular disease as well [38,39,40]. A survey of miRNA found enrichment for loci near genetic variants associated with the risk of CAD, and functional follow-up on miR-128 demonstrated its ability to target ABCA1 and LDLR to regulate HDL and LDL cholesterol levels [41]. Large-scale efforts, such as genotype-tissue expression (GTEx) and gene expression omnibus, have accelerated public availability of RNA-sequencing enabling extensive characterisation of transcriptomics across various tissues and cell types [42, 43].

Proteome

Proteomics characterises protein features in terms of quantitative changes to levels or qualitative post-translational changes, but the dynamic nature and diverse modifications make studying the complete range of proteomic variation very challenging. Post-translational modifications (PTM), such as glycosylation, phosphorylation, ubiquitination, and proteolysis [44], are often implicated in disease, such that 16–21% of disease-associated genetic variants overlap with known post-translational modifications [45]. Current proteomic technologies can be broadly characterised as mass spectrometry-based, antibody-based, or aptamer-based, but the broad dynamic range of proteins is often a hurdle for comprehensive and scalable characterisation of the proteome that is under active development [46]. Still, proteins have the advantage of being functionally closer to phenotypes and reflecting some environmental factors, such as smoking [47] or diet [48, 49], making them well-suited for predicting disease risk [50,51,52], and yielding insights into mechanistic pathways [53, 54]. In addition, proteins are translatable to a broad range of therapeutic modalities for drug discovery but such applications require the additional burden of establishing a causal role in disease.

Metabolome

Metabolomics encompasses small organic molecules, such as lipids, amino acids, and carbohydrates, that are often intermediate- or end-products of biological processes and measured using variations of mass spectrometry or nuclear magnetic resonance spectroscopy. Accordingly, metabolites serve as comprehensive indicators of physiology that reflect the collective state of endogenous ‘omics and environmental factors, such as genomics, microbiome, and diet [55, 56]. Clinical laboratory tests traditionally measure blood or urine metabolites, such as creatinine, urea, or glucose, to diagnose disease. Similarly, metabolites can inform future states, such as predicting incident stroke [57] or CVD [58]. Furthermore, metabolites can have downstream effects too as signalling molecules, cofactors, active metabolites from prodrugs, or other mechanisms. Indeed, some metabolites play a direct role in biological processes, such as peripheral serotonin increasing insulin secretion and reducing thermogenesis in brown adipose tissue [59,60,61] or a set of metabolites mediating the cardioprotective effects of the Mediterranean diet [62]. In this regard, observational associations between metabolites and disease require careful interpretation as they may reflect three scenarios: causal mediators of disease, consequences of the disease, or consequences of confounding variables. Yet another challenge of metabolomics is the diversity of potential chemicals, many of which are uncharacterized or unknown, but reference databases, such as Human Metabolome Database (HMDB), seek to continually aggregate information about the human metabolome as it grows [63].

Microbiome

The microbiome encompasses communities of microorganisms, such as bacteria, viruses, and fungi, inhabiting human skin and mucosa. Sequencing of 16S ribosomal RNA, which is conserved but contains hypervariable regions across bacteria, allows the classification of bacterial species within a sample. Although there is considerable inter-individual variability in microbial composition, host genetics plays a smaller role compared to environmental factors, such as diet or medications [64,65,66,67,68]. Despite the prevalence of pathogenic microorganisms, most microbiota exist in a commensal or mutualistic relationship with humans. Accordingly, studies have implicated the microbiome and microbial-derived compounds in diseases such as obesity [69], steatohepatitis [70], and coronary artery disease [71]. For instance, the beneficial effects of fasting on blood pressure may be partly mediated by the enhanced capacity of the gut microbiome to produce propionate, a short-chain fatty acid with anti-diabetic, anti-inflammatory, and anti-hypertensive properties [66, 72, 73]. Microbiota-changing interventions have shown promise for certain conditions, such as Clostridioides difficile infection, but translating other findings and elucidating cause-and-effect remains challenging due to the confounding effect of environmental factors on microbiome dynamics [64, 74]. Relatedly, the scale is challenging as there are collectively more microbial genes than human genes [64]. Here, MetaHIT and Human Microbiome Project seek to catalogue the diversity of microbial gene sequences and build reference maps for human microbiomes [75,76,77].

Multi-omics Technologies and the Discovery of Novel Causal Mediators

Despite individual ‘omics’ layers allowing for comprehensive analyses, multi-omics is becoming increasingly popular due to synergies between each layer. The orthogonal information can help establish a chain of causality in molecular events that would not be possible with a single technology and paint a more complete view of disease [78, 79]. As a result, multi-omics analyses provide broad advantages for improved predictive power, causal inference, and insights into biological mechanisms.

Predictive Power

Multi-omic profiling can benefit individual patients by improving the detection of subclinical disease and the prediction of incident events. A longitudinal study of 109 volunteers collected multi-omics data, such as genomics, metabolomics, and proteomics. Multi-omics discovered clinically relevant findings for patients, such as pre-symptomatic B-cell lymphoma or dilated cardiomyopathy, and identified insights missed by individual ‘omics, such as distinct glucose dysregulation mechanisms underlying clusters of patients progressing from pre-diabetes to diabetes mellitus [80••]. More generally, a comparison of transcriptomic and proteomic markers found a combination of cardiac proteins and muscle-enriched miRNA were better able to differentiate myocardial infarction among patients with acute chest pain [81]. Similarly, another multi-omic study identified a molecular signature for predicting high-risk atherosclerotic plaques and incident cardiovascular disease, though further studies would be needed to establish causality [82]. Although causality is not necessary to develop predictors for a disease, it is a prerequisite for understanding pathways in biological systems and identifying therapeutic targets.

Causal Inference

Correlations between biomarkers and disease are limited by inherent biases such as reverse causality or confounding, and these biases are additionally apparent between biomarkers in different ‘omics layers in multi-omics studies. Among ‘omics, genetic variants are protected against reverse causation due to being fixed at conception, such that disease-associated genetic variants can be assumed to precede disease. Furthermore, genetic variants are randomly inherited at conception, such that a genetic variant associated with a biomarker can be thought of as a proxy for that biomarker that is independent of other confounding traits akin to a natural randomised trial for that biomarker [83, 84]. Mendelian randomisation (MR) is a technique that leverages these unique properties of genetic variants to estimate the causal effects of a risk factor on diseases, such as transcriptomics [42, 85], proteomics [86, 87], metabolomics [55], or DNA methylation [88]. Indeed, MR studies have demonstrated roles for CSF-1 and CXCL12 in coronary artery disease [89], PLGF in coronary heart disease [90], MCP-1 in stroke and atherosclerotic plaque vulnerability [91, 92], and SCARA5 and TNFSF12 as protective against cardioembolic stroke [93•]. When integrating genomics in multi-omics, it is important to perform colocalisation to distinguish whether a shared causal genetic variant is underlying each ‘omics trait, or distinct genetic variants cause each ‘omics trait independently but are correlated due to linkage disequilibrium [94,95,96]. One such MR and colocalisation framework, heterogeneity in dependent instruments (HEIDI), was used to propose a regulatory mechanism mediating the association of LIPA with coronary artery disease through DNA methylation disrupting an enhancer region of LIPA in addition to a known coding variant (rs1051338) that decreases lysosomal acid lipase levels and activity in lysosomes [94].

Mechanistic Insights

For patients with suspected genetic disorders, the addition of other ‘omics, such as transcriptomics, to clinical genetic testing, can help reclassify variants of uncertain significance [97] and elucidate pathogenic mechanisms [98]. Identifying the pathogenic variant underlying the disease may enable more personalised therapies for rare genetic disorders [99]. In terms of common diseases, multi-omics data can be used to prioritise causal genes from disease-associated loci in GWAS. Polygenic priority scores (PoPS) is a method that prioritises causal genes by combining both locus-based features, such as nearest genes and genetic variants affecting gene expression, and similarity-based features, such as cell-specific expression and pathways. This flexible method enables incorporating data from multiple types of data, such as the following: gene expression, protein–protein interactions, and biological pathways, in order to inform the likelihood that a gene is causal for a disease-associated locus [100]. However, the extent to which the addition of multi-omics data can improve the determination of causal genes relative to picking the closest genes remains controversial [101, 102].

Network and pathway analyses help interpret interdependencies and create models from multi-omics studies. Biological pathways are interactions among molecules that lead to a certain product or physiological change. ReactomeGSA tests for the enrichment of disease-associated biomarkers across expert-curated biological pathways to identify shared or divergent pathways across different types of ‘omics data [103]. Pathway maps are essential to form reliable and informative priors, and other options include the Kyoto Encyclopaedia of Genes and Genomes (KEGG), WikiPathways, and Gene Ontology [104,105,106]. Multiple genes and pathways can be connected to form networks, which can be visualised with biological elements represented as nodes connected through edges using tools such as GeneMania, Search Tool for the Retrieval of Interacting Genes/Proteins (STRING), or Cytoscape [107,108,109]. Downstream analysis of such networks can identify central drivers of a disease state, such as fibronectin-1 and alpha-2-macroglobulin in advanced-stage calcific aortic valve disease based on a network of transcriptome and proteome data [82]. Additionally, nodes can be supplemented with causal genes based on GWAS loci to identify gene modules in the network that cause diseases, such as cholesterol metabolism, innate immunity, and growth factor signalling in a study of coronary artery disease [110]. Finally, multivariable MR and related techniques can also be used to build causal networks of biomarkers or identify causal mediators between risk factor and disease, such as IGFBP-3 mediating the beneficial effects of metabolically favourable adiposity, insulin mediating the relationship between telomere length and coronary heart disease, or metabolites mediating effects of gene expression on disease [111,112,113,114].

Challenges and Limitations of Multi-omics Technologies

Despite promising applications of multi-omics technologies, from identifying driver mutations in cancer to better understanding CVD, it is a nascent field with several challenges before insights can be translated broadly into clinical settings [78].

First, multi-omics datasets present a challenge for the development of robust statistical methodology and study design. Studies can vary on population structure, sample ascertainment, batch effects, and accuracy of different ‘omics or different technologies. A study integrating genomics with two proteomic platforms found 25% of proteins had spurious protein quantitative trait loci (QTL), genetic variants associated with protein levels, due to coding variants overlapping with the epitope binding site used by the platform [90]. Furthermore, researchers must contend with high-dimensionality and heterogeneity of feature spaces. Dimensionality can be reduced using feature selection or transforming into low-dimensional representations [115, 116], and there are active efforts to develop novel integrative tools for both unsupervised and supervised settings [117,118,119,120]. Similarly, as exemplified by GWAS, interpreting results is challenging when hundreds of significant associations are identified and underlying biology is incompletely understood.

Second, there are challenges regarding the generalisability of novel discoveries. For instance, there is a well-documented need for increased diversity in genomics studies [121, 122] but broader representation needs to be addressed across all ‘omics layers to ensure generalizability across populations [123,124,125,126]. Greater diversity can even lead to novel insights into disease biology, as a GWAS of coronary artery disease identified ancestry-specific genetic risk loci and an enrichment of genes expressed in the adrenal cortex among Japanese individuals [10]. Similarly, studies should carefully consider sex differences as many diseases exhibit sex-specific differences in rates of disease, which may be partially explained by sex hormones or other biological differences in each sex [127,128,129]. For instance, a study of atherosclerotic plaques found differences in gene regulatory networks across sexes where male samples expressed immune and inflammation genes while female samples expressed genes related to phenotype switching in smooth muscle cells [129].

Third, cell or tissue type can be an important consideration, particularly for transcriptomic and epigenomic data, due to variations in gene expression across tissues. Ideally, studies should be performed in the context of a tissue relevant to the disease though data availability can be a limitation. However, even at the level of a single tissue, differences in cell composition can often be responsible for heterogeneity in gene expression [42]. Several efforts are underway to create a public map of genetic variation associated with gene expression at a single-cell resolution through consortia such as GTEx and eQTLGen [130, 131].

Future Directions in the Field

Despite these challenges, multi-omics remains a promising field with ongoing research efforts to address current shortcomings. The availability of multi-omics data will continue to grow with their adoption in population studies such as the UK Biobank [12], Million Veterans Program [132], All of Us [133], Taiwan Biobank [134], China Kadoorie Biobank [135], and INTERVAL [136]. Interestingly, efforts to build genetic predictors may eventually allow multi-omics features to be imputed in any study based on genomics alone [136, 137]. In parallel, technological development and cost reduction will lead to new discoveries through the maturation of other ‘omics including wearables [80••], imaging [138], and non-coding transcriptomics [139,140,141]. Likewise, emerging concepts, such as the role of exosomes in cell-to-cell communication, will broaden the context of ‘omics data and deepen our understanding of circulating membrane-bound biomarkers with potential applications to novel diagnostics and therapeutics [142, 143].

Integrating multi-omics will be particularly informative in studies of single-cell data as it will be needed to characterise the full spectrum of cell states and factors driving cell determination [144]. A recent study integrated proteomic and transcriptomics from immune cells identified activated T-cells and macrophages in atherosclerotic plaques and a greater proportion of exhausted T-cells in symptomatic patients though further research is needed to establish whether exhaustion occurs before or after the cardiovascular event [145]. Longitudinal study designs in multi-omics can aid in resolving the order of molecular events [146], and they have recently been used to resolve the temporal fate of immune cell states during the course of COVID-19 and discover inflammatory signatures preceding fatal outcomes [147, 148]. Longitudinal sampling can also identify patterns within and between ‘omics layers that vary over time. To this end, the availability of longitudinal data and development of integrative multi-omics methods for time-series data will be important moving forward [149, 150].

As the field matures and datasets grow, standardised procedures for data collection, mechanisms for data sharing, and ontologies for harmonisation are important to build a centralised repository for multi-omics data in CVD [151]. The cardiovascular disease database (C/VD) curated peer-reviewed multi-omics datasets across multiple species and associated metadata for CVD-related traits [152]. As a case study, the authors integrated differential expression of circulating proteins, miRNA, and metabolites in coronary artery disease to build an interactome network that involved subclusters of molecular entities related to lipid metabolism, PPAR signalling, inflammation, and extracellular matrix interactions.

Finally, translating discoveries is the most important step if the goal is improving patients’ lives. In this regard, multi-omics is expected to aid in the development of therapeutics and the prediction of clinical outcomes. Efforts to connect multi-omics data with drug databases, such as DrugBank [153], or medical and insurance claims records can help identify novel therapeutic targets or repurposing opportunities [154]. Similarly, multi-omic profiling in randomised controlled trials or epidemiological studies can reveal on-target and off-target effects of pharmaceutical compounds [155,156,157], such as the potential effect of renin–angiotensin–aldosterone system modulators on HER2 that mediates their protective role on kidney disease [158]. Although it is still early, advanced statistical models trained on multi-omics data may further improve the prediction of complex phenotypes beyond current best practices. For instance, a machine learning model trained on microbiome, blood biochemistry, and clinical parameters accurately predicted postprandial glucose levels [159]. In a small randomised crossover trial of participants with newly diagnosed type 2 diabetes, a dietary intervention following the model’s recommendations resulted in improved glucose levels relative to a Mediterranean diet [160•]. However, any predictive models require special considerations to avoid harm and exacerbation of health inequities including interpretability of the model, availability of multi-omic data in clinical settings, generalizability and independent replications, and thresholds for sensitivity and specificity in the context of the disease.

Conclusions

Multi-omics is a promising emerging analytical methodology with the potential to improve the lives of many populations. While there remain challenges and limitations to multi-omics, similar to its independent ‘omics components, ongoing technological and/or statistical developments are enhancing this field. Multi-omics has proven to be a useful tool in the discovery of novel causal mediators of disease. By adopting a more comprehensive view of biology and disease, multi-omics has already started to shape the direction of big data analyses in biomedical research, including that on atherosclerotic cardiovascular disease.