Introduction

The combination of data coming from multiple sources, and constituting databases, with significant possibility of integration and complex aggregation and discriminant analyses, defines the so-called Big Data, which intrinsically refers to extensive datasets, widely informative for a large number and variety of persons.

In fact, large and rapidly increasing amount of data (Volume), their multiple sources (i.e., clinical studies, registries, small database, administrative database, patient-reported outcomes, genomic profiles and environmental parameters) (Variety), the rapid data accumulation (Velocity) and their ability to truly represent a specific context (Veracity) characterize Big Data [1].

Analyses of these data may unveil patterns, trends and associations, and define reference models in aggregations of persons [2]. As data digitalization and information technology (IT) are spreading and improving performance [3,4,5], the use of Big Data is steeply increasing, becoming progressively a reference for many typical processes in medicine, such as the identification of the appropriate therapeutic choice by tailoring therapeutic options, the evaluation of short and mid-term, procedure-related or unrelated, risks of adverse events and the definition of the prognosis. To such an extent, Big Data may generate the basis for precision medicine, as factors impacting event occurrence is progressively available for the single subject and may improve effectiveness of cardiovascular therapies [6].

The place of Big Data is far from being well determined. Figure 1 offers a graphical synthesis of the present and future of Big Data.

Fig. 1
figure 1

Big Data elaboration allows an improvement of risk management, and diagnostic or therapeutic strategies in cardiac surgery. The future challenge will be the practical application in healthcare

Big Data: sources and analysis

The key points of the application of Big Data are the clinical usefulness and the balance between costs of sophisticated data analyses and the expected and real benefits (i) to patients, in terms of quality of care, outcomes and risk prediction; (ii) to operators, in terms of quality and security of processes, from the diagnosis to the choice of therapeutic options, in a perspective of resources saving.

Regarding the sources, Electronic Health Record (EHR) is the main source of Big Data. Administrative data, which are commonly employed for billing purposes in fee-for-service health systems, may turn helpful for a large spectrum of analytic goals, and may fuel risk modeling for clinical and economical purposes [7]. Several approaches are used to enable data aggregation from EHR and to facilitate their contribution to Big Data. The most relevant limitations of the use of these databases are the risk of misclassification and the impact of missing data [8].

Digital imaging significantly contributes to Big Data generation. Today, almost all medical images are stored in pixels or voxels [9], which can be processed by software aiming at improving data quality and diagnostic accuracy [10].

Finally, OMICS datasets, or genome, proteomic, transcriptomic, epigenomic, and metabolomic data, are readily available in digital and structured forms, allowing the recognition of patterns useful for grouping procedures; they are additional important sources for Big Data [11].

In cardiovascular medicine, the contribution of OMICS is enormous and hold a very high potential for Big Data generation and subsequent analysis. A valid example of the OMICS data analysis is that it is able to provide information on myocardial molecular profiles of cardiac surgical patients [12,13,14]. The variants identification by OMICS techniques allows performing association studies, which may turn useful to risk stratification and outcome prediction in the context of precision medicine [15,16,17,18,19], using appropriate systems for analyses [20,21,22].

When it comes to analyses of Big Data, classic statistics is limited. Independent storage systems, immediate access and relational databases procedures are crucial for Big Data analyses. Artificial intelligence (AI) is of a particular usefulness, conceived as computer and mathematical concept allowing “machines” to execute learning, problem-solving, patterns recognition, reasoning and planning, in a way that resembles “human thinking”, therefore dealing with uncertainty, projection, and production that are well beyond extrapolating regression models validated in subsets of the general population, called validation samples, to approximate prediction in the general population.

The ability of such a technology to process decisions independently, recognize errors and readjust the decision or prediction models and processes, are the main characteristics of the AI, based on machine learning (ML) and deep learning (DL).

ML is the study of a mathematical algorithm model from sample data which in turn is used to generate predictions or decisions. ML algorithms can be supervised, unsupervised and reinforced [23]. DL is composed of artificial neural networks, with representation learning, to mimic human cognition. DL could be a module to automate predictive analysis, from which data is deduced in a non-linear way. The advantage of a non-linear interpretation is the better ability to identify and interpret more complex characteristics [24] and therefore is linked to a hierarchy of increasing complexity and abstraction [25]. DL is used for image evaluation, such as cardiac magnetic resonance scans; this requires adequate skills and systems [26, 27].

Logistic regression (LR) is a classic classification algorithm that makes a linear combination of input variables and uses the sigmoid function to output a probability.

Neurons in artificial neural networks (ANN) make a linear combination of the output value from the upper layers’ neurons, pass it through sigmoid functions, and finally output a value to the next neurons [28].

The use of predictive models that evaluate the influence of covariates, in the prediction of the results, allows it to identify the patients for whom the intervention will be successful. However, in analyzing non-randomized diagnostic or therapeutic strategies, it is possible to compare non-similar groups, exposing patients to subsequent complications [2].

The present of Big Data in cardiac surgery and the road ahead

While procedures safety and outcomes in cardiac surgery have improved over the years in the majority of elective procedures, cardiac surgery is facing patients with increased complexity due to improved survival owing to refined cardiological, pneumological and oncological therapies. Changes in such a clinical context require rethinking clinical risk assessment and management as well as redefinition of optimal timing for surgical options.

So far, cardiac surgery has relied on logistic models to estimate the risk of events, including mortality associated with cardiac surgery, as essential components of routine clinical management of cardiac surgical patients. The EUROscore II and the Society of Thoracic Score (STS) scores are the logistic models for cardiac surgery-related risk stratification most commonly employed. Nevertheless, the prediction of those estimates is debated, especially in subsets of patients [29]. AI applied to Big Data has the potential to change the paradigm, from a theoretical and average risk prediction to simulations in single patients, weighting tailored therapeutic options and managing risks, and finally employing sustainable options to improve outcomes.

This is the most appealing application of the new concept of the meta-verse, an aggregate of Big Data, information technologies and AI converging to generate novel approaches to handle reality by navigating virtual near-future in clinical contexts, hopefully yielding time saving, less errors, more precision, variability due to operators, minimize costs and human effort while prolonging life with reasonable quality.

To such an extent, Big Data and AI have been applied in seminal studies in cardiac surgery in the field of myocardial revascularization, valvular heart diseases and end-stage heart failure (Table 1). Table 2 shows the ongoing trials focused on the application of Big Data-derived analysis in cardiac surgery.

Table 1 Artificial intelligence in cardiac surgery and cardiovascular diseases
Table 2 Ongoing trials on the application of big data and derived analysis techniques to cardiac surgery

Myocardial revascularization

In the field of surgical myocardial revascularization, structured data from EHR managed by the Society of Thoracic Surgeons, the American College of Cardiology (National Cardiovascular Data Registry), and the American Heart Association, represent a significant source for data analysis. Clinical records, and data from imaging, analyzed and interpreted using AI approaches, may integrate original approaches based on clinical registries and regression models (e.g.: neural networks) [9, 30].

One of the major challenging research tasks, to date, is to evaluate whether percutaneous coronary interventions (PCI) are superior over coronary artery bypass surgery (CABG) in specific clinical contexts. Randomized trials (RCTs) on myocardial revascularization have been used to answer this question, and to develop a tool for the identification of patients who may benefit from one therapeutic option over the other or a combination of both (hybrid revascularization), with minimized clinical risk and expenditure. Although RCTs are very effective to control treatment selection bias, they are low-performing in evaluating subgroups and they suffer from inappropriate statistical power in subset and possible post-hoc biases. Observational, nonrandomized data gathered from registries and large multi-sources databases, may be closer to real-world representing a very large majority of patients, and therefore may work better with single patients using AI based analytic modalities. In contrast, those sources of data may be affected by lower data control and potential biases in the outcome definition and assignment [31]. Weintraub and colleagues [32, 33] compared the effectiveness of different myocardial revascularization strategies by linking the American College of Cardiology Foundation (ACCF) National Cardiovascular Data Registry and the Society of Thoracic Surgery (STS) Adult Cardiac Surgery Databases, to data from claims from the Centers of Medicare and Medicaid. They demonstrated that the real-world mortality was not significantly different at 1-year from that anticipated by commonly used scores, while long term survival was higher in patients receiving CABG as compared with patients who underwent PCI. The method of analyses of these data required the use of probabilistic matching to identify patients throughout databases, adjusting for clinical covariates with the use of inverse probability weighting, and correction of residual confounding by means of a sensitivity analysis.

A valid example of cost-effective applicability of AI to Big Data in the field of myocardial revascularization is the ASCERT study (ACCF and STS Database Collaboration on the Comparative Effectiveness of Revascularization Strategies) [34], where two revascularization strategies (PCI versus CABG) were evaluated in patients suffering from stable ischemic heart disease. linking large converging databases, both clinical and administrative, to obtain data from 86,244 patients for CABG and 103,549 patients for PCI. Those figures are much larger than any proposed by RCTs. Interestingly, the authors found that patients undergoing CABG had better outcomes than those undergoing PCI, but at the expense of higher costs, allowing the calculation of the indicator of the incremental cost-effectiveness ratio expressed as cost per quality-adjusted life-year gained.

The role of some patients-related characteristics in determining outcomes in myocardial revascularization strategies are highlighted in the large study by Hlatkyet al. [35], where authors demonstrated lower long-term mortality with CABG rather than with PCI, in an unselected group of patients extracted by the general population of those undergoing those procedures, with outcomes substantially modified by factors such as diabetes, smoking habit, heart failure, and peripheral artery disease.

The fundamental study of Weintraub et al., by linking the STS database to that of the Centers for Medicare and Medicaid Services, showed that in stable patients, older than 65 years with multivessel coronary artery disease, CABG offers an advantage in terms of long-term survival.

While RCTs remain the only accepted evidence-based information gathered in medical guidelines, studies based on Big Data and AI added knowledge on the comparative effectiveness of the two therapeutic strategies. The place of Big Data analysis is therefore yet to be precisely determined.

Valvular heart diseases and cardiac imaging

The identification of significant valvular diseases has the potential to clarify the etiology, and/or reveal a consequence of ventricular failure, with or without dilatation, which may contribute to define the prognosis and to identify elements acting as triggers for worsening heart failure and hence prognosis. Echocardiography is used widely for assessment of cardiac structure and function, with diagnostic accuracy and reliability depending on the operators’ skill, experience and expertise. In contexts such as high volume in busy environment, and emergency, machines enriched with a technology that analyses live imaging in real-time, continuously comparing it with a pool of reference images, may help optimizing the imaging protocol and identifying diseases or patterns of abnormalities, detecting trends over time and evaluating the stability of specific measurements. Those are intriguing perspectives that may be associated with the application of AI in the field of echocardiography. For instance, in mitral valvular regurgitation, recognition of increasing severity of the valvular insufficiency, ventricular dilatation and reduction in chamber shortening, atrial dilatation, as well as the worsening myocardial function assessed by means of semiautomated, relatively load-independent parameters, may parallelly run with the detection of clinical changes and even anticipate overt changes in symptoms and signs of heart failure (HF).

In echocardiography, automated views recognition and structures identification may be considered an initial step toward semi-assisted diagnostic studies. To such an extent, AI-based technology is a key element, as convolutional neural networks may be employed to identify key reference points on images, and then feature specific for diagnostic patterns. Identification of normal patterns, deviations from normal patterns or specific pathologies has been possible in experimental studies in more than 9 cases in 10 evaluated cases [36] in algorithm-based supervised machine learning [37]. A further step in AI-assisted echocardiography may be the identification of deviation from physiology or overt pathological conditions, such as left ventricular hypertrophy, hypertrophic cardiomyopathy [38, 39], ventricular dilatation and reduced chamber and myocardial function [40]. With 2-D speckle-tracking technology, an accurate semi-automated volume and systolic function quantification may be run bed-side and take a few minutes or seconds [41, 42]. In the context of valvular disease associated with ventricular dysfunction, an important role is played by quantification of contractility reserve, which impacts prognosis. Wall motion quantification is relevant to such an extent, with AI operated diagnostic modality pushing accuracy of wall motion quantification as high as 85% [43, 44], helping in the case of evaluation of ventricular contractility reserve and aiding in decision making on the best valvular disease management.

Assessment of mitral valve regurgitation severity may be aided by automated processes based on deep learning machines [45, 46]. With regard to aortic valve disease, identification of trends over time in ascending aorta and aortic root dimensions, ventricular dimensions and shortening, relies on reproducibility of imaging in specific views, and may provide important information impacting preclinical and, timely, clinical decision making. Moreover, aortic annulus sizing represents an important target of quantification, as to date transcatheter procedures for aortic valve replacement are increasing steeply [47].

Beyond echocardiography, the potential revolution of AI may be even more applicable and profound in more standardized diagnostic processes such those applicable to nuclear medicine, computed tomography and magnetic resonance imaging, which may suffer much less variability between-subjects due to body size [48, 49].

HF and mechanical circulatory support: selection of patients, prediction of adverse events and technology development

Heart transplantation (HTx) is the gold standard therapy in advanced HF, defined as persistence of symptoms and significant personal physical limitations despite optimized pharmacological and standard nonpharmacological therapy, associated with recurrent hospitalizations and need for escalation therapy including inotropes [50]. However, the number of organs available does not match the number of patients in need for organs as they slide toward a deterioration of the clinical conditions and require a timely intervention. Left Ventricular assist devices (LVAD) may help as bridge to HTx, or to candidacy or decision, or even as destination therapy in subjects with advanced or terminal heart failure [51, 52]. Those patients, suffering from end-stage HF, present with unique challenges: frailty [53], end-organ damage, risk for acute decompensations, and high mortality at short-term. Events as cardiogenic shock carries worsened prognosis. The therapeutic strategy shows multiple options, with multiple devices that can be employed in different phases of the clinical course [54].

Because of the high risk of surgery and the patients’ characteristics, the prediction of peri-procedural adverse events and of the long-term complications are critical issues when planning LVAD implantation, impacting benefit and quality of life per costs [55], and healthcare sustainability. Data from registries and discriminant statistics are commonly used to identify potentially life-threatening conditions impacting prognosis and hospital stay duration after LVAD implantation [56].

However, the classic way for risk-prediction estimation is based on statistical methods that implies a proportional and statistically significant participation of several variables in a context where hazard is not proportional over time [57]. On a different approach, AI and Big Data may simulate the effects of decisions, and their interactions with an uncertain environment [58] and facilitate not only the prediction of specific pre-defined events.

Prediction of complications after LVAD implantation is relevant to sustainability of LVAD procedures. AI has been involved to recognize drive-line infections using photographic database as background source [59] and the identification of clusters of variables that predict right ventricular failure, bleeding, infection and pump failure due to pump thrombosis [60].

While statistically-based risk models have proven suboptimal ability to predict mortality risk in LVAD, by use of Interagency Registry for Mechanically Assisted Circulatory Support (INTERMACS), data from 2006 to 2016, for a total of 16,120 patients included, and bootstrapping with 1000 replications in the testing set, improved 90-day discrimination from 0.707 [0.683–0.730] to 0.740 [0.717–0.762] and 1-year mortality from 0.691 [0.673–0.710] to 0.714 [0.695–0.734] (all p < 0.001). The net reclassification rate was up to 49% for 90-day mortality and 37% for 1-year mortality. The findings supported the concept that ML may increase the performance of a risk model for durable LVAD mortality compared to logistic regression-based algorithm [60]. Because continuous blood flow from LVAD is associated with increased risk of complications, as gastrointestinal bleeding, continuous pump speed generating flow is modulated to generate pulsatile flow. More importantly, pump speed of the LVAD may be controlled to assist left ventricles during a single beat to optimize systolic, versus diastolic assistance [61]. Diastolic versus systolic modulation of the pump speed may impact on flow pulsatility and diastolic assistance to reduce external myocardial work [62]. These mathematical models are limited in their applicability because the need for pressure feedback signals from the cardiovascular system require suitable integrated long-term pressure sensors. To date, novel AI based controllers, real-time deep convolutional neural network-based, are tested to estimate left ventricular preload using LVAD flow analyses and a sensorless adaptive control system, trained and evaluated through a number of cross validation settings and physiologic situations in different patient and different conditions, resulting in accurate pre-load evaluation (root mean squared error of 0.84 mmHg, reproducibility coefficient of 1.56 mmHg, coefficient of variation of 14.44%, and bias of 0.29 mmHg for the testing dataset) [63]. The system was able to use LVAD data to measure preload and prevent ventricular suction and pulmonary congestion [64].

Conclusions

After several years of intense research yielding a great number of scientific publications, a major gap exists between practical application of AI applied to Big Data and RCTs to guide practice in real work, in large part because AI and Big Data are yet to become controlled research tools on a large scale. Data quality control, missing data, privacy and potential conflicts of interests in a variety of stakeholders, costs of the technology required, and the need for high-performing information technology are still barriers for a routine and wide use of Big Data in the field of cardiac surgery research and clinical process.

Nevertheless, in the context of an enormous resource re-allocation due to the COVID-19 pandemic, that reduced significantly the research output in other fields of medicine, Big Data and AI may turn to be relevant tools.

The role of hypothesis generation in Big Data science is without doubt, but it should be considered as a complementary mean to obtain evidence [65].

However, the crucial match on usefulness of Big Data and AI in the near future is also played in the side of productivity, simulation, augmented reality aiding diagnostic and clinical decision making, communication with patients and generating precision medicine. These features have the potential to go well beyond the context of knowledge generated from RCTs to prove or unconfirm specific hypotheses, by using strict enrollment criteria to make homogenous the population. Hence, we all need to be familiar with those concepts and tool for the future, which is not that far away from now.