A deep fusion model for stock market prediction with news headlines and time series data

Chen, Pinyu; Boukouvalas, Zois; Corizzo, Roberto

doi:10.1007/s00521-024-10303-1

A deep fusion model for stock market prediction with news headlines and time series data

Original Article
Open access
Published: 24 August 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

A deep fusion model for stock market prediction with news headlines and time series data

Download PDF

728 Accesses
1 Altmetric
Explore all metrics

Abstract

Time series forecasting models are essential decision support tools in real-world domains. Stock market is a remarkably complex domain, due to its quickly evolving temporal nature, as well as the multiple factors having an impact on stock prices. To date, a number of machine learning-based approaches have been proposed in the literature to tackle stock trend prediction. However, they typically tend to analyze a single data source or modality, or consider multiple modalities in isolation and rely on simple combination strategies, with a potential reduction in their modeling power. In this paper, we propose a multimodal deep fusion model to predict stock trends, leveraging daily stock prices, technical indicators, and sentiment in daily news headlines published by media outlets. The proposed architecture leverages a BERT-based model branch fine-tuned on financial news and a long short-term memory (LSTM) branch that captures relevant temporal patterns in multivariate data, including stock prices and technical indicators. Our experiments on 12 different stock datasets with prices and news headlines demonstrate that our proposed model is more effective than popular baseline approaches, both in terms of accuracy and trading performance in a portfolio analysis simulation, highlighting the positive impact of multimodal deep learning for stock trend prediction.

A Deep Fusion Model Combining News Content and Historical Prices for Stock Trend Prediction

A novel multi-source information-fusion predictive framework based on deep neural networks for accuracy enhancement in stock market prediction

Article Open access 09 January 2021

MStoCast: Multimodal Deep Network for Stock Market Forecast

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

From the start of the 20th century, the financial sector has made consistent investments in researching price prediction and market dynamics models [1]. For the purpose of forecasting stock price trends, conventional quantitative approaches rely on historical time series price data [2]. In recent days, leveraging models to analyze financial time series has become essential for managing market risks and making informed investment decisions [3].

An escalating quantity of progressively advanced models is being introduced in research works to address the inherent intricacies of time series data within specific domains. Notably, stock market data is quite complex, as it is characterized by a multi-dimensional, volatile, and dynamically evolving nature. Furthermore, stock market data displays interconnections with various external factors, such as macroeconomic events and news disseminated by media sources. Consequently, appropriately integrating these factors is of paramount importance when developing predictive models that yield satisfactory levels of accuracy.

Autoregressive models [4,5,6,7,8,9,10] are proficient in addressing prediction tasks that take into account temporal auto-correlation, but fall short in adequately accounting for the multivariate nature of the data and the intricacies of nonlinear feature interactions. On the other hand, machine learning and deep learning models tailored for temporal data, exemplified by long short-term memory models and their various adaptations [11,12,13,14,15,16,17,18,19,20,21], are capable of mitigating these limitations. However, their utility is typically confined to the analysis of a single data source, thereby diminishing their effectiveness in highly volatile and challenging-to-predict domains. Therefore, it is advisable to explore ensemble-based and hybrid combination methods [22,23,24,25,26,27,28,29,30,31,32] as they hold greater promise compared to other approaches. These methods can encompass multiple data sources and harness the diversity of multiple predictors, offering a more comprehensive solution to address the challenges of data analysis in such complex and dynamic domains.

A significant drawback of many works in the literature is that they are limited to the analysis of a single source of data (or modality). Some studies highlighted that the analysis of financial news in addition to stock prices may play a key role in stock market prediction [33, 34]. For this reason, researchers are focusing on devising new and more sophisticated ways to integrate different relevant sources of data that may impact stock prices, resulting in more accurate models.

However, several methods that consider multiple sources of data, address them in isolation, relying on simple combination strategies to perform a joint analysis. Moreover, the specialized terminology and scarcity of labeled data in the financial industry exacerbates the difficulty of accurately performing sentiment analysis, making general-purpose text-based models insufficient. To this aim, language models fine-tuned on financial textual data present provide new exciting opportunities for the integration of accurate textual analysis in stock market prediction models [35].

In this paper, we propose a multimodal deep learning model for stock market trend prediction that consists of two branches: a FinBERT branch which specializes on the analysis of the textual content in financial news and accurately model market sentiment, and a LSTM branch which captures temporal market dynamics in complex multivariate data, including stock prices and technical indicators. Our deep fusion approach allows us to effectively leverage multiple modalities leading to improved generalizability, reduced bias, and increased efficiency compared to single-modality approaches.

In summary, the main contributions of our paper can be summarized as follows:

We propose a deep fusion model architecture for stock market trend prediction that seamlessly considers and integrates multiple modalities (stock prices, technical indicators, news headlines) in a joint feature space with multiple specialized branches, empowering the model with a more comprehensive understanding of patterns and complex nonlinear relationships in stock market dynamics that leads to the extraction of more robust and trustworthy next-day trend predictions;
We devise an end-to-end optimization and hyperparameter tuning workflow which allows us to identify and select highly effective configurations for each branch of the fusion model, resulting in a competitive stock prediction performance tailored to the characteristics of a specific stock under analysis;
We perform an extensive evaluation of 12 real-world stocks from different sectors in two different evaluation periods and with different market conditions (uptrend, downtrend). This evaluation encompasses two analytical perspectives: model accuracy and portfolio performance in a realistic simulation, where model predictions are leveraged for practical automated trading decisions. Our experimental results show that our approach can outperform state-of-the-art methods for stock market prediction.

The paper is structured as follows. Section 2 summarizes relevant works for stock market prediction. Section 3 describes our proposed method in detail. Section 4 discusses our experimental settings and the results obtained in our experiments. Section 5 wraps up the paper and discusses relevant directions for future work.

2 Related work

In this section, we review relevant works pertaining to time series prediction and forecasting, with particular focus on stock market analysis.

2.1 Autoregressive models

Autoregressive models are recognized for their ability to characterize associations across multiple time steps for a target feature through the learning of coefficients. One prominent autoregressive technique is the autoregressive integrated moving average (ARIMA) model [4]. Renowned for its efficacy in short-term prediction tasks, ARIMA forecasts the future value of a variable by linearly combining past values and errors, following the application of differentiation operations to render the time series stationary. The methodology outlined in [5], for instance, employs an ARIMA model to predict coronavirus cases using Johns Hopkins epidemiological data. Prophet [6] stands out as another prevalent autoregressive forecasting method, grounded in an additive model that accommodates nonlinear trends through yearly, weekly, and daily variations. This approach also addresses seasonality and holiday effects, demonstrating robustness to outliers and missing data. Prophet aims at enhanced configurability and user-friendliness in comparison with ARIMA. Vector autoregression (VAR) [7] is another noteworthy autoregressive approach that extends beyond predictive tasks for single variables. This method concurrently learns coefficients for multiple variables, considering their temporal correlations. A noteworthy investigation in [8] highlights its effectiveness in forecasting tasks embedded within a spatiotemporal context. In the realm of stock market applications, recent research [9] introduces a moving average heterogeneous autoregressive (MAT-HAR) model, treating thresholds as a moving average-generated, time-varying parameter. This model is employed to forecast the monthly realized volatility of the US stock market. Another study [10] applies univariate ARIMA models to the Amman Stock Exchange. Despite their effectiveness in numerous applications, autoregressive models exhibit certain limitations. Besides being limited to the analysis of single variables or modalities, their simplicity makes them incapable of capturing nonlinear relationships between multiple variables, which are frequently encountered in real-world multivariate data.

2.2 Machine learning and deep learning models

Machine learning and deep learning models tailored for temporal data, such as long short-term memory (LSTM) [36] models and their variations, constitute an advancement over autoregressive models due to their capacity to effectively analyze multivariate data and handle nonlinear feature interactions. In the research by [12], recurrent neural network (RNN) models featuring long short-term memory units are proposed to predict pollutant particle levels at multiple time horizons. In the domain of stock market analysis, [13] employs an LSTM model to predict the next-day closing price of the S &P 500 index, utilizing nine predictors selected from fundamental market data, macroeconomic data, and technical indicators. The authors in [14] introduce an LSTM-based model architecture for forecasting air leaks, assessing its potential within the healthcare sector. The work in [15] devises a tensor decomposition approach for feature extraction, where predictive clustering trees are used for forecasting, and their performance is compared to LSTM models. A comparative investigation of deep neural networks with LSTM networks for stock market analysis is presented by [16], focusing on daily and weekly movements of the Indian BSE Sensex index. Another work by [11] empirically analyzed LSTM networks leveraging a diverse set of real-world datasets, and identified that such models are quite effective in predicting stock market prices. The study in [17] conducts a comparative analysis involving LSTM, gated recurrent unit (GRU), and drop-GRU models in the context of power consumption forecasting, demonstrating the satisfactory performance of the devised models in this application. Combinations of GRU and convolutional neural networks (CNN) have also been explored. For instance, the GRU-CNN model proposed in [37] has shown to be effective for stock market prediction. A decision support system reinforced with LSTM for swing trading is proposed in [18], where predictions and reports that incorporate forecasted values of company stock for the next 30 days are extracted, alongside technical indicators. In the research by [19], bidirectional and stacked LSTM predictive models are benchmarked against shallow neural networks and simplified forms of LSTM networks, with analyses conducted on publicly available stock market data.

The work in [20] demonstrates that LSTM networks combined with bidirectional gated recurrent unit (BiGRU) can accurately predict the closing price of stock market, offering a more competitive performance than simpler models. In [21], a bidirectional LSTM model (Bi-LSTM), proposed for the first time for speech recognition tasks [38], is adopted and optimized by particle swarm optimization (PSO), giving place to a PSO-Bi-LSTM approach to predict useful long-, mid-, and short-term investment strategies. A CNN-LSTM model complemented by an attention mechanism was proposed in [39]. Dilated convolutions have been explored in [40] and have shown great success in extracting multi-scale patterns at different time granularities. A common limitation of these approaches is their confinement to the analysis of a single data type or modality, which constrains their effectiveness in the presence of highly volatile phenomena that depends on multiple factors.

2.3 Ensemble-based and hybrid models

Ensemble-based and combination methodologies involving hybrid models offer a robust approach to address these intricacies, encompassing the utilization of multiple data sources and the amalgamation of various predictors. An AI platform, as proposed by [22], leverages four machine learning ensemble methods, namely neural network regression ensemble, support vector regression ensemble, boosted regression trees, and random forest. The best ensemble method for a given stock is selected through a cross-validation evaluation. In [23], a fusion network is proposed to extract text and numerical information for stock price prediction, with the addition of an attention mechanism to improve the overall model performance.

A stacking ensemble approach for predicting stock closing prices is proposed in [24], where a competitive performance is obtained when contrasted with conventional machine learning ensemble models such as random forest, AdaBoost, and gradient boosting machines. A stacking approach was also explored in [41] with joint consideration of news headlines, multivariate time series data, and multiple base models as predictors. Authors in the work by [25] propose a hybrid forecasting model for stock prices that integrates various deep learning models, specifically, CNN-LSTM [42], GRU-CNN [37], and ensemble models. The work in [26] introduces a hybrid model denoted as PCA-EMD-LSTM, which combines principal component analysis, empirical mode decomposition, and LSTM for predicting stock market trends in Thailand. The hybrid model proposed in [27] utilizes decomposition techniques, multi-factor analysis, and attention-based LSTM to forecast stock market price trends in four major Asian countries. In [28], a hybrid method for analyzing stock markets is introduced, which combines an autoencoder-based feature extraction network with a temporal convolutional model architecture and a temporal clustering optimization algorithm utilizing the KL (Kullback–Leibler) divergence. The approach in [43] employs a CNN model to perform sentiment classification and integrates it to a LSTM analyzing technical indicators, showing that the joint consideration of both aspects leads to improved predictions. A deep learning approach is proposed in [29], where future stock prices are predicted by a blending ensemble learning model that combines two recurrent neural networks followed by a fully connected neural network. The authors in [30] conduct an analysis of the collective sentiment’s significance on popular S &P500 stocks and assess its efficacy in investment decision-making. A study in [31] presents a framework based on LSTM and convolutional neural networks to predict the closing prices of Tesla and Apple, utilizing historical data collected over the past two years. Two stock trading decision methods have been applied in [32]: nested reinforcement learning (Nested RL) using three deep reinforcement learning models, and a weighted random selection with confidence (WRSC) strategy. The results show that their approach outperforms baselines, enhancing portfolio management for higher profits at the same risk level.

3 Method

In this section, we describe our method in detail, focusing on its workflow. The proposed method involves a multimodal deep learning approach that combines information from historical stock prices and statistical indicators with news headlines. The model employs long short-term memory (LSTM) networks and bidirectional encoder representations from transformers (BERT) models to capture both quantitative and qualitative information. For the text component, we leverage FinBERT to conduct sentiment analysis. A fusion layer combines the two modalities and yields a final stock price prediction. A graphical representation of the method is shown in Fig. 1.

3.1 Data gathering, preprocessing, and fusion

Our method aims to comprehensively analyze stock prices, statistical trading indicators, and news headlines. To achieve this goal, we obtain data from two Application Programming Interfaces (APIs). The initial API, referred to as Yahooquery,^{Footnote 1} is employed for retrieving historical stock prices, serving as an unofficial substitute for the obsolete Yahoo Finance API. We input stock tickers in string format, and the API provides all historical data available for a specified stock within a given date range. The data obtained from the API comprises the opening price, high and low values, adjusted closing prices, and the daily observed volume. Subsequently, we employ the TA-lib Python library^{Footnote 2} to incorporate computed statistical indicators into the extracted stock data. These statistical indicators, widely used in technical analysis, encompass the exponential moving average (12-day, 26-day), moving average convergence/divergence (MACD), parabolic SAR, Bollinger bands (upper band, middle band, and lower band), and stochastic (Slow k, Slow d). Before model training, all numerical data undergoes min-max normalization. In our study, we do not perform feature selection to identify a subset of suitable features for each stock. Although some features may be more relevant than others for a given stock under analysis, the adopted deep learning architecture should, in principle, automatically learn feature influence via gradient descent optimization. Specifically, weak features will be characterized by small weights with a vanishing effect in the deeper layers of the network and a tendency to be discarded for prediction. In contrast, relevant features will lead to strong/high activation values that influence the prediction significantly.

As a secondary API, we utilize the end-of-day historical financial data (EODHD) API to fetch news headlines based on a given stock ticker. For instance, a query with the “aapl” ticker as input yields data on news articles, including posting time, titles, article content, URL links, as well as tagged symbols and tickers. To maintain focus on reliable news sources, we exclusively retain articles sourced from Yahoo Finance, discarding those from other origins.

3.2 Long short-term memory (LSTM)

Long short-term memory (LSTM) neural networks represent a category of recurrent neural networks (RNN) extensively applied in the analysis of time series data, owing to their ability of capturing prolonged dependencies within sequential data [44]. The utility of LSTM models lies in their capacity to discern and forecast patterns in time series data, which makes them valuable for predictive tasks. LSTMs address the challenge of vanishing and exploding gradients encountered in traditional RNNs [45, 46] by introducing memory cells to replace recurrent nodes. A distinguishing feature of a memory cell is its internal state, facilitating the flow of gradients across multiple time steps without vanishing or exploding [44].

Each memory cell comprises multiple nodes referred to as gates. The data from the current time step is fed into these LSTM gates, as well as into the hidden state from the preceding time step. Subsequently, three fully connected layers compute the values associated with the input, forget, and output gates. A sigmoid activation function is applied to these values to yield the final output, constrained within a (0, 1) range.

An input node undergoes computation through a tanh activation function. In essence, the gates modulate the significance of the information passed to the model at distinct time steps. The input gate gauges the proportion of the input node’s value to be added to the current internal state of the cell. The forget gate determines whether the prevailing value of the cell should be retained or discarded. Finally, the output gate decides if the memory cell should contribute to the output of the ongoing time step.

Assuming the presence of d inputs, h hidden units, and a batch size of n, the input is defined as $\textbf{X}_t \in \mathbb {R}^{n \times d} $, and the hidden state of the previous time step is defined as $\textbf{H}_{t-1} \in \mathbb {R}^{n \times h} $. The gates at time step t are defined as follows: the input gate is $\textbf{I}_t \in \mathbb {R}^{n \times h}$, the forget gate is $\textbf{F}_t \in \mathbb {R}^{n \times h}$, and the output gate is $\textbf{O}_t \in \mathbb {R}^{n \times h}$. Formally, they are calculated as:

$$\begin{aligned} \textbf{I}_t&= \sigma (\textbf{X}_t \textbf{W}_{xi} + \textbf{H}_{t-1} \textbf{W}_{hi} + \textbf{b}_i), \end{aligned}$$

(1)

$$\begin{aligned} \textbf{F}_t&= \sigma (\textbf{X}_t \textbf{W}_{xf} + \textbf{H}_{t-1} \textbf{W}_{hf} + \textbf{b}_f),\end{aligned}$$

(2)

$$\begin{aligned} \textbf{O}_t&= \sigma (\textbf{X}_t \textbf{W}_{xo} + \textbf{H}_{t-1} \textbf{W}_{ho} + \textbf{b}_o), \end{aligned}$$

(3)

where $\textbf{W}_{xi}, \textbf{W}_{xf}, \textbf{W}_{xo} \in \mathbb {R}^{d \times h}$ are weight parameters and $\textbf{b}_i, \textbf{b}_f, \textbf{b}_o \in \mathbb {R}^{1 \times h}$ are bias parameters.

The incorporation of LSTM cells equips the model with the ability to address intricate temporal patterns in multivariate data, enabling the capture of nonlinear and enduring relationships among various features and timestamps. This capability is leveraged to allow the model to discern resilient patterns within historical data, encompassing statistical indicators, and facilitate the extraction of relationships between stock prices and other descriptive features.

3.3 FinBERT

BERT (bidirectional encoder representations from transformers) is a complex deep neural network model for natural language processing (NLP). BERT achieved state-of-the-art results in various NLP tasks such as text classification, question answering, and named entity recognition. BERT uses a transformer architecture that allows capturing long-range dependencies and context in text data, making it highly effective for tasks involving understanding and processing human language. The high accuracy documented in several research works supports the adoption of BERT as a versatile model for many different NLP tasks [47]. Among them, BERT is often used to extract contextual embedding vectors from text, which can be adopted for subsequent downstream tasks. However, the performance of the model is strictly related to the pertinence of the dataset used to train the model. While using pre-trained general-purpose language may be a practical solution to avoid expensive training costs, it may result in a poor representation of topic-specific textual content [48]. To overcome this limitation, we leverage FinBERT [35], a language model specialized for financial data analysis, which obtained the highest scores on FiQA sentiment scoring and Financial PhraseBank benchmarks, outperforming other popular large language models including GPT-4 [49].

The model architecture consists of multiple stacked transformer layers, which allow the model to capture complex contextual representations. Each layer features a self-attention mechanism, which computes the weighted sum of values (V) based on queries (Q) and keys (K):

$$\begin{aligned} \text {Attention}(Q, K, V) = \text {softmax}\left( \frac{QK^T}{\sqrt{d_k}}\right) V \end{aligned}$$

(4)

The model adopts multiple attention heads, which can be formalized as:

$$\begin{aligned} \text {MultiHead}(Q, K, V) = \text {Concat}(\text {head}_1, \text {head}_2, \ldots , \text {head}_h) W^O, \end{aligned}$$

(5)

where $\text {head}_i = \text {Attention}(QW_i^Q, KW_i^K, VW_i^V)$.

The output of each transformer can be computed as:

$$\begin{aligned} \text {LayerOutput} = \text {LayerNorm}(x + \text {MultiHead}(x) + \text {FFN}(x)), \end{aligned}$$

(6)

where FFN is a simple feed-forward neural network, and ${LayerNorm}(x) = \frac{x - \mu }{\sigma }$ is the layer normalization, with $\mu $ and $\sigma $ being the mean and standard deviation, respectively.

To prevent catastrophic forgetting, FinBERT applies three state-of-the-art techniques: slanted triangular learning rates, discriminative fine-tuning, and gradual unfreezing.

FinBERT takes an initial BERT model trained on BookCorpus and Wikipedia, an fine-tunes it on the TRC2-financial corpus, a subset of Reuters’ TRC24, which consists of 1.8M news articles published by Reuters between 2008 and 2010. Subsequently, FinBERT is fine-tuned on Financial Phrasebank corpus consists of 4845 English sentences from financial news found on LexisNexis database, annotated by 16 people with background in finance and business [50].

FinBERT extracts sentiment scores for all news headlines gathered for a specific stock on a given day. It returns a positive, neutral, and negative score for each news. For textual data, we remove stopwords, punctuation marks, square brackets, and lowercase, in order to reduce noise and focus on meaningful words. Initially, we obtain a summary of the day consisting of two values: sum of positive scores, and sum of negative scores. Subsequently, the largest of the two scores determines if the day is overall positive or negative. Based on this information, we select the most representative news headline, i.e., the one with the largest positive or negative score, and extract its embedding vector representation.^{Footnote 3}

We note that sentiment scores extracted by FinBERT are used exclusively for news selection. Separately, we fine-tune FinBERT with the financial news in our dataset, and we replace the output layer with a single-unit dense layer (to predict uptrend/downtrend directly) and optimize it considering different hyperparameter configurations (see Table 2). The best-performing configuration is identified based on accuracy using a validation set.

Afterward, in the proposed model architecture, we remove the classification layer used during optimization and exploit the embedding vector representation of the news for subsequent fusion.

3.4 Multimodal fusion

Our novel multimodal fusion approach is tailored for next-day stock market trend prediction. This branch of the model is responsible for fusing the two data modalities: time series and text. More in detail, the model incorporates time series data processed through LSTM (long short-term memory), a type of recurrent neural network renowned for its effectiveness in handling sequential data, and text embeddings processed through FinBERT, a specialized model specifically tailored for the analysis of financial text data. The primary objective of this approach is to enhance prediction accuracy and robustness by fusing information from different data sources or modalities.

The structural layout of the model is visually depicted in Fig. 1, illustrating how the two data modalities are seamlessly integrated. This model leverages multimodal learning, which empowers the model with a more comprehensive understanding of the underlying patterns and relationships within the data. This, in turn, can result in an improved ability to withstand unexpected market fluctuations and enhance prediction resilience [51]. This observation is substantiated by prior research conducted across various applications.

The fusion of these data modalities is achieved through a specific process involving a concatenation layer and a series of dense layers.

The temporal granularity of data processed by the different model branches is aligned. For each day, the LSTM model processes a single multivariate data instance containing stock prices and technical indicators. Likewise, the FinBERT model leverages the most representative news headline of the day (as explained in the previous subsection). Since the downstream task of interest is next-day stock trend prediction, a daily time granularity is appropriate, and it allows us to train models efficiently considering a large time frame. Both model branches generate a vector embedding which is subsequently provided to the concatenation layer and results in the vertical concatenation of the two vector embeddings.

More in detail, for FinBERT, the preprocessing phase involves tokenizing the input text for BERT, adding special tokens like [CLS] and [SEP]. The embeddings generated by BERT offer contextual representations for each token, capturing nuanced contextual relationships. These BERT embeddings are then merged into the LSTM embeddings by concatenating them with the input sequences, allowing the multimodal model to leverage the rich contextual information from both BERT and LSTM. Notably, the configuration of these layers is customized to suit the dataset’s characteristics and the particular prediction task at hand. This strategy for model architecture optimization was proven to be beneficial in [52]. Details of the architecture’s optimization and tuning are shown in Table 3, which contains information on the various hyperparameters and configurations considered in the optimization process. We conduct a tuning process leveraging AAPL, TSLA, and MSFT stocks to identify an effective model architecture configuration (layers, number of neurons, etc.).

Table 1 Details on hyperparameter tuning: LSTM branch

A deep fusion model for stock market prediction with news headlines and time series data

Abstract

Similar content being viewed by others

A Deep Fusion Model Combining News Content and Historical Prices for Stock Trend Prediction

A novel multi-source information-fusion predictive framework based on deep neural networks for accuracy enhancement in stock market prediction

MStoCast: Multimodal Deep Network for Stock Market Forecast

Explore related subjects

1 Introduction

2 Related work

2.1 Autoregressive models

2.2 Machine learning and deep learning models

2.3 Ensemble-based and hybrid models

3 Method

3.1 Data gathering, preprocessing, and fusion

3.2 Long short-term memory (LSTM)

3.3 FinBERT

3.4 Multimodal fusion

4 Experiments

4.1 Setup

4.2 Model accuracy

4.2.1 Communication services

4.2.2 Consumer discretionary

4.2.3 Information technology

4.2.4 Real estate

4.2.5 Financials

4.2.6 Health

4.3 Portfolio analysis

5 Conclusion

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Appendix

1.1 Model accuracy

1.2 Confusion matrices

1.3 Portfolio analysis: impact of max shares

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation