1 Introduction

Over the years, research efforts on asset pricing have progressively evolved and alluded to the plausibility of Capital Asset Pricing Model (CAPM) introduced by Treynor (1961, 1962) and further independently improved by Sharpe (1964), Lintner (1965) and Mossin (1966). Building on the premier model of Markowitz (1952) on modern portfolio theory and diversification, CAPM provides a persuasive approach to estimating expected return, giving credence to market risk premium as the cardinal explanatory factor. This is in addition to the presence of a riskless asset and systematic risk factor (beta). However, successive attempts at asset pricing have questioned the validity of CAPM due to ample empirical anomalies discovered by proponents of behavioural (or sentiment) factors such as Gibbons (1982), Lee et al., (1991), Baker & Wurgler (2006), Kumar & Lee (2006), Tetlock (2007), Hyde & Sherif (2010), and Kahneman & Tversky (2013). To this end, this study explores the inclusion of sentiment measures as a risk factor in asset pricing.

The sequential findings of Fama & French (1993, 1995 & 1996) on the three-factor model, provide the foremost statistical adjustment to the CAPM, with the introduction of size and value premiums as additional risk factors. Proposing the difference in returns between high and low book to market ratio and big and small stocks, their approach shows that value and size provide explanation to the cross-sectional sensitivity of common risk factors in average stock returns. Meanwhile, the aptness of the three-factor model in finance literature has also continued to generate debates in subsequent research on asset pricing, despite the significant amount of support it has enjoyed. He et al., (1996) document that the three-factor model accounts for a small-scale proportion of the cross-sectional variation of stock returns. Griffin (2002) also criticises the three-factor model as generally country-specific with little explanatory power towards cross-country endeavour while other studies (Cakici et al. 2013; Hanauer & Linhart 2015) also condemn the inconsistency and inapplicability of the model with other markets, save the developed markets. Furthermore, Petkova (2006) observes that the three- factor model loses its ability to predict cross-section of returns when innovations are included as part of the variables in the model. More so, Carhart (1997) opines that the inability of the three-factor model to recognise the aggression phenomenon of winner-loser effect led to the development of the four-factor model. Hence, in furtherance to the size and value premiums proposed by Fama & French (1993), Carhart (1997) introduced the monthly momentum (MOM) as an additional risk factor. Chen & Fang (2009) show that the four-factor model provides increased explanation for portfolio returns than the three-factor model.

While many studies favour the use of factor models in asset pricing, a stream of research questions the over-reliance and absolute use of factor models. In their opinion, factor models may not adequately account for the discount or premium observed in prices of some classes of assets, particularly those listed in emerging markets. These studies (Lee & Gan 2006; Humpe & Macmillan 2009; Yaya & Shittu 2010; Kasman et al. 2011; Patel 2012; Ahmadi 2016) posit that apart from the factor models, macroeconomic variables such as interest rate, money supply, inflation, trade balance and unemployment also have varying significant impact on the pricing of assets. However, the outcomes of the studies have remained unclear about the most suitable fundamental proxy and the surge in number of pricing anomalies has offered reasons for additional factors. As a result, recent studies have introduced human psychology into the dynamics of stock markets. These research efforts have brought much popularity to the significance of behavioural finance, that is human sentiments, in explaining market outcomes.

Sakariyahu et al., (2021) classify sentiment-based studies into five categories. They include market-based sentiment studies such as Baker & Wurgler (2007), Chen (2012), He et al. (2019), Paterson et al., (2023). These studies demonstrate that market indices (e.g., trade volume, dividend, and liquidity) have informational content and provide signals to noise traders, who upon acting on these signals significantly disrupt stock price behaviour. There is also the media-based category of sentiments who believe that social media threads often inspire noise trading which consequently impacts price direction. Proponents of this category include Rao & Srivastava (2012), Oliveira et al. (2013), Uhl (2014), Dosumu (2023). The third category of sentiment studies use internet-based sentiment measures (Da et al. 2014; Zhang et al. 2017; Trichilli et al. 2020; Sakariyahu et al. 2023). These studies document that irrational investors are propelled by the outcomes of online search facilities. The penultimate category is the non-fundamental based sentiment measures who adopt non-economic events such as politics, weather, or religion to explain asset prices. The advocates of these measures include Levy & Yagil (2011), Bia lkowski et al. (2012), Chang et al. (2012) and Goetzmann et al. (2014). Lastly, there is a category of researchers who use survey-based measures of sentiment. These studies argue that consumer or market surveys contain adequate information that highlight investors expectation of the market. The adherents include Chen (2011), Finter et al. (2012), Dalika (2014), and Salhin et al. (2016).

In contributing to the ongoing efforts on behavioural finance, we investigate whether the inclusion of sentiments as a risk factor in a model of asset returns would increase the forecast power. The introduction of a new sentiment variable creates a distinction from other sentiment studies. It is widely acknowledged in the literature that investor sentiment has no perfect or definitive measure; nonetheless, this study is motivated by the reality that persistent market anomalies largely reflect symptoms of human sentiments (Shleifer & Summers 1990; Sias et al. 2001). Thus, using laggards to leaders as a new proxy for sentiment, we propose that irrational investors express emotional apathy towards a particular stock, sector, or market when the proportion of stocks declining in value outnumbers the value-advancing stocks. This trading pattern eventually disrupts expected market standards by creating artificial price imbalance, in form of either over-escalating security prices during bullish periods or extremely suppressing them during bearish periods (Changsheng & Yongfeng 2012; Bathia & Bredin 2018). To better understand the predictive power of sentiments on stock returns, we examine the extant factor variables (such as risk premium, the size factor, the value factor, and the momentum factor) along with the index we create for investor sentiment.

Furthermore, we use a principal component analysis to form a raw sentiment index by incorporating some generic sentiment variables from the literature, such as liquidity, as measured by market turnover (Pan & Poteshman 2006), dividend premium (Baker & Wur- gler 2006) and consumer confidence indicator (Lemmon & Portniaguina 2006). In a bid to also distinguish between market-imposed sentiment and economic-cycle induced sentiment, this study further constructs a second sentiment index that deflates economic cycle variations, fol- lowing the approach of Bathia & Bredin (2018). For instance, liquidity and ratio of lagging to leading stocks may fluctuate due to national or economic reasons. We therefore extract clean sentiment index that is free from economic cycles. Basically, we regress percentage growth in the industrial production index, consumer price index, broad money supply and base lending rate on each of the sentiment proxies. The resulting residuals generated from these regressions constitute a cleaner sentiment index. Our findings reveal that the sentiment index significantly predicts excess return in many of the portfolios formed. The inclusion of our sentiment index in the risk factors produces incremental abnormal returns, suggesting that the sentiment-induced models forecast better than the extant risk factors.

Interestingly, despite the vast number of studies documenting sentiments, none has investigated our research direction, to the best of our knowledge. This study therefore focuses on the UK stock market by constructing and estimating alternative models to the extant asset pricing models. Our choice of the UK market is informed by its status in the assembly of developed markets. By trading volume, the UK stock market ranks first in Europe and second in the world (Federation of Exchanges, 2018). Thus, using UK data provides a yardstick to gauge the models in this study from the perspective of a developed market and further serves as alternative performance evaluation to similar research endeavour in other developed markets. Finally, to accentuate our findings, we propose a methodological substitute to the parametric tests of asset pricing, using Hansen & Jagannathan (1997) non-parametric model performance technique. This technique assesses the suitability of the models used in this study, thus providing answers to issues surrounding robustness of our findings.

The other parts of this study are organized as follows. Section 2 provides literature review while Sect. 3 explains the methodology. Section 4 shows the empirical models and assumptions underpinning the study’s objectives. Section 5 presents the outcome of the preliminary analysis. Section 6 shows the empirical findings and Sect. 7 concludes the study with recommendations.

2 Literature review

An increasing number of research shows that stock price variation can be generally attributed to the activities of two kinds of investors; arbitrageurs (rational investors) and noise traders (not fully-rational investors) (Ross 1976; Shleifer & Summers 1990; Sias et al. 2001; Gagnon & Karolyi 2010; Ramiah et al. 2015). Arbitrageurs are tactical market participants who specialize in taking advantage of market inefficiencies by wielding different investment strategies. They are highly knowledgeable in diagnosing arbitrage opportunities and because they often have access to huge resources, they take immediate actions, which significantly impact on stock price direction (Ross 1976). Essentially, arbitrageurs are research-oriented traders and are versed in technical or fundamental analysis. Due to the complexity of their strategies and the speed at which efficient markets readjust stock prices back to equilibrium, the goal of arbitrage is to expeditiously trade in assets whose prices do not reflect their true fundamental values, thus exploiting the discrepancies to earn risk-free profits. Although, arbitrage is a fallout of market mispricing; nevertheless, arbitrageurs are important traders whose investment strategies provide liquidity to the stock market.

While arbitrageurs are typically skilled at their practice (Gagnon & Karolyi 2010), noise-traders, on the other hand, are not fully rational investors, often without experience or professional knowledge of trading but whose motive for trading is premised on illogical and invalid information (Sias et al. 2001). As the name suggests, a noise trader is a typical market novice who reacts to noise from the market, particularly high trade volume, and causes the market to digress from normal expected trading patterns, notwithstanding the actions of other rational investors. Although the impact of noise trading on price movement is not elusive in the finance literature (De Long et al. 1991; Black 1986; Sias et al. 2001), the severity of the impact on stock markets still remains a subject of debate.

The advent of online stock trading apps (e.g., Robinhood) has also contributed to the recent surge in noise-trading. A large category of traders using these apps are those who follow volume and price signals, and they form the bulk of aggregate trading for the day (Kim et al. 2020). Considering the incremental proportion of irrational trading to well-informed trading, studies have shown that previous stock market crises could not have been unconnected to the roles of noise traders (De Long et al. 1989; Lee & Rui 2002; Scruggs 2007). Hence, in a market where noise-traders outnumber arbitrageurs, it is possible for arbitrage opportunities to exist as prices in the market may not reflect all available information. Conversely, an efficient market with equal proportion of arbitrageurs and noise-traders will prevent arbitrage opportunities because the rivalry between them will quickly return stock prices to their intrinsic values. However, given the funding restrictions usually faced by arbitrageurs, that could potentially have been used to explore price inefficiencies, there is a likelihood for arbitrage opportunities not to disappear immediately. This is often referred to as limits to arbitrage (Shleifer & Vishny 1997) and tends to ruin the plausibility of the efficient market hypothesis. Hence, more research are motivated to model asset prices from a behavioural perspective, given that traditional models and fundamental factors do not exhaustively explain erratic movements in stock prices.

The theory of noise trading introduced by Black (1986) and subsequently developed by Trueman (1988) set the pace for the recognition of behavioural finance (or investors’ sentiments) in asset pricing. The theory emphasises that sentiments which generate from the noise around financial markets make market observations imperfect. Black (1986) specifically notes that noise trading which can be adduced to market uncertainty provides information to investors on market liquidity and surreptitiously creates an arbitrage opportunity for moving prices of risky assets back to their fundamental values. Although, Fama (1998) opines that pricing anomalies only exist by chance, largely because of arbitraging and methodological imperfections. Nevertheless, Brown & Cliff (2004) and Baker & Wurgler (2006) have demonstrated that the motives of market arbitrageurs are quite dissimilar to other stock market participants and by extension, the unpredictability of changes in investors’ sentiments has a crucial tendency to create adverse cross-sectional variations in stock returns, thus limiting the activities of arbitrageurs. This continuous debate motivates numerous studies to channel attention towards the key drivers of stock prices.

Testing for the time-horizon (short and long term) impact of sentiment on assets in two different environments, Ling et al. (2010) examine the relation between investor sentiment and returns in public and private markets in the US. Applying vector autoregressive (VAR) models to capture the short-run dynamics, they show that investor sentiment has a positive relationship with returns in the short run, with a large magnitude of returns evident and consistent in public markets than private markets. On the long horizon, they find that a negative relationship exists between investor sentiment and returns, with a consistent price reversion to its fundamental values. Their study concludes that private markets appear to be persistently characterised by sentiment-induced mispricing than in public markets. In another related study, Brown & Cliff (2005) explore the link between asset valuation and investor sentiment in US using survey-based sentiment proxies. Their study documents that market mispricing errors are positively related to investor sentiment and that asset valuation is affected by sentiment. They provide evidence that in a multi-year horizon, future returns are negatively related to sentiment.

Also using survey data for sentiment, Ho & Hung (2009) examine the importance of investor sentiment in asset-pricing. Using the monthly equity data of the New York Stock Exchange (NYSE) and the American Stock Exchange (AMEX) from the Centre for Research in Security Price (CRSP) and COMPUSTAT datasets for the period from July 1964 to December 2005, they assess whether sentiment proxies could improve the impacts of the risk-factor models (size, value, liquidity, and momentum effects) on risk-adjusted returns of individual stocks. Their result shows that in the conditional CAPM, size effect becomes less significant and further insignificant in other models. Meanwhile, they also reveal that sentiment-augmented models improve and outperform the extant factor models in capturing stock anomalies and explaining the dynamics of expected stock returns, thus concluding that investor sentiment plays a drastic role in asset pricing.

In another similar research, Bathia & Bredin (2018) explain the importance of investor sentiment measures on conditional asset pricing model. Using monthly data for the period January 1980 to December 2014, they specifically test whether incorporating sentiment measures such as IPO first day returns, IPO volume, closed-end fund discount, equity fund flow, equity put-call ratio, dividend premium and change in margin debt, could improve the performance of risk factors on risk-adjusted returns of U.S. individual stocks. Their findings disclose significant impact of these measures on the asset pricing models.

Assessing the impact of investor sentiment in the oil and gas industry, Zhu et al., (2020) document significant cross-sectional effects of sentiments on returns in the industry. Using financial statement data of common stocks listed on the NYSE, AMEX, and NASDAQ, they show that several anomalies (specifically 13 out of 15) are inherent in the oil and gas industry. Furthermore, their analysis reveals that investor sentiment has significantly positive impact on four of these prominent capital market anomalies. They therefore conclude that investor sentiment is a reality for better explanation of asset valuation. Liang et al. (2017) develop a framework to explain the effect of sentiment on asset pricing in the Chinese stock market. Using sentiment proxies such as investor’s limited attention, anchoring, and other macroeconomic variables, they find and conclude that retail investors are often forced to pay more cognitive loss due to their insensitivity to market sentiment. They provide evidence that a bullish market due to higher level of investor rationality increases stock demand and thus push prices higher.

In another Chinese related study, Xu & Green (2013) adopt monthly data from the Shanghai and Shenzhen stock markets from January 1997 to December 2007 to study the impact of investor sentiment on stock returns in China. Using the three-factor Fama–French model as a benchmark, their study distinguish between positive and normal sentiments to explain the mispricing of returns. They show that the inclusion of sentiment factors such as turnover, the advances/declines ratio and the dividend premium, reduces the impact of the benchmark factors, suggesting that the Fama–French three-factor model does not comprehensively explain asset-pricing in China. Their findings further reveal that there is a difference of effect between the positive and normal sentiments and that sentiment appears to affect smaller companies than the larger ones. They therefore conclude that investor sentiment is a vital factor for explaining pricing anomalies in the Chinese market. Da et al. (2014) use information about households in the US to construct sentiment proxies. Splitting the information obtained into fear and economic attitudes, they predict the impact of household sentiments on short term return reversals and volatility. Their results show that the sentiment proxies broadly predict aggregate market returns with a mean reversal. They conclude that the market exhibits sentiment-induced temporary mispricing with a large effect on stocks that are susceptible to investor sentiments.

Decomposing investor sentiment into call and put, Yang & Zhang (2013) provide evidence of sentiment on asset pricing. By juxtaposing the two sentiment measures into a conditional framework, their study shows that sentiment-augmented asset pricing model could explain some anomalies in the stock market. Das et al. (2015) also explore the role of investor sentiment on the institutional trading behaviour in the REIT market and its subsequent impact on asset pricing. Splitting the data period into pre global financial crisis period, crisis period and post crisis period, they test two alternative theories—flight to liquidity and style investing theory—evaluate how sentiment induces trading behaviour in the real estate market. For the three time-categories, their findings show that investor sentiment plays a major role in the movement of capital around the real estate market, thus lending credence to the two theories they adopt.

Like prior studies have noted, investor sentiment can influence stock market behaviour and by extension asset pricing. Despite the expansive research on sentiments, the findings have always varied across sentiment measures and stock markets. This therefore activates compelling efforts to further examine the role of sentiments in pricing anomalies. Introducing a fresh sentiment proxy in this study is a novelty in literature; we anticipate that our measure would be subsequently adopted as a systematic risk factor in asset pricing.

3 Data and variables

This study assesses the economic importance of noise trading vis a vis, investors’ sentiments on asset pricing. To establish the significance of the sentiment factor and other risk factors, we use data from different sources and cover the period from January 1993 to December 2020. We download data relating to price and market capitalisation of all UK listed firms (FTSE All-Share) from the London Stock Exchange (LSE) and www.investing.com, while data relating to book values are downloaded from Datastream and Bloomberg. In constructing the portfolios, we used only non-financial companies with positive book to market values, thus arriving at a total of 325 companies within the sample period. For quality control, before merging the data, we examined the measures and standards adopted by these sources for data classification. For instance, we observe that the three sources have similar data designs for market and macroeconomic indices, thus entrenching consistency in our data structure. The data composition and sources are described in the next section.

3.1 Risk factors

Data on risk free rate (Rft), monthly portfolio returns, (Rpt), market return(Rmt) and market risk premium (Rpt−Rft) are sourced from Datastream and Bloomberg. Also, based on specific characteristics such as size (SMBt), value (HMLt) and momentum (UMDt), we also obtained data on both equally-weighted (EW) and value-weighted (VW) test portfolios from the Xfi Centre for Finance and Investment at the University of Exeter, which also brings up to date, the observations in Gregory et al. (2013). To give in-depth clarification, we use the excess returns of 5 book-to-market and 5 size portfolios, then the excess returns of portfolios formed using 10 book-to-market and 10 size.

3.2 Sentiment factors

Several indices have been adopted for investor sentiment in the literature, with some indirect observations. In lieu of the extant measures, we therefore propose the ratio of lagging to leading stocks as a new sentiment measure. This proxy, in addition to other sentiment measures such as liquidity (Baker & Wurgler 2007), dividend premium (Sim˜oes Vieira 2011) and consumer confidence indicator (Zouaoui et al. 2011), are used to form two composite indices of sentiment. The new sentiment proxy (the ratio of lagging to leading stock) is first introduced as a risk factor into the pricing model; other sentiment variables are later introduced into the model as a composite risk index, after constructing both raw and clean sentiment indices. Data for these variables are obtained from Datastream.

3.3 Macroeconomic factors

Basically, we observe that the extant sentiment indices suffer from the vicissitudes of the economy, as they are highly correlated with macroeconomic cycles. Hence, in constructing the clean sentiment index mentioned earlier, we obtain data on macroeconomic variables and regress on each of the above sentiment proxies to generate residuals. The residuals of these sentiment proxies are later used to construct the second principal component analysis which we refer to as the clean sentiment index. Data for macroeconomic variables such as industrial production index (IPI), consumer price index (CPI), broad money supply (M3) and base lending rate (BLR) are obtained from Bloomberg and Datastream.

4 Empirical models

In examining the significance of sentiment measures in asset pricing, we begin our model specification with single-factor model (capital asset pricing model), then we introduce the Fama-French three factor model and the Carhart four factor model. Finally, we integrate sentiment factors into the four-factor model; starting with the basic sentiment variable (ratio of laggards to leaders), then the raw and clean sentiment indices which collectively capture liquidity, dividend premium and consumer confidence indicator, after controlling for macroeco- nomic variation. An empirical expression of the various models used in this study is shown in the next section.

4.1 The risk factor models

Our empirical outlook is founded on the standard capital asset pricing model. The model is mathematically expressed as:

$$R_{p} t - R_{f} t = R_{f} t + \beta R_{m} t - R_{f} t + \in_{p} t$$
(1)

Rpt is the return on portfolio p for month t; Rft is the risk-free rate of return; the difference between the two is the excess return α; Rmt is the return of a well-diversified market index; Rmt−Rft is referred to as the market risk premium and t is the residual term.

In addition to the single factor model, Fama & French (1993) initiated two risk factors to capture size and value. The equation is expressed as follows:

$$R_{p} - R_{f} = R_{f} t + \beta R_{m} t - R_{f} t + \beta_{p,SMB} SMB_{t} + \beta_{p,HML} HML_{t} + \in_{pt}$$
(2)

SMBt and HMLt refer to the size and value factors of the portfolio respectively, at a particular month.

Augmenting the three-factor model above, Carhart (1997) proposed an additional risk factor, the ‘winner minus loser’ factor, to capture momentum effect. The four-factor model is expressed as:

$$R_{p} - R_{f} = R_{f} t + \beta R_{m} t - R_{f} t + \beta_{p,SMB} SMB_{t} + \beta_{p,HML} HML_{t} + \beta_{p,UMD} UMD_{t} + \in_{p}$$
(3)

UMDt refers to the momentum factor at a particular month.

For the sake of simplicity, we wrap up the above risk factors into a single composite equation called risk factor variables (RFV). This is shown below:

$$R_{p} - R_{f} = R_{f} + \beta RFV_{t} ] + \in_{p}$$
(4)

where RFVt encloses each of the above risk factors at a particular month.

4.2 Sentiment-augmented risk factors

The next set of models incorporates our sentiment proxies into the Carhart (1997) four factor model distinctly. These proxies are the basic sentiment variable, the raw sentiment index generated from the principal component analysis (PCA) and the clean sentiment index, also generated from PCA after controlling for macroeconomic factors.

The model which embeds the basic sentiment variable BSV is shown below:

$$R_{p} - R_{f} = R_{f} + \beta RFV_{t} ] + \beta_{p,BSV} BSV_{t} \in_{p}$$
(5)

where BSVt refers to the basic sentiment factor at a particular month.

4.3 Constructing sentiment index using principal component analysis (PCA)

Principal Component Analysis (PCA) is an intuitive statistical technique that trans- forms linearly correlated set of observations using orthogonal dimensions and still preserve the inherent features of the observations. Following Baker & Wurgler (2006); Chen & Sherif (2016), this study computes a principal component analysis (PCA) that generates a first stage senti- ment index capturing the common component in basic sentiment variable (BSV ), consumer confidence indicator (CCI), dividend premium (DP ) and liquidity (LIQ). Furthermore, both the Eigen values and the variances of the four components are calculated.

$$PCA_{1} = \beta_{1} (X_{1} ) + \beta_{2} (X_{2} ) + \beta_{1} P(X_{P} )$$
(6)

where

$$PCA_{{1}} = \beta^{T} (X)$$
(7)

On the first principal component, PCA1 represents the subject’s score; βT (X) stands for the regression coefficient for the observed variable P and XP represents the subject’s score on the observed variable P. In calculating the first principal component, focus is centred around the highest variance in the observation. To prevent the creation of potential large values for the weights of β1 and β2, which consequently affects the variance of PCA1, this study restricts the weights to ensure that their sum of squares is 1.

$$PCA_{2} = \beta_{21} (X_{1} ) + \beta_{22} (X_{2} ) + \beta_{2P} (X_{P} )$$
(8)

To eliminate macroeconomic variations that may be rooted in the raw sentiment proxies, this study further generates a clean sentiment index by obtaining residuals of each sentiment proxy after regressing the macroeconomic variables (IPI, CPI, M3 and BLR) on them. Again, this study calculates both the Eigen values and the variances of the four components for the clean sentiment index. In a similar pattern, the second principal component is produced with an orthogonal transformation that prevents it from being correlated with the first principal component but must produce the next greatest possible variance. The outcomes of the principal component analysis are shown in Table 1 below.

Table 1 Results of first and second stage principal component analyses

4.3.1 Raw sentiment-index model

The results of the PCA show that the first two principal components are well suited to construct a raw sentiment index. Their results show Eigen values of 1.8685 and 1.2775 respectively and variances of 0.4671 and 0.3194 respectively. Hence, the first two principal components explain about 79percent of the overall variance for all variables. The output of the raw sentiment index is provided below:

$$RSI = (0.{4671}/0.{7865})\;Component{1} + (0.{3194}/0.{7865})\;Component{2}$$
(9)
$$Component_{1} = - 0.3276BSV + 0.4749CCI - 0.5874DP + 0.5675LIQ$$
(10)
$$Component_{2} = 0.6943BSV + 0.5209CCI - 0.3271DP - 0.3736LIQ$$
(11)

Following the computation of the raw sentiment index, it is therefore integrated into the Carhart (1997) four factor model as follows:

$$R_{p} t - R_{f} t = R_{f} t + \beta [RFV_{t} ] + \beta_{p,RSI} RSI_{t} + \in_{p} t$$
(12)

Here RSIt refers to the raw sentiment factor at a particular month.

4.3.2 Clean sentiment-index model

The results of the clean sentiment index also favour the first two principal components. Their results show Eigen values of 1.620 and 1.079 respectively and variances of 0.405 and 0.270, hence, explaining about 67.5percent of the overall variance for all variables. The output of the clean sentiment index is shown below:

$$CSI = (0.405/0.675)\;Component1 + (0.270/0.675)\;Component2$$
(13)
$$Component_{1} = - 0.071BSV + 0.547CCI - 0.635DP + 0.541LIQ$$
(14)
$$Component_{2} = 0.876BSV + 0.298CCI - 0.141DP - 0.351LIQ$$
(15)

This index is incorporated in the model as follows:

$$R_{p} - R_{f} = R_{f} + \beta RFV_{t} ] + \beta_{p,CSI} CSI_{t} + \in_{p}$$
(16)

where CSIt refers to the clean sentiment factor at a particular month.

4.3.3 Performance of asset pricing models

It is widely acknowledged in literature that estimates produced by asset pricing models are approximately close to reality. Therefore, as a robustness test, it becomes empirically logical to compare the performance of the competing asset pricing models used for estimation, to find out which model provides the best estimates. Several techniques have been employed to achieve this task, however, the model proposed by Hansen & Jagannathan (1997) (H–J) has enjoyed dominance in finance literature. The H–J distance is a statistical innovation that compares the economic performance of a set of competing models, diagnosing misspecification relating to them and identifying the most suitable among the models. Hence, H–J distance can be best described as a model revealing the maximum pricing error associated with a portfolio. It is measured as:

$$\delta = \min ||m{-}m*||,$$
(17)

where m represents the fitted values and m represents the actual values.

In an insightful manner, Hansen & Jagannathan (1997) also show that for a proposed asset pricing model, the random payoff of a portfolio is

$$\delta = \max \left| {\pi (\xi )} \right. - \left| {\pi^{y} (\xi )} \right|,\;\left\| \xi \right\| = 1$$
(18)

where π(ξ) and πy(ξ) are alpha of asset prices measured by respective asset pricing model. Chen & Sherif (2016) further show that HJ measures the mean square distance between the fitted and actual values. Mathematically, HJ minimum distance can be denoted as ε (m − m) such that:

$$m{-}m* = [m(\pi )R - 1]E(RR)^{ - 1} (R)$$
(19)

We further breakdown the above equation into two parts such that:

$$[\varepsilon [m(\pi )R - 1]] = \alpha$$
(20)
$$E(RR)^{ - 1} = S$$
(21)

where R represents excess returns and S inverse of sum of returns.

5 Preliminary analysis

In this study, the explanatory variables are classified into three: factor variables, sentiment variables and macroeconomic variables. Tables 2, 3, 4, 5 show the descriptive statistics for each of the explanatory variables and the portfolios. The tables report the values for the mean, standard deviation, minimum and maximum. During the observed monthly periods, Table 2 which specifically shows the summaries for the explanatory variables report that among the factor variables (CAPM, SMB, HML UMD), UMD has the highest descriptive values. The monthly momentum effect (UMD) reveals a mean of (0.01), standard deviation (0.049), minimum (−0.250) and maximum (0.160) values. On the other hand, monthly reports for Value (SMB) and Size (HML) reveal similar mean (0.002) and stand deviation (0.034) but different minimum and maximum values. The results suggest that the average excess return expected from the inclusion of momentum factor is much higher compared to other factors. This, perhaps, justifies the introduction of the risk factor in the pricing model. In the case of the sentiment variables, liquidity has the highest values for the mean (18.34), minimum (16.85) and maximum (19.57) while the consumer confidence indicator reports the highest standard deviation (8.12). We therefore interpret the results that on the average, market turnover in the UK is much higher than other sentiment variables used in the study. This is not unexpected given the volume of transactions executed in the UK stock market monthly. For the macroeconomic variables, broad money supply shows the highest mean, minimum and maximum values (14.08; 13.12 and 14.81) while basic lending rate reports the highest standard deviation (2.44). The results for Tables 3, 4, 5 which capture the descriptive statistics for the factor variables are also reported in similar fashion.

Table 2 Summary statistics for the explanatory variables
Table 3 Summary statistics for the 5 BTM and size portfolios
Table 4 Summary statistics for the 10 BTM and size portfolios
Table 5 Pairwise correlation of explanatory variables

Table 5 shows the correlation coefficients for the explanatory variables. We report cross- relationships among the intersecting components - risk factors, sentiment measures and mac- roeconomic variables. Our keen interest is on the sign and size of relationships between the sentiment variables and other variables. First, we observe that across the sentiment measures (BSV, CCI, DP and LIQ) and risk factors (CAPM, SMB, HML and UMD), there is a presence of low but significant correlation coefficients. We interpret this outcome to imply that despite the significant interaction among the proxies, the size of cross-relationship is not strong enough to cause severe mutual impact. Hence, these variables can still be utilized as explanatory variables in the model without further adjustment. Interestingly, across the factor variables and macroeconomic variables, only HML and CPI reveal a significant negative relationship of 0.13. Other variables within the same intersect show very low and insignificant correlation coefficients. Meanwhile, a cursory look at the sentiment variables and macroeconomic variables shows strong and significant relationships. Including these variables in the models without further modification could produce spurious results. Hence, the outcome of these relationships led to extracting clean sentiment indices by running a regression of the sentiment measures on each of the macroeconomic variables and extracting the residuals therefrom. The residuals are further cross-correlated with the macroeconomic variables, and they report very weak relation- ships. Hence, the residuals are used to construct principal component analyses (PCA) which represents the new sentiment factors. Nevertheless, we still run the raw sentiment index along with the other models just for the sake of statistical comparison.

6 Empirical findings

We begin our empirical analysis by focusing on the regression outputs in Tables 6, 7, 8, 9. The tables show the estimates of alpha, and t-statistic obtained for each variant of the pricing models. Considering that this study employs six models, each table reports a pair of regression estimates (alpha values and t-statistic) for each of the six models against the individual portfolios. For clarity, alpha values are reported in first column while estimates of t-statistic, which are put in parenthesis, appear on the second column. Furthermore, below each regression output, the estimates of Gibbons et al. (1989) GRS test statistic are reported. GRS test ascertains the rejection (or acceptance) of the null hypothesis that all alpha values for each model are significantly close to zero; thus, the p-values of the GRS test are collectively reported. The p-value of the coefficient of the GRS must be insignificant for the model to be efficient. Additionally, the regression output is further extended to capture the mean adjusted R2, mean alpha value and the mean standard error of the alpha value. Following the approach of Gregory et al., (2013), it is imperative to mention that the magnitude of the mean adjusted R2 for the six regression models are compared. Furthermore, as stated earlier, apart from the use of adjusted R2, several measures are employed in literature to ascertain the most suitable model for asset pricing. However, the technique proposed by Hansen & Jagannathan (1997) (H−J) focuses on the behavioural patterns of prediction and improves the accuracy of asset pricing models. Hence, to better understand the strength of the models, this study adopts H−J test to also evaluate the economic performance of the five models.Footnote 1

Table 6 Regression output for 5 book to market portfolios
Table 7 Regression output for 5 size portfolios
Table 8 Regression output for 10 book to market portfolios
Table 9 Regression output for 10 size-portfolios

We start our interpretation with the regression outputs on Table 6 showing the 5 BTM portfolios. The results for the equally and value-weighted portfolios reveal similar indicators. It is observed that the Fama-French 3-factor model and raw sentiment index failed the GRS test for the weighted portfolios. They show significant p values, thus a rejection of the null hypothesis that the two models (3F and RSI) have alpha estimates jointly close to zero.

CAPM, 4F, 4F_BSV and 4F_CSI passed the GRS test in both the equally and value weighted portfolios. Furthermore, the results show that the inclusion of the basic sentiment variable in the model gradually magnifies the mean alpha, thus implying that excess return is improved by including the sentiment variable in the model specification. Furthermore, the results of the adjusted R2 show a reduction from 5.30 for the CAPM model to 3.70 for the 4F CSI model. Also, the findings show that H-J error decreases from 0.032 for the CAPM to 0.011 for the clean sentiment index. We therefore infer from these findings that our sentiment measure improves excess returns in this portfolio and based on the outcomes of the H-J estimate, the model with clean sentiment index performs better. Similar result patterns are also revealed in the value-weighted book to market portfolio.

Akin to the findings on 5 BTM portfolios, Table 7 shows the regression outputs for the 5 size portfolios. At least, five models passed the GRS test; the only exception is the 4F RSI which has a significant p value and thus fails the GRS test statistic. The five other models: CAPM, 3F, 4F, 4F BSV and 4F CSI passed the GRS test suggesting that the alpha estimates can be jointly distinguished from zero. Surprisingly, we observe that for both the equally and value- weighted portfolios, mean alphas of the five separate models moved in a reverting approach. For instance, in the equally weighted portfolios, mean alpha for CAPM is 0.82, reduced to 0.66 using the 3F model and later increased to 0.80 when the clean sentiment index was introduced. Also, it is observed that a drastic change occurred in the mean adjusted R2 of the two sets of portfolios when the sentiment variable is included. To illustrate this, the CAPM produced a mean adjusted R2 of 5.37 for the equally-weighted portfolio and reduced to 4.47 using the 4F CSI. In addition, for the value-weighted size portfolio, the CAPM produced a mean adjusted R2 of 5.06 but was significantly increased to 5.01 under the 4F CSI. Also, the H–J distance error declines from 0.031 to 0.011 under the equally-weighted portfolio and from 0.026 to 0.060 under the value-weighted portfolio. By implication, the use of both raw and clean sentiment indices in the model specification produced efficient results, with an excess return higher than those of other models. Table 8 covers the regression outputs for the 10 BTM portfolios. The study observes that for the 10 BTM portfolios, using the equally and value weighted constructs, the outcomes reveal about three models (3F, 4F BSV and 4F CSI) that passed the GRS test but with insignificant p values. The introduction of the sentiment variable in the model does not significantly improve the estimates of the mean alpha but the adjusted R2 of the sentiment- augmented models decreased steadily from a value of 5.66 for CAPM to 5.23 for 4F CSI model. This is equally the same for the H–J error. The output for the 10way size portfolio in Table shows almost identical pattern of results but with only 4F BSV passing the GRS test for both the equally and value weighted portfolios. It produced a higher p value indicating its alpha estimates are jointly distinguishable from zero. Summarily, the regression outputs suggest that the inclusion of sentiment measures as risk factors, significantly improve the extant models.

7 Robustness: NYSE composite index

The previous estimations were based on portfolios formed within the UK All Share Index. To further test the accuracy of our model, we formed portfolios using the NYSE composite index. Essentially, we use six portfolios formed based on size and BTM. Table 10 conveys the ability of the six asset pricing models to describe the cross-sectional excess returns of these portfolios. For both the equally and value-weighted portfolios, the findings depict similar patterns of previous outputs. It is shown that sentiment variable marginally increases the mean alphas and gradually reduces adjusted R2 of the six separate models. Despite the significant impact of the sentiment variable, these outcomes become deflated as all the models failed to pass the GRS test because they produced significant p values (Table 11, 12).

Table 10 Regression output for 6 size and book to market portfolios using NYSE composite index
Table 11 Regression Output for 6 Size and Momentum using alternative sentiment proxies
Table 12 Robustness: 2SLS regression output for 6 size and momentum using VIX as an alternative sentiment proxy

8 Robustness: alternative sentiment proxies

We further consider alternative sentiment proxies and create a new set of raw and clean sentiment indices for model estimation. We use turnover ratio (calculated as trading volume divided by market value) (Baker & Wurgler 2007) and investors intelligence index (Brown & Cliff 2005). Repeating the same methodological approach as shown above and testing the model on 6 portfolios formed based on size and momentum within the UK All Share Index, our findings in table 10 show that the sentiment variable gradually magnifies the mean alpha. Overall, the estimation results reveal that the sentiment measures perform better than the traditional factor models. Additionally, we use VIX as an alternative sentiment measure and apply two stage least squares regression. Our findings show that the results do not differ significantly from the main estimation, suggesting that the sentiment-induced models perform better.

9 Conclusion and policy implications

In recent times, studies have revealed linkage between the activities of noise traders with stock price disruption. The motivation stems from the evidence that sentiments which permeate stock markets reflect investors’ illusion about market information. Moreover, ample evidence from the use of experimental psychology in finance (behavioural finance) show that human biases and cognition affect the behaviour of stock prices and essentially, could trigger prolonged market anomalies. More so, the absence of single definitive sentiment measure has increased research interest in employing various tools for predictive purposes. Motivated by these growing efforts and the fact that traditional pricing methods do not aptly account for erratic stock price movements, we contribute to the current literature by conceptualising what noise trading might represent, and thereafter investigate the dynamics of noise trading and investors’ sentiments in asset pricing. In estimating the predictive ability of sentiment constructs, as an alternative or complimentary risk factor, first, we document the impact of extant asset pricing models, such as capital asset pricing model (CAPM), Fama-french 3-factor model and Carhart 4-factor model, on the excess returns of a variety of portfolios formed in different patterns and extracted from firms within the UK. Secondly, the study extends its focus to incorporate sentiment measures into the erstwhile models with a view to discerning the influence of such measures to increase or decrease excess returns. The major concern was to generate intercept and coefficient for each portfolio, with the aim of examining if the intercept terms for each model are jointly significantly different from zero.

To generate convincing estimates, the study conducted correlation analysis of variables to verify commonality of components. Hence, instead of using only the basic sentiment model we constructed, the study also includes, in different model specifications, raw and clean senti- ment indices derived from a principal component analysis (PCA) of variables. The first PCA explains about 79% of the overall variance for all variables while the second PCA provides ex- planation to about 67.5% variation. Further to the inclusion of the iterated sentiment variable in different pricing models, the findings indicate that sentiment-augmented estimates provide statistical significance in explaining the excess returns of majority of the portfolios, in fact, outperforming the traditional asset pricing models. Our results are further corroborated with the use of Hansen & Jagannathan (1997) non-parametric test which substantiates the inclusion of sentiment measures in our asset pricing models. More importantly, our results are consistent and conform with the positions of past studies on investor sentiments, such as Baker & Wurgler (2007), Joseph et al. (2011) and Bathia & Bredin (2018). Hence, our basic sentiment measure, the ratio of laggards to leaders, communicates strong market (or price) signal to noise traders. The ratio can therefore serve as a valid proxy for investor sentiment, which in turn can be used to predict excess or abnormal returns.

Our findings provide significant indications for future academic and practical attempts at asset pricing. The study shows that sentiment is a vital systematic risk factor that must not be ignored by analysts, regulators and other stakeholders when analysing the return characteristics of any asset or portfolio. We show that four factor model is inadequate and needs a sentiment factor to improve asset pricing. The study therefore concludes that extant asset pricing models may not be sufficient to capture market or price anomalies as the activities of noise traders, exemplified by sentiments, have significant implications on the behaviour of stock markets.