Introduction

In the past several decades, housing prices have skyrocketed in major Chinese cities. The annual growth rate of real Chinese national housing prices could be as high as 16.7% from 2006 to 2010 according to Wu et al. (2014), and the annual growth rate of housing prices in first-tier cities of China was as high as 13% over 2003–2013 according to Fang et al. (2016). While some scholars (e.g., Chow & Niu, 2015; Tan et al., 2020; Wang & Zhang, 2014; Wu et al., 2016) have documented how these housing price appreciations are driven by fundamental factors of demand and supply, still a large part of these price appreciations remained unexplained.

Advances in behavioral finance have offered one potential explanation for this housing market fever in China, i.e., investor sentiment (or market sentiment). Originating from “animal spirits” by Pigou (1927) and Keynes (1936), Shiller (2000) proposes that the “irrational exuberance” of investors renders them to rely on many psychological factors for asset valuations. These early studies have laid the foundations for research on investor sentiment in the stock and many other financial markets (e.g., Baker & Wurgler, 2006, 2007). In behavioral finance models, investor sentiment, independent of market fundamentals, is believed to play a role in price determination and could induce some investors to possess systematic biases in their beliefs. Many studies have shown how market sentiments greatly affect stock returns (Baker & Wurgler, 2006, 2007; Da et al., 2015; Huang et al., 2015).

In viewing the important role of market sentiment in real estate markets, there is a growing literature examining the market sentiment on housing returns and dynamics, with the majority of the study focusing on commercial real estate markets (including REIT markets) in developed markets. Examples of these studies include Clayton and MacKinnon (2002, 2000), Gallimore and Gray (2002), Clayton et al. (2009), Lin et al. (2009), Ling et al. (2014), Jin et al. (2014), Das et al. (2015), Letdin et al. (2021) and so on. All these studies on commercial real estate markets and REIT markets highlight the significant role of market sentiments in the valuations of real estate properties.

However, empirical examinations of market sentiment on Chinese residential housing markets are limited. More formally, housing market sentiment could be defined as a misguided belief about housing price appreciations, which cannot be justified by the current economic information set available to housing market participants, as in Ling et al. (2015). As emphasized in Hui and Wang (2014) and Hui et al. (2017), the private (versus commercial) housing sector stands out as an ideal “victim” of investor sentiment for several reasons. First, market participants in private housing markets are mostly individuals and households with limited knowledge and information and hence more susceptible to sentiment. Second, the illiquidity of housing properties leads to high segmentation of the market and asymmetry of information. Among all housing sectors, the private housing market has the highest liquidity and makes it resemble the stock market. Lastly, short-selling restrictions render mispricing in the private housing market hard to eliminate. As the real estate market is the mainstay of this emerging economy and housing composes a large part of household asset holdings, examining the impacts of investor sentiments on the private housing market is imperative, as emphasized in Case and Shiller (2003), Shiller (2010) and Case et al. (2012).

In this paper, we try to fill this gap by applying sentiment analysis techniques to housing markets in China, intending to examine how housing sentiments affect local housing markets in this emerging economy. To achieve this goal, we first construct city-specific housing sentiment indices for 18 major Chinese cities and then conduct a series of empirical analyses to examine the sentiment impacts on local housing returns. More specifically, we construct three sentiment proxies representing the local housing market liquidity and speculative behaviors from a massive second-hand transaction dataset and then use the recursive look-ahead-bias-free implementation of the partial least squares (PLS) method to extract a local housing sentiment index from these three proxies for each city considered. A panel regression of housing return forecasting is then examined with local housing sentiments as an explanatory variable. To investigate the persistence of sentiment impacts, we also estimate impulse responses of cumulative housing returns to sentiment shocks over different return horizons.

For sentiment construction, the first two proxies are the median time on the market of sold housing units and the turnover rate of listed housing units in a given month, serving as two housing market liquidity measures. These two housing market liquidity measures resemble the turnover rate of the stock market in Baker and Wurgler (2006). For the stock market, Baker and Stein (2004), Baker and Wurgler (2006), and Baker and Wurgler (2007) suggest that turnover, or more generally liquidity can serve as a sentiment index. For the real estate market, Clayton et al. (2008) and Ling et al. (2015) point out that increased liquidity is the amplifying channel of a pricing-sentiment spiral in the housing market. The third sentiment proxy is constructed as a small-house return premium over large houses. This small-house return premium measures the speculative investment in Chinese housing markets, as speculative investors in China tend to over-invest in smaller housing units because of less capital requirement and higher liquidity, and hence smaller housing units appreciate faster in cities with more speculative investors.

With these three local housing sentiment proxies, we first orthogonalize each of these proxies against a set of fundamental variables and then implement the recursive look-ahead-bias-free estimation of the PLS method by Kelly and Pruitt (2013, 2015) and Huang et al. (2015) to construct the local housing market sentiment for each city in our sample. Compared with traditional dimension reduction methods such as principal component analysis (PCA), PLS filters out irrelevant common components and summarizes the most useful information in these proxies to construct a single complex sentiment index with the largest covariance with the targeted series of predictions. To remove the potential look-ahead bias of the in-sample estimation of PLS, we recursively conduct the look-ahead-bias-free implementation of PLS with an expanding window scheme to obtain a look-ahead-bias-free estimate of the housing sentiment sequence.

The panel housing return forecasting regressions show that local housing sentiment indices have robust predictive powers for future housing market returns, while the impulse response estimations indicate a salient pattern of short-run underreaction and long-run overreaction. Further analysis shows that local housing sentiment impacts are asymmetric in that local housing sentiments only have significant impacts on housing returns when the market sentiment is below the sample average. In addition, the sentiment effects are much stronger and more significant for cities with a relatively inelastic housing supply. Housing sentiments exhibit strong inertia and are positively correlated with past short-term cumulative housing returns, implying that housing sentiments have a backward-looking property and there exists a significant feedback effect between housing returns and sentiments.

The documented sentiment impacts are comparable to what has been documented for developed housing markets. As shown in the results section, a one-standard-deviation increase in local housing sentiments will cause the annual housing return to increase by about 1.68% in China, while this effect is approximately 5.58% for the U.S. housing market in Soo (2018) and approximately 3.73% in Bork et al. (2020).Footnote 1 These results indicate that the housing markets in China are also susceptible to irrational sentiments, with arbitrage and speculation prevalent in this emerging market.

To prove that the recursive look-ahead-bias-free implementation of the PLS method does not lead to “false detection of predictability”, we conduct a placebo test in which three randomly-generated sentiment proxies are drawn from the standard normal distribution and then summarized into a “placebo” sentiment index by the recursive look-ahead-bias-free procedure. We find no significant predictive power of this “placebo” sentiment on future housing returns. The placebo test documents that the significant predictability of our extracted sentiment index is not generated mechanically by the recursive look-ahead-bias-free implementation of the PLS method.

Our main regression results are also robust to alternative sentiment construction methods, such as the scaled PCA method by Huang et al. (2022) and the traditional PCA method, and are consistent for the sample period before COVID-19. Also, we find consistently significant predictability power of an alternative sentiment index based on the number of newly listed properties (instead of the turnover rate as a proxy) on future housing returns.

Housing market sentiment is particularly difficult to measure compared with other financial markets for several reasons. First, as stated in Soo (2018), typical sentiment proxies for the stock market, such as closed-end fund discounts, mutual fund flows, and dividend premiums, do not have straightforward counterparts for the housing market. Second, housing markets are highly segregated geographically. The housing market conditions for different cities vary substantially even within a given region during the past housing cycle (Ferreira & Gyourko, 2012). Third, transactions in the housing market usually take several months or even years to complete. Hence, the impact of sentiments on housing market outcomes may take a long time to unfold.

In this paper, we choose to follow the studies on market-based (indirect) stock market sentiments to construct housing market sentiments (Baker & Wurgler, 2006; Da et al., 2015; Huang et al., 2015). The two housing market liquidity measures of our sentiment indices mimic the turnover rate of the stock market. The speculation measure of local housing markets could serve as an alternative to first-day returns on IPOs, which measures investor enthusiasm for the stock market. More importantly, by the richness of the transaction dataset, we can construct a sentiment index for each city considered, capable of reflecting heterogeneous local housing market conditions. And the impulse response analysis of cumulative housing returns to sentiment shocks over different return horizons enables us to examine how the sentiment impacts unfold with time.

Compared with the textual approach in Soo (2018), which develops housing sentiments for 34 cities across the United States by quantifying the qualitative tone of local housing newspaper articles, our market-based proxies may be a delayed reflection of market sentiments. However, these transaction-based proxies provide a good complementary to the textual sentiments when representative local housing newspapers are not available. Compared with the survey method to quantify market sentiments, such as those in Bork et al. (2020) and Ling et al. (2015) which use household survey responses to questions about buying conditions for houses in the U.S., our transaction-based sentiment index is an indirect measure of market sentiment.Footnote 2 But given the fact that surveys regarding Chinese customers are limited, our transaction-based sentiment index is a good alternative to those survey-based measures.

Moreover, even though internet usage is common nowadays, an aggregate search index, such as the mortgage default risk based on the Google search index of Chauvet et al. (2016) for the U.S. market, cannot provide localized sentiment measures given the heterogeneous properties of local housing markets. For Chinese real estate markets, Zheng et al. (2016) use Google search counts to construct city-specific confidence indices for 35 Chinese cities from 2005 to 2013. The confidence index constructed by Zheng et al. (2016) is calculated as the ratio of the count of positive entries to the total count of both positive and negative entries regarding the targeted real estate market. For the sample period considered in this paper (2016 to 2020) and the sample period of Zheng et al. (2016), Google search has been blocked and hence Google search entries may not work as a good indicator of confidence index or investor sentiments. The sentiment indices constructed in this paper are based on market transaction outcomes, which are the results of demand and supply factors,Footnote 3 and could serve as a complement to these methods listed above, especially the city-specific confidence index by Zheng et al. (2016).

The contribution of this paper lies in three parts. First, we contribute to a growing literature evaluating the effect of market sentiments on housing markets by analyzing a representative sample of cities in China. Among existing studies on housing sentiments regarding developing markets (Hui & Wang, 2014; Hui et al., 2017, 2018; Lam & Hui, 2018; Soo, 2018; Zhou, 2018), most of them construct a sentiment index for a specific city and examine the relationship between sentiments and housing returns of the city such as Shanghai or Hong Kong, which lack a representative dataset. Aided by massive transaction data, we construct city-specific housing sentiment indices for 18 major cities in China, which are more representative of Chinese housing markets. This panel data structure of our sentiment indices provides more reliable and comprehensive empirical findings on the relationship between local housing sentiments and returns. This paper confirms that investor sentiment is an important factor in Chinese housing markets, suggesting arbitrage and speculation are prominent in this emerging market.

Second, we provide more evidence on the role of local housing supply conditions in analyzing housing market sentiment impacts. Existing studies have shown that supply-side conditions greatly affect housing prices and constraints on housing supply can explain large differences in housing price dynamics in different regions (Glaeser & Ward, 2009; Glaeser et al., 2006, 2008; Green et al., 2005; Saiz, 2010). The role of housing supply may be more significant in Chinese housing markets as indicated by Liu (2014) and Wang et al. (2012). However, supply-side conditions have been overlooked in the empirical examinations of housing sentiment impacts on local housing returns (except Zheng et al., 2016). In this paper, we take advantage of supply elasticity estimates from Liu et al. (2019) to study whether local supply conditions alter market sentiment impacts. Our result shows that housing markets in cities with relatively inelastic housing supply are more sensitive to local housing sentiments than places with elastic housing supply. For cities with a much more elastic housing supply, the impacts of market sentiments on returns are much smaller or even negligible. The finding that local housing supply elasticities play an important role in the sentiment effects highlights the necessity of considering local market conditions in the analysis of sentiment analysis.

Third, we also contribute to the literature on China’s high housing price puzzle. There exist many theoretical as well as empirical works documenting and explaining why the housing price level or growth rate is so high in China (Chen & Wen, 2017; Fang et al., 2016; Glaeser et al., 2017; Wei et al., 2012; Wu et al., 2016). Many of these works debate the existence of a bubble in Chinese housing markets by analyzing the fundamentals underlying this emerging market. The findings in this paper provide some evidence on this issue from the aspect of the nonfundamental market sentiments. Our local housing sentiment indices are shown to impact local housing returns significantly in addition to many fundamental controls, indicating nonfundamental sentiments also add up to the housing market fever of China. The short-run underreaction and long-run overreaction pattern of housing sentiments imply that there exist irrational investors in the Chinese housing market, exaggerating the turmoil of local housing markets in the short run. We also document that the expectations of Chinese real estate investors are backward-looking and there exists a significant feedback effect between housing returns and market sentiments. This pricing-sentiment spiral could enhance the ongoing market fever of Chinese housing markets and potentially extend the length and magnitude of housing market cycles in China.

In brief, our empirical analysis enriches the literature on the role of sentiments in the housing market and provides better insights into the housing market dynamics, which may help policymakers to stabilize and improve the functioning of the housing markets. The rest of this paper is organized as follows. In "Data and Housing Price Index" section, we introduce our transaction data and construct second-hand housing price indices. Descriptions of sentiment proxies and the construction of city-level housing sentiment indices are discussed in "Housing Sentiment Index" section. "Local Housing Sentiments and Returns" section studies the relationship between housing market sentiments and returns. "Robustness Checks" section provides some robustness checks on pre-COVID-19 estimation, alternative sentiment construction methods, alternative sentiment proxies, and the placebo test. "Conclusions" section concludes.

Data and Housing Price Index

Transaction Data

In this paper, we use a massive second-hand housing transaction dataset to construct the housing price index and the sentiment index.Footnote 4 The private (residential) transaction records are retrieved from Lianjia.com, one of the most popular second-hand housing trading platforms in China. As a real estate brokerage firm, Lianjia helps housing sellers to list their properties online and provides brokerage services for potential sellers and buyers. Once a transaction (sale) has been concluded, Lianjia records the sale and posts its information in the Transacted-Units section on its website. Each transaction record includes detailed information on the characteristics of the sold housing unit: listing date, listing price, transaction date, transaction price, and project name (xiao qu), as well as unit attributes, such as floor area, layout, decoration, orientation, etc.

We collect all the transaction records from this Transacted-Units section on Lianjia.com for each city in our sample. In total, we obtain 1,799,272 transaction records from January 2015 to October 2020 covering 18 major cities in China. Table 1 lists all the cities and their corresponding sample periods. The earliest date of the transaction data is January 2015 for Shanghai and Tianjin. In the following, we first use the transaction data to construct the Hedonic Housing Price Index (HPI) and then derive local housing sentiment using the recursive look-ahead-bias-free implementation of the PLS method for each city. Since 2016, a large-scale purchase restriction policy and a series of tightening policies were introduced to fight against speculative housing investment and rapid housing price growth.Footnote 5 Hence, our analysis is under a general tightening policy environment.

Table 1 Transaction data and sentiment sample periods

Housing Price Index Based on a Hedonic Model

The first step of our empirical study is to construct a monthly quality-adjusted housing price index (HPI) for each city in our sample. We use the hedonic regression to build the HPI. The Hedonic model is introduced into the housing markets by Rosen (1974). According to the theory, housing prices contain a part that is driven by some attributes of the house itself (area, layout, orientation, and so on). Following this idea, if these properties were decomposed from the price and then the remaining part is purely driven by supply and demand factors.

Accordingly, our HPI construction is based on the following equation:

$$\mathrm{ln}\left({P}_{ilt}\right)=\alpha +\sum\nolimits_{t=2}^{T}{\beta }_{t}{D}_{t}+\gamma {X}_{ilt}+{\pi }_{l}+{\varepsilon }_{ilt},$$
(1)

where \({P}_{ilt}\) represents the trading price per square meter for housing unit \(i\) in the project (xiao qu) \(l\) on transaction date \(t\). \({X}_{ilt}\) denotes a vector of unit attributes, including floor area, elevator, number of bedrooms, floor level, orientation, and decoration type (fancy, simple, or other). We also control for the project-fixed effect \({\pi }_{l}\). Note that a project in Chinese cities is a very small geographic unit, similar to a Census block in the U.S. The project-fixed effects control for neighborhood amenities, including the school district, crime rate, traffic service, distance to CBD, and so on. The error term is denoted by \({\varepsilon }_{ilt}\).

A series of month dummies, \({D}_{t}\), are included in the regression to capture the time variation in the housing prices clear of quality changes. The value of HPI for month \(t\) is then calculated as \(exp\left({\beta }_{t}\right)\times 100\) and the value is set as 100 for the origin month. We apply the hedonic regression for each city in our sample and obtain 18 city-specific monthly HPI series from January 2015 to October 2020. It is worth noting that the final HPI series is an unbalanced panel with the earliest date in January 2015 for Shanghai and Tianjin.

Housing Sentiment Index

In this section, we use second-hand transaction data to construct our housing sentiment indices. We first introduce our sentiment proxies and then illustrate the recursive look-ahead-bias-free procedure of the PLS method from which the housing sentiment index is constructed. Finally, we display some correlation analysis between our sentiment indices and several confidence indices from official sources.

Sentiment Proxies

In the construction of sentiment proxies, we majorly follow the approach of Zhou (2018). We construct three sentiment proxies based on our trade-by-trade data. The first proxy is MedianIntv, the natural logarithm of the median holding period of sellers, i.e., the median number of days on the market of sold housing units in a given month. Intuitively, a lower level of MedianIntv would indicate a relatively higher market sentiment.

The second proxy is Turnover, the ratio of housing areas being sold to the total housing areas for sale in a given month. To construct this proxy, for each month, we divide the total floor areas transacted in this month by the floor areas of listed properties including transacted houses and available-for-sale houses (which have been listed but not yet sold until the current month). Note that this Turnover ratio is a quasi-turnover ratio of the housing market. The real turnover rate of the housing market is the ratio of trading volume to the total housing stock as in Ling et al. (2014). Usually, the real turnover rate of the housing market is small, since most housing units are not for sale. By definition, this Turnover proxy measures the liquidity of listed properties in the housing market and usually, it is much larger than the real turnover ratio of the housing market.

Essentially, constructed in the same spirit as measures in Clayton et al. (2008), Ling et al. (2014), and Ling et al. (2015), MedianIntv and Turnover measure the liquidity of real estate markets. Unlike MedianIntv, Turnover should be positively correlated with the housing market liquidity since a liquid housing market should be associated with a shorter time on the market and a high turnover rate. As the correlation matrix in Table 9 of Appendix A shows, Turnover is significantly negatively correlated with MedianIntv. Meanwhile, Turnover exhibits a positive correlation with the contemporaneous housing return \({R}_{t}\) while MedianIntv exhibits a negative correlation with \({R}_{t}\). These correlation coefficients provide some support for the positive (negative) correlation between Turnover (MedianIntv) and market liquidity.

The third proxy is SMB, the small-house return premium over large houses. Each year we calculate the quintile breakpoints of the transacted houses’ floor areas and then divide houses into five groups according to the latest breakpoints. Small (big) houses are those in the first (fifth) quintile. Then we use the Hedonic method in "Housing Price Index Based on a Hedonic Model" section to construct a housing price index for small houses and big houses, respectively. In the end, SMB is equal to the small-house return minus the big-house return (in percentage).

Zhou (2018) states that over-optimistic investors tend to buy big houses and push up prices for these large houses. Hence, the proxy SMB, which measures the return premium of smaller houses, is expected to be negatively correlated with sentiments according to Zhou (2018). However, in contrast to this argument, speculative investors in the Chinese housing market tend to over-invest in smaller houses for several reasons. First of all, the financial pressure of speculating in small houses is relatively low because of lower total costs and hence less capital requirement. Secondly, the risk in investing in small houses is much lower as a flipper of small houses could more easily find a buyer in the market because of the higher liquidity and the dominant demand for small houses. Thirdly, the potential return on investing in small houses is higher since the price appreciation tends to be higher for smaller houses.

For the Singapore housing market, Fu and Qian (2014) also find that short-term speculators typically target smaller units as they require less capital and are easier to sell. Furthermore, some online news reports also reveal that small housing units, the primary target of speculators, are always sold out even during periods of a market downturn.Footnote 6 Some speculators were personally interviewed and admitted that “they invest in small houses because of the high return-on-investment ratio”.Footnote 7 In short, from the point of view of speculating behavior, the prices for smaller housing units appreciate faster in cities with more speculative investors, and hence SMB should be positively correlated with housing sentiments. We will examine the correlation between SMB and housing sentiment in the next section.

There are two more proxies used by Zhou (2018), namely, the newly-opened housing construction area, and the housing sector investment to construct the housing sentiment for Shanghai. In this paper, we choose to construct the housing sentiment using MedianIntv, Turnover, and SMB only for three reasons. Firstly, since we are constructing sentiments for 18 Chinese cities, data for the other two proxies are not available for some cities in our sample. Secondly, this paper tries to construct a market outcome-based sentiment. Newly-opened housing construction and housing investment reflect sentiments of the supply side, which has been reflected in market outcomes to some extent. So, omitting these two proxies in the construction of the sentiment will not result in big information loss. Finally, housing investment and construction usually take several years to accomplish. So, sentiments from the supply side may not be synchronous with these based on market outcomes. Hence, omitting the housing investment and construction variables and relying on the other three transaction-based proxies will leave us with a more synchronous sentiment measure.

At the end of this section, we display some summary statistics for these three proxies. To save space, Table 2 only shows the average values of these three proxies for Beijing, Shanghai, Guangzhou, and Shenzhen, respectively. For example, the average median days on the market of second-hand properties in Beijing is 23.54 days (exp(3.159)) in 2016, while this value increases to 70.39 days (exp(4.25)) in 2020. Also, the turnover rate of listed second-hand properties has decreased from 46 to 29% during the same period for Beijing. The average small-house return premium is about 0.24% for Beijing in 2016, decreasing to a negative value of -0.17% in 2020.

Table 2 Housing sentiment proxies from 2016 to 2020

Housing Sentiment Index

With these above three proxies, we then construct sentiment indices for the 18 cities in our sample. First, all these raw proxies are standardized with zero means and unit standard deviations. Second, to eliminate the impact of the business cycle, we regress the standardized proxies on some macroeconomic variables and obtain the residuals. Following Zhou (2018), we choose four macroeconomic variables including Purchasing Managers’ Index (\(PMI\)), the growth rate of consumer price index (\(CPI\)), the growth rate of M2 (\(M2G\)), and the difference between the yield of AA-grade corporate bonds and AAA-grade corporate bonds (\(Default\)). The residuals from these regressions contain sentiment information that is orthogonal to business cycle factors.

Third, to smooth out jumps, we impose a three-month moving average to the residuals, as in Huang et al. (2015). The smoothed residuals are called “clean” proxies. Fourth, considering that the clean proxies may have a lag in reflecting the sentiment of market entities, we need to choose between the current value and the lagged value of each proxy, as in Baker and Wurgler (2006). In detail, we first perform principal component analysis on \({{\varvec{M}}{\varvec{e}}{\varvec{d}}{\varvec{i}}{\varvec{a}}{\varvec{n}}{\varvec{I}}{\varvec{n}}{\varvec{t}}{\varvec{v}}}_{{\varvec{t}}}\), \({{\varvec{M}}{\varvec{e}}{\varvec{d}}{\varvec{i}}{\varvec{a}}{\varvec{n}}{\varvec{I}}{\varvec{n}}{\varvec{t}}{\varvec{v}}}_{{\varvec{t}}-1}\), \({{\varvec{T}}{\varvec{u}}{\varvec{r}}{\varvec{n}}{\varvec{o}}{\varvec{v}}{\varvec{e}}{\varvec{r}}}_{{\varvec{t}}}\), \({{\varvec{T}}{\varvec{u}}{\varvec{r}}{\varvec{n}}{\varvec{o}}{\varvec{v}}{\varvec{e}}{\varvec{r}}}_{{\varvec{t}}-1}\), \({{\varvec{S}}{\varvec{M}}{\varvec{B}}}_{{\varvec{t}}}\), and \({{\varvec{S}}{\varvec{M}}{\varvec{B}}}_{{\varvec{t}}-1}\), to extract its first principal component. Then the correlation between the first component and each of the six variables is calculated to determine whether to choose the current value or the lagged value. Finally, the value which is more correlated with the first component is chosen as the final sentiment proxy.

Fifth, we conduct the recursive look-ahead-bias-free implementation of the PLS approach of Kelly and Pruitt (2013, 2015) and Huang et al. (2015) to construct a look-ahead-bias-free housing sentiment index for each city in our sample.

Specifically, at month t, for each sentiment proxy \({proxy}_{i}\)(MedianIntv, Turnover, or SMB), we run the following first-stage regression:

$${proxy}_{i,s-1}={\pi }_{i,0}+{\pi }_{i,1}{Return}_{s}+{u}_{i,s-1},s\le t,$$
(2)

in which \({proxy}_{i,s-1}\) denotes the lagged sentiment proxy (proxy i at time \(s-1\)) and \({Return}_{s}\) denotes housing return (in percentage) at time s. To obtain the loadings for month \(t\), we only use data up to month \(t\) in the above regression. The latest return (\({Return}_{t}\)) used on the right-hand side is from time t-1 to t. The latest sentiment proxies used on the left-hand side is \({proxy}_{i, t-1}\). Thus, the first-stage coefficient estimates \({\widehat{\pi }}_{i1}\) are in the time t information set since they use monthly returns \({\{Return}_{2},\dots , { Return}_{t}\)} and monthly proxies \({\{proxy}_{i,1},\dots , { proxy}_{i,t-1}\)}.

In the second-stage regression, for month \(t\), we run the cross-sectional regression as follows:

$${proxy}_{i,t}=\alpha +{S}_{t}^{PLS}{\widehat{\pi }}_{i1}+{v}_{i,t},i=\mathrm{1,2},\dots ,N.$$
(3)

In the above regression, the independent variable is the loadings that we obtain in Eq. (2). Then the slope estimate \({S}_{t}^{PLS}\) obtained in the above regression is the look-ahead-bias-free sentiment estimate for month t.

For sentiment estimate \({S}_{t+1}^{PLS}\) at month t + 1, we re-estimate the above first-stage regressions by expanding the first-stage sample to {s: s \(\le\) t + 1} and re-estimate the second-stage regression for time-period t + 1. By recursively implementing the above procedure with an expanding estimation window, we obtain a recursive look-ahead-bias-free estimate of the housing sentiment sequence {\({S}_{t}^{PLS}, t={t}_{0}\), …., T}.

Note that the first available sentiment estimate comes on the date \({t}_{0}\), which is usually later than the starting period of the proxy and housing return sample. This sample loss is because we need to use an initial training data set {s: s \(\le {t}_{0}\)} to estimate the first-stage regression for the month \({t}_{0}\). In the empirical implementation, we use the first twelve months as the initial training sample and hence our final sentiment sample starts one year later than the transaction data as exhibited in Table 1.

We display the housing sentiment index (in red) and the hedonic HPI returns (in blue) for selected cities in Fig. 1. As Fig. 1 shows, the trend of housing sentiments is similar to that of housing returns, especially in Beijing, Guangzhou, and Shenzhen. This co-movement pattern indicates a positive sentiment-return relationship. We also calculate the contemporaneous correlation between the sentiment index (\({S}_{t}^{PLS}\)) and the housing return \({R}_{t}\) as shown in Table 9 in Appendix A. The correlation coefficient (0.327) is similar to that of Zhou (2018) with a 0.26 correlation between her sentiment indices and returns for the Shanghai housing market.

Fig. 1
figure 1

Housing sentiments and returns. Notes. The blue line represents hedonic HPI returns (in percentage) and the red line represents housing sentiments

Correlation with Proxies and Confidence Indices

To confirm the reliability of our sentiment indices, we first calculate the correlations between \({S}_{t}^{PLS}\) and these three sentiment proxies. As Table 9 in Appendix A shows, the housing sentiment \({S}_{t}^{PLS}\) is positively correlated with Turnover and negatively correlated with MedianIntv. These results are consistent with the findings on the positive correlation between market liquidity and sentiments in the literature. To further investigate the correlation between local sentiment and proxies, we calculate their correlation for each city and display the results in Table 3. Results in Table 3 verify the positive correlation between sentiment and Turnover and the negative correlation between sentiment and MedianIntv.

Table 3 Correlation of local housing sentiments with three sentiment proxies

Interestingly, distinct from Zhou (2018), we find that 12 out of 18 cities show a positive correlation between sentiment and SMB. As argued by Zhou (2018), SMB should be negatively correlated with housing sentiment. However, this is not the case for Chinese housing markets. As mentioned above, speculative investors in Chinese housing markets tend to over-invest in smaller houses, and hence prices for smaller housing units appreciate faster in cities with more speculative investors. So, the proxy SMB reflecting housing investors’ speculative behavior should be positively correlated with housing sentiments. Our empirical results in Table 3 verify the prevalence of speculative investors in Chinese housing markets. Cities such as Shanghai, Nanjing, Hefei, Dalian, Tianjin, Guangzhou, Hangzhou, Wuhan, Jinan, Shenzhen, Suzhou, and Changsha, all show significantly positive correlations between local housing sentiments and the small-house return premium. These findings indicate that speculative behaviors are prevalent in Chinese housing markets, and over-optimistic investors are not buying bigger houses but instead over-investing in smaller housing units.

We also compare our sentiment index with some official indices for Beijing and Shanghai, the two largest cities in China. Since housing buyers include both consumers and investors (Han, 2013; Miller & Pandher, 2008), we consider consumer confidence indices and investor confidence indices. As Table 10 of Appendix A shows, our housing sentiment index of Shanghai is significantly positively correlated with the Housing Boom Index of China (\({HB}_{CN}\)) but negatively correlated with the Investor Confidence Index for Economic Policy (\(I{C}_{C{N}_{P}}\)) and the Investor Confidence Index of China (\(I{C}_{CN}\)). The sentiment index of Beijing exhibits positive but insignificant correlations with the Consumer Confidence Index of Beijing (\(C{C}_{BJ}\)) and the Consumer Expectation Index of Beijing (\(C{E}_{BJ}\)). While these results provide some support to the reliability of our second-hand transaction-based housing market sentiment index, it also reveals that housing sentiment may exhibit distinct characteristics relating to macroeconomic indices.

Local Housing Sentiments and Returns

Predicting Returns with Local Housing Sentiments

In this section, we examine the relationship between local housing sentiments and housing market returns. The basic regression equation is set as follows:

$${R}_{i,t+1}=\alpha +\beta {S}_{it}^{PLS}+\gamma {Spring}_{t}+\eta {Autumn}_{t}+\theta {Macros}_{t}+\delta {Fundamentals}_{it}+{\pi }_{i}+{\lambda }_{t}+{\varepsilon }_{it},$$
(4)

where \({R}_{i,t+1}\) denotes the housing market return of city \(i\) at time \(t+1\), calculated by the log return (in percentage) of HPI constructed in "Housing Price Index Based on a Hedonic Model" section. \({S}_{it}^{PLS}\) is the housing sentiment of city \(i\) at time \(t\). \({Spring}_{t}\) and \({Autumn}_{t}\) are seasonality dummy variables, standing for the cold and hot seasons for Chinese housing markets, respectively.Footnote 8\({Macros}_{t}\) refers to some macroeconomic control variables which only vary across time but not across cities, including Purchasing Managers Index (\({PMI}_{t}\)), the growth rate of M2 (\({M2G}_{t}\)), and the yield spread between AA-grade corporate bonds and AAA-grade corporate bonds (\({Default}_{t}\)).

We also control for some city-specific economic fundamentals (\({Fundamentals}_{it}\)), including the growth rate of the consumer price index (\({CPI}_{it}\)), the growth rate of the population (\({population}_{it}\)), the unemployment rate (\({unemployment}_{it}\)), the growth rate of the average income of on-the-job employees (\({income}_{it}\)), the growth rate of the average monthly per-square-meter rents (\({rents}_{it}\)), the house price to rent ratio (\({PRR}_{it}\)).Footnote 9 City-fixed effects (\({\pi }_{i}\)) are included in the regression, and time-fixed effects, \({\lambda }_{t}\), are also controlled for in the last specification in the result table. All these control variables are obtained from the Wind dataset. Finally, \({\varepsilon }_{it}\) denotes the regression residual.

Table 4 presents the results from Eq. (4) with standard errors clustered at the city level while Table 11 in Appendix A provides the detailed results of this regression. From Column (1) to Column (6), control variables are added step by step. In the first column, we only control for seasonality dummies (\({Autumn}_{t}\) and \({Spring}_{t}\)) besides the sentiment variable and the city-fixed effects. In the second column, we further control for time-varying macroeconomic variables. City-specific fundamental controls are added in Column (3) and lagged macroeconomic and fundamental variables are further included in Column (4) of Table 4. In Column (5), we also include the first lag of housing returns in consideration of the persistence of housing returns. In the last column of Table 4, time-fixed effects are controlled for and hence these time-varying macroeconomic variables and seasonality dummies are omitted.

Table 4 Predicting future returns by local housing sentiments

As shown in Table 11 in Appendix A, we can see that housing returns exhibit a salient seasonality pattern (with lower returns in Autumn) and decrease with the default risk (\({Default}_{t}\)). Other macroeconomic variables’ impacts are not persistent across different specifications. For fundamental controls, the unemployment rate and the inflation rate display significant negative impacts on housing returns according to the last three columns. Other fundamental controls’ impacts are not significant or persistent across different specifications.

The most important finding from Tables 4 and Table 11 is that the local housing sentiment has a significant (at the 5% significance level) positive impact on future housing returns. This significant impact is consistent across different specifications, although with different magnitudes. Our conclusions are mainly based on Columns (6) with time-fixed effects of Tables 4 and Table 11. Column (6) shows that housing sentiments can strongly predict future housing returns even when we control for time-fixed effects. In detail, a one-standard-deviation increase in housing sentiments is positively associated with a future monthly return appreciation of approximately 0.14%. This result indicates that the annual housing return will increase by about 1.68% (0.14% × 12) in China given a one-standard-deviation positive shock to housing sentiments.

Soo (2018) also finds a positive predicting power of news media sentiments on quarterly housing returns for the U.S. housing markets. It is shown that for every one percent increase in four quarters of accumulated lagged sentiments, the future quarterly price appreciation is approximately 0.93%. With a standard deviation of the news sentiments of value 1.5, annual housing appreciation will be 5.58% (1.5 × 0.93% × \({4}\)) to a positive one standard deviation of the news sentiment shock. The housing sentiment impacts on Chinese markets estimated in this paper are comparable (but smaller) to that inferred from Soo (2018). More recently, Bork et al. (2020) use household survey responses from the University of Michigan consumer surveys to construct a housing sentiment index for the U.S. housing market. The housing sentiment constructed by Bork et al. (2020) is associated with a standard deviation of 0.08 and is estimated to impact the quarterly housing price growth rate with a coefficient of 11.67. So, one standard deviation increase in housing sentiments will lead to a 3.73 (11.67 × 0.08 × 4) percentage points increase in the annual housing appreciation. Our estimate of housing sentiment impacts on the Chinese housing markets is more comparable to the estimate from Bork et al. (2020).

We offer one explanation for this disparity as well as the similarity among these sentiment impacts. Soo (2018) draws the conclusion based on city-specific sentiment indices from media tone in local newspaper articles for 34 U.S. cities from 2000 to 2013, while Bork et al. (2020) construct an aggregate housing sentiment index for the U.S. housing market based on the University of Michigan consumer surveys for a much longer period of 1975 to 2017. In contrast, the housing sentiment indices constructed in this paper are based on market transaction outcomes for 18 major Chinese cities covering the period of 2016 to 2020. Within this sample period, Chinese housing markets are under strict regulation policies, such as purchasing restrictions on multi-house owners and so on. However, even given these strict regulations, the estimated annual housing sentiment impact is still as high as 1.68%, indicating that Chinese housing markets may be also highly susceptible to irrational sentiment compared with the U.S. market. As Baker and Wurgler (2006) suggested, assets with limited arbitrage were prone to be influenced by sentiment more easily. This comparable sentiment impact on housing returns in China under a stringent regulatory environment also indicates that there may be many arbitrage and speculation behaviors in the Chinese housing market.

Underreaction and Overreaction

Further, it is meaningful to explore whether the transaction-based sentiment captures fundamental information or just market sentiments. According to behavioral asset-pricing theories, if the sentiment captures investors’ behavior bias, the cumulative response curve of asset price should exhibit short-run underreaction and long-run reversal. Liu et al. (2019) show that the effect of sentiments on stock returns behaves the pattern of short-run underreaction and long-run reversal in China’s stock market. In this section, we want to explore whether the same pattern will appear in the housing market.

Following Liu et al. (2019), we adopt the local projection method of Jordà (2005) to estimate cumulative impulse responses of housing returns to local housing sentiment indices. As shown by Jordà (2005), this local projection method is more robust to misspecifications than conventional vector-autoregression (VAR) for estimating impulse responses. Specifically, we consider the following multiple-horizon predictive regressions:

$$\mathrm{log}\left({P}_{ib}\right)-\mathrm{log}\left({P}_{it}\right)={\alpha }_{b}+{\beta }_{b}{S}_{it}^{PLS}+{\delta }_{b}{Control}_{it}+{\pi }_{i}+{\lambda }_{t}+{\varepsilon }_{it},$$
(5)

where \(b\) ranges from month \(t+1\) to month \(t+12\), extending the return horizon from 1 month to 1 year. \({P}_{it}\) represents the second-hand HPI constructed in "Housing Price Index Based on a Hedonic Model" section. \(\mathrm{log}\left({P}_{ib}\right)-\mathrm{log}\left({P}_{it}\right)\) measures the cumulative logarithmic return (in percentage) for city \(i\) from \(t\) to \(t+b\), denoted by \({R}_{i,\left[t, t+b\right]}\), for \(b=1,\dots ,12.\) The cumulative impulse response of housing prices to \({S}_{it}^{PLS}\) is captured by \({\beta }_{b}\) at different horizons, which can be separately estimated by ordinary least squares. As in Column (6) of Table 4, we control for city-fixed effects (\({\pi }_{i}\)), time-fixed effects (\({\lambda }_{t}\)), and some fundamental variables including \({CPI}_{it}\), \({population}_{it}\), \({unemployment}_{it}\), \({income}_{it}\), \({rents}_{it}\), \({PRR}_{it}\), and their lagged terms in the estimation (variable definitions see "Predicting Returns with Local Housing Sentiments" section).Footnote 10

Table 5 reports the estimated housing sentiment coefficients at different return horizons. We can find that the cumulative impulse response gradually increases, peaks at around 5 months, and reverses to a lower and statistically insignificant level at longer horizons. Figure 2 depicts the coefficients of housing sentiment \({\beta }_{b}\) from Table 5 with the time horizon. The salient pattern of short-run underreaction and long-run overreaction is similar to what Liu et al. (2019) document for the Chinese stock market. The significant short-run impacts and insignificant long-run impacts indicate that the sentiment indeed leads to mispricing in China’s housing markets in the short run. And the lack of permanent effects of housing sentiments on cumulative returns in the long run indicates that there is no fundamental information contained in the housing sentiments. This result verifies the existence of irrational traders in the Chinese housing market by standard behavioral asset-pricing models as in Barberis et al. (1998); De Long et al. (1990); Daniel et al. (1998); Hong and Stein (1999).

Table 5 Multi-Horizon Return Regressions
Fig. 2
figure 2

Sentiments and future multi-horizon returns. Notes. This figure depicts the coefficient of sentiment \({\beta }_{b}\) for horizon \(b\) ranged from 1 to 12 months in regression (5). The dots on the curve indicates the coefficient \({\beta }_{b}\) is significant and no dots means insignificant

Asymmetric Effects of Sentiments

In this section, we decompose our sentiment \({S}_{it}^{PLS}\) into a positive part and a negative part to investigate potential asymmetric housing sentiment impacts. We define \({S}_{it}^{pos}=max\left\{{S}_{it}^{PLS},0\right\}\) as the positive sentiment, and \({S}_{it}^{neg}=min\left\{{S}_{it}^{PLS},0\right\}\) as the negative sentiment.Footnote 11 Then the \({S}_{it}\) in Eq. (5) is replaced by \({S}_{it}^{pos}\) and \({S}_{it}^{neg}\) while other control variables remain the same. The empirical results are displayed in Table 6. Table 6 shows that \({S}_{it}^{neg}\) has significantly positive impacts on cumulative returns at short horizons (b = 1, 3, 6) while \({S}_{it}^{pos}\) has no significant coefficients for all horizons. These results indicate that local housing sentiments have an asymmetric impact on housing returns, with only negative sentiments having significant positive effects on future housing returns.

Table 6 Predicting returns by negative and positive sentiments

Our findings are distinct from Zhou (2018) who shows that high positive sentiments are followed by low housing returns. On one hand, our conclusions are based on panel data analysis for 18 cities other than Shanghai only. The broad set of cities makes our findings more representative of the Chinese housing markets. On the other hand, this different pattern may be related to our sample period. Zhou (2018) constructed a trade-based sentiment for Shanghai from January 2010 to May 2015, in which the housing price experienced several rounds of upswing. In contrast, our sample covers the period of January 2016 to October 2020, during which a series of tightening policies were introduced to curb the rapid growth of housing prices. The tightening policy trend and the global COVID-19 epidemic may result in a generally low level of housing sentiment relative to the period before 2016.

Sentiments and Housing Supply Elasticities

Given housing supply elasticity is one of the key determinants of housing prices (Glaeser et al., 2006; Liu, 2014), we will investigate potential heterogeneous sentiment effects across cities with different housing supply elasticities in this section. The supply elasticity estimates are obtained from Liu et al. (2019), which construct housing supply elasticities for 282 cities in China by combining natural geographical constraints, cultivated land protection constraints, and floor area ratio regulations of local housing markets. These elasticity measures show housing supply in most Chinese cities is overall inelastic and exhibits cross-sectional heterogeneity.

Based on these elasticity estimates, we divide cities into two groups, one with a relatively elastic housing supply and the other with a relatively inelastic housing supply. More specifically, we define a dummy variable, \({ElastHigh}_{i}\), taking the value of 1 if the elasticity of city \(i\) is greater than the 75 percentile of elasticities of 282 cities given by Liu et al. (2019), and the value of zero otherwise. In our sample, Shanghai, Xiamen, and Hefei belong to the supply elastic group (\({ElastHigh}_{i}=1\)). Note that Shanghai has a relatively high supply elasticity, which could be related to its looser land use regulations and flat terrain in the area (Liu et al., 2019).

Then an interactive term of \({S}_{it}^{PLS}\) and \({ElastHigh}_{i}\) is included in Eq. (5). The results are given in Table 7. For b = 1, the coefficient on the interactive term is significantly negative, which means that the local housing sentiment has a stronger impact on returns for cities with relatively inelastic housing supply than cities with a more elastic housing supply. Also, the heterogeneous sentiment effects are persistent even for longer horizons (b = 3, 6, 9). This significant heterogeneous sentiment impacts on housing returns of cities with varying supply elasticities are consistent with findings in Zheng et al. (2016), which show that the confidence index based on Google search also has a larger impact on housing appreciations of cities with relatively inelastic housing supply.

Table 7 Interactive effects of housing supply elasticities and sentiments

The finding that local housing supply elasticities play an important role in the sentiment effects highlights the necessity of considering local market conditions in the analysis of sentiment analysis. For cities with a much more elastic housing supply, the impacts of market sentiments on returns are much smaller or even negligible. This may be because that cities with elastic housing supply could adjust housing supply easily when expecting a warming-up market sentiment. However, the supply elasticity is not time-invariant. According to Liu et al. (2019), natural geographical constraints, cultivated land protection constraints, and floor area ratio regulations are the three main factors in determining local supply elasticities. While the natural physical landscape is hard to change and land protection constraints are not allowed to loosen, local official governments could loosen regulations on floor area ratios in land development to alleviate the impacts of housing sentiments.

Predicting Sentiments with Returns

So far, we have shown that the transaction-based local housing sentiment has a positive effect on future housing returns, exhibits the pattern of short-run underreaction and long-run overreaction, and has an asymmetry impact on housing returns. In addition, the market sentiment effects are much stronger and more significant for cities with relatively inelastic housing supply.

Intuitively, sentiment may be affected by past housing returns. Soo (2018) shows that past price appreciations predict higher media sentiment for the U.S. housing markets. In this section, we investigate the determinants of housing sentiments by running the regression as follows:

$${S}_{i,t+1}^{PLS}=\alpha +\beta {R}_{it}+\gamma {Control}_{it}+\delta {S}_{it}^{PLS}+{\mu }_{i}+{\lambda }_{t}+{\varepsilon }_{it},$$
(6)

where \({S}_{i,t+1}^{PLS}\) denotes the sentiment of city \(i\) in month \(t+1\), \({R}_{it}\) denotes the housing return of city \(i\) in month \(t\). We control for \({S}_{it}^{PLS}\),\({ {rents}_{it}, { PRR}_{it}, CPI}_{it}\), \({population}_{it}\), \({unemployment}_{it}\), \({income}_{it}\) (and their first lags), city-fixed effects as well as time-fixed effects in Eq. (6).

Table 8 reports the estimation results. Columns (1)-(4) show that the housing return \({R}_{it}\) is positively correlated with future sentiment \({S}_{i,t+1}^{PLS}\), consistent with Soo (2018). Higher past housing returns will foster higher market sentiments. Results of Column (4) indicate a significant positive serial correlation in housing sentiments. These results indicate that the expectations of Chinese real estate investors are backward-looking and there exists a significant feedback effect between housing returns and market sentiments. This feedback effect could result in a pricing-sentiment spiral, as defined by Ling et al. (2015), in which the dynamic interplay between sentiment and house price appreciations can create a self-reinforcing spiral. The pricing-sentiment spiral could potentially extend the length and magnitude of housing market cycles in China and hence largely exaggerates the ongoing housing market fever in China.

Table 8 Predicting sentiments with past returns

Moreover, we find negative but insignificant effects of rent growth rates (\({rents}_{it}\)) and price-rent ratios (\({PRR}_{it}\)) on local housing sentiments as shown in Table 8. To further explore how persistently the past accumulated returns can push up local sentiments, we replace \({R}_{it}\) with \({R}_{i,\left[t-b,t\right]}\) in Eq. (6). The last three columns of Table 8 report the estimates for 3-month, 6-month, and 9-month return horizons, respectively. The results show that only the past 1-month return significantly predicts higher market sentiments, while past cumulative returns over longer horizons (i.e., 3-month, 6-month, and 9-month) seem to exert no significant impacts on housing sentiments. The most recent past returns (1-month) appear to exert the greatest impact on housing sentiment, similar to findings in Soo (2018).

Robustness Checks

In this section, we conduct several robustness checks on our main regression results. Firstly, to remove the potential shock of COVID-19 on the Chinese real estate market, we exclude the sample period after November 2019 and re-estimate our main regression. Secondly, we use alternative approaches (scaled PCA and PCA) to construct the sentiment index and re-estimate the main regression. Thirdly, we substitute an alternative proxy of sentiment, the new listings of second-hand properties, for turnover and then re-construct our sentiment index and re-estimate the main regression. Finally, to confirm the reliability of the recursive look-ahead-bias-free implementation of the PLS approach, we conduct a placebo test following Kelly and Pruitt (2013).

Pre-Covid-19 Sub-sample Estimation

COVID-19 broke out in November 2019 and was considered to have caused a huge negative shock to the Chinese economy and especially to the real estate market. To remove the potential effect of COVID-19, we discard the data after November 2019 and re-estimate our main empirical regression. Using the sample from January 2016 to October 2019, Panel A of Table 12 in Appendix B shows that a one-standard-deviation increase in local housing sentiment is associated with a future monthly return increase of approximately 0.14%, which is almost the same as that from the full sample estimation in "Predicting Returns with Local Housing Sentiments" section.

Alternative Approaches of Sentiment Construction: scaled PCA and PCA

To further prove the robustness of the predictability of local housing sentiment on future returns, we consider two alternative sentiment construction methods, i.e., the scaled PCA by Huang et al. (2022) and the traditional PCA method. For both alternative methods, we first follow the first four steps as described in "Housing Sentiment Index" section to get the final sentiment proxies (MedianIntv, Turnover, and SMB). And then, for the PCA method, we extract the first principal component of these three proxies to derive the PCA housing sentiment \({S}_{it}^{PCA}\).

For the scaled PCA method, after we get the final sentiment proxies following the first four steps, for month t, we run the following regression:   

\({Return}_{s}={\delta }_{i,t}+{\gamma }_{i,t}{proxy}_{i,s-1}+{u}_{i,s},s\le t\), (7).

by using information only up to month t. We then apply PCA to scaled proxies (\({\widehat{\gamma }}_{1,t}{\varvec{M}}{\varvec{e}}{\varvec{d}}{\varvec{i}}{\varvec{a}}{\varvec{n}}{\varvec{I}}{\varvec{n}}{\varvec{t}}{\varvec{v}}, {\widehat{\gamma }}_{2,t}{\varvec{T}}{\varvec{u}}{\varvec{r}}{\varvec{n}}{\varvec{o}}{\varvec{v}}{\varvec{e}}{\varvec{r}}\), \({\widehat{\gamma }}_{3,t}{\varvec{S}}{\varvec{M}}{\varvec{B}}\)) to extract the first component as the housing sentiment \({S}_{t}^{sPCA}\).Footnote 12 By recursively estimating the above predictive regression (7) and applying PCA to scaled proxies with an expanding window scheme, we obtain a recursive look-ahead-bias-free scaled PCA estimate of the housing sentiment sequence {\({S}_{t}^{sPCA}, t={t}_{0}\), …., T}. Like the recursive implementation of PLS, we use the first twelve months as the initial training sample and the constructed scaled PCA housing sentiment sequences are associated with one-year shorter sample periods.

As the correlation matrix in Table 9 of Appendix A shows, the new alternative scaled PCA sentiment index \({S}_{it}^{sPCA}\) is significantly positively correlated with the original sentiment \({S}_{it}^{PLS}\) (with a correlation coefficient of 0.491). The PCA sentiment \({S}_{it}^{PCA}\) is also significantly positively correlated with \({S}_{it}^{PLS}\), but with a much smaller correlation coefficient of 0.233. We re-estimate our main regression by replacing the sentiment estimates with \({S}_{it}^{PCA}\) and \({S}_{it}^{sPCA}\) with results summarized in Panel B (scaled PCA) and Panel C (PCA) of Table 12 in Appendix B, respectively. As results show, housing sentiment indices derived from both alternative methods still exert a significant impact on future housing returns, although with smaller and less significant impacts than that from the PLS method.

To briefly sum up, the predictability of the local housing sentiment we document is robust to the choice of the sentiment index construction method. However, our results do suggest that the PLS method seems to do a better job at constructing a predictive sentiment index than scaled PCA and PCA in our research setting.

Alternative Proxy: New listings

Lowry (2003) suggests that the number of initial public offerings (IPOs) reflects stock market sentiment. Similarly, in the housing market, a larger number of new listings could indicate a higher market sentiment. In this section, we replace the raw proxy \({{\varvec{T}}{\varvec{u}}{\varvec{r}}{\varvec{n}}{\varvec{o}}{\varvec{v}}{\varvec{e}}{\varvec{r}}}_{{\varvec{i}}{\varvec{t}}}\) with \({{\varvec{N}}{\varvec{e}}{\varvec{w}}{\varvec{L}}{\varvec{i}}{\varvec{s}}{\varvec{t}}{\varvec{i}}{\varvec{n}}{\varvec{g}}{\varvec{s}}}_{{\varvec{i}}{\varvec{t}}}\), the number of new listings at month \(t\) in city \(i\).Footnote 13 Then we apply these proxy cleaning procedures to MedianIntv, NewListings, and SMB, and conduct the recursive look-ahead-bias-free implementation of PLS to construct a look-ahead-bias-free sentiment index \({S}_{it}^{{PLS}^{^{\prime}}}\) for each city in our sample.

As the correlation matrix in Table 9 of Appendix A shows, NewListings is significantly positively correlated with Turnover (with a correlation coefficient of 0.118) but negatively correlated with MedianIntv (with a coefficient of -0.122). And the new alternative sentiment index \({S}_{it}^{{PLS}^{^{\prime}}}\) is significantly positively correlated with the original \({S}_{it}^{PLS}\) (with a correlation coefficient of 0.466). We display the main regression results using this alternative sentiment index \({S}_{it}^{{PLS}^{^{\prime}}}\) in Panel D of Table 12 in Appendix B. The coefficient on \({S}_{it}^{{PLS}^{^{\prime}}}\) (0.178 and significant at 5% level) is close to 0.137 from the main estimation in Column (6) of Table 4. This result suggests that our findings on the significant predictability of housing sentiment on housing returns are robust with the alternative proxy NewListings.

Placebo Test

Following Kelly and Pruitt (2013), we conduct a placebo test to confirm that our estimation procedure does not generate sentiment predictability mechanically. Intuitively, we can test our method by using simulated random sentiment known to have no true return forecasting ability. In detail, we conduct the placebo test as follows. Firstly, we generate three random series from the standard normal distribution (with zero mean and unit variance) as “fake” raw proxies for each city. Secondly, we then conduct the five-steps approach as described in "Housing Sentiment Index" section to these randomly generated proxy series to construct a “placebo” look-ahead-bias-free PLS sentiment index \({S}_{it}^{Placebo}\). Finally, we examine the predictability of this “placebo” sentiment on future returns. If we find no significant predictive power of this “placebo” sentiment, we could provide evidence that the look-ahead-bias-free implementation of PLS does not yield “false predictability” mechanically. As the results in Table 13 of Appendix C show, this “placebo” sentiment exhibits no consistent predictability on future housing returns.Footnote 14 This test confirms that the recursive look-ahead-bias-free implementation of PLS does not mechanically yield a sentiment with significant predictability.

Conclusions

This paper investigates the relationship between local housing sentiments and returns in Chinese housing markets. We construct monthly city-level sentiment indices for 18 Chinese cities from January 2016 to October 2020 by using a massive second-hand transaction dataset through a recursive look-ahead-bias-free implementation of the PLS method. These local housing sentiment indices are based on two housing market liquidity proxies and a small-house return premium measure. Empirically, we find that local housing sentiments can significantly predict future housing market returns. The sentiment impact is comparable with estimates for the U.S. housing market in the literature, implying that the Chinese housing markets are also susceptible to irrational sentiments even under a stringent regulatory environment.

Furthermore, a salient short-run underreaction and long-run overreaction pattern of sentiment effects is documented. Further analysis shows that local housing sentiment impacts are asymmetric, and housing returns in cities with relatively inelastic housing supply are more sensitive to local market sentiments. Last but not the least, we show that the expectations of Chinese real estate investors are backward-looking and there exists a significant feedback effect between housing returns and market sentiments.

Our major findings are robust to alternative sentiment construction methods and alternative sentiment proxy choice, and consistent for the sub-sample before COVID-19. Our empirical analysis enriches the literature on the role of sentiments in the housing market and provides better insights into the housing market dynamics. This paper’s findings can provide references for policymakers to stabilize and improve the functioning of the housing market.