1 Introduction

Since B. Mandelbrot identified the fractal structure of price fluctuations in asset markets in 1963 Mandelbrot (1963), statistical physicists have been investigating the economic mechanism through which a fractal structure emerges. Power laws are an important characteristic in the fractal structure. For example, some studies found that the size distribution of asset price fluctuations follows power law Mantegna and Stanley (2000), Mizuno et al. (2003). In addition, it is shown that firm size distribution (e.g., the distribution of sales across firms) also follows power law Stanley et al. (1995), Axtell (2001), Okuyama et al. (1999), Mizuno et al. (2012), Clementi and Gallegati (2016). The power law exponent associated with firm size distributions is close to one over the last 30 years in many countries Mizuno et al. (2002), Fujimoto et al. (2011). The situation in which the exponent is equal to one is special in that it is the critical point between the oligopolistic phase and the pseudo-equal phase Aoyama (2010). If the power law exponent is less than one, the finite number of top firms occupy a dominant share in the market even if there are infinite number of firms.

One of the most important issues along this line of research is power laws associated with asset price fluctuations, and several models describing asset price dynamics were proposed Richmond et al. (2013), Abergel (2016). In particular, asset price bubbles were regarded as an important research topic. For example, the PACK and LPPL models simulated the price fluctuations of a stock during bubble periods Sornette (2004), Takayasu et al. (2010). In economics and finance, asset price bubbles are defined as the deviation of the price of an asset from its fundamental value. However, it is not easy to obtain information about fundamentals. For example, it is often stated by economists that the fundamental stock value of a company equals to the present discounted value of dividends delivered by the company in the years to come. However, it is hard to get a reliable estimate of future dividends; therefore, it is next to impossible to estimate fundamental stock prices. Without accurate information on fundamentals, it is impossible to detect bubbles. This is a serious issue for policy-makers, like governments and central banks, since the emergence and the burst of bubbles often lead to intolerable economic disasters, such as financial crisis.

The purpose of this paper is to propose a method to detect stock price bubbles in a timely manner. Our basic idea is closely related to a method we proposed as a way to detect bubbles in the context of real estate prices Ohnishi et al. (2012). In this study on real estate prices, we looked for houses that are similar in various respects, including the location of a house, the size of a house, and the age of a house. We argued that houses with similar attributes can be regarded as having similar fundamental values; therefore, the prices for these houses should be similar if there are no bubbles in the housing market. Based on this idea, we looked at the distribution of house prices for houses with similar attributes, showing that it is close to a log-normal distribution during normal periods, but it has a heavy upper tail during bubble periods. In the present paper, we apply this idea to stock markets to detect stock price bubbles.

In this paper, we use a dataset compiled by the Thomson Reuters Corporation that covers daily market capitalization and annual income statements of all the listed firms in NASDAQ and SSE from 1990 to 2015. This period includes the 2000 dot-com and 2007 Shanghai bubbles. We focus on the distribution of market capitalization in NASDAQ and the Shanghai stock exchange (SSE) for these bubbles. Kaizoji et al. showed that the upper tail of stock price distribution in the Tokyo stock exchange grew fat during the dot-com bubble period Kaizoji (2006a, b). However, if we accurately investigate the firm size in stock markets, not only the price but also the outstanding shares must be taken into consideration because share consolidation and splitting often occurs in the market.

The rest of the paper is organized as follows. Section 2 examines the power law of market capitalization distribution using an expansion of the Castillo and Puig test Fujimoto et al. (2011), Malevergne et al. (2011), Del Castillo and Puig (1999), Hisano and Mizuno (2010). In Sect. 3, we observe that the power law index fluctuates around one, depending on economic conditions, and tends to become smaller during bubble periods. In Sect. 4, we find that net assets are most reflected in market capitalization for firms listed in NASDAQ during non-bubble periods. The price-to-book ratio (PBR, P/B Ratio) distribution, which is defined as market capitalization divided by net assets, got fat during the bubble periods. These results suggest that speculative money is excessively concentrated on specific stocks during bubble periods. Section 5 concludes the paper.

2 Distribution of market capitalization

First, we show the cumulative distribution of market capitalization for the listed firms in a stock market. The distribution on December 31, 1997 in NASDAQ is shown in log–log plots in Fig. 1. The distribution is on a straight line that indicates a power law function. However, in this case, the distribution deviates from a straight line and is close to a log-normal distribution when the market capitalization is smaller than about \(2.7 \times 10^8\) dollars:

$$P_{>} \;(x)\;\alpha \;{x^{ - \mu }}\quad{\rm{for}}\quad\;x\; \geq \;{x_0}$$
(1)

where x is the market capitalization, \(\mu\) is a power law index, and \(x_0\) is a threshold. The index estimated with the maximum likelihood method is \(\mu =1.0\) (Fig. 1). Such a power law with \(\mu =1.0\) is called Zipf’s law.

Malevergne et al. Malevergne et al. (2011), who expanded the likelihood ratio test between exponential distribution and the truncated normal distribution introduced by Castillo and Puig Del Castillo and Puig (1999), tested the null hypothesis where, beyond a threshold, a distribution’s upper tail is characterized by a power law distribution against the alternative where the upper tail follows a log-normal distribution beyond the same threshold. This is known as the uniformly most powerful unbiased test. They identified the upper tail that follows a log-normal distribution or a power law distribution to detect a threshold by conducting this test.

The dashed line in Fig. 1 displays the threshold between a power law distribution and a log-normal distribution when the p value of the significance level is set as 0.1 for Malevergne’s test. Therefore, the upper tail of market capitalization distribution can be well approximated by the power law function. On the other hand, a small range of market capitalization follows the log-normal distribution, as shown in the dashed curve in Fig. 1.

Fig. 1
figure 1

Cumulative distribution of market capitalization for listed firms in NASDAQ on December 31, 1997. Dashed straight line and curve, respectively, express a power law distribution as P > (x) α x −1 and log-normal distribution with standard deviation of market capitalizations. Arrow indicates \(x_0 = 2.7 \times 10^8\) dollars in Eq. (1)

3 Power law index of market capitalization distribution

The upper tail of market capitalization distribution gets fat if the speculative money is concentrated on specific stocks. Such concentration of money tends to occur during bubble periods. The power law index became smaller during the 2000 dot-com and 2007 Shanghai bubbles.

The means of market capitalization for all the listed firms in NASDAQ and SSE were, respectively, about \(1.94 \,\times\, 10^9\) dollars on March 9, 2000 and \(3.92 \,\times\, 10^9\) dollars on December 4, 2007 during their bubble periods. On the other hand, the means were, respectively, about \(1.44\, \times\, 10^9\) and \(3.04 \times 10^9\) dollars on March 14, 2011 on the non-bubble periods. The means increased during the bubble periods.

Why does the mean increase during bubble periods? One possibility is that only a few firms increased the market capitalization drastically and raised the whole mean of market capitalization for all the listed firms. The black lines in Fig. 2, respectively, show the market capitalization distributions for firms listed on the NASDAQ during the 2000 dot-com bubble and on the SSE during the 2007 Shanghai bubble. The gray lines express the distributions on March 14, 2011 during non-bubble periods. The cumulative probability on the vertical axis at which the black line intersects with the gray line is \(P_{>} (x=2 \times 10^9 ) \approx \ 0.1\) in the dot-com bubble case. The top 10 % market capitalization in 2000 was higher than that in 2011, although the bottom 90 % market capitalization in 2000 was lower than that in 2011. Such a characteristic was also observed in the Shanghai bubble case (Fig. 2b). The cumulative probability of its crossing point is P > (x = 1.2 × 1010) ≈ 0.04. These results suggest that speculative funds concentrate on a very small set of stocks, leading to stock price bubbles.

The concentration of speculative money changes the distribution slope. Fig. 3 shows the NASDAQ composite index, the SSE composite index, and the time series of the power law index in NASDAQ and SSE. The power law index, which fluctuated around one depending on the economic conditions, fell significantly when the increase of each composite index started during the dot-com and 2007 Shanghai bubble periods and remained small until each composite index steadied after the bubble burst.

Fig. 2
figure 2

Cumulative distributions of market capitalization for listed firms: a in NASDAQ on March 9, 2000 and on March 14, 2011 and b in SSE on Decenber 4, 2007 and on March 14, 2011

Fig. 3
figure 3

a Power law index in NASDAQ and NASDAQ composite index and b Power law index in SSE and SSE composite index. Black and gray lines, respectively, express power law and composite indexes. Dashed lines indicate that the power law index is one

4 Fundamental-adjusted market capitalization in NASDAQ

In economics, a financial bubble is defined by the gap between market and fundamental prices. We show that the gap expanded during the 2000 dot-com bubble period.

First, we look for a firm’s fundamentals that are mainly reflected in its market capitalization during non-bubble periods. We chose the following key financial variables, total assets, net assets, total revenue, operating income, net income, operating cash flow, and number of employees, as candidates of firm fundamentals and investigated the correlation between market capitalization and each key financial variable during non-bubble periods. Table 1 expresses the Kendall and Pearson correlation coefficients for firms listed on NASDAQ in 1997 and 2004. The correlation coefficients between market capitalization and net assets are the largest. However, other correlation coefficients are also high, suggesting the possibility of spurious correlation. To cope with this problem, we conduct random forest regression with market capitalization of individual firms as dependent variable and other financial variables as independent variables.

We provide the regression coefficients of random forests, which are hardly affected by multicollinearity even though we used the data that caused it. Random forests can be used to rank the importance of the explanatory variables in a regression by estimating the explained variable using regression trees without part of the explanatory variables. We set market capitalization and the financial variables to the explained variable and the explanatory variables. Fig. 4a displays the time series of the importance of the financial variables from 1995 to 2013 in NASDAQ. Net assets are the most important in all years except 2011. Other financial variables are not important for most years. On the other hand, in SSE, both net assets and net income are critical. Fig. 4b also expresses the importance of financial variables from 2000 to 2013 in SSE. These results suggest that it is highly likely that the high correlations observed between market capitalization and other financial variables may be spurious ones.

Table 1 Kendall and Pearson correlation coefficients between market capitalization and each key financial variable in 1997 and 2004 in NASDAQ
Fig. 4
figure 4

Importance of financial variables: a from 1995 to 2013 in NASDAQ and b from 2000 to 2013 in SSE. Asterisks, filled square, +, unfilled triangle, diamond, filled circle, and filled triangle are, respectively, total assets, net assets, total revenue, operating income, net income, operating cash flow, and number of employees

Next, we introduce fundamental-adjusted market capitalization. In NASDAQ, since only net assets are the main fundamental that is reflected in market capitalization, we introduce market capitalization adjusted by net assets. In finance, net asset-adjusted market capitalization is called the PBR (or P/B Ratio):

$$\begin{aligned} PBR_i (t) = \frac{x_i (t)}{A_i (t)}, \end{aligned}$$
(2)

where x i(t) is the market capitalization of firm i on the settlement day in year t, and A i is its net assets in year t. Fig. 5 shows the distributions of PBR (1997) in the pre-bubble period, of PBR (1997) in the bubble period, and of PBR (2004) in the post-bubble period in NASDAQ. The distribution became fat during the bubble period and returned to its former position after the bubble burst.

Next, we focus on SSE. Unlike NASDAQ (Fig. 4b), its market capitalizations reflect two financial variables: net assets and net income. Because net income is rarely around zero, the market capitalization divided by the net income becomes extremely big at that time. The upper tail of the distribution of divided market capitalization sensitively responds to the fluctuation of net income. Future work will propose market capitalization adjusted by both net assets and net income.

Fig. 5
figure 5

Cumulative distributions of PBR in NASDAQ in (filled triangle) 1997, (filled diamond) 1999, and (unfilled square) 2004

5 Conclusion

We showed the distributions of market capitalization in NASDAQ and SSE. The upper tails of the distributions follow a power law. The power law index, which fluctuates around one depending on the economic conditions, became small during the 2000 dot-com and 2007 Shanghai bubble periods, suggesting that speculative money was excessively concentrated on a very small set of stocks, leading to stock price bubbles.

In economics and finance, a stock price bubble is defined by the gap between firm sizes in the stock market and in real economies. We used market capitalization and financial variables to estimate the firm sizes in stock markets and real economies. Using the regression coefficient of random forests for market capitalization and financial variables, we found that net assets are most reflected in the market capitalization for NASDAQ firms. For such firms, PBR is defined as market capitalization divided by net assets. The PBR distribution also got fat during the dot-com bubble period. This result means that the gap between firm sizes in asset markets and in real economies widened during the bubble period. This may be a useful tool for policy-makers, like governments and central banks, to detect stock price bubbles. Note that changes in the PBR distribution can be monitored at the daily, or even higher frequencies, so that policy-makers will be able to evaluate the risk of asset price bubbles almost on a real-time basis.

Both net assets and net income are greatly reflected in market capitalization for SSE firms. Market capitalization, divided by net income, becomes extremely big when the net income is close to zero. Therefore, the upper tail of the distribution of divided market capitalization sensitively responds to the fluctuation of net income. One future work will propose market capitalization that is adjusted by net assets and net income to investigate the 2007 Shanghai bubble.

Although market capitalization is made public every day, financial variables are usually announced only quarterly and annually. This difference in timescale complicates the estimation of daily gaps between firm sizes in stock markets and real economies. Another future work will nowcast the key financial variables every day.