1 Introduction

The high degree of financial integration during the Classical Gold Standard is well established in the literature. Obstfeld and Taylor (2005) point to the role of the Gold Standard in driving a convergence in interest rates across countries and increases in capital flows. They refer to the period between 1870 and the First World War as when ‘the first age of globalization sprang forth’ (Obstfeld and Taylor 2005, p. 25). O'Rourke (2002, p. 2) notes that this was ‘the period that saw the largest decline ever in inter-continental barriers to trade and factor mobility’. Various other studies have shown that integration was high when measured by the efficiency of international arbitrage (Canjels et al. 2004), capital account openness (Quinn 2003), capital exports (Esteves 2006, 2011) and movements in sovereign bond markets (Volosovych 2011; Mauro et al. 2002).

However, while studies have considered various aspects of integration during this period, relatively little is known about the integration of stock markets generally. Nonetheless, as noted in Bekaert and Mehl (2019), more recent studies have challenged the view that financial globalization was largely driven by debt and foreign direct investment, opening a broader discussion on integration in other segments of financial markets.

This paper studies stock market integration across eight markets using monthly data capturing both the most industrialized countries and emerging markets, during the Classical Gold Standard. In the first instance, the data are used to identify a ‘global component’ of stock markets. The co-movement of the stock markets with this global component gives us a measure of the integration of each of the eight individual markets. While this method has been used in studies of more recent stock market data (Pukthuanthong and Roll 2009) and in measures of integration in other markets (Volosovych 2011; Ciccarelli and Mojon 2010) this is, to my knowledge, the first paper to do so for this period. In the first instance, these co-movements are related to historical events to understand the pattern of integration in individual markets.

This paper is novel in being the first to use such a broad geographic sample to study stock market integration during the Gold Standard period and relate it to historical events. Existing studies focusing on the Gold Standard period deal largely with the integration between two or three stock markets (for instance, Campbell and Rogers 2017; Stuart 2017, 2018). Some studies use long datasets reaching from the present day back to the Gold Standard. These use data on four or five markets during the Gold Standard, but their focus is generally to obtain an overall composite measure of integration rather than to explain the integration of individual markets and the discussion generally focusses on more recent developments (Goetzmann et al. 2005; Quinn and Voth 2008; Bastidon et al. 2018; Bekaert and Mehl 2019). As such, the developments in individual markets during the Gold Standard are not considered or placed within the context of historical events. There is therefore little comparable literature for this study.

The paper is also unusual in using a balanced sample of stock markets for the entire sample. Since existing studies use data from fewer countries for the time under review here, they generally add markets over time, which makes it difficult to interpret changes in the level of integration. As an example, consider the integration of two markets measured in each of two years using the average pairwise correlation. Suppose that the integration between these two markets is 0.20 in the first and second year. Assume furthermore that in the second year, data become available on a third market with a pairwise correlation of 0.30 with each of the other markets. Now measured integration – the average pairwise correlation – increases from the first to the second year, but this is only because of the addition of the third market to the sample. In contrast, when a balanced panel is used, changes in measured integration are not due to the inclusion of additional markets.

Having considered the integration of individual markets, we compute a composite measure of integration by taking the cross-sectional average of the individual measures of stock-market integration and relate changes in this composite measure to financial crises. The findings broadly support those in the literature that integration increased in the years before the First World War. However, the paper adds to the literature by showing that the findings are robust to (a) the way in which the stock indices have been compiled and (b) the methodology used. A further contribution of this paper is to study how financial crises have impacted on the overall level of integration during the Gold Standard.

Finally, there is frequent discussion in the literature about the level of integration during the Gold Standard compared to recent times. Some studies find that integration exhibited a ‘J-shaped’ or ‘swoosh’ trend whereby integration at the end of the twentieth century was higher than in the late-nineteenth century periods (Volosovych 2011; Bekaert and Mehl 2019), while others argue that integration or capital mobility was similar or higher in the Gold Standard compared to the Gold Standard (Baldwin and Martin 1999; Bordo and Murshid 2006; Bastidon et al. 2018). However, as noted above, these studies often use unbalanced panels to compare integration among a narrow set of markets during the Gold Standard with a much broader set of markets more recently. This paper adds to that discussion by comparing the level of integration in stock markets during the Gold Standard with that observed today using the same markets.

There are five main findings. First, all stock markets co-move with the global component, albeit to varying extents. Thus, there was integration across a broad sample of stock markets during this period.

Second, the pattern of integration in individual countries coincides with important historical events in these countries. In particular, large country-specific political upheavals, such as the revolution in Russia in 1905 and the devolutionist Home Rule movement in Ireland, and large financial crises, such as the Australian crisis of 1893 and the crisis in France in 1881 appear to reduce the integration of exchanges in the affected countries. International crises, particularly the 1907 financial crisis, are also apparent in the data. However, there is less evidence of widespread contagion during these episodes than is observed in the more recent data for the same countries.

Third, the overall level of integration increases during the Gold Standard, particularly in the first half of the sample period. However, reflecting the more idiosyncratic nature of shocks, the role of financial crises for overall integration is less clear. Nonetheless, there is tentative evidence that integration is reduced by financial crises.

Fourth, the level of integration of the stock markets studied during the Gold Standard was much lower than among the same eight markets in the last 35 years.

Fifth, the results are robust to the use of alternative stock price measures, alternative formulations of the global component and an alternative measure of integration.

The paper is structured as follows: The next section motivates the exchanges used in the study, discusses the data and presents some descriptive statistics. Section 3 discusses the choice of methodology. Section 4 presents the results both in terms of the integration of the individual exchanges and overall integration. Section 5 sets out the various robustness checks carried out. Section 6 concludes.

2 Exchanges and data

2.1 Exchanges

Monthly data from eight stock exchanges are collected from secondary sources for the period 1879–1914. The eight exchanges represent Australia, Belgium, France, Ireland, Germany, Russia, the UK and the US. To the author’s knowledge, this is the broadest geographical sample of monthly data that are available for the sample period considered here.Footnote 1 Moreover, the exchanges represent both industrialized and emerging market economies, which experienced interesting and important economic and political developments during the period. On the basis of GDP, the sample includes 6 of the 13 largest economies in 1900, a result that is more or less consistent through the nineteenth century.Footnote 2 Moreover, measures of market size calculated by Rajan and Zingales (2003) suggest that in 1913, several of the exchanges used here were large in international comparison (Table 1). By market capitalization as a percentage of GDP, the UK, Belgium and France are the third, fourth and fifth largest in a sample of 23 exchanges, only surpassed by exchanges in Cuba and Egypt where GDP was very low. Moore (2010) also provides data on the absolute size of market capitalization for 12 exchanges, by which count, the UK, US, France and Germany are the four largest exchanges in 1913. Finally, Rajan and Zingales (2003) also measure size by the number of listed firms on the exchange per million of population. By this measure, four of the ten largest exchanges (Belgium, Australia, UK and Germany) are included in the sample used here.

Table 1 Market size, international comparison, 1913

Moreover, Australia experienced large capital inflows during the period, in a manner that marked it as an emerging market economy. British capital investment in Australia grew rapidly through the 1880s, reaching more than 10% of Australian GDP in 1888 (Quinn and Turner 2020, p. 78). This investment spilled into a property market already fuelled by a marriage boom, rising population and urbanization, and into which the domestic banking system was already lending heavily (Hickson and Turner 2002). The unwinding of the bubble, which began in 1888, but was full force in the early 1890s, saw British capital inflows slow as the Baring crisis took hold, and even reverse in 1893. As noted by Merrett (1989, p. 60), although other countries experienced crises in the 1890s, the extent of the crisis in Australia was ‘unmatched elsewhere’: GDP per capita did not return to its 1888 level until the 1900s.

Russia also experienced strong capital inflows followed by a financial crisis between 1899 and 1902, and subsequently the revolution in 1905. Rapid industrialization in the 1890s was interrupted when foreign capital inflows into government bonds and industrial securities declined in the aftermath of the Baring crisis. A rescue package was largely successful, but Lychakov (2020) notes that the effect was to re-distribute income and wealth from workers to capitalists. The revolution of 1905 began with labor strikes that are usually attributed to poor living conditions for workers (Korelin et al. 2005) and resulted in some constitutional reforms. Lenin later noted that the 1905 revolution was ‘The Great Dress Rehearsal’, without which the 1917 revolution would have been impossible (Ascher 1994).

Although not independent at the time, the devolutionist ‘Home Rule’ movement generated several political crises in Ireland during this period. Hickson and Turner (2005) identify an idiosyncratic slowdown in the Irish stock market after 1897.Footnote 3 They note that it was generally expected by the late 1890s that Home Rule would take place, leading to increasing uncertainty among investors. This was exacerbated by signs that the new regime, when it came to power, might follow redistributive policiesFootnote 4 and the fact that populist nationalism generally perceived banks to be working against the national interest (Ollerenshaw 1997).

In addition to these idiosyncratic shocks, there were several international crises. Neal and Weidenmier (2002) identify large international financial crises in 1890, 1893 and 1907. The Baring crisis in 1890 originated in the deterioration of the economy in Argentina which spilled over into the London market and then onwards. The 1893 crisis originated in the United States with a banking crisis combined with a currency crisis created by the Silver Purchase Act of 1893. Various studies argue that Australia, Italy, and Germany (Bordo and Eichengreen 1999) and Netherlands, Austria and Switzerland (Neal and Weidenmier 2002) were affected by it. Overall, however, the international spillovers from these crises appear to have been less pronounced than that of the 1907 crisis. That crisis originated from the Bank of England refusing to discount US bills in the wake of the San Francisco earthquake and was sparked by the failure of the Knickerbocker Trust. This crisis was very serious and its effects widespread: contagion has been identified as ‘nearly universal in Europe’ by Neal and Weidenmier (2002, p. 30).

There are therefore several interesting idiosyncratic and common shocks against which to consider the integration of stock markets during the period.

2.2 Data

The data sources and descriptions are provided in the Data Appendix. Naturally, our preference is to use indices that are as similarly constructed as possible. Where more than one index is available for an exchange, preference is given to indices of capital gains (price changes exclusive of dividends),Footnote 5 those with the broadest possible sectoral coverage and weighted by market capitalization.

Nonetheless, there are several ways in which the indices may differ. First, when the data is recorded is not uniform: data may be month-end or monthly averages. Moreover, due to time differences, trading will take place at different times, and therefore the issue of whether opening or closing prices are collected becomes relevant. However, in historical data, intra-day (or indeed daily) data are generally not available.Footnote 6 Second, the weighting is not always uniform. For instance, for Russian data, market capitalization is not available and so it is not possible to calculate a market-capitalization weighted index. Third, cross-listings are present in the indices. To the extent possible, series that capture domestic firms are used, however, this is not possible in all cases. Fourth, in the case of France, a blue-chip index is preferred to a broader index, since the broader index suffers from survivorship bias.

Some of these issues will tend to lower measured co-movements between series (narrower sectoral coverage, different weighting), while others (cross-listings) will tend to increase them. However, differences in index compilation are common in both historical and more recent studies.Footnote 7 As a robustness check, in Sect. 5 I show that using alternative measures of some of the markets does not materially impact the main results.

Average monthly percentage returns, variances and correlations for each market over the entire sample period are presented in Table 2. Average percentage returns are highest in Germany and the US, and lowest in France, Ireland and the UK. US returns exhibit the highest variance over the full sample, followed by the Russian and German indices. The Australian, Irish and UK returns have the lowest variance. The lower panel of Table 2 presents pairwise correlation coefficients for the returns. The highest correlation coefficients are between the UK and Belgian, Irish and US returns, and the Belgian and German returns, all of which are above 0.30, while Australia and Russia generally have low pairwise correlations.

Table 2 Descriptive statistics, 1879–1914

What drives the individual pairwise correlations? In some cases, this is likely to be cultural, legal and institutional ties. Perhaps unsurprisingly, the pairwise correlation between the Irish and UK exchanges is particularly high. The close relationship between these two markets has been documented by Stuart (2018). The integration of ‘provincial’ UK stock exchanges with London has also been studied (Campbell et al. 2016).

Correlations may also be driven by different sectoral representation across countries and indices. Figure 1 presents data on the sectoral composition of exchanges in 1900 by market capitalization. Information on the sectoral composition is not available for all indices and the data are primarily from Moore (2010), augmented with additional information where possible on the specific indices used in this study.Footnote 8 This limitation should therefore be borne in mind when interpreting the figure. It is evident that the sectoral distribution is not uniform across exchanges. For example, the Australian series, which has generally low pairwise correlations with the other series, is comprised entirely of manufacturing listings. In contrast, the figure is just 37% in the Belgian exchange, which has the second largest share of manufacturing. Similarly, the Belgian and German exchanges have a relatively high pairwise correlation. This may reflect the fact that both have a low weight on transport but relatively heavy weights on manufacturing, finance and resources.

Fig. 1
figure 1

Sectoral composition of indices. Sources: Moore (2010) except: Australia (from Lamberton 1958), Ireland (from Grossman et al. 2014) and Belgium (from Annaert et al. 2012). Sectoral data for Ireland are only provided for finance and railways (here included as transport); the remainder is listed as ‘other. For Belgium, the breakdown is for industrials (here included as manufacturing), mining (here included as resources) and finance and transport. The exact figure is reported for finance, while approximations of 20% are provided for the remaining sectors

On the other hand, the French exchange has generally low pairwise correlations with most other exchanges, except the German exchange. Both of these exchanges appear to have similar weights on finance and transport which may drive their pairwise correlation. However, the French exchange does not have a particularly unusual composition overall compared to the other exchanges. It therefore seems possible that the blue-chip nature of the French exchange may be resulting in lower correlations. I consider the impact of using a blue-chip index further in Sect. 5.

3 Methodology

3.1 Measuring integration

In this paper, I apply a global component methodology which has been used in various strands of the literature on integration (e.g., Pukthuanthong and Roll (2009; Volosovych 2011; Ciccarelli and Mojon 2010; Gerlach and Stuart 2021). The methodology used here draws most heavily on the first two of these papers. Specifically, I calculate the global component as the first principal component of the data (Volosovych 2011) and then regress the returns of each index on the global component. Integration is then measured using the r-squareds from these regressions (Pukthuanthong and Roll 2009).

In standard principal component analysis, the components are calculated by obtaining the eigenvectors and eigenvalues of the variance–covariance matrix. The eigenvectors are sorted by decreasing eigenvalues to obtain the weightings. These weightings are then multiplied by the underlying series – in this case, the returns series. As a result, eight principal components are obtained.Footnote 9 The first principal component is the series obtained by multiplying the eigenvector associated with the largest eigenvalue by the underlying series and is therefore the component that captures the most variance.

Once the global component is estimated, the r-squared is stored from a regression of the global component on the return on each exchange. That is:

$$r_{i,t} = \alpha + \beta \times r_{t}^{g} + e_{t}$$
(1)

where \(r_{i,t}\) is the return on exchange i at time t and \(r_{t}^{g}\) is the global component. Since there is evidence that over the sample period the variance–covariance matrix of the returns changed,Footnote 10 I use a rolling window. Thus, the global component computed for the three-year period January 1879 to December 1881 is included in Eq. (1) and the measure of integration is reported for 1880. The window is then rolled forward one year, and the process repeated.Footnote 11

This analysis is conducted for each of the eight stock price series and two measures of integration are then calculated: the individual level of integration of each market is measured using the r-squared from these regressions, and the mean of the individual r-squareds at each point in time is the measure of overall integration.Footnote 12

3.2 Discussion of global component methodology

The literature on historical stock market integration provides many methods for measuring integration and co-movements in data. These include the r-squared from the regression of returns on one index on those of another (Campbell and Rogers (2017)), average pairwise rolling correlations of a set of returns (Goetzmann et al. 2005; Quinn and Voth 2008), factor models (Bekaert and Mehl 2019), multivariate GARCH models (Stuart 2017, 2018) and network analysis (Bastidon et al. 2018; Bastidon et al. 2019).

To the author’s knowledge, the global component methodology has not previously been applied to historical data. One question is why this is the case. The broader sample in this paper compared to other studies of the same period means that this is a particularly interesting methodology to use. In a study with data on three or four exchanges, the concept of a ‘global component’ is poorly defined, whereas with the eight exchanges included here it becomes possible to estimate it more precisely. Nonetheless, the number of series is somewhat smaller than that used in Pukthuanthong and Roll (who use 17 series compared to 8 here).

This raises the question of whether the smaller number of series used here could lead to one exchange having an outsize impact on the global component. In principal component analysis, this is possible if one series has a particularly large variance compared to the others. For this reason, standardized data are used for the analysis.Footnote 13

Furthermore, to avoid any suspicion that a high co-movement between an individual series and the global component is driven by that series having a heavy weight in the global component, in Sect. 5 I carry out a robustness check in which I calculate the global components separately for each exchange, using the other seven series only. Overall, this has little impact on the results. Thus, it appears that the global component methodology works well on the data presented here.

A second question is the use of monthly data in this study compared to Pukthuanthong and Roll’s (2009) study of more recent stock market integration, which uses daily data. The concept of a 'global component’ has been used to measure integration in other markets besides stock returns and using data at a similar or lower frequency compared to that used here. In particular, Ciccarelli and Mojon (2010) employ several methods to calculate a global component of (monthly) inflation rates across countries using several methods, in the period since 1960. In a historical setting, Gerlach and Stuart (2021) use a similar methodology to study co-movements in annual inflation rates between countries during the Gold Standard. Volosovych (2011) calculates a global component of 15 countries’ monthly sovereign bond yields over a sample period beginning in 1875. Finally, in the absence of daily data, historical studies of stock market integration are generally carried out on monthly or even annual data.

However, exactly how the global component is calculated varies across studies. Therefore, in Sect. 5 I calculate the global component using two alternative methodologies (Pukthuanthong and Roll (2009; Ciccarelli and Mojon 2010). Moreover, in Sect. 5 I also compare the results using the global component methodology to the average pairwise rolling correlations, which is one of the more widely used methods in the historical stock market literature.

4 Results

4.1 Factor loadings and principal components

Principal component analysis provides additional insights into the nature of co-movements in the series during this time. First, Fig. 2 shows the average proportion of the variance accounted for by each principal components over the sample period. Although it rises somewhat over the sample period, the first principal component generally accounts for approximately 20 to 40 per cent of the variance in the series. This is lower than the proportion of variance the first principal component accounts for in the more recent data since 1985 for the same exchanges (over 90 per cent) (see Sect. 4.4 and the Data Appendix for a discussion of more recent data).Footnote 14

Fig. 2
figure 2

Principal components, average proportion of variance accounted for, 1880–1912

Second, the factor loadings, or weights, used to calculate the principal components are often negative, indicating that at least one exchange moves in a different direction from the global component (Fig. 3). It is relatively rare for more than one country to have a negative loading at the same time. However, less than a third of the time (30 per cent), the first principal component loads positively onto all eight series. For instance, around the episodes of international contagion identified by Neal and Weidenmier (2002) in the early 1890s and around the crisis of 1907, usually one or two exchanges has a negative loading. This implies that the full sample of countries was not affected by the crisis and that it was limited in geographical scope.

Fig. 3
figure 3

Number of series on which first principal component loads negatively, 1880–1913

In contrast, similar analysis carried out on recent data for the same countries indicates that almost always the first principal component loads positively on all series. Thus, the nature of shocks has changed: many of the shocks during the Gold Standard were not so much ‘global’ as specific to one or two countries. Overall, these stylized facts from the principal component analysis suggest that shocks were more idiosyncratic during the Gold Standard than they are today.

4.2 Which exchanges were most integrated?

With this in mind, we turn to Fig. 4, which shows the 5-year moving average of the r-squareds from the individual country level regressions in Eq. (1). In many cases, large financial and political events appear to impact the integration of the individual exchanges. Perhaps unsurprisingly, the UK exchange, as the largest exchange during this sample period, is among the most highly integrated throughout most of the sample. The US, which was growing in size and influence as a financial center, shows a marked increase in integration over the sample. The French market was highly integrated at the start of the sample period, but there is rapid disintegration in the wake of the crash of 1882. This was the worst crisis experienced by the Paris stock market in the nineteenth century, requiring an emergency loan from the Banque de France to avoid a closure (White 2007). The level of integration does not begin to recover until the late 1890s.

Fig. 4
figure 4

Measured integration by country, centred moving average, 1880–1912

The global component generally explained the least variation in the Australian returns, where the r-squared is always below 0.15. However, the level of integration of the Australian exchange is generally increasing in the second half of the 1880s, as capital flowed into the economy, particularly the booming property market. As the bubble unwinds the level of integration declines and trends downwards until at least the mid-1900s, in line with Quinn and Turner’s (2020) assessment that the effects of the Australian bubble were long lasting.

The level of integration of the Russian market rose from the late 1880s, before leveling out in the early 1890s when capital flows slowed in the wake of the international crises in the early 1890s. However, integration rises again, and sharply, in the late-1890s as the economy industrialized and capital flowed into the country, reaching a peak during the financial crisis between 1899 and 1902. There is a rapid decline thereafter, which is compounded in the wake of the Revolution of 1905.

The integration of the Irish market increases rapidly in the first half of the sample, reaching a peak in the late 1890s, at just the time that Hickson and Turner (2002) identify how increasing uncertainty around the Irish political situation gave rise to an idiosyncratic decline in the Irish stock market. The uncertainty around the Home Rule movement remained unresolved until after the First World War and the level of integration generally declines for the remainder of the sample.

Turning to large, international crises, given the three-year window, the effects of the Baring crisis in 1890 and crisis of 1893 are difficult to disentangle. Furthermore, with the exception of Germany, the US and Ireland, the level of integration of most exchanges appears relatively stable. However, the failure of the Knickerbocker Trust in the US and the crisis of 1907 is also an interesting episode. Neal and Weidenmier (2002) find that this crisis had widespread spillovers, particularly in Europe. The effect is an increase in measured integration in France, Russia, Germany, the UK and Belgium. In contrast, the US, which had effectively been excluded from the London discount market in the year prior to the crisis, becomes less integrated over this period. Neal and Weidenmier (2002) identify this as an example of what can happen when a country is excluded from an interdependent system when its financial needs are greatest: attempts to shelter the rest of international system from an idiosyncratic shock in one financial center are likely to prove fruitless.

4.3 Overall integration during the Gold Standard

The overall level of integration is measured by the cross-sectional average of the eight individual r-squareds from the regression in Eq. (1) in each window. The results are presented in Fig. 5. Two points are of note. First, following an initial decline, the average r-squared increases through to the end of the nineteenth century. Thus, the first part of the classical Gold Standard era was one during which markets became more integrated. Second, after reaching a maximum level around the turn of the century, integration levelled out or increased only marginally. Overall, it appears that perhaps the benefits of the Gold Standard for integration were largely realized at the start of the sample period.

Fig. 5
figure 5

Measures of overall integration, 1880–1912

Compared to the existing literature, Goetzmann et al. (2005) find a similar decline in integration the early 1880s, while those authors and Bekaert and Mehl (2019) find that integration subsequently increased almost continuously until the First World War. Quinn and Voth (2008) also find an increase in integration from the start of their sample period in the 1890s until 1913, although in their study integration peaks in the early 1900s. Overall, although the patterns are not identical, the results are broadly similar in finding an increase in integration in the years before 1913.

The large international crises identified by Neal and Weidenmier (2002) in 1890, 1893 and 1907 are not very marked in the overall pattern of integration. This may be because, as noted, idiosyncratic shocks were more important then than now. Moreover, even when there were spillovers, markets were affected in different ways, as we have seen with the failure of the Knickerbocker Trust in 1907. However, the measure of integration does appear to be somewhat more volatile around these periods.

To consider the role of financial crises in more detail, in addition to the number of series on which the first principal component loads negatively at each point in time, Fig. 3 shows the number of countries experiencing a crisis (banking, currency or twin crisis) in a given year. This indicator is sourced from Bordo et al. (2001) who describe their measures in detail (Bordo et al. (2001), p. 55). Specifically, banking crises are defined if financial distress leads to “the erosion of most or all of aggregate banking system capital”. Currency crises are defined as “a forced change in parity, abandonment of a pegged exchange rate, or an international rescue”. Moreover, the authors construct an index of exchange market pressure,Footnote 15 with a currency crisis occurring when this index exceeds a critical threshold. From Fig. 3 it appears that there is some correlation between this crisis indicator and negative loadings: periods of negative loadings cluster around the financial crises in the early 1890s and 1907 and the depression in the early- to mid-1880s.

I begin by estimating a univariate regression of overall integration on the crisis indicator, and find it is significant with a negative sign, suggesting that financial crises reduce integration (Table 3).Footnote 16 This is in line with the evidence in Fig. 3 that usually only one or two countries experience negative factor loadings in a given window. However, other factors could also impact stock market co-movements, including trade openness, inflation rates and government surpluses (see Volosovych (2011) for a discussion).Footnote 17 I therefore next regress the measure of integration on the crisis indicator and this set of control variables. The coefficient on the crisis indicator is still significant and negative (Table 3). While this provides some evidence for the role of financial crises in driving the pattern of integration/disintegration during the period, it is noted that the result is not completely robust to the use of other measures of the global component, as shown in Sect. 5.

Table 3 Regressions of overall integration on crisis indicator

4.4 How integrated were stock markets?

Whether the level of integration during the Gold Standard was high or low is difficult to judge in isolation. There is a discussion in the literature regarding the relative levels of integration during the Gold Standard and the more recent period. For instance, Baldwin and Martin (1999) argue that capital mobility was perhaps higher during the classical gold standard than in recent decades. Bordo and Murshid (2006) draw the same conclusion while Bastidon et al. (2018) argue that financial market integration exceeded that of the late nineteenth century only in the 10 years prior to the publication of their study. In contrast, Volosovych (2011) and Bekaert and Mehl (2019) find that integration exhibited a ‘J-shaped’ or ‘swoosh’ trend whereby integration at the end of the twentieth century was higher than in the late-nineteenth century periods, with a decline in integration in between.

Therefore, I next compare the level of integration during the Gold Standard obtained above with more recent data. For this analysis I collect data for the same set of countries from the OECD for the period 1985 to 2020 (the same length as the sample for the Gold Standard above).Footnote 18 Russian data is not available prior to 1997 and is therefore excluded from this analysis.Footnote 19

Using all three methodologies, the level of integration, as measured by the average r-squareds, ranges from 0.56 to 0.93 (Fig. 6). Thus, the level of integration is significantly higher in more recent times than it was during the Gold Standard across all measures. Indeed, the average r-squareds in the recent data are approximately twice as large compared to those during the Gold Standard. This result suggests stock market integration exhibited a ‘J-shaped’ or ‘swoosh’ pattern similar to the findings of Volosovych (2011) and Bekaert and Mehl (2019).

Fig. 6
figure 6

Overall integration, 1985–2020, excluding Russia

5 Robustness

5.1 Weighting, stock choice and cross-listings

To check that these results are not driven by the specific measure of stock prices chosen, I first test alternative measures which are available for the various stock markets and compare the average r-squareds from the resulting estimates from Eq. (1). I report the correlation of the resulting average r-squareds with those obtained using the baseline series as in Table 4 while Fig. 7 shows the various measures of integration obtained using these alternative series.

Table 4 Correlation of average r-squareds, baseline and alternative stock price series
Fig. 7
figure 7

Alternative stock price series, overall integration, 1880–1912

In the first instance, I consider the Irish data since price-weighted and unweighted series are both available from the same underlying dataset as the baseline, market-capitalization weighted series. Interestingly, the correlation between the new measure of integration and the baseline measure is never less than 0.91 (Table 4).

Unweighted indices are also available for Belgium and France from the same sources as the baseline specification. When all three of these series (including for Ireland) are substituted into the analysis, the overall findings are unchanged (Fig. 7), although the correlation coefficient between the average r-squareds is somewhat lower at 0.64 (Table 4).

Alternative price-weighted data are available for the UK and US although these series differ from those used in the baseline specification in more ways than the weighting.Footnote 20 These additional differences will tend to reduce the correlation. Nonetheless, Fig. 7 shows that the general trend remains the same, although the correlation coefficients also in this case are about 0.64 (Table 4).

Finally, I include the most diverse set of data series possible in the analysis (the price-weighted data for the UK, US and Ireland and the unweighted data for Belgium and France, as well as the baseline data for Germany and Russia).Footnote 21 Changing all five series like this leads to a lower correlation coefficient, at 0.39. However, the overall pattern of integration during the period, as illustrated in Fig. 7, is broadly the same. Overall, it seems that the weighting of the series is not driving the results.

Weighting is not the only way in which series can differ. Some series include a narrower set of firms that others. For instance, the French index is a blue-chip index. To understand what the effect of this might be, I use a blue-chip index calculated by Campbell, et al. (2019) for the UK. The results are presented in the lower panel of Table 4 and Fig. 7. The correlation coefficients are in the region of 0.97 and above, and the overall pattern of integration is generally unchanged.

Finally, cross-listings are likely to cause co-movement between indices. In the baseline specification, I have used indices focused on the domestic market where possible. However, Campbell et al. (2019) also provide a broad index of all listings on the London market.Footnote 22 In the first instance, I include this in the model as before. The correlations is always in excess of 0.90 (Table 4). Next, I look at the loading of the first principal component on the UK series when the broader and baseline (narrower) indices are used (Fig. 8). The evolution of the factor loadings through the 3-year windows is similar: as might be expected the broader series generally has a marginally higher factor loading, but this is relatively constant throughout the sample period. There is a period in the late 1880s when the factor loading turns negative for both series and it takes a little longer for the broad index to regain positive loadings. However, overall, the mean difference between the two series is less than 0.03. This suggests that cross-listings are not significantly altering the main findings.

Fig. 8
figure 8

Factor loadings on UK returns using narrow (baseline) and broad index

Overall, therefore, the choice of weighting and the general composition of the index does not have a significant impact on the broad pattern of integration during the Gold Standard.

5.2 Rolling correlations

To ensure that the global component methodology is not driving the results, I next compare my results with a rolling average pairwise correlations which are used in the existing literature to calculate integration (Goetzmann et al. 2005; Quinn and Voth 2008). Specifically, I calculate pairwise correlations for every series on the same rolling 3-year window as used in the earlier analysis, resulting in 28 correlation coefficients in each of the 33 windows. The average of these 28 correlation coefficients is calculated and presented in Fig. 5 (read off the right-hand axis). Overall, the pattern is very similar to the baseline measure of integration, and the correlation coefficient is 0.83. In terms of the role of crises, results from regressions of the rolling correlation on the crisis indicator are included in Table 3 and suggest that the crisis indicator is statistically significant.

Overall, it seems that the methodology to capture the global component is not driving the main results of the paper.

5.3 Alternative calculations of the global component

Existing studies use various methods to calculate the ‘global component’. I therefore calculate the global components in two alternative ways. The first draws on Pukthuanthong and Roll (2009) who use ‘out-of-sample’ principal components as the global component while the second draws on Ciccarelli and Mojon (2010), who use the unweighted average as the global component of inflation rates across countries using several methods, in the period since 1960. Although on the surface these methods may appear quite different, all three are essentially (weighted) averages of the data.Footnote 23

To calculate ‘out-of-sample’ principal components, Pukthuanthong and Roll (2009) use a lagged variance–covariance matrix to calculate weightings and applies these to current data. Thus, the weightings (eigenvectors) computed from the three-year period January 1879 to December 1881 variance–covariance matrix are applied to the returns in the period January 1882 to December 1884. The window is then rolled forward one year, and the process repeated.

The average of the r-squareds from Eq. (1) using the first out-of-sample principal component and the cross-sectional mean are also presented in Fig. 5.Footnote 24 The pattern is similar to that when the standard calculation of the first principal component is used. The pairwise correlation coefficients between the three measures is never less than 0.83. Overall, it seems that all three measures of the global component yield a similar conclusion regarding the path of integration.

Turning to the role of crises, results for regressions of these measures of integration on the crisis indicator are included in Table 3. For the most part, the crisis indicator is not significant, although the p-value in a univariate regression of integration measured using the cross-sectional mean on the crisis indicator is 0.051. These results suggest that it is difficult to draw firm conclusions on the role of financial crises for overall integration.

5.4 Global component calculated separately for each country

To check that the results are not driven by one country being heavily weighted in the global component, Pukthuanthong and Roll (2009) propose that separate principal components be calculated excluding each index in each window. For instance, if we are measuring the level of integration of the US market, the principal component is calculated using the variance–covariance matrix and returns of the other seven markets only. I next employ this method for all three measures of the global component. A separate global component is calculated for each market, denoted, \(r_{i,t}^{g}\), with the effect that Eq. (1) is now written:

$$r_{i,t} = \alpha + \beta \times r_{i,t}^{g} + e_{t}$$
(2)

The average r-squareds are also presented in Fig. 5 (dashed line). Compared to the average r-squared calculated using \(r_{t}^{g}\), it is notable that while the overall level of integration is much lower, the pattern of integration does not change. Indeed, the correlation between the average r-squareds using this method and the baseline method is 0.98. Overall, it appears that estimating the global component in this manner does not have a significant impact on the overall pattern of integration, although it does markedly affect the absolute level of measured integration.

5.5 Additional principal components

Selecting the number of principal components to use to capture the ‘global components’ is arbitrary. In contrast to Volosovych (2011; Pukthuanthong and Roll 2009) use the first 10 principal components when measuring integration which account for approximately 90 per cent of the variation in their returns series. Figure 2 shows that to capture a similar proportion of the variance, six or seven of the eight principal components would have to be used. This raises the danger of overfitting. I therefore next include both the second and third principal components in Eq. (1) to test whether this affects the measured integration. The average adjusted r-squared from this model is included in Fig. 9. For comparison, Fig. 9 also includes the adjusted r-squared from the baseline model. Overall, the estimated level of integration rises when additional principal components are added, however, the pattern of integration over the period remains remarkably unchanged.

Fig. 9
figure 9

Overall integration, including 1st, 2nd and 3rd principal components

6 Conclusions

This paper examines the integration of stock markets during the classical Gold Standard, using monthly data on indices in Australia, Belgium, France, Germany, Ireland, Russia, the UK and the US. To my knowledge, it is the most comprehensive comparative study of the behavior of stock return integration during this period. Moreover, it is the first to do so using a methodology to capture ‘global components’ of stock returns rather than rolling correlations or a factor model.

Overall, the results indicate that all the exchanges co-moved with the global component, although the level of integration varied across exchanges and through time. Variations in the integration of individual markets through time often appear to relate to large financial or political crises in individual countries, for instance, political crises in Russia and Ireland and financial crises in Australia and France. Moreover, the movements in the wake of the 1907 crisis fit with the existing historical narrative of contagion. Nonetheless, testing whether the overall level of integration is affected by financial crises yields only tentative evidence that, reflecting the more idiosyncratic nature of shocks at the time, crises reduced integration. Finally, comparing with the level of integration in more recent data for the same countries it is clear that the stock markets were substantially less integrated during the Classical Gold Standard than today. The results appear robust to a number of checks.