1 Introduction

Cryptocurrencies proliferation has led to strategic considerations in modern asset portfolio management. For example, how would portfolio fund managers combine traditional asset classes with cryptocurrencies to forming their portfolios, given the peculiar return distribution characteristics of the latter? The return distribution characteristics of cryptocurrencies include aggravated market volatility, price deviations from fundamental value,Footnote 1 and the formation of bubbles and crashes (Arjoon & Bhatnagar, 2017; Litimi et al., 2016; Economou et al., 2015; Mobarek et al., 2014; Yao et al., 2014). However, portfolio analysis following the standard mean-variance approach as proposed by Markowitz (1952) cannot capture higher moments in the return distribution of cryptocurrencies.

The objective of this study is two-fold. First, we examine the convergence (or divergence) of return characteristics using the Euclidean distance measure. This measure is a popular cluster analysis method used to determine the similarity among objects within the same group or the difference between objects in two distinct groups. Specifically, we construct the two-moment mean-variance distance measure following Markowitz (1952) and a four-moment distance measure (including higher moments, i.e. skewness and kurtosis). We demonstrate (i) how uncertainty of market news and regulatory events affects the Euclidean distance, (ii) the implications of convergence (or divergence) over time for efficient frontiers following Markowitz (1952), and (iii) a successful trading strategy based on the Euclidean distance. Second, we extend the distance analysis for cryptocurrencies by developing a Bitcoin cosine similarity measure to capture the behaviours of other cryptocurrencies (i.e. alt-coins) relative to Bitcoin. This extends and complements the existing literature on financial distance analysis. While the Euclidean distance measure only considers the overall similarity in a ‘group’, the cosine similarity measure provides greater insights into how individual alt-coins move compared to Bitcoin in a pairwise setting.

The present literature on mean-variance convergence has largely shown that convergence has increased in both equity (Eun & Lee, 2010) and real estate (Liow, 2015) markets. Interestingly, this convergence is more a result of country factors than industry factors, and stock prices do not converge as much as their volatility (Apergis et al., 2014). However, the constituents underpinning the indices used to construct the Euclidean distance are not homogeneous, thus making it more difficult to achieve financial integration and enforce the law of one price. This may also be the case for Bitcoin and alt-coins. Bengtsson and Gustafsson (2023) find that cryptocurrencies are generally heterogeneous in their return determinants but there are still substitution effects among them. The law of one price can still hold if convergence occurs in the cryptocurrency market.

Moreover, several studies have documented a strong dependence structure among cryptocurrencies. Figá-Talamanca et al. (2021a) found that the joint behaviours of cryptocurrency markets (as represented by the top 4 coins) can be described by two dynamic common factors. In another study, the authors tested for potential common regimes among the top four cryptocurrencies and reported three common states at most (Figá-Talamanca et al., 2021b). More interestingly, they show that the proper modelling of this characteristic could form the basis of lucrative trading strategies. Meanwhile, major cryptocurrencies also demonstrate significant dependence on the tails of their return distributions, as well as the transmission of tail risks. However, the dependence in the right tail is much stronger than that in the left, which may help explain the recent growing popularity of cryptocurrencies (Nguyen et al., 2020). In addition, according to this study, an equal-weighted crypto portfolio is sufficient for diversification, even with transaction costs. Finally, Yaya et al. (2019) confirm the cointegration between Bitcoin and other coins, both before and after the substantial market crash in 2017–2018; however, their relationship is generally weaker following the crash.

As suggested by the investor psychology literature suggests, financial market participants are susceptible to errors and cognitive biases arising from their emotions and volatile beliefs. This is especially true when users’ sentiments (Chuen et al., 2017) or what a user feels at a point in time about the expected market prices (Cheah et al., 2018) drives cryptocurrency prices. Therefore, uncertainty due to market news and regulatory events plays a major role in the convergence or divergence of cryptocurrencies over time, and even more so because this market is not well regulated and often requires very low transaction costs.

The contributions of this study are the following: First, apart from using distance metrics such as Euclidean distance and Bitcoin cosine similarity, we apply the \(\ell ^{2}\)-normalisation feature scaling technique used in machine learning. As large-scale variables can distort the distance calculation, the \(\ell ^{2}\)-normalisation will remove the scaling effect of any dominant feature (moment) from the distance calculation. This reveals the true distance representation of each feature in the respective distance measures, which is an improvement over Eun and Lee (2010) and Liow (2015).

Second, we introduce the Bitcoin cosine similarity measure to better understand the behaviour of individual alt-coins towards Bitcoin. In addition to the Euclidean distance, cosine similarity is another distance metric used in machine learning algorithms in the computer science and business analytics literature. The Bitcoin cosine similarity measure extends and complements the Euclidean distance, as the latter only shows the magnitude of the difference without information about the direction of market movements. Meanwhile, the Bitcoin cosine similarity compares the individual pairs of alt-coins to Bitcoin as the direction of the similarity distance movement will provide further insights into how each altcoin behaves with respect to Bitcoin. We used Bitcoin as a benchmark against alt-coins because Bitcoin is widely accepted as the most popular and well-known cryptocurrency.

Third, for both distance metrics, we examine the impact of all four moments of the return distribution by adding the third and fourth moments, kurtosis and skewness, to the mean (first moment) and variance (second moment). Previous authors, such as Conrad et al. (2013) and Harvey et al. (2010), have already shown that skewness and kurtosis are closely related to a security’s future return; therefore, by incorporating them in our analysis, a more robust measure and comprehensive understanding of cryptocurrency returns can be acquired. The addition of these higher moments can help create more accurate forecasting tools to predict expected cryptocurrency returns.

Finally, we show that uncertainty due to market news and regulations events is an important determinant of the mean-variance Euclidean distance and cosine similarity measure. Cryptocurrency indices were developed to record cryptocurrency market trend and dynamics. For instance, Shah et al. (2021) develop a cryptocurrency price index that incorporates market price movement, size and momentum factors while Lucey et al. (2022) introduce the cryptocurrency uncertainty indices that measure the impacts of market news and regulations events. We show that the uncertainty indices have a high explanatory power for distance measures.

Our findings reveal that both the mean-variance Euclidean distance and Bitcoin cosine similarity measure converged over time during the 2018 cryptocurrency market crash. Before and after the crash, these measures diverged over time. Next, the results also indicate that convergence (or divergence) across time leads to a shift in the efficient frontier. More specifically, a higher degree of divergence over time would lead to greater diversification benefits. Finally, uncertainty due to market news and regulations events is shown to influence the convergence (or divergence) of the Euclidean distance and Bitcoin cosine similarity measures.

2 Financial convergence in cryptocurrency markets

The convergence concept originated from the economic growth literature (Liow, 2015; Barro & Sala-i Martin, 1992) and has been extended to equity (Eun & Lee, 2010) and international public property markets (Liow, 2015). Convergence can be understood as the tendency or process of two or more entities becoming more similar in their behaviours over time. For example, Barro and Sala-i Martin (1992) observe the economic convergence between poor and rich countries/regions where poorer countries/regions grow faster and converge towards richer countries/regions in terms of gross domestic product (GDP) per capita. In this study, financial convergence refers to the convergence of two or more financial markets/assets. For instance, Eun and Lee (2010) report substantial convergence among 17 developed international equity markets regarding their returns and risks, while Liow (2015) documents divergence where real estate markets become more dissimilar over time among 12 developed global public property markets. In both theory and practice, convergence is often attained by increasing integration among countries/markets [e.g. (Eun & Lee, 2010)].

Convergence implies lower entry barriers and consequently, allows easier flows of capital where it can be used most effectively. For example, developing countries and emerging markets offer excellent investment opportunities and attract large cash inflows because of their abundant room for growth. This would help developing countries catch up with their more developed counterparts. In addition, if two comparable financial securities are mispriced relative to one other, arbitrageurs will transfer their funds from the overpriced instrument to the underpriced one, which should bring their prices back in line and ensure parity in the relationship. In other words, this relationship is governed and enforced by the law of one price where market forces ensure that comparable securities behave similarly. The action of arbitrageurs helps maintain and improve market efficiency over time, which is consistent with the adaptive market hypothesis (AMH) as proposed by Lo (2004). Similarly, major shifts or economic shocks could lead to financial divergence between markets. Hence, market efficiency could decrease following financial bubbles, crashes and crises.

In a highly integrated financial market, price convergence is expeditious if the law of one price is observed. According to the efficient markets hypothesis (EMH) proposed by Fama (1970), the law of one price can be observed if the arbitrage process is unrestricted. However, neither the equity nor international public property markets are unregulated. Some commodity assets, such as gold, suffer from large bid-ask spreads and are highly regulated (for example, tax on capital gains). Bitcoin and alt-coins belong to distinct classes of assets. These cryptocurrencies are not issued by any central authority (i.e. decentralised), are highly unregulated, and are relatively inexpensive to transact. Therefore, any price deviation could be arbitraged away relatively easily and quickly, ensuring that the law of one price is observed.

The AMH, as proposed by Lo (2004), postulates the impossibility of absolute efficiency and offers an efficiency concept based on the degree of relative efficiency. Consequently, the AMH is a helpful framework for tracking the evolution of prices in cryptocurrencies (Urquhart & McGroarty, 2016; Urquhart et al., 2015; Ghazani & Araghi, 2014; Manahov & Hudson, 2014; Urquhart & McGroarty, 2014; Urquhart & Hudson, 2013; Neely et al., 2009) and allows for market conditions such as market crashes, economic and political crises, bubbles (Cretarola & Figà-Talamanca, 2021) and regulatory regimes to shift (Kim et al., 2011). As most cryptocurrencies are not backed by any meaningful or tangible commodity, unlike conventional fiat currencies, the price of these digital assets is largely driven by user sentiments (Chuen et al., 2017) or what a user feels at a point in time about the expected market prices (Cheah et al., 2018), which, in turn is affected by cognitive biases and errors arising from emotions and volatile beliefs held by cryptocurrency participants.

The emergence of cryptocurrencies requires closer empirical scrutiny for several reasons. First, the fragility of cryptocurrency markets can potentially destabilise traditional financial markets. For example, as a distinct class of speculative investment assets (Corbet et al., 2018), these cryptocurrencies are prone to frequent, large, and discontinuous price jumps due to market sentiment swings (Katsiampa, 2017; Fry & Cheah, 2016; Cheah & Fry, 2015). If these cryptocurrencies are held as a debt-financed speculative asset, the contagion effect of a sudden collapse in cryptocurrency prices will trigger margin calls, and the negative effects of the price collapse in cryptocurrency markets will spill over to debt markets (Giudici & Polinesi, 2021; Baur et al., 2018).

Cryptocurrency is the most volatile and risky asset class (Mužić & Gržeta, 2022) but its volatility is mostly due to investor sentiments rather than changes in fundamentals (Chuen et al., 2017). Furthermore, cryptocurrencies (particularly, Bitcoin), react quickly to announcements of macroeconomic news announcements within the same day. However, unlike other assets, the trading volume of Bitcoin tends to remain stable, and its volatility often increases in response to positive news although its returns are more sensitive to negative news (Mužić & Gržeta, 2022). Valuation is very different for cryptocurrencies compared with conventional assets because of the unique features of this new asset class. For example, many cryptocurrencies, most notably Bitcoin, have a fixed supply limit unlike fiat currencies. These digital currencies also do not generate cash flows such as stocks and bonds, which invalidates the use of the discounted cash flow model for valuation purposes (Chuen et al., 2017). More importantly, the underlying blockchain technology of cryptocurrencies and their potential for innovative applications play a crucial role in their valuation.

Cryptocurrency is clearly distinct from traditional securities, suggesting that it can offer diversification benefits in portfolio formation. Notably, cryptocurrencies tend to have low transaction costs and outperform mainstream assets in return (Chuen et al., 2017). Therefore, cryptocurrencies could also be used for speculative purposes in addition to being a medium of exchange or alternative currency (Baur et al., 2018). Interestingly, the cryptocurrency market has a very low correlation with traditional assets; therefore, cryptocurrencies can be used for diversification purposes to enhance efficient frontier and investor utility (Chuen et al., 2017). Furthermore, a low correlation was observed in both the normal and turbulent periods (Baur et al., 2018). Despite being able to reduce the risk of portfolios consisting of stocks and bonds in major economies during the pandemic, the role of Bitcoin has changed in many countries except the US (Huang et al., 2021). Kurka (2019) also reported negligible connectedness between cryptocurrencies and conventional instruments, but warned of the occasional spillover of significant shocks between them. Despite its diversification benefits, Bitcoin cannot be classified as a safe-haven asset such as gold as of yet (Kyriazis, 2020). It can be used as an effective hedge for oil and equity, but it does not offer the full benefits of hedging like gold. Nevertheless, Bitcoin could still be used as an effective hedging tool for portfolios containing gold, given its low or negative correlation.

According to Andrianto and Diputra (2017), the addition of some cryptocurrencies (e.g. Bitcoin, Ripple, and Litecoin) to traditional assets (e.g. equity, ETF, commodity, foreign currency) helps to improve its performance by offering a wider range of allocation choices that can reduce portfolio risk (i.e. standard deviation). In addition, other cryptocurrencies (e.g. XEM, DOGE, XLM, USDT) have also been found to be useful for hedging and diversification purposes (Huynh et al., 2020). The optimal weight for crypto assets should be between 5 and 20% depending on investors’ risk tolerance (Andrianto & Diputra, 2017). Dyhrberg (2016) found that Bitcoin is ideal for risk-averse investors in anticipation of negative shocks to the market and Dyhrberg (2016) found that Bitcoin could be used as a hedging asset against market specific risk. Katsiampa (2017) discovered that Bitcoin prices contain both short- and long-run conditional risk components. Omanović et al. (2020) show that during the coronavirus disease pandemic, the cryptocurrency market recovered much faster than conventional financial instruments, which may partly explain the important role of the crypto asset class in portfolio management.

Third, it is worth noting the effects of cross-diversification are mutually beneficial to traditional and cryptocurrency portfolios. These diversification gains can be observed by adding cryptocurrencies to traditional portfolios and incorporating traditional assets into cryptocurrency portfolios. Moreover, adding multiple cryptocurrencies to a portfolio can lead to greater diversification benefits. Mensi et al. (2019) reported that a combination of Bitcoin and Ether offers the best results in terms of risk and hedging effectiveness in the medium and long terms, whereas Bitcoin combined with Monero is the best choice in the short term. Despite the advantages of including cryptocurrencies in portfolios, better performance often requires more sophisticated techniques to account for estimation errors, since such advanced models outperform out-of-sample tests regarding risk and risk-adjusted returns. This includes evidence from short-selling constraints and transaction costs (Platanakis & Urquhart, 2019). With the recent launch of Bitcoin futures in the CBOE and CME, Bitcoin is seen to provide the wider investment communities with prospect of diversifying or hedging their portfolios. These recent financial widening and deepening activities can provide greater alternatives for portfolio and fund managers to reduce the risks of their highly leveraged cryptocurrency portfolios. For example, if cryptocurrencies are held as part of a highly levered but inappropriately diversified portfolio, sudden negative shocks in the cryptocurrency markets would create heavy losses not only in the debt markets but also jeopardise the performance of the pension or fund management held by these portfolios and fund managers.

3 Data and methodology

First, we discuss the construction of the Euclidean return-risk distance measure and the implications for efficient frontier formation. Next, we expand this distance measure to include skewness and kurtosis, and introduce the Bitcoin cosine similarity measure. Finally, we examine the role of uncertainty in determining convergence or divergence in the Euclidean distance measure with regard to the centroid of cryptocurrency markets.

3.1 Data

In this study, we select the top 20 cryptocurrencies on Yahoo FinanceFootnote 2 ranking on 30 June 2021 as follows. The global cryptocurrency market capitalization (market cap hereafter) was $1.39 trillion, and these top 20 cryptocurrencies account for 88.6% of the total market cap, with bitcoin commanding 46.3% and the remaining 42.3% belonging to alt-coins. Among the top 20 cryptocurrencies, we removed three stablecoins, namely USD Coin (USDC), USD Tether (USDT) and wrapped Bitcoin (wBTC), because they are pegged to either the US dollar or Bitcoin and thus do not display any meaningful return distribution moments of their own. In particular, wBTC could bias the results towards convergence because of its similar characteristics to Bitcoin. The final dataset includes 17 cryptocurrencies, which account for 82% of the total market cap and represent the global cryptocurrency market well.

Table 1 provides a full list of all cryptocurrencies in this study and their respective market caps. We downloaded the daily cryptocurrency price data for the period 31 December 2013 to 30 June 2021 from the CoinGecko database.Footnote 3 We calculated the monthly mean returns, standard deviation, skewness and kurtosis of each cryptocurrency. It is important to note that only a few cryptocurrencies had a relatively long data history as most cryptocurrencies were introduced at different times during the sample period. For example, the second largest cryptocurrency, Ether, became available from 6 August 2015.

We employ the Cryptocurrency Uncertainty Index (UCRY)Footnote 4 proposed by Lucey et al. (2022) as a proxy for uncertainty. We constructed the two UCRY monthly indices by averaging weekly values for the month. The UCRY Policy and UCRY Price indices capture the uncertainties in regulatory changes and market movement, respectively, and these indices were computed based on topical news coverage. Notably, the weekly values of each index depend on the number of new articles containing the relevant keywords drawn from an extensive newspaper and news feed database for each week. A greater number of articles containing these keywords suggests more discussions, and consequently, more uncertainty. Each index captures a specific type of uncertainty that could correspond to a particular group of market participants. An informed and less informed group of investors pays more attention to policy and price changes, respectively. Because cryptocurrency markets are largely driven by sentiments, these uncertainty indices can help inform collective market behaviours especially in relation to the tendency to drive towards convergence and divergence observed among various cryptocurrencies.

Table 1 Top 20 cryptocurrencies based on the global market cap as of 30 June 2021

3.2 Constructing Euclidean distance measure

We employ the Euclidean distance measure, the most popular (dis)similarity measure in cluster analysis, to analyse the return distribution of cryptocurrencies. The Euclidean distance measure is relatively easy to understand because it measures the length between two points in a straight line. It is intuitive as it is how ‘distance’ is measured in real life; therefore, it is widely used in machine learning literature. Generally, the Euclidean distance measure along with the Manhattan and Chebyshev distance measures are special cases of the Minkowski distance. The Minkowski distance measure is given as:

$$ MinDis=\left( \sum _{i=1}^{n}|x_{i}-y_{i}|^{p}\right) ^{\frac{1}{p}}, $$

where p is the order of Minkowski distance. When \(p=1\) and \(p=2\), the Minkowski distance provides a measure for Manhattan and Euclidean distance, respectively. However, as \(p=\infty \), the Minkowski distance measure becomes the Chebyshev distance measure which is given as:

$$ CheDis=\max _{i=1}^{n}|x_{i} - y_{i}|. $$

For robustness, we also examine the empirical results based on the Manhattan distance and Chebyshev distances, in addition to Euclidean distance.

The four-moment distance measures provide measures of kurtosis and skewness in addition to the mean and standard deviation measures of cryptocurrencies. Higher moments provide further information on the shape of the returns distribution which has numerous practical applications. Examples include asset pricing (Hwang & Satchell, 1999) and portfolio selection (Harvey et al., 2010). In addition, higher moments could be used to predict cryptocurrencies and subsequently formulate trading strategy purposes (Jia et al., 2021). Furthermore, information from higher moments can be used to help risk managers develop effective risk management strategies. As skewness and kurtosis measure return-symmetry and tail-thickness respectively, risk managers along with investors and portfolio managers can utilise these four-moment distance measures to better inform their investment strategies.

There are three steps involved in constructing this measure. First, given the different scales among the mean, standard deviation, skewness, and kurtosis, we normalise these variables. Second, for any given time period, we calculate the centroid for our basket of cryptocurrencies. Finally, we measure the Euclidean distance from each cryptocurrency to the centroid. The mean distance of all assets demonstrates their (dis)similarity in return distribution.

For two n-dimensional observations \(X=(x_{1},x_{2},\dots ,x_{n})\) and \(Y=(y_{1},y_{2},\dots ,y_{n})\), the Euclidean distance of these two points, d(XY), is defined as the root of the squared differences between the coordinates:

$$\begin{aligned} d(X,Y)=\sqrt{(X-Y)^{2}}=\sqrt{\sum _{i=1}^{n}(x_{i}-y_{i})^{2}.} \end{aligned}$$
(1)

If d(XY) is small (large), these two points are close to (further apart from) each other and hence share more (less) similarities in their characteristics. In this study, the Euclidean distance measure is used to analyse the four key moments of the return distribution of cryptocurrencies.

3.2.1 Data normalisation

Although the four moments are at different scales, they all contribute to the Euclidean distance. Without normalisation, variables with large scales dominate the distance calculation. Data normalisation is a popular technique in cluster analysis and machine learning to ensure that all the features belong to the same scale. The \(\ell ^{2}\)-normalisation is the standard technique based on the square root of the sum of squared vector values. Suppose vector X is defined as \(X=(x_{1},x_{2},\dots ,x_{n})\) and the \(\ell ^{2}\)-norm is calculated using the square root of the sum of squared vector values:

$$\begin{aligned} \left\| X\right\| _{2}:=\sqrt{x_{1}^{2}+\cdots +x_{n}^{2}}. \end{aligned}$$
(2)

To normalise X, we apply

$$\begin{aligned} \hat{x}_{i}=\frac{x_{i}}{\left\| X\right\| _{2}}, \end{aligned}$$
(3)

where \(i=1,\dots ,n\). Then the \(\ell ^{2}\)-norm of the new vector \(\hat{X}=(\hat{x}_{1},\dots ,\hat{x}_{n})\) is exactly 1, that is

$$\begin{aligned} \left\| \hat{X}\right\| _{2}=\sqrt{\hat{x}_{1}^{2}+\cdots +\hat{x}_{n}^{2}}=1. \end{aligned}$$
(4)

The \(\ell ^{2}\)-norm can be calculated with machine learning packages in Python or R and is an improvement over the normalisation technique in Eun and Lee (2010). The method used in Eun and Lee (2010) is not scale invariant and consequently, suffers from scale dominance bias but the \(\ell ^{2}\)-normalisation ensures a stable and reliable Euclidean distance measure by re-scaling the variables for robust and unbiased calculations while still preserving their underlying correlation property.

3.2.2 Centroid Euclidean distance measure

The Euclidean distance for each cryptocurrency is measured from the centroid. The first step is to calculate the centroid as a reference point of the Euclidean distance for the entire basket of cryptocurrencies by measuring the distance of each pair of cryptocurrencies, resulting in their positions changing across different measurement periods.

For any system in n-dimensional space, the centroid is the average of all points across all coordinate directions. For a system with N cryptocurrencies during time period t, the coordinates of cryptocurrency \(X_{it}\) are \((x_{it1},x_{it2},x_{it3},x_{it4})\) where \(x_{it1},x_{it2},x_{it3},x_{it4}\) represent the first four normalised moments of the return distribution (i.e. mean, standard deviation, skewness and kurtosis). The centroid of this system is given as

$$\begin{aligned} C_{t}=(c_{t1},c_{t2},c_{t3},c_{t4})=\left( \frac{1}{N_{t}}\sum _{i=1}^{N_{t}}x_{it1},\frac{1}{N_{t}}\sum _{i=1}^{N_{t}}x_{it2},{\frac{1}{N_{t}}}{\sum _{i=1}^{N_{t}}x_{it3}},{\frac{1}{N_{t}}}{\sum _{i=1}^{N_{t}}x_{it4}}\right) , \end{aligned}$$
(5)

where \(t=1,\dots ,T\) and T represents the final period. Notably, the number of cryptocurrencies may differ from one period to another because of their different starting dates throughout the sample. For example, Bitcoin started around May 2010 whereas Solana started much later, in April 2020. \(c_{tk}=\frac{1}{N_{t}}\sum _{i=1}^{N_{t}}x_{itk}\) measures the average of the kth moment across all the \(N_{t}\) cryptocurrencies in the system.

We define the Centroid Euclidean distance measure (\(ED_{t}\)) for a system of \(N_{t}\) cryptocurrencies at time period t as the average distance between each cryptocurrency and its centroid. \(ED_{t}\) is given as:

$$\begin{aligned} ED_{t}:=\frac{1}{N_{t}}\sum _{i=1}^{N_{t}}d(X_{it},C_{t}),\,\,\,i=1,\dots ,N_{t};\,\,t=1,\dots ,T. \end{aligned}$$
(6)

It is important to note that using Euclidean distance offers a distinct advantage. It does not require any asset pricing model or synthetic variable, and thus can be regarded as model-free and highly robust (Eun & Lee, 2010). It is also noteworthy that the measure presented here is a noticeable improvement over the technique employed by Eun and Lee (2010), as our approach utilises the feature scaling technique via data normalisation and extends the measure to cover higher moments of the return distribution.

In addition to the mean Euclidean distance measure of the entire system in Eq. (6), we also examine the mean distance for each individual return characteristic. The kth moment’s mean distance over time period t (\(D_{tk}\)) is given as

$$\begin{aligned} D_{tk}=\frac{1}{N_{t}}\sum _{i=1}^{N_{t}}\left| x_{itk}-c_{tk}\right| ,\quad i=1,\dots ,N_{t};\quad t=1,\dots ,T,\,\,k=1,2,3,4. \end{aligned}$$
(7)

We take the sum of the absolute values for the difference because the distance measure is always non-negative.

3.3 Bitcoin Euclidean distance measure

To allow for a comparison of the group distance method and pairwise similarity method influence, we compute the Euclidean distance for Bitcoin in place of the centroid. Bitcoin is one of the most dominant cryptocurrencies; therefore, it is a natural reference point instead of the centroid. We calculated the Euclidean distance measure using Bitcoin and found that the Bitcoin Euclidean distance was more volatile than the centroid. However, the results obtained using both distance measures were consistent. The centroid is the group average for the mean return, mean risk, mean skewness and mean kurtosis of the groups, and it becomes increasingly less sensitive to extreme values as more cryptocurrencies are included in its calculation. Consequently, it is influenced more by the collective values of the group of cryptocurrencies and when more cryptocurrencies are included, the centroid becomes less sensitive to extreme movements in any single cryptocurrency.

The definition of the Bitcoin Euclidean distance is similar to that of Centroid Euclidean distance. We define the Bitcoin Euclidean distance measure (\(BED_{t}\)) for a system of \(N_{t}-1\) cryptocurrencies in time period t as the average distance between each altcoin and Bitcoin. \(BED_{t}\) is then given as

$$\begin{aligned} BED_{t}:=\frac{1}{N_{t}}\sum _{i=1}^{N_{t}}d(X_{it},X_{Bt}),\,\,\,i=1,\dots ,N_{t};\,\,t=1,\dots ,T, \end{aligned}$$
(8)

where \(X_{Bt}\) represents the Bitcoin’s first four normalised moments of return distribution.

3.4 Temporal evolution of Euclidean distance and implications of Euclidean distance on investment opportunity in the mean-variance framework

If the mean Euclidean distance of the system shows a downward trend, the system contracts and the return characteristics of cryptocurrencies become more similar to one another. Conversely, if the distance shows an upward trend, the system expands and the return characteristics of different cryptocurrencies become more dissimilar. It is possible that greater dissimilarity can be attributed to a particular or small subset of cryptocurrencies. Regardless of whether the greater dissimilarity is due to an expansion in the system or a particular or small subset of cryptocurrencies, any increase in dissimilarity can be used to highlight a potential opportunity to trade. Trading strategies can be developed from greater dissimilarities without the need to identify contributors to the dissimilarity. In Sect. 5, we present a successful trading strategy, without this particular consideration.

The mean risk-return distance for international equity indices has shown convergence over time (Eun & Lee, 2010). We investigated whether cryptocurrencies also demonstrate similar behaviours, especially when higher moments are included. To confirm potential convergence, we calculated the monthly Euclidean distance for the 17 cryptocurrencies over 90 months from January 2014 to June 2021. We did not use a fixed historical window to calculate the monthly distances. Instead, the distance value for a month is based simply on all daily returns in that month. This approach was consistent with that of Eun and Lee (2010). We do this for the distance of each individual moment and its normalised version, as well as for the two-moment (return and risk) and four-moment combinations. For the most comprehensive four-moment model, we tested the convergence and divergence of the mean Euclidean distance of the system using the following regression:

$$\begin{aligned} ED_{t}=\beta _{0}+\beta _{1}\cdot \text {Time}+\varepsilon _{t},\quad \text {Time}=1,\dots ,90. \end{aligned}$$
(9)

We calculate the Newey–West heteroskedastic autocorrelation consistent t-statistics with six lags. For a more comprehensive analysis, this regression is performed not only the entire sample period but also for sub-samples. Regarding break points, external events may be more meaningful than an endogenous determination based on the data. We identified the first ‘Crypto Winter’Footnote 5 as our break-point in our sample data. This is a long period of continuous and substantial devaluation following the first Bitcoin crash and covers the period from 1 December 2017 to 28 February 2019. Based on the break-point, we split the entire sampling period (denoted as t) into three sub-periods: (i) before (denoted as \(t_1\)); (ii) during (denoted as \(t_2\)) and; (iii) after (denoted as \(t_3\)) the ‘Crypto Winter’. We use the same period divisions throughout the remainder of this study to make our analysis consistent. Interestingly, the start of this crash coincides with the introduction of Bitcoin futures on the Chicago Board Options Exchange (Apergis et al., 2021). To be more accurate, there is a gap of several days between the introduction of Bitcoin futures (10/12/2017) and the start of the crash (16/12/2017), which should not affect the monthly Euclidean distance. We first conduct the Augmented Dickey–Fuller (ADF) test to ascertain whether the error term has a unit root with no time trend. If the error term does not have a unit root, then the regression results should be robust and reliable.

The search for a general trend is largely motivated by the need to identify financial convergence or divergence as a result of the varying behaviour of cryptocurrencies across different market conditions such as bullish and bearish markets (Rubbaniy et al., 2022). Our sample period lends itself very well to investigating different market conditions because the first and third sub-periods are bullish whereas the second sub-period is bearish due to the ‘Crypto Winter’. The Euclidean distance measure is a suitable technique for studying the relationship between cryptocurrencies via convergence and divergence phenomena. In this case, regressions and trendlines are effective tools for analysing the general tendency of cryptocurrencies to converge or diverge.

The convergence (divergence) of the Euclidean distance has important implications for investors and fund managers. Eun and Lee (2010) point out that the decreasing risk-return distance has a negative impact on the investment opportunity set for equity indices. We test whether the mean Euclidean distance and investment opportunities also show a positive relationship in cryptocurrency markets. We divided the entire sampling period into three sub-periods, \(t_1\), \(t_2\) and \(t_3\), as mentioned in Sect. 3.4. To compare performance across the three sub-samples, we use the daily returns of each individual cryptocurrency to compute portfolio returns, volatility and Sharpe ratios.

3.5 Bitcoin cosine similarity measure

To develop further insights into the direction of movement of altcoin pairs with respect to Bitcoin, we used the cosine similarity measure. Instead of using the centroid as a reference point, Bitcoin was used. Bitcoin is the most established cryptocurrency, accounting for 46.31% of the global cryptocurrency market cap and 57.13% of the total market cap in our sample as of 30 June 2021. Thus, we examine whether other cryptocurrencies are becoming more similar or dissimilar to Bitcoin in terms of their return characteristics. From the investors’ point of view, investing in cryptocurrencies with different characteristics provides added risk diversification benefits during portfolio formation.

We used the Bitcoin cosine similarity measure as it is a powerful tool for analysing pairwise similarity. The Euclidean distance is less intuitive and its use for pairwise comparisons can be potentially misleading. Consider the example of three cryptocurrencies: A (4% return and 3% risk), B (8% return and 6% risk) and C (8% return and 1% risk). The return-risk Euclidean distance between A and B is 5% which is the same as that between B and C (see Fig. 1). However, by calculating the return for every 1% of the risk taken (a simplified Sharpe ratio), both A and B are 1.33% while C is 8%. Hence, given the same Euclidean distance, A and B are quite similar as both have the same Sharpe ratio value; however, B and C are significantly different. This shows that the pairwise Euclidean distance could be misleading as it only captures the magnitude of the distance rather than the orientation of movement. Moreover, an analogy using the Sharpe ratio cannot be extended to the Euclidean distance measure involving all four moments. To overcome this limitation, we use the Bitcoin cosine similarity measure to estimate the similarity in the return characteristics between Bitcoin and another cryptocurrency.

Fig. 1
figure 1

Limitation of pairwise Euclidean distance measure. A, B and C are three hypothetical cryptocurrencies with given returns (%, horizontal axis) and risks (%, vertical axis)

Suppose we have two n-dimensional vectors \(A=(a_{1},\dots ,a_{n})\) and \(B=(b_{1},\dots ,b_{n})\), the cosine similarity, \(\cos (\theta )\), is defined as

$$\begin{aligned} \cos (\theta )=\frac{A\cdot B}{\left\| A\right\| _{2}\left\| B\right\| _{2}}=\frac{\sum _{i=1}^{n}a_{i}b_{i}}{\sqrt{\sum _{i=1}^{n}a_{i}^{2}}\sqrt{\sum _{i=1}^{n}b_{i}^{2}}}, \end{aligned}$$
(10)

where \(\theta \) is the angle and vectors A and B; \(\left\| \cdot \right\| _{2}\) is the \(\ell ^{2}\)-norm defined earlier. By construction, cosine similarity should range from \(-1\) to 1. If the cosine similarity measure value shifts from \(-1\) to 1, the angle between the two vectors swings from 180\(^{\circ }\) to 0\(^{\circ }\) correspondingly. This implies that the vectors A and B range from perfect dissimilarity to perfect similarity. The other values indicate various degrees of similarity or dissimilarity.

There are two major differences between the Pearson’s correlation (or colloquially, correlation coefficient) and Bitcoin cosine similarity index between cryptocurrencies and Bitcoin. First, the cosine similarity index is responsive to shifts in return characteristics; therefore, it is a more useful measure of the relationship between cryptocurrencies and Bitcoin. Second, unlike the correlation coefficient, the cosine similarity index is more comparable to the Sharpe ratio. For example, assume the return-risk distance context for two cryptocurrencies: Bitcoin and Ether. Bitcoin (return: 0.4, risk: 0.2) and Ether (return: 0.04, risk: 0.02). It can be observed that the Sharpe ratio, cosine similarity index and correlation coefficient are 2, 1 and 1 respectively for both cryptocurrencies. However, if both the return and risk of Ether are increased by 0.04 to 0.08 and 0.06 respectively, the Sharpe ratio, cosine similarity index and correlation coefficient are 1.33, 0.984 and 1, respectively. Although it can be observed that both the Sharpe ratio and cosine similarity index have changed, the correlation coefficient remains unchanged, indicating that the correlation coefficient fails to capture the dissimilarity shift. Furthermore, the correlation coefficient for two moments can only take the values \(-1\), 0 and 1, which is not useful. We note that although the values of the Bitcoin cosine similarity shift from \(-1\) to 1, they are different from Pearson’s correlation. The Pearson correlation of the two variables x and y is defined as

$$\begin{aligned} Corr(x,y) =\frac{\sum _{i}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum _{i}^{n}(x_{i}-\bar{x})^{2}}\sqrt{\sum _{i}^{n}(y_{i}-\bar{y})^{2}}}, \end{aligned}$$
(11)

where \(\bar{x}\) and \(\bar{y}\) are the mean values of x and y respectively. The correlation coefficients in Eq. (11) are actually the cosine similarity between the centred versions of x and y (i.e. \(x-\bar{x}\) and \(y-\bar{y}\)). Hence, for the case of the Bitcoin cosine similarity index, it does not make sense to calculate \(\bar{x}\) or \(\bar{y}\), which are the average values with respect to x and y of all the required moments of the return distribution.

Suppose we have \(N_{t}-1\) alt-coins along with Bitcoin in the system at time t. Let \(CS(X_{Bt},X_{it})\) denote the cosine similarity between Bitcoin \(X_{Bt}=(x_{Bt1},x_{Bt2},x_{Bt3},x_{Bt4})\) and another cryptocurrency \(X_{it}=(x_{it1},x_{it2},x_{it3},x_{it4})\). Then following Eq. (10), we have

$$\begin{aligned} CS(X_{Bt},X_{it})=\cos (\theta _{Bit})=\frac{X_{Bt}\cdot X_{it}}{\left\| X_{Bt}\right\| _{2}\left\| X_{it}\right\| _{2}}=\frac{\sum _{j=1}^{n=4}x_{Btj}x_{itj}}{\sqrt{\sum _{j=1}^{n=4}x_{Btj}^{2}}\sqrt{\sum _{j=1}^{n=4}x_{itj}^{2}}}, \end{aligned}$$
(12)

where \(x_{\cdot t1},x_{\cdot t2},x_{\cdot t3},x_{\cdot t4}\) represent the first four normalised moments, \(i=1,\dots ,N_{t}-1\), \(t=1,\dots ,T\). We refer to this measure as the Bitcoin cosine similarity of cryptocurrency \(X_{i}\). The cosine similarity differs from the Euclidean distance in that the former is related to the overall direction of return characteristics whereas the latter is related to the magnitude of the geographical distance.

We analyse the evolution of the Bitcoin cosine similarity over time for all 16 alt-coins in the system. If the measure shows an upward (downward) trend converging to 1 (\(-1\)), the cryptocurrency’s return characteristics converge to (diverge from) Bitcoin’s. Following the analysis based on Euclidean distance, we tested the convergence (or divergence) of the cosine similarity for the four-moment model using the following regression:

$$\begin{aligned} CS_{i}=\beta _{0}+\beta _{1}\cdot \text {Time}+\varepsilon _{t},\,\,\,\text {Time}=1,2,\dots ,T_{i}, \end{aligned}$$
(13)

where \(T_{i}\) is the maximum period for cryptocurrency i. We perform the ADF test and calculate the Newey-West heteroscedastic autocorrelation consistent t-statistics. Regression is also carried out for the complete sample plus the three sub-periods \(t_1\), \(t_2\), and \(t_3\), as mentioned in Sect. 3.4.

4 Results

This section presents the key empirical results of this study. First, we discuss the results of the convergence (or divergence) in the mean-variance Euclidean distance and the four-moment Euclidean distance measures. Second, we demonstrate the diversification effect of convergence (or divergence) on efficient frontiers. Third, we report the relationship between Euclidean distance and uncertainty due to market news and regulatory events. Finally, we present the results for Bitcoin cosine similarity over time.

Table 2 Monthly mean Euclidean distance

4.1 Centroid Euclidean distance results

Table 2 shows the descriptive statistics for the monthly mean Euclidean distance of the 17 cryptocurrencies during the sample period. In addition, we present the four moments of the return distribution (original and normalised) as well as the two Euclidean distance measures (mean-variance and mean-variance-skewness-kurtosis).

Before normalisation, the respective ranges for the mean and standard deviation of cryptocurrency returns are between 46.48 and 265.95 and from 27.39 to 165.02 respectively. However, the range tightens following normalisation (from 1.90 to 2.70 and from 1.08 to 1.85, respectively). Table  3 reports the Pearson correlation coefficients for the average distances of the four moments. It is important to note that all correlation coefficients among the four moments are identical before and after normalisation because of the normalisation invariant property of Pearson’s correlation. The correlation between the distance between the first two moments (return and risk) and between the higher moments (skewness and kurtosis) are both relatively high at 0.46 and 0.68 respectively. Figure 2 plots this distance over time for the four normalised moments. High correlations between returns and risk are also observed between skewness and kurtosis. In contrast, the distance correlations between lower and higher moments are much lower (and even negative), suggesting that higher moments behave differently than lower ones. Consequently, incorporating higher moments will enrich lower moments with new dimensions, so it is more comprehensive and informative to also analyse the four-moment distance in addition to the return-risk distance. Although both return-risk and four-moment measures lead to consistent results in our analysis in terms of the general trends of distance over time, which makes the results more reliable, these two measures may differ in other situations, which could have interesting implications (for investment strategies, for example) and require further research. With a standard deviation of 2.35%, the four-moment distance is slightly more volatile than the return-risk distance (2.10% standard deviation).

Table 3 Pearson correlation for Euclidean distance of the four moments
Fig. 2
figure 2

Euclidean distance for the four moments over time. This figure shows the mean Euclidean distance for the four normalised moments over time

An increasing (decreasing) Euclidean distance over time indicates an increasing (decreasing) difference in the return characteristics among cryptocurrencies as they become more dissimilar (similar) to one another. In other words, this increasing (decreasing) distance shows an expansion (contraction) of the system and a divergence from (convergence to) the centroid of the cryptocurrencies. Figure 3 delineates the temporal development of both the four-moment and return-risk distances. The four-moment distance demonstrates an upward trend (divergence) in \(t_1\), followed by a clear downward trend in \(t_2\) (convergence), followed by an upward trend (divergence) afterwards. To validate the results, we conducted an analysis using return and risk distances (see Fig. 3b). Similar to the four-moment distance graph, the return-risk distance shows a downward trend over time, although it is slightly less obvious. We can also see an upward trend (divergence) during \(t_1\), a clear downward (convergence) trend during \(t_2\) and a clear upward (divergence) trend during \(t_3\). This result agrees with that of the four-moment distance, which makes the results more reliable. However, the two distance measures may differ in some cases, which could have interesting implications (for investment strategies, for example) and require further research. Moreover, the presence of convergence during the crash and divergence afterwards (as shown by the distance measures) is consistent with Yaya et al. (2019), who found a weaker relationship among cryptocurrencies after the 2018 crash.

Figure 3 shows that there are four peaks in both the two-moment and four-moment distance measures. These spikes correspond to May 2014, March 2015, July 2016 and October 2017, where Bitcoin-specific events occurred. Specifically, Bitcoin started with a significant price increase on 19 May 2014 due to eBay’s potential adoption and BitPay’s record-setting fundraising campaign.Footnote 6 As of 10 March 2015, Bitcoin Startup 21 announced a $116 million all-star backingFootnote 7 while on 9 July 2016 and 9 October 2017, we observed that Bitcoin underwent a second halving eventFootnote 8 and Bitcoin price suffered an unexpected flash crash on the popular cryptocurrency indexFootnote 9 respectively. As these events are specific to Bitcoin and during the early years of cryptocurrency, these events had the potential to drive Bitcoin away from the centroid. Therefore, an increase in Bitcoin with respect to the averages of all distances between each cryptocurrency in the system would have a much stronger effect during the earlier days of cryptocurrency, when few cryptocurrencies existed. Furthermore, as an increasing number of cryptocurrencies are launched over time, the influence of each cryptocurrency on the centroid diminishes. Hence, the greater number of peaks observed during the earlier years can be attributed to both Bitcoin-specific events and the smaller number of cryptocurrencies in existence during those peaks.

Fig. 3
figure 3

Centroid Euclidean distance over time. This figure shows the mean centroid Euclidean distance (four-moment and return-risk) over time. The regression lines are estimated by Eq. (9)

Figure 4 shows the results based on three different measures of distance: Manhattan, Euclidean and Chebyshev. It can be observed that the different distance measures tend to display a similar pattern. The Manhattan distance (blue line) is higher than the Euclidean distance (orange line). These measures converge to \(p=\infty \), demonstrating that our measure is robust. In the machine learning literature, the Manhattan distance measure is more useful in higher dimensional vectors where the Euclidean distance measure performs slightly worse (see (Aggarwal et al., 2001)). Given that the Euclidean distance measure is easy to understand and that we consider only four moments in our study, the Euclidean distance measure is well suited for our purpose.

Table 4 displays the regression results based on Eq. (9) for both the four-moment and return-risk Euclidean distances in the entire sample period as well as in the sub-samples. In all cases, our ADF test rejects the null hypothesis of a unit root at the 1% level, suggesting that the regression estimates are robust. For the four-moment distance, the convergence trend is significant at the 1% level for the entire sample, indicating convergence over time among the cryptocurrencies in terms of return characteristics. By contrast, the sub-sample after the crash shows significant divergence (although only at 10%). The return-risk Euclidean distance result is similar to the four-moment result even though its downward trend for the entire sample is not statistically significant. However, the downward trend during the crash and the upward trend afterwards are both significant at the 1% level. Finally, the results of our diagnostic tests in Table 4 (as well as the subsequent Tables 5, 7 and 8) show that the residuals are not always normally distributed, which is a stylised fact of cryptocurrencies (Zhang et al., 2018; López-Martín et al., 2022). However, in most cases, they are homoscedastic and uncorrelated.

Fig. 4
figure 4

Three Minkowski distance measures: the Manhattan distance when \(p=1\); the Euclidean distance when \(p=2\); the Chebyshev distance when \(p=\infty \)

Table 4 Results of convergence/divergence test based on centroid Euclidean distance using Eq. (9)

4.2 Bitcoin Euclidean distance results

In this section, we present the BED results. Figure 5 shows the development over time of both four-moment and return-risk BED while Table 5 reports the convergence test results of the BED. In general, BED follows the same trends as the centroid Euclidean distance (i.e. convergence for the whole sample period and sub-period 2, but divergence for sub-periods 1 and 3). However, the BED tends to fluctuate more than the centroid-based distance because the return distribution of Bitcoin is more volatile than that of the centroid.

Fig. 5
figure 5

Bitcoin Euclidean distance over time. This figure shows the mean Bitcoin Euclidean distance (four-moment and return-risk) over time. The regression lines are estimated by Eq. (9)

Table 5 Results of convergence/divergence test based on Bitcoin Euclidean distance using Eq. (9)

4.3 Euclidean distance and investment opportunity in the mean-variance framework

We divide the complete sample into three sub-periods as indicated in Sect. 3.4. The means of the four-moment and Euclidean return-risk distance measures for these three sub-periods are 0.0638, 0.0420, 0.0480, and 0.0353, 0.0245, 0.0290, respectively. For both distance measures, sub-period \(t_1\) has the highest value whereas \(t_2\) has the lowest.

Fig. 6
figure 6

Efficient frontier with all the cryptocurrencies. This figure shows the efficient frontiers of three periods: \(t_1\) before 2017–11 (I), \(t_2\) 2017–12 to 2019–02 (II), and \(t_3\) 2019–03 to 2021–06 (III). The ‘+’ sign represents the minimum volatility portfolio and the star the maximum Sharpe ratio portfolio

Fig. 7
figure 7

Efficient frontier with six cryptocurrencies from 2015–12-01. These six cryptocurrencies are BTC, ETH, DOGE, XRP, LTC, and XLM. This is the same as in the later Sect. 5. This figure shows the efficient frontiers of three periods: \(t_1\) before 2017—11 (I), \(t_2\) 2017–12 to 2019–02 (II), and \(t_3\) 2019–03 to 2021–06 (III). The ‘+’ sign represents the minimum volatility portfolio and the star maximum Sharpe ratio portfolio

Figure 6 presents the efficient frontiers for these sub-periods. We denote the efficient frontiers of the three subperiods as I–III respectively. Frontier I is the most efficient frontier because its portfolios offer the highest expected return for a given level of risk/volatility and the lowest risk for a given expected return. Meanwhile, frontier II is the worst. Regarding the maximum Sharpe ratios, frontier I has the highest value (4.398), followed by III (2.353) and II (0.783). Overall, the ranking of portfolios performance is always consistent with the ranking of Euclidean distance. The result implies that superior investment opportunities in cryptocurrencies are strongly associated with a higher distance among them, most likely due to their dissimilar return characteristics and hence better diversification. This finding is consistent with Eun and Lee (2010)’s finding that there is a significant and positive relationship between the opportunity set and the Euclidean distance measure.

There are several possible factors driving the results in efficient frontiers, as shown in Fig. 6. As an example, we took the number of cryptocurrencies and high returns during the early days. To account for the number of cryptocurrencies, we repeat the analysis of efficient frontiers using only the six cryptocurrencies available from 2015–12–01, namely, BTC, ETH, DOGE, XRP, LTC, and XLM. In other words, we use the same cryptocurrencies in the first sub-period across all three sub-periods. We find little difference from our earlier results in that a higher distance is associated with better efficient frontiers. The results are also shown in Fig. 7. The means of the four-moment and return-risk distance measures for the three sub-periods are 0.102, 0.065, 0.070, and 0.061, 0.047, 0.041 respectively. It is worth noting that the sub-period \(t_3\) has a slightly lower return-risk distance value than sub-period \(t_2\) because, despite having a strong upward (divergence) trend (results available upon request), \(t_3\)’s return-risk distance is heavily skewed by an ultra-low value immediately after \(t_2\) (which has a downward trend). The four-moment distance, in contrast, is less impacted by this and therefore shows slightly more reliable results in this case. Thus, we find similar results to the original analysis of all cryptocurrencies, where higher distance and the upward (divergence) trend tend to be associated with better efficient frontiers.

As for high returns during the early days, this coincided with the determination of the sub-periods. The three sub-periods correspond to before, during and after the ‘Crypto Winter’ where cryptocurrencies underwent regimes of high, low and high returns respectively. A low-return regime or bear market tends to strengthen the relationship among financial assets (Butler & Joaquin, 2002; Campbell et al., 2002; Ang & Bekaert, 2004) which reduces diversification opportunities. We observe that the second sub-period has the smallest distance or the strongest association between cryptocurrencies and therefore, provides the lowest diversification opportunities. This was the worst efficient frontier.

Given the benefits of cryptocurrencies in portfolio management, cryptocurrencies as an emerging asset class (Andrianto & Diputra, 2017; Huynh et al., 2020; Omanović et al., 2020) have increasingly become one of the indispensable asset classes for risk-loving portfolio managers to consider especially during adverse market conditions such as the coronavirus disease pandemic. Utilising several cryptocurrencies can benefit a portfolio favourably (Mensi et al., 2019), whether the portfolio consists of all cryptocurrencies or a mix of cryptocurrencies with other traditional assets. However, specific tools should be developed for portfolio and fund managers to achieve optimal results in managing cryptocurrency-related portfolios (Platanakis & Urquhart, 2019). One such consideration is the potential use of Euclidean distance to determine how portfolios can be formed. For example, when Euclidean distance increases, the efficient frontier is expected to shift favourably. Portfolio and fund managers can adjust the portfolio weights of individual cryptocurrencies respectively to the new optimal allocation and avoid sub-optimal holdings when shifts in the efficient frontiers are not incorporated.

In addition to portfolio management, Euclidean distance analysis can be highly useful for certain trading strategies such as pair trading. As a form of relative value arbitrage (Gatev et al., 2006), this popular strategy involves pairing closely related securities that tend to move closely together. The main idea is that if two such securities diverge to a certain extent, one can reasonably expect them to converge to their usual relationship after some time. To take advantage of this behaviour, after sufficient divergence, traders can buy undervalued and short-sell overvalued assets simultaneously with the expectation of their eventual convergence. If cryptocurrencies are closely related to one another, pair trading is a promising strategy for profits. The pivotal element in pair trading is the measure of divergence used to establish a trigger threshold for opening trades. Therefore, the use of the Euclidean distance measure enables traders to utilise the pair trading strategy to determine the timing of the strategy by observing the Euclidean distance of convergence or divergence between the different pairs of individual alt-coins against Bitcoin and the centroid.

4.4 Euclidean distance and UCRY indices

Next, we hypothesise that uncertainty arising from market news and regulations events is an important driver of Euclidean distance. We examine the Pearson correlation matrix of UCRY indices, return-risk and four-moment Euclidean distance measure, as well as their evolution across time. In addition, we regress the Euclidean distance measure on the URCY indices as follows:

$$\begin{aligned} ED_{t}=\beta _{0}+\beta _{1}\cdot \text {UCRY}_{t}+\varepsilon , \end{aligned}$$
(14)

where \(t=1,\dots ,T\) represents the monthly period. We estimate the model for the entire sampling period (T = 90) as well as Periods \(t_2\) and \(t_3\).

Table 6 Correlation of UCRY indices and Euclidean distance

Table 6 shows the Pearson correlation matrix of UCRY indices and the mean Euclidean distance (return-risk and four-moment) over the entire sampling period, as well as subperiods \(t_2\) and \(t_3\). During these periods, the two UCRY indices were almost perfectly correlated (\(>0.98\)) while the two distance measures were closely related (0.768, 0.670 and 0.755 respectively). For the full period, all the correlation coefficients between both the UCRY indices and the Euclidean distance were low, ranging from \(-0.037\) to 0.162. In sub-periods \(t_2\) and \(t_3\) however, their relationship becomes much stronger; in particular, during subperiod \(t_2\) the correlation between the mean return-risk (four-moment) distance and the UCRY indices is now 0.76 (0.39). A positive correlation suggests that greater uncertainty (higher UCRY indices) is associated with higher distance. In a sense, distance can be considered a measure of dispersion and volatility, so one can say that more uncertainty is associated with more volatility. This suggests that the converging distance is more related to uncertainty. Figure 8 shows that compared to the four-moment distance, the return-risk distance seems to behave more similarly to the two UCRY indices, especially after 2018. The pre-2018 period showed strong fluctuations in both distance measures.

Fig. 8
figure 8

UCRY indices and Euclidean distance over time. This figure shows the time evolution of the UCRY indices and the mean Euclidean distance (return-risk and four-moment). The dashed lines represent the two cut-off points: 2017–11 and 2019–02

Table 7 reports the regression results for the Euclidean distance on UCRY indices. In most cases, the indices have statistically significant and positive impacts on the distance measures, which suggests that higher uncertainty should lead to higher volatility. The explanatory power of the indices has clearly increased significantly after 2018. This finding is particularly interesting because the break point at the start of 2018 coincides with the 2018 cryptocurrency bubble burst. Following the event, cryptocurrencies have been under closer scrutiny and stricter regulations, which could explain why the impact of events was less severe before 2018 but quite substantial afterwards. Moreover, cryptocurrency events seem more relevant to the return-risk distance than the four-moment distance, which suggests that they influence the lower moments more than the higher ones. In line with the AMH, the variation in the Euclidean distance between cryptocurrencies over time is driven by uncertainty in regulations and market movements. This result implies that changing environmental conditions in the market play a significant role in determining the variation in the Euclidean distance between cryptocurrencies over time.

Table 7 Regression results for Euclidean distance on UCRY indices according to Eq. (14)

The reported effect of the UCRY indices on Euclidean distance has substantial implications for regulators and policy makers. Although cryptocurrency markets are currently regarded as largely unregulated, governments can still exert some influence on these markets if a need arises. Specific regulations on taxation, initial coin offerings and bans have already been adopted in some jurisdictions. As higher uncertainty can lead to higher volatility, regulators must be mindful of the potential consequences and carefully consider the impact of these regulations and policies on cryptocurrency markets before committing to a given piece of legislation or policy. As cryptocurrency prices are mostly driven by emotions and sentiments (Cheah et al., 2018), the effect of cryptocurrency price shocks could potentially be transmitted to traditional financial markets. Therefore, it is crucial that regulators and policy makers consider how the information content of news regarding regulatory and policy changes to cryptocurrency markets will be perceived by retail and institutional participants. In addition, a practical approach by national governments to dampen the effects of news is to time news and policy announcements. Announcing new policies in times of reduced trading activities can help alleviate the possibility of overreaction. For example, news releases at the end of the trading week can help minimise excessive trading because many traders have more time to consider and digest the impact of this news during the weekend.

It is also important to note that the decisive role of the UCRY indices in determining Euclidean distance also has substantial implications for traders and investors. Traders can deploy distance measures as useful tools to enhance trading and portfolio management. Therefore, identifying the key drivers of Euclidean distance to anticipate changes in their distance measures more reliably can be advantageous for traders. If the UCRY indices are likely to increase, signifying greater uncertainty in the near future, traders can expect cryptocurrencies to diverge from each other. The extent of this divergence could potentially present a pair trading opportunity to speculative traders to exploit. Consequently, these traders can place limit orders in advance to open trades as soon as their chosen cryptocurrency pairs meet the Euclidean distance criterion. Conversely, if the UCRY indices are likely to decrease, traders should continue to scrutinise the market and wait patiently for the next trading opportunity to present itself. In terms of portfolio management, an increase in the URCY indices that will lead to an increase in the Euclidean distance could signal a potential shift towards a superior efficient frontier as better diversification benefits become available. Therefore, investors should take advantage of rebalancing their portfolios or constructing a new portfolio if they have not done so. By contrast, a potential decrease in UCRY indices indicates a lack of profitable investment opportunities to exploit; therefore, investors may choose to stay out of the market, waiting for a better time to enter it or adjust their positions accordingly.

4.5 Bitcoin cosine similarity

If the Bitcoin cosine similarity of a cryptocurrency shows an upward (downward) trend converging to 1 (\(-1\)), the cryptocurrency’s return characteristics converge to (diverging from) Bitcoin’s return characteristics. Figure 9 shows the Bitcoin cosine similarity over time for all 16 alt-coins and the respective regression lines based on Eq. (13). This measure may fluctuate significantly (especially in the short term), which could be due to its construction. We observe that the large variation in the cosine similarity measure over time is due to the use of Bitcoin instead of the centroid of the overall market. As Bitcoin is a single asset, it should be relatively more volatile than the centroid. The centroid is a collection of more than one asset. By extension, it is unsurprising that the Bitcoin cosine similarity measure is relatively more volatile. The same phenomenon can be observed with the Bitcoin Euclidean distance, which is more volatile than the centroid-based Euclidean distance. Given the short-term volatility of Bitcoin-based measures, it may be more meaningful to focus on general trends in the long term.

Table 8 reports the regression estimates of the Bitcoin cosine similarity measure. For the entire sample period, eight cryptocurrencies exhibit a statistically significant upward trend and none of the alt-coins shows any clear downward trend, suggesting that over time most major alt-coins converge to Bitcoin in terms of return characteristics. In terms of the analysis of the sub-periods, we were mainly interested in the sub-periods of \(t_2\) and \(t_3\) as these two sub-periods contain data on most of the cryptocurrencies. As for sub-period \(t_1\), we have meaningful data for only six cryptocurrencies, none of them shows any significant trends and therefore, we do not show the results in Table 8. During the sub-period \(t_2\), five alt-coins show an upward trend (three of which are statistically significant) and five show a downward trend (two of which are statistically significant). As for the sub-period \(t_3\), eight alt-coins are statistically significant in terms of showing a downward trend. Only three alt-coins show an upward trend but these alt-coins are relatively new (introduced in 2020) and the ADF test failed to reject the null hypothesis of a unit root for two of them. Consequently, the trend for these alt-coins may not be reliable. Overall, the results for Bitcoin cosine similarity are highly consistent with those for Euclidean distance. Many top alt-coins (e.g. ETH, BNB, ADA, XRP, BCH, LTC, XLM, and ETC) converge to Bitcoin before 2019–02 and diverge afterwards. As an extension of the Euclidean distance measure, the Bitcoin cosine similarity measure gives additional insight into the convergence (or divergence) of return characteristics in terms of individual alt-coins and the information generated could be useful in constructing a diverse cryptocurrency portfolio.

Fig. 9
figure 9

Bitcoin cosine similarity over time. This figure shows the time evolution of Bitcoin cosine similarity for all 16 non-bitcoin cryptocurrencies. The regression lines are based on Eq. (13). The two dashed lines represent the two break points: 2017–11 and 2019–02

Table 8 Regression results for Bitcoin cosine similarity based on Eq. (13)

5 Application: a trading strategy based on Centroid Euclidean distance

From Sect. 4.3, we learn that there is a positive relationship between the investment opportunity set and Euclidean distance. This means that given the same risk level, the portfolio on the efficient frontier with a higher Euclidean distance may have a higher return. As such, we can use centroid Euclidean distance to generate a trade signal. To generate a daily trading signal, instead of calculating the monthly Euclidean distance shown earlier, we calculate the rolling window Euclidean distance.

Here we illustrate a trading strategy based on the centroid Euclidean distance. First, we constructed a cryptocurrency portfolio consisting of six cryptocurrencies. We chose BTC, ETH, DOGE, XRP, LTC and XLM because of the data available from 1 December 2015. Given that the centroid is sensitive to the number of cryptocurrencies used, we avoided the use of cryptocurrencies from the beginning of our dataset as there were only four cryptocurrencies available from January 2014. Next, we calculated the centroid Euclidean distance of the portfolio using a 30-day rolling window. Subsequently, we used the daily price of cryptocurrencies from Day \(i-31\) to Day \(i-1\) to generate the trading strategy for Day i. We considered two ways to allocate the portfolio: (i) assigning equal weight to each asset and (ii) optimising the portfolio according to the minimum variance portfolio weight allocation. We carried out the following trading strategy. If the centroid Euclidean distance is greater than a pre-determined threshold, denoted by \(\alpha \), we invest in the portfolio, because it indicates that the portfolio is more profitable given a certain risk level. Otherwise, we do not invest in the portfolio or if we have an existing position, we liquidate it to exit the market, because the low distance indicates that it may not be the optimal time to invest. In our trading strategy, we assume that no risk-free assets are available and consequently, the estimated trading performance is conservative. In other words, actual performance in practice could be even better because we can invest in risk-free assets in the absence of trading opportunities due to the low distance.

Fig. 10
figure 10

Portfolio performance. The threshold level \(\alpha =0.0075\). The starting values of all the indices are 1. ‘Port_ret’ refers to long only strategy; ‘Strategy’ refers to the distance based trading strategy; ‘ew’ refers to equal weight; and ‘mvp’ refers to the minimum variance portfolio

Table 9 compares the investment statistics of the long-only and distance-based trading strategies. The strategy decreases volatility while increasing the Sharpe and Calmar ratio and reducing the maximum drawdown, compared with the long-only strategy. Therefore, the simple strategy presented here is useful for longer term investments and can naturally be used for generating stop-loss signals. Interestingly, our trading strategy outperforms the benchmark for both the minimum variance portfolio and naive equal-weighted portfolio (see Figs 10 and 11), which is consistent with Nguyen et al. (2020), who confirm that a basic equal-weighted portfolio is sufficient to benefit investors.

Table 9 Return-risk statistics against threshold level \(\alpha \)
Fig. 11
figure 11

Negative returns of the equal weight and minimum variance portfolio. \(\alpha =0.0075\). ‘Port_ret’ refers to long only strategy; ‘Strategy’ refers to the distance based trading strategy; ‘ew’ refers to equal weight; ‘mvp’ refers to minimum variance portfolio

6 Conclusion

Machine and deep learning are often used to predict cryptocurrency prices (see, for example, (Zoumpekas et al., 2020; Zhang et al., 2021; Kim et al., 2021)). These approaches are frequently described as ‘black boxes’ and in this study, we used some of the powerful tools from the ‘black boxes’ to analyse the return characteristics of the cryptocurrency market. We first employed two distance measures used in machine learning, namely the Euclidean distance and Bitcoin cosine similarity, to analyse the convergence and divergence in the return characteristics of major cryptocurrencies. We extend the Euclidean distance measure by incorporating higher moments (skewness and kurtosis) using the \(\ell ^{2}\)-normalisation approach. The findings from our new distance measure, together with Bitcoin cosine similarity, reveal new insights to the cryptocurrency financial convergence literature. Both measures confirm that in terms of return characteristics, top cryptocurrencies only converge during the 2018 crash and start to diverge afterwards.

We also found that there is a significant and positive relationship between the mean Euclidean distance and investment opportunities in the cryptocurrency universe. It is noteworthy that the Euclidean distance is a strong indicator of meaningful shifts in the efficient frontier, which have clear and practical implications for traders and investors in constructing their portfolios. Thus, we offer a profitable trading strategy based on the Euclidean distance. Moreover, we shed light on the underlying determinants of the divergence after the 2018 crash. This divergence could be attributed to the increasing scrutiny and number of market and policy events after the first major Bitcoin/cryptocurrency crash in 2018. A major implication of this is that despite their different agenda and objectives, both market participants and regulators must watch out for significant external events with a high potential to disrupt markets and push individual cryptocurrencies away from their long-term fundamental relationships. Hence, future research should focus on different types of price/policy events and their respective effects on the cryptocurrency market.

This study has several limitations. First, our distance measures include the Euclidean distance and Bitcoin cosine similarity. These distance measures do not directly quantify the added benefits of diversification and future studies could develop a specific tool to help traders observe the impact of a percent change in divergence on the risk-adjusted expected return. Second, we only consider regulations and market uncertainty events as potential drivers of distance without considering other intervening variables. A direction of inquiry could investigate the specific intervening and moderating variables that link uncertainty in market news and regulations announcements to distance measures. Finally, our time interval frequency of daily observations limits our ability to investigate intraday trading dynamics. Researchers could use high frequency intraday data to develop a more sensitive tool to assist traders given the rapid changes in cryptocurrency markets.