1 Introduction

During the last two years we have witnessed the creation of a large number of cryptocurrencies. In 2018 this burst had been mainly fueled by the opportunity generated by the initial coin offering (ICO) mechanism used by companies as a new channel to fund innovation. Furthermore, this burst follows the surge of new business models based on blockchain and associated digital tokens and crypto-money. The most dynamic period in the cryptocurrencies market has been, so far, the beginning of 2018 on which this study is focusing. At the time of writing (September 2018) the cryptocurrency market capitalization was floating around 200 billion USD down from 800 billion USD reached in January 2018 (https://coinmarketcap.com/). This market comprises thousands of currencies with only a few with significant capitalization. In particular five currencies, namely, Bitcoin (BTC), Bitcoin Cash (BCH), Ethereum (ETH), Litecoin (LTC) and Ripple (XTC) have been dominating the market during the last few years with a share of capitalization consistently above 70%. Overall, there are 15 currencies with capitalization over 1 billion USD, more than 60 with capitalization over 100 million USD and about 800 with capitalization over 1 million USD. This is a new and confused market characterized by large volatilities, by quick increases in the value of some currencies at the time of their release and, often, a rapid decrease of the value afterwards until failure. This is a market strongly echoed in social media with great expectations, quick swifts of sentiment, strong beliefs and harsh disputes.

In the literature, there have been some studies of correlations in cryptocurrency markets highlighting the non-normal statistics of correlations between price fluctuations (Gkillas et al. 2018) and their relations with fiat currencies (Szetela et al. 2016). Social media and Twitter sentiment signals have been used to attempt nowcasting and forecasting for some of these currencies (Kim et al. 2016; Kaminski 2014). The main focus, so far, has been on Bitcoin with little published research on other cryptocurrencies.

In this paper, I investigate how cryptocurrency prices collectively behave and how the price behaviour is related with the sentiment behaviour expressed through Twitter and StockTwits (https://stocktwits.com/) messages that refer explicitly to the related currency. The main question that is asked is whether this market has a characteristic structure, enquiring where the major cryptocurrencies are located within this structure and investigating the role of minor cryptocurrencies in shaping this structure. I study the influence of social sentiment and its interplay with prices. This is done by looking at the entire market (1944 cryptocurrencies recorded during the first 6 months of 2018) instead of concentrating on a few ‘important’ currencies only. I intentionally study the whole market even if most of the capitalization is retained by a few currencies and most of the other currencies play a marginal economic role. From a naive perspective, a-priori one would had expected to observe minor currencies being driven by the behaviour of the major ones in a similar way as it happens for the dynamics of stock prices that tend to cluster around the leading firms of the relative sector (Aste et al. 2010; Song et al. 2012; Musmeci et al. 2014). Surprisingly, it is uncovered instead that this is not happening in the cryptocurrency market. Indeed, this work uncovers signals revealing that these marginal currencies play a statistically significant role in the collective dynamics of prices and their interplay with social sentiment. Therefore, they should not be excluded a-priori from the investigation and their role with respect the major currencies must be studied in detail. This opens new challenges for what concerns investment strategies and risk management which must handle very large number of variables and cannot be limited to the study of a few influential factors.

In this market, both prices and sentiment data are noisy with large volatility; for this reason in this paper dependency and causality are quantified mainly using rank statistics and topology reducing in this way the effect of noisy outliers. A special attention is devoted to statistically validate dependency and causality links by using non-parametric permutation tests and by assessing the effect of the validation threshold on the resulting structure. Also results are cross-tested by comparing the overall structural properties of the networks discarding the null-hypotersis that they might be the expression of random spurious links. This study uncovers a complex structure of interrelations where prices and sentiments influence each other both within a given currency and across currencies. To my knowledge, this is the first attempt to understand dependency and causality structure in this market.

The structure of the cryptocurrency market as unveiled in this work is unavoidably specific to the period investigated, which has been a very special and dramatic period. In this respect, this paper presents a unique picture of a very interesting period of the cryptocurrency market. Despite the fact that already at the time of finishing the revision of this paper the cryptocurrency market has changed significantly, nonetheless some aspects such as the intrinsic nonlinearity in the interactions and the role of ‘minor’ variables on the whole system will rest significant for this market as well as for other systems in the digital economy. Furthermore, this paper contributes to the study of these systems by introducing several general and rigorous methodologies to handle dependency and causality in these noisy and non linear systems composed by a large number of variables and often supported by a small number of observations. These novel methodologies have broad applicability to the study of the digital economy and complex systems in general.

The paper is organised as follows. In Sect. 2 describes the dataset. Section 3 describes the methodology adopted for quantifying dependency, causality, their representation into networks and the statistical validation procedure. Results are presented in Sect. 4 where the properties of dependency and causality networks for both sentiment and prices and their interplay are described in details. Section 5 provides a detailed discussion of the results with special attention at their statistical significance. Conclusions and perspectives are outlined in Sect. 6.

2 Data

Prices and Twitter sentiment data of 1944 cryptocurrencies traded during the period from January 2018 (02/01/2018) to the middle of of June (14/06/2018) are analyzed. In the dataset, four major currencies, namely BTC, LTC, ETH and XRP had records starting earlier, respectively, from: 01/09/2014, 01/09/2014, 07/08/2015 and 21/01/2015. The number of currencies simultaneously present at any time during the period Jan-June 2018 is reported in Fig. 1. This number is not constant because new currencies are introduced over time and other fail and cease to be traded in the market. Often they do not disappear but their capitalisation become negligible and the price become constant and they are, therefore, excluded from the dataset. The largest number of currencies contemporarily present were 1301 as recorded at the end of January 2018. Then numbers gradually decreased to 471 at the end of the observation period. The peak at the end of January 2018 reflects the popularity of ICOs that indeed peaked in that period. Prices have been obtained from Cryptocompare (https://www.cryptocompare.com/) whereas sentiment is provided by PsychSignal [11]. The sentiment signal is computed from natural language processing of Twitter and StockTwits (https://stocktwits.com/) messages that refer explicitly to the related currency. Messages are classified as positive, negative or unclassified depending on the words contained and their context. The analysed signal is the number of messages in each category, referred to as volume. In this work, only the relative changes in positive and negative volumes are considered; they are treated as separate signals and unclassified volumes are ignored. Original data are hourly, though in the following analytics they have been transformed into daily signals by aggregating prices reporting the average daily price and by aggregating volumes reporting the total daily volume. This aggregation process reduces noise. Similar results are obtained with different aggregation criteria.

Fig. 1
figure 1

Number of currencies simultaneously present during the period Jan–Jun 2018

3 Methodology

I investigated collective movements of currency prices and currency sentiment by computing Kendall cross-correlations (Kendall 1938) and non-parametric transfer entropy (Schreiber 2000; Tungsong et al. 2017) of daily log-returns, \(\log Price(t)- \log Price(t-1)\) (differences of the logarithm of the price between a day and the previous), and daily changes of the logarithm of the number of messages classified positive or negative, \(\log \)(Number of messages with positive sentiment on dayt)\(-\log \)(Number of messages with positive sentiment on day\(t-1\)). The choice of the log-returns for prices is standard in financial literature (Campbell et al. 1997). Differencing makes the series stationary and the logarithm reduces effects of non-normal variations. In contrast, the choice of log variation of sentiment volume is here mainly motivated by the convenience of treating both variables in the same way. Test results show that the use of the volume-variations instead of its log-variations gives overall similar outcomes.

I estimated dependency structure by computing Kendall’s \(\tau \) correlation coefficients (Kendall 1938). It has been verified that comparable results are obtained by using Pearson or Spearman correlations. Nonetheless, Kendall correlation are a more appropriate analytics tool for the kind of data investigated in this work. Indeed, the statistics of both sentiment and prices log-variations are non-normal, making a rank estimate more reliable to establish dependency than the Pearson’s counterpart furthermore time-series are short making Kendall preferable to Spearman estimate (Kendall 1938; Pozzi et al. 2008a, b).

Correlations were computed between pairs of variables by using all available days where both variables had observations. Only correlations between pairs of variables with more than 20 common observations are considered. Correlations are validated non-parametrically using a permutation test that compares the observed correlation coefficients with a null (non-correlated) hypothesis generated by randomly shuffling time entries in the series. Observed correlations are considered ‘valid’ only if they deviate from the mean of the random ones by at least three standard deviations [i.e. Z score larger than 3 (Wilks 1932)]. Note that this validation criteria is non-parametric and, therefore, robust also in the present case where correlations do not follow the statistical distribution assumed in standard tests (Kendall et al. 1946).

The dependency structure was analyzed in terms of its topological properties (the validated links structure). For this purpose, the network’s adjacency matrix \(A_{i,j}\) is defined as a matrix with \(A_{i,j}=1\) when the corresponding correlation has \(Z>3\) and it is computed from more than 20 observations; \(A_{i,j}=0\) otherwise.

I computed all combinations of correlations within and across the vaiables: (1) cross correlations of log-price returns; (2) cross correlations of log-volume sentiment changes (for both positive and negative sentiment); (3) the combined cross correlations between price and sentiment log changes (for positive sentiment only).

I also investigated weighted betweenness centrality and closeness measures (Newman 2008) for each node in the validated correlation networks. The weight of an edge (ij) between currency ‘i’ and currency ‘j’ was associated to the relative correlation \(\tau _{i,j}\) as \(w_{i,j} = 1-\tau _{i,j}^2\). Therefore, uncorrelated nodes are connected with edges with cost equal to 1 and perfectly correlated or anti-correlated nodes have zero-cost connection.

Causality was studied by estimating transfer entropy computed by means of a non-parametric histogram methodology, using 4 equally spaced bins [see in Tungsong et al. (2017)]. Transfer entropies were computed for log-price returns and log-volume positive sentiment changes. A validated transfer entropy network was constructed in an analogous way to the validated correlation networks by keeping links generated from time-series combinations longer than 40 days and keeping transfer entropy permutation-test Z score larger than 3. Transfer entropy measures the reduction in uncertainty about the value of a given variable provided by the knowledge of the previous values of another variable discounting for the information from the past of the variable itself. In this case, I tested the causal effect of positive sentiment on the next day prices and -conversely- the causal effect of prices on next day positive sentiment across all currencies. I also compared transfer entropy results with the Granger causality approach that uses linear regression (Granger 1969, 1980). The outcomes of the two methods are overall consistent and here only the results for the non parametric method that obtains a larger number of validated causal links are reported. It must be noted that, in the linear case, when variables follow a multivariate normal distribution, the transfer entropy method is identical to the well-known Granger causality approach (Barnett et al. 2009). However, it is clear that the dataset under investigation is not following a multivariate normal distribution and, therefore, the non-parametric transfer entropy approach must be adopted. The fact that a larger number of valid links are obtained with the non-parametric transfer entropy approach reinforces the point that this system of variables must be properly described with non-normal multivariate statistics. For the non-parametric histogram approach different binnings have been used observing that results are affected by the choice of the bins but overall outcomes are consistent over a range of bins from 3 to 6.

Under normality assumptions a Z score larger than 3 would imply rejection of null hypothesis with p value below 0.13%. In this paper, \(Z>3\) is used as a threshold to eliminate noise from the correlations. However, this threshold on the Z-score is not directly associated with p value null hypothesis rejection. Indeed, in this case, p value is affected by the fact that statistics are not normal and samples are small. A precise testing of statistical significance is beyond the purposes of this paper however it is crucial to establish if the uncovered structures are reflecting dependency and causalities among the variables or they are just picking randomly spurious interactions from a large number of possibilities on very noisy data. To this purpose I also tested validation at \(Z>6\) which, under normality assumptions, would imply rejection of null hypotheses with p value below \(10^{-9}\). Outcomes from \(Z>6\) were consistent with the analysis with \(Z>3\) but networks become extremely sparse to the point that the transfer entropy network becomes largely disconnected into small clusters and isolated nodes. I, therefore, also looked at similarity between the various networks using the network from cross-correlation of log-price returns as a structure-template. The hypothesis tested in this case was that significant structural similarity being incompatible with random networks.

4 Results

4.1 Price–price and sentiment–sentiment cross-correlation validated networks

Fig. 2
figure 2

Complementary cumulative degree distribution (Probability \((k > x\)) for the validated Kendall cross correlation networks constructed from a the cross correlations of log-price returns and b cross correlations of log-volume sentiment changes for both positive and negative sentiments. The degrees of Bitcoin (BTC), bitcoin cash (BCH), ethereum (ETH), litecoin (LTC) and Ripple (XTC) are indicated explicitly with symbols

Fig. 3
figure 3

Closeness and betweenness-centrality complementary cumulative probability distributions computed over the validated networks using weights \(w_{i,j} = 1-\tau _{i,j}^2\)

I first computed the validated networks from cross correlation of: (1) log-prices; (2) positive sentiment log-volume variations; (3) negative sentiment log-volume variations. These are symmetric matrices of size \(1944\times 1944\) with ones on the diagonal. I observed predominately positive correlations with average correlation between log-prices variations being equal to 0.40, average correlation between positive sentiment log-volume variations being equal to 0.18 and average correlation between the negative sentiment log-volume variations being equal to 0.22.

I computed the degree distribution by considering for each currency i the number of other currencies j with which it shares a statistically validated correlation (\(k_i=\sum _j A_{i,j}\)). The valid correlation networks are sparse with the network from price log-returns correlations having 15% of valid links and average degree of 300.7. In contrast, the positive and the negative sentiment volume networks have, respectively, average degrees equal to 16.3 and 10.7. All networks have one connected giant component, a few small clusters and several isolated nodes. The sizes of the giant components are, respectively, 1216, 730 and 564 for price, positive and negative sentiment networks. Results for the complementary cumulative degree distributions (Probability(\(k_i > x\))) are reported in Fig. 2a, b for the three networks. In the figures the degrees of Bitcoin (BTC), Bitcoin Cash (BCH), Ethereum (ETH), Litecoin (LTC) and Ripple (XTC) are indicated with symbols. A summary of the results for the major currencies is reported in Table 1. Note that in the price network these major cryptocurrencies have high degrees between 800 and 900 ranking in the top 10% of highly connected nodes being, therefore, hubs within the connected component. Conversely, these currencies have relatively low degrees in the sentiment networks ranking below 50% in the positive sentiment network and just above 50% in the negative sentiment network with number of connections between 10 and 50.

To better understand the relative positioning within the cryptocurrency market also with respect to the weighting of the correlations, I computed closeness and centrality distributions. These weighted measures, computed over the validated networks, are reported in Fig. 3. One can observe that for the closeness the relative ranking of the five major cryptocurrencies is similar to the ones observed for the degree distribution; conversely the betweenness-centrality places all major cryptocurrencies into medium/peripheral rankings.

4.2 Price-sentiment validated correlation network

From now on I consider only positive volume sentiment. This choice is to simplify computation and description of the results. I investigated the Kendall cross correlations between log variation of positive sentiment volume and log variations of price. This is an asymmetric \(1944\times 1944\) matrix representing a bipartite undirected network.

The diagonal elements of this matrix are the correlations between positive sentiment and price for each currency. Among the five major cryptocurrencies I observe correlations on the diagonal of: 0.09 BTC, 0.07 BCH, 0.11 ETH, 0.10 LTC and 0.05 XPR. Except for BCH and XPR they are all statistically validated with \(Z> 3\) and series length over 20 points (BCH and XPR have instead \(Z=1.1\) and 1.7, respectively). Overall, only 1% of currency log-price variations have a valid correlation with their own log positive sentiment volume variations; they have mostly positive correlations but there are a few with negative valid correlations as well.

Fig. 4
figure 4

In-degree and out-degree complementary cumulative distributions for the validated Kendall cross correlation network between log variations of price of one currency and log variation of positive sentiment volume of another. The ‘impacted’ distribution is counting the number of valid links with other currencies whose positive sentiment is affected by the currency price. The ‘impacting’ distribution is counting the number of valid links with other currencies whose price is affected by the currency positive sentiment

The off-diagonal elements, \(\tau _{i,j}\)\(i\not =j\), of this matrix are non-symmetric ( \(\tau _{i,j}\not =\tau _{j,i}\)). They represent, respectively: \(\tau _{i,j}\) the correlation of positive sentiment of currency i with price of currency j; \(\tau _{j,i}\) the correlation of positive sentiment of currency j with price of currency i. Here two kinds of degrees must be distinguished : (1) ‘impacting’ degree which is the sum of the valid entries over the columns (\(Ig_i=\sum _{j}A_{i,j}\)); (2) ‘impacted’ degree which is the sum of the valid elements over the rows (\(Id_j=\sum _{i}A_{i,j}\)). Note that, in the literature, these degrees are commonly referred as in-degree and out-degree (Newman 2008); however, in this case this underlying implicit representation of the graph as a directed graph can be misleading implying some sort of causality that is not measured here (it will be measured with Transfer Entropy as reported in the next session). The ‘impacting’ degree of a given currency i is counting the number of valid links with other currencies j whose price is affected by the currency positive sentiment. Conversely ‘impacted’ degree of a given currency i is counting the number of valid links with other currencies j whose sentiment is affected by the currency price. It results that this off-diagonal matrix has 0.2% validated entries. The average degree is 3.1 for both impacting and impacted degrees. The degree distributions are reported in Fig. 4. One can observe that the distribution of the impacting degree has fatter tails than the one of the impacted degree indicating that large variations of sentiment of a given currency are more influential on other currency price variations than large changes in currency price to other currency sentiment. Given that the average degree is the same for both distributions this implies that -conversely- small variations of sentiment of a given currency is more influential to other currency prices variations than small changes in currency price to other currency sentiment. In particular one can observe that changes in Bitcoin sentiment are correlated above validation threshold with changes in prices of almost eighty other currencies whereas changes in Bitcoin price have valid correlation links to only ten other currency sentiment changes. A summary of the results for the major currencies is reported in left columns of Table 1.

It must be stressed that correlation is not causality and from the previous results one cannot conclude what is the cause and what is the effect. For this purpose other kinds of measures must be used as I shall proceed to the next section with transfer entropy.

Table 1 Summary of results for the five major currencies. From left, the first column reports the Z validation threshold. The following reports the currency tickers. Then the following three columns report the degree in the valid cross correlation networks for prices, positive sentiment and negative sentiment. The following two columns report respectively the impacting and impacted degree for the positive sentiment - price valid correlation network. Finally, the last four columns report degrees in the valid transfer entropy network

4.3 Price-sentiment transfer entropy causality network

To quantify causal relations between sentiment and price in the cryptocurrency market, I computed non parametric transfer entropy between log variation of positive sentiment volume and log variations of price and vice versa. These are two \(1944\times 1944\) asymmetric matrices representing bipartite directed networks.

The diagonals of these matrices report, respectively, the causal influence of sentiment over price and the causal influence of price over sentiment for each currency. As for the correlations only the valid entries (over 40 common observations and \(Z>3\)) are retained. I observed that the overall information flow (difference between the transfer entropy between sentiment to price and price to sentiment) is positive indicating for each currency that more information is transferred from past price to future sentiment than the contrary. However, only about 2% of currencies have valid causality relations with 19 currencies having stronger causal influence of price over sentiment and, conversely, other 11 currencies with stronger causal influence of sentiment over price. Interestingly, none of the five major currencies has valid internal price-sentiment causality in either directions.

Fig. 5
figure 5

Complementary cumulative degree distributions for the validated transfer entropy network. a ’Impacting’ distribution: number of other currencies influenced by a given currency. b ’Impacted’ distribution: number of other currencies influencing a given currency. The plots report both the validated transfer entropy network for prices causing sentiment and the network for sentiment causing prices

The off-diagonal elements estimate the causal influence between sentiment in currency i on price of currency j and, conversely, the causal influence between price in currency i on sentiment of currency j. These matrices are sparse with only about 0.3% valid entries (about 10,000 causality links). Here I observed that the overall information flow is in the direction sentiment to price indicating that the past sentiment of other currencies influences the future price of a given currency more than the effect of past prices over future sentiment. Conversely the number of validated causality links is 13,179 for prices causing sentiment and instead 10,352 for sentiment causing prices. The price causing sentiment network has average degree 6.8 and it has one giant component with 1023 elements. Similarly, the sentiment causing price network has average degree 5.3 and one giant component with 1018 elements. The degree distributions of the causality networks are shown in Fig. 5. As in the previous case, two distributions are reported: the ‘impacting’ and the ‘impacted’, the first being the number of all other currencies that act a valid causality over a given currency, the latter being the number of all other currencies that react with valid causality from a given currency. These two degrees are computed for both the price causing sentiment and the sentiment causing price networks. One observes that the five major currencies are spread in a central region of the ranking with respect to the other currencies, with Bitcoin sentiment being among the most impactful on other currency prices but with Bitcoin price being the least impacted by other currency sentiment.

Summary of the results for the major currencies is reported in the last three columns of Table 1. One can indeed see that BTC positive sentiment is causing prices in 15 other currencies whereas only 8 other currencies sentiment are causing BTC price. Note also that ETH positive sentiment is the most impacted by other currencies prices and LTC price is caused by the largest number of other currencies positive sentiment. Finally, BCH causality is driven by sentiment much more than by prices.

I analyzed whether the relative position of a currency in the price network has an effect on the relation between this currency and sentiment. To this end I looked at the top 25% most central currencies in the price cross correlation network in terms of weighted betweenness centrality. Then the transfer entropies of price causing sentiment and sentiment causing prices are computed for these currencies and compared the number of causal relations with the ones for the bottom 25% most peripheral currencies in the price cross correlation network. Results show that central currencies have ten times more causality links than the peripheral counterparts. Indeed, the top 20% central currencies account already for 50% of total causality links. Intriguingly, the signal is larger for sentiment causing prices than for prices causing sentiment.

4.4 Network significance from the comparison between price and sentiment networks

The analyzed data are very noisy, they follow non-normal distributions and millions of relations between variables were tested. Spurious dependency and causality relations are certainly present. What must be tested is if the unveiled structural properties are real features of the system or only spurious consequences of noise and randomness. To this purpose I first tested different levels of validation from \(Z>2\) to \(Z>6\) verifying that the results are consistent and persistent for different validation thresholds. Some of these results for \(Z>6\) are reported in the bottom part of Table 1. Note that, within normal statistics assumptions \(Z>6\), would correspond to p-values below \(10^{-9}\) and nonetheless some of the results previously reported especially for the price cross correlation network are still retreived. However, at this threshold, the transfer entropy network does not have any longer a giant component with the larger cluster having only 36 elements and average degree being 0.1. Overall, this analysis at large Z thresholds gives us some confidence but still provides us with inconclusive answers about the significance of the results, indeed the non-normality of the statistics can strongly affect the corresponding statistics of the Z-score with sizeable likelihood of spurious results even at this threshold levels.

I, therefore, decided to adopt a different approach and, instead of trying to statistically validate each network, I cross-validate results by comparing metrics from networks build from unrelated signals, namely, the price, the positive and the negative sentiment. I argue that if, for instance, the network from sentiment correlations has significantly similar properties with the network from price correlations it is highly unlikely that the two represent random spurious correlations. This was done by comparing the degree centrality (degree of each vertex) of the various networks at different validation thresholds. Sperman correlation was used for the quantification of the similarity between these measures. Results are reported in Table 2 where one can see that there are large and statistically significant correlations (t test p values smaller than \(10^{-45}\)) between all networks analyzed in this paper at all levels of validation thresholding from \(Z>3\) to \(Z>6\). Note that similarities between the correlation networks tend to increase with thresholding value up to \(Z^* = 4\) and then decrease afterwards. Whereas the similarities with the combined Transfer Entropies network has maxima at \(Z^*=3\). The similarity increase with \(Z^*\) in the correlation networks is consequence of the reduction in the noise and the decrease is instead the consequence of the reduction in statistics. In the table, results from the sentiment-price network are not included to avoid confusion and also because they are less significant given that the network is already built from the two signals. Yet, results are well in line with the one reported in Table 2 with correlations ranging between 90 and 45%.

Table 2 Spearman correlations between degree centralities in the dependency and causality networks from prices and sentiment signals

5 Discussion

The first and most important comment concerning this work is that data are very noisy. Price data have a slightly stronger signal than sentiment ones but in both cases noise is predominant. Nonetheless, the presence of a significant structural organization both in the correlations and in the transfer entropy is demonstrated.

Concerning the correlation analytics this paper shows that price-price dependency have larger correlations but sentiment-sentiment and also sentiment-prices show valid and positive correlations. Not surprising, it is observed that Bitcoin and the other four major currencies have strong dependency ties with the prices of a vast number of other currencies. More surprisingly, it is observed that, in contrast, in the sentiment dependency network these major cryptocurrencies are not highly connected. This is also reflected in the closeness and centrality measures that see all major currencies in non-central positions in the network with exception only for the closeness measure for the price network. The sentiment-price correlation network also reflects mainly positive dependencies with major currencies having only average or just slightly above average degrees with exception for the dependency between Bitcoin sentiment and other currency prices that reveal instead very strong dependency connections.

The transfer entropy has a lower fraction of valid links. This is mainly due to the fact that this measure requires the estimate of a probability distribution between three variables which is hard to estimate well with the short time-series in this dataset. Nonetheless, I observe a sizable fraction of valid causality links with most information flowing from prices to sentiment for each currency but instead from sentiment to price when the cross-effect of a currency on another is considered. Interestingly, in terms of number of valid links I observe a larger number of causality links for prices causing sentiment than for sentiment causing prices. This indicates that causality of sentiment over price carries a larger amount of information but also a larger amount of noise and therefore it is validated only at higher transfer entropy values.

The comparison between causality of the central nodes in the prices network with respect the peripheral ones for what concerns the effect of sentiment over prices and prices over sentiment shows that currencies that are central to the systems in term of price behavior are also the ones that most strongly influence the sentiment in the whole system. This is an interesting finding also in the light of the results in Pozzi et al. (2013) that uncovered the great difference between central and peripheral vertices in terms of investment performances and risk. Note that the centre of the prices correlation network contains the five major currencies, however, they are not the main responsible for the causality effect.

It has been already stressed that only statistically validated dependency and causality links are considered providing, therefore, some confidence that weak noisy links are removed. However, statistics is not normal and in this system there are almost four million possible relations between variables and some might turn out to be validated just as the effect of random fluctuations. I argued that the proof that, overall, results are robust and not reporting just incidental spurious relations must be searched in the similarity of metrics of networks extracted with different methodologies (Kendall correlations or Transfer Entropy) from different signals (prices or sentiment). In this respect, the strong correlations reported in Table 2 are a good indication that these systems have a consistent structural organization with prices and sentiments influencing each-other in a significant way.

6 Conclusions

This study demonstrates that the current cryptocurrency market has a complex structure. Major, highly capitalized cryptocurrencies and minor little capitalized ones are interlocked into this complex structure with major currencies playing central roles only for the price dependency network. Sentiment and prices are interconnected and they show both dependency and causality mainly between different currencies.

Social sentiment plays a very important role in this market with Bitcoin sentiment correlating with other currencies prices even more than with its own price and with validated causal measures showing that sentiment is more influential on price than the contrary.

An unexpected outcome of this research is that minor low-capitalised currencies are playing a very important role in moving the market sentiment and consequently are significantly affecting prices also of the highly capitalised currencies. This is a fundamental difference from traditional markets where the driving economic factors are typically reflected into the dependency and causality structure (Aste et al. 2010; Song et al. 2012; Musmeci et al. 2014). The fact that economically irrelevant variables can have influence on the whole structure of the system is, however, a typical feature of complex systems where the system cannot be understood from the analysis of its parts in isolation (Aste and Di Matteo 2010). This indicates that the study of cryptocurrencies and more generally of the digital economy require the development of tools beyond traditional approaches with use of instruments from the science of complex systems.

Cryptocurrencies are increasingly traded and are becoming part of mainstream investment choices. From a risk-management and investment perspectives the present investigation unveil that the overall market dynamics is dominated by noise, large volatility and large failure rates. This is, therefore, a highly risky domain where most of the traditional risk management and asset allocation instruments are likely to be ineffective. Complex system science (Aste and Di Matteo 2010) can guide us into the development of new tools for modelling, managing risk and design investment strategies for these markets and the new digital economy.

This paper is a first attempt to explore the very vast and intricate field of cryptocurrency market. My efforts have been mostly dedicated to perform a statistically rigorous investigation of the whole market using innovative tools such as network measures, non-linear quantification of dependency and causality and non-parametric validation techniques. The results are robust despite the very challenging task to infer, from short time-series, non-linear interrelations in a very large multivariate system.

These are extremely dynamical systems that change continuously. My analysis is limited to a short period of time and the system has already changed between the time when the system was analysed and the publication of this paper. This is an unavoidable reality in these system and the contribution of this paper is not primarily about the actual specific properties of the cryptocurrencies market during the period investigated but some general facts, such as the influence of minor currencies on the whole system, that are likely to remain in the future and to be also characteristic of other systems. Further, an important contribution of this paper is the introduction of a set of rigorous innovative methodologies for the study of systems composed of a very large set of variables with non-liner interactions and with small numbers of available observations. This is a very general challenge common to most socio-economic and complex systems where the methods introduced with this paper can be conveniently adopted in the future.

Much more must be done in future. For instance, in the study of the interactions between prices and sentiment has been neglected, for simplicity, the negative sentiment. It is, however, clear that this plays a very important role which appears to be not trivially related to the positive one. Also many choices have been made, starting from the Z statistics validation threshold or the use of log-variation of sentiment volumes or the choice of considering all currencies and not only the few with relevant market share. Different choices produce different results. In this investigations I verified that the overall reported results are robust and these are retrieved similarly by adopting different choices. However, a more extensive and systematic study is necessary.