1 Introduction

Bitcoin is the pioneer of the increasingly popular cryptocurrencies and also the unique example of a large-scale sustainable payment system, in which all the financial transactions are publicly available (Whitepaper, Nakamoto 2008). It is issued by no central authority like government, bank or organization rather by mathematical cryptographic protocols in a distributed network system. In this system, the users pseudo-anonymously exchange bitcoins. Some special users are called miners, who only can mint bitcoins in the system by solving cryptographic puzzles by donating high computing power and in return, they earn bitcoins as their proof of work. So far, economic literature on the bitcoin issue is quite limited. Fergal and Martin (2013) and Ron and Shamir (2013) among the researchers of various background had successfully drawn the attention to the analytical aspects related to the information contained in the blockchain. Due to its still relatively low acceptance in the foreign exchange market and its poor performance as a medium of the store of value, some attention in the academic world has attracted the discussion on whether bitcoin can be considered a currency. But the trust in this currency totally comes not on the belief in central monetary authority rather computer technology and cryptography.

This paper is basically focused on three aspects of blockchain-based open source financial data. First of all, we re-structured blockchain’s already extracted database (Kondor et al. 2014) for generating a hawk-eye view to observe bitcoin’s daily transaction patterns to understand its economic growth with the course of time. Secondly, we had done the descriptive analysis on that database of daily transaction number and bitcoin volume to understand some of the most interesting and informative statistical distribution. Finally, we had investigated the rank distribution of some of the distinct transactions and their descriptive statistical facts to extract some network topological features.

The blockchain is one of the revolutionary databases that had been evolving since the last decade. It stores any information in a decentralized computing system and once stored, data inside it can never be altered or manipulated. It is transparently accessible to all the users logged in the database and they can view all the information published in the blockchain. Bitcoin cryptocurrency along with the sender and receivers’ addresses in the form of a ledger is the financial monetary information that stored inside a blockchain and has introduced world’s first successfully implemented fully digital cryptocurrency. It solved two real-world problems like “double spending” (Diego and Giovanni 2017) and “duplication problem” and made an alternative way to establish fully functional virtual currency based financial system.

2 Data set

The data used for this paper have been downloaded from the website of the Hungarian bitcoin research group (http://www.vo.elte.hu/bitcoin). Their reconstructed database comprises transaction data (sending and receiving bitcoins) with sending and receiving addresses extracted from the blockchain network constituting the time duration from January 2009 to February 2018. The available data have been uploaded on the website in text files and some of the blockchain’s extracted parameters have been mapped with randomly generated numbers in order to allocate those efficiently by the research group. Long characters of hashes have been mapped to random indicators, for example, BlockID which starts from numerical 0 value, representing the genesis block (first block of the bitcoin blockchain), and ends up to the value of 501418, which is the last block to download on the cutoff date of the month of February 2018.

For our research purpose, we have further restructured the data. The structure of the data has been shown in Fig. 1.

Fig. 1
figure 1

The final reconstructed database that generates the summation of bitcoin volume for inputs and outputs of the transactions recorded during Jan 2009 to Feb 2018 with the block time UTC timestamp

After reconstructing the database, we had fixed our timestamp units into each day from time duration of 1st of January 2009, when the bitcoin blockchain first initiated, to the cut off date of 8th of February 2018. We summed up the transaction count for each of those days and also summed up the input volume of bitcoin for each transaction. In each block, each bitcoin transaction follows either of the two rules as an input–output relationship in terms of bitcoin volume, i.e., input = output or input = output + transaction fees. This is the reason we have summed up the bitcoin volume of each input of transactions which represents the actual volume of bitcoin exchanged through one transaction to another. Our database excludes the transactions of the miner’s bitcoin generation which are called the coinbase transactions, having no inputs, which were filtered out separately to form a separate database to merge into our analysis.

A glance at final data for our analysis has been shown in Tables 1 and 2.

Table 1 The sample of final data for transaction count and Bitcoin volume involved in each day transaction from a data compiled from Jan 2009 to Feb 2018
Table 2 The sample of final data for transaction count and Bitcoin volume involved in each day transaction from a data compiled from Jan 2009 to Feb 2018

In Table 3, we had the sampled price data with time duration of 9 years since bitcoin genesis block published from the beginning of January 2009. We have downloaded the market price data from blockchaininfo website (Market-price 2018) and used in our analysis.

Table 3 The sample of Market price in USD per BTC had been compiled from Jan 2009 to Feb 2018

3 Result and analysis

3.1 Descriptive statistics

We have taken the summary statistics from our reconstructed data and extracted some important information. Table 1 has the sample data that comprised of daily number of transactions and total volume of BTC daily transacted by regular users. Table 2 contains the sample data of the daily number of mining transactions and total mined volume daily done by the miners. We had gone through these columns of data to extract some univariate quatitaive information (Table 4).

Table 4 Summary statistics of the complete dataset containing daily BTC volume, daily transaction count, total BTC volume daily mined and the daily count of mining transactions

The sample size of the data is quite satisfactory. The number of transaction count in both tables is adequate to measure the statistical distribution. For the transaction count of miners per day, the mean and median values are almost same meaning that the mining transaction count is normally distributed. On an average, the mining transactions per day are 153, which are almost close to the theoretically calculated value of 144 (6 transactions per hour *24 = 144 transactions/day). The number of regular transaction data shows quite opposite statistics. By observing the central tendency of the data we can clearly see that the median value of both bitcoin volume and transaction count columns are smaller than that of their mean. So the data appeared to be rightly skewed. The histogram of the data shows that a majority of the data are located on the low side of the graph. Often, skewness is easiest to detect with a histogram or boxplot (Fig. 2).

Fig. 2
figure 2

Histogram of the dataset containing daily transaction count and total BTC volume transacted daily by the regular users

Clearly, we can find some outliers on the high side that strongly affected the results of our analysis. The miner’s bitcoin volume per day and mining transaction count dataset also show significant skewness and kurtosis leading to our analysis strategy of observing the data in logscale. Besides, the high mean and median values show that for the dataset it is an ideal example of a distribution that has a stronger peak, more rapid decay, and heavier tails (Fig. 3).

Fig. 3
figure 3

Histogram of the dataset containing daily mining transaction count and daily total volume of mined bitcoin

3.2 Auto-correlation function of BTC volume, price and number of transactions

In the previous section, we employed statistical approaches to viewing the distribution of daily BTC volume and daily number of transactions. Here, we are using auto-correlation function to see if we could predict the direction of daily log returns. The log return can be defined as:

$$ x(t) = \log \frac{z(t)}{z(t - 1)}, $$
(1)

where x(t) denoted log return of a variable z(t) on day t. We measured the log return in order to make the series stationary for the empirical analysis. Now in our case, we calculated the log return of the BTC volume v(t), and the number of transaction TX(t) and the daily price data P(t), downloaded from the blockchaininfo website. We see in the Fig. 4a–c the stationary time-series log return plots of volume, the number of daily transaction and price data. The time-series data have been selected from 2013-01-01 to 2018-02-08 in order to maintain the consistency.

Fig. 4
figure 4

The log return of a daily BTC volume, b daily number of transactions, c daily price

We plotted the auto-correlation function of the three daily returns with the previous lags. The dotted line is the 95% confidence interval. For the BTC volume and number of transactions, the time-scale for relaxation was found approximately a week as shown in Fig. 5a, b. For price data, in Fig. 5c the ACF vanished at the lag of 1 day. This is reasonable as otherwise, one can do a linear prediction for up or down of tomorrow’s price based on that of today.

Fig. 5
figure 5

Auto-correlation function of a BTC volume return, b number of transaction return, c price return

3.3 The evolution of bitcoin transactions (a bird-eye view)

The distribution of transaction count to volume with the evolution of time has been plotted. An interesting set of observables to better understand the underlying evolution of a unique financial system has been demonstrated. We found that there is some impulse of the volume of bitcoin transaction in the different time slots. Our research focused on this evolution of the financial system is after 2013 when bitcoin is a full-fledged matured currency used by people by trading goods and services.

Figure 6 shows that the daily exchange of bitcoin volume substantially increased after 2013 and onwards. In Fig. 7, we have plotted the same graph in log scale and showed the average volume quantity flowing through the number of daily transactions. The weekly pattern of volume flow observed in the graph proves bitcoin having a solid real economic financial system that we have statistically derived in the next section. In Fig. 8, we have portrayed the volume of BTC compared with price and observed the high price volatility.

Fig. 6
figure 6

The time-series data of BTC volume and number of transactions per day

Fig. 7
figure 7

Log scale plot to understand the weekly pattern of exchange of bitcoin transaction

Fig. 8
figure 8

Price evolution of bitcoin volume transacted per day

We have plotted another graph in Fig. 9 where we showed the time evolution of price, the number of mined transactions and BTC volume. We observed that the number of supply mining transaction has quite stable throughout the time series.

Fig. 9
figure 9

Price evolution of bitcoin volume transacted and mined per day in log scale

3.4 The weekly pattern of bitcoin volume and transactions

We can observe that the volume per transaction became relatively stable after 2013, while it was so volatile before the year (see Fig. 6, for example). Also, it is known that bitcoin mining to generate blocks has been quite stable since the year 2013. So let us use data from January 1, 2013, in the following analysis of power spectrum.

Consider a time-series xn with n =0, 1,…,N − 1, where N is the length of the time-series. Discrete Fourier transform of xn is given by

$$ X_{k} = \mathop \sum \limits_{n = 0}^{N - 1} x_{n} {\text{e}}^{ - 2\pi nik/N} , $$
(2)

where k  = 0, 1,…,N − 1, and i = √− 1, i.e., the unit of imaginary number. Obviously (1) is periodic in k with period N, so one can adopt the convention that Xk = XNk. Because xn is real, it follows that \( X_{ - k} \, = \,X_{k}^{*} \), where * denotes the complex conjugate. Frequency fk corresponding to k is defined by

$$ f_{k} = k/N. $$
(3)

The range of frequency can be regarded as − 0.5 ≤ fk≤ 0.5.

Power spectrum or periodogram is defined by

$$ P(f_{k} ) = \frac{ 1 }{N}\left| {X_{k} } \right|^{2} , $$
(4)

where |Xk| denotes modulus or magnitude of Xk. Because \( X_{ - k} \, = \,X_{k}^{*} \), one can focus on the range 0.0 ≤ fk≤ 0.5. P(f) represents how much oscillator or harmonic movement with the frequency f and, equivalently, the periodicity T = 1/f is contained in the original time-series xn. Therefore, f =0.5 corresponds to T = 2, namely most highly oscillated movement; f → 0 is T → ∞. One often uses smoothed periodogram by applying a filter to the raw periodogram (Bloomeld 2000).

We apply the method of smoothed periodogram for the time-series of daily volume Vn and a daily number of transactions Tn (where n denotes time in the day) in order to find periodicity in them. From Fig. 6, it is obvious that the time-series of volume and transactions have a trend of exponential growth; it would be natural to take logarithms of them and to consider the time-series, xn=log Vn and xn=log Tn.

First, we segment the data into different days of the week, namely n = Sun, Mon,…..Sat, and calculated averages and standard error (defined by standard deviation divided by the square root of a number of data in each collection). The result is given in Fig. 10. One can see that the level of volume or transaction is higher during weekday than weekend; in other words, there exists a weekly pattern that is not obvious in Fig. 6.

Fig. 10
figure 10

Average of a logarithm of number of daily transaction, b logarithm of the daily volume of week. Error bar is the standard error

Additionally, we performed the above method of periodogram for each of the time-series. We employed a detrending by removing the mean of the series and subtracting a linear trend, a tapering with 10% at the beginning and end of the series, and a modified Daniell smoothing with successive simple moving averages of lengths 6 and 12 (Bloomeld 2000).

The result is given in Fig. 11. For both of volume and number of transactions, one can observe an obvious periodicity at f = 1/7 or equivalently T = 7 days as denoted by the dotted vertical line. Also present are higher order harmonics at f = 2/7 and so on. On the other hand, there is an overall increase of power spectrum towards f→ 0, corresponding to the trend of exponential growth which is already shown in Figs. 4 and 5.

Fig. 11
figure 11

Power spectrum as smoothed periodogram for the time-series of a logarithm of daily volume, b logarithm of daily number of transactions

3.5 The outliers’ transaction patterns

In our analysis, we had concentrated on two of the timeslots to find the outliers transaction pattern. Both the patterns have an unusually high spike of bitcoin volume within 1 month recorded from January 2016 to February 2016 shown in Fig. 12 and even though the there was not much variation of price there was a very big volume bitcoin circulated during the last week of January 2016.

Fig. 12
figure 12

A closer look at the unusually high volume of transaction happened on last week of January 2016

There was an interesting finding in Fig. 13. The number of transaction appeared in January stayed in a range of 0.18–0.24 million and the BTC volume spike cropped up in the last week of January. This indicates that there must be some big volume of BTC flow happened on each of some specific numbers of transactions as the total number of the daily transactions are within the regular range. This resulted in the existence of some outliers’ transactions, which are responsible for the big volume of BTC transacted during that last week. By selecting this transaction pattern timeslot of 1 week (from 21st to 28th January 2016), we recalled our main reconstructed database to find out the list of individual transactions involved in that timeframe. We have created a volume to rank distribution log–log plot in Fig. 14 to understand more about the outliers activities. From the plot, it has been clearly observed that a considerable number of transactions possess the low ranks at the tail of the distribution. Besides, the steep shape of the tail suggested that there is the large rank of transactions that contained a large volume of bitcoin flow.

Fig. 13
figure 13

A closer look of BTC volume to the number of transactions during the month of January 2016

Fig. 14
figure 14

Bitcoin volume rank distribution (log–log scale) from 21st Jan 2016 to 28th Jan 2016

But which are those outliers’ transactions that were distinctively traceable among the rest? The most direct method we applied is to use quantiles. The quantiles are values which divide the distribution such that there is a given proportion of observations below the quantile. Mathematically, we estimate the q quantile, the value such that a proportion q will be below it, as follows. We have n ordered observations which divide the scale into n +1 parts: below the lowest observation, above the highest and between each adjacent pair. We set this equal to q and get

$$ i = q (n + { 1}). $$
(5)

If i is an integer, the ith observation is the required quantile estimate. If not, let j be the integer part of i, the part before the decimal point. The quantile will lie between the jth and (j + 1)th observations. The proportion of the distribution which lies below the ith observation is estimated by i /(n + 1). We estimated it by xj + (x j+1 − xj) times (i − j).

Now focusing on the individual transactions of the whole month of January–February 2016, we had calculated the quantile of BTC volume for each transaction which gave us the statistical insight of what percentage of daily transactions have a certain limit of BTC volume involved. As shown in Fig. 15, 99% and 100% quantiles on 21st–24th January have very interesting and have a statistical outlier pattern. For example, there is a transaction on 22nd January that has 40000 bitcoin involved in it. On 24th January, 1% of total daily transactions has 6000–10000 BTC volume at each transaction.

Fig. 15
figure 15

Quantile calculation for BTC volume/transaction for the January–February 2016

3.6 Directed transaction graph and degree correlation to visualize outliers’ activities

The directed transaction graph represents the flow of BTC between transactions (Fergal and Martin 2013). Each node represents transactions and each directed edge between the source that is an input (previous transaction) and a target represents an output of transactions (current transactions) as shown in the Fig. 16. Each directed edge also includes a value of BTC flow (Michael et al. 2015). Thus, a transaction graph table can be constructed for each transaction as a node having a number of incoming connections called in-degree and outgoing connections called out-degree.

Fig. 16
figure 16

Transaction graph

Mathematically, the in-degree of node i is the total number of connections onto node i and is the sum of the ith row of the adjacency matrix:

$$ k_{i}^{\text{in}} = \mathop \sum \limits_{j} a_{ij} . $$
(6)

On the other hand, the out-degree of node i is the total number of connections coming from node i and is the sum of the ith column of the adjacency matrix

$$ k_{i}^{\text{out}} = \mathop \sum \limits_{j} a_{ji} . $$
(7)

The degree correlation is the relation between \( k_{i}^{\text{in}} \) and \( k_{i}^{\text{out}} \) and sometimes can make a large difference to the effective properties of the complex network.

In our analysis, after considering the outcomes of the quantile plot, we have constructed a transaction graph. It includes all the transactions occurred in between 21st to 24th January 2016. All the transactions are represented as the nodes and links are represented as the input connections from the previous transactions to the current one. We measured the degree correlation of the graph by plotting the in-degree vs out-degree of each transaction node as shown in Fig. 17. The degree correlation graph visually described important network properties such as how many addresses involved as in-degree from the previous transactions and out-degree to the next ones. We came across an important finding that there was a transaction which has 3033 out-degrees with only 1 in-degree and also there was transaction with 633 in-degrees with 1 out-degree. This leads to the possible explanation of historical events soft fork of Bitcoin improvement protocol up-gradation called BIP-144 that was taking place during that period of time. These outlier transactions could be the result of the bitcoin developers’ experiment.

Fig. 17
figure 17

The degree correlation of in-degree and out-degree a heat map, b 3D plot for transaction graph constructed in time series of 21st to 24th January 2016

4 Conclusion

We investigated on the unique transaction volume patterns and based on that we developed a methodology to extract interesting findings from a reconstructed database that has been extracted from blockchain system. We have found out that there are weekly patterns in a bitcoin volume to the price per day graph and there is a clear sign of economic financial trading of bitcoin flow among the transactions. The pattern of weekly trading shown in our analysis helped to investigate more on the specific impulses of transactions in a more focused timeslot. We have analyzed each transaction and bitcoin volume involved in that timeslot. The volume rank distribution helped us to identify outliers transactions with the largest volume of bitcoin involved in it. The SegWit (Segregated Witness) and its effect in terms of the soft fork and hard fork debate were heating up during the beginning of January 2016 that might be one of the causes of this large amount of bitcoin flow in some outlier transactions.

The bitcoin system has historically gone through a lot of up-gradation which was termed as BIP (Bitcoin improvement proposal). In our focused tenure of January 2016, there was a big buzz in the bitcoin community that the size of block in blockchain needed to be increased. The result would make the transactions per second become faster. So BIP-144 Proposal to increase maximum possible block size starting at 8 MB was proposed in January 2016. Increasing the block size would increase the scalability but it will reduce the transaction fee which in turn would not profitable for miners since a block can hold more data and transactions that can use that new space and thus may be resulted in cheaper fees. So, there was a big debate going on between the miners and developers’ community. There was a debate even for adopting either a hard fork or soft fork. Soft forks allow compatible changes. With soft forks, the old and new software can co-exist on the network. Hard forks break compatibility of all previous Bitcoin software and require every participant to upgrade to the same rules by a deadline or risk losing money. Such events can also harm network effects. After long debate and discussion, it was never merged. So, one conjecture we can make by observing these outlier transaction patterns of January 2016 that, there might be a lot of experimentation going on by the system developer community to observe network performances which might have caused this spike of the huge daily transacted bitcoin volumes.

The future research goal would be to interconnect the flow of bitcoin with address and transactions and find out more results that reveal new ways to understand the topological structure linkage with cryptocurrency evolutionary growth.