1 Introduction

The increase in economic and finance network analysis has enabled the emergence and evolution of rich theories and methodologies over time (Newman et al. 2006), which have a distinct effect on human decision-making and are pervasive. Thus, they are becoming the link between societies and economies. Methodologically, graph theory has refined into a potent and compelling tool for abstruse complex problems.

Although it was difficult to construct financial networks at the start of this decade, the rise of crypto-assets has enabled significant available fortuity, for network-related research and analysis. Earlier, information about transaction details was usually considered sensitive and not available for research (Baumann et al. 2014). The crypto-asset system comprised of a repeatedly expanding list of information reserved in a chain is publicly reachable, and bestows scope to analyze transaction networks in detail.

The global financial crisis in 2008 exposed financial inequalities throughout the world economies. In January 2009, a mysterious figure named “Satoshi Nakamoto” introduced a virtual currency system called “Bitcoin”, which functioned over a cryptography framework called “Blockchain” with an incentive scheme called “Proof of work” (Nakamoto 2019). Bitcoin is a digital currency that archives transactions and administers autonomously the generation of new units of currency inside the blockchain frame of reference. No centralized authority dominates the operation and logging in a distributed system with private key users proves ownership of coins. A consensus algorithm and a public history of transactions have strengthened security to prevent duplication and double-spending (Narayanan et al. 2016).

Publicly attainable transaction data is the main motivation of analyzing financial networks. Several studies have examined the descriptive statistics, network expansion, network topology, and the dynamics of the bitcoin blockchain network. Let us briefly review some of the prolific research that is the key motivation for our own study. The “User Graph” creation and analysis were based on the famous heuristic rule, based on the observation that every input in a multi-input transaction must be linked to a single user, as it knows all the private addresses of those input public addresses. This was elaborated upon in Reid and Harrigan (2013). They also discussed unusual big flows and temporal analyses. Another group of researchers (Ron and Shamir 2013) performed an elaborate quantitative analysis on the big wallets exchange markets, which have a large number of public addresses and by pinpointing the chain with a high range of threshold incoming bitcoin value. Finally, in a detailed investigation, a group of Hungarian researchers (Kondor et al. 2014, 2014), with extensive analysis of transaction networks, applied linear preferential attachment. In their extended work, they proposed a model that shows how structural changes in the network accompany significant changes in the exchange price of bitcoins. This research group has uploaded blockchain data from 2009–2018 (Hungary research group 2020), which is also the main data source of our previous research and this study too.

The above studies establish that the de-anonymization of blockchain is very difficult. However, a possible breach of anonymity represents a significant perturbation in the bitcoin system. The purpose of our research is twofold. We propose a new approach to identifying specific important users inside blockchain. Our research goal is to develop a methodology that can track those users and help us understand their activities inside the system. We also want to focus on the “weekly pattern” that we have discussed in our previous research (Islam et al. 2019) and utilize this behavioral pattern of the user in our de-anonymization methodology.

The structure of this paper is as follows. In Sect. 2, we focus on daily blockchain data between 2013 and 2018. We select this time period as some of the network analysis had already been done for the first 4 years after its launch (Lischke and Fabian 2016). In Sect. 3 we defined the important terms and network graphs with mathematical notations and equations. We perform the topological analysis of the daily networks by dividing those into the sub-domain of time period to reduce computational difficulties. In Sect. 4, we explain the weekly pattern in terms of the topological change of properties. An edge flow threshold analysis allows us to create suitable sub-graphs for filtering out small volume flows. In Sect. 5, we define the “big players” and validate their existence in our data by matching some popular rich bitcoin exchanges. These exchanges are active around the world in between our preferred sub-domain period. Finally, we propose a methodology of identifying exchange markets or crypto-financial institutions from the network, based on the criteria of very high frequency, persistent daily trades, and weekly pattern of their total daily flow.

2 Data

We collected the research data for this study from the web repository contributed by the Hungarian bitcoin research group (Hungary research group 2020). The research group created a reconstructed database after downloading the publicly available bitcoin blockchain, where the compilation of transaction information (sending and receiving of bitcoins among traders) is in a secured and anonymous format, holding timestamp information from January 2009 to February 2018. The research group used existing techniques (Reid and Harrigan 2013; Meiklejohn et al. 2013) to map addresses to users. They have marked all these mapped users with randomly generated unique numbers for convenience of analysis. Other long string ID in the blockchain has been marked with randomly generated numerical IDs for the same reason. The blockchain contained 501,418 blocks that comprised of verified numbers of transactions compassed by the miners. Appendix A.1 contains some statistical findings on the Hungarian research group’s existing database of January 2009 to February 2018.

In our analysis, we have created our own address to user hash dictionary, which includes the uncontracted addresses that were filtered by the Hungarian research group. For that, we used each uncontracted public address as their user ID to distinguish between the two. The blockchain data contain some records that have non-standard transactions (addresses that cannot be decoded by the system). We had filtered out those transactions, as they might lead to ambiguous results. We then included the timestamp information to analyze the temporal change. We only used the output satoshi values as edge attributes, which represents the flow of bitcoin from input nodes to output nodes.

Between 2011 and 2012, bitcoin gained commercial values and began to be used globally and acknowledged economically. Before this, it was limited to a financial innovation and circulated only experimentally among its pioneer users. Furthermore, researchers more frequently analyzed the data of the first 4–5 years after its launch. Considering all these, we had only focused on the data within the period from 1st January 2013 to 8th February 2018. We filtered out the nonstandard transactions that had contained incomplete information to avoid ambiguous statistical results. After separating 113,492,656 self-loop records and summing up all the multi-edges, the total number of edges of our final graph during 2013–2018 reduced to 432,853,828. The number of nodes of this large data set had reduced to 174,250,450.

The monthly node–edge statistics of the 2013–2018 has been demonstrated in the Fig. 1.

Fig. 1
figure 1

Monthly node–edge count from 2013 to 2018

As discussed in the previous sections, a primary focus of this research was to investigate the weekly pattern of network flow; the daily time stamped data for a specific duration would provide us sufficient exploratory results and be computationally beneficial. From Fig. 1, we divided our analysis into two periods. First, the “Active period”, which is represented by the days between 1st July, 2017 to 31st December 2017. Bitcoin gained its maximum price hikes during this period. The second period is the “Quiet Period” comprising the days between 1st January, 2015 and 30th June, 2015.

3 Mathematical notations and methods

3.1 Definitions and notations

The set of all the transactions recorded in the data of blockchain, as explained above, can be regarded as a giant graph or network, in which vertices or nodes are “users” mapped from addresses (see the preceding section), and links or edges are transactions among users. Let us denote by \(\text {Tx}: i\rightarrow j\) a transaction from user i to j. Note that there are multiple transactions for a same pair \(i\rightarrow j\). In addition, self-loops \(i\rightarrow i\) can be present, corresponding to various cases, including the change in a transaction. We call this giant graph a transaction graph, and construct it for the period 2013–2018 after deleting all self-loops. For the transaction graph, we define frequency of individual users as follows:

$$\begin{aligned} f(i):=\text {number of Tx's such that Tx}: i\rightarrow j\text { or }Tx: j\rightarrow i . \end{aligned}$$
(1)

The frequency f(i) can measure how frequent the user i appeared in the transactions that took place during the whole period.

Fig. 2
figure 2

Complementary CDF for frequency of users(acted as input or output) from 2013 to 2018

Figure 2 shows the complementary cumulative distribution function (complementary CDF) for the frequency. One can see that the distribution has a heavy tail with approximately a power law. We listed the top 20 users in the Table 1. We shall use the frequency to define big players.

Table 1 Top 20 users’ total frequency count as input/output during 2013–2018

To investigate a much shorter time-scale than years, let us construct what we call daily graphs from the transaction graph. A daily graph is an aggregation of all the transactions that took place in one day. One could define weekly or monthly graphs; however, we would like to use daily graphs for our study, because we shall focus on the weekly pattern of daily activities of users. Let us denote by \(G_t=(V_t, E_t)\) the daily graph at time t, where t is assumed to be a day (unless otherwise stated), \(V_t\) is the set of vertices of nodes, and \(E_t\) is the set of links or edges. An edge \(e_{ij}\) is an ordered pair (ij), which represents all the transactions \(\text {Tx}: i\rightarrow j\) during the day t. The set of all the users appearing at either end of \(e_{ij}\) is \(V_t\) such that \(i,j\in V_t\). Each edge \(e_{ij}\) has the information about the amount of money transferred from i to j in the units of satoshi (= 1/100,000,000 BTC = \(10^{-8}\) BTC). Note that on day t, there can be more than one transaction \(\text {Tx}: i\rightarrow j\). We aggregated those multiple transactions, if present, into a single edge, and associated the sum of money flow to the edge. Let us denote the amount of money flow for the edge \(e_{ij}\) by \(g_{ij}\). This completes the construction of daily graphs from the transaction graph. We remark that \(G_t\) does not include multiple edges nor self-loops.

Denoting the number of elements of a set A by |A|, in general, we can define \(|V_t|\) for the number of nodes, and \(|E_t|\) for the number of edges. Regarding time t, we consider two periods, as explained in the preceding section:

$$\begin{aligned} t&\in T_\text {quiet}:=[\text {1 January 2015, 30 June 2015}]\ , \end{aligned}$$
(2)
$$\begin{aligned} t&\in T_\text {active}:=[\text {1 July 2017, 31 December 2017}]\ . \end{aligned}$$
(3)

Subscript t for variables to be defined in what follows may be omitted when the dependence on t is obvious. \(G_t\) is a directed network in the sense that each edge has a specific direction. It is sometimes useful to ignore the direction; in such a case, we shall use the same notation \(G_t=(V_t,E_t)\) for the undirected version of \(G_t\). In-degree \(d^{\text {in}}(i)\) and out-degree \(d^{\text {out}}(i)\) for a node \(i\in V_t\) are defined by

$$\begin{aligned} d_t^{\text {in}}(i)&:=\#\text {nodes }j'\text {s such that }e_{ji}\in E_t\ , \end{aligned}$$
(4)
$$\begin{aligned} d_t^{\text {out}}(i)&:=\#\text {nodes }j'\text {s such that }e_{ij}\in E_t\ , \end{aligned}$$
(5)

respectively. For the undirected version, one can define degree \(d_t(i)\) by

$$\begin{aligned} d_t(i):=d_t^{\text {in}}(i)+d_t^{\text {out}}(i)\ . \end{aligned}$$
(6)

Average degree is then defined by

$$\begin{aligned} \bar{d}_t:=\frac{1}{|V_t|}\sum _{i\in V_t} d_t(i)=\frac{2|E_t|}{|V_t|}\ , \end{aligned}$$
(7)

where the last equality follows from the fact that each undirected edge appears twice for the two nodes at the ends of the edge. The numbers of nodes and edges, and the average degree in the two periods are shown in Fig. 3 (active period) and Fig. 4 (quiet period). One can observe that the average degree is relatively stable, around 3.0, much smaller than the number of nodes, which means that the network is sparse and has a small number of nodes with large degrees, namely hubs.

Fig. 3
figure 3

Node–edge statistics in active period; a daily node–edge count b average degree

Fig. 4
figure 4

Node–edge statistics in quiet period

The network \(G_t\) is changing in time. For different times \(t_1\) and \(t_2\), even if they are successive in time, \(V_{t_1}\) is different from \(V_{t_2}\) as a set. However, examining the data, we found that there exist users i such that \(i\in V_t\) frequently at many temporal points \(t\in T\) for a given period of time T. In other words, there are persistent users.

3.2 Connected components

A daily graph \(G_t\) is not necessarily connected as an undirected graph. In general, a connected component \(C_a\) of an undirected graph \(G_t\) is defined by

$$\begin{aligned} \begin{aligned} C_a:={}&\{i\in V_t\text { such that for any }i,j\in C_a \\&\quad \text {there exists at least one path from }i\text { to }j\}\ , \end{aligned} \end{aligned}$$
(8)

where a path is a set of edges, \(e_{ik_1}, e_{k_1 k_2}, \ldots e_{k_n j}\) connecting between i and j. One can introduce an equivalence relationship between any pair of nodes, namely, i is defined to be equivalent to j if and only if there exists a path between i and j. It is a mathematical consequence that the set of nodes \(V_t\) can be decomposed into mutually disjoint equivalence class as follows:

$$\begin{aligned} V_t=C_1\sqcup C_2\sqcup \cdots C_p\ , \end{aligned}$$
(9)

such that \(C_a\cap C_b=\emptyset \) for any ab. \(C_a\) is called a connected component, and is denoted by \(C_1(G_t)\) when we express the dependence on \(G_t\) explicitly. p is the number of connected components. It follows from the decomposition that

$$\begin{aligned} |V_t|=\sum _{a=1}^p |C_a|\ . \end{aligned}$$
(10)

Suppose that \(C_a\)s are ordered according to size, that is,

$$\begin{aligned} |C_1|\ge |C_2|\ge \cdots |C_p| \end{aligned}$$
(11)

\(C_1\) is called the largest (max) connected component. We denote \(|C_1|/|V_t|\) as the relative size of the largest connected component. We find that often \(|C_1|/|V_t|\) is relatively large, typically 0.5, or even larger.

3.3 Filtered daily graphs

To focus on large amounts of flows in the daily graphs, we shall filter \(G_t\) to obtain a subgraph \(H_t\subset G_t=(V_t,E_t)\) as follows: Each edge has a certain amount of flow \(g_{ij}\) as stated above. We define a certain threshold \(g_*\), which will be determined in the next section, and filter the edges by the following condition: \(g_{ij}\ge g_*\), that is, by deleting all the edges that do not satisfy the condition. Let the set of remaining edges be \(F_t\subset E_t\). Collecting all the nodes that appear at either ends of each edge, one has the set of remaining nodes \(U_t\subset V_t\). This completes the construction of the filtered daily graph \(H_t=(U_t,F_t)\).

A filtered graph \(H_t\) can be decomposed into connected components, as described above. Let the largest (max) connected component be \(C_1(H_t)\). Let the set of nodes in \(C_1(H_t)\) be \(U'_t\), and that of edge be \(F'_t\), which is

$$\begin{aligned} C_1(H_t)=(U'_t,F'_t) \end{aligned}$$
(12)

In the next section, we will compare the total amount of flows on \(H_t\) with that on \(C_1(H_t)\). The former is denoted by

$$\begin{aligned} \phi _0(t)=\sum _{e_{ij}\in F_t} g_{ij}\ , \end{aligned}$$
(13)

while the latter is

$$\begin{aligned} \phi _1(t)=\sum _{e_{ij}\in F'_t} g_{ij}\ . \end{aligned}$$
(14)

We will also compare the size \(|H_t|\) with \(|C_1(H_t)|\).

4 Significant difference between weekdays and weekend

We sub-divided the data into active and quiet periods. We cleft these two period networks into daily images and dissertated the basic statistical properties of nodes and edges. Now, in this section, in the two time domain, we discuss the change in network properties with the transition from weekdays to weekends, specifically for large edge flows. This converges our analysis of defining the “big players”. To scrutinize the activities of the big players and their network, we find a threshold point to identify the comparatively larger edge flows.

4.1 Importance of weekly pattern

One of the main findings of our previous research (Islam et al. 2019) was a weekly pattern. In that study, we showed that the summation of daily BTC volume and the daily total number of transactions has correlation with the business operating days of the week. We showed that the quantity of the two variables have significantly reduced from weekdays to weekends. Although the crypto-asset can be operated 24/7, the blockchain transaction activities during Saturdays and Sundays are relatively low. Therefore, they follow the business operating hours of banks, firms, and so on.

In this study, our spotlight once again focuses on the utilization of this “weekly pattern”. However, this time, the motivation is different. It focuses on the daily big flow of BTC and on the users who are involved in a large amount of flow and presumably well connected with each other. In short, we want to find important users transacting big flows of money. Therefore, the weekly pattern has a close relationship to our research purpose.

To achieve our goal, we must first examine the big flow. We do that by setting a threshold parameter for the flow in each edge. Second, we need to identify the important users. The importance of users should be defined to distinguish them from other users. Apparently, we expect that the network is not completely disconnected. It has a vital core that captures all users who are doing transactions linked with each other. This is called the largest connected components. In the next subsection, we will discuss this threshold parameter and largest connected component variable selection process more elaborately.

4.2 Connected components of sub-graph and the BTC flow inside

First, we construct filtered daily graphs by fixing a certain threshold \(g_*\) mentioned in Sect. 3.3. We fixed the value of 20 BTC for \(g_*\), as it would eliminate all the small flows. Selecting much higher thresholds than 20 BTC would reduce the size so much that it would be difficult to perform the statistical analysis. Therefore, by choosing 20 BTC, we executed a trade-off between network size and threshold. For details, see Appendix A.2. We then calculated the connected components of the filtered daily graph using the method explained in Sect. 3.2 to calculate the weekly pattern.

We can describe the above process using mathematical expressions and equations. First, we took a threshold 20 BTC for the parameter value of \(g_*\). We created the daily filtered graph \(H_t=(U_t,F_t)\). Then, we decomposed those to connected components and listed the largest connected component nodes \(U'_t\) and edges \(F'_t\), as mentioned in Eq. (12). We calculate the relative size of the largest connected component \(|U'_t|/|U_t|\) for the daily graph as

$$\begin{aligned} S_r(t)=\frac{|U'_t|}{|U_t|} . \end{aligned}$$
(15)

In addition, we calculate the relative flow inside the largest connected component using Eqs.(13) and (14) as

$$\begin{aligned} \phi _r(t)=\frac{\phi _1(t)}{\phi _0(t)} . \end{aligned}$$
(16)

If we denote the days of week by dow = Sundays, Mondays, \(\ldots \), Saturdays and the numbers of days of week in the time period by \(n(\text {dow})\), then we can define the averages as follows:

$$\begin{aligned} \bar{S_r}(\text {dow})=\frac{1}{n(\text {dow})}\sum _{t\in \text {dow}}S_r(t) , \end{aligned}$$
(17)

and

$$\begin{aligned} \bar{\phi _r}(\text {dow})=\frac{1}{n(\text {dow})}\sum _{t\in \text {dow}}\phi _r(t) . \end{aligned}$$
(18)
Fig. 5
figure 5

Normalized max connected components size with threshold edge flow greater than 20 BTC

The Fig. 5 shows the daily average maximum connected components normalized size for both active and quiet periods having weekly patterns. We can explain this in terms of big edge flows and the connectivity of the largest components. On weekends, fewer nodes remained active in the network. In the real world, businesses certainly take time off, particularly on weekends. All the largest stock exchanges in the world maintain trading hours that follows the bank’s operating hours. This is why stock markets are closed on weekends. Technically, the crypto-asset exchanges have the upper hand, as investors can trade on Saturdays and Sundays. However, additional challenges and risks are there in doing so. The trading activities outside operational hours lead to many problems. As there are small number of users with relatively small volume of BTC traded, prices are more influenced by single trades, and moreover, the volatility factor is always present. The mismatching between sellers’ asking price and buyers’ bids leads to uncertainty in completing negotiations. On the contrary, on weekdays, the scenario improves significantly. The main reason is that the size of maximum connected components grow on weekdays when there are more nodes to participate. The buyers and sellers have much more information and options.

Fig. 6
figure 6

Normalized max connected components flow with threshold edge flow greater 20 BTC

In terms of flow inside the maximum connected components, we quantified the circulation by taking the proportion of flow inside the maximum connected component of the daily 20 BTC sub-graph to the total flow inside the total connected components. The daily total flow of the sub-graph was calculated to understand the flow inside the maximum connected components and we show this in Appendix A.4. We can approximate the average daily total flow despite having some larger fluctuations in the quiet period than the active period. The active period has a daily total BTC flow that is twice as large as that of the quiet period.

Prices quoted during after-hours sessions are not “official” and do not reflect credibility in the minds of traders. As the Fig. 6 shows, on every weekday, approximately 65% or more of the total flow of all connected components circulates inside the maximum connected components, while it is less than 55% approximately in the active period on weekends. In the quiet period, some spikes in the daily weekend flow push the average flow on Saturdays a bit higher; however, the weekly pattern still holds. Therefore, both the size and flow inside the maximum connected components disclose the difference between the activities on weekdays and weekends.

The sub-graph of thresholds greater than 20 BTC filtered out all the small flows. The maximum connected components of this sub-graph include all users who are persistently involved in the exchange market. Although there might be other relatively less persistent user influence involved in this sub-network, the quantification of the currency stream inside the maximum connected component gives us good insight of the flow pattern. In that context, we planned to measure the daily total average flow of some individual renowned crypto-exchanges and examine their seasonality weekly behavior. This will allow us to emphasize similar behavior as the identifying criteria of other active anonymous financial institutions inside blockchain.

Readers may wonder of the effect on the weekly pattern if one takes a different variable. For example, instead of total edge flow, if one takes average edge flow, which is defined by the total daily flow divided by the number of daily total edges, we have a very nontrivial result. Although we obtain the weekly pattern in the main graph, for higher thresholds, the pattern diminishes. For details, please see Appendix A.3.

5 Examining some exchange’s activities: “big players” market scenario

We now finalize the definition of big players. To make the definition compatible, we use available online open source data of crypto-exchange markets to correlate our blockchain restructured data. At present, most trading at the bitcoin market happens by virtually maintaining a liquid pool of bitcoin so that people can disengage their crypto-assets at any moment. Individuals who would like to trade on the market do so by depositing bitcoin through a transaction to the market’s public address (The hash keys or public keys that are published publicly on that website and available in the blockchain) or by making a bank transfer to the market’s bank accounts. The market then credits the purchaser’s account on their framework with that amount of money. Buyers can then submit limit or market orders that are put in the market’s order book. This gives a clear indication of the daily use of the market’s public address, with very high recurrences. Besides, there is also a chance of high volume of BTC flow to and from the wallets of such markets.

While exploring the open source crypto-exchange data, we found a website (https://bitinfocharts.com/top-100-richest-bitcoin-addresses.html) where we collected the public addresses of 1000 top rich bitcoin wallets. We merged the public addresses with the Hungary research group’s list of addresses database and then again with our restructured address to the user database. In Table 2, we present the public addresses that were contracted with our user database.

Table 2 The contraction of some exchange with our restructured address to users’ database

We found that until the cutoff date of 9th February 2018, Xapo and Bitstamp wallets have a very large number of edges. These two wallets were also discovered in the Table 1 of our top 20 frequent wallets, which we found earlier. These two exchanges were the prime specimens for observing the money flow. We took the user IDs of both exchanges and weighed the daily average sum of inflows and outflows. In both cases, the results followed the weekly pattern, as shown in Figs. 7 and  8.

Fig. 7
figure 7

The average daily sum of in and out flow of xapo.com

Fig. 8
figure 8

The average daily sum of in and out flow of Bitstamp exchange

Finally, we define the big players by recalling Eq. (1) where each user’s frequency is defined by f(i) in the transaction graph created for a period between 2013 and 2018. From Table 1 and with visual assumption from Fig. 2 we take the lowest frequency as 2,675,207. The users having equal to or higher than this frequency in the transactions graph we term them as big players. Mathematically the big players \(i_\text {bp}\) can be defined as :

$$\begin{aligned} f(i_\text {bp})\ge 2,675,207 , \end{aligned}$$
(19)

where we denote big players by bp = 1, 2, \(\ldots \),20. We created daily graphs \(G_t(\text {bp})=(V_t(\text {bp}), E_t(\text {bp}))\) the daily graph at time t, where t is assumed to be a day in a time period T, \(V_t(\text {bp})\) is the set of all vertices that transacted with \(i_\text {bp}\), and \(E_t(\text {bp})\) is the set of links or edges for BTC transactions between big players and other users. Then, the financial institutions are those which meet both the following criteria :

  1. 1.

    Persistency

    having persistent daily activity. If the daily graphs created in such a way that big players \(i_\text {bp}\in V_t(\text {bp})\) at many temporal points \(t\in T\) for a given period of time T, the big players can be called persistent big players.

  2. 2.

    Weekly pattern

    showing weekly pattern of daily total network flow. For daily temporal points \(t\in T\) for a given period of time T, if \(\phi _{\text {bp}}\) represents the daily total (in+out) edge flow of big players \(i_\text {bp}\), then the average days of week total flow can be defined as :

    $$\begin{aligned} \bar{\phi _{\text {bp}}}(\text {dow})=\frac{1}{n(\text {dow})}\sum _{t\in \text {dow}}\phi _{\text {bp}}(t) , \end{aligned}$$
    (20)

    where, dow = Sundays, Mondays, \(\ldots \), Saturdays and the numbers of days of week in the time period T is \(n(\text {dow})\). The weekly pattern is followed if :

    $$\begin{aligned} \bar{\phi _{\text {bp}}}(\text {Fridays}) > \bar{\phi _{\text {bp}}}(\text {Saturdays}), \end{aligned}$$
    (21)

    and

    $$\begin{aligned} \bar{\phi _{\text {bp}}}(\text {Mondays}) > \bar{\phi _{\text {bp}}}(\text {Sundays}). \end{aligned}$$
    (22)

To authenticate the definition, we present the result of applying the criteria on the top 20 frequent users in Table 3.

Table 3 Revealing crypto-exchange market’s top 20 frequent users

Along with the Xapo and Bitstamp exchanges, we calculated the average daily total inflows, outflows, and total flows (in+out) of average weekdays and weekends of the other top 18 bg players. Out of 20 users, 12 showed daily activities, very large daily in/out/total BTC volume and weekly patterns of in/out/total BTC flow. We can identify these as similar financial institutions or exchange markets. The rest did not show similar results because of insufficient daily persistency, and no distinction between weekdays and weekend activities. These Big players can be examples of crowd funding or donation accounts, gambling, gaming sites, or other non-financial services.

6 Conclusion and final remarks

Crypto-asset trading continues around the clock across various exchanges spread globally. While many see this as an advantage, with the potential benefits of making profits at the convenience of active traders, it also comes with the challenges of constantly monitoring prices and making timely trades to book profits and cut losses during the odd-hours.

In our previous study, we identified the weekly pattern of the daily total sum of transaction and bitcoin volume. In this present study, we first checked whether the weekly pattern could be explained by the dynamically changing network properties. To understand this, we performed a threshold analysis aimed to identify the big flows.

The connected component analysis of threshold sub-graph showed that the size of the maximum connected components during weekdays is larger than that of weekends. The result was per expectation for both active and quiet periods. A primary reason for the observed trends is attributed to the mismatch in the standard operating hours of banks and the crypto-asset markets. Over the weekend, not much new money comes in to support prices. In terms of flow, the normalized average edge flow inside the maximum connected components follows the weekly pattern. The two crypto-exchange institutions, Xapo.com and Bitstamp, supported the weekly pattern of daily total circulation in their own networks.

We also found that both Xapo and Bitstamp were among the top 20 frequent users in the network. Thus, we defined big players in terms of high frequency, persistent daily activity, and weekly pattern of total daily average BTC flow. Among the top 20 frequent list, we tried fitting in the two criteria. The 12 out of 20 users who followed, we identified as crypto-exchange companies or financial institutions. We excluded the remaining 8 users because of insufficient persistency and not following the weekly pattern plausibly being online gambling, crowdfunding, and donation institution. The cold wallets, despite their random big flows, cannot be termed as big players according to the definition.

The goal of this research was to reveal the identity of some specific users who are involved in big network flow persistently in the blockchain. We proposed a methodology focusing on behavioral patterns of those users involved in the daily big circulation of money. Applying this methodology, we distinguished the big players into two hypothesized categories: financial and non-financial. The weekly patterns can help us uncover the identity of users we term financial institutions, because they have more BTC trading activities during weekdays than weekends. Most exchange markets belong to this category. A second category of big players is those with large frequency but lagging daily persistent activities and weekly patterns. We conjectured that all the crowd funding, donor organizations, gambling, or betting sites could be examples of non-financial institutions.

This research has a contribution to the field of economics. The blockchain technology has been arousing a lot of interest from a variety of areas such as trade, finances, government and policy. However, because of the anonymity, it turns out to be a challenging task to quantify this engagement and the adoption by financial institutions. In this work, we aimed at understanding which are the main criteria associated with identification of the financial institutions inside a fully digitized economy. To do this, we applied a new technique along with the existing ones for deanonymizing the financial institution users having high frequency, appearing persistently on daily big flow of bitcoin. The financial market of crypto-asset with flow of BTC funds is the representation of saving and investing special currency through the intermediary agents like savers and investors. Like the traditional fiat currency, bitcoin does not have intrinsic value. But, unlike the fiat currency, it has a store of value like nonmonetary assets for example savings accounts, stocks, bonds and real estate. In our work, the big players’ connected component analysis has given us the insight of the quantification of the daily big flow of crypto-asset and acknowledges the circulation demonstrates how money moves through society. The big money flows which involves the conversion of fiat currency to crypto-asset invested by users with the help of intermediary financial institutions and flows back to them as payment for selling back for profits. In short, a digital crypto-asset economy is an endless circular flow of money. In our work, we have found out that more than 50% of the total threshold flow (above 20 BTC) involves the circulation of big flows of economic activity among the big players. To see the crypto-asset to be successful in future, the big players need to engage and embrace reasonable and responsible regulation. The growth of this industry depends, in part, on the establishment of safe, fair and reliable market conditions. Presently, the proper regulatory environment is still uncertain and there are a lot of provisions of research work for standardized regulatory policies .

In future, the strongly connected component analysis and the community analysis of these financial institutions would reveal more information about their communities and the net balance of BTC for each financial institution and their peers. Another possible future study is of exogenous factors such as the price movement of the exchange market, which might have a strong correlation with the flow of BTC inside the blockchain. The bitcoin protocol specifies that the reward for adding a block will be halved every 210,000 blocks every four years, that is, currently, from July 2016 to approximately May 2020. We strongly believe that the price of the exchange market might be extremely volatile after halving and presumably, it will have a deep connection with the flow of BTC inside the blockchain.