Many researchers have addressed Big data and its issues [24, 31]. Exploring Big data offers numerous appealing features, but specialists and experts are also confronted with many difficulties while investigating such mines of data . Many data sets are too large and complicated to manage on available memory units and are distributed across the clusters of computers . Due to the growing nature of Big data, it is hard to avoid its challenges. The most immediate issues that need to be addressed are storage issues, management issues, and processing issues . It is challenging to handle data connectivity issues, storage limitations, and data processing capabilities in real-time Big data . The exponential growth in structured and unstructured data has compelled the need for an efficient and reliable storage approach. Therefore, the reliability of devices matters a lot concerning storage approaches chosen for handling Big data. While processing, backup, and archiving Big data, there could be many challenges like storage medium, data replication, and duplication . Big data applications use high-speed servers and equipment, which results in increased cost.
A distributed paradigm is considered a suitable replacement for costly supercomputers. Distributed approaches are decentralized and aim to disrupt the existing central and conventional ways to deal with huge volumes of data. It also ensures to handle and deal with new expanding client’s needs and application demands. Data storage, data access, data transfer, and data visualization activities use distributed computing with low-cost machines to make Big data analysis and processing possible within a reasonable cost and time . In distributed computing systems, the focus is on data representation. It attempts to addresses challenges of processing, interaction, and representation of data in proficient manners.
Instead of textual/numerical analysis data, visualizations are producing better perceptions and understanding of data. However, power and speed limitations are there when visualizing Big data. This leads to the scalability issue . Current processing technologies and systems cannot satisfy the needs of big growing data, processing of data, and visualization. The increase in speed and storage capacity is much lesser than the amount of Big data. Speed and complex processors are needed . Data visualization is a significant way to deal with Big data. It helps to get an overall perspective of Big data and find information esteems. It also helps to find the hidden patterns in data. The challenge is to manage the parallelism that blends between distributed and shared techniques with such massive data. For the issue mentioned above, the requirements of machines are a fundamental concern. Many researchers have shown that large-scale distributed data visualization is an input\(\setminus \)output-bound problem. When interactivity is required, security and data access become significant problems, mainly when the data are distributed over wide-area networks. Many user interfaces and web-based data visualization techniques are available, which provide efficient decentralization with an additional cost for buying servers and equipment. This is the challenge for these techniques. Tables 1 and 2 provide Big data visualization techniques. Table 2 also provides websites to get these tools for the visualization of data. These tools offer visualizations of data based on its features. One of the key challenges in modern distributed data visualization today is how to provide users with an interactive experience. Techniques presented in Tables 1 and 2 are being used in many distributed applications to create and achieve the intended visualization of data . These distributed data visualization frameworks are presented to attain interactive results and to address scalability. Distribution in these techniques is achieved with the help of substantial remote servers. Similarly, high-performance computing capability is needed to get interactive results .
Blockchain is emerging as new distributed technology. The blockchain concept is based on distributed ledger maintained by multiple parties . Using blockchain, we can build decentralized systems , which can effectively mitigate problems associated with high access and communication costs. Also, a decentralized blockchain architecture is more resistant to a single point of failure . Experiments show that the scalability issue can be effectively handled in blockchain systems.
Decentralized computing frameworks are aimed in several ways at disrupting the current cloud environment and scalability problem.
For achieving interactive outputs from huge streams and sets of big data, blockchain-based decentralized approaches have received great attention. Traditional distributed peer-to-peer (p2p) networks have inevitable disadvantages, including insecurity and lack of auditing and incentives . Filecoin  and BigchainDB  are scalable blockchain databases that combine the characteristics of both blockchain and existing distributed databases. In all respects, blockchain is opening doors for solving numerous problems for many applications in a distributed manner . Blockchain-based frameworks offer many benefits for distributed storage with features like availability, no single point of failure, confidentiality, privacy, and integrity . Currently, Filecoin , Sia, Swarm, and Storj, etc. are mainstream distributed storage solutions built on blockchain .
Sia uses blockchain technology to provide an open market to buy and reserve unused computing space for users. Conditions of storage, such as availability and active duration that are agreed by the participants under file contracts, are encrypted service-level arrangements. These are immediately stored on the Sia blockchain and done .
Filecoin is another decentralized blockchain and native cloud service. It produces an incentive-based scheme that provides additional storage. Filecoin uses IPFS, a P2P distributed protocol, where each file is encrypted with a hash key and containing indexing details. It enables large amounts of data to be searched and saved and distributed with high productivity .
Another protocol for decentralized storage network is SWARM. It is permission less and communication infrastructure . The principal goal of swarm is to provide the decentralized app’s creators with infrastructure resources. To implement the utilities, SWARM uses smart contract platforms (Ethereum). The strategy for decentralized storage is a peer-to-peer approach.
It is not deniable that blockchain has opened doors for solving many problems for many applications in a distributed manner with provision of various kinds of distributed storage approaches mentioned above .
Considering these blockchains based on distributed storage such as Filecoin, Sia, Swarm, etc, the set of transactions are stored within the of blocks of immutable blockchains and generate a type of decentralized database or storage of structured data. However, due to scalability concerns, the size of blocks cannot expand very large, and thus, it can be clearly seen that above-mentioned distributed storage blockchains are not meant to store and handle large amounts of data and will take too long to process and resources to accomplish these executions. These problems lead to increase the overall energy of the network. Due to consumed energies in nodes and long communication distances, the network lifespan, and energy consumption and maximization; it has been a critical issue and it increases the overall energy of the network . Hence, these generic schemes are not much energy efficient also do not address optimizing network and storage cost.
All projects are similar in the way as they are trying to provide decentralized storage, but existing solutions are controlled by centralized third party and are mainly commercial applications. The data stored are available to intermediaries who are providing services. The involvement of intermediaries and third parties leads to an increase in operational and transnational costs and lack of security which makes the system vulnerable.
Hyperledger Sawtooth is also a blockchain framework used in developing and running applications in a distributed manner smoothly . Sawtooth also addresses the challenges of permissioned (private) networks. Sawtooth clusters with different approval can be implemented quickly. No central service may leak transaction trends or other classified details. Sawtooth Blockchain can help to provide efficient data storage and access framework for many private and small organizations and applications .
Despite, all good blockchain systems are facing scalability issues. Due to a number of scalability issues and limited on-chain storage capacity, the feasibility of such a blockchain-based data network is restricted . A variety of methods have been suggested to increase scalability while maintaining decentralization and security in this regard. Sharding technique is one of it.
Sharding is one of the most practical approaches for achieving scalability by partitioning network into several shards, to reduce the overhead of scalability . However, the issue is to maintain atomicity during cross-shard operations. Each shard is stored in a separate server instance .
Although this spreads the load, but increases the overall cost and increases the energy consumption. Sharding can be a good choice when it comes to public and permission-less blockchains, but for private small-level projects, sharding technique may be costly, complex, and consume more energy. Table 3 presents a comparison between some blockchain-based storage services. Provided in Table 3 is relative estimation four parameters, i.e., closed-source decentralization, exposure of data to third party, network cost and energy consumed, scalability, energy-efficient objectives, and free of cost solution.
Many blockchain-based frameworks offer benefits like availability, no single point of failure, confidentiality, privacy, and integrity . Hyperledger Sawtooth is a blockchain framework used in developing and running applications in a distributed manner smoothly . Blockchain can provide efficient data storage and access framework for many organizations and applications . As the visual representation of Big data communicates ideas more clearly, the need for data visualization is increasing in different domains. However, it is a complex and demanding task to visualize a massive volume of data. Therefore, a distributed and green solution is needed to achieve interactive data visualization with less cost and energy consumption. Using blockchain-based mechanism for distribution and decentralization of big data and then visualize could be a good solution. However, aforementioned blockchain-based techniques also increase the chances of escalate overall network cost with high energy and resources consumption. Also users may somehow be required to be controlled by some third-party service making it centralized somehow.
Therefore, there is need of green solution with minimum energy consumption having low network cost providing optimum storage solution, for small and big levels of projects ensuring security, available and reliable data, and its visualization at same time.
BGbV aims to provide support to distributed data visualization. It ensures security and availability, and, most importantly, provides a framework for better utilization of existing distributed resources. By utilizing small units of storage effectively, it facilitates scalability challenges. We present a green solution that will reduce cost by utilizing small resources that are already available and lessen energy consumption and make it an environmentally friendly solution.