Introduction

Cloud computing’s appeal lies in its dynamic and flexible Service Level Agreement (SLA) based negotiable services, allowing users to access virtually limitless computing resources [1]. According to the National Institute of Standards and Technology (NIST), cloud computing offers a swiftly provisioned pay-per-use model, enabling on-demand, accessible, and configurable network access to shared pool resources, requiring minimal interactions from service providers and reduced management efforts [2]. Cloud computing models include private, public, hybrid, and community clouds, with services categorized into Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). IaaS providers like Google Compute Engine, Windows Azure Virtual Machines, and Amazon Elastic Cloud Compute offer network resources and computing storage, enhancing performance and reducing maintenance costs to meet specific customer demands [3, 4]. This evolution in cloud computing has transformed various sectors. Businesses and healthcare organizations benefit from services like cost reduction through resource outsourcing [3, 4], performance monitoring [5, 6], resource management [7], and computing prediction [8]. Additionally, cloud computing facilitates tasks such as resource allocation [9], workload distribution [10,11,12], capacity planning [13], and job-based resource distribution [14, 15]. This transformative impact underscores the significance of cloud computing in modern digital landscapes, empowering organizations with unprecedented efficiency and scalability in resource utilization [3,4,5,6,7,8,9,10,11,12,13,14,15].

Despite the availability of various data services, data owners are apprehensive about entrusting their valuable data to cloud service providers (CSPs) for third-party cloud storage due to concerns about the integrity of the CSPs [13, 16, 17], and the shared nature of cloud storage environments. Cloud computing primarily encompasses data storage and computation, with Infrastructure as a Service (IaaS) closely linked to cloud storage. When accessing IaaS, cloud users often lack visibility into the precise location of their outsourced data within the cloud storage and the machines responsible for processing tasks. Consequently, data privacy within cloud storage is a significant security challenge, exacerbated by the presence of malicious users, resulting in data integrity and confidentiality issues. This poses a critical security challenge for cloud storage, and trust in remote cloud data storage is crucial for the success of cloud computing. Data integrity, encompassing completeness, correctness, and consistency, is vital in the context of Database Management Systems (DBMS) and the ACID (Atomicity, Consistency, Isolation, Durability) properties of transactions. The issue arises when CSPs cannot securely guarantee clients the accuracy and completeness of data in response to their queries [18].

Researchers are actively advancing the field of data integrity in cloud computing by refining data integrity verification techniques and bolstering data privacy-preserving methods. These verification techniques primarily encompass Proof of Work (PoW), Proof of Data Possession (PDP), and Proof of Retrievability (PoR). Notably, the introduction of Message Authentication Code (MAC) using a unique random key within the data integrity framework marked a deterministic approach to data integrity verification, mitigating the inefficiencies associated with remote data integrity schemes that employed RSA-based encryption. This approach addressed issues related to significant computation time and long hash value transfer times for large files [19]. To enhance the security of data integrity schemes, Provable Data Possession (PDP) concepts were introduced to establish the legitimacy of data possession by a cloud server. Various subsequent research efforts have continually refined these algorithms, introducing innovations like the Transparent PDP scheme [20], DHT-PDP [21], Certificateless PDP Protocol for Multiple Copies [22,23,24], and Dynamic Multiple-Replica PDP [25]. Concurrently, the Proof of Retrievability (PoR) concept was introduced in 2007 to address error localization and data recovery issues [26]. Additionally, Proof of Original Ownership (PoW) emerged in 2011 through the Merkle hash tree protocol to prevent malicious adversaries, leading to a plethora of subsequent research endeavors with diverse improved algorithms aimed at the same goals [27,28,29].

Fully homomorphic encryption (FHE) was proposed to maintain the privacy preservation of outsourced data and in that case, original data were converted into ciphertext through an encryption technique that supports multiplication and additional operation over the ciphertext [30]. Meanwhile, drawbacks in [22] such as practically infeasible due to complex operations, were then solved by [31] Somewhat Homomorphic Encryption (SHE) scheme. Many more research works have been established in these few years such as biometrics face recognition approach [32], privacy-preserving auditing scheme for Cloud Storage using HLA [33], An Etiquette Approach for Preserving Data [34], etc.

Recently, Google cloud has introduced Zebra technologies based on a security command center (SCC) and security operation center (SOC) to point out some harmful threats such as crypto mining activity, data exfiltration, potential malware infections, brute force SSH attacks, etc. to maintain data integrity of business organization’s information [35].

In recent years, numerous cloud data integrity schemes have emerged, along with several survey papers, albeit with limited parameters to comprehensively address specific aspects of data integrity. Some of these surveys include data auditing from single copies to multiple replicas [36], Proof of Retrievability [37], various data integrity techniques and verification types for cloud storage, and different data integrity protocols [38]. However, these surveys often fall short in providing a comprehensive understanding of data integrity strategies and their classification. A concise taxonomy of data integrity schemes was presented in a survey paper [39], which discussed a comparative analysis of existing data integrity schemes, their evolution from 2007 to 2015, and covered fewer physical storage issues, fewer security challenges, and design considerations. This survey paper aims to address this gap by offering an in-depth discussion on the security challenges within physical cloud storage, potential threats, attacks, and their mitigations. It will also categorize data integrity schemes, outline their phases and characteristics, provide a comparative analysis, and project future trends. This comprehensive approach underscores the significance of data integrity schemes in securing cloud storage.

Discussion

Although there are several articles arise on similar issues, our research work differs from all mentioned research works in the following ways: Unlike [36, 37, 39], our research work focused on different types of storage-based attacks and also comprised up-to-date methods to resist storage-based attacks which always violate data integrity schemes on physical cloud storage. Like [37], it includes storage-based security issues, threats, and it’s existing mitigation solutions. Unlike [36, 37, 39] our research work focused on the different types of proposals of data integrity verification which is broadly classified into file-level verification, entire blocks verification, metadata verification, and randomly block-level verification.

Unlike [37], our survey work is not constricted to only proof of retrievability (POR). It covers all verification types like the power of ownership (PoW), proof of retrievability (POR), and provable data possession (PDP). It also includes different types of auditing verifications techniques to elaborate job roles on the TPA’s side and DO’s side. It also includes a discussion of the benefit of public auditing to reduce the overhead of computational and communication overhead of DO. Unlike [36,37,38, 40,41,42,43], our survey work reviews a wide range of quality features of data integrity schemes that have individually prime importance in cloud storage security. Unlike [36, 37, 41], we focused on different types of security challenges according to existing symptoms, effects, and probable solutions of data integrity schemes. Like [42,43,44], we include a discussion about malicious insider attacks, forgery attacks, and dishonest TPA and CSP. Unlike [41, 43, 44], in comparative analysis, we introduce here different performance analysis parameters of existing works based on the work’s motivations and limitations in addition to a discussion of public and private data auditing criteria. Like [32], we include all existing data integration methods briefly in the Comparative analysis of data integrity strategies section.

Research gap

According to the above discussion, this research focuses on the following points to summarize the research gaps:

  • In contrast to [36, 37, 39], our research included current strategies to fend against storage-based attacks, which consistently compromise data integrity techniques on physical cloud storage.

  • Our research, in contrast to [36, 37, 39], concentrated on the various approaches to data integrity verification, which is categorised into four categories: file-level verification, full block verification, metadata verification, and randomized block-level verification.

  • Our survey study is not limited to proof of retrievability (POR), in contrast to [37]. It includes all forms of verification, including proven data possession (PDP), proof of retrievability (POR), and power of ownership (PoW). Different Key Management Techniques used in cloud storage to improve security at cloud storage were also added here .

  • In contrast to [36,37,38, 40,41,42,43], our survey work examines a variety of data integrity scheme quality features, each of which is crucial to the security of cloud storage.

  • In contrast to [36, 37, 41], we concentrated on various security issues based on the impacts, symptoms, and likely fixes of data integrity techniques.

  • In contrast to [41, 43, 44], we present here various performance analysis parameters of previous efforts based on the goals and constraints of the work together with a discussion of auditing criteria for both public and private data.

Contribution

On the basis of our knowledge, this is the first attempt to overlook all the related issues of cloud data storage with possible directions under a single article. The Key contributions of this research paper are summarized below:

  • Identification of possible attacks on storage level services which may arise on physical cloud storage mitigating explored solutions

  • Summarizing of possible characteristics of data integrity strategies to examine data integrity auditing soundness, phases, classification, etc. to understand and analyse security loopholes

  • Literature review on comparative analysis based on all characteristics, motivation, limitation, accuracy, method, and probable attacks

  • Discussion on design goal issues along with security level issues on data integrity strategy to analyse dynamic performance efficiency, different key management techniques to achieve security features, to analyse server attacks, etc.

  • Identification of security issues in data integrity strategy and its mitigation solution

  • Discussion about the future direction of new data integrity schemes of cloud computing.

This review article is described in 8 sections. Issues of physical cloud storage section, discusses issues of physical cloud storage, and attacks in storage level service. Key management techniques with regards to storage level in cloud section describes some existing key management techniques to enhance security of cloud storage. Potential attacks in storage level service section describes possible potential attacks in cloud storage. Phases of data integrity technique section phases of the data integrity scheme and summarizes all possible characteristics of the data integrity strategy. Classification of data integrity strategy section describes a classification of data integrity strategy. Characteristics of data integrity technique section describes characteristics of data integrity technique. Challenges of data integrity technique in cloud environment section describes Challenges of data integrity technique in cloud Environment. Desire design challenges of data integrity strategy section describes Desire design challenges of data integrity strategy. Comparative analysis of data integrity strategies section represents a comparative analysis of existing research works of data integrity strategy. At the end,design goal issues and future trends of cloud storage based on existing integrity schemes using a timeline infographic from 2016 to 2022 in Future trends in data integrity approaches section.

Issues of physical cloud storage

Generally, the physical cloud storage in terms of IaaS services gives cloud users the opportunity of using computing resources at a minimum cost without taking any responsibility for infrastructure maintenance. But in the actual scenario, CSP and other authorized users have no trusted actors in cloud computing. Hence, cloud storage is an attack-prone area due to the malicious intentions of CSP and insider-outsider attackers. We have listed here cloud storage issues along with possible attacks. Table 1 shows below all possible mitigating solutions.

  • In capability of CSP: Managing big cloud storage may create a data loss problem for CSP due to lack of insufficient computational capacity, sometimes cannot meet user’s requirement, missing a user-friendly data serialization standard with easily readable and editable syntax, due to changes of a life cycle in a cloud environment [66].

  • Loses control of cloud data over a distributed cloud environment may give vulnerable chances to unauthorized users to manipulate valuable data of valid one [67].

  • Lack of Scalability of physical cloud storage: Scalability means all hardware resources are merged to provide more resources to the distributed cloud system. It might be beneficial for illegitimate access and modify cloud storage and physical data centers [68].

  • Unfair resource allocation strategy: Generally, monitoring data is stored in a shared pool in a public cloud environment which might not be preferable to cloud users who are not interested to leave any footprint on their work distribution/data transmission by a public cloud-hosted software component which will be the reason for a future mediocre of original data fetching [69].

  • Lack of performance monitoring of cloud storage: Generally, monitoring data is stored in a shared pool in a public cloud which might not be preferable to cloud users who are not interested to leave any footprint on their work distribution/data transmission by a public cloud-hosted software component [70].

  • Data threat: Cloud users store sensitive data in cloud environments about their personal information or business information. Due to the lack of data threat prevention techniques of cloud service providers, data may be lost or damaged [64, 71].

  • Malicious cloud storage provider: Lack of transparency and access control policies are basic parameters of a cloud service provider being a malicious storage provider. Due to the missing of these two parameters, it’s quite easy to disclose confidential data of cloud users towards others for business profit [72].

  • Data Pooling: Resource pooling is an important aspect of cloud computing. Due to this aspect, data recovery policies and data confidentiality schemes are broken [73].

  • Data lock-in: Every cloud storage provider does not have a standard format to store data. Therefore, cloud users face a binding problem to switch data from one provider to another due to dynamic changes in resource requirements [39].

  • Security against internal and external malicious attack: Data might be lost or data can be modified by insider or outsider attacks [49, 74,75,76].

Table 1 Potential Types of Vulnerable Attacks and Threats at Storage Level Data Integrity with Mitigating Solutions

Key management techniques with regards to storage level in cloud

In order to prevent data leakage and increase the difficulty of attack, this paper presents a method combining data distribution and data encryption to improve data storage security. We have listed here some key techniques used in cloud storage to enhance security and transparency between cloud storage, cloud users.

  • Hierarchical Key Technique: Some research articles [77] provide secret sharing and key hierarchy derivation technique in combination with user password to enhance key security, protecting the key and preventing the attacker from using the key to recover the data.

  • Private Key Update Technique:This identity-based encryption technique [78] helps to update the private keys of the non-revoked group users instead of the authenticators of the revoked user when the authenticators are not updated, and it does away with the complex certificate administration found in standard PKI systems.

  • Key Separation Technique: This cryptographic method aids in maintaining the privacy of shared sensitive data while offering consumers effective and efficient storage services [79].

  • Attribute-based Encryption Key Technique: Instead of disclosing decryption keys, this method achieves the conventional notion of semantic security for data secrecy, whereas existing methods only do so by establishing a lesser security notion [80, 81]. It is used to share data with users in a confidential manner.

  • Multiple Key Technique:This k-NN query-based method improves security by assisting the Data owner(DO) and each query user in maintaining separate keys and not sharing them [82]. In the meantime, the DO uses his own key to encrypt and decrypt data that has been outsourced.

Potential attacks in storage level service

Storage level service in cloud computing offers services of resource computation, virtual network, shared storage over the internet in lease. It provides more flexible and scalable benefits than on-premise physical hardware. Due to these two aspects of the cloud, storage-level services can be the victim of malicious attacks attempting to steal computing resources for the publication of original data or data exfiltration in data braces. If attackers can successfully enter into the infrastructure services of an organization, they can then grip those parts to obtain access to other important parts of the enterprise architecture causing security issues of data integrity. We have listed here possible attacks on storage-level services.

  • DoS/DDoS: Ultimate purpose of this attack is to do unavailable original services towards users and overload the system by flooding spam results in a single cloud server. Due to the high workload, the performance of cloud servers slumps, and users lose the accessibility to their cloud services.

  • Phishing: Attackers steal important information in the form of a user’s credentials like name, password, etc. after redirecting the user to a fraud webpage as an original page.

  • Brute Force attack/ Online dictionary attack: It’s one type of cryptographic hack. Using an exhaustive key search engine, malicious attackers can violate the privacy policy of the data integrity scheme in cloud storage.

  • MITC: Man in the cloud attack helps attackers to gain the capability to execute any code on a victim machine through installing their synchronization token on a victim’s machine instead of the original synchronization token of a victim machine and using this token, attackers get control over target machine while target machine synchronizes this token to the attacker’s machine.

  • Port scanning: Attackers perform port scanning methods to identify open ports or exposed server locations, analyze the security level of storage and break into the target system.

  • Identity theft: Using password recovery method, attackers can get account information of legitimate users which causes loss of credential information of the user’s account.

  • Risk spoofing: Resource workload balancing is a good managerial part of cloud storage but due to this aspect of cloud computing, attackers can steal credential data of cloud users, able to spread malware code in host machines and create internal security issues.

  • Data loss/leakage: During data transmission time by external adversaries, incapability of cloud service providers, by unauthorized users of the same cloud environment, by internal malicious attackers, data can be lost or manipulated.

  • Shared technology issue: Compromising hypervisors, cloud service providers can run concurrently multiple OS as guests on a host computer. For the feebleness of hypervisor, attackers create vulnerabilities like data loss, insider malicious attacks, outsider attacks, loss of control on machines, and service disruption by taking control over all virtual machines.

Phases of data integrity technique

Data integrity always keeps the promise of data consistency and accuracy of data at cloud storage. Its probabilistic nature and resistance capability of storing data from unauthorized access help cloud users to gain trust for outsourcing their data to remote clouds. It consists of mainly three actors in this scheme: Data owner (DO), Cloud Storage/Service Provider (CSP), and Third-Party Auditor(optional) [39] as depicted in Fig. 1. The data owner produces data before uploading it to any local cloud storage to acquire financial profit. CSP is a third-party organization offering Infrastructure as a service (IaaS) to cloud users. TPA exempts the burden of management of data of DO by checking the correctness and intactness of outsourced data. TPA also reduces communication overhead costs and the computational cost of the data owner [83, 84]. Sometimes, DO ownself takes responsibility for data integrity verification without TPA interference. There are three phases in data integrity strategy described below in Table 2:

  • Data processing phase: In data processing phase, data files are processed in many way like file is divided into blocks [60], applying encryption technique on blocks [90], generation of message digest [87], applying random masking number generation [88], key generation and applying signature on encrypted block [93] etc. and finally encrypted data or obfuscated data is outsourced to cloud storage.

  • Acknowledgement Phase: This phase is totally optional but valuable because sometimes there may arise a situation where CSP might conceal the message of data loss or discard data accidentally to maintain their image [88]. But most of the research works skip this step to minimize computational overhead costs during acknowledgment verification time.

  • Integrity verification phase: In this phase, DO/ TPA sends a challenge message to CSP and subsequently, CSP sends a response message as metadata or proof of information to TPA/DO for data integrity verification. The audit result is sent to DO if verification is done by TPA.

Table 2 Classified Phases of Data Integrity Schemes
Fig. 1
figure 1

Entire Cycle of Data Integrity Technique

Classification of data integrity strategy

Classification of data integrity depends on a variety of conceptual parameters and sub-parameters. Table 3 shows all parameters, and sub-parameters with references to give a clear idea about data integrity strategy. The deployment setup of data integrity strategy is dependent on the environment of the proposed system. Clients can store their data in public cloud set up [98], multi-cloud setup [99, 100] or hybrid cloud set up [101]. If data are placed in a public cloud setup, clients lose access control visibility on data due to the outsider data management policy of CSP. As a result, data integrity problems arise because both CSP and public cloud storage are not honest in practical scenarios. Multi-cloud means more than one cloud service, more than one vendor in the same heterogeneous cloud architecture. A hybrid cloud is also a combination of private and public clouds. Hence, in the shared storage structure of multi and hybrid cloud environments, security issues of data integrity is a genuine concern. The guarantee of data integrity scheme can be proposed in two types: deterministic and probabilistic approaches. The performance of probabilistic verification is better than deterministic verification because of its higher accuracy in error correction of blocks without accessing the whole file and low computational overhead [102]. But, the deterministic approach gives adequate accuracy of data integrity whereas the probabilistic approach gives less than data integrity accuracy of deterministic approach [39].

  1. a)

    Type of proposal

    • File level verification: This is a deterministic verification approach. Here, data integrity verification is generally done by either TPA or the client. The client submitted an encoded file to the storage server and for data integrity verification a verifier verified the encoded file through the challenge key and secret key which is chosen by the client [103].

    • Block Level Verification : This type of verification is a deterministic verification approach. Firstly, a file is divided into blocks, encrypted, generated message digest, and sent encrypted blocks to CSP. Later, CSP sends a response message to TPA for verification and TPA verifies all blocks by comparing the newly generated message digest with the old message digest generated by the client [87].

    • Randomly block level verification: This is a probabilistic verification approach. In this verification, a file is divided into blocks, next generate anyone signatures or combination of any two signatures of hash [86], BLS [88], HLA [124], random masking [88], or ZSS [97] for all blocks and submits both of them to cloud storage. Later, TPA generates a challenge message for randomly selected blocks which will be verified for data integration checking and sent to CSP. Next, CSP sends a proof message to TPA for verification. The proof message is verified by TPA for randomly selected blocks by generating new signatures and comparing old and new signatures of particular blocks [61, 86].

    • Metadata verification: In this deterministic approach, firstly cloud users generate a secret key, and using this secret key, cloud users prepare metadata of the entire file through HMAC-MD5 authentication. Later, the encrypted file is sent to CSP, and metadata is sent to TPA. Later this metadata is used for integrity verification via TPA [85].

  2. b)

    Category of data

    • Static data: In static nature, no need to modify data that are stored in cloud storage. In [105], a basic RDPC scheme is proposed for the verification of static data integrity. In remote cloud data storage, all static files are of state-of-the-art nature which gets the main attention but in practical scenarios, TPA gets permission to possess the original data file creates security problems. In [106], the RSASS scheme is introduced for static data verification by applying a secure hash signature (SHA1) on file blocks.

    • Dynamic Data: Data owners don’t have any restriction policy for applying updation, insertion and deletion operations on outsourced data for unlimited time which is currently stored in remote cloud storage. In [111], a PDP scheme is introduced by assuming a ranked skipping list to hold up completely dynamic operation on data to overcome the problem of limited no. of insertion and query operation on data which is described in [118]. In [117], dynamic data graph is used to restrict conflict of the dynamic nature of big-sized graph data application.

  3. c)

    Verification type

    • Proof of ownership verification: The proof of ownership (PoW) scheme is introduced in the data integrity scheme to prove the actual data ownership of original data owner to server and to restrict unauthorized access to outsourced data of data owner from valid malicious users in the same cloud environment. PoW scheme is enclosed with data duplication scheme to reduce security issues about an illegal endeavor of a malicious user to access unauthorized data [27]. Three types of PoW scheme is defined: s-POW, s-Pow1, s-Pow2 in [29] which have satisfactory computation and I/O efficiency at user side but I/O burden on the remote cloud are significantly increased and this problem was overcome in [28] through establishing a balance between server and user side efficiency.

    • Provable of data possession: Provable of data possession (PDP) scheme promises statically the exactness of data integrity verification of cloud data without downloading on untrusted cloud servers and restricts data leakage attacks at cloud storage. In [104], research work described aspects of the PDP technique from a variety of system design perspectives like computation efficiency, robust verification, lightweight and constant communication cost, etc. in related work. In [112], certificateless PDP is proposed for public cloud storage to address key escrow problems and key management of general public key cryptography and solve the security problems(verifiers were able to extract original data of users during integrity verification time) of [113, 120].

    • Proof of retrievability verification: Proof of retrievability(PoR) ensures data intactness in remote cloud storage. Both PoR and PDP perform similar functions with the difference that PoR scheme has the ability to recover faulty outsourced data whereas PDP only supports data integrity and availability of data to clients [108]. In [109], IPOR scheme is introduced which ensures 100% retrieval probability of corrupted blocks of original data file. DIPOR scheme also supports data retrieval technique of partial health records along with data update operation [115].

    • Auditing verification: Verification of cloud data which is outsourced by the data owner is known as the audit verification process. Data integrity scheme supports two types of verification: Private auditing verification(verification is done between CSP and data owner i.e. cloud user) and Public auditing verification (cloud user hiers a TPA to reduce computational and communication overhead at ownside and verification is done between CSP and TPA) [122]. Privacy-preserving public auditing [83, 122], certificateless public auditing [125],optimized public auditing scheme [123] ,bitcoin-based public auditing [88], S-audit public auditing scheme [108], shared data auditing [83], Dynamic data public auditing [126] Non-privacy preserving public auditing scheme [127], digital signature(BLS, hash table, RSA etc. ) based public auditing scheme [88, 119, 128] etc. are some types of public auditing schemes. A private auditing scheme was first proposed by [110] called SW method and further reviewed by some research works[[87, 116].

Table 3 Taxonomy of applicable Data Integrity Phases

Characteristics of data integrity technique

In this review article, focuses on several quality features of data integrity, which have individually prime importance in cloud storage security. These are:

  • Public Auditability: The auditability scheme examines the accuracy of stored outsourced data from data owner at cloud storage by TPA according to the request of data owners [94, 95].

  • Audit correctness: The proof message of CSP can pass the validation test of TPA only if CSP and TPA are being honest and CSP, data owner properly follow the pre-defined process of data storing [89, 78].

  • Auditing soundness: The one and only way to pass TPA’s verification test is that CSP has to store the data owner’s entire outsourced data at cloud storage [90].

  • Error localization at block level: It helps to find out error blocks of a file in cloud storage during verification time [89].

  • Data Correctness: It helps to rectify error data block with available replica block’s information in cloud storage [89].

  • Stateless Auditor: During verification, a stateless auditor is not necessary to maintain, store or update previous results of verification for future usages [88, 95].

  • Storage Correctness: CSP prepares a report which shows that all data is entirely stored in cloud storage even if the data are partially tempered or lost. Therefore, the system needs to guarantee data owners that their outsourced data are the same as what was previously stored [129].

  • Robustness: In probabilistic data integrity strategy, errors in smaller size data should be identified and rectified [39].

  • Unforgeability: Authenticated users can only generate a valid signature/metadata on shared data [129].

  • Data Dynamic support: It allows data owners to insert, edit and delete data in the cloud storage by maintaining the constant level of integrity verification support like previous [89].

  • Dependability: Data should be available during managing all the file blocks time [89].

  • Replica Audibility: It helps to examine the replicas of the data file stored in the cloud storage by TPA on demand with data owners [89].

  • Light Weight: It means that due to the occurrence of a large number of blocks and the presence of multiple users in the system, signature process time should be short to reduce the computational overhead of clients[88, 97].

  • Auditing Correctness: It ensures that the response message from the CSP side can pass only the verification trial of TPA when CSP properly stores outsourced data perfectly into cloud storage [97].

  • Privacy Protection: During verification, the auditing scheme should not expose a user’s identity information in front of an adversary [90, 97].

  • Efficient User Revocation: The repeal users are not able to upload any data to cloud storage and can not be authorized users any more [78].

  • Batch Auditing: In the public auditing scheme, batch auditing method is proposed for doing multiple auditing tasks from different cloud users which TPA can instantly perform [95].

  • Data Confidentiality: TPA can not acquire actual data during data integrity verification time [90].

  • Boundless Verification: Data owners never give TPA any obligate condition about a fixed no. of verification interaction of data integrity [88].

  • Efficiency: The size of test metadata and the test time on multi-owner’s outsourced data in cloud computing are both individualistic with the number of data owners [95].

  • Private Key Correctness: Private key can pass verification test of cloud user only if the Private key Generator (PKG) sends a right private key to the cloud user [90].

  • Blockless Verification: TPA no need to download entire blocks from cloud storage for verification [95].

Challenges of data integrity technique in cloud environment

Security challenges of data integrity technique in cloud computing always come with some fundamental questions:

  • how outsourced data will be safe in a remote server and how data will be protected from any loss, damage, or alteration in cloud storage?

  • how security will assure cloud data if a malicious user is present inside the cloud?

  • On which location of shared storage, outsourced data will be stored?

  • Will legitimate access to the cloud data be by an authorized user only with complete audit verification availability?

All the above questions are associated with the term privacy preservation of data integrity scheme and that’s why data integrity in cloud computing is a rapidly growing challenge still now. Refer Table 4, for existing solutions to security challenges and corresponding solutions of data integrity techniques.

  1. a)

    Risk to integrity of data: This security is divided into three parts:

    • during globally acquiring time, cloud services are hampered by many malicious attacks if integrity of database, network etc. are properly maintained.

    • Data availability and integrity problems occur if unauthorized changes happened with data by CSP.

    • Segregation problem of data among cloud users in cloud storage is another problem of data integrity. Therefore, SLA-based patch management policy, standard validation technique against unauthorized use and adequate security parameters need to be included in data integrity technique [131].

  2. b)

    Dishonest TPA: A dishonest TPA has two prime intentions:

    • TPA can spoil the image of CSP by generating wrong integrity verification messages.

    • TPA can exploit confidential information with the help of malicious attackers through repeated verification interaction messages with cloud storage.

    Hence, an audit message verification method must be included in a data integrity verification scheme to continuously analyze the intentional behavior of TPA

  3. c)

    Dishonest CSP: An adversary CSP has three motives: i) CSP tries to retrieve either the original content of the entire data file or all block information of the data file and this leakage data information are used by CSP for business profit. ii) CSP can modify the actual content of a file and use it for personal reasons. But in both cases, the data owner can not detect the actual culprit. iii) CSP always tries to maintain its business reputation even if outsourced data of owner are partially tempered or lost Particularly, for that reason, an acknowledged verification method, an error data detection method and an error data recovery method should be included in data integrity scheme to maintain intactness of data and confidentiality of data [89, 132].

  4. d)

    Forgery Attack at Cloud Storage: Outsider attacker may forge a proof message which is generated by CSP for the blocks indicated by challenge message to respond TPA. Malicious auditors may forge an audit-proof that passes the data integrity verification [88, 90].

  5. e)

    Data modification by an insider malicious user into cloud storage : An insider malicious user can subvert or modify a data block at his/her will and can fool the auditor and data owner to trust that the data blocks are well maintained at the cloud storage even if that malicious user alters the interaction messages in the network channel. Hence data confidentiality scheme or obfuscation data technique should be included in data integrity technique [92].

Table 4 Security Challenges of Cloud Storage with its Solutions

Desire design challenges of data integrity strategy

Below are the main design issues for data integrity schemes:

  1. a)

    Communication overhead: It means total outsourcing data, which is transferred from client to storage server, transfer of challenge message to CSP, transfer of the proof message to TPA, transfer of audit message to client all are communication overhead. Table 5 ,compares the communication overhead incurred during public auditing by DO, LCSP, and RTPA. Since DO always sends either their original file, an encrypted file, or an encrypted file with a signature to a cloud server, most articles here consider communication overhead for creating challenge messages and challenge-response messages, which is not included in DO’s communication overhead.

  2. b)

    Computational overhead: Data preprocessing, signature generation and audit message verification from data owner side or trusted agent side, challenge message generation, data integrity verification and audit message generation from the TPA side, prof message generation from CSP side all are computational overhead. In [97], the computational overhead of client, CSP and TPA are less than [124] because ZSS signature requires less overhead of power exponential and hash calculation than BLS signature. Table 6 compares the computational overhead incurred during public auditing by DO, LCSP, and RTPA. Here, Pair denotes bilinear pairing operatons, Hash denotes hash function, Mul denotes multiplication operation, ADD denotes addition operation, Exp denotes exponential operation,Inv denotes inverser operation,Encrypt denotes encryption operation, decrypt denotes decryption operation,and Sub denotes subtraction operation etc.

  3. c)

    Storage overhead: Entire file or block files, metadata, signature, and replica blocks are required to be stored at cloud storage and at client side depending on the policy of system models. Cloud user storage overhead should be little during auditing verification to save extra storage overhead [36].

  4. d)

    Cost overhead: It denotes the summarized cost of communication overhead, computational overhead, and storage overhead.

  5. e)

    Data Dynamic Analysis: Stored data in cloud storage is not always static. Sometimes, alternation of data, deletion of data or addition of new data with old one are basic functions that come into the practical picture due to the dynamic demanding nature of clients. Therefore, data integrity verification should be done after all dynamic operations on stored data. In [93], the insertion, deletion and updation time of increasing data blocks are less than [123] due to less depth of the authenticated structure of the dynamic data integrity auditing scheme.

Table 5 Comparison of Communication Overhead between DO, CSP and TPA During Auditing Phase
Table 6 Comparison of Computational Overhead between DO, CSP, and TPA During Auditing Phase

Comparative analysis of data integrity strategies

integrity checking scheme

Table 7 Comparative Analysis of the data integrity strategy of cloud storage

This section presents a comparative study and comparison of data integrity strategies. Table 7 shows a comparative analysis of the data integrity strategy of cloud storage for expected design methods with limitations. Zang et al. [88] introduced a random masking technique in public audibility scheme during the computation of proof information generation time. Due to the linear relationship between the data block and proof information, malicious adversaries are capable of inert the effectiveness of the SWP scheme. In the SWP scheme, CSP generates proof information and sends it to TPA for verification. There may be an uncertain situation arise when CSP is intruded on by an external and malicious adversary that can alter every data block’s information. To hoax TPA and pass the verification test, a malicious adversary can eavesdrop challenge message and break off the proof message. Therefore, in the SWP scheme, we assume that TPA is the trustworthy element. But practically, it is not possible. To defend against external malicious adversaries without a protective channel, the author proposed here a nonlinear disturbance code as a random masking technique to alter the linear relationship into a nonlinear relationship between data blocks and proof messages. The author applied a BLS hash signature on each block to help the verifier for random block verification. These public audibility verification techniques assure boundless, effective, stateless auditor and soundness criteria with two limitations are that due to the missing data storing acknowledge verification, the reputation of the Cloud services may be destroyed and this scheme is applicable for only static data.

M Thangavel et al. [89] proposed a novel auditing framework, which protects cloud storage from malicious attacks. This technique is based on a ternary and replica-based ternary hash tree which ensures dynamically block updating, data correctness with error localization operation, data insertion, and data deletion operations. W. Shen et al. [90] introduced identity-based data auditing scheme to hide sensitive information at the block level for securing cloud storage during data sharing time. Using this scheme, sanitizer sanitizes data blocks containing sensitive information. Chameleon hash and an unforgeable chameleon hash signature do not provide blockless auditing and require high computational overhead. Hence, this PKG-based signature method assures blockless verification and reduces computational overhead. These public audibility verification techniques assure auditing soundness, private key correctness, and sensitive information hiding one limitation is that due to missing audit messages, TPA can deceive users about data verification. S.Mohanty et al. [85] introduce a confidentiality-preserving auditing scheme by which cloud users can easily verify the risk of the used service from the audit report which is maintained by TPA. This scheme has two benefits. First, it helps to check the integrity of cloud users’ data. Second, it verifies the TPA’s authentication and repudiation. In this scheme, the author proposed a system model which supports the basic criteria of cloud security auditing, confidentiality, and availability. HMAC-MD5 technique is used on metadata to maintain data privacy on the TPA side. Chen et al. [61] proposed MAC oriented data integrity technique based on the metadata verification method which reinforces auditing correctness. These technique helps to protect stored data in cloud storage from MitM and replay attacks. But this scheme needs to improve because, after some repeated pass of challenge-proof messages, CSP will have the ability to get actual block elements of the user’s confidential data.

S. Hiremath et al. [87] established a public blockless data integrity scheme that secures fixed time to check data of variable size files. For data encryption, the author uses the AES algorithm and SHA-2 algorithm for the data auditing scheme. The author uses the concept of random masking and Homomorphic Linear Authenticator (HLA) techniques to ensure stored data confidentiality during auditing time. But this scheme is only applicable for static data stored in cloud storage. Hence, it needs to expand for dynamic data operations. T. Subha. et al. [86] introduced the idea of public auditability to check the correctness of stored data in cloud storage and assume that TPA is a trusty entity. Data privacy mechanisms like Knox and Oruta have been proposed here to grow the security level at cloud storage and resist active adversary attacks. The author uses the Merkle hash tree to encrypt data block elements. B.Shao et al. [93] established a hierarchical Multiple Branches Tree(HMBT) which secures users’ data auditing correctness, fulfills the crypto criteria of data privacy, and gives protection against forgery and replay attacks. The scheme is used a special hash function to give BLS signature on block elements and helps in public auditing.DCDV is a concept based on a hierarchical time tree and Merkle hash tree. Simultaneous execution of access control and data auditing mechanism rarely happens in attribute-based cryptography. Hence, Dual Control and Data Variable(DCDV) data integrity scheme is proposed in [132]. This scheme ensures the solution of the private data leakage problem by the user’s secret key and assures the correctness of the auditing scheme. A PDP technique is proposed for data integrity verification scheme that supports dynamic data update operations, reduces communication overhead for fine-grained dynamic update of Bigdata increases the protection level of stored data at cloud storage, and resists collusion resistance attacks and batch auditing [114]. Another novel public auditing scheme based on an identity-based cryptographical idea ensures low computational overhead from revocated users during the possession of all file blocks. It fulfills the crypto criteria of soundness, correctness, security, and efficiency of revoke users [78].

Some research works introduced BLS cryptographical signature which has the shortest length among all available signatures [88]. This signature is based on a special hash function that is probabilistic, not deterministic. Also, it has more overhead of power exponential and hash calculation. To overcome signature efficiency and computational overhead, a new signature ZSS is proposed [97]. This integrity scheme supports crypto criteria like privacy protection, public auditing correctness, and resisting message and forgery. An attribute-based data auditing scheme is proposed in [137] which proved data correctness and soundness based on discrete logarithm and Diffie-Hellman key exchange algorithm. This scheme maintains the privacy of confidential data of cloud users and resists collusion in blocks during auditing verification time. attacks. ID-based remote data auditing scheme(ID-PUIC) is introduced here which secures efficiency, security, and flexibility with the help of the Diffie-Hellman problem [98]. It also supports ID-based proxy data upload operation when users are restricted to access public cloud servers. It shows a lower computation cost of server and TPA than [107]. Both researches works [105, 126] have worked on public checking of data intactness of outsourced data and reducing communication and computational cost of the verifier. These also support dynamic data auditing, blockless verification, and privacy preservation.

Future trends in data integrity approaches

As further research work, we are discussing here the future direction of the data integrity scheme to enlarge the scope of cloud data security for research process continuity. New emerging trends in data integrity schemes are listed below.In [39], authors have already discussed and shown evolutionary trends of data integrity schemes through a timeline representation from 2007 to 2015 which presented possible scopes of data integrity strategy. Hence, we show a visual representation of all probable trends of the integrity scheme from 2016 to 2022 in the timeline infographic template, Fig. 2.

  1. a)

    Blockchain data-based integrity : Blockchain technology is decentralized, peer-peer technology. It supports scalable and distributed environments in which all the data are treated as transparent blocks that contain cryptographic hash information of the previous block, and timestamps to resist any alteration of a single data block without modifying all the subsequent linked blocks. This feature of this technology improves the performance of cloud storage and maintains the trust of data owners by increasing data privacy through the Merkle tree concept. In [138], a distributed virtual agent model is proposed through mobile agent technology to maintain the reliability of cloud data and to ensure trust verification of cloud data via multi-tenant. In [139], a blockchain-based generic framework is proposed to increase the security of the provenance data in cloud storage which is important for accessing log information of cloud data securely. In [140,141,142], all research works have the same intention of using blockchain technology to enhance data privacy and maintain data integrity in cloud storage.In Table 8, this article show use Blockchain technology to overcome some issues of cloud storage.

  2. b)

    Data integrity in fog computing : Generally, privacy protection schemes are able to resist completely insider attacks in cloud storage. In [147], a fog computing-based TLS framework is proposed to maintain the privacy of data in Fog servers. The extension part of cloud computing is fog computing which was firstly introduced in 2011 [148]. The three advantages of fog computing are towering real-time, low latency, and broader range geographical distribution which is embedded with cloud computing to ensure the privacy of data in fog servers which is a powerful supplement to maintain data privacy preservation in cloud storage.

  3. c)

    Distributed Machine Learning Oriented Data Integrity : In artificial intelligence, maintaining the integrity of training data in the distributed machine learning environment is a rapidly growing challenge due to network forge attacks. In [136], distributed machine learning-oriented data integrity verification scheme (DML-DIV) is introduced to assure training data intactness and to secure training data model. PDP sampling auditing algorithm is adopted here to resist tampering attacks and forge attacks. Discrete logarithm problem (DLP) is introduced in the DML-DIV scheme to ensure privacy preservation of training data during TPA’s challenge verification time. To reduce key escrow problem and certificate cost, identity-based cryptography and key generation technology is proposed here.

  4. d)

    Data Integrity in Edge Computing : Edge computing is an extensional part of distributed computing. Cache data integrity is a new concept in edge computing developed based on cloud computing which serves optimized data retrieval latency on edge servers and gives centralized problems of cloud storage server.Edge data integrity(EDI) concept is first proposed to effectively handle auditing of vendor apps’ cache data on edge servers which is a challenging issue in dynamic, distributed, and volatile edge environments described In [149]. Research work proposed here EDI-V model using variable Merkle hash tree (VMHT) structure to maintain cache data auditing on a large scale server through generating integrity of replica data of it. In [150], the EDI-S model is introduced to verify the integrity of edge data and to localize the corrupted data on edge servers by generating digital signatures of each edge’s replica.

Fig. 2
figure 2

Timeline Infographic of Data Integrity

Table 8 Cloud Computing with Blockchain Technology and its merit with regards to storage level data integrity strategies

Conclusion

With the continuously enlarging popularity of attractive and optimized cost-based cloud services, it is inconvenient to make sure data owners the intactness of outsourced data in cloud storage environments has become a disaster security challenge. We have tried to highlight several issues and the corresponding solution approaches for cloud data integrity which will provide a visualization as well as clear directions to researchers. The current state of the art in this mentioned research field will provide extra milestones in several areas like cloud-based sensitive health care, secured financial service, managing social media flat-forms, etc. In this paper, we have discussed phases of data integrity, characteristics of data integrity scheme, classification of data integrity strategy based on the type of proposal, nature of data and type of verification schemes, and desired design challenges of data integrity strategy based on performance overhead. We have also identified issues in physical cloud storage and attacks on storage-level services along with mitigating solutions. Lastly, we have established here a timeline infographic visual representation of a variety of data integrity schemes and future aspects of data integrity strategy to explore all the security directions of cloud storage.