1 Introduction

The expansion of communication and technology in the current era has sparked demand for information security. Establishing secure communication and preventing data from unauthorized access cryptography is essential. Cryptography refers to a range of approaches derived from mathematical concepts that are aimed at concealing information. Cryptography provides numerous algorithms to protect data from digital threats. Cryptographic algorithms such as classical encryption and modern encryption techniques are often used. These algorithms encrypt the data at the transmission end and decrypt it at the reception end. By encrypting the data, the original text (plain text) is transformed into meaningless random text (ciphertext). These ciphertext are transmitted through a network received by the receiver. During decryption, ciphertext is transformed into a meaningful message. The data are encrypted and decrypted using a randomized set of bits called an encryption key. However, these algorithms differ in their level of efficiency, which is determined by factors such as the successful retrieval of records and strong resistance to cyber-attack.

The encryption processes for all of these algorithms use key values. Cryptography can be segmented into symmetric and asymmetric methods. Symmetric key cryptography uses the same key for both enciphering and deciphering, whereas in asymmetric key cryptography, the different keys are public and private keys for encryption and decryption. The remainder of this paper is organized as follows. Section 2 presents the related work based on a literature survey. Section 3 describes the proposed algorithm and its process. Section 4 presents experimental results. Finally, a summary of the study is presented.

2 Literature Review

The playfair cipher is a well-known symmetric encryption technique invented by Charles Wheatstone in 1854 [1]. This algorithm enhances the security of the transmitted information by encrypting letters in pairs. The conventional Playfair algorithm is composed of a 5 × 5 key matrix that can encrypt a maximum of 26 letters, with I and J placed in the same matrix square. This is a substitution cipher that encrypts pairs of letters, instead of a single letter. The Playfair cipher uses a 5 × 5 table containing a keyword, and the remaining spaces are filled with letters of the alphabet in order of I/J in the same location. To perform encryption, the plain text is broken into bigrams and substituted with an equivalent ciphertext from the key table. Because the plaintext is broken into a diagraph, an odd number of letters in the message require a filler. The choice of filler can be infrequent. A weakness of the traditional playfair algorithm is that it consists of only 26 uppercase letters. Standard filler of ‘X’ is used. Not all other numbers, symbols, or special characters can be included in the matrix. In general, I and J occupy the same space in the matrix; hence, one letter must be omitted from the reconstruction [2].

There are a few reasons for the non-preference of traditional play fair ciphers, as it can be easily cracked if there is adequate text and frequency analysis of the bigrams is possible. Binary 4 × 4 Playfair [3] proposed increases the run time with an increase in the input size. The proposed algorithm can encrypt letters, numbers, symbols, and any type of media files. The rotation of the key matrix creates randomness in the algorithm, which enhances the security.

The extended Playfair approach fails to encrypt special characters but has the advantage of not using I and J in the same block [4]. The 6 × 6 payload cipher proposed by considering numbers, space, and alphanumeric characters cannot encrypt special characters [5, 6]. Furthermore, the authors did not consider the avalanche effect in their study [7]. The authors utilized an 8 × 8 matrix with the concept of graph labeling to safeguard the data against unauthorized access. An enhanced square-key matrix with LFSR showed an improvement in the avalanche effect with a matrix size of 10 × 10 [8].

The authors used a 10 × 10 matrix with all possible keys exhibiting a minimum avalanche effect [9]. All possible characters on the keyboard that allow the repetition of characters in the keywords are considered. A modified 16 × 16 matrix [10] that considers all possible numbers, characters, and letters was proposed. The main limitation is the size of the matrix. They used symmetric encryption with a single key. In addition to the above limitations, Playfair is prone to frequency-analysis attacks [11]. Hence, to overcome these limitations and mention the Playfair cipher in relation to data authentication and copyright protection in 5G networks [12] [13], its applications in wireless sensor networks [14], its use in homomorphic encryption [15], and steganography [16, 17] have sparked our interest, prompting us to propose a new modified cipher algorithm following the rules of Playfair. Although Playfair is a classical encryption technique, it demonstrates a better performance when addressing time complexity [18]. In addition, the Playfair cipher offers effective file-text security measures [19, 20]. The main limitation of [19] is that the processing times for encryption and decryption are linearly proportional.

Data Communication, processing, and storage are essential in current information systems. Despite advancements in technology, the storage and transmission of massive volumes of data are serious tasks that must be considered. To overcome these issues, several compression techniques have been proposed to reduce the size of data by demanding a reduced area. Data Compression is the process of coding data with fewer bits than is usual. These techniques depend purely on the nature of data to be compressed. Several compression methods such as run length encoding, Huffman coding, Arithmetic Coding, LZW, JPEG, JPEG2000 have been proposed and implemented [21].

The proposed method uses the Lempel–Ziv–Markov chain algorithm (LZMA), which uses a constantly changing dictionary of strings to adaptively compress a stream of data by replacing common substrings by up to upto4Gb. It is essential to increase storage capacity, speed up file transfer, and reduce the cost incurred for hardware storage and bandwidth. The LZMA performs lossless data compression. Compared to other compression algorithms, LZMA has a higher compression ratio. A compressed stream is a stream of bits encoded using an adaptive binary-range coder. Although LZMA is primarily designed for compression, it can be adapted for use in encryption scenarios to enhance the data security. The fundamental idea is to utilize LZMA to compress plaintext data before encrypting them, resulting in an improved encryption efficiency.

Improved Encryption Efficiency: By applying LZMA compression to plaintext data before encryption, the resulting compressed data stream often contains a high frequency of recurring patterns and a reduced entropy [29]. Encryption algorithms operate more effectively when dealing with structured and predictable data rather than random data. The ability of LZMA to identify and eliminate redundancy helps create a more structured input for encryption, enhancing the overall efficiency.

1. Improved Energy Efficiency

By applying LZMA compression to plaintext data before encryption, the resulting compressed data stream often contains a high frequency of recurring patterns and a reduced entropy. Encryption algorithms operate more effectively when dealing with structured and predictable data rather than random data [30]. The ability of LZMA to identify and eliminate redundancy helps create a more structured input for encryption, enhancing the overall efficiency.

2. Enhanced Security

One of the critical advantages of incorporating LZMA [22] in encryption is increased security through obfuscation. The compressed output of LZMA does not reveal any identifiable patterns in the original plaintext. This feature makes it more difficult for potential attackers to gain insight into the nature of the data, thereby adding an additional security layer.

3. Bandwidth Optimization

In scenarios in which limited bandwidth is a concern, LZMA compression can significantly reduce the size of the data to be transmitted. A smaller data size directly translates to reduced bandwidth requirements and faster transmission times. By compressing the data before encryption, the LZMA aids in optimizing the utilization of network resources.

Trade-offs: It is important to consider trade-offs when integrating LZMA with encryption. The LZMA compression is a computationally intensive process that requires additional resources and time. Therefore, the computational overhead introduced by compression must be balanced by the potential benefits gained, in terms of enhanced encryption efficiency and security [31].

3. TRADE-OFFS

It is important to consider trade-offs when integrating LZMA with encryption. The LZMA compression is a computationally intensive process that requires additional resources and time. Therefore, the computational overhead introduced by compression must be balanced by the potential benefits gained, in terms of enhanced encryption efficiency and security.

3 Proposed modified play fair algorithm

The proposed algorithm generates 7 × 7 Playfair matrix to encrypt and decrypt “.txt” file. This matrix covers the upper- and lower-case letters, numbers, and special characters. A flow diagram of proposed model for encryption is divided into encryption input module and output module as shown in the Figs.1 and 2. P(x) is the original input and C(x) is the decrypted value. Similarly, the decryption processes also divided into decryption input and output module as shown in Figs. 3 and 4.

Fig. 1
figure 1

Proposed encryption input module

Fig. 2
figure 2

Proposed Encryption output module

Fig. 3
figure 3

Proposed Decryption Input Module

Fig. 4
figure 4

Proposed Decryption Output Module

The proposed Playfair matrix encryption and decryption algorithm follows the following steps:

  1. 1)

    By dividing the plaintext into sections, each section can have two similar or dissimilar characteristics.

  2. 2)

    In case of similar characteristics, a filler character was added between them. The choice of filler used in the proposed algorithm was a random variable.

  3. 3)

    Encryption was performed following the rules of the traditional Playfair algorithm, except that the Playfair matrix table was generated based on the modified Septenary Quadrate Pattern, as shown in Table 2.

  4. 4)

    The number of filler characters and location of filler characters used in the cipher text were embedded in the cipher text. This enables decryption of the cipher text to obtain information regarding the fillers.

  5. 5)

    The size of the encrypted file increased because of the number of fillers added. Hence, to reduce the size of the encrypted file for transmission, the Lempel–Ziv–Markov chain algorithm (LZMA) compression algorithm, which retains the original size of the text file, is used for encryption.

  6. 6)

    The encoding schemes used are ANSI, UTF-8, UTE-16BE, UTF-16LE, UTF-32BE, UTF-32LE, and ASCII. To enhance the security, a random encoding scheme was chosen for each run.

Here is a high-level mathematical representation of the proposed process

Let the variables be:

  • P: Plaintext

  • C: Ciphertext

  • F: Filler character

  • M: Playfair matrix table

  • E: Encryption function

  • D: Decryption function

  • LZMA: LZMA compression algorithm

  • Enc: Encoding scheme

1) Divide the plaintext into sections

This step can be represented by dividing plain text P into pairs (P1, P2) of similar or dissimilar characters.

2) Add filler characters

If P1 and P2 have similar characteristics, the filler character F is inserted between them.

3) Add filler characters

The encryption function E encrypts each pair of characters (P1, P2) using the modified Playfair algorithm with Playfair matrix table M as mentioned in Eq.1

$$C\, = \,E\left( {P,M} \right)$$
(1)

4) Embedding filler character information

The number of filler characters and their locations used in ciphertext.

5) Compression using lzma

The LZMA compression algorithm was applied to ciphertext C to reduce its size, while retaining the original text size as mentioned in Eq.2.

$$C\_compressed\, = \,LZMA,compress(C)$$
(2)

6) Coding scheme selection

A random encoding scheme, Enc, was chosen for each run to enhance security. The selected encoding scheme was used to encode the compressed ciphertext C _compressed as given in Eq.3.

$$C\_encoded\, = \,Enc.encode(C\_compressed)$$
(3)

The LZMA (Lempel-Ziv-Markov chain Algorithm) compression algorithm used in our algorithm involves several steps and equations. These equations and concepts provide a general understanding of LZMA compression algorithms. However, the LZMA is a complex algorithm with multiple variations and optimizations.

1) Probability computation

The LZMA algorithm utilizes a context modeling approach in which the probabilities of symbols are calculated based on previous symbols in the input data. The probability of a symbol 'S' occurring after context 'C' is calculated as given in the Eq.4.

$$P\_old(S|C)\, = \,\frac{{Count\left( {C,S} \right)}}{Count\left( C \right)}\,$$
(4)

2) Probability updating

As new symbols are processed, the probabilities are updated using adaptive models. One common method for updating probabilities is called the “mixing probabilities” or "adaptive probability update" mentioned in Eq.5.

$$P\_new(S|C)\, = \,\left( {1 - \alpha } \right)\, * \,P\_old\left( {S|C} \right) + \alpha * P\_uniform$$
(5)

Here, P_old(S|C) is the previous probability of symbol ‘S' occurring after context 'C, ’ P_new(S|C) is the updated probability, P_uniform is a uniform probability distribution, and α is a mixing parameter between 0 and 1.

3) Encoding and decoding equations

The LZMA algorithm uses variable length encoding to represent symbols. The exact equations for encoding and decoding depend on the specific implementation of the LZMA algorithm, and may involve concepts such as range coding, bit manipulation, and data structures such as binary trees or hash tables.

4) Dictionary and matching

The LZMA algorithm employs a dictionary to store previously observed sequences of symbols. The matching process involves determining the longest match of symbols in the dictionary for the current input. The LZMA algorithm uses various data structures and algorithms to search and update the dictionary efficiently.

5) Compression ratio

The compression ratio (CR) is a measure of the compression achieved by the LZMA algorithm. It is calculated as per Eq.6. Here, Compressed Size refers to the size of the compressed data, and Uncompressed Size refers to the size of the original uncompressed data.

$$CR\, = \,\frac{Compressed\,Size}{{Uncompressed\,Size}}$$
(6)

The most significant advantage of the proposed algorithm is that it can handle files of any size and size. The encryption and decryption times of a file depend on its size. Frequency analysis is typically used to break traditional play-fair algorithms. The proposed method was proven to withstand frequency analysis attacks because of its 7 × 7 matrix size, covering 98 characters with unpredictable random fillers.

The proposed algorithm is extremely sensitive to the keys. The randomness and predictability of filler generation results in good key sensitivity. This is because the fillers are generated randomly, and information regarding the position of the fillers is highly random in cipher text. In addition, the selection of fillers is random for the entire text file and is not the same as that in the Playfair algorithm. This ensures that the proposed algorithm is secure against brute-force attacks. The lookup table computation for the proposed algorithm appears to be more complex, significantly increasing its efficiency.

The main restriction of the play fair cipher is that it encrypts only 26 characters. Special characters and numbers could not be encrypted using a 5 × 5 key matrix. Furthermore, keywords can take only 26 letters without duplicates. We propose a novel method for communicating data securely by using a 7 × 7 key matrix. The advantage of the proposed seasonal quadrate model is that it is designed to hold all possible characters present on a normal keyboard that is used for commercial purposes. This created a large key size and combination. In the proposed model, in addition to previous studies and matrices, a new pattern is created using a biograph to fill the elements in each location. The nomenclature for the generation of a pattern for a defined key is similar to that of the traditional 5 × 5 matrix; however, other elements in the key matrix are positioned randomly, which creates complexity in defining the elements of the key matrix. Hence, the key size and the combination of key matrices are significantly improved. The default key matrices are listed in table 1.

Table 1 Default septenary quadrate pattern

Once a key table is created, plain text encryption can be performed. During this process, the size of the cipher text was large because fillers were included. The addition of filler indices to encrypted message content serves as a source of confusion for intruders, thereby improving confidentiality. To reduce the size of the enciphered text file and induce security, the cipher text was compressed. Prior to compression encoding, the encoding type was randomized. This increases the redundancy in the cipher text, making it difficult for the intruder to access the data.

The encoding schemes used were ANSI, UTF-8, UTE-16BE, UTF-16LE, UTF-32BE, UTF-32LE, and ASCII. The compression algorithm reduces the size of the encrypted document by providing lossless compression with a high compression ratio. Likewise, the fillers used in this study were highly randomized, with a reduced pattern size compared with previous studies. The digraph character and substitution character accession times were significantly reduced.

The proposed algorithm can also be applied to words, sentences, and paragraphs. The encrypted file was compressed using a random encoding scheme to ensure size reduction for easy transmission. This methodology uses the brute-force and avalanche effects. The avalanche effect is a desirable property of any cryptographic algorithm, that is, a small change in the plaintext (or key) should create a significant change in the ciphertext as mentioned in Eq.7.

$$Avalanche\,Effect\, = \,\frac{Number\,of\,flipped\,bits\,in\,cipher\,text}{{Total\,number\,of\,bits\,in\,cipher\,text}}$$
(7)

The deciphered text is authentic since the fillers used are removed at the decryption.

The recorded outputs of the samples are presented in Table 2. The compression ratio, as given in Eq. 8, was approximately 0.4 to 0.8, depending on the size of the ciphertext. Using the proposed algorithm, a good compression ratio is achieved for cipher text with a size greater than 10 KB. It was observed that when the size of the text file increased, the compression ratio improved proportionally. The compression ratio achieved here is similar to that of Ref.[24].

$$Compression\,ratio\, = \,\frac{Size\,of\,cipher\,text - size\,of\,encoded\,cipher\,text}{{Size\,of\,Cipher\,text}}$$
(8)
Table 2 Compression ratio of encrypted message

4 Results and discussion

All simulations of the modified Playfair algorithm were carried out on an Intel core i4, 64-bit processor with 4 GB RAM and a processor speed of 1.10 GHz. The algorithm was coded using Python3.6, using Spyder IDE. The efficiency and security of the proposed method are demonstrated using the compression ratio and avalanche percentage, as tabulated in Tables 3 and 4. The CIA triad emphasizes the policies to be followed for information security within an organization; hence, for a good avalanche effect, cipher text should always satisfy a high avalanche value [23].

Table 3 Results depicting the avalanche effect
Table 4 Comparative Analysis

The proposed algorithm encrypts plain text, and the encrypted plain text is compressed using the Lempel–Ziv–Markov chain compression algorithm (LZMA) for easy transfer of cipher text. The algorithm was evaluated for randomly generated 110 text files of varying sizes from 2 to 200 KB. All these files were encrypted with keys 1 and 2 with varying character lengths, and it was observed that the generated cipher text was not identical but was of the same size.

The proposed algorithm produced an avalanche effect of more than 65% [25]. The experiment produced a drastic change in the output, even for a single-bit change in the input. Even for the same key, a large mismatch in the cipher text was found when evaluated for 100 samples, as shown in Table 3.This can withstand any frequency attack.

The sample output is shown in Fig. 5. This is a comparative output of the same plain text using different keys. Table 4 shows a comparison of the proposed algorithm with existing playfair techniques. Compared to existing techniques, our proposed method exhibits a 10% higher avalanche effect. Furthermore, a larger file size is considered. Larger file sizes owing to complex algorithms are inevitable. Future research should focus on reducing the memory area in order to make it more suitable for IOT devices.

Fig. 5
figure 5

Sample output for two different keys for the same plain text

Tradational playfair algorithm works on a matrix size of 5 × 5 and with limited characters of 25, while this proposed model can handle characters in a 7 × 7 matrix of 98 characters. This enables text to be encrypted in variety of ways. Listed are the advantages of the proposed algorithm:

  1. 1.

    All the keyboard characters are included within the pattern.

  2. 2.

    The key size and key combinations are larger.

  3. 3.

    The fillers used are highly randomized when compared to traditional playfair algorithm.

  4. 4.

    The digraph character accession time and the substitution character accession time are highly reduced.

  5. 5.

    The Encryption and Decryption of a word, sentence and paragraphs are all possible.

  6. 6.

    The PlayFair cipher algorithm is enhanced using the proposed model with which text file encryption has been possible.

  7. 7.

    The addition of filler indices into the enciphered message content also serves as a source of confusion for the intruder.

  8. 8.

    The encrypted file is again compressed by using random encoding scheme to ensure size reduction for easy transmission.

  9. 9.

    The deciphered text is authentic since the fillers used are removed at the decryption.

5 Conclusion

In the modified Playfair algorithm, the traditional rules of the Playfair cipher are maintained, but the dimensions of the matrix are increased to 7 × 7. Unlike the traditional PlayFair cipher, there are no restrictions on the size of the keys used in this algorithm. This modification allows the algorithm to effectively handle text files of any size. During testing, the proposed algorithm demonstrated an avalanche effect ranging from 65 to 93.5%. The avalanche effect refers to the property of a cryptographic algorithm in which a small change in the input or key produces a significantly different output. A high avalanche effect indicates that the algorithm is robust and sensitive to changes in plaintext or key.

In addition, the algorithm achieved a compression ratio of 0.75 for a file size of 100 KB. This compression ratio signifies that the size of the cipher text is reduced, making it suitable for efficient file transfers. The LZMA compression algorithm is used in this process to further minimize the size of the cipher text. A notable advantage of the modified algorithm is its resistance to brute force attacks. Brute-force attacks involve trying all possible combinations of keys until the correct combination is obtained. Resistance of the algorithm to such attacks for the 49! Key combinations ensure the security of encrypted data. Moreover, the algorithm satisfies both the strict plaintext avalanche criterion and strict key avalanche criterion. These criteria evaluate the algorithm's ability to exhibit significant changes in the output cipher text when either the plaintext or the key is modified. By meeting these criteria, the algorithm ensures a high level of security and can be implemented as a security protocol in various applications including low-power embedded devices.