1 Introduction

Face age progression is becoming a widely used technique in the modern era as it serves numerous applications. For an instance, in law enforcement where the face age progression helps to find the missing children or missing person with the help of their previous photo [32, 43, 61], in face recognition [21, 38, 41], and in facial analysis like e-commerce platforms [16, 41], etc. Further, the use of a biometric system is to identify a person based on some particular characteristics. Nowadays, the face of humans is the most leading biometric area [41]. Because it is unique for each individual and provides a guarantee of accuracy and security. However, face age progression remains a challenging task because of facial variations like illumination effects, many intrinsic and extrinsic effects on the face [38]. An abundance of research is present to resolve these practical issues. Face age synthesis or face age progression is a human face appearance at different ages though preserving their unique identity. It is indicated by face geometry, skin texture, skin color, etc. The facial characteristics of human changes with progress in life [40] as illustrated in Fig. 1. Figure 2 shows the synthesized super-resolution face age-progressed images.

Fig. 1
figure 1

Changes in the face with age progression [40]

Fig. 2
figure 2

The synthesized super-resolution face age-progressed images. The diversity of the input images are shown in the image with age (child, youngster, adult), ethnicity, gender, facial expression

First and foremost, the pandemic of COVID-19 has created the touchless face recognition system at the topmost. It is accepted globally to avoid the virus spread, leading from all the other biometric systems that require fingerprints, or any touch-based biometric services. The face of humans is used to provide immense authentication and security to the individual [42].

In law enforcement for keeping security high, the digital biometric passport of the person is used for face matching at the border check. The appearance of person changes as humans progresses in life in terms of both skin texture and geometric shape etc. So, face age progression is unique for humans and it can provide authentication for the specific person’s image from a database of digital passports.

In the health sector, deep learning has made substantial developments. The need has been shown for regularly checking the patients, remote consulting, health insurance ID card, so face recognition can provide health and comfort to humans [46].

Further, the touchless system of face age progression using machine learning can be useful for several facial recognition systems like banking [12]. Also, a possibility in the future with the help of face age progression in face recognition system, can decrease the number of visits required for updating the person’s photo to many service providers thus, delivers ease to the customers.

1.1 Face age progression with GANs

In various fields, GANs are robust performers. GANs have generated impressive results in many fields [13, 14, 55, 58]. They have shown wider applications in various fields such as image-to-image translation [19, 30, 66], text to speech generation [20, 47], and many more. But the biggest disadvantage is that GANs are also used for generating fake media content and are the expertise behind Deep fakes [34, 54, 59]. Face age progression using GANs have attained a lot of attention in the facial verification system. GANs have produced remarkable results in face age progression [2, 5, 37, 63]. The proposed work focuses on the face aging with a super-resolution of an image to determine its practical possibility.

The main contributions of the proposed work:

  1. 1.

    The proposed work presents a combined approach for face age progression using AttentionGAN and SRGAN. To the best of the author’s knowledge from literature, AttentionGAN is a primary attempt on face aging task in the proposed work.

  2. 2.

    In the proposed work, the use of a regex filter reduces the computational complexity as well as training time by selecting the synthesized face images.

  3. 3.

    The efficacy of the proposed method is assessed on the number of publicly existing datasets named UTKFace, CACD, and cross-dataset evaluation on IMDB-WIKI, CelebA, FGNET datasets.

  4. 4.

    The validation of the proposed work is verified by the various attacks like the presence of pose, expression, makeup, illumination. Moreover, simulation outcomes are also compared to existing approaches.

2 Related work

Previous work on face age progression had a focus on facial attributes like geometric growth of face [24, 44], wrinkles [4, 6, 35], face sub-regions [48, 49], and various techniques [45, 53, 62]. Some existing face age progression techniques are given in Table 1. The aging process was the cascade flow of mean faces from age clusters using their eigenfaces. Where the eigenfaces were termed as an appearance-based method that tried to capture the different collections of face images. After that, it took this information to code and compare the images of each face. Then, deep learning had gained tremendous attention in the computer vision field that had produced a remarkable result in many fields [3, 9, 15, 31, 33]. Now, the research on GAN is exploring the main aspect that is, an improvement in the training process and the second is positioning the GANs in real-world applications [17, 60]. The major goal of GAN is to acquire the distribution of generator pa to come close to real data distribution pb and cycle consistency loss tries to get back the original input from the synthesized image while it preserves the identity of an image. For each image, u from domain U, u to G(uto F(G(u)) ≈ u, is a forward direction cycle consistency loss. U and V are two domains, u and v are the images present in the respective domains.

Table 1 Some existing face age progression techniques

For each image u (domain U), u to G(uto F(G(u)) ≈ u, signifies the forward direction cycle consistency as:

$$ {\mathit{\min}}_G{\mathit{\max}}_D{E}_u\sim {p}_b\log \left[\mathrm{D}\left(\mathrm{u}\right)\right]+{E}_v\sim {p}_a\mathit{\log}\left[1-D(v)\right)\Big] $$
(1)
$$ {L}_{cyc\left(\mathrm{G},\mathrm{F}\right)={E}_{u\sim {Q}_{data(u)}}\left[\right\Vert \mathrm{F}\left(\mathrm{G}\left(\mathrm{u}\right)-\mathrm{u}\right\Vert \left]+{E}_{v\sim {Q}_{data(v)}}\right[\left\Vert G\right(F(v)-v\left\Vert \right]} $$
(2)

where G and F are generators, pa is generator probability distribution data, pb is real data distribution, u is real face image, v is the synthesized image, Lcyc is cyclic loss, and D is a discriminator.

The image-to-image conversion and pix2pix [19] used a paired dataset. This problem was eliminated by the two-domain image-to-image transformation model with an unpaired dataset, CycleGAN [66], and DiscoGAN [23] that had shown remarkable results in various domains like a horse to zebra conversion and many more. On the contrary, failure included not changing the shape of an object during the transition. In 2018, Spatial fusion GAN [64] for image synthesis combined the geometry and appearance synthesizer to attain realism in respective domains. Identity loss was introduced to preserves the features of an original face image. The MI-GAN framework worked on retinal images that generated the synthesized retinal images and their segmented masks. The synthesized images were realistic. The model learned the features from a small training dataset and outperformed in comparison to other methods [18]. Moreover, a contactless biometric system provided many benefits such as personal hygiene was maintained in a contactless system, it became more convenient, free from a contaminated surface. Thus, a multi-modal biometric system using face and palmprint can be used to provide more security and authentication to the system [7]. Palmprint has several unique features that can be used for the person’s identification [39]. A novel contribution was shown by the researcher in the field of palmprint recognition. In multi-instance contactless palmprint recognition (fusion system), features of left and right hands were extracted using 2-dimensional DCT (Discrete Cosine Transform) to compose a dual-source space. The experiment performed had shown that the designed Dual Power Analysis (DPA) outperformed single-source dual power analysis [29]. Besides this, StarGAN [8] was a multi-domain image-to-image conversion having one generator and a discriminator. This GAN had shown robustness for handling more than two domains. Progressive growing GAN [22] described the new training method for GAN where generator and discriminator grow progressively and produced astonishing results. The novel approach towards face age progression and regression with template face, considered the average face for ethnicity and age. The template face helped to generate the target face image for age progression and regression. The method had achieved accuracy and efficiency [11]. Further, the Laplacian Pyramid of Adversarial Networks (LAPGAN) [10] introduced a cascading convolutional neural network in a Laplacian pyramid framework that generated coarse-to-fine style output images. From the literature survey for future directions, some important novel contributions are mentioned to enhance security in the field of biometrics. To enhance the robustness against various security attacks, in a palmprint texture code. Because it suffers alignment issues while matching which was its obstacle to be directly adopted in biometric cryptosystems. Thus, 2DPHC (2DPalmhash code) based Fuzzy Vault had improved the key recovery rate [26]. Further, to enhance the privacy and security in the palmprint biometric system, a novel dual-key-binding scramble and 2D palmprint phasor algorithm was introduced. Thus, protect palmprint and information security. The existence of lacking in cancelability in the existing palmprint cryptosystem was overcome. This scheme could also be used in other biometric with some alterations. Also, the applications can be useful to further palmprint texture feature coding [27]. Palmprint authentication with the remote cancellable method based on multi-directional two-dimensional palm phasor (MTDPP) was proposed [28]. MTDPP was used as a cancellable palmprint template, provided biometric protection. Thus, the multi-model biometric with face and palmprint can be taken as the future scope in the contactless technology to provide more authenticity to an individual.

3 The proposed work

The suggested work translates an input face image to the required face aged image and produces high-resolution images with less computation time and storage space by the use of a filter process. Further, image sharpening with edge enhancement is used to provide better quality input to SRGAN.

For this aim, a three-stage learning framework is shown in Fig. 3. In this work, UTKFace and CACD large-scale publicly available datasets are used for training the network. The input images of the face are first pre-processed to attain only RGB images from UTKFace and CACD datasets. Then, manually separated into the four age groups 0–20, 21–40, 41–60, 60+ with good quality face images. Then training, test, and validation datasets with input and target images are prepared for experimental results. The images are resized and cropped to 100×100.

  • Stage 1: The pre-processed images are fed to AttentionGAN generator G to perform the face age progression with an image-to-image conversion AttentionGAN (scheme-II) [51]. The generator acquires both background and foreground attention to generates a high-quality face image with its identity preservation. The unique property of AttentionGAN is that the generator focuses on the foreground of the required image and simultaneously preserves the background of an input image efficiently with the help of attention and content masks. However, the input image goes to a generator that has sub-module parametric sharing encoder GE, content mask generator GC, and attention mask generator GA. The p-1 content masks are generated by generator GC. Also, p-1 foreground attention masks \( {\left\{{A}_v^f\right\}}_{f=1}^{n-1} \)and one background attention mask \( {A}_v^f \) are simultaneously generated by GA. The corresponding attention and content mask get multiplied with an input face image as shown in eq. (3) and generate the target face aged image. The high intensity in the attention mask contributes to change in the facial attributes. The use of the various attention and content masks helps in generating the face aged output image. Mathematically, the output is expressed as shown in eq. (3):

$$ G(u)={\sum}_{f=1}^{n-1}\left({C}_v^f\ast {A}_v^f\right)+u\ast {A}_v^f, $$
(3)

where \( \Big({\left\{{A}_v^f\right\}}_{f=1}^{n-1} \),\( {A}_v^f \)) are p attention mask, u is the input image, \( {C}_v^f \) is a content mask, G(u) generated target face aged image, U and V are two domains and u and v are images in the respective domain.

For the cycle consistency loss, the generated aged image is given to another generator F. Thus, the F generator similarly generates the content and attention masks for the foreground along with the background image. Therefore, combines them to generates the recovered face image. Again, in the generator F, various masks help to preserve the image information and get back the input image with minimum losses, thus preserving the identity of an image. The reconstruction of generated image G(u) to the original input image u is expressed mathematically in eq. (4):

$$ F\left(G(u)\right)={\sum}_{f-1}^{n-1}\left({C}_u^f\ast {A}_u^f\right)+\mathrm{G}\left(\mathrm{u}\right)\ast {A}_u^b, $$
(4)

where F(G(u)) is the reconstructed image that should be very close to the original image u. \( {A}_u^f \), \( {A}_u^b \), \( {C}_u^f \), G (u) are foreground attention mask, background attention mask, content mask, and generated image respectively. F is a generator similar to generator G which also consists of three subnets parametric sharing encoder FE, attention mask generator FA, content mask generator FC. Where FC tries to generate p-1 content mask and FA tries to generate p attention mask for foreground \( {\left\{{A}_u^b\right\}}_{f=1}^{n-1} \)and a background image\( {A}_u^b \). Then, the attention and content mask with the generated face image get multiplied according to eq. (4) to generate a reconstructed image. Mathematically, the objective of optimization in AttentionGAN Scheme II is expressed as shown in eq. (5):

$$ L={L}_{GAN}+{\lambda}_{cycle}\ast {L}_{cycle}+{\lambda}_{id}\ast {L}_{id}, $$
(5)

where LGAN is GAN loss, Lcycle is cyclic loss and Lid is identity preserving loss. Further, λcycle and λid are the parameters to control each term relation.

  • Stage 2: Output of AttentionGAN is fed to a conditional block where it is decided whether to apply the regex filter or not.

If conditional block output is yes, then a regex filter selects the synthesized face aged images from AttentionGAN. Because, AttentionGAN output consists of the synthesized face images, attention masks and content masks images. Thus, the use of the regex filter process aids to reduces the computation time required for further SRGAN training. The filtered synthesized face images are approximately 3% in comparison to the total output images of AttentionGAN for each age group in UTKFace and CACD datasets as shown in Figs. 4 (a) (b). Then, image sharpening using edge enhancement is performed on the filtered face aged images which provide better input to SRGAN [25] training. Also, SRGAN primarily learns the shape, texture, and color of the object and amend the output images with a few sharp edges [50]. So, image sharpening with edge enhancement is used to sharpen the edges of face aged images. Thus, concentrating on the edges of the image as the rest of the image is unchanged, thus provides sharpening at the edge. The process of filtering and image sharpening is described in Algorithm 1. The limited dataset with good quality images is now fed to SRGAN. The reduced SRGAN training time after the filtering process for each age group is approximately 2 h.

If conditional block output is no, then the entire output of AttentionGAN is directly fed to SRGAN training. But, with this method, the complete training of SRGAN took approximately 26 h, due to the presence of various unwanted images in SRGAN training. The unwanted images are content and attention masks of aged faces which are not required for SRGAN training. As only face aged images are required for the final output.

  • Stage 3: SRGAN training is performed in stage 3 to get the final output image. Now, when the image sharpening output is fed to SRGAN training. In this case, SRGAN training is done on high-quality synthesized images. Then, testing generates the super-resolution images at the output. This process reduces the computation complexity as well as training time. On the other side, when the face aged images with content and attention masks are directly fed to SRGAN training, it increases the complexity and training time.

In SRGAN, where the residual blocks help to build the base model and perceptual loss for the optimization. Thus, enhance the general visual quality of a face image. The generator network along with batch normalization layer (BN) and dense skip connections combining the various facial features at different levels produce the super-resolution image at the output. The generator network in Fig. 3 (c) shows the details with a corresponding number of feature (n, kernel size (k), and stride (s) in each convolutional layer. While training, SRGAN tries to synthesize a super-resolution image by down sampling the input high-resolution face image to a low-resolution face image. Then, a discriminator attempts to differentiate among synthesized super-resolution images from a real high-resolution images.

Further, a perceptual loss is a weighted sum of the content loss and adversarial loss as shown in eq. (6):

$$ {L}_p={l}_c+10\hbox{--} 3\ {l}_{adv}, $$
(6)

Lp is perceptual loss, lc is content loss, and 10–3 ladv adversarial loss. Content loss comprises VGG loss and MSE loss. MSE loss is a pixel-wise error between the super-resolution generated image and the original image. VGG loss is a feature map generated by nth convolution before mth maxpool layer within the VGG19 network. It is denoted with the symbol φ (m, n).

The adversarial loss is the probabilities of discriminator general training samples. It is expressed as shown in eq. (7):

$$ {l}_{adv}={\Sigma}_{q=1}^Q-\log\ {D}_{\alpha_D}\ \left({G}_{\alpha_G}\ \left(\mathrm{I}\ \mathrm{lr}\right)\right), $$
(7)

where I lr is the low-resolution input image, q = 1, . . ., Q training samples, \( {D}_{\alpha_D}\ \left({G}_{\alpha_G}\ \left({I}^{lr}\right)\right) \) is a probability that the reconstructed image is a natural super-resolution image.

Algorithm 1 To give high-quality input to SRGAN from the output of AttentionGAN.

Input: G(u) - Synthesized image; F(G(u) - Recovered image; \( {C}_u^f \)- Content mask in domain V; \( {A}_u^f- \)Foreground attention mask in domain V; \( {A}_u^b \)- Background attention mask in domain V; \( {C}_v^f \)- Content mask in domain U;\( {A}_v^f \)- Foreground attention mask in domain U;\( {A}_v^b- \)Background attention mask in domain U.

Output: Synthesized face images with image sharpening.

1.

Extraction of synthesized images from source path (output of AttentionGAN) using regex of Power shell programming, $filter = [regex] “fake_[A-Z] \. (jpg|png)”.

2.

Move the synthesized face images into the destination path (data file of SRGAN).

3.

Image sharpening with edge enhancement using the cv2 library is performed on the synthesized images (data file) to get sharpened images in train file of SRGAN.

4.

Split the train, dev, test ratio in 70 (train): 30 (dev/test).

5.

Begin training of SRGAN model using the dev file and training file.

6.

Begin the testing of SRGAN.

7.

Finally, get a super-resolution face aged image.

4 Simulation results

Extensive experiment evaluations have been performed to validate the proposed work for producing realistic and super-resolution face aged images. The qualitative and quantitative outcomes are described in subsequent subsections to validate the efficiency of the suggested work.

4.1 Face aging datasets

The experiment is conducted using two benchmark datasets: UTKFace and CACD (the cross-age celebrity dataset). The UTKFace dataset has an age range from 0 to 116 years. Only the UTKFace dataset provides images from zero to five years (babies), six to fifteen years (offspring), above seventy-five (elder people). However, images in CACD have an age range from 16 to 62 years. Few images in CACD have wrong labels or mismatches between the face image and its annotation which makes the dataset very challenging. Some images from UTKFace and CACD datasets are shown in Fig. 5.

For cross-dataset evaluation, three datasets are used: FGNET (The Face and Gesture Recognition Research Network), CelebA, and IMDB-WIKI. FGNET contains a total of 1002 images and is widely used for testing purposes. CelebA provides a face images dataset on a large scale, images are in-wild similar to the CACD dataset. Whereas, IMDB-WIKI dataset has 500 k + face images along with their gender and age annotation. The age range of all face images is from 0 to 100 years.

4.2 Training and implementation scheme

The training process details are illustrated in Fig. 6. Individual age groups have been trained with 200 epochs and batch size of 4 for the aging process, on GTX 1660 Ti with a GPU, Windows 10, and operating system of 64-bit, i7 processor. CACD and UTKFace datasets run for 200 epochs for AttentionGAN training. For SRGAN training, CACD and UTKFace datasets run for 500 epochs. Moreover, the input image takes an RGB face image with a crop size of 100 and generates the p attention masks and p-1 content masks, multiplying the corresponding mask and input image produces the target face aged image. Thus, the least square loss is used for stabilizing the training of the network with a learning rate of 0.0002. In the cross-dataset evaluation, a total of 1000 images are randomly sampled from CelebA, IMDB-WIKI, and FGNET datasets. To avoid data imbalance in CACD, because age range is from 16 to 62 years old thus, limiting the 60+ years age images and no children images. So, the number of training images selected with a range of 1024 approximately, to create a balance between each age group evaluation.

Also, a comparison with other GAN-based methods for the execution of training time is shown in Table 2. Training time depends upon various factors such as system architecture, number of images used, quality of images etc.

Table 2 Analysis of training time for GAN-based methods

4.3 Face aging results

The qualitative and quantitative assessment is performed to show the effectiveness of the proposed work. Figure 2 in section 1 shows the super-resolution face aged images. It clearly shows that the proposed work has achieved convincing results. Figure 7, Fig. 8 shows the results from UTKFace and CACD datasets with their corresponding attention mask. It is shown in the results that the proposed work generates realistic images. The proposed work also shows significant results with various face attacks such as the presence of pose change, expression, make-up, illumination, and spectacles. Figure 9, Fig. 10 shows the continuous transition of input face image to corresponding age groups 21–40, 41–60, 60+ for UTKFace and CACD datasets. It is observed that the results on the UTKFace dataset outperform the CACD dataset results. After the manual examination of some input images from two datasets. It is observed that the low performance on the CACD dataset is because the images are taken under high professional settings such as make-up, lighting. While images in the UTKFace dataset are taken under low professional settings. Thus, the more natural the input image better it produces the synthesized results. The cross-dataset evaluation results are taken from only the UTKFace pre-trained model with the age group 60+. The results are shown in Fig. 11 for FGNET, CelebA, and IMDB-WIKI datasets along with their corresponding content mask. It is observed that in cross-dataset evaluation FGNET has better results than IMDB-WIKI and CelebA. It has been observed that aging is different among females and males [1]. As the face of a female tends to age faster in comparison to the face of a male. That’s why some male images show poor performance. From the synthesized images results, it is observed that output images with high contrast in the mask attain rich information as compared to low contrast in the mask. Further, Fig. 12 shows the face build for men and women to show the aging details changes with age progression and observed that with age progression longer and deeper laugh lines, thinner lip and forehead wrinkles are getting deeper on the face, while the identity of the face is well preserved.

Fig. 3
figure 3

(a) The proposed workflow comprises three stages, (b) Stage 1 process, (c) Stage 3 process

Fig. 4
figure 4

Number of images before and after filtering for (a) UTKFace and (b) CACD datasets

Fig. 5
figure 5

Ten images from the UTKFace dataset (first row) and the CACD dataset (second row) with their age annotation

Fig. 6
figure 6

Training datasets partitioning for the proposed work

Fig. 7
figure 7

Synthesized face image results for the UTKFace dataset

Fig. 8
figure 8

Synthesized face image results for the CACD dataset

Fig. 9
figure 9

Continuous face age progression images obtained from the UTKFace dataset

Fig. 10
figure 10

Continuous face age progression images obtained from the CACD dataset

Fig. 11
figure 11

Cross-dataset evaluation results from the UTKFace pre-trained model

Fig. 12
figure 12

Illustrate the visual comparison of face build in men and women

Fig. 13
figure 13

Visual evaluation of generated super-resolution images

Fig. 14
figure 14

Age estimation graphical representation for the UTKFace (a) synthesized face images (b) real face images

Fig. 15
figure 15

Age estimation graphical representation for CACD (a) synthesized face images (b) real face images

Fig. 16
figure 16

Generalized confusion matrix with labeled blocks

Fig. 17
figure 17

The confusion matrix for age estimation using Face++ tool (a) UTKFace (b) CACD

Fig. 18
figure 18

Face verification confidence score (a) UTKFace (b) CACD

Fig. 19
figure 19

Comparison with existing approaches

4.3.1 Super-resolution visual assessment

Super-resolution face aged images are shown in Fig. 2. Stage 3 generates the super-resolution results on the face aged image. The details of the image are highly preserved during processing time. Thus, the proposed work generates a super-resolution face aged image that retains rich information in an image as given in Fig. 13.

4.4 Quantitative evaluation

4.4.1 Age estimation evaluation

The age estimation on synthesized face aged images and real face images is performed using Face++ [36] online tool. The proposed work follows the evaluation method used by Yang H. et.al in [63]. The synthesized face images from CACD and UTKFace datasets are used for evaluation. For evaluation, 20 images are selected from real face images and aged face images. The 20 images are evaluated by mean and its standard deviation for three age groups 21–40, 41–60, 60+. The values presented in Table 3, depict that the estimated age of the generated face images is approximately near to the estimated age of the real face images. The graphical representation is shown in Figs.14 (a) (b) and Figs.15 (a) (b) for UTKFace and CACD datasets respectively. It clearly shows that face age progression is a unique process for individuals. Thus, various factors such as external and internal factors affect the appearance of a person. Make-up also has an important role which affects the appearance of the human face.

Table 3 Age estimation results on UTKFace and CACD datasets

In addition to that, the confusion matrix evaluation is used for age estimation performance. The confusion matrix is a good technique to depict the performance of the classification problem. The matrix shows the visual and quantitative idea of the right predictions and the types of errors present in the classification problem. Figure 16 shows the generalized confusion matrix labeled block used for evaluation.

Figure 17 (a) shows the confusion matrix of the UTKFace dataset has good evaluation results for the age groups 21–40 and 60+. Similarly, the CACD dataset in Fig. 17 (b) is showing good results for 41–60 and 21–40 age groups.

4.4.2 Identity preservation

Identity preservation is the most important parameter for the evaluation in the face aging method. As it provides authentication of the same person. Thus, the proposed work is evaluated with a face verification score. It illustrates the similarity between the two face images. If the confidence score value exceeds the threshold value, then the similarity scores very high, and images are considered to be similar. Table 4 shows the values of confidence score by Face++. All the values have a high score and beyond the threshold level. Figures 18 (a) and (b) show the graphical representation of the face verification score for UTKFace and CACD datasets.

Table 4 Face verification confidence score on UTKFace and CACD datasets

4.4.3 PSNR and SSIM evaluation for super-resolution

PSNR and SSIM have been mostly used evaluation metrics for super-resolution. However, the PSNR (peak signal to noise ratio) incline to result in extreme smoothness and outcome can differ in large, among almost identical images. Further, the SSIM (structural similarity index) evaluates in terms of contrast, brightness, and structure. It measures the value between 0 and 1. The value 1 means two images are identical and as the value goes lower means greater the difference. The use of image sharpening and then SRGAN training produces better results for super-resolution final output. The quantitative evaluation with PSNR and SSIM for super-resolution face aged images after image sharpening and SRGAN training are shown in Table 5.

Table 5 Quantitative analysis of super-resolution face images

4.5 Comparison with existing approaches

The proposed work generates the super-resolution face aged image which helps in identifying the details of aging signs precisely. The proposed work is compared to previous methods [11] [57]. The qualitative evaluation of the proposed work is performed using the FGNET dataset as given in Fig. 19.

4.5.1 User study evaluation

As state-of-the-art papers are shown in Table 1, a user study evaluation was conducted from 10 observers. It was asked to evaluate a pair-wise image comparison to the existing methods [11, 57]. Total 36 paired images of 18 persons were used from the available work and viewers were asked to check the pair-wise images comparison, to evaluate the super-resolution face age-progressed images. Among 360 votes, 60% prefer the proposed work is better, 30% say prior work is better, and 10% indicate that they are even. In the prior work [11, 57] cropped faces were used for face aging. So, lack in showing aging details on faces. The proposed work generates plausible texture changes on faces in old age such as wrinkles, forehead, and front hairline dropping and preserves the identity well, and produces super-resolution face aged images as shown in Fig. 12 and Fig. 13. Thus, the proposed work is showing better performance.

5 Conclusion

In this paper, AttentionGAN with super-resolution GAN is used to get super-resolution face aged images. The proposed work produces plausible super-resolution face aged images. The robustness and efficacy of the proposed work are shown with qualitative comparison and quantitative evaluation using age estimation, identity preservation analysis. The generalization ability of the model is shown with three cross-dataset evaluations. The suggested work retains the age-progressed face images to real images with an error rate of 0.001%. However, the future scope is open for face age progression implementation on real-time applications.