De-Identification Technique with Facial Deformation in Head CT Images

Head CT, which includes the facial region, can visualize faces using 3D reconstruction, raising concern that individuals may be identified. We developed a new de-identification technique that distorts the faces of head CT images. Head CT images that were distorted were labeled as "original images" and the others as "reference images." Reconstructed face models of both were created, with 400 control points on the facial surfaces. All voxel positions in the original image were moved and deformed according to the deformation vectors required to move to corresponding control points on the reference image. Three face detection and identification programs were used to determine face detection rates and match confidence scores. Intracranial volume equivalence tests were performed before and after deformation, and correlation coefficients between intracranial pixel value histograms were calculated. Output accuracy of the deep learning model for intracranial segmentation was determined using Dice Similarity Coefficient before and after deformation. The face detection rate was 100%, and match confidence scores were < 90. Equivalence testing of the intracranial volume revealed statistical equivalence before and after deformation. The median correlation coefficient between intracranial pixel value histograms before and after deformation was 0.9965, indicating high similarity. Dice Similarity Coefficient values of original and deformed images were statistically equivalent. We developed a technique to de-identify head CT images while maintaining the accuracy of deep-learning models. The technique involves deforming images to prevent face identification, with minimal changes to the original information.

QuickShear and Defacing aim to prevent facial identification by deleting voxels in the face region and eliminating facial information. Loss of anatomical structures on the surface and inside the face results in considerable alteration of information and could impede data utilization (de Sitter et al., 2020). Removal of the entire face makes detecting faces from the reconstructed face models difficult. Images in which faces cannot be detected from the reconstructed face models have been shown to reduce the accuracy of medical image alignment, such as affine transforms. It has various effects on medical image processing which may be problematic for use in surgical simulation and research. Facial Deformation only deforms characteristic facial regions and preserves the internal anatomy. Achieving sufficient de-identification using this technique requires such great deformations that the face shape cannot be preserved, making face detection from reconstructed models difficult. Face Masking applies smoothing of the face surface to prevent face identification; preventing face identification in this way requires such strong blurring that the anatomical structure of the face surface cannot be preserved, making face detection difficult. Reface performs face removal and then replaces the face with another face. In this technique, the change in information is so great that boundary areas between the original and additional images are created, tissue continuity is not maintained, and the integrity of the internal anatomical structures is greatly disrupted.
A report revealed that images processed with QuickShear, Defacing, and Face Masking for segmenting brain tissue and tumors achieved reduced output accuracy in deep learning models (de Sitter et al., 2020).
We aimed to develop a new de-identification technique whereby post-processing reconstructed face models are facedetectable, pre-and post-processing reconstructed face models are not considered for the same person, pre-and post-processing information changes are small, and post-processing images are suitable for deep learning.

Acquisition of CT Images
Non-contrast head CT images of 140 Japanese patients and volunteers admitted to our hospital between January 2021 and May 2022 were obtained. CT was performed using a 320-row multi-slice CT scanner (Aquilion ONE; Toshiba Medical Systems, Tokyo, Japan), using the following parameters: collimation, 0.5 mm; tube voltage, 120 kV; tube current, 200 mA; rotation time, 0.6 s; reconstruction section width, 0.5 mm; reconstruction interval, 0.5 mm; and voxel size, 0.43 × 0.43 × 1.00 mm. The imaging range included the entire face ( eyes, nose, mouth, and ears). Images of patients < 20 years old (11 cases), patients with prominent skin or bone lesions (five), patients with specific facial features due to congenital anomalies or disease (one), and patients wearing oxygen masks (four) were excluded; 119 head CT images were included (Fig. 1). All procedures involving human participants were in accordance with the 1964 Helsinki declaration and its later amendments. The Institutional Review Board of our hospital approved the study protocol (approval number: #2021107NI). Written informed consent was obtained from all patients before participation.

Methods of Deformation Processing
Standard control points (10 points) were set on the feature parts of the face (Fig. 2a).
-Eye: six points in total on the internal, midpoints, and external areas -Nose: two points on the root and tip -Mouth: two points on both corners of the mouth One-third of the distance between the midpoints of both eyes was used as the baseline distance (L), and the following 10 control points were added at locations along the geodesic line above and below the set control points (Fig. 2b).
-Eyebrow: four points on the midpoint (above the midpoint of the eyes) and the medial side (above the inner canthi) -Forehead: two points on the midpoint (above the midpoint of the eyebrows) -Cheek: two points on the lower part of the outer canthus -Chin: two points on the corner line of the mouth below the outer canthus In a grid-like pattern, additional control points were added to the area bounded by the 20 set control points. The range of control points to be added was extended by 0.5 L in the lateral direction of the outer canthi. The added control points were limited to the area above the mouth to minimize the influence of intraoral metal artifacts. Finally, 400 control points were set (Fig. 2c). All control points were set manually using Amira 3D ® .
The 118 images other than the original image to be transformed were considered as the reference image candidate group, and the reference image was selected from among these images. The image with the greatest difference in shape around the eyes compared to the original image was selected as the reference image to deform the face and resemble a different face. Details of the reference image selection method are provided below.
Based on the assumption that the control points were set for the original image and all the reference images, registration between the original image and the reference image candidates was first performed. Scale alignment between the original image and reference image candidates was performed as a preprocessing step for registration. The scale of the reference image candidates was enlarged or reduced such that the distance between the midpoints of both eyes between the original image and the reference image candidates was equal.
After scale alignment, the corresponding points (midpoints of the eyes, eyebrows, and forehead, six points in total for each image) were set for the original image and the reference image candidates, and registration was performed using the least-squares method (Abdi, 2007;Jiang, 1998), which minimizes the sum of the distances between the corresponding points. Because rigid body registration with six degrees of freedom (three translation degrees of freedom and three rotation degrees) was used, no interpolation was performed for the surface points between the control points.
Next, control points around the eyes were selected. The area around the eyes was defined as the range contained within a sphere of radius L (1/3 of the distance between the midpoints of both eyes) centered at the midpoint between the eyes and the eyebrows (Fig. 2d). Control points around the eyes were selected for each of the original and reference image candidates. The sum of the distances (D) between the corresponding control points in the two images was calculated.
p is the control point of the original image, q the control point of the reference image candidate, k the control point index, and n the number of control points used in the evaluation. The image with the largest D is the reference image R, which is given by the following formula: i is the index of the reference image candidate and D(i) is the sum of the distances between the control points when the reference image candidate i is used as the reference image.
Based on the assumption that registration of the original and reference images was complete, voxels in the original image were moved according to the following procedure.
First, to deform the control points in the original image to the corresponding control points in the reference image, the deformation vector d of each control point is calculated using the following formula: p is the control point in the original image, q is the control point in the reference image R, d is the deformation vector of the control point, and k is the index of the control point.
(3) d k = q k − p k Next, the deformation vector C j at the voxel coordinate I j was calculated by computing a weighted addition of the deformation vectors of the surrounding control points using the following formula: Procedure for setting control points and selecting a reference image. a Ten points were set as standard control points: inner canthus, outer canthus, midpoints, nasal root, nasal tip, and both corners of the mouth. b Four control points were added for the eyebrows, two for the forehead, two for both cheeks, and two for the chin, for 20 points. c Control points were added in a grid-like pattern to the area bounded by the set control points, and the area was expanded in the lateral direction of the outer canthus. The additional control points were limited to the area above the mouth for 400 points. d The area around the eyes is defined as the range within a sphere of radius L centered at the midpoint between the eyes and eyebrows. The image with the largest summed distance between the corresponding control points around the eye between the two images was used as the reference image. L: baseline distance (1/3 of the distance between the midpoints of both eyes), p : control point of the original image, q : control point of the reference image candidate, k : index of the control point C is the deformation vector of each voxel in the original image, n is the number of control points, I is the voxel coordinates in the original image, j is the index of voxels in the original image, σ 1 and σ 2 are the standard deviations, G is the normal distribution, and w is the normal distribution divided by the center value. This formula shows that the deformation vectors of nearby control points affect C more when performing a weighted addition.
The deformation vectors of all voxels in the original image were calculated, and the deformed image was obtained by moving the voxel positions according to the deformation vectors (Fig. 3). The amount of voxel deformation was attenuated with the distance from the control points. 1σ was set to 7.5 mm, and the effect on voxels at distances greater than 3σ was ignored.

Verification of information changes
Intracranial volume was measured using image processing software using Amira 3D ® . A seed was set in the thalamus, and intracranial segmentation was performed with thresholds ranging from -50 to 120. Unnecessary soft tissue and the spinal cord below the inferior end of the frontal lobe were removed. The volume of the segmentation area was measured. In all cases, intracranial volumes of the original and deformed images were measured and compared. The similarity between the intracranial pixel-value histograms of the original and deformed images was verified. Intracranial regions in the two images were segmented using the method described above, and only the segmented intracranial regions were extracted from the volume data. The correlation coefficient (Guilford, 1956) between the two images was calculated using the normalized correlation to evaluate the similarity of the intracranial pixel value histograms in the images before and after deformation. The image processing software Amira 3D ® was used for the entire process and calculation of the correlation coefficients. The correlation coefficient (r) is given by the following formula:

Verification of utilization
A deep learning model for intracranial segmentation was created using the Dragonfly ® software (version 2021.3; Object Research Systems, Montreal, Canada) (Research Systems Object, 2022), which can create deep learning models for medical image segmentation. Of the 119 eligible images, 100 were used as training data, and 19 were used as test data.
The intracranial region was segmented in the same way as when the intracranial volume was measured and outputted as binary data (1: intracranial, 0: other regions), and the Ground Truth was created. The original images and Ground Truth were used as the training dataset (100 cases) and were trained on the 2D U-net (Ronneberger et al., 2015;Presotto et al., 2022). Data augmentation was performed by flipping horizontally and vertically, rotating, shearing, and scaling. The parameters were a batch size of 512, an epoch number of 100, and categorical cross-entropy as the loss function. For the validation data, 20% of the training data was used (Fig. 5).
The Dice Similarity Coefficient (DSC) (Dice, 1945) was used to evaluate the output accuracy of the deep learning model. In the original and deformed images, each of the 19 images not used for training was input into the deep learning model as test data. DSC was calculated and compared between the original and deformed image inputs (Fig. 5).

Statistical Analysis
Equivalence tests (Dunnett & Gent, 1977) were performed to verify whether the intracranial volumes of the original and deformed images were equivalent. The equivalence margin was set at 1% of the mean intracranial volume Fig. 4 Face identification test flow. The reconstructed face models of the original images were registered with the three face identification programs as Ground Truth (STEP 1). The reconstructed face model of the deformed image was input into the programs as test data. The match confidence score between the original and deformed images of the same person was obtained (STEP 2) of the original image, with a significance level of 5%. Brand-Altman analysis (Bland & Altman, 1986) was performed to verify the equivalence of the DSCs for the original and deformed images. Statistical analyses were performed using the JMP Pro Ⓡ 16 (SAS Institute Inc., Cary, NC, USA).

Deformation Processing
In all 119 cases, facial reconstruction models were created, and control points were set. Reference images could be selected in all cases, and deformation processing was performed on all original images using the selected reference images. The original, reference, and deformed images of one case are shown in Fig. 6.

Evaluation of Accuracy
The face detection rate was 100% in all face detection programs for both reconstructed face models of the original images and those of deformed images. The distribution of the match confidence scores is shown in Fig. 7. The median match confidence scores were 86.77 (interquartile range: 84.16-87.93) for Face API (Fig. 7a), 85.36 (interquartile range: 83.29-87.70) for Rekognition (Fig. 7b), and 84.59 (interquartile range: 81.07-88.04) for NeoFace KAOATO (Fig. 7c). In all cases, the match confidence scores were less than 90 for all face identification programs, and the distribution of match confidence scores did not differ among the three programs.
The mean ± standard deviation of the intracranial volume in the original images was 1412 ± 148 (×10 3 mm 3 ), and that in the deformed images was 1410 ± 152 (×10 3 mm 3 ).
The relationship between the intracranial volumes of the original and deformed images for each case is shown in Fig. 8. An equivalence test was performed using 1% of the mean intracranial volume in the original images, 14 × 10 3 mm 3 as the equivalence margin. Statistical analysis shows that the difference between the means is "above the lower limit (-14 × 10 3 mm 3 ) (P = .0148)" and " below the upper limit (14 × 10 3 mm 3 ) (P = .0012)," indicating that the intracranial volume means between the original images and the deformed images were statistically equivalent; in other words, the two mean volumes were within a certain range. The % variance in intracranial volume difference was 105.5%, a small increment in the deformed images, but not an important difference.
The median correlation coefficient was 0.9965 (interquartile range: 0.9951-0.9974), and the intracranial pixel value histograms between the two images showed high similarity in all cases ( Table 2).
The original and deformed images were input to the deep learning model for intracranial segmentation, and the DSC for the original and deformed images was calculated for all 19 cases. The DSC distribution is shown in Fig. 9. The median DSC was 0.9967 (interquartile range: 0.9962-0.9973) for the original images and 0.9952 (interquartile range: 0.9945-0.9958) for the deformed images, both with a high DSC. A Brand-Altman analysis was performed to compare the DSCs of the original and deformed images. The mean differences in the DSCs between the two images were within an acceptable error margin (95% confidence interval). The DSCs of the original and deformed images were nearly statistically equivalent. In all cases, the control points could be set, the reference images were selected, and the deformed images were created

Study Results Summary
Using control points, a new de-identification technique was developed in head CT images to deform original images to resemble reference images. The reconstructed face models of the deformed images were face-detectable and provided sufficient facial changes from the original images. The intracranial volume and pixel value histograms were equivalent before and after deformation. The output accuracy of the deep learning model for intracranial segmentation was equivalent to the original and deformed images.
Importantly, all the deformed reconstructed face models could likely detect faces because using another person's images as a reference guaranteed that the destination was a human face, even if the control points moved significantly.
According to guidelines (European Data Protection Board, 2022) provided by the European Data Protection Board, the threshold for match confidence scores in critical security situations involving personal information, such as police and banks, is recommended to be ≥ 90 to consider it as the same person. However, no appropriate thresholds for face identification tests between reconstructed face models have been reported. Here, a match confidence score < 90, generally judged to indicate a different person, was accepted as a change in the face. The match confidence scores were < 90 for all face identification programs, possibly because the facial features of the original images were sufficiently altered to the facial features of the reference images, and Fig. 7 Match confidence scores distribution for each face identification program. Distribution of match confidence scores in a Face API, b Rekognition, and c NeoFace KAOATO. All face identification programs had match confidence scores below 90 in all cases, and there were no significant differences among the three programs the deformations were performed to achieve a closer resemblance to the faces of the reference images.
Because the accuracy of face identification by human visual assessment has been reported to be significantly less than that of face identification programs (Chen et al., 2014;Prior et al., 2009), a human visual assessment was not performed.
Intracranial volume has been reported to decrease by approximately 10% between 40 and 75 years old (Fillmore et al., 2015); therefore, the effect of a volume change < 1% in this study was considered sufficiently small. The correlation coefficients of the intracranial pixel value histograms between the original and deformed images were higher than 0.9, indicating a robust correlation. One reason for the small changes in intracranial information before and after deformation may be that the deformation was attenuated with distance from the face surface, so the deformation effect was smaller in the interior. The reasons for the equivalence of the DSCs could be the absence of high-impact processing, such as deletion and blurring, the suppression of internal deformations, and the absence of unnatural boundary areas.

Strengths and Novelties of this Technique
The problems with existing techniques are that face detection is no longer possible, substantial information changes occur during processing, and the output accuracy of deep learning models related to medical image segmentation is reduced. No technique has overcome all these problems. The de-identification technique proposed in this study has the potential to be more useful regarding face detection, information changes, and maintaining the accuracy of deep learning models. The main advantage of our technique over existing techniques is that no unnatural boundaries are generated, consistency is maintained, and the deformed image is indistinguishable from the original one when compared.
If de-identification techniques involve excessive processing, too much information is lost, and the processed images are impractical to use as research material. Conversely, if the degree of processing is too small, there is a greater concern that the face may be identified. There has always been a trade-off between information preservation and face identification prevention. The three main novelties of this technique are that control points were set for the deformation of the face surface, a reference image was used to move the control points, and the degree of deformation was attenuated according to the distance from the control points. Using the reference images, the shape of the face was preserved regardless of how far the control points moved. Because the control point was located only on the face surface, the deformation was attenuated according to the distance from the control points, thereby minimizing changes to intracranial information.

Limitations and Future Work
This study had several limitations. Only CT images were used, but in the future, the technique must also be applied to MRI to confirm its accuracy.
In medical images, there is no established method to evaluate the success or failure of de-identification of medical images, and the degree of deformation that can be said to be "a different face" in the facial reconstruction model is unknown. We reported only the technical details of de-identification, and it cannot be strictly asserted that the results of this study make it possible to legally anonymize the data. The concept of "ELSI" is considered an essential nontechnical issue when developing medical technology and sharing medical data (Fisher, 2005). ELSI indicates "Ethical, Legal,  and Social Issues" and advocates the need to discuss ethical issues and their impact on individuals and society when new methods and technologies not previously available are not addressed by current laws. The proposed de-identification technique could contribute to the social and ethical treatment of personal information, even if legal interpretation is challenging.

Conclusion
A new de-identification technique was developed in head CT images to deform original images to reference images using control points. In deformed images, the reconstructed face models exhibited detectable and sufficient facial changes from the original in all cases; intracranial volume and intracranial pixel value histograms were equivalent before and after deformation. The output accuracy of the deep learning model for intracranial segmentation was equivalent to the original and deformed images.
Author Contributions Uchida, Kin and Saito decided study conception and design. Uchida performed data acquisition. Uchida wrote the main manuscript text and prepared all figures. Takashima and Kawahara performed statistical analysis. Kin and Saito gave the final approval of submitted version.All authors reviewed the manuscript.
Funding Open access funding provided by The University of Tokyo. This research was supported by JST CREST, Japan (grant number JPM-JCR17A1) and JSPS KAKENHI, Japan (grant number JP21K09095).

Data Availability
Data generated or analyzed during the study are available from the corresponding author by request.

Ethics Approval and Consent
The internal review board of the University of Tokyo Hospital approved the study protocol (consent number #2021107NI), and written informed consent was obtained from all patients prior to participation.
Competing Interests All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated Brand-Altman analysis showed that the mean difference in DSCs in all cases was within the acceptable error margin. The DSCs of the original and deformed images could be interpreted as being close to equivalent otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.