Abdominal fat quantification using convolutional networks

Objectives To present software for automated adipose tissue quantification of abdominal magnetic resonance imaging (MRI) data using fully convolutional networks (FCN) and to evaluate its overall performance—accuracy, reliability, processing effort, and time—in comparison with an interactive reference method. Materials and methods Single-center data of patients with obesity were analyzed retrospectively with institutional review board approval. Ground truth for subcutaneous (SAT) and visceral adipose tissue (VAT) segmentation was provided by semiautomated region-of-interest (ROI) histogram thresholding of 331 full abdominal image series. Automated analyses were implemented using UNet-based FCN architectures and data augmentation techniques. Cross-validation was performed on hold-out data using standard similarity and error measures. Results The FCN models reached Dice coefficients of up to 0.954 for SAT and 0.889 for VAT segmentation during cross-validation. Volumetric SAT (VAT) assessment resulted in a Pearson correlation coefficient of 0.999 (0.997), relative bias of 0.7% (0.8%), and standard deviation of 1.2% (3.1%). Intraclass correlation (coefficient of variation) within the same cohort was 0.999 (1.4%) for SAT and 0.996 (3.1%) for VAT. Conclusion The presented methods for automated adipose-tissue quantification showed substantial improvements over common semiautomated approaches (no reader dependence, less effort) and thus provide a promising option for adipose tissue quantification. Clinical relevance statement Deep learning techniques will likely enable image-based body composition analyses on a routine basis. The presented fully convolutional network models are well suited for full abdominopelvic adipose tissue quantification in patients with obesity. Key Points • This work compared the performance of different deep-learning approaches for adipose tissue quantification in patients with obesity. • Supervised deep learning–based methods using fully convolutional networks were suited best. • Measures of accuracy were equal to or better than the operator-driven approach. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-023-09865-w.

using Kronecker delta   and a category identifier .For the segmentation task,   ,  ̂,  ∈ {SAT, VAT, other}.Here, accuracy and dice similarity coefficient are used to measure the pixelwise agreement between predicted and ground-truth adipose tissue.
The following metrics may be used for predictors of continuous random variables , e. , where The second Wasserstein distance PW 2 measures the similarity between the predicted and ground truth distributions.Here, the relative difference was used as cost function to facilitate the comparison with RMSPE.PW 2 is obtained by ,

FCN architectures
The UNet architecture (Fig. S1) is composed of an encoding pathway Several FCN were trained using data augmentation via random image transformations to improve generalization performance (6).This included affine transformations (rescaling, translation, rotation and shear), piecewise affine transformations, changes in perspective, pixel-intensity scaling as well as Gaussian blur, crop-and-pad and cutout (7).The corresponding hyperparameters of the transformations for data augmentation together with SGD parameters and CE-loss class weights were tuned in ex-ante cross-validation.
Lastly, the addition of competitive learning (8) turns the above DenseUNet into a CDFNet (Competitive Dense Fully Convolutional Network, Fig. S3).Instead of fully connecting the convolutions within encoding or decoding blocks, only the strongest Eur Radiol (2023) Schneider D, Eggebrecht T, Linder A et al activations in each encoder and decoder sequence were allowed to pass (maxout operation).The same rule is applied to the skip connections between encoding and decoding pathways.It is hypothesized that a higher selectivity in features will improve the accuracy of the model (9).The CDFNet has 64 feature channels and 2.5 million trainable parameters.

Supplementary Figures
g., quantization of adipose tissue volumes.The Pearson correlation coefficient (,  ̂) =   σ   ̂ measures the linear correlation between true and predicted values assuming standard deviations   and   ̂, and covariance   ̂.Consider the relative difference  = 1 −  ̂/ with realizations {  }.The mean percentage error MPE() =   is a measure of the systematic prediction error (bias) with   being the arithmetic mean of relative difference .The standard deviation of the relative differences SD() =   estimates the variation prediction error by eliminating Eur Radiol (2023) Schneider D, Eggebrecht T, Linder A et al the mean bias from the individual differences.The root-mean-square percentage error is a measure of average total prediction error according to RMSPE(D ) =   2 1 2 chevron brackets ⟨⟩ indicating the arithmetic mean, and the set {  } representing the solution of the following linear assignment problem, which can be solved, for instance, with the Hungarian algorithm.PW2 is bounded by 0 ≤ PW 2 ≤  with 0 indicating similar distributions of predicted and true values.Finally, the excess kurtosis k quantifies the excess contribution of the tails in the distribution of variation errors in comparison to the normal distribution and indicates the relevance of outliers.It is obtained via  , Eur Radiol (2023) Schneider D, Eggebrecht T, Linder A et al where   (4) =  (−  ) 4 denotes the fourth central moment and   = 3 is the kurtosis of the standard normal distribution.In this work, Pearson correlation coefficient, mean percentage error, standard deviation of the relative differences, root-mean-square percentage error, second Wasserstein distance, and excess kurtosis were used to validate the predictors of SAT and VAT volumes for a given patient.
hierarchically capturing spatial context over different length scales and a symmetric decoder for proper resolution output(1).The encoder contains a series of convolution and downsampling (here: max pooling) operations.The decoding path is comprised of repeated convolution and upsampling sequences.Transposed convolutions are used for trainable interpolation during upsampling.Additional paths for gradient flow are provided by skip connections.The last convolution operation maps the resulting feature maps to the class logits.The UNet used here had a total of 8.8 million trainable parameters.The DenseUNet architecture (2,3) (Fig.S2) is a slight variation of the UNet design.The number of encoder and decoder blocks were increased by one.All encoding and decoding sequences use three interconnected convolutional layers (4).A deterministic max-unpooling operation was chosen to compensate for the increased block complexity during upsampling.The number of feature channels was decreased to 64 and kept constant throughout the network.This lowered the numbered of learned parameters to 3.3 million.Eur Radiol (2023) Schneider D, Eggebrecht T, Linder A et alModel trainingData were composed of T1-weighted (in-phase) MR images and their corresponding ground-truth segmentation maps.All FCN were trained in a supervised learning scheme using stochastic gradient descent with learning rate lr = 0.001, momentum m = 0.9, and batch size Nb = 4 using cross-entropy loss as objective function.A common five-fold cross-validation scheme was applied to evaluate segmentation and fat quantification performance.For each split, the dataset was divided into training, validation and test subsets according to the ratio 3:1:1.In this way the entire dataset may be used for model evaluation, testing each of the five obtained models on the respective hold-out data and aggregating performance measures over all models.The training subset was shuffled prior to each epoch and early stopping was used for regularization (5), using, after each training epoch, model performance on the validation subset as indicator.At last, the hold-out test subset was used for model validation.
Figure S1.UNet architecture.Feature maps are shown in green with dimensional

Figure S1 .
Figure S1.UNet architecture.Feature maps are shown in green with dimensional