CT-measured body composition radiomics predict lymph node metastasis in localized pancreatic ductal adenocarcinoma

Background To explored the value of CT-measured body composition radiomics in preoperative evaluation of lymph node metastasis (LNM) in localized pancreatic ductal adenocarcinoma (LPDAC). Methods We retrospectively collected patients with LPDAC who underwent surgical resection from January 2016 to June 2022. According to whether there was LNM after operation, the patients were divided into LNM group and non-LNM group in both male and female patients. The patient’s body composition was measured by CT images at the level of the L3 vertebral body before surgery, and the radiomics features of adipose tissue and muscle were extracted. Multivariate logistic regression (forward LR) analyses were used to determine the predictors of LNM from male and female patient, respectively. Sexual dimorphism prediction signature using adipose tissue radiomics features, muscle tissue radiomics features and combined signature of both were developed and compared. The model performance is evaluated on discrimination and validated through a leave-one-out cross-validation method. Results A total of 196 patients (mean age, 60 years ± 9 [SD]; 117 men) were enrolled, including 59 LNM in male and 36 LNM in female. Both male and female CT-measured body composition radiomics signatures have a certain predictive power on LNM of LPDAC. Among them, the female adipose tissue signature showed the highest performance (area under the ROC curve (AUC), 0.895), and leave one out cross validation (LOOCV) indicated that the signature could accurately classify 83.5% of cases; The prediction efficiency of the signature can be further improved after adding the muscle radiomics features (AUC, 0.924, and the accuracy of the LOOCV was 87.3%); The abilities of male adipose tissue and muscle tissue radiomics signatures in predicting LNM of LPDAC was similar, AUC was 0.735 and 0.773, respectively, and the accuracy of LOOCV was 62.4% and 68.4%, respectively. Conclusions CT-measured body composition Radiomics strategy showed good performance for predicting LNM in LPDAC, and has sexual dimorphism. It may provide a reference for individual treatment of LPDAC and related research about body composition in the future. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-023-00624-3.


First order statistics (19 features)
Notations: X is an image of N voxels included in the ROI P i is the first order histogram with N l discrete intensity levels, in which N l is the number of non-zero bins p i is the normalized first order histogram and equal to Pi Pi (This definition is the same for the following sections) • 10Percentile The 10 th percentile of X.

• 90Percentile
The 90 th percentile of X.

• Energy
Here, c is optional value, defined by "voxelArrayShift", which shifts the intensities to prevent negative values in X. This ensures that voxels with the lowest gray values contribute the least to Energy, instead of voxels with gray level intensity closest to 0.
Energy is a measure of the magnitude of voxel values in an image. A larger values implies a greater sum of the squares of these values.
Note: This feature is volume-confounded, a larger value of c increases the effect of volume-confounding.
Entropy specifies the uncertainty/randomness in the image values. It measures the average amount of information required to encode the image values.
• InterquartileRange interquartile range = P 75 − P 25 Here P 25 and P 75 are the 25 th and 75 th percentile of the image array, respectively.
• Kurtosis Where µ 4 is the 4 th central moment.
Kurtosis is a measure of the 'peakedness' of the distribution of values in the image ROI. A higher kurtosis implies that the mass of the distribution is concentrated towards the tail(s) rather than towards the mean. A lower kurtosis implies the reverse: that the mass of the distribution is concentrated towards a spike near the Mean value.
Related links: https://en.wikipedia.org/wiki/Kurtosis • Maximum maximum = max(X) The maximum gray level intensity within the ROI.
• Mean The average gray level intensity within the ROI.
• MeanAbsoluteDeviation Mean Absolute Deviation is the mean distance of all intensity values from the Mean Value of the image array.

• Median
The median gray level intensity within the ROI.
• Minimum The range of gray values in the ROI.
• RobustMeanAbsoluteDeviation |X 10−90 (i) −X 10−90 | Robust Mean Absolute Deviation is the mean distance of all intensity values from the Mean Value calculated on the subset of image array with gray levels in between, or equal to the 10 th and 90 th percentile.
Here, c is optional value, defined by "voxelArrayShift", which shifts the intensities to prevent negative values in X. This ensures that voxels with the lowest gray values contribute the least to RMS, instead of voxels with gray level intensity closest to 0. RMS is the square-root of the mean of all the squared intensity values. It is another measure of the magnitude of the image values. This feature is volumeconfounded, a larger value of c increases the effect of volume-confounding.
• Skewness Where µ 3 is the 3 rd central moment.
Skewness measures the asymmetry of the distribution of values about the Mean value. Depending on where the tail is elongated and the mass of the distribution is concentrated, this value can be positive or negative.
Related links: https://en.wikipedia.org/wiki/Skewness Standard Deviation measures the amount of variation or dispersion from the Mean Value. By definition, standard deviation = √ variance.
• TotalEnergy Here, c is optional value, defined by "voxelArrayShift", which shifts the intensities to prevent negative values in X. This ensures that voxels with the lowest gray values contribute the least to Energy, instead of voxels with gray level intensity closest to 0. Total Energy is the value of Energy feature scaled by the volume of the voxel in cubic mm. Note This feature is volume-confounded, a larger value of c increases the effect of volume-confounding.
Uniformity is a measure of the sum of the squares of each intensity value. This is a measure of the heterogeneity of the image array, where a greater uniformity implies a greater heterogeneity or a greater range of discrete intensity values.
Variance is the the mean of the squared distances of each intensity value from the Mean value. This is a measure of the spread of the distribution about the mean. By definition, variance = σ 2 .
2 Shape features 13 features • Compactness1 πA 3 Similar to Sphericity, Compactness 1 is a measure of how compact the shape of the tumor is relative to a sphere (most compact). It is therefore correlated to Sphericity and redundant. It is provided here for completeness. The value range is 0 < compactness 1 ≤ 1 6π , where a value of 1 6π indicates a perfect sphere. By definition, compactness 1 = 1 6π √ compactness 2 = 1 6π sphericity 3 . Note: This feature is correlated to Compactness 2, Sphericity and Spherical Disproportion. In the default parameter file provided in the "pyradiomics\bin" folder, Compactness 1 and Compactness 2 are therefore disabled.
• Compactness2 compactness 2 = 36π V 2 A 3 Similar to Sphericity and Compactness 1, Compactness 2 is a measure of how compact the shape of the tumor is relative to a sphere (most compact). It is a dimensionless measure, independent of scale and orientation. The value range is 0 < compactness 2 ≤ 1, where a value of 1 indicates a perfect sphere.
By definition, compactness 2 = (sphericity) 3 . Note: This feature is correlated to Compactness 1, Sphericity and Spherical Disproportion. In the default parameter file provided in the "pyradiomics\bin" folder, Compactness 1 and Compactness 2 are therefore disabled.

• Elongation
Elongation is calculated using its implementation in SimpleITK, and is defined as: Here, λ major and λ minor are the lengths of the largest and second largest principal component axes. The values range between 1 (where the cross section through the first and second largest principal moments is circle-like (non-elongated)) and 0 (where the object is a single point or 1 dimensional line).

• Flatness
Flatness is calculated using its implementation in SimpleITK, and is defined as: Here, λ major and λ least are the lengths of the largest and smallest principal component axes. The values range between 1 (non-flat, sphere-like) and 0 (a flat object).

• Maximum2DDiameterColumn
Maximum 2D diameter (Column) is defined as the largest pairwise Euclidean distance between tumor surface voxels in the row-slice (usually the coronal) plane.

• Maximum2DDiameterRow
Maximum 2D diameter (Row) is defined as the largest pairwise Euclidean distance between tumor surface voxels in the column-slice (usually the sagittal) plane.

• Maximum2DDiameterSlice
Maximum 2D diameter (Slice) is defined as the largest pairwise Euclidean distance between tumor surface voxels in the row-column (generally the axial) plane.

• Maximum3DDiameter
Maximum 3D diameter is defined as the largest pairwise Euclidean distance between surface voxels in the ROI.
Also known as Feret Diameter.
• SphericalDisproportion 36πV 2 Where R is the radius of a sphere with the same volume as the tumor, and equal Spherical Disproportion is the ratio of the surface area of the tumor region to the surface area of a sphere with the same volume as the tumor region, and by definition, the inverse of Sphericity. Therefore, the value range is spherical disproportion ≥ 1, with a value of 1 indicating a perfect sphere.
Note: This feature is correlated to Compactness 1, Sphericity and Spherical Disproportion. In the default parameter file provided in the "pyradiomics\bin" folder, Compactness 1 and Compactness 2 are therefore disabled.
• Sphericity 36πV 2 A Sphericity is a measure of the roundness of the shape of the tumor region relative to a sphere. It is a dimensionless measure, independent of scale and orientation. The value range is 0 < sphericity ≤ 1, where a value of 1 indicates a perfect sphere (a sphere has the smallest possible surface area for a given volume, compared to other solids).
Note: This feature is correlated to Compactness 1, Compactness 2 and Spherical Disproportion. In the default parameter file provided in the "pyradiomics\bin" folder, Compactness 1 and Compactness 2 are therefore disabled.

• SurfaceArea
N is the number of triangles forming the surface mesh of the volume (ROI) a i b i and a i c i are the edges of the i th triangle formed by points a i , b i and c i Surface Area is an approximation of the surface of the ROI in mm2, calculated using a marching cubes algorithm.
• SurfaceVolumeRatio surface to volume ratio = A V Here, a lower value indicates a more compact (sphere-like) shape. This feature is not dimensionless, and is therefore (partly) dependent on the volume of the ROI.

• Volume
The volume of the ROI is approximated by multiplying the number of voxels in the ROI by the volume of a single voxel.

GLCM features 28 features
Notations: Autocorrelation is a measure of the magnitude of the fineness and coarseness of texture.
• AverageIntensity Returns the mean gray level intensity of the i distribution.
Warning: As this formula represents the average of the distribution of i, it is independent from the distribution of j. Therefore, only use this formula if the GLCM is symmetrical, where p x (i) = p y (j), where i = j.
• ClusterProminence Cluster Prominence is a measure of the skewness and asymmetry of the GLCM.
A higher values implies more asymmetry about the mean while a lower value indicates a peak near the mean value and less variation about the mean.
• ClusterShade Cluster Shade is a measure of the skewness and uniformity of the GLCM. A higher cluster shade implies greater asymmetry about the mean.
• ClusterTendency Cluster Tendency is a measure of groupings of voxels with similar gray-level values.
• Contrast Contrast is a measure of the local intensity variation, favoring values away from the diagonal (i = j). A larger value correlates with a greater disparity in intensity values among neighboring voxels.
• Correlation Correlation is a value between 0 (uncorrelated) and 1 (perfectly correlated) showing the linear dependency of gray level values to their respective voxels in the GLCM. Note: When there is only 1 discreet gray value in the ROI (flat region), σ x and σ y will be 0. In this case, the value of correlation will be a NaN. • DifferenceEntropy difference entropy = Ng−1 k=0 p x−y (k) log 2 p x−y (k) + Difference Entropy is a measure of the randomness/variability in neighborhood intensity value differences.
• DifferenceVariance Difference Variance is a measure of heterogeneity that places higher weights on differing intensity level pairs that deviate more from the mean.
• Dissimilarity Dissimilarity is a measure of local intensity variation defined as the mean absolute difference between the neighbouring pairs. A larger value correlates with a greater disparity in intensity values among neighboring voxels.

• Energy
Energy (or Angular Second Moment)is a measure of homogeneous patterns in the image. A greater Energy implies that there are more instances of intensity value pairs in the image that neighbor each other at higher frequencies.
• Entropy Entropy is a measure of the randomness/variability in neighborhood intensity values.
• Homogeneity1 Homogeneity 1 is a measure of the similarity in intensity values for neighboring voxels. It is a measure of local homogeneity that increases with less contrast in the window.
• Homogeneity2 Homogeneity 2 is a measure of the similarity in intensity values for neighboring voxels.
• Id Ng j=1 p(i, j) 1 + |i − j| ID (inverse difference) is another measure of the local homogeneity of an image. With more uniform gray levels, the denominator will remain low, resulting in a higher overall value.
• Idm Ng j=1 p(i, j) 1 + |i − j| 2 IDM (inverse difference moment) is a measure of the local homogeneity of an image. IDM weights are the inverse of the Contrast weights (decreasing exponentially from the diagonal i = j in the GLCM).
• Idmn is a measure of the local homogeneity of an image. IDMN weights are the inverse of the Contrast weights (decreasing exponentially from the diagonal i = j in the GLCM). Unlike Ho-mogeneity2, IDMN normalizes the square of the difference between neighboring intensity values by dividing over the square of the total number of discrete intensity values.
• Idn is another measure of the local homogeneity of an image. Unlike Homogeneity1, IDN normalizes the difference between the neighboring intensity values by dividing over the total number of discrete intensity values.
• Imc1 Maximum Probability is occurrences of the most predominant pair of neighboring intensity values.
• SumAverage Sum Average measures the relationship between occurrences of pairs with lower intensity values and occurrences of pairs with higher intensity values.
• SumEntropy sum entropy = 2Ng k=2 p x+y (k) log 2 p x+y (k) + Sum Entropy is a sum of neighborhood intensity value differences.
• SumSquares Sum of Squares or Variance is a measure in the distribution of neigboring intensity level pairs about the mean intensity level in the GLCM. Warning: This formula represents the variance of the distribution of i and is independent from the distribution of j. Therefore, only use this formula if the GLCM is symmetrical, where p x (i) = p y (j), where i = j.
• SumVariance Sum Variance is a measure of heterogeneity that places higher weights on neighboring intensity level pairs that deviate more from the mean.

• SumVariance2
Using coefficients p x+y and SumAvarage (SA) calculate and return the mean Sum Variance 2.
Sum Variance 2 is a measure of heterogeneity that places higher weights on neighboring intensity level pairs that deviate more from the mean.
This formula differs from SumVariance in that instead of subtracting the SumEntropy from the intensity, it subtracts the SumAvarage, which is the mean of intensities and not its entropy.

GLRLM features 16 features
Notations: P(i, j | θ) is the run length matrix of direction θ p(i, j | θ) is the normalized run length matrix N g is the number of discrete intensity values in the image N r is the number of discrete run lengths in the image N p is the number of voxels in the image Nr j=1 P(i, j|θ) GLN measures the similarity of gray-level intensity values in the image, where a lower GLN value correlates with a greater similarity in intensity values.
Nr j=1 P(i, j|θ) 2 GLNN measures the similarity of gray-level intensity values in the image, where a lower GLNN value correlates with a greater similarity in intensity values. This is the normalized version of the GLN formula.
Nr j=1 p(i, j|θ)i GLV measures the variance in gray level intensity for the runs.
• HighGrayLevelRunEmphasis Nr j=1 P(i, j|θ) HGLRE measures the distribution of the higher gray-level values, with a higher value indicating a greater concentration of high gray-level values in the image.
Nr j=1 P(i, j|θ) LRE is a measure of the distribution of long run lengths, with a greater value indicative of longer run lengths and more coarse structural textures.
• LongRunHighGrayLevelEmphasis Nr j=1 P(i, j|θ) LRHGLRE measures the joint distribution of long run lengths with higher graylevel values.
• LongRunLowGrayLevelEmphasis Nr j=1 P(i, j|θ) LRLGLRE measures the joint distribution of long run lengths with lower graylevel values.
• LowGrayLevelRunEmphasis LGLRE measures the distribution of low gray-level values, with a higher value indicating a greater concentration of low gray-level values in the image.
RE measures the uncertainty/randomness in the distribution of run lengths and gray levels. A higher value indicates more heterogeneity in the texture patterns.
• RunLengthNonUniformity Nr j=1 P(i, j|θ) RLN measures the similarity of run lengths throughout the image, with a lower value indicating more homogeneity among run lengths in the image.
Nr j=1 P(i, j|θ) RLNN measures the similarity of run lengths throughout the image, with a lower value indicating more homogeneity among run lengths in the image. This is the normalized version of the RLN formula. • RunVariance RV is a measure of the variance in runs for the run lengths.
• ShortRunEmphasis Nr j=1 P(i, j|θ) SRE is a measure of the distribution of short run lengths, with a greater value indicative of shorter run lengths and more fine textural textures.
Nr j=1 P(i, j|θ) SRHGLE measures the joint distribution of shorter run lengths with higher gray-level values.
• ShortRunLowGrayLevelEmphasis Nr j=1 P(i, j|θ) SRLGLE measures the joint distribution of shorter run lengths with lower graylevel values.

GLSZM features 16 features
Several notations: P(i, j) is the size zone matrix p(i, j) is the normalized size zone matrix N g is the number of discrete intensity values in the image N s is the number of discrete zone sizes in the image N p is the number of voxels in the image Ns j=1 P(i, j) GLN measures the variability of gray-level intensity values in the image, with a lower value indicating more homogeneity in intensity values.
GLNN measures the variability of gray-level intensity values in the image, with a lower value indicating a greater similarity in intensity values. This is the normalized version of the GLN formula.
GLV measures the variance in gray level intensities for the zones.
• HighGrayLevelZoneEmphasis Ns j=1 P(i, j) HGLZE measures the distribution of the higher gray-level values, with a higher value indicating a greater proportion of higher gray-level values and size zones in the image.
• LargeAreaEmphasis Ns j=1 P(i, j) LAE is a measure of the distribution of large area size zones, with a greater value indicative of more larger size zones and more coarse textures.
• LargeAreaHighGrayLevelEmphasis Ns j=1 P(i, j) LAHGLE measures the proportion in the image of the joint distribution of larger size zones with higher gray-level values.
• LargeAreaLowGrayLevelEmphasis Ns j=1 P(i, j) LALGLE measures the proportion in the image of the joint distribution of larger size zones with lower gray-level values.
• LowGrayLevelZoneEmphasis Ns j=1 LGLZE measures the distribution of lower gray-level size zones, with a higher value indicating a greater proportion of lower gray-level values and size zones in the image.
• SizeZoneNonUniformity Ns j=1 P(i, j) SZN measures the variability of size zone volumes in the image, with a lower value indicating more homogeneity in size zone volumes.
• SizeZoneNonUniformityNormalized SZNN measures the variability of size zone volumes throughout the image, with a lower value indicating more homogeneity among zone size volumes in the image. This is the normalized version of the SZN formula.
• SmallAreaEmphasis Ns j=1 P(i,j) j 2 Ng i=1 Ns j=1 P(i, j) SAE is a measure of the distribution of small size zones, with a greater value indicative of more smaller size zones and more fine textures.
• SmallAreaHighGrayLevelEmphasis Ns j=1 P(i,j)i 2 j 2 Ng i=1 Ns j=1 P(i, j) SAHGLE measures the proportion in the image of the joint distribution of smaller size zones with higher gray-level values.
• SmallAreaLowGrayLevelEmphasis Ns j=1 Ns j=1 P(i, j) SALGLE measures the proportion in the image of the joint distribution of smaller size zones with lower gray-level values.
ZE measures the uncertainty/randomness in the distribution of zone sizes and gray levels. A higher value indicates more heterogeneneity in the texture patterns. Values are in range 1 Np ≤ ZP ≤ 1, with higher values indicating a larger portion of the ROI consists of small zones (indicates a more fine texture).