Introduction

Machine learning-based computer-aided diagnosis (ML-based CAD) is a field that involves analyzing large datasets of patient data, particularly medical images, to assist clinicians in decision-making. Numerous studies in this field have been conducted on different subjects and image modalities such as the characterization of breast tumors with MRI scans [1,2,3], the detection of cerebral aneurysms with CT angiographies [4, 5], and the detection of lung nodules with chest X-rays [6, 7]. Within such studies, the dataset serves as the foundation upon which ML models are trained, directly influencing performance of a CAD system. However, preparation of the dataset entailed difficulty due to the repeat of hard-to-automate processes such as image acquisition, anonymization, annotation and preprocessing, therefore required additional time and effort from research participants. Several studies, including our earlier work, proposed comprehensive process flows to address the difficulty and efficiently prepare the dataset [8, 9]. Nevertheless, our previously proposed process flow was still rigid, as it required a complete re-implementation when dealing with another image modality and diagnostic purpose. As a result, this study improves our former study by reorganizing processes into common and modality/subject-dependent parts, therefore enabling the system to be partly reusable on different joint-research projects regardless of image modality and diagnostic purpose. To evaluate its effectiveness, we introduce two demonstrations related to breast cancer diagnosis.

Breast cancer diagnosis is a typical example of ML-based CAD applications. Advanced techniques allow early detection and treatment of the cancer with an excellent survival expectation. By providing detailed images of breast tissue through MRI scans, precise information about tumor size, location, and proximity to surrounding structures can be realized with popular ML algorithms. Furthermore, an extensive amount of quantitative data extracted as radiomics features from the tumor region can be helpful in diagnosis and prognosis since they are believed to reveal underlying mechanisms at genetic and molecular levels [10]. Meanwhile, some researchers have shifted their focus to long-term esthetic outcomes following breast reconstruction surgery due to the cancer treatment. Various developed tools have been introduced such as the Breast Cancer Conservative Treatment (BCCT.core) [11], the Breast Analyzing Tool (BAT) [12] and the kOBCS© [13] to overcome the subjectivity and provide reliability in the evaluation of esthetic outcome. Among these, BCCT.core was utilized in some recent studies [14, 15], however, it is challenging to perform collective evaluation of multiple cases due to its requirements of interactive operations in every step.

The rest of this paper is structured as follows. “Dataset Preparation in ML-Based Medical Image Diagnosis” section presents our improved procedure for dataset preparation in ML-based CAD. “Procedure of Image Acquisition, Anonymization and Annotation” section describes the implementation of the common process part for two demonstrative systems. The processing specific to each of the demonstrative systems is introduced in “Processing of 3D Breast Images for Geometric Feature Extraction” and “Processing of Breast MRI Images for Radiomic Feature Extraction” sections. Finally, “Conclusion” section offers a summary of the main contributions drawn from this study.

Dataset Preparation in ML-Based Medical Image Diagnosis

In practice, ML-based CAD systems require the cooperation from different participant groups including medical institution, collaborative medical specialists, and research institution to prepare a large number of training datasets. Nonetheless, such cooperation may lead to issues in data consistency, confidentiality and interoperability unless a well-organized procedure is established. The expected procedure should not only address these issues but also be widely applicable to a variety of image modalities for different diagnostic purposes. Accordingly, we developed a novel dataset preparation process for ML-based CAD and its schematic representation is shown in Fig. 1. Briefly, it consists of a common part for generating anonymized and unprocessed exchanging data, followed by modality/subject-dependent parts for processing these data into a final dataset targeting specific diagnostic purposes.

Fig. 1
figure 1

Process flow of dataset preparation dedicated to ML-based CAD

Common Part for Image Acquisition, Anonymization and Annotation

The common part aims to generate an anonymized version of unprocessed data for exchanging among participant groups. Since implementation of processes in this part is either similar or falls under few anticipated cases, it is expected to be highly reusable across different research.

Medical institutions perform image acquisition, incremental anonymization and filtering. As part of routine tasks, it is favorable to commonly adopt a predefined configuration when acquiring clinical images from individual patients. Then, these images and sensitive information are securely stored in a confidential media. Occasionally, anonymization and filtering are carried out for multiple patients at once to generate anonymized images with less effort. Required objects/tools for sending annotation requests to medical specialists are also prepared in advance. As for the annotation form, employing an Optical Character Recognition (OCR) design will accelerate subsequent data collection and error correction at research institution.

Collaborative medical specialists from the medical institution in which the clinical images are acquired or other institutions are requested to perform the manual annotation. By observing given anonymized images, specialists reflect their diagnosis and findings as ROI bounding box, ROI mask or information filled into the annotation form. Each examination case is supposed to undergo multiple times of annotation by different specialists, ensuring a thorough evaluation.

Eventually, outcome of the common part consists of anonymized images and annotation objects like filled forms and ROI masks, which are stored in a data transfer media. Since this media serves as a hub for the interaction among participant groups, data consistency and interoperability are guaranteed. Meanwhile, confidentiality is maintained through incremental anonymization. By widely applying the workflow of this part across studies, unnecessary variation of procedures due to difference in image modality and diagnostic purpose can be alleviated.

Modality/Subject-Dependent Part for Preprocessing and Feature Extraction

This part aims to transform anonymized images into feature primitive, generate thumbnail images, and collect valid annotation data. Then, its resulting final dataset is utilized for ML model training and validation. Unlike the common part, processes in this part are tailored to deal with specific image modality for particular diagnostic purpose.

Annotation objects prepared by medical specialists may not be immediately usable for calculation by the research institution because of their paper-based representation or their mismatch with desired format. In order to quickly access the data in filled annotation forms, the use of an OCR device is necessary. From the fields and data recognized by OCR, error correction is performed to confirm the data validity. As for the received ROI markups in form of mask and bounding box, further checks are required to compare their spatial dimension with corresponding anonymized image and to ensure a mutual agreement of markup locations among different annotators. Next, transforms including value range standardization and format conversion can be executed on qualified markups. Annotation data qualified from the data collection and error correction process are then saved to the final dataset.

Since clinical images are often stored in specialized formats such as DICOM or NIfTI, they may be incompatible with popular image viewers and their previews within the dataset are not possible. The use of dedicated software, however, comes at high cost in terms of time and memory. For these reasons, thumbnail images are created as a compact representation of clinical images, enabling quick observation of multiple images in the dataset without loading them.

Regarding the preprocessing, it can be enhanced by incorporating complementary automatic tasks following hard-to-automate tasks. From the entire or local region of each processed image, feature primitives such as pixel-value statistics, geometric features, or transformed image are extracted and appended to the final dataset. At the end, a particular ML model is trained and validated by employing the prepared dataset.

The proposed process flow can be reused in other studies of ML-based medical image diagnosis with high clinical applicability regardless of image modality or diagnostic purpose. Its common part facilitates the participation in cross annotation from external collaborators while securing patient privacy through automated anonymization of clinical images. By breaking the system down into parts, new modality/subject-dependent parts can be added and customized without disrupting the existing procedure of the common part.

Procedure of Image Acquisition, Anonymization and Annotation

In this section, implementation derived from a common procedure for incremental anonymization and cross annotation of two demonstrative applications utilizing 3D breast images and breast MRI images is described. Further processing of each application will be presented in subsequent sections.

For the application of breast esthetic outcome evaluation, 3D images were prepared through the use of a handheld depth camera (Intel Realsense L515 [16]) to reflect the shape aspects of the breast. For the application of breast tumor detection, various T1-weighted MRI images were acquired by 1.5 T or 3 T scanners to reflect the tissue structures inside the breast before and after contrast enhancement.

Regarding the anonymization, personal information that reveals patient’s identity was completely removed both in folder name, file name, file header, and image content from acquired images, while some non-personal information remained for certain purposes. Particularly, some examination-related information in folder name and file name can be included, such as patient ID, acquisition date, and acquisition parameters (like MRI sequence and sequence order in MRI images), for ease of study organization. As for the file header, some demographic information such as sex, age, and weight can be provided. In the case of MRI header, information dedicated to image orientation and position, slice thickness, pixel spacing were included to ensure interoperability and compatibility across different imaging systems. As for image content, patient faces must be avoided in all images. The incremental anonymization was automatically performed on a regular basis and on only newly added images, therefore it eliminated the repeated anonymization of previously processed images.

Regarding the cross annotation, anonymized 3D breast images and corresponding OCR-compatible annotation sheets were distributed to collaborative medical specialists for filling their diagnoses. In case of breast MRI images, coordinates of bounding boxes encapsulating the breast tumor were given by the dataset provider. Based on these boxes, which were drawn by eight radiologists, we also prepared the tumor mask for selected images using the Segmentation module in 3D Slicer software. Various types of annotation are expected to serve a wider spectrum of analyses.

Processing of 3D Breast Images for Geometric Feature Extraction

The processing of 3D breast images for esthetic outcome evaluation and its typical output are depicted in Fig. 2. Since they were thoroughly described in our earlier study [8], this section aims to briefly review them from the perspective of modality/subject-dependent processing part.

Fig. 2
figure 2

a Process flow and b typical outputs in processing of 3D breast images

The evaluation of postoperative breasts was based on factors such as breast size, height of the inframammary fold, and nipple position. Our research dealt with these factors by considering the disparities between the left and right breasts from several viewpoints including shape and appearance. For the time being, 3D breast meshes constructed from triangular cells and shared edges were adopted to serve as a standard shape-based viewpoint. Figure 2a presents the process of handling these 3D breast meshes. It commenced by extracting the breast region which is a primary requirement for further calculation. It then continued with several hard-to-automate processes such as tracing the breast outline and identifying the cross marks and nipples. Then, automatic scaling and region extraction processes were carried out to collect a normalized breast mesh surface. In the next step, various geometric features including volume, surface area, and center of gravity were automatically extracted from each breast mesh closure. Eventually, L-R contrast features based on the difference and the ratio of bilateral breasts were derived by comparing the geometric feature values between breasts. Besides 3D breast images, their corresponding 2D images were also available as JPG files, therefore the thumbnail creation process was not needed. Processing procedure for such 2D images is not covered in this study.

As a result, four typical examples of extracted breast mesh closures and their feature primitives are presented in Fig. 2b. The mesh closures exposed a smooth polyhedral surface without any holes or splits. Based on these mesh closures, L-R contrast features such as volume ratio VL/VR and surface ratio SL/SR were calculated. Aside from that, esthetic scores averaged from evaluation scores of four specialists were also provided. The result revealed that as L-R features approached 1.0, the esthetic scores became higher, indicating a strong correlation between extracted features and esthetic outcome.

Processing of Breast MRI Images for Radiomics Feature Extraction

Aside from conventional 2D and 3D images used for assessing the external appearance of post-surgery breasts, MRI images provide a non-invasive method to explore internal breast structures and identify pertinent abnormalities. This section covers the adoption of the proposed integrated system to efficiently prepare a dataset of breast MRI images dedicated to breast tumor detection.

Material

For demonstrative purpose, we utilized the breast MRI images from the publicly available Duke-Breast-Cancer-MRI dataset [17], accessible online at the Cancer Imaging Archive website (www.cancerimagingarchive.net). This dataset consists of 922 patients with biopsy-confirmed invasive breast cancer and its preoperative MRI images were acquired by 1.5 T or 3 T scanners in the prone positions. The images were provided in DICOM format of axial plane and encompass non-fat saturated T1-weighted, fat-saturated gradient echo T1-weighted pre-contrast, and post-contrast sequences.

Patient identity information was completely removed in folder name, file name, file header, and image content from acquired images. In order to annotate the tumor position for each image, coordinates of tumor bounding boxes were provided by the dataset. Additionally, other annotation data was given in a worksheet file, for example, side of cancer (left or right) and Nottingham grade.

Creation of Thumbnail Images

Since the MRI images were in DICOM format and tumor masks were in NRRD format, the preparation of thumbnail images in a commonly-used file format, such as JPEG, will facilitate the immediate preview of multiple annotated MRI images. Figure 3 describes how a thumbnail image was generated from an MRI image and corresponding mask. The procedure begins by reading and considering each slice of the mask to select a representative slice. The selection criteria can be determined by slice-based measures such as mask area and mask diameter, or just by utilizing the middle slice of the mask. In our study, the slice with maximum mask area was chosen as the representative slice. With the chosen slice index, representative MRI and mask slices were extracted from the MRI image and its mask. Before creating the thumbnail image, some settings like mask appearance (contour, filling, opacity), output image dimension, and file format should be selected. Finally, the thumbnail image was created by subsequently overlaying three layers from background to foreground: representative MRI slice, representative mask slice, and other annotation data displayed as image caption.

Fig. 3
figure 3

Process flow of preparing thumbnail images from MRI and mask images

Preprocessing and Feature Extraction

Anonymized MRI images were standardized by the resampling process, followed by repeated sliding of an equilateral kernel to extract local features. The process flow for these operations is illustrated in Fig. 4.

Fig. 4
figure 4

Preprocessing and feature extraction flow in breast MRI images

Regarding the preprocessing, it is not appropriate to perform the analysis or comparison across multiple images since different MRI images in the dataset may not share the same voxel dimension or voxel aspect ratio. Consequently, resampling was carried out on MRI images to standardize the voxel dimension. In our system, MRI images were resampled to the common voxel dimension of 1 mm × 1 mm × 1 mm. In addition, corresponding masks need to be updated to align with the resampled MRI images.

For medical image processing, target structures are often presented within a smaller region of interest. The global features are not suitable in these scenarios because they capture insights from the entire image as a whole, leading to the overlook of regional structures. For these reasons, local features are advantageous due to its ability to capture the texture within a specific region. To begin, an equilateral kernel slides over the entire or selective positions within the resampled MRI image and extracts the region encapsulated by the kernel. The size of the kernel decides the field of view of local structures; therefore, it should not be significantly larger or smaller compared to the typical size of target structure. At each kernel position, the extracted local region was used to calculate two classes of features including radiomics features and likelihood features. Descriptions of these features are as follows.

  • Radiomics features: These are quantitative features resulting from the conversion of medical images to mineable high-dimensional data. This process was driven by the belief that biomedical images reflect underlying pathophysiology, so their relationship can be revealed by extracting maximal information from care images [18, 19]. For demonstrative purposes, we extracted 15 selective radiomics features representing the intensity distribution and texture from all voxels within the local region, regardless of the tumor mask. The list of these radiomics features is included in Fig. 5 and the meaning of each feature was described at [20].

  • Likelihood features: Based on the empirical observation from various breast MRI images, the region associated with the tumor tends to have different average brightness and heterogeneity compared to other regions. For this reason, likelihood features make use of voxel intensity’s mean, standard deviation or their combinations like mutual sum or mutual product to characterize the probability of having tumor within the region.

Fig. 5
figure 5

Typical output features of preprocessing flow in breast MRI images

Two classes of features mentioned above were adopted to validate our proposed system. The diagnosis is not limited to only our features, but other handcrafted features or those from deep learning can also be applicable.

Experimental Result

Time Performance

For selected MRI images acquired from Duke-Breast-Cancer-MRI dataset, we referred to corresponding bounding boxes and drew the tumor mask as NRRD (Nearly Raw Raster Data) file using the Segmentation module in 3D Slicer software. As a result, the approximate duration to draw the tumor mask within the bounding box of size 41.5 mm × 38.8 mm × 12 mm from an MRI image is 6 min. These additional tumor masks enable the visualization of real tumor shape, as well as the analysis of geometric, intensity and texture inside the tumor.

To load and render one typical breast MRI image together with its tumor mask in 3D Slicer, it required roughly 1 min and 30 MB of memory. Meanwhile, it took about 7 s to generate a thumbnail image of less than 100 KB, which can be easily and quickly loaded at any time. The availability of thumbnail images enabled the preview through large collections of annotated MRI images.

As for preprocessing time, approximately 55 s were utilized to resample an MRI image and its tumor mask by manual manipulation with the Resample Scalar Volume module. In case the resampling was performed automatically, the expended time was only about 2.17 s.

Regarding the feature extraction, a kernel size of 32 mm × 32 mm × 32 mm was adopted to capture tumors of moderate and big sizes. The extraction time at each kernel position with our Slicer Graphic User Interface (GUI) module was about 0.035 s for 15 radiomics features and about 0.001 s for four likelihood features. In the same manner, if the features are calculated for all kernel positions within a breast MRI image of size 300 mm × 300 mm × 185 mm, it will take approximately 7 days. However, the feature extraction can take significantly less time if it is carried out without the GUI and on only selective kernel positions.

With time performance examined as above, the total duration is acceptable for preparing the dataset of various feature types. In addition, the availability of tumor masks and thumbnail images is believed to facilitate future analyses while maintaining a short generation time. Consequently, the proposed system is deemed appropriate to support medical image diagnosis with high clinical applicability.

Feature Effectiveness

Figure 5 shows the radiomics and likelihood features extracted from three typical kernel positions within a typical pre-contrast breast MRI image. These positions consisted of a position deep inside the tumor, a position where the kernel slightly overlaps with the tumor, and a position outside the tumor (control position). The kernel size was chosen to be 32 mm × 32 mm × 32 mm. The last two columns, which provide the ratio of feature value between tumor-involved positions and control position, are useful to select best features for characterizing the tumor. Based on the presented result, we selected five features including InterquartileRange, Autocorrelation and ClusterShade from radiomics group, LocMeanxLocSTD and LocSTD from likelihood group for further investigation.

After selecting features, their data were extracted on various kernel positions from MRI images of multiple patients and utilized to train several machine learning models for detecting whether the kernel center is inside a tumor or not. The purpose of these models is to confirm the feature effectiveness rather than to perform a complete tumor segmentation problem, which is beyond the scope of this paper. The model training experiment was described in Table 1. Particularly, 2514 data samples extracted from 10 pre-contrast MRI images were organized into three distinct feature sets: radiomics with 15 features, likelihood with four features, and their combination with 19 features. Six models were trained from these feature sets and two algorithms including Random Forest (RF) and K-Nearest Neighbors (KNN). RF models were trained with 100 trees (n_estimators = 100), each tree used 1000 samples randomly drawn from the dataset (max_samples = 1000), and the number of features considered for splitting at each leaf node was set to the square root of the total number of features (max_features = \(\sqrt{N_f}\)). KNN models made predictions by evaluating cosine distance from 10 nearest neighbors. The cosine distance was chosen because it is less affected by high-dimensional sparsity that some other distance metrics, like Euclidean distance, experience. The models were then assessed over tenfold cross validation and averages of precision, sensitivity, f1-score, accuracy, and speed (logarithmic inverse of training time) were treated as performance metrics. Precision assesses the correctness of positive predictions and is measured by the proportion of true positives from all predicted positives. Sensitivity (recall) evaluates the model’s ability to detect true positives, indicating the proportion of true positives among all actual positives. F1-score represents a balanced measure by taking the harmonic mean of precision and recall. Accuracy reflects the overall correctness of a model and is calculated as the proportion of correct predictions out of all predictions.

Table 1 Training experiments of tumor detection models

The results of model assessment are shown in Fig. 6. RF model for combined feature set achieved the best score in term of sensitivity (0.804), f1 (0.795), and accuracy (0.897), whereas RF model for radiomics achieved the best precision (0.821). As for KNN models, they were superior to RF models only in term of speed and combined feature set also attained the best performance. These results confirmed the effectiveness of radiomics and likelihood features and their potentials in breast tumor diagnosis with MRI images.

Fig. 6
figure 6

Performance comparison of tumor detection models

Conclusion

In this article, we have proposed an integrated dataset-preparation system dedicated to ML-based medical image diagnosis with high clinical applicability, targeting any modality and diagnostic purpose. Processes in the proposed system were arranged into common part and modality/subject-dependent parts, with the common part encompassing general functions such as incremental anonymization and cross annotation, while the modality/subject-dependent parts accommodating functions tailored to specific modality and purpose such as thumbnail creation, preprocessing and feature extraction. The incremental anonymization and filtering streamlined batch processing for acquired images, thereby reducing the workload of medical specialists. Then, cross annotation on anonymized images enables the privacy-ensured and robust collaboration between different specialists. Depending on the format of clinical images, thumbnail images can be generated to provide a quick observation across the dataset. To accelerate the preprocessing and feature extraction, a process flow that combined manual and complementary automatic operations was also designed. Our system offered advantages in terms of procedure reusability, scalability, and confidentiality for joint projects with various modalities and purposes.

Two demonstrative systems were successfully employed to prove the effectiveness and high applicability of the developed procedure. Although each of them was associated with different image modality and different diagnostic purpose, they can share similar implementation on the common process part. Subsequent processing, especially the feature extraction, was customized to each demonstrative system. System dedicated to plastic surgery evaluation from 3D breast images competently prepared the datasets of 3D breast-mesh closures and their corresponding L-R contrast features. Preliminary result exhibited a strong correlation between extracted features and esthetic outcome assessed by medical specialists. Regarding the system dedicated to tumor detection from breast T1-weighted MRI images, it successfully generated a dataset of local radiomics and likelihood features from resampled images.

The effectiveness of these features was also confirmed by training several machine learning models to predict tumor region voxel by voxel. Trained with 2514 data samples, the RF model with combined radiomics and likelihood features achieved the best assessment in terms of sensitivity, f1, and accuracy, while the RF model with only radiomics features had the highest precision. In order to further validate the system’s capabilities and broaden its applicability, it is necessary to consider additional modalities and diagnostic purposes in future investigations.