Objective

Mammographic screening has led to an increase in early-stage breast cancer detection [1]. Ductal carcinoma in situ (DCIS) is characterised by the presence of abnormal cells in the milk duct of the breast and is considered the earliest form of breast cancer [2]. DCIS does not spread from its site of origin; therefore, it is non-invasive at the time of detection.

DCIS is a highly treatable Stage 0 breast cancer with a good prognosis. However, if DCIS tissues are left untreated or undetected, they can spread into the surrounding breast tissue. The standard treatment is surgical resection, which prevents the local recurrence and future invasion of DCIS. However, surgical resection of DCIS does not reduce the risk of death from breast cancer [3]. Although some clinical studies have evaluated non-surgical treatment of DCIS, they fail to provide evidence against resection as the standard of care [4]. Resection remains the standard treatment for two key reasons. First, DCIS is non-invasive at the time of detection, but it may progress to invasive carcinoma over time [5, 6]. Second, since a core-needle biopsy involves the collection of tissues from the site of a lesion only, cancer cells can be missed, resulting in a DCIS misdiagnosis [7].

In this study, surplus specimens of breast fine needle aspiration cytology were analysed. The DCIS specimens were obtained from patients who underwent surgery at Nippon Medical School Musashi Kosugi Hospital. Conventional ultrasound-guided biopsy was not performed in this study, but specimens were obtained by puncture from the resected tissues immediately after surgery. We performed genome-wide transcriptomic profiling using the Affymetrix Clariom D Assay (Thermo Fisher Scientific, Waltham, MA, USA), a next-generation microarray with more than 6 million probes, including unidentified transcripts. Six DCIS samples (cancerous and adjacent non-cancerous samples from three patients) were analysed.

Data description

Table 1 summarises our study data. The specimens analysed in this here were taken from three patients who underwent surgery at Nippon Medical School Musashi Kosugi Hospital (Kawasaki, Japan). Data file 7 summarises the patients’ clinical characteristics. Study protocols were conducted in accordance with the 1975 Declaration of Helsinki and informed consent was obtained from each patient. The primary surgical specimens were evaluated following the World Health Organisation (WHO) 4th edition criteria. RNA preparation and microarray analysis in this study were performed as described previously [8]. Total RNA was extracted with a guanidinium thiocyanate/acid phenol–chloroform extraction method using RNAiso-Plus (Takara Bio, Kusatsu, Japan). The concentrations and A260/A280 ratios of RNA were determined using a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific) (Data file 8). The size distribution of total RNA was evaluated using Agilent TapeStation (Data file 9).

Table 1 Overview of data files/data sets

The isolated RNA was subjected to microarray analysis using the human Affymetrix Clariom D platform (Thermo Fisher Scientific), a next-generation microarray device covering > 540,000 transcripts including long non-coding RNAs. The RNA samples were then labelled using the reagents and enzymes supplied in the GeneChip® WT Pico Reagent Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions with slight modification. Briefly, total RNA (100 ng) from each sample was subjected to reverse-transcription and subsequent polymerase chain reaction to synthesise T7 promoter-tagged double-stranded cDNA. The cDNA was then subjected to in vitro transcription with T7 RNA polymerase to synthesize complementary RNA (cRNA). cRNA was reverse-transcribed using random primers to synthesize sense-strand cDNA.

After removing the template RNA using RNase H, sense-strand cDNA (5.5 μg) was digested with uracil-DNA glycosidase into fragments with sizes ranging from 40 to 70 nt. The success of fragmentation was confirmed using an Agilent 2100 Bioanalyzer. The fragments of cDNA were then labelled with biotin using terminal deoxynucleotidyl transferase and subjected to hybridisation according to the manual of GeneChip® WT Pico Reagent Kit. The Clariom D microarray was processed through the automatic washing step using a GeneChip® Hybridisation, Wash, and Stain Kit (Thermo Fisher Scientific) and GeneChip® Fluidics Station 450 (Thermo Fisher Scientific). Hybridised targets were stained with kit-provided streptavidin–phycoerythrin. Fluorescent signals from them were detected using a Scanner 3000 7G (Thermo Fisher Scientific). Raw data. i.e., CEL files, were produced using Affymetrix GeneChip Command Console Software and subjected to data processing using Affymetrix Expression Console Software. The CEL files were registered as datasets under Gene Expression Omnibus (GEO) accession no. GSE169393. A detection call algorithm was applied to filter and remove missing expression values based on absent/present calls. Using this algorithm, present, marginal, or absent calls were obtained for each probe set in each microarray. A scaling factor was applied to the normalised data from the CEL files to bring the average intensity for all probes on the microarray to 500, generating CHP files for use in Microarray Suite 5. For gene expression comparisons, data assigned to absent calls were omitted. The box plots of the microarray signals are available in Data file 10. The correlation of carcinoma and non-carcinoma signal values is available in Data file 11. Normalised signal values for individual genes are listed in Data file 12.

Limitations

Here we describe DCIS transcriptomic profiling results, which may also provide insight regarding presurgical diagnostic biomarkers. One limitation of our study is that specimens were isolated by surgical resection. Therefore, the applicability of the results to specimens obtained by core-needle aspiration biopsy should be validated as described previously [9]. Another limitation is the small sample size. Furthermore, qRT-PCR analysis should be conducted to validate the differential gene expression patterns identified here.