Introduction

Cancer develops through an accumulation of genetic or epigenetic alterations that affect key genes with critical roles in the transformation of normal to tumor cells1. This occurs throughout a complex and multiple-step process of carcinogenesis across many years involving the inactivation of tumor suppressors or activation of oncogenes2. Breast cancer (BC) is the leading cause of death by cancer in women worldwide3. Despite early diagnosis and treatments, such as surgery and targeted therapies, the morbidity and mortality caused by BC persist in female patients, with metastasis being the main cause of death4. The triple-negative BC (TNBC) subtype, accounting for 15–20% of breast cancers, is characterized by the absence of estrogen and progesterone receptors and Her2 expression4. Therefore, there are no approved targeted therapies so far. TNBC is characterized by heterogeneous, aggressive, and highly metastatic tumors that lead to poor prognosis and shorter overall survival of patients when compared to other BC subtypes4,5.

Aberrant glycosylation is a hallmark of epithelial cancers, being the result of a series of alterations in the glycosylation pathways of proteins or lipids6,7. The expression of carbohydrate tumor-associated antigens favors tumor progression, spreading, and invasiveness of cancer cells8. Among these antigens, the Tn antigen (GalNAc-O-Thr/Ser) is highly and widely expressed in different adenocarcinomas while absent in normal tissues, and its expression correlates with decreased survival of cancer patients and high metastatic potential of tumor cells9,10. The core 1 beta 3-galactosyltransferase is the enzyme that elongates the Tn antigen11 and COSMC (core 1 β3Galactosyltransferase specific molecular chaperone) is its chaperone12. Previous studies showed that mutations in cosmc are associated with metastases and the progression of various types of cancer13,14. Therefore, the expression of Tn antigen and core 1 β-3Galactosyltransferase activity loss can result from cosmc mutation12,15,16,17,18,19. Interestingly, we have previously shown that the Tn antigen favors cancer growth, immunoregulation and metastasis both in lung cancer20 and TNBC21. We generated a Tn-expressing cell clone from the 4T1 TNBC mouse cell line by targeting cosmc with CRISPR/Cas9 gene editing, and demonstrated that the presence of the Tn antigen enhances breast tumor growth and lung metastasis development21 in the 4T1 orthotopic TNBC murine cancer model. We also obtained a Tn-negative cell clone (4T1/Tn) that was poorly metastatic and less aggressive than the 4T1 parental cell line21.

This work aims to study and to compare the gene expression profiles among the three cell lines to elucidate the regulatory changes that explain the observed invasive and metastatic behavior in the Tn+ cell clone. Transcriptomic analyses were used to identify significantly differentially expressed genes (DEGs) involved in biological processes associated with cancer development, metastasis and leukocyte recruitment. Furthermore, this study reports that different highly O-glycosylated protein-coding genes, such as mmp9, ecm1, and ankyrin-2 were upregulated in 4T1/Tn+ tumor cells. We discuss the obtained results in the context of cell malignancy and incomplete O-glycosylation.

Results and discussion

This work explored differential gene expression between three murine TNBC cell lines: 4T1/wt, 4T1/Tn+ and 4T1/Tn, previously characterized by our group21. Due to the genetic modification of the cosmc gene in the 4T1/Tn+ cell line, this model is of great relevance for the study of the significance of truncated cancer-associated O-glycans, in particular of the Tn antigen, in the development and malignant behavior of this type of cancer. We did a stranded sequencing of mRNAs in quadruplicate for each cell line. Reads and metadata can be assessed in the NCBI Bioproject database under the accession number Bioproject PRJNA1021590. The 4T1/Tn+ cell line was shown to possess higher metastatic potential than the parental 4T1 cell line, while the 4T1/Tn is not metastatic in the same tested conditions21.

Overall RNA-seq transcriptome results

A total of 752,024,976 (53.3G) raw reads were obtained for all samples. After adapter removal and cleaning by quality and length, 612,675,967 high quality reads were retained for subsequent analyses. A summary of sequencing statistics and mapping rate is presented in Table 1. We mapped these reads to the Mus musculus reference genome GRCm38 and obtained an overall unique mapping rate above 95.3 in all samples (Table 1), that is more than 32.8 × 106 concordantly mapped paired reads for each replicate, which is enough to accurately quantify the expression level of genes in eukaryotes (see for instance22). We detected a total of 57,439 annotated genes and novel transcripts. After filtering by gene expression, 21,661 genes and novel transcripts were kept for subsequent analyses. A principal component analysis (PCA) was done in order to study the overall variability between samples under study. The scatter plot in Supplementary Fig. 1 shows the distribution of the 12 samples in the space formed by axis x, the first component, and by axis y, the second component generated by PCA (Supplementary Fig. 1). As can be seen, the replicates from the same cell line cluster together and apart from samples from other cell lines. Note that the first dimension, that explains 58% of the variability, contributes to separating 4T1/Tn+ from the other two cell lines, while the second component, explaining 32% of the variability, separates 4T1/wt from 4T1/Tn+ and 4T1/Tn. Taken together these results support the robustness of the study and show clear expression differences between analyzed cell lines.

Table 1 Basic metrics of the RNA-seq transcriptomic analysis*.

Differential gene expression analysis between TNBC cell lines

We detected 1405 DEGs between 4T1/Tn+ and 4T1/wt and 838 DEGs between 4T1/Tn+ and 4T1/Tn (FDR ≤ 0.01 and |log2FC|≥ 2). Among the DEGs identified between 4T1/Tn+ and 4T1/wt, 733 were up-regulated and 672 down-regulated in 4T1/Tn+ (Fig. 1 and Supplementary Fig. 2). On the other hand, the differential expression analysis between 4T1/Tn+ and 4T1/Tn showed 623 DEGs up-regulated and 215 down-regulated in 4T1/Tn+ (Supplementary Table 1). In addition, we detected a total of 1264 DEGs between 4T1/Tn and 4T1/wt, 498 genes were up-regulated and 766 were down-regulated in 4T1/wt (Supplementary Table 1).

Figure 1
figure 1

“Volcano plots” of adjusted statistical significance (FDR) against log2 fold change observed in genes in the expression pairwise comparison between analyzed cell lines. DEGs are indicated in blue. Thresholds of significance (< 0.01 FDR) and log2 fold change (> 2). The name of the most significant genes is indicated.

Firstly, we focused on the most significant up-regulated and downregulated DEGs between all pairwise comparisons (Figs. 1, and 2). Some of these DEGs were up-regulated in 4T1/Tn+ when compared to both 4T1/wt and 4T1/Tn cells and shown to be relevant in cancer: sdc323, kif21b24, afp25, ank226, pip5k1b27, serpine228, ecm129,30,31 and hsd11b132. Interestingly, some of these genes encode for O-glycosylated proteins, as predicted by the high number of potential O-glycosylation sites using the algorithm Net-O-Glyc 4.0. On the other hand, cldn7, rab25, jag2, mal2, arhgef17, pitx1, sox13, camsap3 and mmp15 were down-regulated in 4T1/Tn+ cells in relation with 4T1/wt and 4T1/Tn cells (Fig. 2). It is likely that these DEGs are involved in the specific phenotypic expression and regulation of functional characteristics of 4T1/Tn+. Importantly, the transcriptomic data were validated by quantitative real-time PCR (qPCR) of 5 DEGs, that were previously reported as relevant for cancer development (Supplementary Fig. 3): ecm1, serpine2, kif21b, camsap3 and mmp15.

Figure 2
figure 2

Heatmaps of the top 30 differentially expressed genes (DEGs) identified in each pairwise comparison: (A) 4T1/wt versus 4T1/Tn+, (B) 4T1/wt versus 4T1/Tn, and (C) 4T1/Tn versus 4T1/Tn+. Top DEGs are those with the lower FDR values. Genes and samples were grouped by hierarchical clustering.

Biological processes associated to DEGs in TNBC with truncated O-glycosylation

To highlight the physiological processes that could be related to malignant behavior in TNBC with incomplete O-glycosylation, we performed Gene Ontology (GO) enrichment analysis of all DEGs identified in the pairwise comparison of all cell lines. To this end, we focused on the specific patterns of 4T1/Tn+ by comparing 4T1/Tn+ versus 4T1/wt and 4T1/Tn+ versus 4T1/Tn. The GO enrichment analysis showed that functional terms such as extracellular matrix (ECM) structural constituent (GO:0005201), cell junction (GO:0030054), metallopeptidase activity (GO:0008237), basement membrane (GO:0005604), and proteolysis (GO:0006508) (Fig. 3 and Supplementary Table 2), all molecular functions linked to cell invasion and metastasis, were significantly enriched among DEGs. Of note, these biological processes were selected for having more associated DEGs. In addition, immune-related processes, such as chemokine production (GO:0032722), leukocyte migration (GO:0050900) and blood vessel remodeling (GO:0001974), were also selected for their possible connection with aggressiveness of the truncated O-glycosylated cell line (Fig. 3 and Supplementary Table 2).

Figure 3
figure 3

Functional enrichment test results for the functional GO terms mentioned in the main text. The p-value of Fisher’s exact test, the number of DEGs, and the percentage of DEGs in the total number of genes are indicated for each GO term.

In cancer, changes in the ECM can contribute to tumor growth and metastasis33. Several of the obtained enriched GO terms were linked to this process (Supplementary Table 2 and Fig. 3). In addition, we found ecm1, an ECM O-glycoprotein that interacts with different proteins, maintaining the integrity and homeostasis of skin34 and connective tissues35, as one of the most up-regulated genes in the 4T1/Tn+ cell line. ECM1 is up-regulated in various types of malignant epithelial tumors, including invasive breast ductal carcinoma36. In addition, it is a diagnostic marker and correlates with poor prognosis in malignancy of BC37. ECM1 induces tumor proliferation, metastasis, epithelial-to-mesenchymal transition (EMT) and blood vessel generation in BC31,36,37,38. Interestingly, this protein is O-glycosylated, with five confirmed sites in the N-terminal domain39 and one in the C-terminal domain40. These sites are among the 27 O-glycosylation potential sites predicted by the Net-O-Glyc 4.0 algorithm, which are distributed all over the protein. Although the function and structure of these O-glycans in cancer progression are still unknown, and considering that ECM1 functions are mediated by interactions with other molecules, it is likely that some of these are affected by the presence of O-glycans. Thus, the expression of the Tn antigen in the 4T1/Tn+ cell line may regulate these interactions and modulate ECM1 protumor functions. In fact, there are previous works that report that specific O-glycan expression modulates protein interactions39,41.

Cancer cells can modify the ECM by secreting enzymes, including matrix metalloproteinases (MMPs) and disintegrin and metalloproteinase with thrombospondin motifs (ADAMTS), which were identified in the GO term metallopeptidase activity (GO:0008237) and proteolysis (GO:0006508). These enzymes degrade various proteins in the ECM allowing the cells to invade surrounding tissues and form new blood vessels that support tumor growth42,43. In addition, MMPs can target receptors on the surface of tumor cells, activating pathways that foster proliferation, suppress apoptosis, stimulate metabolic changes or induce EMT44,45,46. Several DEGs that were functionally linked to the GO term metallopeptidase activity (GO:0008237) participate in the development of BC and promote tumor progression and angiogenesis, such as mmp1a47, mmp948, mmp1349 and mmp1550. Interestingly, overexpression of MMP-1 and MMP-9 is crucial in invasion, vascular intravasation, EMT and metastasis of TNBC51,52 and are associated with poorer BC patient survival49.

MMP-9 is upregulated in invasive BC and is associated with triple-negativity48, poor prognosis53,54 and metastases48,55 in BC patients. Interestingly, MMP-9 has been proposed as a therapeutic target for metastatic breast cancer using 4T1 cells56. In the context of this work, it is worth noting the importance of O-glycosylation in the regulation of MMPs. Several MMPs have a linker domain that is situated between the Zn2+-binding domain and the haemopexin domain57. In MMP-9 the linker domain is rich in the amino acids serine, threonine and proline, and was found to be highly O-glycosylated58. Around 85% of the MMP-9 sugars are O-linked and attached to 14 O-glycosylation sites59. In addition, the membrane anchors and the cytoplasmic domains of the membrane-type MMPs also have potential O-glycosylated sites59. It has been suggested that, in general, O-glycosylation might stabilize MMPs, improving secretion and increasing its protection against degradation. Protection against proteolysis might be important for MMP-9 because it is released at inflammatory sites, where other proteases are likely to be abundant59. Cancer-associated glycans expressed on MMP-9 regulate interactions with carbohydrate binding proteins, such as galectins60. In the tumor microenvironment, altered glycosylation of MMP-9 allows the detachment of tumor cells from the ECM61. The O-glycosylated domain also controls the bioavailability of active MMP-9, together with the haemopexin domain, and gives interdomain flexibility to the MMP-9 molecule58. This flexibility is important for finding cleavage sites on long substrates62. It has also been shown that O-glycosylation of MMPs can affect their recruitment to the cell surface63, their internalization64 or regulate their autolysis65, and these effects on one member of the family can even affect the activation of other MMP63. Thus, it would be interesting to determine if mmp9 ablation in our Tn+ cell line may reverse the increased aggressiveness associated with Tn expression and also to evaluate whether MMP-Tn activity is different from MMP derived from 4T1 wt cells.

Unlike most MMPs, MMP-15 is a member of the membrane-type MMP subfamily, which are expressed at the cell surface rather than secreted in a soluble form66. MMP-15 has anti-apoptotic properties in cervical cancer67 and is associated with lung cancer aggressiveness68. Furthermore, MMP-15 expression is related with poor prognosis69 especially at the transcriptional level. Surprisingly, we found that MMP-15 is downregulated in our 4T1/Tn+ cell line with regard to the other analyzed cell lines. In agreement with our observations, contrary data have also been reported70, suggesting the existence of different roles for MMP-15. For instance, it can mediate anti-tumor processes by cleaving N-cadherin extracellular domain and therefore preventing cell adhesion71. However, its role in TNBC is still unknown. Thus, the analysis of MMP-15 protein expression and its prognostic significance in TNBC remains an important area for future investigation.

Cell junctions are cellular elements that establish links either between two cells or between a cell and the ECM. They are crucial for the maintenance of tissue architecture and proper cell behavior, and their dysfunction can contribute to the development and progression of cancer72. Indeed, dysregulation of junction genes has been widely reported in BC72. Two out of the 26 DEGs in 4T1/Tn+ cells belonging to the cell junction GO term (GO:0030054) were identified: ank2 and cld7. Ankyrin-2, encoded by the ank2 gene, which is highly up-regulated in the malignant cell line, is a ubiquitous structural membrane protein73. It participates in cell motility, activation, proliferation, contact, and maintenance of specialized membrane domains. Ankyrin-2 promotes proliferation, migration and invasion of cancer cells and it is also involved in drug resistance of cancer cells73,74. Ankyrin-2 has 408 potential O-glycosylation sites in its 3898 amino acids, raising the question whether Tn affects the expression or function of this protein in BC. Although Ankyrin-2 expression is increased in solid tumors32, no differences between breast tumors and normal tissues have been found. Thus, the significance of its upregulation in the 4T1/Tn+ cell line should be further investigated. Claudin-7, encoded by the down-regulated cld7 gene, has been reported to be less expressed in BC specimens than in normal breast samples, a fact that was previously related with a more aggressive behavior of cancer cells75. Claudins are the principal sealing proteins of the tight junctions76. Loss of Claudin-7 expression is associated with the discohesive architecture typically observed in high-grade lesions, suggesting potential functional roles for Claudin-7 in BC progression77 favoring invasion and metastasis. Nevertheless, the Claudin-7 protein does not present any O-glycosylation potential sites, suggesting that the loss of this tumor suppressor75 might be involved in the aggressiveness of 4T1/Tn+ cells.

An increase in microtubule stability78 can drive BC metastasis. In differentiated epithelial cells, most microtubules are not anchored to the centrosome. Instead, their minus-ends are stabilized by binding to a family of proteins, including calmodulin-regulated spectrin-associated proteins (CAMPSAPs)79, that were identified in the GO term structural constituent (GO:0005201). Interestingly, the loss of campsap3 promotes AKT-dependent EMT by tubulin acetylation80. Since the protein fragment of CAMSAP3 that interacts and stabilizes the microtubule minus-end has many potential O-glycosylation sites, we hypothesize that this post-translational modification affects or modulates this interaction, even if this gene is down-regulated.

The basement membrane GO term (GO:0005604) contained several significant genes, such as col17a1. This gene encodes for a transmembrane protein that is involved in cellular adhesion to the underlying ECM. An under expression of col17a1 has been demonstrated in BC, likely due to DNA methylation and inactivation of p53, and overexpression in samples of BC81,82,83.

Chronic inflammation and immunosuppression are inducers of cancer progression84. Many DEGs related to the identified immune processes were upregulated in the 4T1/Tn+ cell line. In this study we found that nos2, nod1 and chil1 were identified in the chemokine production GO term (GO:0032722). The inflammation-associated enzyme, inducible nitric oxide synthase (NOS2) promotes angiogenesis and carcinogenesis and predicts poor survival of ER-negative BC85. High levels of NOS2 enhance cell motility and invasion of ER-negative BC86. On the other hand, Nod1, a member of the NLR family, acts as a sensor for intracellular bacteria by recognizing specific glycopeptides derived from peptidoglycan. Nod1 activation mediates distinct cellular responses including IL-8 release87. Last, chitinase 3-like 1 (CHIL1) is highly secreted by stromal cells in TNBC88 and promotes tumor proliferation, invasion and angiogenesis in colorectal cancer89,90. Other DEGs that participate in the positive regulation of chemokine production and leukocyte migration, such as TNFα, were also upregulated in 4T1/Tn+. The inflammatory cytokine TNFα promotes survival91, metastasis92,93, EMT94, angiogenesis, aggressiveness95 and tumor-promoting macrophage infiltration.

Transcription factor analysis

Altered glycosylation patterns on cell surface molecules, transmembrane proteins, and growth factors lead to tumor cell proliferation, invasion, and metastasis through the activation of signaling pathways that can shape the expression of transcription factors (TFs). We identified 386 significantly differentially expressed TF genes (FDR < 0.01 & logFC >|2|) that showed the same trend of expression in 4T1/Tn+ compared to 4T1/Tn and 4T1/wt. When these genes were submitted to the ChEA3 server, 217 were mapped to orthologs in the human genome that were further considered for analysis (Supplementary Table 3). Based on the mean-rank method, we selected the top 20 TFs that regulated 186 out of the 217 submitted DEGs (Supplementary Table 1). Five TFs clustered in distant regions of the global regulatory network (Fig. 4A, Box I), three of which were enriched in the functional term stem cell fate specification (ebf3, sox18 and pparg) (Supplementary Fig. 4). Likewise, thirteen TFs were closely grouped in the generated global network, meaning that they show high co-expression similarity (Fig. 4A, Box II). The functional GO term enrichment analysis suggested that they are mainly linked to the regulation of genes associated with epidermis development (foxq1, irx4, sp6, casz1, grhl2, tp63, znf750) and positive regulation of wound healing (ovol1, barx2, sox7, zbtb7c, grhl1, grhl3) (Supplementary Fig. 4). Both functional processes have been linked to BC growth in a wide variety of functional genomics studies 96,97,98,99,100. The local network (Fig. 4B) which included only the 20 identified TFs, indicated that the expression of all but three selected TFs were correlated in the GTEx database, with some TFs showing strong associations (e.g., grhl3, tp63, znf750, ovol1 and casz1) (Fig. 4B). The differential signaling due to altered O-glycosylated patterns might explain the presence of a different TF signature in 4T1/Tn+ cells, since human BC cells with distinct O-linked glycans respond different to EGF binding. This variation is attributed to the differential nuclear translocation of EGFR101 and is regulated by the formation of an EGFR/galectin3/MUC1/β-catenin complex at the cell surface, which depends on the O-glycan signature of the cells101. Therefore, the identified TFs are likely regulated by glycosylation of different proteins involved in signal transduction cascades in the 4T1/Tn+ cell line.

Figure 4
figure 4

Transcription factor enrichment analysis. (A) The scatter plot of human transcription factors (TFs) (each point) was build based on their co-expression similarity. The top 20 enriched TFs are indicated in skyblue. This plot was generated in the ChEA server with default parameters. The WGCNA was done using expression data from GTEx samples and visualized using Cytoscape. (B) The local network shows the co-expression similarity among the top 20 identified enriched TFs.

Limitations of this study

As with any research, there are limitations in the study worth mentioning. First, these observations are limited to one TNBC cell line murine model. The transcriptomic study on further TNBC human models will undoubtedly validate the results obtained in the present work. To this end, new TNBC cosmc knock out cell derivatives need to be obtained and characterized. Secondly, since the study of in vitro cultured cancer cells allows to determine the transcriptome only from tumor cells, the use of tumor samples is required to reproduce the heterogeneity and behavior of tumors. Indeed, the existence of other cell types in the tumor microenvironment, as cancer-associated fibroblasts and immune cells, can affect the Tn expression in the surface of cancer cells. In this sense, the study of human organoids is an interesting alternative, since they resemble to the complexity of the tumor allowing genetic engineering102.

Conclusions

In this work, we describe the gene expression profile of BC aggressiveness and metastasis in a TNBC preclinical model of truncated O-glycosylation. The altered biological processes and DEGs that promote tumor growth, invasion and immunomodulation might explain the aggressive properties of 4T1/Tn+ tumor cells21. Furthermore, different highly O-glycosylated protein coding genes, such as mmp9, ecm1 and ank2, were upregulated in 4T1/Tn+ tumor cells. These results support the hypothesis that incomplete O-glycosylation that leads to the expression of the Tn antigen, which might regulate their activity or interaction of different molecules, promotes cancer development and immunoregulation. The role of O-glycans in the identified molecules remains to be elucidated.

Methods

Cell culture

The murine TNBC cell line 4T1 was obtained from ATCC and cultured in DMEM with glutamine (Capricorn, Germany or Gibco, USA) supplemented with 10% inactivated fetal bovine serum (Capricorn, Germany) and antibiotic–antimycotic (Thermo Fisher) at a final concentration of 100 units/mL of penicillin, 100 µg/mL of streptomycin, and 0.25 µg/mL of Gibco Amphotericin B (complete culture medium). The 4T1/Tn+ cell line was generated by CRISPR/Cas9 guide targeted to cosmc gene exon 2 170 s (GATATCTCGAAAATTTCAG) cloned in pBS-U6sg plasmid (Tacgene, France) and GFP-tagged Cas9-PBKS plasmid (Addgene), as previously described21. Cells were obtained by flow cytometry sorting (BD FACSAria™ Fusion). The expression of Tn antigen was verified by staining with the anti-Tn mAb 83D421 in the obtained Tn+ cell line. A Tn cell clone was also selected for further characterization according to Tn expression. Cells were maintained at 37 °C in a humidified atmosphere of 5% CO2 and  harvested by washing with phosphate-buffered saline (PBS) pH 7.4 and incubation with trypsin (0.1%) and EDTA (0.04%) in PBS.

RNA isolation and sequencing

RNA from 4 replicates of each cell line, wild-type, Tn+ and Tn, was purified using the RNeasy kit from Qiagen according to the instructions of the manufacturer. Both RNA concentration and integrity (RIN values) were measured in an Agilent 2100 Bioanalyzer (Agilent, USA). A total of 4 μg of RNA per sample was used as the input for mRNA library preparations. Twelve libraries were generated using Illumina Truseq Stranded mRNA kit. Then, all libraries were sequenced using NextSeq Illumina platform at Macrogen Inc. (Korea), in one lane, using 150 bp paired-end reads.

Reads trimming and mapping

Quality control of raw reads were performed using FastQC103. Adapter sequences and low-quality reads were removed using scythe v0.991 (github.com/vsbuffalo/scythe) and sickle v1.33 (github.com/najoshi/sickle). High-quality reads were mapped to the Mus musculus reference genome (i.e. version GRCm38 downloaded from Ensembl database) using the software Hisat2 (v2.1.0)104. Hisat2 was run with ‘--rna-strandness RF’ and ‘-k 1’, with all other parameters set as default.

Assembly of transcripts and estimation of abundance

Cufflinks (v2.2.1) was used to assemble transcript models from read alignments105. This software recreates a set of transcript models that best explain the sequencing alignments observed in the samples. Cufflinks was run with '--library-type fr-firststrand’, with all other parameters set as default, for each sample separately. Then, a single annotation file was generated using the tool Cuffmerge105. This software merges each of the sample assemblies with the reference annotation file in order to combine novel transcripts with known annotated transcripts. Finally, for each sample, the total number of reads that effectively mapped to each gene described in the final assembly was calculated using the function feature counts of the library Rsubread106. In order to visualize the overall relationship between the samples under study, a PCA analysis was performed using DESeq2 (version 1.18.1)107.

Gene expression and gene enrichment analysis

All pairwise comparisons between cell lines were done, that is, 4T1/wt versus 4T1/Tn+, 4T1/wt versus 4T1/Tn, and 4T1/Tn versus 4T1/Tn+. DEGs were identified using the R package DESeq2 (version 1.18.1) with default parameters107. Only genes with at least one read count in at least ten samples were considered expressed and used for further analysis. The P-values were adjusted for multiple testing using the FDR procedure108. Those genes having a FDR ≤ 0.01 and a value of |log2FC|≥ 2 were considered significantly differentially expressed.

Gene Ontology (GO) enrichment analysis

The functional roles of the DEGs were studied using a GO enrichment analysis. GO terms were downloaded using the Bioconductor R package biomaRt (version 2.48.3)109. In detail, the predicted DEGs were compared with all the expressed genes as background sets. GO terms significantly enriched within DEGs were detected using Fisher's exact test using R.

Transcription factor enrichment analysis (TFEA) and prediction of putative O-GalNAc sites

In order to identify putative transcription factors (TFs) involved in the regulatory changes observed in the 4T1/Tn+ cell line we used the ChIP-X Enrichment Analysis 3 (ChEA3) software110. Significantly DEGs that showed the same expression trend (up-regulated or down-regulated) in the 4T1/Tn+ cell line when compared to 4T1/Tn+ or 4T1/wt were submitted to the ChEA3 online server, available at maayanlab.cloud/chea3. A list of mouse gene symbols was submitted and automatically mapped to available human orthologs. The Mean-rank method (default) was used to identify the TFs whose putative transcriptional targets were most closely similar to the gene set. Co-expression networks, representing co-expression similarity in humans, were built using Weighted Gene Co-expression Network Analysis (WGCNA)111 from expression data from GTEx samples112. The Net-O-Glyc 4.0 program113 was used to predict mucin type O-GalNAc glycosylation sites in all expressed proteins, including identified TFs.

Validation by quantitative real-time RT-PCR (qRT-PCR)

The qPCR assay was performed to validate the reliability of the identified DEGs. Total RNA from the three cell lines was isolated using TRI-reagent (Sigma-Aldrich) according to the instructions of the manufacturer. Samples were analyzed in an Eco real-time PCR System (Illumina) using Fast SYBR® Green Master Mix (Applied Biosystems). The PCR conditions were as follows: 45 °C for 5 min, 94 °C for 30 s, 45 cycles of 94 °C for 5 s, and 60 °C for 34 s. The melt curve of each amplicon was set as 95 °C for 15 s, 60 °C for 1 min, 95 °C for 30 s, and 60 °C for 30 s. The primers used are shown in Supplementary Table 4. Results were expressed as the ratio between each evaluated gene and gapdh expression. The relative quantification of all selected genes was evaluated using the 2 − ∆∆CT method and normalized to gapdh. All reactions were performed with three biological replicates.