Background

Induced pluripotent stem cells (iPSCs), which are derived from somatic cells through reprogramming via several methods [1], have enormous numbers of potential applications, particularly in regenerative medicine, disease modeling and drug screening [2]. However, safe and effective reprogramming methods remain to be described for producing high-quality iPSCs [3]. Before developing personalized stem cell therapies, genome integrity and other safety concerns of iPSC technology must be addressed [4], particularly as genome stability can have profound effects on pluripotency, differentiation and the tumorigenicity of resulting iPSCs [5]. Notably, it was recently shown that the process of reprogramming somatic cells to iPSCs could induce genome alterations such as copy number variation (CNV) [68]. Current evidence suggests that these reprogramming-associated CNVs could be either de novo mutations or enriched mosaic variations in donor cells [9, 10]. CNVs are one of the major genetic causes of human diseases [11]; therefore, it is imperative to carefully investigate possible CNV instability during somatic reprogramming before using iPSCs in a clinical or therapeutic setting.

Genome-wide CNV analysis is technically challenging and the CNV data need to be interpreted with caution [11, 12]. To date, several genome technologies have been utilized for genomic CNV analysis [12]; however, the following points need to be taken into consideration before confirming CNV instability in iPSCs. Firstly, technical limitations exist for identifying CNVs accurately, as the CNV calls of SNP microarrays (SNP for single nucleotide polymorphism) are highly dependent on the external reference set used for the analysis [12]. The lack of internal references on SNP microarrays can lead to low signal-to-noise ratios in the process of CNV calling whilst CNV data obtained by comparative genomic hybridization (CGH) technology are more reliable [12]. Secondly, previous CNV calls in iPSCs cannot readily distinguish between reprogramming-associated CNVs (either de novo CNV or selected mosaic CNV) [10, 13] and pre-existing germ-line CNVs in parental cells. Finally, it is still unknown whether the reported CNV instability is dependent on reprogramming methods or due to the genetic backgrounds of parental cells [6, 14], which can potentially cause method-specific or donor-specific genome instability.

Considering the above concerns, we addressed the issue of CNV instability in iPSCs by taking into account the following factors in our study design. Mouse iPSCs (miPSCs) were generated from the same parental donor to exclude the effect of the genetic background. Various combinations of reprogramming factors and dosages were used for CNV comparison between reprogramming methods. In addition, a high-density CGH microarray assay comparing miPSCs with their parental donor cells was used for genome-wide screening for the CNVs associated with cell reprogramming (Figure 1). Intriguingly, our observations revealed the dosage effect of pluripotent factors on genome integrity during somatic reprogramming.

Figure 1
figure 1

The experimental flow of genome-wide CNV analyses in miPSCs using high-density CGH microarrays. A total of 41 miPSC lines were derived from the same donor of MEF B2 using different reprogramming factors and/or dosages. Genomic DNAs of the parental MEF B2 and various miPSC lines were respectively extracted and fragmented. Cy5-dUTP was used for labeling miPSC DNAs and Cy3-dUTP for the donor DNAs. Each labeled miPSC DNA was hybridized together with a labeled donor DNA onto the mouse genome CGH microarrays. Microarray handling and data analysis were conducted following the Agilent oligonucleotide CGH protocol. Examples were shown for a copy number loss (deletion; depicted by green concave) and a copy number gain (duplication; depicted by red convex) at the bottom.

Results

Initially, we obtained 16 miPSC lines with the three “Yamanaka” factors (Oct4/Klf4/Sox2, OKS) [15, 16] and single Oct4 [17] (Additional file 1): eight miPSC lines obtained by O_0.5, the other eight lines obtained by OKS_1.5. The dosage of each factor in these two methods was equivalent. Intriguingly, we identified 24 CNVs in eight miPSC lines of O_0.5 (i.e. 3.0 CNV/miPSC) and seven CNVs in eight lines of OKS_1.5 (i.e. approximately 0.9 CNV/miPSC) (Additional file 2). The rates of miPSC CNVs between these two reprogramming methods are obviously different, suggesting that the strength of the iPSC reprogramming has an effect on genome integrity. Potentially this suggests that reduced diversity of reprogramming factors and/or reduced reprogramming dosages may induce more CNVs during somatic reprogramming.

To further investigate the possible roles of reprogramming factor diversity and/or dosage in CNV instability, we generated another three sets of miPSC lines from the same donor but using different factor combinations and low/high (i.e., 0.5 ml/1.5 ml) dosages (see Methods for details) (Additional file 1): five miPSC lines obtained by O_1.5, ten lines obtained by OKS_0.5, and ten lines obtained by engineered factors XYZK_0.5 [18]. Using CGH assay we compared the resulting miPSC genomes with their parental genomes and identified zero CNVs in five miPSC lines of O_1.5 (i.e. 0 CNV/miPSC), 25 CNVs in ten OKS_0.5 lines (i.e. 2.5 CNV/miPSC), and seven CNVs in ten XYZK_0.5 lines (i.e. 0.7 CNV/miPSC) (Additional file 2). In total, we screened the genomes of 41 miPSC lines and identified 63 CNVs across 24 genomic loci of the mouse genome (Figure 2 and Additional file 2).

Figure 2
figure 2

The genome distribution of 24 loci of the miPSC CNVs identified in this study was shown. The green bars indicate the deletion loci, and the red bars represent duplication loci.

To investigate the potential mechanism involved in genome instability of miPSCs, we compared the CNV rates between the various methods using the same reprogramming factor combinations with altered dosages. The Mann–Whitney U test with the exact significance was used for single-factor test and the ANOVA test was used for two-factor test. By comparing the CNV numbers between O_0.5 and O_1.5 miPSC lines, we observed more CNVs in O_0.5 miPSCs (24 CNVs/8 miPSCs) than in O_1.5 miPSCs (0 CNV/5 miPSCs). This difference is statistically significant (p-value = 0.030, Mann–Whitney U test) (Figure 3A). Similarly comparing the number of CNVs in OKS_0.5 (25 CNVs/10 miPSCs) and OKS _1.5 miPSC lines (7 CNVs/8 miPSCs), although not significant (p-value = 0.146, Mann–Whitney U test; Figure 3B) does still suggest that a low dosage of reprogramming factors may induce more CNVs than a high dosage. We also combined the CNV data in Figure 3A and 3B together based on their dosages. In the low dosage group (0.5 ml), 49 CNVs were detected in 18 miPSC lines (i.e. approximately 2.7 CNV/miPSC); while in the high dosage group (1.5 ml), eight CNVs were detected in 13 miPSC lines (i.e. approximately 0.6 CNV/miPSC). A significant difference in CNV rates was observed (p-value = 0.008, ANOVA test) (Figure 3C), which strongly supports that the dose of reprograming factors and consequently the reprogramming force can significantly affect the genome instability during reprogramming, with higher doses and stronger reprogramming providing a protective effect. Notably, recent studies have reported that reprogramming factor dosage can affect the epigenetic properties of iPSCs [3] and increased levels of Oct4 and Klf4 were observed to give rise to high-quality iPSCs [19, 20].

Figure 3
figure 3

Comparison of CNV rates between the miPSCs induced by different reprogramming methods. (A) The comparison of CNV numbers between O_0.5 and O_1.5 miPSC lines. This difference is statistically significant (p-value = 0.030, Mann–Whitney U test with the exact significance). (B) The comparison of CNV numbers between OKS_0.5 and OKS_1.5 miPSC lines (Mann–Whitney U test with the exact significance). (C) The combinational analysis of (A) and (B). A significant difference of CNV numbers between high and low reprogramming factor dosages was found (p-value = 0.008, ANOVA test for two-factor analysis). (D) The comparison of CNV numbers between the methods using diverse reprogramming factors with balanced dosages (Mann–Whitney U test with the exact significance). The CNVs are significantly less in the miPSCs of XYZK_0.5 than in those of O_0.5 (p-value = 0.043) and OKS_0.5 (p-value = 0.023).

To further explore the roles of reprogramming force in CNV instability of miPSCs, we compared CNV rates between diverse factor-combinations, while their total dosages remained the same. We introduced the engineered factors XYZK (Oct4-VP16, Sox2-VP16, Klf4 and Nanog-VP16) due to their strong promoting capability during reprogramming [18]. The CNV rates are: O_0.5 (24 CNVs/8 miPSCs), OKS_0.5 (25 CNVs/10 miPSCs) and XYZK_0.5 (7 CNVs/10 miPSCs) (Additional file 2). There is no significant difference between O_0.5 and OKS_0.5 (p-value = 0.633), however, the CNVs of XYZK_0.5 are significantly less than those in O_0.5 (p-value = 0.043) and those in OKS_0.5 (p-value = 0.023) (Figure 3D). Particularly important was the observation that all the seven CNVs detected in XYZK_0.5 came from just two of the ten iPSC lines and six of those seven CNVs were in a single iPSC line. The remaining eight iPSC lines of XYZK_0.5 ad zero CNVs (Additional file 2). This suggests that high-performance engineered factors XYZK are likely to help maintain the genome integrity by reducing reprogramming barriers. Consistently, these observations also support that sufficient reprogramming force has a positive role in iPSC genome integrity.

Discussion

Mouse iPSCs were first generated by retroviral transduction of four transcription factors: Oct3/4, Sox2, Klf4 and c-Myc [1]. However, reactivation of c-Myc increases tumorigenicity in the chimeras, hindering clinical applications [21]. It was observed that the mice derived from c-Myc-free iPSCs showed a significantly reduced incidence of tumorigenicity compared with those derived by the four classic factors [16]. For the sake of high-quality iPSCs generation, we excluded c-Myc in our study design. Considering the low-efficiency of iPSCs induction without c-Myc, we also utilized the optimized reprogramming culture conditions with ultra-high efficiency on iPSCs generation [15].

Based on the reliable CGH technology, we found that reprogramming factor dosage is negatively associated with CNV rate. This result showed the possibility that sufficient reprogramming force may help maintain genome integrity during somatic reprogramming.

Since the reprogramming process is an artificial process that reverses the somatic cell fate into a pluripotent state, reprogramming faces various epigenetic barriers that were set during normal differentiation [3]. Previous evidence showed that the reprogramming process can broadly be divided into two phases: a long stochastic phase of gene activation and a shorter, hierarchical, more deterministic phase of gene activation [3]. The stochastic nature of the reprogramming process suggests that not genetic but epigenetic barriers can be seen as roadblocks in the journey to pluripotency [22]. Reprogramming factors initiate transcriptional effect as well as epigenetic regulation to help re-establishing pluripotency [23, 24]. Moreover, some regulators or chemicals, such as Jhdm1b, valporic acid and vitamin C, can overcome these epigenetic barriers and so markedly enhance reprogramming [3, 25, 26]. These observations suggest that the strength of reprogramming targeting epigenetic barriers is important for successful reprogramming. On the other hand, the iPSCs derived from the stochastic reprogramming phase represent the cells experiencing greater epigenetic changes from the somatic state to a pluripotent one, which could be recognized as a kind of pressure. CNV instability investigated in this study may serve as pressure-induced factors that take part in overcoming epigenetic roadblocks. Therefore, we suggest that iPSCs might experience more genome instability during the reprogramming process if the strength of reprogramming is not enough. Conversely, sufficient reprogramming force will lead to much fewer CNVs. Nevertheless, this hypothesis should be investigated further.

In total we performed genome-wide CNV analyses on 41 miPSC lines derived by different reprogramming factors and/or dosages and detected 63 miPSC CNVs. The average CNV rate is approximately 1.5 per miPSC line, which suggests that the CNVs associated with cell reprogramming is not frequent. The choices of appropriate reprogramming methods with sufficient reprogramming force are likely to help maintain genome integrity of iPSCs.

Conclusions

In summary, we showed, using the CGH microarray assay to directly compare the CNV status of miPSCs to their parental cells is reliable to identify CNV alterations associated with cell reprogramming. Based on the genome-wide analyses of 41 miPSC lines derived by different methods, we suggest that increasing factor dosages, or using high-performance engineered factors [18], is beneficial for the genome integrity of the resulting miPSCs. Our observations highlight the importance of further investigations on the mechanisms and kinetics of cell reprogramming and their effects on iPSC genome integrity.

Methods

Mouse iPSCs generated from the same donor

An embryonic fibroblast cell line (MEF B2) derived from the OG2 mouse was used as the parental donor. The donor cells were infected with retroviruses carrying the indicated reprogramming factors for two days, and then were cultured in iCD1 medium for the generation of iPSCs [15]. We normalized the virus with equal titer (low dosage, MOI = 15 when 0.5 ml virus was used; high dosage, MOI = 45 when 1.5 ml virus was used) according to the titer detecting by Takara Retrovirus Titer Set. In total, we obtained eight miPSC lines using single-factor Oct4 (0.5 ml Oct4, i.e. O_0.5), five lines using single-factor Oct4 (1.5 ml Oct4, i.e. O_1.5), ten lines using three-factor combination (0.167 ml Oct4, 0.167 ml Klf4, and 0.167 ml Sox2, i.e. OKS_0.5), eight lines using three-factor combination (0.5 ml Oct4, 0.5 ml Klf4, and 0.5 ml Sox2, i.e. OKS_1.5), and ten lines obtained by previously reported engineered factors (0.125 ml Oct4-VP16 (X), 0.125 ml Sox2-VP16 (Y), 0.125 ml Nanog-VP16 (Z) and 0.125 ml Klf4, i.e. XYZK_0.5) [18]. The reprogramming efficiencies of different factor combinations were described in previous studies [15, 17, 18]. The iPSC colonies were picked based on Oct4-GFP expression and were validated with a normal karyotype. All of the 41 miPSC lines were harvested at passage 4 for further analysis. All the miPSCs were maintained in mES2i medium, i.e. DMEM supplemented with 15% (v/v) fetal bovine serum, glutamine, non-necessary amino acid, 1000U/ml LIF, 1 μM PD0325901 and 3 μM Chir99021. Our experiments performed with animals were approved by the relevant institutional animal care and use committee (IACUC) of Guangzhou Institutes of Biomedicine and Health (GIBH).

High-resolution assay of comparative genomic hybridization microarray

Genomic DNAs extracted from each miPSC line and the parental donor (MEF B2) were fragmented using Alu I and Rsa I enzyme digestion. DNA labeling was conducted using Agilent SureTag DNA Labeling Kit. Different fluorescence dyes were used for DNA labeling of miPSCs (Cy5-dUTP) and the donor parental cell line (Cy3-dUTP). Each labeled miPSC DNA was hybridized together with the labeled donor DNA onto Agilent SurePrint G3 mouse 1 × 1 M microarray for 40 hours at 65° C. DNA processing, microarray handling and scanning were conducted following the Agilent oligonucleotide CGH protocol (version 6.0).

Genome-wide CNV analyses

The microarray scanning profiles were processed by Agilent Feature Extraction 10.7.3.1. The extracted data was analyzed and plotted by Agilent Workbench 7.0. ADM-2 was selected as statistical algorithm with the threshold of 6.0 and the Fuzzy Zero turning on. Each CNV was called by at least four consecutive probes with log2Ratio (fluorescence value ratio of miPSC-associated Cy5 to donor-associated Cy3) consistent with deletion or duplication.

Statistical analysis

The Mann–Whitney U test with the exact significance was used to determine statistically significant differences in miPSC CNVs between different reprogramming methods. The ANOVA test was used in Figure 3C when a two-factor test is needed. Differences were considered statistically significant when p-value < 0.05.