Global mapping of binding sites for phic31 integrase in transgenic maden-darby bovine kidney cells using ChIP-seq
ΦC31 integrase, a site-specific recombinase, can efficiently target attB-bearing transgenes to endogenous pseudo attP sites within mammalian genomes. The sequence features of endogenous binding sites will help us to fully understand the site-specific recognition function by ΦC31 integrase. The present study was aimed to uncover the global map of ΦC31 integrase binding sites in bovine cells and analysis the features of these binding sites by comprehensive bioinformatics methods.
In this study, we constructed a ChIP-seq method that can be used to uncover the global binding sites by phiC31 integrase. 6740 potential ΦC31 integrase binding sites were identified. A sequence motif was found that contains inverted repeats and has similarities to wild-type attP site. Using REPEATMASKER, we identified a total of 20,183 repeat-regions distributed in 50 repeat types for the 6740 binding sites. These sites enriched in “regulation of GTPase activity” of in the GO category of biological process and KEGG pathway of signal transmembrane transporter activity.
This study is the first time to uncover the global map of binding sites for ΦC31 integrase using ChIP-sequencing method and analysis the features of these binding sites. This method will help us to fully understand the mechanism of the site-specific integration function by phiC31 integrase and will potentially boost its genetic manipulations in both gene therapy and generation of transgenic animals.
KeywordsΦC31 integrase Binding sites ChIP-seq Sequence motif
Chromatin immunoprecipitation-DNA sequencing
Kyoto Encyclopedia of Genes and Genomes
Maden-darby bovine kidney
Multiple Em for Motif Elicitation
The Streptomyces phage ΦC31 integrase is a site-specific recombinase that catalyzes recombination between two short DNA sequences, a bacterial attachment site (attB) and a phage attachment site (attP) . As prokaryotic attachment sites, attP and attB are unlikely to occur in the genomes of mammals and most plants, but pseudo attP sites are present in the eukaryotic genome [2, 3, 4, 5], ΦC31 integrase can efficiently target attB-bearing transgenes to endogenous pseudo attP sites within mammalian genomes [6, 7]. Because ΦC31 integrase-mediated recombination is stable, irreversible and does not require external chemical energy and cofactors, ΦC31 integrase has become a powerful tool for both gene therapy animals [8, 9, 10, 11, 12, 13] and generation of transgenic animals [14, 15, 16]. However, with the use of ΦC31 integrase system in genetic manipulations, an increasing number of problems is being encountered. On one hand, the presence of a large number of pseudo attP sites in the genome increases the risk of chromosomal rearrangements and DNA damage . On the other hand, due to the high activity of the integrase system in the genome, the specificity does not appear to be as high as expected. Therefore, it is necessary to further study the mechanism of ΦC31 integrase system in mammalian cells, in order to further optimize the system, improve its integration specificity and reduce the non-specific recombination.
Studies on sequence characteristics of pseudo attP sites in many species revealed that sequence characteristic is one of the most important factors for ΦC31 integrase to recognize the pseudo attP sites and mediate site-specific integration in eukaryotic cells [18, 19, 20, 21]. Meanwhile the pseudo attP sites in different cell types of the same species are cell-specific. For example, the BpsM1 site we found in bovine maden-darby bovine kidney (MDBK)cell line was not detected in bovine ear fibroblasts . The 19q13.31 site in the human genome is detectable in all cell types, and 3q26.31 appears only in HepG2 cells . These phenomena indicate that, in addition to sequence features, the recognition and binding of the integrase to the sites are also affected by other factors such as chromatin structure. However, this is still not fully studied and understood. The research on ΦC31 integrase function in mammalian cells is still in its infancy. Indepth studies will accelerate its application in biological engineering. Especially, the main problems that need to be investigated include several aspects. First of all, although about 102–103 pseudo attP sites have been predicted in mammalian cells , the actual number of potential sites we didn’t know. Thus the data on ΦC31 integrase is too small to analyze its function and mechanism. Secondly, at present, common methods for identification of integration sites are plasmid rescue, reverse nested PCR and other methods. These methods have limitations and provide only a small amount of binding sites information.
Chromatin immunoprecipitation assay (ChIP) is a powerful tool for studying the interaction between protein and DNA in vivo. The ChIP-Seq technology, which combines ChIP with second-generation sequencing technology, can efficiently detect genome-wide DNA segments that interact with functional proteins .
In this study we establish an expression platform for ΦC31 integrase in bovine cells and identify the whole set of ΦC31 integrase binding sites in these cells by using ChIP-Seq technology. We equally aimed to analyze the characteristics of these binding sites using bioinformatics.
Generation of MDKB cells expressing ΦC31 integrase
Global mapping of binding sites for ΦC31 integrase
Statistics of the ChIP-Seq data before and after quality filtering
Number of sequences
Base number (bp)
Following quality control, we mapped the reads to the Reference Genome Consortium database. From the 42,233,088 reads obtained from INTMDBK_input 38,407,502 were mapped to the Reference Genome Consortium database (mapping rate of 90.94%) whereas 30,675,258 were mapped from the 36,809,270 clean reads generated from the INTMDBK_treat sample (mapping rate of83.34%). These reads with unique alignment to a unique position on the genome was used for subsequent information analysis. The 66 captured sites (markers) information and location was reported in Additional file 2.
Based on a certain analysis model in the entire genome range, the peaks (ChIP-Seq enrichment regions) were scanned to obtain the data on peak location in the genome and the peak region sequence information. The peak search output file was saved into wiggle file format and uploaded to the UCSC Genome Browser for visualization. We identified a total of 6740 ChIP-Seq peaks with total peak length of 4,136,812 and peak mean length of 613.77. The total peak depth was 2,035,208 while the peak mean depth was 301.96 for the INTMDBK_treat sample.
Features of the binding sites
Using REPEATMASKER, we identified a total of 21,065 repeat-regions distributed in 50 repeat types for the 6740 binding sites (Additional file 4). The repeat-types included SINE/tRNA-Core-RTE, SINE/MIR, LINE/L1, LINE/L2, SINE/tRNA and DNA/hAT-Ac. We found that 6740 repeat-regions were located in the upstream regions of ΦC31 integrase binding sites, 6740 in the downstream and 7585 were overlapped with the binding sites.
Molecular pathways of ChIP-Seq-based ΦC31 binding sites
The development of genetic engineering techniques for the introduction of gene sequences into mammalian cells has many potential important applications, such as gene therapy, mammary gland bioreactor and mechanism study on gene function  The ΦC31 integrase-mediated gene delivery system has the ability to integrate transgenes into mammalian genomes at specific locations, and guarantees high expression levels of recombinant protein [24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. Building non-viral expression vectors for ΦC31 integrase will provide an important and safe tool for gene therapy and the generation of transgenic organisms [34, 35, 36, 37]. Furthermore, though ΦC31 integrase has been applied for site-directed gene integration in the mammalian genome, the features of ΦC31 integrase binding sites are not well defined. Indeed, although about 102–103 pseudo attP sites have been predicted in mammalian cells , the number found in humans and mice is very small. In addition, though we have found 32 integration sites in the bovine genome using reverse nested PCR , the actual number of these sites in the genome is very different. Thus, the data on ΦC31 integrase has been found too small to analyze its function and mechanism. Therefore, in this study, we constructed a ΦC31 integrase eukaryotic expression vector and generated a stable MDBK cell line expressing ΦC31 integrase to characterize the binding sites of ΦC31 integrase in the whole bovine genome.
After having confirmed the effective expression of ΦC31 integrase, we applied the ChIP-Seq method to uncover the global map of ΦC31 integrase binding sites. A total of 6740 binding sites were identified to be significantly enriched ΦC31 integrase binding regions in the MDBK cell genome. This study is the first of its kind to uncover the global map of binding sites for ΦC31 integrase in the bovine genome. By analyzing genomic DNA sequences, we identified a highly significant sequence motif among the binding sites. This motif contains inverted repeats and has similarities to wild-type attP site. We also have found that these sites contain repetitive DNA. These results strongly suggest that there are sequence characteristics that define pseudo attP sites in the bovine genome, and that the recognition of these sequences is an important feature of site-specific ΦC31 integrase activity in MDBK cell. Repetitive elements are presumably recognized as a suitable substitute for attP sequences.
Through GO analysis, the most significant GO terms of these genes around the ΦC31 integrase binding sites were “regulation of GTPase activity” in the GO category of biological process, “metal ion transmembrane transporter activity” in the GO category of molecular function, and “membrane region” in the GO category of cellular component. Using KEGG pathway analysis with KOBAS, we found that the genes located in the core significant pathways were associated with “cGMP-PKG signaling pathway” and “Focal adhesion”. These results may explain why the ΦC31 integrase does not necessitate external energy and cofactors as it is able to catalyze the metabolism of energy and microelement pathways. Maybe this hypothesis is wrong because we didn’t have enough direct evidences. Also we don’t know what influence will happen to the location genes expression if a foreign gene is integrated nearby by ΦC31 integrase.The present study has some limitations since we analyzed only a single ChIP-Seq dataset without animal in vivo experimental validation following ChIP-Seq. However, due to the fact that the engineered cells were well characterized and the ChIP-Seq results were validated by real time PCR, it is probable that sequences obtained here are potential ΦC31 integrase target binding sites. This time, our purpose was to validate that the ChIP-seq method can be used to uncover the global binding sites by phiC31 integrase and next time we will take some strategies to optimize our selecting system, in order to get rid of the random binding sites of phiC31 integrase in the bovine genome.
Our study uncovered a large set of ΦC31 integrase binding sites in the bovine genome and analysis the features of these binding sites. This method will help us to fully understand the mechanism of the site-specific integration function by phiC31 integrase and will potentially boost its genetic manipulations in both gene therapy and generation of transgenic animals.
The pCMV-Int plasmid harboring ΦC31 integrase was a kind gift of Prof. M. P. Calos (Stanford University School of Medicine, Stanford, USA). The plasmid pCMV-Int was amplified by PCR using specific forward (5-AAGCTTGGTACCGGTCCGGAATTCCC-3) and reverse (5-CTCGCTGGATCCGGGTGTCGCTAC-3) primers. After digestion by HindIII and xhoI restriction enzymes, the ΦC31 gene fragment was isolated from the PCR product and cloned into the recipient plasmid pcDNA3.1-neo digested by the same restriction enzymes after overnight connection at 16 °C to generate ΦC31 integrase eukaryotic expression vector pcDNA3.1-int-neo.
Establishment of MDKB cells expressing ΦC31 and verification
MDKB cells were obtained from the Cell Bank of the Chinese Academy of Sciences. All reagents were obtained from Invitrogen unless otherwise indicated. The cells were maintained on dulbecco’s modified eagle medium (DMEM). Fresh medium was provided to the cells every day, and the cells were passaged every 4–5 days. One day prior to transfection with Lipofectamine 2000 (Invitrogen), cells were treated with Accutase (Sigma-Aldrich, St. Louis) and plated on Matrigel in conditioned medium. Lipofectamine 2000 transfection was carried out according to the manufacturer’s protocol. We typically used 4 μg of the ΦC31 integrase expression vector pcDNA3.1-int-neo to transfect 2 million cells. After 14–21 days of selection, individual colonies of MDKB cells transformed with the expression vector were manually picked and expanded for further analysis. DNA was extracted from the transgenic colonies and negative MDKB cells with the phenol chloroform procedure.
The ΦC31 integrase expression was validated by western blot analysis. MDBK cells were washed with 3 ml 4 °C precooled PBS and heated at 80 °C for 30 min in RIPA buffer(150 mM Nacl, 50 mM Tris-HCL pH 8, 0.1% SDS, 0.5% Deoxycholate, 1 mM EDTA, 1% NP-40), and then centrifuged for 5 min at 10000 g. Next, 20 μg total proteins were eledectrophoresed in a 10% SDS-PAGE gel, and then electrically transferred onto PVDF membranes using a semidry method. ß-actin was used as endogenous control. Detection was performed using the ECL-plus enhanced chemiluminescence system (ECL Amersham Biosciences) after auto radiographic exposure to HyperfilmTM.ECL (Amersham Biosciences). The program ImageMaster I-D (Pharmacia, Sweden) was used for quantitation of bands by densitometry.
Chromatin immunoprecipitation, ChIP-sequencing and ChIP-qPCR
MDBK or Int-MDKB cells (expressing ΦC31 integrase) (1 × 107 cells) were inoculated into 10 mL DMEM medium and cultured to reach 80–90% confluence. Thereafter, cells were fixed with 1% formaldehyde for 15 min cross-linking and quenched with 450 μL 2.5 M glycine. After washing cells three times with ice-cold PBS, chromatin was isolated by adding 1 ml SDS lysis buffer (150 Mm Nacl, 1% Triton X-100, 25 mM Tris pH 7.5, 0.1%SDS, 0.5% Deoxycholate) followed by disruption with a Dounce homogenizer. Lysates were sonicated (VCX750) to shear the DNA to an average length of 200–500 bp, followed by centrifugation at 15000 g for 10 min to remove cell debris. 300 μL of supernatant were divided into 3 aliquots of 100 μL: 100 μL for the experimental group into which was added 1–10 μg anti-ΦC31 antibodies, 100 μL for the control group without addition of antibodies and 100 μL to which was added 4 μL 5 M NaCl. The 3 aliquots were submitted to 3 h cross-linking at 65 °C followed by agarose gel electrophoresis for the evaluation of ultrasonic disruption results. Thereafter, 900 μL ChIP Dilution Buffer (0.01%SDS, 1% Triton X-100, 1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-Hcl, 167 mM Nacl pH 8.1) and 20 μL 50 × PIC were added in the experimental and control samples, and cleared with 60 μlProtein AAgarose/Salmon Sperm DNA. 20 μL genomic DNA (input) was purified from both aliquots of chromatin and quantified on a Nanodrop spectrophotometer. Extrapolation to the original chromatin volume allowed quantitation of the total chromatin yield. ChIP assays were carried out as follows: An aliquot of chromatin (50 mg) was precleared with protein Aagarose beads (Invitrogen). ΦC31-bound genomic DNA fragments were isolated using an anti-bacteriophage ΦC31 integrase antibody (ab93248). After incubation at 4 °C overnight, protein Aagarose beads were used to isolate the immune complexes. Complexes were washed, eluted from the beads using SDS buffer (1%SDS, 0.1 M NaHCO3), and subjected to RNase and proteinase K treatment. Crosslinks were reversed by incubation overnight at 65 °C, and ChIP DNA was purified by PCR Purification Kit (QIAGEN).
We generated one DNA library using the ΦC31 integrase-specific antibodies incubated with crosslinked and sonicated DNA from Int-MDKB cells. A second control library containing the input DNA was prepared from MDBK cells. About 20–30 ng of the ChIP DNA from each library was prepared for sequencing using the Illumina TruSeq Nano DNA Sample Preparation Kit according to the kit manual. The DNA libraries were sequenced with HiSeq 2500 machine using TruSeq Rapid SBS Kit-HS reagents (Illumina).
To ensure the accuracy of subsequent bioinformatics analysis, the original sequencing data was filtered to obtain high-quality sequencing data (clean data). Quality control of the sequencing data was performed using Sickle (https://github.com/najoshi/sickle) and SeqPrep (https://github.com/jstjohn/SeqPrep). The sequencing output raw reads were trimmed by stripping the adaptor sequences and ambiguous nucleotides and reads with quality scores less than 20 and lengths below 20 bp were removed.
Equal amounts of DNA for each sample were combined with a SYBR Green PCR master mix (Qiagen) and specific primers. Amplification reactions were then performed with an Applied Biosystems 7900 real-time PCR system. The primers used for BF27 binding site were 5-GTACTTATTTTTGTATTTGAAACCACCT and TCTGGGTGTTTTTACATT-3; for BF4a, 5-CCATAAAAGGAATATACTTGAAA-3 and 5-ACCAACATGGCAATCGGGGACAT-3; for BPSF1, 5-GGACCTAGAAGGGGCTCATAACT-3 and 5-TGTATACTCACACCACTGTCCTA-3; for BPSM1, 5-GCCGTTTGTTGGGTTAGCTTTTCAGAA-3 and 5-GTCATCCTGCCTTGCAGGCTGTTTCTT-3; for GAPDH, 5-TTCACCACCATGGAGAAGG-3 and 5-GGCATGGACTGTGGTCATGA-3. Use AppliedBiosystems SDS2.4 and RQ manager software to analyze data.
ChIP-Seq library peak finding
To identify significant ΦC31 binding sites in the sequencing data, the whole genome peak scan was performed using the UCSC Genome Browser following the UCSC Genome Browser Instructions (http://genome.ucsc.edu). UCSC provides genomic reference sequences for a large number of species, including a large number of gene annotation information and comparative genomics information.
MEME motif discovery analyses
The MEME web service (http://meme-suite.org) was ran with the command line summary below: meme INTMDBK_treat.peaks.fa -dna -mod zoops -revcomp -nmotifs3 -maxw30 -minw 6 -maxsize 26,000,000 -oc. /INTMDBK_treat_meme.GO functional enrichment analysis of Peak related genes.
Gene Ontology (GO) is an internationally standardized gene function classification system that provides a set of dynamically updated standard vocabularies to comprehensively describe the properties of genes and gene products in organisms. GO has a total of three ontologies describing the molecular function of the gene, the cellular component in which it is located, and the biological process involved. GO is the basic unit (term, node), each term corresponds to a property.
Where N is the number of genes with GO annotation in the genome; n is the number of peak-related genes in N; M is the number of genes annotated for a particular GO term in the genome; m is the number of peak-associated genes annotated for a particular GO term. The GO term which satisfied the condition was defined as the GO term which was significantly enriched in the peak-related gene if the Bonferroni corrected p-value was < 0.05. The Directed Acyclic Graph (DAG), a graphical representation of the results of enrichment analysis of peak-related genes was also equally performed.
KEGG pathway enrichment analysis
This study used KOBAS (http://kobas.cbi.pku.edu.cn) for KEGG pathway enrichment analysis. The calculation principle was the same as GO enrichment analysis. Terms with corrected p-value of < 0.05 was considered significant.
We thank Mr. Noel Komi for valuable advice on the manuscript.
This work was supported by the National Natural Science Foundation of China (31301009, 81100463).
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
LQ and SW lead the experimental design, facilitated data analysis and wrote the manuscript. LW and XZ conceived and guided the execution of the study. QO and YZ were responsible for sequencing data analysis. AM and GL were in charge of cell culture and chromatin immunoprecipitation. FS, XW and YD revised the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Hillman RT, Calos MP. Site-specific integration with bacteriophage PhiC31 integrase. Cold Spring Harb Protoc. 2012. https://doi.org/10.1101/pdb.prot069211.
- 5.Ohtsuka M, Miura H, Mochida K, Hirose M, Hasegawa A, Ogura A, Mizutani R, Kimura M, Isotani A, Ikawa M, Sato M, Gurumurthy CB. One-step generation of multiple transgenic mouse lines using an improved pronuclear injection-based targeted Transgenesis (i-PITT). BMC Genomics. 2015;16:274.CrossRefGoogle Scholar
- 35.Dormiani K, Mir Mohammad Sadeghi H, Sadeghi-Aliabadi H, Ghaedi K, Forouzanfar M, Baharvand H, Nasr-Esfahani MH. Long-term and efficient expression of human beta-globin gene in a hematopoietic cell line using a new site-specific integrating non-viral system. Gene Ther. 2015;22:663–74.CrossRefGoogle Scholar
- 37.Stilhano RS, Martin PK, de Melo SM, Samoto VY, Peres GB, da Silva Michelacci YM, da Silva FH, Pereira VG, D'Almeida V, da Cruz AT, Jasiulionis MG, Han SW. alpha- L-iduronidase gene-based therapy using the phiC31 system to treat mucopolysaccharidose type I mice. J Gene Med. 2015;17:1–13.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.