Introduction

Diabetic kidney disease (DKD) is a major complication of diabetes mellitus (DM) and the most common cause of chronic kidney disease (CKD) worldwide [1,2,3]. DKD is also associated with an increased risk of cardiovascular mortality [4, 5]. DKD has a complex aetiology, yet individual risk is greatly influenced by genetic predisposition [6].

Advances in next generation sequencing (NGS) technologies and analytical approaches have resulted in more cost-effective sequencing [7], accelerating the rate of genetic research [8]. However, NGS costs are still prohibitive for many laboratories, limiting its utility in large-scale studies of the methylome using high-density arrays [9, 10].

Several genome-wide association studies (GWAS) and meta-analyses have been undertaken to detect common genetic variants associated with DKD. These investigations identified single nucleotide polymorphisms (SNPs) associated with DKD including FRMD3 [11], CARS [11], ACACB [12], AFF3 [13], CDCA7 [14], CUBN [15] and EPO [16] genes.

Epigenetic modifications influence both DNA and RNA regulation without altering the underlying sequence and may contribute to the inherited predisposition of DKD [17, 18]. DNA methylation is significantly altered in DM with higher levels of methylation reported in individuals with DKD [19]. MicroRNAs (miRNAs) are small highly conserved non-coding RNA molecules that act as epigenetic modifiers in the regulation of many protein-coding genes [20, 21] and gene expression [22]. MiRNAs play a vital role in many diseases [23].

Induction of miRNAs in renal cells is associated with accumulation of extracellular matrix proteins implicated in kidney fibrosis and glomerular dysfunction [21]. Several miRNAs have been reported previously in association with DKD including miR-135a [24], miR-200b [25] and miR-377 [26]. MiRNAs may represent biomarkers for this disease but further mechanistic studies are required to elucidate their effects.

This study compared sequencing approaches to investigate differentially methylated miRNAs associated with DKD identified from an epigenome-wide association study (EWAS). The aims were to determine genetic variants and epigenetic marks in the miRNAs associated with DKD and their surrounding sequences, and to perform a direct comparison of the results between blood-derived genomic DNA (gDNA) and DNA from Epstein-Barr virus transformed cell-lines derived from the same participants. This provided an opportunity to evaluate the more readily available transformed cell-line DNA samples as a proxy for the finite supply of gDNA.

Main text

Methods

Sample cohort

All participants were of Caucasian ancestry from the UK or ROI and provided written informed consent for research. DNA was extracted from whole blood using the salting out method, normalised following PicoGreen quantitation, and frozen in multiple aliquots. Cell-line DNA was obtained following Epstein-Barr virus transformation of participants’ lymphocytes into cell lines performed by the European Collection of Authenticated Cell Cultures (ECACC) [27].

Participants were part of the All Ireland-Warren 3-Genetics of Kidneys in Diabetes (GoKinD) UK Collection. Cases (n = 150) were defined as individuals with ≥ 10 years duration of type 1 diabetes (T1D) who had also been diagnosed with DKD defined as hypertension (blood pressure ≥ 135/85 mmHg) and persistent macroalbuminuria (≥ 500 mg/24 h). Diabetic controls (DCs, n = 100) were individuals with ≥ 15 years duration of T1D and no evidence of renal disease on repeat testing. Control subjects all had an estimated glomerular filtration rate (eGFR) > 60 mL/min/m2 whereas each case subject had CKD based on presence of persistent macroalbuminuria and eGFR < 60 mL/min/m2. Participant characteristics are included within Additional file 1: Table S1.

Discovery 450K methylation

Blood-derived gDNA for each individual was bisulphite treated (BST) using the EZ-96 DNA Methylation-Gold™ Kit (Zymo Research, USA).

To assess the methylation status of the cytosine-phosphate-guanine (CpG) sites, the Infinium Human Methylation 450K BeadChip array was used following the manufacturer’s instructions. Cases and controls were randomly distributed across each array. This high throughput platform evaluated individual methylation levels (β values) for each CpG site, ranging from 0 for unmethylated to 1 for complete methylation. Raw methylation data was adjusted for dye bias and quantile normalised as previously reported [28]. Quality control (QC) included evaluation of the bisulphite treatment conversion efficiency, dye specificity, hybridisation, staining and the inclusion of 600 integral negative controls for the EWAS.

The significant methylation values between cases and controls for all probes which passed QC were adjusted for multiple testing using the Benjamini and Hochberg approach [29]. All miRNAs that demonstrated significantly altered levels of DNA methylation (p < ×10−5) were selected from our previous EWAS [28] for this validation and fine-mapping study.

NGS: targeted DNA sequencing

Targeted NGS analysis was performed for the sequences surrounding the CpG site of interest for each miRNA. Blood-derived gDNA was analysed in 23 DKD cases and 23 DCs. Participant characteristics are included within Additional file 1: Table S2. The gDNA samples were matched to the GoKinD cell-line DNA samples from which they were originally transformed, therefore analysis was conducted for 92 samples for each genomic region.

Target sequences for the five miRNAs were amplified using custom designed primers via a polymerase chain reaction (PCR). DNA fragments were pooled by sizes of approximately 800 base pairs (bp), 400 bp and 200 bp. Optimal primers were designed using Primer3Plus, Vector NTI Advance® (Invitrogen™, USA) and EpiDesigner software. Primers were selected depending on their ability to sufficiently cover the CpG site of interest and have compatible annealing temperature to enable multiplex reactions. Primer sequences are provided in Additional file 1: Table S3 with optimised PCR conditions.

The library preparation was conducted using two Thermo Fisher Scientific protocols. The Ion Xpress™ Plus gDNA Fragment Library Preparation protocol (MAN0009847, revision B.0) was employed where the initial fragments were approximately 800 bp as they required fragmentation using the E-Gel™ SizeSelect™ 2% Agarose Gel to generate 200 bp libraries. For fragments originally of 200 bp and 400 bp, the Prepare Amplicon Libraries without Fragmentation Using the Ion Plus Fragment Library Kit protocol (MAN0006846, revision A.0) was followed.

Following library preparation, the DNA samples were diluted to 26 pM using DNA-free water. The 400 bp libraries were enriched using Thermo Fisher Scientific’s Ion OneTouch™ 2 (OT2) and Enrichment System (ES) (Ion Personal Genome Machine® (PGM™) Template OT2 400 Kit manual MAN0007219, revision 3.0). The 200 bp libraries were enriched and prepared using the Ion Chef™ (Ion PGM™ IC 200 Kit manual, MAN0007661, revision A.0).

Both the Ion 316™ Chip v2 and the Ion 318™ Chip v2 were used to sequence the DNA samples using the Ion PGM™ System (Thermo Fisher Scientific), before the raw data was analysed using Torrent Suite™ Software v4.0.4 and Partek® Genomics Suite® 6.6 software (Partek® Inc., USA). The sequencing reads were aligned to the hg19 reference sequence. SNPs were aligned to dbSNP version 141 and annotated using RefSeq version 2014-07-30. The chromosome viewer was used to visualise the overall sequencing coverage for the region of interest surrounding the top-ranked CpG site for each miRNA.

Sanger sequencing: fine mapping and methylation analysis

Forty-six gDNA samples, 23 DKD cases and 23 DCs, were bi-directionally Sanger sequenced using the ABI 3730 Genetic Analyser (Thermo Fisher Scientific). This was completed to enable direct comparisons to be drawn against the NGS variant calls.

Bisulphite treatment of the same samples was performed using the EZ-96 DNA Methylation™ Lightning Kit prior to Sanger sequencing. The resulting data provided the opportunity to assess the methylation status of each CpG site within the fragment.

ContigExpress, a component of Vector NTI Advance® 11.5.2 was used to analyse the Sanger sequencing data and determine accurate SNP calls. DNA sequences were aligned to the GRCh37 reference genome obtained from online resource, Ensembl.

An overview of the analysis workflow is illustrated in Fig. 1.

Fig. 1
figure 1

Workflow of analysis methods undertaken in this study. bam binary alignment map, BST bisulphite-treated, CpG cytosine-phosphate-guanine, DKD diabetic kidney disease, gDNA genomic DNA, hg human genome, NGS next generation sequencing, SNP single nucleotide polymorphism, T1D type-1 diabetes mellitus

Results

Discovery 450K methylation analysis

Methylation status was quantitatively determined (DKD cases n = 150 and DCs n = 100). QC showed that > 99% concordance was observed between all included individuals; r2 > 0.98 for each of the sample pairs assessed. In total, 74 CpG sites were determined from the EWAS, five of which were identified with significantly altered β levels from the original EWAS protocol [28]; miR-141, miR-329-2, miR-34A, miR-429 and miR-940 (Additional file 1: Table S4). This manuscript is focused on validation and fine-mapping of these top-ranked miRNAs in individuals with and without DKD.

NGS: targeted DNA sequencing

Targeted NGS was performed using the Ion PGM™ for DNA extracted from both whole blood and cell-line DNA. Both the Ion 316™ Chip v2 and the Ion 318™ Chip v2 were used in this analysis which typically generated 1.6 million to 3 million reads (Additional file 2: Figure S1).

Analysis was completed using Torrent Suite™ Software and Partek® Genomics Suite®. Four SNPs were identified which have not previously been associated with DKD, or identified as top-ranked results in GWAS (Fig. 2); two within miR-329-2 (rs141067872 and rs10132943) and two within miR-429 (rs7521584 and rs112695918). The frequency distributions for these SNPs are included in Additional file 1: Table S5.

Fig. 2
figure 2

Comparison of matched genomic and cell-line transformed DNA for identified SNPs. Comparison of matched genomic and cell-line transformed DNA for rs141067872, rs10132942 (both miR-329-2), rs7521574 and rs112695918 (both miR-429) data generated by the Ion PGM™. The matching gDNA and cell-line transformed DNA show consistent results indicated by the base colour patterns in each example. Chr chromosome, hg human genome

Figure 2 also shows the comparative results of the blood-derived gDNA samples and their complementary cell-line transformed DNA samples, both analysed using the Ion PGM™ (23 DKD cases and 23 DCs). This comparison of matched samples; gDNA compared to cell-line DNA, showed 100% concordance for SNP calls.

Sanger sequencing: fine mapping and methylation analysis

To confirm variants identified by NGS, the same primer pairs were used to bi-directionally Sanger sequence matched gDNA samples (23 DKD case and 23 DCs). The variants identified by the NGS approach were confirmed by Sanger sequencing (Fig. 3). The genotype and minor allele frequencies (dbSNP, HapMap-CEU, low coverage panel) determined are detailed in Additional file 1: Table S5, though it is essential to note that not all fragments for all samples were Sanger sequenced successfully.

Fig. 3
figure 3

Comparison of SNPs located within miR-329-2 and miR-429 identified by targeted NGS and Sanger sequencing. The data generated by both platforms showed consistent results for SNP calls. Ion PGM™ data analysed using Partek Genomics Suite is shown on the left, with the complementary Sanger sequence results shown on the right, for matching genomic DNA samples. Chr chromosome, DC diabetic control, DKD diabetic kidney disease, hg human genome, Ref reference, Seq sequence

The gDNA DKD and DC samples were also BST in order to assess the methylation status of each CpG site present within the fragment as reported by the 450K methylation array using ContigExpress software. In all, 35 methylation sites were identified for these miRNAs following bi-directional sequencing (Additional file 1: Table S6).

Discussion

This study reports the novel association of five miRNAs with DKD, performing validation following the published EWAS and fine-mapping on these miRNAs. Additionally, different sequencing approaches were evaluated to define the genetic and epigenetic architecture of sequences surrounding miRNAs associated with DKD. Comparative analysis between Sanger sequencing and NGS technologies confirmed a 100% concordant call rate for all SNPs identified by both techniques, for duplicate samples, providing reassurance that the original gDNA sequence for miRNAs was unaltered by the cell-line transformation process. Several additional studies have also reported this positive comparison between Sanger and NGS approaches [30,31,32,33]. Notably, this is the first report to return results using semiconductor sequencing chemistry and Ion Torrent platforms for top-ranked miRNAs identified from an EWAS. Despite being the gold-standard method, Sanger sequencing is not faultless and has been shown to be inefficient in confirming NGS results for regions with high GC content, and repetitive sequences [31]. NGS methods are reported to be more sensitive and scalable than Sanger sequencing [30, 33, 34].

Regarding DNA methylation, the BST DNA Sanger sequencing analysis mirrored the methylation sites identified through the Infinium Human Methylation 450K analysis. It is advisable to use at least two methods to detect and confirm differential methylation status [35]. Of the five main methods of detecting differential methylation, three were not employed in this study; (1) immunoprecipitation of methylated DNA, (2) methylated DNA capture by affinity purification and (3) reduced representation bisulphite sequencing [36]. The bisulphite-based methods, of which two were employed here, performed optimally in comparison to the others [36].

In conclusion, differential methylation in the five top-ranked miRNAs is associated with DKD and we have provided new details on the genetic architecture surrounding these loci. Targeted NGS compared favourably with Sanger sequencing. Sanger sequencing is costly and time-consuming when assessing many variants, or samples. Targeted NGS provides a robust alternative method, offering more cost-effective and often more sensitive approach.

Limitations

A potential limitation is that the sequencing data generated with the Ion PGM™ Template OT2 400 Kit was not of as high quality as the Ion PGM™ IC 200 Kit. Fragments of 400 bp in length had to be prepared and enriched using both the OT2 and ES, not the Ion Chef™ due to chemistry incompatibilities at the time this experiment was undertaken (2014–2015). Both miRNAs with 400 bp fragments, miR-34A and miR-940, could have primers re-designed to facilitate 200 bp fragments covering the region of interest to provide better coverage of these regions.