Background

Forensic biology often involves the study of challenging samples such as discriminating monozygotic twins, predicting age of human from its left over body tissue and distinguishing indiscernible body fluids/tissues whose identity determination remains a big challenge in the forensic laboratories. At the site of the crime, the body fluids of a human such as blood, semen, visceral fluid, vaginal fluid, saliva, or even menstrual fluid are washed out of its natural texture and are often mixed or discolored due to drying/bleaching out during a long exposure. Such obscure exhibits present a big challenge in front of forensic scientists to determine the actual cause/severity of the crime. The current method of DNA investigation in forensic laboratories is carried out by Identifiler®/GlobalFiler® Express kits (Life Science Technology Ltd) (Hennessy et al., 2014; Mulero et al., 2008; Wang et al., 2012) or PowerPlex® kit (Promega) (Oostdik et al., 2014) or Investigator® 24Plex kits (Qiagen) (Chen et al., 2010), which measures at least the 13 core short tandem repeats (STR) loci recommended by Federal Bureau of Investigation (FBI), USA. The method is quite fast and concludes all the polymorphic STR alleles by multiplex amplification of DNA from the picogram (pg) levels of DNA concentration (Green et al., 2013). When phenotypic identification is concerned from the sources of samples with same DNA sequence, STR technology fails to differentiate them on the basis of STR polymorphism which is identical in all the cells/tissues of a single human body or monozygotic twins, however, profiling DNA methylation pattern (Moore et al., 2013), especially 5-methyl cytosine (5mC), certainly possesses great potential to identify such human samples. 5mC is the most studied methylated base that passes on an extra layer of information (epigenetic) upon the genetic code is an important signature of obvious or subtle phenotypic differences observed in humans or human tissues (Heyn et al., 2013; Lister & Ecker, 2009). A large number of research advances in detection and quantification of 5mC have assisted epigenetic modifications to be connected to a plethora of phenotypic alterations (Richards, 2006), that may form an independent part of the forensic study or corroborate the existing STR results. Apart from this, the other modifications of cytosine although minor are 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxycytosine (5caC) (Szulik et al., 2015) and all these modifications including 5mC are more often modulated either through cell-specific epigenetic expression pattern or environmental factors (Jaenisch & Bird, 2003), psychological factors(Kim et al., 2016; Lam et al., 2012) or physical stress acting in and around the cells (Meaney & Szyf, 2005; Unternaehrer et al., 2012).

A major part of DNA methylation pattern is lost while passing on from parents to their offspring; however, there is substantial epigenetic similarity that occurs between parents and offspring which actually recapitulate a parent’s particular DNA methylation in the offspring which is known as epigenetic inheritance (Richards, 2006). These methyl groups in DNA are covalently introduced by enzymes called DNA methyltransferases (DNA MTases) according to the specificities of the DNA MTases which are in turn modulated by the environment (Fig. 1). The cofactor of the MTases, S-adenosyl-L-methionine (SAM) is used as the donor of the methyl group to methylate DNA at the specific position, most commonly pictured as palindromic sites (Auclair & Weber, 2012; Jeltsch, 2002). Some of these methyl groups are removed by demethylases such as ten-eleven translocation (TET) proteins hence causing these methylation marks in the DNA to be reversible and dynamic in nature as per the cellular requirement and the environment (Auclair & Weber, 2012) (Fig. 1). The field of molecular epigenetics is a swiftly developing area of research with a great emphasis on gene regulation, developmental biology as well as in therapeutic applications (Kalozoumi et al., 2012). The omnipresence of DNA methylation in almost all kingdoms of life and the distribution and diversity of DNA MTases in different species reflects their importance and vital cellular functions. 5-methylcytosine and N6-methyladenine in the prokaryotic kingdom are fundamental for regulation of DNA replication, cell cycle, protecting DNA from restriction enzymes and conferring virulence to certain strains (Casadesus & Low, 2006; Low et al., 2001; Ratel et al., 2006). As compared to the invertebrate kingdom, where the genome modification is less pronounced with a minor fraction of methylated DNA, the vertebrate’s genome is modified with a diversity of nucleotide modifications (Deaton & Bird, 2011; Koziol et al., 2016). In humans approximately 3% of the DNA is found to be 5mC methylated mostly at the CpG dinucleotides (Strachan & Read, 2011) by the action of three major DNA methyltransferases viz. DNMT1, DNMT3a, DNMT3b (Moore et al., 2013) (Fig. 1a). Also, N6-methyladenosine in DNA (m6dA) has been recently discovered and identified in the nuclear genome of few vertebrates including human, mostly present in the intronic regions, which shows variations among the cell types (Koziol et al., 2016).

Fig. 1
figure 1

DNA methylation dynamics and human phenotypic variations. a In humans, three different DNA methyltransferases (DNMT1, DNMT3a, and DNMT3b) are responsible for methylating the genomic DNA (primary substrate) at the 5’ position of the cytosine residue. The methyl group donor is S-adenosyl-L-methionine (SAM, also called cofactor or secondary substrate) which is oxidized to S-adenosyl-L-homocysteine (SAH) after donating the methyl group. In the genome, the methyl groups of cytosine residues are removed by ten-eleven translocation proteins (TET1, TET2, TET3) which use Fe2+ and ascorbate as additional cofactors. In a separate reaction system inside the cells, SAH is reduced to methionine using cofactor 5-methyl-tetrahydrofolate (MTHF), and the methionine is converted back to universal methyl donor SAM using an ATP molecule. b DNA methylation profile of human tissues/cells is affected by its immediate environment and activities such as lifestyle, eating habits, alcoholism, smoking, diseased, aggressive, drug addiction, perpetual criminal commitment, etc. The combined effect of the geographical location and the personal experiences/behaviors of individuals which are usually repetitive in nature leave behind certain methylation marks on their DNA

In humans, differentially methylated regions (DMRs) have been observed in chromosomal DNA, i.e. altered methylation patterns on same chromosomal DNA among different tissues or among individuals or others (Choufani et al., 2011). These DMRs observed among multiple tissues (tDMRs) of humans depicts a molecular view of epigenetic changes marked among different tissues. Forensic cases involving monozygotic twins study, tissues identification, human age determination or predicting his/her behavior are not possible to assort through STR profiling which is solely based on variation in DNA sequence repeats called short tandem repeats (Oostdik et al., 2014; Wang et al., 2012). On the other hand, methylation marks flagged in the DNA sequence is a result of cell type as well as its immediate environment (Marsit, 2015).

DMRs can be studied by many new emerging next generation sequencing (NGS) methods. NGS method supersedes the first generation sequencing method (chain termination method) which was introduced by Frederick Sanger, by incorporating two more dimensions in it viz. massively parallel sequencing and high throughput analysis (Mardis, 2013). Sanger method of DNA sequencing used ddNTPs as the chain terminators in addition to usual dNTPs in the ratio of 1:100 to sequence the phage ϕX-174 DNA (5,386 nucleotides) using polyacrylamide gel analysis (Sanger et al., 1977). In NGS method, sequencing of huge DNA such as of human (~3 billion nucleotides) can be carried out in a single day by synthesis of all the fragments of the genome so created using polymerase and labeling each nucleotide that is being incorporated. The problem only lies in the interpretation of the big data so generated.

Bisulfite conversion of genomic DNA and subsequently employing the next-generation sequencing (BS-seq) is now a technique of choice to study the 5mC profile of the genomic DNA. 5mC profiling with the advent of NGS platform, Illumina (Masser et al., 2015) has provided a boost to the BS-seq method as well as led to the creation of many methylation databases (Grunau et al., 2001; Hackenberg et al., 2011). The detection of DNA methylation patterns is an advancing area of research in forensic science, which promises to distinguish various cell/tissue types (Zhang et al., 2013) or monozygotic twins (Levesque et al., 2014; Li et al., 2013), accurately estimate human age (Bekaert et al., 2015; Horvath, 2013), identify human bio-geographic/ethnicity type (Heyn et al., 2013; Xia et al., 1846) and can be applied to predict human behaviour as a new part of investigative forensic (Kockler et al., 2006; Kovatsi et al., 2011).

Starting Forensic Samples, Collection Protocol and Storage

Starting Forensic Samples

Except for the paternity dispute where blood samples or buccal swabs are collected without any contamination, damage or deterioration, most of the biological samples collected from the crime scene legally known as exhibits are highly challenging to process in the forensic laboratory. Blood, blood stained objects, tissues, flesh, semen, vaginal swab from rape victim, hair, teeth, bone, saliva, epithelial cells, objects forcibly touched especially where body fluids/epithelial cells are left-over (for touch DNA analysis), bite marks on body, cigarette stubs, drinking bottles, toothbrush etc are some of the frequently observed and collected exhibits during investigation of a crime scene.

Collection Protocol and Storage

If it’s a criminal case, then any body fluids found at the site other than on clothes are lifted on a sterile gauze piece with the help of sterile forceps. If the fluid is already dried at the crime scene then gauze piece/stain as such is first wetted in phosphate buffer saline (PBS) and then lifted with the help of forceps. The blood stains or non-blood biological stains are on victim/suspect cloths, or on small objects, then whole cloth/object as such is packed in a paper packet and finally in a cardboard box. If it’s a civil case such as paternity dispute, then blood samples on gauze piece or buccal swabs on the cotton of toothpicks are collected. All the wet exhibits, are first dried preferably in shade, wrapped in brown paper which are then finally sealed and labelled in a paper envelope where collector (usually a forensic expert) puts his/her signature with date and time of collection. The bone piece (flesh removed), teeth or hairs are also wrapped in brown papers in dry condition and then sealed in a paper envelope/carton boxes. If the exhibit is flesh, then instantaneously it is put in sterile metallic container, packed with ice/dry ice or in liquid nitrogen if available, which is then immediately shipped to the forensic laboratory through proper channel. The sealed and signed samples collected from the forensic experts are handed to the Investigating Officer (I/O) of the case (generally a police officer in rank of sub-inspector (SI) or above) who from first to last through the proper chain of custody, forwards the exhibits to the State Forensic Laboratory through the Judge of Civil or Criminal Court as the case may be, bearing his/her majesty’s seal and signature in the court’s forwarding letter as well in the exhibit packet(s).

Methods and Techniques of DNA methylation analysis

A very sensitive as well as specific detection of 5mC methylation pattern on the DNA obtained from the questioned sample(s) is required to be devised to be followed routinely in a forensic laboratory. Although one of the easiest methods to detect and identify DNA methylation pattern is through the use of Illumina Infinium BeadChip array (the latest technology is the MethylationEPIC BeadChip (Moran et al., 2016)) which is nonetheless very expensive method to be used routinely in forensic analysis, however there are other molecular approaches and techniques which have been known for decades and some of them developed recently for detection and analysis of 5mC methylation pattern (Table 1). It should be noted that all these methods and analyses are based on either bisulfite treatment of DNA or use of methylation-sensitive restriction enzyme(s) or use of an antibody against the methylated base or a combination of them. These methods and techniques require certain amount of DNA in picogram (pg i.e. 10-12g) or nanogram (ng i.e. 10-9g) or microgram (μg i.e. 10-6g), have different resolving power of methylation status in terms of bp and some are quite expensive and others relatively less expensive (Table 1).

Table 1 Methods/techniques for detecting/analysis of Differential DNA methylation

Bisulfite Sequencing and Methylation-Specific PCR (MSP)

The treatment of DNA with sodium bisulfite (Na+HSO3-) converts unmethylated cytosines to uracils, while methylated cytosines are not affected (Darst et al., 2010). DNA is fragmented (~200-800 bp) by sonication and is treated with bisulfite at lower pH (pH 5) which adds sulfite group to the C6 of cytosine, which is then followed by incubating the samples at higher pH, which removes sulfite group generating uracil. Methylated cytosine does not undergo bisulfite reaction due to steric hindrance exhibited by the methyl group. By performing simple PCR reaction and sequencing of the DNA, methylated and unmethylated cytosines of the DNA sequence are detected and determined (Fig. 2). An advanced method called bisulfite pyrosequencing is a quantitative method to determine the methylation status of individual CpG cytosines from PCR amplified products using unique sequencing-by-synthesis (SBS) method (Tost & Gut, 2007) and can be used to distinguish various human body fluids (Madi et al., n.d.). The release of pyrophosphate (PPi) from the bisulfite treated DNA is proportional to the incorporation of dNTPs, which is converted into ATP to aid in subsequent conversions of luciferin to oxyluciferin. The quantitative portrait of the methylation profile for the amplicon in question can be used to distinguish many body fluids from its DNA obtained as low as ~50 pg (Vidaki et al., 2016).

Fig. 2
figure 2

Bisulfite sequencing, Methylation Specific PCR and Restriction Analysis. Methylation status of cytosines of two different samples/alleles (one methylated and another unmethylated) is detected and confirmed by bisulfite sequencing. For the genomic level of 5mC analysis, bisulfite treatment can be followed by next generation sequencing (BS-seq) method. The data obtained from bisulfite sequencing can be pursued further for Methylation Specific PCR (MSP), Combined BS Restriction Analysis (COBRA) and Methylation-sensitive Single Nucleotide Primer Extension (Ms-SNuPE, explained in Fig. 4) for routine analysis of specific cytosine residues

Methylation Specific PCR (MSP) is a post-bisulfite treatment technique to discriminately amplify and detect a region of interest which mostly remains methylated using methylated-specific primers (Herman et al., 1996). The MSP method may also employ primer sets for unmethylated versions of the same sequence (Fig. 2). The later are useful as a control and sometimes to collect the positive data depending on the experiment. A related modified method called MethyLight MSP provides a quantitative analysis using quantitative real-time PCR (Eads et al., 2000), where methylated-specific primers containing fluorescence reporter anneals to the region of interest. An additional methodology which distinguishes MSP-generated DNA containing a low level of methylation is high resolution melting curve analysis (HRMA or Mc-MSP) (Kristensen et al., 2008; Wojdacz & Dobrovic, 2009), which measures the quantitative ratio of methylated and unmethylated product as differing peaks so produced in the melting curve analysis.

The COBRA Methylation Assay

COBRA (Combined Bisulfite Restriction Analysis) methylation assay involves standard sodium bisulfite treatment of genomic DNA, followed by region-specific PCR amplification, purification of amplification products and restriction enzyme digestion (Fig. 2) and finally analysis of the results (Xiong & Laird, 1997). It is highly specific, sequence-dependent and provides information regarding the methylation status of a DNA sequence. Bio-COBRA, an advanced chip-based assay, which includes an electrophoresis step (after restriction digestion) in microfluidics devices, where nanolitres of reagents are run in miniaturized systems (Brena et al., 2006; Brena & Plass, 2009). Specifically, Bio-COBRA is a high-throughput assessment of DNA methylation of large sample sets, very quickly and quantitatively.

The HELP Methylation Assay

The HELP (HpaII tiny fragment Enrichment by Ligation-mediated PCR) assay is another technique used for determining the dynamic nature of methylation status of DNA from different cells/tissues or from same cells/tissues kept under different conditions (Oda & Greally, 2009). Gene level, as well as genomic level of DNA methylation, is ascertained through this technique. It employs two restriction enzymes; HpaII, methylation-sensitive and MspI methylation-insensitive (isoschizomer), to digest the genomic DNA, contrast the digestion products generated by these enzymes and finally use ligation-mediated PCR (size range is limited from 200-2000 bp) (Khulan et al., 2006). HpaII specifically digests unmethylated 5'-CCGG-3' sites and enrich the methylation deficient regions of the genome. 5mC methylation state at each locus point is determined by comparing the representations made by HpaII as well as MspI (Shaknovich et al., 2010).

Methylation-sensitive restriction enzyme based microarray (MRE- microarray)

DNA with 5mC mark may be immune or non-immune to certain classes of restriction enzymes (Wilson & Murray, 1991). This differential sensitivity of restriction enzyme towards DNA digestion can be utilized to approximate and map the cytosine methyl groups in forensic DNA exhibits and reference samples for comparison. The genomic DNA is fragmented (<200 bp) by using restriction enzyme MseI, which recognizes 5’-TTAA-3’ sequence, leaving most CpG islands (CGIs) intact (Yan et al., 2002). These digestion products are ligated to synthetic linkers and again restricted with BstUI and/or HpaII, since both are CpG specific and cover more than 90% of CGIs. The resulting DNAs obtained are amplified using linker-dependent PCR. The PCR products are then labeled with Cy3 (Em ~570 nm, greenish yellow) and Cy5 (Em ~630 nm, red) dyes. In practical, Cy3 dye can be used for labeling the DNA obtained from a reference sample, while Cy5 dye can be used to label the DNA obtained from the site of the crime. These labeled products from both groups are mixed well in equal amount and blotted on differential methylation hybridization (DMH) microarray slide having printed human CGI library probes (Schumacher et al., 2006). After stringent washing, microarray slide is then subjected to fluorescence scanning. The intensity of Cy3/Cy5 ratio at each locus here will reflect the methylation status in the reference DNA relative to that in the DNA obtained from the site of the crime.

Illumina Infinium Methylation Assay

Illumina’s Infinium Methylation Assay is used to measure methylation level at the single-CpG-site and therefore can offer the highest resolution for identifying the various tissue types and solving many different forensic cases (Table 2). Illumina platforms can perform both bead array and NGS analyses. The recently outdated Illumina Infinium Human Methylation 450K BeadChip (Illumina Inc.), had the coverage of > 480,000 CpG sites of the genome, which included ~99% of RefSeq genes (Touleimat & Tost, 2012). This technology has now been superseded by Infinium MethylationEPIC BeadChip microarray which covers > 850,000 methylation sites, including extra 333,265 CpG sites of enhancer regions made possible by ENCODE and FANTOM5 projects (McCartney et al., 2016; Moran et al., 2016; Pidsley et al., 2016). The methylation data of these CpG sites can be obtained from very low amount of genomic DNA (~250 ng). The data is then transformed into genotypic differences that are quantitative analyzed (Cheng et al., 2014). After bisulfite treatment of genomic DNA as a usual standard protocol, the assay differentiates methylated versus non-methylated CpG sites using either Infinium I chemistry or Infinium II chemistry. In Infinium I chemistry, two different site-specific probes, M bead type and U bead type, each of 50 nucleotides terminates with an extension based fluorophore, directly at the methylated C (G-Cy3) and unmethylated C (A-Cy5) respectively. Infinium II chemistry utilizes 49 nucleotides single probe which on hybridization is extended by a single nucleotide containing either G-Cy3 for methylated or A-Cy5 for unmethylated. Quantitative value (β-value) of DNA methylation for each CpG site is generated with the Illumina GenomeStudio® software (Triche Jr et al., 2013).

Table 2 Forensic exhibits and some epigenetic markers (CpG) information available using Illumina Infinium HumanMethylation450 BeadChip array

Methyl-CpG Binding Domain Protein Sequencing

Methyl-CpG binding domain protein sequencing (MBD-seq) is another technique which is used to determine genome-wide 5mC methylation patterns of humans (Aberg et al., 2012). In MBD-seq method, genomic DNA is fragmented and the methylated sequences are pulled down by a 5mC binding protein or simply using MethylMiner Methylated DNA Enrichment Kit (Invitrogen) (Harris et al., 2010). These methylated fragments are sequenced using high-throughput sequencing techniques (Lan et al., 2011; Rauch & Pfeifer, 2005), and the exact position of sequenced tags is determined by comparing them to the human genome. The technique is quite effective for measurement of methylation status of CpG islands containing high density of CpG sites (Fraga et al., 2003).

Methylated DNA Immunoprecipitation PCR/Sequencing (MeDIP-PCR/seq)

Methylated DNA Immunoprecipitation (MeDIP) involves pulling down methylated DNA regions of the genome using an antibody raised 5mC (Borgel et al., 2012; Mohn et al., 2009). Conventionally a Dot blot method (Clement & Benhattar, 2005) can be performed with the direct addition of 5mC antibody on the fragmented DNA immobilized on Nylon membrane and intensity measurement of fluorescent secondary antibody provides the information of the presence, absence or amount of methyl groups roughly present in a given DNA sample (Fig. 3) (Koziol et al., 2016). MeDIP is an efficient technique for the extraction of methylated DNA from forensic human samples which can include blood, bone and hair samples of commonly used as exhibits. DNA immunoprecipitation combined with next generation sequencing methods termed as MeDIP-seq can be used to for the generation of methylomes from tissue or cells using 160-300 ng of starting DNA (Taiwo et al., 2012). Briefly, the genomic DNA is sonicated to obtain fragments (200-800 bp) and immunoprecipitated with monoclonal antibodies raised against 5-methylcytidine (much like methyl capture protein in MBD-seq method). Antibody-Methylated DNA complex is then purified using paramagnetic beads that isolate it from the rest of non-5mC methylated/unspecific DNA. The methylated DNA so obtained can be used for MSP, methylation-sensitive restriction enzyme analysis or genome-wide 5mC analysis through sequencing/microarray (Borgel et al., 2012). In the genomic analysis of methylated DNA, the fragments of 36-50 bp or 400 bp, as per the method used, with methylated reads are produced. Using a visual browser such as Ensembl, these methylated reads (sequences) are aligned and compared to the human genome using alignment software such as MeQA (http://life.tongji.edu.cn/meqa/) (Huang et al., 2012). MeDIP-PCRs targets methylated genomic loci with starting genomic DNA as low as ~1 ng (Zhao et al., 2014) which could be highly useful for methylation profiling of challenging DNA samples in forensics.

Fig. 3
figure 3

DNA methylation detection through Immuno-precipitation (Dot-blot method). a. The methylated DNA (dot 2 and 3) develops color precipitate/fluorescence while unmethylated DNA (dot 1) fails to develop the color/fluorescence in the nylon blot. b. The DNA is spotted as a dot in the nylon membrane and then primary antibody is blotted, which is followed by a secondary antibody conjugated with horse radish peroxidase enzyme. The addition of enzyme’s substrate such as 3’-3’ diamino benzidine (DAB) produces a pink precipitate. Alternatively, a fluorophore-tagged secondary antibody (Alexa Fluor Ab) can be used to visualize the presence of methyl group

Methylation analysis through Single Base Extension

Single-base extension (SBE) is a method to identify a single-nucleotide polymorphism (SNP) for determining the identity of a nucleotide base at a specific position along a nucleic acid (Hu & Zhang, 2012). It involves hybridization of an oligonucleotide probe to a complementary region of the nucleic acid, with the probe’s 3’ end terminal directly adjacent to the nucleotide base to be identified. The probe is enzymatically extended a single base with a nucleotide terminator, 2',3'-dideoxynucleotide triphosphate (ddNTP) which base pair (bp) to the nucleotide in question (Fig. 4). The terminator does not allow further extension of the sequence. The terminator base can be identified using fluorescence labeling, isotope labeling, mass labeling for mass spectrometry and measuring enzymatic activity using an enzyme conjugate.

Fig. 4
figure 4

Schematic representation of Methyl-sensitive Single Nucleotide Primer Extension (Ms-SNuPE) method. a This method can be used to detect cytosine methylation by designing primers of different lengths corresponding to different cytosine markers that vary among body fluids/tissues of human (e.g. blood, saliva, and semen) or between monozygotic twins or with the ageing of human beings. The primers are extended a single base with fluorescently labeled terminators (ddNTPs-Fluorophore) using DNA polymerase b The extended primers are then detected by running these products through the genetic analyzer (capillary electrophoresis)

Single base extension of primers using a fluorescently labeled ddNTP terminator, originally commercialized as SNaPshot® (ABI PRISM SNaPshot Multiplex kit) by Life technologies for analytical analysis of cytosine methylation status of CpG rich genomic DNA by capillary electrophoresis (CE) has been successfully applied to bisulfite treated and PCR-amplified genomic DNA (Boyd & Zon, 2008). The results can be analyzed using 310, 3100, 3500 and 3730 Genetic Analyzers (Life Technologies) and GeneMapper software of version >3.1 and displayed much like STR peaks in methylation SNaPshot as electropherogram except the former shows variation in repeat units and the latter in methylation status (Wnuk et al., 2013). The forensic exhibits containing DNA samples as listed in Table 2 can be successfully applied to SNaPshot multiplexing system using primers designed against the CpG site/Illumina ID of the known DNA sequence.

Applications of DNA methylation in Forensic Science

Forensic DNA samples Authentication

Scientists in Israel have demonstrated that DNA evidence may be fabricated (Link) using DNA that has been generated/amplified using standard molecular biology protocols such as in vitro PCR, in vivo molecular cloning in bacteria/yeast, and a decade ago developed whole genome amplification (WGA) technique, which practically enables anyone to synthesize large amount of DNA of any human being with any desired STR profile containing 13/15/23 loci + Amelogenin recommended by FBI, USA. The artificial DNA can be planted in and around crime scenes or applied to nearby objects or enmeshed with genuine human tissues (Frumkin et al., 2010; Melchior et al., 2008). To resolve the issue and authenticate the forensic DNA samples, sodium bisulfite treatment of DNA, followed by MSP or cloning and Sanger sequencing can be carried out, which distinguishes the in vitro generated DNA versus genuine DNA (Frumkin et al., 2010).

Age prediction of a person

Age determination of victims and/or suspects is an important part of crime investigation and is of vital interest to a forensic scientist. The current approach of age determination which is based on an assessment of bone and teeth structure and composition is practically vague, rather methylation marks on the DNA obtained from these hard tissues can be used to accurately predict the chronological age of individuals (Bekaert et al., 2015; Giuliani et al., 2015). DNA methylation has an inheritable effect with age acceleration and is close to zero at the embryonic stage of an individual (Guo et al., 2014; Horvath, 2013). The age acceleration starts with a high degree of correlation in genomic DNA methylation marks in a broad range of human tissues and cells and is influenced by gender, genetic makeup (Hannum et al., 2012), racial (Kader & Ghai, 2016; Park et al., 2016), clinical as well as lifestyle of an individual (Weidner et al., 2014). The human genomic regions have been identified with age-sensitive DNA methylation patterns (Yi et al., 2015), which helps in prediction of biological age as accurate as within 5 years of chronological age. The CpG methylation patterns in the promoter regions of the three genes viz. EDARADD, TOM1L1, and NPTX2 were shown to vary linearly (EDARADD and NPTX2 are hypomethylated while TOM1L1 is hypermethylated) with increasing age among 18 - 70 year of individuals (Bocklandt et al., 2011). In further studies, five genes (NPTX2, TRIM58, GRIA2, KCNQ1DN and BIRC4BP) have been shown to continuously shed off methyl group (hypomethylated) with increasing age in various human tissues (Koch & Wagner, 2011) while the gene ELOVL2 showed increasing methylation level with aging (Garagnani et al., 2012). ELOVL2 methylation marks in human blood samples are very stable and do not change significantly over a month period and are up to 70% preserved in a decade old sample which grades it highly powerful in age prediction of blood exhibits in forensic science (Zbiec-Piekarska et al., 2014). Further researches in concert have developed the epigenetic markers (cg06304190, cg06979108, and cg12837463) for age prediction using semen samples (Lee et al., 2015). In an another research work, some of the CpG DNA methylation loci (ASPA, PDE4C, ELOVL2, FHL2, CCDC102B, C1orf132, chr16:85395429 and EDARADD) have been used to predict accurately the age of deceased or live individuals using their blood samples (Bekaert et al., 2015; Freire-Aradas et al., 2016). Beside accurate age prediction, DNA methylation can also help to measure the rate of aging (Horvath, 2013). In physically and healthily compromised individuals, deviation from normal aging rate may be observed. Human aging-associated differentially methylated regions (aDMRs) in the human genome occur in almost all tissues and cells (except spermatozoa) as hyper-aDMRs, preferentially at the developmental gene promoters (Rakyan et al., 2010). DNA methylation signatures in the prefrontal cortex of human can also be used to precisely estimate the aging rate or age of a human sample (Numata et al., 2012). The age analysis can be performed at: https://dnamage.genetics.ucla.edu/

The CpG methylation data obtained from Illumina DNA Infinium of a forensic sample can be uploaded to the above online software to invariably calculate the precise age of the questioned sample. As an alternative method, generation of methylation sensitive HRMA can be carried out using the promoter regions of ELOVL2 and FHL2 loci to predict the age of an individual (Hamano et al., 2016).

Sources of tissue/body fluid identification

Identifying and establishing the source of tissue from the site of crime would be helpful in recreating the crime scene (An et al., 2012). Some of the 5mC methylation marks in DNA are highly tissue specific (Table 2) and can be used to discriminate the origins of the DNA sample, either from a single source or multiple sources of the individual (Frumkin et al., 2011). It implies that when DNA is obtained from a mixture of bodily fluid, this technique would be able to discriminate it into semen, saliva, blood, or any other body tissues as per the methylation pattern (what can be termed as Next Generation Serology) (Park et al., 2014). It was Frumkin et al., who pioneered to explore and establish the 5mC based forensic tissue typing by incubating ~1 ng of DNA with methylation-sensitive restriction enzyme (HhaI, recognition site –GCGC–) while screening 38 genomic loci (205 CpG islands), where differential amplification patterns were obtained based on –GCGC– site methylation level (Frumkin et al., 2010). In another seminal work, it was shown that 5mC methylation pattern of the genes viz. DACT1, USP49, HOXA4, PFN3, and PRMT2 produced by bisulfite sequencing, could be used to distinguish blood, saliva, semen, menstrual blood, and vaginal fluid (Lee et al., 2012). In semen sample, DACT1 and USP49 are specifically hypomethylated, while in other body fluid HOXA4, PFN3, and PRMT2 show their characteristic methylation profile. Similarly to distinguish vaginal epithelia from other body fluids of forensic interests (blood, saliva, and semen), a recent study using bisulfite-modified pyrosequencing has established PFN3A as the novel marker (Antunes et al., 2016). In parallel a confirmatory test (multiplex methylation SNaPshot) for body fluid identification has been devised using 9 CpG menstrual blood markers which can specifically identify blood, saliva, semen, vaginal fluid and menstrual blood (Lee et al., 2016). The stability of these methylation markers in trace amounts of various body fluids may be marred by both endogenous and exogenous factors, and therefore a study using a large number of samples (100 for each body fluid) were performed using Illumina BeadChip Array, which showed comparable sensitivity of the results to STR analysis (Forat et al., 2016). The methylation pattern differences in the DNA of these various body fluids has been recently utilized to develop another novel forensic technique based on combined use of COBRA and MSP methods (Lin et al., 2016).

Discerning monozygotic twins

Monozygotic (MZ) twins having phenotypic differences subtle or otherwise carry exactly the same DNA sequence (concordant) throughout the genome they inherit and hence puts a roadblock to their identification using conventional STR, SNP, or mtDNA profiling (Chatterjee & Morison, 2011). However, it has been observed that there is considerable variation in DNA methylation pattern between MZ twins which largely results from genetic lineage/makeup, pre- and post natal environmental conditions (Segal et al., 2016). Due to this epigenetic (DNA methylation) difference between monozygotic twins, these are known as discordant MZ twins with striking difference in susceptibility for various diseases (Castillo-Fernandez et al., 2014; Elboudwarej et al., 2016; Ribel-Madsen et al., 2012; Roos et al., 2016). Also, for male MZ pairs and female MZ pairs, there are marked differences of the CpG methylation level that are present at the autosomal or the X-chromosomal DNA (Watanabe et al., 2016). These differences in 5mC pattern can be identified by ultra-deep NGS as described by Weber-Lehmann et al. suggesting a solution to paternity and forensic cases involving MZ twins (Weber-Lehmann et al., 2014). Apart from sequencing, HRMA experiment can be performed using bisulfite-converted/PCR-amplified DNA from both the twin’s when one of the twin’s DNA sequences having a greater number of methyl groups will show a different melting temperature (Stewart et al., 2015).

Bio-geographic/Ethnic classification

A large number of bio-geographic/ethnic diversities exists in the human population with having unique and different biogeographic distribution. Cases related to their native origin or migrated have been widely reported in the criminal justice system and their identity can be labeled using their unique methylation profile (Race & Genetics Working Group, 2005; Royal et al., 2010). Differences in DNA methylation patterns among different human races are present right from the birth (Adkins et al., 2011) and these differences persist for generations as long as they maximally inhabit in the same biogeographic region. Methylation differences play an unambiguous role in imparting human phenotypic appearances (Heyn et al., 2013; Xia et al., 1846) as well as human behaviors and undeniably carry along with it certain human inherent characters including susceptibility to diseases such as diabetes and cancers (Mitchell & Grant, 2015; Xia et al., 1846) (Fig. 1b). A few DNA methylation studies of different human races have been carried out recently, which reveal a distinct pattern of CpG methylation at a certain locus in the autosomal DNA (Table 2). Europeans DNA at AHRR locus are hypomethylated and South Asians have heavily methylated locus essentially due to differences in smoking trends (Europeans are heavy smokers) (Elliott et al., 2014). Tobacco smoking, in fact, contributes to global perturbation in the DNA methylome of human genome (Zeilinger et al., 2013). Also, African-Americans and Caucasian-Americans display 13.7% difference among autosomal CpGs patterns, even among the individuals carrying cancer (Adkins et al., 2011). These differences in cytosine methylation profile proffer a strong support to defend the forensic cases related to racial or ethnic disparities.

Anti-social and aggressive behavior identification

Criminal behavior, especially chronic one or repetitive anti-social behavior, has been known since long to be genetic (Carey & Gottesman, 2006; Dinwiddie, 1996; Mednick & Finello, 1983) but recent research trends classify it properly to be epigenetic and to be early environmentally predisposed (Fig. 1b) (Day & Sweatt, 2010; Kovatsi et al., 2011; Provencal et al., 2014; Roth & Sweatt, 2010). The epigenetic factor, especially DNA methylation is the most important aspect of behavioral disorders with marked alteration in the DNA methylation pattern in the brain which leads to memory formation (Day & Sweatt, 2010). Early-life exposures to stress especially during pregnancy, humiliation or misleading parenting can alter normal epigenetic programming in the brain, with overwhelming consequences for gene expression and behavior (Khulan et al., 2014; Szyf, 2011; Wu et al., 2014). Recent studies have demonstrated an association between drug or beverage addiction of mothers with DNA methylation changes, a phenomenon that can mediate long-term alterations in gene expression (Kovatsi et al., 2011; Marjonen et al., 2015; Masemola et al., 2015; Philibert et al., 2012; Zhang et al., 2012). Even smoking addiction has a very sharp effect on DNA methylation pattern among South Asians and Europeans as explained in Section 3.5 with reference to AHRR locus (Elliott et al., 2014; Monick et al., 2012). Another important marker associated with smoking is cg03636183 (F2RL3) which is again hypomethylated among the smokers (Breitling et al., 2011). These smoking associated DNA methylation marks carry the risk of various cancers including blood, breast, colon and lung cancers (Monick et al., 2012; Shenker et al., 2013) and pass on from one generation to the next as an epigenetic memory (Day & Sweatt, 2010) and it may sometimes take generations to mitigate the effect through rehabilitation or changing behavior/phenotype (Bouwland-Both et al., 2015). It can be concluded here that alcoholism, smoking, drug addiction or any other recurring long-term behavior impugn certain methylation marks on the human genomic DNA (Table 2) and can be used to designate an individual with such addiction/behavior to deal with relevant civil or criminal cases in the court of law. Moreover all smokers/alcoholic are not bound to commit a crime or are criminals, neither all non-smokers/non-alcoholic are innocent, but likelihood of a smoker/alcoholic to be an offender (a psychopathic phenotype(Tamatea, 2015)) has been correlated positively (> 90%) in a study (Shareck & Ellaway, 2011) as a record made by the police during criminal detention or alternatively perceived by their neighbouring residents.

Conclusions

The field of epigenetics has recently made a landmark entry in the research of forensic sciences as it has a high significance in critical identification and analysis of forensic samples which otherwise contain exactly the same DNA sequences. DNA methylation particularly cytosine methylation at the CpG sites, the major epigenetic and chemical change in the DNA (Jin et al., 2011) which without altering the DNA sequence leads to phenotypic variation among the various tissues, monozygotic twins or among humans has provided a new vista to distinguish cell types or tissues (Brunner et al., 2009; Lokk et al., 2014), monozygotic twins (Fraga et al., 2005; Heyn et al., 2013; Kaminsky et al., 2009), age of humans (Bekaert et al., 2015; Horvath, 2013; Yi et al., 2015), human races (Adkins et al., 2011; Heyn et al., 2013) or even can be used to predict certain criminal behaviours (Day & Sweatt, 2010; Szyf, 2011; Szyf, 2013).

The epigenetic markers present in the DNA of human cells or tissues can be used to distinguish the different tissues of same the human being, discern monozygotic twins, identify human races, predict human behaviour as well as to accurately determine the age of a DNA donor (Bekaert et al., 2015) using whole-genome bisulfite sequencing using Illumina Infinium BeadChip array (McCartney et al., 2016; Moran et al., 2016; Touleimat & Tost, 2012), or methylated DNA immunoprecipitation sequencing (Borgel et al., 2012; Cheung et al., 2011; Mohn et al., 2009; Pelizzola & Molinaro, 2011; Sorensen & Collas, 2009; Taiwo et al., 2012; Zhao et al., 2014) as discussed above. Interestingly, when micro-volume blood stain, saliva laden cigarette stubs or few fallen hair strands are recovered from the crime scene, extremely low amounts of DNA (≤ 100 pg) as in the forensic case reported from Jharkhand (Rana et al., 2016), can be treated with bisulfite and then subjected to genome-wide amplification (Bundo et al., 2012), followed by quantitative methylation detection using the above technologies. The above methods have been found to produce comparable methylation results, but differ in CpG coverage, resolution, quantitative accuracy, efficiency, and cost. These methods also differ in their capability to detect and discriminate CpG as well as non-CpG methylation sites and the results show variation from one laboratory to another (Jung et al., 2016). Currently, the most advanced current array technology, Infinium MethylationEPIC BeadChip array, offers a unique, logistic and holistic approach, allowing assessment of the methylation status of not only 5mC but also 5hmC, including 2880 CNG sites (where N is any nucleotide and C and G are cytosine and guanine) and 59 SNP sites (Moran et al., 2016). While the Infinium MethylationEPIC BeadChip Array methylation site coverage is quite large, it is still eclipsed by NGS technology when comparing genome coverage and flexibility (NGS will sequence virtually anything and is not restricted to a defined array, although NGS is still more expensive). The use of NGS methods lies in the fact that the methylation reads of the sequence can be used in the mapping of epigenetic variation (among tissues, twins or with the ageing of human being), which can be routinely applied to solve the forensic cases. NGS technology can be successfully exploited in the domain of forensic field by analyzing multiple loci simultaneously of forensic samples from autosomal DNA or sex chromosomes or mitochondrial DNA.

As far as mitochondrial DNA (mtDNA) is concerned, SNP analysis has immense importance in forensic investigation to delineate maternity as well as the ethnicity of an individual in question (Coble et al., 2004; Just et al., 2009). Other than the sequence variability, there are regions of unique methylation patterns that tend to vary among different tissues of an individual (Liu et al., 2016) as well as certain DNA regions where methylation variation can be observed among distantly related individuals. Since the discovery of mtDNA methylation way back in 1977 in animals (Vanyushin & Kirnos, 1977), quantitative and qualitative analyses of mtDNA 5-methylcytosine (5mC) of human from three different sources of tissues viz. blood, saliva, and lung have been recently reported (Liu et al., 2016). Also, N6-methyladenosine in DNA (m6dA), a modification previously thought to be present only in prokaryotic DNA, has been recently discovered and identified in the nuclear genome of few vertebrates including human, mostly present in the intronic regions, which shows variations among the cell types (Koziol et al., 2016). We believe that mitochondria having prokaryotic in its origin must have m6dA modifications preserved in its DNA. The human mtDNA contains both CpG as well as non-CpG methylation patterns (Barres et al., 2009; Bellizzi et al., 2013; Liu et al., 2016). No further work or exploration has been done for mitochondrial m6dA analysis, which is completely an unexplored as well as an exciting field to embark on. Human mitochondrial DNA is smaller in size (16569 bp) in comparison to nuclear DNA (~3 x 109 bp), present in high copy number per cell (~103-104, nearly 10 copies is preset in each of the tens of hundreds of mitochondria per cell) (Lightowlers et al., 1997) and having intact and preserved DNA in double membranous structure, even in the dead and massively destroyed tissues (Barta et al., 2014; Higgins et al., 2015; Nesheva, 2015; Ozga et al., 2016; Schwarz et al., 2009), it has therefore clear advantages over nuclear DNA to be providentially recovered and easily manipulated from the questioned samples referred to the forensic laboratories.

The most remarkable outlook that this review is trying to represent here for the first time is the potential application of DNA methylation profiling to identify the human biogeographic type (racial), ancestry and phenotypic type (ethnicity), or certain criminal behaviour which may aid in gathering unseen (abstract) indirect evidences for some critical forensic cases. The assortment of behavioral evidence is today a part of well-known forensic discipline called ‘investigative forensics’ involving direct questioning from the suspects/criminals and therefore interestingly this current molecular epigenetic approach of the behavioral study and the investigative forensics/psychological study are concluding at the same end but with entirely different fields of analysis. The current research in DNA phenotyping (physical appearance because of the unique DNA) shows that 24 DNA markers (SNP) can be successfully used to predict the color of the hair as well as the eye of an individual from the DNA obtained from his/her body tissues (Walsh et al., 2013). Other genes responsible for skin color, hair structure, and baldness have been determined (Hart et al., 2013; Liu et al., 2015; Marcinska et al., 2015; Pospiech et al., 2015). In future, this technique can be combined with predicting method of age, bio-geographic/ethnicity as well as behaviour of an individual from the DNA methylation data (Next Generation DNA Phenotyping), which can be used to accurately construct the inclusive facial picture who leaves behind the DNA mark but has no match for his/her DNA fingerprint.

The research field of behavioral epigenetics is still in infancy today, similar to cancer genetics 50 years ago. The reproducibility of differential methylation pattern for many of these human conditions, especially neuropsychology disorders such as addiction and mental health, and the relationship between DNA methylation and altered gene expression and physiology, is required to be systematically analyzed. The epigenome expressed by human behavior is dynamic in nature and is modulated by many internal as well as external factors such as embryonic/human developmental, health issues if any and external environmental. This is why epigenome data interpretation is more challenging than genomic data. Not least, the challenge of eliminating library contamination, error rates of the enzymes used for DNA amplification as well as minimum insert size, adapter and hybridization artifacts etc has to be taken into account while considering epigenome sequencing. These are some of the major issues that may arise but have been rarely addressed by the various consortia under the shade of the International Human Epigenome Consortium (http://ihec-epigenomes.org/) that are decoding many tissues/cells specific epigenetic map from healthy donors as well as from diseased ones.

It is important to note that many of the above methods/techniques use only sodium bisulfite treatment and not glucosylation and thus cannot distinguish between 5-methylcytosine and the more abundant 5-hydoxymethylation marks (5-hmC marks are often protected by glucosylation in vivo). Thus more research needs to be carried out and we are clearly not at the stage of scientific certainty to use information about the human epigenome to solve a case in a court of law. The various methods documented here are not intended for exhaustive solutions for identification of forensic tissues, monozygotic twins, age determination of person from cells/tissue, authentication of DNA, etc. rather one or more of these methods can be selected based on the case profile, suitability of the forensic exhibits/samples in hand, or depending on the accessibility of the instrument/resource as required in each experimental method. The use of DNA methylation in forensic analysis is still a risky task considering the amount of genomic DNA obtained from the exhibit in some challenging cases, the sensitivity of the antibody in hand against 5mC, the suitability of the marker (DNA sequence) selected for PCR amplification and the detection methods/instrument used in the analysis. Despite these thwarting circumstances, advancement of methylation research and providing relevant training facilities to the forensic personnel can certainly overcome the above limitations producing enough expertise to introduce these sophisticated molecular techniques in very near future. The beginning of DNA methylation studies in forensic science is fundamental to support the conventional STR profiling and it is quite evident that in near future this technique may be utilized to produce independent evidence as well as corroborative evidence to detect, map and solve an array of forensic cases.