Application of Wavelet Packet Transform to detect genetic polymorphisms by the analysis of inter-Alu PCR patterns
- First Online:
- Cite this article as:
- Cardelli, M., Nicoli, M., Bazzani, A. et al. BMC Bioinformatics (2010) 11: 593. doi:10.1186/1471-2105-11-593
The analysis of Inter-Alu PCR patterns obtained from human genomic DNA samples is a promising technique for a simultaneous analysis of many genomic loci flanked by Alu repetitive sequences in order to detect the presence of genetic polymorphisms. Inter-Alu PCR products may be separated and analyzed by capillary electrophoresis using an automatic sequencer that generates a complex pattern of peaks. We propose an algorithmic method based on the Haar-Walsh Wavelet Packet Transformation (WPT) for an efficient detection of fingerprint-type patterns generated by PCR-based methodologies. We have tested our algorithmic approach on inter-Alu patterns obtained from the genomic DNA of three couples of monozygotic twins, expecting that the inter-Alu patterns of each twins couple will show differences due to unavoidable experimental variability. On the contrary the differences among samples of different twins are supposed to originate from genetic variability. Our goal is to automatically detect regions in the inter-Alu pattern likely associated to the presence of genetic polymorphisms.
We show that the WPT algorithm provides a reliable tool to identify sample to sample differences in complex peak patterns, reducing the possible errors and limits associated to a subjective evaluation. The redundant decomposition of the WPT algorithm allows for a procedure of best basis selection which maximizes the pattern differences at the lowest possible scale. Our analysis points out few classifying signal regions that could indicate the presence of possible genetic polymorphisms.
The WPT algorithm based on the Haar-Walsh wavelet is an efficient tool for a non-supervised pattern classification of inter-ALU signals provided by a genetic analyzer, even if it was not possible to estimate the power and false positive rate due to the lacking of a suitable data base. The identification of non-reproducible peaks is usually accomplished comparing different experimental replicates of each sample. Moreover, we remark that, albeit we developed and optimized an algorithm able to analyze patterns obtained through inter-Alu PCR, the method is theoretically applicable to whatever fingerprint-type pattern obtained analyzing anonymous DNA fragments through capillary electrophoresis, and it could be usefully applied on a wide range of fingerprint-type methodologies.
Many analytical methodologies in modern genetics and biochemistry are based on the analysis of complex mixtures of oligonucleotides or oligopeptides, which are resolved as complex patterns of peaks or bands often referred as "fingerprint type" patterns. When the analysis is performed at the DNA or RNA level, fingerprint type patterns can be generated by gel or capillary electrophoresis of nucleic acid sequences produced by PCR (Polymerase Chain Reaction) -based techniques, such as Random Amplified Polymorphic DNA (RAPD) , Arbitrarily Primed PCR (AP-PCR) , Simple Sequence Repeat anchored Polymerase Chain Reaction amplification (SSR-PCR) , Differential Display Reverse Transcription (DDRT) PCR , AFLP , inter-Alu PCR . All these methodologies allow for a screening of several (up to some hundreds) nucleic acid fragments that correspond to different loci, without making any a priori assumption about their exact sequence and genomic localization. The comparative analysis of patterns obtained in different samples reveals its utility in the most disparate fields of biological research: as examples we recall the identification of genes overexpressed in tumors , the identification of genetic variability at different levels (individuals, populations, species) [7, 8, 9] and the discovering of genomic loci associated with human longevity . Among DNA fingerprinting techniques, inter-Alu PCR [6, 11, 12] is of particular interest, being characterized by the highest information level . Alu repeat sequences are ubiquitously distributed in the human genome with more than one million elements . A genomic DNA fragment can be amplified with a single Alu-specific primer when it is flanked by two Alu elements which have opposite orientation and a distance within few kilobases. A PCR reaction conducted with one ore more primers complementary to Alu sequences produces a multitude of anonymous DNA amplification products that can be revealed by electrophoretic separation. A typical inter-Alu pattern often shows inter-individual variability, due to genetic polymorphisms of different types: length variation of intervening sequences, de novo insertion of flanking Alu elements, deletions, translocations, and mutation of priming sites [13, 15, 16]. In general, this approach can be used for the initial detection of polymorphic loci involved in quantitative, multigenic traits [10, 17] or of germline and somatic mutations [18, 19] or of genetic alterations in cancer cells [20, 21, 22, 23]. In a previous study , we developed a variant of inter-Alu PCR, which uses two different Alu-specific primers labeled with different fluorochromes in the same PCR reaction; the resulting PCR products can be analyzed by capillary electrophoresis and fluorescent detection on a PE/ABI Genetic Analyzer, and reported by the instrument as distinct fluorescence peaks; many of the peaks generated by this method are smaller than 1 Kb and, given that the frequency peaks of Alu elements in the human genome are centered at 0.1 Alu/kb and 1 Alu/kb , are likely to be obtained from the regions with highest density of Alu sequences [10, 17]. In the inter-Alu PCR analysis, as well as in other fingerprint-type genomic analysis, the comparative evaluation of the analytical samples is usually done "by eye" by the operator, with the time-consumption and the possible errors associated with a subjective evaluation. These limitations prevent the application of these technique to large data sets and there is the necessity to develop computer-based analytical approaches, able to automate the comparative analysis of different samples and to provide better reliability and operative efficiency. We have elaborated and tested, in the present work, an algorithm based on the Wavelet Packet Transformation (WPT) aimed to detect fingerprint-type patterns generated by inter-Alu PCR. The WPT is an overcomplete multiscale analysis of the initial signal based on wavelet functions . Starting from a signal of length 2 N the information is distributed on N × 2 N coefficients so that it is possible to apply an optimization procedure for classification problems and pattern recognition. In recent years the wavelet analysis has been largely applied to biological data sets, for very different purposes such as microarray data mining [26, 27] and analysis of the genomic sequence [28, 29, 30]. In this paper we use the Best Basis algorithm to define different classes of signals. This method has been developed by Coifman and Wickerhauser  for the sismic signals classification and successively applied to feature extraction problems by Saito  that has proposed the Local Discriminant Basis algorithm. The classification is based on the hypothesis that the relevant signal information is well reproduced by a limited number of wavelet coefficients. To perform the WPT we have chosen the Haar basis that generates the Walsh packets . We have tested the capability of the wavelet analysis to detect sample to sample differences in a fingerprint type pattern produced by the electrophoretic analysis of inter-Alu PCR products. The positions of electrophoretic peaks detected by the genetic analyzer was used to reconstruct the inter-Alu pattern using a standard Gaussian for each peak. We have applied the WPT algorithm to identify some regions in the electrophoretic patterns where a significant difference is detected among the signals obtained from three couples of homozygotic twins. The comparison of the patterns of members of the same couple of twins allowed to filter the intrinsic variability of experimental methodology, whilst those signals which varied only among different twins were possibly correlated to polymorphic loci. The characterization of the detected polymorphic loci requires further specific experiments.
We have developed a program that performs the signals reconstruction using a mapping from data point (unit of the instrument) to base pairs.
Results and Discussion
We have applied the WPT to the 6 union signals obtained from the three couples of twins and we have looked for the coefficients that discriminate among a fixed couple of twin and the others. The analysis of sample replicates reduces experimental variability mainly due to unpredictable errors due to the PCR reaction and to the electrophoretic separation. This reproduces the condition which is encountered in the routinary biological use of the inter Alu-PCR and other similar methodologies. In this case the variability between the twins of a given couple, that share the same genomic DNA sequence, can be explained by differences in DNA quality, purity, presence of contaminants and other unpredictable differences generated in the extraction and preparation of DNA samples (which could in principle partially depends from pre-existing biochemical/biological differences between the blood samples). The variability may appear as slightly different peak positions or different amplification degree of inter-Alu sequence that could produce non-detectable signals (peak absence in one twin).
In order to relate the δ value with the effective differences in the inter-Alu patterns, we have to normalize the signals to the area of the support region of the wavelet function associated to the cji coefficient. If, in the considered region, the union signals have a single peak, the criterium is satisfied when the peak position of different twin couples is shifted of 2 bp (at least) with respect to the measured difference between the peak position of the same twin couple. On the contrary if we are analyzing regions where several peaks are present, the criterium (2) takes into account the correlation among the peak positions in the signal and it is satisfied when the global difference between the patterns of different twin couples is more than 1/3 of the total signal area plus the experimental variability of the twin signals.
Global classifying regions obtained using the first marker (Tet fluorochrome)
Global classifying regions obtained using the second marker (Fam fluorochrome)
a rapid, computer-assisted detection of variable peaks;
an automated comparison of different replicates of the same sample, and an automatic "extraction" of reproducible signals;
a better sensitivity, with the ability to detect an higher number of polymorphic regions.
Moreover we remark that, albeit we developed an algorithm specifically optimized to analyze inter-Alu PCR patterns, the method is theoretically applicable to whatever fingerprint-type pattern obtained analyzing anonymous DNA fragments through capillary electrophoresis, and could be usefully applied on a wide range of fingerprint-type methodologies. It is important to note that, recently, new high-throughput methods based on DNA sequencing  and on TIP-chip microarray analysis [36, 37] have been presented, aimed to perform a locus by locus detection of Alu mutation/polymorphisms on the whole genome: the first results obtained with these methodologies [35, 37] have begun to clarify and to point out the importance of the mutagenesis mediated by Alu sequences and other retrotransposons in human genome variation and in various disease conditions. However, for their inherent complexity and high cost, these high-throughput methodologies are not likely to become (at least in the next few years) a substitute for inter-Alu PCR in all those situations in which limited availability of time or budget could be a constraint (for example, for diagnostic examination of disease states in which the importance of Alu-associated genetic variation has been found). The availability of a computer method capable to speed-up, simplify and standardize the analysis of inter-Alu PCR patterns will be a valuable aid for a routine use of the inter-Alu analysis.
This work was partially supported by the European Union Grants GEHA (LSHM-CT-2004-503270).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.