Genome-wide search of the genes tagged with the consensus of 33.6 repeat loci in buffalo Bubalus bubalis employing minisatellite-associated sequence amplification

Abstract

Minisatellites have been implicated with chromatin organization and gene regulation, but mRNA transcripts tagged with these elements have not been systematically characterized. The aim of the present study was to gain an insight into the transcribing genes associated with consensus of 33.6 repeat loci across the tissues in water buffalo, Bubalus bubalis. Using cDNA from spermatozoa and eight different somatic tissues and an oligo primer based on two units of consensus of 33.6 repeat loci (5′ CCTCCAGCCCTCCTCCAGCCCT 3′), we conducted minisatellite-associated sequence amplification (MASA) and identified 29 mRNA transcripts. These transcripts were cloned and sequenced. Blast search of the individual mRNA transcript revealed sequence homologies with various transcribing genes and contigs in the database. Using real-time PCR, we detected the highest expression of nine mRNA transcripts in spermatozoa and one each in liver and lung. Further, 21 transcripts were found to be conserved across the species; seven were specific to bovid whereas one was exclusive to the buffalo genome. The present work demonstrates innate potentials of MASA in accessing several functional genes simultaneously without screening the cDNA library. This approach may be exploited for the development of tissue-specific mRNA fingerprints in the context of genome analysis and functional and comparative genomics.

Introduction

Minisatellites defined as 10–100-bp-long stretch of DNA (Charlesworth et al. 1994) encompass 0.5 to several kilobases in the eukaryotic genome (Vergnaud and Denoeud 2000). Variations in the number of their repeat units confer high level of polymorphism to these regions (Ali and Wallace 1988; Jeffreys et al. 1990, Kim et al. 2008). Earlier, minisatellites were described to represent the “junk” part of the genome (Orgel and Crick 1980). However, recent studies have shown their involvement in a variety of functions in the eukaryotic genome (Fondon and Garner 2004; Legendre et al. 2007; Levdansky et al. 2007).

Transcriptionally active minisatellites and microsatellites are of particular importance since they regulate functions of other RNAs (Li et al. 2002, 2004). Minisatellites within the RNAs particularly in the protein coding regions may influence both structure and function of the proteins. Such repeat elements owing to “strand slippage” during DNA replication shrink and expand, causing mutations in the coding regions of the corresponding genes. Consequently, the altered proteins elicit pathological response at the molecular and cellular levels, leading to genetic diseases, e.g., Huntington disease, fragile X syndrome, myotonic dystrophy, epilepsy, etc. (Li et al. 2004; Lohi et al. 2005; Verstrepen et al. 2005; Usdin 2008). Association of minisatellite with promoter regions influences binding of transcription factors, thus regulating gene expression across the tissues (Li et al. 2004; Caburet et al. 2004, 2005; Dey and Rath 2005; Mahr and Müller-Hilke 2007; Akagi et al. 2009). Minisatellites have been used as valuable tools to analyze uncharacterized genomes (Georges and Andersson 1996). In earlier studies, we uncovered several mRNA transcripts representing known and novel genes tagged with minisatellites from the buffalo genome (Srivastava et al. 2006, 2008, 2009). Of these, we characterized and mapped secreted modular calcium binding protein 1 gene in the buffalo genome (Srivastava et al. 2007). The consensus sequence of minisatellite 33.6 is an 11-bp repeat, originating from the human myoglobin gene (Jeffreys et al. 1985). In the present study, employing minisatellite-associated sequence amplification (MASA) approach, we uncovered mRNA transcripts tagged with two units of consensus of 33.6 repeat loci in water buffalo, Bubalus bubalis, using cDNA from different somatic tissues, gonads, and spermatozoa (Srivastava et al. 2006, 2008, 2009). These mRNA transcripts were studied for their sequence, homology, differential expression, and evolutionary status. Furthermore, chromosome localization and copy number studies were conducted for the candidate genes. The significance of buffalo in agriculture and in dairy and meat industries prompted us to use this animal as a model. Generation of mRNA fingerprint(s) from different somatic tissues and spermatozoa is envisaged to provide deeper insight into the ubiquitous and singular expression of the genes in this species and their functional correlations among different tissues. This work demonstrates the innate potentials of MASA-mediated approach for accessing genes without screening the cDNA library and forms a rich basis for functional and comparative genomics on any type of cell, tissues, and even biopsied samples.

Materials and methods

Blood collection and isolation of genomic DNA

Heparinized vials were used for blood collection from buffalo Bubalus bubalis, cattle Bos taurus, goat Capra hircus, sheep Ovis aries, human Homo sapiens, tiger Panthera tigris, fish Heteropneustes fossilis, bird Columba livia, rat Rattus norvegicus, jungle cat Felis chaus, bonnet monkey Macaca radiate, Indian rhinoceros Rhinoceros unicornis, and leopard Panthera pardus, and DNA was isolated following standard protocol (Srivastava et al. 2008). Blood samples from the endangered species were procured with due permissions from the competent authorities of the state and union government of India, following strictly the guidelines of the Institute’s Ethical and Biosafety Committee.

Sperm processing and RNA isolation

Fresh ejaculated semen samples from buffalo bulls were obtained from the animal farm, Lucknow, UP, India, following strictly the guidelines of the Institute’s Ethical and Biosafety Committee. Sperm cells were isolated by centrifugation on percoll density gradient (Srivastava et al. 2009). After the centrifugation, cells were washed in sperm wash buffer (0.15 mM NaCl and 10 mM EDTA), and pellets were processed for RNA extraction using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). After resuspension of the cells in TRIzol reagent, they were incubated at 60°C for 30 min, vortexed every 10 min for lysis as per standard protocols (Ostermeier et al. 2005; Lalancette et al. 2008a) and manufacturer’s instructions. DNase I (Ambion, USA) treatment was performed on every sample, and RNA was precipitated with ammonium acetate and ethanol. Quantification of RNA was done on UV spectrophotometer. In order to ensure that the RNA obtained was exclusively from the spermatozoa, it was reverse-transcribed to cDNA (described later), and using primers specific for CDH1 and CD45 genes, polymerase chain reaction (PCR) was performed. These genes express only in epithelial and leukocyte cells, respectively, and not in the spermatozoa (Lambard et al. 2004; Lalancette et al. 2008a). Further, presence of DNA was ruled out by PCR using 50-ng total RNA as template and β-actin primers, GenBank accession no. DQ661647 (forward 5′ CAGATCATGTTCGAGACCTTCAA 3′ and reverse 5′ GATGATCTTGATCTTCATTGTGCTG 3′), designed on different exons (Srivastava et al. 2009).

RNA isolation from the tissues and synthesis of cDNA

Total RNA was extracted from testis, kidney, liver, spleen, lung, heart, ovary, and brain using TRIzol following manufacturer’s instructions. These tissues were procured from the local slaughter house, Delhi, India. Following this, approximately 10 µg of RNA each from different tissues and spermatozoa was reverse-transcribed into cDNA using commercially available high-capacity cDNA RT kit (Applied Biosystems, USA). The success of cDNA synthesis was confirmed by PCR reaction of 35 cycles using buffalo-derived β-actin primers.

MASA with oligo based on the consensus of 33.6 minisatellite

Using 22-base-long 33.6 oligo primer (Supplementary Table 1) and cDNA from different tissues and spermatozoa, PCR amplifications were carried out. The reaction conditions involved 95°C denaturation for 5 min followed by 35 cycles each consisting of 95°C for 1 min, 60°C for 1.5 min, and 72°C for 1 min and final extension at 72°C for 10 min. In order to amplify more numbers of 33.6 tagged transcripts, other annealing temperatures between 55°C and 60°C were also used. Approximately, 25 μl of amplified product was resolved on a 20-cm-long, 3% (w/v) agarose gel in 1× TBE buffer at a constant voltage. The resolved bands were sliced from the gel, purified using the Gel extraction kit (QIAGEN), and cloned into pGEMT-easy vector (Promega, USA).

Cloning of amplified fragments, slot-blot hybridization, and sequencing

PCR-amplified fragments were cloned and sequenced after confirmation with restriction digestion (EcoR1 enzyme) and slot-blot hybridization, using α-32P-dCTP-labeled buffalo genomic DNA. For slot-blot hybridization, PCR-amplified inserts were alkaline-denatured and spotted onto the nylon membrane along with buffalo genomic DNA as positive control and cloning vector as negative control. Hybridizations were carried out overnight at 60°C. After hybridization, membranes were washed in 2× saline sodium citrate (SSC) and 0.1% SDS thrice, and signals were recorded by exposure of the blot to X-ray film. Following this, at least three clones from each set were subjected to sequencing. The sequences were deposited in the database, and accession numbers were obtained (Table 1).

Table 1 Details of the mRNA transcripts tagged with consensus of 33.6 repeat loci in water buffalo B. bubalis

Database search

The putative identity of sequences was determined using Basic Local Alignment Search Tool (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi; Altschul et al. 1990) and B. taurus build 4 genome database at the National Center for Biotechnology Information (NCBI). Search parameters were set for nucleotide collection (nr/nt) and reference mRNA sequence (RefSeq_rna) optimized for highly similar sequences (megablast). Only trimmed sequences containing 33.6 minisatellite on both ends were used for similarity searches. Table 1 enlists “Transcript ID,” “Score,” “E-value,” “Accession no.,” and “Description” of 29 transcripts. Multiple sequence alignment and phylogenetic tree construction were done using ClustalW program (www.ebi.ac.uk/clustalw). Only sequences that were showing similarity with the characterized genes in the database were taken for phylogenetic analysis. Repeats were calculated using Tandem Repeat Finder. ORF and amino acid sequence identification were done using Translation Tool (http://www.bioinformatics.org/sms/index.html).

Cross-hybridization of MASA-amplified fragments with different species

For cross-hybridization, approximately 500 ng of heat-denatured genomic DNA from 12 different species mentioned earlier were slot-blotted onto the nylon membrane along with cloned plasmid and buffalo genomic DNA as positive control and 2× SSC as negative control. Blots were hybridized with α-32P-dCTP-labeled recombinant plasmids at 60°C overnight, and autoradiography was done following standard procedure (John and Ali 1997).

RT–PCR and relative expression of MASA generated candidate transcripts with real-time PCR

Expression of MASA-amplified transcripts from the tissues and sperm was studied using clone-specific internal primers designed by Primer 3 software and 50 ng cDNA from different tissues. For each RT–PCR reaction, β-actin was used as positive control. Relative expression of cDNA from different tissues and spermatozoa corresponding to 13 mRNA transcripts was studied using real-time PCR (Sequence Detection system, 7000, ABI Prism, CA, USA). Primers were designed using Primer Express 2.0 (Applied Biosystems) software (Supplementary Table 1). Real-time PCR reaction was performed following standard protocol (Pathak et al. 2006; Srivastava et al. 2008). The specificity of each primer pair and efficiency of the amplification were tested by assaying serial dilutions of cDNA. Single melting temperature peak representing a single amplicon validated primer specificity while the slope and R 2 values for serial dilutions affirmed the reactions efficiency. In order to reduce individual variations, normalization of quantitative real-time results was done using endogenous control (GAPDH). To detect potential contamination during preparation of the plate, nuclease-free water was included in each reaction as a negative control. Quantitative real-time PCR reactions were performed in triplicate using 96-well plate in a 20-µl reaction volume, employing conditions of 50°C for 2 min and 95°C for 10 min, followed by 40 cycles each of 95°C for 10 s and 60°C for 1 min. The data were analyzed using threshold setup recommended by Applied Biosystems (http://www3.appliedbiosystems.com/cms/groups/mcb_support/documents/generaldocuments/cms_042176.pdf). The expression status was calculated using the formula \( {\left( {1 + E} \right)^{^{ - \Delta Ct}}} \), where E is the efficiency of PCR and ∆Ct is the cyclic threshold difference between target gene and internal control. To achieve the maximum (one) efficiency of the real-time PCR, the amplicon size was kept small (70–150 bp) so that the expression level of the test gene/transcript remains \( {2^{ - \Delta \Delta Ct}} \). Based on this, expression levels of different mRNA transcripts in buffalo genome were ascertained.

Amplification of full-length buffalo peroxisomal membrane protein 4 CDS using endpoint PCR

Peroxisomal membrane protein 4 (PXMP-4) gene in buffalo represented partial cDNA sequence amplified by 33.6 MASA (EU188605). Full-length buffalo PXMP-4 CDS was generated using primers designed from cattle PXMP-4 sequence (accession no. NM_001099163, XM_869359). Details of primer sequences (transcript ID Dp30) and product size are given in Supplementary Table 1. End point PCR was conducted to amplify 3′ UTR of PXMP-4 gene, and the amplicon so obtained was cloned into pGEMT-easy vector and sequenced. Finally, full-length buffalo PXMP-4 gene sequences were assembled.

Copy number calculation for 33.6 MASA-amplified transcripts

Copy number of two representative mRNA transcripts, Dp10 corresponding to SARS2 gene and Dp26 corresponding to PXMP-4 in buffalo genome, was calculated using SYBR Green chemistry on real-time PCR Sequence Detection System-7000 (ABI, USA). Serial dilutions (10-fold) of buffalo genomic DNA and recombinant plasmids in the range of 300,000,000 to three copies were prepared (haploid genome of buffalo = 3.36 pg, wt per base pair = 1.096 × 10−21 g), and reactions were conducted in triplicate according to standard protocol (Pathak et al. 2006). Primer (Supplementary Table 1) and assay conditions were kept similar to those used for relative expression study.

Chromosomal localization of SARS2 gene by fluorescence in situ hybridization

Buffalo metaphase chromosomes were prepared according to standard protocol (Pathak et al. 2006). Fluorescence in situ hybridization (FISH) was carried out using B. taurus SARS2 (BC140548, MGC: 151512 (IMAGE: 8080720)) bacterial artificial chromosome (BAC) probe (imaGenes GmbH, Germany). The BAC clone was labeled with biotin-16-dUTP nick translation Kit, Vysis (IL, USA). Biotinylated antifluorescein antibody and FITC avidin DCS were obtained from Vector Labs. Reaction was carried out following standard protocol (Premi et al. 2007). Slides were counterstained with DAPI and watched under fluorescence microscope (BX 51, Olympus), and images were captured with CCD camera attached with a video camera mounting adaptor Olympus U-CMAD-2. Chromosomal mapping was done following the ISCNDB 2000 (Cribiu et al. 2001).

Results

In silico analysis of 33.6 minisatellites across the species

To gain an insight into the transcribing genes associated with 33.6 minisatellites, oligonucleotides comprising one (5′CCTCCAGCCCT3′) and two (5′CCTCCAGCCCT CCTCCAGCCCT3′) units of this repeat were used independently to conduct BLAST search. Two units comprising 22 nucleotides did not reveal significant homology with the entries in GenBank whereas one unit comprising 11 nucleotides was found to be present in the flanking and intervening regions of several structural and functional genes across the species (Supplementary Table 2). Though most of the genes in the database were found to be tagged with one unit of 33.6 sequences, the presence of two units across the wider spectrum of genes in any species could not be ruled out. This is because such genes may not yet be part of the database. With this assumption, we used 22-base-long oligonucleotides of 33.6 repeat for genome-wide search of genes employing MASA.

The mRNA transcripts tagged with consensus of 33.6 repeat across different somatic tissues and spermatozoa of buffalo

MASA conducted with 22-base-long oligonucleotides and cDNA from different somatic tissues, gonads, and spermatozoa of buffalo amplified several mRNA transcripts ranging from 200 to 1.0 kb as shown (a representative) in Fig. 1. Prominent bands totaling 161 from all the tissues and spermatozoa were cloned, sequenced, and subjected to database search. Details of each mRNA transcript totaling 29 in the range of 206 to 988 bp uncovered in the present study are given in Table 1 and Supplementary Table 3. ClustalW alignment of the 29 mRNA transcripts showed no intertissue sequence polymorphism.

Fig. 1
figure1

A representative agarose gel showing minisatellite-associated sequence amplification (MASA) using 22-base-long oligo primer based on consensus of 33.6 repeat loci. cDNA from different somatic tissues and spermatozoa of buffalo amplified by 33.6 primer (a). β-actin primers were used as an internal control (b). Molecular marker 1-kb ladder is given on the left side

33.6 tagged mRNA transcripts associated with various transcribing genes

Of the 29 mRNA transcripts identified, BLAST search for 15 transcripts showed sequence homology with flanking or intervening region(s) of transcribing genes involved in signal transduction, cell differentiation, and cell proliferation (see Tables 1 and 2) across the species. The remaining 14 transcripts showed either negligible or nonsignificant homology with the functional genes in the database. However, they did show homology with still uncharacterized BAC clones and contigs (Table 1).

Table 2 Relative expression and copy number assessment of representative MASA-identified mRNA transcripts in different somatic tissues and spermatozoa of buffalo B. bubalis

Status of MASA-amplified sequences in different species

Buffalo-derived 33.6 tagged mRNA transcripts were cloned and used as probes for cross-hybridization with genomic DNA from 12 different species. A representative blot is shown in Fig. 2, and the remaining results have been given in Supplementary Table 4. Of the 29 uncovered fragments, 21 showed signals across the species, seven (Dp2, Dp5, Dp6, Dp11, Dp13, Dp24, and Dp28) were specific to bovid, and one (Dp25) was exclusive to buffalo (see Supplementary Table 4). Phylogenetic analyses of 14 corresponding gene fragments (Dp1, Dp2, Dp3, Dp4, Dp9, Dp10, Dp12, Dp17, Dp19, Dp20, Dp21, Dp22, Dp26, and Dp27) showed their close relationship with cattle, but surprisingly one, Dp16, was found to be close to humans. Twelve genes used for phylogenetic analysis are shown in a representative figure (Supplementary Figure 1). Since the remaining 14 mRNA transcripts had no entry in the database, they could not be used for such analysis.

Fig. 2
figure2

Zoo-blot hybridization to elucidate conservation of 33.6 tagged mRNA transcripts in different species. Cross-hybridization result of representative 13 recombinant clones corresponding to 33.6 tagged mRNA transcripts with genomic DNA of buffalo and 12 other species is mentioned on top of the panel. NC denotes negative control (2×SSC) and PC positive control (recombinant plasmids). Transcript IDs of the sequences used for hybridization are mentioned on the left. β-actin was used as an internal positive control

Differentially expressed mRNA transcripts in somatic tissues and spermatozoa

RT–PCR analysis of 29 mRNA transcripts with their internal primers showed varying levels of signals among somatic tissues, gonads, and spermatozoa with respect to 13 mRNA transcripts (Fig. 3) whereas the remaining 16 showed almost uniform signals in all the tissues examined (not shown). Following this, 13 transcripts were subjected to quantitative expression analysis using real-time PCR. Tissue or spermatozoal mRNA transcript(s) that showed lowest expression was used as an internal calibrator (cb). From these 13, nine (Dp1, Dp4, Dp8, Dp10, Dp17, Dp19, Dp20, Dp26, and Dp27) showed the highest expression in spermatozoa (Fig. 4, Table 2). Dp2 showed maximum expression in testis, Dp9 in liver, and Dp22 in lung, suggesting their specific roles in these organs (Fig. 4). Notably, Dp16 showed negligible expression in spermatozoa. Summary of the relative expression (in folds) derived from \( {2^{ - \Delta \Delta Ct}} \) values obtained for various transcripts based on real-time PCR is given in Table 2.

Fig. 3
figure3

RT–PCR analysis of 33.6 tagged mRNA transcripts. Using internal primers and cDNA from different somatic tissues, gonads, and spermatozoa of buffalo, RT–PCR analysis of 13 representative mRNA transcripts was done. Transcript’s IDs are indicated on left and tissues are mentioned on top of the lanes. β-actin was used a positive control

Fig. 4
figure4

Expressional analysis of the representative 33.6 tagged mRNA transcripts. al represents different tissues, gonad, and spermatozoa. Note the maximum expression of some representative mRNA transcripts in the spermatozoa corresponding to Dp1, 4, 8, 10, 17,19, 20, and 26 shown in a, c, d, f, h, i, j, and l), respectively, and exclusive expression of Dp9 in liver (e). Bars represent relative expression of the transcript(s) in folds. Transcript IDs are mentioned on top left corner and tissues, below the panels. For details, see Table 2

Isolation and characterization of the buffalo PXMP-4 CDS

Of all the 33.6 tagged transcripts, a 605 bp (Dp26) showed homology with B. taurus PXMP-4 gene along its entire length, representing partial cDNA sequence of PXMP-4 in buffalo. End point PCR with gene-specific cattle PXMP-4 primers amplified 883-bp fragment (accession number EU714054), in buffalo genome (Supplementary Table 1). Complete assembled cDNA sequence of PXMP-4 gene in buffalo was found to be of 1,488 bp (Supplementary Figure 2). In silico analysis (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) of PXMP-4 sequence from nucleotide position 1–555 (negative frame) was found to encode an open reading frame of 184 amino acids with putative conserved domain of approximate molecular weight of 24 kDa. Multiple alignments of buffalo PXMP-4 sequences showed the highest homology with that of cattle, though this gene was present in other species as well (Supplementary Figure 3). Except two units of 33.6 repeat, no other repetition of minisatellite was noticed in the entire sequence.

Buffalo PXMP-4 and SARS2 are single-copy genes

Copy number calculation of two transcripts, Dp10 and Dp26, was done using real-time PCR and SYBR green. Standard curve with slope −3.5 to −3.6 substantiated maximum efficiency of the reaction. Extrapolating the standard curves gave single copy for both the genes per haploid genome (Fig. 5). Chromosomal mapping of Dp10, representing SARS2 gene, was done using FISH and bovine SARS2 BAC probe. As expected, two signals on buffalo metaphase chromosome 18 (Fig. 6a) and interphase nuclei (Fig. 6b) were detected.

Fig. 5
figure5

Copy number assessment of PXMP-4 and SARS2 genes by real-time PCR. Real-time PCR amplification plots based on 10-fold dilution series of plasmids carrying 33.6 tagged sequences for copy number calculation. a Dp10, showing slope of −3.5 and a single dissociation peak. b Dp26, showing slope of −3.6 and a single dissociation peak, substantiating maximum efficiency of the PCR reaction and high specificity of the primers, respectively, with target cDNA

Fig. 6
figure6

Localization of SARS2 gene on the representative interphase nuclei and metaphase chromosome 18 using FISH. SARS2 BAC probe showing signals on buffalo metaphase chromosome 18 (a) and interphase nuclei (a–e) (b). Note two signals in the interphase nuclei corresponding to those on the homologous chromosomes. Scales are not shown to the micrographs

Discussion

Debates whether minisatellites play roles in organism development and evolution have yet to take a decisive turn. However, their presence in the transcribing regions of the genomes has attracted much attention (Morgante et al. 2002; Lalancette et al. 2008b). Current work on buffalo is another stepping stone showing minisatellite tagged with exons, contradicting the earlier notion of their absence in the coding sequences. Consensus sequence of 33.6 repeat loci is distributed ubiquitously in the human (Jeffreys et al. 1985) and several nonhuman genomes (Wickings 1993; Karaca et al. 2002). In an earlier study, two units (5′CCTCCAGCCCT3′)2 of this repeat uncovered genome-specific band pattern in bovid (Azfer et al. 1999). However, Blast search of the same showed its poor association with coding or noncoding region of the gene(s). Since we detected a number of genes from somatic tissues and spermatozoa tagged with two units of 33.6 repeat, it is likely that repeat tagged genes across the species are still not part of the database. Among minisatellite tagged genes, 50–60% (Tables 1 and 2) encode cell wall proteins or proteins that are involved in signal transduction. Such genes encode mostly serine and threonine residues believed to be the site for posttranslational modifications, crucial for maintaining the proteins at cell wall surface (Verstrepen et al. 2005; Richard and Dujon 2006). Presence of these transcripts both in somatic tissues and spermatozoa suggests their more generalized functions. Since buffalo genome is not yet fully sequenced, transcript profile from the current work will add new vistas to the buffalo genomics.

Repeat sequences in a genome may cause reciprocal or nonreciprocal translocations, segmental duplications, gene amplification, and other kinds of chromosomal rearrangements (Richard et al. 2008). In addition, sequence insertion is known to cause genomic diversity (Richard et al. 2008). Exclusive presence of transcript Dp25 in buffalo observed in the present study may be representing an event of sequence insertion. If so, Dp25 mRNA transcript may prove to be an attractive candidate to be analyzed among different breeds of buffalo across the somatic tissues and spermatozoa to ascertain its possible breed-specific origin. Phylogenetic analysis of 33.6 tagged genes in buffalo showed close homology with those of cattle (Supplementary Figure 1). This may not be true for all the genes owing to differences in the genome evolution of the two species. However, such phylogenetic analysis of different breeds of buffalo would be relevant in the context of breed delineation.

Differential expression of most of the transcripts in somatic tissues, gonads, and spermatozoa may be a reflection of programmed sequence modulation required during different stages of development. The highest expression of nine mRNA transcripts in the spermatozoa suggests their possible biological significance in the fertilization events. It is likely that 33.6 repeat tagged with mRNA transcripts acts as transcriptional regulator, leading to qualitative changes in gene expression either by chromatin modification or sequence alterations.

Notwithstanding detection of 29 mRNA transcripts, we characterized two candidate single-copy genes, PXMP-4 and SARS2, showing the highest expression in the spermatozoa. Absence of peroxisomes in the cells of the liver, kidney, and brain has been associated with Zellweger syndrome, a rare, congenital disorder that affects children (Heymans et al. 1983). Analysis of full-length PXMP-4 gene (Supplementary Figure 3) showed that two units of 33.6 repeat are often not conserved, reflecting its polymorphic nature. Similar analysis of all the 33.6 tagged genes in buffalo and other species is envisaged to uncover levels of conservation of this minisatellite, highlighting their possible regulatory roles. Earlier studies have shown the presence of peroxisomes in spermatozoa (Reisse et al. 2001) which is corroborated by our present work on PXMP-4 gene expression. However, no such information is available on SARS2 gene in any species. This evoked our interest to localize this gene on buffalo chromosome using cattle-derived BAC clone.

In humans, several point mutations in SARS of mitochondria lead to inaccurate translation causing sensorineural deafness (Yokogawa et al. 2000; Shah et al. 2001). Mouse mitochondrial SARS shows ubiquitous expression but more in the tissues with high metabolic rate such as heart and liver (Gibbons et al. 2004). Therefore, the lowest expression of this gene in buffalo’s heart and liver was found to be startling. Even more surprising was the highest expression in the spermatozoa. Additional work on this line in buffalo and different species would resolve this issue and strengthen clinical significance of this gene. In addition, expression studies of these genes in genetically infertile animals would prove to be informative for ascertaining their involvement in the control and regulation of in/fertility, if any. This would enrich our understanding on the roles of un/common genes selectively expressing in various somatic tissues and germ line.

In conclusion, our data demonstrate that 33.6 minisatellite is an integral part of various transcribing genes in buffalo genome. The fact that few genes detected in the present study have clinical significance adds additional strength to MASA-mediated approach of genome analysis. Thus, detailed characterization of mRNA transcripts from different somatic tissues and spermatozoa is envisaged to be useful for (1) ascertaining their involvement in regulation of in/fertility, (2) molecular delineation of buffalo breeds, if any, and (3) identification of “superior” germplasm enriching prospects of animal biotechnology. Novel part of the present approach is that several functional, structural, and regulatory genes have been accessed without screening the cDNA library.

Abbreviations

BAC:

Bacterial artificial chromosome

CDH :

Cadherin 1

CDS:

Complementary DNA sequence

Ct :

Cycle threshold

DCS:

D cell sorter

DAPI:

4′,6-diamidino-2-phenylindole

FISH:

Fluorescence in situ hybridization

GAPDH :

Glyceraldehyde 3-phosphate dehydrogenase

ISCNDB:

International System for Chromosome Nomenclature of Domestic Bovids

MASA:

Minisatellite-associated sequence amplification

mRNA:

Messenger ribonucleic acid

ORF:

Open reading frame

PXMP-4 :

Peroxisomal membrane protein 4

R2 value:

Regression coefficient

RAPD:

Random amplification of polymorphic DNA

RT–PCR:

Reverse transcriptase–polymerase chain reaction

SARS2 :

Seryl-tRNA synthetase 2

TBE:

Tris/borate/EDTA

UTR:

Untranslated region

References

  1. Akagi T, Yin D, Kawamata N et al (2009) Functional analysis of a novel DNA polymorphism of a tandem repeated sequence in the asparagine synthetase gene in acute lymphoblastic leukemia cells. Leuk Res 33:991–996

    Article  CAS  PubMed  Google Scholar 

  2. Ali S, Wallace RB (1988) Intrinsic polymorphism of variable number tandem repeat loci in the human genome. Nucleic Acids Res 16:8487–8496

    Article  CAS  PubMed  Google Scholar 

  3. Altschul SF, Gish W, Miller W et al (1990) A basic local alignment search tool. J Mol Biol 215:403–410

    CAS  PubMed  Google Scholar 

  4. Azfer MA, Bashamboo A, Ahmed N et al (1999) Random amplification of polymorphic DNA with conserved sequences reveals genome-specific monomorphic amplicons: implications in clad identification. J Biosci 24:35–41

    Article  CAS  Google Scholar 

  5. Caburet S, Vaiman D, Veitia RA (2004) A genomic basis for the evolution of vertebrate transcription factors containing amino acid runs. Genetics 167:1813–1820

    Article  CAS  PubMed  Google Scholar 

  6. Caburet S, Cocquet J, Vaiman D et al (2005) Coding repeats and evolutionary “agility”. Bioessays 27:581–587

    Article  CAS  PubMed  Google Scholar 

  7. Charlesworth B, Sniegowski P, Stephan W (1994) The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371:215–220

    Article  CAS  PubMed  Google Scholar 

  8. Cribiu EP, Di Berardino D, Di Meo GP et al (2001) International system for chromosome nomenclature of domestic bovids (ISCNDB 2000). Cytogenet Cell Genet 92:283–299

    Article  CAS  PubMed  Google Scholar 

  9. Dey I, Rath PC (2005) A novel rat genomic simple repeat DNA with RNA-homology shows triplex (H-DNA)-like structure and tissue-specific RNA expression. Biochem Biophys Res Commun 327:276–286

    Article  CAS  PubMed  Google Scholar 

  10. Fondon JW, Garner HR (2004) Molecular origins of rapid and continuous morphological evolution. Proc Natl Acad Sci USA 101:18058–18063

    Article  CAS  PubMed  Google Scholar 

  11. Georges M, Andersson L (1996) Livestock genomics comes of age. Genome Res 6:907–921

    Article  CAS  PubMed  Google Scholar 

  12. Gibbons WJ Jr, Yan Q, Li R et al (2004) Genomic organization, expression, and subcellular localization of mouse mitochondrial seryl-tRNA synthetase. Biochem Biophys Res Commun 317:774–778

    Article  CAS  PubMed  Google Scholar 

  13. Heymans HS, Schutgens RB, Tan R et al (1983) Severe plasmalogen deficiency in tissues of infants without peroxisomes (Zellweger syndrome). Nature 306:69–70

    Article  CAS  PubMed  Google Scholar 

  14. Jeffreys AJ, Wilson V, Thein SL (1985) Hypervariable ‘minisatellite’ regions in human DNA. Nature 314:67–73

    Article  CAS  PubMed  Google Scholar 

  15. Jeffreys AJ, Neumann R, Wilson V (1990) Repeat unit sequence variation in minisatellites: a novel source of DNA polymorphism for studying variation and mutation by single molecule analysis. Cell 60:473–485

    Article  CAS  PubMed  Google Scholar 

  16. John MV, Ali S (1997) Synthetic DNA-based genetic markers reveal intra- and inter species DNA sequence variability in the Bubalus bubalis and related genomes. DNA Cell Biol 16:369–378

    Article  CAS  PubMed  Google Scholar 

  17. Karaca M, Sukumar S, Zipf A et al (2002) Genetic diversity among forage Bermuda grass (Cynodon spp.). Evidence from chloroplast and nuclear DNA fingerprinting. Crop Sci 42:2118–2127

    CAS  Article  Google Scholar 

  18. Kim TS, Booth JG, Gauch HG Jr et al (2008) Simple sequence repeats in Neurospora crassa: distribution, polymorphism and evolutionary inference. BMC Genomics 9:31

    Article  PubMed  CAS  Google Scholar 

  19. Lalancette C, Thibault C, Bachand L et al (2008a) Transcriptome analysis of bull semen with extreme nonreturn rate: use of suppression-subtractive hybridization to identify functional markers for fertility. Biol Reprod 78:618–635

    Article  CAS  PubMed  Google Scholar 

  20. Lalancette C, Miller D, Li Y et al (2008b) Paternal contributions: new functional insights for spermatozoal RNA. J Cell Biochem 104:1570–1579

    Article  CAS  PubMed  Google Scholar 

  21. Lambard S, Galeraud-Denis I, Martin G et al (2004) Analysis and significance of mRNA in human ejaculated sperm from normozoospermic donors: relationship to sperm motility and capacitation. Mol Hum Reprod 10:535–541

    Article  CAS  PubMed  Google Scholar 

  22. Legendre M, Pochet N, Pak T et al (2007) Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Res 17:1787–1796

    Article  CAS  PubMed  Google Scholar 

  23. Levdansky E, Romano J, Shadkchan Y et al (2007) Coding tandem repeats generate diversity in Aspergillus fumigatus genes. Eukaryot Cell 6:1380–1391

    Article  CAS  PubMed  Google Scholar 

  24. Li YC, Korol AB, Fahima T et al (2002) Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol 11:2453–2465

    Article  CAS  PubMed  Google Scholar 

  25. Li YC, Korol AB, Fahima T et al (2004) Microsatellites within genes: structure, function and evolution. Mol Biol Evol 21:991–1007

    Article  CAS  PubMed  Google Scholar 

  26. Lohi H, Young EJ, Fitzmaurice SN et al (2005) Expanded repeat in canine epilepsy. Science 307:81

    Article  CAS  PubMed  Google Scholar 

  27. Mahr S, Müller-Hilke B (2007) Transcriptional activity of the RHOB gene is influenced by regulatory polymorphisms in its promoter region. Genomic Med 1:125–128

    Article  PubMed  Google Scholar 

  28. Morgante M, Hanafey M, Powell W (2002) Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 30:194–200

    Article  CAS  PubMed  Google Scholar 

  29. Orgel LE, Crick FH (1980) Selfish DNA the ultimate parasite. Nature 284:604–607

    Article  CAS  PubMed  Google Scholar 

  30. Ostermeier GC, Goodrich RJ, Moldenhauer JS et al (2005) A suite of novel human spermatozoal RNAs. J Androl 26:70–74

    CAS  PubMed  Google Scholar 

  31. Pathak D, Srivastava J, Premi S et al (2006) Chromosomal localization, copy number assessment and transcriptional status of BamH1 repeat fractions in water buffalo Bubalus bubalis. DNA Cell Biol 25:206–214

    Article  CAS  PubMed  Google Scholar 

  32. Premi S, Srivastava J, Chandy SP et al (2007) AZFc somatic microdeletions and copy number polymorphism of the DAZ genes in human males exposed to natural background radiation. Hum Genet 121:337–346

    Article  CAS  PubMed  Google Scholar 

  33. Reisse S, Rothardt G, Völkl A et al (2001) Peroxisomes and ether lipid biosynthesis in rat testis and epididymis. Biol Reprod 64:1689–1694

    Article  CAS  PubMed  Google Scholar 

  34. Richard GF, Dujon B (2006) Molecular evolution of minisatellites in hemiascomycetous yeasts. Mol Biol Evol 23:189–202

    Article  CAS  PubMed  Google Scholar 

  35. Richard GF, Kerrest A, Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72:686–727

    Article  CAS  PubMed  Google Scholar 

  36. Shah ZH, Toompuu M, Hakkinen T et al (2001) Novel coding-region polymorphisms in mitochondrial seryl-tRNA synthetase (SARSM) and mitoribosomal protein S12 (RPMS12) genes in DFNA4 autosomal dominant deafness families. Hum Mutat 17:433–434

    Article  CAS  PubMed  Google Scholar 

  37. Srivastava J, Premi S, Pathak D et al (2006) Transcriptional status of known and novel genes tagged with consensus of 33.15 repeat loci employing minisatellite-associated sequence amplification (MASA) and real-time PCR in water buffalo, Bubalus bubalis. DNA Cell Biol 25:31–48

    Article  CAS  PubMed  Google Scholar 

  38. Srivastava J, Premi S, Kumar S et al (2007) Characterization of Smoc-1 uncovers two transcript variants showing differential tissue and age specific expression in Bubalus bubalis. BMC Genomics 8:436

    Article  PubMed  Google Scholar 

  39. Srivastava J, Premi S, Kumar S et al (2008) Organization and differential expression of the GACA/GATA tagged somatic and spermatozoal transcriptomes in buffalo Bubalus bubalis. BMC Genomics 9:132

    Article  PubMed  CAS  Google Scholar 

  40. Srivastava J, Premi S, Kumar S et al (2009) Expressional dynamics of minisatellite 33.15 tagged spermatozoal transcriptome in Bubalus bubalis. BMC Genomics 10:303

    Article  PubMed  CAS  Google Scholar 

  41. Usdin K (2008) The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. Genome Res 18:1011–1019

    Article  CAS  PubMed  Google Scholar 

  42. Vergnaud G, Denoeud F (2000) Minisatellites: mutability and genome architecture. Genome Res 10:899–907

    Article  CAS  PubMed  Google Scholar 

  43. Verstrepen KJ, Jansen A, Lewitter F et al (2005) Intragenic tandem repeats generate functional variability. Nat Genet 37:986–990

    Article  CAS  PubMed  Google Scholar 

  44. Wickings EJ (1993) Hypervariable single and multi-locus DNA polymorphisms for genetic typing of non-human primates. Primates 34:323–331

    Article  Google Scholar 

  45. Yokogawa T, Shimada N, Takeuchi N et al (2000) Characterization and tRNA recognition of mammalian mitochondrial seryl-tRNA synthetase. J Biol Chem 275:19913–19920

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by a DST Grant No. SR/WOSA/LS-92/2005 to DP, SR/FT/L-81/2005 to RS, DBT Grant No. BT/PR8476/AAQ/01/315/2006 to SA, and a core grant from the Department of Biotechnology, Govt. of India to the National Institute of Immunology, New Delhi. SA thanks Alexander Von Humboldt Foundation, Bonn, Germany for equipment donation and Shri Khem Singh Negi for technical assistance.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sher Ali.

Additional information

Responsible Editor: Hans-Joachim Lipps.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Table 1

Details of the primers used for RT–PCR analysis (a) and relative expression studies, copy number assessment by real-time PCR (b) amplification of 3′ region of PXMP-4 gene (c) (DOC 90 kb)

Supplementary Table 2

Association of mRNA transcripts across the species with consensus of 11 mer 33.6 repeat loci based on in silico analysis. Single unit was found to be present in the flanking and intervening regions of several structural and functional genes across the species (DOC 166 kb)

Supplementary Table 3

Details of 29 mRNA transcripts tagged with consensus of 33.6 repeat loci uncovered from different somatic tissues and spermatozoa (DOC 57 kb)

Supplementary Table 4

Cross-hybridization of recombinant clones corresponding to MASA-amplified buffalo mRNA transcripts with genomic DNA of 12 species. Transcript ID and species name are given on top of the panel. Note the exclusive presence of eight transcripts in buffalo/bovids (DOC 80 kb)

Supplementary Figure 1

Phylogenetic tree based on blast search result of representative MASA-amplified mRNA transcripts. Transcript ID is mentioned on the left side of the panel (al). Only those sequences showing similarity with the characterized genes in the database were taken for phylogenetic analysis (PPT 1116 kb)

Supplementary Figure 2

Full-length 1,488-bp cDNA sequence of buffalo PXMP-4 lacking poly A tail. Nucleotide sequence in green shows region amplified by MASA with 33.6 primer, and red represents region amplified by bovine-derived PXMP-4 internal primers. Note the position of the 33.6 primers (underlined). The 1,488-bp cDNA sequences from different somatic tissues and spermatozoa were found to be identical (PPT 208 kb)

Supplementary Figure 3

Multiple sequence alignment of PXMP-4 nucleotide sequence(s) from different species. Note the close sequence homology between buffalo (red) and cattle (green). Box (yellow) indicates position of minisatellite 33.6 in different species (DOC 61 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pathak, D., Srivastava, J., Samad, R. et al. Genome-wide search of the genes tagged with the consensus of 33.6 repeat loci in buffalo Bubalus bubalis employing minisatellite-associated sequence amplification. Chromosome Res 18, 441–458 (2010). https://doi.org/10.1007/s10577-010-9132-0

Download citation

Keywords

  • Buffalo genome
  • 33.6 minisatellites
  • Satellite-tagged transcripts
  • Relative expression
  • Fluorescence in situ hybridization