Large-scale transcriptome sequencing and gene analyses in the crab-eating macaque (Macaca fascicularis) for biomedical research
- First Online:
- Cite this article as:
- Huh, JW., Kim, YH., Park, SJ. et al. BMC Genomics (2012) 13: 163. doi:10.1186/1471-2164-13-163
- 5.3k Downloads
As a human replacement, the crab-eating macaque (Macaca fascicularis) is an invaluable non-human primate model for biomedical research, but the lack of genetic information on this primate has represented a significant obstacle for its broader use.
Here, we sequenced the transcriptome of 16 tissues originated from two individuals of crab-eating macaque (male and female), and identified genes to resolve the main obstacles for understanding the biological response of the crab-eating macaque. From 4 million reads with 1.4 billion base sequences, 31,786 isotigs containing genes similar to those of humans, 12,672 novel isotigs, and 348,160 singletons were identified using the GS FLX sequencing method. Approximately 86% of human genes were represented among the genes sequenced in this study. Additionally, 175 tissue-specific transcripts were identified, 81 of which were experimentally validated. In total, 4,314 alternative splicing (AS) events were identified and analyzed. Intriguingly, 10.4% of AS events were associated with transposable element (TE) insertions. Finally, investigation of TE exonization events and evolutionary analysis were conducted, revealing interesting phenomena of human-specific amplified trends in TE exonization events.
This report represents the first large-scale transcriptome sequencing and genetic analyses of M. fascicularis and could contribute to its utility for biomedical research and basic biology.
Differentially Expressed Gene
Human Endogenous Retroviruses
Kyoto Encyclopedia of Genes and Genomes
Long Interspersed Elements
National Institutes of Health
National Primate Research Center
Online Mendelian Inheritance in Man
Phosphate Buffered Saline
Reverse Transcriptase Polymerase Chain Reaction
Short Interspersed Elements
Specific Pathogen Free
Crab-eating macaques (Macaca fascicularis) are one of the most frequently used and studied species for biomedical research . Due to the broad range of habitats, they have various common names including crab-eating macaque, cynomolgus macaque, Philippine monkey, and long-tailed macaque. Numerous wild crab-eating macaques are distributed in Southeast Asia, including Indonesia, Philippines, Myanmar, Vietnam, and Thailand . They inhabit various habitats including primary, secondary, coastal, mangrove, and riverine forests and areas near villages. Diurnal and arboreal crab-eating macaques belong to the infraorder Catarrhini, superfamily Carecopithecoidea, family Cercopithecidae, and genus Macaca.
With the aid of fossil records and comparative DNA sequence analysis, genus macaques and humans have diverged from a common ancestor between 25 and 31 million years ago . This evolutionary relationship has made this primate as a more suitable experimental animal model than rodents, dogs, and pigs and may lead to its widespread use for the translational studies for drug testing . Among the genus Macaca, Rhesus and crab-eating macaque is representative species which were widely used as a non-human primate model for biomedical research. However, the rhesus macaque is the most frequently used primate as a non-human primate model . In the United States, more than 60% of monkeys housed in National Institutes of Health (NIH)-supported facilities are rhesus macaques . Furthermore, 65% of the monkeys used for experimental research each year are rhesus macaques. In 2007, first draft genome sequences of rhesus macaque genome was published . These worldwide trends in use and accumulated genome information data may lead to the assumption that the rhesus macaque is the ideal non-human primate model. However, the event of “export ban of rhesus monkey from India in 1977” had restricted the usage of Indian subspecies of the rhesus macaque and accelerate the building of self-sustaining breeding colonies in the US. Therefore, researchers who want to have a research with rhesus monkey in the outside of US have some problems, they have concerned the chinese-origin rhesus macaque and crab-eating macaque from south asia . Furthermore, the crab-eating macaque has important advantages, including (1) easy handling derived from a smaller body size (♂ 412–648 mm, ♀ 385–503 mm vs. ♂ 483–635 mm, ♀ 470–531 mm), weight (♂ 4.7–8.3 kg, ♀ 2.5–5.7 kg vs. ♂ 5.6–10.9 kg, ♀ 4.4–10.9 kg) and longer tails than rhesus macaques ; (2) low cost and easy availability for experimental use; and (3) lack of seasonal fertility, which may affect efficient experiments and scheduling in the large-scale housing of experimental monkeys . Finally, abundant gene information is available for the crab-eating macaque. Greater numbers of EST and full-length cDNA library sequences are available in the NCBI database for crab-eating macaque [9, 10, 11, 12, 13, 14, 15]. And recently their draft genome sequences also available in the EBI database [6, 16]. Therefore, crab-eating macaque could be a excellent experimental primate animal models for biomedical studies.
In an in-depth examination of the published papers from 2010 to 2011 indicated that pharmacology field for safety and toxicity testing of newly developed drugs was the most frequently encountered [17, 18, 19, 20]. In particular, the crab-eating macaque was used predominantly in brain research, the neurosciences, and clinical research [21, 22, 23, 24]. Furthermore, experimental primate model have been developed by four different ways of simple replacement, induced, infection, and surgical. The induced method involved treatment with specific chemicals (e.g., 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) or streptozotocin (STZ)[25, 26, 27], whereas the surgical method (e.g., middle cerebral artery occlusion model for ischemia) were created through specific types of surgery . The infection method was simpler than previously described since humans and the crab-eating macaque have numerous “anthroponosis” (the opposite of “zoonosis”), including influenza, tuberculosis, and hepatitis . Lastly, simple replacement method was the usage of natural crab-eating monkey for specific purpose (e.g., drug safety or efficacy testing) .
From now, numerous disease models, including aging, alcohol abuse, Alzheimer’s disease, amenorrhea, asthma, diabetes, epilepsy, menopause, obesity, osteoporosis, Parkinson’s disease, plague, variola, vascular disease, and various infection disease models, have been developed and used [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46] However, small amount of transcript sequences of crab-eating macaque could be a weak point to be a good experimental animals for biomedical application. If we have abundant transcript sequences for crab-eating macaque, we could design the whole gene probe sequences for microarray analyses. And also, due to the insufficient transcript sequences, we could not analyze the alternatively spliced transcripts in different tissues. Recent accumulated transcriptome information underlined that AS event is an important molecular mechanism since it can generate different functional units for transcriptome and proteome diversity using limited genetic sources[47, 48, 49]. And also human transcriptome studies with different human tissues show different AS patterns derived by tissue-specific alternative promoters and polyadenylation [50, 51, 52]. However, sometimes aberrant changes in alternative splicing could occur the human disease (e.g. retinitis pigmentosa or cystic fibrosis) [53, 54]. And A few number of papers have reviewed the association between alternative splicing and disease [55, 56, 57, 58]. Among the different AS mechanism, TE exonization is intriguing AS events . Specifically, small amount of TEs show the tissue specific and species specific characters . That means that TE exonization event could be a one of the important AS events. Therefore, AS is not a simple molecular aspect of RNA transcription, rather it represents a highly controlled and evolved molecular mechanism for generating genetic diversity using limited DNA resources. And also AS control mechanism are major growing topics in biomedical researches. Hence, the investigation of the AS events in specific genes is another means of novel gene or disease gene identification and characterization steps. However, these kinds of applications with crab-eating macaque for advanced biomedical research could be achieved by the massive amount of transcript sequences and information.
In this study, we carried out a whole-transcriptome sequencing analysis of 16 tissues from Macaca fascicularis using GS FLX sequencing to generate massive transcript information for the improvement of biomedical use. More than 4 million raw reads were created and assembled, resulting in 35,524 isogroups, 44,458 isotigs, 54,858 contigs, and 348,160 singletons. Additionally, we identified and experimentally validated differentially expressed gene (DEG) transcripts. Finally, using the numerous transcript sequences, we analyze the AS and TE events of crab-eating macaque.
Results and discussion
GS FLX sequencing and gene annotation
Application for OMIM database and KEGG pathway database
We then applied our results to the OMIM database (http://www.ncbi.nlm.nih.gov/omim/), which provides information on disorder-related genes that have been functionally well-characterized, and the KEGG pathway database (http://www.genome.jp/kegg/pathway.html), a representative molecular pathway database specifically for disease-related pathways. In the OMIM database, we collected all of the available gene sets for calculation of coverage. Of the 2,579 disorder-related genes in the OMIM database (Additional file 1: Table S6), 1,935 genes (75.02%) were covered by our results (Additional file 1: Table S7), indicating that the gene information from our sequencing could lead to an enhanced understanding of the genetic responses to specific experimental conditions in disease-related research on the crab-eating macaque.
MPTP treatment of the crab-eating macaque is one of the most well established models of Parkinson’s disease . Therefore, we applied our results to the investigation of Parkinson’s disease (map05012) in the KEGG pathway. In general, first step of disease mechanism research is the identification of full-length gene sequences, specifically coding sequences (CDS), using cDNA library or RACE experiments for the investigation of a specific disease. Then other following steps of in vitro or in vivo experiments are applied for the characterization of specific disease. Therefore, the identification of intact CDS in genes was our primary goal. In the KEGG pathway database, 129 Parkinson’s disease genes were registered. We manually tested the existence of open-reading frame sequences and compared the existence of full-length CDS with our sequencing data (data not shown). Our results indicated that total 115 genes (89%) harbor the intact full-length CDS (101 genes) or truncated CDS or UTR sequences (14 genes). These high rate of identification of intact full-length sequences are coincided the property of GS FLX sequencing platform (long-read sequencing) [61, 62]. Although, we did not validated the other disease-related genes in OMIM database, our results can clearly reduce the cost and experimental efforts for the identification of specific disorder-related genes for biomedical research.
Differentially expressed gene analysis and experimental validation
Identification and validation of tissue-specific transcripts.
Selected DEGs for experimental validation
SLC12A21, CA12, CLCA43
UMOD_T27, UMOD_T18, TINAG9, SLC34A110, SLC22A611, SLC22A1212, LRP213, CDH1614, C12orf5915, A2LD116
CYP2B617, C918, F919, TAT20, F13B21, CRP22, C8B23, FGG24, GC25, MBL226
SFTPD27, SFTPB28, SFTPA129, SFTPC30
CELA2A31, CPB132, PRSS333, CEL34, INS35, CTRB236, CELA137, CLPS38, PRSS239, CELA3A40, CPA241
CA644, C4orf4045, MUC746, CST247, CST548, AMY2A49, PRB150, CST451, PRB352, STATH53, HTN154
MYH455, AMPD156, TPM357, ATP2A158, MYOT59, MYBPC160, MYL161, TNNI262
CHIA65, LIPF66, GKN267, GKN168, PGA569
ADAM3270, SHCBP1L71, ACRBP72, CABS173, CRISP274, TCP1175, ALLC76, TUBA3D77, ANKRD778, LDHC79, CMTM280, FUNDC281
Alternative splicing (AS) analysis
A total of 6,931 manually corrected AS events were identified in the 24,884 clustered isogroups (Additional file 1: Table S21). Total 4314 isogroup harbored the more than one alternatively spliced transcripts. The average number of AS events was 1.60, and the highest number observed was 63 AS events in the AKR1B10 gene (Additional file 1: Table S22). Intriguingly, the human AKR1B10 gene shows only one reference mRNA sequence, while the EBI database of Alternative Splicing and Transcript Diversity 1.1 indicated only 5 alternative transcripts for this gene in humans (http://www.ebi.ac.uk/asd/index.html). A careful analysis indicated that AS events occurred more frequently in the 5′ and 3′ regions (2,270 and 2,313, respectively) than the internal regions (1,727) (Additional file 1: Table S22). Further, 274 AS events (10.4%) were TE related. As a result, ~17% of the crab-eating macaque isogroups were shown to have alternatively spliced transcripts. This lower rate of AS events in the crab-eating macaque may be explained by 2 alternative interpretations. One is the shortage of total amount of transcript sequences. In the case of human studies, earlier researches indicated that approximately 40%–70% of genes have alternative transcripts. However, advanced high-throughput sequencing and bioinformatic tools have shown that 92%–95% of human genes undergo AS [50, 51, 64, 65, 66]. In addition, different human tissues show different AS patterns because of tissue-specific alternative promoters and polyadenylation [50, 51, 52]. Therefore, larger amount of transcript sequences and more diverse tissues or cell types could enhance the AS information. Another is explained by simple lineage specific characters. Because, we already observed the differential alternative splicing between human and chimpanzees [63, 67]. And, as indicated in the genome project of chimpanzee and orangutan, different amplification rate and lineage specific of transposable elements could cause the different TE-derive alternative splicing [68, 69].
Transposable element (TE) analysis
Recent growing genomic evidence has indicated that TEs are a valuable genetic resource for transcriptome and proteome diversity [70, 71, 72, 73]. Exonization events are one of the AS mechanisms that can occur as a result of TEs, including human endogenous retroviruses (HERVs), short interspersed elements (SINEs), and long interspersed elements (LINEs). Alu (a primate-specific SINE) and LINEs have potential 5′ and 3′ splicing sites for exonization events. Moreover, HERVs and LINEs harbor internal promoters that can control the tissue-specific expression of a gene .
Broad range of utility of crab-eating macaque gene information
The results of our study have implications for various fields of research. First, the massive number of transcriptome sequences (approximately 4 million sequences in 16 tissues) could be used as a draft of the crab-eating macaque gene sequences. In addition, the modified and combined gene information could be used for the production of DNA probe sequences for microarray analysis. Specifically, the company Agilent provides a customized probe design service using industrial-scale inkjet technology (http://www.genomics.agilent.com/). Therefore, crab-eating macaque microarray chips could be designed for specific experiments and more rapid and accurate gene expression profiling is possible in a single experiment. For example, to investigate specific drugs for Parkinson’s disease, customized microarray chips harboring the 129 Parkinson’s disease-related crab-eating macaque genes from the KEGG pathway database could be prepared.
Second, crab-eating macaque gene information coupled with gene information from the rhesus macaque could be used to resolve the mystery of speciation events between closely related species. The average genetic divergence between crab-eating macaque and rhesus macaque is 0.4%–0.5%, and their evolutionary relationship is closer than that between human and chimpanzee . Therefore, large-scale transcript sequences could help to trace the evolutionary root of the speciation event. Third, gene sequencing of the crab-eating macaque could accelerate the completion of a genome project for this primate. Recently draft genome sequences also available (http://www.ebi.ac.uk/asd/index.html). Hence, reanalysis and diverse application could be possible for the analysis of genome and transcripme in crab-eating macaque. Fourth, the 175 DEGs, including the 81 experimentally validated DEGs, represent candidate genes with tissue-specific functions. Specifically, two of gene groups of functionally well characterized genes with tissue relatedness not investigated, and functionally not characterized genes with tissue relatedness not investigated, could be a valuable sources for tissue specific functional study and novel function analysis in specific tissue, respectively. Fifth, the AS and TE exonization analysis could be used for comparative analysis of crab-eating macaque with other species. Although, the data set are not sufficient for other application, our results are to be used as basic information to understand the transcriptome of crab-eating macaque. Finally, our open data base are very useful for numerous researchers who are interested in the gene information of crab-eating macaque, specifically unskilled researchers in genomics and bioinformatics technique.
We sequenced the transcriptome of 16 different tissues from M. fascicularis for the biomedical usage. We found that ~86% of human genes are represented in the ones sequenced in this study. Therefore our results of gene information could be used for understanding the biological response of the crab-eating macaque for safety and efficacy testing. Additionally, 175 tissue-specific genes were identified, with 81 of them experimentally validated. We identified and analyzed 4,314 alternative splicing (AS) events and positive selected genes. Intriguingly, 10.4% of the AS events were associated with transposable element (TE) insertions. And human-specific amplified trends of TE exonization event are also revealed during the primate evolution. Our research is the first large-scale transcriptome sequencing and gene analyses. Therefore, this result could be valuable genetic resources for biomedical research and improve our understanding of primate evolution.
Specific pathogen free (SPF) crab-eating macaques
Adult male (5 years old) and female (6 years old) crab-eating macaques (Macaca fascicularis) weighing between 4 kg and 7 kg were used. Their origin is vietnam. All animals were provided by the National Primate Research Center (NPRC) of Korea. In our experiments, specific pathogen free (SPF) animals were used. All animals underwent a complete physical, viral, bacterial, and parasite examination. On physical examination, SPF animals were examined for criteria, including coat condition, appearance, weight, sex, and date of birth. Enzyme immunoassay was performed to detect viruses such as BV; STLV-1 and −2; SIV; SRV-1, -2, and −5; and SVV. In addition, tests were performed to detect Mycobacterium tuberculosis (TB), Shigella spp., Salmonella spp., and Yersinia spp. For the TB skin test, all animals were tested by an intradermal injection in the eyelid, and the remaining bacterial examination items were checked by fecal culture tests. In our SPF animals, all items in the above tests were negative.
Sample preparation for GS FLX sequencing and gene annotation
The most important issue for transcriptome sequencing is the preparation of fresh and healthy tissue samples. Therefore, specific pathogen free (SPF) one male and one female adult crab-eating macaques were selected. Additionally, perfusion with diethylpyrocarbonate (DEPC)-treated phosphate buffered saline (PBS) was conducted via the common carotid artery with RNase inhibitors to inhibit blood contamination and promote recovery of intact RNA molecules from the tissue samples. Sixteen tissue samples were collected from one male and one female crab-eating monkeys (1. Cecum, 2. Cerebellum, 3. Heart, 4. Kidney, 5. Liver, 6. Lung, 7. Ovary, 8. Pancreas, 9. Prostate, 10. Salivary gland, 11. Skeletal muscle, 12. Small intestine, 13. Spleen, 14. Stomach, 15. Testis, and 16. Cerebrum).
All animal procedures and study design were conducted in accordance with the Guidelines of the Institutional Animal Care and Use Committee (KRIBB-AEC-11010) in Korea Research Institute of Bioscience and Biotechnology (KRIBB).
RNA isolation and mRNA subtraction
Total RNA was extracted from 16 different crab-eating monkey tissues using the Trizol reagent (Invitrogen), and total RNAs were validated by RNA electrophoresis in agarose gels containing formaldehyde. Two distinct ribosomal RNA bands (28 S and 18 S) were confirmed. Pure mRNA was subtracted using the PolyA Tract mRNA isolation system (Promega).
cDNA synthesis and poly(A) tail removal
First strand cDNA synthesis was conducted using the RevertAid H Minus First Strand cDNA Synthesis Kit (Fermentas) using oligo(dT) primers optimized for the 454 sequencing procedures (5′- GAGCTAGTTCTGGAG(T)16VN-3′). Second strand cDNA was synthesized by DNA pol I and RNase H (Fermentas), and the poly(A) tail was removed using a specific enzyme (Gsul).
Library preparation for GS FLX sequencing
The first step of library preparation involves the fragmentation of the high molecular weight DNA sample into smaller molecular species appropriate for sequencing using GS FLX Titanium chemistry. This fragmentation is performed by nebulization, which shears double-stranded DNA into fragments ranging from about 400 to 1000 base pairs. This population of smaller-sized DNA species, generated from a single DNA sample, is referred to as a “library.” Approximately 3–5 μg cDNA was used to generate the DNA library for Genome Sequencer FLX Titanium (Roche, Mannheim, GE). The fragment ends were polished (blunted), and 2 short adapters were ligated onto both ends. The adapters provide priming sequences for both amplification and sequencing of the sample library fragments, as well as the “sequencing key”, a short sequence of 4 nucleotides used by the system software for base calling and, following repair of any nicks in the double-stranded library, release of the unbound strand of each fragment (with 5′-Adaptor A). Finally the quality of the library of single-stranded template DNA fragments (sst DNA library) was assessed using a 2100 BioAnalyzer (Agilent, Waldbronn, GE), and the library was quantified, including a functional quantification to determine the optimal amount of the library to use as input for emulsion-based clonal amplification.
Single “effective” copies of template species from the DNA library to be sequenced were hybridized to DNA Capture Beads. The immobilized library was then resuspended in the amplification solution, and the mixture was emulsified, followed by PCR amplification. After amplification, the DNA-carrying beads were recovered from the emulsion and enriched. The second strands of the amplification products were melted away as part of the enrichment process, leaving the amplified single-stranded DNA library bound to the beads. The sequencing primer was then annealed to the immobilized amplified DNA templates.
After amplification, the DNA-carrying beads were set into the wells of five and a half PicoTiterPlate device (PTP) such that each well contained a single DNA bead. The loaded PTP was then inserted into the Genome Sequencer FLX instrument, and sequencing reagents were sequentially flowed over the plate. Information from all the wells of the PTP is captured simultaneously by a camera and can be processed in real time by the onboard computer. The sequencing procedure was conducted on a Genome Sequencer FLX Titanium instrument (Roche, Mannheim, GE) at Macrogen in Korea.
Sequence assembly and gene annotation
A total of 4,058,656 raw reads obtained from the 16 libraries were used for our analysis. For rapid assembly and exact gene annotation, all raw reads into were divided into 2 groups, clustered reads and unclustered reads, by the clustering method of the BLASTN program with human reference RNA ( Additional file 2: Figure S2). This method generated 3,240,337 reads clustered with human reference RNA and 818,319 unclustered reads. Each group was analyzed by GS de novo Assembler v.2.5.3 (Newbler, 454 Life Science). The clustered group generated 38,750 assembled contigs, 31,786 isotigs, and 24,884 isogroups and 99,283 unassembled singletons. However, 132,121 reads were discarded due to short, chimeric, or repetitive sequences. The unclustered group generated 16,108 assembled contigs 12,672 isotigs, and 10,640 isogroups and 248,877 unassembled singletons. In addition, 57,613 reads were also discarded.
Two different gene annotation strategies were conducted in the clustered and unclustered groups. In the clustered group, initial gene information obtained by clustering with human reference RNA was used for the gene annotation. However, in the case of the unclustered group and unassembled singleton sequences, the BLASTX program was used with the nr70 database. The CD-HIT program (http://www.bioinformatics.org/cd-hit/) was used to build the nr70 database. If gene annotation was conducted, Gene Ontology (GO) searching (http://www.geneontology.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis (http://www.genome.jp/kegg/) were performed.
KEGG pathway analyses
By overlaying expression data onto biological pathways, established and novel relationships among genes can be explored. These pathways give key information about the functional and metabolic organization of cellular and biological systems within organisms. Therefore, putative crab-eating macaque genes incorporate KEGG pathway information. The pathway analysis pipeline extracts EC numbers from the descriptions of UniProt results, and these EC numbers are mapped with KEGG pathway information.
Using the annotated gene information, our sequences were compared with human unigene and reference sequences. Our sequences were analyzed using the BLASTN program with an expectation value of 1e-20. If one match occurred between human and crab-eating macaque sequences, the one match was interpreted as a covered result. Additionally, Online Mendelian Inheritance in Man (OMIM) gene sets were applied for disease-related gene research (http://www.ncbi.nlm.nih.gov/omim).
Differentially expressed gene (DEG) analysis
Sixteen different tissue samples were collected and sequenced. Thus, over 4 million reads harboring different tissue information were available for the DEG analysis. DEG information was extracted by counting the read information. Exclusively tissue-specific contigs (only allowed 100%) that contained a minimum of 100 reads were selected. For the experimental validation, 81 randomly selected DEGs were validated.
Reverse transcriptase polymerase chain reaction (RT-PCR) amplification and sequencing procedure
Locus-specific primer pairs were used for the RT-PCR amplification of 81 DEGs (Additional file 1: Table S23). If possible, 2 distant exons were used for constructing primer pairs to reduce non-specific PCR bands resulting from genomic contamination. In the validation steps, 15 tissues samples are used for the experimental efficiency (We removed the ovary samples). M-MLV reverse transcriptase with an annealing temperature of 42°C was used for the reverse transcription reaction with an RNase inhibitor (Promega). Control PCR amplification was also performed on pure mRNA samples that were not subjected to reverse transcription, indicating that the prepared mRNA samples did not contain genomic DNA. RT-PCRs were carried out for 30 cycles at specific annealing temperatures. To validate amplified products, RT-PCR products were separated on a 1.5% agarose gel, purified using a gel extraction kit (GeneAll), and cloned into the pGEM-T-easy vector (Promega). The cloned DNA was isolated using a plasmid DNA mini-prep kit (GeneAll). Sequencing was conducted by a commercial sequencing company (Macrogen).
Transposable element (TE) analysis
The TEs included in the human reference RNAs, chimpanzee reference RNAs, rhesus reference RNAs, marmoset reference RNAs, and clustered assembly contigs were analyzed for comparative TE analysis. The TEs were identified by the RepeatMasker program (http://repeatmasker.genome.washington.edu) with various repeat sequences from the Repbase Update.
Alternative splicing (AS) analysis
For the AS analysis, the Newbler2.5 assembly result files (54AllContig.fna, 454Isotigs.fna, and 454IsotigsLayout.txt) were modified. Among these result files, the 454IsotigsLayour.txt file demonstrated the relationships between isotigs and contigs in specific isogroups. Therefore, the alternatively spliced isogroup information was collected. Among the AS data, only clustered and annotated isogroups were analyzed for the comparative analysis with humans. However, in the case of crab-eating macaque, detailed phenomena could not be investigated because no crab-eating macaque genome sequences are available. For a detailed analysis, the AS data was analyzed manually. The 5′ and 3′ alternative exon and internal exon units that could occur by exon creation or loss ( Additional file 1: Table S22) and the TE-related AS were counted. In the manual analysis, specific exons harboring a TE in the marginal regions of exons were designated as TE-related AS (Additional file 2: Figure S3).
Accession numbers and database
The data have been deposited in the DDBJ under accession number DRA000436. The assembled sequences are also freely available from http://220.127.116.11/macaca/.
This research was supported by a grant from the KRIBB Research Initiative Program (KGM4241231).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.