Abstract
Narrowing candidate genomic segments down to a size compatible with contig assembly by shotgun sequencing is a limiting step in the identification of mammalian genes by positional cloning. A 6 to 8 fold redundancy of sequencing (coverage) is usually required and, most often, does not alleviate the need for directed approaches for closing the remaining gaps. Once a complete sequence is obtained, computer analyzes are used to locate candidate exons, usually spanning less than 10% of the genomic sequence. Here, I propose an alternative strategy in which the need for contig assembly - and the high coverage it imposes - is removed. This strategy takes advantage of the fact that: i) mammalian coding exons are on average much smaller than individual sequencing runs, and ii) computer methods to identify coding regions only depend on local sequence information. In this context of “exon hunting” (opposed to genomic sequencing per se), I show that a 2 to 3 fold sequencing coverage is indeed sufficient to locate most candidate exons within genomic fragments, the size of which is now limited only by the available sequencing power. This strategy can be fully automated as it involves a single experimental technique and real-time computer analyzes of the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Legouis, J-P. Hardelin, J. Levilliers, J.-M. Claverie, S. Compain, V. Wunderle, P. Millasseau, D. Le Paslier, D. Cohen, D. Caterina, L. Bougueleret, G. Lutfalla, J. Weissenbach, and C. Petit, The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules, Cell 67:423 (1991).
E.S. Lander and M.S. Waterman, Genomic mapping by fingerprinting random clones: a mathematical analysis, Genomics 2:231 (1988).
J.W. Fickett, Recognition of protein coding regions in DNA sequences, Nucl. Acids Res.10:5303 (1982).
J.-M. Claverie, J.-M. Claverie and L. Bougueleret, L. Bougueleret Heuristic informational analysis of sequences. Nucl. Acids Res.14:179 (1986).
J.-M. Claverie, I. Sauvaget, and L. Bougueleret, k-tuple frequency analysis: from intron/exon discrimination to T-Cell epitope mapping, Meth. Enzymol.183:237 (1990).
E.C. Uberbacher and R.J. Mural, Locating protein-coding regions in human DNA by a multiple sensor neural network approach, Proc. Natl. Acad. Sci. USA 88:11261 (1991).
W. Gish and D.J. States, Identification of protein coding regions by database similarity search, Nature Genetics 3:266 (1993).
J.M. Claverie and D. States, Information enhancement methods for large scale sequence analysis. Computers and Chemistry 17:191 (1993).
J.M. Claverie, Large scale sequence analysis, in “Automated DNA Sequencing and Analysis Techniques,” J.C. Venter, ed., Academic Press, New York, (in press).
J.-M. Claverie, Identifying coding exons by similarity search: Alu-derived and other potentially misleading protein sequences, Genomics 12:838 (1992).
M.L. Engle and C. Burks, Artificially generated data sets for testing DNA sequence assembly algorithms. Genomics 16:286 (1993).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer Science+Business Media New York
About this chapter
Cite this chapter
Claverie, JM. (1994). Shallow Shotgun Sequencing as a Strategy for Finding Coding Exons. In: Hochgeschwender, U., Gardiner, K. (eds) Identification of Transcribed Sequences. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2562-2_20
Download citation
DOI: https://doi.org/10.1007/978-1-4615-2562-2_20
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6094-0
Online ISBN: 978-1-4615-2562-2
eBook Packages: Springer Book Archive