A new method to develop highly specific models for regulatory DNA regions
We present a new modular concept to construct organizational models for transcriptional regulatory DNA units. The method requires a training set of at least 10 sequences and a simple initial model (e.g. two characteristic transcription factor binding sites). The final model is generated by computer analysis directly from the sequences. 20 Lentivirus long terminal repeats (LTRs) and an initial model consisting of only two elements (TATA box and polyA signal) resulted in a final model of 10 elements which recognized all of the more than 100 available Lentivirus LTRs while rejecting all other known LTR types. Database searches with this Lentivirus LTR model demonstrated the very high specificity of our method.
Unable to display preview. Download preview PDF.
- Myers, G., Wain-Hobson, S., Henderson, L.E., Korber, B., Jeang, K.-T., Pavlakis, G.N.: Human retroviruses and AIDS 1994. A compilation and analysis of nucleic acid and amino acid sequences. Database by Los Alamos National Laboratory (1994)Google Scholar
- Quandt, K., Frech, K., Herrmann, G., Werner, T.: A consensus match scoring system that is correlated with biological functionality. in Bioinformatics: From Nucleic Acids and Proteins to Cell Metabolism (Eds. D. Schomburg, U. Lessel) (1995a) 47–57Google Scholar
- Uberbacher, E.C., Mural, R.J.: Locating Protein-Coding Regions in Human DNA Sequences by a Multiple Sensor Neural Network Approach. Proc. Nad. Acad. Sci. USA 88 (1991) 11261–11265Google Scholar