Abstract
We study different genetic algorithm operators for one permutation problem associated with the Human Genome Project—the assembly of DNA sequence fragments from a parent clone whose sequence is unknown into a consensus sequence corresponding to the parent sequence. The sorted-order representation, which does not require specialized operators, is compared with a more traditional permutation representation, which does require specialized operators. The two representations and their associated operators are compared on problems ranging from 2K to 34K base pairs (KB). Edge-recombination crossover used in conjunction with several specialized operators is found to perform best in these experiments; these operators solved a 10KB sequence, consisting of 177 fragments, with no manual intervention. Natural building blocks in the problem are exploited at progressively higher levels through “macro-operators.” This significantly improves performance.
Article PDF
Similar content being viewed by others
References
Bean, J. C. (1992). Genetics and random keys for sequencing and optimization. Technical Report 92-43, The University of Michigan.
Burks, C., Engle, M., Lowenstein, M., Parsons, R., & Soderlund, C. (1993). Stochastic optimization tools for DNA assembly: integration of physical map and sequence data. Poster presented at Genome Sequencing and Analysis Conference V.
Burks, C., Engle, M., Forrest, S., Parsons, R., Soderlund, C., & Stolorz, P. (1994). Stochastic optimization tools for genomic sequence assembly. In Adams, M.O., Fields, C., & Venter, J. C., eds., Automated DNA Sequencing and Analysis Techniques. Academic Press.
Carlsson, P., Darnfors, C., Olofsson, S. O., & Bjursell, G. (1986). Analysis of the human apolipoprotein B gene, complete structure of the B-74 region. Gene 49:29–51.
Cedeno, W., & Vemuri, V. (1993). An investigation of DNA mapping with genetic algorithms: preliminary results. In Proc. of the Fifth Workshop on Neural Networks, volume 2204 of SPIE.
Chen, W. Q., & Hunkapiller, T. (1992). Sequence accuracy of large DNA sequencing projects. J. DNA Seq. Map 2:335–342.
Churchill, G., Burks, C., Eggert, M., Engle, M., & Waterman, M. (1993). Assembling DNA sequence fragments by shuffling and simulated annealing. Technical Report LAUR 93-2287, Los Alamos National Lab., Los Alamos, NM.
Davis, L. (1985). Applying adaptive algorithms to epistatic domains. In Proc. of the 1985 Joint Conference on Artificial Intelligence. Los Angeles, CA: Morgan Kaufmann.
Engle, M., & Burks, C. (1993). Artificially generated data sets for testing DNA fragment assembly algorithms. Genomics 286–288.
Fickett, J., & Cinkosky, M. (1993). A genetic algorithm for assembling chromosome physical maps. Proc. of the Second International Conference on Bioinformatics, Supercomputing, and Complex Genome Analysis. St. Petersburg, FL: World Scientific. 272–285.
Forrest, S. (1993). Genetic algorithms: Principles of natural selection applied to computation. Science 261:872–878.
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley Publishing Company.
Grefenstette, J. J. (1984). Genesis: A system for using genetic search procedures. In Proceedings of a Conference on Intelligent Systems and Machines, 161–165.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press.
Howe, C., & Ward, E., eds. (1989). Nucleic Acids Sequencing: A Practical Approach. IRL Press.
Huang, X. (1992). A contig assembly program based on sensitive detection of fragment overlaps. Genomics 14:18–25.
Hunkapiller, T., Kaiser, R., Koop, B., & Hood, L. (1991). Large-scale and automated DNA sequence determination. Science 254:59–67.
Hunkapiller, T., Kaiser, R., & Hood, L. (1991). Large-scale DNA sequencing. Curr. Opin. Biotech. 2:92–101.
Kececioglu, J. (1991). Exact and approximation algorithms for DNA sequence reconstruction. Ph.D. Dissertation, University of Arizona, Tucson, AZ. TR 91-26, Department of Computer Science.
Lawler, E., Rinnooy Kan, A., & Shmoys, D., eds. (1985). The Traveling Salesman Problem. New York: John Wiley and Sons.
Lin, S., & Kernighan, H. W. (1973). An effective heuristic algorithm for the traveling-salesman problem. Operations Research 21:498–516.
Matsumoto, K., Arai, M., Ishihara, N., Ando, A., Inoko, H., & Ikemura, T. (1991). Cluster of fibronectin type-III repeats found in the human major histocompatibility complex class III region shows highest homology with repeats in an extracellular matrix protein, tenascin. Genomics 12:485–491.
Parsons, R., Forrest, S., & Burks, C. (1993). Genetic algorithms for DNA sequence assembly. In Proceedings of the 1st International Conference on Intelligent Systems in Molecular Biology, 310–318. Bethesda, MD: AAAI Press.
Sanger, F., Coulson, A., Hill, D., & Petersen, G. (1982). Nucleotide sequence of bacteriophage lambda DNA. J. Mol. Biol. 162:729–773.
Schaffer, J. D., Caruana, R., L.J. Eshelman, & R. Das. (1989). A study of control parameters affecting online performance of genetic algorithms for function optimization. In Proceedings of the Third International Conference on Genetic Algorithms, 51–60. San Mateo, CA: Morgan Kaufmann.
Seto, D., Koop, B., & Hood, L. (1993). An experimentally-derived data set constructed for testing large-scale DNA sequence assembly algorithms. Genomics 15:673–676.
Staden, R. (1980). A new computer method for the storage and manipulation of DNA gel reading data. Nucl. Acids Res. 8:3673–3694.
Starkweather, T., McDaniel, S., Mathias, K., Whitley, D., & Whitley, C. (1991). A comparison of genetic sequencing operators. In Fourth International Conference on Genetic Algorithms, 69–76.
Sverdlov, E., Monastyrskaya, G., Broude, N., Ushkarev, Y., Melkov, A., Smirnov, Y., Malyshev, I., Allikmets, R., Kostina, M., Dulubova, I., Kiyatkin, N., Grishin, A., Modyanov, N., and Ovchinnikov, Y. (1987). Family of human Na+, K+-ATPase genes. Structure of the gene of isoform alpha-III. Cokl. Biochem. 297:426–431.
Syswerda, G. (1989). Uniform crossover in genetic algorithms. In Proceedgins of the Third International Conference on Genetic Algorithms, 2–9. San Mateo, CA: Morgan Kaufmann.
Waterman, M. S., ed. (1989). Mathematical Methods for DNA Sequences. CRC Press.
Whitley, D. (1993). Personal Communication, August 30.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Parsons, R.J., Forrest, S. & Burks, C. Genetic Algorithms, Operators, and DNA Fragment Assembly. Machine Learning 21, 11–33 (1995). https://doi.org/10.1023/A:1022613513712
Issue Date:
DOI: https://doi.org/10.1023/A:1022613513712