Abstract
There has been much recent algorithmic work on the problem of reconstructing the evolutionary history of biological species. Computer virus specialists are interested in finding the evolutionary history of computer viruses — a virus is often written using code fragments from one or more other viruses, which are its immediate ancestors. A phylogeny for a collection of computer viruses is a directed acyclic graph whose nodes are the viruses and whose edges map ancestors to descendants and satisfy the property that each code fragment is “invented” only once. To provide a simple explanation for the data, we consider the problem of constructing such a phylogeny with a minimum number of edges. This optimization problem is NP-hard, and we present positive and negative results for associated approximation problems. When tree solutions exist, they can be constructed and randomly sampled in polynomial time.
Part of this work was performed at Sandia National Laboratories and was supported by the U.S. Department of Energy under contract DE-AC04-76AL85000. Part of this work was supported by the ESPRIT Basic Research Action Programme of the EC under contract 7141 (project ALCOM-IT).
Part of this work was performed at Sandia National Laboratories and was supported by the U.S. Department of Energy under contract DE-AC04-76AL85000.
This work was performed under U.S. Department of Energy contract DE-AC04-76AL85000.
Preview
Unable to display preview. Download preview PDF.
References
M. Bellare, S. Goldwasser, C. Lund, and A. Russell. Efficient probabilistically checkable proofs and applications to approximation. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computing, pages 294–304, 1993.
C. Benham, S. Kannan, M. Paterson, and T. Warnow. Hen's teeth and whale's feet: Generalized characters and their compatibility. Journal of Mathematical Biology, 2(4):515–525, 1995.
H. Bodlaender, M. Fellows, and T. Warnow. Two strikes against perfect phylogeny. In Proceedings of the 19th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science, pages 273–283. Springer Verlag, 1992.
C. Colbourn and M. Jerrum, 1995. Personal communication.
C. Colbourn, W. Myrvold, and E. Neufeld. Two algorithms for unranking arborescences. Journal of Algorithms. To appear.
D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9:251–280, 1990.
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.
U. Feige. A threshold of ln n for approximating set cover. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing, pages 286–293, 1996.
A. Gibbons. Algorithmic Graph Theory. Cambridge University Press, 1985.
L. Goldberg, P. Goldberg, C. Phillips, E. Sweedyk, and T. Warnow. Computing the phylogenetic number to find good evolutionary trees. In Proceedings of the 6th Symposium on Combinatorial Pattern Matching, July 1995.
D. Gusfield. Efficient algorithms for inferring evolutionary trees. Networks, 21:12–28, 1991.
W. Joklik, H. Willett, D. Amos, and C. Wilfert, editors. Zinsser Microbiology. Appleton & Lange, Norwalk, Connecticut, 20th edition, 1992.
D. Karger, P. Klein, and R. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. Journal of the Association for Computing Machinery, 42(2), 1995.
J. Kephart and W. Arnold. Automatic extraction of computer virus signatures. In R. Ford, editor, Proceedings of the 4th Virus Bulletin International Conference, pages 179–194. Virus Bulletin Ltd; 1994.
A. Nijenhuis and H. Wilf. Combinatorial Algorithms for Computers and Calculators. Academic Press, 2nd edition, 1978.
R. Prim. Shortest connection networks and some generalizations. Bell System Technical Journal, 36:1389–1401, 1957.
G. B. Sorkin. Grouping related computer viruses into families. In Proceedings of the IBM Security ITS, Oct. 1994.
M. Steel. The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 9:91–116, 1992.
D. Wilson. Generating random spanning trees more quickly than the cover time. Submitted for publication, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goldberg, L.A., Goldberg, P.W., Phillips, C.A., Sorkin, G.B. (1996). Constructing computer virus phylogenies. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_19
Download citation
DOI: https://doi.org/10.1007/3-540-61258-0_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61258-2
Online ISBN: 978-3-540-68390-2
eBook Packages: Springer Book Archive