To those outside the field, and even to some focused on empirical applications, phylogenetics may appear to have little to do with algebra. Probability and statistics are clearly important ingredients, as modeling and inferring evolutionary relationships motivate the field. Combinatorics is also an obvious component, as the graph-theoretic notions of trees, and more recently networks, are used to describe the relationships. But where does the algebra arise?

The models used in phylogenetics are necessarily complex. At the simplest, they depend on a tree structure, as well as Markov matrices describing changes in nucleotide sequences along the edges. These two components result in probability distributions given by rather complicated polynomials on the parameters of the models, whose precise form reflects the structure of the tree. Even following standard statistical paradigms for inference, efficient calculation, such as by the Felsenstein pruning algorithm Felsenstein (1981) used in likelihood calculations, depends on understanding this algebraic structure.

But in the late 1980s, the algebraic structure also suggested alternative inference frameworks to some researchers. These included the phylogenetic invariants of Cavender and Felsenstein (1987), and of Lake (1987), and the Hadamard transform framework of Hendy and his colleagues Hendy and Penny (1989), Hendy et al. (1994). While this early explicitly algebraic work resulted in a number of interesting mathematical explorations, perhaps culminated in Evans and Speed’s invariants work Evans and Speed (1993), it had little impact on practical inference as simulations studies seldom showed good performance Huelsenbeck (1995).

In the early 2000s, works of Allman and Rhodes (2003) and of Sturmfels and Sullivant (2005) revived interest in invariants. Interest in applying algebraic perspectives to statistical problems, especially in computational biology, was exemplified by the book of Pachter and Sturmfels (2005), which helped draw new researchers to the field. Of course, algebra in statistics has been present from the beginning, such as in Pearson’s work Pearson (1894), but as theoretic and computational tools of algebra have developed, they had remained largely outside of the inference toolbox.

In recent years, algebraic methods have been crucial to advances in the theory of phylogenetic inference [in particular, parameter identifiability of phylogenetic models Allman and Rhodes (2009), Allman et al. (2018)] and in new methods of tree reconstruction Fernández-Sánchez and Casanellas (2016), Chifman and Kubatko (2014) that are competitive with traditional frameworks. The tools that have been used draw from algebraic geometry, commutative algebra, computational algebra and algebraic statistics as well as group representation theory and algebraic combinatorics.

The works in this volume showcase the varied directions in which algebra is playing a role in current phylogenetic research.

Algebraic varieties underly the investigation of mixture models by Gross et al., as well as the study of maximum likelihood inference using recently developed numerical algebraic geometry tools by Kosta and Kubjas. Sumner and Woodhams focus more tightly on the modeling of sequence evolution, and the algebraic origin of nicely structured models.

A number of works move beyond simple evolution on a tree. The multispecies coalescent model, which describes the biological process by which gene trees may differ from species trees, is analyzed by Disanto and Rosenberg with tools of algebraic combinatorics. Long and Kubatko also consider this model, greatly weakening the assumptions necessary to justify the invariant-based SVDquartets method of species tree inference. Durden and Sullivant give an identifiability result for a k-mer based distance under the coalescent.

Moving from trees to networks, Kim et al. investigate the impact of admixture on phylogenetic distances and tree reconstruction. Considering both the coalescent and the hybridization, Baños mixes algebraic and combinatorial approaches to show the identifiability of many network features from gene tree data.

Two works highlight other algebraic tools. Terauds and Sumner apply representation theory to study improving distance estimates based on gene order through maximum likelihood. Yoshida et al. bring tropical geometry and algebra to bear on summarizing collections of trees, through a new form of principal component analysis.

Finally, Huber et al.’s work highlights the role of submodularity, a concept appearing widely in combinatorics and optimization, while Wicke and Fischer address open questions on the Shapely value of trees.