On the information content of discrete phylogenetic characters

Bordewich, Magnus; Deutschmann, Ina Maria; Fischer, Mareike; Kasbohm, Elisa; Semple, Charles; Steel, Mike

doi:10.1007/s00285-017-1198-2

On the information content of discrete phylogenetic characters

Published: 16 December 2017

Volume 77, pages 527–544, (2018)
Cite this article

Journal of Mathematical Biology Aims and scope Submit manuscript

Magnus Bordewich¹,
Ina Maria Deutschmann²,
Mareike Fischer²,
Elisa Kasbohm²,
Charles Semple³ &
…
Mike Steel³

379 Accesses
1 Citation
Explore all metrics

Abstract

Phylogenetic inference aims to reconstruct the evolutionary relationships of different species based on genetic (or other) data. Discrete characters are a particular type of data, which contain information on how the species should be grouped together. However, it has long been known that some characters contain more information than others. For instance, a character that assigns the same state to each species groups all of them together and so provides no insight into the relationships of the species considered. At the other extreme, a character that assigns a different state to each species also conveys no phylogenetic signal. In this manuscript, we study a natural combinatorial measure of the information content of an individual character and analyse properties of characters that provide the maximum phylogenetic information, particularly, the number of states such a character uses and how the different states have to be distributed among the species or taxa of the phylogenetic tree.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Metrics for Phylogenetic Trees via Distributions of Discrete and Continuous Characters

Combinatorial Scoring of Phylogenetic Networks

Character

Notes

This condition is weaker than the assumption that each state actually evolves only once, since the states at the leaves may have evolved with homoplasy (reversals or convergent evolution) yet still be homoplasy-free on the tree.

References

Bandelt H, Fischer M (2008) Perfectly misleading distances from ternary characters. Syst Biol 57(4):540–543
Article Google Scholar
Bordewich M, Semple C, Steel M (2006) Identifying X-trees with few characters. Electron J Comb 13:R83
MathSciNet MATH Google Scholar
Carter M, Hendy M, Penny D, Széley L, Wormald N (1990) On the distribution of lengths of evolutionary trees. SIAM J Discrete Math 3(1):38–47
Article MathSciNet MATH Google Scholar
Huber K, Moulton V, Steel M (2005) Four characters suffice to convexly define a phylogenetic tree. SIAM J Discrete Math 18(1):835–843
Article MathSciNet MATH Google Scholar
Maddison D, Schulz KS, Maddison W (2007) The tree of life web project. In: Zhang ZQ, Shear W (eds) Linnaeus tercentenary: progress in invertebrate taxonomy, vol 1668. Zootaxa, Auckland, pp 19–40
Google Scholar
McDiarmid C, Semple C, Welsh D (2015) Counting phylogenetic networks. SIAM J Discrete Math 19:205–224
MathSciNet MATH Google Scholar
Schütz A (2016) Der Informationsgehalt von $r$-Zustands-Charactern. Bachelor’s thesis, Greifswald University, Germany
Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford
MATH Google Scholar
Sloane N (2010) The on-line encyclopedia of integer sequences. http://oeis.org
Steel M, Penny D (2005) Maximum parsimony and the phylogenetic information in multi-state characters. In: Albert V (ed) Parsimony, phylogeny and genomics. Oxford University Press, Oxford
Google Scholar
Townsend J (2007) Profiling phylogenetic informativeness. Syst Biol 56:222–231
Article Google Scholar

Download references

Acknowledgements

We thank the two anonymous reviewers for several helpful comments on an earlier version of this paper. I.D. and E.K. thank the International Office at the University of Greifswald and the German Academic Exchange Service (DAAD) for the support through the mobility program PROMOS (travel scholarship). We also thank the (former) Allan Wilson Centre for supporting this research.

Author information

Authors and Affiliations

Science Laboratories, School of Engineering and Computing Sciences, University of Durham, South Road, Durham, DH1 3LE, UK
Magnus Bordewich
Institute of Mathematics and Computer Science, Ernst-Moritz-Arndt-University Greifswald, Walther-Rathenau-Str. 47, 17487, Greifswald, Germany
Ina Maria Deutschmann, Mareike Fischer & Elisa Kasbohm
School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch, 8140, New Zealand
Charles Semple & Mike Steel

Authors

Magnus Bordewich
View author publications
You can also search for this author in PubMed Google Scholar
Ina Maria Deutschmann
View author publications
You can also search for this author in PubMed Google Scholar
Mareike Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Kasbohm
View author publications
You can also search for this author in PubMed Google Scholar
Charles Semple
View author publications
You can also search for this author in PubMed Google Scholar
Mike Steel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mike Steel.

Appendix

Proof of Lemma 1

We first consider the case $m=2$. In this case, we have $b(m+s)=b(2+s)$ and $b(m)=b(2)=1$ as well as $b(m+s-1)=b(s+1)$ and $b(m+1)=b(3)=1$. In total, we have $b(m+s) \cdot b(m) = b(s+2) > b(s+1) = b(m+1)\cdot b(m+s-1)$, which is true for all $s\ge 2$.

We now consider the case $m\ge 3$. As $s\ge 2$, we have:

$$\begin{aligned}&2m + 2s> 2m +2 \\&\quad \Rightarrow 2(m+s)-5> 2m-3 \\&\quad \Rightarrow (2(m+s)-5) \cdot (2(m+s)-7)!! \cdot (2m-5)!! \\&\qquad> (2m-3) \cdot (2(m+s)-7)!! \cdot (2m-5)!! \\&\quad \Rightarrow (2(m+s)-5)!! \cdot (2m-5)!!> (2m-3)!! \cdot (2(m+s)-7)!! \\&\quad \Rightarrow (2(m+s)-5)!! \cdot (2m-5)!!> (2(m+1)-5)!! \cdot (2(m+s-1)-5)!! \\&\quad \Rightarrow b(m+s) \cdot b(m) > b(m+1)\cdot b(m+s-1). \end{aligned}$$

The last line uses the fact that $b(m)=(2m-5)!!$ for all $m\ge 3$. This completes the proof. $\square $

Note that Lemma 1 is only stated for $m \ge 2$. If $m=1$, the lemma only holds for $s\ge 3$. To see this, consider the case $m=1$ and $s=2$. Then, $b(m+s)\cdot b(m)= b(1+2)\cdot b(1) = b(1+1)\cdot b(1+2-1)= b(m+1)\cdot b(m+s-1)$, as $b(1)=b(2)=b(3)=1$. Therefore the strict inequality stated in the lemma no longer holds.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bordewich, M., Deutschmann, I.M., Fischer, M. et al. On the information content of discrete phylogenetic characters. J. Math. Biol. 77, 527–544 (2018). https://doi.org/10.1007/s00285-017-1198-2

Download citation

Received: 10 March 2017
Revised: 30 July 2017
Published: 16 December 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00285-017-1198-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the information content of discrete phylogenetic characters

Abstract

Access this article

Similar content being viewed by others

Information Metrics for Phylogenetic Trees via Distributions of Discrete and Continuous Characters

Combinatorial Scoring of Phylogenetic Networks

Character

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Lemma 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the information content of discrete phylogenetic characters

Abstract

Access this article

Similar content being viewed by others

Information Metrics for Phylogenetic Trees via Distributions of Discrete and Continuous Characters

Combinatorial Scoring of Phylogenetic Networks

Character

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Lemma 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation