Skip to main content
Log in

Impact of missing data, gene choice, and taxon sampling on phylogenetic reconstruction: the Caryophyllales (angiosperms)

  • Original Article
  • Published:
Plant Systematics and Evolution Aims and scope Submit manuscript

Abstract

Density of taxon sampling and number/kind of characters are central to achieving the ultimate goals in phylogenetic reconstruction: tree robustness and improved accuracy. In molecular phylogenetics, DNA sequence repositories such as GenBank are potential sources for expanding datasets in two dimensions, taxa and characters, to the level of “supermatrices.” However, the issue of missing characters/genomic regions is generally considered a major impediment to this endeavor. We used here the angiosperm order Caryophyllales to systematically address the impact of missing data when expanding taxon sampling and number of characters in phylogenetic reconstruction. Our analyses show that expansion of taxon sampling by ~13-fold resulted in improved phylogenetic assessment of the Caryophyllales despite up to 38% missing data. Expanding number of characters in the dataset by allowing for up to 100-fold increase in amount of missing data and inclusion of entries with about 40% missing genomic regions did not negatively impact tree structure or robustness, but to the contrary improved both. These results are timely regarding the ongoing efforts to achieve detailed assessment of the tree of life.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Agnarsson I, May-Collado LJ (2008) The phylogeny of Cetartiodactyla: the importance of dense taxon sampling, missing data, and the remarkable promise of cytochrome b to provide reliable species-level phylogenies. Mol Phylogenet Evol 48:964–985

    Article  PubMed  CAS  Google Scholar 

  • Albert VA, Williams SE, Chase MW (1992) Carnivorous plants: phylogeny and structural evolution. Science 257:1491–1495

    Article  PubMed  CAS  Google Scholar 

  • Alverson WS, Whitlock BA, Nyffeler R, Bayer C, Baum DA (1999) Phylogeny of the core Malvales: evidence from ndhF sequence data. Am J Bot 86:1474–1486

    Article  PubMed  CAS  Google Scholar 

  • APG II (2003) An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG II. Bot J Linn Soc 141:399–436

    Article  Google Scholar 

  • APG III (2009) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Socy 161:105–121

    Article  Google Scholar 

  • Applequist WL, Wallace RS (2001) Phylogeny of the portulacaceous cohort based on ndhF sequence data. Syst Bot 26:406–419

    Google Scholar 

  • Barthet MM, Hilu KW (2007) Expression of matK: functional and evolutionary implications. Am J Bot 94:1402–1412

    Article  PubMed  CAS  Google Scholar 

  • Behnke H-D (1994) Sieve-element plastids: their significance for the evolution and systematics of the order. In: Behnke H-D, Mabry TJ (eds) Caryophyllales: evolution and systematics. Springer, Berlin, Germany, pp 87–121

    Google Scholar 

  • Bittrich V (1993) Introduction to centrospermae. In: Kubitzki K, Rohwer JG, Bittrich V (eds) The families and genera of vascular plants, vol II, magnoliid, hamamelid, and caryophyllid families, vol II. Springer, Berlin, Germany, pp 13–19

    Google Scholar 

  • Borsch T, Hilu KW, Quandt D, Wilde V, Neinhuis C, Barthlott W (2003) Noncoding plastid trnT-trnF sequences reveal a well resolved phylogeny of basal angiosperms. J Evol Biol 16:558–576

    Article  PubMed  CAS  Google Scholar 

  • Brockington SF, Alexandre R, Ramdial J, Moore MJ, Crawley S, Dhingra A, Hilu K, Soltis DE, Soltis PS (2009) Phylogeny of the caryophyllales sensu lato: revisiting hypotheses on pollination biology and perianth differentiation in the core caryophyllales. Int J Plant Sci 170:627–643

    Article  Google Scholar 

  • Burleigh JG, Hilu KW, Soltis DE (2009) Inferring phylogenies with incomplete data sets: A 5-gene, 567-taxon analysis of angiosperms. BMC Evol Biol 9:61

    Article  PubMed  Google Scholar 

  • Cameron KM, Wurdack KJ, Jobson RW (2002) Molecular evidence for the common origin of snap-traps among carnivorous plants. Am J Bot 89:1503–1509

    Article  PubMed  CAS  Google Scholar 

  • Chase MW, Soltis DE, Olmstead RG, Morgan D, Les DH, Mishler BD, Duvall MR, Price RA, Hills HG, Qiu Y-L, Kron KA, Rettig JH, Conti E, Palmer JD, Manhart JR, Systma KJ, Michaels HJ, Kress WJ, Karol KG, Clark WD, Hedren M, Gaut BS, Jansen RK, Kim K-J, Wimpee CF, Smith JF, Furnier GR, Strauss SH, Xiang Q-Y, Plunkett GM, Soltis PS, Swensen SM, Williams SE, Gadek PA, Quinn CJ, Eguiarte LE, Golenberg E, Learn GH Jr, Graham SW, Barrett SCH, Dayanandan S, Albert VA (1993) Phylogenetics of seed plants: an analysis of nucleotide sequences from the plasitd Gene rbcL. Ann MO Bot Garden 80:528–580

    Article  Google Scholar 

  • Clark LG, Zhang W, Wendel JF (1995) A phylogeny of the grass family (Poaceae) based on ndhF sequence data. Syst Bot 20:436–460

    Article  Google Scholar 

  • Cuénoud P, Savolainen V, Chatrou LW, Powell MP, Grayer RJ, Chase MW (2002) Molecular phylogenetics of Caryophyllales based on nuclear 18S rDNA and plastid rbcL, atpB, and matK DNA sequences. Am J Bot 89:132–144

    Article  PubMed  Google Scholar 

  • Donoghue MJ, Doyle JA, Gauthier J, Kluge AG, Rowe T (1989) The importance of fossils in phylogeny reconstruction. Annu Rev Ecol Syst 20:431–460

    Article  Google Scholar 

  • Downie SR, Katz-Downie DS, Cho K-J (1997) Relationships in the Caryophyllales as suggested by phylogenetic analyses of partial chloroplast DNA ORF2280 homolog sequences. Am J Bot 84:253–273

    Article  PubMed  CAS  Google Scholar 

  • Downie SR, Palmer JD (1994) Phylogenetic relationships using restriction site variation of the chloroplast DNA inverted repeat. In: Behnke H-D, Mabry TJ (eds) Caryophyllales: evolution and systematics. Springer, Berlin, pp 223–233

    Google Scholar 

  • Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Focus 12:13–25

    Google Scholar 

  • Edgar Robert C (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    Article  PubMed  CAS  Google Scholar 

  • Edwards EJ, Nyffeler R, Donoghue MJ (2005) Basal cactus phylogeny: implications of Pereskia (Cactaceae) paraphyly for the transition to the cactus life form. Am J Bot 92:1177–1188

    Article  PubMed  Google Scholar 

  • Farris JS (1989) The retention index and the rescaled consistency index. Cladistics 5:417–419

    Article  Google Scholar 

  • Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Article  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, pp 344

  • Fior S, Karis PO, Casazza G, Minuto L, Sala F (2006) Molecular phylogeny of the Caryophyllaceae (Caryophyllales) inferred from chloroplast matK and nuclear rDNA ITS sequences. Am J Bot 93:399–411

    Article  PubMed  CAS  Google Scholar 

  • Fior S, Karis PO (2007) Phylogeny, evolution and systematics of Moehringia (Caryophyllaceae) as inferred from molecular and morphological data: a case of homology reassessment. Cladistics 23:362–372

    Article  Google Scholar 

  • Freudenstein JV, Davis JI (2010) Branch support via resampling; an empirical study. Cladistics 26:643–656

    Article  Google Scholar 

  • Gao K, Norell MA (1998) Taxonomic revision of Carusia (Reptilia: Squamata) from the late cretaceous of the gobi desert and phylogenetic relationships of anguimorphan lizards. Am Mus Novitates 3230:1–52

    Google Scholar 

  • Gauthier J (1986) Saurischian monophyly and the origin of birds. Memoirs Calif Acad Sci 8:1–56

    Google Scholar 

  • GenBank (2009) (http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html)

  • Giannasi DE, Zurawski G, Learn G, Clegg MT (1992) Evolutionary relationships of the Caryophyllidae based on comparative rbcL sequences. Syst Bot 17:1–15

    Article  Google Scholar 

  • Graybeal A (1998) Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol 47:9–17

    Article  PubMed  CAS  Google Scholar 

  • Hillis DM (1996) Inferring complex phylogenies. Nature 383:130–131

    Article  PubMed  CAS  Google Scholar 

  • Hillis DM, Pollock DD, McGuire JA, Zwickl DJ (2003) Is sparse taxon sampling a problem for phylogenetic inference? Syst Biol 52:124–126

    Article  PubMed  Google Scholar 

  • Hilu KW, Alice LA (1999) Evolutionary implications of matK indels in Poaceae. Am J Bot 86:1735–1741

    Article  PubMed  CAS  Google Scholar 

  • Hilu KW, Borsch T, Müller K, Soltis DE, Soltis PS, Savolainen V, Chase MW, Powell MP, Alice LA, Evans R, Sauquet H, Neinhuis C, Slotta TAB, Jens GR, Campbell CS, Chatrou LW (2003) Angiosperm phylogeny based on matK sequence information. Am J Bot 90:1758–1776

    Article  PubMed  CAS  Google Scholar 

  • Hoot SB, Culham A, Crane PR (1995) The utility of atpB gene sequences in resolving phylogenetic relationships: comparison with rbcL and 18S ribosomal DNA sequences in the Lardizabalaceae. Ann MO Bot Garden 82:194–207

    Article  Google Scholar 

  • Huelsenbeck JP (1991) When are fossils better than extant taxa in phylogenetic analysis? Syst Zool 40:458–469

    Article  Google Scholar 

  • Jansen RK, Cai Z, Raubeson LA, Daniell H, dePamphilis CW, Leebens-Mack J, Müller KF, Guisinger-Bellian M, Haberle RC, Hansen AK, Chumley TW, Lee S-B, Peery R, McNeal JR, Kuehl JV, Boore JL (2007) Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. PNAS 104:19369–19374

    Article  PubMed  CAS  Google Scholar 

  • Jansen RK, Saski C, Lee S-B, Hansen AK, Daniell H (2011) Complete plastid genome sequences of three rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22 to the nucleus. Mol Biol Evol 28:835–847

    Article  PubMed  CAS  Google Scholar 

  • Johnson LA, Soltis DE (1995) Phylogenetic inference in Saxifragaceae sensu stricto and Gilia (Polemoniaceae) using matK sequences. Ann MO Bot Gardens 82:149–175

    Article  Google Scholar 

  • Judd WS, Campbell CS, Kellogg EA, Stevens PF, Donoghue MJ (2008) Plant systematics: a phylogenetic approach. Sinauer Associates, Sunderland MA 01375 USA

    Google Scholar 

  • Kadereit G, Borsch T, Weising K, Freitag H (2003) Phylogeny of Amaranthaceae and Chenopodiaceae and the evolution of C4 photosynthesis. Int J Plant Sci 164:959–986

    Article  CAS  Google Scholar 

  • Källersjö M, Farris JS, Chase MW, Bremer B, Fay MF, Humphries CJ, Petersen G, Seberg O, Bremer K (1998) Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants, and flowering plants. Plant Syst Evol 213:259–287

    Article  Google Scholar 

  • Källersjö M, Albert VA, Farris JS (1999) Homoplasy increases phylogenetic structure. Cladistics 15:91–93

    Google Scholar 

  • Kawahara AY, Mignault AA, Regier JC, Kitching IJ, Mitter C (2009) Phylogeny and biogeography of Hawkmoths (Lepidoptera: Sphingiae): evidence from five nuclear genes. PLoS One 4:1–11

    Article  Google Scholar 

  • Kearney M (2002) Fragmentary taxa, missing data, and ambiguity: mistaken assumptions and conclusions. Syst Biol 51:369–381

    Article  PubMed  Google Scholar 

  • Kearney M, Clark JM (2003) Problems due to missing data in phylogenetic analyses including fossils: a critical review. J Vertebr Paleontol 23:263–274

    Article  Google Scholar 

  • Kelchner SA (2000) The evolution of non-coding chloroplast DNA and its application in plant systematics. Ann MO Bot Garden 87:482–498

    Article  Google Scholar 

  • Kubitzki K, Rohwer JG, Bittrich V (eds) (1993)The families and genera of vascular plants. II. Flowering plants: dicotyledons, magnoliid, hamamelid and caryophyllid families. Springer, Berlin

  • Leebens-Mack J, Raubeson LA, Cui L, Kuehl JV, Fourcade MH, Chumley TW, Boore JL, Jansen RK, dePamphilis CW (2005) Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one’s way out of the felsenstein zone. Mol Biol Evol 22:1948–1963

    Article  PubMed  CAS  Google Scholar 

  • Li J (2008) Phylogeny of Catalpa (Bignoniaceae) inferred from sequences of chloroplast ndhF and nuclear ribosomal DNA. Syst Evol 46:341–348

    Google Scholar 

  • Liang H, Hilu KW (1996) Application of the matK gene sequences to grass systematics. Can J Bot 74:125–134

    Article  CAS  Google Scholar 

  • McMahon MM, Sanderson MJ (2006) Phylogenetic supermatrix analysis of genbank sequences from 2228 papilionoid legumes. Syst Biol 55:818–836

    Article  PubMed  Google Scholar 

  • Meimberg H, Wistuba A, Dittrich P, Heubl G (2001) Molecular phylogeny of Nepenthaceae based on cladistic analysis of plastid trnK intron sequence data. Plant Biol 3:164–175

    Article  CAS  Google Scholar 

  • Moore MJ, Bell CD, Soltis PS, Soltis DE (2007) Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA 104:19363–19368

    Article  PubMed  Google Scholar 

  • Müller J, Müller K (2003) QuickAlign: a new alignment editor. Plant Mol Biol Rep 21:5

    Article  Google Scholar 

  • Müller K (2004) PRAP-computation of Bremer support for large data sets. Mol Phylogen Evol 31:780–782

    Article  Google Scholar 

  • Müller KF, Borsch T (2005) Phylogenetics of Amaranthaceae based on matK/trnK sequence data-evidence from parsimony, likelihood, and bayesian analyses. Ann MO Bot Gardens 92:66–102

    Google Scholar 

  • Müller KF, Borsch T, Hilu KW (2006) Phylogenetic utility of rapidly evolving DNA at high taxonomical levels: contrasting matK, trnT-F and rbcL in basal angiosperms. Mol Phylogen Evol 41:99–117

    Article  Google Scholar 

  • Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407–414

    Article  Google Scholar 

  • Novacek MJ (1992) Fossils, topologies, missing data, and the higher level phylogeny of eutherian mammals. Syst Biol 41:58–73

    Google Scholar 

  • Nyffeler R (2002) Phylogenetic relationships in the cactus family (Cactaceae) based on evidence from trnK/matK and trnL-trnF sequences. Am J Bot 89:312–326

    Article  PubMed  CAS  Google Scholar 

  • Nyffeler R (2007) The closest relatives of cacti: insights from phylogenetic analyses of chloroplast and mitochondrial sequences with special emphasis on relationships in the tribe Anacampseroteae. Am J Bot 94:89–101

    Article  PubMed  CAS  Google Scholar 

  • Olmstead RG, Michaels HJ, Scott KM, Palmer JD (1992) Monophyly of the Asteridae and identification of their major lineages inferred from dna sequences of rbcL. Ann MO Bot Garden 79:249–265

    Article  Google Scholar 

  • Olmstead RG, Zjhra ML, Lohmann LG, Grose SO, Eckert AJ (2009) A molecular phylogeny and classification of bignoniaceae. Am J Bot 96:1731–1743

    Article  PubMed  CAS  Google Scholar 

  • O’Quinn R, Hufford L (2005) Molecular systematics of montieae (Portulacaceae): implications for taxonomy, biogeography and ecology. Syst Bot 30:314–331

    Article  Google Scholar 

  • Philippe H, Snell EA, Bapteste E, Lopez P, Holland PWH, Casane D (2004) Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol Biol Evol 21:1740–1752

    Article  PubMed  CAS  Google Scholar 

  • Pirie MD, Humphreys AM, Galley C, Barker NP, Verboom GA, Orlovich D, Draffin SJ, Lloyd K, Baeza CM, Negritto M, Ruiz E, Sanchez JHC, Reimer E, Linder HP (2008) A novel supermatrix approach improves resolution of phylogenetic relationships in a comprehensive sample of danthonioid grasses. Mol Phylogen Evol 48:1106–1119

    Article  CAS  Google Scholar 

  • Pollock DD, Zwickl DJ, McGuire JA, Hillis DM (2002) Increased taxon sampling is advantageous for phylogenetic inference. Syst Biol 51:664–671

    Article  PubMed  Google Scholar 

  • Pryer KM, Schuettpelz E, Wolf PG, Schneider H, Smith AR, Cranfill R (2004) Phylogeny and evolution of ferns (Monilophytes) with a focus on the early leptosporangiate divergences. Am J Bot 91:1582–1598

    Article  PubMed  CAS  Google Scholar 

  • Qiu Y-L, Dombrovska O, Lee J, Li L, Whitlock BA, Bernasconi-Quadroni F, Rest JS, Davis CC, Borsch T, Hilu KW, Renner SS, Soltis DE, Soltis PS, Zanis MJ, Cannone JJ, Gutell RR, Powell M, Savolainen V, Chatrou LW, Chase MW (2005) Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes. Int J Plant Sci 166:815–842

    Article  CAS  Google Scholar 

  • Qiu Y-L, Li L, Wang B, Chen Z, Knopp V, Groth-Malonek M, Dombrovska O, Lee J, Kent L, Rest J, Estabrook GF, Hendry TA, Taylor DW, Testa CM, Ambros M, Crandall-Stotler B, Duff RJ, Stech M, Frey W, Quandt D, Davis CC (2006) The deepest divergences in land plants inferred from phylogenomic evidence. PNAS 103:15511–15516

    Article  PubMed  CAS  Google Scholar 

  • Rannala B, Huelsenbeck JP, Yang Z, Nielsen R (1998) Taxon sampling and the accuracy of large phylogneies. Syst Biol 47:702–710

    Article  PubMed  CAS  Google Scholar 

  • Rettig JH, Wilson HD, Manhart JR (1992) Phylogeny of the Caryophyllales-gene sequence data. Taxon 41:201–209

    Article  Google Scholar 

  • Rokas A, Carroll SB (2005) More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol Biol Evol 22:1337–1344

    Article  PubMed  CAS  Google Scholar 

  • Rønsted N, Weiblen GD, Clement WL, Zerega NJC, Savolainen V (2008) Reconstructing the phylogeny of figs (Ficus, Moraceae) to reveal the history of the fig pollination mutualism. Symbiosis 45:1–12

    Google Scholar 

  • Rosenberg MS, Kumar S (2001) Incomplete taxon sampling is not a problem for phylogenetic inference. PNAS 98:10751–10756

    Article  PubMed  CAS  Google Scholar 

  • Sanchez A, Kron KA (2008) Phylogenetics of Polygonaceae with an emphasis on the evolution of Eriogonoideae. Syst Bot 33:87–96

    Article  Google Scholar 

  • Savolainen V, Chase MW, Hoot SB, Morton CM, Soltis DE, Bayer C, Fay MF, DeBruijn AY, Sullivan S, Qiu Y-L (2000) Phylogenetics of flowering plants based on combined analysis of plastid atpB and rbcL gene sequences. Syst Biol 49:306–362

    Article  PubMed  CAS  Google Scholar 

  • Smissen RD, Clement JC, Garnock-Jones PJ, Chambers GK (2002) Subfamilial relationships within Caryophyllaceae as inferred from 5’ ndhF sequences. Am J Bot 89:1336–1341

    Article  PubMed  CAS  Google Scholar 

  • Smith JF, Wolfram JC, Brown KD, Carroll CL, Denton DS (1997) Tribal Relationships in the Gesneriaceae: evidence from DNA sequences of the chloroplast gene ndhF. Ann MO Bot Garden 84:50–66

    Article  Google Scholar 

  • Soltis DE, Soltis PS, Chase MW, Mort ME, Albach DC, Zanis M, Savolainen V, Hahn WH, Hoot SB, Fay MF, Axtell M, Swensen SM, Prince LM, Kress WJ, Nixon KC, Farris JS (2000) Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot J Linn Soc 133:381–461

    Google Scholar 

  • Soltis DE, Senters AE, Zanis MJ, Kim S, Thompson JD, Soltis PS, Ronse De Craene LP, Endress PK, Farris JS (2003) Gunnerales are sister to other core eudicots: implications for the evolution of pentamery. Am J Bot 90:461–470

    Article  PubMed  Google Scholar 

  • Soltis DE, Albert VA, Savolainen V, Hilu K, Qiu Y-L, Chase MW, Farris JS, Stefanovic S (2004) Genome-scale data, angiosperm relationships, and ‘ending incongruence’: a cautionary tale in phylogenetics. Trends Plant Sci 9:477–483

    Article  PubMed  CAS  Google Scholar 

  • Stamatakis A, Hoover P, Rougemont J (2008) A Fast Bootstrapping Algorithm for the RAxML Web Servers. Systematic Biol 57:758–771

    Article  Google Scholar 

  • Stevens PF (2010) Angiosperm Phylogeny Website. Version 9, June 2008. http://www.mobot.org/MOBOT/research/APweb/

  • Swofford DL (2003) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, Sinauer Associates, Sunderland, MA, USA

    Google Scholar 

  • Turmel M, Gagnon M-C, O’Kelly CJ, Otis C, Lemieux C (2009) The chloroplast genomes of the green algae Pyramimonas, Monomastix, and Pycnococcus shed new light on the evolutionary history of Prasinophytes and the origin of the secondary chloroplasts of Euglenids. Mol Biol Evol 26:632–648

    Article  Google Scholar 

  • Wang H, Moore MJ, Soltis PS, Bell C, Brockington SF, Alexandre R, Davis CC, Latvis M, Manchester SR, Soltis DE (2009) Rosid radiation and the rapid rise of angiosperm-dominated forests. Proc Natl Acad Sci USA 106:3853–3858

    Article  PubMed  CAS  Google Scholar 

  • Whittall JB, Carlosn ML, Beardsley PM, Meinke RJ, Liston A (2006) The Mimulus moschatus Alliance (Phrymaceae): molecular and morphological phylogenetics and their conservation implications. Syst Bot 31:380–397

    Article  Google Scholar 

  • Wiens JJ (1998) Does adding characters with missing data increase or decrease phylogenetic accuracy? Syst Biol 47:625–640

    Article  PubMed  CAS  Google Scholar 

  • Wiens JJ (2003a) Incomplete taxa, incomplete characters, and phylogenetic accuracy: is there a missing data problem? J Vertebr Paleontol 23:297–310

    Article  Google Scholar 

  • Wiens JJ (2003b) Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol 52:528–538

    Article  PubMed  Google Scholar 

  • Wiens JJ (2005) Can Incomplete taxa rescue phylogenetic analyses from long-branch attraction? Syst Biol 54:731–742

    Article  PubMed  Google Scholar 

  • Wiens JJ (2006) Missing data and the design of phylogenetic analyses. J Biomed Inform 39:34–42

    Article  PubMed  CAS  Google Scholar 

  • Wiens JJ, Reeder TW (1995) Combining data sets with different numbers of taxa for phylogenetic analysis. Syst Biol 44:548–558

    Google Scholar 

  • Wilkinson M (1995) Coping with abundant missing entries in phylogenetic inference using parsimony. Syst Biol 44:501–514

    Google Scholar 

  • Williams SE, Albert VA, Chase MW (1994) Relationships of Droseraceae: a cladistic analysis of rbcL sequence and morphological data. Am J Bot 81:1027–1037

    Article  Google Scholar 

  • Wilson CA (2009) Phylogenetic relationships among the recognized series in Iris section Limniris. Syst Bot 34:277–284

    Article  Google Scholar 

  • Wolf PG (1997) Evaluation of atpB nucleotide sequences for phylogenetic studies of ferns and other pteridophytes. Am J Bot 84:1429–1440

    Article  PubMed  CAS  Google Scholar 

  • Zwickl DJ, Hillis DM (2002) Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:588–598

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

The authors thank J. Gordon Burleigh for his contributions to this manuscript; and D. and P. Soltis, S. Brockington, and M. Moore, as well as the Missouri Botanical Garden and the Royal Botanic Garden at Kew for providing DNA samples for several taxa. We thank M. Barthet for help in designing a primer, A. Hinckle for helping with specimen collection, S. Newman for assistance in laboratory work, and A. Ferraioli for assistance with figures. We also thank two anonymous reviewers for their comments and suggestions. This work is part of the AToL-Angiosperm project supported by grants from the National Science Foundation, USA (EF-043105 and REU-477683 3) to K.W.H.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. S. Crawley.

Electronic supplementary material

Below are the links to the electronic supplementary material.

Online Resource 4 (ESM_4) MP strict consensus tree based on the matK/trnK intron dataset for 51 Caryophyllales taxa (0.3% missing data). Percent bootstrap values greater than 50% are noted on branches

Online Resource 5 (ESM_5) Summary of the MP strict consensus tree based on matK/trnK intron data with expanded taxon sampling (652 taxa with 38% missing data). Percent bootstrap values greater than 50% are noted on branches

Online Resource 6 (ESM_6) ML tree based on the five genomic regions (rbcL, atpB, ndhF, matK, and trnK intron) for 136 Caryophyllales taxa. Percent bootstrap values greater than 50% are noted on branches. (a) Expanded details for the “AAC” and “raphide” clades. (b) Expanded details for the “succulents” clade. (c) Expanded details for the “FTPP” and “carnivorous” clades

Online Resource 7 (ESM_7) MP strict consensus tree based on the dataset of five genomic regions (rbcL, atpB, ndhF, matK, and trnK intron) for 136 taxa (5GR-136; 46% missing data). Percent bootstrap values greater than 50% are noted on branches. (a) The FTPP and carnivorous clades have been collapsed. (b) FTPP and carnivorous clades are expanded

Online Resource 8 (ESM_8) ML tree based on the matK/trnK intron dataset for 51 Caryophyllales taxa. Branch lengths are noted on the branches

Supplementary material 1 (DOC 77 kb)

Supplementary material 2 (DOC 220 kb)

Supplementary material 3 (DOC 537 kb)

Supplementary material 4 (EPS 1050 kb)

Supplementary material 5 (EPS 487 kb)

Supplementary material 6 (EPS 1142 kb)

Supplementary material 7 (EPS 1166 kb)

Supplementary material 8 (EPS 1219 kb)

Supplementary material 9 (EPS 1869 kb)

Supplementary material 10 (EPS 1206 kb)

Supplementary material 11 (TIFF 2931 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crawley, S.S., Hilu, K.W. Impact of missing data, gene choice, and taxon sampling on phylogenetic reconstruction: the Caryophyllales (angiosperms). Plant Syst Evol 298, 297–312 (2012). https://doi.org/10.1007/s00606-011-0544-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00606-011-0544-x

Keywords

Navigation