Tianjian Qin

Sub-Species Level Diversification and Phylogeny Reconstruction

Published on 20 Aug 2024 18 min read Link
image
Image Credit: leonardo.ai

Diversification is easiest to misunderstand when we jump straight to the final tree. By that stage, a lot of the process has already been compressed away: event order, lingering progenitor lineages, extinct side branches, and the mismatch between what happened and what our summaries can still recover. That is the motivation for the simulator below. It keeps time in the foreground and lets the same run be viewed as lineage history, coarse-grained taxa, and trait-based reconstruction.

Under the hood, each extant lineage can bud a clonal daughter, bud a mutating daughter, or die, with event times sampled in continuous time using a Gillespie-style stochastic simulation algorithm.[1] The simulator then coarse-grain those lineage-level events into “species” by requiring a user-defined number of mutational steps before promoting a daughter lineage to a new named taxon. It is essentially a protracted speciation, and it is closer to a budding picture than to a perfectly symmetric split because the progenitor lineage can survive while throwing off daughters.[4][5][15]

TL;DR

  • Diversification is an event-by-event process first, and only later a tree summary.
  • Coarse-graining and extinction can make different fine-scale histories collapse into the same observed species-level pattern.
  • A tree reconstructed from traits can disagree with a tree reconstructed from the event history for very ordinary reasons.
  • The simulator makes those bottlenecks visible with time-resolved chronograms, a lineages-through-time panel, and a trait-space panel.
EvoLab

Interactive diversification lab

Continuous-time budding process + coarse-grained species labels + linked reconstructions.

Extant
1
Extinct
0
Observed
1
Time
0.00
Leaf-pair r
n/a
Run controls
Parameters and display settings Rates, coarsening, pruning, and the trait process.
Rates 0.40 / 0.35 / 0.10 Coarsen 2 Brownian · 4D
Lineage process
0.40
0.35
0.10
Observation layer
40
2
Prune extinct taxa
Show branch lengths
Trait model
0.25
4
0.50
Clonal lineage Mutating lineage Extinct lineage Highlighted taxon
1. Full lineage chronogram
2. Coarsened species chronogram
3. History tree from the coarsened record
4. Distance tree from observed traits
5. Lineages through time
6. Trait space (first two axes or PCA)

How to read the simulator

The six panels are arranged to move from process to summary. The first two show what is happening through time at the lineage and coarse-grained taxon levels; the next two show two different tree summaries of the same run; the last two summarize tempo and geometry.

  • The chronograms keep event order explicit, so budding, persistence, and extinction are visible instead of being absorbed into a generic node-link picture.
  • Scenario presets and seeded runs make comparisons cleaner, because they separate parameter changes from a completely new random draw.
  • The Brownian and OU-like options give two contrasting ways to think about continuous traits: wandering versus constrained spread.
  • JSON and Newick export allows to pull a run into another workflow once an example becomes useful beyond the page itself.

Note: the Leaf-pair r card is just a rough agreement score based on pairwise tip distances in the two trees. It is not a formal topology test.

What each panel is trying to tell you

  1. Full lineage chronogram. The fine-grained event history, including extinct and still-incidental lineages.
  2. Coarsened species chronogram. The same history after you decide how many mutational steps count as “enough” to name a new species.
  3. History tree. A forced tree summary of the coarsened record. Useful, but already one step removed from the process.
  4. Trait tree. A simple reconstruction from observed trait distances only. This is the panel most like real life, because we often observe character data, not the true event log.
  5. Lineages through time. A quick sanity check on richness, extinction, and what survives into the observed layer.
  6. Trait space. The geometry behind the trait tree. When panel 4 looks odd, this panel usually explains why.

Coarsening is the real knob here

The Coarsen Level slider is not a decorative extra. It is the heart of the whole toy model. If it is set to 0, every new lineage is promoted immediately. If it is larger, several mutational steps must accumulate before we call the result a new species. In spirit that rhymes with protracted-speciation ideas, where there is a difference between an incipient lineage and a completed species.[4][5]

What coarsening actually does
This is a one-branch illustration. Red events are mutational steps, blue events are clonal births. Move the slider and watch which events get promoted to named species.
2 mutational step(s)

Once you see it that way, the disagreement between “lineage history” and “species history” becomes less surprising. You are not merely reducing visual complexity. You are changing the object you are willing to observe.

Extinction is a memory problem

I think this is the easiest part to underestimate. Even if the generating process were strictly tree-like, the phylogeny you reconstruct from surviving taxa is a pruned object. It does not contain the extinct lineages that once helped define branching order, lineage durations, and the amount of hidden experimentation in the clade.[2][3] That is one of the reason that extant-only trees can be consistent with more than one diversification history.[14]

Same extant pattern, different hidden baggage
The pruned tree on the right stays fixed. The complete history on the left changes because extinct side branches are different in each scenario.
A
Complete history
Extant-only view

That is also why I left the Prune extinct taxa toggle in the simulator. Switching it on and off is a nice way to watch how much of the argument is really about biology, and how much is about observation.

Traits are proxies, not the event log

The trait tree is the panel that most closely resembles the real inferential situation: you do not get handed the event history, you get handed characters. For continuous traits, Brownian motion is the standard reference model, and OU-type models are a common way to represent pull toward an optimum or constrained evolution.[11][12][13] I included both, not because one of them is “the truth,” but because the difference is really useful.

Brownian spread versus OU-like pull
These are just little 2D toy trajectories from a common starting point. Brownian paths wander; OU-like paths keep getting pulled back toward the center.
0.25

More generally, disagreement among trees is not just a failure mode of bad statistics. Gene trees need not equal species trees,[9][10] and sometimes the underlying history is reticulate enough that a network is the more honest presentation.[6][7][8] I am not simulating hybridization here, but I do want the post to keep that broader lesson in view: a clean bifurcating tree is a very useful summary, not a guarantee that the world was actually that clean.

A few things worth trying in the simulator

  1. High turnover. Switch to the turnover preset and compare the history tree to the trait tree. The extant summary gets much thinner while the hidden history stays messy.
  2. Radiation. Use rapid radiation with coarsen level 1, then 3. Watch how the same event stream starts to look much more or much less tree-like depending on what you are willing to call a species.
  3. Trait constraint. Keep the diversification settings fixed, flip from Brownian to OU-like, and look at the trait cloud. The geometry in panel 6 usually explains the stability or instability of panel 4.
  4. Step mode. Press Step a dozen times instead of letting the animation run. The logic of the chronograms is easier to grasp when you watch events accumulate one by one.

EvoLab: thesimulator project

The simulator in this post is EvoLab.

EvoLabThumbnail
EvoLab

A diversification lab for continuous-time budding lineages, coarsened species records, history versus trait trees, and exportable JSON or Newick summaries.

GitHub Open Source

References

Numbers in brackets link to the sources used for the conceptual framing of this post.

  1. Gillespie DT. 1977. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 81(25):2340–2361. doi:10.1021/j100540a008
  2. Nee S, May RM, Harvey PH. 1994. The reconstructed evolutionary process. Philos Trans R Soc Lond B Biol Sci. 344(1309):305–311. doi:10.1098/rstb.1994.0068
  3. Lambert A, Stadler T. 2013. Birth-death models and coalescent point processes: the shape and probability of reconstructed phylogenies. Theor Popul Biol. 90:113–128. doi:10.1016/j.tpb.2013.10.002
  4. Rosindell J, Cornell SJ, Hubbell SP, Etienne RS. 2010. Protracted speciation revitalizes the neutral theory of biodiversity. Ecol Lett. 13(6):716–727. doi:10.1111/j.1461-0248.2010.01463.x
  5. Etienne RS, Rosindell J. 2012. Prolonging the past counteracts the pull of the present: protracted speciation can explain observed slowdowns in diversification. Syst Biol. 61(2):204–213. doi:10.1093/sysbio/syr091
  6. Huson DH, Bryant D. 2006. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 23(2):254–267. doi:10.1093/molbev/msj030
  7. Yu Y, Dong J, Liu KJ, Nakhleh L. 2014. Maximum likelihood inference of reticulate evolutionary histories. Proc Natl Acad Sci USA. 111(46):16448–16453. doi:10.1073/pnas.1407950111
  8. Wen D, Yu Y, Nakhleh L. 2016. Bayesian inference of reticulate phylogenies under the multispecies network coalescent. PLoS Genet. 12(5):e1006006. doi:10.1371/journal.pgen.1006006
  9. Maddison WP. 1997. Gene trees in species trees. Syst Biol. 46(3):523–536. doi:10.1093/sysbio/46.3.523
  10. Degnan JH, Rosenberg NA. 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol. 24(6):332–340. doi:10.1016/j.tree.2009.01.009
  11. Felsenstein J. 1973. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet. 25(5):471–492. PubMed: 4741844
  12. Felsenstein J. 1985. Phylogenies and the comparative method. Am Nat. 125(1):1–15. doi:10.1086/284325
  13. Butler MA, King AA. 2004. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. Am Nat. 164(6):683–695. doi:10.1086/426002
  14. Louca S, Pennell MW. 2020. Extant timetrees are consistent with a myriad of diversification histories. Nature. 580(7804):502–505. doi:10.1038/s41586-020-2176-1
  15. Caetano DS, Quental TB. 2023. How important is budding speciation for comparative studies? Syst Biol. 72(6):1443–1453. doi:10.1093/sysbio/syad050
  16. Felsenstein J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool. 27(4):401–410. doi:10.1093/sysbio/27.4.401
Choose Colour