Sub-Species Level Diversification and Phylogeny Reconstruction

Published on 20 Aug 202415 min readLink

TL;DR

It is not mandatory to visualize species diversification process as trees. We can instead show a graph of relationships between incipient species, or in other words, the chain of evolutionary events.

To illustrate the idea, I developed an interactive simulator to explore different evolutionary trajectories in four aspects of view. The simulation is based on Gillespie's algorithm. The coarsen level determines how many consecutive (theoretically meaningful) mutations are needed to form a new species, with associated traits gradually evolving with each mutation.

To begin, press the Grow button to start the simulation. If the simulation becomes unresponsive (e.g., all nodes go extinct), simply press the Reset button. Blue nodes are non-mutating sub-species and red nodes are mutating sub-species. Extinction sub-species become grey. Feel free to experiment with the simulator first, and then dive into the technical details for a deeper understanding.

Hint: You can drag the nodes in the graph panels. Hover your mouse on a node/tip in one panel, corresponding node/tips will be highlighted in all the other panels. You may observe strange trees, this is due to lacking of information.

Birth Rate: 0.40

Mutation Rate: 0.40

Death Rate: 0.10

Simulation Steps: 25

Coarsen Level: 1

Trait Dimensions: 5

Mutation Model:

Show Branch Lengths

Prune Extinct Lineages

The simulator provides four panels for visualization:

Panel 1: Displays the complete network, illustrating the ancestral relationships between sub-species (nodes).
Panel 2: Shows a coarsened network, representing species formation after multiple mutations, depending on the coarsen level.
Panel 3: Shows a tree reconstructed from the coarsened network.
Panel 4: Shows a tree reconstructed from the trait values of the species.

Traditional Era: Morphological Tree Construction

In the pre-phylogenetics era, researchers classified organisms based on observable traits, often using Linnaean taxonomy. Morphological similarities, such as body structure or bone shapes, were grouped into hierarchical categories. Tools like homology studies helped identify shared ancestry based on physical features. Branch lengths in these trees were calibrated using the fossil record, estimating divergence times from known fossil dates. This approach provided a rough timeline of evolution, representing the passage of time between ancestral and descendant species.

Modern Era: Molecular Phylogenetics

With the rise of molecular biology, phylogenetic trees are now built using DNA sequence data. Tools like MAFFT align sequences, followed by RAxML or IQ-TREE to infer relationships. The model selection step uses software like jModelTest to pick the best evolutionary model. Branch lengths are derived from the amount of genetic change, where mutations accumulate over time. This approach often results in a tree that reflects genetic distances, rather than explicit time.

Branch Length Interpretations

The branch lengths in phylogenetic trees can have different interpretations depending on the method used. In time-calibrated trees (fossil-based), they represent the actual time elapsed, while in molecular trees, branch lengths typically indicate the amount of genetic change between species.

Stochastic Diversification Models

In stochastic simulations of species diversification, branch lengths are often aligned with fossil-based timelines, although implicitly. In those models, branch lengths represent the duration of survival of lineages.

Brownian Motion and Trait Change

Under the Brownian Motion model, there is a correlation between the amount of genetic change and the evolution of traits over time. Genetic changes are assumed to accumulate randomly, which mirrors gradual shifts in species traits.

Aligning Trait Trees with Phylogenetic Trees

Increasing the dimensionality of trait space can help better approximate genetic change. By accounting for a broader spectrum of characteristics, the resulting trait trees may align more closely with phylogenetic trees based on genetic data. Traits are often the phenotypic expression of underlying genetic variation, capturing more traits reduces the gap between observable features and the genetic changes that drive evolution.

The Discrepancy between Different Trees

The duration of survival may not directly relate to the amount of genetic change, therefore not directly relate to the traits. Can we expect a discrepancy between trees based on time of survival and tree based on trait differences? Can we observe lessened discrepancy if the trait space has more dimensions?

The Impossible Reconstruction

Given a full network of incipient species, or the chain of historical events, there are various cases in which we cannot reconstruct a valid phylogenetic tree, even if we have all the information at hand. One example is that the initial ancestor went extinct. Theoretically, we may also encounter various cases that we cannot reconstruct valid trees even from trait tables.

Seeing is Believing

Several serious papers have already discussed these issues mathematically. However, I have decided to see it with my own eyes. Under what circumstances can we expect no valid tree or erroneous trees? Does the increase of trait dimensions really bring phylogenetic trees closer to trait trees, even if the phylogenetic tree is simulated? With my simulator the exploration can be fun and intuitive.

There are 7 adjustable parameters in the simulator, four of them can be changed during simulation while the other three must be set at the beginning of each simulation cycle. Internally, the simulator is running on Gillespie's algorithm, but the displaying time intervals between events are stretched for better visualization. In the simulator, a mutation is assumed as a theoretical standardized unit mutation (?), its concept may be ambiguous, but I think it is fine for a toy project.

Below is a list of the parameters you can play with:

Birth Rate: The rate at which new sub-species are born but not mutated meaningfully.
Mutation Rate: The rate at which new sub-species are born and accumulated meaningful mutation.
Death Rate: The rate at which an entire sub-species go extinct.
Simulation Steps: The number of steps the simulation will run before stopping. Cannot be changed during simulation
Coarsen Level: The number of consecutive mutations needed to form a new species. CAN be changed at any time.
Trait Dimensions: The number of traits each sub-species has. This cannot be changed if simulation has started
Mutation Model: The model used to simulate gradual trait evolution. Currently, only Brownian Motion is available for now.
Show Branch Lengths: Whether to show branch lengths in the trees. Can be toggled at any time.
Prune Extinct Lineages: Whether to remove extinct lineages from the trees. Can be toggled at any time.

You can pause and resume simulation at will. If all the sub-species go extinct, the simulation will stop, you must manually reset the simulator.

Tianjian Qin