The gain and loss of genes during 600 million years of vertebrate evolution.

Blomme, T., Vandepoele, K., De Bodt, S., Simillion, C., Maere, S., Van de Peer, Y.

Corresponding author:


Gene duplication is assumed to have played a crucial role in the evolution of vertebrate organisms. Apart from a continuous mode of duplication, two or three whole genome duplication events have been proposed during the evolution of vertebrates, one or two at the dawn of vertebrate evolution, and an additional one in the fish lineage, not shared with land vertebrates. Here, we have studied gene gain and loss in seven different vertebrate genomes, spanning an evolutionary period of about 600 million years.

We show that: first, the majority of duplicated genes in extant vertebrate genomes are ancient and were created at times that coincide with proposed whole genome duplication events; second, there exist significant differences in gene retention for different functional categories of genes between fishes and land vertebrates; third, there seems to be a considerable bias in gene retention of regulatory genes towards the mode of gene duplication (whole genome duplication events compared to smaller-scale events), which is in accordance with the so-called gene balance hypothesis; and fourth, that ancient duplicates that have survived for many hundreds of millions of years can still be lost.

Based on phylogenetic analyses, we show that both the mode of duplication and the functional class the duplicated genes belong to have been of major importance for the evolution of the vertebrates. In particular, we provide evidence that massive gene duplication (probably as a consequence of entire genome duplications) at the dawn of vertebrate evolution might have been particularly important for the evolution of complex vertebrates.

Supplementary Data

Vertebrate phylogenetic trees

Access the Vertebrate phylogenetic trees application here.

The search criterium can be chosen at the top left of the page:

  • tree number: in the analysis 10,086 tree families were studied, but unfortunately we were not able to build a phylogenetic tree for all of them (see manuscript). If the chosen tree number exists in the dataset, the number appears as a link in the section "Search results" after clicking "Start search". If the tree with the chosen number does not exist, the "Search results" section will say "no results found".
  • Ensembl protein ID: the protein dataset was created based on Ensembl data (see manuscript). An Ensembl protein ID may be filled in and if this protein is present in the trees, the corresponding tree number will be shown in the "Search results" section.
  • function (GO): all trees were functionally annotated with GOslim labels (see manuscript)). This functional annotation can be search by GOslim label (GO:x) or by description of the GOlabel.
  • duplication time point: in all trees, duplication events were identified and evaluated by relative dating (see manuscript). The numbers of the time points with identified duplication events are displayed by clicking on "Show time points on vertebrate tree".

By clicking the links in the "Search results" section, the image of the tree and its functional annotation are shown in the center of the page. By clicking the protein IDs and the nodes (blue dots) in the tree image, additional information is shown on the right side of the page. For proteins, the gene description and the organism are available. For the nodes, the bootstrap value and the type of node is shown ("duplication or speciation"). In case of a duplication event, the relative dating result is also shown ("duplication time point"), if available. If loss was identified after the duplication event, the number of loss events is shown in the section ("loss x"). Finally, if the node is the beginning of an orthologous group -as big as possible in the shown tree topology- this is shown in the last line ("orthologous group").

Protein descriptions

The list with descriptions (Ensembl) of the proteins can be downloaded here.

VIB / UGent
Bioinformatics & Evolutionary Genomics
Technologiepark 927
B-9052 Gent
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)

Don't hesitate to contact the in case of problems with the website!