Dealing with saturation at the amino acid level: A case study involving anciently duplicated zebrafish genes.
The ray-finned fishes (Actinopterygii) seem to have two copies of many tetrapod (Sarcopterygii) genes. The origin of these duplicate fish genes is the subject of some controversy. One explanation for the existence of these extra fish genes could be an increase in the rate of independent gene duplications in fishes. Alternatively, gene duplicates in fish may have been formed in the ancestor of all or most Actinopterygii during a complete genome duplication event. A third possibility is that tetrapods have lost more genes than fish after gene or genome duplication events in the common ancestor of both lineages. These three hypotheses can be tested by phylogenetic reconstruction. Previously, we found that a large number of anciently duplicated genes of zebrafish are sister sequences in evolutionary trees suggesting that they were produced in Actinopterygii after the divergence of Sarcopterygii [Phil. Trans. R. Soc. Lond. B 356 (2001) 119]. On the other hand, several well-supported trees showed one of the two fish genes as the sister sequence to a monophyletic clade that included the second fish gene and genes from frog, chicken, mouse and human. These so-called outgroup topologies suggest that the origin of many fish duplicates predates the divergence of the Sarcopterygii and Actinopterygii and support the hypothesis that tetrapods have lost duplicates that have been retained in fish. Here we show that many of these 'outgroup' tree topologies are erroneous and can be corrected when mutational saturation is taken into account. To this end, a Java-based application has been developed to visualize the amount of saturation in amino acid sequences. The program graphically displays the number of observed frequent and rare amino acid replacements between pairs of sequences against their overall evolutionary distance. Discrimination between frequent and rare amino acid replacements is based on substitution probability matrices (e.g. PAM and BLOSUM). Evolutionary distances between sequences can be computed from the fraction of unsaturated sites only and evolutionary trees inferred by pairwise distance methods. When trees are computed by omitting the saturated fraction of sites, most fish duplicates are sister sequences.
Van de Peer, Y., Frickey, T., Taylor, J.S., Meyer, A. (2002) Dealing with saturation at the amino acid level: A case study involving anciently duplicated zebrafish genes. Gene 295(2):205-11.
VIB / UGent
Bioinformatics & Evolutionary Genomics
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)