The usual way to demonstrate saturation in nucleotide sequences is to plot the fraction of differences between sequences against the evolutionary distance separating them. When the number of observed differences, for example for the fraction of third codon positions, no longer increases with increasing evolutionary distance, the sequence is said to be saturated. The same technique can be applied to amino acid (aa) sequences. We have developed a Java application called ASaturA that discriminates aa substitutions with high and low probabilities of occurrence. All aa replacements are defined either as 'frequent' or as 'rare' depending on their mutation probabilities, which are inferred from substitution probability matrices, such as the well-known PAM and BLOSUM. These 20 by 20 matrices provide the empirically derived probabilities of one aa being replaced by another one when sequences have diverged over a certain evolutionary distance. ASaturA sorts all substitutions according to these probabilities and a probability 'cut-off' value can be chosen that differentiates between frequent and rare substitutions. For each sequence pair, the program plots the number of observed frequent and rare aa replacements against their evolutionary distance. By modifying the substitution probability 'cut-off' value, the number of aa substitutions classified as frequent or rare can be changed. Ideally, careful selection of the 'cut-off' value splits the original data set into a saturated and an unsaturated one. Besides the most widely used substitution probability matrices, such as PAM, BLOSUM, mtREV24 and JTT, user-defined matrices can be used also.
After the fraction of aa replacements is estimated that will probably be
saturated, evolutionary distances between sequences can be computed from
the unsaturated fraction of sites (i.e., the 'rare' sites) and evolutionary
trees can be automatically computed from these distance values by neighbour
joining. ASaturA is available from the authors upon request. In order to
run the ASaturA program, the JavaTM Runtime environment (1.4 or later) needs
to be installed as well (can be downloaded here).
A collection of substitution matrices (you need them!)
Van de Peer, Y., Frickey, T., Taylor, J.S., Meyer, A. (2002) Dealing with saturation at the amino acid level: A case study involving anciently duplicated zebrafish genes. Gene 295(2):205-11.
VIB / UGent
Bioinformatics & Evolutionary Genomics
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)