A Greedy, Graph-Based Algorithm for the Alignment of Multiple Homologous Gene Lists.
Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In this paper, the alignment of multiple homologous genomic segments, represented as ordered gene lists, is studied. The alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged sequences. The alignment of multiple gene lists can be cast into a cycle-canceling problem in a graph, allowing for an accurate and efficient algorithm.
Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.
The algorithm is implemented as a part of the i-ADHoRe 3.0 package.
* Fostier, J., * Proost, S., Dhoedt, B., Saeys, Y., Demeester, P., Van de Peer, Y., Vandepoele, K. (2011) A Greedy, Graph-Based Algorithm for the Alignment of Multiple Homologous Gene Lists. Bioinformatics 27, 749-56. *contributed equally
VIB / UGent
Bioinformatics & Evolutionary Genomics
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)