Modelling the ancestral sequence distribution and model frequencies in contextdependent models for primate noncoding sequences.BackgroundRecent approaches for contextdependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different orders of Markov chains to model dependence at the ancestral root sequence. Root distributions which are coupled to the contextdependent model across the underlying phylogenetic tree are deemed more realistic than decoupled Markov chains models, as the evolutionary process is responsible for shaping the composition of the ancestral root sequence. ResultsWe find strong support, in terms of Bayes Factors, for using a secondorder Markov chain at the ancestral root sequence along with a contextdependent model throughout the remainder of the phylogenetic tree in an ancestral repeats dataset, and for using a firstorder Markov chain at the ancestral root sequence in a pseudogene dataset. Relaxing the assumption of a single contextindependent set of independent model frequencies as in Baele et al., yields a further drastic increase in model fit. We show that the substitution rates associated with the CpGmethylationdeamination process can be modelled through contextdependent model frequencies and that their accuracy depends on the (order of the) Markov chain imposed at the ancestral root sequence. In addition, we provide evidence that this approach (which assumes that root distribution and evolutionary model are decoupled) outperforms an approach inspired by the work of Arndt et al., where the root distribution is coupled to the evolutionary model. We show that the continuoustime approximation of Hwang and Green has stronger support in terms of Bayes Factors, but the parameter estimates show minimal differences. ConclusionsWe show that the combination of a dependency scheme at the ancestral root sequence and a contextdependent evolutionary model across the remainder of the tree allows for accurate estimation of the model's parameters. The different assumptions tested in this manuscript clearly show that designing accurate contextdependent models is a complex process, with many different assumptions that require validation. Further, these assumptions are shown to change across different datasets, making the search for an adequate model for a given dataset quite challenging. Baele, G., Van de Peer, Y., Vansteelandt, S. (2010) Modelling the ancestral sequence distribution and model frequencies in contextdependent models for primate noncoding sequences. BMC Evol. Biol. 10:244. 

Contact:
VIB / UGent Bioinformatics & Evolutionary Genomics Technologiepark 927 B9052 Gent BELGIUM +32 (0) 9 33 13807 (phone) +32 (0) 9 33 13809 (fax) 