ATCOECIS FAQ

Frequently Asked Questions

Enrichment fold and p-value? Only GO categories/motifs with two or more genes in the input set and showing enrichment compared to the background frequency (in the complete genome) are reported. Enrichment fold is the ratio of the GO/motif frequency in the gene set (or cluster) over the background GO/motif frequency. P-values are calculated using the hypergeometric distribution and are not corrected for multiple hypothesis testing.

Motif source and mappings? Whereas known plant cis-regulatory elements were retrieved from PLACE (Higo et al., 1999Go) and AGRIS (Palaniswamy et al., 2006Go), a complementary set of elements was identified using the network-level conservation principle, which applies a systems-level constraint (Elemento and Tavazoie, 2005; see manuscriptfor details). Using motif conservation over orthologous genes between Arabidopsis and poplar (Populus trichocarpa), 866 nonredundant 8-mer motifs with significant Network-level Conservation Scores (NCS; P value < 0.05) were identified. Although the network-level conservation method provides an elegant way to uncover candidate cis-regulatory elements, identifying individual biological functional motif instances on promoter sequences remains problematic. Especially the short and sometimes degenerate nature of these 8-mers (or TF-binding sites in general) yields a large fraction of false-positive motif matches. Therefore, for NCS motifs, we only considered Arabidopsis instances showing evolutionary conservation in one or more orthologous poplar promoters. This filtering step yielded overall higher enrichment values when validating motif instances using GO (cfr. manuscript Table II). In contrast, for known experimentally defined plant motifs from PLACE and AGRIS, all motif instances on Arabidopsis promoters were retained for further analysis. Motif mapping was done using dna-pattern (RSA tools; van Helden et al., 2000Go) and was restricted to the first 1,000 bp upstream from the translation start site or to a shorter region if the adjacent upstream gene is located within a distance smaller than 1,000 bp.

Orthologous promoters? Orthologous groups were identified through protein clustering using OrthoMCL (Li et al., 2003Go). Starting from an all-against-all BLASTP sequence similarity search using the full proteomes of Arabidopsis (26,541 proteins) and P. trichocarpa (45,554 proteins), 11,707 orthologous clusters were defined, covering 18,088 Arabidopsis and 22,760 poplar genes. These orthologous groups contain inparalogous genes (i.e. genes duplicated after the divergence between Arabidopsis and P. trichocarpa) and thus offer a more realistic representation of orthology compared to, for example, reciprocal best hit approaches.

Search gene coexpression neighborhoods using your genes as input? Simple reports for each returned gene using simple counts the number of matching genes (from the input set). Advanced calculates for each returned gene a score combining the number of matching genes (from the input set) with the strength of the coexpression relationships.

Reference? Reference: Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and coexpression networks. (2009) Vandepoele, K., Quimbaya, M., Casneuf, T., De Veylder L. and Van de Peer, Y. Plant Physiology

Questions or remarks? Mail Klaas Vandepoele
or visit our Bioinformatics & Evolutionary Genomics homepage