Identification of novel regulatory modules in dicot plants using expression data and comparative genomics.

Vandepoele, K., Casneuf, T., Van de Peer, Y.

Corresponding author:


Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBS) are the functional elements that determine transcriptional activity and are organized into separated cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequenctly, the discovery of novel TFBS in promoter sequences is an important step to improve our understanding of gene regulation.

Here, we applied a detection strategy that combines features of classical motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBS and 139 regulatory elements that could be linked to different important biological processes such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature with cooperative TFBS located in close proximity of each other.

These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view.

Supplementary Data

  • Interactive search tool Click here
  • Raw files
    • Transcription Factor Binding Site (TFBS) matrices
      Position Weight Matrices (PWMs) compatible with the MotifSampler format can be found here.
    • TFBS target genes
      A GFF file with the matches of all TFBS on the upstream regions Arabidopsis can be found here (20MB). The same information is also available in a gene-TFBS matrix file (6MB), showing for each gene its description and the PWMs matching its upstream region (tab-delimited). The corresponding upstream regions are available here (multi-fasta file).
    • cis-regulatory modules (CRM)
      The TFBS content of different modules as well as the GO enrichment can be found here. Note that this table corresponds with Supplemental Table 3.
    • CRM target genes
      An overview of the target genes for the different modules can be found here (6.2MB). Exact coordinates of cooperative TFBS can be found in the GFF file.

