Klaas Vandepoele

Principal investigator

Department of Plant Biotechnology and Bioinformatics (WE09), Ghent University
VIB Center for Plant Systems Biology
Technologiepark 71, 9052 Ghent, Belgium
Phone +00 32 9 3313822
E-mail: firstname.lastnameATpsb.vib-ugent.be

Short curriculum

Birthdate: 21 May 1978

February 2018– current
VIB Group leader, Comparative Network Biology, Center for Plant Systems Biology, VIB

February 2016– current
Associate Professor Ghent University. Department of Plant Biotechnology and Bioinformatics (Ghent University) – VIB Center for Plant Systems Biology

February 2011– January 2016
Assistant Professor Ghent University (BOF-Tenure Track), MRP project: From Nucleotides to Networks. Department of Plant Biotechnology and Bioinformatics (Ghent University) – VIB Department of Plant Systems Biology

April 2005 – January 2011
Postdoctoral fellow FWO Vlaanderen, “Evolutionary analysis of transcriptional regulation in plants and other eukaryotic organisms”. Bioinformatics and Systems Biology division, Department of Plant Biotechnology and Genetics (Ghent University) – VIB Department of Plant Systems Biology

Visiting scientist at Lab Observatoire Océanologique de Banyuls Laboratoire Arago, Banyuls-sur-Mer, France

  • Collaboration with G. Piganeau (June 2006 and October 2007). “Structure of regulatory regions in the genomes of unicellular eukaryotes”
  • Collaboration H. Moreau and G. Piganeau (November 2009 and July 2010). “pico-PLAZA, a resource for algal comparative genomics”


Ghent University, Ph. D. student, Bioinformatics and Evolutionary Genomics division, Department of Molecular Genetics (Plant Systems Biology – VIB)
Specialization grant IWT (Flemish government institution)
Dissertation: “Mode and tempo of gene and genome evolution in plants” (March 17th 2005)


Ghent University, Bachelor of Chemistry
Ghent University, Master of Biotechnology

Service as Reviewer and Editor

  • Journals: Bioinformatics, BMC Bioinformatics, BMC Evolutionary Biology, BMC Genomics, BMC Plant Biology, BMC Research Notes, Current Opinion in Plant Biology, 'Development, Genes and Evolution', In Silico Biology, Journal of Experimental Botany, Journal of Molecular Evolution, Molecular Genetics and Genomics, Nucleic Acids Research, The Plant Cell, The Plant Journal, Plant Physiology, Plant Physiology and Biochemistry, PLoS Genetics, Tree Genetics and Genomes, Science Signaling, Trends in Plant Science
  • Reviewer proposals Freiburg Institute for Advanced Study (FRIAS) and the University of Strasbourg Institute for Advanced Study (USIAS)
  • Reviewer COST (an intergovernmental framework for European Cooperation in Science and Technology)
  • Reviewer The Netherlands Organisation for Scientific Research (NWO)
  • Member Editorial Board TheScientificWorldJOURNAL: 2011-2012
  • Academic Editor PLoS One: 2014- current
  • Member Editorial Board http://insilicoplants.org : 2018- current

Klaas Vandepoele is a highly cited researcher (2017-2018).

LinkOut: LinkedIn - ORCID iD iconorcid.org/0000-0003-4790-2725 - Google Scholar


  1. Prince, S. J., Valliyodan, B., Ye, H., Yang, M., Tai, S., Hu, W., Murphy, M., et al. (2019). Understanding genetic control of root system architecture in soybean : insights into the genetic basis of lateral root number. PLANT CELL AND ENVIRONMENT, 42(1), 212–229.
    Developing crops with better root systems is a promising strategy to ensure productivity in both optimum and stress environments. Root system architectural traits in 397 soybean accessions were characterized and a high-density single nucleotide polymorphisms (SNPs)-based genome-wide association study was performed to identify the underlying genes associated with root structure. SNPs associated with root architectural traits specific to landraces and elite germplasm pools were detected. Four loci were detected in landraces for lateral root number (LRN) and distribution of root thickness in diameter Class I with a major locus on chromosome 16. This major loci was detected in the coding region of unknown protein, and subsequent analyses demonstrated that root traits are affected with mutated haplotypes of the gene. In elite germplasm pool, 3 significant SNPs in alanine-glyoxalate aminotransferase, Leucine-Rich Repeat receptor/No apical meristem, and unknown functional genes were found to govern multiple traits including root surface area and volume. However, no major loci were detected for LRN in elite germplasm. Nucleotide diversity analysis found evidence of selective sweeps around the landraces LRN gene. Soybean accessions with minor and mutated allelic variants of LRN gene were found to perform better in both water-limited and optimal field conditions.
  2. Vaneechoutte, D., & Vandepoele, K. (2019). Curse : building expression atlases and co-expression networks from public RNA-Seq data. BIOINFORMATICS.
    Public RNA-Sequencing (RNA-Seq) datasets are a valuable resource for transcriptome analyses, but their accessibility is hindered by the imperfect quality and presentation of their metadata and by the complexity of processing raw sequencing data. The Curse suite was created to alleviate these problems. It consists of an online curation tool named Curse to efficiently build compendia of experiments hosted on the Sequence Read Archive, and a lightweight pipeline named Prose to download and process the RNA-Seq data into expression atlases and co-expression networks. Curse networks showed improved linking of functionally related genes compared to the state-of-the-art.; Availability and implementation: Curse, Prose, and their manuals are available at http://bioinformatics.psb.ugent.be/webtools/Curse/. Prose was implemented in Java.; Supplementary information: Supplementary data are available at Bioinformatics online.
  3. Veeckman, E., Van Glabeke, S., Haegeman, A., Muylle, H., van Parijs, F. R., Byrne, S. L., Asp, T., et al. (2019). Overcoming challenges in variant calling : exploring sequence diversity in candidate genes for plant development in perennial ryegrass (Lolium perenne). DNA RESEARCH, 26(1), 1–12.
    Revealing DNA sequence variation within the Lolium perenne genepool is important for genetic analysis and development of breeding applications. We reviewed current literature on plant development to select candidate genes in pathways that control agronomic traits, and identified 503 orthologues in L. perenne. Using targeted resequencing, we constructed a comprehensive catalogue of genomic variation for a L. perenne germplasm collection of 736 genotypes derived from current cultivars, breeding material and wild accessions. To overcome challenges of variant calling in heterogeneous outbreeding species, we used two complementary strategies to explore sequence diversity. First, four variant calling pipelines were integrated with the VariantMetaCaller to reach maximal sensitivity. Additional multiplex amplicon sequencing was used to empirically estimate an appropriate precision threshold. Second, a de novo assembly strategy was used to reconstruct divergent alleles for each gene. The advantage of this approach was illustrated by discovery of 28 novel alleles of LpSDUF247, a polymorphic gene co-segregating with the S-locus of the grass self-incompatibility system. Our approach is applicable to other genetically diverse outbreeding species. The resulting collection of functionally annotated variants can be mined for variants causing phenotypic variation, either through genetic association studies, or by selecting carriers of rare defective alleles for physiological analyses.
  4. Lama, S., Broda, M., Abbas, Z., Vaneechoutte, D., Belt, K., Säll, T., Vandepoele, K., et al. (2019). Neofunctionalization of Mitochondrial Proteins and Incorporation into Signaling Networks in Plants. (M. Purugganan, Ed.)Molecular Biology and Evolution, 36(5), 974–989.
    Because of their symbiotic origin, many mitochondrial proteins are well conserved across eukaryotic kingdoms. It is however less obvious how specific lineages have obtained novel nuclear-encoded mitochondrial proteins. Here, we report a case of mitochondrial neofunctionalization in plants. Phylogenetic analysis of genes containing the Domain of Unknown Function 295 (DUF295) revealed that the domain likely originated in Angiosperms. The C-terminal DUF295 domain is usually accompanied by an N-terminal F-box domain, involved in ubiquitin ligation via binding with ASK1/SKP1-type proteins. Due to gene duplication, the gene family has expanded rapidly, with 94 DUF295-related genes in Arabidopsis thaliana alone. Two DUF295 family subgroups have uniquely evolved and quickly expanded within Brassicaceae. One of these subgroups has completely lost the F-box, but instead obtained strongly predicted mitochondrial targeting peptides. We show that several representatives of this DUF295 Organellar group are effectively targeted to plant mitochondria and chloroplasts. Furthermore, many DUF295 Organellar genes are induced by mitochondrial dysfunction, whereas F-Box DUF295 genes are not. In agreement, several Brassicaceae-specific DUF295 Organellar genes were incorporated in the evolutionary much older ANAC017-dependent mitochondrial retrograde signaling pathway. Finally, a representative set of DUF295 T-DNA insertion mutants was created. No obvious aberrant phenotypes during normal growth and mitochondrial dysfunction were observed, most likely due to the large extent of gene duplication and redundancy. Overall, this study provides insight into how novel mitochondrial proteins can be created via “intercompartmental” gene duplication events. Moreover, our analysis shows that these newly evolved genes can then be specifically integrated into relevant, pre-existing coexpression networks.
  5. Van Bel, M., Bucchini, F., & Vandepoele, K. (2019). Gene space completeness in complex plant genomes. (S. Kelly, Ed.)CURRENT OPINION IN PLANT BIOLOGY, 48, 9–17.
    Genome annotations offer ample opportunities to study gene functions, biochemical and regulatory pathways, or quantitative trait loci in plants. Determining the quality and completeness of a genome annotation, and maintaining the balance between them, are major challenges, even for genomes of well-studied model organisms. In this review, we present a historical overview of the complexity in different plant genomes and discuss the hurdles and possible solutions in obtaining a complete and high-quality genome annotation. We illustrate there is no clear-cut answer to solve these challenges for different gene types, but provide tips on guiding the iterative process of generating a superior genome annotation, which is a moving target as our knowledge about plant genomics increases and additional data sources become available.
  6. Van Leene, J., Han, C., Gadeyne, A., Eeckhout, D., Matthijs, C., Cannoot, B., De Winne, N., et al. (2019). Capturing the phosphorylation and protein interaction landscape of the plant TOR kinase. NATURE PLANTS, 5, 316–327.
    The target of rapamycin (TOR) kinase is a conserved regulatory hub that translates environmental and nutritional information into permissive or restrictive growth decisions. Despite the increased appreciation of the essential role of the TOR complex in plants, no large-scale phosphoproteomics or interactomics studies have been performed to map TOR signalling events in plants. To fill this gap, we combined a systematic phosphoproteomics screen with a targeted protein complex analysis in the model plant Arabidopsis thaliana. Integration of the phosphoproteome and protein complex data on the one hand shows that both methods reveal complementary subspaces of the plant TOR signalling network, enabling proteome-wide discovery of both upstream and downstream network components. On the other hand, the overlap between both data sets reveals a set of candidate direct TOR substrates. The integrated network embeds both evolutionarily-conserved and plant-specific TOR signalling components, uncovering an intriguing complex interplay with protein synthesis. Overall, the network provides a rich data set to start addressing fundamental questions about how TOR controls key processes in plants, such as autophagy, auxin signalling, chloroplast development, lipid metabolism, nucleotide biosynthesis, protein translation or senescence.
  7. Pollier, J., Vancaester, E., Kuzhiumparambil, U., Vickers, C. E., Vandepoele, K., Goossens, A., & Fabris, M. (2019). A widespread alternative squalene epoxidase participates in eukaryote steroid biosynthesis. NATURE MICROBIOLOGY, 4(2), 226–233.
    Steroids are essential triterpenoid molecules that are present in all eukaryotes and modulate the fluidity and flexibility of cell membranes. Steroids also serve as signalling molecules that are crucial for growth, development and differentiation of multicellular organisms1-3. The steroid biosynthetic pathway is highly conserved and is key in eukaryote evolution4-7. The flavoprotein squalene epoxidase (SQE) catalyses the first oxygenation reaction in this pathway and is rate limiting. However, despite its conservation in animals, plants and fungi, several phylogenetically widely distributed eukaryote genomes lack an SQE-encoding gene7,8. Here, we discovered and characterized an alternative SQE (AltSQE) belonging to the fatty acid hydroxylase superfamily. AltSQE was identified through screening of a gene library of the diatom Phaeodactylum tricornutum in a SQE-deficient yeast. In accordance with its divergent protein structure and need for cofactors, we found that AltSQE is insensitive to the conventional SQE inhibitor terbinafine. AltSQE is present in many eukaryotic lineages but is mutually exclusive with SQE and shows a patchy distribution within monophyletic clades. Our discovery provides an alternative element for the conserved steroid biosynthesis pathway, raises questions about eukaryote metabolic evolution and opens routes to develop selective SQE inhibitors to control hazardous organisms.
  8. Kulkarni, S. R., Vaneechoutte, D., Van de Velde, J., & Vandepoele, K. (2018). TF2Network : predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information. NUCLEIC ACIDS RESEARCH, 46(6).
    A gene regulatory network (GRN) is a collection of regulatory interactions between transcription factors (TFs) and their target genes. GRNs control different biological processes and have been instrumental to understand the organization and complexity of gene regulation. Although various experimental methods have been used to map GRNs in Arabidop-sis thaliana, their limited throughput combined with the large number of TFs makes that for many genes our knowledge about regulating TFs is incomplete. We introduce TF2Network, a tool that exploits the vast amount of TF binding site information and enables the delineation of GRNs by detecting potential regulators for a set of co-expressed or functionally related genes. Validation using two experimental benchmarks reveals that TF2Network predicts the correct regulator in 75-92% of the test sets. Furthermore, our tool is robust to noise in the input gene sets, has a low false discovery rate, and shows a better performance to recover correct regulators compared to other plant tools. TF2Network is accessible through a web interface where GRNs are interactively visualized and annotated with various types of experimental functional information. TF2Network was used to perform systematic functional and regulatory gene annotations, identifying new TFs involved in circadian rhythm and stress response.
  9. Krasovec, M., Vancaester, E., Rombauts, S., Bucchini, F., Yau, S., Hemon, C., Lebredonchel, H., et al. (2018). Genome analyses of the microalga Picochlorum provide insights into the evolution of thermotolerance in the green lineage. GENOME BIOLOGY AND EVOLUTION, 10(9), 2347–2365.
    While the molecular events involved in cell responses to heat stress have been extensively studied, our understanding of the genetic basis of basal thermotolerance, and particularly its evolution within the green lineage, remains limited. Here, we present the 13.3-Mb haploid genome and transcriptomes of a halotolerant and thermotolerant unicellular green alga, Picochlorum costavermella (Trebouxiophyceae) to investigate the evolution of the genomic basis of thermotolerance. Differential gene expression at high and standard temperatures revealed that more of the gene families containing up-regulated genes at high temperature were recently evolved, and less originated at the ancestor of green plants. Inversely, there was an excess of ancient gene families containing transcriptionally repressed genes. Interestingly, there is a striking overlap between the thermotolerance and halotolerance transcriptional rewiring, as more than one-third of the gene families up-regulated at 35 degrees C were also up-regulated under variable salt concentrations in Picochlorum SE3. Moreover, phylogenetic analysis of the 9,304 protein coding genes revealed 26 genes of horizontally transferred origin in P. costavermella, of which five were differentially expressed at higher temperature. Altogether, these results provide new insights about how the genomic basis of adaptation to halo- and thermotolerance evolved in the green lineage.
  10. Gao, Zhen, Daneva, A., Salanenka, Y., Van Durme, M., Huysmans, M., Lin, Z., De Winter, F., et al. (2018). KIRA1 and ORESARA1 terminate flower receptivity by promoting cell death in the stigma of Arabidopsis. NATURE PLANTS, 4(6), 365–+.
    Flowers have a species-specific functional life span that determines the time window in which pollination, fertilization and seed set can occur. The stigma tissue plays a key role in flower receptivity by intercepting pollen and initiating pollen tube growth toward the ovary. In this article, we show that a developmentally controlled cell death programme terminates the functional life span of stigma cells in Arabidopsis. We identified the leaf senescence regulator ORESARA1 (also known as ANAC092) and the previously uncharacterized KIRA1 (also known as ANAC074) as partially redundant transcription factors that modulate stigma longevity by controlling the expression of programmed cell death-associated genes. KIRA1 expression is sufficient to induce cell death and terminate floral receptivity, whereas lack of both KIRA1 and ORESARA1 substantially increases stigma life span. Surprisingly, the extension of stigma longevity is accompanied by only a moderate extension of flower receptivity, suggesting that additional processes participate in the control of the flower's receptive life span.
  11. De Clerck, Olivier, Kao, S.-M., Bogaert, K., Blomme, J., Foflonker, F., Kwantes, M., Vancaester, E., et al. (2018). Insights into the evolution of multicellularity from the sea lettuce genome. CURRENT BIOLOGY, 28(18), 2921–2933.
    We report here the 98.5 Mbp haploid genome (12,924 protein coding genes) of Ulva mutabilis, a ubiquitous and iconic representative of the Ulvophyceae or green seaweeds. Ulva's rapid and abundant growth makes it a key contributor to coastal biogeochemical cycles; its role in marine sulfur cycles is particularly important because it produces high levels of dimethylsulfoniopropionate (DMSP), the main precursor of volatile dimethyl sulfide (DMS). Rapid growth makes Ulva attractive biomass feedstock but also increasingly a driver of nuisance "green tides." Ulvophytes are key to understanding the evolution of multicellularity in the green lineage, and Ulva morphogenesis is dependent on bacterial signals, making it an important species with which to study cross-kingdom communication. Our sequenced genome informs these aspects of ulvophyte cell biology, physiology, and ecology. Gene family expansions associated with multicellularity are distinct from those of freshwater algae. Candidate genes, including some that arose following horizontal gene transfer from chromalveolates, are present for the transport and metabolism of DMSP. The Ulva genome offers, therefore, new opportunities to understand coastal and marine ecosystems and the fundamental evolution of the green lineage.
  12. Forslund, K., Pereira, C., Capella-Gutierrez, S., Sousa da Silva, A., Altenhoff, A., Huerta-Cepas, J., Muffato, M., et al. (2018). Gearing up to handle the mosaic nature of life in the quest for orthologs. BIOINFORMATICS, 34(2), 323–329.
    The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.
  13. Hansen, B. O., Meyer, E. H., Ferrari, C., Vaid, N., Movahedi, S., Vandepoele, K., Nikoloski, Z., et al. (2018). Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana. NEW PHYTOLOGIST, 217(4), 1521–1534.
    Recent advances in gene function prediction rely on ensemble approaches that integrate results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We have explored and compared two methods to integrate 10 gene co-function networks for Arabidopsis thaliana and demonstrate how the integration of these networks produces more accurate gene function predictions for a larger fraction of genes with unknown function. These predictions were used to identify genes involved in mitochondrial complex I formation, and for five of them, we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet. The methods presented here demonstrate that ensemble gene function prediction is a powerful method to boost prediction performance, whereas the EnsembleNet database provides a cutting-edge community tool to guide experimentalists.
  14. Lang, D., Ullrich, K. K., Murat, F., Fuchs, J., Jenkins, J., Haas, F. B., Piednoel, M., et al. (2018). The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. PLANT JOURNAL, 93(3), 515–533.
    The draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome-scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene-and TE-rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono-centric with peaks of a class of Copia elements potentially coinciding with centromeres. Gene body methylation is evident in 5.7% of the protein-coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure-based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant-specific cell growth and tissue organization. The P. patens genome lacks the TE-rich pericentromeric and gene-rich distal regions typical for most flowering plant genomes. More non-seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.
  15. Van Bel, M., Diels, T., Vancaester, E., Kreft, L., Botzki, A., Van de Peer, Y., Coppens, F., et al. (2018). PLAZA 4.0 : an integrative resource for functional, evolutionary and comparative plant genomics. NUCLEIC ACIDS RESEARCH, 46(D1), D1190–D1196.
    PLAZA (https://bioinformatics.psb.ugent.be/plaza) is a plant-oriented online resource for comparative, evolutionary and functional genomics. The PLAZA platform consists of multiple independent instances focusing on different plant clades, while also providing access to a consistent set of reference species. Each PLAZA instance contains structural and functional gene annotations, gene family data and phylogenetic trees and detailed gene colinearity information. A user-friendly web interface makes the necessary tools and visualizations accessible, specific for each data type. Here we present PLAZA 4.0, the latest iteration of the PLAZA framework. This version consists of two new instances (Dicots 4.0 and Monocots 4.0) providing a large increase in newly available species, and offers access to updated and newly implemented tools and visualizations, helping users with the ever-increasing demands for complex and in-depth analyzes. The total number of species across both instances nearly doubles from 37 species in PLAZA 3.0 to 71 species in PLAZA 4.0, with a much broader coverage of crop species (e.g. wheat, palm oil) and species of evolutionary interest (e.g. spruce, Marchantia). The new PLAZA instances can also be accessed by a programming interface through a RESTful web service, thus allowing bioinformaticians to optimally leverage the power of the PLAZA platform.
  16. Besbrugge, N., Van Leene, J., Eeckhout, D., Cannoot, B., Kulkarni, S. R., De Winne, N., Persiau, G., et al. (2018). GSyellow, a multifaceted tag for functional protein analysis in monocot and dicot plants. PLANT PHYSIOLOGY, 177(2), 447–464.
    The ability to tag proteins has boosted the emergence of generic molecular methods for protein functional analysis. Fluorescent protein tags are used to visualize protein localization, and affinity tags enable the mapping of molecular interactions by, for example, tandem affinity purification or chromatin immunoprecipitation. To apply these widely used molecular techniques on a single transgenic plant line, we developed a multifunctional tandem affinity purification tag, named GS(yellow), which combines the streptavidin-binding peptide tag with citrine yellow fluorescent protein. We demonstrated the versatility of the GS(yellow) tag in the dicot Arabidopsis (Arabidopsis thaliana) using a set of benchmark proteins. For proof of concept in monocots, we assessed the localization and dynamic interaction profile of the leaf growth regulator ANGUSTIFOLIA3 (AN3), fused to the GS(yellow) tag, along the growth zone of the maize (Zea mays) leaf. To further explore the function of ZmAN3, we mapped its DNA-binding landscape in the growth zone of the maize leaf through chromatin immunoprecipitation sequencing. Comparison with AN3 target genes mapped in the developing maize tassel or in Arabidopsis cell cultures revealed strong conservation of AN3 target genes between different maize tissues and across monocots and dicots, respectively. In conclusion, the GS(yellow) tag offers a powerful molecular tool for distinct types of protein functional analyses in dicots and monocots. As this approach involves transforming a single construct, it is likely to accelerate both basic and translational plant research.
  17. Khan, Aziz, Fornes, O., Stigliani, A., Gheorghe, M., Castro-Mondragon, J. A., van der Lee, R., Bessy, A., et al. (2018). JASPAR 2018 : update of the open-access database of transcription factor binding profiles and its web framework. NUCLEIC ACIDS RESEARCH, 46(D1), D260–D266.
    JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
  18. De Schutter, K., Tsaneva, M., Kulkarni, S. R., Rougé, P., Vandepoele, K., & Van Damme, E. (2017). Evolutionary relationships and expression analysis of EUL domain proteins in rice (Oryza sativa). RICE, 10.
    Background: Lectins, defined as 'Proteins that can recognize and bind specific carbohydrate structures', are widespread among all kingdoms of life and play an important role in various biological processes in the cell. Most plant lectins are involved in stress signaling and/or defense. The family of Euonymus-related lectins (EULs) represents a group of stress-related lectins composed of one or two EUL domains. The latter protein domain is unique in that it is ubiquitous in land plants, suggesting an important role for these proteins. Results: Despite the availability of multiple completely sequenced rice genomes, little is known on the occurrence of lectins in rice. We identified 329 putative lectin genes in the genome of Oryza sativa subsp. japonica belonging to nine out of 12 plant lectin families. In this paper, an in-depth molecular characterization of the EUL family of rice was performed. In addition, analyses of the promoter sequences and investigation of the transcript levels for these EUL genes enabled retrieval of important information related to the function and stress responsiveness of these lectins. Finally, a comparative analysis between rice cultivars and several monocot and dicot species revealed a high degree of sequence conservation within the EUL domain as well as in the domain organization of these lectins. Conclusions: The presence of EULs throughout the plant kingdom and the high degree of sequence conservation in the EUL domain suggest that these proteins serve an important function in the plant cell. Analysis of the promoter region of the rice EUL genes revealed a diversity of stress responsive elements. Furthermore analysis of the expression profiles of the EUL genes confirmed that they are differentially regulated in response to several types of stress. These data suggest a potential role for the EULs in plant stress signaling and defense.
  19. Vandepoele, Klaas. (2017). A guide to the PLAZA 3.0 plant comparative genomic database. In A. D. van Dijk (Ed.), Plant genomics databases : methods and protocols (Vol. 1533, pp. 183–200). New York, NY, USA: Springer.
    PLAZA 3.0 is an online resource for comparative genomics and offers a versatile platform to study gene functions and gene families or to analyze genome organization and evolution in the green plant lineage. Starting from genome sequence information for over 35 plant species, precomputed comparative genomic data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, and genomic colinearity information within and between species. Complementary functional data sets, a Workbench, and interactive visualization tools are available through a user-friendly web interface, making PLAZA an excellent starting point to translate sequence or omics data sets into biological knowledge. PLAZA is available at http://bioinformatics.psb.ugent.be/plaza/ .
  20. Babiychuk, E., Trinh, H. K., Vandepoele, K., Van De Slijke, E., Geelen, D., De Jaeger, G., Obokata, J., et al. (2017). The mutation nrpb1-A325V in the largest subunit of RNA polymerase II suppresses compromised growth of Arabidopsis plants deficient in a function of the general transcription factor IIF. PLANT JOURNAL, 89(4), 730–745.
    The evolutionarily conserved 12-subunit RNA polymerase II (Pol II) is a central catalytic component that drives RNA synthesis during the transcription cycle that consists of transcription initiation, elongation, and termination. A diverse set of general transcription factors, including a multifunctional TFIIF, govern Pol II selectivity, kinetic properties, and transcription coupling with posttranscriptional processes. Here, we show that TFIIF of Arabidopsis (Arabidopsis thaliana) resembles the metazoan complex that is composed of the TFIIF and TFIIF polypeptides. Arabidopsis has two TFIIF subunits, of which TFIIF1/MAN1 is essential and TFIIF2/MAN2 is not. In the partial loss-of-function mutant allele man1-1, the winged helix domain of Arabidopsis TFIIF1/MAN1 was dispensable for plant viability, whereas the cellular organization of the shoot and root apical meristems were abnormal. Forward genetic screening identified an epistatic interaction between the largest Pol II subunit nrpb1-A325V variant and the man1-1 mutation. The suppression of the man1-1 mutant developmental defects by a mutation in Pol II suggests a link between TFIIF functions in Arabidopsis transcription cycle and the maintenance of cellular organization in the shoot and root apical meristems.
  21. Del Cortona, A., Leliaert, F., Bogaert, K., Turmel, M., Boedeker, C., Janouškovec, J., Lopez-Bautista, J. M., et al. (2017). The plastid genome in Cladophorales green algae is encoded by hairpin chromosomes. CURRENT BIOLOGY, 27(24), 3771–3782.
    Virtually all plastid (chloroplast) genomes are circular double-stranded DNA molecules, typically between 100 and 200 kb in size and encoding circa 80-250 genes. Exceptions to this universal plastid genome architecture are very few and include the dinoflagellates, where genes are located on DNA minicircles. Here we report on the highly deviant chloroplast genome of Cladophorales green algae, which is entirely fragmented into hairpin chromosomes. Short-and long-read high-throughput sequencing of DNA and RNA demonstrated that the chloroplast genes of Boodlea composita are encoded on 1-to 7-kb DNA contigs with an exceptionally high GC content, each containing a long inverted repeat with one or two protein-coding genes and conserved non-coding regions putatively involved in replication and/or expression. We propose that these contigs correspond to linear single-stranded DNA molecules that fold onto themselves to form hairpin chromosomes. The Boodlea chloroplast genes are highly divergent from their corresponding orthologs, and display an alternative genetic code. The origin of this highly deviant chloroplast genome most likely occurred before the emergence of the Cladophorales, and coincided with an elevated transfer of chloroplast genes to the nucleus. A chloroplast genome that is composed only of linear DNA molecules is unprecedented among eukaryotes, and highlights unexpected variation in plastid genome architecture.
  22. Zhang, Xinhua, Ivanova, A., Vandepoele, K., Radomiljac, J., Van de Velde, J., Berkowitz, O., Willems, P., et al. (2017). The transcription factor MYB29 is a regulator of ALTERNATIVE OXIDASE1a. PLANT PHYSIOLOGY, 173(3), 1824–1843.
    Plants sense and integrate a variety of signals from the environment through different interacting signal transduction pathways that involve hormones and signaling molecules. Using ALTERNATIVE OXIDASE1a (AOX1a) gene expression as a model system of retrograde or stress signaling between mitochondria and the nucleus, MYB DOMAIN PROTEIN29 (MYB29) was identified as a negative regulator (regulator of alternative oxidase1a 7 [rao7] mutant) in a genetic screen of Arabidopsis (Arabidopsis thaliana). rao7/myb29 mutants have increased levels of AOX1a transcript and protein compared to wild type after induction with antimycin A. A variety of genes previously associated with the mitochondrial stress response also display enhanced transcript abundance, indicating that RAO7/MYB29 negatively regulates mitochondrial stress responses in general. Meta-analysis of hormone-responsive marker genes and identification of downstream transcription factor networks revealed that MYB29 functions in the complex interplay of ethylene, jasmonic acid, salicylic acid, and reactive oxygen species signaling by regulating the expression of various ETHYLENE RESPONSE FACTOR and WRKY transcription factors. Despite an enhanced induction of mitochondrial stress response genes, rao7/myb29 mutants displayed an increased sensitivity to combined moderate light and drought stress. These results uncover interactions between mitochondrial retrograde signaling and the regulation of glucosinolate biosynthesis, both regulated by RAO7/MYB29. This common regulator can explain why perturbation of the mitochondrial function leads to transcriptomic responses overlapping with responses to biotic stress.
  23. De Decker, S., Vanormelingen, P., Sefbom, J., Lembke, C., Van den Berghe, K., Vandepoele, K., Sabbe, K., et al. (2017). Identifying drivers of sympatric speciation in the marine benthic diatom Seminavis robusta using metabolic analysis and whole-genome resequencing. PHYCOLOGIA (Vol. 56, pp. 41–42).
  24. Ruprecht, C., Proost, S., Hernandez-Coronado, M., Ortiz-Ramirez, C., Lang, D., Rensing, S. A., Becker, J. D., et al. (2017). Phylogenomic analysis of gene co-expression networks reveals the evolution of functional modules. PLANT JOURNAL, 90(3), 447–465.
    Molecular evolutionary studies correlate genomic and phylogenetic information with the emergence of new traits of organisms. These traits are, however, the consequence of dynamic gene networks composed of functional modules, which might not be captured by genomic analyses. Here, we established a method that combines large-scale genomic and phylogenetic data with gene co-expression networks to extensively study the evolutionary make-up of modules in the moss Physcomitrella patens, and in the angiosperms Arabidopsis thaliana and Oryza sativa (rice). We first show that younger genes are less annotated than older genes. By mapping genomic data onto the co-expression networks, we found that genes from the same evolutionary period tend to be connected, whereas old and young genes tend to be disconnected. Consequently, the analysis revealed modules that emerged at a specific time in plant evolution. To uncover the evolutionary relationships of the modules that are conserved across the plant kingdom, we added phylogenetic information that revealed duplication and speciation events on the module level. This combined analysis revealed an independent duplication of cell wall modules in bryophytes and angiosperms, suggesting a parallel evolution of cell wall pathways in land plants.
  25. Kreft, L., Botzki, A., Coppens, F., Vandepoele, K., & Van Bel, M. (2017). PhyD3 : a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization. BIOINFORMATICS, 33(18), 2946–2947.
    Motivation: Comparative and evolutionary studies utilize phylogenetic trees to analyze and visualize biological data. Recently, several web-based tools for the display, manipulation and annotation of phylogenetic trees, such as iTOL and Evolview, have released updates to be compatible with the latest web technologies. While those web tools operate an open server access model with a multitude of registered users, a feature-rich open source solution using current web technologies is not available. Results: Here, we present an extension of the widely used PhyloXML standard with several new options to accommodate functional genomics or annotation datasets for advanced visualization. Furthermore, PhyD3 has been developed as a lightweight tool using the JavaScript library D3.js to achieve a state-of-the-art phylogenetic tree visualization in the web browser, with support for advanced annotations. The current implementation is open source, easily adaptable and easy to implement in third parties' web sites. Availability and implementation: More information about PhyD3 itself, installation procedures and implementation links are available at http://phyd3.bits.vib.be and at http://github.com/vibbits/phyd3/. Supplementary information: Supplementary data are available at Bioinformatics online.
  26. Ritter Traub, A., Iñigo, S., Fernandez Calvo, P., Heyndrickx, K., Dhondt, S., Shi, H., De Milde, L., et al. (2017). The transcriptional repressor complex FRS7-FRS12 regulates flowering time and growth in Arabidopsis. NATURE COMMUNICATIONS, 8.
    Most living organisms developed systems to efficiently time environmental changes. The plant-clock acts in coordination with external signals to generate output responses determining seasonal growth and flowering time. Here, we show that two Arabidopsis thaliana transcription factors, FAR1 RELATED SEQUENCE 7 (FRS7) and FRS12, act as negative regulators of these processes. These proteins accumulate particularly in short-day conditions and interact to form a complex. Loss-of-function of FRS7 and FRS12 results in early flowering plants with overly elongated hypocotyls mainly in short days. We demonstrate by molecular analysis that FRS7 and FRS12 affect these developmental processes in part by binding to the promoters and repressing the expression of GIGANTEA and PHYTOCHROME INTERACTING FACTOR 4 as well as several of their downstream signalling targets. Our data reveal a molecular machinery that controls the photoperiodic regulation of flowering and growth and offer insight into how plants adapt to seasonal changes.
  27. Vaneechoutte, D., Estrada, A. R., Lin, Y.-C., Loraine, A. E., & Vandepoele, K. (2017). Genome-wide characterization of differential transcript usage in Arabidopsis thaliana. PLANT JOURNAL, 92(6), 1218–1231.
    Alternative splicing and the usage of alternate transcription start- or stop sites allows a single gene to produce multiple transcript isoforms. Most plant genes express certain isoforms at a significantly higher level than others, but under specific conditions this expression dominance can change, resulting in a different set of dominant isoforms. These events of differential transcript usage (DTU) have been observed for thousands of Arabidopsis thaliana, Zea mays and Vitis vinifera genes, and have been linked to development and stress response. However, neither the characteristics of these genes, nor the implications of DTU on their protein coding sequences or functions, are currently well understood. Here we present a dataset of isoform dominance and DTU for all genes in the AtRTD2 reference transcriptome based on a protocol that was benchmarked on simulated data and validated through comparison with a published reverse transciptase-polymerase chain reaction panel. We report DTU events for 8148 genes across 206 public RNA-Seq samples, and find that protein sequences are affected in 22% of the cases. The observed DTU events show high consistency across replicates, and reveal reproducible patterns in response to treatment and development. We also demonstrate that genes with different evolutionary ages, expression breadths and functions show large differences in the frequency at which they undergo DTU, and in the effect that these events have on their protein sequences. Finally, we showcase how the generated dataset can be used to explore DTU events for genes of interest or to find genes with specific DTU in samples of interest.
  28. Veeckman, E., Vandepoele, K., Asp, T., Roldàn-Ruiz, I., & Ruttink, T. (2016). Genomic variation in the FT gene family of perennial ryegrass (Lolium perenne). In I. Roldàn-Ruiz, J. Baert, & D. Reheul (Eds.), Breeding in a world of scarcity : proceedings of the 2015 meeting of the section “Forage Crops and Amenity Grasses” of Eucarpia (pp. 121–126). Presented at the 31st Symposium of Eucarpia’s “Forage Crops and Amenity Grasses” Section, Cham, Switzerland: Springer.
    The timing of fl owering is of prime importance for several agronomic traits, and its genetic control is therefore of great interest to breeders. Several signaling pathways converge on FLOWERING LOCUS T (FT) gene family members, which act as central regulators of fl owering, branching and seed dormancy. We identifi ed the complete FT gene family in the Lolium perenne genome and performed phylogenetic analysis to delineate functional clades and to identify putative functionally redundant paralogs. Five FT genes of L. perenne were selected for targeted resequencing in a genepool of 746 accessions to describe genetic diversity in wild accessions, commercial cultivars and breeding material.
  29. Van de Velde, Jan, Van Bel, M., Vaneechoutte, D., & Vandepoele, K. (2016). A collection of conserved noncoding sequences to study gene regulation in flowering plants. PLANT PHYSIOLOGY, 171(4), 2586–2598.
    Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops.
  30. Van Leene, J., Blomme, J., Kulkarni, S. R., Cannoot, B., De Winne, N., Eeckhout, D., Persiau, G., et al. (2016). Functional characterization of the Arabidopsis transcription factor bZIP29 reveals its role in leaf and root development. JOURNAL OF EXPERIMENTAL BOTANY, 67(19), 5825–5840.
    Plant bZIP group I transcription factors have been reported mainly for their role during vascular development and osmosensory responses. Interestingly, bZIP29 has been identified in a cell cycle interactome, indicating additional functions of bZIP29 in plant development. Here, bZIP29 was functionally characterized to study its role during plant development. It is not present in vascular tissue but is specifically expressed in proliferative tissues. Genome-wide mapping of bZIP29 target genes confirmed its role in stress and osmosensory responses, but also identified specific binding to several core cell cycle genes and to genes involved in cell wall organization. bZIP29 protein complex analyses validated interaction with other bZIP group I members and provided insight into regulatory mechanisms acting on bZIP dimers. In agreement with bZIP29 expression in proliferative tissues and with its binding to promoters of cell cycle regulators, dominant-negative repression of bZIP29 altered the cell number in leaves and in the root meristem. A transcriptome analysis on the root meristem, however, indicated that bZIP29 might regulate cell number through control of cell wall organization. Finally, ectopic dominant-negative repression of bZIP29 and redundant factors led to a seedling-lethal phenotype, pointing to essential roles for bZIP group I factors early in plant development.
  31. Veeckman, E., Ruttink, T., & Vandepoele, K. (2016). Are we there yet? : reliably estimating the completeness of plant genome sequences. PLANT CELL, 28(8), 1759–1768.
    Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation.
  32. Tzfadia, O., Diels, T., De Meyer, S., Vandepoele, K., Aharoni, A., & Van de Peer, Y. (2016). CoExpNetViz: comparative co-expression networks construction and visualization tool. FRONTIERS IN PLANT SCIENCE, 6.
    Motivation: Comparative transcriptomics is a common approach in functional gene discovery efforts. It allows for finding conserved co-expression patterns between orthologous genes in closely related plant species, suggesting that these genes potentially share similar function and regulation. Several efficient co-expression-based tools have been commonly used in plant research but most of these pipelines are limited to data from model systems, which greatly limit their utility. Moreover, in addition, none of the existing pipelines allow plant researchers to make use of their own unpublished gene expression data for performing a comparative co-expression analysis and generate multi-species co-expression networks. Results: We introduce CoExpNetViz, a computational tool that uses a set of query or "bait" genes as an input (chosen by the user) and a minimum of one pre-processed gene expression dataset. The CoExpNetViz algorithm proceeds in three main steps; (i) for every bait gene submitted, co-expression values are calculated using mutual information and Pearson correlation coefficients, (ii) non bait (or target) genes are grouped based on cross-species orthology, and (iii) output files are generated and results can be visualized as network graphs in Cytoscape. Availability: The CoExpNetViz tool is freely available both as a PHP web server (link: http://bioinformatics.psb.ugent.be/webtools/coexpr/) (implemented in C++) and as a Cytoscape plugin (implemented in Java). Both versions of the CoExpNetViz tool support LINUX and Windows platforms.
  33. Nelissen, H., Eeckhout, D., Demuynck, K., Persiau, G., Walton, A., Van Bel, M., Vervoort, M., et al. (2015). Dynamic changes in ANGUSTIFOLIA3 complex composition reveal a growth regulatory mechanism in the maize leaf. PLANT CELL, 27(6), 1605–1619.
    Most molecular processes during plant development occur with a particular spatio-temporal specificity. Thus far, it has remained technically challenging to capture dynamic protein-protein interactions within a growing organ, where the interplay between cell division and cell expansion is instrumental. Here, we combined high-resolution sampling of the growing maize (Zea mays) leaf with tandem affinity purification followed by mass spectrometry. Our results indicate that the growth-regulating SWI/SNF chromatin remodeling complex associated with ANGUSTIFOLIA3 (AN3) was conserved within growing organs and between dicots and monocots. Moreover, we were able to demonstrate the dynamics of the AN3-interacting proteins within the growing leaf, since copurified GROWTH-REGULATING FACTORs (GRFs) varied throughout the growing leaf. Indeed, GRF1, GRF6, GRF7, GRF12, GRF15, and GRF17 were significantly enriched in the division zone of the growing leaf, while GRF4 and GRF10 levels were comparable between division zone and expansion zone in the growing leaf. These dynamics were also reflected at the mRNA and protein levels, indicating tight developmental regulation of the AN3-associated chromatin remodeling complex. In addition, the phenotypes of maize plants overexpressing miRNA396a-resistant GRF1 support a model proposing that distinct associations of the chromatin remodeling complex with specific GRFs tightly regulate the transition between cell division and cell expansion. Together, our data demonstrate that advancing from static to dynamic protein-protein interaction analysis in a growing organ adds insights in how developmental switches are regulated.
  34. Van Leene, J., Eeckhout, D., Cannoot, B., De Winne, N., Persiau, G., Van De Slijke, E., Vercruysse, L., et al. (2015). An improved toolbox to unravel the plant cellular machinery by tandem affinity purification of Arabidopsis protein complexes. NATURE PROTOCOLS, 10(1), 169–187.
    Tandem affinity purification coupled to mass spectrometry (TAP-MS) is one of the most advanced methods to characterize protein complexes in plants, giving a comprehensive view on the protein-protein interactions (PPIs) of a certain protein of interest (bait). The bait protein is fused to a double affinity tag, which consists of a protein G tag and a streptavidin-binding peptide separated by a very specific protease cleavage site, allowing highly specific protein complex isolation under near-physiological conditions. Implementation of this optimized TAP tag, combined with ultrasensitive MS, means that these experiments can be performed on small amounts (25 mg of total protein) of protein extracts from Arabidopsis cell suspension cultures. It is also possible to use this approach to isolate low abundant protein complexes from Arabidopsis seedlings, thus opening perspectives for the exploration of protein complexes in a plant developmental context. Next to protocols for efficient biomass generation of seedlings (similar to 7.5 months), we provide detailed protocols for TAP (1 d), and for sample preparation and liquid chromatography-tandem MS (LC-MS/MS; similar to 5 d), either from Arabidopsis seedlings or from cell cultures. For the identification of specific co-purifying proteins, we use an extended protein database and filter against a list of nonspecific proteins on the basis of the occurrence of a co-purified protein among 543 TAP experiments. The value of the provided protocols is illustrated through numerous applications described in recent literature.
  35. Verkest, A., Byzova, M., Martens, C., Willems, P., Verwulgen, T., Slabbinck, B., Rombaut, D., et al. (2015). Selection for improved energy use efficiency and drought tolerance in canola results in distinct transcriptome and epigenome changes. PLANT PHYSIOLOGY, 168(4), 1338–1350.
    To increase both the yield potential and stability of crops, integrated breeding strategies are used that have mostly a direct genetic basis, but the utility of epigenetics to improve complex traits is unclear. A better understanding of the status of the epigenome and its contribution to agronomic performance would help in developing approaches to incorporate the epigenetic component of complex traits into breeding programs. Starting from isogenic canola (Brassica napus) lines, epilines were generated by selecting, repeatedly for three generations, for increased energy use efficiency and drought tolerance. These epilines had an enhanced energy use efficiency, drought tolerance, and nitrogen use efficiency. Transcriptome analysis of the epilines and a line selected for its energy use efficiency solely revealed common differentially expressed genes related to the onset of stress tolerance-regulating signaling events. Genes related to responses to salt, osmotic, abscisic acid, and drought treatments were specifically differentially expressed in the drought-tolerant epilines. The status of the epigenome, scored as differential trimethylation of lysine-4 of histone 3, further supported the phenotype by targeting drought-responsive genes and facilitating the transcription of the differentially expressed genes. From these results, we conclude that the canola epigenome can be shaped by selection to increase energy use efficiency and stress tolerance. Hence, these findings warrant the further development of strategies to incorporate epigenetics into breeding.
  36. Proost, S., Van Bel, M., Vaneechoutte, D., Van de Peer, Y., Inzé, D., Mueller-Roeber, B., & Vandepoele, K. (2015). PLAZA 3.0 : an access point for plant comparative genomics. NUCLEIC ACIDS RESEARCH, 43(D1), D974–D981.
    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms.
  37. Wang, F., Muto, A., Van de Velde, J., Neyt, P., Himanen, K., Vandepoele, K., & Van Lijsebettens, M. (2015). Functional analysis of the Arabidopsis TETRASPANIN gene family in plant growth and development. PLANT PHYSIOLOGY, 169(3), 2200–2214.
    TETRASPANIN (TET) genes encode conserved integral membrane proteins that are known in animals to function in cellular communication during gamete fusion, immunity reaction and pathogen recognition. In plants, functional information is limited to one of the 17 members of the Arabidopsis TET gene family and to expression data in reproductive stages. Here, the promoter activity of all 17 Arabidopsis TET genes was investigated by pAtTET::NLS-GFP/GUS reporter lines throughout the life cycle, which predicted functional divergence in the paralogous genes per clade. However, partial overlap was observed for many TET genes across the clades, correlating with few phenotypes in single mutants and therefore requiring double mutant combinations for functional investigation. Mutational analysis showed a role for TET13 in primary root growth and lateral root development, and redundant roles for TET5 and TET6 in leaf and root growth through negative regulation of cell proliferation. Strikingly, a number of TET genes were expressed in embryonic and seedling progenitor cells and remained expressed until the differentiation state in the mature plant, suggesting a dynamic function over developmental stages. cis-regulatory elements together with transcription factor binding data provided molecular insight into the site, conditions and perturbations that affect TET gene expression, and positioned the TET genes in different molecular pathways; the data represent a hypothesis-generating resource for further functional analyses.
  38. Goeminne, L., Vandepoele, K., Gevaert, K., & Clement, L. (2015). Robust peptide-based models in quantitative proteomics. Proteomic Forum, Abstracts. Presented at the Proteomic Forum 2015.
    Peptide level models for assessing differential proteomics outperform summarization-based methods in terms of sensitivity, specificity, accuracy and precision (Goeminne et al., 2015, submitted). However, the ordinary least squares (OLS) parameter estimator is prone to overfitting and suffers from missing peptides and outliers that are omnipresent in proteomics data. We propose a robust ridge estimator and adopt empirical Bayes to stabilize the variance. With the CPTAC spike-in study, we demonstrate that our robust peptide-based estimator further improves the sensitivity and specificity.
  39. Goeminne, L., Vandepoele, K., Gevaert, K., & Clement, L. (2015). Peptide-level robust ridge regression modeling improves both sensitivity and specificity in quantitative proteomics. Presented at the 7th MaxQuant Summer School 2015.
  40. Vriet, C., Lemmens, K., Vandepoele, K., Reuzeau, C., & Russinova, E. (2015). Evolutionary trails of plant steroid genes. TRENDS IN PLANT SCIENCE, 20(5), 301–308.
    Plant steroids - brassinosteroids (BRs) and their precursors, phytosterols-play a major role in plant growth, development, stress tolerance, and have high potential for agricultural applications. Currently, this prospect is limited by a lack of information about their evolution and expression dynamics (spatial and temporal) across plant species. The increasing number of sequenced genomes offers an opportunity for evolutionary studies that might help to prioritize functional analyses with the aim to improve crop yield and stress tolerance. In this review we provide a glimpse of the origin, evolution, and functional conservation of phytosterol and BR genes in the green plant lineage using comparative sequence and expression analyses of publicly available datasets.
  41. Glover, N. M., Daron, J., Pingault, L., Vandepoele, K., Paux, E., Feuillet, C., & Choulet, F. (2015). Small-scale gene duplications played a major role in the recent evolution of wheat chromosome 3B. GENOME BIOLOGY, 16.
    Background: Bread wheat is not only an important crop, but its large (17 Gb), highly repetitive, and hexaploid genome makes it a good model to study the organization and evolution of complex genomes. Recently, we produced a high quality reference sequence of wheat chromosome 3B (774 Mb), which provides an excellent opportunity to study the evolutionary dynamics of a large and polyploid genome, specifically the impact of single gene duplications. Results: We find that 27 % of the 3B predicted genes are non-syntenic with the orthologous chromosomes of Brachypodium distachyon, Oryza sativa, and Sorghum bicolor, whereas, by applying the same criteria, non-syntenic genes represent on average only 10 % of the predicted genes in these three model grasses. These non-syntenic genes on 3B have high sequence similarity to at least one other gene in the wheat genome, indicating that hexaploid wheat has undergone massive small-scale interchromosomal gene duplications compared to other grasses. Insertions of non-syntenic genes occurred at a similar rate along the chromosome, but these genes tend to be retained at a higher frequency in the distal, recombinogenic regions. The ratio of non-synonymous to synonymous substitution rates showed a more relaxed selection pressure for non-syntenic genes compared to syntenic genes, and gene ontology analysis indicated that non-syntenic genes may be enriched in functions involved in disease resistance. Conclusion: Our results highlight the major impact of single gene duplications on the wheat gene complement and confirm the accelerated evolution of the Triticeae lineage among grasses.
  42. Gonzalez Sanchez, N., Pauwels, L., Baekelandt, A., De Milde, L., Van Leene, J., Besbrugge, N., Heyndrickx, K., et al. (2015). A repressor protein complex regulates leaf growth in Arabidopsis. PLANT CELL, 27(8), 2273–2287.
    Cell number is an important determinant of final organ size. In the leaf, a large proportion of cells are derived from the stomatal lineage. Meristemoids, which are stem cell-like precursor cells, undergo asymmetric divisions, generating several pavement cells adjacent to the two guard cells. However, the mechanism controlling the asymmetric divisions of these stem cells prior to differentiation is not well understood. Here, we characterized PEAPOD (PPD) proteins, the only transcriptional regulators known to negatively regulate meristemoid division. PPD proteins interact with KIX8 and KIX9, which act as adaptor proteins for the corepressor TOPLESS. D3-type cyclin encoding genes were identified among direct targets of PPD2, being negatively regulated by PPDs and KIX8/9. Accordingly, kix8 kix9 mutants phenocopied PPD loss-of-function producing larger leaves resulting from increased meristemoid amplifying divisions. The identified conserved complex might be specific for leaf growth in the second dimension, since it is not present in Poaceae (grasses), which also lack the developmental program it controls.
  43. De Witte, D., Van de Velde, J., Decap, D., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., et al. (2015). BLSSpeller : exhaustive comparative discovery of conserved cis-regulatory elements. BIOINFORMATICS, 31(23), 3758–3766.
    Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O. sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z. mays.
  44. Del Cortona, A., Leliaert, F., Verbruggen, H., Lopez-Bautista, J. M., Vandepoele, K., & De Clerck, O. (2015). Towards an understanding of the cytological diversity of green seaweeds (Ulvophyceae). EUROPEAN JOURNAL OF PHYCOLOGY (Vol. 50, pp. 217–217).
  45. Volders, P.-J., Verheggen, K., Menschaert, G., Vandepoele, K., Martens, L., Vandesompele, J., & Mestdagh, P. (2015). An update on LNCipedia : a database for annotated human lncRNA sequences. NUCLEIC ACIDS RESEARCH, 43(D1), D174–D180.
    The human genome is pervasively transcribed, producing thousands of non-coding RNA transcripts. The majority of these transcripts are long non-coding RNAs (lncRNAs) and novel lncRNA genes are being identified at rapid pace. To streamline these efforts, we created LNCipedia, an online repository of lncRNA transcripts and annotation. Here, we present LNCipedia 3.0 (http://www.lncipedia.org), the latest version of the publicly available human lncRNA database. Compared to the previous version of LNCipedia, the database grew over five times in size, gaining over 90 000 new lncRNA transcripts. Assessment of the protein-coding potential of LNCipedia entries is improved with state-of-the art methods that include large-scale reprocessing of publicly available proteomics data. As a result, a high-confidence set of lncRNA transcripts with low coding potential is defined and made available for download. In addition, a tool to assess lncRNA gene conservation between human, mouse and zebrafish has been implemented.
  46. Zamariola, L., De Storme, N., Vannerum, K., Vandepoele, K., Armstrong, S. J., Franklin, F. C. H., & Geelen, D. (2014). SHUGOSHINs and PATRONUS protect meiotic centromere cohesion in Arabidopsis thaliana. PLANT JOURNAL, 77(5), 782–794.
    In meiosis, chromosome cohesion is maintained by the cohesin complex, which is released in a two-step manner. At meiosis I, the meiosis-specific cohesin subunit Rec8 is cleaved by the protease Separase along chromosome arms, allowing homologous chromosome segregation. Next, in meiosis II, cleavage of the remaining centromere cohesin results in separation of the sister chromatids. In eukaryotes, protection of centromeric cohesion in meiosis I is mediated by SHUGOSHINs (SGOs). The Arabidopsis genome contains two SGO homologs. Here we demonstrate that Atsgo1 mutants show a premature loss of cohesion of sister chromatid centromeres at anaphase I and that AtSGO2 partially rescues this loss of cohesion. In addition to SGOs, we characterize PATRONUS which is specifically required for the maintenance of cohesion of sister chromatid centromeres in meiosis II. In contrast to the Atsgo1 Atsgo2 double mutant, patronus T-DNA insertion mutants only display loss of sister chromatid cohesion after meiosis I, and additionally show disorganized spindles, resulting in defects in chromosome segregation in meiosis. This leads to reduced fertility and aneuploid offspring. Furthermore, we detect aneuploidy in sporophytic tissue, indicating a role for PATRONUS in chromosome segregation in somatic cells. Thus, ploidy stability is preserved in Arabidopsis by PATRONUS during both meiosis and mitosis.
  47. Vargas, L., Santa Brigida, A. B., Mota Filho, J. P., de Carvalho, T. G., Rojas, C. A., Vaneechoutte, D., Van Bel, M., et al. (2014). Drought tolerance conferred to sugarcane by association with Gluconacetobacter diazotrophicus: a transcriptomic view of hormone pathways. PLOS ONE, 9(12).
    Sugarcane interacts with particular types of beneficial nitrogen-fixing bacteria that provide fixed-nitrogen and plant growth hormones to host plants, promoting an increase in plant biomass. Other benefits, as enhanced tolerance to abiotic stresses have been reported to some diazotrophs. Here we aim to study the effects of the association between the diazotroph Gluconacetobacter diazotrophicus PAL5 and sugarcane cv. SP70-1143 during water depletion by characterizing differential transcriptome profiles of sugarcane. RNA-seq libraries were generated from roots and shoots of sugarcane plants free of endophytes that were inoculated with G. diazotrophicus and subjected to water depletion for 3 days. A sugarcane reference transcriptome was constructed and used for the identification of differentially expressed transcripts. The differential profile of non-inoculated SP70-1143 suggests that it responds to water deficit stress by the activation of drought-responsive markers and hormone pathways, as ABA and Ethylene. qRT-PCR revealed that root samples had higher levels of G. diazotrophicus 3 days after water deficit, compared to roots of inoculated plants watered normally. With prolonged drought only inoculated plants survived, indicating that SP70-1143 plants colonized with G. diazotrophicus become more tolerant to drought stress than non-inoculated plants. Strengthening this hypothesis, several gene expression responses to drought were inactivated or regulated in an opposite manner, especially in roots, when plants were colonized by the bacteria. The data suggests that colonized roots would not be suffering from stress in the same way as non-inoculated plants. On the other hand, shoots specifically activate ABA-dependent signaling genes, which could act as key elements in the drought resistance conferred by G. diazotrophicus to SP70-1143. This work reports for the first time the involvement of G. diazotrophicus in the promotion of drought-tolerance to sugarcane cv. SP70-1143, and it describes the initial molecular events that may trigger the increased drought tolerance in the host plant.
  48. Fu, Q., Fierro Gutierrez, A. C. E., Meysman, P., Sanchez Rodriguez, A., Vandepoele, K., Marchal, K., & Engelen, K. (2014). MAGIC: access portal to a cross-platform gene expression compendium for maize. BIOINFORMATICS, 30(9), 1316–1318.
    To facilitate the exploration of publicly available Zea mays expression data, we constructed a maize expression compendium, making use of an integration methodology and a consistent probe to gene mapping based on the 5b.60 sequence release of Z. mays. The compendium is made available through a web portal MAGIC that hosts a variety of analysis tools to easily browse and analyze the data. Our compendium is different from previous initiatives in combining expression values across different experiments by providing a consistent gene annotation across different platforms.
  49. Lindemose, S., Jensen, M. K., Van de Velde, J., O’Shea, C., Heyndrickx, K., Workman, C. T., Vandepoele, K., et al. (2014). A DNA-binding-site landscape and regulatory network analysis for NAC transcription factors in Arabidopsis thaliana. NUCLEIC ACIDS RESEARCH, 42(12), 7681–7693.
    Target gene identification for transcription factors is a prerequisite for the systems wide understanding of organismal behaviour. NAM-ATAF1/2-CUC2 (NAC) transcription factors are amongst the largest transcription factor families in plants, yet limited data exist from unbiased approaches to resolve the DNA-binding preferences of individual members. Here, we present a TF-target gene identification workflow based on the integration of novel protein binding microarray data with gene expression and multi-species promoter sequence conservation to identify the DNA-binding specificities and the gene regulatory networks of 12 NAC transcription factors. Our data offer specific single-base resolution fingerprints for most TFs studied and indicate that NAC DNA-binding specificities might be predicted from their DNA-binding domain's sequence. The developed methodology, including the application of complementary functional genomics filters, makes it possible to translate, for each TF, protein binding microarray data into a set of high-quality target genes. With this approach, we confirm NAC target genes reported from independent in vivo analyses. We emphasize that candidate target gene sets together with the workflow associated with functional modules offer a strong resource to unravel the regulatory potential of NAC genes and that this workflow could be used to study other families of transcription factors.
  50. Van de Velde, Jan, Heyndrickx, K., & Vandepoele, K. (2014). Inference of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. PLANT CELL, 26(7), 2729–2745.
    Transcriptional regulation plays an important role in establishing gene expression profiles during development or in response to (a) biotic stimuli. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity, and the identification of individual TFBS in genome sequences is a major goal to inferring regulatory networks. We have developed a phylogenetic footprinting approach for the identification of conserved noncoding sequences (CNSs) across 12 dicot plants. Whereas both alignment and non-alignment-based techniques were applied to identify functional motifs in a multispecies context, our method accounts for incomplete motif conservation as well as high sequence divergence between related species. We identified 69,361 footprints associated with 17,895 genes. Through the integration of known TFBS obtained from the literature and experimental studies, we used the CNSs to compile a gene regulatory network in Arabidopsis thaliana containing 40,758 interactions, of which two-thirds act through binding events located in DNase I hypersensitive sites. This network shows significant enrichment toward in vivo targets of known regulators, and its overall quality was confirmed using five different biological validation metrics. Finally, through the integration of detailed expression and function information, we demonstrate how static CNSs can be converted into condition-dependent regulatory networks, offering opportunities for regulatory gene annotation.
  51. Sonnhammer, E. L., Gabaldón, T., da Silva, A. W. S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P. D., et al. (2014). Big data and other challenges in the quest for orthologs. BIOINFORMATICS.
    Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking.
  52. De Witte, D., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2014). A parallel, distributed-memory framework for comparative motif discovery. In R. Wyrzykowski, J. Dongarra, K. Karczewski , & J. Wasniewski (Eds.), Lecture Notes in Computer Science (Vol. 8385, pp. 268–277). Presented at the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM), Springer.
    The increasing number of sequenced organisms has opened new possibilities for the computational discovery of cis-regulatory elements ('motifs') based on phylogenetic footprinting. Word-based, exhaustive approaches are among the best performing algorithms, however, they pose significant computational challenges as the number of candidate motifs to evaluate is very high. In this contribution, we describe a parallel, distributed-memory framework for de novo comparative motif discovery. Within this framework, two approaches for phylogenetic footprinting are implemented: an alignment-based and an alignment-free method. The framework is able to statistically evaluate the conservation of motifs in a search space containing over 160 million candidate motifs using a distributed-memory cluster with 200 CPU cores in a few hours. Software available from http://bioinformatics.intec.ugent.be/blsspeller/
  53. Verkest, A., Abeel, T., Heyndrickx, K., Van Leene, J., Lanz, C., Van De Slijke, E., De Winne, N., et al. (2014). A generic tool for transcription factor target gene discovery in Arabidopsis cell suspension cultures based on tandem chromatin affinity purification. PLANT PHYSIOLOGY, 164(3), 1122–1133.
    Genome-wide identification of transcription factor (TF) binding sites is pivotal to our understanding of gene expression regulation. Although much progress has been made in the determination of potential binding regions of proteins by chromatin immunoprecipitation, this method has some inherent limitations regarding DNA enrichment efficiency and antibody necessity. Here, we report an alternative strategy for assaying in vivo TF-DNA binding in Arabidopsis (Arabidopsis thaliana) cells by tandem chromatin affinity purification (TChAP). Evaluation of TChAP using the E2Fa TF and comparison with traditional chromatin immunoprecipitation and single chromatin affinity purification illustrates the suitability of TChAP and provides a resource for exploring the E2Fa transcriptional network. Integration with transcriptome, cis-regulatory element, functional enrichment, and coexpression network analyses demonstrates the quality of the E2Fa TChAP sequencing data and validates the identification of new direct E2Fa targets. TChAP enhances both TF target mapping throughput, by circumventing issues related to antibody availability, and output, by improving DNA enrichment efficiency.
  54. Heyndrickx, K., Van de Velde, J., Wang, C., Weigel, D., & Vandepoele, K. (2014). A functional and evolutionary perspective on transcription factor binding in Arabidopsis thaliana. PLANT CELL, 26(10), 3894–3910.
    Understanding the mechanisms underlying gene regulation is paramount to comprehend the translation from genotype to phenotype. The two are connected by gene expression, and it is generally thought that variation in transcription factor (TF) function is an important determinant of phenotypic evolution. We analyzed publicly available genome-wide chromatin immunoprecipitation experiments for 27 TFs in Arabidopsis thaliana and constructed an experimental network containing 46,619 regulatory interactions and 15,188 target genes. We identified hub targets and highly occupied target (HOT) regions, which are enriched for genes involved in development, stimulus responses, signaling, and gene regulatory processes in the currently profiled network. We provide several lines of evidence that TF binding at plant HOT regions is functional, in contrast to that in animals, and not merely the result of accessible chromatin. HOT regions harbor specific DNA motifs, are enriched for differentially expressed genes, and are often conserved across crucifers and dicots, even though they are not under higher levels of purifying selection than non-HOT regions. Distal bound regions are under purifying selection as well and are enriched for a chromatin state showing regulation by the Polycomb repressive complex. Gene expression complexity is positively correlated with the total number of bound TFs, revealing insights in the regulatory code for genes with different expression breadths. The integration of noncanonical and canonical DNA motif information yields new hypotheses on cobinding and tethering between specific TFs involved in flowering and light regulation.
  55. Choulet, F., Alberti, A., Theil, S., Glover, N., Barbe, V., Daron, J., Pingault, L., et al. (2014). Structural and functional partitioning of bread wheat chromosome 3B. SCIENCE, 345(6194).
    We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter-and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits.
  56. Vercruyssen, L., Verkest, A., Gonzalez Sanchez, N., Heyndrickx, K., Eeckhout, D., Han, S.-K., Jégu, T., et al. (2014). ANGUSTIFOLIA3 binds to SWI/SNF chromatin remodeling complexes to regulate transcription during Arabidopsis leaf development. PLANT CELL, 26(1), 210–229.
    The transcriptional coactivator ANGUSTIFOLIA3 (AN3) stimulates cell proliferation during Arabidopsis thaliana leaf development, but the molecular mechanism is largely unknown. Here, we show that inducible nuclear localization of AN3 during initial leaf growth results in differential expression of important transcriptional regulators, including GROWTH REGULATING FACTORs (GRFs). Chromatin purification further revealed the presence of AN3 at the loci of GRF5, GRF6, CYTOKININ RESPONSE FACTOR2, CONSTANS-LIKE5 (COL5), HECATE1 (HEC1), and ARABIDOPSIS RESPONSE REGULATOR4 (ARR4). Tandem affinity purification of protein complexes using AN3 as bait identified plant SWITCH/SUCROSE NONFERMENTING (SWI/SNF) chromatin remodeling complexes formed around the ATPases BRAHMA (BRM) or SPLAYED. Moreover, SWI/SNF ASSOCIATED PROTEIN 73B (SWP73B) is recruited by AN3 to the promoters of GRF5, GRF3, COL5, and ARR4, and both SWP73B and BRM occupy the HEC1 promoter. Furthermore, we show that AN3 and BRM genetically interact. The data indicate that AN3 associates with chromatin remodelers to regulate transcription. In addition, modification of SWI3C expression levels increases leaf size, underlining the importance of chromatin dynamics for growth regulation. Our results place the SWI/SNF-AN3 module as a major player at the transition from cell proliferation to cell differentiation in a developing leaf.
  57. Heyman, J., Cools, T., Vandenbussche, F., Heyndrickx, K., Van Leene, J., Vercauteren, I., Vanderauwera, S., et al. (2013). ERF115 controls root quiescent center cell division and stem cell replenishment. SCIENCE, 342(6160), 860–863.
    The quiescent center (QC) plays an essential role during root development by creating a microenvironment that preserves the stem cell fate of its surrounding cells. Despite being surrounded by highly mitotic active cells, QC cells self-renew at a low proliferation rate. Here, we identified the ERF115 transcription factor as a rate-limiting factor of QC cell division, acting as a transcriptional activator of the phytosulfokine PSK5 peptide hormone. ERF115 marks QC cell division but is restrained through proteolysis by the APC/C-CCS52A2 ubiquitin ligase, whereas QC proliferation is driven by brassinosteroid-dependent ERF115 expression. Together, these two antagonistic mechanisms delimit ERF115 activity, which is called upon when surrounding stem cells are damaged, revealing a cell cycle regulatory mechanism accounting for stem cell niche longevity.
  58. Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., & Vandepoele, K. (2013). TRAPID : an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes. GENOME BIOLOGY, 14(12).
    Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system.
  59. Verelst, W., Bertolini, E., De Bodt, S., Vandepoele, K., Demeulenaere, M., Pé, M. E., & Inzé, D. (2013). Molecular and physiological analysis of growth-limiting drought stress in Brachypodium distachyon leaves. MOLECULAR PLANT, 6(2), 311–322.
    The drought-tolerant grass Brachypodium distachyon is an emerging model species for temperate grasses and cereal crops. To explore the usefulness of this species for drought studies, a reproducible in vivo drought assay was developed. Spontaneous soil drying led to a 45% reduction in leaf size, and this was mostly due to a decrease in cell expansion, whereas cell division remained largely unaffected by drought. To investigate the molecular basis of the observed leaf growth reduction, the third Brachypodium leaf was dissected in three zones, namely proliferation, expansion, and mature zones, and subjected to transcriptome analysis, based on a whole-genome tiling array. This approach allowed us to highlight that transcriptome profiles of different developmental leaf zones respond differently to drought. Several genes and functional processes involved in drought tolerance were identified. The transcriptome data suggest an increased energy availability in the proliferation zones, along with an up-regulation of sterol synthesis that may influence membrane fluidity. This information may be used to improve the tolerance of temperate cereals to drought, which is undoubtedly one of the major environmental challenges faced by agriculture today and in the near future.
  60. Vandepoele, Klaas, Van Bel, M., Richard, G., Van Landeghem, S., Verhelst, B., Moreau, H., Van de Peer, Y., et al. (2013). pico-PLAZA, a genome database of microbial photosynthetic eukaryotes. ENVIRONMENTAL MICROBIOLOGY, 15(8), 2147–2153.
    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. PLAZA can be used to functionally characterize large-scale ES /RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylumtricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains.
  61. De Clercq, I., Vermeirssen, V., Van Aken, O., Vandepoele, K., Murcha, M. W., Law, S. R., Inzé, A., et al. (2013). The membrane-bound NAC transcription factor ANAC013 functions in mitochondrial retrograde regulation of the oxidative stress response in Arabidopsis. PLANT CELL, 25(9), 3472–3490.
    Upon disturbance of their function by stress, mitochondria can signal to the nucleus to steer the expression of responsive genes. This mitochondria-to-nucleus communication is often referred to as mitochondrial retrograde regulation (MRR). Although reactive oxygen species and calcium are likely candidate signaling molecules for MRR, the protein signaling components in plants remain largely unknown. Through meta-analysis of transcriptome data, we detected a set of genes that are common and robust targets of MRR and used them as a bait to identify its transcriptional regulators. In the upstream regions of these mitochondrial dysfunction stimulon (MDS) genes, we found a cis-regulatory element, the mitochondrial dysfunction motif (MDM), which is necessary and sufficient for gene expression under various mitochondrial perturbation conditions. Yeast one-hybrid analysis and electrophoretic mobility shift assays revealed that the transmembrane domain-containing NO APICAL MERISTEM/ARABIDOPSIS TRANSCRIPTION ACTIVATION FACTOR/CUP-SHAPED COTYLEDON transcription factors (ANAC013, ANAC016, ANAC017, ANAC053, and ANAC078) bound to the MDM cis-regulatory element. We demonstrate that ANAC013 mediates MRRinduced expression of the MDS genes by direct interaction with the MDMcis-regulatory element and triggers increased oxidative stress tolerance. In conclusion, we characterized ANAC013 as a regulator of MRR upon stress in Arabidopsis thaliana.
  62. De Smet, Riet, Adams, K. L., Vandepoele, K., Van Montagu, M., Maere, S., & Van de Peer, Y. (2013). Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 110(8), 2898–2903.
    The importance of gene gain through duplication has long been appreciated. In contrast, the importance of gene loss has only recently attracted attention. Indeed, studies in organisms ranging from plants to worms and humans suggest that duplication of some genes might be better tolerated than that of others. Here we have undertaken a large-scale study to investigate the existence of duplication-resistant genes in the sequenced genomes of 20 flowering plants. We demonstrate that there is a large set of genes that is convergently restored to single-copy status following multiple genome-wide and smaller scale duplication events. We rule out the possibility that such a pattern could be explained by random gene loss only and therefore propose that there is selection pressure to preserve such genes as singletons. This is further substantiated by the observation that angiosperm single-copy genes do not comprise a random fraction of the genome, but instead are often involved in essential housekeeping functions that are highly conserved across all eukaryotes. Furthermore, single-copy genes are generally expressed more highly and in more tissues than non-single-copy genes, and they exhibit higher sequence conservation. Finally, we propose different hypotheses to explain their resistance against duplication.
  63. De Witte, D., Van de Velde, J., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., Vandepoele, K., et al. (2013). Comparative motif discovery in the cloud. Benelux Bioinformatics Conference 2013, Abstracts. Presented at the Benelux Bioinformatics Conference 2013.
  64. Wang, F., Vandepoele, K., & Van Lijsebettens, M. (2012). Tetraspanin genes in plants. PLANT SCIENCE, 190, 9–15.
  65. Dessimoz, C., Gabaldón, T., Roos, D. S., Sonnhammer, E. L., Herrero, J., Quest Orthologs Consortium, the, Vandepoele, K., et al. (2012). Toward community standards in the quest for orthologs. BIOINFORMATICS, 28(6), 900–904.
  66. Van Bel, M., Proost, S., Wischnitzki, E., Movahedi, S., Scheerlinck, C., Van de Peer, Y., & Vandepoele, K. (2012). Dissecting plant genomes with the PLAZA comparative genomics platform. PLANT PHYSIOLOGY, 158(2), 590–600.
    With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase.
  67. Quimbaya Gomez, M. A., Vandepoele, K., Raspé, E., Matthijs, M., Dhondt, S., Beemster, G., Berx, G., et al. (2012). Identification of putative cancer genes through data integration and comparative genomics between plants and humans. CELLULAR AND MOLECULAR LIFE SCIENCES, 69(12), 2041–2055.
    Coordination of cell division with growth and development is essential for the survival of organisms. Mistakes made during replication of genetic material can result in cell death, growth defects, or cancer. Because of the essential role of the molecular machinery that controls DNA replication and mitosis during development, its high degree of conservation among organisms is not surprising. Mammalian cell cycle genes have orthologues in plants, and vice versa. However, besides the many known and characterized proliferation genes, still undiscovered regulatory genes are expected to exist with conserved functions in plants and humans. Starting from genome-wide Arabidopsis thaliana microarray data, an integrative strategy based on coexpression, functional enrichment analysis, and cis-regulatory element annotation was combined with a comparative genomics approach between plants and humans to detect conserved cell cycle genes involved in DNA replication and/or DNA repair. With this systemic strategy, a set of 339 genes was identified as potentially conserved proliferation genes. Experimental analysis confirmed that 20 out of 40 selected genes had an impact on plant cell proliferation; likewise, an evolutionarily conserved role in cell division was corroborated for two human orthologues. Moreover, association analysis integrating Homo sapiens gene expression data with clinical information revealed that, for 45 genes, altered transcript levels and relapse risk clearly correlated. Our results illustrate how a systematic exploration of the A. thaliana genome can contribute to the experimental identification of new cell cycle regulators that might represent novel oncogenes or/and tumor suppressors.
  68. Movahedi, S., Van Bel, M., Heyndrickx, K., & Vandepoele, K. (2012). Comparative co-expression analysis in plant biology. PLANT CELL AND ENVIRONMENT, 35(10), 1787–1798.
    The analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has shown that transcriptionally coordinated genes are often functionally related. Based on large-scale expression compendia grouping multiple experiments, this guilt-by-association principle has been applied to study modular gene programmes, identify cis-regulatory elements or predict functions for unknown genes in different model plants. Recently, several studies have demonstrated how, through the integration of gene homology and expression information, correlated gene expression patterns can be compared between species. The incorporation of detailed functional annotations as well as experimental data describing proteinprotein interactions, phenotypes or tissue specific expression, provides an invaluable source of information to identify conserved gene modules and translate biological knowledge from model organisms to crops. In this review, we describe the different steps required to systematically compare expression data across species. Apart from the technical challenges to compute and display expression networks from multiple species, some future applications of plant comparative transcriptomics are highlighted.
  69. Moreau, H., Verhelst, B., Couloux, A., Derelle, E., Rombauts, S., Grimsley, N., Van Bel, M., et al. (2012). Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. GENOME BIOLOGY, 13(8).
    Background: Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research: Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion: The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.
  70. De Witte, D., Van Bel, M., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2012). A high performance computing approach to the dicovery of conserved motifs. 20e Annual Conference on Intelligent Systems for Molecular Biology, Abstracts (pp. 1–1). Presented at the 20e Annual Conference on Intelligent Systems for Molecular Biology (ISMB - 2012).
  71. De Witte, D., Van Bel, M., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2012). Alignment-free genome-wide comparative motif discovery in 4 Monocot species. 11th European Conference on Computational Biology, Abstracts (pp. 1–1). Presented at the 11th European Conference on Computational Biology (ECCB - 2012).
  72. Heyndrickx, K., & Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. PLANT PHYSIOLOGY, 159(3), 884–901.
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation.
  73. Petrov, Veselin, Vermeirssen, V., De Clercq, I., Van Breusegem, F., Minkov, I., Vandepoele, K., & Gechev, T. S. (2012). Identification of cis-regulatory elements specific for different types of reactive oxygen species in Arabidopsis thaliana. GENE, 499(1), 52–60.
  74. Vaulot, D., Lepere, C., Toulza, E., De la Iglesia, R., Poulain, J., Gaboyer, F., Moreau, H., et al. (2012). Metagenomes of the picoalga Bathycoccus from the Chile coastal upwelling. PLOS ONE, 7(6).
    Among small photosynthetic eukaryotes that play a key role in oceanic food webs, picoplanktonic Mamiellophyceae such as Bathycoccus, Micromonas, and Ostreococcus are particularly important in coastal regions. By using a combination of cell sorting by flow cytometry, whole genome amplification (WGA), and 454 pyrosequencing, we obtained metagenomic data for two natural picophytoplankton populations from the coastal upwelling waters off central Chile. About 60% of the reads of each sample could be mapped to the genome of Bathycoccus strain from the Mediterranean Sea (RCC1105), representing a total of 9 Mbp (sample T142) and 13 Mbp (sample T149) of non-redundant Bathycoccus genome sequences. WGA did not amplify all regions uniformly, resulting in unequal coverage along a given chromosome and between chromosomes. The identity at the DNA level between the metagenomes and the cultured genome was very high (96.3% identical bases for the three larger chromosomes over a 360 kbp alignment). At least two to three different genotypes seemed to be present in each natural sample based on read mapping to Bathycoccus RCC1105 genome.
  75. Proost, Sebastian, Fostier, J., De Witte, D., Dhoedt, B., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2012). i-ADHoRe 3.0 : fast and sensitive detection of genomic homology in extremely large data sets. NUCLEIC ACIDS RESEARCH, 40(2).
  76. Fostier, J., Proost, S., Dhoedt, B., Saeys, Y., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2011). A greedy, graph-based algorithm for the alignment of multiple homologous gene lists. BIOINFORMATICS, 27(6), 749–756.
  77. Babiychuk, E., Vandepoele, K., Wissing, J., Garcia-Diaz, M., De Rycke, R., Akbari, H., Joubès, J., et al. (2011). Plastid gene expression and plant development require a plastidic protein of the mitochondrial transcription termination factor family. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 108(16), 6674–6679.
  78. Mittler, R., Vanderauwera, S., Suzuki, N., Miller, G., Tognetti, V., Vandepoele, K., Gollery, M., et al. (2011). ROS signaling: the new wave? TRENDS IN PLANT SCIENCE, 16(6), 300–309.
    Reactive oxygen species (ROS) play a multitude of signaling roles in different organisms from bacteria to mammalian cells. They were initially thought to be toxic byproducts of aerobic metabolism, but have now been acknowledged as central players in the complex signaling network of cells. In this review, we will attempt to address several key questions related to the use of ROS as signaling molecules in cells, including the dynamics and specificity of ROS signaling, networking of ROS with other signaling pathways, ROS signaling within and across different cells, ROS waves and the evolution of the ROS gene network.
  79. Movahedi, S., Van de Peer, Y., & Vandepoele, K. (2011). Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice. PLANT PHYSIOLOGY, 156(3), 1316–1330.
    Microarray experiments have yielded massive amounts of expression information measured under various conditions for the model species Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). Expression compendia grouping multiple experiments make it possible to define correlated gene expression patterns within one species and to study how expression has evolved between species. We developed a robust framework to measure expression context conservation (ECC) and found, by analyzing 4,630 pairs of orthologous Arabidopsis and rice genes, that 77% showed conserved coexpression. Examples of nonconserved ECC categories suggested a link between regulatory evolution and environmental adaptations and included genes involved in signal transduction, response to different abiotic stresses, and hormone stimuli. To identify genomic features that influence expression evolution, we analyzed the relationship between ECC, tissue specificity, and protein evolution. Tissue-specific genes showed higher expression conservation compared with broadly expressed genes but were fast evolving at the protein level. No significant correlation was found between protein and expression evolution, implying that both modes of gene evolution are not strongly coupled in plants. By integration of cis-regulatory elements, many ECC conserved genes were significantly enriched for shared DNA motifs, hinting at the conservation of ancestral regulatory interactions in both model species. Surprisingly, for several tissue-specific genes, patterns of concerted network evolution were observed, unveiling conserved coexpression in the absence of conservation of tissue specificity. These findings demonstrate that orthologs inferred through sequence similarity in many cases do not share similar biological functions and highlight the importance of incorporating expression information when comparing genes across species.
  80. Huysman, M., Martens, C., Vandepoele, K., Gillard, J., Rayko, E., Heijde, M., Bowler, C., et al. (2010). Genome-wide analysis of the diatom cell cycle unveils a novel type of cyclins involved in environmental signaling. GENOME BIOLOGY, 11(2).
    Background : Despite the enormous importance of diatoms in aquatic ecosystems and their broad industrial potential, little is known about their life cycle control. Diatoms typically inhabit rapidly changing and unstable environments, suggesting that cell cycle regulation in diatoms must have evolved to adequately integrate various environmental signals. The recent genome sequencing of Thalassiosira pseudonana and Phaeodactylum tricornutum allows us to explore the molecular conservation of cell cycle regulation in diatoms. Results : By profile-based annotation of cell cycle genes, counterparts of conserved as well as new regulators were identified in T. pseudonana and P. tricornutum. In particular, the cyclin gene family was found to be expanded extensively compared to that of other eukaryotes and a novel type of cyclins was discovered, the diatom-specific cyclins. We established a synchronization method for P. tricornutum that enabled assignment of the different annotated genes to specific cell cycle phase transitions. The diatom-specific cyclins are predominantly expressed at the G1-to-S transition and some respond to phosphate availability, hinting at a role in connecting cell division to environmental stimuli. Conclusion : The discovery of highly conserved and new cell cycle regulators suggests the evolution of unique control mechanisms for diatom cell division, probably contributing to their ability to adapt and survive under highly fluctuating environmental conditions.
  81. Takahashi, N., Quimbaya Gomez, M. A., Schubert, V., Lammens, T., Vandepoele, K., Schubert, I., Matsui, M., et al. (2010). The MCM-Binding Protein ETG1 Aids Sister Chromatid Cohesion Required for Postreplicative Homologous Recombination Repair. PLOS GENETICS, 6(1).
    The DNA replication process represents a source of DNA stress that causes potentially spontaneous genome damage. This effect might be strengthened by mutations in crucial replication factors, requiring the activation of DNA damage checkpoints to enable DNA repair before anaphase onset. Here, we demonstrate that depletion of the evolutionarily conserved minichromosome maintenance helicase-binding protein ETG1 of Arabidopsis thaliana resulted in a stringent late G2 cell cycle arrest. This arrest correlated with a partial loss of sister chromatid cohesion. The lack-of-cohesion phenotype was intensified in plants without functional CTF18, a replication fork factor needed for cohesion establishment. The synergistic effect of the etg1 and ctf18 mutants on sister chromatid cohesion strengthened the impact on plant growth of the replication stress caused by ETG1 deficiency because of inefficient DNA repair. We conclude that the ETG1 replication factor is required for efficient cohesion and that cohesion establishment is essential for proper development of plants suffering from endogenous DNA stress. Cohesion defects observed upon knockdown of its human counterpart suggest an equally important developmental role for the orthologous mammalian ETG1 protein.
  82. Piganeau, G., Vandepoele, K., Gourbière, S., Van de Peer, Y., & Moreau, H. (2009). Unravelling cis-Regulatory Elements in the Genome of the Smallest Photosynthetic Eukaryote: Phylogenetic Footprinting in Ostreococcus. Journal of Molecular Evolution, 69(3), 249–259.
    We used a phylogenetic footprinting approach, adapted to high levels of divergence, to estimate the level of constraint in intergenic regions of the extremely gene dense Ostreococcus algae genomes (Chlorophyta, Prasinophyceae). We first benchmarked our method against the Saccharomyces sensu stricto genome data and found that the proportion of conserved non-coding sites was consistent with those obtained with methods using calibration by the neutral substitution rate. We then applied our method to the complete genomes of Ostreococcus tauri and O. lucimarinus, which are the most divergent species from the same genus sequenced so far. We found that 77% of intergenic regions in Ostreococcus still contain some phylogenetic footprints, as compared to 88% for Saccharomyces, corresponding to an average rate of constraint on intergenic region of 17% and 30%, respectively. A comparison with some known functional cis-regulatory elements enabled us to investigate whether some transcriptional regulatory pathways were conserved throughout the green lineage. Strikingly, the size of the phylogenetic footprints depends on gene orientation of neighboring genes, and appears to be genus-specific. In Ostreococcus, 5' intergenic regions contain four times more conserved sites than 3' intergenic regions, whereas in yeast a higher frequency of constrained sites in intergenic regions between genes on the same DNA strand suggests a higher frequency of bidirectional regulatory elements. The phylogenetic footprinting approach can be used despite high levels of divergence in the ultrasmall Ostreococcus algae, to decipher structure of constrained regulatory motifs, and identify putative regulatory pathways conserved within the green lineage.
  83. De Bodt, S., Proost, S., Vandepoele, K., Rouzé, P., & Van de Peer, Y. (2009). Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics, 10(288), 1–15.
    Background: Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results: In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.
  84. Van de Peer, Y., Fawcett, J., Proost, S., Sterck, L., & Vandepoele, K. (2009). The flowering world: a tale of duplications. TRENDS IN PLANT SCIENCE, 14(12), 680–688.
    Flowering plants contain many genes, most of which were created during the past 200 or so million years through small- and large-scale duplications. Paleo-polyploidy events, in particular, have been the subject of much recent research. There is a growing consensus that one or more genome doubling or merging events occurred early during the evolution of the flowering plants, and that many lineages have since undergone additional, independent and more recent duplication events. Here, we review the difficulties in determining the number of genome duplications and discuss how the completion of some additional genome sequences of species occupying key phylogenetic positions has led to a better understanding of the timing of certain duplication events. This is important if we want to demonstrate the significance of genome duplications for the evolution and radiation of (different groups of) flowering plants.
  85. Naouar, N., Vandepoele, K., Lammens, T., Casneuf, T., Zeller, G., Van Hummelen, P., Weigel, D., et al. (2009). Quantitative RNA expression analysis with Affymetrix Tiling 1.0R arrays identifies new E2F target genes. Plant Journal, 57(1), 184–194.
    The Affymetrix ATH1 array provides a robust standard tool for transcriptome analysis, but unfortunately does not represent all of the transcribed genes in Arabidopsis thaliana. Recently, Affymetrix has introduced its Arabidopsis Tiling 1.0R array, which offers whole-genome coverage of the sequenced Col-0 reference strain. Here, we present an approach to exploit this platform for quantitative mRNA expression analysis, and compare the results with those obtained using ATH1 arrays. We also propose a method for selecting unique tiling probes for each annotated gene or transcript in the most current genome annotation, TAIR7, generating Chip Definition Files for the Tiling 1.0R array. As a test case, we compared the transcriptome of wild-type plants with that of transgenic plants overproducing the heterodimeric E2Fa-DPa transcription factor. We show that with the appropriate data pre-processing, the estimated changes per gene for those with significantly different expression levels is very similar for the two array types. With the tiling arrays we could identify 368 new E2F-regulated genes, with a large fraction including an E2F motif in the promoter. The latter groups increase the number of excellent candidates for new, direct E2F targets by almost twofold, from 181 to 334.
  86. Vandepoele, Klaas, Quimbaya Gomez, M. A., Casneuf, T., De Veylder, L., & Van de Peer, Y. (2009). Unraveling Transcriptional Control in Arabidopsis Using cis-Regulatory Elements and Coexpression Networks. Plant Physiology, 150(2), 535–546.
    Analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has demonstrated that genes with an overall similar expression pattern are often enriched for similar functions. This guilt-by-association principle can be applied to define modular gene programs, identify cis-regulatory elements, or predict gene functions for unknown genes based on their coexpression neighborhood. We evaluated the potential to use Gene Ontology (GO) enrichment of a gene's coexpression neighborhood as a tool to predict its function but found overall low sensitivity scores (13%-34%). This indicates that for many functional categories, coexpression alone performs poorly to infer known biological gene functions. However, integration of cis-regulatory elements shows that 46% of the gene coexpression neighborhoods are enriched for one or more motifs, providing a valuable complementary source to functionally annotate genes. Through the integration of coexpression data, GO annotations, and a set of known cis-regulatory elements combined with a novel set of evolutionarily conserved plant motifs, we could link many genes and motifs to specific biological functions. Application of our coexpression framework extended with cis-regulatory element analysis on transcriptome data from the cell cycle-related transcription factor OBP1 yielded several coexpressed modules associated with specific cis-regulatory elements. Moreover, our analysis strongly suggests a feed-forward regulatory interaction between OBP1 and the E2F pathway. The ATCOECIS resource (http:// bioinformatics.psb.ugent.be/ATCOECIS/) makes it possible to query coexpression data and GO and cis-regulatory element annotations and to submit user-defined gene sets for motif analysis, providing an access point to unravel the regulatory code underlying transcriptional control in Arabidopsis (Arabidopsis thaliana).
  87. Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., & Vandepoele, K. (2009). PLAZA : a comparative genomics resource to study gene and genome evolution in plants. PLANT CELL, 21(12), 3718–3731.
    The number of sequenced genomes of representatives within the green lineage is rapidly increasing. Consequently, comparative sequence analysis has significantly altered our view on the complexity of genome organization, gene function, and regulatory pathways. To explore all this genome information, a centralized infrastructure is required where all data generated by different sequencing initiatives is integrated and combined with advanced methods for data mining. Here, we describe PLAZA, an online platform for plant comparative genomics (http://bioinformatics.psb.ugent.be/plaza/). This resource integrates structural and functional annotation of published plant genomes together with a large set of interactive tools to study gene function and gene and genome evolution. Precomputed data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, intraspecies whole-genome dot plots, and genomic colinearity between species. Through the integration of high confidence Gene Ontology annotations and tree-based orthology between related species, thousands of genes lacking any functional description are functionally annotated. Advanced query systems, as well as multiple interactive visualization tools, are available through a user-friendly and intuitive Web interface. In addition, detailed documentation and tutorials introduce the different tools, while the workbench provides an efficient means to analyze user-defined gene sets through PLAZA's interface. In conclusion, PLAZA provides a comprehensible and up-to-date research environment to aid researchers in the exploration of genome information within the green plant lineage.
  88. Dhaese, Stien, Vandepoele, K., WATERSCHOOT, D., Vanloo, B., Vandekerckhove, J., Ampe, C., & Van Troys, M. (2009). The Mouse Thymosin Beta15 Gene Family Displays Unique Complexity and Encodes A Functional Thymosin Repeat. JOURNAL OF MOLECULAR BIOLOGY, 387(4), 809–825.
    We showed earlier that human beta -thymosin 15 (Th15) is up-regulated in prostate cancer, confirming Studies from others that propagated Tb15 as a prostate cancer biomarker. In this first report on mouse Tb15, we show that, unlike in humans, four Tb15-like isoforms are present in Mouse. We used phylogenetic analysis of deuterostome beta-thymosins to show that these four new isoforms cluster within the vertebrate Tb15-clade. Intriguingly, one of these Mouse beta-thymosins, Th15r, consists of two beta-thymosin domains. The existence of such a repeat beta-thymosin is so far unique in vertebrates, though common in lower eukaryotes. Biochemical data indicate that Tb15r potently sequesters actin. In a cellular context, Tb15r behaves as a bona fide beta-thymosin, lowering central stress fibre content. We reveal that a complex genomic organization underlies Tb15r expression: Tb15r results from read-through transcription and alternative splicing of two tandem duplicated mouse Tb15 genes. Transcript profiling of all Mouse beta-thymosin isoform (Th15s, Tb4 and Tb10) reveals that two isoform switches occur between embryonic and adult tissues, and indicates Th15r as the major mouse Tb15 isoform in adult cells. Tb15r is present also in mouse prostate cancer cell lines. This insight into the mouse Tb15 family is fundamental for future studies on Tb15 in mouse (prostate) cancer models.
  89. Vandenbroucke, Korneel, Robbens, S., Vandepoele, K., Inzé, D., Van de Peer, Y., & Van Breusegem, F. (2008). Hydrogen peroxide-induced gene expression across kingdoms: a comparative analysis. MOLECULAR BIOLOGY AND EVOLUTION, 25(3), 507–516.
    Cells react to oxidative stress conditions by launching a defense response through the induction of nuclear gene expression. The advent of microarray technologies allowed monitoring of oxidative stress-dependent changes of transcript levels at a comprehensive and genome-wide scale, resulting in a series of inventories of differentially expressed genes in different organisms. We performed a meta-analysis on hydrogen peroxide (H2O2)-induced gene expression in the cyanobacterium Synechocystis PCC 6803, the yeast Saccharomyces cerevisiae and Schizosaccharomyces pombe, the land plant Arabidopsis thaliana, and the human HeLa cell line. The H2O2-induced gene expression in both yeast species was highly conserved and more similar to the A. thaliana response than that of the human cell line. Based on the expression characteristics of genuine antioxidant genes, we show that the antioxidant capacity of microorganisms and higher eukaryotes is differentially regulated. Four families of evolutionarily conserved eukaryotic proteins could be identified that were H2O2 responsive across kingdoms: DNAJ domain-containing heat shock proteins, small guanine triphosphate-binding proteins, Ca2+-dependent protein kinases, and ubiquitin-conjugating enzymes.
  90. Martens, Cindy, Vandepoele, K., & Van de Peer, Y. (2008). Whole-genome analysis reveals molecular innovations and evolutionary transitions in chromalveolate species. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 105(9), 3427–3432.
    The chromalveolates form a highly diverse and fascinating assemblage of organisms, ranging from obligatory parasites such as Plasmodium to free-living ciliates and algae such as kelps, diatoms, and dinoflagellates. Many of the species in this monophyletic grouping are of major medical, ecological, and economical importance. Nevertheless, their genome evolution is much less well studied than that of higher plants, animals, or fungi. In the current study, we have analyzed and compared 12 chromalveolate species for which whole-sequence information is available and provide a detailed picture on gene loss and gene gain in the different lineages. As expected, many gene loss and gain events can be directly correlated with the lifestyle and specific adaptations of the organisms studied. For instance, in the obligate intracellular Apicomplexa we observed massive loss of genes that play a role in general basic processes such as amino acid, carbohydrate, and lipid metabolism, reflecting the transition of a free-living to an obligate intracellular lifestyle. In contrast, many gene families show species-specific expansions, such as those in the plant pathogen oomycete Phytophthora that are involved in degrading the plant cell wall polysaccharides to facilitate the pathogen invasion process. In general, chromalveolates show a tremendous difference in genome structure and evolution and in the number of genes they have lost or gained either through duplication or horizontal gene transfer.
  91. Bowler, Chris, Allen, A. E., Badger, J. H., Grimwood, J., Jabbari, K., Kuo, A., Maheswari, U., et al. (2008). The Phaeodactylum genome reveals the evolutionary history of diatom genomes. NATURE, 456(7219), 239–244.
    Diatoms are photosynthetic secondary endosymbionts found throughout marine and freshwater environments, and are believed to be responsible for around one- fifth of the primary productivity on Earth(1,2). The genome sequence of the marine centric diatom Thalassiosira pseudonana was recently reported, revealing a wealth of information about diatom biology(3-5). Here we report the complete genome sequence of the pennate diatom Phaeodactylum tricornutum and compare it with that of T. pseudonana to clarify evolutionary origins, functional significance and ubiquity of these features throughout diatoms. In spite of the fact that the pennate and centric lineages have only been diverging for 90 million years, their genome structures are dramatically different and a substantial fraction of genes (similar to 40%) are not shared by these representatives of the two lineages. Analysis of molecular divergence compared with yeasts and metazoans reveals rapid rates of gene diversification in diatoms. Contributing factors include selective gene family expansions, differential losses and gains of genes and introns, and differential mobilization of transposable elements. Most significantly, we document the presence of hundreds of genes from bacteria. More than 300 of these gene transfers are found in both diatoms, attesting to their ancient origins, and many are likely to provide novel possibilities for metabolite management and for perception of environmental signals. These findings go a long way towards explaining the incredible diversity and success of the diatoms in contemporary oceans.
  92. Lessa Alvim Kamei, C., Boruc, J., Vandepoele, K., Van Den Daele, H., Maes, S., Russinova, E., Inzé, D., et al. (2008). The PRA1 gene family in Arabidopsis. PLANT PHYSIOLOGY, 147(4), 1735–1749.
    Prenylated Rab acceptor 1 (PRA1) domain proteins are small transmembrane proteins that regulate vesicle trafficking as receptors of Rab GTPases and the vacuolar soluble N-ethylmaleimide-sensitive factor attachment receptor protein VAMP2. However, little is known about PRA1 family members in plants. Sequence analysis revealed that higher plants, compared with animals and primitive plants, possess an expanded family of PRA1 domain-containing proteins. The Arabidopsis ( Arabidopsis thaliana) PRA1 (AtPRA1) proteins were found to homodimerize and heterodimerize in a manner corresponding to their phylogenetic distribution. Different AtPRA1 family members displayed distinct expression patterns, with a preference for vascular cells and expanding or developing tissues. AtPRA1 genes were significantly coexpressed with Rab GTPases and genes encoding vesicle transport proteins, suggesting an involvement in the vesicle trafficking process similar to that of their animal counterparts. Correspondingly, AtPRA1 proteins were localized in the endoplasmic reticulum, Golgi apparatus, and endosomes/prevacuolar compartments, hinting at a function in both secretory and endocytic intracellular trafficking pathways. Taken together, our data reveal a high functional diversity of AtPRA1 proteins, probably dealing with the various demands of the complex trafficking system.
  93. Van Roy, F., Vandepoele, K., Van Roy, N., Andries, V., Staes, K., Vandesompele, J., Laureys, G., et al. (2008). A constitutional translocation t(1;17)(p36.2;q11.2) in a neuroblastoma patient disrupts the the human NBPF1 and ACCN1 genes. EJC SUPPLEMENTS (Vol. 6, pp. 14–14). Presented at the 20th Meeting of the European Association for Cancer Research.
  94. Sterck, L., Rombauts, S., Vandepoele, K., Rouzé, P., & Van de Peer, Y. (2007). How many genes are there in plants (... and why are they there)? CURRENT OPINION IN PLANT BIOLOGY, 10(2), 199–203.
    Annotation of the first few complete plant genomes has revealed that plants have many genes. For Arabidopsis, over 26 500 gene loci have been predicted, whereas for rice, the number adds up to 41 000. Recent analysis of the poplar genome suggests more than 45 000 genes, and partial sequence data from Medicago and Lotus also suggest that these plants contain more than 40 000 genes. Nevertheless, estimations suggest that ancestral angiosperms had no more than 12 000-14 000 genes. One explanation for the large increase in gene number during angiosperm evolution is gene duplication. It has been shown previously that the retention of duplicates following small- and large-scale duplication events in plants is substantial. Taking into account the function of genes that have been duplicated, we are now beginning to understand why many plant genes might have been retained, and how their retention might be linked to the typical lifestyle of plants.
  95. Velasco, R., Zharkikh, A., Troggio, M., Cartwright, D. A., Cestaro, A., Pruss, D., Pindo, M., et al. (2007). A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLOS ONE, 2(12).
    Background. Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings. We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism ( SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions. Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape.
  96. Rymen, B., Fiorani, F., Kartal, F., Vandepoele, K., Inzé, D., & Beemster, G. (2007). Cold nights impair leaf growth and cell cycle progression in maize through transcriptional changes of cell cycle genes. PLANT PHYSIOLOGY, 143(3), 1429–1438.
    Low temperature inhibits the growth of maize (Zea mays) seedlings and limits yield under field conditions. To study the mechanism of cold-induced growth retardation, we exposed maize B73 seedlings to low night temperature (25 degrees C/4 degrees C, day/night) from germination until the completion of leaf 4 expansion. This treatment resulted in a 20% reduction in final leaf size compared to control conditions (25 degrees C/18 degrees C, day/night). A kinematic analysis of leaf growth rates in control and cold-treated leaves during daytime showed that cold nights affected both cell cycle time (165%) and cell production (222%). In contrast, the size of mature epidermal cells was unaffected. To analyze the effect on cell cycle progression at the molecular level, we identified through a bioinformatics approach a set of 43 cell cycle genes and analyzed their expression in proliferating, expanding, and mature cells of leaves exposed to either control or cold nights. This analysis showed that: (1) the majority of cell cycle genes had a consistent proliferation-specific expression pattern; and (2) the increased cell cycle time in the basal meristem of leaves exposed to cold nights was associated with differential expression of cell cycle inhibitors and with the concomitant down-regulation of positive regulators of cell division.
  97. Polet, D., Lambrechts, A., Vandepoele, K., Vandekerckhove, J., & Ampe, C. (2007). On the origin and evolution of vertebrate and viral profilins. FEBS LETTERS, 581(2), 211–217.
  98. Peres, A., Churchman, M. L., Hariharan, S., Himanen, K., Verkest, A., Vandepoele, K., Magyar, Z., et al. (2007). Novel plant-specific cyclin-dependent kinase inhibitors induced by biotic and abiotic stresses. JOURNAL OF BIOLOGICAL CHEMISTRY, 282(35), 25588–25596.
    The EL2 gene of rice ( Oryza sativa), previously classified as early response gene against the potent biotic elicitor N-acetylchitoheptaose and encoding a short polypeptide with unknown function, was identified as a novel cell cycle regulatory gene related to the recently reported SIAMESE ( SIM) gene of Arabidopsis thaliana. Iterative two-hybrid screens, in vitro pull-down assays, and fluorescence resonance energy transfer analyses showed that Orysa; EL2 binds the cyclin-dependent kinase ( CDK) CDKA1; 1 and D-type cyclins. No interaction was observed with the plant-specific B-type CDKs. The amino acid motif ELERFL was identified to be essential for cyclin, but not for CDK binding. Orysa; EL2 impaired the ability of Orysa; CYCD5; 3 to complement a budding yeast ( Saccharomyces cerevisiae) triple CLN mutant, whereas recombinant protein inhibited CDK activity in vitro. Moreover, Orysa; EL2 was able to rescue the multicellular trichome phenotype of sim mutants of Arabidopsis, unequivocally demonstrating that Orysa; EL2 operates as a cell cycle inhibitor. Orysa; EL2 mRNA levels were induced by cold, drought, and propionic acid. Our data suggest that Orysa; EL2 encodes a new type of plant CDK inhibitor that links cell cycle progression with biotic and abiotic stress responses.
  99. Blomme, T., Vandepoele, K., De Bodt, S., Simillion, C., Maere, S., & Van de Peer, Y. (2006). The gain and loss of genes during 600 million years of vertebrate evolution. GENOME BIOLOGY, 7(5).
    Background: Gene duplication is assumed to have played a crucial role in the evolution of vertebrate organisms. Apart from a continuous mode of duplication, two or three whole genome duplication events have been proposed during the evolution of vertebrates, one or two at the dawn of vertebrate evolution, and an additional one in the fish lineage, not shared with land vertebrates. Here, we have studied gene gain and loss in seven different vertebrate genomes, spanning an evolutionary period of about 600 million years. Results: We show that: first, the majority of duplicated genes in extant vertebrate genomes are ancient and were created at times that coincide with proposed whole genome duplication events; second, there exist significant differences in gene retention for different functional categories of genes between fishes and land vertebrates; third, there seems to be a considerable bias in gene retention of regulatory genes towards the mode of gene duplication ( whole genome duplication events compared to smaller-scale events), which is in accordance with the so-called gene balance hypothesis; and fourth, that ancient duplicates that have survived for many hundreds of millions of years can still be lost. Conclusion: Based on phylogenetic analyses, we show that both the mode of duplication and the functional class the duplicated genes belong to have been of major importance for the evolution of the vertebrates. In particular, we provide evidence that massive gene duplication ( probably as a consequence of entire genome duplications) at the dawn of vertebrate evolution might have been particularly important for the evolution of complex vertebrates.
  100. Vandepoele, Klaas, Casneuf, T., & Van de Peer, Y. (2006). Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics. GENOME BIOLOGY, 7(11).
    Background: Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation. Results: Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other. Conclusion: These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view.
  101. Vandepoele, Klaas. (2005). Mode and tempo of gene and genome evolution in plants. Ghent University. Faculty of Sciences, Ghent, Belgium.
  102. Paterson, A. H., Bowers, J. E., Van de Peer, Y., & Vandepoele, K. (2005). Ancient duplication of cereal genomes. NEW PHYTOLOGIST.
  103. Vandepoele, Klaas, Vlieghe, K., Florquin, K., Hennig, L., Beemster, G., Gruissem, W., Van de Peer, Y., et al. (2005). Genome-wide identification of potential plant E2F target genes. PLANT PHYSIOLOGY, 139(1), 316–328.
    Entry into the S phase of the cell cycle is controlled by E2F transcription factors that induce the transcription of genes required for cell cycle progression and DNA replication. Although the E2F pathway is highly conserved in higher eukaryotes, only a few E2F target genes have been experimentally validated in plants. We have combined microarray analysis and bioinformatics tools to identify plant E2F-responsive genes. Promoter regions of genes that were induced at the transcriptional level in Arabidopsis ( Arabidopsis thaliana) seedlings ectopically expressing genes for the E2Fa and DPa transcription factors were searched for the presence of E2F- binding sites, resulting in the identification of 181 putative E2F target genes. In most cases, the E2F- binding element was located close to the transcription start site, but occasionally could also be localized in the 5'untranslated region. Comparison of our results with available microarray data sets from synchronized cell suspensions revealed that the E2F target genes were expressed almost exclusively during G1 and S phases and activated upon reentry of quiescent cells into the cell cycle. To test the robustness of the data for the Arabidopsis E2F target genes, we also searched for the presence of E2F-cis-acting elements in the promoters of the putative orthologous rice ( Oryza sativa) genes. Using this approach, we identified 70 potential conserved plant E2F target genes. These genes encode proteins involved in cell cycle regulation, DNA replication, and chromatin dynamics. In addition, we identified several genes for potentially novel S phase regulatory proteins.
  104. Vandepoele, Klaas, & Van de Peer, Y. (2005). Exploring the plant transcriptome through phylogenetic profiling. PLANT PHYSIOLOGY, 137(1), 31–42.
    Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.
  105. Simillion, C., Vandepoele, K., & Van de Peer, Y. (2004). Recent developments in computational approaches for uncovering genomic homology. BIOESSAYS, 26(11), 1225–1235.
  106. Simillion, C., Vandepoele, K., Saeys, Y., & Van de Peer, Y. (2004). Building genomic profiles for uncovering segmental homology in the twilight zone. Belgian Bioinformatics Conference, 4th, Abstracts. Presented at the 4th Belgian Bioinformatics Conference (BBC 2004).
  107. Landrieu, I., da Costa, M., De Veylder, L., Dewitte, F., Vandepoele, K., Hassan, S., Wieruszeski, J.-M., et al. (2004). A small CDC25 dual-specificity tyrosine-phosphatase isoform in Arabidopsis thaliana. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 101(36), 13380–13385.
    The dual-specificity CDC25 phosphatases are critical positive regulators of cyclin-dependent kinases (CDKs). Even though an antagonistic Arabidopsis thaliana WEE1 kinase has been cloned and tyrosine phosphorylation of its CDKs has been demonstrated, no valid candidate for a CDC25 protein has been reported in higher plants. We identify a CDC25-related protein (Arath;CDC25) of A. thaliana, constituted by a sole catalytic domain. The protein has a tyrosine-phosphatase activity and stimulates the kinase activity of Arabidopsis CDKs. Its tertiary structure was obtained by NMR spectroscopy and confirms that Arath;CDC25 belongs structurally to the classical CDC25 superfamily with a central five-stranded beta-sheet surrounded by helices. A particular feature of the protein, however, is the presence of an additional zinc-binding loop in the C-terminal part. NMR mapping studies revealed the interaction with phosphorylated peptidic models derived from the conserved CDK loop containing the phosphothreonine-14 and phosphotyrosine-15. We conclude that despite sequence divergence, Arath;CDC25 is structurally and functionally an isoform of the CDC25 superfamily, which is conserved in yeast and in plants, including Arabidopsis and rice.
  108. Simillion, C., Vandepoele, K., Saeys, Y., & Van de Peer, Y. (2004). Building genomic profiles for uncovering segmental homology in the twilight zone. GENOME RESEARCH, 14(6), 1095–1106.
    The identification of homologous regions within and between genomes is all essential prerequisite for Studying genome structure and evolution. Different methods already exist that allow detecting homologous regions ill all automated manner. These methods are based either oil finding sequence similarities at the DNA level or on identifying chromosomal regions showing conservation of gene order and content. Especially the latter approach has proven useful for detecting homology between highly divergent chromosomal regions. However, until now, such map-based approaches required that candidate homologous regions show significant collinearity with other segments to be considered as being homologous. Here, we present a novel method that creates profiles combining the gene order and content information of multiple mutually homologous genomic segments. These profiles can be used to scan one or more genomes to detect segments that show significant collinearity with the entire profile but not necessarily with individual segments. When applying this new method to the combined genomes of Arabidopsis and rice, we find additional evidence for ancient duplication events in the rice genome.
  109. Vandepoele, Klaas, Simillion, C., & Van de Peer, Y. (2004). The quest for genomic homology. CURRENT GENOMICS, 5(4), 299–308.
  110. Gevers, D., Vandepoele, K., Simillion, C., & Van de Peer, Y. (2004). Gene duplication and biased functional retention of paralogs in bacterial genomes. TRENDS IN MICROBIOLOGY, 12(4), 148–154.
    Gene duplication is considered an important prerequisite for gene innovation that can facilitate adaptation to changing environments. The analysis of 106 bacterial genome sequences has revealed the existence of a significant number of paralogs. Analysis of the functional classification of these paralogs reveals a preferential enrichment in functional classes that are involved in transcription, metabolism and defense mechanisms. From the organization of paralogs in the genome we can conclude that duplicated genes in bacteria appear to have been mainly created by small-scale duplication events, such as tandem and operon duplications.
  111. Vercammen, Dominique, Van De Cotte, B., De Jaeger, G., Eeckhout, D., Casteels, P., Vandepoele, K., Vandenberghe, I., et al. (2004). Type II metacaspases Atmc4 and Atmc9 of Arabidopsis thaliana cleave substrates after arginine and lysine. JOURNAL OF BIOLOGICAL CHEMISTRY, 279(44), 45329–45336.
    Nine potential caspase counterparts, designated metacaspases, were identified in the Arabidopsis thaliana genome. Sequence analysis revealed two types of metacaspases, one with ( type I) and one without ( type II) a proline- or glutamine-rich N-terminal extension, possibly representing a prodomain. Production of recombinant Arabidopsis type II metacaspases in Escherichia coli resulted in cysteine-dependent autocatalytic processing of the proform into large and small subunits, in analogy to animal caspases. A detailed biochemical characterization with a broad range of synthetic oligopeptides and several protease inhibitors of purified recombinant proteins of both metacaspase 4 and 9 showed that both metacaspases are arginine/lysine-specific cysteine proteases and did not cleave caspase-specific synthetic substrates. These findings suggest that type II metacaspases are not directly responsible for earlier reported caspase-like activities in plants.
  112. Vandepoele, K., De Vos, W., Taylor, J. S., Meyer, A., & Van de Peer, Y. (2004). Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 101(6), 1638–1643.
    It has been suggested that fish have more genes than humans. Whether most of these additional genes originated through a complete (fish-specific) genome duplication or through many lineage-specific tandem gene or smaller block duplications and family expansions continues to be debated. We analyzed the complete genome of the pufferfish Takifugu rubripes (Fugu) and compared it with the paranome of humans. We show that most paralogous genes of Fugu are the result of three complete genome duplications. Both relative and absolute dating of the complete predicted set of protein-coding genes suggest that initial genome duplications, estimated to have occurred at least 600 million years ago, shaped the genome of all vertebrates, In addition, analysis of >150 block duplications in the Fugu genome clearly supports a fish-specific genome duplication (approximate to320 million years ago) that coincided with the vast radiation of most modern ray-finned fishes. Unlike the human genome, Fugu contains very few recently duplicated genes; hence, many human genes are much younger than fish genes. This lack of recent gene duplication, or, alternatively, the accelerated rate of gene loss, is possibly one reason for the drastic reduction of the genome size of Fugu observed during the past 100 million years or so, subsequent to the additional genome duplication that ray-finned fishes but not land vertebrates experienced.
  113. Breyne, Peter, Dreesen, R., Cannoot, B., Rombaut, D., Vandepoele, K., Rombauts, S., Vanderhaeghen, R., et al. (2003). Quantitative cDNA-AFLP analysis for genome-wide expression studies. MOLECULAR GENETICS AND GENOMICS, 269(2), 173–179.
    An improved cDNA-AFLP method for genome-wide expression analysis has been developed. We demonstrate that this method is an efficient tool for quantitative transcript profiling and a valid alternative to microarrays. Unique transcript tags, generated from reverse-transcribed messenger RNA by restriction enzymes, were screened through a series of selective PCR amplifications. Based on in silico analysis, an enzyme combination was chosen that ensures that at least 60% of all the mRNAs were represented by an informative sequence tag. The sensitivity and specificity of the method allows one to detect poorly expressed genes and distinguish between homologous sequences. Accurate gene expression profiles were determined by quantitative analysis of band intensities, and subtle differences in transcriptional activity were revealed. A detailed screen for cell cycle-modulated genes in tobacco demonstrates the usefulness of the technology for genome-wide expression analysis.
  114. Raes, Jeroen, Vandepoele, K., Simillion, C., Saeys, Y., & Van de Peer, Y. (2003). Investigating ancient duplication events in the Arabidopsis genome. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS, 3(1-4), 117–129.
  115. Raes, Jeroen, Vandepoele, K., Simillion, C., Saeys, Y., & Van de Peer, Y. (2003). Investigating ancient duplication events in the Arabidopsis genome. In Axel Meyer & Y. Van de Peer (Eds.), Genome evolution : gene and genome duplications and the origin of novel gene functions (pp. 117–129). Dordrecht, The Netherlands: Kluwer Academic.
  116. Vandepoele, K., Simillion, C., & Van de Peer, Y. (2003). Evidence that rice and other cereals are ancient aneuploids. PLANT CELL, 15(9), 2192–2202.
    Detailed analyses of the genomes of several model organisms revealed that large-scale gene or even entire-genome duplications have played prominent roles in the evolutionary history of many eukaryotes. Recently, strong evidence has been presented that the genomic structure of the dicotyledonous model plant species Arabidopsis is the result of multiple rounds of entire-genome duplications. Here, we analyze the genome of the monocotyledonous model plant species rice, for which a draft of the genomic sequence was published recently. We show that a substantial fraction of all rice genes (similar to15%) are found in duplicated segments. Dating of these block duplications, their nonuniform distribution over the different rice chromosomes, and comparison with the duplication history of Arabidopsis suggest that rice is not an ancient polyploid, as suggested previously, but an ancient aneuploid that has experienced the duplication of one-or a large part of one-chromosome in its evolutionary past, similar to70 million years ago. This date predates the divergence of most of the cereals, and relative dating by phylogenetic analysis shows that this duplication event is shared by most if not all of them.
  117. Breyne, Peter, Dreesen, R., Vandepoele, K., De Veylder, L., Van Breusegem, F., Callewaert, L., Rombauts, S., et al. (2002). Transcriptome analysis during cell division in plants. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 99(23), 14825–14830.
    Using synchronized tobacco Bright Yellow-2 cells and cDNA-amplified fragment length polymorphism-based genomewide expression analysis, we built a comprehensive collection of plant cell cycle-modulated genes. Approximately 1,340 periodically expressed genes were identified, including known cell cycle control genes as well as numerous unique candidate regulatory genes. A number of plant-specific genes were found to be cell cycle modulated. Other transcript tags were derived from unknown plant genes showing homology to cell cycle-regulatory genes of other organisms. Many of the genes encode novel or uncharacterized proteins, indicating that several processes underlying cell division are still largely unknown.
  118. Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., & Van de Peer, Y. (2002). The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. GENOME RESEARCH, 12(11), 1792–1801.
    It is expected that one of the merits of comparative genomics lies in the transfer of structural and functional information from one genome to another. This is based on the observation that, although the number of chromosomal rearrangements that occur in genomes is extensive, different species still exhibit a certain degree of conservation regarding gene content and gene order. It is in this respect that we have developed a new software tool for the Automatic Detection of Homologous Regions (ADHoRe). ADHoRe was primarily developed to find large regions of microcolinearity, taking into account different types of microrearrangements such as tandem duplications, gene loss and translocations, and inversions. Such rearrangements often complicate the detection of colinearity, in particular when comparing more anciently diverged species. Application of ADHoRe to the complete genome of Arabidopsis and a large collection of concatenated rice BACs yields more than 20 regions showing statistically significant microcolinearity between both plant species. These regions comprise from 4 up to 11 conserved homologous gene pairs. We predict the number of homologous regions and the extent of microcolinearity to increase significantly once better annotations of the rice genome become available.
  119. Vandepoele, Klaas, Raes, J., De Veylder, L., Rouzé, P., Rombauts, S., & Inzé, D. (2002). Genome-wide analysis of core cell cycle genes in Arabidopsis. PLANT CELL, 14(4), 903–916.
    Cyclin-dependent kinases and cyclins regulate with the help of different interacting proteins the progression through the eukaryotic cell cycle. A high-quality, homology-based annotation protocol was applied to determine the core cell cycle genes in the recently completed Arabidopsis genome sequence. In total, 61 genes were identified belonging to seven selected families of cell cycle regulators, for which 30 are new or corrections of the existing annotation. A new class of putative cell cycle regulators was found that probably are competitors of E2F/DP transcription factors, which mediate the G1-to-S progression. In addition, the existing nomenclature for cell cycle genes of Arabidopsis was updated, and the physical positions of all genes were compared with segmentally duplicated blocks in the genome, showing that 22 core cell cycle genes emerged through block duplications. This genome-wide analysis illustrates the complexity of the plant cell cycle machinery and provides a tool for elucidating the function of new family members in the future.
  120. Simillion, C., Vandepoele, K., Van Montagu, M., Zabeau, M., & Van de Peer, Y. (2002). The hidden duplication past of Arabidopsis thaliana. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 99(21), 13627–13632.
    Analysis of the genome sequence of Arabidopsis thaliana shows that this genome, like that of many other eukaryotic organisms, has undergone large-scale gene duplications or even duplications of the entire genome. However, the high frequency of gene loss after duplication events reduces colinearity and therefore the chance of finding duplicated regions that, at the extreme, no longer share homologous genes. In this study we show that heavily degenerated block duplications that can no longer be recognized by directly comparing two segments because of differential gene loss, can still be detected through indirect comparison with other segments. When these so-called hidden duplications in Arabidopsis are taken into account, many homologous genomic regions can be found in five to eight copies. This finding strongly implies that Arabidopsis has undergone three, but probably no more, rounds of genome duplications. Therefore, adding such hidden blocks to the duplication landscape of Arabidopsis sheds light on the number of polyploidy events that this model plant genome has undergone in its evolutionary past.
  121. Vandepoele, Klaas, Simillion, C., & Van de Peer, Y. (2002). Detecting the undetectable : uncovering duplicated segments in Arabidopsis by comparison with rice. TRENDS IN GENETICS, 18(12), 606–608.
    Genome analysis shows that large-scale gene duplications have occurred in fungi, animals and plants, creating genomic regions that show similarity in gene content and order. However the high frequency of gene loss reduces colinearity resulting in duplicated regions that, in the extreme, no longer share homologous genes. Here, we show that by comparison with an appropriate second genome, such paralogous regions can still be identified.
  122. Vandepoele, Klaas, Saeys, Y., Simillion, C., RAES, J., & Van de Peer, Y. (2002). Detecting microcolinearity between Arabidopsis and Rice. Proceedings of the 6th Gatersleben Research Conference (2002), “Plant Genetic Resources in the Genomic Era: Genetic Diversity, Genome Evolution and New Applications”.