Publications

We publish new findings in general life science journals as well as in journals focusing on plant biology, bioinformatics & tool/resource development. A more detailed overview incl. citations is available via Google Scholar, ORCID, and Web of Science.

Publications

Manosalva Pérez, N., Ferrari, C., Engelhorn, J., Depuydt, T., Nelissen, H., Hartwig, T., & Vandepoele, K. (2024). MINI‐AC : inference of plant gene regulatory networks using bulk or single‐cell accessible chromatin profiles. PLANT JOURNAL, 117, 280–301. https://doi.org/10.1111/tpj.16483

Gene regulatory networks (GRNs) represent the interactions between transcription factors (TF) and their target genes. Plant GRNs control transcriptional programs involved in growth, development, and stress responses, ultimately affecting diverse agricultural traits. While recent developments in accessible chromatin (AC) profiling technologies make it possible to identify context-specific regulatory DNA, learning the underlying GRNs remains a major challenge. We developed MINI-AC (Motif-Informed Network Inference based on Accessible Chromatin), a method that combines AC data from bulk or single-cell experiments with TF binding site (TFBS) information to learn GRNs in plants. We benchmarked MINI-AC using bulk AC datasets from different Arabidopsis thaliana tissues and showed that it outperforms other methods to identify correct TFBS. In maize, a crop with a complex genome and abundant distal AC regions, MINI-AC successfully inferred leaf GRNs with experimentally confirmed, both proximal and distal, TF-target gene interactions. Furthermore, we showed that both AC regions and footprints are valid alternatives to infer AC-based GRNs with MINI-AC. Finally, we combined MINI-AC predictions from bulk and single-cell AC datasets to identify general and cell-type specific maize leaf regulators. Focusing on C4 metabolism, we identified diverse regulatory interactions in specialized cell types for this photosynthetic pathway. MINI-AC represents a powerful tool for inferring accurate AC-derived GRNs in plants and identifying known and novel candidate regulators, improving our understanding of gene regulation in plants.
A
Liu, L., Heidecker, M., Depuydt, T., Manosalva Pérez, N., Crespi, M., Blein, T., & Vandepoele, K. (2023). Transcription factors KANADI 1, MYB DOMAIN PROTEIN 44, and PHYTOCHROME INTERACTING FACTOR 4 regulate long intergenic noncoding RNAs expressed in Arabidopsis roots. PLANT PHYSIOLOGY, 193(3), 1933–1953. https://doi.org/10.1093/plphys/kiad360

Thousands of long intergenic noncoding RNAs (lincRNAs) have been identified in plant genomes. While some lincRNAs have been characterized as important regulators in different biological processes, little is known about the transcriptional regulation for most plant lincRNAs. Through the integration of 8 annotation resources, we defined 6,599 high-confidence lincRNA loci in Arabidopsis (Arabidopsis thaliana). For lincRNAs belonging to different evolutionary age categories, we identified major differences in sequence and chromatin features, as well as in the level of conservation and purifying selection acting during evolution. Spatiotemporal gene expression profiles combined with transcription factor (TF) chromatin immunoprecipitation (ChIP) data were used to construct a TF-lincRNA regulatory network containing 2,659 lincRNAs and 15,686 interactions. We found that properties characterizing lincRNA expression, conservation, and regulation differ between plants and animals. Experimental validation confirmed the role of 3 TFs, KANADI 1, MYB DOMAIN PROTEIN 44, and PHYTOCHROME INTERACTING FACTOR 4, as key regulators controlling root-specific lincRNA expression, demonstrating the predictive power of our network. Furthermore, we identified 58 lincRNAs, regulated by these TFs, showing strong root cell type-specific expression or chromatin accessibility, which are linked with genome-wide association studies genetic associations related to root system development and growth. The multilevel genome-wide characterization covering chromatin state information, promoter conservation, and chromatin immunoprecipitation-based TF binding, for all detectable lincRNAs across 769 expression samples, permits rapidly defining the biological context and relevance of Arabidopsis lincRNAs through regulatory networks.
A
Meijer, A., Atighi Quchan Atigh, M., Demeestere, K., De Meyer, T., Vandepoele, K., & Kyndt, T. (2023). Dicer-like 3a mediates intergenerational resistance against root-knot nematodes in rice via hormone responses. PLANT PHYSIOLOGY, 193(3), 2071–2085. https://doi.org/10.1093/plphys/kiad215

In a continuously changing and challenging environment, passing down the memory of encountered stress factors to offspring could provide an evolutionary advantage. In this study, we demonstrate the existence of 'intergenerational acquired resistance' in the progeny of rice (Oryza sativa) plants attacked by the belowground parasitic nematode Meloidogyne graminicola. Transcriptome analyses revealed that genes involved in defense pathways are generally downregulated in progeny of nematode-infected plants under uninfected conditions but show a stronger induction upon nematode infection. This phenomenon was termed "spring loading" and depends on initial downregulation by the 24nt siRNA biogenesis gene dicer-like 3a (dcl3a) involved in the RNA-directed DNA methylation pathway. Knock-down of dcl3a led to increased nematode susceptibility and abolished intergenerational acquired resistance, as well as jasmonic acid/ethylene spring loading in the offspring of infected plants. The importance of ethylene signaling in intergenerational resistance was confirmed by experiments on a knock-down line of ethylene insensitive 2 (ein2b), which lacks intergenerational acquired resistance. Taken together, these data indicate a role for DCL3a in regulating plant defense pathways during both within-generation and intergenerational resistance against nematodes in rice.
A
Smet, D., Opdebeeck, H., & Vandepoele, K. (2023). Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice. FRONTIERS IN PLANT SCIENCE, 14. https://doi.org/10.3389/fpls.2023.1212073

Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing key components of cis-regulatory elements and regulatory networks. Our understanding of the underlying regulatory code remains, however, incomplete. Recent studies have shown that, by training machine learning (ML) algorithms on genomic sequence features, it is possible to predict which genes will transcriptionally respond to a specific stress. By identifying the most important features for gene expression prediction, these trained ML models allow, in theory, to further elucidate the regulatory code underlying the transcriptional response to abiotic stress. Here, we trained random forest ML models to predict gene expression in rice (Oryza sativa) in response to heat or drought stress. Apart from thoroughly assessing model performance and robustness across various input training data, the importance of promoter and gene body sequence features to train ML models was evaluated. The use of enriched promoter oligomers, complementing known TF binding sites, allowed us to gain novel insights in DNA motifs contributing to the stress regulatory code. By comparing genomic feature importance scores for drought and heat stress over time, general and stress-specific genomic features contributing to the performance of the learned models and their temporal variation were identified. This study provides a solid foundation to build and interpret ML models accurately predicting transcriptional responses and enables novel insights in biological sequence features that are important for abiotic stress responses.
A
Zackova Suchanova, J., Bilcke, G., Romanowska, B., Fatlawi, A., Pippel, M., Skeffington, A., … Poulsen, N. (2023). Diatom adhesive trail proteins acquired by horizontal gene transfer from bacteria serve as primers for marine biofilm formation. NEW PHYTOLOGIST, 240(2), 770–783. https://doi.org/10.1111/nph.19145

Biofilm-forming benthic diatoms are key primary producers in coastal habitats, where they frequently dominate sunlit intertidal substrata. The development of gliding motility in raphid diatoms was a key molecular adaptation that contributed to their evolutionary success. However, the structure-function correlation between diatom adhesives utilized for gliding and their relationship to the extracellular matrix that constitutes the diatom biofilm is unknown. Here, we have used proteomics, immunolocalization, comparative genomics, phylogenetics and structural homology analysis to investigate the evolutionary history and function of diatom adhesive proteins. Our study identified eight proteins from the adhesive trails of Craspedostauros australis, of which four form a new protein family called Trailins that contain an enigmatic Choice-of-Anchor A (CAA) domain, which was acquired through horizontal gene transfer from bacteria. Notably, the CAA-domain shares a striking structural similarity with one of the most widespread domains found in ice-binding proteins (IPR021884). Our work offers new insights into the molecular basis for diatom biofilm formation, shedding light on the function and evolution of diatom adhesive proteins. This discovery suggests that there is a transition in the composition of biomolecules required for initial surface colonization and those utilized for 3D biofilm matrix formation.
A
Kaufmann, K., & Vandepoele, K. (Eds.). (2023). Plant gene regulatory networks : methods and protocols. https://doi.org/10.1007/978-1-0716-3354-0

Plant development and environmental responses require the coordinated, orchestrated, and often cell-type-specific activities of many gene products. The complex regulatory relationships between genes and gene products are abstractly described as gene regulatory networks (GRNs). The characterization of GRNs requires typically quantitative experimental data on gene expression dynamics, e.g., in response to an environmental or developmental cue, together with information on the “hard-wiring” of the system, reporting the molecular players and interactions that determine regulatory activity and specificity. Natural genetic variation and evolutionary conservation of genes and regulatory elements can further enrich the diversity in data types used to delineate and analyze GRNs. Together, different types of experimental input serve as foundation for computational data integration and modelling of gene-regulatory interactions that underlie phenotypic traits and plant behavior in response to environmental cues. Advanced experimental technologies to interrogate plant GRNs at unprecedented resolution are rapidly evolving, accompanied by the introduction of innovative computational approaches to integrate multiscale data and predict gene-regulatory interactions. The second edition of Plant Gene Regulatory Networks aims to introduce different experimental techniques and computational approaches to elucidate GRN structure and functions in plants. The experimental technologies include multi-scale analyses of gene activities, including, e.g., cell-type and single cell transcriptome analyses, approaches for targeted perturbation, and untargeted analyses of proteomes and metabolomes. Different innovative techniques to identify molecular interactions in vitro and in vivo and to elucidate mechanisms of specificity in gene regulation are presented. It is becoming increasingly clear that mechanisms of gene regulation, and thereby mechanistic regulatory relationships forming the basis of plant GRNs, can only be understood by elucidating the higher-order architecture of genetic interactions and 3D molecular topology in gene promoters, and in general in the plant nucleus. Besides providing protocols for selected experimental approaches, Plant Gene Regulatory Networks highlights bioinformatics and data resources and pipelines for primary data analysis, integration, and network analysis. Selected innovative computational approaches for network modelling, including, e.g., machine learning approaches and agentbased modelling, are introduced. In sum, the chapters presented in the second edition of Plant Gene Regulatory Networks aim to expand the toolbox for GRN analysis in different plant species, at different experimental and computational levels, by providing detailed practical and technical insights.
A
Willems, A., Liang, Y., Heyman, J., Depuydt, T., Eekhout, T., Canher, B., … De Veylder, L. (2023). Plant lineage-specific PIKMIN1 drives APC/CCCS52A2 E3-ligase activity-dependent cell division. PLANT PHYSIOLOGY, 191(3), 1574–1595. https://doi.org/10.1093/plphys/kiac528

The anaphase-promoting complex/cyclosome (APC/C) marks key cell cycle proteins for proteasomal breakdown, thereby ensuring unidirectional progression through the cell cycle. Its target recognition is temporally regulated by activating subunits, one of which is called CELL CYCLE SWITCH 52 A2 (CCS52A2). We sought to expand the knowledge on the APC/C by using the severe growth phenotypes of CCS52A2-deficient Arabidopsis (Arabidopsis thaliana) plants as a readout in a suppressor mutagenesis screen, resulting in the identification of the previously undescribed gene called PIKMIN1 (PKN1). PKN1 deficiency rescues the disorganized root stem cell phenotype of the ccs52a2-1 mutant, whereas an excess of PKN1 inhibits growth of ccs52a2-1 plants, indicating the need for control of PKN1 abundance for proper development. Accordingly, the lack of PKN1 in a wild-type background negatively impacts cell division, while its systemic overexpression promotes proliferation. PKN1 shows a cell cycle phase-dependent accumulation pattern, localizing to microtubular structures, including the preprophase band, the mitotic spindle, and phragmoplast. PKN1 is conserved throughout the plant kingdom, with its function in cell division being evolutionary conserved in the liverwort Marchantia polymorpha. Our data thus demonstrate that PKN1 represents a novel, plant-specific gene with a role in cell division that is likely proteolytically controlled by the CCS52A2-activated APC/C.
A
Gryffroy, L., Ceulemans, E., Manosalva Pérez, N., Venegas Molina, J. J., Jaramillo, A., Rodrigues, S. D., … Goossens, A. (2023). Rhizogenic agrobacterium protein RolB interacts with the TOPLESS repressor proteins to reprogram plant immunity and development. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 120(3). https://doi.org/10.1073/pnas.2210300120

Rhizogenic Agrobacterium strains comprise biotrophic pathogens that cause hairy root disease (HRD) on hydroponically grown Solanaceae and Cucurbitaceae crops, besides being widely explored agents for the creation of hairy root cultures for the sustainable production of plant-specialized metabolites. Hairy root formation is mediated through the expression of genes encoded on the T-DNA of the root-inducing (Ri) plasmid, of which several, including root oncogenic locus B (rolB), play a major role in hairy root development. Despite decades of research, the exact molecular function of the proteins encoded by the rol genes remains enigmatic. Here, by means of TurboID-mediated proximity labeling in tomato (Solanum lycopersicum) hairy roots, we identified the repressor proteins TOPLESS (TPL) and Novel Interactor of JAZ (NINJA) as direct interactors of RolB. Although these interactions allow RolB to act as a transcriptional repressor, our data hint at another in planta function of the RolB oncoprotein. Hence, by a series of plant bioassays, transcriptomic and DNA-binding site enrichment analyses, we conclude that RolB can mitigate the TPL functioning so that it leads to a specific and partial reprogramming of phytohormone signaling, immunity, growth, and developmental processes. Our data support a model in which RolB manipulates host transcription, at least in part, through interaction with TPL, to facilitate hairy root development. Thereby, we provide important mechanistic insights into this renowned oncoprotein in HRD.
A
Nguyen, T. H., Thiers, L., Van Moerkercke, A., Bai, Y., Fernandez Calvo, P., Minne, M., … Goossens, A. (2023). A redundant transcription factor network steers spatiotemporal Arabidopsis triterpene synthesis. NATURE PLANTS, 9(6), 926–937. https://doi.org/10.1038/s41477-023-01419-8

Plant specialized metabolite biosynthesis is strictly regulated in time and space. The authors describe a robust and redundant transcriptional network that steers cell-specific and jasmonate-inducible triterpene biosynthesis in Arabidopsis root tips. Plant specialized metabolites modulate developmental and ecological functions and comprise many therapeutic and other high-value compounds. However, the mechanisms determining their cell-specific expression remain unknown. Here we describe the transcriptional regulatory network that underlies cell-specific biosynthesis of triterpenes in Arabidopsis thaliana root tips. Expression of thalianol and marneral biosynthesis pathway genes depends on the phytohormone jasmonate and is limited to outer tissues. We show that this is promoted by the activity of redundant bHLH-type transcription factors from two distinct clades and coactivated by homeodomain factors. Conversely, the DOF-type transcription factor DAG1 and other regulators prevent expression of the triterpene pathway genes in inner tissues. We thus show how precise expression of triterpene biosynthesis genes is determined by a robust network of transactivators, coactivators and counteracting repressors.
A
Nisa, M., Eekhout, T., Bergis, C., Pedroza-Garcia, J. A., He, X., Mazubert, C., … Raynaud, C. (2023). Distinctive and complementary roles of E2F transcription factors during plant replication stress responses. MOLECULAR PLANT, 16(8), 1269–1282. https://doi.org/10.1016/j.molp.2023.07.002

Survival of living organisms is fully dependent on their maintenance of genome integrity, being permanently threatened by replication stress in proliferating cells. Although the plant DNA damage response (DDR) regulator SOG1 has been demonstrated to cope with replicative defects, accumulating evidence points to other pathways functioning independently of SOG1. Here, we have studied the role of the Arabidopsis E2FA and EF2B transcription factors, two well-characterized regulators of DNA replication, in the response to replication stress. Through a combination of reverse genetics and chromatin-immunoprecipitation approaches, we show that E2FA and E2FB share many target genes with SOG1, providing evidence for their involvement in the DDR. Analysis of double and triple mutant combinations revealed that E2FB, rather than E2FA, plays the most prominent role in sustaining growth in the presence of replicative defects, either operating antagonistically or synergistically with SOG1. Reversely, SOG1 aids in overcoming the replication defects of E2FA/E2FB-deficient plants. Our data reveal a complex transcriptional network controlling the replication stress response, in which both E2Fs and SOG1 act as key regulatory factors.
A
Vandepoele, K., & Kaufmann, K. (2023). Characterization of gene regulatory networks in plants using new methods and data types. In K. Kaufmann & K. Vandepoele (Eds.), Plant gene regulatory networks : methods and protocols (2nd ed., Vol. 2698, pp. 1–11). https://doi.org/10.1007/978-1-0716-3354-0_1

A major question in plant biology is to understand how plant growth, development, and environmental responses are controlled and coordinated by the activities of regulatory factors. Gene regulatory network (GRN) analyses require integrated approaches that combine experimental approaches with computational analyses. A wide range of experimental approaches and tools are now available, such as targeted perturbation of gene activities, quantitative and cell-type specific measurements of dynamic gene activities, and systematic analysis of the molecular 'hard-wiring' of the systems. At the computational level, different tools and databases are available to study regulatory sequences, including intuitive visualizations to explore data-driven gene regulatory networks in different plant species. Furthermore, advanced data integration approaches have recently been developed to efficiently leverage complementary regulatory data types and learn context-specific networks.
A
Manosalva Pérez, N., & Vandepoele, K. (2023). Prediction of transcription factor regulators and gene regulatory networks in tomato using binding site information. In K. Kaufmann & K. Vandepoele (Eds.), Plant gene regulatory networks : methods and protocols (2nd ed., Vol. 2698, pp. 323–349). https://doi.org/10.1007/978-1-0716-3354-0_19

Gene regulatory networks (GRNs) represent the regulatory links between transcription factors (TF) and their target genes. In plants, they are essential to understand transcriptional programs that control important agricultural traits such as yield or (a)biotic stress response. Although several high- and low-throughput experimental methods have been developed to map GRNs in plants, these are sometimes expensive, come with laborious protocols, and are not always optimized for tomato, one of the most important horticultural crops worldwide. In this chapter, we present a computational method that covers two protocols: one protocol to map gene identifiers between two different tomato genome assemblies, and another protocol to predict putative regulators and delineate GRNs given a set of functionally related or coregulated genes by exploiting publicly available TF-binding information. As an example, we applied the motif enrichment protocol on tomato using upregulated genes in response to jasmonate, as well as upregulated and downregulated genes in plants with genotypes OENAM1 and nam1, respectively. We found that our protocol accurately infers the expected TFs as top enriched regulators and identifies GRNs functionally enriched in biological processes related with the experimental context under study.
A
Depuydt, T., De Rybel, B., & Vandepoele, K. (2022). Charting plant gene functions in the multi-omics and single-cell era. TRENDS IN PLANT SCIENCE, 28(3), 283–296. https://doi.org/10.1016/j.tplants.2022.09.008

Despite the increased access to high-quality plant genome sequences, the set of genes with a known function remains far from complete. With the advent of novel bulk and single-cell omics profiling methods, we are entering a new era where advanced and highly integrative functional annotation strategies are being developed to elucidate the functions of all plant genes. Here, we review different multi-omics approaches to improve functional and regulatory gene characterization and highlight the power of machine learning and network biology to fully exploit the complementary information embedded in different omics layers. Finally, we discuss the potential of emerging single-cell methods and algorithms to further increase the resolution, allowing generation of functional insights about plant biology.
A
Van Bel, M., Silvestri, F., Weitz, E. M., Kreft, L., Botzki, A., Coppens, F., & Vandepoele, K. (2022). PLAZA 5.0 : extending the scope and power of comparative and functional genomics in plants. NUCLEIC ACIDS RESEARCH, 50(D1), D1468–D1474. https://doi.org/10.1093/nar/gkab1024

Abstract PLAZA is a platform for comparative, evolutionary, and functional plant genomics. It makes a broad set of genomes, data types and analysis tools available to researchers through a user-friendly website, an API, and bulk downloads. In this latest release of the PLAZA platform, we are integrating a record number of 134 high-quality plant genomes, split up over two instances: PLAZA Dicots 5.0 and PLAZA Monocots 5.0. This number of genomes corresponds with a massive expansion in the number of available species when compared to PLAZA 4.0, which offered access to 71 species, a 89% overall increase. The PLAZA 5.0 release contains information for 5 882 730 genes, and offers pre-computed gene families and phylogenetic trees for 5 274 684 protein-coding genes. This latest release also comes with a set of new and updated features: a new BED import functionality for the workbench, improved interactive visualizations for functional enrichments and genome-wide mapping of gene sets, and a fully redesigned and extended API. Taken together, this new version offers extended support for plant biologists working on different families within the green plant lineage and provides an efficient and versatile toolbox for plant genomics. All PLAZA releases are accessible from the portal website: https://bioinformatics.psb.ugent.be/plaza/.
A
Meijer, A., De Meyer, T., Vandepoele, K., & Kyndt, T. (2022). Spatiotemporal expression profile of novel and known small RNAs throughout rice plant development focussing on seed tissues. BMC GENOMICS, 23(1). https://doi.org/10.1186/s12864-021-08264-z

Background Small RNAs (sRNAs) regulate numerous plant processes directly related to yield, such as disease resistance and plant growth. To exploit this yield-regulating potential of sRNAs, the sRNA profile of one of the world's most important staple crops - rice - was investigated throughout plant development using next-generation sequencing. Results Root and leaves were investigated at both the vegetative and generative phase, and early-life sRNA expression was characterized in the embryo and endosperm. This led to the identification of 49,505 novel sRNAs and 5581 tRNA-derived sRNAs (tsRNAs). In all tissues, 24 nt small interfering RNAs (siRNAs) were highly expressed and associated with euchromatic, but not heterochromatic transposable elements. Twenty-one nt siRNAs deriving from genic regions in the endosperm were exceptionally highly expressed, mimicking previously reported expression levels of 24 nt siRNAs in younger endosperm samples. In rice embryos, sRNA content was highly diverse while tsRNAs were underrepresented, possibly due to snoRNA activity. Publicly available mRNA expression and DNA methylation profiles were used to identify putative siRNA targets in embryo and endosperm. These include multiple genes related to the plant hormones gibberellic acid and ethylene, and to seed phytoalexin and iron content. Conclusions This work introduces multiple sRNAs as potential regulators of rice yield and quality, identifying them as possible targets for the continuous search to optimize rice production.
A
Kuiper, M., Bonello, J., Fernández-Breis, J. T., Bucher, P., Futschik, M. E., Gaudet, P., … on behalf of the GRECO consortium, [ missing ]. (2022). The gene regulation knowledge commons : the action area of GREEKC. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS, 1865(1). https://doi.org/10.1016/j.bbagrm.2021.194768

As computational modeling becomes more essential to analyze and understand biological regulatory mechanisms, governance of the many databases and knowledge bases that support this domain is crucial to guarantee reliability and interoperability of resources. To address this, the COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC, CA15205, www.greekc.org) organized nine workshops in a four-year period, starting September 2016. The workshops brought together a wide range of experts from all over the world working on various steps in the knowledge management process that focuses on understanding gene regulatory mechanisms. The discussions between ontologists, curators, text miners, biologists, bioinformaticians, philosophers and computational scientists spawned a host of activities aimed to standardize and update existing knowledge management workflows and involve end-users in the process of designing the Gene Regulation Knowledge Commons (GRKC). Here the GREEKC consortium describes its main achievements in improving this GRKC.
A
Ferrari, C., Manosalva Pérez, N., & Vandepoele, K. (2022). MINI-EX : integrative inference of single-cell gene regulatory networks in plants. MOLECULAR PLANT, 15(11), 1807–1824. https://doi.org/10.1016/j.molp.2022.10.016

Multicellular organisms, such as plants, are characterized by highly specialized and tightly regulated cell populations, establishing specific morphological structures and executing distinct functions. Gene regulatory networks (GRNs) describe condition-specific interactions of transcription factors (TFs) regulating the expression of target genes, underpinning these specific functions. As efficient and validated methods to identify cell-type-specific GRNs from single-cell data in plants are lacking, limiting our understanding of the organization of specific cell types in both model species and crops, we developed MINI-EX (Motif-Informed Network Inference based on single-cell EXpression data), an integrative approach to infer cell-type-specific networks in plants. MINI-EX uses single-cell transcriptomic data to define expression-based networks and integrates TF motif information to filter the inferred regulons, resulting in networks with increased accuracy. Next, regulons are assigned to different cell types, leveraging cell-specific expression, and candidate regulators are prioritized using network centrality measures, functional annotations, and expression specificity. This embedded prioritization strategy offers a unique and efficient means to unravel signaling cascades in specific cell types controlling a biological process of interest. We demonstrate the stability of MINI-EX toward input data sets with low number of cells and its robustness toward missing data, and show that it infers state-of-the-art networks with a better performance compared with other related single-cell network tools. MINI-EX successfully identifies key regulators controlling root development in Arabidopsis and rice, leaf development in Arabidopsis, and ear development in maize, enhancing our understanding of cell-type-specific regulation and unraveling the roles of different regulators controlling the development of specific cell types in plants.
A
Curci, P. L., Zhang, J., Mähler, N., Seyfferth, C., Mannapperuma, C., Diels, T., … Vandepoele, K. (2022). Identification of growth regulators using cross-species network analysis in plants. PLANT PHYSIOLOGY, 190(4), 2350–2365. https://doi.org/10.1093/plphys/kiac374

Cross-species network analysis enables identification and validation of growth regulators in Arabidopsis. With the need to increase plant productivity, one of the challenges plant scientists are facing is to identify genes that play a role in beneficial plant traits. Moreover, even when such genes are found, it is generally not trivial to transfer this knowledge about gene function across species to identify functional orthologs. Here, we focused on the leaf to study plant growth. First, we built leaf growth transcriptional networks in Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and aspen (Populus tremula). Next, known growth regulators, here defined as genes that when mutated or ectopically expressed alter plant growth, together with cross-species conserved networks, were used as guides to predict novel Arabidopsis growth regulators. Using an in-depth literature screening, 34 out of 100 top predicted growth regulators were confirmed to affect leaf phenotype when mutated or overexpressed and thus represent novel potential growth regulators. Globally, these growth regulators were involved in cell cycle, plant defense responses, gibberellin, auxin, and brassinosteroid signaling. Phenotypic characterization of loss-of-function lines confirmed two predicted growth regulators to be involved in leaf growth (NPF6.4 and LATE MERISTEM IDENTITY2). In conclusion, the presented network approach offers an integrative cross-species strategy to identify genes involved in plant growth and development.
A
Jaramillo-Botero, A., Colorado, J., Quimbaya, M., Rebolledo, M. C., Lorieux, M., Ghneim-Herrera, T., … Goddard, W. A. (2022). The ÓMICAS alliance, an international research program on multi-omics for crop breeding optimization. FRONTIERS IN PLANT SCIENCE, 13. https://doi.org/10.3389/fpls.2022.992663

The OMICAS alliance is part of the Colombian government's Scientific Ecosystem, established between 2017-2018 to promote world-class research, technological advancement and improved competency of higher education across the nation. Since the program's kick-off, OMICAS has focused on consolidating and validating a multi-scale, multi-institutional, multi-disciplinary strategy and infrastructure to advance discoveries in plant science and the development of new technological solutions for improving agricultural productivity and sustainability. The strategy and methods described in this article, involve the characterization of different crop models, using high-throughput, real-time phenotyping technologies as well as experimental tissue characterization at different levels of the omits hierarchy and under contrasting conditions, to elucidate epigenome-, genome-, proteome- and metabolome-phenome relationships. The massive data sets are used to derive in-silico models, methods and tools to discover complex underlying structure-function associations, which are then carried over to the production of new germplasm with improved agricultural traits. Here, we describe OMICAS' R&D trans-disciplinary multi-project architecture, explain the overall strategy and methods for crop-breeding, recent progress and results, and the overarching challenges that lay ahead in the field.
A
Castro-Mondragon, J. A., Riudavets-Puig, R., Rauluseviciute, I., Berhanu Lemma, R., Turchi, L., Blanc-Mathieu, R., … Mathelier, A. (2022). JASPAR 2022 : the 9th release of the open-access database of transcription factor binding profiles. NUCLEIC ACIDS RESEARCH, 50(D1), D165–D173. https://doi.org/10.1093/nar/gkab1113

JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.
A
Nagy, I., Veeckman, E., Liu, C., Van Bel, M., Vandepoele, K., Jensen, C. S., … Asp, T. (2022). Chromosome-scale assembly and annotation of the perennial ryegrass genome. BMC GENOMICS, 23(1). https://doi.org/10.1186/s12864-022-08697-0

Background The availability of chromosome-scale genome assemblies is fundamentally important to advance genetics and breeding in crops, as well as for evolutionary and comparative genomics. The improvement of long-read sequencing technologies and the advent of optical mapping and chromosome conformation capture technologies in the last few years, significantly promoted the development of chromosome-scale genome assemblies of model plants and crop species. In grasses, chromosome-scale genome assemblies recently became available for cultivated and wild species of the Triticeae subfamily. Development of state-of-the-art genomic resources in species of the Poeae subfamily, which includes important crops like fescues and ryegrasses, is lagging behind the progress in the cereal species. Results Here, we report a new chromosome-scale genome sequence assembly for perennial ryegrass, obtained by combining PacBio long-read sequencing, Illumina short-read polishing, BioNano optical mapping and Hi-C scaffolding. More than 90% of the total genome size of perennial ryegrass (approximately 2.55 Gb) is covered by seven pseudo-chromosomes that show high levels of collinearity to the orthologous chromosomes of Triticeae species. The transposon fraction of perennial ryegrass was found to be relatively low, approximately 35% of the total genome content, which is less than half of the genome repeat content of cultivated cereal species. We predicted 54,629 high-confidence gene models, 10,287 long non-coding RNAs and a total of 8,393 short non-coding RNAs in the perennial ryegrass genome. Conclusions The new reference genome sequence and annotation presented here are valuable resources for comparative genomic studies in grasses, as well as for breeding applications and will expedite the development of productive varieties in perennial ryegrass and related species.
A
De Clercq, I., Van de Velde, J., Luo, X., Liu, L., Storme, V., Van Bel, M., … Vandepoele, K. (2022). Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators. FREE RADICAL BIOLOGY AND MEDICINE, 189(S1), 48–49. https://doi.org/10.1016/j.freeradbiomed.2022.06.202
Nevers, Y., Jones, T. E. M., Jyothi, D., Yates, B., Ferret, M., Portell-Silva, L., … the OpenEBench team the Quest for Orthologs Consortium, [missing]. (2022). The quest for orthologs Orthology Benchmark Service in 2022. NUCLEIC ACIDS RESEARCH, 50(W1), W623–W632. https://doi.org/10.1093/nar/gkac330

The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.
A
Labarre, A., López-Escardó, D., Latorre, F., Leonard, G., Bucchini, F., Obiol, A., … Massana, R. (2021). Comparative genomics reveals new functional insights in uncultured MAST species. ISME JOURNAL, 15(6), 1767–1781. https://doi.org/10.1038/s41396-020-00885-8

Heterotrophic lineages of stramenopiles exhibit enormous diversity in morphology, lifestyle, and habitat. Among them, the marine stramenopiles (MASTs) represent numerous independent lineages that are only known from environmental sequences retrieved from marine samples. The core energy metabolism characterizing these unicellular eukaryotes is poorly understood. Here, we used single-cell genomics to retrieve, annotate, and compare the genomes of 15 MAST species, obtained by coassembling sequences from 140 individual cells sampled from the marine surface plankton. Functional annotations from their gene repertoires are compatible with all of them being phagocytotic. The unique presence of rhodopsin genes in MAST species, together with their widespread expression in oceanic waters, supports the idea that MASTs may be capable of using sunlight to thrive in the photic ocean. Additional subsets of genes used in phagocytosis, such as proton pumps for vacuole acidification and peptidases for prey digestion, did not reveal particular trends in MAST genomes as compared with nonphagocytotic stramenopiles, except a larger presence and diversity of V-PPase genes. Our analysis reflects the complexity of phagocytosis machinery in microbial eukaryotes, which contrasts with the well-defined set of genes for photosynthesis. These new genomic data provide the essential framework to study ecophysiology of uncultured species and to gain better understanding of the function of rhodopsins and related carotenoids in stramenopiles.
A
Van Bel, M., & Vandepoele, K. (2021). Comment on ’Hayai-annotation plants : an ultrafast and comprehensive functional gene annotation system in plants’ : the importance of taking the GO graph structure into account. BIOINFORMATICS, 36(22–23), 5558–5560. https://doi.org/10.1093/bioinformatics/btaa1052
Shin, J., Marx, H., Richards, A., Vaneechoutte, D., Jayaraman, D., Maeda, J., … Roy, S. (2021). A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies. NUCLEIC ACIDS RESEARCH, 49(1). https://doi.org/10.1093/nar/gkaa1041

Comparative functional genomics offers a powerful approach to study species evolution. To date, the majority of these studies have focused on the transcriptome in mammalian and yeast phylogenies. Here, we present a novel multi-species proteomic dataset and a computational pipeline to systematically compare the protein levels across multiple plant species. Globally we find that protein levels diverge according to phylogenetic distance but is more constrained than the mRNA level. Module-level comparative analysis of groups of proteins shows that proteins that are more highly expressed tend to be more conserved. To interpret the evolutionary patterns of conservation and divergence, we develop a novel network-based integrative analysis pipeline that combines publicly available transcriptomic datasets to define co-expression modules. Our analysis pipeline can be used to relate the changes in protein levels to different species-specific phenotypic traits. We present a case study with the rhizobia-legume symbiosis process that supports the role of autophagy in this symbiotic association.
A
Massana, R., Labarre, A., López-Escardó, D., Obiol, A., Bucchini, F., Hackl, T., … Keeling, P. J. (2021). Gene expression during bacterivorous growth of a widespread marine heterotrophic flagellate. ISME JOURNAL, 15(1), 154–167. https://doi.org/10.1038/s41396-020-00770-4

Phagocytosis is a fundamental process in marine ecosystems by which prey organisms are consumed and their biomass incorporated in food webs or remineralized. However, studies searching for the genes underlying this key ecological process in free-living phagocytizing protists are still scarce, in part due to the lack of appropriate ecological models. Our reanalysis of recent molecular datasets revealed that the cultured heterotrophic flagellate Cafeteria burkhardae is widespread in the global oceans, which prompted us to design a transcriptomics study with this species, grown with the cultured flavobacterium Dokdonia sp. We compared the gene expression between exponential and stationary phases, which were complemented with three starvation by dilution phases that appeared as intermediate states. We found distinct expression profiles in each condition and identified 2056 differentially expressed genes between exponential and stationary samples. Upregulated genes at the exponential phase were related to DNA duplication, transcription and translational machinery, protein remodeling, respiration and phagocytosis, whereas upregulated genes in the stationary phase were involved in signal transduction, cell adhesion, and lipid metabolism. We identified a few highly expressed phagocytosis genes, like peptidases and proton pumps, which could be used to target this ecologically relevant process in marine ecosystems.
A
Benites, L. F., Bucchini, F., Sanchez-Brosseau, S., Grimsley, N., Vandepoele, K., & Piganeau, G. (2021). Evolutionary genomics of sex-related chromosomes at the base of the green lineage. GENOME BIOLOGY AND EVOLUTION, 13(10). https://doi.org/10.1093/gbe/evab216

Although sex is now accepted as a ubiquitous and ancestral feature of eukaryotes, direct observation of sex is still lacking in most unicellular eukaryotic lineages. Evidence of sex is frequently indirect and inferred from the identification of genes involved in meiosis from whole genome data and/or the detection of recombination signatures from genetic diversity in natural populations. In haploid unicellular eukaryotes, sex-related chromosomes are named mating-type (MTs) chromosomes and generally carry large genomic regions where recombination is suppressed. These regions have been characterized in Fungi and Chlorophyta and determine gamete compatibility and fusion. Two candidate MT+ and MT- alleles, spanning 450-650 kb, have recently been described in Ostreococcus tauri, a marine phytoplanktonic alga from the Mamiellophyceae class, an early diverging branch in the green lineage. Here, we investigate the architecture and evolution of these candidate MT+ and MT- alleles. We analyzed the phylogenetic profile and GC content of MT gene families in eight different genomes whose divergence has been previously estimated at up to 640 Myr, and found evidence that the divergence of the two MT alleles predates speciation in the Ostreococcus genus. Phylogenetic profiles of MT trans-specific polymorphisms in gametologs disclosed candidate MTs in two additional species, and possibly a third. These Mamiellales MT candidates are likely to be the oldest mating-type loci described to date, which makes them fascinating models to investigate the evolutionary mechanisms of haploid sex determination in eukaryotes.
A
De Clercq, I., Van de Velde, J., Luo, X., Liu, L., Storme, V., Van Bel, M., … Vandepoele, K. (2021). Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators. NATURE PLANTS, 7(4), 500–513. https://doi.org/10.1038/s41477-021-00894-1

Gene regulation is a dynamic process in which transcription factors (TFs) play an important role in controlling spatiotemporal gene expression. To enhance our global understanding of regulatory interactions in Arabidopsis thaliana, different regulatory input networks capturing complementary information about DNA motifs, open chromatin, TF-binding and expression-based regulatory interactions were combined using a supervised learning approach, resulting in an integrated gene regulatory network (iGRN) covering 1,491 TFs and 31,393 target genes (1.7 million interactions). This iGRN outperforms the different input networks to predict known regulatory interactions and has a similar performance to recover functional interactions compared to state-of-the-art experimental methods. The iGRN correctly inferred known functions for 681 TFs and predicted new gene functions for hundreds of unknown TFs. For regulators predicted to be involved in reactive oxygen species (ROS) stress regulation, we confirmed in total 75% of TFs with a function in ROS and/or physiological stress responses. This includes 13 ROS regulators, previously not connected to any ROS or stress function, that were experimentally validated in our ROS-specific phenotypic assays of loss- or gain-of-function lines. In conclusion, the presented iGRN offers a high-quality starting point to enhance our understanding of gene regulation in plants by integrating different experimental data types.
A
Bucchini, F., Del Cortona, A., Kreft, Ł., Botzki, A., Van Bel, M., & Vandepoele, K. (2021). TRAPID 2.0 : a web application for taxonomic and functional analysis of de novo transcriptomes. NUCLEIC ACIDS RESEARCH, 49(17). https://doi.org/10.1093/nar/gkab565

Advances in high-throughput sequencing have resulted in a massive increase of RNA-Seq transcriptome data. However, the promise of rapid gene expression profiling in a specific tissue, condition, unicellular organism or microbial community comes with new computational challenges. Owing to the limited availability of well-resolved reference genomes, de novo assembled (meta)transcriptomes have emerged as popular tools for investigating the gene repertoire of previously uncharacterized organisms. Yet, despite their potential, these datasets often contain fragmented or contaminant sequences, and their analysis remains difficult. To alleviate some of these challenges, we developed TRAPID 2.0, a web application for the fast and efficient processing of assembled transcriptome data. The initial processing phase performs a global characterization of the input data, providing each transcript with several layers of annotation, comprising structural, functional, and taxonomic information. The exploratory phase enables downstream analyses from the web application. Available analyses include the assessment of gene space completeness, the functional analysis and comparison of transcript subsets, and the study of transcripts in an evolutionary context. A comparison with similar tools highlights TRAPID's unique features. Finally, analyses performed within TRAPID 2.0 are complemented by interactive data visualizations, facilitating the extraction of new biological insights, as demonstrated with diatom community metatranscriptomes.
A
Ding, P., Sakai, T., Shrestha, R. K., Manosalva Pérez, N., Guo, W., Ngou, B. P. M., … Jones, J. D. G. (2021). Chromatin accessibility landscapes activated by cell-surface and intracellular immune receptors. JOURNAL OF EXPERIMENTAL BOTANY, 72(22), 7927–7941. https://doi.org/10.1093/jxb/erab373

Activation of cell-surface and intracellular receptor-mediated immunity results in rapid transcriptional reprogramming that underpins disease resistance. However, the mechanisms by which co-activation of both immune systems lead to transcriptional changes are not clear. Here, we combine RNA-seq and ATAC-seq to define changes in gene expression and chromatin accessibility. Activation of cell-surface or intracellular receptor-mediated immunity, or both, increases chromatin accessibility at induced defence genes. Analysis of ATAC-seq and RNA-seq data combined with publicly available information on transcription factor DNA-binding motifs enabled comparison of individual gene regulatory networks activated by cell-surface or intracellular receptor-mediated immunity, or by both. These results and analyses reveal overlapping and conserved transcriptional regulatory mechanisms between the two immune systems.
A
Depuydt, T., & Vandepoele, K. (2021). Multi‐omics network‐based functional annotation of unknown Arabidopsis genes. PLANT JOURNAL, 108, 1193–1212. https://doi.org/10.1111/tpj.15507

Unraveling gene function is pivotal to understanding the signaling cascades that control plant development and stress responses. Since experimental profiling is costly and labor intensive, there is a clear need for high-confidence computational annotation. In contrast to detailed gene-specific functional information, transcriptomics data is widely available for both model and crop species. Here, we describe a novel automated function prediction (AFP) method, which leverages complementary information from multiple expression datasets by analyzing study-specific gene co-expression networks. First, we benchmarked the prediction performance on recently characterized Arabidopsis thaliana genes, and showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n=15,790) and unknown (n=11,865) genes in A. thaliana and validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 thousand interactions in total), obtaining a set of high-confidence functional annotations. Our method assigned at least one validated annotation to 5,054 (42.6%) unknown genes, and at least one novel validated function to 3,408 (53.0%) genes with computational annotations only. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help fill the information gap on biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our AFP approach can be applied in future studies to facilitate gene discovery for crop improvement.
A
Bulánková, P., Sekulić, M., Jallet, D., Nef, C., van Oosterhout, C., Delmont, T. O., … De Veylder, L. (2021). Mitotic recombination between homologous chromosomes drives genomic diversity in diatoms. CURRENT BIOLOGY, 31(15), 3221-3232.e9. https://doi.org/10.1016/j.cub.2021.05.013

Diatoms, an evolutionarily successful group of microalgae, display high levels of intraspecific genetic variability in natural populations. However, the contribution of various mechanisms generating such diversity is unknown. Here we estimated the genetic micro-diversity within a natural diatom population and mapped the genomic changes arising within clonally propagated diatom cell cultures. Through quantification of haplotype diversity by next-generation sequencing and amplicon re-sequencing of selected loci, we documented a rapid accumulation of multiple haplotypes accompanied by the appearance of novel protein variants in cell cultures initiated from a single founder cell. Comparison of the genomic changes between mother and daughter cells revealed copy number variation and copy-neutral loss of heterozygosity leading to the fixation of alleles within individual daughter cells. The loss of heterozygosity can be accomplished by recombination between homologous chromosomes. To test this hypothesis, we established an endogenous readout system and estimated that the frequency of interhomolog mitotic recombination was under standard growth conditions 4.2 events per 100 cell divisions. This frequency is increased under environmental stress conditions, including treatment with hydrogen peroxide and cadmium. These data demonstrate that copy number variation and mitotic recombination between homologous chromosomes underlie clonal variability in diatom populations. We discuss the potential adaptive evolutionary benefits of the plastic response in the interhomolog mitotic recombination rate, and we propose that this may have contributed to the ecological success of diatoms.
A
Colinas Martinez, M., Pollier, J., Vaneechoutte, D., Malat, D., Schweizer, F., De Milde, L., … Goossens, A. (2021). Subfunctionalization of paralog transcription factors contributes to regulation of alkaloid pathway branch choice in Catharanthus roseus. FRONTIERS IN PLANT SCIENCE, 12. https://doi.org/10.3389/fpls.2021.687406

Catharanthus roseus produces a diverse range of specialized metabolites of the monoterpenoid indole alkaloid (MIA) class in a heavily branched pathway. Recent great progress in identification of MIA biosynthesis genes revealed that the different pathway branch genes are expressed in a highly cell type- and organ-specific and stress-dependent manner. This implies a complex control by specific transcription factors (TFs), only partly revealed today. We generated and mined a comprehensive compendium of publicly available C. roseus transcriptome data for MIA pathway branch-specific TFs. Functional analysis was performed through extensive comparative gene expression analysis and profiling of over 40 MIA metabolites in the C. roseus flower petal expression system. We identified additional members of the known BIS and ORCA regulators. Further detailed study of the ORCA TFs suggests subfunctionalization of ORCA paralogs in terms of target gene-specific regulation and synergistic activity with the central jasmonate response regulator MYC2. Moreover, we identified specific amino acid residues within the ORCA DNA-binding domains that contribute to the differential regulation of some MIA pathway branches. Our results advance our understanding of TF paralog specificity for which, despite the common occurrence of closely related paralogs in many species, comparative studies are scarce.
A
Bilcke, G., Van Craenenbroeck, L., Castagna Mourão e Lima, A., Osuna, C., Vandepoele, K., Sabbe, K., … Vyverman, W. (2021). Light intensity and spectral composition drive reproductive success in the marine benthic diatom Seminavis robusta. SCIENTIFIC REPORTS, 11(1). https://doi.org/10.1038/s41598-021-92838-0

The properties of incident light play a crucial role in the mating process of diatoms, a group of ecologically important microalgae. While species-specific requirements for light intensity and photoperiod have been observed in several diatom species, little is known about the light spectrum that allows sexual reproduction. Here, we study the effects of spectral properties and light intensity on the initiation and progression of sexual reproduction in the model benthic diatom Seminavis robusta. We found that distinct stages of the mating process have different requirements for light. Vigorous mating pair formation occurred under a broad range of light intensities, ranging from 10 to 81 mu E m(-2) s(-1), while gametogenesis and subsequent stages were strongly affected by moderate light intensities of 27 mu E m(-2) s(-1) and up. In addition, light of blue or blue-green wavelengths was required for the formation of mating pairs. Combining flow cytometric analysis with expression profiling of the diatom-specific cyclin dsCyc2 suggests that progression through a blue light-dependent checkpoint in the G1 cell cycle phase is essential for induction of sexual reproduction. Taken together, we expand the current model of mating in benthic pennate diatoms, which relies on the interplay between light, cell cycle and sex pheromone signaling.
A
Bilcke, G., Van den Berge, K., De Decker, S., Bonneure, E., Poulsen, N., Bulánková, P., … Vyverman, W. (2021). Mating type specific transcriptomic response to sex inducing pheromone in the pennate diatom Seminavis robusta. ISME JOURNAL, 15, 562–576. https://doi.org/10.1038/s41396-020-00797-7

Sexual reproduction is a fundamental phase in the life cycle of most diatoms. Despite its role as a source of genetic variation, it is rarely reported in natural circumstances and its molecular foundations remain largely unknown. Here, we integrate independent transcriptomic datasets to prioritize genes responding to sex inducing pheromones (SIPs) in the pennate diatomSeminavis robusta. We observe marked gene expression changes associated with SIP treatment in both mating types, including an inhibition of S phase progression, chloroplast division, mitosis, and cell wall formation. Meanwhile, meiotic genes are upregulated in response to SIP, including a sexually induced diatom specific cyclin. Our data further suggest an important role for reactive oxygen species, energy metabolism, and cGMP signaling during the early stages of sexual reproduction. In addition, we identify several genes with a mating type specific response to SIP, and link their expression pattern with physiological specialization, such as the production of the attraction pheromone diproline in mating type - (MT-) and mate-searching behavior in mating type + (MT+). Combined, our results provide a model for early sexual reproduction in pennate diatoms and significantly expand the suite of target genes to detect sexual reproduction events in natural diatom populations.
A
Bilcke, G., Osuna, C., Santana Silva, M., Poulsen, N., D’hondt, S., Bulánková, P., … Vandepoele, K. (2021). Diurnal transcript profiling of the diatom Seminavis robusta reveals adaptations to a benthic lifestyle. PLANT JOURNAL, 107(1), 315–336. https://doi.org/10.1111/tpj.15291

Coastal regions contribute an estimated 20% of annual gross primary production in the oceans, despite occupying only 0.03% of their surface area. Diatoms frequently dominate coastal sediments, where they experience large variations in light regime resulting from the interplay of diurnal and tidal cycles. Here, we report on an extensive diurnal transcript profiling experiment of the motile benthic diatom Seminavis robusta. Nearly 90% (23 328) of expressed protein-coding genes and 66.9% (1124) of expressed long intergenic non-coding RNAs showed significant expression oscillations and are predominantly phasing at night with a periodicity of 24 h. Phylostratigraphic analysis found that rhythmic genes are enriched in highly conserved genes, while diatom-specific genes are predominantly associated with midnight expression. Integration of genetic and physiological cell cycle markers with silica depletion data revealed potential new silica cell wall-associated gene families specific to diatoms. Additionally, we observed 1752 genes with a remarkable semidiurnal (12-h) periodicity, while the expansion of putative circadian transcription factors may reflect adaptations to cope with highly unpredictable external conditions. Taken together, our results provide new insights into the adaptations of diatoms to the benthic environment and serve as a valuable resource for the study of diurnal regulation in photosynthetic eukaryotes.
A
Blanco‐Pastor, J. L., Barre, P., Keep, T., Ledauphin, T., Escobar‐Gutiérrez, A., Roschanski, A. M., … Sampoux, J. (2021). Canonical correlations reveal adaptive loci and phenotypic responses to climate in perennial ryegrass. MOLECULAR ECOLOGY RESOURCES, 21(3), 849–870. https://doi.org/10.1111/1755-0998.13289

Germplasm from perennial ryegrass (Lolium perenne L.) natural populations is useful for breeding because of its adaptation to a wide range of climates. Climate‐adaptive genes can be detected from associations between genotype, phenotype and climate but an integrated framework for the analysis of these three sources of information is lacking. We used two approaches to identify adaptive loci in perennial ryegrass and their effect on phenotypic traits. First, we combined Genome‐Environment Association (GEA) and GWAS analyses. Then, we implemented a new test based on a Canonical Correlation Analysis (CANCOR) to detect adaptive loci. Furthermore, we improved the previous perennial ryegrass gene set by de novo gene prediction and functional annotation of 39,967 genes. GEA‐GWAS revealed eight outlier loci associated with both environmental variables and phenotypic traits. CANCOR retrieved 633 outlier loci associated with two climatic gradients, characterized by cold‐dry winter versus mild‐wet winter and long rainy season versus long summer, and pointed out traits putatively conferring adaptation at the extremes of these gradients. Our CANCOR test also revealed the presence of both polygenic and oligogenic climatic adaptations. Our gene annotation revealed that 374 of the CANCOR outlier loci were positioned within or close to a gene. Co‐association networks of outlier loci revealed a potential utility of CANCOR for investigating the interaction of genes involved in polygenic adaptations. The CANCOR test provides an integrated framework to analyse adaptive genomic diversity and phenotypic responses to environmental selection pressures that could be used to facilitate the adaptation of plant species to climate change.
A
Vancaester, E., Depuydt, T., Osuna, C., & Vandepoele, K. (2020). Comprehensive and functional analysis of horizontal gene transfer events in diatoms. MOLECULAR BIOLOGY AND EVOLUTION, 37(11), 3243–3257. https://doi.org/10.1093/molbev/msaa182

Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favourable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3-5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing, and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes were detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.
A
Osuna Cruz, C. M., Bilcke, G., Vancaester, E., De Decker, S., Bones, A. M., Winge, P., … Vandepoele, K. (2020). The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms (vol 11, 3320, 2020). https://doi.org/10.1038/s41467-020-19222-w
Vercruysse, J., Van Bel, M., Osuna, C., Kulkarni, S. R., Van den Storme, V., Nelissen, H., … Vandepoele, K. (2020). Comparative transcriptomics enables the identification of functional orthologous genes involved in early leaf growth. PLANT BIOTECHNOLOGY JOURNAL, 18(2), 553–567. https://doi.org/10.1111/pbi.13223

Leaf growth is a complex trait for which many similarities exist in different plant species, suggesting functional conservation of the underlying pathways. However, a global view of orthologous genes involved in leaf growth showing conserved expression in dicots and monocots is currently missing. Here, we present a genome-wide comparative transcriptome analysis between Arabidopsis and maize, identifying conserved biological processes and gene functions active during leaf growth. Despite the orthology complexity between these distantly related plants, 926 orthologous gene groups including 2829 Arabidopsis and 2974 maize genes with similar expression during leaf growth were found, indicating conservation of the underlying molecular networks. We found 65% of these genes to be involved in one-to-one orthology, whereas only 28.7% of the groups with divergent expression had one-to-one orthology. Within the pool of genes with conserved expression, 19 transcription factor families were identified, demonstrating expression conservation of regulators active during leaf growth. Additionally, 25 Arabidopsis and 25 maize putative targets of the TCP transcription factors with conserved expression were determined based on the presence of enriched transcription factor binding sites. Based on large-scale phenotypic data, we observed that genes with conserved expression have a higher probability to be involved in leaf growth and that leaf-related phenotypes are more frequently present for genes having orthologues between dicots and monocots than clade-specific genes. This study shows the power of integrating transcriptomic with orthology data to identify or select candidates for functional studies during leaf development in flowering plants.
A
Yau, S., Krasovec, M., Benites, L. F., Rombauts, S., Groussin, M., Vancaester, E., … Piganeau, G. (2020). Virus-host coexistence in phytoplankton through the genomic lens. SCIENCE ADVANCES, 6(14). https://doi.org/10.1126/sciadv.aay2587

Virus-microbe interactions in the ocean are commonly described by "boom and bust" dynamics, whereby a numerically dominant microorganism is lysed and replaced by a virus-resistant one. Here, we isolated a microalga strain and its infective dsDNA virus whose dynamics are characterized instead by parallel growth of both the microalga and the virus. Experimental evolution of clonal lines revealed that this viral production originates from the lysis of a minority of virus-susceptible cells, which are regenerated from resistant cells. Whole-genome sequencing demonstrated that this resistant-susceptible switch involved a large deletion on one chromosome. Mathematical modeling explained how the switch maintains stable microalga-virus population dynamics consistent with their observed growth pattern. Comparative genomics confirmed an ancient origin of this "accordion" chromosome despite a lack of sequence conservation. Together, our results show how dynamic genomic rearrangements may account for a previously overlooked coexistence mechanism in microalgae-virus interactions.
A
Blommaert, L., Vancaester, E., Huysman, M., Osuna, C., D’hondt, S., Lavaud, J., … Sabbe, K. (2020). Light regulation of LHCX genes in the benthic diatom Seminavis robusta. FRONTIERS IN MARINE SCIENCE, 7. https://doi.org/10.3389/fmars.2020.00192

Intertidal benthic diatoms experience a highly variable light regime, which especially challenges these organisms to cope with excess light energy during low tide. Non-photochemical quenching of chlorophyll fluorescence (NPQ) is one of the most rapid mechanisms diatoms possess to dissipate excess energy. Its capacity is mainly defined by the xanthophyll cycle (XC) and Light-Harvesting Complex X (LHCX) proteins. Whereas the XC and its relation to NPQ have been relatively well-studied in both planktonic and benthic diatoms, our current knowledge about LHCX proteins and their potential involvement in NPQ regulation is largely restricted to planktonic diatoms. While recent studies using immuno-blotting have revealed the presence of light regulated LHCX proteins in benthic diatom communities and isolates, nothing is as yet known about the diversity, identity and transcriptional regulation or function of these proteins. We identified LHCX genes in the draft genome of the model benthic diatom Seminavis robusta and followed their transcriptional regulation during a day/night cycle and during exposure to high light conditions. The S. robusta genome contains 17 LHCX sequences, which is much more than in the sequenced planktonic model diatoms (4-5), but similar to the number of LHCX genes in the sea ice associated diatom Fragilariopsis cylindrus. LHCX diversification in both species, however, appears to have occurred independently. Interestingly, the S. robusta genome contains LHCX genes that are related to the LHCX6 of the model centric diatom Thalassiosira pseudonana, which are lacking in the well-studied pennate model diatom Phaeodactylum tricornutum. All investigated LHCX genes, with exception of SrLHCX6, were upregulated during the daily dark-light transition. Exposure to 2,000 timol photons m(-2) s(-1), furthermore, increased transcription of all investigated LHCX genes. Our data suggest that the diversification and involvement of several light regulated LHCX genes in the photophysiology of S. robusta may represent an adaptation to the complex and highly variable light environment this benthic diatom species can be exposed to.
A
Del Cortona, A., Jackson, C. J., Bucchini, F., Van Bel, M., D’hondt, S., Škaloud, P., … Leliaert, F. (2020). Neoproterozoic origin and multiple transitions to macroscopic growth in green seaweeds. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 117(5), 2551–2559. https://doi.org/10.1073/pnas.1910060117

The Neoproterozoic Era records the transition from a largely bacterial to a predominantly eukaryotic phototrophic world, creating the foundation for the complex benthic ecosystems that have sustained Metazoa from the Ediacaran Period onward. This study focuses on the evolutionary origins of green seaweeds, which play an important ecological role in the benthos of modern sunlit oceans and likely played a crucial part in the evolution of early animals by structuring benthic habitats and providing novel niches. By applying a phylogenomic approach, we resolve deep relationships of the core Chlorophyta (Ulvophyceae or green seaweeds, and freshwater or terrestrial Chlorophyceae and Trebouxiophyceae) and unveil a rapid radiation of Chlorophyceae and the principal lineages of the Ulvophyceae late in the Neoproterozoic Era. Our time-calibrated tree points to an origin and early diversification of green seaweeds in the late Tonian and Cryogenian periods, an interval marked by two global glaciations with strong consequent changes in the amount of available marine benthic habitat. We hypothesize that unicellular and simple multicellular ancestors of green seaweeds survived these extreme climate events in isolated refugia, and diversified in benthic environments that became increasingly available as ice retreated. An increased supply of nutrients and biotic interactions, such as grazing pressure, likely triggered the independent evolution of macroscopic growth via different strategies, including true multicellularity, and multiple types of giant-celled forms.
A
Lu, K.-J., van ’t Wout Hofland, N., Mor, E., Mutte, S., Abrahams, P., Kato, H., … De Rybel, B. (2020). Evolution of vascular plants through redeployment of ancient developmental regulators. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 117(1), 733–740. https://doi.org/10.1073/pnas.1912470117

Vascular plants provide most of the biomass, food, and feed on earth, yet the molecular innovations that led to the evolution of their conductive tissues are unknown. Here, we reveal the evolutionary trajectory for the heterodimeric TMO5/LHW transcription factor complex, which is rate-limiting for vascular cell proliferation in Arabidopsis thaliana. Both regulators have origins predating vascular tissue emergence, and even terrestrialization. We further show that TMO5 evolved its modern function, including dimerization with LHW, at the origin of land plants. A second innovation in LHW, coinciding with vascular plant emergence, conditioned obligate heterodimerization and generated the critical function in vascular development in Arabidopsis. In summary, our results suggest that the division potential of vascular cells may have been an important factor contributing to the evolution of vascular plants.
A
Jones, M., & Vandepoele, K. (2020). Identification and evolution of gene regulatory networks : insights from comparative studies in plants. CURRENT OPINION IN PLANT BIOLOGY, 54, 42–48. https://doi.org/10.1016/j.pbi.2019.12.008

The availability of genome sequences, genome-wide assays of transcription factor binding, and accessible chromatin maps have unveiled gene regulatory landscapes in plants. This understanding has ushered in comparative gene regulatory network studies that assess network rewiring between species, across time, and between biological tissues. Comparisons of cis-regulatory elements across the plant kingdom have uncovered examples of conserved sequences, but also of divergence, indicating that selective pressures can vary in different plant families. Transcription factor duplication, followed by spatiotemporal expression divergence of the duplicates, also appears to be a key mechanism of network evolution. Here, we review recent literature describing the regulation of gene expression in plants, and how comparative studies provide insights into how these regulatory interactions change and lead to gene regulatory network rewiring.
A
Kulkarni, S. R., & Vandepoele, K. (2020). Inference of plant gene regulatory networks using data-driven methods : a practical overview. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS, 1863(6). https://doi.org/10.1016/j.bbagrm.2019.194447

Transcriptional regulation is a complex and dynamic process that plays a vital role in plant growth and development. A key component in the regulation of genes is transcription factors (TFs), which coordinate the transcriptional control of gene activity. A gene regulatory network (GRN) is a collection of regulatory interactions between TFs and their target genes. The accurate delineation of GRNs offers a significant contribution to our understanding about how plant cells are organized and function, and how individual genes are regulated in various conditions, organs or cell types. During the past decade, important progress has been made in the identification of GRNs using experimental and computational approaches. However, a detailed overview of available platforms supporting the analysis of GRNs in plants is missing. Here, we review current databases, platforms and tools that perform data-driven analyses of gene regulation in Arabidopsis. The platforms are categorized into two sections, 1) promoter motif analysis tools that use motif mapping approaches to find TF motifs in the regulatory sequences of genes of interest and 2) network analysis tools that identify potential regulators for a set of input genes using a range of data types in order to generate GRNs. We discuss the diverse datasets integrated and highlight the strengths and caveats of different platforms. Finally, we shed light on the limitations of the above approaches and discuss future perspectives, including the need for integrative approaches to unravel complex GRNs in plants.
A
Stock genannt Schroer, F., Bilcke, G., De Decker, S., Osuna, C., Van den Berge, K., Vancaester, E., … Vyverman, W. (2020). Distinctive growth and transcriptional changes of the diatom Seminavis robusta in response to quorum sensing related compounds. FRONTIERS IN MICROBIOLOGY, 11. https://doi.org/10.3389/fmicb.2020.01240

In aquatic habitats, diatoms are frequently found in association with Proteobacteria, many members of which employ cell-to-cell communication via N-acyl homoserine lactones (AHLs). It has been suggested that diatoms could distinguish between beneficial and algicidal bacteria in their surroundings by sensing AHLs. Although some microalgae can interfere with AHL signaling, e.g., by releasing AHL mimics or degrading them, molecular responses to AHLs in microalgae are still unclear. Therefore, we tested the effects of short-chained AHLs, i.e., N-hexanoyl homoserine lactone (C6-HSL), N-3-hydroxyhexanoyl homoserine lactone (OH-C6-HSL), and N-3-oxohexanoyl homoserine lactone (oxo-C6-HSL) and long-chained AHLs, i.e., N-tetradecanoyl homoserine lactone (C14-HSL), N-3-hydroxytetradecanoyl homoserine lactone (OH-C14-HSL), and N-3-oxotetradecanoyl homoserine lactone (oxo-C14-HSL), on growth of the benthic diatom Seminavis robusta. All tested short-chained AHLs did not affect diatom growth, while long-chained AHLs promoted (C14-HSL) or inhibited (OH-C14-HSL and oxo-C14-HSL) growth. To investigate the physiological effects of these long-chained AHLs in more detail, an RNA-seq experiment was performed during which S. robusta was treated with the growth-promoting C14-HSL and the growth-inhibiting oxo-C14-HSL. One tetramic acid was also tested (TA14), a structural rearrangement product of oxo-C14-HSL, which also induced growth inhibition in S. robusta. After 3 days of treatment, analysis revealed that 3,410 genes were differentially expressed in response to at least one of the compounds. In the treatment with the growth-promoting C14-HSL many genes involved in intracellular signaling were upregulated. On the other hand, exposure to growth-inhibiting oxo-C14-HSL and TA14 triggered a switch in lipid metabolism towards increased fatty acid degradation. In addition, oxo-C14-HSL led to downregulation of cell cycle genes, which is in agreement with the stagnation of cell growth in this treatment. Combined, our results indicate that bacterial signaling molecules with high structural similarity induce contrasting physiological responses in S. robusta.
A
Osuna, C., Bilcke, G., Vancaester, E., De Decker, S., Bones, A. M., Winge, P., … Vandepoele, K. (2020). The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms. NATURE COMMUNICATIONS, 11(1). https://doi.org/10.1038/s41467-020-17191-8

Benthic diatoms are the main primary producers in shallow freshwater and coastal environments, fulfilling important ecological functions such as nutrient cycling and sediment stabilization. However, little is known about their evolutionary adaptations to these highly structured but heterogeneous environments. Here, we report a reference genome for the marine biofilm-forming diatom Seminavis robusta, showing that gene family expansions are responsible for a quarter of all 36,254 protein-coding genes. Tandem duplications play a key role in extending the repertoire of specific gene functions, including light and oxygen sensing, which are probably central for its adaptation to benthic habitats. Genes differentially expressed during interactions with bacteria are strongly conserved in other benthic diatoms while many species-specific genes are strongly upregulated during sexual reproduction. Combined with re-sequencing data from 48 strains, our results offer insights into the genetic diversity and gene functions in benthic diatoms. Available genomics studies have mostly focused on planktonic centric diatom. Here, the authors report the genome assembly of the marine biofilm-forming diatom Seminavis robusta and the resequencing data of a panel of accessions to reveal their evolutionary adaptations.
A
Bousquet, L., Hemon, C., Malburet, P., Bucchini, F., Vandepoele, K., Grimsley, N., … Echeverria, M. (2020). The medium-size noncoding RNA transcriptome of Ostreococcus tauri, the smallest living eukaryote, reveals a large family of small nucleolar RNAs displaying multiple genomic expression strategies. NAR GENOMICS AND BIOINFORMATICS, 2(4). https://doi.org/10.1093/nargab/lqaa080

The small nucleolar RNAs (snoRNAs), essential for ribosome biogenesis, constitute a major family of medium-size noncoding RNAs (mncRNAs) in all eukaryotes. We present here, for the first time in a marine unicellular alga, the characterization of the snoRNAs family in Ostreococcus tauri, the smallest photosynthetic eukaryote. Using a transcriptomic approach, we identified 131 O. tauri snoRNAs (Ot-snoRNA) distributed in three classes: the C/D snoRNAs, the H/ACA snoRNAs and the MRP RNA. Their genomic organization revealed a unique combination of both the intronic organization of animals and the polycistronic organization of plants. Remarkably, clustered genes produced Ot-snoRNAs with unusual structures never previously described in plants. Their abundances, based on quantification of reads and northern blots, showed extreme differences in Ot-snoRNA accumulation, mainly determined by their differential stability. Most of these Ot-snoRNAs were predicted to target rRNAs or snRNAs. Seventeen others were orphan Ot-snoRNAs that would not target rRNA. These were specific to O. tauri or Mamiellophyceae and could have functions unrelated to ribosome biogenesis. Overall, these data reveal an 'evolutionary response' adapted to the extreme compactness of the O. tauri genome that accommodates the essential Ot-snoRNAs, developing multiple strategies to optimize their coordinated expression with a minimal cost on regulatory circuits.
A
Witjes, L., Van Troys, M., Vandekerckhove, J., Vandepoele, K., & Ampe, C. (2019). A new evolutionary model for the vertebrate actin family including two novel groups. MOLECULAR PHYLOGENETICS AND EVOLUTION, 414. https://doi.org/10.1016/j.ympev.2019.106632

Database surveys in the vertebrate model organisms: chicken (Gallus gallus), western clawed frog (Xenopus tropicalis), anole lizard (Anolis carolinensis) and zebrafish (Danio rerio) indicate that in some of these species the number of actin paralogues differs from the well-established six paralogues in mouse (Mus musculus). To investigate differential functions of actins and for establishing disease models it is important to know how actins in the different model organisms relate to each other and whether the vertebrate actin family is truly limited to six groups. Primarily through synteny analyses we discovered that the vertebrate actin family consists of eight instead of six orthologous actin groups for which we propose improved gene nomenclature. We also established that α-skeletal muscle, γ-enteric smooth muscle and γ-cytoplasmic actin genes originated prior to tetrapods contradicting an earlier and widely accepted model of actin evolution. Our findings allow a more reliable predictive classification of actin paralogues in (non-mammalian) vertebrates and contribute to a better understanding of actin evolution as basis for biomedical research on actin-related diseases.
A
Kulkarni, S. R., Jones, M., & Vandepoele, K. (2019). Enhanced maps of transcription factor binding sites improve regulatory networks learned from accessible chromatin data. PLANT PHYSIOLOGY, 181(2), 412–425. https://doi.org/10.1104/pp.19.00605

Determining where transcription factors (TFs) bind in genomes provides insight into which transcriptional programs are active across organs, tissue types, and environmental conditions. Recent advances in high-throughput profiling of regulatory DNA have yielded large amounts of information about chromatin accessibility. Interpreting the functional significance of these data sets requires knowledge of which regulators are likely to bind these regions. This can be achieved by using information about TF-binding preferences, or motifs, to identify TF-binding events that are likely to be functional. Although different approaches exist to map motifs to DNA sequences, a systematic evaluation of these tools in plants is missing. Here, we compare four motif-mapping tools widely used in the Arabidopsis (Arabidopsis thaliana) research community and evaluate their performance using chromatin immunoprecipitation data sets for 40 TFs. Downstream gene regulatory network (GRN) reconstruction was found to be sensitive to the motif mapper used. We further show that the low recall of Find Individual Motif Occurrences, one of the most frequently used motif-mapping tools, can be overcome by using an Ensemble approach, which combines results from different mapping tools. Several examples are provided demonstrating how the Ensemble approach extends our view on transcriptional control for TFs active in different biological processes. Finally, a protocol is presented to effectively derive more complete cell type-specific GRNs through the integrative analysis of open chromatin regions, known binding site information, and expression data sets. This approach will pave the way to increase our understanding of GRNs in different cellular conditions.
A
Van Bel, M., Bucchini, F., & Vandepoele, K. (2019). Gene space completeness in complex plant genomes. CURRENT OPINION IN PLANT BIOLOGY, 48, 9–17. https://doi.org/10.1016/j.pbi.2019.01.001

Genome annotations offer ample opportunities to study gene functions, biochemical and regulatory pathways, or quantitative trait loci in plants. Determining the quality and completeness of a genome annotation, and maintaining the balance between them, are major challenges, even for genomes of well-studied model organisms. In this review, we present a historical overview of the complexity in different plant genomes and discuss the hurdles and possible solutions in obtaining a complete and high-quality genome annotation. We illustrate there is no clear-cut answer to solve these challenges for different gene types, but provide tips on guiding the iterative process of generating a superior genome annotation, which is a moving target as our knowledge about plant genomics increases and additional data sources become available.
A
Van Leene, J., Han, C., Gadeyne, A., Eeckhout, D., Matthijs, C., Cannoot, B., … De Jaeger, G. (2019). Capturing the phosphorylation and protein interaction landscape of the plant TOR kinase. NATURE PLANTS, 5(3), 316–327. https://doi.org/10.1038/s41477-019-0378-z

The target of rapamycin (TOR) kinase is a conserved regulatory hub that translates environmental and nutritional information into permissive or restrictive growth decisions. Despite the increased appreciation of the essential role of the TOR complex in plants, no large-scale phosphoproteomics or interactomics studies have been performed to map TOR signalling events in plants. To fill this gap, we combined a systematic phosphoproteomics screen with a targeted protein complex analysis in the model plant Arabidopsis thaliana. Integration of the phosphoproteome and protein complex data on the one hand shows that both methods reveal complementary subspaces of the plant TOR signalling network, enabling proteome-wide discovery of both upstream and downstream network components. On the other hand, the overlap between both data sets reveals a set of candidate direct TOR substrates. The integrated network embeds both evolutionarily-conserved and plant-specific TOR signalling components, uncovering an intriguing complex interplay with protein synthesis. Overall, the network provides a rich data set to start addressing fundamental questions about how TOR controls key processes in plants, such as autophagy, auxin signalling, chloroplast development, lipid metabolism, nucleotide biosynthesis, protein translation or senescence.
A
Lama, S., Broda, M., Abbas, Z., Vaneechoutte, D., Belt, K., Säll, T., … Van Aken, O. (2019). Neofunctionalization of mitochondrial proteins and incorporation into signaling networks in plants. MOLECULAR BIOLOGY AND EVOLUTION, 36(5), 974–989. https://doi.org/10.1093/molbev/msz031

Because of their symbiotic origin, many mitochondrial proteins are well conserved across eukaryotic kingdoms. It is however less obvious how specific lineages have obtained novel nuclear-encoded mitochondrial proteins. Here, we report a case of mitochondrial neofunctionalization in plants. Phylogenetic analysis of genes containing the Domain of Unknown Function 295 (DUF295) revealed that the domain likely originated in Angiosperms. The C-terminal DUF295 domain is usually accompanied by an N-terminal F-box domain, involved in ubiquitin ligation via binding with ASK1/SKP1-type proteins. Due to gene duplication, the gene family has expanded rapidly, with 94 DUF295-related genes in Arabidopsis thaliana alone. Two DUF295 family subgroups have uniquely evolved and quickly expanded within Brassicaceae. One of these subgroups has completely lost the F-box, but instead obtained strongly predicted mitochondrial targeting peptides. We show that several representatives of this DUF295 Organellar group are effectively targeted to plant mitochondria and chloroplasts. Furthermore, many DUF295 Organellar genes are induced by mitochondrial dysfunction, whereas F-Box DUF295 genes are not. In agreement, several Brassicaceae-specific DUF295 Organellar genes were incorporated in the evolutionary much older ANAC017-dependent mitochondrial retrograde signaling pathway. Finally, a representative set of DUF295 T-DNA insertion mutants was created. No obvious aberrant phenotypes during normal growth and mitochondrial dysfunction were observed, most likely due to the large extent of gene duplication and redundancy. Overall, this study provides insight into how novel mitochondrial proteins can be created via “intercompartmental” gene duplication events. Moreover, our analysis shows that these newly evolved genes can then be specifically integrated into relevant, pre-existing coexpression networks.
A
Vaneechoutte, D., & Vandepoele, K. (2019). Curse : building expression atlases and co-expression networks from public RNA-Seq data. BIOINFORMATICS, 35(16), 2880–2881. https://doi.org/10.1093/bioinformatics/bty1052

Public RNA-Sequencing (RNA-Seq) datasets are a valuable resource for transcriptome analyses, but their accessibility is hindered by the imperfect quality and presentation of their metadata and by the complexity of processing raw sequencing data. The Curse suite was created to alleviate these problems. It consists of an online curation tool named Curse to efficiently build compendia of experiments hosted on the Sequence Read Archive, and a lightweight pipeline named Prose to download and process the RNA-Seq data into expression atlases and co-expression networks. Curse networks showed improved linking of functionally related genes compared to the state-of-the-art.; Availability and implementation: Curse, Prose, and their manuals are available at http://bioinformatics.psb.ugent.be/webtools/Curse/. Prose was implemented in Java.; Supplementary information: Supplementary data are available at Bioinformatics online.
A
Pollier, J., Vancaester, E., Kuzhiumparambil, U., Vickers, C. E., Vandepoele, K., Goossens, A., & Fabris, M. (2019). A widespread alternative squalene epoxidase participates in eukaryote steroid biosynthesis. NATURE MICROBIOLOGY, 4(2), 226–233. https://doi.org/10.1038/s41564-018-0305-5

Steroids are essential triterpenoid molecules that are present in all eukaryotes and modulate the fluidity and flexibility of cell membranes. Steroids also serve as signalling molecules that are crucial for growth, development and differentiation of multicellular organisms1-3. The steroid biosynthetic pathway is highly conserved and is key in eukaryote evolution4-7. The flavoprotein squalene epoxidase (SQE) catalyses the first oxygenation reaction in this pathway and is rate limiting. However, despite its conservation in animals, plants and fungi, several phylogenetically widely distributed eukaryote genomes lack an SQE-encoding gene7,8. Here, we discovered and characterized an alternative SQE (AltSQE) belonging to the fatty acid hydroxylase superfamily. AltSQE was identified through screening of a gene library of the diatom Phaeodactylum tricornutum in a SQE-deficient yeast. In accordance with its divergent protein structure and need for cofactors, we found that AltSQE is insensitive to the conventional SQE inhibitor terbinafine. AltSQE is present in many eukaryotic lineages but is mutually exclusive with SQE and shows a patchy distribution within monophyletic clades. Our discovery provides an alternative element for the conserved steroid biosynthesis pathway, raises questions about eukaryote metabolic evolution and opens routes to develop selective SQE inhibitors to control hazardous organisms.
A
Veeckman, E., Van Glabeke, S., Haegeman, A., Muylle, H., van Parijs, F. R., Byrne, S. L., … Ruttink, T. (2019). Overcoming challenges in variant calling : exploring sequence diversity in candidate genes for plant development in perennial ryegrass (Lolium perenne). DNA RESEARCH, 26(1), 1–12. https://doi.org/10.1093/dnares/dsy033

Revealing DNA sequence variation within the Lolium perenne genepool is important for genetic analysis and development of breeding applications. We reviewed current literature on plant development to select candidate genes in pathways that control agronomic traits, and identified 503 orthologues in L. perenne. Using targeted resequencing, we constructed a comprehensive catalogue of genomic variation for a L. perenne germplasm collection of 736 genotypes derived from current cultivars, breeding material and wild accessions. To overcome challenges of variant calling in heterogeneous outbreeding species, we used two complementary strategies to explore sequence diversity. First, four variant calling pipelines were integrated with the VariantMetaCaller to reach maximal sensitivity. Additional multiplex amplicon sequencing was used to empirically estimate an appropriate precision threshold. Second, a de novo assembly strategy was used to reconstruct divergent alleles for each gene. The advantage of this approach was illustrated by discovery of 28 novel alleles of LpSDUF247, a polymorphic gene co-segregating with the S-locus of the grass self-incompatibility system. Our approach is applicable to other genetically diverse outbreeding species. The resulting collection of functionally annotated variants can be mined for variants causing phenotypic variation, either through genetic association studies, or by selecting carriers of rare defective alleles for physiological analyses.
A
Prince, S. J., Valliyodan, B., Ye, H., Yang, M., Tai, S., Hu, W., … Nguyen, H. T. (2019). Understanding genetic control of root system architecture in soybean : insights into the genetic basis of lateral root number. PLANT CELL AND ENVIRONMENT, 42(1), 212–229. https://doi.org/10.1111/pce.13333

Developing crops with better root systems is a promising strategy to ensure productivity in both optimum and stress environments. Root system architectural traits in 397 soybean accessions were characterized and a high-density single nucleotide polymorphisms (SNPs)-based genome-wide association study was performed to identify the underlying genes associated with root structure. SNPs associated with root architectural traits specific to landraces and elite germplasm pools were detected. Four loci were detected in landraces for lateral root number (LRN) and distribution of root thickness in diameter Class I with a major locus on chromosome 16. This major loci was detected in the coding region of unknown protein, and subsequent analyses demonstrated that root traits are affected with mutated haplotypes of the gene. In elite germplasm pool, 3 significant SNPs in alanine-glyoxalate aminotransferase, Leucine-Rich Repeat receptor/No apical meristem, and unknown functional genes were found to govern multiple traits including root surface area and volume. However, no major loci were detected for LRN in elite germplasm. Nucleotide diversity analysis found evidence of selective sweeps around the landraces LRN gene. Soybean accessions with minor and mutated allelic variants of LRN gene were found to perform better in both water-limited and optimal field conditions.
A
Cirri, E., De Decker, S., Bilcke, G., Werner, M., Osuna, C., De Veylder, L., … Pohnert, G. (2019). Associated bacteria affect sexual reproduction by altering gene expression and metabolic processes in a biofilm inhabiting diatom. FRONTIERS IN MICROBIOLOGY, 10. https://doi.org/10.3389/fmicb.2019.01790

Diatoms are unicellular algae with a fundamental role in global biogeochemical cycles as major primary producers at the base of aquatic food webs. In recent years, chemical communication between diatoms and associated bacteria has emerged as a key factor in diatom ecology, spurred by conceptual and technological advancements to study the mechanisms underlying these interactions. Here, we use a combination of physiological, transcriptomic, and metabolomic approaches to study the influence of naturally coexisting bacteria, Maribacter sp. and Roseovarius sp., on the sexual reproduction of the biofilm inhabiting marine pennate diatom Seminavis robusta. While Maribacter sp. severely reduces the reproductive success of S. robusta cultures, Roseovarius sp. slightly enhances it. Contrary to our expectation, we demonstrate that the effect of the bacterial exudates is not caused by altered cell-cycle regulation prior to the switch to meiosis. Instead, Maribacter sp. exudates cause a reduced production of diproline, the sexual attraction pheromone of S. robusta. Transcriptomic analyses show that this is likely an indirect consequence of altered intracellular metabolic fluxes in the diatom, especially those related to amino acid biosynthesis, oxidative stress response, and biosynthesis of defense molecules. This study provides the first insights into the influence of bacteria on diatom sexual reproduction and adds a new dimension to the complexity of a still understudied phenomenon in natural diatom populations.
A
Forslund, K., Pereira, C., Capella-Gutierrez, S., Sousa da Silva, A., Altenhoff, A., Huerta-Cepas, J., … Lewis, S. (2018). Gearing up to handle the mosaic nature of life in the quest for orthologs. BIOINFORMATICS, 34(2), 323–329. https://doi.org/10.1093/bioinformatics/btx542

The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.
A
Khan, A., Fornes, O., Stigliani, A., Gheorghe, M., Castro-Mondragon, J. A., van der Lee, R., … Mathelier, A. (2018). JASPAR 2018 : update of the open-access database of transcription factor binding profiles and its web framework. NUCLEIC ACIDS RESEARCH, 46(D1), D260–D266. https://doi.org/10.1093/nar/gkx1126

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
A
Lang, D., Ullrich, K. K., Murat, F., Fuchs, J., Jenkins, J., Haas, F. B., … Rensing, S. A. (2018). The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. PLANT JOURNAL, 93(3), 515–533. https://doi.org/10.1111/tpj.13801

The draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome-scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene-and TE-rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono-centric with peaks of a class of Copia elements potentially coinciding with centromeres. Gene body methylation is evident in 5.7% of the protein-coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure-based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant-specific cell growth and tissue organization. The P. patens genome lacks the TE-rich pericentromeric and gene-rich distal regions typical for most flowering plant genomes. More non-seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.
A
Van Bel, M., Diels, T., Vancaester, E., Kreft, L., Botzki, A., Van de Peer, Y., … Vandepoele, K. (2018). PLAZA 4.0 : an integrative resource for functional, evolutionary and comparative plant genomics. NUCLEIC ACIDS RESEARCH, 46(D1), D1190–D1196. https://doi.org/10.1093/nar/gkx1002

PLAZA (https://bioinformatics.psb.ugent.be/plaza) is a plant-oriented online resource for comparative, evolutionary and functional genomics. The PLAZA platform consists of multiple independent instances focusing on different plant clades, while also providing access to a consistent set of reference species. Each PLAZA instance contains structural and functional gene annotations, gene family data and phylogenetic trees and detailed gene colinearity information. A user-friendly web interface makes the necessary tools and visualizations accessible, specific for each data type. Here we present PLAZA 4.0, the latest iteration of the PLAZA framework. This version consists of two new instances (Dicots 4.0 and Monocots 4.0) providing a large increase in newly available species, and offers access to updated and newly implemented tools and visualizations, helping users with the ever-increasing demands for complex and in-depth analyzes. The total number of species across both instances nearly doubles from 37 species in PLAZA 3.0 to 71 species in PLAZA 4.0, with a much broader coverage of crop species (e.g. wheat, palm oil) and species of evolutionary interest (e.g. spruce, Marchantia). The new PLAZA instances can also be accessed by a programming interface through a RESTful web service, thus allowing bioinformaticians to optimally leverage the power of the PLAZA platform.
A
Krasovec, M., Vancaester, E., Rombauts, S., Bucchini, F., Yau, S., Hemon, C., … Piganeau, G. (2018). Genome analyses of the microalga Picochlorum provide insights into the evolution of thermotolerance in the green lineage. GENOME BIOLOGY AND EVOLUTION, 10(9), 2347–2365. https://doi.org/10.1093/gbe/evy167

While the molecular events involved in cell responses to heat stress have been extensively studied, our understanding of the genetic basis of basal thermotolerance, and particularly its evolution within the green lineage, remains limited. Here, we present the 13.3-Mb haploid genome and transcriptomes of a halotolerant and thermotolerant unicellular green alga, Picochlorum costavermella (Trebouxiophyceae) to investigate the evolution of the genomic basis of thermotolerance. Differential gene expression at high and standard temperatures revealed that more of the gene families containing up-regulated genes at high temperature were recently evolved, and less originated at the ancestor of green plants. Inversely, there was an excess of ancient gene families containing transcriptionally repressed genes. Interestingly, there is a striking overlap between the thermotolerance and halotolerance transcriptional rewiring, as more than one-third of the gene families up-regulated at 35 degrees C were also up-regulated under variable salt concentrations in Picochlorum SE3. Moreover, phylogenetic analysis of the 9,304 protein coding genes revealed 26 genes of horizontally transferred origin in P. costavermella, of which five were differentially expressed at higher temperature. Altogether, these results provide new insights about how the genomic basis of adaptation to halo- and thermotolerance evolved in the green lineage.
A
De Clerck, O., Kao, S.-M., Bogaert, K., Blomme, J., Foflonker, F., Kwantes, M., … Bothwell, J. H. (2018). Insights into the evolution of multicellularity from the sea lettuce genome. CURRENT BIOLOGY, 28(18), 2921–2933. https://doi.org/10.1016/j.cub.2018.08.015

We report here the 98.5 Mbp haploid genome (12,924 protein coding genes) of Ulva mutabilis, a ubiquitous and iconic representative of the Ulvophyceae or green seaweeds. Ulva's rapid and abundant growth makes it a key contributor to coastal biogeochemical cycles; its role in marine sulfur cycles is particularly important because it produces high levels of dimethylsulfoniopropionate (DMSP), the main precursor of volatile dimethyl sulfide (DMS). Rapid growth makes Ulva attractive biomass feedstock but also increasingly a driver of nuisance "green tides." Ulvophytes are key to understanding the evolution of multicellularity in the green lineage, and Ulva morphogenesis is dependent on bacterial signals, making it an important species with which to study cross-kingdom communication. Our sequenced genome informs these aspects of ulvophyte cell biology, physiology, and ecology. Gene family expansions associated with multicellularity are distinct from those of freshwater algae. Candidate genes, including some that arose following horizontal gene transfer from chromalveolates, are present for the transport and metabolism of DMSP. The Ulva genome offers, therefore, new opportunities to understand coastal and marine ecosystems and the fundamental evolution of the green lineage.
A
Besbrugge, N., Van Leene, J., Eeckhout, D., Cannoot, B., Kulkarni, S. R., De Winne, N., … De Jaeger, G. (2018). GSyellow, a multifaceted tag for functional protein analysis in monocot and dicot plants. PLANT PHYSIOLOGY, 177(2), 447–464. https://doi.org/10.1104/pp.18.00175

The ability to tag proteins has boosted the emergence of generic molecular methods for protein functional analysis. Fluorescent protein tags are used to visualize protein localization, and affinity tags enable the mapping of molecular interactions by, for example, tandem affinity purification or chromatin immunoprecipitation. To apply these widely used molecular techniques on a single transgenic plant line, we developed a multifunctional tandem affinity purification tag, named GS(yellow), which combines the streptavidin-binding peptide tag with citrine yellow fluorescent protein. We demonstrated the versatility of the GS(yellow) tag in the dicot Arabidopsis (Arabidopsis thaliana) using a set of benchmark proteins. For proof of concept in monocots, we assessed the localization and dynamic interaction profile of the leaf growth regulator ANGUSTIFOLIA3 (AN3), fused to the GS(yellow) tag, along the growth zone of the maize (Zea mays) leaf. To further explore the function of ZmAN3, we mapped its DNA-binding landscape in the growth zone of the maize leaf through chromatin immunoprecipitation sequencing. Comparison with AN3 target genes mapped in the developing maize tassel or in Arabidopsis cell cultures revealed strong conservation of AN3 target genes between different maize tissues and across monocots and dicots, respectively. In conclusion, the GS(yellow) tag offers a powerful molecular tool for distinct types of protein functional analyses in dicots and monocots. As this approach involves transforming a single construct, it is likely to accelerate both basic and translational plant research.
A
Gao, Z., Daneva, A., Salanenka, Y., Van Durme, M., Huysmans, M., Lin, Z., … Nowack, M. (2018). KIRA1 and ORESARA1 terminate flower receptivity by promoting cell death in the stigma of Arabidopsis. NATURE PLANTS, 4(6), 365–375. https://doi.org/10.1038/s41477-018-0160-7

Flowers have a species-specific functional life span that determines the time window in which pollination, fertilization and seed set can occur. The stigma tissue plays a key role in flower receptivity by intercepting pollen and initiating pollen tube growth toward the ovary. In this article, we show that a developmentally controlled cell death programme terminates the functional life span of stigma cells in Arabidopsis. We identified the leaf senescence regulator ORESARA1 (also known as ANAC092) and the previously uncharacterized KIRA1 (also known as ANAC074) as partially redundant transcription factors that modulate stigma longevity by controlling the expression of programmed cell death-associated genes. KIRA1 expression is sufficient to induce cell death and terminate floral receptivity, whereas lack of both KIRA1 and ORESARA1 substantially increases stigma life span. Surprisingly, the extension of stigma longevity is accompanied by only a moderate extension of flower receptivity, suggesting that additional processes participate in the control of the flower's receptive life span.
A
Kulkarni, S. R., Vaneechoutte, D., Van de Velde, J., & Vandepoele, K. (2018). TF2Network : predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information. NUCLEIC ACIDS RESEARCH, 46(6). https://doi.org/10.1093/nar/gkx1279

A gene regulatory network (GRN) is a collection of regulatory interactions between transcription factors (TFs) and their target genes. GRNs control different biological processes and have been instrumental to understand the organization and complexity of gene regulation. Although various experimental methods have been used to map GRNs in Arabidop-sis thaliana, their limited throughput combined with the large number of TFs makes that for many genes our knowledge about regulating TFs is incomplete. We introduce TF2Network, a tool that exploits the vast amount of TF binding site information and enables the delineation of GRNs by detecting potential regulators for a set of co-expressed or functionally related genes. Validation using two experimental benchmarks reveals that TF2Network predicts the correct regulator in 75-92% of the test sets. Furthermore, our tool is robust to noise in the input gene sets, has a low false discovery rate, and shows a better performance to recover correct regulators compared to other plant tools. TF2Network is accessible through a web interface where GRNs are interactively visualized and annotated with various types of experimental functional information. TF2Network was used to perform systematic functional and regulatory gene annotations, identifying new TFs involved in circadian rhythm and stress response.
A
Hansen, B. O., Meyer, E. H., Ferrari, C., Vaid, N., Movahedi, S., Vandepoele, K., … Mutwil, M. (2018). Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana. NEW PHYTOLOGIST, 217(4), 1521–1534. https://doi.org/10.1111/nph.14921

Recent advances in gene function prediction rely on ensemble approaches that integrate results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We have explored and compared two methods to integrate 10 gene co-function networks for Arabidopsis thaliana and demonstrate how the integration of these networks produces more accurate gene function predictions for a larger fraction of genes with unknown function. These predictions were used to identify genes involved in mitochondrial complex I formation, and for five of them, we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet. The methods presented here demonstrate that ensemble gene function prediction is a powerful method to boost prediction performance, whereas the EnsembleNet database provides a cutting-edge community tool to guide experimentalists.
A
De Decker, S., Vanormelingen, P., Sefbom, J., Lembke, C., Van den Berghe, K., Vandepoele, K., … Vyverman, W. (2017). Identifying drivers of sympatric speciation in the marine benthic diatom Seminavis robusta using metabolic analysis and whole-genome resequencing. PHYCOLOGIA, 56(4, suppl.), 41–42.
Vaneechoutte, D., Estrada, A. R., Lin, Y.-C., Loraine, A. E., & Vandepoele, K. (2017). Genome-wide characterization of differential transcript usage in Arabidopsis thaliana. PLANT JOURNAL, 92(6), 1218–1231. https://doi.org/10.1111/tpj.13746

Alternative splicing and the usage of alternate transcription start- or stop sites allows a single gene to produce multiple transcript isoforms. Most plant genes express certain isoforms at a significantly higher level than others, but under specific conditions this expression dominance can change, resulting in a different set of dominant isoforms. These events of differential transcript usage (DTU) have been observed for thousands of Arabidopsis thaliana, Zea mays and Vitis vinifera genes, and have been linked to development and stress response. However, neither the characteristics of these genes, nor the implications of DTU on their protein coding sequences or functions, are currently well understood. Here we present a dataset of isoform dominance and DTU for all genes in the AtRTD2 reference transcriptome based on a protocol that was benchmarked on simulated data and validated through comparison with a published reverse transciptase-polymerase chain reaction panel. We report DTU events for 8148 genes across 206 public RNA-Seq samples, and find that protein sequences are affected in 22% of the cases. The observed DTU events show high consistency across replicates, and reveal reproducible patterns in response to treatment and development. We also demonstrate that genes with different evolutionary ages, expression breadths and functions show large differences in the frequency at which they undergo DTU, and in the effect that these events have on their protein sequences. Finally, we showcase how the generated dataset can be used to explore DTU events for genes of interest or to find genes with specific DTU in samples of interest.
A
Zhang, X., Ivanova, A., Vandepoele, K., Radomiljac, J., Van de Velde, J., Berkowitz, O., … De Clercq, I. (2017). The transcription factor MYB29 is a regulator of ALTERNATIVE OXIDASE1a. PLANT PHYSIOLOGY, 173(3), 1824–1843. https://doi.org/10.1104/pp.16.01494

Plants sense and integrate a variety of signals from the environment through different interacting signal transduction pathways that involve hormones and signaling molecules. Using ALTERNATIVE OXIDASE1a (AOX1a) gene expression as a model system of retrograde or stress signaling between mitochondria and the nucleus, MYB DOMAIN PROTEIN29 (MYB29) was identified as a negative regulator (regulator of alternative oxidase1a 7 [rao7] mutant) in a genetic screen of Arabidopsis (Arabidopsis thaliana). rao7/myb29 mutants have increased levels of AOX1a transcript and protein compared to wild type after induction with antimycin A. A variety of genes previously associated with the mitochondrial stress response also display enhanced transcript abundance, indicating that RAO7/MYB29 negatively regulates mitochondrial stress responses in general. Meta-analysis of hormone-responsive marker genes and identification of downstream transcription factor networks revealed that MYB29 functions in the complex interplay of ethylene, jasmonic acid, salicylic acid, and reactive oxygen species signaling by regulating the expression of various ETHYLENE RESPONSE FACTOR and WRKY transcription factors. Despite an enhanced induction of mitochondrial stress response genes, rao7/myb29 mutants displayed an increased sensitivity to combined moderate light and drought stress. These results uncover interactions between mitochondrial retrograde signaling and the regulation of glucosinolate biosynthesis, both regulated by RAO7/MYB29. This common regulator can explain why perturbation of the mitochondrial function leads to transcriptomic responses overlapping with responses to biotic stress.
A
Ritter Traub, A., Iñigo, S., Fernandez Calvo, P., Heyndrickx, K., Dhondt, S., Shi, H., … Goossens, A. (2017). The transcriptional repressor complex FRS7-FRS12 regulates flowering time and growth in Arabidopsis. NATURE COMMUNICATIONS, 8. https://doi.org/10.1038/ncomms15235

Most living organisms developed systems to efficiently time environmental changes. The plant-clock acts in coordination with external signals to generate output responses determining seasonal growth and flowering time. Here, we show that two Arabidopsis thaliana transcription factors, FAR1 RELATED SEQUENCE 7 (FRS7) and FRS12, act as negative regulators of these processes. These proteins accumulate particularly in short-day conditions and interact to form a complex. Loss-of-function of FRS7 and FRS12 results in early flowering plants with overly elongated hypocotyls mainly in short days. We demonstrate by molecular analysis that FRS7 and FRS12 affect these developmental processes in part by binding to the promoters and repressing the expression of GIGANTEA and PHYTOCHROME INTERACTING FACTOR 4 as well as several of their downstream signalling targets. Our data reveal a molecular machinery that controls the photoperiodic regulation of flowering and growth and offer insight into how plants adapt to seasonal changes.
A
Ruprecht, C., Proost, S., Hernandez-Coronado, M., Ortiz-Ramirez, C., Lang, D., Rensing, S. A., … Mutwil, M. (2017). Phylogenomic analysis of gene co-expression networks reveals the evolution of functional modules. PLANT JOURNAL, 90(3), 447–465. https://doi.org/10.1111/tpj.13502

Molecular evolutionary studies correlate genomic and phylogenetic information with the emergence of new traits of organisms. These traits are, however, the consequence of dynamic gene networks composed of functional modules, which might not be captured by genomic analyses. Here, we established a method that combines large-scale genomic and phylogenetic data with gene co-expression networks to extensively study the evolutionary make-up of modules in the moss Physcomitrella patens, and in the angiosperms Arabidopsis thaliana and Oryza sativa (rice). We first show that younger genes are less annotated than older genes. By mapping genomic data onto the co-expression networks, we found that genes from the same evolutionary period tend to be connected, whereas old and young genes tend to be disconnected. Consequently, the analysis revealed modules that emerged at a specific time in plant evolution. To uncover the evolutionary relationships of the modules that are conserved across the plant kingdom, we added phylogenetic information that revealed duplication and speciation events on the module level. This combined analysis revealed an independent duplication of cell wall modules in bryophytes and angiosperms, suggesting a parallel evolution of cell wall pathways in land plants.
A
De Schutter, K., Tsaneva, M. T., Kulkarni, S. R., Rougé, P., Vandepoele, K., & Van Damme, E. (2017). Evolutionary relationships and expression analysis of EUL domain proteins in rice (Oryza sativa). RICE, 10. https://doi.org/10.1186/s12284-017-0164-3

Background: Lectins, defined as 'Proteins that can recognize and bind specific carbohydrate structures', are widespread among all kingdoms of life and play an important role in various biological processes in the cell. Most plant lectins are involved in stress signaling and/or defense. The family of Euonymus-related lectins (EULs) represents a group of stress-related lectins composed of one or two EUL domains. The latter protein domain is unique in that it is ubiquitous in land plants, suggesting an important role for these proteins. Results: Despite the availability of multiple completely sequenced rice genomes, little is known on the occurrence of lectins in rice. We identified 329 putative lectin genes in the genome of Oryza sativa subsp. japonica belonging to nine out of 12 plant lectin families. In this paper, an in-depth molecular characterization of the EUL family of rice was performed. In addition, analyses of the promoter sequences and investigation of the transcript levels for these EUL genes enabled retrieval of important information related to the function and stress responsiveness of these lectins. Finally, a comparative analysis between rice cultivars and several monocot and dicot species revealed a high degree of sequence conservation within the EUL domain as well as in the domain organization of these lectins. Conclusions: The presence of EULs throughout the plant kingdom and the high degree of sequence conservation in the EUL domain suggest that these proteins serve an important function in the plant cell. Analysis of the promoter region of the rice EUL genes revealed a diversity of stress responsive elements. Furthermore analysis of the expression profiles of the EUL genes confirmed that they are differentially regulated in response to several types of stress. These data suggest a potential role for the EULs in plant stress signaling and defense.
A
Babiychuk, E., Trinh, H. K., Vandepoele, K., Van De Slijke, E., Geelen, D., De Jaeger, G., … Kushnir, S. (2017). The mutation nrpb1-A325V in the largest subunit of RNA polymerase II suppresses compromised growth of Arabidopsis plants deficient in a function of the general transcription factor IIF. PLANT JOURNAL, 89(4), 730–745. https://doi.org/10.1111/tpj.13417

The evolutionarily conserved 12-subunit RNA polymerase II (Pol II) is a central catalytic component that drives RNA synthesis during the transcription cycle that consists of transcription initiation, elongation, and termination. A diverse set of general transcription factors, including a multifunctional TFIIF, govern Pol II selectivity, kinetic properties, and transcription coupling with posttranscriptional processes. Here, we show that TFIIF of Arabidopsis (Arabidopsis thaliana) resembles the metazoan complex that is composed of the TFIIF and TFIIF polypeptides. Arabidopsis has two TFIIF subunits, of which TFIIF1/MAN1 is essential and TFIIF2/MAN2 is not. In the partial loss-of-function mutant allele man1-1, the winged helix domain of Arabidopsis TFIIF1/MAN1 was dispensable for plant viability, whereas the cellular organization of the shoot and root apical meristems were abnormal. Forward genetic screening identified an epistatic interaction between the largest Pol II subunit nrpb1-A325V variant and the man1-1 mutation. The suppression of the man1-1 mutant developmental defects by a mutation in Pol II suggests a link between TFIIF functions in Arabidopsis transcription cycle and the maintenance of cellular organization in the shoot and root apical meristems.
A
Kreft, L., Botzki, A., Coppens, F., Vandepoele, K., & Van Bel, M. (2017). PhyD3 : a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization. BIOINFORMATICS, 33(18), 2946–2947. https://doi.org/10.1093/bioinformatics/btx324

Motivation: Comparative and evolutionary studies utilize phylogenetic trees to analyze and visualize biological data. Recently, several web-based tools for the display, manipulation and annotation of phylogenetic trees, such as iTOL and Evolview, have released updates to be compatible with the latest web technologies. While those web tools operate an open server access model with a multitude of registered users, a feature-rich open source solution using current web technologies is not available. Results: Here, we present an extension of the widely used PhyloXML standard with several new options to accommodate functional genomics or annotation datasets for advanced visualization. Furthermore, PhyD3 has been developed as a lightweight tool using the JavaScript library D3.js to achieve a state-of-the-art phylogenetic tree visualization in the web browser, with support for advanced annotations. The current implementation is open source, easily adaptable and easy to implement in third parties' web sites. Availability and implementation: More information about PhyD3 itself, installation procedures and implementation links are available at http://phyd3.bits.vib.be and at http://github.com/vibbits/phyd3/. Supplementary information: Supplementary data are available at Bioinformatics online.
A
Del Cortona, A., Leliaert, F., Bogaert, K., Turmel, M., Boedeker, C., Janouškovec, J., … De Clerck, O. (2017). The plastid genome in Cladophorales green algae is encoded by hairpin chromosomes. CURRENT BIOLOGY, 27(24), 3771–3782. https://doi.org/10.1016/j.cub.2017.11.004

Virtually all plastid (chloroplast) genomes are circular double-stranded DNA molecules, typically between 100 and 200 kb in size and encoding circa 80-250 genes. Exceptions to this universal plastid genome architecture are very few and include the dinoflagellates, where genes are located on DNA minicircles. Here we report on the highly deviant chloroplast genome of Cladophorales green algae, which is entirely fragmented into hairpin chromosomes. Short-and long-read high-throughput sequencing of DNA and RNA demonstrated that the chloroplast genes of Boodlea composita are encoded on 1-to 7-kb DNA contigs with an exceptionally high GC content, each containing a long inverted repeat with one or two protein-coding genes and conserved non-coding regions putatively involved in replication and/or expression. We propose that these contigs correspond to linear single-stranded DNA molecules that fold onto themselves to form hairpin chromosomes. The Boodlea chloroplast genes are highly divergent from their corresponding orthologs, and display an alternative genetic code. The origin of this highly deviant chloroplast genome most likely occurred before the emergence of the Cladophorales, and coincided with an elevated transfer of chloroplast genes to the nucleus. A chloroplast genome that is composed only of linear DNA molecules is unprecedented among eukaryotes, and highlights unexpected variation in plastid genome architecture.
A
Vandepoele, K. (2017). A guide to the PLAZA 3.0 plant comparative genomic database. In A. D. van Dijk (Ed.), Plant genomics databases : methods and protocols (Vol. 1533, pp. 183–200). https://doi.org/10.1007/978-1-4939-6658-5_10

PLAZA 3.0 is an online resource for comparative genomics and offers a versatile platform to study gene functions and gene families or to analyze genome organization and evolution in the green plant lineage. Starting from genome sequence information for over 35 plant species, precomputed comparative genomic data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, and genomic colinearity information within and between species. Complementary functional data sets, a Workbench, and interactive visualization tools are available through a user-friendly web interface, making PLAZA an excellent starting point to translate sequence or omics data sets into biological knowledge. PLAZA is available at http://bioinformatics.psb.ugent.be/plaza/ .
A
Van de Velde, J., Van Bel, M., Vaneechoutte, D., & Vandepoele, K. (2016). A collection of conserved noncoding sequences to study gene regulation in flowering plants. PLANT PHYSIOLOGY, 171(4), 2586–2598. https://doi.org/10.1104/pp.16.00821

Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops.
A
Veeckman, E., Ruttink, T., & Vandepoele, K. (2016). Are we there yet? : reliably estimating the completeness of plant genome sequences. PLANT CELL, 28(8), 1759–1768. https://doi.org/10.1105/tpc.16.00349

Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation.
A
Tzfadia, O., Diels, T., De Meyer, S., Vandepoele, K., Aharoni, A., & Van de Peer, Y. (2016). CoExpNetViz: comparative co-expression networks construction and visualization tool. FRONTIERS IN PLANT SCIENCE, 6. https://doi.org/10.3389/fpls.2015.01194

Motivation: Comparative transcriptomics is a common approach in functional gene discovery efforts. It allows for finding conserved co-expression patterns between orthologous genes in closely related plant species, suggesting that these genes potentially share similar function and regulation. Several efficient co-expression-based tools have been commonly used in plant research but most of these pipelines are limited to data from model systems, which greatly limit their utility. Moreover, in addition, none of the existing pipelines allow plant researchers to make use of their own unpublished gene expression data for performing a comparative co-expression analysis and generate multi-species co-expression networks. Results: We introduce CoExpNetViz, a computational tool that uses a set of query or "bait" genes as an input (chosen by the user) and a minimum of one pre-processed gene expression dataset. The CoExpNetViz algorithm proceeds in three main steps; (i) for every bait gene submitted, co-expression values are calculated using mutual information and Pearson correlation coefficients, (ii) non bait (or target) genes are grouped based on cross-species orthology, and (iii) output files are generated and results can be visualized as network graphs in Cytoscape. Availability: The CoExpNetViz tool is freely available both as a PHP web server (link: http://bioinformatics.psb.ugent.be/webtools/coexpr/) (implemented in C++) and as a Cytoscape plugin (implemented in Java). Both versions of the CoExpNetViz tool support LINUX and Windows platforms.
A
Van Leene, J., Blomme, J., Kulkarni, S. R., Cannoot, B., De Winne, N., Eeckhout, D., … De Jaeger, G. (2016). Functional characterization of the Arabidopsis transcription factor bZIP29 reveals its role in leaf and root development. JOURNAL OF EXPERIMENTAL BOTANY, 67(19), 5825–5840. https://doi.org/10.1093/jxb/erw347

Plant bZIP group I transcription factors have been reported mainly for their role during vascular development and osmosensory responses. Interestingly, bZIP29 has been identified in a cell cycle interactome, indicating additional functions of bZIP29 in plant development. Here, bZIP29 was functionally characterized to study its role during plant development. It is not present in vascular tissue but is specifically expressed in proliferative tissues. Genome-wide mapping of bZIP29 target genes confirmed its role in stress and osmosensory responses, but also identified specific binding to several core cell cycle genes and to genes involved in cell wall organization. bZIP29 protein complex analyses validated interaction with other bZIP group I members and provided insight into regulatory mechanisms acting on bZIP dimers. In agreement with bZIP29 expression in proliferative tissues and with its binding to promoters of cell cycle regulators, dominant-negative repression of bZIP29 altered the cell number in leaves and in the root meristem. A transcriptome analysis on the root meristem, however, indicated that bZIP29 might regulate cell number through control of cell wall organization. Finally, ectopic dominant-negative repression of bZIP29 and redundant factors led to a seedling-lethal phenotype, pointing to essential roles for bZIP group I factors early in plant development.
A
Veeckman, E., Vandepoele, K., Asp, T., Roldán-Ruiz, I., & Ruttink, T. (2016). Genomic variation in the FT gene family of perennial ryegrass (Lolium perenne). In I. Roldán-Ruiz, J. Baert, & D. Reheul (Eds.), Breeding in a world of scarcity : proceedings of the 2015 meeting of the section “Forage Crops and Amenity Grasses” of Eucarpia (pp. 121–126). https://doi.org/10.1007/978-3-319-28932-8_18

The timing of fl owering is of prime importance for several agronomic traits, and its genetic control is therefore of great interest to breeders. Several signaling pathways converge on FLOWERING LOCUS T (FT) gene family members, which act as central regulators of fl owering, branching and seed dormancy. We identifi ed the complete FT gene family in the Lolium perenne genome and performed phylogenetic analysis to delineate functional clades and to identify putative functionally redundant paralogs. Five FT genes of L. perenne were selected for targeted resequencing in a genepool of 746 accessions to describe genetic diversity in wild accessions, commercial cultivars and breeding material.
A
Goeminne, L., Vandepoele, K., Gevaert, K., & Clement, L. (2015). Robust peptide-based models in quantitative proteomics. Proteomic Forum, Abstracts. Presented at the Proteomic Forum 2015, Berlin, Germany.

Peptide level models for assessing differential proteomics outperform summarization-based methods in terms of sensitivity, specificity, accuracy and precision (Goeminne et al., 2015, submitted). However, the ordinary least squares (OLS) parameter estimator is prone to overfitting and suffers from missing peptides and outliers that are omnipresent in proteomics data. We propose a robust ridge estimator and adopt empirical Bayes to stabilize the variance. With the CPTAC spike-in study, we demonstrate that our robust peptide-based estimator further improves the sensitivity and specificity.
A
Goeminne, L., Clement, L., Gevaert, K., & Vandepoele, K. (2015). Peptide-level robust ridge regression modeling improves both sensitivity and specificity in quantitative proteomics. MaxQuant Summer School, 7th, Abstracts. Presented at the 7th MaxQuant summer school on Computational mass spectrometry-based proteomics, Munich, Germany.
Van Leene, J., Eeckhout, D., Cannoot, B., De Winne, N., Persiau, G., Van De Slijke, E., … De Jaeger, G. (2015). An improved toolbox to unravel the plant cellular machinery by tandem affinity purification of Arabidopsis protein complexes. NATURE PROTOCOLS, 10(1), 169–187. https://doi.org/10.1038/nprot.2014.199

Tandem affinity purification coupled to mass spectrometry (TAP-MS) is one of the most advanced methods to characterize protein complexes in plants, giving a comprehensive view on the protein-protein interactions (PPIs) of a certain protein of interest (bait). The bait protein is fused to a double affinity tag, which consists of a protein G tag and a streptavidin-binding peptide separated by a very specific protease cleavage site, allowing highly specific protein complex isolation under near-physiological conditions. Implementation of this optimized TAP tag, combined with ultrasensitive MS, means that these experiments can be performed on small amounts (25 mg of total protein) of protein extracts from Arabidopsis cell suspension cultures. It is also possible to use this approach to isolate low abundant protein complexes from Arabidopsis seedlings, thus opening perspectives for the exploration of protein complexes in a plant developmental context. Next to protocols for efficient biomass generation of seedlings (similar to 7.5 months), we provide detailed protocols for TAP (1 d), and for sample preparation and liquid chromatography-tandem MS (LC-MS/MS; similar to 5 d), either from Arabidopsis seedlings or from cell cultures. For the identification of specific co-purifying proteins, we use an extended protein database and filter against a list of nonspecific proteins on the basis of the occurrence of a co-purified protein among 543 TAP experiments. The value of the provided protocols is illustrated through numerous applications described in recent literature.
A
Proost, S., Van Bel, M., Vaneechoutte, D., Van de Peer, Y., Inzé, D., Mueller-Roeber, B., & Vandepoele, K. (2015). PLAZA 3.0 : an access point for plant comparative genomics. NUCLEIC ACIDS RESEARCH, 43(D1), D974–D981. https://doi.org/10.1093/nar/gku986

Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms.
A
Vriet, C., Lemmens, K., Vandepoele, K., Reuzeau, C., & Russinova, E. (2015). Evolutionary trails of plant steroid genes. TRENDS IN PLANT SCIENCE, 20(5), 301–308. https://doi.org/10.1016/j.tplants.2015.03.006

Plant steroids - brassinosteroids (BRs) and their precursors, phytosterols-play a major role in plant growth, development, stress tolerance, and have high potential for agricultural applications. Currently, this prospect is limited by a lack of information about their evolution and expression dynamics (spatial and temporal) across plant species. The increasing number of sequenced genomes offers an opportunity for evolutionary studies that might help to prioritize functional analyses with the aim to improve crop yield and stress tolerance. In this review we provide a glimpse of the origin, evolution, and functional conservation of phytosterol and BR genes in the green plant lineage using comparative sequence and expression analyses of publicly available datasets.
A
Wang, F., Muto, A., Van de Velde, J., Neyt, P., Himanen, K., Vandepoele, K., & Van Lijsebettens, M. (2015). Functional analysis of the Arabidopsis TETRASPANIN gene family in plant growth and development. PLANT PHYSIOLOGY, 169(3), 2200–2214. https://doi.org/10.1104/pp.15.01310

TETRASPANIN (TET) genes encode conserved integral membrane proteins that are known in animals to function in cellular communication during gamete fusion, immunity reaction and pathogen recognition. In plants, functional information is limited to one of the 17 members of the Arabidopsis TET gene family and to expression data in reproductive stages. Here, the promoter activity of all 17 Arabidopsis TET genes was investigated by pAtTET::NLS-GFP/GUS reporter lines throughout the life cycle, which predicted functional divergence in the paralogous genes per clade. However, partial overlap was observed for many TET genes across the clades, correlating with few phenotypes in single mutants and therefore requiring double mutant combinations for functional investigation. Mutational analysis showed a role for TET13 in primary root growth and lateral root development, and redundant roles for TET5 and TET6 in leaf and root growth through negative regulation of cell proliferation. Strikingly, a number of TET genes were expressed in embryonic and seedling progenitor cells and remained expressed until the differentiation state in the mature plant, suggesting a dynamic function over developmental stages. cis-regulatory elements together with transcription factor binding data provided molecular insight into the site, conditions and perturbations that affect TET gene expression, and positioned the TET genes in different molecular pathways; the data represent a hypothesis-generating resource for further functional analyses.
A
Nelissen, H., Eeckhout, D., Demuynck, K., Persiau, G., Walton, A., Van Bel, M., … De Jaeger, G. (2015). Dynamic changes in ANGUSTIFOLIA3 complex composition reveal a growth regulatory mechanism in the maize leaf. PLANT CELL, 27(6), 1605–1619. https://doi.org/10.1105/tpc.15.00269

Most molecular processes during plant development occur with a particular spatio-temporal specificity. Thus far, it has remained technically challenging to capture dynamic protein-protein interactions within a growing organ, where the interplay between cell division and cell expansion is instrumental. Here, we combined high-resolution sampling of the growing maize (Zea mays) leaf with tandem affinity purification followed by mass spectrometry. Our results indicate that the growth-regulating SWI/SNF chromatin remodeling complex associated with ANGUSTIFOLIA3 (AN3) was conserved within growing organs and between dicots and monocots. Moreover, we were able to demonstrate the dynamics of the AN3-interacting proteins within the growing leaf, since copurified GROWTH-REGULATING FACTORs (GRFs) varied throughout the growing leaf. Indeed, GRF1, GRF6, GRF7, GRF12, GRF15, and GRF17 were significantly enriched in the division zone of the growing leaf, while GRF4 and GRF10 levels were comparable between division zone and expansion zone in the growing leaf. These dynamics were also reflected at the mRNA and protein levels, indicating tight developmental regulation of the AN3-associated chromatin remodeling complex. In addition, the phenotypes of maize plants overexpressing miRNA396a-resistant GRF1 support a model proposing that distinct associations of the chromatin remodeling complex with specific GRFs tightly regulate the transition between cell division and cell expansion. Together, our data demonstrate that advancing from static to dynamic protein-protein interaction analysis in a growing organ adds insights in how developmental switches are regulated.
A
Volders, P.-J., Verheggen, K., Menschaert, G., Vandepoele, K., Martens, L., Vandesompele, J., & Mestdagh, P. (2015). An update on LNCipedia : a database for annotated human lncRNA sequences. NUCLEIC ACIDS RESEARCH, 43(D1), D174–D180. https://doi.org/10.1093/nar/gku1060

The human genome is pervasively transcribed, producing thousands of non-coding RNA transcripts. The majority of these transcripts are long non-coding RNAs (lncRNAs) and novel lncRNA genes are being identified at rapid pace. To streamline these efforts, we created LNCipedia, an online repository of lncRNA transcripts and annotation. Here, we present LNCipedia 3.0 (http://www.lncipedia.org), the latest version of the publicly available human lncRNA database. Compared to the previous version of LNCipedia, the database grew over five times in size, gaining over 90 000 new lncRNA transcripts. Assessment of the protein-coding potential of LNCipedia entries is improved with state-of-the art methods that include large-scale reprocessing of publicly available proteomics data. As a result, a high-confidence set of lncRNA transcripts with low coding potential is defined and made available for download. In addition, a tool to assess lncRNA gene conservation between human, mouse and zebrafish has been implemented.
A
Del Cortona, A., Leliaert, F., Verbruggen, H., Lopez-Bautista, J. M., Vandepoele, K., & De Clerck, O. (2015). Towards an understanding of the cytological diversity of green seaweeds (Ulvophyceae). EUROPEAN JOURNAL OF PHYCOLOGY, 50(suppl. 1), 217–217.
De Witte, D., Van de Velde, J., Decap, D., Van Bel, M., Audenaert, P., Demeester, P., … Fostier, J. (2015). BLSSpeller : exhaustive comparative discovery of conserved cis-regulatory elements. BIOINFORMATICS, 31(23), 3758–3766. https://doi.org/10.1093/bioinformatics/btv466

Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O. sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z. mays.
A
Gonzalez Sanchez, N., Pauwels, L., Baekelandt, A., De Milde, L., Van Leene, J., Besbrugge, N., … Inzé, D. (2015). A repressor protein complex regulates leaf growth in Arabidopsis. PLANT CELL, 27(8), 2273–2287. https://doi.org/10.1105/tpc.15.00006

Cell number is an important determinant of final organ size. In the leaf, a large proportion of cells are derived from the stomatal lineage. Meristemoids, which are stem cell-like precursor cells, undergo asymmetric divisions, generating several pavement cells adjacent to the two guard cells. However, the mechanism controlling the asymmetric divisions of these stem cells prior to differentiation is not well understood. Here, we characterized PEAPOD (PPD) proteins, the only transcriptional regulators known to negatively regulate meristemoid division. PPD proteins interact with KIX8 and KIX9, which act as adaptor proteins for the corepressor TOPLESS. D3-type cyclin encoding genes were identified among direct targets of PPD2, being negatively regulated by PPDs and KIX8/9. Accordingly, kix8 kix9 mutants phenocopied PPD loss-of-function producing larger leaves resulting from increased meristemoid amplifying divisions. The identified conserved complex might be specific for leaf growth in the second dimension, since it is not present in Poaceae (grasses), which also lack the developmental program it controls.
A
Verkest, A., Byzova, M., Martens, C., Willems, P., Verwulgen, T., Slabbinck, B., … De Block, M. (2015). Selection for improved energy use efficiency and drought tolerance in canola results in distinct transcriptome and epigenome changes. PLANT PHYSIOLOGY, 168(4), 1338–1350. https://doi.org/10.1104/pp.15.00155

To increase both the yield potential and stability of crops, integrated breeding strategies are used that have mostly a direct genetic basis, but the utility of epigenetics to improve complex traits is unclear. A better understanding of the status of the epigenome and its contribution to agronomic performance would help in developing approaches to incorporate the epigenetic component of complex traits into breeding programs. Starting from isogenic canola (Brassica napus) lines, epilines were generated by selecting, repeatedly for three generations, for increased energy use efficiency and drought tolerance. These epilines had an enhanced energy use efficiency, drought tolerance, and nitrogen use efficiency. Transcriptome analysis of the epilines and a line selected for its energy use efficiency solely revealed common differentially expressed genes related to the onset of stress tolerance-regulating signaling events. Genes related to responses to salt, osmotic, abscisic acid, and drought treatments were specifically differentially expressed in the drought-tolerant epilines. The status of the epigenome, scored as differential trimethylation of lysine-4 of histone 3, further supported the phenotype by targeting drought-responsive genes and facilitating the transcription of the differentially expressed genes. From these results, we conclude that the canola epigenome can be shaped by selection to increase energy use efficiency and stress tolerance. Hence, these findings warrant the further development of strategies to incorporate epigenetics into breeding.
A
Glover, N. M., Daron, J., Pingault, L., Vandepoele, K., Paux, E., Feuillet, C., & Choulet, F. (2015). Small-scale gene duplications played a major role in the recent evolution of wheat chromosome 3B. GENOME BIOLOGY, 16. https://doi.org/10.1186/s13059-015-0754-6

Background: Bread wheat is not only an important crop, but its large (17 Gb), highly repetitive, and hexaploid genome makes it a good model to study the organization and evolution of complex genomes. Recently, we produced a high quality reference sequence of wheat chromosome 3B (774 Mb), which provides an excellent opportunity to study the evolutionary dynamics of a large and polyploid genome, specifically the impact of single gene duplications. Results: We find that 27 % of the 3B predicted genes are non-syntenic with the orthologous chromosomes of Brachypodium distachyon, Oryza sativa, and Sorghum bicolor, whereas, by applying the same criteria, non-syntenic genes represent on average only 10 % of the predicted genes in these three model grasses. These non-syntenic genes on 3B have high sequence similarity to at least one other gene in the wheat genome, indicating that hexaploid wheat has undergone massive small-scale interchromosomal gene duplications compared to other grasses. Insertions of non-syntenic genes occurred at a similar rate along the chromosome, but these genes tend to be retained at a higher frequency in the distal, recombinogenic regions. The ratio of non-synonymous to synonymous substitution rates showed a more relaxed selection pressure for non-syntenic genes compared to syntenic genes, and gene ontology analysis indicated that non-syntenic genes may be enriched in functions involved in disease resistance. Conclusion: Our results highlight the major impact of single gene duplications on the wheat gene complement and confirm the accelerated evolution of the Triticeae lineage among grasses.
A
Lindemose, S., Jensen, M. K., Van de Velde, J., O’Shea, C., Heyndrickx, K., Workman, C. T., … De Masi, F. (2014). A DNA-binding-site landscape and regulatory network analysis for NAC transcription factors in Arabidopsis thaliana. NUCLEIC ACIDS RESEARCH, 42(12), 7681–7693. https://doi.org/10.1093/nar/gku502

Target gene identification for transcription factors is a prerequisite for the systems wide understanding of organismal behaviour. NAM-ATAF1/2-CUC2 (NAC) transcription factors are amongst the largest transcription factor families in plants, yet limited data exist from unbiased approaches to resolve the DNA-binding preferences of individual members. Here, we present a TF-target gene identification workflow based on the integration of novel protein binding microarray data with gene expression and multi-species promoter sequence conservation to identify the DNA-binding specificities and the gene regulatory networks of 12 NAC transcription factors. Our data offer specific single-base resolution fingerprints for most TFs studied and indicate that NAC DNA-binding specificities might be predicted from their DNA-binding domain's sequence. The developed methodology, including the application of complementary functional genomics filters, makes it possible to translate, for each TF, protein binding microarray data into a set of high-quality target genes. With this approach, we confirm NAC target genes reported from independent in vivo analyses. We emphasize that candidate target gene sets together with the workflow associated with functional modules offer a strong resource to unravel the regulatory potential of NAC genes and that this workflow could be used to study other families of transcription factors.
A
Vargas, L., Santa Brigida, A. B., Mota Filho, J. P., de Carvalho, T. G., Rojas, C. A., Vaneechoutte, D., … Hemerly, A. S. (2014). Drought tolerance conferred to sugarcane by association with Gluconacetobacter diazotrophicus: a transcriptomic view of hormone pathways. PLOS ONE, 9(12). https://doi.org/10.1371/journal.pone.0114744

Sugarcane interacts with particular types of beneficial nitrogen-fixing bacteria that provide fixed-nitrogen and plant growth hormones to host plants, promoting an increase in plant biomass. Other benefits, as enhanced tolerance to abiotic stresses have been reported to some diazotrophs. Here we aim to study the effects of the association between the diazotroph Gluconacetobacter diazotrophicus PAL5 and sugarcane cv. SP70-1143 during water depletion by characterizing differential transcriptome profiles of sugarcane. RNA-seq libraries were generated from roots and shoots of sugarcane plants free of endophytes that were inoculated with G. diazotrophicus and subjected to water depletion for 3 days. A sugarcane reference transcriptome was constructed and used for the identification of differentially expressed transcripts. The differential profile of non-inoculated SP70-1143 suggests that it responds to water deficit stress by the activation of drought-responsive markers and hormone pathways, as ABA and Ethylene. qRT-PCR revealed that root samples had higher levels of G. diazotrophicus 3 days after water deficit, compared to roots of inoculated plants watered normally. With prolonged drought only inoculated plants survived, indicating that SP70-1143 plants colonized with G. diazotrophicus become more tolerant to drought stress than non-inoculated plants. Strengthening this hypothesis, several gene expression responses to drought were inactivated or regulated in an opposite manner, especially in roots, when plants were colonized by the bacteria. The data suggests that colonized roots would not be suffering from stress in the same way as non-inoculated plants. On the other hand, shoots specifically activate ABA-dependent signaling genes, which could act as key elements in the drought resistance conferred by G. diazotrophicus to SP70-1143. This work reports for the first time the involvement of G. diazotrophicus in the promotion of drought-tolerance to sugarcane cv. SP70-1143, and it describes the initial molecular events that may trigger the increased drought tolerance in the host plant.
A
Van de Velde, J., Heyndrickx, K., & Vandepoele, K. (2014). Inference of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. PLANT CELL, 26(7), 2729–2745. https://doi.org/10.1105/tpc.114.127001

Transcriptional regulation plays an important role in establishing gene expression profiles during development or in response to (a) biotic stimuli. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity, and the identification of individual TFBS in genome sequences is a major goal to inferring regulatory networks. We have developed a phylogenetic footprinting approach for the identification of conserved noncoding sequences (CNSs) across 12 dicot plants. Whereas both alignment and non-alignment-based techniques were applied to identify functional motifs in a multispecies context, our method accounts for incomplete motif conservation as well as high sequence divergence between related species. We identified 69,361 footprints associated with 17,895 genes. Through the integration of known TFBS obtained from the literature and experimental studies, we used the CNSs to compile a gene regulatory network in Arabidopsis thaliana containing 40,758 interactions, of which two-thirds act through binding events located in DNase I hypersensitive sites. This network shows significant enrichment toward in vivo targets of known regulators, and its overall quality was confirmed using five different biological validation metrics. Finally, through the integration of detailed expression and function information, we demonstrate how static CNSs can be converted into condition-dependent regulatory networks, offering opportunities for regulatory gene annotation.
A
Heyndrickx, K., Van de Velde, J., Wang, C., Weigel, D., & Vandepoele, K. (2014). A functional and evolutionary perspective on transcription factor binding in Arabidopsis thaliana. PLANT CELL, 26(10), 3894–3910. https://doi.org/10.1105/tpc.114.130591

Understanding the mechanisms underlying gene regulation is paramount to comprehend the translation from genotype to phenotype. The two are connected by gene expression, and it is generally thought that variation in transcription factor (TF) function is an important determinant of phenotypic evolution. We analyzed publicly available genome-wide chromatin immunoprecipitation experiments for 27 TFs in Arabidopsis thaliana and constructed an experimental network containing 46,619 regulatory interactions and 15,188 target genes. We identified hub targets and highly occupied target (HOT) regions, which are enriched for genes involved in development, stimulus responses, signaling, and gene regulatory processes in the currently profiled network. We provide several lines of evidence that TF binding at plant HOT regions is functional, in contrast to that in animals, and not merely the result of accessible chromatin. HOT regions harbor specific DNA motifs, are enriched for differentially expressed genes, and are often conserved across crucifers and dicots, even though they are not under higher levels of purifying selection than non-HOT regions. Distal bound regions are under purifying selection as well and are enriched for a chromatin state showing regulation by the Polycomb repressive complex. Gene expression complexity is positively correlated with the total number of bound TFs, revealing insights in the regulatory code for genes with different expression breadths. The integration of noncanonical and canonical DNA motif information yields new hypotheses on cobinding and tethering between specific TFs involved in flowering and light regulation.
A
Fu, Q., Fierro Gutierrez, A. C. E., Meysman, P., Sanchez Rodriguez, A., Vandepoele, K., Marchal, K., & Engelen, K. (2014). MAGIC: access portal to a cross-platform gene expression compendium for maize. BIOINFORMATICS, 30(9), 1316–1318. https://doi.org/10.1093/bioinformatics/btt739

To facilitate the exploration of publicly available Zea mays expression data, we constructed a maize expression compendium, making use of an integration methodology and a consistent probe to gene mapping based on the 5b.60 sequence release of Z. mays. The compendium is made available through a web portal MAGIC that hosts a variety of analysis tools to easily browse and analyze the data. Our compendium is different from previous initiatives in combining expression values across different experiments by providing a consistent gene annotation across different platforms.
A
Sonnhammer, E. L., Gabaldón, T., da Silva, A. W. S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., … Vandepoele, K. (2014). Big data and other challenges in the quest for orthologs. https://doi.org/10.1093/bioinformatics/btu492

Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking.
A
Vercruyssen, L., Verkest, A., Gonzalez Sanchez, N., Heyndrickx, K., Eeckhout, D., Han, S.-K., … Inzé, D. (2014). ANGUSTIFOLIA3 binds to SWI/SNF chromatin remodeling complexes to regulate transcription during Arabidopsis leaf development. PLANT CELL, 26(1), 210–229. https://doi.org/10.1105/tpc.113.115907

The transcriptional coactivator ANGUSTIFOLIA3 (AN3) stimulates cell proliferation during Arabidopsis thaliana leaf development, but the molecular mechanism is largely unknown. Here, we show that inducible nuclear localization of AN3 during initial leaf growth results in differential expression of important transcriptional regulators, including GROWTH REGULATING FACTORs (GRFs). Chromatin purification further revealed the presence of AN3 at the loci of GRF5, GRF6, CYTOKININ RESPONSE FACTOR2, CONSTANS-LIKE5 (COL5), HECATE1 (HEC1), and ARABIDOPSIS RESPONSE REGULATOR4 (ARR4). Tandem affinity purification of protein complexes using AN3 as bait identified plant SWITCH/SUCROSE NONFERMENTING (SWI/SNF) chromatin remodeling complexes formed around the ATPases BRAHMA (BRM) or SPLAYED. Moreover, SWI/SNF ASSOCIATED PROTEIN 73B (SWP73B) is recruited by AN3 to the promoters of GRF5, GRF3, COL5, and ARR4, and both SWP73B and BRM occupy the HEC1 promoter. Furthermore, we show that AN3 and BRM genetically interact. The data indicate that AN3 associates with chromatin remodelers to regulate transcription. In addition, modification of SWI3C expression levels increases leaf size, underlining the importance of chromatin dynamics for growth regulation. Our results place the SWI/SNF-AN3 module as a major player at the transition from cell proliferation to cell differentiation in a developing leaf.
A
Verkest, A., Abeel, T., Heyndrickx, K., Van Leene, J., Lanz, C., Van De Slijke, E., … De Jaeger, G. (2014). A generic tool for transcription factor target gene discovery in Arabidopsis cell suspension cultures based on tandem chromatin affinity purification. PLANT PHYSIOLOGY, 164(3), 1122–1133. https://doi.org/10.1104/pp.113.229617

Genome-wide identification of transcription factor (TF) binding sites is pivotal to our understanding of gene expression regulation. Although much progress has been made in the determination of potential binding regions of proteins by chromatin immunoprecipitation, this method has some inherent limitations regarding DNA enrichment efficiency and antibody necessity. Here, we report an alternative strategy for assaying in vivo TF-DNA binding in Arabidopsis (Arabidopsis thaliana) cells by tandem chromatin affinity purification (TChAP). Evaluation of TChAP using the E2Fa TF and comparison with traditional chromatin immunoprecipitation and single chromatin affinity purification illustrates the suitability of TChAP and provides a resource for exploring the E2Fa transcriptional network. Integration with transcriptome, cis-regulatory element, functional enrichment, and coexpression network analyses demonstrates the quality of the E2Fa TChAP sequencing data and validates the identification of new direct E2Fa targets. TChAP enhances both TF target mapping throughput, by circumventing issues related to antibody availability, and output, by improving DNA enrichment efficiency.
A
Zamariola, L., De Storme, N., Vannerum, K., Vandepoele, K., Armstrong, S. J., Franklin, F. C. H., & Geelen, D. (2014). SHUGOSHINs and PATRONUS protect meiotic centromere cohesion in Arabidopsis thaliana. PLANT JOURNAL, 77(5), 782–794. https://doi.org/10.1111/tpj.12432

In meiosis, chromosome cohesion is maintained by the cohesin complex, which is released in a two-step manner. At meiosis I, the meiosis-specific cohesin subunit Rec8 is cleaved by the protease Separase along chromosome arms, allowing homologous chromosome segregation. Next, in meiosis II, cleavage of the remaining centromere cohesin results in separation of the sister chromatids. In eukaryotes, protection of centromeric cohesion in meiosis I is mediated by SHUGOSHINs (SGOs). The Arabidopsis genome contains two SGO homologs. Here we demonstrate that Atsgo1 mutants show a premature loss of cohesion of sister chromatid centromeres at anaphase I and that AtSGO2 partially rescues this loss of cohesion. In addition to SGOs, we characterize PATRONUS which is specifically required for the maintenance of cohesion of sister chromatid centromeres in meiosis II. In contrast to the Atsgo1 Atsgo2 double mutant, patronus T-DNA insertion mutants only display loss of sister chromatid cohesion after meiosis I, and additionally show disorganized spindles, resulting in defects in chromosome segregation in meiosis. This leads to reduced fertility and aneuploid offspring. Furthermore, we detect aneuploidy in sporophytic tissue, indicating a role for PATRONUS in chromosome segregation in somatic cells. Thus, ploidy stability is preserved in Arabidopsis by PATRONUS during both meiosis and mitosis.
A
De Witte, D., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2014). A parallel, distributed-memory framework for comparative motif discovery. In R. Wyrzykowski, J. Dongarra, K. Karczewski, & J. Wasniewski (Eds.), Lecture Notes in Computer Science (Vol. 8385, pp. 268–277). https://doi.org/10.1007/978-3-642-55195-6_25

The increasing number of sequenced organisms has opened new possibilities for the computational discovery of cis-regulatory elements ('motifs') based on phylogenetic footprinting. Word-based, exhaustive approaches are among the best performing algorithms, however, they pose significant computational challenges as the number of candidate motifs to evaluate is very high. In this contribution, we describe a parallel, distributed-memory framework for de novo comparative motif discovery. Within this framework, two approaches for phylogenetic footprinting are implemented: an alignment-based and an alignment-free method. The framework is able to statistically evaluate the conservation of motifs in a search space containing over 160 million candidate motifs using a distributed-memory cluster with 200 CPU cores in a few hours. Software available from http://bioinformatics.intec.ugent.be/blsspeller/
A
Choulet, F., Alberti, A., Theil, S., Glover, N., Barbe, V., Daron, J., … Feuillet, C. (2014). Structural and functional partitioning of bread wheat chromosome 3B. SCIENCE, 345(6194). https://doi.org/10.1126/science.1249721

We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter-and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits.
A
Verelst, W., Bertolini, E., De Bodt, S., Vandepoele, K., Demeulenaere, M., Pé, M. E., & Inzé, D. (2013). Molecular and physiological analysis of growth-limiting drought stress in Brachypodium distachyon leaves. MOLECULAR PLANT, 6(2), 311–322. https://doi.org/10.1093/mp/sss098

The drought-tolerant grass Brachypodium distachyon is an emerging model species for temperate grasses and cereal crops. To explore the usefulness of this species for drought studies, a reproducible in vivo drought assay was developed. Spontaneous soil drying led to a 45% reduction in leaf size, and this was mostly due to a decrease in cell expansion, whereas cell division remained largely unaffected by drought. To investigate the molecular basis of the observed leaf growth reduction, the third Brachypodium leaf was dissected in three zones, namely proliferation, expansion, and mature zones, and subjected to transcriptome analysis, based on a whole-genome tiling array. This approach allowed us to highlight that transcriptome profiles of different developmental leaf zones respond differently to drought. Several genes and functional processes involved in drought tolerance were identified. The transcriptome data suggest an increased energy availability in the proliferation zones, along with an up-regulation of sterol synthesis that may influence membrane fluidity. This information may be used to improve the tolerance of temperate cereals to drought, which is undoubtedly one of the major environmental challenges faced by agriculture today and in the near future.
A
De Smet, R., Adams, K. L., Vandepoele, K., Van Montagu, M., Maere, S., & Van de Peer, Y. (2013). Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 110(8), 2898–2903. https://doi.org/10.1073/pnas.1300127110

The importance of gene gain through duplication has long been appreciated. In contrast, the importance of gene loss has only recently attracted attention. Indeed, studies in organisms ranging from plants to worms and humans suggest that duplication of some genes might be better tolerated than that of others. Here we have undertaken a large-scale study to investigate the existence of duplication-resistant genes in the sequenced genomes of 20 flowering plants. We demonstrate that there is a large set of genes that is convergently restored to single-copy status following multiple genome-wide and smaller scale duplication events. We rule out the possibility that such a pattern could be explained by random gene loss only and therefore propose that there is selection pressure to preserve such genes as singletons. This is further substantiated by the observation that angiosperm single-copy genes do not comprise a random fraction of the genome, but instead are often involved in essential housekeeping functions that are highly conserved across all eukaryotes. Furthermore, single-copy genes are generally expressed more highly and in more tissues than non-single-copy genes, and they exhibit higher sequence conservation. Finally, we propose different hypotheses to explain their resistance against duplication.
A
Vandepoele, K., Van Bel, M., Richard, G., Van Landeghem, S., Verhelst, B., Moreau, H., … Piganeau, G. (2013). pico-PLAZA, a genome database of microbial photosynthetic eukaryotes. ENVIRONMENTAL MICROBIOLOGY, 15(8), 2147–2153. https://doi.org/10.1111/1462-2920.12174

With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. PLAZA can be used to functionally characterize large-scale ES /RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylumtricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains.
A
De Clercq, I., Vermeirssen, V., Van Aken, O., Vandepoele, K., Murcha, M. W., Law, S. R., … Van Breusegem, F. (2013). The membrane-bound NAC transcription factor ANAC013 functions in mitochondrial retrograde regulation of the oxidative stress response in Arabidopsis. PLANT CELL, 25(9), 3472–3490. https://doi.org/10.1105/tpc.113.117168

Upon disturbance of their function by stress, mitochondria can signal to the nucleus to steer the expression of responsive genes. This mitochondria-to-nucleus communication is often referred to as mitochondrial retrograde regulation (MRR). Although reactive oxygen species and calcium are likely candidate signaling molecules for MRR, the protein signaling components in plants remain largely unknown. Through meta-analysis of transcriptome data, we detected a set of genes that are common and robust targets of MRR and used them as a bait to identify its transcriptional regulators. In the upstream regions of these mitochondrial dysfunction stimulon (MDS) genes, we found a cis-regulatory element, the mitochondrial dysfunction motif (MDM), which is necessary and sufficient for gene expression under various mitochondrial perturbation conditions. Yeast one-hybrid analysis and electrophoretic mobility shift assays revealed that the transmembrane domain-containing NO APICAL MERISTEM/ARABIDOPSIS TRANSCRIPTION ACTIVATION FACTOR/CUP-SHAPED COTYLEDON transcription factors (ANAC013, ANAC016, ANAC017, ANAC053, and ANAC078) bound to the MDM cis-regulatory element. We demonstrate that ANAC013 mediates MRRinduced expression of the MDS genes by direct interaction with the MDMcis-regulatory element and triggers increased oxidative stress tolerance. In conclusion, we characterized ANAC013 as a regulator of MRR upon stress in Arabidopsis thaliana.
A
De Witte, D., Van de Velde, J., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., … Fostier, J. (2013). Comparative motif discovery in the cloud. Benelux Bioinformatics Conference 2013, Abstracts. Presented at the Benelux Bioinformatics Conference 2013, Brussels, Belgium.
Heyman, J., Cools, T., Vandenbussche, F., Heyndrickx, K., Van Leene, J., Vercauteren, I., … De Veylder, L. (2013). ERF115 controls root quiescent center cell division and stem cell replenishment. SCIENCE, 342(6160), 860–863. https://doi.org/10.1126/science.1240667

The quiescent center (QC) plays an essential role during root development by creating a microenvironment that preserves the stem cell fate of its surrounding cells. Despite being surrounded by highly mitotic active cells, QC cells self-renew at a low proliferation rate. Here, we identified the ERF115 transcription factor as a rate-limiting factor of QC cell division, acting as a transcriptional activator of the phytosulfokine PSK5 peptide hormone. ERF115 marks QC cell division but is restrained through proteolysis by the APC/C-CCS52A2 ubiquitin ligase, whereas QC proliferation is driven by brassinosteroid-dependent ERF115 expression. Together, these two antagonistic mechanisms delimit ERF115 activity, which is called upon when surrounding stem cells are damaged, revealing a cell cycle regulatory mechanism accounting for stem cell niche longevity.
A
Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., & Vandepoele, K. (2013). TRAPID : an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes. GENOME BIOLOGY, 14(12). https://doi.org/10.1186/gb-2013-14-12-r134

Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system.
A
Dessimoz, C., Gabaldón, T., Roos, D. S., Sonnhammer, E. L., Herrero, J., Quest Orthologs Consortium, the, … Van Bel, M. (2012). Toward community standards in the quest for orthologs. BIOINFORMATICS, 28(6), 900–904. https://doi.org/10.1093/bioinformatics/bts050
Van Bel, M., Proost, S., Wischnitzki, E., Movahedi, S., Scheerlinck, C., Van de Peer, Y., & Vandepoele, K. (2012). Dissecting plant genomes with the PLAZA comparative genomics platform. PLANT PHYSIOLOGY, 158(2), 590–600. https://doi.org/10.1104/pp.111.189514

With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase.
A
Movahedi, S., Van Bel, M., Heyndrickx, K., & Vandepoele, K. (2012). Comparative co-expression analysis in plant biology. PLANT CELL AND ENVIRONMENT, 35(10), 1787–1798. https://doi.org/10.1111/j.1365-3040.2012.02517.x

The analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has shown that transcriptionally coordinated genes are often functionally related. Based on large-scale expression compendia grouping multiple experiments, this guilt-by-association principle has been applied to study modular gene programmes, identify cis-regulatory elements or predict functions for unknown genes in different model plants. Recently, several studies have demonstrated how, through the integration of gene homology and expression information, correlated gene expression patterns can be compared between species. The incorporation of detailed functional annotations as well as experimental data describing proteinprotein interactions, phenotypes or tissue specific expression, provides an invaluable source of information to identify conserved gene modules and translate biological knowledge from model organisms to crops. In this review, we describe the different steps required to systematically compare expression data across species. Apart from the technical challenges to compute and display expression networks from multiple species, some future applications of plant comparative transcriptomics are highlighted.
A
Moreau, H., Verhelst, B., Couloux, A., Derelle, E., Rombauts, S., Grimsley, N., … Vandepoele, K. (2012). Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. GENOME BIOLOGY, 13(8). https://doi.org/10.1186/gb-2012-13-8-r74

Background: Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research: Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion: The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.
A
Wang, F., Vandepoele, K., & Van Lijsebettens, M. (2012). Tetraspanin genes in plants. PLANT SCIENCE, 190, 9–15. https://doi.org/10.1016/j.plantsci.2012.03.005
Heyndrickx, K., & Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. PLANT PHYSIOLOGY, 159(3), 884–901. https://doi.org/10.1104/pp.112.196725

A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation.
A
Petrov, V., Vermeirssen, V., De Clercq, I., Van Breusegem, F., Minkov, I., Vandepoele, K., & Gechev, T. S. (2012). Identification of cis-regulatory elements specific for different types of reactive oxygen species in Arabidopsis thaliana. GENE, 499(1), 52–60. https://doi.org/10.1016/j.gene.2012.02.035
Quimbaya Gomez, M., Vandepoele, K., Raspé, E., Matthijs, M., Dhondt, S., Beemster, G., … De Veylder, L. (2012). Identification of putative cancer genes through data integration and comparative genomics between plants and humans. CELLULAR AND MOLECULAR LIFE SCIENCES, 69(12), 2041–2055. https://doi.org/10.1007/s00018-011-0909-x

Coordination of cell division with growth and development is essential for the survival of organisms. Mistakes made during replication of genetic material can result in cell death, growth defects, or cancer. Because of the essential role of the molecular machinery that controls DNA replication and mitosis during development, its high degree of conservation among organisms is not surprising. Mammalian cell cycle genes have orthologues in plants, and vice versa. However, besides the many known and characterized proliferation genes, still undiscovered regulatory genes are expected to exist with conserved functions in plants and humans. Starting from genome-wide Arabidopsis thaliana microarray data, an integrative strategy based on coexpression, functional enrichment analysis, and cis-regulatory element annotation was combined with a comparative genomics approach between plants and humans to detect conserved cell cycle genes involved in DNA replication and/or DNA repair. With this systemic strategy, a set of 339 genes was identified as potentially conserved proliferation genes. Experimental analysis confirmed that 20 out of 40 selected genes had an impact on plant cell proliferation; likewise, an evolutionarily conserved role in cell division was corroborated for two human orthologues. Moreover, association analysis integrating Homo sapiens gene expression data with clinical information revealed that, for 45 genes, altered transcript levels and relapse risk clearly correlated. Our results illustrate how a systematic exploration of the A. thaliana genome can contribute to the experimental identification of new cell cycle regulators that might represent novel oncogenes or/and tumor suppressors.
A
Vaulot, D., Lepere, C., Toulza, E., De la Iglesia, R., Poulain, J., Gaboyer, F., … Piganeau, G. (2012). Metagenomes of the picoalga Bathycoccus from the Chile coastal upwelling. PLOS ONE, 7(6). https://doi.org/10.1371/journal.pone.0039648

Among small photosynthetic eukaryotes that play a key role in oceanic food webs, picoplanktonic Mamiellophyceae such as Bathycoccus, Micromonas, and Ostreococcus are particularly important in coastal regions. By using a combination of cell sorting by flow cytometry, whole genome amplification (WGA), and 454 pyrosequencing, we obtained metagenomic data for two natural picophytoplankton populations from the coastal upwelling waters off central Chile. About 60% of the reads of each sample could be mapped to the genome of Bathycoccus strain from the Mediterranean Sea (RCC1105), representing a total of 9 Mbp (sample T142) and 13 Mbp (sample T149) of non-redundant Bathycoccus genome sequences. WGA did not amplify all regions uniformly, resulting in unequal coverage along a given chromosome and between chromosomes. The identity at the DNA level between the metagenomes and the cultured genome was very high (96.3% identical bases for the three larger chromosomes over a 360 kbp alignment). At least two to three different genotypes seemed to be present in each natural sample based on read mapping to Bathycoccus RCC1105 genome.
A
De Witte, D., Van Bel, M., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2012). A high performance computing approach to the dicovery of conserved motifs. 20e Annual Conference on Intelligent Systems for Molecular Biology, Abstracts, 1–1.
De Witte, D., Van Bel, M., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2012). Alignment-free genome-wide comparative motif discovery in 4 Monocot species. 11th European Conference on Computational Biology, Abstracts, 1–1.
Proost, S., Fostier, J., De Witte, D., Dhoedt, B., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2012). i-ADHoRe 3.0 : fast and sensitive detection of genomic homology in extremely large data sets. NUCLEIC ACIDS RESEARCH, 40(2). https://doi.org/10.1093/nar/gkr955
Fostier, J., Proost, S., Dhoedt, B., Saeys, Y., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2011). A greedy, graph-based algorithm for the alignment of multiple homologous gene lists. BIOINFORMATICS, 27(6), 749–756. https://doi.org/10.1093/bioinformatics/btr008
Babiychuk, E., Vandepoele, K., Wissing, J., Garcia-Diaz, M., De Rycke, R., Akbari, H., … Kushnir, S. (2011). Plastid gene expression and plant development require a plastidic protein of the mitochondrial transcription termination factor family. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 108(16), 6674–6679. https://doi.org/10.1073/pnas.1103442108
Mittler, R., Vanderauwera, S., Suzuki, N., Miller, G., Tognetti, V., Vandepoele, K., … Van Breusegem, F. (2011). ROS signaling: the new wave? TRENDS IN PLANT SCIENCE, 16(6), 300–309. https://doi.org/10.1016/j.tplants.2011.03.007

Reactive oxygen species (ROS) play a multitude of signaling roles in different organisms from bacteria to mammalian cells. They were initially thought to be toxic byproducts of aerobic metabolism, but have now been acknowledged as central players in the complex signaling network of cells. In this review, we will attempt to address several key questions related to the use of ROS as signaling molecules in cells, including the dynamics and specificity of ROS signaling, networking of ROS with other signaling pathways, ROS signaling within and across different cells, ROS waves and the evolution of the ROS gene network.
A
Movahedi, S., Van de Peer, Y., & Vandepoele, K. (2011). Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice. PLANT PHYSIOLOGY, 156(3), 1316–1330. https://doi.org/10.1104/pp.111.177865

Microarray experiments have yielded massive amounts of expression information measured under various conditions for the model species Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). Expression compendia grouping multiple experiments make it possible to define correlated gene expression patterns within one species and to study how expression has evolved between species. We developed a robust framework to measure expression context conservation (ECC) and found, by analyzing 4,630 pairs of orthologous Arabidopsis and rice genes, that 77% showed conserved coexpression. Examples of nonconserved ECC categories suggested a link between regulatory evolution and environmental adaptations and included genes involved in signal transduction, response to different abiotic stresses, and hormone stimuli. To identify genomic features that influence expression evolution, we analyzed the relationship between ECC, tissue specificity, and protein evolution. Tissue-specific genes showed higher expression conservation compared with broadly expressed genes but were fast evolving at the protein level. No significant correlation was found between protein and expression evolution, implying that both modes of gene evolution are not strongly coupled in plants. By integration of cis-regulatory elements, many ECC conserved genes were significantly enriched for shared DNA motifs, hinting at the conservation of ancestral regulatory interactions in both model species. Surprisingly, for several tissue-specific genes, patterns of concerted network evolution were observed, unveiling conserved coexpression in the absence of conservation of tissue specificity. These findings demonstrate that orthologs inferred through sequence similarity in many cases do not share similar biological functions and highlight the importance of incorporating expression information when comparing genes across species.
A
Huysman, M., Martens, C., Vandepoele, K., Gillard, J., Rayko, E., Heijde, M., … Vyverman, W. (2010). Genome-wide analysis of the diatom cell cycle unveils a novel type of cyclins involved in environmental signaling. GENOME BIOLOGY, 11(2). https://doi.org/10.1186/gb-2010-11-2-r17

Background : Despite the enormous importance of diatoms in aquatic ecosystems and their broad industrial potential, little is known about their life cycle control. Diatoms typically inhabit rapidly changing and unstable environments, suggesting that cell cycle regulation in diatoms must have evolved to adequately integrate various environmental signals. The recent genome sequencing of Thalassiosira pseudonana and Phaeodactylum tricornutum allows us to explore the molecular conservation of cell cycle regulation in diatoms. Results : By profile-based annotation of cell cycle genes, counterparts of conserved as well as new regulators were identified in T. pseudonana and P. tricornutum. In particular, the cyclin gene family was found to be expanded extensively compared to that of other eukaryotes and a novel type of cyclins was discovered, the diatom-specific cyclins. We established a synchronization method for P. tricornutum that enabled assignment of the different annotated genes to specific cell cycle phase transitions. The diatom-specific cyclins are predominantly expressed at the G1-to-S transition and some respond to phosphate availability, hinting at a role in connecting cell division to environmental stimuli. Conclusion : The discovery of highly conserved and new cell cycle regulators suggests the evolution of unique control mechanisms for diatom cell division, probably contributing to their ability to adapt and survive under highly fluctuating environmental conditions.
A
Takahashi, N., Quimbaya Gomez, M., Schubert, V., Lammens, T., Vandepoele, K., Schubert, I., … De Veylder, L. (2010). The MCM-Binding Protein ETG1 Aids Sister Chromatid Cohesion Required for Postreplicative Homologous Recombination Repair. PLOS GENETICS, 6(1). https://doi.org/10.1371/journal.pgen.1000817

The DNA replication process represents a source of DNA stress that causes potentially spontaneous genome damage. This effect might be strengthened by mutations in crucial replication factors, requiring the activation of DNA damage checkpoints to enable DNA repair before anaphase onset. Here, we demonstrate that depletion of the evolutionarily conserved minichromosome maintenance helicase-binding protein ETG1 of Arabidopsis thaliana resulted in a stringent late G2 cell cycle arrest. This arrest correlated with a partial loss of sister chromatid cohesion. The lack-of-cohesion phenotype was intensified in plants without functional CTF18, a replication fork factor needed for cohesion establishment. The synergistic effect of the etg1 and ctf18 mutants on sister chromatid cohesion strengthened the impact on plant growth of the replication stress caused by ETG1 deficiency because of inefficient DNA repair. We conclude that the ETG1 replication factor is required for efficient cohesion and that cohesion establishment is essential for proper development of plants suffering from endogenous DNA stress. Cohesion defects observed upon knockdown of its human counterpart suggest an equally important developmental role for the orthologous mammalian ETG1 protein.
A
Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., & Vandepoele, K. (2009). PLAZA : a comparative genomics resource to study gene and genome evolution in plants. PLANT CELL, 21(12), 3718–3731. https://doi.org/10.1105/tpc.109.071506

The number of sequenced genomes of representatives within the green lineage is rapidly increasing. Consequently, comparative sequence analysis has significantly altered our view on the complexity of genome organization, gene function, and regulatory pathways. To explore all this genome information, a centralized infrastructure is required where all data generated by different sequencing initiatives is integrated and combined with advanced methods for data mining. Here, we describe PLAZA, an online platform for plant comparative genomics (http://bioinformatics.psb.ugent.be/plaza/). This resource integrates structural and functional annotation of published plant genomes together with a large set of interactive tools to study gene function and gene and genome evolution. Precomputed data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, intraspecies whole-genome dot plots, and genomic colinearity between species. Through the integration of high confidence Gene Ontology annotations and tree-based orthology between related species, thousands of genes lacking any functional description are functionally annotated. Advanced query systems, as well as multiple interactive visualization tools, are available through a user-friendly and intuitive Web interface. In addition, detailed documentation and tutorials introduce the different tools, while the workbench provides an efficient means to analyze user-defined gene sets through PLAZA's interface. In conclusion, PLAZA provides a comprehensible and up-to-date research environment to aid researchers in the exploration of genome information within the green plant lineage.
A
Dhaese, S., Vandepoele, K., WATERSCHOOT, D., Vanloo, B., Vandekerckhove, J., Ampe, C., & Van Troys, M. (2009). The Mouse Thymosin Beta15 Gene Family Displays Unique Complexity and Encodes A Functional Thymosin Repeat. JOURNAL OF MOLECULAR BIOLOGY, 387(4), 809–825. https://doi.org/10.1016/j.jmb.2009.02.026

We showed earlier that human beta -thymosin 15 (Th15) is up-regulated in prostate cancer, confirming Studies from others that propagated Tb15 as a prostate cancer biomarker. In this first report on mouse Tb15, we show that, unlike in humans, four Tb15-like isoforms are present in Mouse. We used phylogenetic analysis of deuterostome beta-thymosins to show that these four new isoforms cluster within the vertebrate Tb15-clade. Intriguingly, one of these Mouse beta-thymosins, Th15r, consists of two beta-thymosin domains. The existence of such a repeat beta-thymosin is so far unique in vertebrates, though common in lower eukaryotes. Biochemical data indicate that Tb15r potently sequesters actin. In a cellular context, Tb15r behaves as a bona fide beta-thymosin, lowering central stress fibre content. We reveal that a complex genomic organization underlies Tb15r expression: Tb15r results from read-through transcription and alternative splicing of two tandem duplicated mouse Tb15 genes. Transcript profiling of all Mouse beta-thymosin isoform (Th15s, Tb4 and Tb10) reveals that two isoform switches occur between embryonic and adult tissues, and indicates Th15r as the major mouse Tb15 isoform in adult cells. Tb15r is present also in mouse prostate cancer cell lines. This insight into the mouse Tb15 family is fundamental for future studies on Tb15 in mouse (prostate) cancer models.
A
Vandepoele, K., Quimbaya Gomez, M., Casneuf, T., De Veylder, L., & Van de Peer, Y. (2009). Unraveling Transcriptional Control in Arabidopsis Using cis-Regulatory Elements and Coexpression Networks. Plant Physiology, 150(2), 535–546. https://doi.org/10.1104/pp.109.136028

Analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has demonstrated that genes with an overall similar expression pattern are often enriched for similar functions. This guilt-by-association principle can be applied to define modular gene programs, identify cis-regulatory elements, or predict gene functions for unknown genes based on their coexpression neighborhood. We evaluated the potential to use Gene Ontology (GO) enrichment of a gene's coexpression neighborhood as a tool to predict its function but found overall low sensitivity scores (13%-34%). This indicates that for many functional categories, coexpression alone performs poorly to infer known biological gene functions. However, integration of cis-regulatory elements shows that 46% of the gene coexpression neighborhoods are enriched for one or more motifs, providing a valuable complementary source to functionally annotate genes. Through the integration of coexpression data, GO annotations, and a set of known cis-regulatory elements combined with a novel set of evolutionarily conserved plant motifs, we could link many genes and motifs to specific biological functions. Application of our coexpression framework extended with cis-regulatory element analysis on transcriptome data from the cell cycle-related transcription factor OBP1 yielded several coexpressed modules associated with specific cis-regulatory elements. Moreover, our analysis strongly suggests a feed-forward regulatory interaction between OBP1 and the E2F pathway. The ATCOECIS resource (http:// bioinformatics.psb.ugent.be/ATCOECIS/) makes it possible to query coexpression data and GO and cis-regulatory element annotations and to submit user-defined gene sets for motif analysis, providing an access point to unravel the regulatory code underlying transcriptional control in Arabidopsis (Arabidopsis thaliana).
A
De Bodt, S., Proost, S., Vandepoele, K., Rouzé, P., & Van de Peer, Y. (2009). Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics, 10(288), 1–15. https://doi.org/10.1186/1471-2164-10-288

Background: Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results: In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.
A
Naouar, N., Vandepoele, K., Lammens, T., Casneuf, T., Zeller, G., Van Hummelen, P., … Vuylsteke, M. (2009). Quantitative RNA expression analysis with Affymetrix Tiling 1.0R arrays identifies new E2F target genes. Plant Journal, 57(1), 184–194. https://doi.org/10.1111/j.1365-313X.2008.03662.x

The Affymetrix ATH1 array provides a robust standard tool for transcriptome analysis, but unfortunately does not represent all of the transcribed genes in Arabidopsis thaliana. Recently, Affymetrix has introduced its Arabidopsis Tiling 1.0R array, which offers whole-genome coverage of the sequenced Col-0 reference strain. Here, we present an approach to exploit this platform for quantitative mRNA expression analysis, and compare the results with those obtained using ATH1 arrays. We also propose a method for selecting unique tiling probes for each annotated gene or transcript in the most current genome annotation, TAIR7, generating Chip Definition Files for the Tiling 1.0R array. As a test case, we compared the transcriptome of wild-type plants with that of transgenic plants overproducing the heterodimeric E2Fa-DPa transcription factor. We show that with the appropriate data pre-processing, the estimated changes per gene for those with significantly different expression levels is very similar for the two array types. With the tiling arrays we could identify 368 new E2F-regulated genes, with a large fraction including an E2F motif in the promoter. The latter groups increase the number of excellent candidates for new, direct E2F targets by almost twofold, from 181 to 334.
A
Van de Peer, Y., Fawcett, J., Proost, S., Sterck, L., & Vandepoele, K. (2009). The flowering world: a tale of duplications. TRENDS IN PLANT SCIENCE, 14(12), 680–688. https://doi.org/10.1016/j.tplants.2009.09.001

Flowering plants contain many genes, most of which were created during the past 200 or so million years through small- and large-scale duplications. Paleo-polyploidy events, in particular, have been the subject of much recent research. There is a growing consensus that one or more genome doubling or merging events occurred early during the evolution of the flowering plants, and that many lineages have since undergone additional, independent and more recent duplication events. Here, we review the difficulties in determining the number of genome duplications and discuss how the completion of some additional genome sequences of species occupying key phylogenetic positions has led to a better understanding of the timing of certain duplication events. This is important if we want to demonstrate the significance of genome duplications for the evolution and radiation of (different groups of) flowering plants.
A
Piganeau, G., Vandepoele, K., Gourbière, S., Van de Peer, Y., & Moreau, H. (2009). Unravelling cis-Regulatory Elements in the Genome of the Smallest Photosynthetic Eukaryote: Phylogenetic Footprinting in Ostreococcus. Journal of Molecular Evolution, 69(3), 249–259. https://doi.org/10.1007/s00239-009-9271-0

We used a phylogenetic footprinting approach, adapted to high levels of divergence, to estimate the level of constraint in intergenic regions of the extremely gene dense Ostreococcus algae genomes (Chlorophyta, Prasinophyceae). We first benchmarked our method against the Saccharomyces sensu stricto genome data and found that the proportion of conserved non-coding sites was consistent with those obtained with methods using calibration by the neutral substitution rate. We then applied our method to the complete genomes of Ostreococcus tauri and O. lucimarinus, which are the most divergent species from the same genus sequenced so far. We found that 77% of intergenic regions in Ostreococcus still contain some phylogenetic footprints, as compared to 88% for Saccharomyces, corresponding to an average rate of constraint on intergenic region of 17% and 30%, respectively. A comparison with some known functional cis-regulatory elements enabled us to investigate whether some transcriptional regulatory pathways were conserved throughout the green lineage. Strikingly, the size of the phylogenetic footprints depends on gene orientation of neighboring genes, and appears to be genus-specific. In Ostreococcus, 5' intergenic regions contain four times more conserved sites than 3' intergenic regions, whereas in yeast a higher frequency of constrained sites in intergenic regions between genes on the same DNA strand suggests a higher frequency of bidirectional regulatory elements. The phylogenetic footprinting approach can be used despite high levels of divergence in the ultrasmall Ostreococcus algae, to decipher structure of constrained regulatory motifs, and identify putative regulatory pathways conserved within the green lineage.
A
Vandenbroucke, K., Robbens, S., Vandepoele, K., Inzé, D., Van de Peer, Y., & Van Breusegem, F. (2008). Hydrogen peroxide-induced gene expression across kingdoms: a comparative analysis. MOLECULAR BIOLOGY AND EVOLUTION, 25(3), 507–516. https://doi.org/10.1093/molbev/msm276

Cells react to oxidative stress conditions by launching a defense response through the induction of nuclear gene expression. The advent of microarray technologies allowed monitoring of oxidative stress-dependent changes of transcript levels at a comprehensive and genome-wide scale, resulting in a series of inventories of differentially expressed genes in different organisms. We performed a meta-analysis on hydrogen peroxide (H2O2)-induced gene expression in the cyanobacterium Synechocystis PCC 6803, the yeast Saccharomyces cerevisiae and Schizosaccharomyces pombe, the land plant Arabidopsis thaliana, and the human HeLa cell line. The H2O2-induced gene expression in both yeast species was highly conserved and more similar to the A. thaliana response than that of the human cell line. Based on the expression characteristics of genuine antioxidant genes, we show that the antioxidant capacity of microorganisms and higher eukaryotes is differentially regulated. Four families of evolutionarily conserved eukaryotic proteins could be identified that were H2O2 responsive across kingdoms: DNAJ domain-containing heat shock proteins, small guanine triphosphate-binding proteins, Ca2+-dependent protein kinases, and ubiquitin-conjugating enzymes.
A
Van Roy, F., Vandepoele, K., Van Roy, N., Andries, V., Staes, K., Vandesompele, J., … Speleman, F. (2008). A constitutional translocation t(1;17)(p36.2;q11.2) in a neuroblastoma patient disrupts the the human NBPF1 and ACCN1 genes. EJC SUPPLEMENTS, 6(9), 14–14. https://doi.org/10.1016/S1359-6349(08)71228-X
Lessa Alvim Kamei, C., Boruc, J., Vandepoele, K., Van Den Daele, H., Maes, S., Russinova, E., … De Veylder, L. (2008). The PRA1 gene family in Arabidopsis. PLANT PHYSIOLOGY, 147(4), 1735–1749. https://doi.org/10.1104/pp.108.122226

Prenylated Rab acceptor 1 (PRA1) domain proteins are small transmembrane proteins that regulate vesicle trafficking as receptors of Rab GTPases and the vacuolar soluble N-ethylmaleimide-sensitive factor attachment receptor protein VAMP2. However, little is known about PRA1 family members in plants. Sequence analysis revealed that higher plants, compared with animals and primitive plants, possess an expanded family of PRA1 domain-containing proteins. The Arabidopsis ( Arabidopsis thaliana) PRA1 (AtPRA1) proteins were found to homodimerize and heterodimerize in a manner corresponding to their phylogenetic distribution. Different AtPRA1 family members displayed distinct expression patterns, with a preference for vascular cells and expanding or developing tissues. AtPRA1 genes were significantly coexpressed with Rab GTPases and genes encoding vesicle transport proteins, suggesting an involvement in the vesicle trafficking process similar to that of their animal counterparts. Correspondingly, AtPRA1 proteins were localized in the endoplasmic reticulum, Golgi apparatus, and endosomes/prevacuolar compartments, hinting at a function in both secretory and endocytic intracellular trafficking pathways. Taken together, our data reveal a high functional diversity of AtPRA1 proteins, probably dealing with the various demands of the complex trafficking system.
A
Martens, C., Vandepoele, K., & Van de Peer, Y. (2008). Whole-genome analysis reveals molecular innovations and evolutionary transitions in chromalveolate species. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 105(9), 3427–3432. https://doi.org/10.1073/pnas.0712248105

The chromalveolates form a highly diverse and fascinating assemblage of organisms, ranging from obligatory parasites such as Plasmodium to free-living ciliates and algae such as kelps, diatoms, and dinoflagellates. Many of the species in this monophyletic grouping are of major medical, ecological, and economical importance. Nevertheless, their genome evolution is much less well studied than that of higher plants, animals, or fungi. In the current study, we have analyzed and compared 12 chromalveolate species for which whole-sequence information is available and provide a detailed picture on gene loss and gene gain in the different lineages. As expected, many gene loss and gain events can be directly correlated with the lifestyle and specific adaptations of the organisms studied. For instance, in the obligate intracellular Apicomplexa we observed massive loss of genes that play a role in general basic processes such as amino acid, carbohydrate, and lipid metabolism, reflecting the transition of a free-living to an obligate intracellular lifestyle. In contrast, many gene families show species-specific expansions, such as those in the plant pathogen oomycete Phytophthora that are involved in degrading the plant cell wall polysaccharides to facilitate the pathogen invasion process. In general, chromalveolates show a tremendous difference in genome structure and evolution and in the number of genes they have lost or gained either through duplication or horizontal gene transfer.
A
Bowler, C., Allen, A. E., Badger, J. H., Grimwood, J., Jabbari, K., Kuo, A., … Grigoriev, I. V. (2008). The Phaeodactylum genome reveals the evolutionary history of diatom genomes. NATURE, 456(7219), 239–244. https://doi.org/10.1038/nature07410

Diatoms are photosynthetic secondary endosymbionts found throughout marine and freshwater environments, and are believed to be responsible for around one- fifth of the primary productivity on Earth(1,2). The genome sequence of the marine centric diatom Thalassiosira pseudonana was recently reported, revealing a wealth of information about diatom biology(3-5). Here we report the complete genome sequence of the pennate diatom Phaeodactylum tricornutum and compare it with that of T. pseudonana to clarify evolutionary origins, functional significance and ubiquity of these features throughout diatoms. In spite of the fact that the pennate and centric lineages have only been diverging for 90 million years, their genome structures are dramatically different and a substantial fraction of genes (similar to 40%) are not shared by these representatives of the two lineages. Analysis of molecular divergence compared with yeasts and metazoans reveals rapid rates of gene diversification in diatoms. Contributing factors include selective gene family expansions, differential losses and gains of genes and introns, and differential mobilization of transposable elements. Most significantly, we document the presence of hundreds of genes from bacteria. More than 300 of these gene transfers are found in both diatoms, attesting to their ancient origins, and many are likely to provide novel possibilities for metabolite management and for perception of environmental signals. These findings go a long way towards explaining the incredible diversity and success of the diatoms in contemporary oceans.
A
Polet, D., Lambrechts, A., Vandepoele, K., Vandekerckhove, J., & Ampe, C. (2007). On the origin and evolution of vertebrate and viral profilins. FEBS LETTERS, 581(2), 211–217. https://doi.org/10.1016/j.febslet.2006.12.013
Sterck, L., Rombauts, S., Vandepoele, K., Rouzé, P., & Van de Peer, Y. (2007). How many genes are there in plants (... and why are they there)? CURRENT OPINION IN PLANT BIOLOGY, 10(2), 199–203. https://doi.org/10.1016/j.pbi.2007.01.004

Annotation of the first few complete plant genomes has revealed that plants have many genes. For Arabidopsis, over 26 500 gene loci have been predicted, whereas for rice, the number adds up to 41 000. Recent analysis of the poplar genome suggests more than 45 000 genes, and partial sequence data from Medicago and Lotus also suggest that these plants contain more than 40 000 genes. Nevertheless, estimations suggest that ancestral angiosperms had no more than 12 000-14 000 genes. One explanation for the large increase in gene number during angiosperm evolution is gene duplication. It has been shown previously that the retention of duplicates following small- and large-scale duplication events in plants is substantial. Taking into account the function of genes that have been duplicated, we are now beginning to understand why many plant genes might have been retained, and how their retention might be linked to the typical lifestyle of plants.
A
Peres, A., Churchman, M. L., Hariharan, S., Himanen, K., Verkest, A., Vandepoele, K., … De Veylder, L. (2007). Novel plant-specific cyclin-dependent kinase inhibitors induced by biotic and abiotic stresses. JOURNAL OF BIOLOGICAL CHEMISTRY, 282(35), 25588–25596. https://doi.org/10.1074/jbc.M703326200

The EL2 gene of rice ( Oryza sativa), previously classified as early response gene against the potent biotic elicitor N-acetylchitoheptaose and encoding a short polypeptide with unknown function, was identified as a novel cell cycle regulatory gene related to the recently reported SIAMESE ( SIM) gene of Arabidopsis thaliana. Iterative two-hybrid screens, in vitro pull-down assays, and fluorescence resonance energy transfer analyses showed that Orysa; EL2 binds the cyclin-dependent kinase ( CDK) CDKA1; 1 and D-type cyclins. No interaction was observed with the plant-specific B-type CDKs. The amino acid motif ELERFL was identified to be essential for cyclin, but not for CDK binding. Orysa; EL2 impaired the ability of Orysa; CYCD5; 3 to complement a budding yeast ( Saccharomyces cerevisiae) triple CLN mutant, whereas recombinant protein inhibited CDK activity in vitro. Moreover, Orysa; EL2 was able to rescue the multicellular trichome phenotype of sim mutants of Arabidopsis, unequivocally demonstrating that Orysa; EL2 operates as a cell cycle inhibitor. Orysa; EL2 mRNA levels were induced by cold, drought, and propionic acid. Our data suggest that Orysa; EL2 encodes a new type of plant CDK inhibitor that links cell cycle progression with biotic and abiotic stress responses.
A
Rymen, B., Fiorani, F., Kartal, F., Vandepoele, K., Inzé, D., & Beemster, G. (2007). Cold nights impair leaf growth and cell cycle progression in maize through transcriptional changes of cell cycle genes. PLANT PHYSIOLOGY, 143(3), 1429–1438. https://doi.org/10.1104/pp.106.093948

Low temperature inhibits the growth of maize (Zea mays) seedlings and limits yield under field conditions. To study the mechanism of cold-induced growth retardation, we exposed maize B73 seedlings to low night temperature (25 degrees C/4 degrees C, day/night) from germination until the completion of leaf 4 expansion. This treatment resulted in a 20% reduction in final leaf size compared to control conditions (25 degrees C/18 degrees C, day/night). A kinematic analysis of leaf growth rates in control and cold-treated leaves during daytime showed that cold nights affected both cell cycle time (165%) and cell production (222%). In contrast, the size of mature epidermal cells was unaffected. To analyze the effect on cell cycle progression at the molecular level, we identified through a bioinformatics approach a set of 43 cell cycle genes and analyzed their expression in proliferating, expanding, and mature cells of leaves exposed to either control or cold nights. This analysis showed that: (1) the majority of cell cycle genes had a consistent proliferation-specific expression pattern; and (2) the increased cell cycle time in the basal meristem of leaves exposed to cold nights was associated with differential expression of cell cycle inhibitors and with the concomitant down-regulation of positive regulators of cell division.
A
Velasco, R., Zharkikh, A., Troggio, M., Cartwright, D. A., Cestaro, A., Pruss, D., … Viola, R. (2007). A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLOS ONE, 2(12). https://doi.org/10.1371/journal.pone.0001326

Background. Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings. We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism ( SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions. Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape.
A
Blomme, T., Vandepoele, K., De Bodt, S., Simillion, C., Maere, S., & Van de Peer, Y. (2006). The gain and loss of genes during 600 million years of vertebrate evolution. GENOME BIOLOGY, 7(5). https://doi.org/10.1186/gb-2006-7-5-r43

Background: Gene duplication is assumed to have played a crucial role in the evolution of vertebrate organisms. Apart from a continuous mode of duplication, two or three whole genome duplication events have been proposed during the evolution of vertebrates, one or two at the dawn of vertebrate evolution, and an additional one in the fish lineage, not shared with land vertebrates. Here, we have studied gene gain and loss in seven different vertebrate genomes, spanning an evolutionary period of about 600 million years. Results: We show that: first, the majority of duplicated genes in extant vertebrate genomes are ancient and were created at times that coincide with proposed whole genome duplication events; second, there exist significant differences in gene retention for different functional categories of genes between fishes and land vertebrates; third, there seems to be a considerable bias in gene retention of regulatory genes towards the mode of gene duplication ( whole genome duplication events compared to smaller-scale events), which is in accordance with the so-called gene balance hypothesis; and fourth, that ancient duplicates that have survived for many hundreds of millions of years can still be lost. Conclusion: Based on phylogenetic analyses, we show that both the mode of duplication and the functional class the duplicated genes belong to have been of major importance for the evolution of the vertebrates. In particular, we provide evidence that massive gene duplication ( probably as a consequence of entire genome duplications) at the dawn of vertebrate evolution might have been particularly important for the evolution of complex vertebrates.
A
Vandepoele, K., Casneuf, T., & Van de Peer, Y. (2006). Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics. GENOME BIOLOGY, 7(11). https://doi.org/10.1186/gb-2006-7-11-r103

Background: Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation. Results: Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other. Conclusion: These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view.
A
Vandepoele, K., Vlieghe, K., Florquin, K., Hennig, L., Beemster, G., Gruissem, W., … De Veylder, L. (2005). Genome-wide identification of potential plant E2F target genes. PLANT PHYSIOLOGY, 139(1), 316–328. https://doi.org/10.1104/pp.105.066290

Entry into the S phase of the cell cycle is controlled by E2F transcription factors that induce the transcription of genes required for cell cycle progression and DNA replication. Although the E2F pathway is highly conserved in higher eukaryotes, only a few E2F target genes have been experimentally validated in plants. We have combined microarray analysis and bioinformatics tools to identify plant E2F-responsive genes. Promoter regions of genes that were induced at the transcriptional level in Arabidopsis ( Arabidopsis thaliana) seedlings ectopically expressing genes for the E2Fa and DPa transcription factors were searched for the presence of E2F- binding sites, resulting in the identification of 181 putative E2F target genes. In most cases, the E2F- binding element was located close to the transcription start site, but occasionally could also be localized in the 5'untranslated region. Comparison of our results with available microarray data sets from synchronized cell suspensions revealed that the E2F target genes were expressed almost exclusively during G1 and S phases and activated upon reentry of quiescent cells into the cell cycle. To test the robustness of the data for the Arabidopsis E2F target genes, we also searched for the presence of E2F-cis-acting elements in the promoters of the putative orthologous rice ( Oryza sativa) genes. Using this approach, we identified 70 potential conserved plant E2F target genes. These genes encode proteins involved in cell cycle regulation, DNA replication, and chromatin dynamics. In addition, we identified several genes for potentially novel S phase regulatory proteins.
A
Vandepoele, K., & Van de Peer, Y. (2005). Exploring the plant transcriptome through phylogenetic profiling. PLANT PHYSIOLOGY, 137(1), 31–42. https://doi.org/10.1104/pp.104.054700

Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.
A
Paterson, A. H., Bowers, J. E., Van de Peer, Y., & Vandepoele, K. (2005). Ancient duplication of cereal genomes. https://doi.org/10.1111/j.1469-8137.2005.01347.x
Vandepoele, K. (2005). Mode and tempo of gene and genome evolution in plants. Ghent University. Faculty of Sciences, Ghent, Belgium.
Landrieu, I., da Costa, M., De Veylder, L., Dewitte, F., Vandepoele, K., Hassan, S., … Lippens, G. (2004). A small CDC25 dual-specificity tyrosine-phosphatase isoform in Arabidopsis thaliana. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 101(36), 13380–13385. https://doi.org/10.1073/pnas.0405248101

The dual-specificity CDC25 phosphatases are critical positive regulators of cyclin-dependent kinases (CDKs). Even though an antagonistic Arabidopsis thaliana WEE1 kinase has been cloned and tyrosine phosphorylation of its CDKs has been demonstrated, no valid candidate for a CDC25 protein has been reported in higher plants. We identify a CDC25-related protein (Arath;CDC25) of A. thaliana, constituted by a sole catalytic domain. The protein has a tyrosine-phosphatase activity and stimulates the kinase activity of Arabidopsis CDKs. Its tertiary structure was obtained by NMR spectroscopy and confirms that Arath;CDC25 belongs structurally to the classical CDC25 superfamily with a central five-stranded beta-sheet surrounded by helices. A particular feature of the protein, however, is the presence of an additional zinc-binding loop in the C-terminal part. NMR mapping studies revealed the interaction with phosphorylated peptidic models derived from the conserved CDK loop containing the phosphothreonine-14 and phosphotyrosine-15. We conclude that despite sequence divergence, Arath;CDC25 is structurally and functionally an isoform of the CDC25 superfamily, which is conserved in yeast and in plants, including Arabidopsis and rice.
A
Vercammen, D., Van De Cotte, B., De Jaeger, G., Eeckhout, D., Casteels, P., Vandepoele, K., … Van Breusegem, F. (2004). Type II metacaspases Atmc4 and Atmc9 of Arabidopsis thaliana cleave substrates after arginine and lysine. JOURNAL OF BIOLOGICAL CHEMISTRY, 279(44), 45329–45336. https://doi.org/10.1074/jbc.M406329200

Nine potential caspase counterparts, designated metacaspases, were identified in the Arabidopsis thaliana genome. Sequence analysis revealed two types of metacaspases, one with ( type I) and one without ( type II) a proline- or glutamine-rich N-terminal extension, possibly representing a prodomain. Production of recombinant Arabidopsis type II metacaspases in Escherichia coli resulted in cysteine-dependent autocatalytic processing of the proform into large and small subunits, in analogy to animal caspases. A detailed biochemical characterization with a broad range of synthetic oligopeptides and several protease inhibitors of purified recombinant proteins of both metacaspase 4 and 9 showed that both metacaspases are arginine/lysine-specific cysteine proteases and did not cleave caspase-specific synthetic substrates. These findings suggest that type II metacaspases are not directly responsible for earlier reported caspase-like activities in plants.
A
Simillion, C., Vandepoele, K., & Van de Peer, Y. (2004). Recent developments in computational approaches for uncovering genomic homology. BIOESSAYS, 26(11), 1225–1235. https://doi.org/10.1002/bies.20127
Gevers, D., Vandepoele, K., Simillion, C., & Van de Peer, Y. (2004). Gene duplication and biased functional retention of paralogs in bacterial genomes. TRENDS IN MICROBIOLOGY, 12(4), 148–154. https://doi.org/10.1016/j.tim.2004.02.007

Gene duplication is considered an important prerequisite for gene innovation that can facilitate adaptation to changing environments. The analysis of 106 bacterial genome sequences has revealed the existence of a significant number of paralogs. Analysis of the functional classification of these paralogs reveals a preferential enrichment in functional classes that are involved in transcription, metabolism and defense mechanisms. From the organization of paralogs in the genome we can conclude that duplicated genes in bacteria appear to have been mainly created by small-scale duplication events, such as tandem and operon duplications.
A
Vandepoele, K., De Vos, W., Taylor, J. S., Meyer, A., & Van de Peer, Y. (2004). Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 101(6), 1638–1643. https://doi.org/10.1073/pnas.0307968100

It has been suggested that fish have more genes than humans. Whether most of these additional genes originated through a complete (fish-specific) genome duplication or through many lineage-specific tandem gene or smaller block duplications and family expansions continues to be debated. We analyzed the complete genome of the pufferfish Takifugu rubripes (Fugu) and compared it with the paranome of humans. We show that most paralogous genes of Fugu are the result of three complete genome duplications. Both relative and absolute dating of the complete predicted set of protein-coding genes suggest that initial genome duplications, estimated to have occurred at least 600 million years ago, shaped the genome of all vertebrates, In addition, analysis of >150 block duplications in the Fugu genome clearly supports a fish-specific genome duplication (approximate to320 million years ago) that coincided with the vast radiation of most modern ray-finned fishes. Unlike the human genome, Fugu contains very few recently duplicated genes; hence, many human genes are much younger than fish genes. This lack of recent gene duplication, or, alternatively, the accelerated rate of gene loss, is possibly one reason for the drastic reduction of the genome size of Fugu observed during the past 100 million years or so, subsequent to the additional genome duplication that ray-finned fishes but not land vertebrates experienced.
A
Simillion, C., Vandepoele, K., Saeys, Y., & Van de Peer, Y. (2004). Building genomic profiles for uncovering segmental homology in the twilight zone. GENOME RESEARCH, 14(6), 1095–1106. https://doi.org/10.1101/gr.2179004

The identification of homologous regions within and between genomes is all essential prerequisite for Studying genome structure and evolution. Different methods already exist that allow detecting homologous regions ill all automated manner. These methods are based either oil finding sequence similarities at the DNA level or on identifying chromosomal regions showing conservation of gene order and content. Especially the latter approach has proven useful for detecting homology between highly divergent chromosomal regions. However, until now, such map-based approaches required that candidate homologous regions show significant collinearity with other segments to be considered as being homologous. Here, we present a novel method that creates profiles combining the gene order and content information of multiple mutually homologous genomic segments. These profiles can be used to scan one or more genomes to detect segments that show significant collinearity with the entire profile but not necessarily with individual segments. When applying this new method to the combined genomes of Arabidopsis and rice, we find additional evidence for ancient duplication events in the rice genome.
A
Vandepoele, K., Simillion, C., & Van de Peer, Y. (2004). The quest for genomic homology. CURRENT GENOMICS, 5(4), 299–308. https://doi.org/10.2174/1389202043349237
Simillion, C., Vandepoele, K., Saeys, Y., & Van de Peer, Y. (2004). Building genomic profiles for uncovering segmental homology in the twilight zone. Belgian Bioinformatics Conference, 4th, Abstracts. Presented at the 4th Belgian Bioinformatics Conference (BBC 2004), Brussels, Belgium.
Breyne, P., Dreesen, R., Cannoot, B., Rombaut, D., Vandepoele, K., Rombauts, S., … Zabeau, M. (2003). Quantitative cDNA-AFLP analysis for genome-wide expression studies. MOLECULAR GENETICS AND GENOMICS, 269(2), 173–179. https://doi.org/10.1007/s00438-003-0830-6

An improved cDNA-AFLP method for genome-wide expression analysis has been developed. We demonstrate that this method is an efficient tool for quantitative transcript profiling and a valid alternative to microarrays. Unique transcript tags, generated from reverse-transcribed messenger RNA by restriction enzymes, were screened through a series of selective PCR amplifications. Based on in silico analysis, an enzyme combination was chosen that ensures that at least 60% of all the mRNAs were represented by an informative sequence tag. The sensitivity and specificity of the method allows one to detect poorly expressed genes and distinguish between homologous sequences. Accurate gene expression profiles were determined by quantitative analysis of band intensities, and subtle differences in transcriptional activity were revealed. A detailed screen for cell cycle-modulated genes in tobacco demonstrates the usefulness of the technology for genome-wide expression analysis.
A
Raes, J., Vandepoele, K., Simillion, C., Saeys, Y., & Van de Peer, Y. (2003). Investigating ancient duplication events in the Arabidopsis genome. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS, 3(1–4), 117–129. https://doi.org/10.1023/A:1022666020026
Raes, J., Vandepoele, K., Simillion, C., Saeys, Y., & Van de Peer, Y. (2003). Investigating ancient duplication events in the Arabidopsis genome. In A. Meyer & Y. Van de Peer (Eds.), Genome evolution : gene and genome duplications and the origin of novel gene functions (pp. 117–129). https://doi.org/10.1007/978-94-010-0263-9_12
Vandepoele, K., Simillion, C., & Van de Peer, Y. (2003). Evidence that rice and other cereals are ancient aneuploids. PLANT CELL, 15(9), 2192–2202. https://doi.org/10.1105/tpc.014019

Detailed analyses of the genomes of several model organisms revealed that large-scale gene or even entire-genome duplications have played prominent roles in the evolutionary history of many eukaryotes. Recently, strong evidence has been presented that the genomic structure of the dicotyledonous model plant species Arabidopsis is the result of multiple rounds of entire-genome duplications. Here, we analyze the genome of the monocotyledonous model plant species rice, for which a draft of the genomic sequence was published recently. We show that a substantial fraction of all rice genes (similar to15%) are found in duplicated segments. Dating of these block duplications, their nonuniform distribution over the different rice chromosomes, and comparison with the duplication history of Arabidopsis suggest that rice is not an ancient polyploid, as suggested previously, but an ancient aneuploid that has experienced the duplication of one-or a large part of one-chromosome in its evolutionary past, similar to70 million years ago. This date predates the divergence of most of the cereals, and relative dating by phylogenetic analysis shows that this duplication event is shared by most if not all of them.
A
Vandepoele, K., Raes, J., De Veylder, L., Rouzé, P., Rombauts, S., & Inzé, D. (2002). Genome-wide analysis of core cell cycle genes in Arabidopsis. PLANT CELL, 14(4), 903–916. https://doi.org/10.1105/tpc.010445

Cyclin-dependent kinases and cyclins regulate with the help of different interacting proteins the progression through the eukaryotic cell cycle. A high-quality, homology-based annotation protocol was applied to determine the core cell cycle genes in the recently completed Arabidopsis genome sequence. In total, 61 genes were identified belonging to seven selected families of cell cycle regulators, for which 30 are new or corrections of the existing annotation. A new class of putative cell cycle regulators was found that probably are competitors of E2F/DP transcription factors, which mediate the G1-to-S progression. In addition, the existing nomenclature for cell cycle genes of Arabidopsis was updated, and the physical positions of all genes were compared with segmentally duplicated blocks in the genome, showing that 22 core cell cycle genes emerged through block duplications. This genome-wide analysis illustrates the complexity of the plant cell cycle machinery and provides a tool for elucidating the function of new family members in the future.
A
Vandepoele, K., Saeys, Y., Simillion, C., RAES, J., & Van de Peer, Y. (2002). Detecting microcolinearity between Arabidopsis and Rice. Proceedings of the 6th Gatersleben Research Conference (2002), “Plant Genetic Resources in the Genomic Era: Genetic Diversity, Genome Evolution and New Applications”.
Breyne, P., Dreesen, R., Vandepoele, K., De Veylder, L., Van Breusegem, F., Callewaert, L., … Zabeau, M. (2002). Transcriptome analysis during cell division in plants. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 99(23), 14825–14830. https://doi.org/10.1073/pnas.222561199

Using synchronized tobacco Bright Yellow-2 cells and cDNA-amplified fragment length polymorphism-based genomewide expression analysis, we built a comprehensive collection of plant cell cycle-modulated genes. Approximately 1,340 periodically expressed genes were identified, including known cell cycle control genes as well as numerous unique candidate regulatory genes. A number of plant-specific genes were found to be cell cycle modulated. Other transcript tags were derived from unknown plant genes showing homology to cell cycle-regulatory genes of other organisms. Many of the genes encode novel or uncharacterized proteins, indicating that several processes underlying cell division are still largely unknown.
A
Simillion, C., Vandepoele, K., Van Montagu, M., Zabeau, M., & Van de Peer, Y. (2002). The hidden duplication past of Arabidopsis thaliana. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 99(21), 13627–13632. https://doi.org/10.1073/pnas.212522399

Analysis of the genome sequence of Arabidopsis thaliana shows that this genome, like that of many other eukaryotic organisms, has undergone large-scale gene duplications or even duplications of the entire genome. However, the high frequency of gene loss after duplication events reduces colinearity and therefore the chance of finding duplicated regions that, at the extreme, no longer share homologous genes. In this study we show that heavily degenerated block duplications that can no longer be recognized by directly comparing two segments because of differential gene loss, can still be detected through indirect comparison with other segments. When these so-called hidden duplications in Arabidopsis are taken into account, many homologous genomic regions can be found in five to eight copies. This finding strongly implies that Arabidopsis has undergone three, but probably no more, rounds of genome duplications. Therefore, adding such hidden blocks to the duplication landscape of Arabidopsis sheds light on the number of polyploidy events that this model plant genome has undergone in its evolutionary past.
A
Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., & Van de Peer, Y. (2002). The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. GENOME RESEARCH, 12(11), 1792–1801. https://doi.org/10.1101/gr.400202

It is expected that one of the merits of comparative genomics lies in the transfer of structural and functional information from one genome to another. This is based on the observation that, although the number of chromosomal rearrangements that occur in genomes is extensive, different species still exhibit a certain degree of conservation regarding gene content and gene order. It is in this respect that we have developed a new software tool for the Automatic Detection of Homologous Regions (ADHoRe). ADHoRe was primarily developed to find large regions of microcolinearity, taking into account different types of microrearrangements such as tandem duplications, gene loss and translocations, and inversions. Such rearrangements often complicate the detection of colinearity, in particular when comparing more anciently diverged species. Application of ADHoRe to the complete genome of Arabidopsis and a large collection of concatenated rice BACs yields more than 20 regions showing statistically significant microcolinearity between both plant species. These regions comprise from 4 up to 11 conserved homologous gene pairs. We predict the number of homologous regions and the extent of microcolinearity to increase significantly once better annotations of the rice genome become available.
A
Vandepoele, K., Simillion, C., & Van de Peer, Y. (2002). Detecting the undetectable : uncovering duplicated segments in Arabidopsis by comparison with rice. TRENDS IN GENETICS, 18(12), 606–608.

Genome analysis shows that large-scale gene duplications have occurred in fungi, animals and plants, creating genomic regions that show similarity in gene content and order. However the high frequency of gene loss reduces colinearity resulting in duplicated regions that, in the extreme, no longer share homologous genes. Here, we show that by comparison with an appropriate second genome, such paralogous regions can still be identified.
A