Michiel Van Bel

Title: 
Postdoc
Project: 
PLAZA comparative genomics platform

Shared with Applied Bioinformatics and Biostatistics core service group

Publications

  1. Van Bel, M., Bucchini, F., & Vandepoele, K. (2019). Gene space completeness in complex plant genomes. (S. Kelly, Ed.)CURRENT OPINION IN PLANT BIOLOGY, 48, 9–17.
    Genome annotations offer ample opportunities to study gene functions, biochemical and regulatory pathways, or quantitative trait loci in plants. Determining the quality and completeness of a genome annotation, and maintaining the balance between them, are major challenges, even for genomes of well-studied model organisms. In this review, we present a historical overview of the complexity in different plant genomes and discuss the hurdles and possible solutions in obtaining a complete and high-quality genome annotation. We illustrate there is no clear-cut answer to solve these challenges for different gene types, but provide tips on guiding the iterative process of generating a superior genome annotation, which is a moving target as our knowledge about plant genomics increases and additional data sources become available.
  2. Van Leene, J., Han, C., Gadeyne, A., Eeckhout, D., Matthijs, C., Cannoot, B., De Winne, N., et al. (2019). Capturing the phosphorylation and protein interaction landscape of the plant TOR kinase. NATURE PLANTS, 5, 316–327.
    The target of rapamycin (TOR) kinase is a conserved regulatory hub that translates environmental and nutritional information into permissive or restrictive growth decisions. Despite the increased appreciation of the essential role of the TOR complex in plants, no large-scale phosphoproteomics or interactomics studies have been performed to map TOR signalling events in plants. To fill this gap, we combined a systematic phosphoproteomics screen with a targeted protein complex analysis in the model plant Arabidopsis thaliana. Integration of the phosphoproteome and protein complex data on the one hand shows that both methods reveal complementary subspaces of the plant TOR signalling network, enabling proteome-wide discovery of both upstream and downstream network components. On the other hand, the overlap between both data sets reveals a set of candidate direct TOR substrates. The integrated network embeds both evolutionarily-conserved and plant-specific TOR signalling components, uncovering an intriguing complex interplay with protein synthesis. Overall, the network provides a rich data set to start addressing fundamental questions about how TOR controls key processes in plants, such as autophagy, auxin signalling, chloroplast development, lipid metabolism, nucleotide biosynthesis, protein translation or senescence.
  3. De Clerck, Olivier, Kao, S.-M., Bogaert, K., Blomme, J., Foflonker, F., Kwantes, M., Vancaester, E., et al. (2018). Insights into the evolution of multicellularity from the sea lettuce genome. CURRENT BIOLOGY, 28(18), 2921–2933.
    We report here the 98.5 Mbp haploid genome (12,924 protein coding genes) of Ulva mutabilis, a ubiquitous and iconic representative of the Ulvophyceae or green seaweeds. Ulva's rapid and abundant growth makes it a key contributor to coastal biogeochemical cycles; its role in marine sulfur cycles is particularly important because it produces high levels of dimethylsulfoniopropionate (DMSP), the main precursor of volatile dimethyl sulfide (DMS). Rapid growth makes Ulva attractive biomass feedstock but also increasingly a driver of nuisance "green tides." Ulvophytes are key to understanding the evolution of multicellularity in the green lineage, and Ulva morphogenesis is dependent on bacterial signals, making it an important species with which to study cross-kingdom communication. Our sequenced genome informs these aspects of ulvophyte cell biology, physiology, and ecology. Gene family expansions associated with multicellularity are distinct from those of freshwater algae. Candidate genes, including some that arose following horizontal gene transfer from chromalveolates, are present for the transport and metabolism of DMSP. The Ulva genome offers, therefore, new opportunities to understand coastal and marine ecosystems and the fundamental evolution of the green lineage.
  4. Lang, D., Ullrich, K. K., Murat, F., Fuchs, J., Jenkins, J., Haas, F. B., Piednoel, M., et al. (2018). The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. PLANT JOURNAL, 93(3), 515–533.
    The draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome-scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene-and TE-rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono-centric with peaks of a class of Copia elements potentially coinciding with centromeres. Gene body methylation is evident in 5.7% of the protein-coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure-based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant-specific cell growth and tissue organization. The P. patens genome lacks the TE-rich pericentromeric and gene-rich distal regions typical for most flowering plant genomes. More non-seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.
  5. Van Bel, M., Diels, T., Vancaester, E., Kreft, L., Botzki, A., Van de Peer, Y., Coppens, F., et al. (2018). PLAZA 4.0 : an integrative resource for functional, evolutionary and comparative plant genomics. NUCLEIC ACIDS RESEARCH, 46(D1), D1190–D1196.
    PLAZA (https://bioinformatics.psb.ugent.be/plaza) is a plant-oriented online resource for comparative, evolutionary and functional genomics. The PLAZA platform consists of multiple independent instances focusing on different plant clades, while also providing access to a consistent set of reference species. Each PLAZA instance contains structural and functional gene annotations, gene family data and phylogenetic trees and detailed gene colinearity information. A user-friendly web interface makes the necessary tools and visualizations accessible, specific for each data type. Here we present PLAZA 4.0, the latest iteration of the PLAZA framework. This version consists of two new instances (Dicots 4.0 and Monocots 4.0) providing a large increase in newly available species, and offers access to updated and newly implemented tools and visualizations, helping users with the ever-increasing demands for complex and in-depth analyzes. The total number of species across both instances nearly doubles from 37 species in PLAZA 3.0 to 71 species in PLAZA 4.0, with a much broader coverage of crop species (e.g. wheat, palm oil) and species of evolutionary interest (e.g. spruce, Marchantia). The new PLAZA instances can also be accessed by a programming interface through a RESTful web service, thus allowing bioinformaticians to optimally leverage the power of the PLAZA platform.
  6. Tasdighian, S., Van Bel, M., Li, Z., Van de Peer, Y., Carretero-Paulet, L., & Maere, S. (2017). Reciprocally retained genes in the angiosperm lineage show the hallmarks of dosage balance sensitivity. PLANT CELL, 29(11), 2766–2785.
    In several organisms, particular functional categories of genes, such as regulatory and complex-forming genes, are preferentially retained after whole-genome multiplications but rarely duplicate through small-scale duplication, a pattern referred to as reciprocal retention. This peculiar duplication behavior is hypothesized to stem from constraints on the dosage balance between the genes concerned and their interaction context. However, the evidence for a relationship between reciprocal retention and dosage balance sensitivity remains fragmentary. Here, we identified which gene families are most strongly reciprocally retained in the angiosperm lineage and studied their functional and evolutionary characteristics. Reciprocally retained gene families exhibit stronger sequence divergence constraints and lower rates of functional and expression divergence than other gene families, suggesting that dosage balance sensitivity is a general characteristic of reciprocally retained genes. Gene families functioning in regulatory and signaling processes are much more strongly represented at the top of the reciprocal retention ranking than those functioning in multiprotein complexes, suggesting that regulatory imbalances may lead to stronger fitness effects than classical stoichiometric protein complex imbalances. Finally, reciprocally retained duplicates are often subject to dosage balance constraints for prolonged evolutionary times, which may have repercussions for the ease with which genome multiplications can engender evolutionary innovation.
  7. Schmidt, Martin, Van Bel, M., Woloszynska, M., Slabbinck, B., Martens, C., De Block, M., Coppens, F., et al. (2017). Plant-RRBS, a bisulfite and next-generation sequencing-based methylome profiling method enriching for coverage of cytosine positions. BMC PLANT BIOLOGY, 17.
    Background: Cytosine methylation in plant genomes is important for the regulation of gene transcription and transposon activity. Genome-wide methylomes are studied upon mutation of the DNA methyltransferases, adaptation to environmental stresses or during development. However, from basic biology to breeding programs, there is a need to monitor multiple samples to determine transgenerational methylation inheritance or differential cytosine methylation. Methylome data obtained by sodium hydrogen sulfite (bisulfite)-conversion and next-generation sequencing (NGS) provide genome- wide information on cytosine methylation. However, a profiling method that detects cytosine methylation state dispersed over the genome would allow high-throughput analysis of multiple plant samples with distinct epigenetic signatures. We use specific restriction endonucleases to enrich for cytosine coverage in a bisulfite and NGS-based profiling method, which was compared to whole-genome bisulfite sequencing of the same plant material. Methods: We established an effective methylome profiling method in plants, termed plant-reduced representation bisulfite sequencing (plant-RRBS), using optimized double restriction endonuclease digestion, fragment end repair, adapter ligation, followed by bisulfite conversion, PCR amplification and NGS. We report a performant laboratory protocol and a straightforward bioinformatics data analysis pipeline for plant-RRBS, applicable for any reference-sequenced plant species. Results: As a proof of concept, methylome profiling was performed using an Oryza sativa ssp. indica pure breeding line and a derived epigenetically altered line (epiline). Plant-RRBS detects methylation levels at tens of millions of cytosine positions deduced from bisulfite conversion in multiple samples. To evaluate the method, the coverage of cytosine positions, the intra-line similarity and the differential cytosine methylation levels between the pure breeding line and the epiline were determined. Plant-RRBS reproducibly covers commonly up to one fourth of the cytosine positions in the rice genome when using MspI-DpnII within a group of five biological replicates of a line. The method predominantly detects cytosine methylation in putative promoter regions and not-annotated regions in rice. Conclusions: Plant-RRBS offers high-throughput and broad, genome- dispersed methylation detection by effective read number generation obtained from reproducibly covered genome fractions using optimized endonuclease combinations, facilitating comparative analyses of multi-sample studies for cytosine methylation and transgenerational stability in experimental material and plant breeding populations.
  8. Vu, L. D., Verstraeten, I., Stes, E., Van Bel, M., Coppens, F., Gevaert, K., & De Smet, I. (2017). Proteome profiling of wheat shoots from different cultivars. FRONTIERS IN PLANT SCIENCE, 8.
    Wheat is a cereal grain and one of the world's major food crops. Recent advances in wheat genome sequencing are by now facilitating its genomic and proteomic analyses. However, little is known about possible differences in total protein levels of hexaploid versus tetraploid wheat cultivars, and also knowledge of phosphorylated wheat proteins is still limited. Here, we performed a detailed analysis of the proteome of seedling leaves from two hexaploid wheat cultivars (Triticum aestivum L. Pavon 76 and USU-Apogee) and one tetraploid wheat (T. turgidum ssp. durum cv. Senatore Cappelli). Our shotgun proteomics data revealed that, whereas we observed some significant differences, overall a high similarity between hexaploid and tetraploid varieties with respect to protein abundance was observed. In addition, already at the seedling stage, a small set of proteins was differential between the small (USU-Apogee) and larger hexaploid wheat cultivars (Pavon 76), which could potentially act as growth predictors. Finally, the phosphosites identified in this study can be retrieved from the in-house developed plant PTM-Viewer (bioinformatics.psb.ugent.be/webtools/ptm_viewer/), making this the first searchable repository for phosphorylated wheat proteins. This paves the way for further in depth, quantitative (phospho) proteome-wide differential analyses upon a specific trigger or environmental change.
  9. Van Bel, M., & Coppens, F. (2017). Exploring plant co-expression and gene-gene Interactions with CORNET 3.0. In A. D. van Dijk (Ed.), Plant genomics databases : methods and protocols (Vol. 1533, pp. 201–212). New York, NY, USA: Springer.
    Selecting and filtering a reference expression and interaction dataset when studying specific pathways and regulatory interactions can be a very time-consuming and error-prone task. In order to reduce the duplicated efforts required to amass such datasets, we have created the CORNET (CORrelation NETworks) platform which allows for easy access to a wide variety of data types: coexpression data, protein-protein interactions, regulatory interactions, and functional annotations. The CORNET platform outputs its results in either text format or through the Cytoscape framework, which is automatically launched by the CORNET website.CORNET 3.0 is the third iteration of the web platform designed for the user exploration of the coexpression space of plant genomes, with a focus on the model species Arabidopsis thaliana. Here we describe the platform: the tools, data, and best practices when using the platform. We indicate how the platform can be used to infer networks from a set of input genes, such as upregulated genes from an expression experiment. By exploring the network, new target and regulator genes can be discovered, allowing for follow-up experiments and more in-depth study. We also indicate how to avoid common pitfalls when evaluating the networks and how to avoid over interpretation of the results.All CORNET versions are available at http://bioinformatics.psb.ugent.be/cornet/ .
  10. Kreft, L., Botzki, A., Coppens, F., Vandepoele, K., & Van Bel, M. (2017). PhyD3 : a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization. BIOINFORMATICS, 33(18), 2946–2947.
    Motivation: Comparative and evolutionary studies utilize phylogenetic trees to analyze and visualize biological data. Recently, several web-based tools for the display, manipulation and annotation of phylogenetic trees, such as iTOL and Evolview, have released updates to be compatible with the latest web technologies. While those web tools operate an open server access model with a multitude of registered users, a feature-rich open source solution using current web technologies is not available. Results: Here, we present an extension of the widely used PhyloXML standard with several new options to accommodate functional genomics or annotation datasets for advanced visualization. Furthermore, PhyD3 has been developed as a lightweight tool using the JavaScript library D3.js to achieve a state-of-the-art phylogenetic tree visualization in the web browser, with support for advanced annotations. The current implementation is open source, easily adaptable and easy to implement in third parties' web sites. Availability and implementation: More information about PhyD3 itself, installation procedures and implementation links are available at http://phyd3.bits.vib.be and at http://github.com/vibbits/phyd3/. Supplementary information: Supplementary data are available at Bioinformatics online.
  11. Vu, L. D., Stes, E., Van Bel, M., Nelissen, H., Maddelein, D., Inzé, D., Coppens, F., et al. (2016). Up-to-date workflow for plant (phospho)proteomics identifies differential drought-responsive phosphorylation events in maize leaves. JOURNAL OF PROTEOME RESEARCH, 15(12), 4304–4317.
    Protein phosphorylation is one of the most common post-translational modifications (PTMs), which can regulate protein activity and localization as well as proteinprotein interactions in numerous cellular processes. Phosphopeptide enrichment techniques enable plant researchers to acquire insight into phosphorylation-controlled signaling networks in various plant species. Most phosphoproteome analyses of plant samples still involve stable isotope labeling, peptide fractionation, and demand a lot of mass spectrometry (MS) time. Here, we present a simple workflow to probe, map, and catalogue plant phosphoproteomes, requiring relatively low amounts of starting material, no labeling, no fractionation, and no excessive analysis time. Following optimization of the different experimental steps on Arabidopsis thaliana samples, we transferred our workflow to maize, a major monocot crop, to study signaling upon drought stress. In addition, we included normalization to protein abundance to identify true phosphorylation changes. Overall, we identified a set of new phosphosites in both Arabidopsis thaliana and maize, some of which are differentially phosphorylated upon drought. All data are available via ProteomeXchange with identifier PXD003634, but to provide easy access to our model plant and crop data sets, we created an online database, Plant PTM Viewer (bioinformatics.psb.ugent.be/webtools/ptm_viewer/), where all phosphosites identified in our study can be consulted.
  12. Van de Velde, Jan, Van Bel, M., Vaneechoutte, D., & Vandepoele, K. (2016). A collection of conserved noncoding sequences to study gene regulation in flowering plants. PLANT PHYSIOLOGY, 171(4), 2586–2598.
    Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops.
  13. Walton, A., Stes, E., Cybulski, N., Van Bel, M., Iñigo, S., Nagels Durand, A., Timmerman, E., et al. (2016). It’s time for some “site”-seeing: novel tools to monitor the ubiquitin landscape in Arabidopsis thaliana. PLANT CELL, 28(1), 6–16.
    Ubiquitination, the covalent binding of the small protein modifier ubiquitin to a target protein, is an important and frequently studied posttranslational protein modification. Multiple reports provide useful insights into the plant ubiquitinome, but mostly at the protein level without comprehensive site identification. Here, we implemented ubiquitin combined fractional diagonal chromatography (COFRADIC) for proteome-wide ubiquitination site mapping on Arabidopsis thaliana cell cultures. We identified 3009 sites on 1607 proteins, thereby greatly increasing the number of known ubiquitination sites in this model plant. Finally, The Ubiquitination Site tool (http://bioinformatics.psb.ugent.be/webtools/ubiquitin_viewer/) gives access to the obtained ubiquitination sites, not only to consult the ubiquitination status of a given protein, but also to conduct intricate experiments aiming to study the roles of specific ubiquitination events. Together with the antibodies recognizing the ubiquitin remnant motif, ubiquitin COFRADIC represents a powerful tool to resolve the ubiquitination maps of numerous cellular processes in plants.
  14. Yue, K., Sandal, P., Williams, E. L., Murphy, E., Stes, E., Nikonorova, N., Ramakrishna, P., et al. (2016). PP2A-3 interacts with ACR4 and regulates formative cell division in the Arabidopsis root. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 113(5), 1447–1452.
    In plants, the generation of new cell types and tissues depends on coordinated and oriented formative cell divisions. The plasma membrane-localized receptor kinase ARABIDOPSIS CRINKLY 4 (ACR4) is part of a mechanism controlling formative cell divisions in the Arabidopsis root. Despite its important role in plant development, very little is known about the molecular mechanism with which ACR4 is affiliated and its network of interactions. Here, we used various complementary proteomic approaches to identify ACR4-interacting protein candidates that are likely regulators of formative cell divisions and that could pave the way to unraveling the molecular basis behind ACR4-mediated signaling. We identified PROTEIN PHOSPHATASE 2A-3 (PP2A-3), a catalytic subunit of PP2A holoenzymes, as a previously unidentified regulator of formative cell divisions and as one of the first described substrates of ACR4. Our in vitro data argue for the existence of a tight posttranslational regulation in the associated biochemical network through reciprocal regulation between ACR4 and PP2A-3 at the phosphorylation level.
  15. Nelissen, H., Eeckhout, D., Demuynck, K., Persiau, G., Walton, A., Van Bel, M., Vervoort, M., et al. (2015). Dynamic changes in ANGUSTIFOLIA3 complex composition reveal a growth regulatory mechanism in the maize leaf. PLANT CELL, 27(6), 1605–1619.
    Most molecular processes during plant development occur with a particular spatio-temporal specificity. Thus far, it has remained technically challenging to capture dynamic protein-protein interactions within a growing organ, where the interplay between cell division and cell expansion is instrumental. Here, we combined high-resolution sampling of the growing maize (Zea mays) leaf with tandem affinity purification followed by mass spectrometry. Our results indicate that the growth-regulating SWI/SNF chromatin remodeling complex associated with ANGUSTIFOLIA3 (AN3) was conserved within growing organs and between dicots and monocots. Moreover, we were able to demonstrate the dynamics of the AN3-interacting proteins within the growing leaf, since copurified GROWTH-REGULATING FACTORs (GRFs) varied throughout the growing leaf. Indeed, GRF1, GRF6, GRF7, GRF12, GRF15, and GRF17 were significantly enriched in the division zone of the growing leaf, while GRF4 and GRF10 levels were comparable between division zone and expansion zone in the growing leaf. These dynamics were also reflected at the mRNA and protein levels, indicating tight developmental regulation of the AN3-associated chromatin remodeling complex. In addition, the phenotypes of maize plants overexpressing miRNA396a-resistant GRF1 support a model proposing that distinct associations of the chromatin remodeling complex with specific GRFs tightly regulate the transition between cell division and cell expansion. Together, our data demonstrate that advancing from static to dynamic protein-protein interaction analysis in a growing organ adds insights in how developmental switches are regulated.
  16. Proost, S., Van Bel, M., Vaneechoutte, D., Van de Peer, Y., Inzé, D., Mueller-Roeber, B., & Vandepoele, K. (2015). PLAZA 3.0 : an access point for plant comparative genomics. NUCLEIC ACIDS RESEARCH, 43(D1), D974–D981.
    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms.
  17. Olvera Carrillo, Y., Van Bel, M., Van Hautegem, T., Fendrych, M., Huysmans, M., Braun Šimášková, M., Van Durme, M., et al. (2015). A conserved core of programmed cell death indicator genes discriminates developmentally and environmentally induced programmed cell death in plants. PLANT PHYSIOLOGY, 169(4), 2684–2699.
    A plethora of diverse programmed cell death (PCD) processes has been described in living organisms. In animals and plants, different forms of PCD play crucial roles in development, immunity, and responses to the environment. While the molecular control of some animal PCD forms such as apoptosis is known in great detail, we still know comparatively little about the regulation of the diverse types of plant PCD. In part, this deficiency in molecular understanding is caused by the lack of reliable reporters to detect PCD processes. Here, we addressed this issue by using a combination of bioinformatics approaches to identify commonly regulated genes during diverse plant PCD processes in Arabidopsis (Arabidopsis thaliana). Our results indicate that the transcriptional signatures of developmentally controlled cell death are largely distinct from the ones associated with environmentally induced cell death. Moreover, different cases of developmental PCD share a set of cell death-associated genes. Most of these genes are evolutionary conserved within the green plant lineage, arguing for an evolutionary conserved core machinery of developmental PCD. Based on this information, we established an array of specific promoter-reporter lines for developmental PCD in Arabidopsis. These PCD indicators represent a powerful resource that can be used in addition to established morphological and biochemical methods to detect and analyze PCD processes in vivo and in planta.
  18. De Witte, D., Van de Velde, J., Decap, D., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., et al. (2015). BLSSpeller : exhaustive comparative discovery of conserved cis-regulatory elements. BIOINFORMATICS, 31(23), 3758–3766.
    Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O. sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z. mays.
  19. Vargas, L., Santa Brigida, A. B., Mota Filho, J. P., de Carvalho, T. G., Rojas, C. A., Vaneechoutte, D., Van Bel, M., et al. (2014). Drought tolerance conferred to sugarcane by association with Gluconacetobacter diazotrophicus: a transcriptomic view of hormone pathways. PLOS ONE, 9(12).
    Sugarcane interacts with particular types of beneficial nitrogen-fixing bacteria that provide fixed-nitrogen and plant growth hormones to host plants, promoting an increase in plant biomass. Other benefits, as enhanced tolerance to abiotic stresses have been reported to some diazotrophs. Here we aim to study the effects of the association between the diazotroph Gluconacetobacter diazotrophicus PAL5 and sugarcane cv. SP70-1143 during water depletion by characterizing differential transcriptome profiles of sugarcane. RNA-seq libraries were generated from roots and shoots of sugarcane plants free of endophytes that were inoculated with G. diazotrophicus and subjected to water depletion for 3 days. A sugarcane reference transcriptome was constructed and used for the identification of differentially expressed transcripts. The differential profile of non-inoculated SP70-1143 suggests that it responds to water deficit stress by the activation of drought-responsive markers and hormone pathways, as ABA and Ethylene. qRT-PCR revealed that root samples had higher levels of G. diazotrophicus 3 days after water deficit, compared to roots of inoculated plants watered normally. With prolonged drought only inoculated plants survived, indicating that SP70-1143 plants colonized with G. diazotrophicus become more tolerant to drought stress than non-inoculated plants. Strengthening this hypothesis, several gene expression responses to drought were inactivated or regulated in an opposite manner, especially in roots, when plants were colonized by the bacteria. The data suggests that colonized roots would not be suffering from stress in the same way as non-inoculated plants. On the other hand, shoots specifically activate ABA-dependent signaling genes, which could act as key elements in the drought resistance conferred by G. diazotrophicus to SP70-1143. This work reports for the first time the involvement of G. diazotrophicus in the promotion of drought-tolerance to sugarcane cv. SP70-1143, and it describes the initial molecular events that may trigger the increased drought tolerance in the host plant.
  20. Sonnhammer, E. L., Gabaldón, T., da Silva, A. W. S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P. D., et al. (2014). Big data and other challenges in the quest for orthologs. BIOINFORMATICS.
    Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking.
  21. De Witte, D., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2014). A parallel, distributed-memory framework for comparative motif discovery. In R. Wyrzykowski, J. Dongarra, K. Karczewski , & J. Wasniewski (Eds.), Lecture Notes in Computer Science (Vol. 8385, pp. 268–277). Presented at the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM), Springer.
    The increasing number of sequenced organisms has opened new possibilities for the computational discovery of cis-regulatory elements ('motifs') based on phylogenetic footprinting. Word-based, exhaustive approaches are among the best performing algorithms, however, they pose significant computational challenges as the number of candidate motifs to evaluate is very high. In this contribution, we describe a parallel, distributed-memory framework for de novo comparative motif discovery. Within this framework, two approaches for phylogenetic footprinting are implemented: an alignment-based and an alignment-free method. The framework is able to statistically evaluate the conservation of motifs in a search space containing over 160 million candidate motifs using a distributed-memory cluster with 200 CPU cores in a few hours. Software available from http://bioinformatics.intec.ugent.be/blsspeller/
  22. Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., & Vandepoele, K. (2013). TRAPID : an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes. GENOME BIOLOGY, 14(12).
    Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system.
  23. Vandepoele, Klaas, Van Bel, M., Richard, G., Van Landeghem, S., Verhelst, B., Moreau, H., Van de Peer, Y., et al. (2013). pico-PLAZA, a genome database of microbial photosynthetic eukaryotes. ENVIRONMENTAL MICROBIOLOGY, 15(8), 2147–2153.
    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. PLAZA can be used to functionally characterize large-scale ES /RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylumtricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains.
  24. De Witte, D., Van de Velde, J., Van Bel, M., Audenaert, P., Demeester, P., Dhoedt, B., Vandepoele, K., et al. (2013). Comparative motif discovery in the cloud. Benelux Bioinformatics Conference 2013, Abstracts. Presented at the Benelux Bioinformatics Conference 2013.
  25. Dessimoz, C., Gabaldón, T., Roos, D. S., Sonnhammer, E. L., Herrero, J., Quest Orthologs Consortium, the, Vandepoele, K., et al. (2012). Toward community standards in the quest for orthologs. BIOINFORMATICS, 28(6), 900–904.
  26. Van Bel, M., Proost, S., Wischnitzki, E., Movahedi, S., Scheerlinck, C., Van de Peer, Y., & Vandepoele, K. (2012). Dissecting plant genomes with the PLAZA comparative genomics platform. PLANT PHYSIOLOGY, 158(2), 590–600.
    With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase.
  27. Movahedi, S., Van Bel, M., Heyndrickx, K., & Vandepoele, K. (2012). Comparative co-expression analysis in plant biology. PLANT CELL AND ENVIRONMENT, 35(10), 1787–1798.
    The analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has shown that transcriptionally coordinated genes are often functionally related. Based on large-scale expression compendia grouping multiple experiments, this guilt-by-association principle has been applied to study modular gene programmes, identify cis-regulatory elements or predict functions for unknown genes in different model plants. Recently, several studies have demonstrated how, through the integration of gene homology and expression information, correlated gene expression patterns can be compared between species. The incorporation of detailed functional annotations as well as experimental data describing proteinprotein interactions, phenotypes or tissue specific expression, provides an invaluable source of information to identify conserved gene modules and translate biological knowledge from model organisms to crops. In this review, we describe the different steps required to systematically compare expression data across species. Apart from the technical challenges to compute and display expression networks from multiple species, some future applications of plant comparative transcriptomics are highlighted.
  28. Moreau, H., Verhelst, B., Couloux, A., Derelle, E., Rombauts, S., Grimsley, N., Van Bel, M., et al. (2012). Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. GENOME BIOLOGY, 13(8).
    Background: Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research: Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion: The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.
  29. De Witte, D., Van Bel, M., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2012). A high performance computing approach to the dicovery of conserved motifs. 20e Annual Conference on Intelligent Systems for Molecular Biology, Abstracts (pp. 1–1). Presented at the 20e Annual Conference on Intelligent Systems for Molecular Biology (ISMB - 2012).
  30. De Witte, D., Van Bel, M., Demeester, P., Dhoedt, B., Vandepoele, K., & Fostier, J. (2012). Alignment-free genome-wide comparative motif discovery in 4 Monocot species. 11th European Conference on Computational Biology, Abstracts (pp. 1–1). Presented at the 11th European Conference on Computational Biology (ECCB - 2012).
  31. Van Bel, M. (2012). Comparative analysis of plant genomes through data integration. Ghent University. Faculty of Sciences, Ghent, Belgium.
    When we started our research in 2008, several online resources for genomics existed, each with a different focus. TAIR (The Arabidopsis Information Resource) has a focus on the plant model species Arabidopsis thaliana, with (at that time) little or no support for evolutionary or comparative genomics. Ensemble provided some basic tools and functions as a data warehouse, but it would only start incorporating plant genomes in 2010. There was no online resource at that time however, that provided the necessary data content and tools for plant comparative and evolutionary genomics that we required. As such, the plant community was missing an essential component to get their research at the same level as the biomedicine oriented research communities. We started to work on PLAZA in order to provide such a data resource that could be accessed by the plant community, and which also contained the necessary data content to help our research group’s focus on evolutionary genomics. The platform for comparative and evolutionary genomics, which we named PLAZA, was developed from scratch (i.e. not based on an existing database scheme, such as Ensemble). Gathering the data for all species, parsing this data into a common format and then uploading it into the database was the next step. We developed a processing pipeline, based on sequence similarity measurements, to group genes into gene families and sub families. Functional annotation was gathered through both the original data providers and through InterPro scans, combined with Interpro2GO. This primary data information was then ready to be used in every subsequent analysis. Building such a database was good enough for research within our bioinformatics group, but the target goal was to provide a comprehensive resource for all plant biologists with an interest in comparative and evolutionary genomics. Designing and creating a user-friendly, visually appealing web interface, connected to our database, was the next step. While the most detailed information is commonly presented in data tables, aesthetically pleasing graphics, images and charts are often used to visualize trends, general statistics and also used in specific tools. Design and development of these tools and visualizations is thus one of the core elements within my PhD. The PLAZA platform was designed as a gene-centric data resource, which is easily navigated when a biologist wants to study a relative small number of genes. However, using the default PLAZA website to retrieve information for dozens of genes quickly becomes very tedious. Therefore a ’gene set’-centric extra layer was developed where user-defined gene sets could be quickly analyzed. This extra layer, called the PLAZA workbench, functions on top of the normal PLAZA website, implicating that only gene sets from species present within the PLAZA database can be directly analyzed. The PLAZA resource for comparative and evolutionary genomics was a major success, but it still had several issues. We tried to solve at least two of these problems at the same time by creating a new platform. The first issue was the building procedure of PLAZA: adding a single species, or updating the structural annotation of an existing one, requires the total re-computation of the database content. The second issue was the restrictiveness of the PLAZA workbench: through a mapping procedure gene sets could be entered for species not present in the PLAZA database, but for species without a phylogenetic close relative this approach did not always yield satisfying results. Furthermore, the research in question might just focus on the difference between a species present in PLAZA and a close relative not present in PLAZA (e.g. to study adaptation to a different ecological niche). In such a case, the mapping procedure is in itself useless. With the advent of NGS transcriptome data sets for a growing number of species, it was clear that a next challenge had presented itself. We designed and developed a new platform, named TRAPID, which could automatically process entire transcriptome data sets, using a reference database. The target goal was to have the processing done quickly with the results containing both gene family oriented data (such as multiple sequence alignments and phylogenetic trees) and functional characterization of the transcripts. Major efforts went into designing the processing pipeline so it could be reliable, fast and accurate.
  32. Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., & Vandepoele, K. (2009). PLAZA : a comparative genomics resource to study gene and genome evolution in plants. PLANT CELL, 21(12), 3718–3731.
    The number of sequenced genomes of representatives within the green lineage is rapidly increasing. Consequently, comparative sequence analysis has significantly altered our view on the complexity of genome organization, gene function, and regulatory pathways. To explore all this genome information, a centralized infrastructure is required where all data generated by different sequencing initiatives is integrated and combined with advanced methods for data mining. Here, we describe PLAZA, an online platform for plant comparative genomics (http://bioinformatics.psb.ugent.be/plaza/). This resource integrates structural and functional annotation of published plant genomes together with a large set of interactive tools to study gene function and gene and genome evolution. Precomputed data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, intraspecies whole-genome dot plots, and genomic colinearity between species. Through the integration of high confidence Gene Ontology annotations and tree-based orthology between related species, thousands of genes lacking any functional description are functionally annotated. Advanced query systems, as well as multiple interactive visualization tools, are available through a user-friendly and intuitive Web interface. In addition, detailed documentation and tutorials introduce the different tools, while the workbench provides an efficient means to analyze user-defined gene sets through PLAZA's interface. In conclusion, PLAZA provides a comprehensible and up-to-date research environment to aid researchers in the exploration of genome information within the green plant lineage.
  33. Van Bel, M., Saeys, Y., & Van de Peer, Y. (2008). FunSiP: a modular and extensible classifier for the prediction of functional sites in DNA. BIOINFORMATICS, 24(13), 1532–1533.
    Motivation: Many problems in genome annotation are tackled by using a classification model to predict functional sites such as splice sites, translation start sites or stop codons. Locating the correct position of these sites remains one of the most important but also one of the most difficult issues in the structural annotation of genomes. Most of the software currently in use is written for a very specific problem, thereby limiting the possibilities for reuse. Summary: We developed a software platform that uses a very general approach towards the classification of functional sites in DNA sequences. The program uses an ab initio approach towards the identification of these sites, and extends SpliceMachine, a previously developed splice site predictor that shows state-of-the art performance for both donor and acceptor splice site recognition in the human and Arabidopsis thaliana genome.