Software

Motif Detection

  • MotifSuite Probabilistic motif detection requires a multi-step approach going from the actual de novo regulatory motif finding up to a tedious assessment of the predicted motifs. MotifSuite, a user-friendly web interface streamlines this analysis flow. Its core consists of two post-processing procedures that allow prioritizing the motif detection output. The tools offered by MotifSuite are built around the well-established motif detection tool MotifSampler but can also be used in combination with any other probabilistic motif detection tool.
  • CPModule is a CRM detection method with a performance that is competitive to that of other state-of-art tools, while being able to handle larger data sets (such as 100 sequences in combination with a library of 517 PWMs).
  • ModuleDigger an itemset mining based strategy for computationally detecting cis-regulatory modules (CRMs) in a set of genes.
  • CRoSSeD motif screening based on structural features. CRoSSeD (Conditional Random fields of Smoothed Structural Data) is based on conditional random fields and uses structural scales, reprensenting structural and physico-chemical charachteristics of the DNA molecule, to model and predict regulator binding sites.

Next-generation sequence analysis

  • EXPLoRA (EXtraction of over-rePresented aLleles in BSA), provides a tool for the analysis of data obtained from Bulk segregant analysis (BSA) coupled to high throughput sequencing. BSA is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors. EXPLoRA increases the power of the BSA technology and obtains a better distinction between spuriously and truly linked regions by exploiting the properties of linkage disequilibrium in its analysis model.
  • SomInaClust is a method that accurately identifies driver genes based on their mutation pattern across tumour samples and then classifies them into oncogenes or tumour suppressor genes respectively.
  • EVORhA (Evolutionary Reconstruction of Haplotypes) is a quasi-species assembly method that allows reconstructing from short-read sequence data of a bacterial population, the haplotypes occurring in that population and the frequencies at which they occur. EVORhA can be used amongst others to reconstruct genome-wide haplotypes obtained from a mixed bacterial infection or to reconstruct the population dynamics of an evolving bacterial population from pooled sequence data.

Expression compendia

  • COLOMBOS stands for 'COLlection Of Microarrays for Bacterial OrganismS'. It is a web based interface for exploring and analyzing comprehensive organism-specific cross-platform expression compendia of bacterial organisms. Colombos' major features include: 1. Access to expression compendia that combine information from different platforms and experiments (constructed by collecting, homogenizing and formally annotating publicly available microarrays from Gene Expression Omnibus (GEO), and ArrayExpress). 2. Integration of information from main curated microbial databases allowing the user to interactively browse and query the compendia for specific genes, pathways, transcriptional regulation mechanisms, and more. 3. Providing an extensive formal condition contrast annotation and higher level condition ontology, allowing the user to interactively browse and query the compendia not only for specific arrays or experiments, but also for specific experimental conditions and biological processes. 4. Providing a suite of flexible yet intuitive tools to explore, analyze and visualize the expression data in the compendia. The compendia can also be downloaded in their entirety.
  • MAGIC stands for 'access portal to a cross-platform gene expression compendium for maize'. It is a web-based interface for exploring and analyzing a comprehensive cross-platform expression compendium of maize (similar functionalities as COLOMBOS)

Gene expression analysis

  • SIBR (Spatial Intensity Bias Removal) is an easy-to use tool for normalization of DIGE data containing region-based intensity biases. It also includes a correction step for non-linear dye effects.
  • CALIB a BioConductor package for estimating absolute expression levels from two color microarray data
  • Adaptive quality based clustering Clustering of microarray data, emphasis on identifying tightly coexpressed genes.

Biclustering and coclustering

  • COMODO (COnserved MODules across Organisms) is a coclustering procedure to identify conserved expression modules between two species. The method uses as input microarray data and a gene homology map and provides as output pairs of conserved modules and searches for the pair of modules for which the number of sharing homologs is statistically most significant relative to the size of the linked modules.
  • Query driven biclustering Query-driven biclustering framework (QDB) allows identifiying clusters of genes with similar expression profiles (coordinated changes) in a significant subset of measured experimental conditions.
  • Ensembl query-based biclustering Query-based biclustering techniques allow interrogating a gene expression compendium with a given gene or gene list. They do so by searching for genes in the compendium that have a profile close to the average expression profile of the genes in this query-list. As it can often not be guaranteed that the genes in a long query-list will all be mutually coexpressed, it is advisable to use each gene separately as a query. This approach, however, leaves the user with a tedious post-processing of partially redundant biclustering results. The fact that for each query-gene multiple parameter settings need to be tested in order to detect the "most optimal bicluster size" adds to the redundancy problem. To aid with this post-processing, we developed an ensemble approach to be used in combination with query-based biclustering.

Network inference

  • ViTraM a flexible software platform for visualizing overlapping transcriptional modules in an intuitive way. By visualizing not only the genes and the experiments in which the genes are co-expressed, but also additional properties of the modules such as the regulators and regulatory motifs that are responsible for the observed co-expression, ViTraM can assist in the biological analysis and interpretation of the output of module detection tools.
  • DISTILLER "Data Integration System To Identify Links in Expression Regulation" is a data integration framework that searches for transcriptional modules by combining expression data with information on the direct interaction between a regulator and its corresponding target genes. The framework builds upon advanced itemset mining approaches that have been designed to have good scalability, efficient memory use, and a small number of user parameters. It includes a condition selection or bicluster strategy in which co-expression of genes is required in only a significant subset of the complete condition set. By including this condition selection we can apply the algorithm to large expression compendia where interesting genes are not necessarily co-expressed in all measured conditions. Our approach also makes it straightforward to include any number of data sources related to transcriptional interactions such as additional microarrays, ChIP-chip or motif data. A webservice version of DISTILLER is available at DISTILLER
  • ReMoDiscovery an intuitive approach based on itemset mining to correlate regulatory programs with regulators and corresponding motifs to a set of co-expressed genes. It exploits in a concurrent way three independent data sources: ChIP-chip data, motif information and gene expression profiles.
  • Syntren a network generator that creates synthetic transcriptional regulatory networks and produces simulated gene expression data that approximates experimental data. Network topologies are generated by selecting subnetworks from previously described regulatory networks. Interaction kinetics are modeled by equations based on Michaelis-Menten and Hill kinetics. Our results show that the statistical properties of these topologies more closely approximate those of genuine biological networks than do those of different types of random graph models. Several user-definable parameters adjust the complexity of the resulting data set with respect to the structure learning algorithms.

Network-based datainterpretation

  • Phenetic Omics experiments often result in unstructured gene lists, the interpretation of which in terms of pathways or mode of action is challenging. To aid in the interpretation of such gene lists, we developed PheNetic, a network-based data interpretation method which exploits publicly available information, captured in a comprehensive interaction network to obtain a mechanistic view of the listed genes. PheNetic selects from a comprehensive interaction network the sub-networks highlighted by these gene lists. A userfriendly webservice to analyse lists of differentially expressed genes can be accessed through Phenetic.
  • EPSILON: when genomic data are associated with gene expression data, the resulting expression quantitative trait loci (eQTL) will likely span multiple genes. Here we present EPSILON, a network-based eQTL prioritization technique which exploits the organism dependent interaction network to prioritize to select the most likely causal gene affecting the expression of target genes.