Glossary

A

  • Alignment editing: Procedure removing non-homologous positions and partial/diverged sequences from a multiple sequence alignment prior to tree construction.
  • Anchor point: Colinear gene pair (~genes from the same gene family located in a colinear segment; see also Glossary - Colinearity).

B

  • BED: Abbreviation for Browser Extensible Data; a text file format used to store genomic regions as coordinates and associated annotations.
  • Bonferroni correction: Correction method for multiple testing that was used as part of the GO enrichment tool.
  • Bootstrap analysis: A type of statistical analysis used to test the reliability of specific branches in an evolutionary tree. The non-parametric bootstrap proceeds by re-sampling the original data, with replacement, to create a series of bootstrap samples of the same size as the original data. The bootstrap percentage of a node is the proportion of times that node is present in the set of trees that is constructed from the new data sets.

C

  • Circle Plot: Reports all colinear regions within a single species using a circular representation.
  • Cis-Regulatory element: DNA sequences located within or next to transcribed genes and that either increase (enhancers) or decrease (repressor or silencer, depending on their mechanism of action) gene transcription. Cis-regulatory elements act by recruiting trans-acting transcriptional activator or repressor proteins.
  • Clustering: Genes are clustered into gene families using MCL algorithms.
  • Colinearity: Two genomic segments can be considered colinear if they share the same gene content (homologs) in the same order.
  • Comparing: Refers to the species included in an i-ADHoRe experiment to delineate genomic homology.

D

  • Duplication consistency score: Measure to find dubious duplication nodes in a reconciled tree that are artifacts from the tree construction procedure. Duplication nodes with a low consistency score can be considered speciation nodes.
  • Duplication type: Indicates Tandem or Block duplication event.

E

  • Expansion Plot: Explore the copy-number gene family variation between two groups of species.

F

  • Functional clustering: Physical clustering of functionally related genes.

G

  • GO: A controlled vocabulary to describe gene and gene product attributes in any organism.
  • GO depth: Indicates for a GO term the shortest distance (through parent-child relationships) to the root in the GO hierarchy.
  • GO enrichment: The over-representation of a certain GO term in a gene set compared to the genome-wide background frequency. The statistical significance is determined using the hypergeometric distribution with Bonferroni correction.
  • GO projection: Methodology that uses orthology relations to transfer functional annotation between genes and/or species.
  • GO projection source gene: Refers to the orthologous source gene that was used to transfer the functional annotation. The tree icon links to the phylogenetic tree that was used to delineate the orthologous group.
  • GO type: Refers to the three organizing principles of GO being Cellular Component (CC), Biological Process (BP) and Molecular Function (MF).
  • Gene Family Finder: This tool enables to identify (expanded) gene families specific to one or more species.
  • Gene family: A set of homologous/orthologous genes grouped by sequence similarity algorithms (e.g. using Markov clustering).
  • Gene family expansion: This tool reports the expansion/depletion of a species/lineage (through gene copy number variation compared to total proteome sizes of the species) within a gene family.
  • Gene type: Refers to a locus encoding a protein-coding gene, RNA, pseudo gene or TE (Transposable Element).
  • Genome Browser: Tool that allows users to visually inspect the genomes, with various features mapped onto the raw sequence.

H

  • Homologs: Also known as homologous genes: genes sharing similarity due to common ancestry.

I

  • I-ADHoRe: Iterative Automated Detection of Homologous Regions, an algorithm to find genomic homology based on gene colinearity.
  • I-ADHoRe experiments: Refers to the species included in an i-ADHoRe experiment to delineate genomic homology.
  • Inparalogs: Duplicated genes (or paralogs) that originated after a speciation event.
  • Integrative Orthology Viewer: Tool to explore for a query gene the orthologous genes in other species using different evidences.
  • InterPro: A database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences.

K

  • Keywords: Most frequent functional annotation terms associated with a gene family.
  • Ks: The synonymous substitution rate reports the fraction of synonymous substitutions over all synonymous sites.
  • Ks-dating tool: Method to explore several Ks graphs of colinear gene pairs simultaneously.

M

  • Markov clustering: A graph-based clustering method that delineates clusters in a protein-protein similarity graph in a process that is sensitive to the density and the strength of the connections.
  • Multiple sequence alignment: Also known as MSA: an alignment of two or more biological sequences, generally protein, DNA, or RNA. In general, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor.
  • Multiplicon: A set of homologous genomic segments (detected with i-ADHoRe).
  • Multiplicon View: Displays the aligned gene strings of a set of homologous segments.
  • Multiplicon level: The number of homologous genomic segments in a multiplicon.
  • Multiplicon navigation: Use the arrows to scroll through the multiplicon.
  • Multiplicon segments: The genomic regions grouped within a multiplicon.

N

  • Normalization: Transformation of the original data by reporting relative frequencies (percentages) for the different series being displayed.

O

  • OrthoMCL: Graph-based clustering algorithm to detect orthologous groups; here used to subdivide homologous gene families into sub-families (see also Markov clustering).
  • Orthologs: Also known as orthologous genes: homologs created through a speciation event.
  • Orthology type: Indicates for a Source and Target species the number of orthologous genes per species (S-to-T).
  • Outlier: A gene initially included in a gene family but only showing sequence similarity to a limited number of family members. Note that these genes are NOT included in MSAs and phylogenetic trees.

P

  • Paralogs: Also known as paralogous genes: homologs created through a duplication event.
  • Pathway: Reactome pathway indicating biological processes and reactions.
  • Phylogenetic profile: Indicates the presence or absence of a gene family in a set of species.
  • Phylogenetic tree: An evolutionary tree shows the relationships among various biological taxa that are believed to have a common ancestor.

R

  • Reactome: Reactome is a free, online, open-source, curated resource of core pathways and reactions in human biology (available for other species through projection).
  • Reconciliation: Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family.

S

  • Similarity heatmap: Visualizes the similarities (based on BLAST bitscores) between all genes within a family.
  • Similarity heatmap type: Normalized values show sequence similarities relative to the highest score for the reference gene.
  • Skyline plot: Provides for a locus of interest an overview of the colinear regions within and between species.
  • Species: One of the basic units to classify biological entities. Normally species cannont interbreed.
  • Strand orientation: Indicates whether or not genes on the reverse strand are mapped to the forward strand.
  • Sub-family: Subset of a gene family delineated using the OrthoMCL algorithm.
  • Synteny plot: Reports the local gene organization for homologous genes within a family.

T

  • Tandem representative: All genes in a tandem cluster are remapped to a tandem representative by i-ADHoRe (as tandem genes would negatively influence the program's statistics to identify genomic homology).
  • Tree explorer: View the phylogenetic tree of a homologous gene family.
  • Tribe-MCL: Graph-based clustering algorithm that was used to group all homologous genes into gene families (see also Markov clustering).

W

  • WGD: Abbreviation for Whole Genome Duplication.
  • WGDotplot: Tool that reports in a pairwise manner all colinear regions between and/or within species.
  • WGMapping: Displays the organization of a set of genes on all chromosomes (for a selected species).
  • Whole genome duplication: Duplication event where every chromosome becomes duplicated. Diploid species become tetraploid for example.
  • Window size: The size of the region being analyzed (measured in genes).
  • Workbench: Toolkit included in PLAZA that allows researchers to perform analyses on user-defined gene sets.