Download section

All downloads are also available from the FTP directory

Warning:

The following species have a disclaimer stating that the data is currently not available for bulk download. As such there will be no files available for the per-species data, and for mixed files (e.g. gene families) the information for the species with disclaimers will have been removed.

These files contains the overview of all species in this PLAZA instance.
Additionally, extra information such as naming conventions, source information, version information, publication information, etc. are also provided.
These files contain, per species, the structural annotation for each gene in this PLAZA instance in CSV format.
This includes gene identifier information, coordinate information, strand information, etc.
These files contain, per species, the structural annotation for each gene in this PLAZA instance in GFF format.
This includes gene identifier information, coordinate information, strand information, etc.
These files contain the mapping between genes identifiers and transcript identifiers, as well as information about which isoform was selected per locus
These files contain, per species, the gene identifiers and descriptions available for each gene in this PLAZA instance.
Many genes are known by their aliases (e.g. E2F3 for AT2G36010). These files contain this kind of identifier mappings.
These files contain, per species, the functional annotation of the genes within this PLAZA instance.

This information is limited to the annotation of the genes.
For the associated ontologies (such as parent-child relationships between the various terms), you can find the necessary information here:
These files contain the genomic sequences for each of the organisms present in this PLAZA instance.
These files contain sequences for each locus/transcript for each of the organisms in this PLAZA instance.

  • CDS: Coding Sequences. The concatenated exonic DNA sequence per locus/transcript (coding genes only), from start codon to stop codon.
  • Transcript: Transcript sequences. The spliced transcribed DNA sequence per locus/transcript (all gene types). Contains UTR regions.
  • Protein: Amino Acid Sequences. The translated coding sequence per gene, using the appropriate translation table
These files contain the information about genes that are detected as being duplicates due to a segmental duplication event or a whole genome duplication event, for all the species within this PLAZA instance.
The files contain gene-pairs (i.e. the duplicates), as well as the associated Ks value (#of synonymous substitutions per synonymous site).

NOTE: Ks-values might be default values in case the default i-ADHoRe run was done without Ks-dating
These files contain the gene family information for the (coding) genes within this PLAZA instance, separated per gene family type

The following data files are available:
  • Gene family - gene mapping: mapping between the gene families and their constituent genes. Each gene can be part of only 1 gene family (per family type).
  • Gene family functional annotation: the functional annotation per gene family (GO/InterPro), which is based on enrichment analysis of the constituent genes in the family.
    Also included is a global score which reflects both the enrichment score and the associated p-value.See the documentation for more information.
These files contain the orthologous relationships between genes, as detected by the various methods within the Integrative Orthology approach.

For more information about the integrative orthology, see the documentation