Download ORCAE guidelines



It represents the genomic context of the locus you are currently viewing. Depicted in the overview are the gene models of the selected locus and 2 loci up and downstream of the selected locus. The view is always centred on the selected locus (also indicated by a green box around the gene models) and is automatically rescaled to fit 5 loci (the absolute length of the genomic region shown can differ from locus to locus and is reported below the picture). Genes originated from the forward strand are drawn above the black line (= the genomic sequence), the ones from the reverse strand beneath the line. The black line can be interrupted by dashed (grey) regions; these denote gaps in the sequence.
Clicking on a gene model in the picture will redirect you to that locus page. At the bottom of the figure there are some arrow shaped icons. These are also for browsing: click on the single arrow to move the view one locus in the direction of the arrow. Clicking on the ?double? arrow will shift the view in such a way that the last (or first, depending on the arrow direction) gene model on the current view becomes the first (or last) in the next view (move in steps of 5 loci)



The name of the annotator who last edited the locus.


The email address of the annotator.
In case of conflict you are encouraged to contact the annotator yourself and solve the dispute.


The institute the annotator belongs to.


In this field the current status of the locus is depicted. 4 different states are possible:
the locus is editable
the locus has been deleted or has become obsolete (eg. by fusing it to another gene)
the locus is being edited by someone else
the locus is temporarily unavailable, the gene structure of this locus has recently been altered and we are updating the similarity information.

Gene Function

Short Name

When the gene acronym is known, it can be entered in this field. The gene acronym is a short code of 3-letters followed by a number or letter that indicates the gene function. For a list of some of the frequently used codes, click the link.

Alternative names

Enter possible synonymous names for the gene in this field.


The one line fasta-header description.

Additional functional description

A more elaborate description of the gene function.

Pubmed ID

Pubmed IDs for the most important publications that report about this particular gene (ie. in the species being annotated). Alternatively Pubmed IDs for publications about orthologous genes from other species that helped you to specify this gene's function are provided.

EC Number

In this field you can find the EC number specifying the biochemical activity of the encoded protein (if available).


When available this field shows the KOG (or COG) id for the protein of interest. If an id is present also the class and the description of the KOG are shown.

Gene Ontology

If available, the GO assignments for each GO-tree separately will be provided in this section. Most of the GO assignments are automatically derived from the InterProScan result (InterPro2Go).
Clicking on the GO-id will redirect you to the AmiGO page of that GO-id.

Protein Domain

This section shows all the protein domains that were found in the protein. Domains are mapped using InterProScan, making use of all the public databases. For each domain, the page provides the domain name, the database used to find this domain and a description of the domain.


A schematic representation of the mapped domains is provided at the top of this block. It depicts the relative positions (in aminoacids) of the different domain hits within the protein sequence. Each domain has a different colour (note that the colours do not reflect the score of the hit).

Protein Homologs

This section shows proteins that share similarity with the protein being viewed. The proteins shown are retrieved by Blast with an Evalue lower than 1e-5 (note that only the best 10 hits are shown if more than ten matches were available with an Evalue of < 1e-5). For each hit the gene name and description as well as the scores are presented.
Clicking on the 'View Blast' button will redirect you to a new page showing the blast output, allowing visualisation of the blast alignment.


The schema at the beginning of this block is an overview of a multiple alignment of the query protein and the reported hits. Alignments are constructed with the Muscle program.
The gene of interest is colored in blue, the protein hits in green. Boxes represent aligned protein sequence and the dotted lines are gaps introduced to optimise the alignment. Vertical grey lines indicate the splice site junctions in reference to the protein sequence of the selected gene.
If you want a more detailed view of the alignment you can click on the 'View in JalView' link. This will open a new window showing the aligned protein sequences using the JalView editor. The JalView editor also allows simple tree construction possibilities. You cannot edit this alignment.

Gene Structure

This section provides the positions of the exons and (when available) the 3' and 5' untranslated regions with respect to the sequence scaffold (supercontig). These positions are given as coordinates (EMBL feature format). Untranslated regions occur at the beginning or end of the gene and are separated from coding regions by a semicolon. Next to the coordinates also the sequence type (eg. mRNA, SeCys, ..) and the strand are reported. The quality tag is an indication of the confidence that the structure is the correct one (from 1 to 5).
For a detailed view of the gene structure in either GenomeView or Artemini, click the 'View in GenomeView' respc. 'View in Artemini' link.


In the schema exons are represented by thick, dark blue blocks, untranslated regions by thin, light blue blocks and introns by arched lines. The orientation the gene is also indicated (5' - 3'). Note that the schemas are always drawn in the 5' to 3', so genes on the minus strand are inverted. Hovering over the exons/introns will show their length in nucleotides.

Tiling array


Graphical representation of the tiling array data mapped onto the gene model.
There is a context of 100 bases around the gene model (= tiling array data start 100bases before the model and ends 100bases after it.)
The bars denote the expression level. Green color indicates that the experiment value is higher than the control. Otherwise they are colored red.
The direction of the gene reflects the strand it is on. They are drawn 5'-3' if located on the forward strand or 3'-5' for the reverse strand
Click in the grey area to link to the systemix site for a more detailed view (showing the same region) of the expression data.


Here you can find the coding sequence (CDS) of the proposed model.
Clicking on the 'Blast' button will launch a blast against the nr_dna database of NCBI.

Protein Section

This shows the translated CDS, ie. the predicted protein sequence. The protein length (in aminoacids) is also reported.
Click on the 'Blast' button to launch a blast of this sequence against the Genbank non redundant protein database.

Signal Peptide

If available the sequence of the signal peptide is reported.

Subcellular localisation

This field shows the predicted cellular location of the protein. Click on the link to view the scoring of the predicted cellular localisation.
The prediction is done with HECTAR (for more info please contact Bernhard Gschloessl in Roscoff).


All transcripts for which the genomic position overlaps on at least 1 base with the given gene model, are used in this comparison.
When the transcript sequence is completely consistent with the gene model it is colored green. If the transcript has a red color, it is in some way inconsistent with the given gene model. In this case a brief explanation on why the transcipts conflicts with the model is provided.
If 2 EST sequences are connected by a dashed grey line, it means these are part from an EST-couple (aka. derived from opposite ends of the same clone).