Evaluation of gene prediction software using a genomic data set: application of Arabidopsis thaliana sequences.

Pavy, N., Rombauts, S., D?hais, P., Mathé, C., Ramana, D.V., Leroy, P., Rouzé, P.

Corresponding author:

Abstract



Motivation
The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes.

Results
We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software.

Availability
The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/

Contact
pierre.rouze@psb.ugent.be.

Supplementary Data

Contact Info

Department of Plant Genetics
Flanders Interuniversity Institute of Biotechnology
and
Laboratoire associ? de l'INRA (France)
Bioinformatics & Evolutionary Genomics
Technologiepark 927
B-9052 Gent
BELGIUM

1on leave from Avesthagen Graine Technologies,
Plant Genome Biology Laboratory,
P.O. Box 5091, Cubbon Park GPO,
Bangalore-560001, India.
Present address: CSHL, 1 Bungtown Road,
Cold Spring Harbor, NY 11724, USA

2Station INRA d'Am?lioration des Plantes - Domaine de Crouelle
234 avenue du Br?et,
63039 Clermont-Ferrand cedex 2, France












Contact:
VIB / UGent
Bioinformatics & Evolutionary Genomics
Technologiepark 927
B-9052 Gent
BELGIUM
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)

Don't hesitate to contact the in case of problems with the website!

You are visiting an outdated page of the BEG/Van de Peer Lab site.

Not all pages have been ported, so these archived pages are still available.

Redirect to the new website?