Generic eukaryotic core promoter prediction using structural features of DNA
Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, has no size constraints, and the results are easily interpretable. We compared our approach to fourteen current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on twelve additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.
Abeel, T., Saeys, Y., Bonnet, E., Rouzé, P., Van de Peer, Y. (2008) Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res. 18(2):310-23.
VIB / UGent
Bioinformatics & Evolutionary Genomics
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)