ProSOM: Core promoter prediction based on unsupervised clustering of DNA physical profiles
More and more genomes are being sequenced, and tokeep up with the pace of sequencing projects, automated annotationtechniques are required. One of the most challenging problemsin genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is sucha challenging problem, it is not yet common practice to integratetranscription start site prediction in genome annotation projects.Nevertheless, better core promoter prediction can improve genomeannotation and can be used to guide experimental work.
Comparing the average structural profile of transcribed,promoter and intergenic sequences demonstrates that the corepromoter has unique features that cannot be found in othersequences. We show that unsupervised clustering by using selforganizingmaps can clearly distinguish between the structuralprofiles of promoter sequences and other genomic sequences. Animplementation of this promoter prediction program, called ProSOM,is available and has been compared with the state-of-the-art. Wepropose an objective, accurate and biologically sound validationscheme for core promoter predictors. ProSOM performs at least asgood as current software, but our technique is more balanced in termsof the number of predicted sites and the number of false predictions,thus providing a better all-round performance. Additional tests onthe ENCODE regions of the human genome show that 98% of allpredictions made by ProSOM can be associated with transcriptionallyactive regions, thus demonstrating high precision.
Predictions for the human genome and the program(ProSOM) are available upon request.
Abeel, T., Saeys, Y., Rouzé, P., Van de Peer, Y. (2008) ProSOM: Core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 24(13):i24-31.
VIB / UGent
Bioinformatics & Evolutionary Genomics
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)