Figure 1
Representation of the three different configuration of the intergenic region in DNA. The genes are represented with an arrow and the core promoter for each gene is indicated with an oval box. (1) The two genes are pointing in the same direction and there is one core promoter in the intergenic region. (2) Two genes are pointing in the opposite directions and the intergenic region contains the two core promoters. In the last case (3) the genes are pointing towards each other and in the intergenic region there is no core promoter present.
 


 

Figure 2
Comparison of the 3rd-order transition matrices computed from the intergenic sequences and the input sequences. Each point represents a corresponding entry in both transition matrices. The x-axis indicates the value in the intergenic matrix while the y-axis depicts the value in the matrix based on the input sequences. For some of the outliers the corresponding entry in the matrix is given.


 

Figure 3

Logo representation of the 4 inserted sites in the simulated sequences. These logos are created from the all the inserted instances of the motifs in the sequences.

 

1                                                                   2  

3                                                                   4
 
 

Figure 4
(a) Total number of times the G-box consensus is found in 10 runs. The horizontal axis shows the number of noisy sequences added to the G-box data set. (b) Average number of correctly predicted G-box positions. This number is based on comparison of the described G-box positions and the predicted positions of the G-box motif in all the runs where a G-box consensus was found. (c) Average percentage of wrongly classified motifs. This number is based the number of sequences that are indicated as not having a G-box although a G-box was documented (including the runs where no G-box consensus is found).