Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks

Tom Michoel^{1,2 *}, Riet De Smet³, Anagha Joshi^1,2, Yves Van de Peer^1,2 and Kathleen Marchal³

¹ Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Gent, Belgium

² Department of Molecular Genetics, Ghent University Technologiepark 927, B-9052 Gent, Belgium

³ CMPG, Department Microbial and Molecular Systems, KULeuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium

^*Corresponding author, E-mail: tom.michoel@psb.vib-ugent.be

Abstract

Background: A myriad of methods to reverse-engineer transcriptional regulatory networks have been developed in recent years. Direct methods directly reconstruct a network of pairwise regulatory interactions while module-based methods predict a set of regulators for modules of coexpressed genes treated as a single unit. To date, there has been no systematic comparison of the relative strengths and weaknesses of both types of methods.

Results: We have compared LeMoNe, a recently developed module-based algorithm, to CLR, a mutual information based direct algorithm, using benchmark expression data and databases of known transcriptional regulatory interactions for Escherichia coli and Saccharomyces cerevisiae. A global comparison using recall versus precision curves hides the topologically distinct nature of the inferred networks and is not informative about the specific subtasks for which each method is most suited. Analysis of the degree distributions and a regulator specific comparison show that CLR is regulator-centric, making true predictions for a higher number of regulators, while LeMoNe is target-centric, recovering a higher number of known targets for fewer regulators, with limited overlap in the predicted interactions between both methods. Detailed biological examples in E. coli and S. cerevisiae are used to illustrate these differences and to prove that each method is able to infer parts of the network where the other fails. Biological validation of the inferred networks cautions against over-interpreting recall and precision values computed using incomplete reference networks.

Conclusions: Our results indicate that module-based and direct methods retrieve largely distinct parts of the underlying transcriptional regulatory networks. The choice of algorithm should therefore be based on the particular biological problem of interest and not on global metrics which cannot be transferred between organisms. The development of sound statistical methods for integrating the predictions of different reverse-engineering strategies emerges as an important challenge for future research.

Supplementary Information

E. coli

Links to external data sources:
List of candidate regulators: [TXT]
List of differentially expressed genes: [TXT]
Reference network: [TXT]
Modules in LeMoNe 30% precision network: [TXT]
Regulator to module assigments in LeMoNe 30% precision network: [TXT]
CLR 30% precision network: [TXT]

S. cerevisiae

Links to external data sources:
List of candidate regulators: [TXT]
List of differentially expressed genes: [TXT]
Reference network: [TXT]
Modules in LeMoNe 1070 network: [TXT]
Regulator to module assigments in LeMoNe 1070 network: [TXT]
CLR 1070 network: [TXT]

Figures & tables

Figures:
- Figure 1 [PDF]
- Figure 2 [PDF]
- Figure 3 [PDF]
- Figure 4 [PDF]
- Figure 5 [PDF]
- Figure 6 [PDF]
Table:
- Table 1 [PDF]
Supplementary figures:
- Figure S1 [PDF]
- Figure S2 [PDF]
- Figure S3 [PDF]
- Figure S4 [PDF]
- Figure S5 [PDF]
Supplementary table:
- Table S1 [PDF]

Software

LeMoNe: Java package for learning module networks.
MatrixClust: Matlab toolbox for fuzzy clustering adjacency matrices.

credits

Contact:
VIB / UGent
Bioinformatics & Evolutionary Genomics
Technologiepark 927
B-9052 Gent
BELGIUM
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)