The MORPH algorithm is described in Tzfadia et al. (2012). MORPH (MOdule-guided Ranking of PatHway genes) ranks candidate genes for biological processes based on gene expression data and a set of bait genes. In contrast to other guilt-by-association methods, MORPH uses clusterings to partition the data into modules before calculating co-expression measures. MORPH uses a machine learning approach known as data selection to find the dataset - clustering combination in which the bait genes have highest co-expression. This selection is based on the area under the self-rank curve. These features make MORPH a very powerful tool to unravel gene functions. A highly computational efficient version of the MORPH algorithm can be used with your own genes of interest online here.
Species currently included are depicted in the cladogram below (branch lengths are meaningless). If you would like to submit data for a new species, please refer to adding a new species below.
Here a summary is presented of the data used in the MORPH prediction algorithm for the different included species. For more information contact us here.
We note that we have only used relatively limited data sets, and as more and more data sets are becoming available, we advise to use the MORPH-bulk tool with your own data set of interest instead of relying on the MorphDB tool, which was build as a proof of concept and user friendly tool for exploring our MORPH bulk predictions described in Zwaenepoel et al. 2018. To apply MORPH bulk to your own data sets, please find information at https://github.com/arzwa/morph-bulk/wiki.
The MorphDB database is RDF based, and tools for generating the RDF graph are included in the MORPH bulk distribution. In other words you can easily generate a MorphDB instance yourself!
Clusterings: Co-expression (CLICK), protein-protein interaction (Matisse), metabolic network (Matisse) and Enzyme based.
Dataset | # Conditions | # Genes |
---|---|---|
ds1Data.txt | 160 | 12459 |
Seed_GH_DataSet.txt | 42 | 22225 |
SeedlingsDataSet.txt | 64 | 12459 |
SeedsDataSet.txt | 51 | 12459 |
TissuesData.txt | 96 | 12564 |
Clusterings: Co-expression (CLICK) and metabolic network (Matisse) based. Note: all dat except the JAMER, MAERF and JAMER_MAERF data sets was retrieved from the Noble foundation.
Dataset | # Conditions | # Genes |
---|---|---|
Balzergue.expression_matrix | 41 | 20461 |
Benedito.expression_matrix | 20 | 19363 |
Breakspear.expression_matrix | 10 | 15137 |
Carvalho.expression_matrix | 11 | 17446 |
Czaja.expression_matrix | 20 | 16082 |
Niebel.expression_matrix | 18 | 17388 |
noble_all.expression_matrix | 151 | 20914 |
Ruffel.expression_matrix | 13 | 18294 |
Zhang.expression_matrix | 18 | 17902 |
JAMER.expression_matrix | 21 | 12790 |
JAMER_MAERF.expression_matrix | 32 | 13962 |
MAERF.expression_matrix | 11 | 12637 |
Clusterings: Co-expression (CLICK), protein-protein interaction (Matisse), metabolic network (Matisse), orthology (MCL) and enzyme based.
Dataset | # Conditions | # Genes |
---|---|---|
FruitDataSet.txt | 32 | 9217 |
RootAndLeafDataSet.txt | 21 | 9217 |
Clusterings: Co-expression (CLICK), protein-protein interaction (Matisse) and metabolic network (Matisse).
Dataset | # Conditions | # Genes |
---|---|---|
All_TissuesDataSet_ITAG.txt | 326 | 9012 |
Leafs_ITAG.txt | 242 | 9012 |
Root_ITAG.txt | 24 | 9012 |
Tuber_ITAG.txt | 60 | 9012 |
Clusterings: Co-expression (CLICK), protein-protein interaction (Matisse) and enzyme based.
Dataset | # Conditions | # Genes |
---|---|---|
E-GEOD-14275.expression_matrix | 6 | 11019 |
E-GEOD-25073.expression_matrix | 6 | 15930 |
E-GEOD-31077.expression_matrix | 16 | 5137 |
E-GEOD-35984.expression_matrix | 10 | 10116 |
E-GEOD-39298.expression_matrix | 6 | 12118 |
E-GEOD-5167.expression_matrix | 0 | 11852 |
E-GEOD-8216.expression_matrix | 6 | 15964 |
E-MEXP-2267.expression_matrix | 36 | 8025 |
RiceGenomeDataSet.expression_matrix | 16 | 25744 |
Clusterings: Co-expression (CLICK) based.
Dataset | # Conditions | # Genes |
---|---|---|
all | 24 | 32777 |
fiber | 3 | 21720 |
leaf | 3 | 27087 |
phloem | 3 | 25795 |
root | 3 | 25780 |
shoot | 3 | 26404 |
three_cell_type | 3 | 25326 |
vessel | 3 | 22826 |
xylem | 3 | 23888 |
Clusterings: Co-expression (CLICK), ortholog and enzyme based.
Dataset | # Conditions | # Genes |
---|---|---|
caros_all.expression_matrix | 30 | 25329 |
caros_hairy_roots.expression_matrix | 7 | 15105 |
caros_organs.expression_matrix | 8 | 19933 |
caros_smartcell.expression_matrix | 7 | 18678 |
caros_suspension_culture.expression_matrix | 8 | 15121 |
Clusterings: Co-expression based (CLICK).
Dataset | # Conditions | # Genes |
---|---|---|
all.count_table.txt | 23 | 16674 |
all.fpkm_table.txt | 23 | 15679 |
female_flower.count_table.txt | 6 | 13077 |
female_flower.fpkm_table.txt | 6 | 10587 |
male_flower.count_table.txt | 7 | 14260 |
male_flower.fpkm_table.txt | 7 | 11809 |
root.count_table.txt | 4 | 13380 |
root.fpkm_table.txt | 4 | 10708 |
vegetative.count_table.txt | 6 | 13946 |
vegetative.fpkm_table.txt | 6 | 11416 |
MorphDB is an RDF based graph database stored in an Apache Jena triple store (TDB). Currently the database is not directly served and queries are performed using SPARQL at the server side. The triple store is highly scalable and can be served directly when interest grows.
Currently, the following predicates are included in MorphDB.
Predicate |
---|
http://www.w3.org/2000/01/rdf-schema#label |
http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
http://morph.org/has_score |
http://morph.org/is_candidate_of |
http://morph.org/member_of |
http://morph.org/species |
http://morph.org/has_member |
http://morph.org/has_species_member |
http://morph.org/is_missing_bait_of |
http://morph.org/has_ausr |
http://morph.org/has_bait_in_dataset |
http://morph.org/has_candidate |
http://morph.org/no_genes_in_dataset |
http://morph.org/no_genes_missing |
http://morph.org/has_bait_missing |
http://morph.org/rank |
http://morph.org/score_for_gene_set |
http://morph.org/score_for_sp_gene_set |
http://morph.org/score_value |
http://morph.org/is_bait_of |
http://morph.org/gene_set_type |
http://morph.org/has_species |
<gene> has_score <score>
triple in the triple store. Therefore gene scores are reified. We can access a score object of the gene
by using the predicate has_score
. To get the actual score for the pathway of interest we have to
use the predicates score_for_sp_gene_set
and score_value
with as subject the score object.
For a similar reason GO/mapman terms (gene sets) exist at two levels, as a GO term has different AUSR values in different species.
Object (subject) | example URI |
---|---|
Gene | http://morph.org/gene#MT0001S0570 |
Gene set (general level) | http://morph.org/gene_set#GO_0010105 |
Gene set (species level) | http://morph.org/gene_set_sp#sly_GO_0010105 |
Gene family | http://morph.org/gene_family#HOM0000018 |
Score | http://morph.org/score#MT0001S0570_GO_0010224 |
If you want to run MORPH bulk for your own data and own species of interest, we refer to the MORPH-bulk repository at https://github.com/arzwa/morph_bulk. On the wiki page of that repository you can find documentation and a tutorial on how to run MORPH bulk for your case of interest. This also includes tools for building a MorphDB RDF data base graph, which can be used to generate a MorphDB instance.
MorphDB was developed by Arthur Zwaenepoel (2017)
If you use MorphDB or MORPH bulk, please cite:
Zwaenepoel, A., Diels, T., Amar, D., Van Parys, T., Shamir, R., Van de Peer, Y., & Tzfadia, O. (2018). MorphDB: Prioritizing Genes for Specialized Metabolism Pathways and Gene Ontology Categories in Plants. Frontiers in Plant Science, 9(March), 1–13. https://doi.org/10.3389/fpls.2018.00352