The SPARQL endpoint to MorphDB provides a powerful tool for user with specific interests that want to perform complex queries. Really, you can virtually do any query you can think of like "give me all genes that are predicted as a candidate with a score above 2 for GO_0010105 in Arabidopsis that have orthologs in tomato that are bait genes for a GO term with an AUSR score above 0.7 that have an unknown function" which is pretty neat to do. Below you can find some examples of queries you might want to perform if you're interested in particular genes, gene families or pathways in the context of the Morph tool for candidate gene prediction.
An RDF database contains triples i.e. 'sentences' with the structure
subject
predicate
object
. Examples of subjects and objects in MorphDB are
genes, GO categories, Mapman pathways, scores, AUSR values and descriptions (among others). Examples of
predicates are is_candidate_of
, is_bait_of
and others. An example query to show all
predicates present in MorphDB is included below. this may be of interest to explore the database further using
SPARQL.
SELECT DISTINCT ?p WHERE {?s ?p ?o}
An example of a gene-centric query could be: "Give me all pathways for which the gene AT1G43850 is a candidate and the respective AUSR and score." The following query gives the desired output for a gene (AT1G43850 a SEUSS transcriptional co-regulator).
SELECT ?pathway ?pathway_description ?ausr ?score WHERE { m:gene\#AT1G43850 m:is_candidate_of ?pathway ; m:has_score ?s . ?pathway rdfs:label ?pathway_description ; m:has_ausr ?ausr . ?s m:score_for_sp_gene_set ?pathway ; m:score_value ?score . }
As a gene can have multiple scores (i.e. for different pathways and GO's),
gene scores are reified. We can access a score object of the gene
by using the predicate has_score
. To get the actual score for the pathway
of interest we have to use the predicates score_for_sp_gene_set
and
score_value
with as subject the score object.
Let's say we want both candidate genes predicted by MORPH as well as the bait genes that were present and missing in the dataset used by MORPH to infer candidate genes. Also we would like a pathway or GO description and the AUSR score.
SELECT ?go ?ausr ?relation ?gene ?gene_description WHERE { m:gene_set_sp\#sly_GO_0010105 ?relation ?gene ; rdfs:label ?go ; m:has_ausr ?ausr . ?gene rdfs:label ?gene_description . OPTIONAL { ?gene m:has_score ?s . ?s m:score_for_sp_gene_set m:gene_set_sp\#sly_GO_0010105 ; m:score_value ?score . } } ORDER BY (?relation)
Using the ORDER BY (?relation)
keyword the ouput is sorted
alphabetically by the relation predicate.
Here's where SPARQL becomes really handy to query RDF data. Let's say we want to get all candidates for mapman pathway 1.1.1.1 (PS.lightreaction.photosystem II.LHC-II) in Arabidopsis thaliana that have an ortholog in Medicago truncatula that is also a candidate for pathway 1.1.1.1. The query string then looks like this:
SELECT ?gene ?gene_description ?ortholog ?ortholog_description ?gf WHERE { m:gene_set_sp\#ath_1.1.1.1 m:has_candidate ?gene . ?gene m:member_of ?gf ; rdfs:label ?gene_description . ?gf m:has_member ?ortholog . ?ortholog m:is_candidate_of m:gene_set_sp\#mtr_1.1.1.1 ; rdfs:label ?ortholog_description . }
Using the ORDER BY (?relation)
keyword the ouput is sorted
alphabetically by the relation predicate.