Examples of SPARQL queries

The SPARQL endpoint to MorphDB provides a powerful tool for user with specific interests that want to perform complex queries. Really, you can virtually do any query you can think of like "give me all genes that are predicted as a candidate with a score above 2 for GO_0010105 in Arabidopsis that have orthologs in tomato that are bait genes for a GO term with an AUSR score above 0.7 that have an unknown function" which is pretty neat to do. Below you can find some examples of queries you might want to perform if you're interested in particular genes, gene families or pathways in the context of the Morph tool for candidate gene prediction.

Note: the prefixes in the default query of the SPARQL endpoint should be included in order for these queries to be functional.

Get all predicates in MorphDB

An RDF database contains triples i.e. 'sentences' with the structure subjectpredicateobject. Examples of subjects and objects in MorphDB are genes, GO categories, Mapman pathways, scores, AUSR values and descriptions (among others). Examples of predicates are is_candidate_of, is_bait_of and others. An example query to show all predicates present in MorphDB is included below. this may be of interest to explore the database further using SPARQL.

SELECT DISTINCT ?p WHERE {?s ?p ?o}

Example 2: a gene-centric view

An example of a gene-centric query could be: "Give me all pathways for which the gene AT1G43850 is a candidate and the respective AUSR and score." The following query gives the desired output for a gene (AT1G43850 a SEUSS transcriptional co-regulator).

SELECT ?pathway ?pathway_description ?ausr ?score
WHERE {
    m:gene\#AT1G43850 m:is_candidate_of ?pathway ;
                      m:has_score ?s .
    ?pathway rdfs:label ?pathway_description ;
             m:has_ausr ?ausr .
    ?s m:score_for_sp_gene_set ?pathway ;
       m:score_value ?score .
}

As a gene can have multiple scores (i.e. for different pathways and GO's), gene scores are reified. We can access a score object of the gene by using the predicate has_score. To get the actual score for the pathway of interest we have to use the predicates score_for_sp_gene_set and score_value with as subject the score object.

Example 3: get all information on a pathway or GO term

Let's say we want both candidate genes predicted by MORPH as well as the bait genes that were present and missing in the dataset used by MORPH to infer candidate genes. Also we would like a pathway or GO description and the AUSR score.

SELECT ?go ?ausr ?relation ?gene ?gene_description
WHERE {
    m:gene_set_sp\#sly_GO_0010105 ?relation ?gene ;
                                  rdfs:label ?go ;
                                  m:has_ausr ?ausr .
    ?gene rdfs:label ?gene_description .
    OPTIONAL { ?gene m:has_score ?s .
               ?s m:score_for_sp_gene_set m:gene_set_sp\#sly_GO_0010105 ;
                                          m:score_value ?score . }
}
ORDER BY (?relation)

Using the ORDER BY (?relation) keyword the ouput is sorted alphabetically by the relation predicate.

Example 4: incorporating orthology information

Here's where SPARQL becomes really handy to query RDF data. Let's say we want to get all candidates for mapman pathway 1.1.1.1 (PS.lightreaction.photosystem II.LHC-II) in Arabidopsis thaliana that have an ortholog in Medicago truncatula that is also a candidate for pathway 1.1.1.1. The query string then looks like this:

SELECT ?gene ?gene_description ?ortholog ?ortholog_description ?gf
WHERE {
    m:gene_set_sp\#ath_1.1.1.1 m:has_candidate ?gene .
    ?gene m:member_of ?gf ;
          rdfs:label ?gene_description .
    ?gf m:has_member ?ortholog .
    ?ortholog m:is_candidate_of m:gene_set_sp\#mtr_1.1.1.1 ;
              rdfs:label ?ortholog_description .
}

Using the ORDER BY (?relation) keyword the ouput is sorted alphabetically by the relation predicate.