• Data is downloaded or converted to RDF
  • We use public identifers and vocabularies wherever possible
  • All datasets have extensive provenance information using the VoID specification
  • Any data containing chemicals (SD Files) is processed through our chemistry registration system to assign unique OPS compound identifiers
  • Data is loaded to the triple store (data cache)
  • Mappings between databases (such as drugbank id X = chembl ID Y are loaded to the IMS (top left)
  • At query time, a sparql query is passed to the Semantic Workflow Engine. This recognises the input identifiers and asks the IMS for any matching identifiers in other databases. The IMS web service passes these IDs back and the sparql query is re-written and then executed
  • We serve our data using pre-written, heavily optimised sparql queries that are packaged up and available through our API