New in the 1.5 API release

Updated versions of the data sets and new features


1) Updated data sources

The list of the data sets with the date downloaded and the version numbers:

UniProt (manually curated entries only) from 28 Jan 2015, release 2015_1

ENZYME from 02 Feb 2015, release 2015_1

DrugBank from 19 Feb 2015, version 4.1

ChEMBL from 18 Feb 2015, ChEMBL 20

ChEBI 04 Mar 2015 ChEBI, Release 125

FDA Adverse Events (FAERS) data, 09 Jul 2012

Gene Ontology, 04 Mar 2015

Gene Ontology Annotations, 17 Feb 2015

WikiPathways, 20 Mar 2015, v20150312

DisGeNET, 31 Mar 2015, v2.1.0


Data sources not updated:

ConceptWiki

Open PHACTS Chemical Registration Service (OCRS)

neXtProt


Various SPARQL optimizations have been done to improve API calls and results.


The IMS has been enriched with additional patterns for various datasets.

Quality assurance comparing the API results to the native data sources was done to assure the same content.


2) Several new features


API features:

· New filter for the Tissue API calls. A new filter has been added to Tissues for Protein and Protein for Tissues.

· New filter for the Disease API calls. A new filter, assoc_type, has been added to Associations for Disease: Count, Associations for Disease: List, Associations for Target: Count, an Associations for Target: List. This new parameter filters for these associations by using the SIO identifier:

o sio:SIO_001119 rdfs:label "gene-disease association linked with causal mutation"

o sio:SIO_001120 rdfs:label "therapeutic gene-disease association"

o sio:SIO_001121 rdfs:label "gene-disease biomarker association"

o sio:SIO_001122 rdfs:label "gene-disease association linked with genetic variation"

o sio:SIO_001123 rdfs:label "gene-disease association linked with altered gene expression"

o sio:SIO_001124 rdfs:label "gene-disease association linked with post-translational modification"

 

New results:

Adverse events from the FAERS data and interacting drugs from DrugBank have been added to Compound Information calls.


3) Response format changes

The color-coded 3-Scale documentation reflects the optional and required parameters. The documentation also has the example query URIs pre-loaded in the query box as well as dropdown menus for parameter filters that have less than 100 options.


OCRS matches for compounds are now optional: This implies that compounds can be returned without an OCRS identifier. This allows the use of new ChEMBL_20 URIs to retrieve results from Compound Information and Compound Pharmacology.

This modification is temporary. The requirement for the presence of an OCRS identifier will be returned with the next release of the platform.


Batch calls: The API only exposes skos:exactMatch relationships between instances regardless of the underlying mapping in the IMS. For example, the itemized list now resembles the same structure as Target Information results.


Target Class members: The requirement for the presence in ChEMBL has been removed for ENZYME proteins and UniProt is the main data source for the proteins displayed in the result. Therefore, all proteins from UniProt will be returned in the result with ChEMBL data as a sub-block when present.

 

DrugBank results: DrugBank has added language tags (_en) to their data set and this is now reflected in the results returned by the API.


GO data set: The “primary topic” URL for GO terms is now given in the form of http://purl.obolibrary.org/obo/GO_000000. It was previously in the form of http://purl.org/obo/owl/GO#GO_0000000. If the http://purl.org/obo/owl/GO# form is used now, the http://purl.obolibrary.org/obo/ form will be returned in a primary topic block and an exact match block will be returned with the http://purl.org/obo/owl/GO# input.


4) Fixes for previous issues


Target Information: All targets include amino acid sequences.


Associations for disease: The requirement to return the disease class has been removed. Get targets for disease and Association for disease now return the same data as the DisGeNET source.


Pathway for Target: Previously missing pathways for the specified target are now returned.

 

Statistics on 1.5 Release

 

Linked data cache: 

Dataset                                                            Triples

http://purl.uniprot.org                                      979928936

http://www.openphacts.org/goa                      879448347

http://www.ebi.ac.uk/chembl                           445732880

http://www.nextprot.org                                   249403405

http://rdf.imim.es                                               15011136

http://aers.data2semantics.org/                         13557070

http://www.wikipathways.org                               6110015

http://www.conceptwiki.org                                 4331760

http://www.openphacts.org/bio2rdf/drugbank     4028767

http://www.geneontology.org                               1366494

http://www.ebi.ac.uk/chebi                                  1012056

http://purl.uniprot.org/enzyme                                  61467

http://www.nextprot.org/caloha                                14552

        

           Entities                                                                   Number

           Compound                                                            1565746

           Target                                                                      547357

           Pathway                                                                     1448

           Disease                                                                       8162

           Tissue                                                                           817



Identifier mapping service:

31,382,458 Mappings

186 Mapping Sets

50 Source Data Sources

4 Predicates

50 Target Data Sources