Summary 

This release of Open PHACTS opens up access to SureChEMBL patents and enhanced Pathways data

The main updates are:

  • SureChEMBL patent data upto Mar 2015
  • New API calls covering Patent data and their links to Compounds, Disease and Targets
  • Structure search extended to cover SureChEMBL plus existing data sources
  • Bioannotations of life science relevant full text patents with tagging of Disease and Targets terms using Scibite Termite
  • Enhanced Pathways data from WikiPathways and allowing queries on interactions between entities in the Pathways
We look forward to your comments on this release and will be updating the Knime and Pipeline workflows accordingly.

There will be further data updates on the SureChEMBL Patent and other data sources over the coming months.


The version independent URL (api.openphacts.org/latest) has also been updated at this time too. We welcome feedback on this


SureChEMBL Details

SureChEMBL (https://www.surechembl.org/) is a patent chemistry resource freely available by the ChEMBL group at EMBL-EBI. SureChEMBL uses a live and automated cloud-based pipeline that combines full-text and image mining to extract chemical annotations from patent documents, convert them to compounds and make them readily searchable and publicly available within 1-2 days from publication.

To complement the chemistry annotations, approximately 3.6 million full-text life-science-relevant patents published between 1975 and 2015 by the EPO, WIPO and USPTO authorities were annotated with biological entities such as genes and diseases, using the Termite text-mining tool. Furthermore, since patent documents are inherently obfuscated and thus noisy, we developed and validated algorithms to assess the relevance of each entity type (compound, gene, disease) within a particular document, thus enabling ranking and filtering of entities in order to reduce noise and spurious matches.

As a result, the relationships between patent documents and annotated entities for 3.6 million life-science relevant patents are now available via the Open PHACTS API. A series of web services calls have been then developed to allow users to query the data and to integrate it with the other data resources included in the Open PHACTS Discovery Platform (https://dev.openphacts.org/docs/2.1). These calls include:

• Patent Information - Retrieves bibliographic information for a patent document, e.g. title, publication date and classification codes.

• Patent Entities - Retrieves all annotations (compounds, genes and diseases) found in a patent document, along with their frequency of occurrence within the document, section and relevance score.

• Patent Entities: Count - Retrieves the number of entities mentioned in the patent specified.

• Patents for Compound: Count - Retrieves the number of patents a compound entity occurs in.

• Patents for Compound: List - Retrieves a list of patents a compound entity occurs in.

• Patents for Target: Count - Retrieves the number of patents a gene entity occurs in.

• Patents for Target: List - Retrieves a list of patents a gene entity occurs in.

• Patents for Disease: Count - Retrieves the number of patents a disease entity occurs in.

• Patents for Disease: List - Retrieves a list of patents a disease entity occurs in.


For more information on the annotation process, identifier mapping, relevance scoring and RDF generation, there is further information in the Support portal.


Pathways & WikiPathways


There continues to be huge improvements in ‘omics platforms for example RNASEQ, ChiPSEQ and proteomics, and also increased use of phenotypic screening, resulting in vast amounts of data that needs interpreting. Biological pathways are a key resource in understanding output from ‘omics platforms, even just the simple knowledge of which pathways are enriched helps. Wikipathways now increasingly tracks the interactions semantically by connecting datanodes in the pathway diagram to one another. The interactions can be either directed or undirected. In the case of a directed interaction, the RDF now captures a source and a target, depending on whether the datanode is at the beginning of the line or on the end that has the arrow, respectively. Both SBGN and MIM notations are supported in the RDF and in the queries of the data. Also as part of the RDF update, a new RDF for Reactome pathways was incorporated in WikiPathways.

Being able to programmatically compute over biological pathways as a graph allows for greater understanding, e.g. to see which genes/protein are up or downstream of each other. In the latest update for the OpenPHACTS API, there were three calls added to the API for pathways and specifically for interactions:


  • GET /pathway/getInteractions which lets the users query for all the interactions involved in a pathway.  To use this feature, the user specifies a WikiPathways URI for the pathway and the query returns information about the interactions including direction and information about the connected nodes.  
  • GET /pathways/interactions/byEntity which allows the user to specify a node URI (can be metabolite, protein, geneproduct, or RNA URI) found in a pathway and find the direction of the interactions originating from this point. Options included are the ability for the user to select whether they want the direction from the specified node to be in the “upstream” direction or the “downstream” direction. The query also returns the type of the interaction (eg. directed, undirected, inhibition, stimulation, conversion, etc.) annotated from the pathway resource.
  • GET /pathways/interactions/byEntity/count which is a method for counting the interactions from the user defined starting datanode. This will count all the interactions in which a datanode is participating across all of the pathway dataset. So if an identifier is found in multiple pathways, all of the interactions for that identifier are found.


In the course of this works as anticipated, we did uncover some areas of improvement with the underlying data. These can be rectified in future data releases resulting in more interaction data.


Bug Fixes & other changes

Deprecated calls removed

3 calls were deprecated in previous API versions and these have now been removed in 2.1, these are

GET /pharmacology/filters/units

GET /pharmacology/filters/units/{act_type}