There will be further data updates on the SureChEMBL Patent and other data sources over the coming months.
The version independent URL (api.openphacts.org/latest) has also been updated at this time too. We welcome feedback on this
As a result, the relationships between patent documents and annotated entities for 3.6 million life-science relevant patents are now available via the Open PHACTS API. A series of web services calls have been then developed to allow users to query the data and to integrate it with the other data resources included in the Open PHACTS Discovery Platform (https://dev.openphacts.org/docs/2.1). These calls include:
• Patent Information - Retrieves bibliographic information for a patent document, e.g. title, publication date and classification codes.
• Patent Entities - Retrieves all annotations (compounds, genes and diseases) found in a patent document, along with their frequency of occurrence within the document, section and relevance score.
• Patent Entities: Count - Retrieves the number of entities mentioned in the patent specified.
• Patents for Compound: Count - Retrieves the number of patents a compound entity occurs in.
• Patents for Compound: List - Retrieves a list of patents a compound entity occurs in.
• Patents for Target: Count - Retrieves the number of patents a gene entity occurs in.
• Patents for Target: List - Retrieves a list of patents a gene entity occurs in.
• Patents for Disease: Count - Retrieves the number of patents a disease entity occurs in.
• Patents for Disease: List - Retrieves a list of patents a disease entity occurs in.
For more information on the annotation process, identifier mapping, relevance scoring and RDF generation, there is further information in the Support portal.
Pathways & WikiPathways
There continues to be huge improvements in ‘omics platforms for example RNASEQ, ChiPSEQ and proteomics, and also increased use of phenotypic screening, resulting in vast amounts of data that needs interpreting. Biological pathways are a key resource in understanding output from ‘omics platforms, even just the simple knowledge of which pathways are enriched helps. Wikipathways now increasingly tracks the interactions semantically by connecting datanodes in the pathway diagram to one another. The interactions can be either directed or undirected. In the case of a directed interaction, the RDF now captures a source and a target, depending on whether the datanode is at the beginning of the line or on the end that has the arrow, respectively. Both SBGN and MIM notations are supported in the RDF and in the queries of the data. Also as part of the RDF update, a new RDF for Reactome pathways was incorporated in WikiPathways.
Being able to programmatically compute over biological pathways as a graph allows for greater understanding, e.g. to see which genes/protein are up or downstream of each other. In the latest update for the OpenPHACTS API, there were three calls added to the API for pathways and specifically for interactions:
GET /pathway/getInteractions which lets the users query for all the interactions involved in a pathway. To use this feature, the user specifies a WikiPathways URI for the pathway and the query returns information about the interactions including direction and information about the connected nodes.
GET /pathways/interactions/byEntity which allows the user to specify a node URI (can be metabolite, protein, geneproduct, or RNA URI) found in a pathway and find the direction of the interactions originating from this point. Options included are the ability for the user to select whether they want the direction from the specified node to be in the “upstream” direction or the “downstream” direction. The query also returns the type of the interaction (eg. directed, undirected, inhibition, stimulation, conversion, etc.) annotated from the pathway resource.
GET /pathways/interactions/byEntity/count which is a method for counting the interactions from the user defined starting datanode. This will count all the interactions in which a datanode is participating across all of the pathway dataset. So if an identifier is found in multiple pathways, all of the interactions for that identifier are found.
In the course of this works as anticipated, we did uncover some areas of improvement with the underlying data. These can be rectified in future data releases resulting in more interaction data.