While querying the system, I have stumbled on strange URIs for drugs, I am copying one here as an example:
They come from adverse events (aers.data2semantics.org). The identity resolution service doesn't come with them.
Is this an error, or is there some rationale behind these URIs ?
The loaded AERS dataset (2012-07-09) unfortunately contains many strange and duplicated identifiers, e.g.
The URIs seem to correspond 1-1 to a URI %-escape of the first 45 characters of the rdfs:label with spaces replaced with _, e.g. "ATARAX-P /00058402/"
In the corresponding IRS linkset from AERS to Drugbank you will find even more of these duplicates, e.g.
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00557> skos:exactMatch <http://aers.data2semantics.org/resource/drug/ATARAX>, <http://aers.data2semantics.org/resource/drug/ATARAXOID>, <http://aers.data2semantics.org/resource/drug/ATARAX_%2F00595201%2F>, <http://aers.data2semantics.org/resource /drug/ATARAX_____________________________%2F00058402%2F>, <http://aers.data2semantics.org/resource/drug/ATARAX_____________________________%2F00058403%2F> .
The upstream http://aers.data2semantics.org/ is unfortunately currently not available.
When querying the IRS mapUri method for one of these URIs, take care to double-escape those %s to %25, e.g. https://beta.openphacts.org/1.5/mapUri?app_id=161aeb7d&app_key=bbcba81896020f0b95e3dd35b55e3345&Uri=http%3A%2F%2Faers.data2semantics.org%2Fresource%2Fdrug%2FZOPICLONE__________%2528ZOPICLONE%2529
or if you have your own IRS instance, for example http://heater.cs.man.ac.uk:3004/QueryExpander/mapUri?Uri=http%3A%2F%2Faers.data2semantics.org%2Fresource%2Fdrug%2FATARAX_____________________________%252F00058402%252F&lensUri=http%3A%2F%2Fopenphacts.org%2Fspecs%2F%2FLens%2FDefault&Pattern+Filter=&overridePredicateURI=&format=text%2Fhtml
For the example you found, I can confirm that "ATARAX-P" is not part of the linkset, and therefore not recognized by the IMS. It seems the AERS linkset is newer than the AERS datadump, which could explain the mismatch. I have tracked this as in Open PHACTS' issue tracker so that we can update the AERS data to make these identifiers match.
It seems the "real" AERS drug identifier is the number after /, e.g. /00058402/ (when present) - which also http://aers.data2semantics.org/resource/drug/VISTARIL___________________________%2F00058402%2F - this IS in the IMS as:
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00557> skos:exactMatch <http://aers.data2semantics.org/resource/drug/VISTARIL___________________________%2F00058402%2F>.
it is one of the datasets in 1.5 (aers.data2semantics.org).
I retrieved it via a SPARQL query directly (it is for all compounds with adverse events in a tissue, and it's federated with meddra to resolve it).