Start a new topic

Chemical Similarity in Open PHACTS

I have come across a strange behavior in the similarity search.

I was looking at compounds similar to necrostatin and found that some of the results with quite high similarities (0.8) are not similar at all.

I compared the results with ChEMBL and these compounds were not in their results at all.

I took some of the compounds for which Open PHACTS had a Tanimoto similarity of 0.8 and compared them with a couple of different fingerprints. In all cases, the first three had similar values to each other, whereas the 4th ("strange") compound had a much lower value.

See image.


SMILES:

Necrostatin: CN1C(=S)NC(Cc2c[nH]c3ccccc23)C1=O

1st "strange" compound on slide:

CC[C@@H](C)[C@@H]1C(=O)N2CCCC[C@@H]2C(=O)N3CCN(CC3)C(=O)N([C@@H](C(=O)N4CCC[C@H]4C(=O)N[C@@H](C(=O)N1)CC5=CNC6=CC=CC=C65)CC7=CC=CC=C7)C

2nd "strange" compound on slide:

C[C@H]1C(=O)N[C@@H](C(=O)NCCCCCCC(=O)N[C@H](C(=O)N[C@@H](C(=O)N[C@@H](C(=O)N[C@@H](C(=O)N1C)CCCCN)CC2=CNC3=CC=CC=C32)CC4=CC=CC=C4)CC5=CC=CC=C5)CC6=CC=CC=C6

Compounds in the table, top to bottom:

COC1=CC2=C(NC=C2CC2NC(=O)N(C)C2=O)C=C1

CN1C(=O)NC(CC2=C(C)NC3=CC=CC=C23)C1=O

CN1C(=O)NC(CC2=CNC3=C2C=CC=C3F)C1=O

C[C@@H]1N(C)C(=O)[C@@H](CCCCN)NC(=O)[C@@H](CC2=CNC3=CC=CC=C23)NC(=O)[C@@H](CC2=CC=CC=C2)NC(=O)[C@H](CC2=CC=CC=C2)NC(=O)CCCCCCNC(=O)[C@@H](CC2=CC=CC=C2)NC1=O


JPG

The structure search API calls are provided by RSC, which use the Bingo search cartridge. It's the same search which is also used on the chemspider page, and you can actually find the compounds there as well if you set the similarity threshold to >70%.

 

I understand the similarity search is done by the RSC engine, however, I am not sure this explains why these compounds should be similar. There are very few things similar about them and especially when you look at the other compounds with 0.8 similarity, it raises questions.

Login or Signup to post a comment