2. 3-D Molecular Similarity Assessment for European Orphan Drugs

Members : TBD

Mentor : Sunghwan Kim (PubChem/NCBI)

 

Project Description

If a drug gets a marketing authorization in Europe with orphan designation (meaning that it is approved for rare diseases), it will get a market exclusivity for 10 years (meaning that no "similar" drugs for the same indication cannot enter into the market). Please see this document for more details:

http://www.ema.europa.eu/ema/index.jsp?curl=pages/regulation/general/general_content_000392.jsp&mid=WC0b01ac058061f019

Therefore, the European Medical Agency, which is responsible for marketing authorization of medical products, requires the applicant of a new drug to submit a "similarity report", which compare the new drug with existing drugs in terms of molecular structure, mechanism of action, and indication. For more details, see Section 2.1 of this document:

http://ec.europa.eu/health//sites/health/files/files/orphanmp/doc/c_2008_4077_en.pdf

While the molecular structure similarity comparison is required for drug approval, molecular similarity is a very subjective concept, and no standard way to evaluate it.

For this reason, some papers have analyzed molecular similarity among approved drugs using several 2-D similarity methods:

http://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-5
http://www.sciencedirect.com/science/article/pii/S1359644616304718

However, these studies evaluated molecular similarity using 2-D similarity methods, and if 3-D similarity methods are used, we will have some different insights on similarity assessment for EMA's orphan designations.

 

Methods

This study will take the following steps:

  1. Get all approved orphan drugs from the European Medicines Agency
  2. Retrieve all known drugs from a public database (e.g., PubChem, DrugBank)
  3. Generate 3-D conformers for the drugs in (1) and (2)
  4. Compute 3-D similarity scores between the drugs, using the 3-D conformers generated in (3) and several 3-D similarity methods.
  5. Compute 2-D similarity scores between the drugs, using commonly used 2-D fingerprint methods.
  6. Identify drug-drug pairs with a low 2-D score but with a high 3-D score (meaning that the two drugs are similar in 3-D but not in 2-D).
  7. Identify drug-drug pairs with a high 2-D score but with a low 3-D score [that is, opposite to (6)].
  8. Discuss the difference between 2-D and 3-D similarity in recognizing molecular similarity.
  9. Discuss potential impacts of using 3-D similarity methods for EMA's similarity assessment for marketing authorization.
  10. Discuss how EMA's and FDA's regulations are different in terms of orphan drug marketing approval.

This project is quite straightforward, but would take more time than other projects, because 3-D similarity comparison takes longer than 2-D similarity comparison.

Rating: 
0
No votes yet
Join the conversation.

Comments 11

olcc s16 | Mon, 03/13/2017 - 13:06
Hi Dr. Kim While waiting for more participants in this project, do you have any suggested paper regarding of 3-D similarity? I want to study the background before the project started. Thanks- Phuc

Sunghwan Kim | Wed, 03/15/2017 - 23:10

I recommend these two papers:

Sunghwan Kim | Wed, 03/15/2017 - 23:10

I recommend these two papers:

Sunghwan Kim | Fri, 03/31/2017 - 13:32

Yes, I believe the link contains the information that we should start with. Please map drug names with PubChem CIDs (and other necessary information for next step). I see some couple issues here:

(1) The list contains biologics (e.g., monoclonal antibody and protein drugs) as well as small molecule drugs. So, it's likely that you will fail to map these drugs with CIDs, because PubChem primarily contains small molecules. Do not worry even if you get many failed search. We will focus on small molecules.

(2) The list contains thousands of drugs, while Google Sheet cannot make that many web service request, so you will need to use MS Excel for this task. Once you have a shorter list of drugs (after removing biologics and duplicates), it may be possible to use Google Sheet (depending on the size of the data set).

olcc s16 | Thu, 04/06/2017 - 12:12
Dr. Kim, I have cleaned up the Europa orphan drug list as you instructed. However, I might have skipped biological substance while process the original list (which i think it's ok since Pubchem won't give any result from it) and also, I think I catch up some mistake with the drugs nomenclature (such as parenthesis). So, I tried to upload the list into GoogleSheet and linked to CACTUS resolver to retrieve the Standard Inchi Key (but it took forever, and some cell still giving me "loading"). Then, I upload the list to Pubchem as synonyms to find and I got 673 results. I upload to OLCC the original list, the updated, and the list from Pubchem

olcc s16 | Thu, 04/06/2017 - 12:32
And also, for later work, what would be the smart and faster way to do clean up just like the last list? Because i have to do it manually and it would take a while. Thanks

Sunghwan Kim | Thu, 04/06/2017 - 17:51
By the way, how did you map the names with the CIDs? Please explain it to me briefly. (one or two sentences should be fine).

Sunghwan Kim | Thu, 04/06/2017 - 17:48

>> I tried to upload the list into GoogleSheet and linked to CACTUS resolver to retrieve the Standard Inchi Key.

I've already mentioned that GoogleSheet can't not handle a large number of web service requests.  Excel is the one you should.

>> I upload the list to Pubchem as synonyms to find and I got 673 results

I don't see the one-to-one mapping of the synonyms and the CIDs (which name was mapped to which CID).  I recommend that this information should be kept in one file.  (Ideally, it would be useful if you add an additional column for the mapped CID to the original excel file downloaded from EMA.  (Of course, you need to keep a copy of the unmodified original file seperately).

>> for later work, what would be the smart and faster way to do clean up just like the last list?

Are you talking about getting InChIKeys or mapping synonyms with CIDs, or something else?  Please tell me more specifically.

 

olcc s16 | Thu, 04/06/2017 - 21:31

Dr. Kim
>>Mapping names with CID- After cleaning up from the original list, I sorted the list alphabetically.Then, I convert the list from Excel to csv file, and do some clean up with the csv file because there are some quotation marks that Pubchem couldn't read( which giving me error report). Then I upload it through the identifier exchange service and choose synonyms as ID Input List.

>>Google Sheet- yes, you have mentioned about it previously but I don't know how to do it on Excel yet and since the list is cleaned up a bit so i thought it would be ok. I'll figure out how to redo it again on Excel.

>>I don't see the one-to-one mapping of the synonyms and the CIDs (which name was mapped to which CID).
I'm sorry. I forgot to upload that list

>> for later work, what would be the smart and faster way to do clean up just like the last list?
What I mean is cleaning up the list from biologics because I have been cleaning up the list manually and I'm afraid that I might skipped some or encountered a larger list in the future.

Thanks

Sunghwan Kim | Sat, 04/08/2017 - 16:54

>> for later work, what would be the smart and faster way to do clean up just like the last list?

Well, it would be case-by-case.  In my case, I would do the mapping first with the raw list.  If your query does not exist in PubChem (because it's not small molecules or for whatever reason), the search will not return any.  Then, you have a list of the queries that fails, so you can look into only those failures while subsequent cleaning up.

 

Annotations