Members : TBD
Mentor : Sunghwan Kim (PubChem/NCBI)
Project Description
If a drug gets a marketing authorization in Europe with orphan designation (meaning that it is approved for rare diseases), it will get a market exclusivity for 10 years (meaning that no "similar" drugs for the same indication cannot enter into the market). Please see this document for more details:
Therefore, the European Medical Agency, which is responsible for marketing authorization of medical products, requires the applicant of a new drug to submit a "similarity report", which compare the new drug with existing drugs in terms of molecular structure, mechanism of action, and indication. For more details, see Section 2.1 of this document:
http://ec.europa.eu/health//sites/health/files/files/orphanmp/doc/c_2008_4077_en.pdf
While the molecular structure similarity comparison is required for drug approval, molecular similarity is a very subjective concept, and no standard way to evaluate it.
For this reason, some papers have analyzed molecular similarity among approved drugs using several 2-D similarity methods:
http://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-5
http://www.sciencedirect.com/science/article/pii/S1359644616304718
However, these studies evaluated molecular similarity using 2-D similarity methods, and if 3-D similarity methods are used, we will have some different insights on similarity assessment for EMA's orphan designations.
Methods
This study will take the following steps:
- Get all approved orphan drugs from the European Medicines Agency
- Retrieve all known drugs from a public database (e.g., PubChem, DrugBank)
- Generate 3-D conformers for the drugs in (1) and (2)
- Compute 3-D similarity scores between the drugs, using the 3-D conformers generated in (3) and several 3-D similarity methods.
- Compute 2-D similarity scores between the drugs, using commonly used 2-D fingerprint methods.
- Identify drug-drug pairs with a low 2-D score but with a high 3-D score (meaning that the two drugs are similar in 3-D but not in 2-D).
- Identify drug-drug pairs with a high 2-D score but with a low 3-D score [that is, opposite to (6)].
- Discuss the difference between 2-D and 3-D similarity in recognizing molecular similarity.
- Discuss potential impacts of using 3-D similarity methods for EMA's similarity assessment for marketing authorization.
- Discuss how EMA's and FDA's regulations are different in terms of orphan drug marketing approval.
This project is quite straightforward, but would take more time than other projects, because 3-D similarity comparison takes longer than 2-D similarity comparison.
 
         
 
Comments 11
Extra Reading
Please read these two papers
I recommend these two papers:
How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space
J. Chem. Inf. Model., 2009, 49 (1), pp 108–119 (http://pubs.acs.org/doi/abs/10.1021/ci800249s)
How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space
J. Chem. Inf. Model., 2014, 54 (1), pp 230–242 (http://pubs.acs.org/doi/abs/10.1021/ci400469u)
Please read these two papers
I recommend these two papers:
How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space
J. Chem. Inf. Model., 2009, 49 (1), pp 108–119 (http://pubs.acs.org/doi/abs/10.1021/ci800249s)
How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space
J. Chem. Inf. Model., 2014, 54 (1), pp 230–242 (http://pubs.acs.org/doi/abs/10.1021/ci400469u)
List of Orphan Drugs
http://www.ema.europa.eu/ema/index.jsp?curl=pages%2Fmedicines%2Flanding%...
This is a link to the list of Orphan Drugs from European Medicines Agency. They'll be sorted according to alphabet order as a full list. We can download this list as worksheet also.
Phuc
Map drugs with PubChem CIDs
Yes, I believe the link contains the information that we should start with. Please map drug names with PubChem CIDs (and other necessary information for next step). I see some couple issues here:
(1) The list contains biologics (e.g., monoclonal antibody and protein drugs) as well as small molecule drugs. So, it's likely that you will fail to map these drugs with CIDs, because PubChem primarily contains small molecules. Do not worry even if you get many failed search. We will focus on small molecules.
(2) The list contains thousands of drugs, while Google Sheet cannot make that many web service request, so you will need to use MS Excel for this task. Once you have a shorter list of drugs (after removing biologics and duplicates), it may be possible to use Google Sheet (depending on the size of the data set).
Clean up List
More Efficient Way
How did you do th mapping?
>> I tried to upload the list
>> I tried to upload the list into GoogleSheet and linked to CACTUS resolver to retrieve the Standard Inchi Key.
I've already mentioned that GoogleSheet can't not handle a large number of web service requests. Excel is the one you should.
>> I upload the list to Pubchem as synonyms to find and I got 673 results
I don't see the one-to-one mapping of the synonyms and the CIDs (which name was mapped to which CID). I recommend that this information should be kept in one file. (Ideally, it would be useful if you add an additional column for the mapped CID to the original excel file downloaded from EMA. (Of course, you need to keep a copy of the unmodified original file seperately).
>> for later work, what would be the smart and faster way to do clean up just like the last list?
Are you talking about getting InChIKeys or mapping synonyms with CIDs, or something else? Please tell me more specifically.
Dr. Kim
Dr. Kim
>>Mapping names with CID- After cleaning up from the original list, I sorted the list alphabetically.Then, I convert the list from Excel to csv file, and do some clean up with the csv file because there are some quotation marks that Pubchem couldn't read( which giving me error report). Then I upload it through the identifier exchange service and choose synonyms as ID Input List.
>>Google Sheet- yes, you have mentioned about it previously but I don't know how to do it on Excel yet and since the list is cleaned up a bit so i thought it would be ok. I'll figure out how to redo it again on Excel.
>>I don't see the one-to-one mapping of the synonyms and the CIDs (which name was mapped to which CID).
I'm sorry. I forgot to upload that list
>> for later work, what would be the smart and faster way to do clean up just like the last list?
What I mean is cleaning up the list from biologics because I have been cleaning up the list manually and I'm afraid that I might skipped some or encountered a larger list in the future.
Thanks
cleaning up after mapping
>> for later work, what would be the smart and faster way to do clean up just like the last list?
Well, it would be case-by-case. In my case, I would do the mapping first with the raw list. If your query does not exist in PubChem (because it's not small molecules or for whatever reason), the search will not return any. Then, you have a list of the queries that fails, so you can look into only those failures while subsequent cleaning up.