RDKit 的使用之获得Canoncial smiles 格式分子表达式

作者: 张群峰发布: 2021-12-19 6,003阅读 26评论

目前，化合物的 SMILES 格式表示逐渐被科研工作者们认知与认可。SMILES 是 Simplified Molecular Input Line Entry System 首字母的缩写，它是一种ASCII编码的线性符号，主要用来在一维上用文本语言表示化合物的结构。最先采用SMILES 格式的是美国的化学信息学家David Weininger，该格式于1986年由其公司Daylight Chemical Information Systems开发并创建。

SMILES表示法规则

SMILES标记根据某些规则将化学结构转换为字符串：

原子由各自原子符号表示，相邻原子表示彼此相连，简单的H连接可以省略
单键，双键，三键和芳香键分别以符号”-” ”=” ”#” ”:”表示（通常单键和芳香键可省略）
带有分支的原子写在左侧，用“()”指定，分支上的元素写在右侧
裂解环结构以形成链结构，并且裂解位点用数字表示。
断开的化合物被写成由”.”分隔的单独结构,顺序是任意的。

Canonical SMILES 表示法

通常我们可以利用不同软件或方法获得化学分子的smiles格式，但细心的人可能发现，虽然看到的3D 结构是一样的，但不同来源的SMILES格式是不同的，比如从PUBCHEM 中下载的结构文件导出的SMILES格式跟利用ChemDraw画出的相同的化学结构导出的SMILES格式其实是有很大差异的。3-Trifluoromethyl- L- phenylalanine的SMILES格式如下：

PUBCHEM :FC(F)(F)c1cc(C[C@H](N)C(=O)O)ccc1

Chemdraw:O=C(O)[C@@H](N)CC1=CC=CC(C(F)(F)F)=C1

这可能因为在转为SMILES 格式并没有统一规定哪个原子是起点。这也导致了不同的数据库的分子结构SMILES 表示不一，对于直接应用化学结构SMILES格式的研究的人来说，会造成很大的麻烦。因此产生了一个与该化合物名称相对应的SMILES 表示法—— ”Canonical SMILES”。

通常将通用的SMILES 格式转为 Canonical SMILES 格式称为“规范化”。然而，规范的SMILES常常是根据商业化的CANGEN算法生成的，还需要搭配使用Daylight的软件，使用起来比较受限制。

使用RDKit将SMILES 转为 Canonical SMILES

此处介绍一个方法，使用python的工具包 RDKit 可以实现SMILES格式的统一化，将SMILES格式转为Canonical SMILES 格式。(安装方法见下文)

转化思路：将不同来源的SMILES 格式转为RDKit 的Mol 对象，然后将RDKit的Mol 对象转为 canonical SMILES格式。

测试代码如下:

from rdkit import Chem
smi_PUBCHEM = 'FC(F)(F)c1cc(C[C@H](N)C(=O)O)ccc1'
mol = Chem.MolFromSmiles(smi_PUBCHEM)
canonical_smi_PUBCHEM = Chem.MolToSmiles(mol)
canonical_smi_PUBCHEM

'N[C@@H](Cc1cccc(C(F)(F)F)c1)C(=O)O' #运行结果

smi_Chemdraw = 'O=C(O)[C@@H](N)CC1=CC=CC(C(F)(F)F)=C1'
mol = Chem.MolFromSmiles(smi_Chemdraw)
canonical_smi_Chemdraw = Chem.MolToSmiles(mol)
canonical_smi_Chemdraw

'N[C@@H](Cc1cccc(C(F)(F)F)c1)C(=O)O' #运行结果

RDKit 的安装

RDKit是一个开源的python包，用于化学信息学领域。基于python语言的调取使用，可进行化合物描述符的生成，分子指纹的生成，化合物结构相似性计算，优化分子构象等。

安装

首先在电脑上构建python 环境，建议安装anaconda 软件。

安装好后，利用windows搜索功能找到 anaconda prompt, 输入以下命令，然后按回车等待安装。

conda install -c rdkit rdkit

注：如果首次安装出现安装失败，不用着急找教程，多安装几次就会成功。

验证是否成功

继续在anaconda prompt 中输入命令，如果能看到rdkit 具体版本即可证明安装成功。

Python
import rdkit 
rdkit.__version__

注:此处__是输入法半角状态下两个_ ，可直接复制粘贴回车运行。

基础教程及详细信息请参考以下链接：

RDKit 中文教程

http://rdkit.chenzhaoqiang.com/basicManual.html

442赞

作者: 张群峰

发表回复取消回复

评论列表(26)

Kmerwg说道：

2024-04-19 13:43

order prandin online – buy cheap jardiance how to buy jardiance

回复
Xeiwym说道：

2024-04-17 13:38

glyburide 5mg us – actos canada order dapagliflozin generic

回复
Enriqueta说道：

2024-04-16 03:14

Wow, incredible blog layout! How long have you ever
been blogging for? you make blogging glance easy.
The full look of your site is wonderful, let alone the content!
You can see similar here dobry sklep

回复
Bxbrnb说道：

2024-04-15 12:49

buy desloratadine 5mg online cheap – purchase desloratadine buy cheap generic ventolin

回复
Hcchhr说道：

2024-04-15 07:35

depo-medrol cheap – fml-forte price order astelin 10 ml nasal spray

回复
Pdxnhw说道：

2024-04-13 03:55

buy ventolin inhalator online cheap – albuterol drug order theophylline 400 mg online

回复
Elva说道：

2024-04-11 05:37

Definitely believe that which you said. Your favorite justification seemed to be on the internet the simplest thing
to be aware of. I say to you, I definitely get irked while people consider worries that they just don’t know about.
You managed to hit the nail upon the top as well as defined out the whole thing without
having side effect , people can take a signal.
Will likely be back to get more. Thanks

my blog … vpn coupon code 2024

回复
Svhsak说道：

2024-04-09 14:53

azithromycin 500mg pills – buy generic tindamax purchase ciprofloxacin sale

回复
Tim说道：

2024-04-08 05:52

Hello there! I just want to give you a big thumbs up for
the excellent information you’ve got here on this post.
I am returning to your blog for more soon.

Feel free to surf to my blog post vpn special

回复
Ewpidx说道：

2024-04-04 04:32

order amoxicillin – buy cephalexin online purchase baycip generic

回复
Bqlmss说道：

2024-04-03 16:10

order augmentin 1000mg for sale – clavulanate generic cipro order online

回复
Gbykuv说道：

2024-03-31 13:41

buy hydroxyzine sale – buy atarax pill amitriptyline sale

回复
Wbztls说道：

2024-03-30 13:54

clomipramine cost – remeron 30mg over the counter sinequan 25mg us

回复
Knlpnr说道：

2024-03-29 13:07

buy seroquel 50mg – cheap fluvoxamine eskalith for sale online

回复
Jitvgm说道：

2024-03-27 14:59

clozaril 50mg sale – order altace 5mg generic famotidine 20mg price

回复
Xmvsyc说道：

2024-03-26 13:41

buy retrovir for sale – biaxsig cost zyloprim 300mg for sale

回复
Esmmor说道：

2024-03-25 13:30

brand glycomet – buy metformin pill purchase lincocin pills

回复
Syqcld说道：

2024-03-23 07:55

cost lasix – buy lasix 100mg online order capoten

回复
Ybgftw说道：

2024-03-21 20:46

purchase ampicillin oral penicillin amoxil us

回复
Gfuijk说道：

2024-03-19 17:07

buy ivermectin uk – buy amoxiclav cheap purchase tetracycline

回复
Kpcaaf说道：

2024-03-19 14:35

valtrex 1000mg tablet – zovirax 800mg tablet buy zovirax online

回复
Rndkah说道：

2024-03-17 13:09

ciplox 500 mg for sale – order erythromycin 250mg without prescription buy erythromycin without prescription

回复
Brvksd说道：

2024-03-17 09:47

flagyl 200mg ca – cleocin cheap azithromycin 500mg canada

回复
Rgzupy说道：

2024-03-14 14:14

ciprofloxacin 1000mg uk – buy augmentin paypal amoxiclav oral

回复
Nhfzzh说道：

2024-03-14 10:26

cipro 500mg price – order augmentin online buy augmentin 375mg generic

回复
Bqegul说道：

2024-03-13 02:50

atorvastatin where to buy oral atorvastatin atorvastatin 40mg tablet

回复