Searching for High-Value Molecules Using Reinforcement Learning and Transformers

1University of Montreal, Mila 2Intel Labs

Abstract

Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However, RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLFormer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLFormer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.

Overview of how an RL agent can generate molecules

Molecules generated by ChemRLFormer by reward hacking the docking functions.

Molecules generated by ChemRLFormer by reward hacking the docking functions

High value molecules generated by ChemRLFormer for various reward functions.

albuterol similarity

Molecules generated by ChemRLFormer by reward hacking the docking functions

amlodipine mpo

Molecules generated by ChemRLFormer by reward hacking the docking functions

celecoxib rediscovery

Molecules generated by ChemRLFormer by reward hacking the docking functions

deco hop

Molecules generated by ChemRLFormer by reward hacking the docking functions

drd2

Molecules generated by ChemRLFormer by reward hacking the docking functions

fexofenadine mpo

Molecules generated by ChemRLFormer by reward hacking the docking functions

gsk3b

Molecules generated by ChemRLFormer by reward hacking the docking functions

isomers c7h8n2o2

Molecules generated by ChemRLFormer by reward hacking the docking functions

isomers c9h10n2o2pf2cl

Molecules generated by ChemRLFormer by reward hacking the docking functions

jnk3

Molecules generated by ChemRLFormer by reward hacking the docking functions

median1

Molecules generated by ChemRLFormer by reward hacking the docking functions

median2

Molecules generated by ChemRLFormer by reward hacking the docking functions

mestranol similarity

Molecules generated by ChemRLFormer by reward hacking the docking functions

osimertinib mpo

Molecules generated by ChemRLFormer by reward hacking the docking functions

perindopril mpo

Molecules generated by ChemRLFormer by reward hacking the docking functions

qed

Molecules generated by ChemRLFormer by reward hacking the docking functions

ranolazine mpo

Molecules generated by ChemRLFormer by reward hacking the docking functions

scaffold hop

Molecules generated by ChemRLFormer by reward hacking the docking functions

sitagliptin mpo

Molecules generated by ChemRLFormer by reward hacking the docking functions

thiothixene rediscovery

Molecules generated by ChemRLFormer by reward hacking the docking functions

troglitazone rediscovery

Molecules generated by ChemRLFormer by reward hacking the docking functions

valsartan smarts

Molecules generated by ChemRLFormer by reward hacking the docking functions

zaleplon mpo

Molecules generated by ChemRLFormer by reward hacking the docking functions