Abstract:
Recent advances in convolutional neural networks have inspired the application of
deep learning to other disciplines. Even though image processing and natural lan-
guage processing have turned out to be the most successful, there are many other
areas that have benefited, like computational chemistry in general and drug design in
particular. From 2018 the scientific community has seen a surge of methodologies re-
lated to the generation of diverse molecular libraries using machine learning. The first
goal is to provide an accessible way of using machine learning algorithms to chemists
without technical knowledge. Hence, cheML.io, a web database that contains vir-
tual molecules generated by 10 recent ML algorithms, is proposed. It allows users to
browse the data in a user-friendly and convenient manner. ML-generated molecules
with desired structures and properties can be retrieved with the help of a drawing
widget. For the case of a specific search leading to insufficient results, users are able
to create new molecules on demand. The second goal is to develop an algorithm that
allows the generation of diverse focused libraries utilizing one, or two seed molecules
which guide the generation of de novo molecules. Here a variant of transformers,
an architecture recently developed for natural language processing, was employed for
this purpose. The results indicate that this model is indeed applicable for the task of
generating focussed molecular libraries and leads to statistically significant increases
in some of the core metrics of the MOSES benchmark. A benchmark that provides
baselines and metrics that can characterize the main attributes of the algorithms by
examining the generated molecules. In addition, a novel way of generating libraries
where two seed molecules can be fused is introduced.