DSpace Repository

TRANSMOL: REPURPOSING A LANGUAGE MODEL FOR MOLECULAR GENERATION

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

Show simple item record

dc.contributor.author Zhumagambetov, Rustam
dc.contributor.author Molnár, Ferdinand
dc.contributor.author Peshkov, Vsevolod A.
dc.contributor.author Fazli, Siamac
dc.date.accessioned 2022-03-09T17:20:09Z
dc.date.available 2022-03-09T17:20:09Z
dc.date.issued 2021
dc.identifier.citation Zhumagambetov, R., Molnár, F., Peshkov, V. A., & Fazli, S. (2021). Transmol: repurposing a language model for molecular generation. RSC Advances, 11(42), 25921–25932. https://doi.org/10.1039/d1ra03086h en_US
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/6088
dc.description.abstract Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other domains that have also benefited; among them, life sciences in general and chemistry and drug design in particular. In concordance with this observation, from 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However to date, attention mechanisms have not been employed for the problem of de novo molecular generation. Here we employ a variant of transformers, an architecture recently developed for natural language processing, for this purpose. Our results indicate that the adapted Transmol model is indeed applicable for the task of generating molecular libraries and leads to statistically significant increases in some of the core metrics of the MOSES benchmark. The presented model can be tuned to either input-guided or diversity-driven generation modes by applying a standard one-seed and a novel two-seed approach, respectively. Accordingly, the one-seed approach is best suited for the targeted generation of focused libraries composed of close analogues of the seed structure, while the two-seeds approach allows us to dive deeper into under-explored regions of the chemical space by attempting to generate the molecules that resemble both seeds. To gain more insights about the scope of the one-seed approach, we devised a new validation workflow that involves the recreation of known ligands for an important biological target vitamin D receptor. To further benefit the chemical community, the Transmol algorithm has been incorporated into our cheML.io web database of ML-generated molecules as a second generation on-demand methodology en_US
dc.language.iso en en_US
dc.publisher RSC Advances en_US
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Type of access: Open Access en_US
dc.subject Transmol en_US
dc.subject molecular generation en_US
dc.title TRANSMOL: REPURPOSING A LANGUAGE MODEL FOR MOLECULAR GENERATION en_US
dc.type Article en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States