Optimising biological activity and ADME properties, while minimising toxicity, are objectives when developing new compounds. Advanced machine learning methods are indispensable to this process. The project will develop and benchmark representation learning approaches, addressing their accuracy and explainability, using public and in-house data for endpoints ranging from chemical reactions to toxicity. The program will be done with the target users: large companies, regulatory agencies and SMEs.
Traditional machine learning (ML) methods provide reliable predictions though only for compounds similar to the training set, thus defining their applicability domain (AD). Emerging representation learning approaches can efficiently approximate the physical interactions of molecules with an accuracy comparable to physics-based methods in only fractions of time. Models based on these representations should have much larger AD due to pre-training on large chemical sets of theoretical values. Here we will develop and benchmark representation learning approaches, addressing their accuracy and ADs, using public and in-house data for endpoints ranging from chemical reactions to toxicity. While explainable AI (XAI) methods are actively developing in the ML community, there is a gap with their use in chemistry, i.e. there is a need to translate their results to the end users, chemists and regulatory bodies. Since the research program is tightly coupled with the target users - large companies, regulatory agencies and SMEs - it provides a clear path for technology transfer from academia to industry. AiChemist will provide structured training to its fellows through a combination of online courses and schools, strengthening European innovation capacity in the education of specialists in AI methods. The fellows will receive comprehensive training in transferable skills. The complementary expertise and strong commitment of the partners make this ambitious innovative research program realistic via the proper allocation of individual tasks and resources, as described below.
This project is funded by the European Union’s Horizon research and innovation programme unde grant agreement No 101120466, and it is Horizon Europe (HORIZON) Marie Skłodowska-Curie Actions Doctoral Networks (MSCA-DN).