Fellows | AiChemist MSCA-DN

DC1 - Fabian Krüger

Nationality: German

Personal introduction:

My academic background includes a Bachelor of Science in Biochemistry from the Technical University of Munich, followed by a Master of Science in Biochemistry at the same institution, along with an exchange program at the University of Wollongong in Australia.

Throughout my career, I've gained diverse research experience, including positions at the Fraunhofer Institute for Cognitive Systems, the Max Planck Institute of Biochemistry, and the Federal Institute for Materials Research and Testing. My work has spanned topics such as trust in AI decision-making, cryo-electron tomography of neurotoxic aggregates, and the ecotoxicity of building materials. These experiences have provided me with a solid foundation in both experimental biochemistry and computational modeling.

Research Topic: Improving accuracy and applicability domain of models using representation learning

As an early-stage researcher in the AiChemist consortium, my work focuses on advancing the accuracy and applicability domain of models through cutting-edge representation learning techniques. I am particularly interested in the potential of graph neural networks and transformers to enhance predictive modeling in the field of drug discovery. I will begin at AstraZeneca under the supervision of Professor Ola Engkvist for the first 18 months, before moving to Helmholtz Munich to work with Dr. Igor Tetko and Professor Fabian Theis at Technical University of Munich for the latter half of my studies. My research aims to benchmark modern representation learning methods against traditional machine learning approaches, with a particular focus on their performance in predicting biological activity, toxicity, and ADME properties.

Contact: LinkedIn ORCID

Organizations:

AstraZeneca, March 1st 2024 - August 31st 2025

Helmholtz Zentrum München, September 1st 2025 - March 31st 2027

Secondments: AstraZeneca, October - December 2025, Bayer, December 2025

Presentations and Posters

"Publishing Neural Networks in Drug Discovery Might Compromise Training Data Privacy" at the AIChemist-CECAM Flagship School, Lausanne, Switzerland, April 28, 2025.
"Can Publishing Neural Networks Expose Confidential Training Data? Risks in Drug Discovery" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC2 - Dina Khasanova

portrait-Dina

Nationality: Russian

Personal introduction:

My background is in chemistry and chemoinformatics. I received my BSc in Chemistry from Kazan Federal University and a double MSc in Chemoinformatics from Bar-Ilan University and the University of Strasbourg. While I was still an undergraduate, I joined the Laboratory of Chemoinformatics and Molecular Modeling, where I worked on standardizing chemical reaction conditions and predicting reaction yields, which subsequently became the topic of my bachelor’s thesis. During my master's degree, I applied machine learning to drug discovery. While working on my master's thesis, I became interested in explainable AI to interpret my models for GPCRs. With this background, I was excited to find a PhD position within the AiChemist Doctoral Network, where I will use XAI for multi-task molecular representations to generate new hybrid chemotypes that we hope will go beyond what a human expert might initially consider.

Research topic: Using XAI to develop hybrid chemotypes

In this project, artificial intelligence (AI) approaches will be used to further develop novel hybrid chemotype rules, e.g., reactions and/or alerts which are transparently applied in regulatory settings. Traditionally, the identification of substructural fragments associated with specific chemical modes of action (e.g., molecular initiating events) has relied on human expertise in chemistry, biology, pharmacology, and toxicology. Chemotypes based on Chemical Structure & Reaction Mark-Up Language (CSRML) define hybrid rules representing structural motifs as well as atomic and molecular properties, chemical reactivities, and metabolic transformations. Although the CSRML methodology has yielded important results (e.g., ToxPrint chemotypes), development of new knowledge is resource intensive. Adoption of a machine learning approach within the hybrid chemotype definitions increases the predictive power; however, the approach still requires human expertise. This project will use XAI (existing and to be developed in the project) for chemically explainable multi-task molecular representations to generate new hybrid chemotypes that will go beyond what a human expert might initially consider but will still be interpretable and consistent with human knowledge. Due to the relatively large amount of available training data for genetic toxicity, skin sensitization and cardiotoxicity, this project will focus on developing novel hybrid chemotypes for these endpoints. The hybrid chemotypes can be tested and refined in collaboration with other AiChemist fellows.

Contact: LinkedIn ORCID

Organizations:

Helmholtz Zentrum München, September 16th 2024 - August 31st 2025

Molecular Networks, September 1st 2025 - August 31st 2026

Pfizer, September 1st 2026 - September 15th 2027

Planned secondments: AstraZeneca, February 2026

Presentations

"Explainable AI for Functional Groups" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC3 - Karoline Schjelde

portrait-Dina

Nationality: Danish

Personal introduction:

I hold both a Bachelor's and Master's degree in Nanoscience from the University of Copenhagen, where I pursued an interdisciplinary education that integrated elements of physics, chemistry, and molecular biology. The program provided extensive flexibility to explore various scientific disciplines, and I developed a particular interest in computational and quantum chemistry. For my Master’s thesis I performed computational investigations of azobenzene-based Molecular Solar Thermal Energy Storage (MOST) systems. My work involved developing a framework for generating diverse azobenzene variations, which properties were calculated using semi-empirical quantum mechanical methods. These results were further analyzed using machine learning techniques to screen for the most promising candidates for energy storage applications. My current research interests are primarily centered on computational chemistry, with a focus on the practical application of computational methods, including molecular mechanics, ab initio methods, density functional theory, and hybrid quantum-classical approaches.

Research topic: Predicting Side Reactions Using a Combined Meta-dynamics-MD and Machine Learning Approach

The prediction of side reactions in organic synthesis is critical for enhancing the efficiency and reliability of chemical processes. While machine learning (ML) models have shown significant progress in prediction reaction outcomes, challenges persist in accurately predicting the interactions of functional groups or scaffolds with catalytic systems. Transition metal catalysts are particularly prone to side reactions, where catalytic intermediates may be deactivated by other reaction components, leading to unwanted transformations or catalyst inhibition.

My research focuses on predicting these undesired chemical reactions involving catalysts and reactive intermediates. I plan to integrate Meta-dynamics-MD simulations at the semi-empirical level with machine learning techniques to predict side reactions. The Meta-dynamics-MD simulations are utilized to estimate the rates of low-barrier transformations that is characteristic of catalyst deactivation or other side reactions. Data generated from these simulations are subsequently employed to train ML models capable of predicting potential decomposition or inhibition pathways based on 2D molecular representations. The ultimate aim is to develop an automated workflow that not only predicts these side reactions but also offers insights into optimizing reaction conditions, thereby enhancing the accuracy and robustness of synthetic predictions.

Contact: LinkedIn ORCID

Organizations:

AstraZeneca, August 1st 2024 - January 31st 2026

UCPH, February 1st 2026 - July 31st 2027

Planned secondments: UNISTRA, July 2026

Presentations

"Predicting side reactions using a combined meta-dynamics-MD and ML approach" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC4 - Matthew Ball

portrait-Dina

Nationality: British

Personal introduction:

Throughout my Bachelor's and Master's degrees in Natural Sciences at the University of Cambridge, I have always been interested in Chemistry, specifically applying mathematical and computational methods to solve problems in Chemistry. Towards the end of my degree, I began to specialise in Theoretical and Physical Chemistry, and I was especially interested in their applications to understand chemical synthesis (which I was continuing to learn about). Wanting to expand my biological knowledge, my master’s research project had a different focus: protein design and macromolecular simulation. This introduced me to the fields of bioinformatics and computational antibody design; and gave me exposure to state-of-the-art deep-learning techniques. I loved the experience of researching my project, which led me to apply for the AiChemist program, where I can apply my chemical understanding with the skills I learnt from my Master’s research in a single project.

My research will focus on reaction optimisation, particularly predicting optimal reaction conditions for a diverse set of chemical transformations highly relevant to medicinal organic chemistry. Currently, reaction condition data is sparse and suffers from issues: a lack of negative data (conditions which don’t lead to a reaction are not typically published) and a many-to-many relationship, products can be created in high yield by several different sets of conditions. Utilising AstraZeneca's in-house data and simulations, this project aims to use physical and quantum mechanical descriptors to augment and improve existing methods for condition prediction, which we hope to validate using experimentation campaigns.

Research topic: Prediciton of optimal reaction conditions using Artificial Intelligence tools

My research will focus on reaction optimisation, particularly predicting optimal reaction conditions for a diverse set of chemical transformations highly relevant to medicinal organic chemistry. Currently, reaction condition data is sparse and suffers from issues: a lack of negative data (conditions which don’t lead to a reaction are not typically published) and a many-to-many relationship, products can be created in high yield by several different sets of conditions. Utilising AstraZeneca's in-house data and simulations, this project aims to use physical and quantum mechanical descriptors to augment and improve existing methods for condition prediction, which we hope to validate using experimentation campaigns.

Contact: LinkedIn ORCID Personal Blog

Organizations:

UNISTRA, September 1st 2024 - February 28th 2026

AstraZeneca, March 1st 2026 - August 31st 2027

Planned secondments: UCPH, February 2026

Presentations

"Predicting Reaction Conditions: A Data-Driven Perspective" at the AIChemist-CECAM Flagship School, Lausanne, Switzerland, April 28, 2025.
"Predicting Organic Reaction Conditions:A Data-Driven Perspective" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC5 - Bob van Schendel

portrait-Dina

Nationality: Dutch

Personal introduction: I am a Phd researcher in the AIChemist consortium, working on integrating experimental and in silico simulations, further combined with multi-task deep learning to improve reactivity prediction models.

After a Bachelor in Computer Science and a Master in Bioinformatics, I was enthusiastic about drug discovery and continued my work there as a PhD student in this program. I am very happy to be part of the AiChemist consortium and starting together with so many other PhD researchers. This allows us to effectively learn from each other, share knowledge and experiences and motivate each other. I am also in a position to use my background best, using my Computer Science experience to effectively make workflows combining the many programs, data sources and models we need to use. Additionally, I get to dive deeper into (quantum) chemistry with the guidance of experts in this field, including my supervisors, Mikhail Kabeshov and Mike Preuss.

Research topic: Multi-task Neural Network reactivity prediction using in-silico simulations and synthesis experimental data

Today, chemical synthesis prediction remains a challenging and data-intensive process, where researchers rely heavily on experimental data to forecast the outcomes of complex chemical reactions. Despite significant advancements, these models still require vast amounts of data across every region of the chemical space to be accurate enough for practical applications. However, experimental data alone is insufficient to cover the entire chemical landscape, leaving gaps that hinder predictive accuracy. To address this, scientists are turning to in silico simulations, such as quantum chemistry and Density Functional Theory (DFT), to fill in these gaps. These simulations can model chemical mechanisms with precision but are often too slow for large-scale synthesis predictions.

To overcome this bottleneck, a novel approach combining multi-task learning and transfer learning is being proposed. By integrating real-world experimental data with in silico simulations, researchers aim to create machine learning models capable of automatically predicting chemical reactivity in low-data regions of chemical space. This approach not only enhances the models' predictive power but also addresses the critical challenge of efficiently integrating diverse data sources. The ultimate goal is to develop a robust synthesis prediction framework that leverages the strengths of both experimental and simulated data, revolutionizing the way chemical reactions are predicted and ultimately accelerating the discovery of new compounds.

Contact: LinkedIn ORCID

Organizations:

AstraZeneca, March 1st 2024 - August 31st 2025

ULEI, September 1st 2025 - March 31st 2027

Planned secondments: UCPH, February 2026

Presentations

"Combining QM simulations and ML models for reactivity prediction" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC6 - Mateusz Iwan

portrait-Mateusz

Nationality: Polish

Personal introduction: Early-stage researcher with the AiChemist consortium, focusing on predicting adverse drug reactions using machine learning and artificial intelligence.

During my studies at Jagiellonian University, I explored various branches of chemistry and eventually discovered a passion for computational and quantum chemistry. This interest led to my Master’s thesis, which investigated the covalent binding modes of ligands to proteins from the NF-κB family. During this time, I also developed a strong interest in programming and the application of machine learning and artificial intelligence in drug design and discovery. After completing my studies, I interned with the ML/AI team at Selvita S.A., where I focused on benchmarking various active learning strategies, detecting protein-ligand interactions, and developing QSPR models. Following this internship, I joined the AiChemist program to pursue a PhD at the intersection of chemistry and artificial intelligence.

Currently, as an early-stage researcher, my work centers on predicting adverse drug reactions, with a particular focus on cardiotoxicity, nephrotoxicity, and hepatotoxicity. My aim is to enhance drug discovery and development by identifying toxic molecules at early stages.

Research topic: Advanced ML methods to predict and understand toxicity of drugs

Predicting and understanding toxicological liabilities of small molecules is of utmost importance. The thesis will be aimed at the application of Machine Learning (ML) methods in order to develop predictive models for different endpoints of toxicity. Initially, the focus of the project will be on the cardiotoxicity, kidney and liver toxicity of drugs. State-of-the-art techniques such as Graph Neural Networks (GCNs), multitask and transfer learning will be used to predict the results of biological assays with the aim to develop models that can estimate not only the toxic effects but also the Mode Of Actions (MOAs) of the molecules. Explainable AI will reveal the most likely MOAs and the chemical substructures that most contribute to the identified risks. The doctoral candidate will work on building models of cardiotoxicity, liver toxicity and kidney toxicity, taking into account known and possible mechanisms, and also explain predictions of all models and correlate them with known issues for compounds with known mode of action. The models will be tested on internal data at Bayer, and successful models will be deployed and shared within a multi-objective system.

Contact: LinkedIn ORCID GitHub X

Organizations:

IRFMN, February 1st 2024 – July 31st, 2025

Bayer, August 1st 2025 - January 31st 2027

Secondments: TU/e, June 2025

Presentations

Pharmacovigilance meets demographics: Towards Personalized Cardiotoxicity Prediction at Club delle 2 Seminar at IRFMN, May 14, 2025.
Pharmacovigilance meets demographics: Towards Personalized Cardiotoxicity Prediction at QSAR 2025, June 4, 2025.
"Demographic-Sensitive Cardiotoxicity Prediction" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC7 - Andi Hunklinger

portrait-Andi

Nationality: German

Personal introduction: Since my school days, I have been fascinated by the intricate mechanisms of nature and was driven by the desire to understand more and more fundamental principles of the world. In 2016, I started studying Chemistry and Biochemistry at the Ludwig-Maximilians-university in Munich and after two years I also began a Computer Science study. The two distinct studies enabled me to develop a comprehensive skillset, rich in both theoretical and practical experience. In chemistry, I was particularly fond of organic synthesis and structural biology lab work. On the computer science side, I acquired knowledge of the basic principles and was inspired by the exciting advancements in machine learning models, which motivated me to apply these tools in research questions relevant to life sciences. Over the years, I laid the groundwork, which helped me succeed in interdisciplinary internships during my MSc in Chemistry, culminating in a master thesis about the optimization of generative chemical language models and the synthesis and testing of the designed small molecule hits. I am excited to continue working on generative language models, with a greater emphasis on model interpretability. Frequently, lab researchers are required to trust machine learning models based on their evaluation metrics rather than their logic. With my background, I grasp the technical aspects of these models while understanding the biological context of the topic. By integrating these perspectives, my work aims to assist the scientific community in critically evaluating the generation of novel enzymes from protein language models by analyzing the model's reasoning in a biologically understandable way.

Research topic: Generative language models for the design of tailored chemical transformations

Enzymes are attractive nanoscopic material capable of accelerating chemical transformations several orders of magnitude, while working in sustainable, mild conditions. Understandably, enormous research efforts have been put into the engineering of enzymes that catalyze chemical reactions in a greener and cheaper fashion. Especially from an industry perspective, the improvement of enzymatic activity and the ability to catalyze new-to nature reactions could offer substantial benefits. Generative protein language models treat the protein amino-acid sequences as a language and generate new sequences in an auto-regressive token-wise manner. By inputting an starting query, such as a reaction that should be catalyzed, the enzyme generation can be controlled. These models typically employ the Transformer decoder-only architecture, which is not inherently interpretable and requires external explainability techniques to understand how they learn and design sequences. This analysis will be a key component of the thesis, potentially leading to improvements in the model, the development of downstream prediction tasks, and a deeper understanding of the protein language.

Contact: LinkedIn ORCID GitHub Google Scholar

Pre-prints and articles:

Hunklinger, A.; Hartog, P.; Šícho, M.; Godin, G.; Tetko, I. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS Joint Compound Solubility Challenge. SLAS Discovery. 2024. https://doi.org/10.1016/j.slasd.2024.01.005

Presentations at conferences and meetings:

Hunklinger, A.; Ferruz, N. Protein Design with Explainable Artificial Intelligence. European RosettaCon 2024 (https://www.europeanrosettacon.org/) (November 2024).

Organizations:

CRG, March 1st 2024 – August 31st 2025

Bayer, September 1st 2025 - February 28th 2027

Secondments: TU/e, June 2025

Presentations

"Explainability of Transformer-based models for the design of protein sequences" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC8 - Ghaith Mqawass

portrait-Ghaith

Nationality: Syrian

Personal introduction: I am Ghaith [nickname: Gito]. I hold a degree in Electrical Engineering from Tishreen University and a Master’s in Data Science from Skoltech in Moscow. My professional journey includes research experience at the University of Vienna, where I worked as a research assistant. I am currently on the Helmholtz Munich ICB team, working under the supervision of Fabian Theis. In my current role, I focus on perturbation modeling using drug and genetic screens in partnership with Bayer and Pfizer, contributing to innovative approaches in this cutting-edge field. My interdisciplinary background allows me to merge data science and engineering to solve complex biomedical challenges. I am an interdisciplinary researcher and a problem solver working at the intersection of AI, engineering, and biology.

Research topic: Modeling drug response in image-based screens as function of chemical space

Existing phenotypic screening techniques like Cell Painting enable the generation of immense datasets of images, showing the response of cell lines to chemical perturbations. This project aims to combine learned representations from the screening images (using modern ML techniques like self-supervised learning) and from the known chemical structure of tested compounds (using graph neural networks, for example) in an actionable way. It is based on pre-existing work in Theis lab, with the key novelty being the use of advanced chemical representations, which will be developed by other DCs. I am working on existing public and private datasets of hundreds of thousands of perturbations, with the objective of learning an image-based “morphometry latent space” by adapting and extending existing methods, using feature attribution methods to make the latent space accessible and to gain insights about the learned features from the morphological embedding. In the longer term, after integrating assay data coming from different laboratories and batches by creating a condition-invariant phenotype representation, we aim to provide meaningful and explainable encodings for drugs, from which a conditional generative model capable of sampling from the chemical space to obtain a desired phenotype will be built.

Contact: LinkedIn ORCID Personal Blog

Organizations:

TUM, June 1st, 2024 – March 31, 2025

Pfizer, April 1st - November 30th, 2025

Bayer, December 1st 2025 - May 31st 2027

DC9 - Marcel Hiltscher

portrait-Marcel

Nationality: German

Personal introduction: I am an early-stage researcher in the AiChemist consortium. I focus on developing explainable methods for multi-objective de novo drug design.

My academic background is in computer science, specializing in machine learning and applied mathematics. For my master’s thesis, I developed an approach that combines machine-learned potentials with active learning to explore the potential energy surfaces of molecules, aiming to identify transition states. This project sparked a sincere interest in the intersection of machine learning and natural sciences, leading me to pursue a position as a doctoral researcher. My research topic, which combines generative models, active learning, and XAI, offers a unique and comprehensive platform for impactful research. In addition to my academic background, I have gained industry experience as a machine learning engineer. During this time, I developed models for various applications, including speaker diarization and 2D/3D object detection, and was responsible for deploying these models. This industry experience has sharpened my development skills and enhanced my ability to collaborate effectively within large codebases.

Research topic: Explainable active learning for multi-objective de novo drug design

Exploring the vast chemical space to discover bioactive molecules is a significant challenge. In recent years, generative deep learning models have emerged as a popular and promising tool for navigating this space and generating compounds with desired properties. A critical aspect of these generative models is the consideration of multiple properties, such as ADME and toxicity, in the design of new compounds. One possible approach for generating novel compounds is active learning, which iteratively selects and optimizes the most promising candidates with respect to the multiple properties. However, generative models can produce many compounds on demand, making selecting candidates for subsequent iterations complex, especially as the decision influences the following iterations.

To address this challenge, we leverage explainable AI (xAI) to provide human-interpretable explanations of multi-objective molecular characteristics. These explanations can be used to make more informed decisions within the active learning framework.

Contact: LinkedIn ORCID

Organizations:

TU/e, August 1st 2024 – November 30th 2025

Sanofi, February 1st 2026 - July 31st 2027

Planned secondments: Bayer, January, 2026

Presentations

"Evaluating the Faithfulness of Fingerprint-based Explanations" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC10 - Subashini Kennedy

portrait-Subashini

Nationality: Indian

Personal introduction: Early-stage researcher of the Explainable AI for Molecules (AiChemist) project, focusing on developing quantum-based QSAR models to predict ADMET-related properties. My curiosity towards pharmaceuticals led me to pursue an undergraduate in Pharmaceutical Technology. During this program, I developed an interest in Computer-Aided Drug Design, which directed me toward a master's in Pharmaceutical Modelling at Uppsala University. My journey into the intersection of biology, chemistry, and AI began with participation in the Uppsala team for the 2021 iGEM competition. My Interest in the application of in silico methods culminated in a thesis project where I employed AlphaFold to predict protein structures in intestinal epithelia, aiming to understand drug absorption mechanisms. This experience significantly enhanced my interest in AI applications within pharmaceutical sciences. Following my master’s, I joined the Molecular Systems Physiology Group at the University of Galway to further explore and understand drug metabolism. I contributed to their ‘BugTheDrug’ project, by curating drug metabolism profiles of over 100 FDA-approved drugs. My combined interests in drug metabolism and the application of computational methods have naturally led me to my current PhD project.

Research topic: Simple quantum descriptors for actionable insights on ADMET-related properties

During my doctoral research, I will develop advanced predictive models for drug metabolism and toxicity by leveraging quantum chemistry-based descriptors. This approach will integrate quantum descriptors into machine learning frameworks to predict metabolism and toxicity properties, thereby enhancing the interpretability and reliability of these models. The project’s ultimate goal is to provide actionable insights that will support the development of safer and more effective drug candidates.

Contact: LinkedIn ORCID

Organizations:

ENS-PSL, June 1st 2024 – November 30th 2025

Sanofi, December 1st 2026 - May 31st 2027

Secondments: IRFMN, September, October, 2025

Presentations

Topology-Aware GNNs: Teaching Molecules Their Quantum Features at the AIChemist-CECAM Flagship School, Lausanne, Switzerland, April 28, 2025.
"Can a quantum approach to QSAR improve model performance?" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC11 - Vasilii Fastovskii

portrait-Ghaith

Nationality: Russian

Personal introduction: I am an early-stage researcher within the AiChemist consortium, focusing on multi-instance explainable learning. I recently completed the ChEMoinformaticsPlus Erasmus Mundus Joint Master's Degree program, where I conducted my Master's thesis at the University of Strasbourg in the Laboratory of Chemoinformatics. During my thesis, I focused on validating the Multi-Instance Learning (MIL) approach by benchmarking the Bag-Attention Net (B-AN), a multi-instance learning algorithm, to model the activity of Cdk2/CyclinA2 protein-protein complex inhibitors and retrieve the most active conformation of inhibitors containing an atropisomeric axis. I am particularly enthusiastic about the AiChemist Project, as it offers a robust platform to further my research career in chemoinformatics. My passion for Europe's diverse cultures has driven me to seek an environment that fosters intercultural exchange, and I am deeply committed to promoting collaboration between academia and industry through technology transfer, expertise sharing, and constructive dialogue.

Research topic: Multi-instance explainable learning for decoding stereo-dependent biological effects

Stereoisomerism is consensually recognized as a key feature in the rationalization of interactions between chemical entities in biological processes. In this regard, the thalidomide incident is often cited to illustrate the dramatic consequences of the presence of a wrong stereoisomer. However, stereoisomerism is a typical 3D feature that is vastly missing in 2D QSAR approaches. To solve this issue, 3D QSAR is more appropriate but suffers from the uncertainty arising from the somewhat arbitrary choices of the precise geometry of a molecule. To solve this issue, we propose to use the multi-instance learning approach. This machine-learning paradigm takes advantage of ensembles of equivalent versions of each data point, here the multiple computed conformers of molecules, for instance, generated using the MD studies carried out by other DCs. These methods work by weighting these conformers that are finally interpretable: for a given molecule, it is possible to retrieve the specific conformations that explain the prediction. It is then possible to compare structural data to better validate models or cross the 3D QSAR predictions with other 3D methods (docking, pharmacophore) to make better decisions. The method will be applied to topoisomers datasets and publicly available datasets of specific angles of rotation.

Contact: LinkedIn ORCID GitHub

Organizations:

UNISTRA, August 1st 2024 – January 31st 2026

Sanofi, February 1st 2026 - July 31st 2027

Planned secondments: ENS-PSL, January, 2026

Prensentations

"Multi-instance explainable learning for decoding stereo-dependent biological effects" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC12 - Eric Alcaide Aldeano

Nationality: Spanish

Personal introduction: My background is in Medicine and Physics, although I've had a passion for of AI for Sciences since my high school days, when I got my first preprint on activation functions. My degree thesis focused on the immune profiling of brain methastasis from lung cancer as a driver for precision medicine. During my time in industry, I worked in Protein-Ligand structure prediction, pioneering the "CoFolding" paradigm with neural networks operating on multimodal data and 3D objects. Currently, I am most interested in scaling deep learning models to leverage large amounts of data in an efficient way to unlock a variety of problems such as understanding and interacting with sequential data, predicting interpretable chemical reactions, and finding novel molecules that bind to previously-thought undruggable targets.

Research topic: Learning chemically explainable multi-task molecular representations

The application of artificial intelligence techniques for molecular property and reactivity prediction is still hindered by the fact that their predictions are rarely aligned with a chemist’s intuition. Current state-of-the-art models for a number of chemical modelling tasks are mostly adapted from natural language processing. Their inference procedure and the intrinsic lack of chemical interpretability are a source of suspicion from the chemists. Improvements in this direction will help professional chemists in the pharma industry leverage the full potential of AI models. Recent work has shown that it is possible to learn molecular representations possessing higher interpretability by design. In this project, I will advance this research direction by building neural architectures to learn molecular representations which are chemically interpretable, scalable and suitable for a broad range of modelling tasks, using graph neural networks, group-equivariant neural networks and incorporating the physical and chemical priors into the architectures.

Contact: LinkedIn ORCID

Organizations:

USI, July 1st, 2024 – December 31st, 2025

Pfizer, January 1st 2026 - June 30th 2027

Secondments: EPFL, September, 2025

DC13 - Xuan Vu Nguyen

Nationality: Vietnamese

Personal Introduction: I hold a Master’s degree in Cheminformatics - a joint program between the University of Strasbourg, the University of Milan, and Université Paris Cité - and have one year of industry experience as an Large Language Model (LLM) engineer. I enjoy viewing chemistry as a series of interconnected puzzles, whether in retrosynthesis or mechanism elucidation. The challenge is not only to solve these puzzles, but to solve them elegantly. My PhD aims to explore how these chemical puzzles can be represented for AI models, and how to teach those models to “play” them with the same elegance valued by chemists.

Research topic: Reliable synthesis planning - bridging the gap between algorithms and wet-lab / automation

A significant amount of effor has gone into automating molecular synthesis, a crucial component of the design–make–test–analyze cycle. Yet a truly reliable and transformative Computer-Aided Synthesis Planning (CASP) tool remains elusive. Current algorithms rely heavily on explorative search guided by single-step policy models, which often struggle to capture the strategic aspects of complete synthetic routes or to properly consider chemical feasibility.

The next generation of synthesis-planning tools will require strategy-aware policy models and improved reward systems for search. Among the promising directions, LLMs offer a compelling path forward - provided we can keep their hallucination tendencies in check. Beyond that, reaction condition prediction, route optimization, and robotic protocol translation are also in my bound of interest.

Contact: LinkedIn ORCID

Organizations:

LIAC - EPFL: 1st October - 30th September 2026

Pfizer: 1st October 2026 - 30th September 2028

Pre-prints and articles:

Xuan-Vu, Nguyen, et al. “Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs” (link coming soon)
Xuan-Vu, Nguyen, et al. "TempRe: Template generation for single and direct multi-step retrosynthesis." arXiv preprint arXiv:2507.21762 (2025).

Posters and presentations:

“Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs”. Accepted for spotlight talk and poster at ML4Molecule workshop, ELLIS UnConference, Copenhagen, Denmark. https://moleculediscovery.github.io/workshop2025/

"TempRe: Template generation for single and direct multi-step retrosynthesis." Poster at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.

DC14 - Jaehyeon Park

portrait-Jaehyeon

Nationality: Korean

Personal introduction: I majored in Physics with a minor in Chemical Engineering Applications. My prior research involved developing an enhanced machine learning model using deep learning for noise reduction. Addressing the significant issue of noise interference for individuals with hearing impairments, my work focused on denoising everyday sounds and speech enhancement using deep learning techniques. This involved distinguishing between noise and voice in ambient sounds via classification processes and improving voice clarity through amplification methods.

Furthermore, this approach facilitated the creation of an affordable AI-powered hearing aid, designed for universal accessibility regardless of socioeconomic status. This device utilizes AI for noise reduction and speech enhancement, implemented on a Raspberry Pi for a simplified hearing aid format.

Research topic: Development of XAI models for beyond rule of 5 chemical space molecules

Artificial intelligence (AI) models have been massively applied to design small molecule drugs. Even though more opportunities are awaiting to be found outside of the small molecule-centered chemical space, lack of cheminformatics tools hinders the application of AI for molecules beyond the traditional druggable chemical space such as natural products and their semi- synthetic derivatives, covalent inhibitors, and metal complexes, i.e., beyond the traditional rule of 5 chemical space (bRo5).

Explainable AI (XAI) machine learning (ML) models developed for traditional chemical space will need to be extended to be applicable to bRo5 chemical space. ML methods based on Natural Language Processing (NLP), which analyze presentation of molecules as text (SMILES) are gaining popularity and success in modelling of traditional chemical spaces. In this project, in addition to SMILES I will explore novel line notations of bRo5 molecules based on SMILES extensions, such as BILN, BigSMILES and SELFIES specifically developed to handle large molecules, with respect of their efficiency for development of ML for bRo5 using representation learning. Their accuracy and expandability will be tested for prediction of physicochemical properties, biological activities, and toxicities of molecules. The most accurate models will be used within reinforcement learning to design new active molecules for drug discovery. This research will allow to exploit the novel chemical spaces, which are gaining popularity in pharma industry.

Contact: LinkedIn ORCID GitHub

Organizations:

KIT, UST September 1st, 2024 – August 31st, 2027

Planned secondments: HMGU, December 1st 2026 - March 30th 2027

Presentations

"Web-based multi-target cytotoxicity prediction for multi-component nanoparticles: nano-QSAR model with extended applicability domain" at the AIChemist-CECAM Flagship School, Lausanne, Switzerland, April 28, 2025.
"Introduction to NanoToxRadar: Drug composition by gut-microbiome" at the AiChemist Spring School in Lausanne, Switzerland, April 25, 2025.