# publications

publications in reversed chronological order

## in review

- Transporting Densities Across Dimensions
*Michael Plainer*, Felix Dietrich, and Ioannis G. Kevrekidis2023Even the best scientific equipment can only partially observe reality. Recorded data is often lower-dimensional, e.g., two-dimensional pictures of the three-dimensional world. Combining data from multiple experiments then results in a marginal density. This work shows how to transport such lower-dimensional marginal densities into a more informative, higher-dimensional joint space by leveraging time-delayed measurements from an observation process. This can augment the information from scientific equipment to construct a more coherent view. Classical transportation algorithms can be used when the source and target dimensions match. Our approach allows the transport of samples between spaces of different dimensions by exploiting information from the sample collection process. We reconstruct the surface of an implant from partial recordings of bacteria moving on it and construct a joint space for satellites orbiting the Earth by combining one-dimensional, time-delayed altitude measurements.

## 2024

- Doob’s Lagrangian: A Sample-Efficient Variational Approach to Transition Path SamplingYuanqi Du*,
*Michael Plainer**, Rob Brekelmans*, Chenru Duan, Frank Noé, Carla P. Gomes, Alán Aspuru-Guzik, and Kirill Neklyudov*In Advances in Neural Information Processing Systems*, 2024Rare event sampling in dynamical systems is a fundamental problem arising in the natural sciences, which poses significant computational challenges due to an exponentially large space of trajectories. For settings where the dynamical system of interest follows a Brownian motion with known drift, the question of conditioning the process to reach a given endpoint or desired rare event is definitively answered by Doob’s h-transform. However, the naive estimation of this transform is infeasible, as it requires simulating sufficiently many forward trajectories to estimate rare event probabilities. In this work, we propose a variational formulation of Doob’s h-transform as an optimization problem over trajectories between a given initial point and the desired ending point. To solve this optimization, we propose a simulation-free training objective with a model parameterization that imposes the desired boundary conditions by design. Our approach significantly reduces the search space over trajectories and avoids expensive trajectory simulation and inefficient importance sampling estimators which are required in existing methods. We demonstrate the ability of our method to find feasible transition paths on real-world molecular simulation and protein folding tasks.

## 2023

- Transition Path Sampling with Boltzmann Generator-based MCMC Moves
*Michael Plainer**, Hannes Stärk*, Charlotte Bunne, and Stephan Günnemann*In Generative AI and Biology Workshop*, 2023Sampling all possible transition paths between two 3D states of a molecular system has various applications ranging from catalyst design to drug discovery. Current approaches to sample transition paths use Markov chain Monte Carlo and rely on time-intensive molecular dynamics simulations to find new paths. Our approach operates in the latent space of a normalizing flow that maps from the molecule’s Boltzmann distribution to a Gaussian, where we propose new paths without requiring molecular simulations. Using alanine dipeptide, we explore Metropolis-Hastings acceptance criteria in the latent space for exact sampling and investigate different latent proposal mechanisms.

- DiffDock-Pocket: Diffusion for Pocket-Level Docking with Side Chain Flexibility
*Michael Plainer*, Marcella Toth, Simon Dobers, Hannes Stärk, Gabriele Corso, Céline Marquet, and Regina Barzilay*In Machine Learning in Structural Biology*, 2023When a small molecule binds to a protein, the 3D structure and function of the protein can significantly change. Understanding this process, called molecular docking, is crucial in areas such as drug design. Recent learning-based attempts have shown promising results at this task, yet lack the necessary features that traditional approaches support. In this work, we close this gap by proposing DiffDock-Pocket: a diffusion-based all-atom docking algorithm conditioned on a binding target. Our model supports receptor flexibility by extending the generative diffusion process to the manifold describing the main degrees of freedom of the protein’s side chains. Empirically, we improve the state-of-the-art in site-specific-docking on the PDBBind benchmark. In particular, in the realistic scenario that no bound protein structure is available, we double the accuracy of current methods while being 20 times faster than other flexible approaches.

## theses

- MSc.Machine Learning Techniques for Improved Transition Path Sampling
*Michael Plainer**Technical University of Munich*, Nov 2023Master’s ThesisThe ability to efficiently, and most importantly accurately, simulate the atoms of molecules has opened many opportunities in various disciplines. In areas such as drug discovery or material science, we are interested in the constant small fluctuations captured by molecular simulations but also in finding rare transitions to different states as well. Transition path sampling (TPS) offers a powerful approach to exploring the pathways of rare events in complex systems, providing a comprehensive landscape of the transitional trajectories that traditional methods often miss. Current algorithms are based on Markov chain Monte Carlo and rely on computationally expensive molecular dynamics simulations. In this thesis, we will propose two new methods to overcome the issues of current approaches. As for the first approach, we demonstrate how we can sample transition paths in a learned latent space of a Boltzmann generator without the need for molecular dynamics simulations. For this, we reformulate the acceptance criterion of Metropolis-Hastings in the latent space to ensure that paths can be sampled with the correct probability. Additionally, we investigate how we can improve the current state of traditional TPS methodology. For this, we introduce a self-attention-based neural network architecture that uses the entire transition path to determine the optimal point to start molecular simulations from. We demonstrate the capabilities of our approaches on the molecule alanine dipeptide and introduce metrics and evaluation techniques to compare them with existing work. While the introduced latent TPS approach is mathematically correct, the produced results are not convincing and often exhibit unfavorable performance due to low acceptance of paths. Our ideas to improve point selection with context-aware neural networks on the other hand, seem promising and can improve on the state-of-the-art.

- BSc.Transport of Discontinuous Densities with Artificial Neural Networks
*Michael Plainer**Technical University of Munich*, Sep 2021Bachelor’s ThesisNearly all real-world measurements can only record a part of the underlying truth due to technical limitations. In many fields, full comprehension of the system requires an understanding of how the unmeasurable inputs or states map to the measurable outputs. In cases where many individual measurements are performed, the density of the observation can be approximated with histograms. They count the frequency at which measurements fall in a given range. Each observed sample corresponds to exactly one unknown point in the input space that has been mapped by a function to produce exactly this recorded output. When the distribution of these points in the original input space is known (e.g. uniformly distributed), a transport function describing this mapping can be found. Identifying this transport function is the main objective of this thesis. The field of transportation theory is dedicated to finding these transportation maps between two (probability) measures that are optimal according to a metric. Those approaches can fail to identify the true underlying transport map, for example if it is not bijective or when the recorded density is discontinuous. Reconstructing this true underlying transport map can be done by employing an observation process that measures consecutive outputs of moving points. This reconstruction procedure is implemented with artificial neural networks and demonstrated by examples. Separately to the transport of measures, another network is implemented that learns the underlying dynamical system based on the observation process, allowing to extrapolate the movement of the points. Apart from fictitious examples, the procedure is also applied to reconstruct the shape of a simulated cell by synthesizing image data (e.g. produced by a microscope) and observing moving bacteria on the cell’s surface.

* Equal contribution