Molecular design has important applications in healthcare, in particular, to accelerate the development of new drugs and vaccines. Typical approaches to study molecules need time-consuming Monte Carlo (MC) and Molecular Dynamics (MD) simulations to obtain approximate samples.

We discuss a recent alternative to expensive MC and MD simulations is to use deep generative learning to acceleratemolecular simulations.

** What is the probability that a drug molecule binds a target protein?** This is one of the challenges scientists face in developing drugs and vaccines. The properties of molecules is often studied by the equilibrium probability of their 3D atomic configurations. In a molecule, this is given by the well-known Boltzmann distribution with the 3D coordinates x of the atoms forming the molecule.

Typical approach to study Boltzmann distribution can be expensive and time-consuming. We discuss a recent trend of AI-based approach to reduce the cost of time-consuming MC and MD simulations by leveraging state-of-the-art deep generative learning in molecular design.

Under a wide range of conditions, the equilibrium probability of the 3D atomic configurations in a molecule is given by the well-known Boltzmann distribution:

\(\displaystyle p(x) \propto \exp\left\{\frac{-E(x)}{kT}\right\}\),

where \(k\) is the Boltzmann constant, \(T\) is the temperature and \(E(x)\) is the energy associated with the 3D coordinates x of the atoms forming the molecule. The Boltzmann distribution allows us to answer questions such as what is the probability that a drug molecule binds a target protein? This probability could be obtained by sampling from \(p(x)\) and counting the fraction of times that the obtained \(x\) correspond to a bound state. Obtaining a reliable answer to this question is highly important in drug design since this binding probability can be used to select promising drug candidates among massive libraries of drug-like molecules.

However, working directly with the Boltzmann distribution, that is, directly drawing samples from \(p(x)\), is intractable and approximations have to be made. The typical approach for this is to run expensive Monte Carlo (MC) and Molecular Dynamics (MD) simulations to obtain approximate samples. For example, by using the Hybrid Monte Carlo (HMC) algorithm, which works by making tiny changes to x according to approximate Hamiltonian dynamics and rejecting the changes whenever the resulting configuration is physically unrealistic.

In practice, the time increment used in the HMC simulations must be kept small to avoid the resulting trajectories becoming rapidly unphysical. Furthermore, such simulations are often likely to get stuck into temporary intermediate states with very low transition probabilities. This makes exploring slow molecular events, such as the binding of a drug to a target protein, a very computationally expensive process. The result is that MC and MD methods often consume much of the worldwide available supercomputing resources.

A recent alternative to expensive MC and MD simulations is to use machine learning to learn a tractable distribution \(q(x)\) that is an accurate approximation to \(p(x)\). The approximation \(q(x)\) can be obtained by using a deep neural network (DNN) to transform samples from a simpler distribution \(p_0(z)\) over latent variables \(z\) into a more complex distribution over \(x\). The DNN transformation is given by a conditional distribution \(p(x \vert z)\), where denotes the weights of the DNN. The resulting approximate distribution \(q(x)\) is obtained by marginalizing out the latent variables:

\(\displaystyle q(x) = \int p(x \vert z, \theta) p_0(z) dz.\)

Many deep generative models such as normalizing flows, variational autoencoders or neural auto-regresive models can be described by the previous expression. Their parameters are typically tuned by minimizing with respect to an objective function that quantifies the fit of the resulting \(q(x)\) to the target Boltzman distribution. This objective could be, for example, the Kullback-Leibler (KL) divergence from \(q(x)\) to \(p(x)\), denoted by \(\text{KL}[q(x) \vert \vert p(x)]\). Other objective functions are possible as well. Once an objective function has been chosen, its optimization is often done using stochastic gradient descent, which requires the evaluation of the energy function \(E(x)\) and its gradients. This is typically done using specialized software. For example, the OpenMM toolkit.

Using deep generative models for approximating the Boltzman distribution has multiple advantages. Unlike MC and MD simulations, they generate independent samples and can avoid getting stuck into temporary intermediate states with low transition probabilities. Deep generative models are very flexible models and, by increasing the size of their DNNs, they can produce better and better approximations. While there will always be some amount of bias in the generated samples, this bias can be reduced by, for example, running a short Hybrid Monte Carlo simulation that is initialized with a sample from \(q(x)\) and then taking the last state of the simulation, which represents an improved version of the original sample. Another option is to use importance sampling, which is possible when the density of \(q(x)\) can be computed in a tractable way. For example, when the deep generative model is a normalizing flow or an autoregressive model.

Despite the aforementioned advantages, there are still some important points to consider when using deep generative models to approximate the Boltzman distribution p(x). First, \(p(x)\) naturally captures physical invariances regarding the location of atoms forming a molecule. For example, translation and rotational invariances. For efficient learning, this type of invariances need to be manually encoded into the deep generative model. While some initial work has been done in this area, there is still no general solution for this problem. Second, the optimization of the deep generative model may get stuck into fitting only a few of the modes of \(p(x)\) and ignoring the others. For example, minimizing \(\text{KL}[q(x) \vert \vert p(x)]\) is well known to have a tendency to favour solutions that have this problem. To avoid this, the objective function that is used for tuning needs to be carefully chosen to favour instead a more global fit to \(p(x)\). Finally, while importance sampling can be used to remove the bias in the samples from \(q(x)\), the performance of this method is poor when the dimensionality of x is large, which is the case in most practical applications. To avoid this, a possibility is to use Hamiltonian annealed importance sampling, which behaves better in higher dimensions.

At Boltzbit, we exploit deep generative models to accelerate molecuar simulations using deep generative models. The application of deep generative models to approximate the probability of the 3D atomic configurations in a molecule is currently a very active and promising research area. It is expected that these methods will allow in the future the cheap and practical simulation of very large molecular compounds. The result will be a significant reduction in the cost of the drug design process by enabling the accurate estimation of drug efficacy via simulations and not through expensive wet-lab experiments. This type of change has happened before. For example, in aerodynamics, where computational fluid dynamics has replaced wind tunnel experiments in the design of large aeroplane fuselages and wings.