To develop an effective treatment for a disease, we need to find molecules that are relevant to that disease’s mechanism, and understand their structure so that we can design a drug that can attach to them and modify their behavior. For example in the case of a bacterial or viral infection, we would need to characterise molecules specific to those pathogens that the drug can attach to and how that drug might prevent their replication. This can be done in silico
(computer simulation) but it most often needs to be confirmed in vitro
, that is, a test system for the drug in a real live biological system.
So how can AI help speed up this stage in the development pipeline? The very first step is target validation, which means that we need to identify targets in a biological system that could be disrupted by certain molecules; for now this is a human task that relies on the mechanistic understanding of the biology of the system, but our knowledge of biology is still very rudimentary. Unlike in other physical systems, it is difficult to isolate specific components of a regulatory pathway involved in a single process, because natural selection tends to create systems that are highly entangled and with a high level of reuse, creating very complex systems that are hard to characterize.
To improve our understanding of this process, AI systems can be used to reverse engineer the regulatory networks
in cells from experimental data. Using high-throughput assaying systems to measure multiple perturbed cell cultures, we can collect enough data variation and quantity to create a model of the regulatory network that is most plausible to have generated that data. Using that model, we can understand which perturbations in the network would result in a change in the disease progression.
Once we know where in the molecular pathways we want to perturb with a drug, it is useful to know what is the molecular structure of the proteins involved. In many cases, we know the genetic code for the proteins, but we don’t know what is its final molecular structure inside a cell. It can be quite difficult to obtain this structure as it requires a complicated experiment where the molecules must be crystallized and then photographed with x-rays. In some cases this process damages the proteins and it can take years until researchers develop the right protocol to capture this structure correctly for a single protein.
AI techniques such as AlphaFold 2
can greatly accelerate this process by predicting the protein structure from the genetic code. The model was trained on known genetic sequences and their corresponding protein structures, and can generalize to novel, unseen sequences. Even if not 100% accurate, these predictions can greatly accelerate the discovery process as experimental teams no longer need to wait years for each new protein to be characterized via crystallography experiments.