This thesis demonstrates how to measure allosteric communication in proteins from long-timescale MD simulations, and illustrates how insight from allosteric networks can explain fundamental protein behaviors and open up new therapeutic opportunities. This work also highlights the utility of protein conformational ensembles and the tremendous scientific value encoded within protein energy landscapes. The data presented provides further support to the hypothesis that a protein’s thermal fluctuations contain all of its functionally relevant states.
In Chapter 2, the CARDS methodology is described as a means to integrate concerted structural and disorder-mediated correlations into a holistic view of allosteric communication. Applications of CARDS to the Catabolite Activator Protein (CAP), shown experimentally to undergo allosteric coupling via conformational entropy, are described. Specifically, we showed that examining the coupling of every residue to a known cAMP binding site naturally highlights regions of the protein that are known to be impacted by cAMP binding. Decomposing correlations into disorder-mediated and purely structural components demonstrates an important role for disorder-mediated coupling in the absence of concerted structural changes. Our global communication metric also provides a means to identify important functional sites without foreknowledge of their existence and locations. I expect CARDS to be of great utility for understanding allostery in systems where it is already known to occur, as well as for predicting allostery in systems where it has yet to be observed.
In Chapter 3, we describe the complete process of G protein activation and GDP release, specifically focusing on Gq, which has the slowest dissociation rate. These results reveal a previously unobserved intermediate that defines the rate-limiting step for GDP release and, ultimately, G protein activation. The model synthesizes a wealth of experimental data and previous analyses, and builds upon decades of literature to create a single, unified model. We highlight roles in the allosteric network for regions of known importance and identify new regions of value in the allosteric network coupling the receptor- and nucleotide-binding sites. The consistency of our model with a wide variety of structural and biochemical data suggests that it is a promising foundation for future efforts to understand the determinants of GPCR-G interaction specificity, how mutations cause aberrant signaling and disease, and how small molecule inhibitors modulate G activation. The model also adds weight to the growing appreciation for the fact that a protein’s spontaneous fluctuations encode considerable information about its functional dynamics. I hope this approach and combination of analyses will prove valuable for understanding other slow conformational changes and unbinding processes.
Described in Chapter 4 is a cryptic allosteric site in the IID of the Ebola VP35 protein that provides a new therapeutic opportunity against this essential viral protein. We used adaptive sampling simulations to access more of the ensemble of conformations that VP35 adopts, uncovering an unanticipated cryptic pocket. Application of CARDS to these simulations suggested that the cryptic pocket is allosterically coupled to the blunt end-binding interface and, therefore, could modulate biologically-important interactions. Subsequent experiments highlighted that fluctuations within the folded state of the IID expose two buried cysteines that line the proposed cryptic pocket to solvent. Moreover, covalently modifying these cysteines to stabilize the open form of the cryptic pocket allosterically disrupts binding to dsRNA blunt ends by at least 5-fold. Therefore, it may be possible to attenuate the impact of viral replication and restrict pathogenicity by designing small molecules to target the cryptic allosteric site we report here. More generally, these results speak to the power of simulations to provide simultaneous access to both hidden conformations and dynamics with atomic resolution. Thus I hope this demonstrate the potential of simulations as a means to uncover unanticipated features of proteins’ conformational ensembles, such as cryptic pockets and allostery, providing a foundation for the design of further experiments. We anticipate such simulations will enable the discovery of cryptic pockets and cryptic allosteric sites in other proteins, particularly those that are currently considered undruggable.
Chapter 5 and 6 highlight the power of Folding@home to tackle a myriad of biological problems, both fundamental and with potential relevance to therapeutic design. Work in Chapter 5 revealed that SARS-CoV-2 N protein undergoes phase separation with RNA when reconstituted in vitro. In this work we propose a model where a single-genome condensate forms through N protein gRNA interaction, driven by a small number of high-affinity sites. This (meta)-stable single-genome condensate undergoes subsequent maturation, leading to virion assembly. In this model, condensate-associated N proteins are in exchange with a bulk pool of soluble N protein, such that the interactions that drive compaction are heterogeneous and dynamic. The model provides a physical mechanism in good empirical agreement with data for N protein oligomerization and assembly.
In Chapter 6 we describe how the excascale power of Folding@home allows us to hunt for druggable opportunities throughout the entire SARS-CoV-2 proteome. The pandemic caused by SARS-CoV-2 necessitated a call-to-arms; a call that over a million citizen-scientists answered to generate 0.1 seconds of simulation data. We find that spike proteins have a strong trade-off between making ACE2 binding interfaces accessible to infiltrate cells and conformationally masking epitopes to subvert immune responses. These simulations also provide an atomically detailed roadmap for targeting proteins for vaccines and antivirals. We describe a number of cryptic pockets that we identify throughout the proteome of SARS-CoV-2, with more to be described as they are discovered. For each protein system in Table 6.1, an extraordinary amount of sampling has led to the generation of a quantitative map of its conformational landscape.
Finally, in Chapter 7 we demonstrate how molecular simulations, when integrated with experiments, can be used to study the biophysical impact of mutations, such as those that grant Ceftazidime Resistance to the CTX-M -lactamase (P167S and D240G). However, these mutations actually undergo negative epistasis with one another. Here we show that while P167S and D240G individually grant CTX-M ceftazidime resistance, the double mutant P167S/D240G displays a wild-type behavior. The results presented here suggest that conformational heterogeneity, particularly in active-site loops, plays an important role in the substrate specificity and evolutionary capacity of -lactamases. The P167S and D240G mutants readily sample multiple conformations and yet are more stable than P167S/D240G, which samples fewer conformations. Crystallography and molecular dynamics results show that the CTX-M acyl-enzyme complex exists in equilibrium between inactive and active conformations and that the P167S and D240G variants have a higher probability of adopting active conformations. Taken together, our data suggests that the P167S and D240G substitutions promote an open conformation of the -loop that creates access for ceftazidime and allows Glu166 to sample conformations consistent with deacylation, whereas the WT and P167S/D240G mutant exhibit a closed -loop conformation that constrains access for ceftazidime and prevents Glu166 from efficiently coordinating water for deacylation.
Altogether, this thesis highlights the importance of understanding allostery in biological systems. Both in scale and approach, we are entering a new phase of using molecular dynamics simulations to understand biophysics; the complex task of simulating an organism’s entire proteome could become more commonplace. It is worth speculating what the future holds for studies of protein dynamics, allosteric communication, and the role of conformational entropy in biological function and disease.
Methodologically, CARDS is already able to measure allosteric communication in a holistic manner, but there are many new methodological enhancements through which we can obtain an increased understanding of allosteric communication. As previously discussed, using a rotameric library might improve upon identifying significant allosteric communication beyond noise, and using a different a multi-exponential distribution to compute probabilities of ordered and disordered regimes might improve detection of dynamical states.
Beyond previously discussed avenues of improvements, construction of our allosteric network in chapter 3 is aided by the knowledge that the GPCR-binding site sends a signal to the GDP-binding site [168, 169, 170]. While ample evidence exists to indicate that coupling occurs in both directions, CARDS does not explicitly describe directionality in pairwise communication. This kind of directional information has immense value in less well-studied systems. Hence, there is value in identifying an alternative metric to mutual information that specifically measures a directional transfer of information. Additionally, CARDS extracts data directly from simulation datasets, rather than an MSM. As such, CARDS is inherently limited by the sampling bias that may exist in a dataset, measuring stronger correlations between residues in states that are closer to a trajectory’s starting configuration. While computing structural-correlations is already possible using MSMs, it is difficult to assign dynamical states to MSMs. Therefore it is valuable to develop frameworks to extract kinetic signatures of disorder from MSMs that can be used to assign dynamical states. Then correlated motions will be able to be extracted from MSMs trivially.
Along with developments with CARDS, there are a number of questions surrounding allostery in G protein signalling that remain unexplored. While we highlight the potential utility of our allosteric network in chapter 3, it is worth noting that an allosteric inhibitor of Gq already exists, YM-254890 (YM) [203]. This depsipeptide binds at the hinge between the two domains and prevents G protein activation by trapping the protein in a GDP bound state, preventing GDP dissociation. However, the mechanism of inhibition, efficacious even in cells [492], remains unclear. Structural models suggest that the domain-opening motion of G proteins is inhibited by YM, but simulation of large peptide-protein complexes, particularly at the scale of heterotrimeric G proteins, remains a challenge. Learning the allosteric mechanism of G protein inhibition by YM might inform chemical strategies to design a simplified peptide analog, as well as inform design principles for the general design of highly-selective G protein inhibitors.
Building off the allosteric networks described in chapters 3 and 4, it is worth exploring if protein homologs have conserved allosteric networks. For example, many G protein isoforms have similar structures [171], but different behaviors ranging from their effector targets to their GDP dissociation kinetics. Given previous success studying different isoforms of a protein family [10], it may be that the differences in conformational landscapes encode similarities and differences in allosteric networks. Assuming a “core” allosteric network exists across all G protein isoforms, the differences between them may be all the more interesting as they might provide a basis for regions that govern isoform-specific behaviors.
Going beyond primarily-simulation studies, it is exciting to enter an era where microsecond-to-millisecond dynamics can be explored and integrated with experiments. While simulations already provide predictive models for experimental measurements (see chapter 3, 4, 5 and 7), there are opportunities to further integrate with experiments that directly measure conformational dynamics, such as single-molecule studies or Nuclear Magnetic Resonance (NMR) spectroscopy. NMR is able to observe major and minor conformational states and obtain thermodynamic and kinetic information at residue-level resolution. It provides a powerful methodology to identify and characterize excited states in a protein’s conformational landscape. However, one limitation of NMR is that it remains difficult to characterize these excited states or to identify the motions that may drive the equilibrium motions (called “exchange processes”) observed in NMR spectra. MD provides a powerful complement to these limitations, with the potential to characterize the motions underlying conformational equilibria and the participating residues. NMR and MD together may be a powerful means by which protein conformational landscapes can be more completely characterized. As we enter an era of exascale computing, it is exciting to consider the ways in these kinds of integrated biophysical approaches may help comprehend complex biological phenomena.