Kalaivani, R. "Dynamics of Protein Kinases : Its Relationship to Functional Sites and States". Thesis, 2017. http://etd.iisc.ernet.in/2005/3579.
Streszczenie:
A cell is a highly complex, ordered, and above all, a robust system. It copes with in-trennel and external uncertainties like heterogeneous stimuli, errors in processing and execution, and changes within and outside the cell. Maintenance of such a system critically depends on a large body of signalling networks and associated regulatory mechanisms. Of the recurrent manoeuvres in cell signalling, protein phosphorylation is the most prominent, and is used as a switch to transmit information and effect-ate various outcomes. It is estimated that 30% of the entire proteome of a typical eukaryotic cell is phosphorylated at one time or another, almost exclusively at the hydroxyl groups of one or more Seer(S)/Thru(T)/Tyr(Y) residues. This phosphorylation is accomplished through the transfer of g-phosphate of ATP in the presence of cations by a superfamily of enzymes called protein kinases, or STY kinases.
In accordance with widespread phosphorylation events, STY kinases form a large and diverse superfamily, constituting 2% of the proteins encoded in an eukaryotic genome and about 500 proteins in the human proteome. Distantly related STY kinases share less than 20% sequence identity, phosphorylate specific substrates, bind to dis-tint interaction partners, localise in different cellular compartments and are regulated by different mechanisms. Despite flexibly accommodating these specific attributes, all STY kinases share a conserved 3-dimensional fold and retain the catalytic function. Moreover, all STY kinases can be manipulated by the signalling machinery to be in the “on” (functionally active) or “off” (inactive) state, thereby adding another layer of regulatory control. Such versatility of the STY kinase domain in harbouring specific substrate recognition motifs, binding interfaces, domain architectures and functional states makes it one of the most influential players in cell signalling and a desirable drug target.
Despite decades of studies, a comprehensive understanding of the kinase domain, and the features that dictate its catalytic activity and specificity is lacking. This is reflected by the fact that whereas kinases specifically bind and phosphorylate their cognate substrates, most drugs targeted at them are non-specific and beget cross-reactivity. This gap in understanding potentially ensues from an awry outlook of STY kinases from the viewpoint of sequences and structures alone. It is now well established that the function and regulation of a protein molecule, along with its stability and evolution, is closely related to its dynamics. In this premise, this thesis explores the mechanistic and dynamics underpinnings of STY kinases, and interprets them in the light of their multitude of functional responsibilities and specificities
In Chapter 1 of the thesis, we broadly discuss the complexity of cell signalling and the pivotal role of STY kinases in it. After a brief introduction to cell signalling in eukaryotes, several signal cascades mediated by different secondary messengers (camp, cGMP, DAG, IP3) are described. In these signal pathways, modularity is identified as a recurrent theme at all levels of hierarchy: within domain, within protein, within signalling pathway and across signalling pathways. One such modular regulation, protein phosphorylation, is discussed in detail and its catalytic enzyme STY kinase is introduced. An overview and historical perspective of the STY kinase superfamily is presented along with the review of literature pertaining to their sequence, structure and catalytic function characteristics. We note that in the active state, all STY kinases adopt a specific spatial conformation characterised by precise positioning of crucial structural motifs, while the inactive state is usually a case of some deviation from these structural constraints.
Chapter 2 addresses a fundamental question in the protein dynamics and function paradigm. If mobility and dynamics of a protein is intimately coupled to its function, how does it manifest in STY kinases? Is there a discernible inter-relationship be-tween the mobility of an STY kinase and its functional competence? To answer these questions, we collated 55 crystal structures of 14 STY kinases from diverse groups and families, and subjected their kinase catalytic domains to Gaussian network model (GNM) based normal mode analysis (NMA). GNM models the kinase structure as a 3-dimensional mass-spring system in a coarse-grained fashion, with masses/nodes at Ca atom positions. Proximate Ca nodes, within a 7 A˚ distance cut-off, are con-nested by identical virtual springs, resulting in a simplified network of Ca-Ca bonded and non-bonded interactions modelled as harmonic potentials. Based purely on the topology of mechanical constraints imposed by the springs, GNM analytically deter-mines the isotropic vibrational normal modes available to the kinase structure. This method approximates the energy of the protein structure harmonically, and thus any micro-motion of the kinase can be theoretically described by a linear cSombination of the calculated normal modes. It is known from previous studies that the modes of
low frequencies correspond to biologically feasible and meaningful motions like hinge movements, protein folding and catalysis.
We note that the multiple crystal structures analysed in each of the 14 STY kinases are identical in sequence and gross structural fold, and vary only in local backbone conformations corresponding to the functional state of the kinase (active/inactive). Upon examining the fluctuations of kinases in the normal mode of the least frequency (or, global mode), we found systematically higher structural fluctuations in the inactive states when compared to the corresponding active states. This observation held true within individual kinases and across all the 14 kinases. Taken together, a more number of residues have higher fluctuations in the inactive states (n = 1095), than those with higher fluctuations in the active states (n = 525; Chi-square test, p value < 0.05). This skewed fluctuation distribution is in corroboration with higher B-factors and con-formational energies of the inactive state crystal structures. Moreover, high fluctuation is observed across the different inactive forms, except a small fraction of DFG-“in” in-active conformations. Interestingly, the regions of differential fluctuation localised to activation loop, catalytic loop, aC-helix and aG-helix, which are implied in kinase function and regulation. Further investigation of 476 crystal structures of kinase com-plexes with other proteins revealed a remarkable correspondence of these regions of differential fluctuation to contact interfaces. We further verified that this differential fluctuation is not a trivial consequence of bound small molecules or mutations, but an inherent attribute of the kinase catalytic domains.
In Chapter 3, we verified the accuracy of differential fluctuation observed between the active and inactive STY kinases, as perceived from GNM based NMA, using the more rigorous method of molecular dynamics (MD) simulations. GNM is minimal-is tic in that the STY kinase catalytic domain is coarse-grained and reduced to a 3-dimensional mechanical network of Ca atom nodes. Thus, the role of side chains and their biophysical character, intra-protein interactions, mutations and bound factors are grossly overlooked. In this premise, we conducted all-atom MD simulations using AMBER ff14SB force-field of 6 structural variants of cAMP-dependent protein kinase (PKA) for 1 ms each. We chose 2 crystal structures of active and inactive PKA (PDB IDs 3FJQ and 1SYK respectively) whose kinase domains shared high structural similarity (gRMSD = 2.6 A)˚. They were modified in silico to obtain 6 starting structures for MD simulations: phosphorylated kinase domain in active and inactive states, kinase do-main along with its C-terminal tail in active and inactive states, active kinase domain bound to ATP/2Mg2+, and unphosphorylated inactive kinase domain.
In the absence of external domains, the inactive kinase domain conformation elicits higher mobility in terms of Ca RMSD and Ca RMSF than the active kinase domain. Of the 255 residues in PKA, remarkable 198 residues have higher Ca RMSF in the inactive state, with predominant contributions from ATP binding loop, catalytic loop and aG-helix. In the presence of C-terminal tail, the differential mobility of the kinase domain is exaggerated, with 241 out of 255 residues showing higher Ca RMSF in the inactive state. Upon close investigation, we found that in the presence of C-terminal tail, al-though the mobility of residues is generally suppressed in both the functional states, a few functional regions like activation loop and hinge residues experience higher Ca RMSF in the inactive state. This sheds light on the role of C-terminal tail in the dynam-ics of the activation loop, potentially operating through the hinge residues. Absence of phosphorylation in the inactive kinase domain increases the mobility of residues in general, except of those in the aG-helix. When bound to ATP/2Mg2+, active ki-nase domain (active-holo) showed higher mobility than the active-apo and inactive structures, contrary to the previous results and studies. Intrigued, we examined the simulation closely and found a transition of the active-holo structure to another con-formation, named s2, at 450 ns. Upon analysis of the trajectory before the transition, the active-holo form was indeed found to be more stable and less mobile than the inac-tive state(s). Thus, all the inactive variants are found to be consistently more agile and mobile than their active counterparts, in agreement with the results obtained using NMA.
Chapter 4 discusses the transition of the active-holo simulation to a new state, named s2, characterises its structural features and explores the possibility of its func-tional relevance. In the previous chapter, while attempting to verify the presence of differential mobility between various active and inactive forms of PKA through MD simulations, we chanced upon the transition of an active PKA state bound to ATP/2Mg2+ (active-holo) to s2 conformation. The s2 state has a Ca RMSD of up to 4.1 A˚ from the initial starting conformations, mainly contributed by the ATP binding loop, abs-helix, act-helix and age-helix, which are implicated in catalysis and substrate recognition. Once formed, s2 was stable and did not revert back to the active-hole starting structure or any other conformation. We calculated all-vs.-all Ca RMSDs of the conformations sampled during the simulation and identified 3 time periods: 0 - 200 ns of initial conformations similar to the starting structure, 201 - 500 ns of transition, and 501 - 1000 ns of s2 conformations. Principle component analysis (PCA) of the Ca spatial positions during the entire trajectory also categorically exposed two energy wells corresponding to the initial and s2 conformations in the first and second PCs (variance = 56%). Upon systematically comparing the conformers sampled in MD with every known kinase structure, no structure hit with Ca RMSD 2 A˚ was found for conformers sampled after 500 ns, deeming s2 as a novel and hitherto unknown conformation.
Investigation of persistent intra-protein interactions unique to the s2 state revealed two stabilising interactions: a salt bridge between K73 and E106 in the b-sheet behind the ATP binding cleft and a network of hydrophobic interactions anchoring act-helix to the age-helix. Aside from these defining interactions, s2 is also characterised by a higher density of intra-protein hydrogen bond network, which stabilises it further. PCA of the MD trajectory indicates the transition of active-hole to s2 to be a process with at least 2 steps, the first being the salt bridge formation. Evolutionary conservation analysis shows that the crucial residues involved in the s2-specific interactions are not reliably conserved across PKAs of other organisms. However, convergence to s2 may still be feasible through other courses of stabilising interactions. From a functional perspective, the s2 conformation opens up the age-helix away from the kinase core and mildly rearranges the catalytic cleft, thereby unmasking a hotspot for sub-strata binding that was absent in the initial structure. In an attempt to replicate the s2 conformation, we performed 4 repeat simulations of the same active-hole starting structure for 1 ms each. Although two of these independent simulations achieved the K73-E106 salt bridge, none of them cloned the complete extent of transition and con-mergence to s2. Instead, we sampled another set of novel conformations, s3, in one of the repeat simulations indicating a disposition for the ATP bound PKA to sample different conformations. Comparative analysis suggests a potential role of C-terminal tail in stabilising the active-hole conformation in physiological conditions.
Chapter 5 characterises the extent of conservation of structural fluctuations in ho-mologous STY kinases and interprets the observations in the light of their regulatory diversity. Upon establishing that structural fluctuations of STY kinases carry activity-specific information (Chapter 2) and affirming the same using MD simulations (Chap-ter 3), we hypothesised that the mobility of STY kinases is an important consider-action to understand the basis of their regulatory features as well. In that case, one would expect the structural fluctuations to be better conserved in closely related STY kinases than distantly related ones. To test our hypothesis, we collated 73 crystal structures containing an STY kinase domain in the active conformation and subjected them to GNM based NMA as described above. The global mode structural fluctuations of pairs of STY kinases of varying evolutionary divergence (same-protein, within-subfamily, within-family, within-group and across-groups) were analysed. We found that the closely related STY kinase pairs (of same-protein and within-subfamily cate-goriest) have more conserved and better correlated structural fluctuations than those that were distantly related (of within-group and across-group categories). This con-serration of flexibility did not trivially follow from sequence/structure conservation, since a substantial 65.4% of variation in fluctuations was not accounted by variations in sequences and/or structures.
Across the 73 active STY kinases belonging to different groups, we identified a conserved flexibility signature defined by low magnitude fluctuations of residues in and around the catalytic loop. Interestingly, we also identified sub-structural residue-specific fluctuation profiles characteristic of kinases of different categories. Specifically, fluctuation patterns that are statistically unique to kinase groups (AGC, TK) and families (PKA, CDK) were recognised. These fluctuation signatures localise in sites known to participate in protein-protein interactions typical of the kinase group and family concerned. Thus, we report for the first time that residues characterised by fluctuations that are differentially conserved within a group/family are involved in interactions specific to the group/family. Upon the validation of structural fluctuation as an indicative tool to understand kinase-specific interactions, we elucidate an application of this understanding. In SC kinase, we identified local regions around the age-helix to be exhibiting conserved differential fluctuations in comparison to its close relatives EGFR and Abl. Following from the learning that specific fluctuations are correlated with specific binding, we propose this as an attractive target for drug design, with minimal cross-reactivity. Overall, this chapter demonstrates the conservation of fluctuation in STY kinases and underscores the importance of consideration of fluctuations, over and above sequence and structural features, in understanding the roles of sites characteristic of kinases.
Chapter 6 documents the frequency of substitution of structural fluctuations in STY kinases over the course of divergent evolution. So far, we had established that structural fluctuations are evidently distinct in the varied functional states assumed by a single STY kinase (Chapter 2-3). In addition, fluctuations are differentially conserved within closely related kinases, but systematically vary across families (Chapter 5). In this chapter, we quantify the structural fluctuation variations in all residues of STY kinases put together. In a sense, this is the fluctuation space available for STY kinases across their functional states, binding modes, and regulatory mechanisms. To accomplish this, we systematically compiled all known eukaryotic kinase domain structures solved at resolutions better than 3 A˚. These structures were then divided into wild-type (harbouring no mutations and having typical amino acids at critical functional sites), pseudo-kinase (harbouring no mutations, but having unconventional amino acids at critical functional sites), disease mutant (harbouring mutations that have imp-plications in diseases) and mutant of unknown effect (harbouring mutations whose physiological manifestation is unknown) categories. Global mode structural fluctuations were determined for every kinase catalytic domain structure in each of the 4 enlisted categories.
Similar to Benioff and Benioff’s BLOSUM that summarised the probability of all possible amino acid substitutions in homologous proteins, we documented a ma tricks of fluctuation substitution frequency in the conserved regions of wild-type kinases (named FLOSUM). We observe a positive correlation between the mean and variance of structural fluctuations at equivalent residue positions in wild-type kinase structures (Spearman rank order correlation, r = 0.69, p value < 1e 139). This implies that the residues with low flexibility, like catalytic loop, do not adopt diverse fluctuations in different functional states or across kinases. Substitution with any other fluctuation is heavily disfavoured at the lower range of flexibility than at the higher range. While we did not detect apparent differences in the FLOSUMs of wild-type, disease mutant and mutants of unknown effect structures, there is a remarkable distinction in the FLOSUM of pseudo-kinases. Fluctuation substitutions that are traditionally unfavourable in wild-type kinases are freely allowed in pseudo-kinases, thus exhibit-in poor conservative substitution. Over and above the lack of conventional amino acids, poor conservation of structural fluctuations and favourable substitution of de-viand fluctuations could render auxiliary functional character to the kinase domain in pseudo-kinases, despite their structural similarity to STY kinases. Taken together, this study summarises the structural fluctuation landscape of STY kinases in the form of a substitution matrix, which can serve as a model of flexibility substitution during protein evolution.
Encouraged by structural fluctuations being differentially conserved in closely re-lasted kinases (Chapter 5) and conservatively substituted across kinases (Chapter 6), we extended this principle to the sequences of STY kinases in Chapter 7. This chapter reports the development of a method to predict the sites of functional specialisation in kinases, which differentiate one kinase from another, and applies it to all known STY kinase families. These are correlates of kinase-specific functional and regulatory attributes like specific protein-protein interactions, cognate substrate recognition and response to specific signals. Two cardinal properties of family-specific functional sites, viz., differential conservation and discriminatory ability, were used to identify them. We systematically compiled a data set of 5488 kinase catalytic domain sequences be-longing to 107 families. After aligning them into a single multiple sequence alignment, we comparatively analysed the amino acid distributions in topologically equivalent positions of different families. Based on 3 different analytical measures, physicochemical property, Shannon’s entropy and random probability, we scored the differential conservation of every alignment position in each family. By maximising the disc rim-inability between the kinase families, we integrated the results of the three measures and devised a single unified scoring scheme called ID score. This integrated scoring method could distinguish the 107 families from one another with an accuracy of 99.2%.
Each site in every STY kinase family was given a score in the range 0 to 1, with 0 indicating no functional specialisation and 1 indicating maximum functional spa-canalisation, by the ID score. Several validations of the method were carried out to assess its competence. First, we selected those residue positions which have consistently high ID scores across most families. Using these hotspot alignment positions that render specificity to the kinase, we clustered the kinase sequences into groups and families. We found that the ID score predicted sites clustered the kinases better than the traditional clustering using the entire alignment. Despite reduction in information, the increase in accuracy of clustering is feasible because of efficient filtering of non-discriminatory sites by ID score. Second, a linear discriminant classifier was observed to predict the kinase family, based on the ID score predicted sites, better than traditional methods. Third, family-specific protein-protein interaction sites in CDK and substrate recognising distal sites in MAPK were scored significantly higher than other residues by ID score (Two-tailed unpaired t-test, p value < 0.05). Fourth, family-specific oncogenic driver mutation sites in 8 different kinase families were identified confidently by the ID score. Finally, we demonstrate one feasible application of the ID score method in the prediction of specific protein-protein interaction sites. In summary, we developed an integrated discriminatory method to identify regions of functional specialisation in all known kinases, validated the results for known cases and elucidate a potential application of the method.
The learning from the entire thesis work is summarised in Chapter 8, which positions the work in the larger context of functioning of the kinase domain and the use of dynamics to interpret protein functions. The validity of the simple, yet use-full, NMA of proteins and complementary MD simulations to understand basic mechanistic and dynamic properties of proteins is highlighted. Similar to sequence and structure, dynamics is now recognised as a crucial feature holding information about protein function. The main learning of the thesis that the flexibility and mobility of STY kinases is conserved and conservatively substituted at different levels of hierarchy (different functional forms within a kinase, across kinase families and across the entire STY kinase superfamily) is discussed. The contributions of the work in fur-the ring the knowledge of specificity determinants in kinases, which dictate precise regulatory and control mechanisms, are presented.
Supplementary information helpful in understanding of the results of individual chapters, but could not be printed in the thesis due to its length, are provided in an optical disk attached to the thesis. The material in the optical disk is referred to in appropriate places in the individual chapters