Abstract
Intrinsically disordered proteins (IDPs) are ubiquitous and play key roles in transcriptional regulations and other cellular processes. To characterize diverse structural ensembles of IDPs, combinations of NMR and computational modeling showed some promise, but they need further improvements. Here, for accurate and efficient modeling of IDPs, we propose a systematic multiscale computational method. We first perform all-atom replica-exchange molecular dynamics (MD) simulations of a few fragments selected from a target IDP. These results together with generic knowledge-based local potentials are fed into the iterative Boltzmann inversion method to obtain an accurate coarse-grained potential. Then coarse-grained MD simulations provide the IDP ensemble. We tested the new method for the disordered N-terminal domain of p53 showing that the method reproduced the residual dipolar coupling and x-ray scattering profile very accurately. Further local structure analyses revealed that, guided by all-atom MD ensemble of fragments, the p53 N-terminal domain ensemble was biased to kinked structures in the AD1 region and biased to extended conformers in a proline-rich region and these biases contributed to improvement of the reproduction of the experiments.
Introduction
In eukaryotes, many proteins lack well-structured three-dimensional folds entirely or partly in their native condition, but have some important functions, which are called “intrinsically disordered proteins (IDPs)” or “intrinsically disordered regions” (1,2). For example, many transcription factors contain IDPs (2). Numerous IDPs are associated with human diseases, including cancer, cardiovascular disease, amyloidosis, neurodegenerative diseases, and diabetes (3). Revealing functions of IDPs is of crucial importance and thus IDPs have been intensively studied in the last decade (4–11).
To decipher IDP functions in detail, structural characterization is indispensible, where high degrees of conformational flexibility call for methods that can probe ensembles of structures. Small angle x-ray scattering (SAXS) (12) and NMR techniques, such as residual dipolar coupling (RDC) (12) and paramagnetic relaxation enhancement (5,13), have been successfully applied to characterize the IDP structural ensemble. Because these methods, on their own, do not directly give us structures, they are often accompanied by computational modeling (5,8). Among the various methods, two similar computational methods (12,14) that explicitly build ensembles of IDP structures based on knowledge-based dihedral angle statistics have proven to be useful. In one such method called “flexible-meccano” (12), first ϕ/ψ propensities are sampled from a coil library, and then with use of the propensities a large number of conformers that do not self-overlap are generated. Averages over thus-generated ensembles often reproduce experimental data such as RDC and SAXS quite well. Accuracy of these methods, however, depends on the proteins; they are not always good enough, especially for IDPs with some residual and nontrivial orders (15).
When the direct use of the flexible-meccano method does not give good agreement in RDC values, one argues that some nontrivial residual order is present (15). One calibrates the ensemble so that the reweighted ensemble gives better agreement with experiments, by which we can get some insights into the nontrivial orders (8,9,16). A major concern here is that this approach suffers from the degeneracy problem; i.e., there is more than one way to reproduce a given set of experimental observations within experimental error. Another and more fundamental approach for IDP ensemble modeling may be to directly use molecular dynamics (MD) simulations (5,17,18). MD simulations are expected to provide much more accurate structural ensembles. Yet, given that biologically relevant IDPs are often not so small, sufficient sampling by standard MD simulations is not feasible at this moment.
Recently, Mukrasch et al. (19) proposed a hybrid method which combines accelerated MD simulations of a few fragments selected from a target IDP and a flexible-meccano approach (15). They showed that incorporation of the MD-sampled dihedral angles into the statistical ϕ/ψ propensities of the flexible-meccano method improves the accuracy of the computed RDC values. Their method is very promising, but obviously has some room for improvement. Indeed, the purpose of our work here is to extend their method.
In this study, we put forward the approach of Mukrasch et al. (15), employing a more systematic multiscale theory and a more general MD simulation approach. We combine high-accuracy all-atom MD simulations of a few fragments selected from a target IDP and coarse-grained (CG) MD simulations of the entire target IDP. First, we select short fragments where previous RDC data (20) show large-deviation from the baseline, which is indicative of residual orders. For the selected fragments, we perform all-atom MD simulations with the variant replica exchange solute tempering (REST) method (21–23), which is a state-of-the-art efficient sampling method. The use of REST is one improvement over the previous work (15).
By this method, we obtain probability distributions of local structures. For the rest of the target IDP, we use generic statistics obtained from a set of loop structures stored in the protein data bank. The information is fed into the multiscale methodology to obtain a tuned CG potential. Namely, we employ the iterative Boltzmann inversion technique (24), which takes into account the coupling among different degrees of freedom. The systematic multiscale theory used here is the second advantage of this work. With the converged CG potential, we perform CG MD simulations to obtain an IDP structural ensemble. MD simulations are more general than the chain-growth approach used in the flexible-meccano method. By employing MD simulations, we can address the dynamics. In addition, we can easily simulate large protein dynamics that contain both folded domains and IDP regions by combining our method with other CG methods. We show that our new fragment-based multiscale MD (fMD) simulation gives a significantly more-accurate structure ensemble than the pure-CG model, in that it does not rely on all-atom MD simulations for a target IDP, p53 N-terminal domain.
The tumor suppressor p53 is a multifunctional transcription factor that plays vital roles in maintaining the integrity of the human genome, controlling apoptosis, cell-cycle arrest, and DNA repair (25). p53 forms a homo-tetramer of 393-residue subunits. Each chain consists of four major domains (i.e., the N-terminal domain, NTD (residues 1–93); the DNA binding domain, DBD; the tetramerization domain, TET; and the C-terminal domain, CTD) (Fig. 1 A) (26).
Figure 1.
Sequences and structures of simulated systems. (A) Schematic of p53 domain organization showing N-terminal domain (NTD), DNA-binding domain (DBD), tetramerization domain (TET), and C-terminal domain (CTD). The sequence of human p53 NTD is shown and known sites of phosphorylation are indicated by black dots. The AD1, AD2 and PRR motifs are indicated. (B) A random sequence of 93 residues obtained by shuffling the p53 NTD sequence. (C) Ten representative structures from the structural ensemble of p53 NTD. These structures are superimposed by using C-terminal 30 residues (roughly corresponding to the PRR region). Each chain is represented by a different color, and AD1 and AD2 are represented by bead models.
The NTD is the IDP characterized by far-ultraviolet circular dichroism spectra and by NMR spectroscopy (27), and plays crucial roles in transcription activation. Two subregions in the NTD, activation domain 1 (AD1) (residues 19–26) and activation domain 2 (AD2) (Fig. 1 A) bind to regulator proteins (28–30). A major negative-regulator of p53—Mdm2—binds to AD1, making this region an important anti-cancer drug target (31). The NMR spectroscopy revealed that intrinsically disordered AD1 forms a stable helix in complex with Mdm2 (32). Binding-coupled folding is a key feature in this region. Our previous all-atom MD simulations revealed that the intrinsic structural ensemble of AD1 regions is highly diverse, containing small but nonnegligible (∼0.5%) contents of the structure, close to that in the complex (23).
Due to its importance, p53 NTD has been characterized by various experiments such as RDC (20), SAXS (20), and paramagnetic relaxation enhancement (33), as well as by computational modeling (33). In particular, Wells et al. (20) used a flexible-meccano method to analyze the RDC data, finding the most significant deviation of the RDC profile predicted by flexible-meccano from the experimental result at residues 21–25 (correspond to AD1 in Fig. 1 A) and over the range 61–93 (corresponding to PRR in Fig. 1 A). By calibrating the ensemble based on accelerated MD simulations and knowledge of the complex structure, they improved the structural ensemble of p53 NTD. Our approach here is to move their approach forward, making a systematic procedure for an accurate and feasible construction of the IDP ensemble by MD simulations.
Materials and Methods
p53 N-terminal domain
As a target for multiscale modeling of IDP, we chose p53 N-terminal domain (NTD), which contains 93 residues (Fig. 1 A). The RDC experiments suggested some residual structures primarily in two regions of NTD: AD1 (residues 19–26) and PRR (residues 61–93).
To test the significance of our IDP modeling, we used a random sequence as well. Using the amino-acid composition of p53 NTD, we shuffled it randomly, obtaining a random sequence of 93 residues with the same composition as p53 NTD (sequence shown in Fig. 1 B).
Coarse-grained protein model
In this work, we used a one-bead-per-residue, coarse-grained (CG) model of a protein. We represented an amino acid by a single interaction center (bead) located at the position of the Cα atom. The CG model consists of five terms,
where Vb is the potential energy function for virtual bond stretching, Va is that for virtual bond angle, Vd is that for virtual dihedral angle, Vex is a standard excluded volume term, and Vele is the electrostatic interaction potential.
The potential energy function for the bond stretching is defined as
where rij is the distance between residues i and j, kb = 100.0 kcal/mol/Å2, and b = 3.8 Å. The potential energy function for the bond angle (Va) and for the dihedral angle (Vd) is described in detail in the subsequent subsections. The excluded volume potential is represented as
where the summation is over nonlocal pairs defined by i < j−3. The values ε = 0.2 kcal/mol and σ = 4.0 Å. The potential energy function for the electrostatic interaction takes the Debye-Hückel form
where qi is the charge of residue i. Here, we assumed that all Glu and Asp have −1 charge and all Arg and Lys have +1 charge. The target sequence does not contain any His residues. The values ε0 and εr are the dielectric constant and the relative dielectric constant for water (set as 80), respectively. The value λD is Debye length represented as
where kB is the Boltzmann constant, T is the temperature, and e is the elementary electric charge. I is ion strength and thus is dependent on the salt concentrations. We assumed a physiological salt concentration, i.e., [Na] = [Cl] = 150 mM, which corresponds to the ion strength I = 0.15.
All CG simulations employed a standard underdamped Langevin dynamics (34) with T = 300 K. For each system studied, one production run was performed for 5 × 107 time steps, which was enough to obtain the converged results and evenly spaced 104 frames were used for the analysis.
All-atom simulations for selected fragments
Because the RDC experiments suggested some residual structural order in AD1 and PRR regions (20), we decided to use an accurate modeling of these regions. For these two regions, using a standard atomistic force field, we performed variant replica exchange solute tempering (REST) MD simulations (23). The AD1 is short enough to obtain a well-converged sampling by the REST method, which was described before (23). The simulated region for AD1 (residues 17–29) contains and is slightly larger than actual AD1. The PRR is longer and is not easy to be completely sampled by a single REST simulation. Thus, we fragmentized it into three parts (PRR1, residues 61–72; PRR2, residues 71–80; and PRR3, residues 79–93). These parts have overlaps, which are necessary to calculate probability distributions of all the dihedral and bond angles in PRR. For each fragment, N- and C-termini were capped by Ace and NH2, respectively. For each of the four segments, we constructed initial models by PyMOL (http://www.pymol.org).
The REST simulations were described in detail in Terakawa et al. (23), and here we briefly summarize it. The fragments were solvated with water molecules and ions for neutralization. The force field for the peptide was ffOPLSAA/L (35,36) and that for water was TIP4P (37). We simulated the system with dodecahedron periodic boundary conditions and used the particle-mesh Ewald method (38) for electrostatics. The bond lengths that include hydrogen atoms in the peptide (in water molecules) were constrained by the p-LINCS (39) (by SETTLE (40)). In each simulation, 10 replicas were used. The production simulations were performed for 100 ns using NVT ensemble and 10,000 frames were used for the potential construction. These simulations were conducted using GROMACS 4.0 (41).
Based on the converged sampling by REST simulations, we constructed the probability distribution functions used for the subsequence CG simulations. For the virtual bond angle θi defined by Cαi−1-Cαi-Cαi+1 and the dihedral angle ηi representing a rotation around the Cαi-Cαi+1, we obtained the probability distributions P(θi) and P(ηi), respectively. In the bond angle and dihedral angle calculation, we treated CH3 atom of N-terminus capping residue (Ace) as Cα atom. The bond angles and the dihedral angles between residues 17 and 28 were constructed from the REST variant simulation of AD1. The dihedral angles between residues 61 and 71, 71 and 79, and 79 and 92 were constructed from the simulations of PRR1, PRR2, and PRR3, respectively. The bond angles between residues 61 and 70, 71 and 79, and 80 and 93 were constructed from those.
Probability distribution for generic loop structures derived from structure database
In order to construct the probability distributions for the regions other than AD1 and PRR, we constructed a generic set of probability distributions from the dataset of 13,598 protein structures in PDB which have mutual sequence identity <30%. For each of the proteins, using the DSSP software (42) for assigning the secondary structure, we extracted four consecutive loop residues (residues which are not assigned to helix or strand). We call them the “loop-segment library” hereafter. We illustrated the distributions by plotting a Ramachandran plot for alanine from this library (see Fig. S1 A in the Supporting Material), which shows that although the major population is in the poly-proline II and β-regions, some populations were also found in α-region.
The virtual bond angles θ were classified by the amino-acid type of the central residue, and for every central residue type, we obtained histograms with the bin size of 10°. Thus, totally, 20 probability distributions P(θ) were constructed. The dihedral angles η were classified by the central pair of amino-acid types, and for every pair we obtained histograms with the bin size of 10°, thereby obtaining 400 probability distributions P(η). These generic distributions were used for regions other than AD1 or PRR fragments.
The first-generation bond and dihedral angle potentials by Boltzmann inversion method
To derive the first-generation CG potentials for bond angles and dihedral angles, we combined the probability distributions obtained by all-atom REST simulations for AD1 and PRR and those from the structural database. We call this the “fragment-based multiscale MD” (fMD) method. For comparison, we also generated CG potentials purely from a generic structural database without using any information from all-atom REST simulations. The CG potential thus obtained is called the “pure-CG model”.
Using the probability distribution for bond angles (either by an all-atom REST simulation or by a structural database), we define the first-generation CG potential energy function as
where θ is the bond angle and P(θ) is the probability distribution. The denominator in the logarithm is from the Jacobian of the polar coordinates. The P(θ) values derived are represented as numerical tables. When we calculated a force in simulation, we interpolated these tabulated values with the spline interpolation to obtain a continuous potential energy function.
Similarly, the potential energy function for the dihedral angle is represented as
where η is the dihedral angle. When we calculated a force in simulation, to satisfy the periodic boundary condition of η, we fit the tabulated data by the truncated Fourier series as
where km, kn, and C are Fourier coefficients.
Iterative Boltzmann inversion procedure
The Boltzmann inversion method is based on an assumption of independence of all bond and dihedral angles, but this assumption is, in reality, an approximation. To take into account the mutual coupling, we employed a standard iterative Boltzmann inversion procedure for tuning Va and Vd (24).
First, we performed CG simulation with the first-generation potentials Va and Vd described above without electrostatic interactions, from which we obtained the simulated probability distributions PCGMD(a) of bond angles and dihedral angles (a is either θ or η). Using them, we updated the local potential terms by the equation
where Pref(a) is the reference probability distributions derived either from the atomistic REST simulations for AD1 and PRR or from generic statistics from PDB. The parameter c is an empirical constant (c < 1) often introduced for stable convergence. We empirically set c as 0.5. With the updated local potentials, we repeated CG simulations. This procedure was iterated three cycles, and, after that, the calculated potentials were converged.
RDC and SAXS profiles
For each of conformations sampled by the CG (fMD or pure-CG) simulations, we first rebuilt all-atom models using BBQ (43) for the backbone atoms and then SCWRL 4 (44) for the side-chain atoms. The default setups were used for both software packages. For each of all-atom structures, we used PALES (45) to obtain a RDC profile and used CRYSOL (46) to calculate a SAXS profile. Then, RDC and SAXS profiles from every structure in an ensemble were averaged-over.
Results
Fragment conformations revealed by all-atom simulations
We performed the conformational sampling for AD1 (residues 17–29), PRR1 (residues 61–72), PRR2 (residues 71–80), and PRR3 (residues 79–93) regions by all-atom REST simulations (see Movie S1 and Movie S2 in the Supporting Material). Characteristics of these fragment structures and convergence of the sampling were assessed by monitoring some probability distributions.
Here, we exemplify them with the distributions of radius of gyration for the AD1 and PRR2 ensembles (Fig. 2). In this figure, AD1 represents the 10-amino-acid residue segment (residues 18–27) extracted from the simulated segment (residues 17–29) so that the number of amino acids for this analysis is identical between AD1 and PRR2. We see that the peak of the distribution is ∼5.5 Å for AD1, whereas it is shifted to 8.5 Å for PRR2. Thus, AD1 (PRR2) has preference for compact (extended) structures. Although the AD1 region is predicted to form a helical structure by the secondary structure prediction server PSI-PRED (47), the typical structure (left compact structure in Fig. 2) chosen by the clustering analysis of the atomistic simulation is not a helical structure. Probably, a helical structure is stably formed only after binding to regulatory proteins. Further analysis of the ensemble will be discussed below when we compare it with the ensemble by fMD.
Figure 2.
The probability of radius of gyration for the AD1 and PRR2 ensembles obtained by REST variant simulations. In the case of the AD1 ensemble, results obtained from the samples of 0–100 ns, 100–200 ns, and 0–200 ns were depicted by plus-symbols, cross-symbols, and dashed curve, respectively. In the case of the PRR2 ensemble, results obtained from the samples of 0–50 ns, 50–100 ns, and 0–100 ns were depicted by plus-symbols, cross-symbols, and dashed curve, respectively. The left compact structure (Rg, 5.4 Å) is a typical structure from the AD1 ensemble and, the right extended structure (Rg, 8.3 Å) is that from the PRR2 ensemble.
We also see that the REST simulations for AD1 and PRR2 show reasonable convergence by assessing differences in the distributions obtained by different time windows (Fig. 2, plus and cross symbols). In more detail, the error in the probability distribution of the AD1 region was larger than PRR2 despite the longer simulation time. It was perhaps due to the smaller conformational diversity for PRR2 that is localized in extended conformers. In contrast, the AD1 region has a more diverse structural ensemble including some compact conformers. Though the convergence of AD1 was not perfect, we concluded that the convergence is sufficient. We used the samples of 0–100 ns for both AD1 and PRR2 to derive the probability distributions for bond angles and dihedral angles.
CG model tuned by the iterative Boltzmann inversion
Given the probability distributions for bond angles and dihedral angles either by all-atom REST simulations or from structural database, we performed the iterative Boltzmann inversion procedure to obtain the converged CG potentials, Va and Vd. The iterative Boltzmann inversion ensures that the resulting CG simulations reproduce the given probability distributions for bond angles and dihedral angles. In Fig. 3, some examples of potentials before (dashed curve) and after (solid curve) the iteration are shown together with the distribution obtained from the CG simulations (dotted curve). Although many potentials were not severely altered in the iterative Boltzmann inversion process (Fig. 3, C and D), some potentials were significantly deformed (Fig. 3, A and B). In Fig. 3 D, the difference between the potential constructed from library statistics and that actually used in CG simulation is negligible. This shows that the fitting error by a truncated Fourier series is quite small. Therefore, the deformation observed in Fig. 3 B is primarily not due to such a fitting error, but due to the iterative potential update.
Figure 3.
CG potentials for bond angles at Leu35 (A) and at Thr55 (C) and CG potentials for dihedral angles around the virtual bond Gln38-Ala39 (B) and Val10-Glu11 (D). (Dashed curves) Generic knowledge-based potentials −kBTlnP obtained by the Boltzmann inversion method. (Solid curve) Converged CG potentials obtained by the iterative Boltzmann inversion method. (Dotted curves) −kBTlnPsim, where Psim is the probability distribution of a bond angle/dihedral angle obtained from the CG simulations using the converged CG potentials.
We then addressed the difference between the CG potentials constructed from the atomistic REST simulation result VfMD(η) and those solely constructed from the structural database Vpure-CG(η) (Fig. 4 A). To quantify the difference in the two CG potentials, we define a score as . Fig. 4 A shows the score along the sequence. We see that the PRR region includes many dihedral angles with high scores, which suggest large differences between the potentials from the atomistic simulation result and from the structural database. All the dihedral angles around the virtual bond X-Pro (X represents an arbitrary amino acid) have scores higher than 4.0, and so have large impact when we use the potentials from the all-atom simulation result instead of those from the structural database.
Figure 4.
Comparison of potentials obtained by fMD and by pure-CG. (A) The difference between the dihedral angle potentials constructed from the all-atom MD simulation result (fMD) and those constructed from the structural database (pure-CG). The definition of score is described in the text. (Solid bar) Score of X-Pro described in the text. (B and C) Examples of potentials for dihedral angles; for the dihedral angles around the virtual bonds Leu22-Trp23 in panel B and Ala63-Pro64 in panel C. (Solid curves) CG dihedral angle potentials constructed from the all-atom MD simulations and (dashed lines) those from the database.
The CG potentials obtained by all-atom REST simulations and by structural database are compared in Fig. 4, B and C, where in the former (latter), the two CG potentials are similar (dissimilar). Particularly, Fig. 4 C represents potentials of the dihedral angle around the virtual bond X-Pro, and we see that the potential from the simulation result has a deeper basin at ∼−100° than that from the library statistics. This strong bias to a particular dihedral angle is a behavior specific to this region in solution.
To further assess the conformational property of the resulting CG simulations, we rebuilt all-atom backbones and drew the so-called Ramachandran plot for an alanine (see Fig. S1 B). We used BBQ (43) to rebuild the backbone atoms from Cα models. The Ramachandran plot of backbone dihedral angles of Ala39 obtained from the CG simulation ensemble is shown in Fig. S1 B. We see that the overall distributions are fairly similar to each other. In particular, the tendency that most samples populate in the poly-proline II region is well reproduced.
SAXS profile
To test the simulated p53 NTD ensembles, we first compared the SAXS profiles from fMD and from pure-CG with the experimental SAXS profile (Fig. 5). We note that the experimental data were taken directly from the figure in Wells et al. (20) by image processing, and thus may contain some unavoidable error. Overall, both the fMD simulation (solid line) and pure-CG simulation (dashed line) gave quite close agreement with the experiment, demonstrating that the overall shape (Fig. 1 C) of the calculated and measured ensembles were consistent. In more detail, the detailed inspection suggests that the structural ensemble obtained by fMD (solid line) reproduces the SAXS experiment slightly better than that obtained by pure-CG model (dashed line), although the difference is rather small and is statistically insignificant.
Figure 5.
Experimental SAXS data from p53 NTD (cross points) (20) compared with the average simulated data obtained by the fMD simulation (solid curve) and the pure-CG simulation (dashed line). (Inset) Guinier plot for these SAXS data.
The average radius of gyration of the ensemble with only Cα atoms from fMD and pure-CG is 25.5 Å and 21.4 Å, respectively, and thus they are indeed significantly different. However, the calculated spectrum is not sensitive to the differences between them. This is probably because the SAXS intensities in the small angle limit region (s < 0.02 in Fig. 5)—which is important for the accurate estimate of radius of gyration—usually display large error. (Indeed, the experimental data in the small-angle-limit region was reported with large errors in Wells et al. (20).) Note that, in Guinier plot which is usually used for estimation of radius of gyration from SAXS, the fMD simulation result shows agreement to experimental data except in the small-angle limit, which is slightly better than the pure-CG simulation. (Fig. 5, inset) The radii of gyration calculated from the Guinier plot were 29.484 Å (fMD) and 26.431 Å (pure-CG), respectively.
As a control, we also calculated the SAXS profile for the randomly shuffled sequence (Fig. 5, dotted line), which fits equally well with the experimental profile of p53 NTD. Assuming that the random sequence has generic disordered structures, we see that the SAXS profile is too insensitive to clearly detect the target-specific structural ensemble from generic disordered ensembles.
RDC
Next, we compare the RDC profiles of p53 NTD. Fig. 6 A shows NH RDCs of p53 NTD in phage measured at 300 MHz (dashed line) together with the results from the fMD simulation (solid line) (Fig. 6 A). We see that the RDC calculated by fMD agrees very well with the experimental RDC. The correlation coefficient between the two was R = 0.848. We note that, especially, the RDC values in AD1 and PRR regions that were based on the all-atom MD samplings agree with the experimental ones almost perfectly (R in the PRR region was 0.948).
Figure 6.
NH residual dipolar couplings of p53 NTD in phage measured at 300 MHz (dashed line) together with the results calculated from CG simulations (solid curves) (A and B) for p53 NTD and (C) for a random sequence. The results in panels A and B are obtained from the fMD simulation and from the pure-CG simulation, respectively. All calculated values are scaled by the same prefactor to best reproduce the experimental data. (Solid bar in A) AD1 and PRR are modeled by the potential derived from atomistic simulations. Each value of R represents the Pearson correlation coefficient.
To see the impact of the multiscale approach on the RDC profile, we also calculated the RDC profile by pure-CG simulations. Fig. 6 B shows the RDC profile obtained from the pure-CG together with the experimental data. We see that the pure-CG simulation agrees reasonably well with the experiments with R = 0.697 (p < 10−4). As expected, the major differences between the RDCs by fMD and those by pure-CG are found in the AD1 and PRR regions (R = 0.699), where the fMD method outperformed the pure-CG method. The difference was statistically significant with p = 0.002.
As a control, we calculated the RDC profile for a randomly shuffled sequence in Fig. 6 C. Apparently, the RDC for the random sequence does not resemble with the experimental RDCs for p53 NTD (R = 0.007), which clearly showed that the RDC is sensitive enough to detect the target-specific weak residual structures in IDPs.
Characterizing ensembles of p53 NTD
Because the RDC profiles suggested some residual orders in AD1 and PRR regions, we next address structural characteristics in these regions in detail. We computed distributions of radius of gyration of AD1 (residues 17–29; see Fig. S2 C), PRR1 (residues 61–72; see Fig. S2 A), PRR2 (residues 71–80; see Fig. 7 A), and PRR3 (residues 79–93; see Fig. S2 B) in the ensembles by atomistic, fMD, and pure-CG simulations. Here, the radius of gyration was calculated with only Cα atoms. We see that the distributions of PRR2 in the ensemble from atomistic simulation and that from fMD simulation are consistent (Fig. 7 A). In contrast, the distributions in the ensemble by atomistic and pure-CG simulations are not consistent. The peak of distribution shifted from 5.0 Å of pure-CG simulation to 8.0 Å of fMD simulation.
Figure 7.
(A) Distribution of radius of gyration in the ensemble by all-atom (AA) (solid line), fMD (dashed line), and pure-CG (dotted line) simulations of PRR2 segment (residues 71–80). (B) The difference in the probability of each residue in AD1 (residues 17–29) assigned to turn conformation by DSSP software among AA (solid box), fMD (dashed box), and CG (dotted box) simulation.
This shows the CG potential in fMD simulation prefers extended conformation in comparison with the model used in pure-CG simulation. This shift is probably due to the large alteration of the potentials of the dihedral angle around the virtual bond X-Pro (X represents an arbitrary amino acid) described above (Fig. 4). Note that PRR2 includes four prolines in 10 residues. The same tendency of peak position shift was observed in the distribution of PRR1 and PRR3. These extended conformation preferences probably made the RDC values by the fMD simulation better agree with the experimental RDCs (Fig. 6). Thus, the PRR-segment can briefly be characterized by an extended ensemble, comparing it with featureless disordered peptides. The extended conformers in the PRR-segment may be effective for its binding to regulatory proteins via the fly-casting mechanism (4,10).
Next, the distributions of radius of gyration of AD1 in the ensemble by all-atom, fMD, and pure-CG simulations are similar (see Fig. S2 C). Thus, better agreement of the RDC values with the experiment of this region (Fig. 6) cannot be explained solely from such a conformational property. Then, we performed a secondary structure analysis of this region by using DSSP software (43) in order to obtain more local structural information. To analyze the ensembles by fMD and by pure-CG simulations, we rebuilt full atomistic information by BBQ and SCWRL 4. The result shows the difference in the probability of turns in AD1 among atomistic, fMD, and pure-CG simulations (Fig. 7 B). From Fig. 7 B, we see that residues 21 and 22 have high probability of being in turn in the ensemble as shown by atomistic and fMD simulations. In contrast, this tendency was not observed in the ensemble by pure-CG simulation. Thus, we consider that the model used in fMD gained this tendency through the potential construction process in which the atomistic simulation result was used. In addition, the tendency that residues 21 and 22 have a high probability of being in turn conformation probably made the RDC values by fMD simulation better agree with the experimental RDCs (Fig. 6).
To gain further insights into the structural property of the ensembles of AD1, we performed the clustering analyses with the k-means method. Before the clustering analysis, each sampled structure was superimposed to the initial structure of each simulation. The number of clusters was set to 30. The result of this analysis is depicted in Fig. 8, which shows representative structures in the clusters of each ensemble. We see that, in the ensemble of atomistic simulations (Fig. 8, left), the residues 21 and 22 (see bead representation in Fig. 8) in most structures (clusters 1 and 3) have turn conformation. This result is consistent with the result of secondary structure analysis (Fig. 7 B) described above. The clusters of the ensemble by fMD simulation (Fig. 8, center) have the same tendency as those of atomistic simulation in that the residues 21 and 22 are (clusters 1 and 3) turn-conformation with high probability. In contrast, clusters from pure-CG simulation (right column in Fig. 8) do not have such a tendency, and the kink position moves over a little. The CG potential in fMD is directly based on the local property of atomistic simulations, which led to similar clustering of the two ensembles.
Figure 8.
The result of clustering analysis of the ensemble from AA (left column), fMD (center column), and CG (right column) simulation. (Top to bottom) Representative structures in the cluster of each ensemble are arranged in decreasing order of the fraction which value is written just below the structure. The beads in each structure represent the position of Cα atom of residues 21 and 22 which have the high probability of turn conformation in the ensemble of atomistic and fMD simulation (Fig. 8).
In the above secondary structure analysis (Fig. 7 B), the agreement between atomic and fMD simulations was qualitative, but not quantitative. Besides, in the clustering analysis described above (Fig. 8), the agreement of kink position between atomistic and fMD simulation was observed but the structures themselves were not identical. These discrepancies are at least partly due to the lack of some nonbonded interactions such as hydrogen bonds, hydrophobic interactions, and van der Waals attractions. Such nonbonded terms can also be derived systematically by some recently developed multiscale methods (48–50), and inclusion of such terms may make the ensemble more accurate, in general. Yet, the close agreement between experimental and fMD-based RDC profiles here suggests that the contribution of these nonbonded terms to overall structural property is relatively minor for the case of p53 NTD.
Discussion and Conclusions
We presented a new computational and multiscale method that integrates all-atom replica exchange MD simulations for some selected fragments of the target sequence with CG MD simulations, and showed that the new method can produce a structural ensemble of p53 NTD, which accurately reproduces the RDC and SAXS data. This method can be viewed as an extension of the method developed by Mukrasch et al. (15).
Our approach uses MD simulations of a CG model to generate the IDP structural ensemble, which has more generality and thus has broader applicability than Monte Carlo or some chain-growth algorithms such as flexible-meccano. Clearly, dynamics can only be addressed by CG MD simulations. In addition, for large proteins that contain folded domains and disordered linkers, CG MD is advantageous because we can straightforwardly combine the current CG MD simulations for IDPs with other CG MD methods developed for folded domains. For the latter, we have been developing a generic CG MD software CafeMol (51) (http://www.cafemol.org), and the method presented in this article can easily be integrated into the software.
Acknowledgments
This work was partly supported by Grant-in-Aid for Scientific Research, partly by Research and Development of the Next-Generation Integrated Simulation of Living Matter, a part of the Development and Use of the Next-Generation Supercomputer Project of the Ministry of Education Culture, Sports, Science and Technology, and partly by the Global Centers of Excellence Program “Formation of a Strategic Base for Biodiversity and Evolutionary Research: from Genome to Ecosystem” of the Ministry of Education, Culture, Sports, Science, and Technology.
Supporting Material
References
- 1.Dunker A., Brown C., Obradovi Z. Intrinsic disorder and protein function. Biochemistry. 2002;41:6573–6582. doi: 10.1021/bi012159+. [DOI] [PubMed] [Google Scholar]
- 2.Dyson H.J., Wright P.E. Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol. 2005;6:197–208. doi: 10.1038/nrm1589. [DOI] [PubMed] [Google Scholar]
- 3.Cheng Y., LeGall T., Dunker A.K. Rational drug design via intrinsically disordered protein. Trends Biotechnol. 2006;24:435–442. doi: 10.1016/j.tibtech.2006.07.005. [DOI] [PubMed] [Google Scholar]
- 4.Shoemaker B.A., Portman J.J., Wolynes P.G. Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc. Natl. Acad. Sci. USA. 2000;97:8868–8873. doi: 10.1073/pnas.160259697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dedmon M.M., Lindorff-Larsen K., Dobson C.M. Mapping long-range interactions in α-synuclein using spin-label NMR and ensemble molecular dynamics simulations. J. Am. Chem. Soc. 2005;127:476–477. doi: 10.1021/ja044834j. [DOI] [PubMed] [Google Scholar]
- 6.Sugase K., Dyson H.J., Wright P.E. Mechanism of coupled folding and binding of an intrinsically disordered protein. Nature. 2007;447:1021–1025. doi: 10.1038/nature05858. [DOI] [PubMed] [Google Scholar]
- 7.Turjanski A.G., Gutkind J.S., Hummer G. Binding-induced folding of a natively unstructured transcription factor. PLOS Comput. Biol. 2008;4:e1000060. doi: 10.1371/journal.pcbi.1000060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Marsh J.A., Forman-Kay J.D. Structure and disorder in an unfolded state under nondenaturing conditions from ensemble models consistent with a large number of experimental restraints. J. Mol. Biol. 2009;391:359–374. doi: 10.1016/j.jmb.2009.06.001. [DOI] [PubMed] [Google Scholar]
- 9.De Simone A., Richter B., Vendruscolo M. Toward an accurate determination of free energy landscapes in solution states of proteins. J. Am. Chem. Soc. 2009;131:3810–3811. doi: 10.1021/ja8087295. [DOI] [PubMed] [Google Scholar]
- 10.Trizac E., Levy Y., Wolynes P.G. Capillarity theory for the fly-casting mechanism. Proc. Natl. Acad. Sci. USA. 2010;107:2746–2750. doi: 10.1073/pnas.0914727107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Vuzman D., Levy Y. DNA search efficiency is modulated by charge composition and distribution in the intrinsically disordered tail. Proc. Natl. Acad. Sci. USA. 2010;107:21004–21009. doi: 10.1073/pnas.1011775107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bernadó P., Blanchard L., Blackledge M. A structural model for unfolded proteins from residual dipolar couplings and small-angle x-ray scattering. Proc. Natl. Acad. Sci. USA. 2005;102:17002–17007. doi: 10.1073/pnas.0506202102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gillespie J.R., Shortle D. Characterization of long-range structure in the denatured state of staphylococcal nuclease. I. Paramagnetic relaxation enhancement by nitroxide spin labels. J. Mol. Biol. 1997;268:158–169. doi: 10.1006/jmbi.1997.0954. [DOI] [PubMed] [Google Scholar]
- 14.Jha A.K., Colubri A., Sosnick T.R. Statistical coil model of the unfolded state: resolving the reconciliation problem. Proc. Natl. Acad. Sci. USA. 2005;102:13099–13104. doi: 10.1073/pnas.0506078102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mukrasch M.D., Markwick P., Blackledge M. Highly populated turn conformations in natively unfolded tau protein identified from residual dipolar couplings and molecular simulation. J. Am. Chem. Soc. 2007;129:5235–5243. doi: 10.1021/ja0690159. [DOI] [PubMed] [Google Scholar]
- 16.Fisher C.K., Huang A., Stultz C.M. Modeling intrinsically disordered proteins with Bayesian statistics. J. Am. Chem. Soc. 2010;132:14919–14927. doi: 10.1021/ja105832g. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wong K.B., Clarke J., Daggett V. Towards a complete description of the structural and dynamic properties of the denatured state of barnase and the role of residual structure in folding. J. Mol. Biol. 2000;296:1257–1282. doi: 10.1006/jmbi.2000.3523. [DOI] [PubMed] [Google Scholar]
- 18.Makowska J., Rodziewicz-Motowidło S., Scheraga H.A. Polyproline II conformation is one of many local conformational states and is not an overall conformation of unfolded peptides and proteins. Proc. Natl. Acad. Sci. USA. 2006;103:1744–1749. doi: 10.1073/pnas.0510549103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hamelberg D., Mongan J., McCammon J.A. Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules. J. Chem. Phys. 2004;120:11919–11929. doi: 10.1063/1.1755656. [DOI] [PubMed] [Google Scholar]
- 20.Wells M., Tidow H., Fersht A.R. Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proc. Natl. Acad. Sci. USA. 2008;105:5762–5767. doi: 10.1073/pnas.0801353105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu P., Kim B., Berne B.J. Replica exchange with solute tempering: a method for sampling biological systems in explicit water. Proc. Natl. Acad. Sci. USA. 2005;102:13749–13754. doi: 10.1073/pnas.0506346102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huang X., Hagen M., Berne B.J. Replica exchange with solute tempering: efficiency in large scale systems. J. Phys. Chem. B. 2007;111:5405–5410. doi: 10.1021/jp068826w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Terakawa T., Kameda T., Takada S. On easy implementation of a variant of the replica exchange with solute tempering in GROMACS. J. Comput. Chem. 2011;32:1228–1234. doi: 10.1002/jcc.21703. [DOI] [PubMed] [Google Scholar]
- 24.Reith D., Pütz M., Müller-Plathe F. Deriving effective mesoscale potentials from atomistic simulations. J. Comput. Chem. 2003;24:1624–1636. doi: 10.1002/jcc.10307. [DOI] [PubMed] [Google Scholar]
- 25.Vogelstein B., Lane D., Levine A.J. Surfing the p53 network. Nature. 2000;408:307–310. doi: 10.1038/35042675. [DOI] [PubMed] [Google Scholar]
- 26.Joerger A.C., Fersht A.R. Structural biology of the tumor suppressor p53. Annu. Rev. Biochem. 2008;77:557–582. doi: 10.1146/annurev.biochem.77.060806.091238. [DOI] [PubMed] [Google Scholar]
- 27.Dawson R., Müller L., Buchner J. The N-terminal domain of p53 is natively unfolded. J. Mol. Biol. 2003;332:1131–1141. doi: 10.1016/j.jmb.2003.08.008. [DOI] [PubMed] [Google Scholar]
- 28.Kussie P.H., Gorina S., Pavletich N.P. Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science. 1996;274:948–953. doi: 10.1126/science.274.5289.948. [DOI] [PubMed] [Google Scholar]
- 29.Bochkareva E., Kaustov L., Bochkarev A. Single-stranded DNA mimicry in the p53 transactivation domain interaction with replication protein A. Proc. Natl. Acad. Sci. USA. 2005;102:15412–15417. doi: 10.1073/pnas.0504614102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Di Lello P., Jenkins L.M., Omichinski J.G. Structure of the Tfb1/p53 complex: insights into the interaction between the p62/Tfb1 subunit of TFIIH and the activation domain of p53. Mol. Cell. 2006;22:731–740. doi: 10.1016/j.molcel.2006.05.007. [DOI] [PubMed] [Google Scholar]
- 31.Vassilev L.T., Vu B.T., Liu E.A. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science. 2004;303:844–848. doi: 10.1126/science.1092472. [DOI] [PubMed] [Google Scholar]
- 32.Lee H., Mok K.H., Han K.H. Local structural elements in the mostly unstructured transcriptional activation domain of human p53. J. Biol. Chem. 2000;275:29426–29432. doi: 10.1074/jbc.M003107200. [DOI] [PubMed] [Google Scholar]
- 33.Lowry D.F., Stancik A., Daughdrill G.W. Modeling the accessible conformations of the intrinsically unstructured transactivation domain of p53. Proteins. 2008;71:587–598. doi: 10.1002/prot.21721. [DOI] [PubMed] [Google Scholar]
- 34.Honeycutt J.D., Thirumalai D. The nature of folded states of globular proteins. Biopolymers. 1992;32:695–709. doi: 10.1002/bip.360320610. [DOI] [PubMed] [Google Scholar]
- 35.Jorgensen W.L., Maxwell D.S., Tirado-Rives J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 1996;118:11225–11236. [Google Scholar]
- 36.Kaminski G.A., Friesner R.A., Jorgensen W.L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B. 2001;105:6474–6487. [Google Scholar]
- 37.Jorgensen W.L., Chandrasekhar J., Klein M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
- 38.Essmann U., Perera L., Pedersen L. A smooth particle mesh Ewald method. J. Chem. Phys. 1995;103:8577–8593. [Google Scholar]
- 39.Hess B. P-LINCS: a parallel linear constraint solver for molecular simulation. J. Chem. Theory Comput. 2008;4:116–122. doi: 10.1021/ct700200b. [DOI] [PubMed] [Google Scholar]
- 40.Miyamoto S., Kollman P.A. SETTLE: an analytical version of the SHAKE and RATTLE algorithm for rigid water models. J. Comput. Chem. 1992;13:952–962. [Google Scholar]
- 41.Hess B., Kutzner C., Lindahl E. GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 2008;4:435–447. doi: 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]
- 42.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 43.Gront D., Kmiecik S., Kolinski A. Backbone building from quadrilaterals: a fast and accurate algorithm for protein backbone reconstruction from α-carbon coordinates. J. Comput. Chem. 2007;28:1593–1597. doi: 10.1002/jcc.20624. [DOI] [PubMed] [Google Scholar]
- 44.Krivov G.G., Shapovalov M.V., Dunbrack R.L., Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins. 2009;77:778–795. doi: 10.1002/prot.22488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zweckstetter M., Bax A. Prediction of sterically induced alignment in a dilute liquid crystalline phase: aid to protein structure determination by NMR. J. Am. Chem. Soc. 2000;122:3791–3792. [Google Scholar]
- 46.Svergun D., Barberato C., Koch M.H.J. CRYSOL—a program to evaluate x-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Cryst. 1995;28:768–773. [Google Scholar]
- 47.Bryson K., McGuffin L.J., Jones D.T. Protein structure prediction servers at University College London. Nucleic Acids Res. 2005;33:W36–W38. doi: 10.1093/nar/gki410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hills R.D., Jr., Lu L., Voth G.A. Multiscale coarse-graining of the protein energy landscape. PLOS Comput. Biol. 2010;6:e1000827. doi: 10.1371/journal.pcbi.1000827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.DeVane R., Klein M.L., Moore P.B. Coarse-grained potential models for phenyl-based molecules: I. Parametrization using experimental data. J. Phys. Chem. B. 2010;114:6386–6393. doi: 10.1021/jp9117369. [DOI] [PubMed] [Google Scholar]
- 50.Betancourt M.R. Comparison between molecular dynamic based and knowledge based potentials for protein side chains. J. Comput. Biol. 2010;17:943–952. doi: 10.1089/cmb.2009.0152. [DOI] [PubMed] [Google Scholar]
- 51.Kenzaki H., Koga N., Takada S. CafeMol: a coarse-grained biomolecular simulator for simulating proteins at work. J. Chem. Theory Comput. 2011;7:1979–1989. doi: 10.1021/ct2001045. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.