Calculating the pairwise RMSD of a trajectory¶
Last updated: June 26, 2020 with MDAnalysis 1.0.0
Minimum version of MDAnalysis: 0.17.0
import MDAnalysis as mda from MDAnalysis.tests.datafiles import PSF, DCD, CRD, DCD2 from MDAnalysis.analysis import diffusionmap, align, rms import numpy as np import matplotlib.pyplot as plt %matplotlib inline
The test files we will be working with here feature adenylate kinase (AdK), a phosophotransferase enzyme. ([BDPW09]) The trajectories sample a transition from a closed to an open conformation.
adk_open = mda.Universe(CRD, DCD2) adk_closed = mda.Universe(PSF, DCD)
While 1-dimensional RMSD is a quick way to estimate how much a structure changes over time, it can be a misleading measure. It is easy to think that two structures with the same RMSD from a reference frame are also similar; but in fact, they can be very different. Instead, calculating the RMSD of each frame in the trajectory to all other frames in the other trajectory can contain much more information. This measure is often called the pairwise, all-to-all, or 2D RMSD.
The other, or reference, trajectory in pairwise RMSD can either be the first trajectory, or another one. If the pairwise RMSD of a trajectory is calculated to itself, it can be used to gain insight into the conformational convergence of the simulation. The diagonal of the plot will be zero in this case (as this represents the RMSD of a structure to itself). Blocks of low RMSD values along the diagonal indicate similar structures, suggesting the occupation of a given state. Blocks of low RMSD values off the diagonal indicate that the trajectory is revisiting an earlier state. Please see the living guide Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations by Grossfield et al. for more on using 2D RMSD as a measure of convergence. MDAnalysis provides a DistanceMatrix class for easy calculation of the pairwise RMSD of a trajectory to itself.
When the other trajectory in pairwise RMSD is a different trajectory, the pairwise RMSD can be used to compare the similarity of the two conformational ensembles. There is no requirement that the two trajectories be the same length. In this case, the diagonal is no longer necessarily zero. Blocks of low RMSD values anywhere indicate that the two trajectories are sampling similar states. Pairwise RMSDs with different trajectories must be manually calculated in MDAnalysis.
Pairwise RMSD of a trajectory to itself¶
In order to calculate the pairwise RMSD of a trajectory to itself, you should begin by aligning the trajectory in order to minimise the resulting RMSD. You may not have enough memory to do this
in_memory, in which case you can write out the aligned trajectory to a file (please see the aligning tutorials for more).
aligner = align.AlignTraj(adk_open, adk_open, select='name CA', in_memory=True).run()
matrix = diffusionmap.DistanceMatrix(adk_open, select='name CA').run()
The results array is in
matrix.dist_matrix as a square array with the shape (#n_frames, #n_frame).
We can use the common plotting package matplotlib to create a heatmap from this array. For other ways to plot heat maps, you can look at the seaborn, plotly (for interactive images), or holoviews (also interactive) packages.
plt.imshow(matrix.dist_matrix, cmap='viridis') plt.xlabel('Frame') plt.ylabel('Frame') plt.colorbar(label=r'RMSD ($\AA$)')
<matplotlib.colorbar.Colorbar at 0x1130bf710>
Pairwise RMSD between two trajectories¶
Calculating the 2D RMSD between two trajectories is a bit more finicky;
DistanceMatrix can only calculate the RMSD of a trajectory to itself. Instead, we do it the long way by simply calculating the RMSD of each frame in the second trajectory, to each frame in the first trajectory.
First we set up a 2D numpy array a shape corresponding to the length of each of our trajectories to store our results. To start off, it is populated with zeros.
prmsd = np.zeros((len(adk_open.trajectory), # y-axis len(adk_closed.trajectory))) # x-axis
Then we iterate through each frame of the
adk_open trajectory (our y-axis), and calculate the RMSD of
adk_closed to each frame, storing it in the
for i, frame_open in enumerate(adk_open.trajectory): r = rms.RMSD(adk_closed, adk_open, select='name CA', ref_frame=i).run() prmsd[i] = r.rmsd[:, -1] # select 3rd column with RMSD values
We plot it below. As you can see, there is no longer a line of zero values across the diagonal. Here, the frames of
adk_open are similar but not identical.
plt.imshow(prmsd, cmap='viridis') plt.xlabel('Frame (adk_closed)') plt.ylabel('Frame (adk_open)') plt.colorbar(label=r'RMSD ($\AA$)')
<matplotlib.colorbar.Colorbar at 0x113053cd0>
 Oliver Beckstein, Elizabeth J. Denning, Juan R. Perilla, and Thomas B. Woolf. Zipping and Unzipping of Adenylate Kinase: Atomistic Insights into the Ensemble of Open↔Closed Transitions. Journal of Molecular Biology, 394(1):160–176, November 2009. 00107. URL: https://linkinghub.elsevier.com/retrieve/pii/S0022283609011164, doi:10.1016/j.jmb.2009.09.009.
 Richard J. Gowers, Max Linke, Jonathan Barnoud, Tyler J. E. Reddy, Manuel N. Melo, Sean L. Seyler, Jan Domański, David L. Dotson, Sébastien Buchoux, Ian M. Kenney, and Oliver Beckstein. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. Proceedings of the 15th Python in Science Conference, pages 98–105, 2016. 00152. URL: https://conference.scipy.org/proceedings/scipy2016/oliver_beckstein.html, doi:10.25080/Majora-629e541a-00e.
 Naveen Michaud-Agrawal, Elizabeth J. Denning, Thomas B. Woolf, and Oliver Beckstein. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. Journal of Computational Chemistry, 32(10):2319–2327, July 2011. 00778. URL: http://doi.wiley.com/10.1002/jcc.21787, doi:10.1002/jcc.21787.
 Douglas L. Theobald. Rapid calculation of RMSDs using a quaternion-based characteristic polynomial. Acta Crystallographica Section A Foundations of Crystallography, 61(4):478–480, July 2005. 00127. URL: http://scripts.iucr.org/cgi-bin/paper?S0108767305015266, doi:10.1107/S0108767305015266.