# Calculating the root mean square fluctuation over a trajectory¶

We calculate the RMSF of the alpha-carbons in adenylate kinase (AdK) as it transitions from an open to closed structure, with reference to the average conformation of AdK.

Last executed: May 18, 2021 with MDAnalysis 1.1.1

Last updated: July 2, 2020 with MDAnalysis 1.0.0

Minimum version of MDAnalysis: 1.0.0

Packages required:

Note

MDAnalysis implements RMSD calculation using the fast QCP algorithm ([The05]) and a rotation matrix R that minimises the RMSD ([LAT09]). Please cite ([The05]) and ([LAT09]) when using the MDAnalysis.analysis.align and MDAnalysis.analysis.rms modules in published work.

:

import MDAnalysis as mda
from MDAnalysisData import datasets
from MDAnalysis.analysis import rms, align
import nglview as nv


The test files we will be working with here are an equilibrium trajectory of adenylate kinase (AdK), a phosophotransferase enzyme. ([SB17]) AdK has three domains:

• CORE

• LID: an ATP-binding domain

• NMP: an AMP-binding domain

The LID and NMP domains move around the stable CORE as the enzyme transitions between the opened and closed conformations. We therefore might wonder whether the LID and NMP residues are more mobile than the CORE residues. One way to quantify this flexibility is by calculating the root mean square fluctuation (RMSF) of atomic positions.

:

adk = datasets.fetch_adk_equilibrium()


## Background¶

The root-mean-square-fluctuation (RMSF) of a structure is the time average of the RMSD. It is calculated according to the below equation, where $$\mathbf{x}_i$$ is the coordinates of particle $$i$$, and $$\langle\mathbf{x}_i\rangle$$ is the ensemble average position of $$i$$.

$\rho_i = \sqrt{\left\langle (\mathbf{x}_i - \langle\mathbf{x}_i\rangle)^2 \right\rangle}$

Where the RMSD quantifies how much a structure diverges from a reference over time, the RSMF can reveal which areas of the system are the most mobile. While RMSD is frequently calculated to an initial state, the RMSF should be calculated to an average structure of the simulation. An area of the structure with high RMSF values frequently diverges from the average, indicating high mobility. When RMSF analysis is carried out on proteins, it is typically restricted to backbone or alpha-carbon atoms; these are more characteristic of conformational changes than the more flexible side-chains.

## Creating an average structure¶

We can generate an average structure to align to with the align.AverageStructure class. Here we first align to the first frame (ref_frame=0), and then average the coordinates.

:

average = align.AverageStructure(u, u, select='protein and name CA',
ref_frame=0).run()
ref = average.universe


## Aligning the trajectory to a reference¶

rms.RMSF does not allow on-the-fly alignment to a reference, and presumes that you have already aligned the trajectory. Therefore we need to first align our trajectory to the average conformation.

:

aligner = align.AlignTraj(u, ref,
select='protein and name CA',
in_memory=True).run()


You may not have enough memory to load the trajectory into memory. In that case, save this aligned trajectory to a file and re-load it into MDAnalysis by uncommenting the code below.

:

# aligner = align.AlignTraj(u, ref,
#                           select='protein and name CA',
#                           filename='aligned_traj.dcd',
#                           in_memory=False).run()
# u = mda.Universe(PSF, 'aligned_traj.dcd')


## Calculating RMSF¶

The trajectory is now fitted to the reference, and the RMSF (API docs) can be calculated.

Note

MDAnalysis implements an algorithm that computes sums of squares and avoids underflows or overflows. Please cite ([Wel62]) when using the MDAnalysis.analysis.rms.RMSF class in published work.

:

c_alphas = u.select_atoms('protein and name CA')
R = rms.RMSF(c_alphas).run()


## Plotting RMSF¶

We can now plot the RMSF using the common plotting package matplotlib.

:

import matplotlib.pyplot as plt
%matplotlib inline


As we can see, the LID and NMP residues indeed move much more compared to the rest of the enzyme.

:

plt.plot(c_alphas.resids, R.rmsf)
plt.xlabel('Residue number')
plt.ylabel('RMSF ($\AA$)')
plt.axvspan(122, 159, zorder=0, alpha=0.2, color='orange', label='LID')
plt.axvspan(30, 59, zorder=0, alpha=0.2, color='green', label='NMP')
plt.legend(); ## Visualising RMSF as B-factors¶

Colouring a protein by RMSF allows you to visually identify regions of high fluctuation. This is commonly done by setting temperature factor (also known as b-factor) values, writing out to a format with B-factor specification (e.g. PDB), and visualising the file in a program such as VMD or nglview.

MDAnalysis uses the tempfactor topology attribute for this information. Below, we iterate through each residue of the protein and set the tempfactor attribute for every atom in the residue to the alpha-carbon RMSF value; this is necessary so every atom in the residue is coloured with the alpha-carbon RMSF.

:

u.add_TopologyAttr('tempfactors') # add empty attribute for all atoms
protein = u.select_atoms('protein') # select protein atoms
for residue, r_value in zip(protein.residues, R.rmsf):
residue.atoms.tempfactors = r_value


Below we visualise these values with a rainbow colour scheme. Purple values correspond to low RMSF values, whereas red values correspond to high RMSFs.

:

view = nv.show_mdanalysis(u)
view.update_representation(color_scheme='bfactor')
view


You can also write the atoms to a file and visualise it in another program. As the original Universe did not contain altLocs, icodes or occupancies for each atom, some warnings will be printed (which are not visible here).

:

u.atoms.write('rmsf_tempfactors.pdb')

/Users/lily/anaconda3/envs/mda-user-guide/lib/python3.7/site-packages/MDAnalysis/coordinates/PDB.py:864: DeprecationWarning: Using the last letter of the segid for the chainID is now deprecated and will be changed in 2.0. In 2.0, the chainID attribute will be used if it exists, or a placeholder value.
"exists, or a placeholder value.", DeprecationWarning)