Non-linear dimension reduction to diffusion maps¶
Here we reduce the dimensions of a trajectory into a diffusion map.
Last executed: Dec 27, 2020 with MDAnalysis 1.0.0
Last updated: Dec 2020
Minimum version of MDAnalysis: 0.17.0
Packages required:
Note
Please cite [CL06] if you use the MDAnalysis.analysis.diffusionmap.DiffusionMap
in published work.
[1]:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD
from MDAnalysis.analysis import diffusionmap
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Loading files¶
The test files we will be working with here feature adenylate kinase (AdK), a phosophotransferase enzyme ([BDPW09]). The trajectory DCD
samples a transition from a closed to an open conformation.
[2]:
u = mda.Universe(PSF, DCD)
Diffusion maps¶
Diffusion maps are a non-linear dimensionality reduction technique that embeds the coordinates of each frame onto a lower-dimensional space, such that the distance between each frame in the lower-dimensional space represents their “diffusion distance”, or similarity. It integrates local information about the similarity of each point to its neighours, into a global geometry of the intrinsic manifold. This means that this technique is not suitable for trajectories where the transitions between conformational states are not well-sampled (e.g. replica exchange simulations), as the regions may become disconnected and a meaningful global geometry cannot be approximated. Unlike principal component analysis, there is no explicit mapping between the components of the lower-dimensional space and the original atomic coordinates; no physical interpretation of the eigenvectors is immediately available. Please see [CL06], [dlPHHvdW08], [RZMC11] and [FPKD11] for more information.
The default distance metric implemented in MDAnalysis’ DiffusionMap class is RMSD.
Note
MDAnalysis implements RMSD calculation using the fast QCP algorithm ([The05]). Please cite [The05] if you use the default distance metric in published work.
[3]:
dmap = diffusionmap.DiffusionMap(u, select='backbone', epsilon=2)
dmap.run()
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-3-21344cb8a5bd> in <module>
1 dmap = diffusionmap.DiffusionMap(u, select='backbone', epsilon=2)
----> 2 dmap.run()
~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/analysis/diffusionmap.py in run(self, start, stop, step)
334 # run only if distance matrix not already calculated
335 if not self._dist_matrix._calculated:
--> 336 self._dist_matrix.run(start=start, stop=stop, step=step)
337 # important for transform function and length of .run() method
338 self._n_frames = self._dist_matrix.n_frames
~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/analysis/base.py in run(self, start, stop, step, verbose)
195 self.times[i] = ts.time
196 # logger.info("--> Doing frame {} of {}".format(i+1, self.n_frames))
--> 197 self._single_frame()
198 logger.info("Finishing up")
199 self._conclude()
~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/analysis/diffusionmap.py in _single_frame(self)
257 for j, ts in enumerate(self._u.trajectory[iframe:self.stop:self.step]):
258 self._ts = ts
--> 259 j_ref = self.atoms.positions
260 dist = self._metric(i_ref, j_ref, weights=self._weights)
261 self.dist_matrix[self._frame_index, j+self._frame_index] = (
~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/core/groups.py in positions(self)
2480 :attr:`~MDAnalysis.coordinates.base.Timestep.positions`.
2481 """
-> 2482 return self.universe.trajectory.ts.positions[self.ix]
2483
2484 @positions.setter
KeyboardInterrupt:
The first eigenvector in a diffusion map is always essentially all ones (when divided by a constant):
[ ]:
dmap._eigenvectors[:, 0]
Therefore, when we embed the trajectory onto the dominant eigenvectors, we ignore the first eigenvector. In order to determine which vectors are dominant, we can examine the eigenvalues for a spectral gap: where the eigenvalues stop decreasing constantly in value.
[ ]:
fig, ax = plt.subplots()
ax.plot(dmap.eigenvalues[1:16])
From this plot, we take the first k dominant eigenvectors to be the first five. Below, we transform the trajectory onto these eigenvectors. The time
argument is the exponent that the eigenvalues are raised to for embedding. As values increase for time
, more dominant eigenvectors (with lower eigenvalues) dominate the diffusion distance more. The transform
method returns an array of shape (# frames, # eigenvectors).
[ ]:
transformed = dmap.transform(5, # number of eigenvectors
time=1)
transformed.shape
For easier analysis and plotting we can turn the array into a DataFrame.
[ ]:
df = pd.DataFrame(transformed,
columns=['Mode{}'.format(i+2) for i in range(5)])
df['Time (ps)'] = df.index * u.trajectory.dt
df.head()
There are several ways we can visualise the data. Using the Seaborn’s PairGrid
tool is the quickest and easiest way, if you have seaborn already installed. Each of the subplots below illustrates axes of the lower-dimensional embedding of the higher-dimensional data, such that dots (frames) that are close are kinetically close (connected by a large number of short pathways), whereas greater distance indicates states that are connected by a smaller number of long pathways. Please see
[FPKD11] for more information.
Note
You will need to install the data visualisation library Seaborn for this function.
[ ]:
import seaborn as sns
g = sns.PairGrid(df, hue='Time (ps)',
palette=sns.color_palette('Oranges_d',
n_colors=len(df)))
g.map(plt.scatter, marker='.')
References¶
[1] Oliver Beckstein, Elizabeth J. Denning, Juan R. Perilla, and Thomas B. Woolf. Zipping and Unzipping of Adenylate Kinase: Atomistic Insights into the Ensemble of Open↔Closed Transitions. Journal of Molecular Biology, 394(1):160–176, November 2009. 00107. URL: https://linkinghub.elsevier.com/retrieve/pii/S0022283609011164, doi:10.1016/j.jmb.2009.09.009.
[2] Ronald R. Coifman and Stéphane Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, July 2006. 02271. doi:10.1016/j.acha.2006.04.006.
[3] J. de la Porte, B. M. Herbst, W. Hereman, and S. J. van der Walt. An introduction to diffusion maps. In In The 19th Symposium of the Pattern Recognition Association of South Africa. 2008. 00038.
[4] Andrew Ferguson, Athanassios Z. Panagiotopoulos, Ioannis G. Kevrekidis, and Pablo G. Debenedetti. Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach. Chemical Physics Letters, 509(1-3):1–11, June 2011. 00085. doi:10.1016/j.cplett.2011.04.066.
[5] Richard J. Gowers, Max Linke, Jonathan Barnoud, Tyler J. E. Reddy, Manuel N. Melo, Sean L. Seyler, Jan Domański, David L. Dotson, Sébastien Buchoux, Ian M. Kenney, and Oliver Beckstein. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. Proceedings of the 15th Python in Science Conference, pages 98–105, 2016. 00152. URL: https://conference.scipy.org/proceedings/scipy2016/oliver_beckstein.html, doi:10.25080/Majora-629e541a-00e.
[6] Naveen Michaud-Agrawal, Elizabeth J. Denning, Thomas B. Woolf, and Oliver Beckstein. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. Journal of Computational Chemistry, 32(10):2319–2327, July 2011. 00778. URL: http://doi.wiley.com/10.1002/jcc.21787, doi:10.1002/jcc.21787.
[7] Mary A. Rohrdanz, Wenwei Zheng, Mauro Maggioni, and Cecilia Clementi. Determination of reaction coordinates via locally scaled diffusion map. The Journal of Chemical Physics, 134(12):124116, March 2011. 00220. doi:10.1063/1.3569857.
[8] Douglas L. Theobald. Rapid calculation of RMSDs using a quaternion-based characteristic polynomial. Acta Crystallographica Section A Foundations of Crystallography, 61(4):478–480, July 2005. 00127. URL: http://scripts.iucr.org/cgi-bin/paper?S0108767305015266, doi:10.1107/S0108767305015266.