{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Non-linear dimension reduction to diffusion maps\n", "\n", "Here we reduce the dimensions of a trajectory into a diffusion map.\n", "\n", "**Last executed:** Dec 27, 2020 with MDAnalysis 1.0.0\n", "\n", "**Last updated:** Dec 2020\n", "\n", "**Minimum version of MDAnalysis:** 0.17.0\n", "\n", "**Packages required:**\n", " \n", "* MDAnalysis (Michaud-Agrawal *et al.*, 2011, Gowers *et al.*, 2016)\n", "* MDAnalysisTests\n", "\n", "
\n", " \n", "**Note**\n", "\n", "Please cite Coifman and Lafon, 2006 if you use the ``MDAnalysis.analysis.diffusionmap.DiffusionMap`` in published work.\n", "\n", "
\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:42.555038Z", "start_time": "2021-01-05T02:45:41.747445Z" } }, "outputs": [], "source": [ "import MDAnalysis as mda\n", "from MDAnalysis.tests.datafiles import PSF, DCD\n", "from MDAnalysis.analysis import diffusionmap\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading files\n", "\n", "The test files we will be working with here feature adenylate kinase (AdK), a phosophotransferase enzyme (Beckstein *et al.*, 2009). The trajectory ``DCD`` samples a transition from a closed to an open conformation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:42.762521Z", "start_time": "2021-01-05T02:45:42.556396Z" } }, "outputs": [], "source": [ "u = mda.Universe(PSF, DCD)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Diffusion maps\n", "\n", "Diffusion maps are a non-linear dimensionality reduction technique that embeds the coordinates of each frame onto a lower-dimensional space, such that the distance between each frame in the lower-dimensional space represents their “diffusion distance”, or similarity. It integrates local information about the similarity of each point to its neighours, into a global geometry of the intrinsic manifold. This means that this technique is not suitable for trajectories where the transitions between conformational states are not well-sampled (e.g. replica exchange simulations), as the regions may become disconnected and a meaningful global geometry cannot be approximated. Unlike [principal component analysis](pca.ipynb), there is no explicit mapping between the components of the lower-dimensional space and the original atomic coordinates; no physical interpretation of the eigenvectors is immediately available. \n", "Please see Coifman and Lafon, 2006, Porte *et al.*, 2008, Rohrdanz *et al.*, 2011 and Ferguson *et al.*, 2011 for more information.\n", "\n", "The default distance metric implemented in MDAnalysis' DiffusionMap class is RMSD.\n", "\n", "
\n", " \n", "**Note**\n", "\n", "MDAnalysis implements RMSD calculation using the fast QCP algorithm (Theobald, 2005). Please cite Theobald, 2005 if you use the default distance metric in published work.\n", "\n", "
\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:52.118650Z", "start_time": "2021-01-05T02:45:42.765100Z" } }, "outputs": [ { "ename": "KeyboardInterrupt", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mdmap\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdiffusionmap\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDiffusionMap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mu\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mselect\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'backbone'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mepsilon\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdmap\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/analysis/diffusionmap.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, start, stop, step)\u001b[0m\n\u001b[1;32m 334\u001b[0m \u001b[0;31m# run only if distance matrix not already calculated\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 335\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_dist_matrix\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_calculated\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 336\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_dist_matrix\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstop\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstop\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstep\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstep\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 337\u001b[0m \u001b[0;31m# important for transform function and length of .run() method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 338\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_n_frames\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_dist_matrix\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mn_frames\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/analysis/base.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, start, stop, step, verbose)\u001b[0m\n\u001b[1;32m 195\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtimes\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mts\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 196\u001b[0m \u001b[0;31m# logger.info(\"--> Doing frame {} of {}\".format(i+1, self.n_frames))\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 197\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_single_frame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 198\u001b[0m \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Finishing up\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 199\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_conclude\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/analysis/diffusionmap.py\u001b[0m in \u001b[0;36m_single_frame\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 257\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mts\u001b[0m \u001b[0;32min\u001b[0m \u001b[0menumerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_u\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrajectory\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0miframe\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstop\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstep\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 258\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_ts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mts\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 259\u001b[0;31m \u001b[0mj_ref\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0matoms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpositions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 260\u001b[0m \u001b[0mdist\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_metric\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi_ref\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mj_ref\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweights\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_weights\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 261\u001b[0m self.dist_matrix[self._frame_index, j+self._frame_index] = (\n", "\u001b[0;32m~/anaconda3/envs/mda-user-guide/lib/python3.8/site-packages/MDAnalysis/core/groups.py\u001b[0m in \u001b[0;36mpositions\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 2480\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0mattr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;31m`\u001b[0m\u001b[0;34m~\u001b[0m\u001b[0mMDAnalysis\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcoordinates\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbase\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTimestep\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpositions\u001b[0m\u001b[0;31m`\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2481\u001b[0m \"\"\"\n\u001b[0;32m-> 2482\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0muniverse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrajectory\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mts\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpositions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mix\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2483\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2484\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mpositions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msetter\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyboardInterrupt\u001b[0m: " ] } ], "source": [ "dmap = diffusionmap.DiffusionMap(u, select='backbone', epsilon=2)\n", "dmap.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first eigenvector in a diffusion map is always essentially all ones (when divided by a constant):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:52.121585Z", "start_time": "2021-01-05T02:45:41.752Z" } }, "outputs": [], "source": [ "dmap._eigenvectors[:, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Therefore, when we embed the trajectory onto the dominant eigenvectors, we ignore the first eigenvector. In order to determine which vectors are dominant, we can examine the eigenvalues for a **spectral gap**: where the eigenvalues stop decreasing constantly in value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:52.122785Z", "start_time": "2021-01-05T02:45:41.753Z" }, "scrolled": true }, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "ax.plot(dmap.eigenvalues[1:16])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this plot, we take the first *k* dominant eigenvectors to be the first five. Below, we transform the trajectory onto these eigenvectors. The ``time`` argument is the exponent that the eigenvalues are raised to for embedding. As values increase for ``time``, more dominant eigenvectors (with lower eigenvalues) dominate the diffusion distance more. The ``transform`` method returns an array of shape (# frames, # eigenvectors)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:52.124022Z", "start_time": "2021-01-05T02:45:41.755Z" } }, "outputs": [], "source": [ "transformed = dmap.transform(5, # number of eigenvectors\n", " time=1)\n", "transformed.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For easier analysis and plotting we can turn the array into a DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:52.125160Z", "start_time": "2021-01-05T02:45:41.758Z" } }, "outputs": [], "source": [ "df = pd.DataFrame(transformed, \n", " columns=['Mode{}'.format(i+2) for i in range(5)])\n", "df['Time (ps)'] = df.index * u.trajectory.dt\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several ways we can visualise the data. Using the Seaborn's `PairGrid` tool is the quickest and easiest way, if you have seaborn already installed. Each of the subplots below illustrates axes of the lower-dimensional embedding of the higher-dimensional data, such that dots (frames) that are close are kinetically close (connected by a large number of short pathways), whereas greater distance indicates states that are connected by a smaller number of long pathways. Please see Ferguson *et al.*, 2011 for more information.\n", "\n", "
\n", " \n", "**Note**\n", "\n", "You will need to install the data visualisation library [Seaborn](https://seaborn.pydata.org/installing.html) for this function.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-01-05T02:45:52.126187Z", "start_time": "2021-01-05T02:45:41.759Z" } }, "outputs": [], "source": [ "import seaborn as sns\n", "\n", "g = sns.PairGrid(df, hue='Time (ps)', \n", " palette=sns.color_palette('Oranges_d',\n", " n_colors=len(df)))\n", "g.map(plt.scatter, marker='.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "[1] Oliver Beckstein, Elizabeth J. Denning, Juan R. Perilla, and Thomas B. Woolf.\n", "Zipping and Unzipping of Adenylate Kinase: Atomistic Insights into the Ensemble of OpenClosed Transitions.\n", "Journal of Molecular Biology, 394(1):160–176, November 2009.\n", "00107.\n", "URL: https://linkinghub.elsevier.com/retrieve/pii/S0022283609011164, doi:10.1016/j.jmb.2009.09.009.\n", "\n", "[2] Ronald R. Coifman and Stéphane Lafon.\n", "Diffusion maps.\n", "Applied and Computational Harmonic Analysis, 21(1):5–30, July 2006.\n", "02271.\n", "doi:10.1016/j.acha.2006.04.006.\n", "\n", "[3] J. de la Porte, B. M. Herbst, W. Hereman, and S. J. van der Walt.\n", "An introduction to diffusion maps.\n", "In In The 19th Symposium of the Pattern Recognition Association of South Africa. 2008.\n", "00038.\n", "\n", "[4] Andrew Ferguson, Athanassios Z. Panagiotopoulos, Ioannis G. Kevrekidis, and Pablo G. Debenedetti.\n", "Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach.\n", "Chemical Physics Letters, 509(1-3):1–11, June 2011.\n", "00085.\n", "doi:10.1016/j.cplett.2011.04.066.\n", "\n", "[5] Richard J. Gowers, Max Linke, Jonathan Barnoud, Tyler J. E. Reddy, Manuel N. Melo, Sean L. Seyler, Jan Domański, David L. Dotson, Sébastien Buchoux, Ian M. Kenney, and Oliver Beckstein.\n", "MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations.\n", "Proceedings of the 15th Python in Science Conference, pages 98–105, 2016.\n", "00152.\n", "URL: https://conference.scipy.org/proceedings/scipy2016/oliver_beckstein.html, doi:10.25080/Majora-629e541a-00e.\n", "\n", "[6] Naveen Michaud-Agrawal, Elizabeth J. Denning, Thomas B. Woolf, and Oliver Beckstein.\n", "MDAnalysis: A toolkit for the analysis of molecular dynamics simulations.\n", "Journal of Computational Chemistry, 32(10):2319–2327, July 2011.\n", "00778.\n", "URL: http://doi.wiley.com/10.1002/jcc.21787, doi:10.1002/jcc.21787.\n", "\n", "[7] Mary A. Rohrdanz, Wenwei Zheng, Mauro Maggioni, and Cecilia Clementi.\n", "Determination of reaction coordinates via locally scaled diffusion map.\n", "The Journal of Chemical Physics, 134(12):124116, March 2011.\n", "00220.\n", "doi:10.1063/1.3569857.\n", "\n", "[8] Douglas L. Theobald.\n", "Rapid calculation of RMSDs using a quaternion-based characteristic polynomial.\n", "Acta Crystallographica Section A Foundations of Crystallography, 61(4):478–480, July 2005.\n", "00127.\n", "URL: http://scripts.iucr.org/cgi-bin/paper?S0108767305015266, doi:10.1107/S0108767305015266." ] } ], "metadata": { "kernelspec": { "display_name": "Python (mda-user-guide)", "language": "python", "name": "mda-user-guide" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }