{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Calculating the Harmonic Ensemble Similarity between ensembles\n", "\n", "Here we compare the conformational ensembles of proteins in four trajectories, using the harmonic ensemble similarity method.\n", "\n", "**Last updated:** December 2022 with MDAnalysis 2.4.0-dev0\n", "\n", "**Last updated:** December 2022\n", "\n", "**Minimum version of MDAnalysis:** 1.0.0\n", "\n", "**Packages required:**\n", " \n", "* MDAnalysis (Michaud-Agrawal *et al.*, 2011, Gowers *et al.*, 2016)\n", "* MDAnalysisTests\n", "\n", "\n", "**Optional packages for visualisation:**\n", "\n", "* [matplotlib](https://matplotlib.org)\n", "\n", "\n", "
\n", " \n", "**Note**\n", "\n", "The metrics and methods in the `encore` module are from (Tiberti *et al.*, 2015). Please cite them when using the ``MDAnalysis.analysis.encore`` module in published work.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-09-25T05:47:24.628167Z", "start_time": "2020-09-25T05:47:22.580836Z" }, "execution": { "iopub.execute_input": "2021-05-19T06:01:11.019242Z", "iopub.status.busy": "2021-05-19T06:01:11.018634Z", "iopub.status.idle": "2021-05-19T06:01:12.662416Z", "shell.execute_reply": "2021-05-19T06:01:12.662787Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "import MDAnalysis as mda\n", "from MDAnalysis.tests.datafiles import (PSF, DCD, DCD2, GRO, XTC, \n", " PSF_NAMD_GBIS, DCD_NAMD_GBIS,\n", " PDB_small, CRD)\n", "from MDAnalysis.analysis import encore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The test files we will be working with here feature adenylate kinase (AdK), a phosophotransferase enzyme. (Beckstein *et al.*, 2009)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-09-25T05:47:34.271010Z", "start_time": "2020-09-25T05:47:33.722128Z" }, "execution": { "iopub.execute_input": "2021-05-19T06:01:12.666701Z", "iopub.status.busy": "2021-05-19T06:01:12.666185Z", "iopub.status.idle": "2021-05-19T06:01:13.384417Z", "shell.execute_reply": "2021-05-19T06:01:13.384865Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/pbarletta/mambaforge/envs/mda-user-guide/lib/python3.9/site-packages/MDAnalysis/coordinates/DCD.py:165: DeprecationWarning: DCDReader currently makes independent timesteps by copying self.ts while other readers update self.ts inplace. This behaviour will be changed in 3.0 to be the same as other readers\n", " warnings.warn(\"DCDReader currently makes independent timesteps\"\n" ] } ], "source": [ "u1 = mda.Universe(PSF, DCD)\n", "u2 = mda.Universe(PSF, DCD2)\n", "u3 = mda.Universe(GRO, XTC)\n", "u4 = mda.Universe(PSF_NAMD_GBIS, DCD_NAMD_GBIS)\n", "\n", "labels = ['DCD', 'DCD2', 'XTC', 'NAMD']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The trajectories can have different lengths, as seen below." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-09-25T05:47:35.055174Z", "start_time": "2020-09-25T05:47:35.049296Z" }, "execution": { "iopub.execute_input": "2021-05-19T06:01:13.388558Z", "iopub.status.busy": "2021-05-19T06:01:13.388056Z", "iopub.status.idle": "2021-05-19T06:01:13.390062Z", "shell.execute_reply": "2021-05-19T06:01:13.390462Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "98 102 10\n" ] } ], "source": [ "print(len(u1.trajectory), len(u2.trajectory), len(u3.trajectory))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating harmonic similarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The harmonic ensemble similarity method treats the conformational ensemble within each trajectory as a high-dimensional Gaussian distribution $N(\\mu, \\Sigma)$. The mean $\\mu$ is estimated as the average over the ensemble. The covariance matrix $\\Sigma$ is calculated either using a shrinkage estimator (`cov_estimator='shrinkage'`) or a maximum-likelihood method (`cov_estimator='ml'`).\n", "\n", "The harmonic ensemble similarity is then calculated using the symmetrised version of the Kullback-Leibler divergence. This has no upper bound, so you can get some very high values for very different ensembles.\n", "\n", "The function we will use is `encore.hes` ([API docs here](https://docs.mdanalysis.org/stable/documentation_pages/analysis/encore/similarity.html#MDAnalysis.analysis.encore.similarity.hes)). It is recommended that you align your trajectories before computing the harmonic similarity. You can either do this yourself with `align.AlignTraj`, or pass `align=True` into `encore.hes`. The latter option will align each of your Universes to the current timestep of the first Universe. Note that since `encore.hes` will pull your trajectories into memory, this changes the positions of your Universes." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2020-09-25T05:48:57.293061Z", "start_time": "2020-09-25T05:47:36.853297Z" }, "execution": { "iopub.execute_input": "2021-05-19T06:01:13.393984Z", "iopub.status.busy": "2021-05-19T06:01:13.393436Z", "iopub.status.idle": "2021-05-19T06:03:04.423398Z", "shell.execute_reply": "2021-05-19T06:03:04.424122Z" } }, "outputs": [], "source": [ "hes, details = encore.hes([u1, u2, u3, u4],\n", " select='backbone',\n", " align=True,\n", " cov_estimator='shrinkage',\n", " weights='mass')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-09-25T05:49:00.760513Z", "start_time": "2020-09-25T05:49:00.753313Z" }, "execution": { "iopub.execute_input": "2021-05-19T06:03:04.433767Z", "iopub.status.busy": "2021-05-19T06:03:04.432850Z", "iopub.status.idle": "2021-05-19T06:03:04.436211Z", "shell.execute_reply": "2021-05-19T06:03:04.436792Z" }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.0 24955.7 1879874.5 145622.3 \n", " 24955.7 0.0 1659867.5 161102.3 \n", " 1879874.5 1659867.5 0.0 9900092.7 \n", " 145622.3 161102.3 9900092.7 0.0 \n" ] } ], "source": [ "for row in hes:\n", " for h in row:\n", " print(\"{:>10.1f}\".format(h), end = ' ')\n", " print(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean and covariance matrices for each Universe are saved in `details`. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2020-09-25T05:49:19.580520Z", "start_time": "2020-09-25T05:49:19.575275Z" }, "execution": { "iopub.execute_input": "2021-05-19T06:03:04.443752Z", "iopub.status.busy": "2021-05-19T06:03:04.442721Z", "iopub.status.idle": "2021-05-19T06:03:04.446973Z", "shell.execute_reply": "2021-05-19T06:03:04.447746Z" } }, "outputs": [ { "data": { "text/plain": [ "(2565,)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "details[\"ensemble1_mean\"].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-09-25T05:49:23.911436Z", "start_time": "2020-09-25T05:49:23.743708Z" }, "execution": { "iopub.execute_input": "2021-05-19T06:03:04.462349Z", "iopub.status.busy": "2021-05-19T06:03:04.459404Z", "iopub.status.idle": "2021-05-19T06:03:04.726436Z", "shell.execute_reply": "2021-05-19T06:03:04.728025Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "im = plt.imshow(hes)\n", "plt.xticks(np.arange(4), labels)\n", "plt.yticks(np.arange(4), labels)\n", "plt.title('Harmonic ensemble similarity')\n", "cbar = fig.colorbar(im)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[1] R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler, D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein. [MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations](http://conference.scipy.org/proceedings/scipy2016/oliver_beckstein.html). In S. Benthall and S. Rostrup, editors, *Proceedings of the 15th Python in Science Conference*, pages 98-105, Austin, TX, 2016. SciPy, doi: [10.25080/majora-629e541a-00e](https://doi.org/10.25080/majora-629e541a-00e).\n", "\n", "[2] N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. *J. Comput. Chem*. 32 (2011), 2319-2327, [doi:10.1002/jcc.21787](https://dx.doi.org/10.1002/jcc.21787). PMCID:[PMC3144279](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144279/)\n", "\n", "[3] ENCORE: Software for Quantitative Ensemble Comparison. Matteo Tiberti, Elena Papaleo, Tone Bengtsen, Wouter Boomsma, Kresten Lindorff-Larsen. *PLoS Comput Biol.* 2015, 11, e1004415.\n", "\n", "[4] Beckstein O, Denning EJ, Perilla JR, Woolf TB. Zipping and unzipping of adenylate kinase: atomistic insights into the ensemble of open<-->closed transitions. *J Mol Biol*. 2009;394(1):160–176. [doi:10.1016/j.jmb.2009.09.009](https://dx.doi.org/10.1016%2Fj.jmb.2009.09.009)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "[1] Oliver Beckstein, Elizabeth J. Denning, Juan R. Perilla, and Thomas B. Woolf.\n", "Zipping and Unzipping of Adenylate Kinase: Atomistic Insights into the Ensemble of OpenClosed Transitions.\n", "Journal of Molecular Biology, 394(1):160–176, November 2009.\n", "00107.\n", "URL: https://linkinghub.elsevier.com/retrieve/pii/S0022283609011164, doi:10.1016/j.jmb.2009.09.009.\n", "\n", "[2] Richard J. Gowers, Max Linke, Jonathan Barnoud, Tyler J. E. Reddy, Manuel N. Melo, Sean L. Seyler, Jan Domański, David L. Dotson, Sébastien Buchoux, Ian M. Kenney, and Oliver Beckstein.\n", "MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations.\n", "Proceedings of the 15th Python in Science Conference, pages 98–105, 2016.\n", "00152.\n", "URL: https://conference.scipy.org/proceedings/scipy2016/oliver_beckstein.html, doi:10.25080/Majora-629e541a-00e.\n", "\n", "[3] Naveen Michaud-Agrawal, Elizabeth J. Denning, Thomas B. Woolf, and Oliver Beckstein.\n", "MDAnalysis: A toolkit for the analysis of molecular dynamics simulations.\n", "Journal of Computational Chemistry, 32(10):2319–2327, July 2011.\n", "00778.\n", "URL: http://doi.wiley.com/10.1002/jcc.21787, doi:10.1002/jcc.21787.\n", "\n", "[4] Matteo Tiberti, Elena Papaleo, Tone Bengtsen, Wouter Boomsma, and Kresten Lindorff-Larsen.\n", "ENCORE: Software for Quantitative Ensemble Comparison.\n", "PLOS Computational Biology, 11(10):e1004415, October 2015.\n", "00031.\n", "URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004415, doi:10.1371/journal.pcbi.1004415." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.15 ('mda-user-guide')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "vscode": { "interpreter": { "hash": "7b52aa17ef4e9358c0e91ff3f0bf50d10667bb8b55636d4b9fb967a2ff94bd8c" } } }, "nbformat": 4, "nbformat_minor": 2 }