{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Principal component analysis of a trajectory\n", "\n", "Here we compute the principal component analysis of a trajectory.\n", "\n", "**Last updated:** February 2020\n", "\n", "**Minimum version of MDAnalysis:** 0.17.0\n", "\n", "**Packages required:**\n", " \n", "* MDAnalysis (Michaud-Agrawal *et al.*, 2011, Gowers *et al.*, 2016)\n", "* MDAnalysisTests\n", "\n", "**Optional packages for visualisation:**\n", "\n", "* [nglview](http://nglviewer.org/nglview/latest/api.html)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ddaddaff9ef74f7984332974bbe6d79a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "_ColormakerRegistry()" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import MDAnalysis as mda\n", "from MDAnalysis.tests.datafiles import PSF, DCD\n", "from MDAnalysis.analysis import pca, align\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "import nglview as nv\n", "import warnings\n", "# suppress some MDAnalysis warnings about writing PDB files\n", "warnings.filterwarnings('ignore')\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading files\n", "\n", "The test files we will be working with here feature adenylate kinase (AdK), a phosophotransferase enzyme. (Beckstein *et al.*, 2009) The trajectory ``DCD`` samples a transition from a closed to an open conformation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "u = mda.Universe(PSF, DCD)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Principal component analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Principal component analysis (PCA) is a statistical technique that decomposes a system of observations into linearly uncorrelated variables called **principal components**. These components are ordered so that the first principal component accounts for the largest variance in the data, and each following component accounts for lower and lower variance. PCA is often applied to molecular dynamics trajectories to extract the large-scale conformational motions or \"essential dynamics\" of a protein. The frame-by-frame conformational fluctuation can be considered a linear combination of the essential dynamics yielded by the PCA.\n", "\n", "In MDAnalysis, the method implemented in the PCA class ([API docs](https://docs.mdanalysis.org/stable/documentation_pages/analysis/pca.html)) is as follows:\n", "\n", "1. Optionally align each frame in your trajectory to the first frame.\n", "2. Construct a 3N x 3N covariance for the N atoms in your trajectory. Optionally, you can provide a mean; otherwise the covariance is to the averaged structure over the trajectory.\n", "3. Diagonalise the covariance matrix. The eigenvectors are the principal components, and their eigenvalues are the associated variance.\n", "4. Sort the eigenvalues so that the principal components are ordered by variance.\n", "\n", "
\n", " | PC1 | \n", "PC2 | \n", "PC3 | \n", "PC4 | \n", "PC5 | \n", "Time (ps) | \n", "
---|---|---|---|---|---|---|
0 | \n", "118.408403 | \n", "29.088239 | \n", "15.746624 | \n", "-8.344273 | \n", "-1.778052 | \n", "0.0 | \n", "
1 | \n", "115.561875 | \n", "26.786799 | \n", "14.652497 | \n", "-6.621619 | \n", "-0.629777 | \n", "1.0 | \n", "
2 | \n", "112.675611 | \n", "25.038767 | \n", "12.920275 | \n", "-4.324424 | \n", "-0.160540 | \n", "2.0 | \n", "
3 | \n", "110.341464 | \n", "24.306985 | \n", "11.427097 | \n", "-3.891525 | \n", "-0.173275 | \n", "3.0 | \n", "
4 | \n", "107.584299 | \n", "23.464155 | \n", "11.612103 | \n", "-2.293161 | \n", "-1.500821 | \n", "4.0 | \n", "