{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Principal component analysis of a trajectory\n", "\n", "Here we compute the principal component analysis of a trajectory.\n", "\n", "**Last updated:** December 2022 with MDAnalysis 2.4.0-dev0\n", "\n", "**Minimum version of MDAnalysis:** 1.0.0\n", "\n", "**Packages required:**\n", " \n", "* MDAnalysis (Michaud-Agrawal *et al.*, 2011, Gowers *et al.*, 2016)\n", "* MDAnalysisTests\n", "\n", "**Optional packages for visualisation:**\n", "\n", "* [nglview](http://nglviewer.org/nglview/latest/api.html)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-12-27T02:19:58.481105Z", "start_time": "2020-12-27T02:19:57.039201Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d8483e7f3a8b404ca65c640dc0c71b68", "version_major": 2, "version_minor": 0 }, "text/plain": [] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "import MDAnalysis as mda\n", "from MDAnalysis.tests.datafiles import PSF, DCD\n", "from MDAnalysis.analysis import pca, align\n", "import nglview as nv;\n", "\n", "import warnings\n", "# suppress some MDAnalysis warnings about writing PDB files\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading files\n", "\n", "The test files we will be working with here feature adenylate kinase (AdK), a phosophotransferase enzyme. (Beckstein *et al.*, 2009) The trajectory ``DCD`` samples a transition from a closed to an open conformation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-12-27T02:19:58.572962Z", "start_time": "2020-12-27T02:19:58.483786Z" } }, "outputs": [], "source": [ "u = mda.Universe(PSF, DCD)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Principal component analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Principal component analysis is a common linear dimensionality reduction technique that maps the coordinates in each frame of your trajectory to a linear combination of orthogonal vectors. The vectors are called **principal components**, and they are ordered such that the first principal component accounts for the most variance in the original data (i.e. the largest uncorrelated motion in your trajectory), and each successive component accounts for less and less variance. The frame-by-frame conformational fluctuation can be considered a linear combination of the essential dynamics yielded by the PCA. Please see Amadei *et al.*, 1993, Jolliffe, 2002, Sittel *et al.*, 2014, or Sittel and Stock, 2018 for a more in-depth introduction to PCA.\n", "\n", "Trajectory coordinates can be transformed onto a lower-dimensional space (*essential subspace*) constructed from these principal components in order to compare conformations. You can thereby visualise the motion described by that component.\n", "\n", "In MDAnalysis, the method implemented in the PCA class ([API docs](https://docs.mdanalysis.org/stable/documentation_pages/analysis/pca.html)) is as follows:\n", "\n", "1. Optionally align each frame in your trajectory to the first frame.\n", "2. Construct a 3N x 3N covariance for the N atoms in your trajectory. Optionally, you can provide a mean; otherwise the covariance is to the averaged structure over the trajectory.\n", "3. Diagonalise the covariance matrix. The eigenvectors are the principal components, and their eigenvalues are the associated variance.\n", "4. Sort the eigenvalues so that the principal components are ordered by variance.\n", "\n", "
\n", " | PC1 | \n", "PC2 | \n", "PC3 | \n", "Time (ps) | \n", "
---|---|---|---|---|
0 | \n", "118.408413 | \n", "29.088241 | \n", "15.746624 | \n", "0.0 | \n", "
1 | \n", "115.561879 | \n", "26.786797 | \n", "14.652498 | \n", "1.0 | \n", "
2 | \n", "112.675616 | \n", "25.038766 | \n", "12.920274 | \n", "2.0 | \n", "
3 | \n", "110.341467 | \n", "24.306984 | \n", "11.427098 | \n", "3.0 | \n", "
4 | \n", "107.584302 | \n", "23.464154 | \n", "11.612104 | \n", "4.0 | \n", "