Analysis

The analysis module of MDAnalysis provides the tools needed to analyse your data. Several analyses are included with the package. These range from standard algorithms (e.g. calculating root mean squared quantities) to unique algorithms such as the path similarity analysis.

Generally these bundled analyses are contributed by various researchers who use the code for their own work. Please refer to the individual module documentation or relevant user guide tutorials for additional references and citation information.

If you need functionality that is not already provided in MDAnalysis, there are several ways to write your own analysis.

If you want to run your own script in parallel in MDAnalysis, here is a tutorial on make your code parallelizable.

Imports and dependencies

Analysis modules are not imported by default. In order to use them, you will need to import them separately, e.g.:

from MDAnalysis.analysis import align

Note

Several modules in MDAnalysis.analysis require additional Python packages. For example, encore makes use of scikit-learn. The Python packages are not automatically installed with pip, although they are with conda.

Other modules require external programs. For example, hole requires the HOLE programs. You will need to install these yourself.

Alignments and RMS fitting

The MDAnalysis.analysis.align and MDAnalysis.analysis.rms modules contain the functions used for aligning structures, aligning trajectories, and calculating root mean squared quantities.

Demonstrations of alignment are in align_structure, align_trajectory_first, and align_trajectory. Another example of generating an average structure from an alignment is demonstrated in rmsf. Typically, trajectories need to be aligned for RMSD and RMSF values to make sense.

Note

These modules use the fast QCP algorithm to calculate the root mean square distance (RMSD) between two coordinate sets [The05] and the rotation matrix R that minimizes the RMSD [LAT09]. Please cite these references when using these modules.

Distances and contacts

The MDAnalysis.analysis.distances module provides functions to rapidly compute distances. These largely take in coordinate arrays.

Residues can be determined to be in contact if atoms from the two residues are within a certain distance. Analysing the fraction of contacts retained by a protein over at trajectory, as compared to the number of contacts in a reference frame or structure, can give insight into folding processes and domain movements.

MDAnalysis.analysis.contacts contains functions and a class to analyse the fraction of native contacts over a trajectory.

Trajectory similarity

A molecular dynamics trajectory with \(N\) atoms can be considered through a path through \(3N\)-dimensional molecular configuration space. MDAnalysis contains a number of algorithms to compare the conformational ensembles of different trajectories. Most of these are in the MDAnalysis.analysis.encore module ([TPB+15]) and compare estimated probability distributions to measure similarity. The path similarity analysis compares the RMSD between pairs of structures in conformation transition paths. MDAnalysis.analysis.encore also contains functions for evaluating the conformational convergence of a trajectory using the similarity over conformation clusters or similarity in a reduced dimensional space.

Dimension reduction

A molecular dynamics trajectory with \(N\) atoms can be considered a path through \(3N\)-dimensional molecular configuration space. It remains difficult to extract important dynamics or compare trajectory similarity from such a high-dimensional space. However, collective motions and physically relevant states can often be effectively described with low-dimensional representations of the conformational space explored over the trajectory. MDAnalysis implements two methods for dimensionality reduction.

Principal component analysis is a common linear dimensionality reduction technique that maps the coordinates in each frame of your trajectory to a linear combination of orthogonal vectors. The vectors are called principal components, and they are ordered such that the first principal component accounts for the most variance in the original data (i.e. the largest uncorrelated motion in your trajectory), and each successive component accounts for less and less variance. Trajectory coordinates can be transformed onto a lower-dimensional space (essential subspace) constructed from these principal components in order to compare conformations. Your trajectory can also be projected onto each principal component in order to visualise the motion described by that component.

Diffusion maps are a non-linear dimensionality reduction technique that embeds the coordinates of each frame onto a lower-dimensional space, such that the distance between each frame in the lower-dimensional space represents their “diffusion distance”, or similarity. It integrates local information about the similarity of each point to its neighours, into a global geometry of the intrinsic manifold. This means that this technique is not suitable for trajectories where the transitions between conformational states is not well-sampled (e.g. replica exchange simulations), as the regions may become disconnected and a meaningful global geometry cannot be approximated. Unlike PCA, there is no explicit mapping between the components of the lower-dimensional space and the original atomic coordinates; no physical interpretation of the eigenvectors is immediately available.

For computing similarity, see the tutorials in Trajectory similarity.

Polymers and membranes

MDAnalysis has several analyses specifically for polymers, membranes, and membrane proteins.

Hydrogen Bond Analysis

The MDAnalysis.analysis.hydrogen_bonds module provides methods to find and analyse hydrogen bonds in a Universe.