Analysis

The analysis module of MDAnalysis provides the tools needed to analyse your data. Several analyses are included with the package. These range from standard algorithms (e.g. calculating root mean squared quantities) to unique algorithms such as the path similarity analysis.

Generally these bundled analyses are contributed by various researchers who use the code for their own work. Please refer to the individual module documentation or relevant user guide tutorials for additional references and citation information.

If you need functionality that is not already provided in MDAnalysis, there are several ways to write your own analysis.

If you want to run your own script in parallel in MDAnalysis, here is a tutorial on make your code parallelizable.

Imports and dependencies

Analysis modules are not imported by default. In order to use them, you will need to import them separately, e.g.:

from MDAnalysis.analysis import align

Note

Several modules in MDAnalysis.analysis require additional Python packages. For example, encore makes use of scikit-learn. The Python packages are not automatically installed with pip, although they are with conda.

Other modules require external programs. For example, hole requires the HOLE programs. You will need to install these yourself.

Alignments and RMS fitting

The MDAnalysis.analysis.align and MDAnalysis.analysis.rms modules contain the functions used for aligning structures, aligning trajectories, and calculating root mean squared quantities.

Demonstrations of alignment are in align_structure, align_trajectory_first, and align_trajectory. Another example of generating an average structure from an alignment is demonstrated in rmsf. Typically, trajectories need to be aligned for RMSD and RMSF values to make sense.

Note

These modules use the fast QCP algorithm to calculate the root mean square distance (RMSD) between two coordinate sets [The05] and the rotation matrix R that minimizes the RMSD [LAT09]. Please cite these references when using these modules.

Distances and contacts

The MDAnalysis.analysis.distances module provides functions to rapidly compute distances. These largely take in coordinate arrays.

Residues can be determined to be in contact if atoms from the two residues are within a certain distance. Analysing the fraction of contacts retained by a protein over at trajectory, as compared to the number of contacts in a reference frame or structure, can give insight into folding processes and domain movements.

MDAnalysis.analysis.contacts contains functions and a class to analyse the fraction of native contacts over a trajectory.

Trajectory similarity

A molecular dynamics trajectory with \(N\) atoms can be considered through a path through \(3N\)-dimensional molecular configuration space. MDAnalysis contains a number of algorithms to compare the conformational ensembles of different trajectories. Most of these are in the MDAnalysis.analysis.encore module ([TPB+15]) and compare estimated probability distributions to measure similarity. The path similarity analysis compares the RMSD between pairs of structures in conformation transition paths. MDAnalysis.analysis.encore also contains functions for evaluating the conformational convergence of a trajectory using the similarity over conformation clusters or similarity in a reduced dimensional space.

Dimension reduction

A molecular dynamics trajectory with \(N\) atoms can be considered a path through \(3N\)-dimensional molecular configuration space. It remains difficult to extract important dynamics or compare trajectory similarity from such a high-dimensional space. However, collective motions and physically relevant states can often be effectively described with low-dimensional representations of the conformational space explored over the trajectory. MDAnalysis implements two methods for dimensionality reduction.

For computing similarity, see the tutorials in Trajectory similarity.

Polymers and membranes

MDAnalysis has several analyses specifically for polymers, membranes, and membrane proteins.