Analysing pore dimensions with HOLE2

Here we use HOLE to analyse pore dimensions in a membrane.

Last updated: December 2022 with MDAnalysis 2.4.0-dev0

Minimum version of MDAnalysis: 1.0.0

Packages required:

Note

The classes in MDAnalysis.analysis.hole2 are wrappers around the HOLE program. Please cite ([SGW93], [SNW+96]) when using this module in published work.

[1]:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB_HOLE
from MDAnalysis.analysis import hole2
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

import warnings
# suppress some MDAnalysis warnings when writing PDB files
warnings.filterwarnings('ignore')

Background

The MDAnalysis.analysis.hole2 module (API docs) provides wrapper classes and functions that call the HOLE program. This means you must have installed the program yourself before you can use the class. You then have 2 options, you either pass the path to your hole executable to the class; in this example, hole is installed at ~/hole2/exe/hole. Or, set your binary path variable ($PATH in Unix systems) to point to the executable’s folder so you don’t have to point to the binary explicitly every time you call hole or any of its helper tools. This is what we have done here, so we don’t have to set the executable argument.

HOLE defines a series of points throughout the pore from which a sphere can be generated that does not overlap any atom (as defined by its van der Waals radius). (Please see ([SGW93], [SNW+96]) for a complete explanation). By default, it ignores residues with the following names: “SOL”, “WAT”, “TIP”, “HOH”, “K “, “NA “, “CL “. You can change these with the ignore_residues keyword. Note that the residue names must have 3 characters. Wildcards do not work.

This tutorial first demonstrates how to use the MDAnalysis.analysis.hole2.hole function similarly to the HOLE binary on a PDB file. We then demonstrate how to use the MDAnalysis.analysis.hole2.HoleAnalysis class on a trajectory of data. You may prefer to use the more fully-featured HoleAnalysis class for the extra functionality we provide, such as creating an animation in VMD of the pore.

Using HOLE with a PDB file

The hole function allows you to specify points to begin searching at (cpoint) and a search direction (cvect), the sampling resolution (sample), and more. Please see the documentation for full details.

The PDB file here is the experimental structure of the Gramicidin A channel. Note that we pass HOLE a PDB file directly, without creating a MDAnalysis.Universe.

We are setting a random_seed here so that the results in the tutorial can be reproducible. This is normally not advised.

[2]:
profiles = hole2.hole(PDB_HOLE,
                      outfile='hole1.out',
                      sphpdb_file='hole1.sph',
                      vdwradii_file=None,
                      random_seed=31415,
                      #   executable='~/hole2/exe/hole',
                      )

outfile and sphpdb_file are the names of the files that HOLE will write out. vdwradii_file is a file of necessary van der Waals’ radii in a HOLE-readable format. If set to None, MDAnalysis will create a simple2.rad file with the built-in radii from the HOLE distribution.

This will create several outputs in your directory:

  • hole1.out: the log file for HOLE.

  • hole1.sph: a PDB-like file containing the coordinates of the pore centers.

  • simple2.rad: file of Van der Waals’ radii

  • tmp/pdb_name.pdb: the short name of a PDB file with your structure. As hole is a FORTRAN77 program, it is limited in how long of a filename that it can read. Symlinking the file to the current directory can shorten the path.

If you do not want to keep the files, set keep_files=False. Keep in mind that you will not be able to create a VMD surface without the sphpdb file.

The pore profile itself is in the profiles1 dictionary, indexed by frame. There is only one frame in this PDB file, so it is at profiles1[0].

[3]:
profiles[0].shape
[3]:
(425,)

Each profile is a numpy.recarray with the fields below as an entry for each rxncoord:

  • rxn_coord: the distance along the pore axis in angstrom

  • radius: the pore radius in angstrom

  • cen_line_D: distance measured along the pore centre line - the first point found is set to zero.

[4]:
profiles[0].dtype.names
[4]:
('rxn_coord', 'radius', 'cen_line_D')

You can then proceed with your own analysis of the profiles.

[5]:
rxn_coords = profiles[0].rxn_coord
pore_length = rxn_coords[-1] - rxn_coords[0]
print('The pore is {} angstroms long'.format(pore_length))
The pore is 42.4 angstroms long

You can create a VMD surface from the hole1.sph output file, using the create_vmd_surface function.

[6]:
hole2.create_vmd_surface(filename='hole1.vmd',
                         sphpdb='hole1.sph',
                        #  sph_process='~/hole2/exe/sph_process',
                         )
[6]:
'hole1.vmd'

To view this, open your PDB file in VMD.

vmd tmp*/*.pdb

Load the output file in Extensions > Tk Console:

source hole1.vmd

Your pore surface will be drawn as below.

sphpdb.png

MDAnalysis supports many of the options that can be customised in HOLE. For example, you can specify a starting point for the pore search within the pore with cpoint, and a sample distance (default: 0.2 angstrom) for the distance between the planes used in HOLE. Please see the MDAnalysis.analysis.hole2 for more information.

Using HOLE with a trajectory

One of the limitations of the hole program is that it can only accept PDB files. In order to use other formats with hole, or to run hole on trajectories, we can use the hole2.HoleAnalysis class with an MDAnalysis.Universe. While the example file below is a PDB, you can use any files to create your Universe. You can also specify that the HOLE analysis is only run on a particular group of atoms with the select keyword (default value: ‘protein’).

As with hole(), HoleAnalysis allows you to select a starting point for the search (cpoint). You can pass in a coordinate array; alternatively, you can use the center-of-geometry of your atom selection in each frame as the start.

[7]:
from MDAnalysis.tests.datafiles import MULTIPDB_HOLE

u = mda.Universe(MULTIPDB_HOLE)

ha = hole2.HoleAnalysis(u, select='protein',
                        cpoint='center_of_geometry',
                        # executable='~/hole2/exe/hole',
                        )
ha.run(random_seed=31415)
[7]:
<mdahole2.analysis.hole.HoleAnalysis at 0x7f1a146af610>

Working with the data

Again, the data is stored in ha.results.profiles as a dictionary of numpy.recarrays. The dictionary is indexed by frame; we can see the HOLE profile for the fourth frame below (truncated to the first 10 values).

[8]:
for rxn_coord, radius, cen_line_D in (ha.results.profiles[3][:10]):
    print(f"{rxn_coord:.2f}, {radius:.2f}, {cen_line_D:.2f}")
-21.01, 15.35, -39.24
-20.91, 12.63, -34.65
-20.81, 10.64, -30.05
-20.71, 9.58, -27.73
-20.61, 8.87, -25.40
-20.51, 8.57, -23.62
-20.41, 8.56, -21.84
-20.31, 8.48, -21.74
-20.21, 8.39, -21.64
-20.11, 8.30, -21.54

If you want to collect each individual property, use gather(). Setting flat=True flattens the lists of rxn_coord, radius, and cen_line_D, in order. You can select which frames you want by passing an iterable of frame indices to frames. frames=None returns all frames.

[9]:
gathered = ha.gather()
print(gathered.keys())
dict_keys(['rxn_coord', 'radius', 'cen_line_D'])
[10]:
print(len(gathered['rxn_coord']))
11
[11]:
flat = ha.gather(flat=True)
print(len(flat['rxn_coord']))
3967

You may also want to collect the radii in bins of rxn_coord for the entire trajectory with the bin_radii() function. range should be a tuple of the lower and upper edges of the first and last bins, respectively. If range=None, the minimum and maximum values of rxn_coord are used.

bins can be either an iterable of (lower, upper) edges (in which case range is ignored), or a number specifying how many bins to create with range.

[12]:
radii, edges = ha.bin_radii(bins=100, range=None)

The closely related histogram_radii() function takes the same arguments as bin_radii() to group the pore radii, but allows you to specify an aggregating function with aggregator (default: np.mean) that will be applied to each array of radii. The arguments for this function, and returned values, are analogous to those for np.histogram.

[13]:
means, edges = ha.histogram_radii(bins=100, range=None,
                                  aggregator=np.mean)

We can use this to plot the mean radii of the pore over the trajectory. (You can also accomplish this with the plot_mean_profile() function shown below, by setting n_std=0.)

[14]:
midpoints = 0.5*(edges[1:]+edges[:-1])
plt.plot(midpoints, means)
plt.ylabel(r"Mean HOLE radius $R$ ($\AA$)")
plt.xlabel(r"Pore coordinate $\zeta$ ($\AA$)")
[14]:
Text(0.5, 0, 'Pore coordinate $\\zeta$ ($\\AA$)')
../../../_images/examples_analysis_polymers_and_membranes_hole2_29_1.png

HoleAnalysis also has the min_radius() function, which will return the minimum radius in angstrom for each frame. The resulting array has the shape (#n_frames, 2).

[15]:
min_radii = ha.min_radius()
for frame, min_radius in min_radii:
    print(f"Frame {int(frame)}: {min_radius:.3f}")
Frame 0: -0.237
Frame 1: 1.567
Frame 2: 1.533
Frame 3: 1.425
Frame 4: 1.243
Frame 5: 1.198
Frame 6: 1.296
Frame 7: 1.438
Frame 8: 1.511
Frame 9: 0.879
Frame 10: 0.997
[16]:
plt.plot(min_radii[:, 0], min_radii[:, 1])
plt.ylabel('Minimum HOLE radius $R$ ($\AA$)')
plt.xlabel('Frame')
[16]:
Text(0.5, 0, 'Frame')
../../../_images/examples_analysis_polymers_and_membranes_hole2_32_1.png

Visualising the VMD surface

The create_vmd_surface() method is built into the HoleAnalysis class. It writes a VMD file that changes the pore surface for each frame in VMD. Again, load your file and source the file in the Tk Console:

source holeanalysis.vmd

[17]:
ha.create_vmd_surface(filename='holeanalysis.vmd')
[17]:
'holeanalysis.vmd'

hole.gif

Plotting

HoleAnalysis has several convenience methods for plotting. plot() plots the HOLE radius over each pore coordinate, differentiating each frame with colour.

[18]:
ha.plot()
[18]:
<AxesSubplot: xlabel='Pore coordinate $\\zeta$ ($\\AA$)', ylabel='HOLE radius $R$ ($\\AA$)'>
../../../_images/examples_analysis_polymers_and_membranes_hole2_39_1.png

You can choose to plot specific frames, or specify colours or a colour map. Please see the documentation for a full description of arguments.

[19]:
ha.plot(frames=[0, 2, 5, 9])
[19]:
<AxesSubplot: xlabel='Pore coordinate $\\zeta$ ($\\AA$)', ylabel='HOLE radius $R$ ($\\AA$)'>
../../../_images/examples_analysis_polymers_and_membranes_hole2_41_1.png

The plot3D() function separates each frame onto its own axis in a 3D plot.

[20]:
ha.plot3D()
[20]:
<Axes3DSubplot: xlabel='Pore coordinate $\\zeta$ ($\\AA$)', ylabel='Frames', zlabel='HOLE radius $R$ ($\\AA$)'>
../../../_images/examples_analysis_polymers_and_membranes_hole2_43_1.png

You can choose to plot only the part of each pore lower than a certain radius by setting r_max.

[21]:
ha.plot3D(r_max=2.5)
[21]:
<Axes3DSubplot: xlabel='Pore coordinate $\\zeta$ ($\\AA$)', ylabel='Frames', zlabel='HOLE radius $R$ ($\\AA$)'>
../../../_images/examples_analysis_polymers_and_membranes_hole2_45_1.png

You can also plot the mean and standard deviation of the pore radius over the pore coordinate.

[22]:
ha.plot_mean_profile(bins=100,  # how much to chunk rxn_coord
                     n_std=1,  # how many standard deviations from mean
                     color='blue',  # color of plot
                     fill_alpha=0.2,  # opacity of standard deviation
                     legend=True)
[22]:
<AxesSubplot: xlabel='Pore coordinate $\\zeta$ ($\\AA$)', ylabel='HOLE radius $R$ ($\\AA$)'>
../../../_images/examples_analysis_polymers_and_membranes_hole2_47_1.png

Ordering HOLE profiles with an order parameter

If you are interested in the HOLE profiles over an order parameter, you can directly pass that into the analysis after it is run. Below, we use an order parameter of RMSD from a reference structure.

Note

Please cite ([SFSB14]) when using the over_order_parameters functionality.

[23]:
from MDAnalysis.analysis import rms

ref = mda.Universe(PDB_HOLE)
rmsd = rms.RMSD(u, ref, select='protein', weights='mass').run()
rmsd_values = rmsd.rmsd[:, 2]
for i, rmsd in enumerate(rmsd_values):
    print(f"Frame {i}:  {rmsd:.2f}")
Frame 0:  6.11
Frame 1:  4.88
Frame 2:  3.66
Frame 3:  2.44
Frame 4:  1.22
Frame 5:  0.00
Frame 6:  1.22
Frame 7:  2.44
Frame 8:  3.66
Frame 9:  4.88
Frame 10:  6.11

You can pass this in as order_parameter. The resulting profiles dictionary will have your order parameters as keys. You should be careful with this if your order parameter has repeated values, as duplicate keys are not possible; each duplicate key just overwrites the previous value.

[24]:
op_profiles = ha.over_order_parameters(rmsd_values)

You can see here that the dictionary does not order the entries by the order parameter. If you iterate over the dictionary, it will return each (key, value) pair in sorted key order.

[25]:
for order_parameter, profile in op_profiles.items():
    print(f"{order_parameter:.3f}, {len(profile)}")
0.000, 543
1.221, 389
1.221, 511
2.442, 419
2.442, 399
3.663, 379
3.663, 443
4.884, 391
4.884, 149
6.105, 205
6.105, 139

You can also select specific frames for the new profiles.

[26]:
op_profiles = ha.over_order_parameters(rmsd_values, frames=[0, 4, 9])
for order_parameter, profile in op_profiles.items():
    print(f"{order_parameter:.3f}, {len(profile)}")
1.221, 511
4.884, 149
6.105, 205

Plotting

HoleAnalysis also provides convenience functions for plotting over order parameters. Unlike plot(), plot_order_parameters() requires an aggregator function that reduces an array of radii to a singular value. The default function is min(). You can also pass in functions such as max() or np.mean(), or define your own function to operate on an array and return a vlue.

[27]:
ha.plot_order_parameters(rmsd_values,
                         aggregator=min,
                         xlabel='RMSD to reference ($\AA$)',
                         ylabel='Minimum HOLE pore radius $r$ ($\AA$)')
[27]:
<AxesSubplot: xlabel='RMSD to reference ($\\AA$)', ylabel='Minimum HOLE pore radius $r$ ($\\AA$)'>
../../../_images/examples_analysis_polymers_and_membranes_hole2_58_1.png

plot3D_order_parameters() functions in a similar way to plot3D(), although you need to pass in the order parameters.

[28]:
ha.plot3D_order_parameters(rmsd_values,
                           ylabel='RMSD to reference ($\AA$)')
[28]:
<Axes3DSubplot: xlabel='Pore coordinate $\\zeta$ ($\\AA$)', ylabel='RMSD to reference ($\\AA$)', zlabel='HOLE radius $R$ ($\\AA$)'>
../../../_images/examples_analysis_polymers_and_membranes_hole2_60_1.png

Deleting HOLE files

The HOLE program and related MDAnalysis code write a number of files out. Both the hole() function and HoleAnalysis class contain ways to easily remove these files.

For hole(), pass in keep_files=False to delete HOLE files as soon as the analysis is done. However, this will also remove the sphpdb file required to create a VMD surface from the analysis. If you need to write a VMD surface file, use the HoleAnalysis class instead.

You can track the created files at the tmp_files attribute.

[29]:
ha.tmp_files
[29]:
['simple2.rad',
 'hole000.out',
 'hole000.sph',
 'hole001.out',
 'hole001.sph',
 'hole002.out',
 'hole002.sph',
 'hole003.out',
 'hole003.sph',
 'hole004.out',
 'hole004.sph',
 'hole005.out',
 'hole005.sph',
 'hole006.out',
 'hole006.sph',
 'hole007.out',
 'hole007.sph',
 'hole008.out',
 'hole008.sph',
 'hole009.out',
 'hole009.sph',
 'hole010.out',
 'hole010.sph']

The built-in method delete_temporary_files() will remove these.

[30]:
ha.delete_temporary_files()
ha.tmp_files
[30]:
[]

Alternatively, you can use HoleAnalysis as a context manager. When you exit the block, the temporary files will be deleted automatically.

[31]:
with hole2.HoleAnalysis(u,
    # executable='~/hole2/exe/hole',
    ) as ha2:
    ha2.run()
    ha2.create_vmd_surface(filename='holeanalysis.vmd')
    ha2.plot()
../../../_images/examples_analysis_polymers_and_membranes_hole2_66_0.png
[32]:
ha.profiles[0][0].radius
[32]:
20.0962

References

[1] Richard J. Gowers, Max Linke, Jonathan Barnoud, Tyler J. E. Reddy, Manuel N. Melo, Sean L. Seyler, Jan Domański, David L. Dotson, Sébastien Buchoux, Ian M. Kenney, and Oliver Beckstein. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. Proceedings of the 15th Python in Science Conference, pages 98–105, 2016. 00152. URL: https://conference.scipy.org/proceedings/scipy2016/oliver_beckstein.html, doi:10.25080/Majora-629e541a-00e.

[2] Naveen Michaud-Agrawal, Elizabeth J. Denning, Thomas B. Woolf, and Oliver Beckstein. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. Journal of Computational Chemistry, 32(10):2319–2327, July 2011. 00778. URL: http://doi.wiley.com/10.1002/jcc.21787, doi:10.1002/jcc.21787.

[3] O S Smart, J M Goodfellow, and B A Wallace. The pore dimensions of gramicidin A. Biophysical Journal, 65(6):2455–2460, December 1993. 00522. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1225986/, doi:10.1016/S0006-3495(93)81293-1.

[4] O. S. Smart, J. G. Neduvelil, X. Wang, B. A. Wallace, and M. S. Sansom. HOLE: a program for the analysis of the pore dimensions of ion channel structural models. Journal of Molecular Graphics, 14(6):354–360, 376, December 1996. 00935. doi:10.1016/s0263-7855(97)00009-x.

[5] Lukas S. Stelzl, Philip W. Fowler, Mark S. P. Sansom, and Oliver Beckstein. Flexible gates generate occluded intermediates in the transport cycle of LacY. Journal of Molecular Biology, 426(3):735–751, February 2014. 00000. URL: https://asu.pure.elsevier.com/en/publications/flexible-gates-generate-occluded-intermediates-in-the-transport-c, doi:10.1016/j.jmb.2013.10.024.