Constructing, modifying, and adding to a Universe

MDAnalysis version: ≥ 0.20.1

Last updated: December 2022 with MDAnalysis 2.4.0-dev0

Sometimes you may want to construct a Universe from scratch, or add attributes that are not read from a file. For example, you may want to group a Universe into chains, or create custom segments for protein domains.

In this tutorial we:

create a Universe consisting of water molecules
merge this with a protein Universe loaded from a file
create custom segments labeling protein domains

Throughout this tutorial we will include cells for visualising Universes with the NGLView library. However, these will be commented out, and we will show the expected images generated instead of the interactive widgets.

[1]:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB_small
import numpy as np
from IPython.core.display import Image

import warnings
# suppress some MDAnalysis warnings when writing PDB files
warnings.filterwarnings('ignore')

print("Using MDAnalysis version", mda.__version__)

# Optionally, use NGLView to interactively view your trajectory
import nglview as nv
print("Using NGLView version", nv.__version__)

Using MDAnalysis version 2.9.0-dev0

Using NGLView version 3.0.8

Creating and populating a Universe with water

Creating a blank Universe

The Universe.empty() method creates a blank Universe. The natoms (int) argument must be included. Optional arguments are:

n_residues (int): number of residues
n_segments (int): number of segments
atom_resindex (list): list of resindices for each atom
residue_segindex (list): list of segindices for each residue
trajectory (bool): whether to attach a MemoryReader trajectory (default False)
velocities (bool): whether to include velocities in the trajectory (default False)
forces (bool): whether to include forces in the trajectory (default False)

We will create a Universe with 1000 water molecules.

[2]:

n_residues = 1000
n_atoms = n_residues * 3

# create resindex list
resindices = np.repeat(range(n_residues), 3)
assert len(resindices) == n_atoms
print("resindices:", resindices[:10])

# all water molecules belong to 1 segment
segindices = [0] * n_residues
print("segindices:", segindices[:10])

resindices: [0 0 0 1 1 1 2 2 2 3]
segindices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

[3]:

# create the Universe
sol = mda.Universe.empty(n_atoms,
                         n_residues=n_residues,
                         atom_resindex=resindices,
                         residue_segindex=segindices,
                         trajectory=True) # necessary for adding coordinates
sol

[3]:

<Universe with 3000 atoms>

Adding topology attributes

There isn’t much we can do with our current Universe because MDAnalysis has no information on the particle elements, positions, etc. We can add relevant information manually using TopologyAttrs.

names

[4]:

sol.add_TopologyAttr('name', ['O', 'H1', 'H2']*n_residues)
sol.atoms.names

[4]:

array(['O', 'H1', 'H2', ..., 'O', 'H1', 'H2'], shape=(3000,), dtype=object)

elements (“types”)

Elements are typically contained in the type topology attribute.

[5]:

sol.add_TopologyAttr('type', ['O', 'H', 'H']*n_residues)
sol.atoms.types

[5]:

array(['O', 'H', 'H', ..., 'O', 'H', 'H'], shape=(3000,), dtype=object)

residue names (“resnames”)

[6]:

sol.add_TopologyAttr('resname', ['SOL']*n_residues)
sol.atoms.resnames

[6]:

array(['SOL', 'SOL', 'SOL', ..., 'SOL', 'SOL', 'SOL'],
      shape=(3000,), dtype=object)

residue counter (“resids”)

[7]:

sol.add_TopologyAttr('resid', list(range(1, n_residues+1)))
sol.atoms.resids

[7]:

array([   1,    1,    1, ..., 1000, 1000, 1000], shape=(3000,))

segment/chain names (“segids”)

[8]:

sol.add_TopologyAttr('segid', ['SOL'])
sol.atoms.segids

[8]:

array(['SOL', 'SOL', 'SOL', ..., 'SOL', 'SOL', 'SOL'],
      shape=(3000,), dtype=object)

Adding positions

Positions can simply be assigned, without adding a topology attribute.

The O-H bond length in water is around 0.96 Angstrom, and the bond angle is 104.45°. We can first obtain a set of coordinates for one molecule, and then translate it for every water molecule.

[9]:

# coordinates obtained by building a molecule in the program IQMol
h2o = np.array([[ 0,        0,       0      ],  # oxygen
                [ 0.95908, -0.02691, 0.03231],  # hydrogen
                [-0.28004, -0.58767, 0.70556]]) # hydrogen

[10]:

grid_size = 10
spacing = 8

coordinates = []

# translating h2o coordinates around a grid
for i in range(n_residues):
    x = spacing * (i % grid_size)
    y = spacing * ((i // grid_size) % grid_size)
    z = spacing * (i // (grid_size * grid_size))

    xyz = np.array([x, y, z])

    coordinates.extend(h2o + xyz.T)

print(coordinates[:10])

[array([0., 0., 0.]), array([ 0.95908, -0.02691,  0.03231]), array([-0.28004, -0.58767,  0.70556]), array([8., 0., 0.]), array([ 8.95908, -0.02691,  0.03231]), array([ 7.71996, -0.58767,  0.70556]), array([16.,  0.,  0.]), array([16.95908, -0.02691,  0.03231]), array([15.71996, -0.58767,  0.70556]), array([24.,  0.,  0.])]

[11]:

coord_array = np.array(coordinates)
assert coord_array.shape == (n_atoms, 3)
sol.atoms.positions = coord_array

We can view the atoms with NGLView, a library for visualising molecules. It guesses bonds based on distance.

[12]:

# sol_view = nv.show_mdanalysis(sol)
# sol_view.add_representation('ball+stick', selection='all')
# sol_view.center()
# sol_view

sol_view

Adding bonds

Currently, the sol universe doesn’t contain any bonds.

[13]:

assert not hasattr(sol, 'bonds')

They can be important for defining ‘fragments’, which are groups of atoms where every atom is connected by a bond to another atom in the group (i.e. what is commonly called a molecule). You can pass a list of tuples of atom indices to add bonds as a topology attribute.

[14]:

bonds = []
for o in range(0, n_atoms, 3):
    bonds.extend([(o, o+1), (o, o+2)])

bonds[:10]

[14]:

[(0, 1),
 (0, 2),
 (3, 4),
 (3, 5),
 (6, 7),
 (6, 8),
 (9, 10),
 (9, 11),
 (12, 13),
 (12, 14)]

[15]:

sol.add_TopologyAttr('bonds', bonds)
sol.bonds

[15]:

<TopologyGroup containing 2000 bonds>

The bonds associated with each atom or the bonds within an AtomGroup can be accessed with the bonds attribute:

[16]:

print(sol.atoms[0].bonds)
print(sol.atoms[-10:].bonds)

<TopologyGroup containing 2 bonds>
<TopologyGroup containing 7 bonds>

Merging with a protein

Now we can merge the water with a protein to create a combined system by using MDAnalysis.Merge to combine AtomGroup instances.

The protein is adenylate kinase (AdK), a phosphotransferase enzyme. [1]

[17]:

protein = mda.Universe(PDB_small)

[18]:

# protein_view = nv.show_mdanalysis(protein)
# protein_view

protein_view

We will translate the centers of both systems to the origin, so they can overlap in space.

[19]:

cog = sol.atoms.center_of_geometry()
print('Original solvent center of geometry: ', cog)
sol.atoms.positions -= cog
cog2 = sol.atoms.center_of_geometry()
print('New solvent center of geometry: ', cog2)

Original solvent center of geometry:  [36.22634681 35.79514029 36.24595657]
New solvent center of geometry:  [ 2.78155009e-07 -1.27156576e-07  3.97364299e-08]

[20]:

cog = protein.atoms.center_of_geometry()
print('Original solvent center of geometry: ', cog)
protein.atoms.positions -= cog
cog2 = protein.atoms.center_of_geometry()
print('New solvent center of geometry: ', cog2)

Original solvent center of geometry:  [-3.66508082  9.60502842 14.33355791]
New solvent center of geometry:  [8.30580288e-08 3.49225059e-08 2.51332265e-08]

[21]:

combined = mda.Merge(protein.atoms, sol.atoms)

[22]:

# combined_view = nv.show_mdanalysis(combined)
# combined_view.add_representation("ball+stick", selection="not protein")
# combined_view

combined_view

Unfortunately, some water molecules overlap with the protein. We can create a new AtomGroup containing only the molecules where every atom is further away than 6 angstroms from the protein.

[23]:

no_overlap = combined.select_atoms("same resid as (not around 6 protein)")

With this AtomGroup, we can then construct a new Universe.

[24]:

u = mda.Merge(no_overlap)

[25]:

# no_overlap_view = nv.show_mdanalysis(u)
# no_overlap_view.add_representation("ball+stick", selection="not protein")
# no_overlap_view

no_overlap_view

Adding a new segment

Often you may want to assign atoms to a segment or chain – for example, adding segment IDs to a PDB file. This requires adding a new Segment with Universe.add_Segment.

Adenylate kinase has three domains: CORE, NMP, and LID. As shown in the picture below,[1] these have the residues:

CORE: residues 1-29, 60-121, 160-214 (gray)
NMP: residues 30-59 (blue)
LID: residues 122-159 (yellow)

f116422c65334e9282767dc0be0a8c3d

[26]:

u.segments.segids

[26]:

array(['4AKE', 'SOL'], dtype=object)

On examining the Universe, we can see that the protein and solvent are already divided into two segments: protein (‘4AKE’) and solvent (‘SOL’). We will add three more segments (CORE, NMP, and LID) and assign atoms to them.

First, add a Segment to the Universe with a segid. It will be empty:

[27]:

core_segment = u.add_Segment(segid='CORE')
core_segment.atoms

[27]:

<AtomGroup with 0 atoms>

Residues can’t be broken across segments. To put atoms in a segment, assign the segments attribute of their residues:

[28]:

core_atoms = u.select_atoms('resid 1:29 or resid 60:121 or resid 160-214')
core_atoms.residues.segments = core_segment
core_segment.atoms

[28]:

<AtomGroup with 2744 atoms>

[29]:

nmp_segment = u.add_Segment(segid='NMP')
lid_segment = u.add_Segment(segid='LID')

nmp_atoms = u.select_atoms('resid 30:59')
nmp_atoms.residues.segments = nmp_segment

lid_atoms = u.select_atoms('resid 122:159')
lid_atoms.residues.segments = lid_segment

As of MDAnalysis 2.1.0, PDBs use the chainID TopologyAttr for the chainID column. If it is missing, it uses a placeholder “X” value instead of the segid. We therefore must manually set that ourselves to visualise the protein in NGLView.

[30]:

# add the topologyattr to the universe
u.add_TopologyAttr("chainID")
core_segment.atoms.chainIDs = "C"
nmp_segment.atoms.chainIDs = "N"
lid_segment.atoms.chainIDs = "L"

We can check that we have the correct domains by visualising the protein.

[31]:

# domain_view = nv.show_mdanalysis(u)
# domain_view.add_representation("protein", color_scheme="chainID")
# domain_view

domain-view

Tiling into a larger Universe

We can use MDAnalysis to tile out a smaller Universe into a bigger one, similarly to editconf in GROMACS. To start off, we need to figure out the box size. The default in MDAnalysis is a zero vector. The first three numbers represent the length of each axis, and the last three represent the alpha, beta, and gamma angles of a triclinic box.

[32]:

print(u.dimensions)

None

We know that our system is cubic in shape, so we can assume angles of 90°. The difference between the lowest and highest x-axis positions is roughly 73 Angstroms.

[33]:

max(u.atoms.positions[:, 0]) - min(u.atoms.positions[:, 0])

[33]:

np.float32(73.23912)

So we can set our dimensions.

[34]:

u.dimensions = [73, 73, 73, 90, 90, 90]

To tile out a Universe, we need to copy it and translate the atoms by the box dimensions. We can then merge the cells into one large Universe and assign new dimensions.

[35]:

def tile_universe(universe, n_x, n_y, n_z):
    box = universe.dimensions[:3]
    copied = []
    for x in range(n_x):
        for y in range(n_y):
            for z in range(n_z):
                u_ = universe.copy()
                move_by = box*(x, y, z)
                u_.atoms.translate(move_by)
                copied.append(u_.atoms)

    new_universe = mda.Merge(*copied)
    new_box = box*(n_x, n_y, n_z)
    new_universe.dimensions = list(new_box) + [90]*3
    return new_universe

Here is a 2 x 2 x 2 version of our original unit cell:

[36]:

tiled = tile_universe(u, 2, 2, 2)

[37]:

# tiled_view = nv.show_mdanalysis(tiled)
# tiled_view

tiled_view

References

[1]: Beckstein O, Denning EJ, Perilla JR, Woolf TB. Zipping and unzipping of adenylate kinase: atomistic insights into the ensemble of open<–>closed transitions. J Mol Biol. 2009;394(1):160–176. doi:10.1016/j.jmb.2009.09.009

Acknowledgments

The Universe tiling code was modified from @richardjgowers’s gist on the issue in 2016.