Universe

If you wish to make an apple pie from scratch, you must first invent the universe.

—Carl Sagan, Cosmos

MDAnalysis is structured around two fundamental classes: the Universe and the AtomGroup. Almost all code in MDAnalysis begins with Universe, which contains all the information describing a molecular dynamics system.

It has two key properties:

  • atoms: an AtomGroup of the system’s atoms, providing access to important analysis methods (described below)

  • trajectory: the currently loaded trajectory reader

A Universe ties the static information from the “topology” (e.g. atom identities) to dynamically updating information from the “trajectory” (e.g. coordinates). A key feature of MDAnalysis is that an entire trajectory is not loaded into memory (unless the user explicitly does so with MemoryReader). Instead, the trajectory attribute provides a view on a specific frame of the trajectory. This allows the analysis of arbitrarily long trajectories without a significant impact on memory.

Creating a Universe

Loading from files

A Universe is typically created from a “topology” file, with optional “trajectory” file/s. Trajectory files must have the coordinates in the same order as atoms in the topology. See Formats for the topology and trajectory formats supported by MDAnalysis, and how to load each specific format.

u = Universe(topology, trajectory)
u = Universe(pdbfile)                       # read atoms and coordinates from PDB or GRO
u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
u = Universe(topology, traj1, traj2, ...)   # read from multiple trajectories

The line between topology and trajectory files is quite blurry. For example, a PDB or GRO file is considered both a topology and a trajectory file. The difference is that a topology file provides static information, such as atom identities (name, mass, etc.), charges, and bond connectivity. A trajectory file provides dynamic information, such as coordinates, velocities, forces, and box dimensions.

If only a single file is provided, MDAnalysis tries to read both topology and trajectory information from it. When multiple trajectory files are provided, coordinates are loaded in the order given.

The default arguments should create a Universe suited for most analysis applications. However, the Universe constructor also takes optional arguments.

The following options specify how to treat the input:

  • format: the file format of the trajectory file/s. (default: None, formats are guessed)

  • topology_format: the file format of the topology file. (default: None, formats are guessed)

  • all_coordinates: whether to read coordinate information from the first file (default: False. Ignored when only one file is provided)

  • continuous: whether to give multiple trajectory files continuous time steps. This is currently only supported for XTC/TRR trajectories with a GRO/TPR topology, following the behaviour of gmx trjcat (default: False.)

In [1]: import MDAnalysis as mda

In [2]: from MDAnalysis.tests.datafiles import PDB, GRO, XTC

In [3]: u1 = mda.Universe(GRO, XTC, XTC, all_coordinates=True)

In [4]: u1.trajectory
Out[4]: <ChainReader containing adk_oplsaa.gro, adk_oplsaa.xtc, adk_oplsaa.xtc with 21 frames of 47681 atoms>

In [5]: u2 = mda.Universe(GRO, XTC, XTC, all_coordinates=False, continuous=False)

In [6]: print([int(ts.time) for ts in u2.trajectory])
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 0, 100, 200, 300, 400, 500, 600, 700, 800, 900]

The following options modify the created Universe:

  • guess_bonds: whether to guess connectivity between atoms. (default: False)

  • vdwradii: a dictionary of {element: radius} of van der Waals’ radii for use in guessing bonds.

  • transformations: a function or list of functions for on-the-fly trajectory transformation.

  • in_memory: whether to load coordinates into memory (default: False)

  • in_memory_step: only read every nth frame into an in-memory representation. (default: 1)

  • is_anchor: whether to consider this Universe when unpickling AtomGroups (default: True)

  • anchor_name: the name of this Universe when unpickling AtomGroups (default: None, automatically generated)

You can also pass in keywords for parsing the topology or coordinates. For example, many file formats do not specify the timestep for their trajectory. In these cases, MDAnalysis assumes that the default timestep is 1 ps. If this is incorrect, you can pass in a dt argument to modify the timestep. This does not modify timesteps for formats that include time information.

In [7]: from MDAnalysis.tests.datafiles import PRM, TRJ

In [8]: default_timestep = mda.Universe(PRM, TRJ)

In [9]: default_timestep.trajectory.dt
Out[9]: 1.0

In [10]: user_timestep = mda.Universe(PRM, TRJ, dt=5)  # ps

In [11]: user_timestep.trajectory.dt
Out[11]: 5

Constructing from AtomGroups

A new Universe can be created from one or more AtomGroup instances with Merge(). The AtomGroup instances can come from different Universes, meaning that this is one way to concatenate selections from different datasets.

For example, to combine a protein, ligand, and solvent from separate PDB files:

u1 = mda.Universe("protein.pdb")
u2 = mda.Universe("ligand.pdb")
u3 = mda.Universe("solvent.pdb")
u = Merge(u1.select_atoms("protein"), u2.atoms, u3.atoms)
u.atoms.write("system.pdb")

Constructing from scratch

A Universe can be constructed from scratch with Universe.empty. There are three stages to this process:

  1. Create the blank Universe with specified number of atoms. If coordinates, set trajectory=True.

  2. Add topology attributes such as atom names.

  3. (Optional) Load coordinates.

For example, to construct a universe with 6 atoms in 2 residues:

In [12]: u = mda.Universe.empty(6, 2, atom_resindex=[0, 0, 0, 1, 1, 1], trajectory=True)

In [13]: u.add_TopologyAttr('masses')

In [14]: coordinates = np.empty((1000,  # number of frames
   ....:                         u.atoms.n_atoms,
   ....:                         3))
   ....: 

In [15]: u.load_new(coordinates, order='fac')
Out[15]: <Universe with 6 atoms>

See this notebook tutorial for more information.

Guessing topology attributes

MDAnalysis can guess two kinds of information. Sometimes MDAnalysis guesses information instead of reading it from certain file formats, which can lead to mistakes such as assigning atoms the wrong element or charge. See the available topology parsers for a case-by-case breakdown of which atom properties MDAnalysis guesses for each format. See Guessing for how attributes are guessed, and Default values and attribute levels for which attributes have default values.

Universe properties and methods

A Universe holds master groups of atoms and topology objects:

  • atoms: all Atoms in the system, in an AtomGroup.

  • residues: all Residues in the system

  • segments: all Segments in the system

  • bonds: all bond TopologyObjects in the system

  • angles: all angle TopologyObjects in the system

  • dihedrals: all dihedral TopologyObjects in the system

  • impropers: all improper TopologyObjects in the system

Residues and Segments are chemically meaningful groups of Atoms.

Modifying a topology is typically done through the Universe, which contains several methods for adding properties:

See Topology attributes for more information on which topology attributes can be added, and examples/constructing_universe.ipynb for examples on adding attributes and Segments.