Reading and writing files¶
Read information from topology and coordinate files to create a Universe:
import MDAnalysis as mda u = mda.Universe('topology.gro', 'trajectory.xtc')
A topology file is always required for a Universe, whereas coordinate files are optional. Some file formats provide both topology and coordinate information. MDAnalysis supports a number of formats, which are automatically detected based on the file extension. For example, the above loads a GROMACS XTC trajectory. Multiple coordinate files can be loaded, as described below; the following code loads two CHARMM/NAMD DCD files and concatenates them:
u = mda.Universe('topology.psf', 'trajectory1.dcd', 'trajectory2.dcd')
Some formats can be loaded with format-specific keyword arguments, such as the LAMMPS DATA
See Loading from files for more information on loading data into a Universe from files.
Reading multiple trajectories¶
A Universe can load multiple trajectories, which are concatenated in the order given. One exception to this is with XTC and TRR files. If the
continuous=True flag is passed to Universe, MDAnalysis will try to stitch them together so that the trajectory is as time-continuous as possible. This means that there will be no duplicate time-frames, or jumps back in time.
As an example, the following depicted trajectory is split into four parts. The column represents the time. As you can see, some segments overlap. With
continuous=True, only the frames marked with a + will be read.
part01: ++++-- part02: ++++++- part03: ++++++++ part04: ++++
However, there can be gaps in time (i.e. frames appear to be missing). Ultimately it is the user’s responsiblity to ensure that the trajectories can be stitched together meaningfully.
While you can set
continuous=True for either XTC or TRR files, you cannot mix different formats.
More information can be found at the API reference for
format keyword is provided,
ChainReader will try to guess the format for each file from its extension. You can force
ChainReader to use the same format for every file by using the
format keyword. You can also specify which format to use by file, by passing in a sequence of
(filename, format) tuples.
In : import MDAnalysis as mda In : from MDAnalysis.tests.datafiles import PDB, GRO In : u = mda.Universe(PDB, [(GRO, 'gro'), (PDB, 'pdb'), (GRO, 'gro')]) In : u.trajectory Out: <ChainReader containing adk_oplsaa.gro, adk_oplsaa.pdb, adk_oplsaa.gro with 3 frames of 47681 atoms>
Reading trajectories into memory¶
If your device has sufficient memory to load an entire trajectory into memory, then analysis can be sped up substantially by transferring the
trajectory to memory. This makes it possible to
operate on raw coordinates using existing MDAnalysis tools. In
addition, it allows the user to make changes to the coordinates in a
trajectory (e.g. through
AtomGroup.positions) without having
to write the entire state to file.
The most straightforward way to do this is to pass
automatically transfers a trajectory to memory:
In : from MDAnalysis.tests.datafiles import TPR, XTC In : universe = mda.Universe(TPR, XTC, in_memory=True)
MDAnalysis uses the
MemoryReader class to load this data in.
Transferring trajectories into memory¶
The decision to transfer the trajectory to memory can be made at any
time with the
In : universe = mda.Universe(TPR, XTC) In : universe.transfer_to_memory()
This operation may take a while (passing
transfer_to_memory() will display a progress bar). However, subsequent operations on the trajectory will be very fast.
Building trajectories in memory¶
MemoryReader can also be used to directly generate a trajectory as a numpy array.
In : from MDAnalysisTests.datafiles import PDB In : from MDAnalysis.coordinates.memory import MemoryReader In : import numpy as np In : universe = mda.Universe(PDB) In : universe.atoms.positions Out: array([[ 52.017, 43.56 , 31.555], [ 51.188, 44.112, 31.722], [ 51.551, 42.828, 31.039], ..., [105.342, 74.072, 40.988], [ 57.684, 35.324, 14.804], [ 62.961, 47.239, 3.753]], dtype=float32)
load_new() method can be used to load coordinates into a Universe, replacing the old coordinates:
In : coordinates = np.random.rand(len(universe.atoms), 3) In : universe.load_new(coordinates, format=MemoryReader); In : universe.atoms.positions Out: array([[0.6523205 , 0.47571093, 0.71093994], [0.8886824 , 0.4973507 , 0.275455 ], [0.29257664, 0.5104733 , 0.493781 ], ..., [0.1561049 , 0.9776475 , 0.36574146], [0.17438903, 0.43279898, 0.5096959 ], [0.67912287, 0.25593033, 0.9732715 ]], dtype=float32)
or they can be directly passed in when creating a Universe.
In : universe2 = mda.Universe(PDB, coordinates, format=MemoryReader) In : universe2.atoms.positions Out: array([[0.6523205 , 0.47571093, 0.71093994], [0.8886824 , 0.4973507 , 0.275455 ], [0.29257664, 0.5104733 , 0.493781 ], ..., [0.1561049 , 0.9776475 , 0.36574146], [0.17438903, 0.43279898, 0.5096959 ], [0.67912287, 0.25593033, 0.9732715 ]], dtype=float32)
In-memory trajectories of an atom selection¶
Creating a trajectory of an atom selection requires transferring the appropriate units. This is often needed when using
Merge() to create a new Universe, as coordinates are not automatically loaded in.
Frames and trajectories¶
Universes can be written out to a number of formats with
write(). For example, to write the current frame as a PDB:
from MDAnalysis.tests.datafiles import PDB, TRR u = mda.Universe(PDB, TRR) ag = u.select_atoms("name CA") ag.write("c-alpha.pdb")
Pass in the
frames keyword to write out trajectories.
Slice or index the trajectory to choose which frames to write:
ag.write('c-alpha_skip2.trr', frames=u.trajectory[::2]) ag.write('c-alpha_some.dcd', frames=u.trajectory[[0,2,3]])
Alternatively, iterate over the trajectory frame-by-frame with
Writer(). This requires you to pass in the number of atoms to write.
with mda.Writer('c-alpha.xyz', ag.n_atoms) as w: for ts in u.trajectory: w.write(ag)
You can pass keyword arguments to some format writers. For example, the LAMMPS DATA format allows the
timeunit keywords to specify the output units.
MDAnalysis supports pickling of most of its data structures and trajectory formats. Unsupported attributes can be found in PR #2887.
In : import pickle In : from MDAnalysis.tests.datafiles import PSF, DCD In : psf = mda.Universe(PSF, DCD) In : pickle.loads(pickle.dumps(psf)) Out: <Universe with 3341 atoms>
MDAnalysis.core.groups.AtomGroup, during serialization, it will be pickled with its bound
MDAnalysis.core.universe.Universe. This means that after unpickling,
MDAnalysis.core.universe.Universe will be created and
be attached to the new
MDAnalysis.core.groups.AtomGroup. If the Universe is serialized
MDAnalysis.core.groups.AtomGroup, they will still be bound together afterwards:
In : import pickle In : from MDAnalysis.tests.datafiles import PSF, DCD In : u = mda.Universe(PSF, DCD) In : g = u.atoms In : g_pickled = pickle.loads(pickle.dumps(g)) In : print("g_pickled.universe is u: ", u is g_pickled.universe) g_pickled.universe is u: False In : g_pickled, u_pickled = pickle.loads(pickle.dumps((g, u))) In : print("g_pickled.universe is u_pickled: ", ....: u_pickled is g_pickled.universe) ....: g_pickled.universe is u_pickled: True
MDAnalysis.core.groups.AtomGroups are bound to the same
MDAnalysis.core.universe.Universe, they will also be bound to the same one
In : u = mda.Universe(PSF, DCD) In : g = u.atoms[:2] In : h = u.atoms[2:4] In : g_pickled = pickle.loads(pickle.dumps(g)) In : h_pickled = pickle.loads(pickle.dumps(h)) In : print("g_pickled.universe is h_pickled.universe : ", ...: g_pickled.universe is h_pickled.universe) ...: g_pickled.universe is h_pickled.universe : False In : g_pickled, h_pickled = pickle.loads(pickle.dumps((g, h))) In : print("g_pickled.universe is h_pickled.universe: ", ...: g_pickled.universe is h_pickled.universe) ...: g_pickled.universe is h_pickled.universe: True