Reading and writing files
Input
Read information from topology and coordinate files to create a Universe:
import MDAnalysis as mda
u = mda.Universe('topology.gro', 'trajectory.xtc')
A topology file is always required for a Universe, whereas coordinate files are optional. Some file formats provide both topology and coordinate information. MDAnalysis supports a number of formats, which are automatically detected based on the file extension. For example, the above loads a GROMACS XTC trajectory. Multiple coordinate files can be loaded, as described below; the following code loads two CHARMM/NAMD DCD files and concatenates them:
u = mda.Universe('topology.psf', 'trajectory1.dcd', 'trajectory2.dcd')
Some formats can be loaded with format-specific keyword arguments, such as the LAMMPS DATA atom_style
specification.
See also
See Loading from files for more information on loading data into a Universe from files.
Reading multiple trajectories
A Universe can load multiple trajectories, which are concatenated in the order given. One exception to this is with XTC and TRR files. If the continuous=True
flag is passed to Universe, MDAnalysis will try to stitch them together so that the trajectory is as time-continuous as possible. This means that there will be no duplicate time-frames, or jumps back in time.
As an example, the following depicted trajectory is split into four parts. The column represents the time. As you can see, some segments overlap. With continuous=True
, only the frames marked with a + will be read.
part01: ++++--
part02: ++++++-
part03: ++++++++
part04: ++++
However, there can be gaps in time (i.e. frames appear to be missing). Ultimately it is the user’s responsiblity to ensure that the trajectories can be stitched together meaningfully.
Note
While you can set continuous=True
for either XTC or TRR files, you cannot mix different formats.
More information can be found at the API reference for ChainReader
.
Trajectory formats
If no format
keyword is provided, ChainReader
will try to guess the format for each file from its extension. You can force ChainReader
to use the same format for every file by using the format
keyword. You can also specify which format to use by file, by passing in a sequence of (filename, format)
tuples.
In [1]: import MDAnalysis as mda
In [2]: from MDAnalysis.tests.datafiles import PDB, GRO
In [3]: u = mda.Universe(PDB, [(GRO, 'gro'), (PDB, 'pdb'), (GRO, 'gro')])
In [4]: u.trajectory
Out[4]: <ChainReader containing adk_oplsaa.gro, adk_oplsaa.pdb, adk_oplsaa.gro with 3 frames of 47681 atoms>
In-memory trajectories
Reading trajectories into memory
If your device has sufficient memory to load an entire trajectory into memory, then analysis can be sped up substantially by transferring the
trajectory to memory. This makes it possible to
operate on raw coordinates using existing MDAnalysis tools. In
addition, it allows the user to make changes to the coordinates in a
trajectory (e.g. through
AtomGroup.positions
) without having
to write the entire state to file.
The most straightforward way to do this is to pass in_memory=True
to Universe
, which
automatically transfers a trajectory to memory:
In [5]: from MDAnalysis.tests.datafiles import TPR, XTC
In [6]: universe = mda.Universe(TPR, XTC, in_memory=True)
MDAnalysis uses the MemoryReader
class to load this data in.
Transferring trajectories into memory
The decision to transfer the trajectory to memory can be made at any
time with the
transfer_to_memory()
method
of a Universe
:
In [7]: universe = mda.Universe(TPR, XTC)
In [8]: universe.transfer_to_memory()
This operation may take a while (passing verbose=True
to transfer_to_memory()
will display a progress bar). However, subsequent operations on the trajectory will be very fast.
Building trajectories in memory
MemoryReader
can also be used to directly generate a trajectory as a numpy array.
In [9]: from MDAnalysisTests.datafiles import PDB
In [10]: from MDAnalysis.coordinates.memory import MemoryReader
In [11]: import numpy as np
In [12]: universe = mda.Universe(PDB)
In [13]: universe.atoms.positions
Out[13]:
array([[ 52.017, 43.56 , 31.555],
[ 51.188, 44.112, 31.722],
[ 51.551, 42.828, 31.039],
...,
[105.342, 74.072, 40.988],
[ 57.684, 35.324, 14.804],
[ 62.961, 47.239, 3.753]], dtype=float32)
The load_new()
method can be used to load coordinates into a Universe, replacing the old coordinates:
In [14]: coordinates = np.random.rand(len(universe.atoms), 3)
In [15]: universe.load_new(coordinates, format=MemoryReader);
In [16]: universe.atoms.positions
Out[16]:
array([[0.22084172, 0.69380623, 0.6204634 ],
[0.32873678, 0.3500816 , 0.8036403 ],
[0.8358787 , 0.2957448 , 0.07027348],
...,
[0.16618489, 0.35975078, 0.5305598 ],
[0.47995117, 0.5262336 , 0.41377455],
[0.38143802, 0.57584035, 0.6608682 ]], dtype=float32)
or they can be directly passed in when creating a Universe.
In [17]: universe2 = mda.Universe(PDB, coordinates, format=MemoryReader)
In [18]: universe2.atoms.positions
Out[18]:
array([[0.22084172, 0.69380623, 0.6204634 ],
[0.32873678, 0.3500816 , 0.8036403 ],
[0.8358787 , 0.2957448 , 0.07027348],
...,
[0.16618489, 0.35975078, 0.5305598 ],
[0.47995117, 0.5262336 , 0.41377455],
[0.38143802, 0.57584035, 0.6608682 ]], dtype=float32)
In-memory trajectories of an atom selection
Creating a trajectory of an atom selection requires transferring the appropriate units. This is often needed when using Merge()
to create a new Universe, as coordinates are not automatically loaded in.
Output
Frames and trajectories
MDAnalysis Universe
s can be written out to a number of formats with write()
. For example, to write the current frame as a PDB:
from MDAnalysis.tests.datafiles import PDB, TRR
u = mda.Universe(PDB, TRR)
ag = u.select_atoms("name CA")
ag.write("c-alpha.pdb")
Pass in the frames
keyword to write out trajectories.
ag.write('c-alpha_all.xtc', frames='all')
Slice or index the trajectory to choose which frames to write:
ag.write('c-alpha_skip2.trr', frames=u.trajectory[::2])
ag.write('c-alpha_some.dcd', frames=u.trajectory[[0,2,3]])
Alternatively, iterate over the trajectory frame-by-frame with Writer()
. This requires you to pass in the number of atoms to write.
with mda.Writer('c-alpha.xyz', ag.n_atoms) as w:
for ts in u.trajectory:
w.write(ag)
You can pass keyword arguments to some format writers. For example, the LAMMPS DATA format allows the lengthunit
and timeunit
keywords to specify the output units.
Pickling
MDAnalysis supports pickling of most of its data structures and trajectory formats. Unsupported attributes can be found in PR #2887.
In [19]: import pickle
In [20]: from MDAnalysis.tests.datafiles import PSF, DCD
In [21]: psf = mda.Universe(PSF, DCD)
In [22]: pickle.loads(pickle.dumps(psf))
Out[22]: <Universe with 3341 atoms>
As for MDAnalysis.core.groups.AtomGroup
, during serialization, it will be pickled with its bound
MDAnalysis.core.universe.Universe
. This means that after unpickling,
a new MDAnalysis.core.universe.Universe
will be created and
be attached to the new MDAnalysis.core.groups.AtomGroup
. If the Universe is serialized
with its MDAnalysis.core.groups.AtomGroup
, they will still be bound together afterwards:
In [23]: import pickle
In [24]: from MDAnalysis.tests.datafiles import PSF, DCD
In [25]: u = mda.Universe(PSF, DCD)
In [26]: g = u.atoms
In [27]: g_pickled = pickle.loads(pickle.dumps(g))
In [28]: print("g_pickled.universe is u: ", u is g_pickled.universe)
g_pickled.universe is u: False
In [29]: g_pickled, u_pickled = pickle.loads(pickle.dumps((g, u)))
In [30]: print("g_pickled.universe is u_pickled: ",
....: u_pickled is g_pickled.universe)
....:
g_pickled.universe is u_pickled: True
If multiple MDAnalysis.core.groups.AtomGroup
s are bound to the same
MDAnalysis.core.universe.Universe
, they will also be bound to the same one
after serialization:
In [1]: u = mda.Universe(PSF, DCD)
In [2]: g = u.atoms[:2]
In [3]: h = u.atoms[2:4]
In [4]: g_pickled = pickle.loads(pickle.dumps(g))
In [5]: h_pickled = pickle.loads(pickle.dumps(h))
In [6]: print("g_pickled.universe is h_pickled.universe : ",
...: g_pickled.universe is h_pickled.universe)
...:
g_pickled.universe is h_pickled.universe : False
In [7]: g_pickled, h_pickled = pickle.loads(pickle.dumps((g, h)))
In [8]: print("g_pickled.universe is h_pickled.universe: ",
...: g_pickled.universe is h_pickled.universe)
...:
g_pickled.universe is h_pickled.universe: True