Reading and writing files

Input

Read information from topology and coordinate files to create a Universe:

import MDAnalysis as mda
u = mda.Universe('topology.gro', 'trajectory.xtc')

A topology file is always required for a Universe, whereas coordinate files are optional. Some file formats provide both topology and coordinate information. MDAnalysis supports a number of formats, which are automatically detected based on the file extension. For example, the above loads a GROMACS XTC trajectory. Multiple coordinate files can be loaded, as described below; the following code loads two CHARMM/NAMD DCD files and concatenates them:

u = mda.Universe('topology.psf', 'trajectory1.dcd', 'trajectory2.dcd')

Some formats can be loaded with format-specific keyword arguments, such as the LAMMPS DATA atom_style specification.

Reading multiple trajectories

A Universe can load multiple trajectories, which are concatenated in the order given. One exception to this is with XTC and TRR files. If the continuous=True flag is passed to Universe, MDAnalysis will try to stitch them together so that the trajectory is as time-continuous as possible. This means that there will be no duplicate time-frames, or jumps back in time.

As an example, the following depicted trajectory is split into four parts. The column represents the time. As you can see, some segments overlap. With continuous=True, only the frames marked with a + will be read.

part01:  ++++--
part02:      ++++++-
part03:            ++++++++
part04:                         ++++

However, there can be gaps in time (i.e. frames appear to be missing). Ultimately it is the user’s responsiblity to ensure that the trajectories can be stitched together meaningfully.

Note

While you can set continuous=True for either XTC or TRR files, you cannot mix different formats.

More information can be found at the API reference for ChainReader.

Trajectory formats

If no format keyword is provided, ChainReader will try to guess the format for each file from its extension. You can force ChainReader to use the same format for every file by using the format keyword. You can also specify which format to use by file, by passing in a sequence of (filename, format) tuples.

In [1]: import MDAnalysis as mda

In [2]: from MDAnalysis.tests.datafiles import PDB, GRO

In [3]: u = mda.Universe(PDB, [(GRO, 'gro'), (PDB, 'pdb'), (GRO, 'gro')])

In [4]: u.trajectory
Out[4]: <ChainReader containing adk_oplsaa.gro, adk_oplsaa.pdb, adk_oplsaa.gro with 3 frames of 47681 atoms>

In-memory trajectories

Reading trajectories into memory

If your device has sufficient memory to load an entire trajectory into memory, then analysis can be sped up substantially by transferring the trajectory to memory. This makes it possible to operate on raw coordinates using existing MDAnalysis tools. In addition, it allows the user to make changes to the coordinates in a trajectory (e.g. through AtomGroup.positions) without having to write the entire state to file.

The most straightforward way to do this is to pass in_memory=True to Universe, which automatically transfers a trajectory to memory:

In [5]: from MDAnalysis.tests.datafiles import TPR, XTC

In [6]: universe = mda.Universe(TPR, XTC, in_memory=True)

MDAnalysis uses the MemoryReader class to load this data in.

Transferring trajectories into memory

The decision to transfer the trajectory to memory can be made at any time with the transfer_to_memory() method of a Universe:

In [7]: universe = mda.Universe(TPR, XTC)

In [8]: universe.transfer_to_memory()

This operation may take a while (passing verbose=True to transfer_to_memory() will display a progress bar). However, subsequent operations on the trajectory will be very fast.

Building trajectories in memory

MemoryReader can also be used to directly generate a trajectory as a numpy array.

In [9]: from MDAnalysisTests.datafiles import PDB

In [10]: from MDAnalysis.coordinates.memory import MemoryReader

In [11]: import numpy as np

In [12]: universe = mda.Universe(PDB)

In [13]: universe.atoms.positions
Out[13]: 
array([[ 52.017,  43.56 ,  31.555],
       [ 51.188,  44.112,  31.722],
       [ 51.551,  42.828,  31.039],
       ...,
       [105.342,  74.072,  40.988],
       [ 57.684,  35.324,  14.804],
       [ 62.961,  47.239,   3.753]], shape=(47681, 3), dtype=float32)

The load_new() method can be used to load coordinates into a Universe, replacing the old coordinates:

In [14]: coordinates = np.random.rand(len(universe.atoms), 3)

In [15]: universe.load_new(coordinates, format=MemoryReader);

In [16]: universe.atoms.positions
Out[16]: 
array([[0.17314637, 0.09900497, 0.12876734],
       [0.22613522, 0.71016735, 0.06491213],
       [0.62965196, 0.4768967 , 0.13714159],
       ...,
       [0.26517144, 0.81196684, 0.9158056 ],
       [0.09677985, 0.2001779 , 0.17556828],
       [0.9661267 , 0.93336505, 0.7824625 ]],
      shape=(47681, 3), dtype=float32)

or they can be directly passed in when creating a Universe.

In [17]: universe2 = mda.Universe(PDB, coordinates, format=MemoryReader)

In [18]: universe2.atoms.positions
Out[18]: 
array([[0.17314637, 0.09900497, 0.12876734],
       [0.22613522, 0.71016735, 0.06491213],
       [0.62965196, 0.4768967 , 0.13714159],
       ...,
       [0.26517144, 0.81196684, 0.9158056 ],
       [0.09677985, 0.2001779 , 0.17556828],
       [0.9661267 , 0.93336505, 0.7824625 ]],
      shape=(47681, 3), dtype=float32)

In-memory trajectories of an atom selection

Creating a trajectory of an atom selection requires transferring the appropriate units. This is often needed when using Merge() to create a new Universe, as coordinates are not automatically loaded in.

Output

Frames and trajectories

MDAnalysis Universes can be written out to a number of formats with write(). For example, to write the current frame as a PDB:

from MDAnalysis.tests.datafiles import PDB, TRR
u = mda.Universe(PDB, TRR)
ag = u.select_atoms("name CA")
ag.write("c-alpha.pdb")

Pass in the frames keyword to write out trajectories.

ag.write('c-alpha_all.xtc', frames='all')

Slice or index the trajectory to choose which frames to write:

ag.write('c-alpha_skip2.trr', frames=u.trajectory[::2])
ag.write('c-alpha_some.dcd', frames=u.trajectory[[0,2,3]])

Alternatively, iterate over the trajectory frame-by-frame with Writer(). This requires you to pass in the number of atoms to write.

with mda.Writer('c-alpha.xyz', ag.n_atoms) as w:
    for ts in u.trajectory:
        w.write(ag)

You can pass keyword arguments to some format writers. For example, the LAMMPS DATA format allows the lengthunit and timeunit keywords to specify the output units.

Pickling

MDAnalysis supports pickling of most of its data structures and trajectory formats. Unsupported attributes can be found in PR #2887.

In [19]: import pickle

In [20]: from MDAnalysis.tests.datafiles import PSF, DCD

In [21]: psf = mda.Universe(PSF, DCD)

In [22]: pickle.loads(pickle.dumps(psf))
Out[22]: <Universe with 3341 atoms>

As for MDAnalysis.core.groups.AtomGroup, during serialization, it will be pickled with its bound MDAnalysis.core.universe.Universe. This means that after unpickling, a new MDAnalysis.core.universe.Universe will be created and be attached to the new MDAnalysis.core.groups.AtomGroup. If the Universe is serialized with its MDAnalysis.core.groups.AtomGroup, they will still be bound together afterwards:

In [23]: import pickle

In [24]: from MDAnalysis.tests.datafiles import PSF, DCD

In [25]: u = mda.Universe(PSF, DCD)

In [26]: g = u.atoms

In [27]: g_pickled = pickle.loads(pickle.dumps(g))

In [28]: print("g_pickled.universe is u: ", u is g_pickled.universe)
g_pickled.universe is u:  False

In [29]: g_pickled, u_pickled = pickle.loads(pickle.dumps((g, u)))

In [30]: print("g_pickled.universe is u_pickled: ",
   ....:        u_pickled is g_pickled.universe)
   ....: 
g_pickled.universe is u_pickled:  True

If multiple MDAnalysis.core.groups.AtomGroups are bound to the same MDAnalysis.core.universe.Universe, they will also be bound to the same one after serialization:

In [31]: u = mda.Universe(PSF, DCD)

In [32]: g = u.atoms[:2]

In [33]: h = u.atoms[2:4]

In [34]: g_pickled = pickle.loads(pickle.dumps(g))

In [35]: h_pickled = pickle.loads(pickle.dumps(h))

In [36]: print("g_pickled.universe is h_pickled.universe : ",
   ....:        g_pickled.universe is h_pickled.universe)
   ....: 
g_pickled.universe is h_pickled.universe :  False

In [37]: g_pickled, h_pickled = pickle.loads(pickle.dumps((g, h)))

In [38]: print("g_pickled.universe is h_pickled.universe: ",
   ....:        g_pickled.universe is h_pickled.universe)
   ....: 
g_pickled.universe is h_pickled.universe:  True