Atom selection language
AtomGroups can be created by selecting atoms using the MDAnalysis atom selection language:
In [1]: import MDAnalysis as mda
In [2]: from MDAnalysis.tests.datafiles import PSF, DCD
In [3]: u = mda.Universe(PSF, DCD)
In [4]: ala = u.select_atoms('resname ALA')
In [5]: ala
Out[5]: <AtomGroup with 190 atoms>
The select_atoms()
method of a
AtomGroup
or a
Universe
returns an
AtomGroup
. These two methods have different behaviour: while Universe.select_atoms
operates on all the atoms in the universe, AtomGroup.select_atoms
only operates on the atoms within the original AtomGroup. A single selection phrase always returns an
AtomGroup
with atoms sorted according to their
index in the topology. This is to ensure that there are not any duplicates,
which can happen with complicated selections. When order matters, you can pass in multiple phrases.
This page documents selection keywords and their arguments. select_atoms()
also accepts keywords that modify the behaviour of the selection string and the resulting AtomGroup
(documented further down this page). For example, you can:
Pass in named AtomGroups as arguments:
In [6]: sph_6 = u.select_atoms("sphzone 6 protein")
In [7]: u.select_atoms("around 3 group sph_6", sph_6=sph_6)
Out[7]: <AtomGroup with 81 atoms>
Turn off periodic boundary conditions for geometric keywords with
periodic=False
:
In [8]: u.select_atoms("around 6 protein", periodic=False)
Out[8]: <AtomGroup with 0 atoms>
Create dynamic UpdatingAtomGroups with
updating=True
:
In [9]: u.select_atoms("prop x < 5 and prop y < 5 and prop z < 5", updating=True)
Out[9]: <AtomGroup with 917 atoms, with selection 'prop x < 5 and prop y < 5 and prop z < 5' on the entire Universe.>
It is possible to export selections for external software packages with the help of Selection exporters.
Selection Keywords
The following describes all selection keywords currently understood by the selection parser. The following applies to all selections:
Keywords are case sensitive.
Atoms are automatically sequentially ordered in a resulting selection (see notes below on Ordered selections for how to circumvent this if necessary).
Selections are parsed left to right and parentheses can be used for grouping. For example:
In [10]: u.select_atoms("segid DMPC and not (name H* or type OW)")
Out[10]: <AtomGroup with 0 atoms>
String selections such as names and residue names can be matched with Unix shell-style wildcards. These rules include:
Using
*
in a string matches any number of any characters?
matches any single character[seq]
matches any character in seq;[!seq]
matches any character not in seq[!?]
selects empty strings
For example, the string
GL*
selects all strings that start with “GL”, such as “GLU”, “GLY”, “GLX29”, “GLN”.GL[YN]
will select all “GLY” and “GLN” strings. Any number of patterns can be included in the search. For more information on pattern matching, see thefnmatch
documentation.
Simple selections
- protein
Selects atoms that belong to a hard-coded set of standard protein residue names.
- backbone
Selects the backbone atoms of a hard-coded set of protein residues. These atoms have the names: CA, C, O, N.
- nucleic
Selects atoms that belong to a hard-coded set of standard nucleic residue names.
- nucleicbackbone
Selects the backbone atoms of a hard-coded set of nucleic residues. These atoms have the names: P, O5’, C5’, C3’, O3’
- nucleicbase
Selects the atoms in nucleobases.
- nucleicsugar
Selects the atoms in nucleic sugars. These have the names: C1’, C2’, C3’, C4’, O2’, O4’, O3’
- segid seg-name
select by segid (as given in the topology), e.g.
segid 4AKE
orsegid DMPC
- resid residue-number-range
resid
can take a single residue number or a range of numbers, followed by insertion codes. A range consists of two selections separated by a colon (inclusive) such asresid 1A:1C
. This selects all residues withresid==1
andicode in ('A', 'B', 'C')
. A residue number (“resid”) and icode is taken directly from the topology. Unlikeresnum
,resid
is sensitive to insertion codes.- resnum residue-number-range
resnum
can take a single residue number or a range of numbers. A range consists of two numbers separated by a colon (inclusive) such asresnum 1:5
. A residue number (“resnum”) is taken directly from the topology. Unlikeresid
,resnum
is insensitive to insertion codes.- resname residue-name
select by residue name, e.g.
resname LYS
- name atom-name
select by atom name (as given in the topology). Often, this is force field dependent. Example:
name CA
(for C-alpha atoms) orname OW
(for SPC water oxygen)- type atom-type
select by atom type; this is either a string or a number and depends on the force field; it is read from the topology file (e.g. the CHARMM PSF file contains numeric atom types). This uses the
Atom.type
topology attribute.- atom seg-name residue-number atom-name
a selector for a single atom consisting of segid resid atomname, e.g.
DMPC 1 C2
selects the C2 carbon of the first residue of the DMPC segment- altloc alternative-location
a selection for atoms where alternative locations are available, which is often the case with high-resolution crystal structures e.g.
resid 4 and resname ALA and altloc B
selects only the atoms of ALA-4 that have an altloc B record.- icode icode
a selector for atoms where insertion codes are available. This can be combined with residue numbers using the
resid
selector above. e.g.icode [!?]
selects atoms without insertion codes.- moltype molecule-type
select by the
moltype
topology attribute, e.g.moltype Protein_A
. At the moment, only the TPR format defines themoltype
.
Boolean
- not
all atoms not in the selection, e.g.
not protein
selects all atoms that aren’t part of a protein- and
the intersection of two selections, i.e. the boolean and. e.g.
protein and not resname ALA
selects all atoms that belong to a protein but are not in an alanine residue- or
the union of two selections, i.e. the boolean or. e.g.
protein and not (resname ALA or resname LYS)
selects all atoms that belong to a protein, but are not in a lysine or alanine residue
Geometric
The geometric keywords below all implement periodic boundary conditions by default when valid cell dimensions are accessible from the Universe. This can be turned off by passing in the keyword periodic=False
:
In [11]: u.select_atoms("around 6 protein", periodic=False)
Out[11]: <AtomGroup with 0 atoms>
- around distance selection
selects all atoms a certain cutoff away from another selection, e.g.
around 3.5 protein
selects all atoms not belonging to protein that are within 3.5 Angstroms from the protein- sphzone externalRadius selection
selects all atoms within a spherical zone centered in the center of geometry (COG) of a given selection, e.g.
sphzone 6.0 ( protein and ( resid 130 or resid 80 ) )
selects the center of geometry of protein, resid 130, resid 80 and creates a sphere of radius 6.0 around the COG.- sphlayer innerRadius externalRadius selection
selects all atoms within a spherical layer centered in the center of geometry (COG) of a given selection, e.g.,
sphlayer 2.4 6.0 ( protein and ( resid 130 or resid 80 ) )
selects the center of geometry of protein, resid 130, resid 80 and creates a spherical layer of inner radius 2.4 and external radius 6.0 around the COG.- cyzone externalRadius zMax zMin selection
selects all atoms within a cylindric zone centered in the center of geometry (COG) of a given selection, e.g.
cyzone 15 4 -8 protein and resid 42
selects the center of geometry of protein and resid 42, and creates a cylinder of external radius 15 centered on the COG. In z, the cylinder extends from 4 above the COG to 8 below. Positive values for zMin, or negative ones for zMax, are allowed.- cylayer innerRadius externalRadius zMax zMin selection
selects all atoms within a cylindric layer centered in the center of geometry (COG) of a given selection, e.g.
cylayer 5 10 10 -8 protein
selects the center of geometry of protein, and creates a cylindrical layer of inner radius 5, external radius 10 centered on the COG. In z, the cylinder extends from 10 above the COG to 8 below. Positive values for zMin, or negative ones for zMax, are allowed.- point x y z distance
selects all atoms within a cutoff of a point in space, make sure coordinate is separated by spaces, e.g.
point 5.0 5.0 5.0 3.5
selects all atoms within 3.5 Angstroms of the coordinate (5.0, 5.0, 5.0)- prop [abs] property operator value
selects atoms based on position, using property x, y, or z coordinate. Supports the abs keyword (for absolute value) and the following operators: <, >, <=, >=, ==, !=. For example,
prop z >= 5.0
selects all atoms with z coordinate greater than 5.0;prop abs z <= 5.0
selects all atoms within -5.0 <= z <= 5.0.
Similarity and connectivity
- same subkeyword as selection
selects all atoms that have the same subkeyword value as any atom in selection. Allowed subkeyword values are the atom properties:
name, type, resname, resid, resnum, segid, mass, charge, radius, bfactor
, the groups an atom belong to:residue, segment, fragment
, and the atom coordinatesx, y, z
. (Note thatbfactor
currently only works for MMTF formats.) e.g.same charge as protein
selects all atoms that have the same charge as any atom in protein.- byres selection
selects all atoms that are in the same segment and residue as selection, e.g. specify the subselection after the byres keyword.
byres
is a shortcut tosame residue as
- bonded selection
selects all atoms that are bonded to selection e.g.:
name H and bonded name N
selects only hydrogens bonded to nitrogens
Index
- index index-range
selects all atoms within a range of (0-based) inclusive indices, e.g.
index 0
selects the first atom in the universe;index 5:10
selects the 6th through 11th atoms, inclusive. This uses theAtom.index
topology attribute.- bynum number-range
selects all atoms within a range of (1-based) inclusive indices, e.g.
bynum 1
selects the first atom in the universe;bynum 5:10
selects 5th through 10th atoms, inclusive.Note
These are not the same as the 1-indexed
Atom.id
topology attribute.bynum
simply adds 1 to the 0-indexedAtom.index
.
Preexisting selections and modifiers
- group group-name
selects the atoms in the
AtomGroup
passed to the function as an argument named group-name. Only the atoms common to group-name and the instanceselect_atoms()
was called from will be considered, unlessgroup
is preceded by theglobal
keyword. group-name will be included in the parsing just by comparison of atom indices. This means that it is up to the user to make sure the group-name group was defined in an appropriateUniverse
.- global selection
by default, when issuing
select_atoms()
from anAtomGroup
, selections and subselections are returned intersected with the atoms of that instance. Prefixing a selection term withglobal
causes its selection to be returned in its entirety. As an example, theglobal
keyword allows forlipids.select_atoms("around 10 global protein")
— wherelipids
is a group that does not contain any proteins. Wereglobal
absent, the result would be an empty selection since theprotein
subselection would itself be empty. When callingselect_atoms()
from aUniverse
,global
is ignored.
Dynamic selections
By default select_atoms()
returns an
AtomGroup
, in which the list of atoms is
constant across trajectory frame changes. If
select_atoms()
is invoked with named
argument updating
set to True
, an
UpdatingAtomGroup
instance will be returned
instead.
# A dynamic selection of corner atoms:
In [12]: ag_updating = u.select_atoms("prop x < 5 and prop y < 5 and prop z < 5", updating=True)
In [13]: ag_updating
Out[13]: <AtomGroup with 917 atoms, with selection 'prop x < 5 and prop y < 5 and prop z < 5' on the entire Universe.>
It behaves just like an AtomGroup
object, with the difference that the selection expressions are re-evaluated
every time the trajectory frame changes (this happens lazily, only when the
UpdatingAtomGroup
object is accessed so that
there is no redundant updating going on):
In [14]: u.trajectory.next()
Out[14]: < Timestep 1 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
In [15]: ag_updating
Out[15]: <AtomGroup with 923 atoms, with selection 'prop x < 5 and prop y < 5 and prop z < 5' on the entire Universe.>
Using the group
selection keyword for
Preexisting selections and modifiers, one can
make updating selections depend on
AtomGroup
, or even other
UpdatingAtomGroup
, instances.
Likewise, making an updating selection from an already updating group will
cause later updates to also reflect the updating of the base group:
In [16]: chained_ag_updating = ag_updating.select_atoms("resid 1:1000", updating=True)
In [17]: chained_ag_updating
Out[17]: <AtomGroup with 923 atoms, with selection 'resid 1:1000' on another AtomGroup.>
In [18]: u.trajectory.next()
Out[18]: < Timestep 2 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
In [19]: chained_ag_updating
Out[19]: <AtomGroup with 921 atoms, with selection 'resid 1:1000' on another AtomGroup.>
Finally, a non-updating selection or a slicing/addition operation made on an
UpdatingAtomGroup
will return a static
AtomGroup
, which will no longer update
across frames:
In [20]: static_ag = ag_updating.select_atoms("resid 1:1000")
In [21]: static_ag
Out[21]: <AtomGroup with 921 atoms>
In [22]: u.trajectory.next()
Out[22]: < Timestep 3 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
In [23]: static_ag
Out[23]: <AtomGroup with 921 atoms>
Ordered selections
select_atoms()
sorts the atoms
in the AtomGroup
by atom index before
returning them (this is to eliminate possible duplicates in the
selection). If the ordering of atoms is crucial (for instance when
describing angles or dihedrals) or if duplicate atoms are required
then one has to concatenate multiple AtomGroups, which does not sort
them.
The most straightforward way to concatenate two AtomGroups is by using the
+
operator:
In [14]: ordered = u.select_atoms("resid 3 and name CA") + u.select_atoms("resid 2 and name CA")
In [15]: list(ordered)
Out[15]:
[<Atom 46: CA of type 22 of resname ILE, resid 3 and segid 4AKE>,
<Atom 22: CA of type 22 of resname ARG, resid 2 and segid 4AKE>]
A shortcut is to provide two or more selections to
select_atoms()
, which then
does the concatenation automatically:
In [16]: list(u.select_atoms("resid 3 and name CA", "resid 2 and name CA"))
Out[16]:
[<Atom 46: CA of type 22 of resname ILE, resid 3 and segid 4AKE>,
<Atom 22: CA of type 22 of resname ARG, resid 2 and segid 4AKE>]
Just for comparison to show that a single selection string does not work as one might expect:
In [17]: list(u.select_atoms("(resid 3 or resid 2) and name CA"))
Out[17]:
[<Atom 22: CA of type 22 of resname ARG, resid 2 and segid 4AKE>,
<Atom 46: CA of type 22 of resname ILE, resid 3 and segid 4AKE>]