Guessing Topology Attributes

Since version 2.8.0 MDAnalysis has introduced a new context-dependent guessing API to guess topology attributes that are not read from the file. This allows topology attributes such as masses, charges, and atom types to be guessed from existing information in a context-dependent manner (e.g. biological naming conventions) rather than file formats, as was previously done.

Supported guesser contexts

Guesser

Context name

Topology attributes guessed

DefaultGuesser

“default”

elements, types, masses, bonds, angles, dihedrals, impropers, aromaticities

Guessing at Universe creation

Topology attributes can be guessed at Universe creation by passing in topology attributes to guess to the to_guess keyword. By default, as of version 2.8.0, the default guesser is used to guess types and masses.

In [1]: import MDAnalysis as mda

In [2]: from MDAnalysis.tests.datafiles import PRM12

In [3]: u = mda.Universe(PRM12, context="default", to_guess=["types", "masses", "bonds"])

In [4]: u.atoms.bonds
Out[4]: <TopologyGroup containing 8947 bonds>

In general, guessing at Universe creation works very similarly to guessing using the guess_TopologyAttrs method interface documented below. The main difference is that passing guesser-specific keyword arguments such as fudge_factor and vdwradii into Universe creation is now deprecated and will be removed in version 3.0. Instead, we recommend specifying these arguments through an explicit call to the guess_TopologyAttrs().

Guessing using the guess_TopologyAttrs() interface

Topology attributes can also be guessed after Universe creation using the guess_TopologyAttrs() method. The to_guess, force_guess, and context keywords are used to specify which attributes to guess, which attributes to forcibly re-guess, and which guesser to use, respectively. These three keywords perform the same way here as they do in Universe creation.

As with Universe creation, the DefaultGuesser is used as the default context. The following example demonstrates how to guess atom types and masses after Universe creation.

In [5]: u = mda.Universe(PRM12, to_guess=[]) # in v2.8.0 masses and types are guessed by default

In [6]: u.guess_TopologyAttrs(to_guess=["types", "masses"])

In [7]: u.atoms.types
Out[7]: 
array(['HO', 'OH', 'CI', ..., 'OW', 'HW', 'HW'],
      shape=(8923,), dtype=object)

The context can be specified either using a string (e.g., "default") or an already created Guesser object (which will have been derived from the base class GuesserBase). It may be convenient to pass in an already-created Guesser object (such as the DefaultGuesser) if there are particular keywords you want to use in guessing methods, such as the fudge_factor, vdwradii or lower_bound keywords for controlling bond guessing. However, if additional keyword arguments are passed into guess_TopologyAttrs(), they will replace any existing arguments inside the guesser.

In [8]: from MDAnalysis.guesser import DefaultGuesser

In [9]: from MDAnalysis.tests.datafiles import CONECT    # example data file

In [10]: u = mda.Universe(CONECT)

In [11]: guesser = DefaultGuesser(u, fudge_factor=1.2)

In [12]: u.guess_TopologyAttrs(to_guess=["bonds"], context=guesser, fudge_factor=0.5)

In [13]: guesser._kwargs["fudge_factor"]
Out[13]: 0.5

Forcibly re-guessing

MDAnalysis will preferentially read topology attributes from file instead of re-guessing them, even if the attribute is passed into to_guess. For example, below, the types attributes reflects the actual atom types in the file.

In [14]: u = mda.Universe(PRM12, to_guess=["types", "masses"])

In [15]: u.atoms.types
Out[15]: 
array(['HO', 'OH', 'CI', ..., 'OW', 'HW', 'HW'],
      shape=(8923,), dtype=object)

Note

In cases where the attribute is only present for some atoms in the file (e.g. a patchy element column in a PDB), MDAnalysis will only guess the attribute for atoms where it is not present in the file.

To force MDAnalysis to re-guess a TopologyAttr, pass in the attribute to the force_guess keyword. This will force MDAnalysis to guess the attribute even if it is present in the file.

In [16]: u.guess_TopologyAttrs(to_guess=["types"], force_guess=["types"])

In [17]: u.atoms.types
Out[17]: array(['H', 'O', 'C', ..., 'O', 'H', 'H'], shape=(8923,), dtype=object)

Guessing bonds, angles, and torsions

Whereas most attributes are guessed at the atom, residue, or segment level, guessing topology objects such as bonds, angles, dihedrals and impropers behaves somewhat differently, and interacts with the force_guess keyword specially.

Specifically, if these connectivity attributes are guessed, they are by default guessed additively. Therefore, if bonds and other objects are guessed twice, the bonds of the second guess are added on. Below, we see the number of bonds increase when guessed again with a looser criteria.

In [18]: from MDAnalysis.tests.datafiles import CONECT

In [19]: u = mda.Universe(CONECT, to_guess=["bonds"])

In [20]: print(len(u.bonds))
1922

In [21]: u.guess_TopologyAttrs(to_guess=["bonds"], fudge_factor=1.2) # looser

In [22]: print(len(u.bonds))
9385

However, the number of bonds doesn’t change when the bonds are guessed again with stricter criteria – no new bonds are found (and also no bonds are removed either, even if they do not match the new criteria):

In [23]: u.guess_TopologyAttrs(to_guess=["bonds"], fudge_factor=0.5) # stricter

In [24]: print(len(u.bonds))
9385

Moreover, bonds are unique, so if the bonds are guessed again with the same criteria, the guessed bonds don’t change:

In [25]: u.guess_TopologyAttrs(to_guess=["bonds"], fudge_factor=0.5) # same

In [26]: print(len(u.bonds))
9385

However, if you want to forcibly overwrite all existing bonds, angles, dihedrals or impropers, you can pass the object to the force_guess keyword. This will remove all existing objects of that type before guessing. Below, we see the number of bonds has shrunk when guessed with stricter criteria:

In [27]: u.guess_TopologyAttrs(to_guess=["bonds"], force_guess=["bonds"], fudge_factor=0.5)

In [28]: print(len(u.bonds))
1912

Order of guessing

The order of the attributes guessed can matter in some cases. For example, bond guessing with the DefaultGuesser relies on looking up the vdW radii of the atoms involved by their atom type. That means that for file formats where the atom type is not a valid element, the atom type must be forcefully re-guessed for bond-guessing to work.

Note

The behaviour of looking up radii by type will likely change to looking up by element in version 3.0.

Therefore the following will not work (in MDAnalysis < 3.0) due to the types encoded in the PSF file:

In [29]: from MDAnalysis.tests.datafiles import PSF, DCD

In [30]: u = mda.Universe(PSF, DCD)

In [31]: print(u.atoms.types)
['56' '2' '2' ... '32' '72' '72']

In [32]: u.guess_TopologyAttrs(to_guess=["bonds"])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[32], line 1
----> 1 u.guess_TopologyAttrs(to_guess=["bonds"])

File ~/micromamba/envs/mda-user-guide/lib/python3.10/site-packages/MDAnalysis/core/universe.py:1658, in Universe.guess_TopologyAttrs(self, context, to_guess, force_guess, error_if_missing, **kwargs)
   1656 fg =  attr in force_guess
   1657 try:
-> 1658     values = guesser.guess_attr(attr, fg)
   1659 except NoDataError as e:
   1660     if error_if_missing or fg:

File ~/micromamba/envs/mda-user-guide/lib/python3.10/site-packages/MDAnalysis/guesser/base.py:159, in GuesserBase.guess_attr(self, attr_to_guess, force_guess)
    155 # Connection attributes should be just returned as they are always
    156 # appended to the Universe. ``force_guess`` handling should happen
    157 # at Universe level.
    158 if issubclass(top_attr, _Connection):
--> 159     return guesser_method()
    161 # check if the topology already has the attribute to partially guess it
    162 if hasattr(self._universe.atoms, attr_to_guess) and not force_guess:

File ~/micromamba/envs/mda-user-guide/lib/python3.10/site-packages/MDAnalysis/guesser/default_guesser.py:430, in DefaultGuesser.guess_bonds(self, atoms, coords)
    428 # check that all types have a defined vdw
    429 if not all(val in vdwradii for val in set(atomtypes)):
--> 430     raise ValueError(("vdw radii for types: " +
    431                       ", ".join([t for t in set(atomtypes) if
    432                                  t not in vdwradii]) +
    433                       ". These can be defined manually using the" +
    434                       f" keyword 'vdwradii'"))
    436 lower_bound = self._kwargs.get('lower_bound', 0.1)
    438 box = self._kwargs.get('box', None)

ValueError: vdw radii for types: 3, 52, 56, 70, 7, 25, 54, 81, 22, 51, 21, 9, 6, 31, 55, 1, 72, 32, 5, 30, 26, 29, 20, 23, 24, 73, 10, 2, 57, 50. These can be defined manually using the keyword 'vdwradii'

However, the snippet below will re-guess the types, and now bond-guessing can work as the elements have vdW radii defined:

In [1]: u.guess_TopologyAttrs(to_guess=["types", "bonds"], force_guess=["types"])

In [2]: print(u.atoms.types)
['N' 'H' 'H' ... 'C' 'O' 'O']