density module¶

Electron density calculated from individual atoms.

Todo

Near line 2440 want to investigate compatibility of the bfgs and l-bfgs parameters, and print which is being called. Currently getting some unexpected results with image_refine not terminating on convergence and then sometimes an exception (seen when running w/ bsoft). (01/30/21)

Module author: Michael S. Chapman <chapmanms@missouri.edu>

Authors:

Michael S. Chapman <chapmanms@missouri.edu>

Oregon Health & Science University & University of Missouri

Version:

1, Nov 26, 2024

Changed in version 02/14/2010.

Changed in version 11/4/12: - split from atomic_density.py

Changed in version 0.4.2: (10/24/13)

Changed in version 0.5.0: (5/1/15) ReStructured Text documentation

Changed in version 0.5.1: (5/17/18) calculate() offers (atoms) selection option needed for Andrew’s rotamer evaluation.

Changed in version 1.0.0: 11/17/20 Python 2.7 –> 3.6

Changed in version 1.0.2: 02/02/21 scipy.optimize.fmin_* standardized into scipy.optimize.minimize for more uniform convergence criteria.

Todo

partials calculation - exclude parameter types that are not needed. Make xyz the default (as per CNS), else read from “refine”.

class density.Arguments(imports=[], main=None, *args, **kwargs)¶

Bases: Arguments

Manager for command-line arguments from main and imported modules.

This is a template to be copied into modules, for the handling of command- line options.

methods export() and domestic() should be overridden by the module subclass with declarations of the command-line arguments needed by the module.

Parameters:

imports ((list of) ArgumentParser(s)) – (list of) module(s) from which to obtain “parent” ArgumentParser objects for inclusion.
main (bool|NoneType) – Add information and arguments appropriate for a main program (or not); if None, will determine by whether this class is defined in __main__.

domestic()¶: Defines options used only when main program and not when imported.

export()¶: Defines options used in both stand-alone and imported modes.

exception density.ArrayBoundsError(*args)¶

Bases: Exception

Wrap pending exception (from numpy), if any, in addition to other info.

class density.AtomicDensity(method='dstar-shells', form_factors=None, d_min=None, d_max=None, shells=15, filter_resolution=None, use_cache=True, cut_off=7.5, verbose=False)¶

Bases: object

Electron density surrounding isolated atom by various methods.

Density calculation can be accelerated through the use of previously cached densities from atoms of the same type and same (similar!) B-factor. See “cache_*” methods below. If enabled, method density will take advantage of the cache. The cache stores tables of density (and partial derivatives) with respect to distance (from center of atom) and rasterized B-factor. The default caching has some limitations:

Caching requires that the atomic density cut-off distance (beyond which density assumed zero) remain constant, as it should for a run of RSRef (some checks provided).

Chaching ignores differences less than b_precision (see self.cache_init()). So, there are limitations on its use during B-factor/envelope refinement:

For refinement of overall B-factors or EM envelope or the resolution of a low-pass Butterworth filter, previous densities need to be purged with cache_reset(). Caching will still help in calculating densities for atoms with identical individual B-factors.

For refinement of individual B-factors, caching should be disabled with cache_off() as caching will ignore the effects of small changes between cycles and risks one atom coming within b_precision of another. With freely changing B-factors, there is little to be gained by caching anyway.

With group B-factor refinement, it is safest to disable (cache_off()). Caching may speed computation if there are many atoms of the same B-factor in each group. If you want to risk it, use cache_reset() between cycles and use a small cache_init(b_precision=0.000001) to minimize the chance of atoms being accidentally considered the same.

Caching should also be disabled (cache_off()) with: (1) finite difference estimates of partial derivatives (unit-testing); and (2) direct (vs. interpolative) density calculation (method = ‘_calculate_all_pts’, used mostly for unit-testing).

Variables:

dst_limit – limits of the resolution bins, going from low to high resolution.
dst – (geometric) mean resolution of each bin.
dst_limit_cubed – limits (above) cubed.
butterworth_order (float) – for low-pass filter on atomic density. If reset, re-bind self.filter_resolution to force re-calculation of the filter values.

Parameters:

method (str) – to be used in atomic density calculation. Currently only dstar-shells (as implemented in the original 1995 RSRef, which requires parameters form_factors, shells, d_min, d_max. In the future, might also support spherical Gaussian and other possibilities.
form_factors (form_factor_table.FormFactorTable) – tabulated standard scattering factors for all atom types.
d_min (float) – high resolution limit (minimal d-spacing; Å).
d_max (float) – low resolution limit (maximal d-spacing; Å).
shells (int) – number of shells to use between d_min & d_max. Accuracy increases linearly & speed decreases with larger number. For a single C-atom (B=0), 5 shells gives 0.9% precision, 20 shells (overkill) 0.6%, compared to direct transform.
filter_resolution (float) – Soft high resolution limit (Å) for a Butterworth 5th order low pass filter to be applied to the model density, or None
use_cache (bool) – speed calculation by saving density (/partials) vs. distance, initial setting. See notes on AtomicDensity class documentation, and methods cache_on and cache_off etc..
cut_off (float) – atomic density considered zero beyond this distance in Å

cache_init(distance=None, b_precision=0.003, verbose=False)¶

Prepares cache for tables of atomic density vs. distance.

Parameters:

distance (float or ndarray-like) – distances from atom for density calculation.
b_precision (float) – B-factors considered same if w/in b_precision. Default is the formatting precision of a PDB file.

cache_off(reset=False, verbose=False)¶

Switches off cache storage and use of atomic densities / derivatives.

Parameters:

reset (bool) – force clearing of the cache - use if don’t trust current values for future use.
verbose (bool) – provide documentation.

cache_on(distance=None, b_precision=None, reset=False, verbose=False)¶

Switches on cache storage and use of atomic densities / derivatives.

Parameters:

distance (float or ndarray-like) – distances from atom for density calculation, checked if provided and cache reinitialized if necessary.
b_precision (float) – B-factors considered same if w/in b_precision. Default is the formatting precision of a PDB file, checked if provided and cache reinitialized if necessary.
reset (bool) – force clearing of the cache - use if don’t trust prior calculations.
verbose (bool) – provide documentation.

cache_reset(verbose=False)¶: Reset the density cache, called for example when added_b changed.

density(distance=None, atom_type='C', type_num=None, b=0.0, b_refining=True, check_caching=False)¶

Electron density (& derivatives) @ distance(s) from atom, using caching.

Pass through routine for _density, adding support for caching.

Caching saves up to ~1.8 ms/atom, which can be 50% for small atom boxes (1/2-width of ~d_min). This depends on the variety of atom types and B. No measurable overhead for 10,000 atoms w/ different B-factors, but may be switched off. (Of greatest use when densities are cached with unit occupancies.)

Parameters:

distance (float or ndarray-like or NoneType) – distance(s) from atom center (self.cache_distance if None).
atom_type (str) – as in PDB file.
type_num (int) – (optional) numerical atom type designator that has precedence over atom_type. (Minor speed increase.) Must correspond to atom_type in form_factors.
b (float) – atomic temperature factor (Å²).
b_refining (bool) – cache will store whether this atom has a B-factor that is being refined for a more atomic control of the use of cached data that is not yet implemented.
check_caching (bool) – will perform a (computer intensive) check asserting that cached densities are the same as calculated de novo (for unit testing).

Returns:

electron density at distance(s) from atom center.

Return type:

float or ndarray

density_array = <numpy.vectorize object>¶

density_wo_cache(distance, atom_type='C', type_num=None, b=0.0)¶

Electron density and partial derivatives at a distance from atom center.

Parameters:

distance (float or ndarray-like) – distance(s) from atom center.
atom_type (string) – as in PDB file.
type_num (int) – (optional) numerical atom type designator that has precedence over atom_type. (Minor speed increase.) Must correspond to atom_type in form_factors.
b (float, Å²) – atomic temperature factor (already incorporating overall B and /or envelope function).

Returns:

electron density at distance(s) from atom center and its its derivatives with respect to distance, B-factor and resolution. (The resolution partial derivative is with respect to a low-pass filter to be applied to the atomic density.)

Return type:

4 * (float or ndarray)

Note

Vectorized: if distance is array-like, the density is calculated at all distances in little more than the time for a single distance.

Todo

Consider wrapping in another layer of vectorization for calculation over multiple atoms.

Todo

Consider class method that pre-calculates attributes that depend on distance and not on atom type or B-factor.

distances(spacing=0.05)¶

Distances for density calculation for interpolation and caching.

Parameters:: spacing (float) – spacing between points (to be multipled by high resolution limit). With linear interpolation, preliminary tests indicate no loss of precision with spacing = 1/20. With cubic interpolation, 1/2 may suffice. Computation may depend inversely on spacing.
Returns:: distances (Å) from atom center where density will be calculated.
Return type:: ndarray(dtype=float)

property filter_resolution¶

profile(b=10.0, atom_type='C', max_dist=4.0, points=17)¶

Print calculated density profile of an isolated atom.

Incorporates resolution limits and filter resolution that have been specified, but not (automatically) any overall B or envelope function.

Parameters:

b (float) – thermal factor (add in any overall B or envelope function)
atom_type (str)
max_dist (flost) – tabulate up to this distance from atom center
points (int) – number of columns in table

three_d_transform_weights(distance)¶

Pre-calculations for 3D Fourier transform of centro-symmetric function.

For example, weights to calculate atomic density from scattering factors do not depend on the atomic parameters, and may be pre-calculated.

The returned weights are the 3D Fourier transforms of spherical shells, calculated as the difference between the FTs of solid spheres. The function to be multiplied by the returned array must be sampled at the same self.dst points. These are calculated from self.d_min, self.d_max and self.shells on class instantiation. Take care as there is no checking.

For further efficiency, the results could be tabulated for a range of distances for subsequent interpolation.

Todo

Add calculation of partial derivatives of the weights with respect to distance (see rres_local.c).

Parameters:: distance (float or ndarray) – distance(s) (from the center) at which Fourier transform is to be evaluated.
Returns:: weights, derivatives-by-distance which when multiplied by some centro-symmetric function, sampled at self.dst, and summed, yield the 3D Fourier transform and its derivative at distance from the center (from a 1D summation). weights is 1-D if distance is scalar, or a 2-D array with a row for each distance.
Return type:: 2 ndarray(shape=(distance.size, self.shells))

Attention

distance > 0.0001 is imposed to avoid instability in the numerical calculation of the G-function weight 2πrd*cos{2πrd*}/(2πr)³.

class density.ModelMap(comparison_map=None, calculator=None, near=3.6, cut_off=7.5, require_pts=2.0, symmetry=None)¶

Bases: object

Calculation of density near model atoms and comparison to observed density.

Calculations are on the grid of the (input) experimental map. Several methods are provided for calculating the model density in decreasing order of likely interest:

calculate_box_mask: Atom-masked distance interpolation <~ 3ms/atom with bounds-checking.

calculate_spherical_mask: Atom-masked distance interpolation ~3ms/atom with bounds-checking. (This might be the most erudite, but it is slower to determine which voxels are w/in the sphere than vectorized calculation for all voxels within the enclosing rhomboid.)

calculate_interpolate: No masking (all map) calculation ~12ms/atom.

calculate_all_pts: Direct independent calculation each pt, ~300ms/atom

Calculate_box_mask is the default, but the simpler, more accurate methods are maintained for regression testing.

Variables:

scale_model_to_observed (bool) – default = True. False can be used to obtain statistics for models so far out of density that scale nears zero. However, scaling to the model (False) can lead to a trivial solution in refinement where Bs are refined higher, occupancies lower to reduce the scale & residual. False is a fragile option, because I have not figured the effects on partial calculation, which I have programmed to fail in this case.

Parameters:

comparison_map (Map) – experminental map.
calculator (AtomicDensity) – definition of how the density of atoms is calculated.
near (float, Å) – use grid points within this distance of any atom for residuals etc.
cut_off (float, Å) – atomic density considered zero beyond this distance.
require_pts (float) – check that all map grid points within this distance of any atom evaluated are available (if > 0.0)
symmetry (Symmetry) – to apply to the atomic model.

Todo

Consider whether better to initialize instance of AtomicDensity within ModelMap.

Todo

Methods to pre-processes the comparison map: resample, expand symmetry etc..

class AtomMask(modelmap, cut_off=5.0, checkbounds=False)¶

Bases: object

Supports operations on the map that are masked to the locale of atoms.

Parameters:

modelmap (ModelMap) – the current ModelMap instance, usually “self”
cut_off (float) – distance from atom (A) beyond which density is set zero or otherwise ignored.
checkbounds (bool) – throw ArrayBoundError exception if grid points within cut_off are outside the map. Set to False if using a larger cut_off (to include contributions of neighboring atoms) than will be used for model/map comparisons. Instead, use ModelMap.inbounds to check.

Variables:

mask (ndarray(shape=map.shape,type=bool)) – True for map pixels w/in cut_off of any atom being evaluated.
stray_atoms (int) – number of atoms with out-of-bounds grid points; could be used for warnings / errors if not self.checkbounds.

Warning

checkbounds is not implemented for all density calculation methods.

Note

map density assumed to start at array index [0,0,0].

distances(xyz, evaluate=True)¶

Distances of neighboring map grid points to atom. Molecular mask.

Parameters:

xyz (nd.matrix, shape=(3,1)) – atomic coordinate, orthogonal A
evaluate (bool) – True if the that is being evaluated. Exceptions will only be thrown for out-of-bounds atoms that are being evaluated.

Returns:

(map_slice, distances), box surrounding the atom & distances to grid points within the box from the atom.

Return type:

tuple of 3 slice objects, float ndarray.shape(map_slice)

Todo

faster bounds-checking on limits rather than array assingment?

atom_box(distance)¶

Range of map that could be within distance of a neighboring atom.

Sphere is expanded such that if centered on the nearest grid point to an atom, the box encloses all grid points within distance of the atom.

Parameters:: distance (float) – Å.
Returns:: Half-width of enclosing box in map grid steps.
Return type:: int ndarray, shape=3

property atomic_partials_current¶: Whether atomic partial derivatives are current with latest model.

calculate(atoms, added_b=0.0, lazy=True, tolerance=None, method=None, scale_constants=None, local_scale=True, selection=None, verbose=False)¶

Calculate density from model atoms by one of several methods.

Parameters:

atoms (Atoms) – list of coordinates & atomic properties
added_b (float, Å²) – overall isotropic B-factor and/or EM envelope exponential to be added to atomic B-factors
lazy (bool) – Perform the calculation only if the structure or map has changed since the last calculation (change > tolerance, see below).
tolerance (float) – for paramter p & p’, deemed changed if abs(p-p’) > tolerance*max(abs(p),abs(p’)) and max(abs(p),abs(p’)) > tolerance. Should not be less than machine precision.
method (str) – one of _select_method.methods.keys() that controls whether density values are interpolated and grid points masked. Set by ModelMap.__init__, and should only be reset here for unit testing.
scale_constants (tuple(float, float)) – if prior scaling coefficients are to be used, this tuple should contain the multiplier and constant to be added subsequently. scale_constants=None will force recalculation.
local_scale (bool) – for scaling use only grid points close to atoms Local scaling must be used with masked density calculation, but may need to be turned off for good overall R-factor statistics.
selection (atoms.Selection) – Evaluate over selected atoms only.

Returns:

density, nearest_atom, multipler, constant (that were used to scale calculated map)

Return type:

float ndarray, float ndarray, float, float

changed(atoms=None, added_b=0.0, tolerance=None, verbose=False)¶

Test if changes in atoms / parameters require density recalculation.

Parameters:

atoms (Atoms or NoneType) – atomic coordinates, B-factors, occupancies If None, will not check for changes in atoms.
added_b (float) – combined overall B-factor / EM envelope correction.
tolerance (float) – for paramter p & p’, deemed changed if abs(p-p’) > tolerance*max(abs(p),abs(p’)) and max(abs(p),abs(p’)) > tolerance. Should not be less than machine precision.

Returns:

True if any parameter changed

Return type:

bool

correlation(verbose=True)¶

Pearson correlation coefficient model vs experimental map, local & overall.

Returns:: correlation coefficients - local & overall
Return type:: float, float

Todo

support masking of map.

distances(atom, map_index)¶

distances_assume_orthonormal(atom, map_index)¶

dmagnification(function='sq_diff', atoms=None, dxyz=None)¶

Partial of the objective function with respect to magnification.

This method calculates the partial derivative needed to optimize the magnification of the input map for best agreement with the atomic model.

Magnification is used in the microscopic sense. An increase means that the field of view (the map) contains less object, i.e. the Å per pixel are reduced and the unit cell dimensions are decreased. This is the inverse of the scale parameter in EMAN. Magnification is defined as a scaling from the Cartesian-space origin.

Conceptually, changed magnification can be thought of as either:

Division of the map dimensions by M.

Magnification of the coordinates by M.

In calculating the partial derivative, the 2nd approach can take advantage of atomic partial derivatives (if already calculated), while the 1st approach might be quicker if atomic partials are not available. Note that we do not actually magnify the coordinates as this would distort stereochemistry.

Approach 2: We can take advantage of the partial derivatives of the objective function, Γ with respect to the atomic coordinates x_i, ∂Γ/∂x_i

∂Γ/∂M = Σ_i ∂Γ/∂x_i . ∂x_i/∂M = Σ_i ∂Γ/∂x_i . x_i

Approach 1: This requires no prior calculation of coordinate partials. It should be quicker when independent of coordinate refinement, but the current implementation is significantly less accurate for reasons that are not clear.

Consider a least-squares residual:

Σ = Σ_grid (ρ_o - ρ_c)²

Adjusting (only) the observed map:

∂Γ/∂M = 2(ρ_o - ρ_c) . ∂ρ_o/∂M

By increasing the magnification, we change the map index coordinates m at any point in Carthesian space, m, where D,t is the deorthogonalization operator such that m = Dx + t.

∂ρ_o/∂M = ∂ρ_o /∂m . ∂m/∂M

The first term is just the gradient of the observed map, calculated numerically by central differences. Now the 2nd term. For a fixed set of atomic coordinates, the map coordinates will change w/ magnification:

m = Dx + t; ∂m/∂M = Dx = (m - t)/M

Here, we will restrict ourselves to evaluating the derivative at M=1, because we need matching ρ_o, ρ_c grid points to avoid interpolation. i.e. this approach will work when the maps are updated between every cycle, so that (locally) M=1. In summary, in approach 2, we evaluate:

∂Γ/∂M = 2(ρ_o - ρ_c) . ∂ρ_o/∂m . (m - t)

Accuracy has been assessed by comparison to finite difference calculations. Accuracy depends on the magnification error, resolution, distance of atoms from density, etc.. Test cases so far, give a convergence radius of approach 1 at about 2.5% and approach 2 at about 4%. Within this, errors range from 10-30% for approach 1 and 1-15% for approach 2. It is not clear whether the larger errors of approach 1 result from a undetected bug, or intrinsic limitations of the algorithm. Neither are very impressive, but both are robust in the sign of the derivative changing as you go through the optimum. Thus, both should allow refinement, even if conergence is slow. (It is suspected that convergence is poor, because small changes in magnification result in large displacements which can place atoms close to mismatched peaks in the density.)

Parameters:

function (str) – objective function, one of the supported strings below.
atoms (Atoms) – coordinates etc., required if to use partials.
dxyz (ndarray.shape(3,N)) – partial derivatives of the objective function re: atoms.xyz. If not None, approach 2 (see above) will be used.

Warning

map_calc.calculate() must have been run w/ latest atoms (approach 1) - no checks!

Warning

dxyz must be current with atoms (approach 2) - no checks!

Returns:: ∂Γ/∂M
Return type:: float

Warning

Possible bug in approach 1 with derivatives up to 30% in error. It might also be a limitation of the algorithm, see comments in “Accuracy” above.

getVolume(threshold=None, ignoreMask=False)¶

image_refine(atoms, magnification=1.0, added_b=0.0, resolution=0.0, optimize_magnification=False, optimize_overall_b=False, optimize_filter=False, finite_difference=False, bounds={'magnification': (0.9, 1.1), 'resolution': (0.6, 1.5)}, max_cycles=10, min_improvement=0.0, min_grad=0.0, verbosity=0)¶

Returns image (map) parameters fit to model (w/o applying them).

(Optionally constrained to improve stability.)

Differs from RSRef.refine in that all refining atoms will be used without reference to atom or group selections.

Warning

Convergence can be poor, due to:

Covariance of parameters: envelope and resolution have similar effects upon the density. This becomes a greater problem at lower resolution and with smaller structures (fewer data points). The solution may be to refine only one of the paramters, in turn, starting with the worst.
Inappropriate (internal) scaling of different parameter types, so that shifts are concentrated in a sub-set of components. As appropriate values depend on the system and stage of refinement, default values may need to be adjusted. See command-line options.
Low pass resolution << (hard) resolution limit, so that the filter resolution has little effect.
Analytical partial derivatives are approximated by sum of partials from individual atoms. If map_use is small relative to the actual width of an atom’s density, then the impact of changing its density parameters on neighboring atoms may be underestimated. This can be mitigated by using a larger map_use (just takes longer), or setting atom_extent = map_use, or using finite difference partials - but only if it looks like there is a problem.
Poor initial guesses for parameters: A large error in magnification (off by 10%) can yeild erroneous partials for the B-factors (atoms are too far from correct locations).

Warning

Constrained refinement (default, see below) may be needed to avoid an excessive trial shift on the first cycle resulting in out-of-bounds atoms. The problem has been lessened, but not completely eliminated by the rescaling of internal refinement units (Sept 2012).

Parameters:

atoms (Atoms) – Atomic coordinates etc.
magnification (float) – starting magnification.
added_b (float) – Overall B-factor and/or EM envelope exponential (Å²)
resolution (float) – for low-pass filtering of atomic density.
optimize_magnification (bool) – Refine magnification (else fix).
optimize_overall_b (bool) – Refine overall b-factor or EM envelope function (else fix).
optimize_filter (bool) – Refine low-pass filter parameters (else fix). (May not be possible to refine filter and B-factor/envelope simultaneously.)
finite_difference (bool) – Calculate derivatives by finite differences instead of analytically. Takes ~2 x longer, but may be more stable, particularly at low resolution, because it is not subject to the same approximations as the analytical derivatives.
bounds (dict(tuple) or NoneType) – fractional limits on refined parameter values. keywords: ‘magnification’, ‘added_b’, ‘resolution’. Thus ‘magnification’:(0.9, 1.1) would correspond to +/-10%. If either bounds=None, or all individual limits are None, then the default optimizer will be bfgs. Otherwise the default will be the bound-constrained optimizer, l_bfgs_b. Constrained is recommended as it usually converges faster, and avoids a common exception of a first optimization step using a magnification that puts atoms outside the map.
max_cycles (int) – maximum number of max_cycles (hard limit for bfgs; approx. for constrained l-bfgs, as converted to a corresponding limit on function / gradient calculations).
min_improvement (float) – refinement (bounded l-bfgs-b) is stopped when relative change in function < min_improvement, i.e. 1.e-2 would stop when change < 1% of current residual value.
min_grad (float) – refinement stopped when the largest derivative magnitude falls below this value (internal parameter units).
verbosity (int) – -1 for silent, 0 for terse, 1 for verbose

Returns:

refined incremental magnification factor, added overall B, resolution for low-pass filtering of atomic density.

Return type:

float x 3

Todo

Consider combined atomic / image refinement, but suspect that would not converge well.

Todo

Consider refinement of map origin, but this would be equivalent to overall rigid model (translational) refinement.

density module¶

Table of Contents

Table of Contents