CNS-RSRef package documentation (Real-space refinement embedded in CNS).

Section author: Michael S. Chapman <chapmami@ohsu.edu>

Module author: Michael S. Chapman <chapmami@ohsu.edu>

Authors:

Michael S. Chapman <chapmami@ohsu.edu>,

Oregon Health & Science University

Version:

0.5, March 23, 2016

Usage:

cns_rsref [-RSRef options] < cns.inp...

Changed in version 0.1: 10/05/11

Changed in version 0.4.2: 11/08/13

Changed in version 0.5: 05/20/2015 ReStructuredText docs

Synopsis

The RSRef package compares/refines the agreement between an atomic model and an electron density map.

Embedded within CNS, stereochemical restraints are available, and a number of optimizers are supported, including Cartesian or torsion angle dynamics.

Usage

Refinement is run from CNS, substituting the RRES energy term calculated by RSRef. From coordinates supplied by CNS, RSRef returns the density-fit “energy” and the partial derivatives (gradients) needed for refinement.

Division of Labor / Control input

Broadly, CNS is responsible for:

  • Coordinate input / output, selection and grouping.
  • Stereochemical restraints.
  • Optimizations.

Embedded RSRef is responsible for:

  • Map input, comparison with coordinates, the “experimental” energy term and its partial derivatives re: atomic positions.

Stand-alone RSRef would be used for complementary optimizations of (as applicable):

  • Parameters that affect the map: magnification, resolution, overall B / envelope function; most useful with electron microscopy.
  • Atomic B-factors (fitting vs. density not yet supported as CNS-embed).

The division of labor dictates which information is provided to CNS through the .inp files that it reads, and which is passed to RSRef through command- line arguments.

Sorry, but there is a little redundancy.

In such cases, the input to one of the programs is simply ignored, as documented below. CNS and RSRef do not check the compatibility of each other’s input.

Input files

Command files (.inp) will be only slightly changed from those used in conventional reciprocal space refinement. Examples are available in /example_input:

  • min_1.inp: Gradient descent optimization in Cartesian coordinates.
  • sa5k_1.inp: Simulated annealing, torsion-angle or Cartesian.

rsref.csh illustrates how the program is run.

The most important points of the .inp files are:

  • include rres within the flags command that substitutes for the usual x-ray term.
  • Some of the usuals, in the “not normally changed” section can be removed: reading reflections, R-factors etc.. Others continue to be needed: stereochemistry, non-crystallographic symmetry etc.. Still others look like they should not be necessary, but satisfy dependencies: the scatter & spacegroup libs in XTALlib & XRAYlib. (The examples are not optimized, other efficiencies may be possible.)
  • Unlike conventional refinement, there is no need to consider a lattice of full unit cells. Within CNS, space group information would be needed for calculation of inter-molecular van der Waals / electrostatics terms, but the CNS default is that, for speed, they are omitted. The user may choose a large and arbitrary unit cell. (Electron microscopists would want to do this anyway.)
  • Experience indicates that in real-space, higher annealing temperatures (~10,000 K) may be tolerated / advantageous. Explore.
  • Particularly important parameters include:
    • Coordinates (pdb). Note that with CNS 1.3 onwards, the structure (mtf) file (prepared using CNS’s generate) is usually optional. Most refinements can take advantage of CNS’s autogenerate support.
    • Coordinate selection, rigid group designation.
    • Optimization protocol, annealing schedule, number of cycles etc..
  • The following CNS input parameters are ignored as they are handled by RSRef: resolution limits, weight on the experimental term, overall B-factor, restraints for atomic B refinement.

Command-line parameters:

Native CNS does not use command-line arguments. In the modified CNS, they are passed through and used to control RSRef. Critically important arguments include:

  • Map file name.
  • Resolution limits.
  • Weight for the experimental (map) fitting energy (RSRef) vs. stereochemical energy (CNS).
  • Symmetry (crystallographic & molecular) from which to estimate contributions to the calculated electron density.
  • Radii around atoms that define which map pixels are considered.

This is not a complete list. Users need to consult the RSRef documentation. Note one important RSRef parameter that will be ignored:

  • Coordinates - they will be read & passed from CNS.

Note also that although resolution and symmetry information are passed from the command line through CNS to RSRef, they are not automatically coordinated with input given in the CNS .inp file. It is the user’s responsibility to ensure that the two specifications of symmetry are compatible.

The several means of RSRef input (command-line; argument file; dot-file) are all supported in the CNS-embedded version.

Weighting (without test sets / cross-validation) is subjective. Choose an appropriate balance between fit and reasonable stereochemistry.

Installation

Background on Implementation

RSRef is made available to a near-normal CNS through:

  • A C wrapper (wrap.c) that exposes the most useful Python classes and methods of RSRef to potentially any C or Fortran-compiled program.
  • A set of Fortran subroutines (interface.f) that is specific for CNS, sharing COMMON block storage which it exchanges with the arrays/arguments exposed by wrap.c. Almost all of the necessary Fortran functionality is contained within this addition, thereby quarantined from changes in the CNS code base.
  • Minor changes to ener.inc, energy.f & cns.f that are patched in with patch.

cns_rsref is therefore built on top of an existing CNS installation.

Installation Script: patch.py

This should guide the administrator / user through the 2-minute process. It should be able to figure out sensible defaults if run within a CNS-compatible environment (after sourcing cns_solve_env), and providing that patch.py is invoked from the RSRef installation directory with the same Python interpreter as will be used when running cns_rsref. On many systems, there are multiple versions of Python, and your default may depend on your shell / envirnomental variables. You may need to invoke explicitly the python interpreter that you want. It needs to be one with a full installation (including libraries & site-packages), and not just the binary.

In overview, patch.py makes a copy of $CNS_SOURCE, into which it adds interface.f, wrap.c and 3 modified cns files as it patches them. Finally, it builds an executable with make cns_solve using a modified Makefile.

The script is lazy. Use the “-f” option to force a rebuild even with files that it thinks are up to date.

Requested input is explained below. In most cases the defaults will be fine:

  • Python directory w/ libpython2.7.a [/progs/build/Python-2.7.1]:
    • Path to the Python library to be linked on loading.
    • patch.py will use unix ‘find’ & list those compatible with the interpreter in use, and select one as the default.
  • Python directory (w/ include-files, /Modules) [/progs/build/Python-2.7.1]:
    • This directory should contain include files and the Modules directory containing needed objects, such as getpath.o.
    • patch.py will find & list those compatible with the interpreter in use, and select one as the default.
  • Build directory [/tmp/tmpIyFx8z]:
    • Used to gather source/object files and to build the modified CNS.
    • Default is a temporary file that will disappear completely on completion of a successful (or failed) build. Before it vanishes, an opportunity will be given to save the executable. The default is appropriate for routine builds of stable code, but repeated builds will take longer, because objects are not saved.
    • An alternative directory path can be provided, where the build will be left (user responsible for clean-up). Repeated builds will be quick, as patch.py will only patch / build components that are out-of-date.
  • CNS source:
    • defaults to $CNS_SOURCE, from where the CNS distribution sources should be copied.
  • Directory for executable (cns_solve-1110051839.exe) [/progs/cns_solve_1.21/intel-x86_64bit-linux/bin]:
    • The default may require administrator privaledges and would add the newly built executable to the CNS installation directory.
    • There should be nothing wrong with this, as the modifications should be innocuous to those not using RSRef. CNS should run normally.
    • Note, however, that unless the links below overwrite the existing ones, that cns & cns_solve would still point to the prior version, which would be maintained in the installation directory until purged.
    • A different location can certainly be provided.
    • None will skip the save.
  • Directory for link "cns" (or None) [last_directory]:
    • Optionally makes a simpler symbolic link to the time-stamped executable file-name.
    • If allowed, and you choose the bin directory of the CNS installation, this will make the new version the default for all users.
  • Directory for link "cns_solve" (or None) [last_directory]: - see above.
  • Directory for link "cns_solve" (or None) [last_directory]: - Finally a unique link, if you want to keep distinct.

Potential Problems

Python is interpreted at run-time, using dynamically-loaded C-compiled libraries. Warnings abound in Python documentation about the need to maintain compatibility between Python modules in terms of both Python versions and the compilers used in preparing libraries. Here, with cross-compilation, we extend the challenge to maintaining C/Fortan compiler / option compatibility across the Python installation; Python packages and their C/Fortran components, such as NumPy, SciPy; and the CNS package. patch.py will build interface.f and wrap.c with the same compiler / options as CNS. We have several compilers available here, and have not paid close attention to exact compatibility, with no disasters so far.

Performance guidelines

For a ~5000 atom structure, surrounded by ~5000 NCS neighbors, on a 4-cpu 2007 computer, gradient descent takes 20-30s / cycle. Slow-cooling from 10,000K followed by 50 cycles of gradient descent takes 90-120 min. Small substructures can be refined very quickly.

Credits

This is a completely new implementation by Michael Chapman, but it builds upon experience of others in the Chapman lab. in writing interfaces to X-plor and earlier versions of CNS. These include Richard Bertram, Zhi (James) Chen, Andrew Korostelev & Felcy Fabiola. Supported by funding from NIH.