CNS-RSRef package (Real-space refinement within CNS).

Section author: Michael S. Chapman <chapmanms@missouri.edu>

Module author: Michael S. Chapman <chapmanms@missouri.edu>

Authors:

Michael S. Chapman <chapmanms@missour.edi>,

Oregon Health & Science University and University of Missouri

Version:

1, Nov 26, 2024

Usage:

cns_rsref [RSRef-options] < cns.inp...

Synopsis

The RSRef package compares/refines the agreement between an atomic model and an electron density map.

Embedded within CNS, stereochemical restraints are available, and a number of optimizers are supported, including Cartesian or torsion angle dynamics.

Note

Most users will currently require the CNS extension to do restrained all-atom xyz-position optimization. Other tasks, such as rigid-group, B-factor, image (resolution) as well as internal coordinate (backbone phi/psi) optimizations can be done without. Work is underway on a python implementation of stereochemical restraints that might further limit the need for the CNS extension to a small subset of users.

Usage

Refinement is run from CNS, substituting the RRES energy term calculated by RSRef. From coordinates supplied by CNS, RSRef returns the density-fit “energy” and the partial derivatives (gradients) needed for refinement.

Division of Labor / Control input

Broadly, CNS is responsible for:

  • Coordinate input / output, selection and grouping.

  • Stereochemical restraints.

  • Optimizations.

Embedded RSRef is responsible for:

  • Map input, comparison with coordinates, the “experimental” energy term and its partial derivatives re: atomic positions.

Stand-alone RSRef would be used for complementary optimizations of (as applicable):

  • Parameters that affect the map: magnification, resolution, overall B / envelope function; most useful with electron microscopy.

  • Atomic B-factors (fitting vs. density not yet supported as CNS-embed).

The division of labor dictates which information is provided to CNS through the .inp files that it reads, and which is passed to RSRef through command-line arguments.

Sorry, but there is a little redundancy.

In such cases, the input to one of the programs is simply ignored, as documented below. CNS and RSRef do not check the compatibility of each other’s input.

Input files

Command files (.inp) will be only slightly changed from those used in conventional reciprocal space refinement. Examples are available in /example_input:

  • min_1.inp: Gradient descent optimization in Cartesian coordinates.

  • sa5k_1.inp: Simulated annealing, torsion-angle or Cartesian.

rsref.csh illustrates how the program is run.

The most important points of the .inp files are:

  • include rres within the flags command that substitutes for the usual x-ray term.

  • Some of the usuals, in the “not normally changed” section can be removed: reading reflections, R-factors etc.. Others continue to be needed: stereochemistry, non-crystallographic symmetry etc.. Still others look like they should not be necessary, but satisfy dependencies: the scatter & spacegroup libs in XTALlib & XRAYlib. (The examples are not optimized, other efficiencies may be possible.)

  • Unlike conventional refinement, there is no need to consider a lattice of full unit cells. Within CNS, space group information would be needed for calculation of inter-molecular van der Waals / electrostatics terms, but the CNS default is that, for speed, they are omitted. The user may choose a large and arbitrary unit cell. (Electron microscopists would want to do this anyway.)

  • Experience indicates that in real-space, higher annealing temperatures (~10,000 K) may be tolerated / advantageous. Explore.

  • Particularly important parameters include:

    • Coordinates (pdb). Note that with CNS 1.3 onwards, the structure (mtf) file (prepared using CNS’s generate) is usually optional. Most refinements can take advantage of CNS’s autogenerate support.

    • Coordinate selection, rigid group designation.

    • Optimization protocol, annealing schedule, number of cycles etc..

  • The following CNS input parameters are ignored as they are handled by RSRef: resolution limits, weight on the experimental term, overall B-factor, restraints for atomic B refinement.

Command-line parameters:

Native CNS does not use command-line arguments. In the modified CNS, they are passed through and used to control RSRef. Critically important arguments include:

  • Map file name.

  • Resolution limits.

  • Weight for the experimental (map) fitting energy (RSRef) vs. stereochemical energy (CNS).

  • Symmetry (crystallographic & molecular) from which to estimate contributions to the calculated electron density.

  • Radii around atoms that define which map pixels are considered.

This is not a complete list. Users need to consult the RSRef documentation. Note one important RSRef parameter that will be ignored:

  • Coordinates - they will be read & passed from CNS.

Note also that although resolution and symmetry information are passed from the command line through CNS to RSRef, they are not automatically coordinated with input given in the CNS .inp file. It is the user’s responsibility to ensure that the two specifications of symmetry are compatible.

The several means of RSRef input (command-line; argument file) are all supported in the CNS-embedded version.

Weighting (without test sets / cross-validation) is subjective. Choose an appropriate balance between fit and reasonable stereochemistry.

Installation

Background on Implementation

RSRef is made available to a near-normal CNS through:

  • A C wrapper (wrap.c) that exposes the most useful Python classes and methods of RSRef to potentially any C or Fortran-compiled program.

  • A set of Fortran subroutines (interface.f) that is specific for CNS, sharing COMMON block storage which it exchanges with the arrays/arguments exposed by wrap.c. Almost all of the necessary Fortran functionality is contained within this addition, thereby quarantined from changes in the CNS code base.

  • Minor changes to ener.inc, energy.f & cns.f that are patched in with patch.py.

cns_rsref is therefore built on top of an existing CNS installation.

Choice of Python Interpreter

Warning

Compatibility issues can arise if the same versions of python are not used when building the libpasto_wrap libraries, extending CNS with patch.py (below) and running the modified CNS.

patch.py (below) links libpythonX.Y into the new CNS build. As further highlighted below, dynamic (run-time) linking of shared object libraries (the usual preference now) can be problematic, because CNS uses legacy versions of IMSL routines. Backwardly incompatible modern versions are also loaded through python-numpy. The solution is to lock dependencies at load time, using static archives (.a).

Several widely used pythons distribute the .so, but no longer the .a, by default. For anaconda, libpython-static can be installed separately as an add-on for a subset of python versions, but we have encountered fatal gcc errors reporting unexpected incompatibilities. Preferred is the use of a python virtual environment created with a distribution that includes the .a static archive.

Glossary

The following are defined in the CNS environment:

$CNS_SOLVE

Absolute path to the root of the CNS installation to be extended (…/cns_solve_1.3).

$CNS_ARCH

CNS auto-generated string specific for the machine architecture (eg. intel-x86_64bit-linux).

$CNS_INST

$CNS_SOLVE/$CNS_ARCH

$CNS_SOURCE

$CNS_INST/source - the machine specific build directory.

The following are used for convenience below:

PASTOHOME

Path to the directory where the pasto package is installed, containing directories: pasto, lib, etc..

DATE

Timestamp string, encoded YYMMDDHHMM.

rsref_build

A new directory, $CNS_INST/rsref_build, where rsref-specific files are collected for the rebuild.

Quickstart

If the RSRef user has write access to a full working installation of CNS, then the following will likely be sufficient:

  • Work within the python environment that will be used for running cns_rsref.

  • Ensure that PASTOHOME/lib contains a compatible libpasto_wrap.a. Compatibility depends on machine architecture and python version.

    • Distributed at time-of-writing is compatible with python3.10 and x86_64-linux.

    • If a more suitable version has been saved in PASTOHOME/lib/precompiled/…, then a sympolic link should be created in PASTOHOME/lib.

      • For example, from PASTOHOME/lib: ln -s precompiled/x86_64-linux-gnu/libpasto_wrap.a .

    • Else a new library should be built with lib/compile_wrap.sh, but first customize PYHOME and VER, within.

  • Define the CNS environment for the CNS installation to be modified by:

    • source CNSPATH/.cns_solve_env_sh (bash users)

    • source CNSPATH/cns_solve_env (csh users)

    Where CNSPATH is the explicitly entered $CNS_SOLVE.

  • python3 patch.py, using the patch.py in PASTOHOME/pasto/cns. This will patch cns.f and energy.f, compile an interface to pasto/rsref and build a new cns_solve executable using a modified Makefile. The locations of needed libraries are determined form the environment.

The above will update $CNS_SOURCE/cns_solve-DATE.exe. A third link is added in $CNS_INST/bin such that cns, cns_solve, and cns_rsref all point to this executable. (Unless the RRES term in included in the energy, the extended CNS should behave just like the unmodified.)

Complications

If you do not have write access to a working CNS, then you will be starting from a cns_solve tar download. This will be incompatible with most modern 64-bit environments. So, it will be necessary to modify, as discussed in the next section. Then, you should be able to resume with the Quickstart.

Successful extension of CNS is sensitive to incompatibilities in python versions, libraries needed and how they have been compiled. Further details are provided in subsequent sections below of how the parts are brought together.

Underlying CNS installation

An existing CNS installation will be modified to embed RSRef extensions to support map-fitting terms into the energy minimized. Requirements are:

  • The installer of the RSRef extensions must have write permission to the CNS installation. A new installation, if needed can be started with the downloaded cns_solve_1.3_all.tar:

    • Within your chosen location for $CNS_SOLVE: tar xvf cns_solve_1.3_all.tar

    • edit the tops of files .cns_solve_env.sh and cns_solve_env to reflect the chosen $CNS_SOLVE.

  • The CNS installation make install (don’t do it yet) creates an architecture-specific build directory (eg. intel-x86_64bit-linux). The source subdirectory must retain the Fortran sources and Makefile that would be deleted by make clean.

  • Unless the CNS installation is already functional, before extending with RSRef, compatibilities with modern 64-bit environments will first have to be addressed.

    • CNS has not been maintained since 2010 and there are now source and compiler incompatibilities.

    • An update to Fortan source files can be avoided by compiling machvar.f using different gfortran options.

    • The Makefile to be modified is auto-generated according to machine architecture on make install. If a recent gcc is used, this first make fails due to argument mis-matches.

    • However, the auto-generated Makefile will have been written to $CNS_SOURCE and can now be patched to address 64-bit and compiler compatibilities.

    • Execution of PASTOHOME/pasto/cns/recompile_cns.sh from within $CNS_SOURCE will accomplish all of the steps above, following extraction from a cns_solve_1.3_all.tar file.

libpasto_wrap

Key python objects and methods are exposed to compiled programs, such as CNS, by linking to a library that wraps pasto/rsref. PaStO installation no longer builds the library automatically. This section provides additional rationalization, but not additional directions beyond those in Quickstart. After reading, one might be persuaded to rebuild the library only if a suitable one is not already distributed. Remember to look in both PASTOHOME/lib and in its precompiled/… subdirectories. The following comments pertain only to building the library from source:

  • Use the distributed script lib/compile_wrap.sh to build both shared object (.so) and static archive (.a) libraries.

    • Compatibility is important between versions of include files and python libraries, libpasto_wrap and the CNS compilation.

      • It can be frustrating to satisfy all constraints.

  • Dynamic loading of .so may be preferred for non-CNS extensions.

  • The static library (.a) will be used for CNS. CNS implements legacy versions of IMSL routines (1978-82) whose signatures differ from versions linked into recent distributions of python numpy. It is very difficult to control which versions of symbol tables will be used on dynamic (run-time) loading of .so libraries, particularly with a python interpreter that loads modules as needed. Links to a static archive (.a) are locked at load-time, avoiding this problem.

Libraries can be pretested before linking to CNS. pasto/test/test_libraries.sh builds executables from tst_wrap.c and performs a regression test.

Installation Script: pasto/cns/patch.py

Firstly, make sure that you are using the python environment to be used when running cns_rsref. Secondly, (re-)set the CNS environment once the CNS build directory contains the architecture-specific Fortran source and Make files. This will define the locations of CNS files to be modified.

Then PASTOHOME/pasto/cns/patch.py will create patched versions of cns.f, energy.f, ener.inc and Makefile (now including rsref_make.inc). These are written to a new directory, rsref_build, and the contents of $CNS_SOURCE are overwritten with links pointing to rsref_build versions. PASTOHOME/pasto/cns/patch.py uses the rsref_build/Makefile to recompile the modified sources. libpasto_wrap.a is found by relative path from the patch.py invoked. Other libraries to be linked are located using sysconfig variables that will be specific for python interpreter used to run patch.py.

The result will be an updated cns_solve-DATE.exe in the architecture-specific $CNS_SOURCE directory. This is linked to from the $CNS_INST/bin directory with cns, cns_solve and the new cns_rsref. They all point to the same executable. Note that the modified CNS should operate like the original unless the rres energy term is included.

Performance guidelines

For a ~9,000 atom structure, surrounded by ~14,000 NCS neighbors, on a 2018 workstation, gradient descent takes 11s CPU / cycle. Slow-cooling from 3,000K followed by 50 cycles of gradient descent takes 80 min. Small substructures can be refined very quickly.

Troubleshooting

Coordinates

PDB output

TER records inappropriately inserted residues and connectivity in Coot wrong

This can occur with multiconformer structures from CNS or X-plor that designate conformer in with the segid columns rather than the standard alt conf column. If then processed by a PaStO program, including translate.py, information from the segid and alt conf attributes are merged and new chain TERs can be inserted spuriously. CNS and X-plor PDB files must first be converted to the standard designation.

The following script can be used to re-standardize a CNS multi-conformer PDB file:

#! /bin/bash
# usage segid2alt.sh <segid> <alt_designator> <pdb_input> [> <pdb_output>]
# eg. ./segid2alt AC2 B my.pdb > new.pdb
# Adds a standard alternative conformation designator to atoms with a particular segid
# Needed because CNS drops the alt designator
sed -e "/${1}/s/\(^ATOM  ..........\)./\1${2}/" -e "/${1}/s/\(^HETATM..........\)./\1${2}/" ${3}
exit

History

Changed in version 0.1: 10/05/11

Changed in version 0.4.2: 11/08/13

Changed in version 0.5: 05/20/2015 ReStructuredText docs

Changed in version 1.0.0: python3

Changed in version 1.0.4: (04/10/21) 64-bit CNS compatibility by source patches.

Changed in version 1.0.5: (07/15/24) deprecation of distutils to setuptools

Changed in version 1.0.6: (11/12/24) 64-bit compatibility redone by compile flags. Dependency satisfaction by “best available approximation” is deprecated. Hitherto, the system was searched for most recent python dependents if any were missing from the python environment invoked. (Convenience is outweighed by confusion debugging subtle version incompatibilities.) Also deprecated is generation of a cns_rsref executable from a CNS installation for which there is not write-access. Creation of the pasto_wrap libraries is now by a provided script rather than automatic, facilitating pure python installation by PIP.

Credits

This is a completely new implementation by Michael Chapman, but it builds upon experience of others in the Chapman lab. in writing interfaces to X-plor and earlier versions of CNS. These include Richard Bertram, Zhi (James) Chen, Andrew Korostelev, Felcy Fabiola & Andrew Trzynka. Matt Stanley provided key help in installing CNS in modern 64-bit environments. Supported by funding from NIH.