Installation & Getting Started¶
Synoposis¶
PaStO is a small suite of programs containing RSRef for Real-space atomic refinement, Superpose for flexible or rigid group structure comparison, and some supporting programs:
The core functionality of RSRef is comparison of an atomic model to a 3D map:
Crystallographic electron density
Electron microscopic (EM) Coulombic potential
With calculation of least squares residuals and their minimization, using partial derivatives of the fit with respect to model and imaging paramters. Thus consistency of atomic model and map can be optimized, improving the correlation coefficient. Model parameters include coordinates and B-factors. EM imaging parameters include magnification, resolution and envelope correction to maximize the consistency of atomic model and map.
Patches are provided to embed RSRef in the CNS pacakge. This provides a real space target for stereochemically restrained refinement. Several optimizers are then supported, inclucing Powell or L-BFGS gradient descent or simulated annealing molecular dynamics. Atomic models can be parameterized as cartesian all-atom or torsion angle.
RSRef and SuperPose have growing capabilities for their own (stand-alone) atomic refinements that complement (& increasingly supersede) traditional packages like CNS. A starting emphasis within the whole PaStO package has been parsimonious optimization, mimimizing changes to a structure when when fitting to a map or an overlaid structure. Rigid-group and (backbone) torsion angle optimizations are supported with optional van der Waal’s repulsion and distance restraints. Stereochemistry within molecules (or fragments) is fixed / constrained. Such reduced-parameter approaches mitigate over-fitting when datasets are lower resolution. Further stereochemical restraints are under development to support additional modes of refinement.
Dependencies¶
CNS and Bsoft (if desired) need to be available for RSRef extension / embedding upon setup of RSRef, so must be pre-installed. Python dependencies can be satisfied at run-time.
CNS¶
High-resolution, all atom flexible real-space refinement requires RSRef to be embedded within CNS. Other modes of refinement do not require CNS, so co-installation is optional. Installation requires recompilation of CNS. RSRef should be compatible with multiple versions of CNS and is tested with v1.3.
mrcfile¶
Natively RSRef supports only X-plor/CNS format maps.
Formats .ccp4
and .mrc
are supported by calls to the mrcfile
python package which can be installed in one of the following standard ways:
pip install mrcfile
conda install -c conda-forge mrcfile
This is optional, but recommended. Jobs are often nearly twice the speed with binary map formats.
Bsoft¶
Support for .ccp4
, .mrc
and many other legacy map formats can
also be added by linking in the
Bsoft library.
This is no longer recommended, as bsoft silently ignores alternative map
orientations specified in the header, because our interface is
limited to now-legacy Bsoft versions <= 1.9.2, and because we suspect bugs
in our extension interface that are kluged-over non-generally.
Python & modules¶
Links are given for reference, but installation will likely be easier with pip, conda etc..
- Version 3 interpreter
PaStO v1.0 has been refactored for Python 3 (tested on v3.6 to 3.12). Support for Python v2.7 has been dropped. For setup with CNS (and Bsoft), it will have to be a development-capable installation of python that provides static archive libraries (
.a
), and not a stripped-down version, sometimes shipped with the operating system. (At the time of writing, gcc finds fatal incompatibilities with the .a distributed in the Anaconda libpython-static package (v3.12).)- NumPy
Required throughout; tested with v2.0
- SciPy
Widespread use; requires version >= 0.12; tested with v1.13.
- cmd2
Required for command control in all but CNS-embedded RSRef; Version >= 2.4. cmd2 is an extension of the python cmd module.
- mrcfile
Replaces Bsoft for map input. Only XPLOR format is supported natively. Version >= 1.5.
- Possibly gnureadline
Only for python installations (unlike Anaconda) that do not include the “standard” readline extension module, and give a startup warning. It is used only for tab-completion by cmd2 and is not absolutely required. Tested with v8.0.0.
- matplotlib
Graphing for several commands in superpose; tested with v.3.5
- Sphinx
Is needed if you want to rebuild the supplied documentation or build it in a different format (pdf, man pages, etc.). Requires versioin >= 3.2; tested with v7.3.7.
- libpython-static <https://anaconda.org/conda-forge/libpython-static/>
Required only if embedding PaStO within CNS and using a recent Anaconda python. This module provides
libpythonX.Y.a
that is no longer contained in some distributions of python. (At the time of writing, gcc finds fatal incompatibilities with the .a distributed in the Anaconda libpython-static package (v3.12). For now, a different distribution of python will be needed (with CNS) and this package will not be required.)
RSRef Installation¶
In the following “X.Y.Z” is used to denote version, eg. 1.0.6
Python environment¶
It is recommended that PaStO be installed within an isolated python environment. To create a new environment and activate it:
python -m venv <envpath>
source <envpath>/bin/activate
for example:
python3.10 -m venv ~/.virtualenvs/pastoenv
source ~/.virtualenvs/pastoenv/bin/activate
At the time of writing, incompatibilities in libpython3.12.a prevent the use of an Anaconda environment, specifically for building and extended CNS with PaStO embedded, but it would be created as:
conda create -n <envname>
conda activate <envname>
(or source activate <envname> on some systems)
The activate command will have to be repeated for each session using PaStO and so is a good candidate to inclusion in a login script.
pip install¶
This is recommended for all installations, but strictly it is optional if you also plan to extract from a tar distribution. pip install is also the preferred route to importing unmodified pasto into other packages.
It is hoped that soon it will be possible to distribute from the PyPI
repository, but currently licensing restrictions require the wheel (.whl) file
to be obtained from the developers and installed explicitly:
pip install path/PaStO-X.Y.Z-py3-none-any.whl
.
tar extraction¶
The full tar-extracted distribution will be needed either to rebuild CNS with PaStO embedded or for further development of PaStO. Even if a pip-installation is to be used, it is recommended that the tar also be extracted for reference. It provides additional tests, examples and local documentation that are not distributed with the wheel.
(Unix is described first. Adaptations for Windows are described in a subsequent section.)
Validation:¶
(After setting the environment appropriately, see below…)
A quick minimal test refinement can be run with run_rsref.sh
.
The script comments on the results expected.
More extensive examples are noted in “Getting Started” below.
Additionally, a panel of module unit tests is available in
$<PARENT>/pasto/test
.
In this directory, unit-test modules can be run individually, or python can be asked to
find/run all test modules with python -m unittest discover -v.
Some of the tests use large data sets that are distributed with the tar file,
but not the wheel (pip).
Windows¶
Warning
The Windows implementation has not been tested since major refactoring.
Todo
Test and rewrite Windows instructions.
- Stand-alone functionality is supported.
Input of .ccp4 and .mrc maps should be possible with the mrcfile package, but has not been tested. BSoft is not available for Windows. CNS/Xplor maps are supported natively in the PaStO package.
- CNS is available.
If appropriate compilers are available, the CNS extension could, in principle, be supported, but this has not been tested.
Warning
Recent versions need to be retested in a Windows environment.
The distribution file, pasto-X.Y.Z.tar.zip
has been prepared by python -m build.
Extract the files into the same folder containing the .zip file (likely
one folder above the default). In a Command prompt window, descend into
pasto-X.Y.Z
and enjoy.
Validation: A command-line equivalent to run_rsref.sh would be:
set DATA=%<PARENT>%\pasto\test
rsref.py --map=%DATA%\alanine-2.0.xplor --high_resolution=2.0 \
--pdb_in=%DATA%\alanine.pdb \
--by_resolution --map_use=0.9 --atom_extent=2.5
evaluate
stop
(Which could be put in a batch .bat file…)
Distribution - developers only¶
The build package will be needed: pip install build.
Files dist/PaStO-X.Y.Z-py3-none-any.whl
and
dist/pasto-X.Y.Z.tar.gz
are created with
python -m build in the parent directory of pasto
.
The wheel file contents are specified within pyproject.toml.
The tar file contents are specified within MANIFEST.in.
Contents may be checked respectively with unzip -l dist/*.whl and
tar tvzf dist/*.tar.gz.
Environment¶
Note
This section is optional if you are using a pip wheel installation, because command-line entry points are provided. Environmental variables are set automatically in cns_rsref.sh examples. So, this section is required only for direct running of the source code (as it comes from a tar extraction).
Unix:¶
Something like the following will be needed (in bash), where <PARENT> is the
path of the directory containing the pasto package,
(.../pasto-X.Y.Z
if extracted from the tar file):
export PATH=<PARENT>:<PARENT>/pasto:${PATH}
export PYTHONPATH=<PARENT>:<PARENT>/pasto:${PYTHONPATH}
The use of cns input files may depend on CNS environmental parameters that can be set individually as needed or in csh with source cns_solve_env.
Windows¶
Windows environmental variables are set in a similar way (command prompt window):
set PATH=%<PARENT>%:%<PARENT>/pasto%:%PATH%
set PYTHONPATH=%<PARENT>%:%<PARENT>/pasto%:%PYTHONPATH%
or they can be set permenantly in the autoexec.bat
file using msconfig.
Documentation¶
HTML documentation is provided with an entry point of
${<PARENT>}/doc/index.html.
This has been prepared for distribution using
documentation.py
, invoked by
doc/documentation.sh
, which can be used to rebuild the
documentation or make it available in different formats, see Colophon.
Options and command summaries are documented seperately, for example:
rsref.py -h
(command-line) or
i.e. run rsref.py, then “help” at the prompt.Getting started¶
Running from a pip installation¶
The following commands are defined in ${VIRTUAL_ENV}/bin and hence within the ${PATH} of the python environment. They constitute entry points to the package, running the associated python module as a main program:
- RSRef [options]
Compare and optimize the fit of an atomic model to a map.
- SuperPose [options]
Superimposition and morphing of one structure onto another.
- SymExp [options]
Generate neighbors from crystallographic and local symmetry.
- Translate [options]
Switch between atom naming conventions in PDB coordinate files.
- PaStO <module> [options]
An alternative way of invoking any pasto module with main() (e.g PaStO rsref -h)
Modules can also be imported within a python script: import pasto.<module>.
Running from source¶
If the environment is defined as previously, the above commands can also be run, using tar-extracted sources (or those within a pip install), as:
rsref.py [options]
python3 -m rsref [options]
superpose.py [options]
python3 -m superpose [options]
symexp.py [options]
python3 -m symexp [options]
translate.py [options]
python3 translate [options]
Examples¶
A suite of examples needs to be assembled.
<PARENT>/examples
contains sub-directories with data, input,
output structures and log files.
See README.txt
in each sub-directory for details:
- virus-antibody:
exemplifies rigid-body domain-level refinement of a complex using an EM map at 10 Å resolution.
- virus-5A:
exemplifies flexible fitting to a 4.5 Å EM reconstruction; also Superpose comparison of structures.
- pkd_rotamers:
automatic selection of library rotamers that best fit a map.
Version History¶
Added in version 1.0.6: pip install added (11/26/24).
Added in version 1.0.5: Compatible with breaking change in cmd2 2.0.1: py command no longer available, py instead opens python shell; consider also run_pyscript file.py. Scripted examples within the distributed package have been updated (7/11/24).
Added in version 1.0.2: Expose alternative optimization algorithms from scipy.optimize.minimize for rsref & superpose. This also improves the consistency of convergence criteria between different methods (2/5/21).
Deprecated since version 1.0.2: End full support of bsoft map reading, retaining code for future input of legacy formats, but should be debugged (1/17/21).
Added in version 1.0.2: Switch to mrcfile support for .mrc and .ccp4 map input (1/17/21).
Changed in version 1.0.1: Updates to documentation.py and doc-strings: html & pdf doc sets complete (1/10/21).
Changed in version 1.0.0a: Refactored for module options to be passed Arguments sub-classes instead of ArgumentParser sub-classes (11/21/20).
Added in version 1.0.0a: Support for Fortran-like operator synonyms to work around the cmd2
limitation that >, |
are captured as output redirects and not passed
through in program commands with atom selections or python expressions
(11/21/20).
For example, .GT., .ge., .OR. can be used to replace >, >= & |
.
Changed in version 1.0.0: Breaking change: upgrade to cmd2 v1.3 with argparse was required as python
optparse has been deprecated (November 2020).
This changes syntax of program control commands.
It introduced a new limitation of cmd2 that > and |
are parsed as
redirects (or removed) and are not passed forward as part of a command.
Added in version 1.0.0: python 3.6 support (November 2020)
Deprecated since version 1.0.0: python 2.7 support (November 2020)
Added in version 0.6.2: Connectivity determined from covalent radii rather than single absolute distance. Needed to support hydrogens (08/24/20).
Added in version 0.6.1: Outputs (CNS/X-plor) segid if input & not using symmetry extensions of PDB format (8/24/20).
Added in version 0.6.0: Support of different atom-name nomenclatures (6/20/20 - 07/18/20).
Added in version 0.5.9: Added atoms.angle() from Davis Catolico (11/21/17) in preparation for bond angle restraints that have not yet been implemented as of 12/22/20. This will need testing.
Added in version 0.5.8: Distance (x-linking) restraints (07/11/17)
Added in version 0.5.8: Rotamer libraries (developed from branch at 0.3.7) (07/03/17)
Added in version 0.5.8: Segmented torsion refinement, but still need rigid group (04/24/17). As of 12/22/20, needs work on user interface and restraints between fragments.
Added in version 0.5.7: Covalent geometry restraints: bond lengths (05/01/17); needs checking as of 12/22/20.
Changed in version 0.5.6: In retrospect, final public distribution before upgrade to 1.0.0.
Added in version 0.5.6: Comparison of anisotropic Bs; adding vector.py (01/11/16).
Added in version 0.5.6: van der Waals restraint (10/22/15) - needs more experience / use as of 12/22/20.
Added in version 0.5.5: Anisotropic Us ride with atom rotations in rigid / torsion motions (10/13/15).
Added in version 0.5.3: torsion.get_all_psi() & other locations now robust to missing OXT (09/27/15).
Changed in version 0.5.2: torsion.py now direction-neutral, so robust to different atom ordering and arbitrary directions of edge determination in the bond graph (08/26/15).
Added in version 0.5.1: calculate() offers (atoms) selection option needed for rotamer evaluation (5/17/20).
Added in version 0.5.1: Tolerate missing Scale/Offset w/ warning (8/11/15).
Added in version 0.5.0: ReStructuredText / Sphinx documentation (05/20/15).
Deprecated since version 0.5.0: EpyDoc doc-strings.
Added in version 0.5.0: Support RAve & RSigma (aka Scale & Offset) in a final record of an X-plor map (2/19/15).
Added in version 0.5.0: Output maps written in Xplor/CNS format (2/17/15).
Added in version 0.4.2: Torsion angle refinement (pre-alpha) in RSRef (October 2013), later extended to Superpose.
Added in version 0.4.2: Limiting calculation of partials according to AtomicParameterization properties for improved efficiency (October 2013).
Changed in version 0.4.2: Breaking change: User interface for model parameterization is redone, requiring changes in program input (October 2013)
Changed in version 0.4.1: Just-in-time Tasks() manager (July 2013), widely deployed, satisfying pre-requisites only as needed, improving efficiency, and reducing run-time errors.
Added in version 0.4.0: symexp.py is a main-program wrap, so that symmetry can be refactored within
Tasks(), see above, to be used by multiple programs.
run_symmetry.sh
is replaced by run_symexp.sh
.
Deprecated since version 0.4.0: Bsoft versions <= 1.7.
Added in version 0.4.0: Bsoft supported, version 1.8.
Added in version 0.4.0: Internals refactored as class option, providing access to attributes through python user-commands (July 2013).
Changed in version 0.3.9: A refine command without arguments will continue additional cycles using previously defined parameters, rather than using defaults (July 2013).
Changed in version 0.3.9: --normalize
(-N
) now needed for normalization of the
input map (used to be the default, July 2013).
Added in version 0.3.9: cmd2nest, an extension of cmd2 - cascading sub-menus of commands (06/05/13).
Changed in version 0.3.9: Upgrade from cmd to cmd2 (05.04/13).
Changed in version 0.3.9: converted from optparse to argparse (5/4/13).
Changed in version 0.3.9: Under-the-hood reprogramming to pull functionalities out of rsref-refine so that they can also be used in superpose (Apr 2013). Analyze becomes an independent command.
Changed in version 0.3.8: Reimplemented superpose to use Pairing
, supporting selections
on both target and moving structures and matching by flexible criteria
(February 2013).
Changed in version 0.3.7: torsion.py is Michael’s consolidation of Brynmor’s TAProtein.py & TAProtein_copy.py (from Augusst 2011) which are now deprecated. Some parts are moved to transform.py. Improving documentation, consistency with other parts of RSRef (11/1/12).
Added in version 0.3.6: Compound expressions within Selections (Oct 2012).
Added in version 0.3.6: Foundations for torsion angle refinement, still under development.
Added in version 0.3.5: First examples included (Sept 2012)
Added in version 0.3.5: Command-line options to condition / scale parameters for improved refinement convergence (Sept 2012)
Added in version 0.3.5: Rigid-group refinement (Sept 2012).
Changed in version 0.3.5: Breaking change: input for commands refine and image_refine change.
Added in version 0.3.4: Distribution de-bugging, documentation overhaul (8/23/12).
Added in version 0.3.3: Patch embedding RSRef into CNS 1.3 is now stable (03/09/12).
Changed in version 0.3.3: Caching of atomic density debugged (3/5/12). Previously had re-used density if B-factor matched without checking atom type. Caching is now disabled when B-factors, Envelope & Resolution refined; improves robustness of image and atom refinement.
Added in version 0.3.2: Restrained refinement of individual B-factors (12/15/11).
Added in version 0.3.1: Crystallographic (lattice) symmetry (December 2011)
Added in version 0.3.0: Contributions from non-crystallographic (molecular) symmetry equivalent atoms (Fall 2011).
Added in version 0.3.0: Through linking to Bsoft library, add support for CCP4, MRC, SPI, DSN6 and other map formats to CNS and X-plor (Fall 2011).
Added in version 0.3.0: Expand form factor sets from which maps are calculated from atoms (Fall 2011), expanding from X-ray into neutron diffraction and widening support for Electron Microscopy.
Added in version 0.3.0: Low-pass filtering and resolution limit (replaces explicit modeling of CTF) that can now be refined (Fall 2011).
Changed in version 0.3.0: Removed bug in which stereochemical terms were omitted from CNS-embedded refinement (Fall 2011).
Changed in version 0.3.0: Map and other attributes retained in RAM, even when communicating with CNS, improving performance (Fall 2011).
Added in version 0.3.0: Compatibility with latest CNS (version 1.3; Fall 2011).
Changed in version 0.3.0: Completely new Python implementation (11/18/11).
Changed in version 0.2: Phipsi restraints: direct use of target PDB file instead of MoleMan output (11/25/12).
Added in version 0.0.1: - 0.1 02/14/10 - 10/28/11: Initial python programming continues
Added in version 0.0.0: EMScale - now deprecated first draft of scaling model and map (December 2009). Restarting version numbers.
Deprecated since version RSRef2000: The forerunner implemented in C, used through 2011. Drop support (for now): GUIs / macro extensions for O molecular graphics and explicit use of EM CTF parameters.
Ongoing issues¶
SciPy bug in some Windows versions:¶
Enthought (Windows 64 bit) bug in SciPy 0.8.0 through 0.10.0 (at least, still a problem 3/4/12) - causes optimize.fmin_l_bfgs (refinement) to hang. This likely started with EPD version 7.0. Googling indicates that can be circumvented by installing SciPy independently (with Python, NumPy) or using an older version of EPD. This bug is known to others, and is not a problem for Linux EPD distributions. Status of Windows 32 bit is unknown.
OpenMP Multi-Processing on unix platforms:¶
Python’s global interlock precludes parallel calculations through multi- threading, except with C extensions. A user’s NumPy or SciPy may have been compiled with OpenMP, in which case the array processing will grab half of the total available threads, but with the over-head of short-lived threads, there is almost no improvement in time. Not all NumPy / SciPy instals are compiled with OpenMP, so first check whether RSRef is using more than 100% of a CPU (top or pidstat). If so, RSRef will be friendlier to other processes if the number of threads / process is limited by setting the environmental variable OMP_MUM_THREADS=1. When embedded, the CNS part can make effective use of multi-threading, so you might want to allow a larger number of threads, balancing the gain in CNS with the waste in RSRef.
Crystallographic symmetry:¶
Support of crystallographic symmetry is temporarily suspended. The (pickled) dictionary of operators is incompatible with python 3, and its rebuilding depends on availability of a python 3 compatible cctbx (phenix). PaStO programs are usually run with at most local / molecular / non-crystallographic symmetry, so it is hoped that this is not a problem.
Colophon¶
Documentation is available:
online at: https://chapman.missouri.edu/software/
within local installations, starting at:
file:///$<PARENT>/doc/build/index.html
This section documents how documentation is rebuilt following an edit, or how to add additional formats, such as pdf or man pages.
documentation.py
is used to assemble documentation under
the user’s chosen DOC-PATH with a command like:
python -u -m documentation [--clobber] [--developer] [--warnings] --nonstop
--builder=html --builder=latexpdf DOC-PATH
(Take care, because the default DOC-PATH will update / overwrite the
installation documentation in $RSREF_HOME/doc
.)
There are many options, to be listed with documentation.py --help,
but --clobber
is used to overwrite existing files (without warning),
and --warnings
is to print documentation diagnostics and should be
omitted for a clean set.
documentation.sh
runs documentation.py
with
appropriate parameters to generate a html documentation set and a pdf manual.
Run twice, and it should complete the cross references.
Further details of documentation.sh
are given below.
documentation.py
gathers in DOC-PATH/source the
reStructuredText
(.rst
& .txt
) files needed from:
A handful of overview files in the
$<PARENT>
and$<PARENT>/pasto
folders.Auto-building and extraction of doc-strings from
.py
source files in the$<PARENT>/pasto
tree, using Sphinx.Auto-generation of
<program>_cmd_list.rst
and<program>_cmd_help.rst
files, by capturing the output for each <program> of:program.py --help
(command-line arguments).(command-control help).
which are then included in the documentation for each program.
documentation.py
then builds a documentation set in html
format (by default) in DOC-PATH/build/html using
Sphinx.
Additional formats are optional when running
documentation.py
or can be added using a command such as:
make latexpdf
while in the folder DOC-PATH.
This uses the Makefile
generated by
Sphinx.
Credits¶
Michael S. Chapman with help from Andrew Trzynka & Brynmor K. Chapman
Last updated: 07/11/24; chapmanms@missouri.edu