****************************** Installation & Getting Started ****************************** .. include:: .. include:: Synoposis ========= PaStO is a small suite of programs containing RSRef for Real-space atomic refinement, Superpose for flexible or rigid group structure comparison, and some supporting programs: * The core functionality of RSRef is comparison of an atomic model to a 3D map: - Crystallographic electron density - Electron microscopic (EM) Coulombic potential With calculation of least squares residuals and their minimization, using partial derivatives of the fit with respect to model and imaging paramters. Thus consistency of atomic model and map can be optimized, improving the correlation coefficient. Model parameters include coordinates and B-factors. EM imaging parameters include magnification, resolution and envelope correction to maximize the consistency of atomic model and map. * Patches are provided to embed RSRef in the `CNS pacakge `_. This provides a real space target for stereochemically restrained refinement. Several optimizers are then supported, inclucing Powell or L-BFGS gradient descent or simulated annealing molecular dynamics. Atomic models can be parameterized as cartesian all-atom or torsion angle. * RSRef and SuperPose have growing capabilities for their own (stand-alone) atomic refinements that complement (& increasingly supersede) traditional packages like CNS. A starting emphasis within the whole PaStO package has been parsimonious optimization, mimimizing changes to a structure when when fitting to a map or an overlaid structure. Rigid-group and (backbone) torsion angle optimizations are supported with optional van der Waal's repulsion and distance restraints. Stereochemistry within molecules (or fragments) is fixed / constrained. Such reduced-parameter approaches mitigate over-fitting when datasets are lower resolution. Further stereochemical restraints are under development to support additional modes of refinement. .. _Dependencies: Dependencies ============ CNS and Bsoft (if desired) need to be available for RSRef extension / embedding upon setup of RSRef, so must be pre-installed. Python dependencies can be satisfied at run-time. CNS --- High-resolution, all atom flexible real-space refinement requires RSRef to be embedded within `CNS `_. Other modes of refinement do not require CNS, so co-installation is optional. :ref:`Installation ` requires recompilation of CNS. RSRef should be compatible with multiple versions of CNS and is tested with v1.3. mrcfile ------- Natively RSRef supports only X-plor/CNS format maps. Formats :file:`.ccp4` and :file:`.mrc` are supported by calls to the mrcfile python package which can be installed in one of the following standard ways: * pip install mrcfile * conda install -c conda-forge mrcfile This is optional, but recommended. Jobs are often nearly twice the speed with binary map formats. Bsoft ----- Support for :file:`.ccp4`, :file:`.mrc` and many other legacy map formats can also be added by linking in the `Bsoft library `_. This is no longer recommended, as bsoft silently ignores alternative map orientations specified in the header, because our interface is limited to now-legacy Bsoft versions <= 1.9.2, and because we suspect bugs in our extension interface that are kluged-over non-generally. Python & modules ---------------- Links are given for reference, but installation will likely be easier with pip, conda *etc.*. Version 3 interpreter PaStO v1.0 has been refactored for Python 3 (tested on v3.6 to 3.12). Support for Python v2.7 has been dropped. For setup with CNS (and Bsoft), it will have to be a development-capable installation of python that provides static archive libraries (:file:`.a`), and not a stripped-down version, sometimes shipped with the operating system. (At the time of writing, gcc finds fatal incompatibilities with the .a distributed in the Anaconda libpython-static package (v3.12).) `NumPy `_ Required throughout; tested with v2.0 `SciPy `_ Widespread use; requires version >= 0.12; tested with v1.13. `cmd2 `_ Required for command control in all but CNS-embedded RSRef; Version >= 2.4. cmd2 is an extension of the python cmd module. `mrcfile `_ Replaces Bsoft for map input. Only XPLOR format is supported natively. Version >= 1.5. Possibly `gnureadline `_ Only for python installations (unlike Anaconda) that do not include the "standard" readline extension module, and give a startup warning. It is used only for tab-completion by cmd2 and is not absolutely required. Tested with v8.0.0. `matplotlib `_ Graphing for several commands in superpose; tested with v.3.5 `Sphinx `_ Is needed if you want to rebuild the supplied documentation or build it in a different format (pdf, man pages, *etc.*). Requires versioin >= 3.2; tested with v7.3.7. `libpython-static ` Required only if embedding PaStO within CNS and using a recent Anaconda python. This module provides :file:`libpython{X}.{Y}.a` that is no longer contained in some distributions of python. (At the time of writing, gcc finds fatal incompatibilities with the .a distributed in the Anaconda libpython-static package (v3.12). For now, a different distribution of python will be needed (with CNS) and this package will not be required.) .. _RSref-Installation: RSRef Installation ================== In the following "X.Y.Z" is used to denote version, eg. 1.0.6 Python environment ------------------ It is recommended that PaStO be installed within an isolated python environment. To create a new environment and activate it: .. code-block:: bash python -m venv source /bin/activate for example: .. code-block:: bash python3.10 -m venv ~/.virtualenvs/pastoenv source ~/.virtualenvs/pastoenv/bin/activate At the time of writing, incompatibilities in libpython3.12.a prevent the use of an Anaconda environment, specifically for building and extended CNS with PaStO embedded, but it would be created as: .. code-block:: bash conda create -n conda activate (or :command:`source activate ` on some systems) The activate command will have to be repeated for each session using PaStO and so is a good candidate to inclusion in a login script. pip install ----------- This is recommended for all installations, but strictly it is optional if you also plan to extract from a tar distribution. pip install is also the preferred route to importing unmodified pasto into other packages. It is hoped that soon it will be possible to distribute from the PyPI repository, but currently licensing restrictions require the wheel (.whl) file to be obtained from the developers and installed explicitly: :command:`pip install` :file:`{path}/PaStO-{X}.{Y}.{Z}-py3-none-any.whl`. tar extraction -------------- The full tar-extracted distribution will be needed either to rebuild CNS with PaStO embedded or for further development of PaStO. Even if a pip-installation is to be used, it is recommended that the tar also be extracted for reference. It provides additional tests, examples and local documentation that are not distributed with the wheel. (Unix is described first. Adaptations for Windows are described in a subsequent section.) Embedding in CNS: ----------------- Directions are provided in the `documentation `_. (From the home page, click on Embedded RSRef, then Installation in the left frame.) Validation: ----------- (After setting the environment appropriately, see below...) A quick minimal test refinement can be run with :file:`run_rsref.sh`. The script comments on the results expected. More extensive examples are noted in "Getting Started" below. Additionally, a panel of module unit tests is available in :file:`${}/pasto/test`. In this directory, unit-test modules can be run individually, or python can be asked to find/run all test modules with :command:`python -m unittest discover -v`. Some of the tests use large data sets that are distributed with the tar file, but not the wheel (pip). Windows ------- .. warning:: The Windows implementation has not been tested since major refactoring. .. todo:: Test and rewrite Windows instructions. Stand-alone functionality is supported. Input of .ccp4 and .mrc maps should be possible with the mrcfile package, but has not been tested. BSoft is not available for Windows. CNS/Xplor maps are supported natively in the PaStO package. CNS is available. If appropriate compilers are available, the CNS extension could, in principle, be supported, but this has not been tested. .. warning:: Recent versions need to be retested in a Windows environment. The distribution file, :file:`pasto-{X}.{Y}.{Z}.tar.zip` has been prepared by :command:`python -m build`. Extract the files into the same folder containing the .zip file (likely one folder above the default). In a Command prompt window, descend into :file:`pasto-{X}.{Y}.{Z}` and enjoy. Validation: A command-line equivalent to run_rsref.sh would be: .. code-block:: winbatch set DATA=%%\pasto\test rsref.py --map=%DATA%\alanine-2.0.xplor --high_resolution=2.0 \ --pdb_in=%DATA%\alanine.pdb \ --by_resolution --map_use=0.9 --atom_extent=2.5 evaluate stop (Which could be put in a batch .bat file...) .. _Distribution: Distribution - developers only ============================== The build package will be needed: :command:`pip install build`. Files :file:`dist/PaStO-{X}.{Y}.{Z}-py3-none-any.whl` and :file:`dist/pasto-{X}.{Y}.{Z}.tar.gz` are created with :command:`python -m build` in the parent directory of :file:`pasto`. The wheel file contents are specified within pyproject.toml. The tar file contents are specified within MANIFEST.in. Contents may be checked respectively with :command:`unzip -l dist/*.whl` and :command:`tar tvzf dist/*.tar.gz`. .. _Environment: Environment =========== .. note:: This section is optional if you are using a pip wheel installation, because command-line entry points are provided. Environmental variables are set automatically in cns_rsref.sh examples. So, this section is required only for direct running of the source code (as it comes from a tar extraction). Unix: ----- Something like the following will be needed (in bash), where is the path of the directory containing the pasto package, (:file:`.../pasto-{X}.{Y}.{Z}` if extracted from the tar file): .. code-block:: bash export PATH=:/pasto:${PATH} export PYTHONPATH=:/pasto:${PYTHONPATH} The use of cns input files may depend on CNS environmental parameters that can be set individually as needed or in csh with source cns_solve_env. Windows ------- Windows environmental variables are set in a similar way (command prompt window): .. code-block:: winbatch set PATH=%%:%/pasto%:%PATH% set PYTHONPATH=%%:%/pasto%:%PYTHONPATH% or they can be set permenantly in the :file:`autoexec.bat` file using msconfig. Documentation ============= HTML documentation is provided with an entry point of `${}/doc/index.html `_. This has been prepared for distribution using :mod:`documentation.py `, invoked by :file:`doc/documentation.sh`, which can be used to rebuild the documentation or make it available in different formats, see `Colophon`_. Options and command summaries are documented seperately, for example: .. code-block:: bash rsref.py -h (command-line) or :menuselection:`RSRef --> help [command]` *i.e.* run rsref.py, then "help" at the prompt. Getting started =============== .. _Running: Running from a pip installation ------------------------------- The following commands are defined in ${VIRTUAL_ENV}/bin and hence within the ${PATH} of the python environment. They constitute entry points to the package, running the associated python module as a main program: RSRef [options] Compare and optimize the fit of an atomic model to a map. SuperPose [options] Superimposition and morphing of one structure onto another. SymExp [options] Generate neighbors from crystallographic and local symmetry. Translate [options] Switch between atom naming conventions in PDB coordinate files. PaStO [options] An alternative way of invoking any pasto module with main() (e.g :command:`PaStO rsref -h`) Modules can also be imported within a python script: :command:`import pasto.`. Running from source ------------------- If the environment is defined as previously, the above commands can also be run, using tar-extracted sources (or those within a pip install), as: .. code-block:: bash rsref.py [options] python3 -m rsref [options] superpose.py [options] python3 -m superpose [options] symexp.py [options] python3 -m symexp [options] translate.py [options] python3 translate [options] .. _Extended_CNS: Running RSRef-extended CNS: --------------------------- First, see the `documentation `_. (From the home page, click on Embedded RSRef.) Then use the Examples (described below) such as in the virus-5A folder. After editing for your local environment, jobs are run by :command:`./cns_rsref.sh` :file:`{cns-commands}.inp`. .. _Examples: Examples -------- A suite of examples is being expanded. :file:`{}/examples` contains sub-directories with data, input, output structures and log files. See :file:`README.txt` in each sub-directory for details: virus-antibody: exemplifies rigid-body domain-level refinement of a complex using an EM map at 10 |Aring| resolution. virus-5A: exemplifies flexible fitting to a 4.5 |Aring| EM reconstruction; also :program:`Superpose` comparison of structures. pkd_rotamers: automatic selection of library rotamers that best fit a map. .. _Change-log: Version History =============== .. versionadded:: 1.0.6 pip install added (11/26/24). .. versionadded:: 1.0.5 Compatible with breaking change in cmd2 2.0.1: py command no longer available, py instead opens python shell; consider also run_pyscript file.py. Scripted examples within the distributed package have been updated (7/11/24). .. versionadded:: 1.0.2 Expose alternative optimization algorithms from scipy.optimize.minimize for rsref & superpose. This also improves the consistency of convergence criteria between different methods (2/5/21). .. deprecated:: 1.0.2 End full support of bsoft map reading, retaining code for future input of legacy formats, but should be debugged (1/17/21). .. versionadded:: 1.0.2 Switch to mrcfile support for .mrc and .ccp4 map input (1/17/21). .. versionchanged:: 1.0.1 Updates to documentation.py and doc-strings: html & pdf doc sets complete (1/10/21). .. versionchanged:: 1.0.0a Refactored for module options to be passed Arguments sub-classes instead of ArgumentParser sub-classes (11/21/20). .. versionadded:: 1.0.0a Support for Fortran-like operator synonyms to work around the cmd2 limitation that ``>, |`` are captured as output redirects and not passed through in program commands with atom selections or python expressions (11/21/20). For example, .GT., .ge., .OR. can be used to replace ``>, >= & |``. .. versionchanged:: 1.0.0 Breaking change: upgrade to cmd2 v1.3 with argparse was required as python optparse has been deprecated (November 2020). This changes syntax of program control commands. It introduced a new limitation of cmd2 that ``> and |`` are parsed as redirects (or removed) and are not passed forward as part of a command. .. versionadded:: 1.0.0 python 3.6 support (November 2020) .. deprecated:: 1.0.0 python 2.7 support (November 2020) .. versionadded:: 0.6.2 Connectivity determined from covalent radii rather than single absolute distance. Needed to support hydrogens (08/24/20). .. versionadded:: 0.6.1 Outputs (CNS/X-plor) segid if input & not using symmetry extensions of PDB format (8/24/20). .. versionadded:: 0.6.0 Support of different atom-name nomenclatures (6/20/20 - 07/18/20). .. versionadded:: 0.5.9 Added atoms.angle() from Davis Catolico (11/21/17) in preparation for bond angle restraints that have not yet been implemented as of 12/22/20. This will need testing. .. versionadded:: 0.5.8 Distance (x-linking) restraints (07/11/17) .. versionadded:: 0.5.8 Rotamer libraries (developed from branch at 0.3.7) (07/03/17) .. versionadded:: 0.5.8 Segmented torsion refinement, but still need rigid group (04/24/17). As of 12/22/20, needs work on user interface and restraints between fragments. .. versionadded:: 0.5.7 Covalent geometry restraints: bond lengths (05/01/17); needs checking as of 12/22/20. .. versionchanged:: 0.5.6 In retrospect, final public distribution before upgrade to 1.0.0. .. versionadded:: 0.5.6 Comparison of anisotropic Bs; adding vector.py (01/11/16). .. versionadded:: 0.5.6 van der Waals restraint (10/22/15) - needs more experience / use as of 12/22/20. .. versionadded:: 0.5.5 Anisotropic Us ride with atom rotations in rigid / torsion motions (10/13/15). .. versionadded:: 0.5.3 torsion.get_all_psi() & other locations now robust to missing OXT (09/27/15). .. versionchanged:: 0.5.2 torsion.py now direction-neutral, so robust to different atom ordering and arbitrary directions of edge determination in the bond graph (08/26/15). .. versionadded:: 0.5.1 calculate() offers (atoms) selection option needed for rotamer evaluation (5/17/20). .. versionadded:: 0.5.1 Tolerate missing Scale/Offset w/ warning (8/11/15). .. versionadded:: 0.5.0 ReStructuredText / Sphinx documentation (05/20/15). .. deprecated:: 0.5.0 EpyDoc doc-strings. .. versionadded:: 0.5.0 Support RAve & RSigma (aka Scale & Offset) in a final record of an X-plor map (2/19/15). .. versionadded:: 0.5.0 Output maps written in Xplor/CNS format (2/17/15). .. versionadded:: 0.4.2 Torsion angle refinement (pre-alpha) in RSRef (October 2013), later extended to Superpose. .. versionadded:: 0.4.2 Limiting calculation of partials according to AtomicParameterization properties for improved efficiency (October 2013). .. versionchanged:: 0.4.2 Breaking change: User interface for model parameterization is redone, requiring changes in program input (October 2013) .. versionchanged:: 0.4.1 Just-in-time Tasks() manager (July 2013), widely deployed, satisfying pre-requisites only as needed, improving efficiency, and reducing run-time errors. .. versionadded:: 0.4.0 symexp.py is a main-program wrap, so that symmetry can be refactored within Tasks(), see above, to be used by multiple programs. :file:`run_symmetry.sh` is replaced by :file:`run_symexp.sh`. .. deprecated:: 0.4.0 Bsoft versions <= 1.7. .. versionadded:: 0.4.0 Bsoft supported, version 1.8. .. versionadded:: 0.4.0 Internals refactored as class option, providing access to attributes through python user-commands (July 2013). .. versionchanged 0.4.0 python cmd is replaced by cmd2nest, an extension of cmd2, written to support cascading command sub-menus. Introduced here as alpha with partial implementation (July 2013), later implemented throughout the package. Changes input with deprecation of commands exec, EXEC, echo, value, replaced with the py python interpreter of cmd2 and other commands. .. versionchanged 0.4.0 Breaking change: switching from python optparse (deprecated) to argparse, changing syntax of command-line arguments (July 2013). Option :option:`--by_resolution` is replaced by :option:`--relative_extent`, :option:`--relative_use` and :option:`--require_relative`. .. versionchanged:: 0.3.9 A refine command without arguments will continue additional cycles using previously defined parameters, rather than using defaults (July 2013). .. versionchanged:: 0.3.9 :option:`!--normalize` (:option:`!-N`) now needed for normalization of the input map (used to be the default, July 2013). .. versionadded:: 0.3.9 cmd2nest, an extension of cmd2 - cascading sub-menus of commands (06/05/13). .. versionchanged:: 0.3.9 Upgrade from cmd to cmd2 (05.04/13). .. versionchanged:: 0.3.9 converted from optparse to argparse (5/4/13). .. versionchanged:: 0.3.9 Under-the-hood reprogramming to pull functionalities out of rsref-refine so that they can also be used in superpose (Apr 2013). Analyze becomes an independent command. .. versionchanged:: 0.3.8 Reimplemented superpose to use :py:class:`Pairing`, supporting selections on both target and moving structures and matching by flexible criteria (February 2013). .. versionchanged:: 0.3.7 torsion.py is Michael's consolidation of Brynmor's TAProtein.py & TAProtein_copy.py (from Augusst 2011) which are now deprecated. Some parts are moved to transform.py. Improving documentation, consistency with other parts of RSRef (11/1/12). .. versionadded:: 0.3.6 Compound expressions within Selections (Oct 2012). .. versionadded:: 0.3.6 Foundations for torsion angle refinement, still under development. .. versionadded:: 0.3.5 First examples included (Sept 2012) .. versionadded:: 0.3.5 Command-line options to condition / scale parameters for improved refinement convergence (Sept 2012) .. versionadded:: 0.3.5 Rigid-group refinement (Sept 2012). .. versionchanged:: 0.3.5 Breaking change: input for commands refine and image_refine change. .. versionadded:: 0.3.4 Distribution de-bugging, documentation overhaul (8/23/12). .. versionadded:: 0.3.3 Patch embedding RSRef into CNS 1.3 is now stable (03/09/12). .. versionchanged:: 0.3.3 Caching of atomic density debugged (3/5/12). Previously had re-used density if B-factor matched without checking atom type. Caching is now disabled when B-factors, Envelope & Resolution refined; improves robustness of image and atom refinement. .. versionadded:: 0.3.2 Restrained refinement of individual B-factors (12/15/11). .. versionadded:: 0.3.1 Crystallographic (lattice) symmetry (December 2011) .. versionadded:: 0.3.0 Contributions from non-crystallographic (molecular) symmetry equivalent atoms (Fall 2011). .. versionadded:: 0.3.0 Through linking to Bsoft library, add support for CCP4, MRC, SPI, DSN6 and other map formats to CNS and X-plor (Fall 2011). .. versionadded:: 0.3.0 Expand form factor sets from which maps are calculated from atoms (Fall 2011), expanding from X-ray into neutron diffraction and widening support for Electron Microscopy. .. versionadded:: 0.3.0 Low-pass filtering and resolution limit (replaces explicit modeling of CTF) that can now be refined (Fall 2011). .. versionchanged:: 0.3.0 Removed bug in which stereochemical terms were omitted from CNS-embedded refinement (Fall 2011). .. versionchanged:: 0.3.0 Map and other attributes retained in RAM, even when communicating with CNS, improving performance (Fall 2011). .. versionadded:: 0.3.0 Compatibility with latest CNS (version 1.3; Fall 2011). .. versionchanged:: 0.3.0 Completely new Python implementation (11/18/11). .. versionchanged:: 0.2 Phipsi restraints: direct use of target PDB file instead of MoleMan output (11/25/12). .. versionadded:: 0.0.1 - 0.1 02/14/10 - 10/28/11: Initial python programming continues .. versionadded:: 0.0.0 EMScale - now deprecated first draft of scaling model and map (December 2009). Restarting version numbers. .. deprecated:: RSRef2000 The forerunner implemented in C, used through 2011. Drop support (for now): GUIs / macro extensions for O molecular graphics and explicit use of EM CTF parameters. Ongoing issues ============== SciPy bug in some Windows versions: ----------------------------------- Enthought (Windows 64 bit) bug in SciPy 0.8.0 through 0.10.0 (at least, still a problem 3/4/12) - causes optimize.fmin_l_bfgs (refinement) to hang. This likely started with EPD version 7.0. Googling indicates that can be circumvented by installing SciPy independently (with Python, NumPy) or using an older version of EPD. This bug is known to others, and is not a problem for Linux EPD distributions. Status of Windows 32 bit is unknown. OpenMP Multi-Processing on unix platforms: ------------------------------------------ Python's global interlock precludes parallel calculations through multi- threading, except with C extensions. A user's NumPy or SciPy may have been compiled with OpenMP, in which case the array processing will grab half of the total available threads, but with the over-head of short-lived threads, there is almost no improvement in time. Not all NumPy / SciPy instals are compiled with OpenMP, so first check whether RSRef is using more than 100% of a CPU (top or pidstat). If so, RSRef will be friendlier to other processes if the number of threads / process is limited by setting the environmental variable OMP_MUM_THREADS=1. When embedded, the CNS part can make effective use of multi-threading, so you might want to allow a larger number of threads, balancing the gain in CNS with the waste in RSRef. Crystallographic symmetry: -------------------------- Support of crystallographic symmetry is temporarily suspended. The (pickled) dictionary of operators is incompatible with python 3, and its rebuilding depends on availability of a python 3 compatible cctbx (phenix). PaStO programs are usually run with at most local / molecular / non-crystallographic symmetry, so it is hoped that this is not a problem. .. _Colophon: Colophon ======== Documentation is available: * online at: ``_ * within local installations, starting at: :file:`file:///${}/doc/build/index.html` This section documents how documentation is rebuilt following an edit, or how to add additional formats, such as pdf or man pages. :mod:`documentation.py ` is used to assemble documentation under the user's chosen DOC-PATH with a command like: .. code-block:: bash python -u -m documentation [--clobber] [--developer] [--warnings] --nonstop --builder=html --builder=latexpdf DOC-PATH (Take care, because the default DOC-PATH will update / overwrite the installation documentation in :file:`${RSREF_HOME}/doc`.) There are many options, to be listed with :program:`documentation.py --help`, but :option:`!--clobber` is used to overwrite existing files (without warning), and :option:`!--warnings` is to print documentation diagnostics and should be omitted for a clean set. :file:`documentation.sh` runs :mod:`documentation.py ` with appropriate parameters to generate a html documentation set and a pdf manual. Run twice, and it should complete the cross references. Further details of :file:`documentation.sh` are given below. :mod:`documentation.py ` gathers in DOC-PATH/source the `reStructuredText `_ (:file:`.rst` & :file:`.txt`) files needed from: * A handful of overview files in the :file:`${}` and :file:`${}/pasto` folders. * Auto-building and extraction of doc-strings from :file:`.py` source files in the :file:`${}/pasto` tree, using `Sphinx `_. * Auto-generation of :file:`_cmd_list.rst` and :file:`_cmd_help.rst` files, by capturing the output for each of: - ``program.py --help`` (command-line arguments). - :menuselection:`program --> help` (command-control help). which are then included in the documentation for each program. :mod:`documentation.py ` then builds a documentation set in html format (by default) in DOC-PATH/build/html using `Sphinx `_. Additional formats are optional when running :mod:`documentation.py ` or can be added using a command such as: .. code-block:: bash make latexpdf while in the folder DOC-PATH. This uses the :file:`Makefile` generated by `Sphinx `_. Credits ======= Michael S. Chapman with help from Andrew Trzynka & Brynmor K. Chapman Last updated: 07/11/24; chapmanms@missouri.edu