translate module

Support different atom name nomenclatures.

usage:
  • Translate [options] <input-coordinate-file> [translated-file]

  • translate.py [options] <input-coordinate-file> [translated-file]

Module author: Michael S. Chapman <chapmanms@missouri.edu>

Authors:

Michael S. Chapman <chapmanms@missouri.edu>

University of Missouri

Version:

1, Nov 26, 2024

Changed in version 0.6.0: 4/13/20 Started

Changed in version 0.6.1: 07/09/20 Working version installed

Changed in version 0.6.2: 07/18/20 Options -d, -D to -q, -Q for package compatibility

Changed in version 1.0.2: 01/19/21 Dictionary attributes go private to remove long variable values from documentation.

Synopsis

Variants of the PDB-format coordinate file use different atom nomenclatures. This module manages the translation between different dialects. It is normally called by other PaStO programs so that users are not invoking explicitly, but it is also available as a stand-alone program.

Command line options

Command-line options
====================

The most up-to-date documentation is generated from :command:`translate.py -h`:

(Command: /trihome/chapmanms/Devel/RSRef/FTatom/pasto/translate.py -h)

(Sources: /trihome/chapmanms/Devel/RSRef/FTatom/pasto v1.0.6)

usage: translate.py [-h] [--version] [--infile FILE] [--outfile FILE] [--in_nomenclature STR] [--out_nomenclature STR]
                    ... input output

Change atom-name convention in a coordinate file (c) University of Missouri 2020, Michael S. Chapman

positional arguments:
  ...                   Program commands may follow any required positional arguments, or optional positional arguments terminated
                        by '--'. Commands are space-separated, quoted if containing white space.
  input                 input coordinate file
  output                output coordinate file

options:
  -h, --help            show this help message and exit
  --version, -v         show program's version number and exit
  --infile FILE         Redirected standard input, like "<". (default: <_io.TextIOWrapper name='<stdin>' mode='r'
                        encoding='utf-8'>)
  --outfile FILE        Redirected standard output, like ">". (default: <_io.TextIOWrapper name='<stdout>' mode='w'
                        encoding='utf-8'>)

Nomenclature translation:
  --in_nomenclature STR, -q STR
                        From input: bmrb, cif, cns, diana, midas, msi, pdb92, sc, sybyl, ucsf, xplor, lax; lax for no translation,
                        None for auto-determine (default: None)
  --out_nomenclature STR, -Q STR
                        To output: bmrb, cif, cns, diana, midas, msi, pdb92, sc, sybyl, ucsf, xplor; default: same as
                        --in_nomenclature (-q) if defined, else internal convention if None (default: None)

+<file> inserts options from <file>, one per line.

class translate.Arguments(imports=[], main=None, *args, **kwargs)

Bases: Arguments

Manager for command-line arguments from main and imported modules.

This is a template to be copied into modules, for the handling of command- line options.

methods export() and domestic() should be overridden by the module subclass with declarations of the command-line arguments needed by the module.

Parameters:
  • imports ((list of) ArgumentParser(s)) – (list of) module(s) from which to obtain “parent” ArgumentParser objects for inclusion.

  • main (bool|NoneType) – Add information and arguments appropriate for a main program (or not); if None, will determine by whether this class is defined in __main__.

domestic()

Defines options used only when main program and not when imported.

export()

Defines options used in both stand-alone and imported modes.

class translate.Dictionary

Bases: object

Multi-convention dictionary of coordinate atom names.

Variables:
  • _original (tuple(str)) – joined elements comprise a literal copy of atom_nom.tbl.txt from http://www.bmrb.wisc.edu/ref_info/atom_nom.tbl.

  • _dictionary (dict(list(dict))) – equivalent of _original. Outer keys of residue type (‘ALA’, ‘ASN’, …). Values are lists of dictionaries with keys of dialect and values of atom name, eg. {‘cif’:’N’, ‘xplor’:’N’, …).

  • _names (dict) – names of the atoms stored as dict(dict(set)) where the outer dict is keyed by convention name, the inner by residue name, and then sets of atom names (None omitted), so the set of cif atom names for valine would be accessible as _names[‘cif’][‘VAL’].

  • convention (list) – supported nomenclatures

  • to3 (dict) – conversion of one-letter to three-letter amino acid names.

  • to1 (dict) – conversion of three-letter to one-letter amino acid names.

Warning

The data is derived from a table at the BMRB database that is mostly concerned with hydrogen/proton names. In this table ‘ucsf’ and ‘midas’ are incomplete, but nearly complements of each other. I am guessing that ‘midas’ is non-hydrogen atoms, with ‘ucsf’ being the the hydrogen additions. ‘sc’ is merely a stereochemical designation. Thus, probably don’t want to be using ‘ucsf’ or ‘sc’.

Warning

nomenclatures ‘cns’ and ‘xplor’ might ignore differences in topology files.

Todo

check whether atom names differ between the protein and allhdg series of cns and xplor topology files. If the polar and allhdg versions are supersets, then a decision should be made on whether to use the largest set for all, or be specific for each topology file.

alphabetical = ['ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE', 'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'SER', 'THR', 'TRP', 'TYR', 'VAL']
convention = ['bmrb', 'cif', 'cns', 'diana', 'midas', 'msi', 'pdb92', 'sc', 'sybyl', 'ucsf', 'xplor']
d = {'bmrb': 'O', 'cif': 'O', 'cns': 'O', 'diana': 'O', 'midas': 'O', 'msi': 'O', 'pdb92': 'O', 'sc': None, 'sybyl': 'O', 'ucsf': None, 'xplor': 'O'}
dialect(atoms, shortcut=True, verbosity=2)

Determine the nomenclature from the atom names.

Parameters:
  • atoms (Atoms) – coordinates.

  • shortcut (bool) – stop looking on first exact match

  • verbosity (int) – higher number gives more diagnostic output.

Returns:

nomenclature dialect, like ‘cif’ or ‘xplor’ or ‘pdb92’ - see Dictionary.convention.

Return type:

str

lang = 'xplor'
static repr(residue, *nomenclature, **kwarg)

Tabular text represenation of equivalent atom names.

Parameters:
  • residue (str) – ‘all’ or usually-3-letter residue name.

  • *nomenclature (str) – zero or more format types like ‘cif’ or ‘xplor’ or ‘pdb92’ - see Dictionary.convention. Use ‘cif’ for Coot and for PDB files with maximum compatibility with the PDBx/mmCIF format ‘pdb92’ was the standard before PDB adopted the cif format.

  • header (str) – preface with column headers of format type(s).

  • regression (str) – table returned will be formatted for comparison to the original BMRB data file, superseding other options.

Returns:

translation table

Return type:

str

res = 'VAL'
to1 = {'ALA': 'A', 'ARG': 'R', 'ASN': 'N', 'ASP': 'D', 'CYS': 'C', 'GLN': 'Q', 'GLU': 'E', 'GLY': 'G', 'HIS': 'H', 'ILE': 'I', 'LEU': 'L', 'LYS': 'K', 'MET': 'M', 'PHE': 'F', 'PRO': 'P', 'SER': 'S', 'THR': 'T', 'TRP': 'W', 'TYR': 'Y', 'VAL': 'V'}
to3 = {'A': 'ALA', 'C': 'CYS', 'D': 'ASP', 'E': 'GLU', 'F': 'PHE', 'G': 'GLY', 'H': 'HIS', 'I': 'ILE', 'K': 'LYS', 'L': 'LEU', 'M': 'MET', 'N': 'ASN', 'P': 'PRO', 'Q': 'GLN', 'R': 'ARG', 'S': 'SER', 'T': 'THR', 'V': 'VAL', 'W': 'TRP', 'Y': 'TYR'}
translate(atoms, fr_dialect, to_dialect, in_place=True)

Translate atom names between nomenclatures (dialects)

Parameters:
  • atoms (Atoms) – coordinate set

  • fr_dialect (str|NoneType) – starting nomenclature if known, eg. ‘cns’, ‘cif’, ‘pdb92’. If None, will get the closest match between names in atoms and _dictionary.

  • to_dialect (str) – desired nomenclature, eg. ‘cns’, ‘cif’, ‘pdb92’.

  • in_place (bool) – replace the names in atoms as well as returning (usually want True)

Returns:

translated, from a deep copy of atoms

Return type:

atoms.Atoms

unknown(atoms, nomenclature='cif', onlyKnownResidues=True, verbosity=99)

Report atoms whose names are not defined.

Parameters:
  • atoms (Atoms) – coordinates.

  • verbosity (int) – higher number gives more diagnostic output.

  • *nomenclature (str) –

    format type like ‘cif’ or ‘xplor’ or ‘pdb92’. See Dictionary.convention. Use ‘cif’ for Coot and for maximum compatibility with the PDBx/mmCIF format ‘pdb92’ was the standard before PDB adopted the cif format.

  • onlyKnownResidues (bool) – count atoms only from residues listed in the dictionary.

Returns:

number undefined atoms

Return type:

int

translate.startup()

Program initialization, reading options etc..

Returns:

parser

Return type:

Arguments.ArgumentParser