stereochemistry module¶
Stereochemistry for connectivity and restraints.
Module author: Michael S. Chapman <chapmanms@missouri.edu>
- Authors:
Michael S. Chapman <chapmami@missouri.edu>,
University of Missouri
- Version:
1, Nov 26, 2024
Changed in version 08/10/20.
Changed in version 0.6.0: 08/19/20 Completed calculation of radii for functional groups.
Changed in version 1.0.0: 09/26/20 Python 2.7 –> 3.6
- class stereochemistry.CovalentRadii¶
Bases:
object
Approximation of Bond Lengths
For a covalent bond, the distance is approximated by the sum of radii for the two elements. The class supports look-up for single, double and triple bonds which are stored as lists of length 3 in picometers. Zero is used if undefined. Errors are estimated to be about 3 pm.
The following is lightly edited from the Wiki page from which the data were obtained: https://en.wikipedia.org/wiki/Covalent_radius
Radii form a self-consistent fit for all elements in a small set of molecules. This was done separately for single,[5] double,[6] and triple bonds[7] up to superheavy elements. Both experimental and computational data were used. The single-bond results are often similar to those of Cordero et al.[4] When they are different, the coordination numbers used can be different. This is notably the case for most (d and f) transition metals. Normally one expects that r1 > r2 > r3. Deviations may occur for weak multiple bonds, if the differences of the ligand are larger than the differences of R in the data used.
Note that elements up to atomic number 118 (oganesson) have now been experimentally produced and that there are chemical studies on an increasing number of them. The same, self-consistent approach was used to fit tetrahedral covalent radii for 30 elements in 48 crystals with subpicometer accuracy.[8]
Single-,[5] double-,[6] and triple-bond[7] covalent radii, determined using typically 400 experimental or calculated primary distances, R, per set.
P. Pyykko; M. Atsumi (2009). “Molecular Single-Bond Covalent Radii for Elements 1-118”. Chemistry: A European Journal. 15 (1): 186-197. doi:10.1002/chem.200800987. PMID 19058281.
P. Pyykko; M. Atsumi (2009). “Molecular Double-Bond Covalent Radii for Elements Li-E112”. Chemistry: A European Journal. 15 (46): 12770-12779. doi:10.1002/chem.200901472. PMID 19856342.. Figure 3 of this paper contains all radii of refs. [5-7]. The mean-square deviation of each set is 3 pm.
P. Pyykko; S. Riedel; M. Patzschke (2005). “Triple-Bond Covalent Radii”. Chemistry: A European Journal. 11 (12): 3511-3520. doi:10.1002/chem.200401299. PMID 15832398.
P. Pyykko (2012). “Refitted tetrahedral covalent radii for solids”. Physical Review B. 85 (2): 024115, 7 p. Bibcode:2012PhRvB..85b4115P. doi:10.1103/PhysRevB.85.024115.
- bonds = ('single', 'double', 'triple')¶
- checkRadii(verbose, exception, *bonds, **bonding)¶
Check that covalent radii are optimal with respect to target bonds.
- Parameters:
verbose (bool) – document
exception (bool) – raise ValueError if any covalent radius is non-optimal
bonds (tuples) – (atom(str), atom(str), target-distance (float)) This can be the same as the argument to fitBondTypes.
bonding (dict) – keyed by every atom in bonds, 3-list of (str-element, float-single-bonding, float-double-bonding, bool-optimize). This will usually be the dict returned from fitBondTypes.
- Returns:
optimal?
- Return type:
bool
- Raises:
ValueError if exception=True and any radius non-optimal
- curated(element, residue=None, atom=None)¶
Single-bonded radius optionally superseded by residue/atom type.
- Parameters:
element (str) – 1 or 2-character periodic table symbol
residue (str) – 3-character residue type, eg. “ALA”, “GLY”
atom (str) – atom name, eg. “CA”, “NH1”, required if residue!=None
- Returns:
radius
- Return type:
float
Unless residue and atom match one of the combinations below, radius will be set by element as single-bonded. The exceptions below are for protein atoms with mixed single/double bonding, resonance or tautomers. The hard-wired parameters come from prior fitting in self.protein() to the .cif geometry files of Coot.
Caution
Limitations based on nomenclature:
- # Protein is recognized by residue matching one of the standard 20
as defined in translate.Dictionary.to1.keys().
# Atom names use .cif nomenclature.
Generalization should be possible, but of limited benefit, because problems are mostly avoided, because identification uses standard names “N”, “C”, “O”, “CA”, “CB”, “NE” (Lys), but otherwise infers from element.
For the C-terminus, an “O” will be treated as double-bonded, while “OXT” (most nomenclatures), “OT1” or “OT2” will be considered single-bonded. (Ideally, we would want resonance between the 2 oxygens, but no attempt is made to identify the “O” of the terminal residue.)
- element = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'Lr', 'Rf', 'Db', 'Sg', 'Bh', 'Hs', 'Mt', 'Ds', 'Rg', 'Cn', 'Nh', 'Fl', 'Mc', 'Lv', 'Ts', 'Og']¶
- Variables:
element – list of element abbreviations by atomic number
- fitBondTypes(*bonds, **bonding)¶
Fit the single/double bond combinations of atoms to target bonds.
If connectivity or bond lengths are to be determined from atom-based attributes, what designation of single- or double-bonded for an atom best replicates its several bonds? This might be further complicated in the presence of resonance or tautomerism.
This routine optimizes the covalent radii to match optimally a set of bond lengths.
The algorithm is simplistic, 1-D searches through each atom-type, repeated with progressively smaller step size.
Note that this can be unstable / ill-posed particularly when the fitted atom-types is close to or exceeds the number of target bond lengths. See method protein() for examples that mitigate these problems.
- Parameters:
bonds (tuples) – (atom(str), atom(str), target-distance (float))
bonding (dict) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize).
- Returns:
optimized bonding
- Return type:
same as bonding, except the 3-tuples are 3-lists.
Nothing special is inferred from the atom names as the elements are explicitly defined in bonding. Thus, to find the best common parameters for several atoms types of the same element, choose a common atom pseudonym in both bonds and bonding. For example, one might use ‘CY’ as a common pseudonym for ‘CD1’, ‘CD2’, ‘CG1’, … to derive a single parameter set for the aromatic ring.
- halfLength(atom, bond=1)¶
- Parameters:
atom (int|str) – atomic number or element abbreviation
int|3-tuple|NoneType – 1, 2 or 3 for single, double or triple bond. Tuple for an average of bond-types, (i * single, j * double, k * triple)/(i+j+k). This might be slightly more appropriate for some atom types when search for possible bond connectivity before bonding type has been established.
- Returns:
half bond length (Angstrom)
- Return type:
float
- number = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118]¶
- Variables:
number – list of atomic numbers, for checking only
- optimizeProteinAmide(bonding=None)¶
Find the single/double bond combinations that fit amide side chain.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
Common pseudonyms are used for amide atoms so that a single set of parameters is derived for both Asn & Gln. See comments for explanation.
- optimizeProteinAromatic(bonding=None)¶
Find single/double bond combinations that best all aromatic side chains.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinBackbone(bonding=None)¶
Find the single/double bond combinations that best fit peptide geom.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Omission of the oxygen adds an error of 0.015 A to the carbonyl in favor of improving the backbone.
Target values have come from Coot library .cif files.
- optimizeProteinCarboxylate(bonding=None)¶
Find the single/double bond combinations that fit carboxylate side chain.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
Common pseudonyms are used for amide atoms so that a single set of parameters is derived for both Asp & Glu. See comments for explanation.
- optimizeProteinGuanidinium(bonding=None)¶
Find the single/double bond combinations that best fit Arg guanidinium.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinHis(bonding=None)¶
Find the single/double bond combinations that best fit His ring.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinHisAve(bonding=None)¶
Find the single/double bond combinations that best fit His ring.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
Although there are fewer degrees of freedom single radii for N & C give a lower RMSD.
- optimizeProteinPhe(bonding=None)¶
Find the single/double bond combinations that best fit Phe ring.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinPheAve(bonding=None)¶
Find the single/double bond combinations that best fit Phe ring.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinPheH(bonding=None)¶
Find the single/double bond combinations that best fit Phe ring.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
- Deprecated:
This was a modified version to test the value of including hydrogens. It has a marginal worsening effect, decreasing the average double-bonding from 0.30 to 0.28 and increasing its standard deviation from 0.18 to 0.19. The idea had been that by requiring each ring-C to be sp2, CD, CE, CZ would be more like CG, and that the double-bonded character would be more evenly distributed, but this did not pan out.
- See:
optimizeProteinPhe for the standard version.
Target values have come from Coot library .cif files.
- optimizeProteinTrp(bonding=None)¶
Find the single/double bond combinations that best fit Trp side chain.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinTrpAve(bonding=None)¶
Find the single/double bond combinations that best fit Trp side chain.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinTyr(bonding=None)¶
Find the single/double bond combinations that best fit Tyr ring.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- optimizeProteinTyrAve(bonding=None)¶
Find the single/double bond combinations that best fit Tyr ring.
- Parameters:
bonding (dict|NoneType) – keyed by every atom in bonds, 3-tuple of (str-element, float-single-bonding, float-double-bonding, bool-optimize). If None will start with single-bonded defaults.
- Returns:
updated bonding
- Return type:
dict
Target values have come from Coot library .cif files.
- protein()¶
Suggest covalent radii for functional groups in proteins.
When an atom has both single and double bonds or is part of a tautomeric or resonant system, there will likely be no single covalent radius that accurately predicts all distances to bonded partners.
Aromatic group are calculated iteratively as (generally) the optimization is ill-posed if parameters does not exceed independent target bonds. The strategy here is to start by optimizing first a generic carbon and nitrogen that best fit all bonds of aromatic side chains, then to further optimize these for each side chain. The side chains are then optimized with “average” carbons and nitrogens for the entire ring, before then attempting optimization for each atom location (eg. CD1/2, CE1/2, …). These optimizations become less well- posed, and even though residual error statistics get lower, variation in the single and double-bonded character within the rings suggest that one should stop with generic “CR” and “NR” ring atoms and not try to distinguish gamma, delta, epsilon (etc.) positions.
Should one want to use the position-specific values, then it is important to start the optimization from the better-posed generic values. For Trp this reduces the RMSD bond length from 0.046 to 0.013A. (For the generic (and over-determined) CR & NR, it makes only a 1% difference in the double-bonded character, giving the same residual.)
There is not much that could be done to improve calculations for the peptide bond. It is ill-posed (3 atoms w/ 3 target bonds) and the RMSD is among the worse in proteins at 0.059A. The single / double- bonding characteristics of peptide, amide and carboxylates seem vaguely sensible.
- radius = [[32, 0, 0], [46, 0, 0], [133, 124, 0], [102, 90, 85], [85, 78, 73], [75, 67, 60], [71, 60, 54], [63, 57, 53], [64, 59, 53], [67, 96, 0], [155, 160, 0], [139, 132, 127], [126, 113, 111], [116, 107, 102], [111, 102, 94], [103, 94, 95], [99, 95, 93], [96, 107, 96], [196, 193, 0], [171, 147, 133], [148, 116, 114], [136, 117, 108], [134, 112, 106], [122, 111, 103], [119, 105, 103], [116, 109, 102], [111, 103, 96], [110, 101, 101], [112, 115, 120], [118, 120, 0], [124, 117, 121], [121, 111, 114], [121, 114, 106], [116, 107, 107], [114, 109, 110], [117, 121, 108], [210, 202, 0], [185, 157, 139], [163, 130, 124], [154, 127, 121], [147, 125, 116], [138, 121, 113], [128, 120, 110], [125, 114, 103], [125, 110, 106], [120, 117, 112], [128, 139, 137], [136, 144, 0], [142, 136, 146], [140, 130, 132], [140, 133, 127], [136, 128, 121], [133, 129, 125], [131, 135, 122], [232, 209, 0], [196, 161, 149], [180, 139, 139], [163, 137, 131], [176, 138, 128], [174, 137, 0], [173, 135, 0], [172, 134, 0], [168, 134, 0], [169, 135, 132], [168, 135, 0], [167, 133, 0], [166, 133, 0], [165, 133, 0], [164, 131, 0], [170, 129, 0], [162, 131, 131], [152, 128, 122], [146, 126, 119], [137, 120, 115], [131, 119, 110], [129, 116, 109], [122, 115, 107], [123, 112, 110], [124, 121, 123], [133, 142, 0], [144, 142, 150], [144, 135, 137], [151, 141, 135], [145, 135, 129], [147, 138, 138], [142, 145, 133], [223, 218, 0], [201, 173, 159], [186, 153, 140], [175, 143, 136], [169, 138, 129], [170, 134, 118], [171, 136, 116], [172, 135, 0], [166, 135, 0], [166, 136, 0], [168, 139, 0], [168, 140, 0], [165, 140, 0], [167, 0, 0], [173, 139, 0], [176, 0, 0], [161, 141, 0], [157, 140, 131], [149, 136, 126], [143, 128, 121], [141, 128, 119], [134, 125, 118], [129, 125, 113], [128, 116, 112], [121, 116, 118], [122, 137, 130], [136, 0, 0], [143, 0, 0], [162, 0, 0], [175, 0, 0], [165, 0, 0], [157, 0, 0]]¶
list, by atomic number, of lists of 3 covalent radii for single, double & triple bonds.