schrodinger.application.jaguar.csmiles.SmilesAtom module

class schrodinger.application.jaguar.csmiles.SmilesAtom.SmilesAtom(atom: Atom, precision: int, mol: rdkit.Chem.rdchem.Mol | None = None, parent: Union[None, Self] = None, dummy: bool = False, error_on_arbitrary_branch: bool = False)

Bases: object

__init__(atom: Atom, precision: int, mol: rdkit.Chem.rdchem.Mol | None = None, parent: Union[None, Self] = None, dummy: bool = False, error_on_arbitrary_branch: bool = False)
atom: Atom
idx: int
mol: rdkit.Chem.rdchem.Mol | None
conf: rdkit.Chem.rdchem.Conformer | None
has_template: bool
tag: str
rank: int | None
parent: Optional[Self]
implicit_atoms: list[Self]
bond_to: list[SmilesBond]
ring_close_to: list[SmilesBond]
ring_open_to: list[SmilesBond]
arbitrary_branch: bool
error_on_arbitrary_branch: bool
label: str
set_precision(precision: int)

Recursively set a new precision level for this atom and all downstream in the graph.

Parameters:

precision – The new value for the precision.

get_bond_to(other: Self) SmilesBond | None

Find and return the SmilesBond object that joins this atom to the other atom passed, if it exists. This only searches the current atom.

Parameters:

other – The SmilesAtom for which we will look for a bond to.

Returns:

The SmilesBond object joining this atom to the passed, or None if no such bond is found.

get_full_bonds() list[SmilesBond]

Return every downstream bond from this node including special ring-close and ring-open open bonds. Therefore this should not be used to traverse the graph as it will result in cycles.

Returns:

Every downstream bond from this node including ring bonds.

set_implicit_atoms()

Record any implicit atoms that are bonded to the RDKit atom associated with this SmilesAtom into the implicit_atoms list as a dummy SmilesAtom. These atoms are usually hydrogen which are typically implicit in SMILES strings.

Then recursively call this for all child atoms such that the downstream graph has its implicit atoms set, thus it should be called on the root.

is_ring_close_to(other: Self) bool

Determine if we have a ring close bond to the passed atom or not.

Parameters:

other – The atom we are searching for a ring close bond to.

Returns:

True if this atom has a ring close bond to argument other.

set_ring_tags(_open_tags: dict[int, Self] | None = None)

Recursively set the ring open and close tags for ring open and close bonds. Should be called on the root SmilesAtom without arguments.

Parameters:

_open_tags – The dictionary containing the open set of ring tags and the atom that opened this ring. Once the opening atom is encountered in a ring close it is removed from the open set for re-use. This should be None when not called recursively.

get_smiles_symbol() str

Return the SMILES symbol for the atom. Respects RDKit’s aromaticity.

get_formatted_smiles() str

Insert the ring tags before the first bond symbol, or at the end if no bond is found.

Returns:

The SMILES string for the atom with ring tag inserted.

get_smiles(debug=False) str

Recursively prints the SMILES substring for the atoms and, if requested, their debugging labels.

Parameters:

debug – Flag that we should include atom labels for debugging.

Returns:

The debug string for all downstream of this atom.

get_csmiles(minimal: bool, cxsmiles: bool = False, _idx: dict[int, Self] | None = None) str

Recursively create the conformer SMILES string from this atom. Also set the CSMILES_ORDER_KEY property wich records the order the atoms appear in the CSMILES string.

Parameters:
  • minimal – Flag that we should use the minimal set of delimiters for greater compression sacrificing readability.

  • cxsmiles – Flag that we want the CXSMILES notation.

  • _idx – A dictionary mapping index in CSMILES string to the SmilesAtom object for that atom. This is set by the root atom and passed on to child atoms in recursive calls. Then the next index can be found as the largest key in the dictionary + 1.

Returns:

The conformer SMILES string from this atom.

get_difference_csmiles(diff_graph: Self, tolerance: float, minimal: bool = False) str

Compute the difference CSMILES between this graph and that passed as diff_graph. This should be called with diff_graph as the root SmilesAtom of the graph to be diffed.

Parameters:
  • diff_graph – The graph to take the difference from.

  • tolerance – The tolerance below which differences will be truncated to zero.

  • minimal – Flag that the minimal representation should be used.

Raises:

ConnectivityMismatchException – If graph topologies are not identical.

Returns:

The difference CSMILES string.

get_ordered_bonds() list[SmilesBond]

Get a list of bonds (edges) sorted in canonical order for graph downstream of this atom. Recursively calls the same method in child atoms.

Returns:

A list of SmilesBond objects ordered in canonical order.

get_csmiles_dihedral_dict(atom_idxs: bool) dict[tuple[int | str, int, int, int | str], float]

For this and all child atoms, return a dictionary mapping the indices of the atom to the dihedral angle between them. When atom_idxs is True these are indices into the Mol object, otherwise they are indices into the CSMILES string.

When we request string indices (atom_idxs=False), if an atom in the dihedral is implicit then it does not appear in the CSMILES string so the index is given as ‘H’.

If atom_idxs is False then we require get_csmiles to have been called on the root atom to set const.CSMILES_ORDER_KEY on the atoms before CSMILES indices can be known. This is not required for atom indices.

Parameters:

atom_idxs – If True indices are for atom in the Mol object, otherwise we return indices as position in the CSMILES string.

Returns:

The dictionary mapping index quad to dihedral angle.

get_csmiles_idx() int | str

Return the index of this atom in the CSMILES string. Requires get_csmiles to have been called on the root atom to set const.CSMILES_ORDER_KEY on the atoms before CSMILES indices can be known.

Note this index is the order the atoms appear in the string not the index of the character in the string.

Returns:

The integer index of the atom as it appears in the CSMILES string. If this is an implicit hydrogen then ‘H’ is returned.

get_ring_close_indices(atom_idxs: bool) list[tuple[int, int, rdkit.Chem.rdchem.BondType]]

Recursively collect a list of all ring-close bonds in the graph as (end, start, type) tuples.

If atom_idxs is False then we require get_csmiles to have been called on the root atom to set const.CSMILES_ORDER_KEY on the atoms before CSMILES indices can be known. This is not required for atom indices.

Parameters:

atom_idxs – If True indices are for atom in the Mol object, otherwise we return indices as position in the CSMILES string.

Returns:

A list of ring-close bonds as (end, start, type) tuples.

calculate_backwards_dihedrals() bool

Find the first forwards branch from the canonical order and determine the back-dihedral into every other branch from this one. This is necessary when the root atom has multiple branches, or if the root atom has only a single bond forwards then the second atom in the graph may have back-dihedrals. In the case of collinear root atoms it can be even more convoluted, as we may have to take N collinear steps down the graph until we find a branch with a non-collinear angle.

Returns:

True if we were successful in setting the back-dihedral. Primarily to support recursive descent into the graph.

get_canonical_dihedral_branch() tuple[list[Self], Self, Self, SmilesBond] | None

Locate which branch we should follow as the canonical branch to measure the back-dihedrals against and also determine which atoms could be the first in the dihedral (e.g. if there are multiple terminal hydrogen then return them all.

Normally this is the canonically first branch, but if that branch is a single atom then we still don’t have a dihedral, so continue to find the first branch with a fourth atom that can make a dihedral. If none exists then return None.

Returns:

List of possible first atoms in the quad, the third atom in the quad, the final atom in the quad, and the graph bond around which the dihedral is measured (i.e. between self and third). Returns None if no dihedral is possible.

get_canonical_preceding(get_all_possible=False) Optional[Union[list[Self], Self]]

Get the atom that precedes this atom in the canonical ordering. If no such atom exists then attempt to find the canonical terminal hydrogen. If no such hydrogen exists then there is no preceding atom so return None.

In the case where both self and the first child need to have their terminal hydrogen found this function will set the _terminal property on that first child as well.

Parameters:

get_all_possible – Flag that we should return the list of all possible preceeding atoms rather than just the canonical preceeding.

Returns:

The preceding atom, or None if no such atom exists.

get_canonical_next() tuple[list[Self], NextFrom]

Get the atom that follows this atom in the canonical ordering. If no such atom exists then attempt to find the canonical terminal hydrogen. If no such hydrogen exists then this is the terminal point in the (sub-)graph so return an empty list.

Returns:

A list of the next atom(s) in the canonical ordering and where they came from. If None exist then an empty list is returned

get_best_h(a: Self, b: Self, c: Self, h_first: bool) Self

Find the best terminating hydrogen atom for this atom, if it has any, defined as that which minimizes the dihedral around bond a-b ( h_first=True) or b-c (h_first=False). Atoms a, b, c must be in the order they appear in the dihedral.

The hydrogen to be found could be at the start or end of the quad, which is set by the h_first flag being True when the hydrogen to find the first position in the quad.

We don’t use SmilesBond objects here because the terminating hydrogen are not considered a conventional bond to be traversed.

Parameters:
  • a – The first atom in the known trio.

  • b – The second atom in the known trio.

  • c – The third atom in the known trio.

  • h_first – Flag if the hydrogen to be found is the first in the quad, if False then the hydrogen is last in the quad.

Returns:

The dummy hydrogen SmilesAtom object which minimizes the dihedral angle.

simultaneous_terminal_determination(firsts: list[Self], third: Self) Self

Here the self instance is always the second atom in the quad, where we need to determine the terminal hydrogen simultaneously. This is set as third._terminal here, and the best from firsts is returned.

That is, we are determining self.get_canonical_previous() and third.get_canonical_next().

Parameters:
  • firsts – The set of possible first atoms we might have, usually terminal hydrogen.

  • third – The third atom of the quad.

Returns:

The canonical terminal atom for this atom from the list of possible that were passed.

apply_dihedrals(constraints: dict[tuple[int, int, int, int], float])

Apply the dihedrals in the graph bonds to the conformer by traversing the graph in canonical order and rotating bonds in self.conf, which is conformer 0 of self.mol.

Parameters:

constraints – A dictionary recording the dihedral set for each quad of atom indices. These should be imposed on any subsiquent optimizations done of the molecule if the dihedrals are to be maintained. Modified in place.

set_back_collinear_dihedral(first: Self, third: Self, final: Self, dihedral: Dihedral, constraints: dict[tuple[int, int, int, int], float], managed_atoms: set[int], set_unplaced: bool)

Set a backwards collinear dihedral to the desired angle skipping over the dihedral group as instructed by dihedral.collin_steps. We also enforce that first-self-third is indeed collinear as it is encoded to be. We also enforce that the eventual quad we adjust are not collinear by setting a standard angle if required.

For example, if we have HA-B-C-D/E, where A-B-C-D are collinear then we are setting the dihedral HA-D/E.

Parameters:
  • first – The first atom in the back-collinear quad.

  • third – The third atom in the back-collinear quad.

  • final – The final atom in the back-collinear quad.

  • dihedral – The dihedral object recording the angle and collinear steps.

  • constraints – A dictionary recording the dihedral set for each quad of atom indices.

  • managed_atoms – The set of atom indices for atoms we have explcitly placed and should not move. Used here to adjust implicit atoms.

  • set_unplaced – Flag that we should set the dihedral for all atoms which have not been placed (are not in managed_atoms).

make_atoms_collinear(a1: Self, a2: Self, a3: Self)

Make atoms a1-a2-a3 collinear, i.e set their bond angle to 180 degrees. All atoms connected to a3 are moved. Temporary bonds are made for a1-a2 and a2-a3 as required to keep RDKit happy and cleaned up afterwards. Requires that ring bonds have already been broken.

Parameters:
  • a1 – The first atom.

  • a2 – The second atom.

  • a3 – The third atom.

get_nth_parent(n: int) list[Self]

Ascend up the graph n steps and return that atom. Used to find the required atoms for collinear skips. Returns the path of atoms taken to reach that parent.

Parameters:

n – The number of steps to make up the graph. Zero will return self, 1 returns direct parent etc.

Returns:

The path taken to get to that parent, such that the last element is the parent in question.

has_pyramidal_lone_pair() bool

Helper function to determine if this atom should be considered as having a pyramidal lone pair and so requiring implicit hydrogen dihedrals to be stored.

Returns:

True if atom can have pyramidal lone pair.

set_dihedral(constraints: dict[tuple[int, int, int, int], float], first: Self, third: Self, final: Self, dihedral: float)

Set the dihedral angled defined by (first, self, third, final) to the angle passed and record the constraint into the constraint dict.

It is not possible to call rdMolTransforms.SetDihedralDeg directly as this function moves all atoms bonded to third excluding self ( this is the opposite of what the documentation claims…). Therefore we would end up overwriting previously set dihedrals. Instead we rely on the fact we have broken all the bonds to third (excluding that to self). So we temporarily re-make the bond to final, do the rotation, then remove this bond. If rdMolTransforms.SetDihedralDeg only moved the branch from final then this extra bond breaking and making would not be necessary.

Parameters:
  • constraints – A dictionary to which the dihedral constraint will recorded for the quad of atom indices.

  • first – The first atom in the quad.

  • third – The third atom in the quad. All bonds except that to self must have been removed before calling.

  • final – The final atom in the quad. All atoms onwards on this branch will be moved.

  • dihedral – The angle to be set in degrees.

break_ring_close_bonds()

Recursively traverse the graph and break the ring close bonds in the associtated structure.

reform_ring_close_bonds() set[tuple[int, int]]

Recursively traverse the graph and reform the ring close bonds in the associated structure.

Parameters:

ring_close_idxs – A set of atom index pairs which are the ring bonds to cut if imposing the dihedral constraints on the molecule in outside programs. Initially this is None and is filled as the graph is traversed.

Returns:

The set of atom index tuples that were remade.

make_all_rdkit_bonds()

Remake the RDKit bonds that are recorded as broken in the _broken_bonds property, then update the RDKit ring information.

build_graph(rank_dict: dict[int, Self])

Recursively build the molecular graph in SMILES canonical order, given an RDKit Mol object with the required const.SYM_GROUP_KEY property set. The graph is ordered canonically by topological symmetry and dihedral angles.

Parameters:

rank_dict – The working dictionary mapping atom rank to SmilesAtom object.

add_children(explicit: list[Dihedral], rank_dict: dict[int, Self], bond_into: SmilesBond)

Iterate over the next explicit dihedrals adding them to the graph and adding respective entries to rank_dict. Call build_graph on each child so the graph is recursively built. self.bond_to is populated.

Parameters:
  • explicit – A list of dihedral objects that go onwards from this atom (self).

  • rank_dict – The working dictionary mapping atom rank to SmilesAtom object. Added to in-place.

  • bond_into – The bond leading from the parent into this atom.

next_from_root_with_multiple_terminal_implicit(explicit: list[Dihedral], preceding: list[Self])

Update the list of explicit atoms appearing in the CSMILES string after the current atom, which is the root atom that has implicit hydrogen. Thus we must also identify which implicit hydrogen to take as the root’s terminal.

i.e. we are looking at {H}-R-{T}-{Fin; t}, where {H} is the set of

possible implicit terminals, R is the root atom, {T} is the set of possible third atoms, and {Fin; t} is the set of final atoms for each third t in {T}.

The Dihedral objects in explicit are incomplete, so here we fill them out with the optimal from {H} {T} and {Fin; t} for each t in {T}.

These are not returned in canonical order.

Parameters:
  • explicit – The working Dihedral object for the atoms that follow explicitly in the SMILES string (as opposed to implicit hydrogen. Modified in-place.

  • preceding – The list of possible preceding atoms. As this is the root atom these must be terminal implicit hydrogen.

next_from_single_parent(explicit: list[Dihedral], implicit: list[Dihedral], preceding: Self) tuple[list[Dihedral], list[Dihedral]]

Self has a single preceding atom and we want to determine what the next atoms could be from here. This can happen if we have an atom with a parent, or if we are the root atom and we have only a single implicit terminal atom.

There are then two cases: we make the quad of (Parent-of-parent, Parent, self, possible_next_atoms) or our parent doesn’t have a preceding atom (we are root, or our parent is root with no implicit terminal). Then we must instead look forwards to find the ordering.

Parameters:
  • explicit – The list of Dihedral objects for the following explicit atoms. These will have the canonical first and final atoms set.

  • implicit – The list of Dihedral objects for any implicit atoms attached to this atom.

  • preceding – The preceding atom. Either our parent or a terminal hydrogen.

Returns:

The list of Dihedrals with first and final properties set, and the list of implicit atoms for which we must explicitly record dihedrals (usually because self is a nitrogen atom).

resolve_collinearity_and_get_recorded(finals: list[Dihedral], preceding: Self, firsts: list[Self]) list[Dihedral]

Update the entries in finals so any collinearity in the dihedrals is resolved, then sort finals into canonical order and return any implicit atoms from it for which we need to record angles.

Parameters:
  • finals – A list of dihedrals into downstream atoms in the graph. Updated in-place with non-collinear atoms if collinearity is found.

  • preceding – The parent atom immediately preceding this.

  • firsts – A list of possible first atoms in the dihedral quad.

Returns:

A list of dihedral objects for any implicit atoms that need angles recorded.

peek_forwards_finals(explicit: list[Dihedral], preceding: Self)

Look forwards to resolve the outgoing dihedrals, modifying explicit in place with angles for sorting purposes.

Parameters:
  • explicit – The list of outgoing dihedrals that will be modified in-place.

  • preceding – The atom directly preceding this one.

get_first_non_collinear(second: Self, third: Self, finals: list[Self], backwards: bool = False) tuple[list[TipPair] | None, int]

Given (second, third), find the canonically next atom that is not collinear with the bond vector. This may be many steps into the graph, e.g. if we had C#CC#CC#CC#CCF and called this for the first two C#C then we would return the F (which will be non-collinear) as everything else is collinear with the first C#C.

This can either operate “forwards” where finals is taken as the list of possible starting atoms and the search proceeds following canonical order, or “backwards” where finals is the list of possible final groups (e.g. mutliple terminating hydrogen) and we locate the first by traversing up the graph by parents.

Note that we must keep second moving along as the atom preceding the tip as the second->third vector will be changed by this and can be important when there are small bends in the collinear segment.

Parameters:
  • second – The second atom of the quad, which is the first atom of the bond.

  • third – The third atom of the quad, which is the second atom of the bond.

  • finals – The list of possible terminating atoms. The first in the quad if backwards is False, else they are the last in the quad.

  • backwards – Flag if we are searching forwards in the graph if False, or backwards up parents if True.

Returns:

List of possible non-collinear atoms. Can be multiple if multiple symmetric groups are found.

Returns:

The number of steps that had to be taken to find a non-collinear atom.

update_tip(pair: TipPair, pos_third: Point3D, backwards: bool) tuple[float, rdkit.Chem.rdchem.Atom]

Update the tip to the new position in the collinear chain. Return the new tip and the dot-product to the known branch for later checking to see if collinearity has been resolved.

Parameters:
  • pair – The TipPair object for the current position.

  • pos_third – The position of the third atom, used if backwards is True.

  • backwards – If True then we are hunting backwards for a non-collinear, otherwise we are hunting forwards.

Returns:

The dot-product for determining collinearity and the new tip.

fix_misplaced_implicit_hydrogen()

We can have the problem where the initial SMILES parsing resulted in a chiral center being set incorrectly for the final conformer data. If every group from the chiral center is explicit in the CSMILES then we have no problem as the explicit dihedrals will correct the chirality. If the center contains one implicit hydrogen (by definition it cannot have more than one) then this hydrogen may be the wrong side of the tetrahedron. The cleanup FF minimization can’t fix this as the hydrogen is trapped in a local minimum. So we detect these cases and resolve them by flipping the hydrogen across the chiral center. This is sufficient to allow the cleanup minimization to fix their position.

This isn’t a common case as implicit atoms are deliberately set to minimize sterics, but it can happen in fused ring systems where a ring-open bond is present. We don’t know the angle for that downstream dihedral, so the implicit placement could be wrong. This at least makes it less wrong. A full solution would be to do another full pass spacing out implicit atoms once all explicit angles have been set.

get_longest_path() int

Performs a depth-first search to find the longest path from this node to a leaf. Works by recursively calling this function on all children.

Returns:

The longest path from this node to a leaf.