schrodinger.application.jaguar.csmiles.encoding module

schrodinger.application.jaguar.csmiles.encoding.generate_csmiles(st_original: Mol, precision: int = 1, representation: Representation = Representation.INSET, minimal_inset: bool = False, arbitrary_branch_warning: Literal['IGNORE', 'WARNING', 'ERROR'] = 'IGNORE', skip_root_canonicalization: bool = False, non_canonical: bool = False, root: int | None = None) str | tuple[str, bool]

Return the conformer smiles of the input structure. This has the format of a canonical smiles string with the dihedral angle inserted between each window of 4 smiles atoms. e.g. CCCCC would have one entry such as C{120}C{60}C{130}C{0}C, where “{60}” denotes that the dihedral angle between carbons (1, 2, 3, 4) atoms is 60, as they appear in the SMILES string. Then the following 130 is the dihedral for carbons (2, 3, 4, 5).

Parameters:
  • st_original – The structure to generate the CSMILES for.

  • precision – The size of the bin to round the angle to, with the first bin centered on zero.

  • representation – Which representation of the string we should use.

  • minimal_inset – Flag that we should produce the most compressed CSMILES we can by using minimal delimiters around dihedral blocks.

  • arbitrary_branch_warning – Flag what to do if an arbitrary branch point is encountered. If ‘WARNING’ then a bool will also be returned indicating if an arbitrary choice was made or not.

  • skip_root_canonicalization – Flag if we should skip iterating over all possible symmetric root atoms, resulting string will be canonical for the arbitrary root chosen (i.e. smallest atom index with symmetry class zero).

  • non_canonical – Flag that we should skip all canonicalization. Atoms will be ranked by their position in the structure.

  • root – The index of the explicit root atom that should be used. Setting this skips root canonicalization.

Returns:

The conformer smiles string and optionally a bool indicating if an arbitrary branch point was encountered or not if arbitrary_branch_warning is True.

Raises:

ArbitraryBranchException if an arbitrary branch choice was encountered and error_on_arbitrary_branch is True.

schrodinger.application.jaguar.csmiles.encoding.generate_csmiles_debug(st_original: Mol, precision: int = 1, representation: Representation = Representation.INSET, minimal_inset: bool = False, error_on_arbitrary_branch: bool = False, skip_root_canonicalization: bool = False, non_canonical: bool = False, root: int | None = None) tuple[str, bool, rdkit.Chem.rdchem.Mol]

Debug wrapper function to CSMILES generation that prints debugging information and returns an additional decorated Mol object with debugging atom properties set.

Parameters:
  • st_original – The structure to generate the CSMILES for.

  • precision – The size of the bin to round the angle to, with the first bin centered on zero.

  • representation – Which representation of the string we should use.

  • minimal_inset – Flag that we should produce the most compressed CSMILES we can by using minimal delimiters around dihedral blocks.

  • error_on_arbitrary_branch – Flag if we should raise an ArbitraryBranchException when an an arbitrary branch point is encountered or continue anyway.

  • skip_root_canonicalization – Flag if we should skip iterating over all possible symmetric root atoms, resulting string will be canonical for the arbitrary root chosen (i.e. smallest atom index with symmetry class zero).

  • non_canonical – Flag that we should skip all canonicalization. Atoms will be ranked by their position in the structure.

  • root – The index of the explicit root atom that should be used. Setting this skips root canonicalization.

Returns:

The CSMILES strong, a bool indicating if an arbitrary branch choice was encountered or not, and a decorated Mol object with debugging atom properties set.

Raises:

ArbitraryBranchException – If error_on_arbitrary_branch is True and we encounter an arbitrary branch choice.

schrodinger.application.jaguar.csmiles.encoding.generate_csmiles_driver(st_original: Mol, precision: int, representation: Representation = Representation.INSET, minimal_inset: bool = False, error_on_arbitrary_branch: bool = False, skip_root_canonicalization: bool = False, root: int | None = None, non_canonical: bool = False, debug: bool = False) tuple[str, bool, SmilesAtom] | tuple[str, bool, SmilesAtom, rdkit.Chem.rdchem.Mol]

Return the conformer smiles of the input structure. This has the format of a canonical smiles string with the dihedral angle inserted between each window of 4 smiles atoms. e.g. CCCCC would have one entry such as C{120}C{60}C{130}C{0}C, where “{60}” denotes that the dihedral angle between carbons (1, 2, 3, 4) atoms is 60, as they appear in the SMILES string. Then the following 130 is the dihedral for carbons (2, 3, 4, 5).

Parameters:
  • st_original – The structure to generate the CSMILES for.

  • precision – The size of the bin to round the angle to, with the first bin centered on zero.

  • representation – Which representation of the string we should use.

  • minimal_inset – Flag that we should produce the most compressed CSMILES we can by using minimal delimiters around dihedral blocks.

  • error_on_arbitrary_branch – Flag if we should raise an ArbitraryBranchException when an an arbitrary branch point is encountered or continue anyway.

  • skip_root_canonicalization – Flag if we should skip iterating over all possible symmetric root atoms, resulting string will be canonical for the arbitrary root chosen (i.e. smallest atom index with symmetry class zero).

  • non_canonical – Flag that we should skip all canonicalization. Atoms will be ranked by their position in the structure.

  • root – The index of the explicit root atom that should be used. Setting this skips root canonicalization.

  • debug – Flag that we should print additional debugging information.

Returns:

The conformer smiles string and a bool indicating if an arbitray branch point was encountered, and the root node of the canonical graph. If debug is True then also return a decorated Mol object for debugging.

Raises:

ArbitraryBranchException if an arbitrary branch choice was encountered and error_on_arbitrary_branch is True.

schrodinger.application.jaguar.csmiles.encoding.resolve_back_dihedrals(smiles_root: SmilesAtom)

Calculate back-dihedrals from the appropriate atom, depending on root atom branches and first dihedrals. Information is updated in the graph rooted at smiles_root in-place.

Parameters:

smiles_root – The root atom of the CSMILES molecular graph.

schrodinger.application.jaguar.csmiles.encoding.construct_traditional_cxsmiles_string(raw_smiles: str, dihedral_dict: dict[tuple[int | str, int, int, int | str], float], precision: int) str

Assemble together the CXSMILES string from components. Note that these indices are into the SMILES string, not the Mol object.

Parameters:
  • raw_smiles – The regular SMILES string for the molecule.

  • dihedral_dict – The dictionary mapping quad of atom indices in the SMILES string to the dihedral angle between them.

  • precision – The size of the bin to round the angle to, with the first bin centered on zero. If this is negative then it is the number of decimal digits recorded.

  • debug – Flag that we should print additional debug information.

Returns:

The CXSMILES string encoding the conformer.

schrodinger.application.jaguar.csmiles.encoding.build_smiles_graph(mol: Mol, root_idx: int, precision: int, error_on_arbitrary_branch: bool) SmilesAtom

Driver function to build the SmilesAtom graph for the Mol object with the required const.SYM_GROUP_KEY set on the atoms to define topological canonical order. The graph is begin from atom root_idx, returning the root SmilesAtom from which the graph can be traversed.

Parameters:
  • mol – The RDKit Mol object with const.SYM_GROUP_KEY set on the atoms.

  • root_idx – The index of the root atom in the graph.

  • precision – The precision requested for rounding dihedrals to.

  • error_on_arbitrary_branch – Flag that we should raise an ArbitraryBranchException if we encounter an arbitrary branch choice.

Returns:

The root SmilesAtom for the graph.

Raises:
  • AribitraryBranchException – if error_on_arbitrary_branch is true.

  • RingTagLimitException – if more rings are opened than const.MAX_TAG_NUMBER allows.

schrodinger.application.jaguar.csmiles.encoding.set_symmetry_groups(mol: Mol, non_canonical: bool) list[tuple[int, rdkit.Chem.rdchem.Mol]]

Generate a canonical SMILES for the structure and decorate the atoms with SYM_GOUP_KEY properties so the canonical order can be found traversing the molecule. Also return a list of topologically identical root atoms for the graph.

Parameters:
  • mol – Structure to generate a SMILES string for and decorate.

  • non_canonical – Flag that symmetry rank should be set to atomic index, skipping all canonicalization.

Returns:

A tuple containing the canonical SMILES string and a list of topologically equivalent root atoms.

schrodinger.application.jaguar.csmiles.encoding.get_longest_path_length(mol: Mol) int

Given a molecule return the longest path in the canonical CSMILES string for it.

Parameters:

mol – The molecule to find the longest path of.

Returns:

The number of atoms in the longest path of the canonical CSMILES string for input mol.