schrodinger.application.scaffold_enumeration.cxsmiles module

Functions to parse “repeating units” and “position variant bonds” from CX SMILES “features” text are not particularly bright, but probably good enough for machine-generated CX SMILES.

class schrodinger.application.scaffold_enumeration.cxsmiles.MCG(atoms, center)

Bases: tuple

atoms

List of atom indices ([int]).

center

Central atom index (int).

class schrodinger.application.scaffold_enumeration.cxsmiles.SRU(atoms, subscript, superscript)

Bases: tuple

atoms

List of atom indices ([int]).

subscript

SRU’s subscript (str).

superscript

SRU’s superscript (str).

schrodinger.application.scaffold_enumeration.cxsmiles.parse_mcg(text, pos, accum)

Parses “multi-center SGroup” data from CX SMILES “features”.

<quote>

The multicenter atom indexes written after “m:” followed by a colon character and the indexes of the atoms which forms the given SGroup separated by “.”. The SGroups are separated by commas.

Example: “m:0:7.6.5.4.3,2:12.11.10.9.8,C:0.0,2.1”

</quote>

Parameters
  • text (str) – CX SMILES “features” string.

  • pos (int) – Index of the character in text right after “m:”.

  • accum (list) – List to which the “SGroups” are to be appended.

Returns

Index of the first unconsumed character in text.

Return type

int

schrodinger.application.scaffold_enumeration.cxsmiles.parse_sru(text, pos, accum)

Parses “SRU” data from CX SMILES “features”.

<quote>

Polymer Sgroups Each Sgroup exported after “Sg:” in fields separated by a colon. Fields are:

  1. Sgroup type keyword. Valid keywords are:

Keyword | Sgroup Type n | SRU … | …

  1. Atom indexes separated with commas.

  2. Subscript of the Sgroup. If the supscript equals the keyword of the Sgroup this field can be empty. Escaped field.

  3. Superscript of the Sgroup. In the superscript only connectivity and flip information is allowed. This field can be empty. Escaped field.

  4. Head crossing bond indexes. The indexes of bonds that share a common bracket in case of ladder-type polymers. This field can be empty.

  5. Tail crossing bond indexes. The indexes of bonds that share a common bracket in case of ladder-type polymers. This field can be empty.

  6. If the c export option is present then bracket orientation, bracket type followed by the coordinates (4 pair, separated with commas). Bracket orientation can be s or d (single or double), bracket type can be b,c,r,s for braces, chevrons, round and square, respectively. The brackets are written between parentheses and separated with semicolons.

A colon is needed after the last non-empty field.

If one needs to retain not only the chemically relevant information, but the whole structure (as drawn), then the c export option should be used.

Examples:

CCCC |Sg:gen:0,1,2:|
CCCC |Sg:n:0,1,2:3-6:eu|
*CC(*)C(*)N* |Sg:n:6,1,2,4::hh&#44;f:6,0,:4,2,|

</quote>

In addition:

<quote>

Escaping

In some places special characters are escaped to ‘&#code’ where code is the ASCII code of the special character.

Not escaped characters in fields of Sgroups and DataSgroups: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#$%()[]./?-+*^_~=’ and the space character.

Not escaped characters in atom property keys and values: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#$%()[]./?-+*^_~=’ and the space character.

Not escaped characters in atom labels and atom values: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘><”!@#%()[]./?-+*^_~=,:’ and the space character.

</quote>

This subroutine recognizes only:

atoms (2), subscript (3), and superscript (4).

Parameters
  • text (str) – CX SMILES “features” string.

  • pos (int) – Index of the character in text right after “Sg:n:”.

  • accum (list) – List to which the “SGroups” are to be appended.

Returns

Index of the first unconsumed character in text.

Return type

int

schrodinger.application.scaffold_enumeration.cxsmiles.parse_cx_extensions(text)

Parses: (a) multi-center groups and (b) SRUs.

Parameters

text (str) – CX extensions to be parsed.

Returns

Tuple ot lists that hold the MCGs and SRUs.

Return type

(list(MCG), list(SRU))

schrodinger.application.scaffold_enumeration.cxsmiles.mol_from_cxsmiles(text, parseName=True)

Strives to instantiate rdkit.Chem.Mol from text assuming that the latter is CX SMILES.

Parameters
  • text (str) – CX SMILES string.

  • parseName (bool) – Parse molecule title?

Returns

Molecule or None

Return type

rdkit.Chem.Mol or NoneType