schrodinger.application.transforms.chemio module

class schrodinger.application.transforms.chemio.WriteBatchesOfStructures(file_prefix: Union[str, Path])

Bases: _LocalOnlyPTransform, WriteBatchesOfStructures

A PTransform that writes each element (which is a list of structures) to a separate file with a unique filename. Note that this transform is an identity step that outputs the input PCollection.

class schrodinger.application.transforms.chemio.ReadFileToStructures(input_file: str)

Bases: PTransform

Read Structure objects from a file.

This is a source transform that reads structures from a file.

Example usage in YAML:

pipeline:
  - type: ReadFileToStructures
    config:
      input_file: "ligands.maegz"
Parameters:

input_file – Path to the structure file to read.

__init__(input_file: str)
expand(pbegin: PBegin)
class schrodinger.application.transforms.chemio.WriteStructures(output_file: str)

Bases: PTransform

Write Structure objects to a file.

This is a sink transform that writes structures to a file. It passes through the input structures unchanged for chaining.

Example usage in YAML:

pipeline:
  - type: ReadFileToStructures
    config:
      input_file: "ligands.maegz"
  - type: LigFilter
    config:
      criteria: ["Num_heavy_atoms > 5"]
  - type: WriteStructures
    config:
      output_file: "filtered.maegz"
Parameters:

output_file – Path to the output structure file.

__init__(output_file: str)
expand(pcoll: PCollection[Structure])
class schrodinger.application.transforms.chemio.CreateStructuresFromSmiles(elements: Iterable[str])

Bases: PTransform

Create Structure objects from a list of SMILES strings.

This is a source transform that creates structures from SMILES.

Example usage in YAML:

pipeline:
  - type: CreateStructuresFromSmiles
    config:
      elements:
        - "c1ccccc1"
        - "CCO"
Parameters:

elements – List of SMILES strings to convert to structures.

__init__(elements: Iterable[str])
expand(pbegin: PBegin)
class schrodinger.application.transforms.chemio.StructureToRow(label: Optional[str] = None)

Bases: PTransform

Convert Structure objects to Beam Rows.

This transform wraps each Structure in a Row with a ‘structure’ field, enabling use of Row-based transforms like MapToFields.

Example usage in YAML:

pipeline:
  - type: CreateStructuresFromSmiles
    config:
      elements:
        - "c1ccccc1"
  - type: LigFilter
    config:
      criteria: ["Num_heavy_atoms > 3"]
  - type: StructureToRow
  - type: MapToFields
    config:
      fields:
        title:
          callable: "lambda row: row.structure.title"
expand(pcoll: PCollection[Structure])
class schrodinger.application.transforms.chemio.ReadMolsFromFile(input_file: str, silent: bool = False)

Bases: PTransform

Read RDKit Mol objects from a SMILES file.

This is a source transform that reads molecules from a file containing newline-separated SMILES strings.

Example usage in YAML:

pipeline:
  - type: ReadMolsFromFile
    config:
      input_file: "ligands.smi"
Parameters:
  • input_file – Path to the SMILES file to read.

  • silent – If True, suppress warnings for invalid SMILES.

__init__(input_file: str, silent: bool = False)
expand(pbegin: PBegin)
class schrodinger.application.transforms.chemio.WriteMolsToSmilesFile(output_file: str)

Bases: PTransform

Write RDKit Mol objects to a SMILES file.

This is a sink transform that writes molecules to a file as newline-separated SMILES strings. It passes through the input molecules unchanged for chaining.

Example usage in YAML:

pipeline:
  - type: CreateMolsFromSmiles
    config:
      elements:
        - "c1ccccc1"
  - type: WriteMolsToSmilesFile
    config:
      output_file: "output.smi"
Parameters:

output_file – Path to the output SMILES file.

__init__(output_file: str)
expand(pcoll: PCollection[Mol])
class schrodinger.application.transforms.chemio.CreateMolsFromSmiles(elements: Iterable[str])

Bases: PTransform

Create RDKit Mol objects from a list of SMILES strings.

This is a source transform that creates molecules from SMILES.

Example usage in YAML:

pipeline:
  - type: CreateMolsFromSmiles
    config:
      elements:
        - "c1ccccc1"
        - "CCO"
Parameters:

elements – List of SMILES strings to convert to molecules.

__init__(elements: Iterable[str])
expand(pbegin: PBegin)
class schrodinger.application.transforms.chemio.ReadFromPDB(pdb_codes: list[str], preserve_caps: bool = False)

Bases: PTransform

Read structures from the PDB (Protein Data Bank) by PDB codes.

This is a source transform that downloads PDB files and yields Structure objects with the PDB code stored as property s_seam_pdb_code.

Example usage in YAML:

pipeline:
  - type: ReadFromPDB
    config:
      pdb_codes:
        - "1A2B"
        - "3C4D"
      preserve_caps: false
Parameters:
  • pdb_codes – List of PDB codes to download.

  • preserve_caps – If True, preserve original capitalization.

PDB_ID_PROPERTY = 's_seam_pdb_code'
__init__(pdb_codes: list[str], preserve_caps: bool = False)
expand(pbegin: PBegin)
class schrodinger.application.transforms.chemio.ReadFromChembl(sample_size: Optional[int] = None)

Bases: PTransform

Read molecules from the ChEMBL database.

This is a source transform that fetches molecules from the ChEMBL API and yields Structure objects with the ChEMBL ID stored as property s_seam_chembl_id.

Example usage in YAML:

pipeline:
  - type: ReadFromChembl
    config:
      sample_size: 100
Parameters:

sample_size – Maximum number of molecules to fetch. If not provided, all available molecules will be fetched.

CHEMBL_URL = 'https://www.ebi.ac.uk/chembl/api/data/molecule'
DEFAULT_PAGE_SIZE = 1000
CHEMBL_REQUEST_PARAMS = {'format': 'json', 'only': 'molecule_chembl_id,molecule_structures'}
CHEMBL_ID_PROPERTY = 's_seam_chembl_id'
__init__(sample_size: Optional[int] = None)
static create_request(offset: int, limit: int) dict

Construct a ChEMBL API request with pagination parameters.

Parameters:
  • offset – The starting index from which to fetch molecules.

  • limit – The maximum number of molecules to fetch in this request.

expand(pbegin: PBegin)