schrodinger.seam.io.chemio module

Transforms for reading and writing structures and molecules.

class schrodinger.seam.io.chemio.ReadStructuresFromFile(file_pattern: Union[str, Path], **kwargs)

Bases: _LocalOnlyPTransform, ReadStructuresFromFile

Read a file (or files) containing a structure or a list of structures and return a PCollection of schrodinger.structure.Structure objects.

Example:

>>> from schrodinger.test import mmshare_data_file
>>> with beam.Pipeline() as p:
...     _ = (p
...     | ReadStructuresFromFile(mmshare_data_file('cookbook/stereoisomers-form-1.maegz'))
...     | beam.Map(lambda st: st.title)
...     | textio.WriteToText('titles.txt'))
>>> with open('titles.txt') as f:
...     titles = sorted(set(line.strip() for line in f))
>>> titles 
['stereoisomers-1-form-1', 'stereoisomers-2-form-1', ...]
Parameters:

file_pattern – A file name or glob pattern.

class schrodinger.seam.io.chemio.ReadAllStructuresFromFile(label: Optional[str] = None)

Bases: _LocalOnlyPTransform, ReadAllStructuresFromFile

A PTransform for reading a PCollection of structure files.

Example:

>>> from schrodinger.test import mmshare_data_file
>>> with beam.Pipeline() as p:
...     _ = (p
...     | beam.Create([mmshare_data_file('cookbook/stereoisomers-form-1.maegz')])
...     | ReadAllStructuresFromFile()
...     | beam.Map(lambda st: st.title)
...     | textio.WriteToText('titles.txt'))
>>> with open('titles.txt') as f:
...     titles = sorted(set(line.strip() for line in f))
>>> titles 
['stereoisomers-1-form-1', 'stereoisomers-2-form-1', ...]
class schrodinger.seam.io.chemio.WriteStructuresToFile(file_name: str | pathlib.Path, sort_key: ~typing.Optional[~typing.Callable[[~schrodinger.structure._structure.Structure], ~typing.Any]] = <function WriteStructuresToFile.<lambda>>, title_prefix: ~typing.Optional[str] = None, overwrite: bool = True)

Bases: PTransform

Write a PCollection of schrodinger.structure.Structure objects to a file, and return a PCollection of the output file name.

The file format is determined by the file extension. See schrodinger.structure.StructureWriter for more details.

Example:

>>> from schrodinger import structure
>>> from pathlib import Path
>>> outfile = Path('out.maegz')
>>> with beam.Pipeline() as p:
...     sts = [structure.create_new_structure(num_atoms=i) for i in range(1, 11)]
...     _ = (p | beam.Create(sts) | WriteStructuresToFile(outfile))
>>> sts = list(structure.StructureReader(outfile))
>>> len(sts)
10
Parameters:
  • file_name – the output file name

  • sort_key – a function of one argument that is used to extract a comparison key from each element.

  • title_prefix – An optional prefix to add to the title of each structure. If provided, the title of each structure will be set to “{title_prefix}{index}” where index is the index of the structure in the sorted list. If not provided, the title of each structure will be unchanged.

  • overwrite – If True, the file will be overwritten if it already exists.

Raises:

ValueError – if the file already exists

__init__(file_name: str | pathlib.Path, sort_key: ~typing.Optional[~typing.Callable[[~schrodinger.structure._structure.Structure], ~typing.Any]] = <function WriteStructuresToFile.<lambda>>, title_prefix: ~typing.Optional[str] = None, overwrite: bool = True)
display_data() dict

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:

Dict[str, Any]: A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:

{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
class schrodinger.seam.io.chemio.ReadMolsFromFile(file_pattern: Union[str, Path], silent=False, **kwargs)

Bases: _LocalOnlyPTransform, ReadMolsFromFile

Read a file containing a newline separated list of SMILES strings and return a PCollection of RDKit molecules.

Invalid SMILES strings are skipped. A warning is printed if silent is set to False.

Example:

>>> from pathlib import Path
>>> infile = Path('test.smi')
>>> _ = infile.write_text("C\nCC\nCCC")
>>> Path('num_atoms.txt').unlink(missing_ok=True)
>>> with beam.Pipeline() as p:
...     _ = (p
...     | ReadMolsFromFile(infile)
...     | beam.Map(lambda m: m.GetNumHeavyAtoms())
...     | textio.WriteToText('num_atoms.txt'))
>>> with open('num_atoms.txt') as f:
...     num_atoms = sorted(int(line.strip()) for line in f)
>>> num_atoms
[1, 2, 3]
Parameters:

file_pattern – A file name or glob pattern.

class schrodinger.seam.io.chemio.WriteMolsToFile(file_name: str | pathlib.Path, overwrite: bool = True, **kwargs)

Bases: PTransform

Write a PCollection of RDKit molecules to a file as a newline separated list of SMILES strings.

Example:

>>> from rdkit import Chem
>>> from pathlib import Path
>>> outfile = Path('test.smi')
>>> outfile.unlink(missing_ok=True)
>>> with beam.Pipeline() as p:
...     mols = [Chem.MolFromSmiles('C' * i) for i in range(1, 4)]
...     _ = (p | beam.Create(mols) | WriteMolsToFile(outfile))
>>> with open(outfile) as f:
...     smiles = sorted(line.strip() for line in f)
>>> smiles
['C', 'CC', 'CCC']
Parameters:

file_name – the output file name

Raises:

ValueError – if the file already exists

__init__(file_name: str | pathlib.Path, overwrite: bool = True, **kwargs) None
display_data() dict

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:

Dict[str, Any]: A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:

{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}