schrodinger.seam.io.chemio module¶
Transforms for reading and writing structures and molecules.
- class schrodinger.seam.io.chemio.ReadStructuresFromFile(file_pattern: Union[str, Path], **kwargs)¶
Bases:
_LocalOnlyPTransform,ReadStructuresFromFileRead a file (or files) containing a structure or a list of structures and return a PCollection of schrodinger.structure.Structure objects.
Example:
>>> from schrodinger.test import mmshare_data_file >>> with beam.Pipeline() as p: ... _ = (p ... | ReadStructuresFromFile(mmshare_data_file('cookbook/stereoisomers-form-1.maegz')) ... | beam.Map(lambda st: st.title) ... | textio.WriteToText('titles.txt')) >>> with open('titles.txt') as f: ... titles = sorted(set(line.strip() for line in f)) >>> titles ['stereoisomers-1-form-1', 'stereoisomers-2-form-1', ...]
- Parameters:
file_pattern – A file name or glob pattern.
- class schrodinger.seam.io.chemio.ReadAllStructuresFromFile(label: Optional[str] = None)¶
Bases:
_LocalOnlyPTransform,ReadAllStructuresFromFileA
PTransformfor reading aPCollectionof structure files.Example:
>>> from schrodinger.test import mmshare_data_file >>> with beam.Pipeline() as p: ... _ = (p ... | beam.Create([mmshare_data_file('cookbook/stereoisomers-form-1.maegz')]) ... | ReadAllStructuresFromFile() ... | beam.Map(lambda st: st.title) ... | textio.WriteToText('titles.txt')) >>> with open('titles.txt') as f: ... titles = sorted(set(line.strip() for line in f)) >>> titles ['stereoisomers-1-form-1', 'stereoisomers-2-form-1', ...]
- class schrodinger.seam.io.chemio.WriteStructuresToFile(file_name: str | pathlib.Path, sort_key: Optional[Callable[[Structure], Any]] = None, title_prefix: Optional[str] = None, overwrite: bool = True)¶
Bases:
PTransformWrite a PCollection of schrodinger.structure.Structure objects to a file, and return a PCollection of the output file name.
The file format is determined by the file extension. See schrodinger.structure.StructureWriter for more details.
Example:
>>> from schrodinger import structure >>> from pathlib import Path >>> outfile = Path('out.maegz') >>> with beam.Pipeline() as p: ... sts = [structure.create_new_structure(num_atoms=i) for i in range(1, 11)] ... _ = (p | beam.Create(sts) | WriteStructuresToFile(outfile)) >>> sts = list(structure.StructureReader(outfile)) >>> len(sts) 10
- Parameters:
file_name – the output file name
sort_key – a function of one argument that is used to extract a comparison key from each element.
title_prefix – An optional prefix to add to the title of each structure. If provided, the title of each structure will be set to “{title_prefix}{index}” where index is the index of the structure in the sorted list. If not provided, the title of each structure will be unchanged.
overwrite – If True, the file will be overwritten if it already exists.
- Raises:
ValueError – if the file already exists
- __init__(file_name: str | pathlib.Path, sort_key: Optional[Callable[[Structure], Any]] = None, title_prefix: Optional[str] = None, overwrite: bool = True)¶
- display_data() dict¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
- Returns:
Dict[str, Any]: A dictionary containing
key:valuepairs. The value might be an integer, float or string value; aDisplayDataItemfor values that have more data (e.g. short value, label, url); or aHasDisplayDatainstance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
- class schrodinger.seam.io.chemio.WriteStructuresToFiles(path: str | pathlib.Path, destination: Callable[[Structure], str], file_extension: str = '.maegz', overwrite: bool = True)¶
Bases:
_LocalOnlyPTransform,WriteStructuresToFilesWrite a PCollection of structures to multiple files based on a destination function.
Each structure is routed to a file determined by the destination callable. Structures with the same destination are written to the same file.
Example:
>>> from schrodinger import structure >>> from pathlib import Path >>> import shutil >>> outdir = Path('output_by_title') >>> shutil.rmtree(outdir, ignore_errors=True) >>> with beam.Pipeline() as p: ... sts = [structure.create_new_structure(num_atoms=i) for i in range(1, 4)] ... for st in sts: ... st.title = f'mol_{st.atom_total}' ... _ = (p | beam.Create(sts) ... | WriteStructuresToFiles( ... path=outdir, ... destination=lambda st: st.title)) >>> sorted(f.name for f in outdir.glob('*.maegz')) ['mol_1.maegz', 'mol_2.maegz', 'mol_3.maegz']
- Parameters:
path – Directory where output files are written.
destination – A callable that takes a Structure and returns a base filename (without extension).
file_extension – Extension for output files (default: “.maegz”).
overwrite – If True, overwrite existing files (default: True).
- class schrodinger.seam.io.chemio.ReadMolsFromFile(file_pattern: Union[str, Path], silent=False, **kwargs)¶
Bases:
_LocalOnlyPTransform,ReadMolsFromFileRead a file containing a newline separated list of SMILES strings and return a PCollection of RDKit molecules.
Invalid SMILES strings are skipped. A warning is printed if silent is set to False.
Example:
>>> from pathlib import Path >>> infile = Path('test.smi') >>> _ = infile.write_text("C\nCC\nCCC") >>> Path('num_atoms.txt').unlink(missing_ok=True) >>> with beam.Pipeline() as p: ... _ = (p ... | ReadMolsFromFile(infile) ... | beam.Map(lambda m: m.GetNumHeavyAtoms()) ... | textio.WriteToText('num_atoms.txt')) >>> with open('num_atoms.txt') as f: ... num_atoms = sorted(int(line.strip()) for line in f) >>> num_atoms [1, 2, 3]
- Parameters:
file_pattern – A file name or glob pattern.
- class schrodinger.seam.io.chemio.WriteMolsToFile(file_name: str | pathlib.Path, overwrite: bool = True, **kwargs)¶
Bases:
PTransformWrite a PCollection of RDKit molecules to a file as a newline separated list of SMILES strings.
Example:
>>> from rdkit import Chem >>> from pathlib import Path >>> outfile = Path('test.smi') >>> outfile.unlink(missing_ok=True) >>> with beam.Pipeline() as p: ... mols = [Chem.MolFromSmiles('C' * i) for i in range(1, 4)] ... _ = (p | beam.Create(mols) | WriteMolsToFile(outfile)) >>> with open(outfile) as f: ... smiles = sorted(line.strip() for line in f) >>> smiles ['C', 'CC', 'CCC']
- Parameters:
file_name – the output file name
- Raises:
ValueError – if the file already exists
- __init__(file_name: str | pathlib.Path, overwrite: bool = True, **kwargs) None¶
- display_data() dict¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
- Returns:
Dict[str, Any]: A dictionary containing
key:valuepairs. The value might be an integer, float or string value; aDisplayDataItemfor values that have more data (e.g. short value, label, url); or aHasDisplayDatainstance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }