schrodinger.seam.transforms.core module

schrodinger.seam.transforms.core.yields_elements(finish_bundle_method: Callable)

A decorator that wraps a DoFn’s finish_bundle method to allow it to yield elements.

Example usage:

>>> import apache_beam as beam
>>> from schrodinger.seam.transforms import core
>>>
>>> class Batcher(beam.DoFn):
...
...     def __init__(self):
...         self._batch = []
...
...     def start_bundle(self):
...         self._batch.clear()
...
...     def process(self, element):
...         self._batch.append(element)
...
...     @yields_elements
...     def finish_bundle(self):
...         yield self._batch
...         self._batch.clear()
...
class schrodinger.seam.transforms.core.GroupIntoBatches(batch_size: int)

Bases: PTransform

PTransform that batches the input into desired batch size after grouping by key.

Identical behavior to Beam’s GroupIntoBatches but doesn’t use Beam timer’s (which are unsupported by the SeamRunner).

__init__(batch_size: int)
expand(pcoll)
class schrodinger.seam.transforms.core.TeeIf(condition: bool, *consumers: Union[PTransform[PCollection[T], Any], Callable[[PCollection[T]], Any]])

Bases: PTransform

A PTransform which performs beam.Tee() if the condition is True, and does nothing if the condition is False.

Example usage:

>>> import apache_beam as beam
>>> from schrodinger import structure
>>> from schrodinger.seam.io import chemio
>>> from schrodinger.seam.transforms import core
>>>
>>> sts = [structure.create_new_structure(num_atoms=i) for i in range(1, 11)]
>>> debug_flag = True
>>>
>>> with beam.Pipeline() as p:
...     st_count = (p
...                 | beam.Create(sts)
...                         | core.TeeIf(debug_flag, chemio.WriteStructuresToFile("output_file.maegz"))
...                 | beam.combiners.Count.Globally()
...                )
>>> # st_count will be 10, and output_file.maegz will be created because debug_flag was set to True
__init__(condition: bool, *consumers: Union[PTransform[PCollection[T], Any], Callable[[PCollection[T]], Any]])
expand(pcoll)