schrodinger.seam.examples.haystack module

A pedagogical example workflow to demonstrate how logging and works with seam.

The workflow creates a collection of strings, most of which are “haystalk” and a few of which are “needle”. The workflow then processes each item in the collection, sleeping for a fixed amount of time for each “haystalk”/”needle” and aggregating them into lists.

To get a sense of how logging works, run this example and read the README found in the generated “seam/logs” directory.

Basic usage (est. walltime: 1m)

$SCHRODINGER/run seam_example.py haystack

Parallelized usage with jobserver (est. walltime: 5m)

$SCHRODINGER/run seam_example.py haystack –per-stalk-sleep-time 0.001 -HOST localhost:8

schrodinger.seam.examples.haystack.parse_args(args)
class schrodinger.seam.examples.haystack.Bale(sleep_per_item: float, error_on_needle: bool)

Bases: DoFn

A DoFn that aggregates elements into lists and sleeps for a fixed amount of time per element.

__init__(sleep_per_item: float, error_on_needle: bool)
start_bundle()

Called before a bundle of elements is processed on a worker.

Elements to be processed are split into bundles and distributed to workers. Before a worker calls process() on the first element of its bundle, it calls this method.

process(element: str)

Method to use for processing elements.

This is invoked by DoFnRunner for each element of a input PCollection.

The following parameters can be used as default values on process arguments to indicate that a DoFn accepts the corresponding parameters. For example, a DoFn might accept the element and its timestamp with the following signature:

def process(element=DoFn.ElementParam, timestamp=DoFn.TimestampParam):
  ...

The full set of parameters is:

  • DoFn.ElementParam: element to be processed, should not be mutated.

  • DoFn.SideInputParam: a side input that may be used when processing.

  • DoFn.TimestampParam: timestamp of the input element.

  • DoFn.WindowParam: Window the input element belongs to.

  • DoFn.TimerParam: a userstate.RuntimeTimer object defined by the spec of the parameter.

  • DoFn.StateParam: a userstate.RuntimeState object defined by the spec of the parameter.

  • DoFn.KeyParam: key associated with the element.

  • DoFn.RestrictionParam: an iobase.RestrictionTracker will be provided here to allow treatment as a Splittable DoFn. The restriction tracker will be derived from the restriction provider in the parameter.

  • DoFn.WatermarkEstimatorParam: a function that can be used to track output watermark of Splittable DoFn implementations.

Args:

element: The element to be processed *args: side inputs **kwargs: other keyword arguments.

Returns:

An Iterable of output elements or None.

finish_bundle() Iterable[List[str]]

Called after a bundle of elements is processed on a worker.

schrodinger.seam.examples.haystack.main(args=None)