schrodinger.seam.examples.haystack module¶
A pedagogical example workflow to demonstrate how logging and works with seam.
The workflow creates a collection of strings, most of which are “haystalk” and a few of which are “needle”. The workflow then processes each item in the collection, sleeping for a fixed amount of time for each “haystalk”/”needle” and aggregating them into lists.
To get a sense of how logging works, run this example and read the README found in the generated “seam/logs” directory.
Basic usage (est. walltime: 1m)
$SCHRODINGER/run seam_example.py haystack
Parallelized usage with jobserver (est. walltime: 5m)
$SCHRODINGER/run seam_example.py haystack –per-stalk-sleep-time 0.001 -HOST localhost:8
- schrodinger.seam.examples.haystack.parse_args(args)¶
- class schrodinger.seam.examples.haystack.Bale(sleep_per_item: float, error_on_needle: bool)¶
Bases:
DoFnA DoFn that aggregates elements into lists and sleeps for a fixed amount of time per element.
- __init__(sleep_per_item: float, error_on_needle: bool)¶
- start_bundle()¶
Called before a bundle of elements is processed on a worker.
Elements to be processed are split into bundles and distributed to workers. Before a worker calls process() on the first element of its bundle, it calls this method.
- process(element: str)¶
Method to use for processing elements.
This is invoked by
DoFnRunnerfor each element of a inputPCollection.The following parameters can be used as default values on
processarguments to indicate that a DoFn accepts the corresponding parameters. For example, a DoFn might accept the element and its timestamp with the following signature:def process(element=DoFn.ElementParam, timestamp=DoFn.TimestampParam): ...
The full set of parameters is:
DoFn.ElementParam: element to be processed, should not be mutated.DoFn.SideInputParam: a side input that may be used when processing.DoFn.TimestampParam: timestamp of the input element.DoFn.WindowParam:Windowthe input element belongs to.DoFn.TimerParam: auserstate.RuntimeTimerobject defined by the spec of the parameter.DoFn.StateParam: auserstate.RuntimeStateobject defined by the spec of the parameter.DoFn.KeyParam: key associated with the element.DoFn.RestrictionParam: aniobase.RestrictionTrackerwill be provided here to allow treatment as a SplittableDoFn. The restriction tracker will be derived from the restriction provider in the parameter.DoFn.WatermarkEstimatorParam: a function that can be used to track output watermark of SplittableDoFnimplementations.DoFn.BundleContextParam: allows a shared context manager to be used per bundleDoFn.SetupContextParam: allows a shared context manager to be used per DoFn
- finish_bundle() Iterable[List[str]]¶
Called after a bundle of elements is processed on a worker.
- schrodinger.seam.examples.haystack.main(args=None)¶