schrodinger.seam.examples.haystack module¶
A pedagogical example workflow to demonstrate how logging and works with seam.
The workflow creates a collection of strings, most of which are “haystalk” and a few of which are “needle”. The workflow then processes each item in the collection, sleeping for a fixed amount of time for each “haystalk”/”needle” and aggregating them into lists.
To get a sense of how logging works, run this example and read the README found in the generated “seam/logs” directory.
Basic usage (est. walltime: 1m)
$SCHRODINGER/run seam_example.py haystack
Parallelized usage with jobserver (est. walltime: 5m)
$SCHRODINGER/run seam_example.py haystack –per-stalk-sleep-time 0.001 -HOST localhost:8
- schrodinger.seam.examples.haystack.parse_args(args)¶
- class schrodinger.seam.examples.haystack.Bale(sleep_per_item: float, error_on_needle: bool)¶
Bases:
DoFn
A DoFn that aggregates elements into lists and sleeps for a fixed amount of time per element.
- __init__(sleep_per_item: float, error_on_needle: bool)¶
- start_bundle()¶
Called before a bundle of elements is processed on a worker.
Elements to be processed are split into bundles and distributed to workers. Before a worker calls process() on the first element of its bundle, it calls this method.
- process(element: str)¶
Method to use for processing elements.
This is invoked by
DoFnRunner
for each element of a inputPCollection
.The following parameters can be used as default values on
process
arguments to indicate that a DoFn accepts the corresponding parameters. For example, a DoFn might accept the element and its timestamp with the following signature:def process(element=DoFn.ElementParam, timestamp=DoFn.TimestampParam): ...
The full set of parameters is:
DoFn.ElementParam
: element to be processed, should not be mutated.DoFn.SideInputParam
: a side input that may be used when processing.DoFn.TimestampParam
: timestamp of the input element.DoFn.WindowParam
:Window
the input element belongs to.DoFn.TimerParam
: auserstate.RuntimeTimer
object defined by the spec of the parameter.DoFn.StateParam
: auserstate.RuntimeState
object defined by the spec of the parameter.DoFn.KeyParam
: key associated with the element.DoFn.RestrictionParam
: aniobase.RestrictionTracker
will be provided here to allow treatment as a SplittableDoFn
. The restriction tracker will be derived from the restriction provider in the parameter.DoFn.WatermarkEstimatorParam
: a function that can be used to track output watermark of SplittableDoFn
implementations.
- finish_bundle() Iterable[List[str]] ¶
Called after a bundle of elements is processed on a worker.
- schrodinger.seam.examples.haystack.main(args=None)¶