schrodinger.tasks.beamtasks module¶
- class schrodinger.tasks.beamtasks.BeamTask(*args, _param_type=<object object>, **kwargs)¶
Bases:
schrodinger.tasks.jobtasks.ComboJobTask
A base class for tasks that use Apache Beam to process input items in a pipeline. To create a BeamTask:
Specify the ItemInputClass that defines the type of input items. Define a Settings CompoundParam (if the pipeline requires any settings). Override the definePipeline method to define the Beam pipeline.
For an example, see scripts/examples/tasks/hello_beamtask.py.
- InputItemClass = NotImplemented¶
- class Settings(*args, _param_type=<object object>, **kwargs)¶
- __init__(*args, **kwargs)¶
- setRunner(runner)¶
Set the runner to use for the Beam pipeline. By default, a SeamRunner will be used.
- checkForOneInputSource()¶
- mainFunction()¶
- static definePipeline(InputTransform, OutputTransform, settings, options)¶
Override this static method to define the Beam pipeline. This method should contain the following block:
with beam.Pipeline(runner=SeamRunner(), options=options) as p: (p | InputTransform() # Additional transforms to define the workflow. Use settings # as needed to configure transforms | OutputTransform() )
- Parameters
InputTransform – A PTransform to create the input PCollection.
OutputTransform – A PTransform to save the output PCollection.
settings – The workflow settings, defined by the Settings class.
options – The pipeline options to use for the Beam pipeline. Pass these options to the beam.Pipeline constructor with options=options.
- processOutputs(output_list: list)¶
Override this method to do output processing in the launch process. The output_list will contain the items from the PCollection returned by the definePipeline method.
This is a convenience API to allow for basic output processing in the launch process. However, using this API causes all output to be collected in memory.
This feature also relies on a global output_list variable. Running multiple BeamTasks in the same process may cause unexpected behavior.
For improved robustness and memory usage, move output processing/writing into the beam pipeline itself.
- calling_contextChanged¶
A
pyqtSignal
emitted by instances of the class.
- calling_contextReplaced¶
A
pyqtSignal
emitted by instances of the class.
- failure_infoChanged¶
A
pyqtSignal
emitted by instances of the class.
- failure_infoReplaced¶
A
pyqtSignal
emitted by instances of the class.
- inputChanged¶
A
pyqtSignal
emitted by instances of the class.
- inputReplaced¶
A
pyqtSignal
emitted by instances of the class.
- job_configChanged¶
A
pyqtSignal
emitted by instances of the class.
- job_configReplaced¶
A
pyqtSignal
emitted by instances of the class.
- max_progressChanged¶
A
pyqtSignal
emitted by instances of the class.
- max_progressReplaced¶
A
pyqtSignal
emitted by instances of the class.
- nameChanged¶
A
pyqtSignal
emitted by instances of the class.
- nameReplaced¶
A
pyqtSignal
emitted by instances of the class.
- outputChanged¶
A
pyqtSignal
emitted by instances of the class.
- outputReplaced¶
A
pyqtSignal
emitted by instances of the class.
- progressChanged¶
A
pyqtSignal
emitted by instances of the class.
- progressReplaced¶
A
pyqtSignal
emitted by instances of the class.
- progress_stringChanged¶
A
pyqtSignal
emitted by instances of the class.
- progress_stringReplaced¶
A
pyqtSignal
emitted by instances of the class.
- statusChanged¶
A
pyqtSignal
emitted by instances of the class.
- statusReplaced¶
A
pyqtSignal
emitted by instances of the class.