schrodinger.tasks.beamtasks module

class schrodinger.tasks.beamtasks.BeamTask(*args, _param_type=<object object>, **kwargs)

Bases: schrodinger.tasks.jobtasks.ComboJobTask

A base class for tasks that use Apache Beam to process input items in a pipeline. To create a BeamTask:

Specify the ItemInputClass that defines the type of input items. Define a Settings CompoundParam (if the pipeline requires any settings). Override the definePipeline method to define the Beam pipeline.

For an example, see scripts/examples/tasks/hello_beamtask.py.

InputItemClass = NotImplemented
class Settings(*args, _param_type=<object object>, **kwargs)

Bases: schrodinger.models.parameters.CompoundParam

__init__(*args, **kwargs)
setRunner(runner)

Set the runner to use for the Beam pipeline. By default, a SeamRunner will be used.

checkForOneInputSource()
mainFunction()
static definePipeline(InputTransform, OutputTransform, settings, options)

Override this static method to define the Beam pipeline. This method should contain the following block:

with beam.Pipeline(runner=SeamRunner(), options=options) as p:
    (p
     | InputTransform()
     # Additional transforms to define the workflow. Use settings
     # as needed to configure transforms
     | OutputTransform()
     )
Parameters
  • InputTransform – A PTransform to create the input PCollection.

  • OutputTransform – A PTransform to save the output PCollection.

  • settings – The workflow settings, defined by the Settings class.

  • options – The pipeline options to use for the Beam pipeline. Pass these options to the beam.Pipeline constructor with options=options.

processOutputs(output_list: list)

Override this method to do output processing in the launch process. The output_list will contain the items from the PCollection returned by the definePipeline method.

This is a convenience API to allow for basic output processing in the launch process. However, using this API causes all output to be collected in memory.

This feature also relies on a global output_list variable. Running multiple BeamTasks in the same process may cause unexpected behavior.

For improved robustness and memory usage, move output processing/writing into the beam pipeline itself.

calling_contextChanged

A pyqtSignal emitted by instances of the class.

calling_contextReplaced

A pyqtSignal emitted by instances of the class.

failure_infoChanged

A pyqtSignal emitted by instances of the class.

failure_infoReplaced

A pyqtSignal emitted by instances of the class.

inputChanged

A pyqtSignal emitted by instances of the class.

inputReplaced

A pyqtSignal emitted by instances of the class.

job_configChanged

A pyqtSignal emitted by instances of the class.

job_configReplaced

A pyqtSignal emitted by instances of the class.

max_progressChanged

A pyqtSignal emitted by instances of the class.

max_progressReplaced

A pyqtSignal emitted by instances of the class.

nameChanged

A pyqtSignal emitted by instances of the class.

nameReplaced

A pyqtSignal emitted by instances of the class.

outputChanged

A pyqtSignal emitted by instances of the class.

outputReplaced

A pyqtSignal emitted by instances of the class.

progressChanged

A pyqtSignal emitted by instances of the class.

progressReplaced

A pyqtSignal emitted by instances of the class.

progress_stringChanged

A pyqtSignal emitted by instances of the class.

progress_stringReplaced

A pyqtSignal emitted by instances of the class.

statusChanged

A pyqtSignal emitted by instances of the class.

statusReplaced

A pyqtSignal emitted by instances of the class.