schrodinger.application.glide_ws.nodes module¶
Node Classes for WScore. A “node” is a stage in the WScore job pipeline. Nodes have the property that each has a single JobDJ loop.
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.application.glide_ws.nodes.get_csvwriter(model, mode, csv_filename)¶
- class schrodinger.application.glide_ws.nodes.Subjob(cmd, cmd_dir, jobname, write_args)¶
Bases:
tuple
- cmd¶
Alias for field number 0
- cmd_dir¶
Alias for field number 1
- jobname¶
Alias for field number 2
- write_args¶
Alias for field number 3
- class schrodinger.application.glide_ws.nodes.ArrayJob(cmd, cmd_dir, jobname, subjobname_template, write_args)¶
Bases:
tuple
- cmd¶
Alias for field number 0
- cmd_dir¶
Alias for field number 1
- jobname¶
Alias for field number 2
- subjobname_template¶
Alias for field number 3
- write_args¶
Alias for field number 4
- class schrodinger.application.glide_ws.nodes.Node(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
object
Base class for WScore driver nodes.
A “node” is a stage in the WScore job pipeline. Nodes have the property that each has a single JobDJ loop.
This class provides a framework where many methods do nothing by default, and subclasses are expected to override them as needed. All subclasses must override the ‘command’ method, and almost certainly will want to override ‘input_filename’ and ‘input_string’ as well.
Since the WScore workflow is centered around the concept of complexes, the default Node workflow has implicit loops over the complexes array, and passes the Complex object when calling many methods, such as the three mentioned above. These methods can also take extra arguments at the end, denoted by
*a
, because some subclasses also need to loop over, say, parameter sets or subjob numbers in addition to looping over complexes.From the point of view of the user of object instances derived from this class, the main methods of interest are ‘configure’, ‘run’, ‘inputs’, and ‘outputs’. Properties of interest include ‘name’ and ‘status’.
It is important to know which parts of the lifecycle of a node run during startup and which run only during the backend run, because these two stages often happen on different hosts.
Startup:
init configure inputs outputs
Run:
init configure run begin_node prepare_inputs add_jobs run_subjobs end_node
Note: the methods called by run (begin_node, prepare_inputs, etc.) may optionally return the next status value. For example, begin_node can send the message that run() should skip directly to end_node by returning SUBJOBS_DONE.
- default_name = 'NODE'¶
- __init__(wscore_job, name=None, jobdj=None, job_class=None)¶
Constructor. The only required argument, ‘wscore_job’, is the “parent”, a WscoreJob object. It is used because it contains the information needed to set up the jobs, including complexes, offsets, models, and hosts, as well as being responsible for saving state.
The other, optional arguments are useful for testing: ‘name’ overrides the default node name for the class, ‘jobdj’ lets callers provide their own, pre-existing JobDJ-like object for running the subjobs, and ‘job_class’ is used for creating the subjobs that are added to the JobDJ object.
- set_subnode_start(subnode)¶
Set the starting subnode for this node. Possible values are SETUP, RUN, and POSTPROCESS.
- set_subnode_stop(subnode)¶
Set the stopping subnode for this node. Possible values are SETUP, RUN, and POSTPROCESS.
- init()¶
Called from the constructor to perform optional initialization. Does nothing by default.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- begin_node()¶
Called when the node begins running. Does nothing by default.
- configure(params)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- print_header()¶
Print a header when the node begins running.
Print a footer when the node has completed running.
- print_continue_header()¶
Print a continuation header when the node continues running because the job was launched with -RESTART.
- print_skip_message()¶
Print a message when the node is skipped because we are trying to restart from a node that is already done.
- print_stop_at()¶
Print a message when the node stops early because the ‘stop_at’ property was set (this happens when the user specifies a subnode for ‘end_node’ in the input file; for example, NATIVE.SETUP).
- run()¶
Run the node, normally including preprocessing, input setup, jobdj loop, and postprocessing. Steps that were already run because we are using -RESTART are skipped as appropriate. Also, the range of steps to run can be narrowed by calling set_subnode_start and set_subnode_stop before calling run.
- update_total_cpu_times()¶
Update the total subjob cpu time for the current WScore job Does nothing by default
- property status¶
Status of the node: NEW, BEGUN, INPUTS_WRITTEN, RUNNING, SUBJOBS_DONE, or DONE.
- save_state()¶
Tell our parent WscoreJob to save its state. This method is called whenever something meaningful such as a status change occurs or whenever a subjob is complete.
- prepare_inputs()¶
Create all the input files for this node.
- logfiles()¶
Generator yielding the logfiles for the subjobs for the current node.
- subjob_outputs(job)¶
Return the output file(s) for a given subjob.
- Parameters
job (Subjob) – subjob
- Return type
iterable of str
- register_logs()¶
Register subjob logfiles as output files with job control, so if the driver dies, the user will get something back to try to figure out what happened. (For a successful job, the logfiles are archived and deleted at the end of each node).
- register_outputs()¶
- archive_logs()¶
Archive and delete the subjob logfiles at the end of the node.
- add_jobs()¶
Add the jobs to the JobDJ object.
- need_job(complex, *a)¶
Return True if we need to run one or more jobs for this particular complex. The base implementation always returns True.
- complex_input_prereqs(complex, *a)¶
Check whether all the requirements to run the job(s) for this complex are met, or raise a fatal error if not. Called by complex_job_generator; does nothing by default.
- input_string(complex, *a)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- write_input_file(cmd_dir, complex, *a)¶
Write the input file for a single job for a given complex Calls ‘input_filename’ for the filename, ‘job_directory’ for the directory, and ‘input_string’ for the body of the input file. Does nothing if either ‘fname’ or the return value of ‘input_string’ are None.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- input_filename(complex, *a)¶
Return the input filename for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- jobname(complex, *a)¶
Return the job name for a single job for a given complex. The default is “<node>_<complex>”, where <node> is the beautified version of the node name (lowercase and stripped of _GEN suffixes).
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- command(complex, *a)¶
Return the command line for a subjob as a list. Subclasses MUST override this method.
- job_directory(complex, *a)¶
Return the path to the directory where the job(s) for a single complex will be run. The default is to return complex.directory.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- end_node_per_complex(complex)¶
Do postprocessing of a given complex after all the subjobs are done. Does nothing by default.
- subjob_done(job)¶
Called whenever a job is done. It is passed the JobControlJob object used by JobDJ. Does nothing by default.
- run_subjobs()¶
Run the subjobs for this node using JobDJ.
- subjob_hosts()¶
Return the subjob hosts as a list of (host, ncpus) tuples. The base implementation returns the hosts lists from the parent WScore job, but it may be overridden if a specific subclass needs to use different hosts.
- new_jobdj()¶
Create and return a new JobDJ object to be used for running the subjobs.
- property jobdj¶
The JobDJ used for running the subjobs. If not set, a new JobDJ is created the first time this property is accessed.
- complex_ligfiles()¶
Return the set of ligand files for all complexes.
- complex_recepfiles()¶
Return the set of receptor files for all complexes.
- complex_gridfiles()¶
Return the set of grid files for all complexes.
- complex_wmapfiles()¶
Return the set of watermap files for all complexes. This includes the _wm.maegz or _wm.zip files for each complex, as well as the -continuous.maegz files for complexes for which the WaterMap file is not a zip file (meaning that they were generated by the WScore job, and therefore we want to make sure they get registered with job control).
- property state¶
The NodeState object for the current node. It can be used for storing arbitrary properties that will persist after the node is done and when a job is restarted.
- property jobarray¶
Return the value of the -ARRAY command-line flag.
- class schrodinger.application.glide_ws.nodes.GlideBaseNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.Node
Base class for nodes that run Glide jobs.
Provides a default command line and filename functions for the input, lib, and raw files, as well as a function for checking whether all the libfiles exist.
The command line is:
$SCHRODINGER/glide <jobname>.in
- METAL_LIGAND_CUTOFF = 4.0¶
- already_printed_metal_ligand_warning = False¶
- command(complex, *a)¶
Return the command line for a subjob as a list. Subclasses MUST override this method.
- input_filename(complex, *a)¶
Return the input filename for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- check_libfiles()¶
Check if all the libfiles for this node exist. If some don’t exist, either abort or remove the complexes that didn’t have a libfile, depending on the value of the -skipbad argument.
- lib_filename(complex, *a)¶
- raw_filename(complex, *a)¶
- pv_filename(complex, *a)¶
- check_libfile(complex, *a)¶
Check if the libfile for this complex exists, and print a warning if not. Returns True if the libfile exists, False if not. Additionally, check if any of the poses contained in the libfile feature an unsupported metal-ligand interaction
- check_for_metal_ligand_interaction(complex, posefile)¶
Check if any of the docked ligand poses are close to a metal atom, and print a warning message
- class schrodinger.application.glide_ws.nodes.ArrayJobMixin¶
Bases:
object
Mixin for nodes that support Job Array for running subjobs.
- monitor_jobarrays(jobs, sum_tasks, delay=10)¶
Waits until all the jobs represented by Job objects in ‘jobs’ are complete. Prints a progress message every time there is a change in the number of done/pending subjobs. Checks the status every ‘delay’ seconds. The total number of tasks should be passed in as ‘sum_tasks’.
- launch_arrays(host)¶
Launch the subjobs using job arrays. Returns a list of Job objects (one per array, not one per subjob).
- create_taskmaps()¶
Create the “taskmaps”: the lists of job indices that need to be run for each array job. They are stored in the self.taskmaps dict, using the jobname as the key.
- class schrodinger.application.glide_ws.nodes.DistributedGlideBaseNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.ArrayJobMixin
,schrodinger.application.glide_ws.nodes.GlideBaseNode
Base class for distributed Glide jobs. This class takes care of the splitting and merging.
Subclasses (or users of instances of subclasses) are responsible for setting self.ligfile, which is the ligand file used for all docking jobs (after splitting).
- glide_name = None¶
- init()¶
Called from the constructor to perform optional initialization. Does nothing by default.
- property ligfile¶
The ligand file for this node. This is a magic property that automatically accounts for the backend runtime path.
- property subjob_ligdir¶
The directory where we write the ligand files for the subjobs.
- begin_node()¶
Called when the node begins running. Does nothing by default.
- prepare_sip_inputs()¶
- process_ligfile()¶
Determine the number of ligands to dock, the number of subjobs to run, and other parameters related to subjob splitting.
- njobs(*a)¶
- property nstructs¶
The maximum number of structures per subjob. If not set explicitly, this property is either obtained from the -NSTRUCTS command-line argument, or a default value is chosen based on the number of ligands to dock.
- static nligs_to_dock(startlig, endlig, nligs)¶
Return the number of ligands to dock, based on the startlig and endlig properties, as well as on the number of ligands actually found in the ligand file.
- static auto_nstructs(nligs)¶
Return the number of ligands per subjob, based on the number of ligands in the input file.
- format_ijob(ijob)¶
Return the formatted job number: if it’s a number, pad it with an appropriate number of zeroes.
- subjob_sip_input_ligfile(complex, ijob, *a)¶
- subjobname(complex, ijob, *a)¶
The name of a subjob for a specific complex.
- need_subjob(complex, ijob, *a)¶
Return True if the subjob hasn’t been run to completion, as determined by inspecting its log file.
- add_jobs()¶
Add the jobs to the JobDJ object.
- find_failed_logfiles()¶
Return the list of logfiles for subjobs that didn’t run to completion. This may include logfiles that don’t exist.
- subjob_directory(complex, ijob, *a)¶
The directory used for running a given subjob. The default implementation returns the same as the main job directory.
- subjob_rawfile(complex, ijob, *a)¶
- subjob_outputs(job)¶
Return the output file(s) for a given subjob.
- Parameters
job (Subjob) – subjob
- Return type
iterable of str
- array_command(complex, *a)¶
Like command(), but used when running on job arrays.
- split_files()¶
Split the ligand input file self.ligfile into subjob input files with up to self.nstructs ligands per file. Returns the number of files written.
- get_writer(ijob)¶
Get a StructureWriter object for subjob ‘ijob’.
- input_filename(complex, ijob, *a)¶
Return the input filename for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- subjob_ligfile(complex, ijob, *a)¶
- subjob_split_ligfile(ijob)¶
- ligoffset(ijob)¶
Return the ligand offset of subjob ‘ijob’ based on the startlig and nstructs properties of the object.
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- filtered_filename(complex, *a)¶
A merged raw file containing only poses with raw scores that passed the cutoff supplied to the merge() method.
- merge_all(pv=False)¶
Merge the subjobs for each complex.
- merge(complex, rawfiles, pv, wscore_raw_cutoff, *a)¶
Merge the subjob file for a single complex combination.
- input_string(complex, ijob, *a)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- extract_gridfiles()¶
Extract the contents of each grid archive to a directory with the same basename as the grid archive. Modify the job’s Complex object to point to the extracted .grd files.
- class schrodinger.application.glide_ws.nodes.DistributedJobMixin¶
Bases:
object
Mixin for nodes that distribute per-complex tasks over one or subjobs.
- complex_job_generator(complex, *a)¶
- complex_array_job_generator(complex, *a)¶
Return a generator that produces the ArrayJob object to run the array jobs for a single complex.
- class schrodinger.application.glide_ws.nodes.SerialJobMixin¶
Bases:
object
Mixin for nodes that do not need to distribute per-complex tasks over one or more subjobs
- complex_job_generator(complex, *a)¶
Return a generator that produces Subjob objects for a single complex.
- class schrodinger.application.glide_ws.nodes.ComplexLoopMixin¶
Bases:
object
Mixin for nodes that need to run one job per complex parameter set.
- job_generator()¶
Return a generator that produces Subjob objects. The base implementation just loops over all complexes and calls complex_job_generator.
- complex_offset_iterator()¶
Iterator for looping over complexes
- array_job_generator()¶
Like job_generator, but yields once per array, instead of once per subjob. Yields ArrayJob objects instead of Subjob objects.
- complexes_to_run()¶
Return the list of complexes for which need_job() is True.
- class schrodinger.application.glide_ws.nodes.ModelLoopMixin¶
Bases:
object
Mixin for nodes that need to loop over models, such as the MODEL and TESTSET nodes. Since models involve offsets, this mixin must be combined with OffsetMixin or DistributedOffsetMixin.
Here the main job loop is over models instead of over complexes. We then loop over the complexes of each model while making sure that we don’t run the same complex/offset combination twice if it is used by more than one model.
- job_generator()¶
- complex_offset_iterator()¶
Return complex objects for all complexes in all models in ws.models, but only return each tuple once (it’s possible to find the same complex/offset combination in more than one model!)
- array_job_generator()¶
- complexes_to_run()¶
Return the list of complexes for which need_job() is True.
- class schrodinger.application.glide_ws.nodes.WmapGenNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.SerialJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.Node
“WMAP_GEN” node: set up and run Watermap calculations
Inputs: ligand and receptor files (specified in COMPLEX block in the input file).
Outputs: watermap file <jobname>_<complex>/wmap_<complex>_wm.maegz
- default_name = 'WMAP_GEN'¶
- subjob_hosts()¶
Return the subjob hosts as a list of (host, ncpus) tuples. The base implementation returns the hosts lists from the parent WScore job, but it may be overridden if a specific subclass needs to use different hosts.
- new_jobdj()¶
Create and return a new JobDJ object to be used for running the subjobs.
- write_input_file(cmd_dir, complex, *a)¶
Write the input file for a single job for a given complex Calls ‘input_filename’ for the filename, ‘job_directory’ for the directory, and ‘input_string’ for the body of the input file. Does nothing if either ‘fname’ or the return value of ‘input_string’ are None.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- need_job(complex, *a)¶
Return True if we need to run one or more jobs for this particular complex. The base implementation always returns True.
- command(complex, *a)¶
Return the command line for a subjob as a list. Subclasses MUST override this method.
- configure(config)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- class schrodinger.application.glide_ws.nodes.GridGenNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.GlideBaseNode
GRID_GEN: Generate grid archives for all training-set complexes. (Optional)
Inputs: receptor and ligand structure files for each complex. Outputs: grid files (.zip archive) for each complex.
- default_name = 'GRID_GEN'¶
- init()¶
Called from the constructor to perform optional initialization. Does nothing by default.
- begin_node()¶
For grids that the driver will generate, strip explicit waters from the receptor proceeding to construct the grid. Otherwise, for grids with preexisting grids, throw a fatal error if explicit waters are detected.
- complex_job_generator(complex, *a)¶
A generator that produces Subjob objects for a single complex. For this node, there are two subjobs per complex: one for the WScore grid, and one for the SP grid.
- jobname(complex, sp=False)¶
Return the job name for a single job for a given complex. The default is “<node>_<complex>”, where <node> is the beautified version of the node name (lowercase and stripped of _GEN suffixes).
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- check_grid_for_unsupported_waters(complex)¶
Check that the recepfile in the the grid archive for this complex contains waters, and throw a fatal message if any are found.
- get_explicit_waters(recep_ct)¶
Return a list of explicit water atoms in the given receptor ct
- strip_waters_from_recepfile(complex)¶
If explicit waters detected in receptor_ct, first make a copy of the receptor_ct, then delete these water atoms and then write the modified ct back to another file, leaving the original recepfile unmodified.
- configure(config)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- get_box()¶
Return the boxsize and average center of mass over all the complexes as a tuple (boxsize, [cm_x, cm_y, cm_z]). Repeated calls always return the same results because we want to use the same box spec for all complexes.
- need_wscore_gridgen_job(complex)¶
- complex_input_prereqs(complex)¶
Check whether all the requirements to run the job(s) for this complex are met, or raise a fatal error if not. Called by complex_job_generator; does nothing by default.
- input_string(complex, sp=False)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- class schrodinger.application.glide_ws.nodes.NativeNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.SerialJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.GlideBaseNode
“NATIVE” node: run native docking jobs to check for triggers.
- default_name = 'NATIVE'¶
- input_string(complex)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- property trigger_lib_file¶
- property trigger_epv_file¶
- subjob_outputs(job)¶
Return the output file(s) for a given subjob.
- Parameters
job (Subjob) – subjob
- Return type
iterable of str
- check_triggers()¶
Check that all triggers for the current node pass. If any fail, write the trigger epv file for the node. If the value of the trigger WScore keyword is true, also stop the job.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- need_job(complex, *a)¶
Return True if we need to run one or more jobs for this particular complex. The base implementation always returns True.
- class schrodinger.application.glide_ws.nodes.SPDecoysNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.DistributedGlideBaseNode
SP_DECOYS node. Preliminary screening of the decoys set to make the DECOYS node faster.
- default_name = 'SP_DECOYS'¶
- init()¶
Called from the constructor to perform optional initialization. Does nothing by default.
- configure(wscore_jobparams)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- input_string(complex, ijob)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- property nstructs¶
The maximum number of structures per subjob. If not set explicitly, this property is either obtained from the -NSTRUCTS command-line argument, or a default value is chosen based on the number of ligands to dock.
- get_reader()¶
An iterator that yields on each cycle a tuple with all the poses for each ligand (one per receptor, with Nones for missing poses).
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- class schrodinger.application.glide_ws.nodes.DecoysNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.DistributedGlideBaseNode
DECOYS node: Run docking jobs for all decoys using all complexes and all offset parameter sets.
- default_name = 'DECOYS'¶
- input_string(complex, ijob, *a)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- configure(wscore_jobparams)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- class schrodinger.application.glide_ws.nodes.CrossNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.DistributedGlideBaseNode
“CROSS” Node: set up and select ensembles, and write one ligand pose file per “model” (ensemble combined with offset parameters), each file containing one pose per training-set ligand docked into its best-scoring receptor for that ensemble and parameter set.
Outputs: <jobname>_<model_str>/cross_<model_str>_lib.maegz, Model objects.
- default_name = 'CROSS'¶
- property training_actives¶
The training_actives file for this node. This is a magic property that automatically accounts for the backend runtime path.
- configure(pars)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- check_ensemble_size(pars)¶
Make sure that the requested ensemble sizes make sense, and adjust them to the number of complexes if they are too big and the values were not specified explicitly by the user.
- begin_node()¶
Called when the node begins running. Does nothing by default.
- end_node()¶
Select the ensembles and create the models.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- class schrodinger.application.glide_ws.nodes.TestSetNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ModelLoopMixin
,schrodinger.application.glide_ws.nodes.DistributedGlideBaseNode
“TESTSET” node: 1. Dock test-set ligands into receptor grids from each complex. 2. Sort into one pose file per model, each file containing one pose per test-set ligand docked into its best-scoring receptor for that model. 3. Merge each test-set pose file with the decoy pose file for that model, and compute enrichment metrics.
Inputs: user-supplied test-set ligand file; grids from bound complexes; decoy pose files <jobname>_<model_str>/decoys_<model_str>_lib.maegz.
Outputs: test-set and merged pose files <jobname>_<model_str>/testset_<model_str>_lib.maegz, <jobname>_<model_str>/testset_<model_str>_merged_lib.maegz; Enrichment reports <jobname>_<model_str>/testset_<model_str>.enrich.
- default_name = 'TESTSET'¶
- configure(wscore_jobparams)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- input_string(complex, ijob, *a)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- class schrodinger.application.glide_ws.nodes.ModelNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ModelLoopMixin
,schrodinger.application.glide_ws.nodes.DistributedGlideBaseNode
“MODEL” node: 1. Dock an arbitrary ligand set into receptor grids from a single model. 2. Sort into one pose file, containing one pose for each ligand docked into its best-scoring receptor from the model.
Inputs: user-supplied ligand file; “model file” (ZIP archive) <oldjob>_models.zip, containing a dictionary (InputConfig) with model parameters plus a line indicating the “active” model (<oldjob>.models), and the grid and watermap files for all the models. Here <oldjob> is the name of the “model generation” job (through the “TESTSET” node) that created the models. (The WscoreJob method “process_config_modelgen()” handles reading the model file).
Outputs: <complex>/model_<complex>_<offset_str>_raw.maegz.
- default_name = 'MODEL'¶
- configure(wscore_jobparams)¶
Configure the node using the ‘params’ dictionary. Does nothing by default.
- begin_node()¶
Called when the node begins running. Does nothing by default.
- auto_shuffle_ligs()¶
Shuffle input ligands before docking with sp_filter mode active
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- run_subjobs()¶
Run the subjobs for this node using JobDJ.
- cutoff_server_logfile(complex)¶
- cutoff_config_filename(complex)¶
Return the filter config filename for a given complex if the SP filtering feature is enabled; otherwise, return None.
- start_cutoff_servers()¶
Spawn one SP cutoff server per complex.
- get_cutoff_server_config(complex)¶
Return a dict with the cutoff server configuration for a given complex. Keys include ‘host’, ‘port’, and ‘server_id’. See schrodinger.application.glide.packages.sp_filter for more details.
- stop_cutoff_servers()¶
Nicely ask the cutoff servers to stop, via POST.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- input_string(complex, ijob, offset, zroff, *a)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- update_total_cpu_times()¶
Update the total subjob cpu time for the current WScore job Does nothing by default
- class schrodinger.application.glide_ws.nodes.DistributedMMGBSABaseNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.ArrayJobMixin
,schrodinger.application.glide_ws.nodes.Node
Base class for distributed MMGBSA nodes. Run Prime MMGBSA calculation on the poses and extract the scores for use in the combined WScore/MMGBSA scoring function.
Subclasses differ on which type of loop to use (e.g., all complexes vs a loop over models), which poses are used as input as well as in details of the post-processing (for example, whether an epv or a model file or an enrichment report should be generated).
Subclasses must define the ‘glide_name’ property, which is the name of the Glide node that produced the poses that should be fed into the MMGBSA calculation. Subclasses must also define ‘mmgbsa_receptor_name’,which is the name of the node whose output-file (the optimized free receptor) is used as input for the MMGBSA pose scoring.
- glide_name = None¶
- mmgbsa_receptor_name = None¶
- init()¶
Called from the constructor to perform optional initialization. Does nothing by default.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- begin_node()¶
Called when the node begins running. Does nothing by default.
- add_jobs()¶
Add the jobs to the JobDJ object.
- find_failed_logfiles()¶
Return the list of logfiles for subjobs that didn’t run to completion. This may include logfiles that don’t exist.
- array_command(complex, *a)¶
- need_subjob(complex, ijob, *a)¶
- max_subjobs_over_complexes()¶
- mmgbsa_in_filename(complex, *a)¶
This is the filtered raw output file from the previous Glide docking node which is associated with the current node. This raw file serves as input for the mmgbsa calculation.
- auto_nstructs(nligs)¶
Determine a reasonable value for -NSTRUCTS based on the number of poses.
- nposes(complex)¶
- nstructs(complex)¶
- njobs(complex, *a)¶
Return the -NJOBS value used to determine how many mmgbsa subjobs to launch.
- mmgbsa_obj_options(complex)¶
- get_mmgbsa_parent_obj(complex)¶
- init_mmgbsa_job_obj_attributes(mmgbsa_job)¶
Clear filenames from the parent object, so that they aren’t inherited by the subjob namespaces.
- split_files(*a)¶
Create input structure files for mmgbsa subjobs
- subjobname(complex, ijob, *a)¶
The name of a subjob for a specific complex.
- format_ijob(ijob)¶
Return the formatted job number: if it’s a number, pad it with an appropriate number of zeroes.
- subjob_directory(complex, *a)¶
The directory used for running a given subjob. The default implementation returns the same as the main job directory.
- input_filename(complex, ijob, *a)¶
Return the input filename for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- command(complex, ijob, *a)¶
Return the command line for a subjob as a list. Subclasses MUST override this method.
- subjob_rawfile(complex, ijob, *a)¶
- subjob_ligfile(complex, ijob, *a)¶
- input_string(complex, ijob, *a)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- optimized_free_receptor(complex, *a)¶
- get_subjob_configuration(complex, ijob, *a)¶
- merge_all()¶
Merge the subjobs for each complex.
- need_job(complex, *a)¶
Return True if we need to run one or more jobs for this particular complex. The base implementation always returns True.
- merge(complex, rawfiles, *a)¶
Merge the subjob file for a single complex combination.
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- raw_filename(complex, *a)¶
- property glide_node¶
- property mmgbsa_receptor_node¶
- mmgbsa_out_filename(complex, *a)¶
- subjob_outputs(job)¶
Return the output file(s) for a given subjob.
- Parameters
job (Subjob) – subjob
- Return type
iterable of str
- glide_raw_filename(*a)¶
This is the unfiltered raw output file from the previous Glide docking node which is associated with the current node. This raw file is used for entropy scoring/merging, even of poses that didn’t go through the mmgbsa calculation.
- logfiles()¶
Generator yielding the logfiles for the subjobs for the current node.
- class schrodinger.application.glide_ws.nodes.MMGBSAModelBaseNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedMMGBSABaseNode
Base class for MMGBSA nodes that need to loop over models.
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- end_node_models(apply_offsets=False)¶
Run the scoring/merging on each model and call end_node_per_model on each model.
- end_node_per_model(model)¶
Additional per-model postprocessing. Does nothing, but it can be overrriden by derived classes as needed.
- class schrodinger.application.glide_ws.nodes.MMGBSAReceptorBaseNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.Node
Base node for nodes that need to optimize the “free” receptor associated with a particular complex. Subclassses should also inherit from a “loop” mixin to get the necessary generators to provide the complexes to operate on.
- end_node()¶
Does nothing, but it can be overidden by derived classes as needed.
- input_string(complex, *a)¶
Return the body of the input file for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.- Return type
str or bytes or NoneType
- input_filename(complex, *a)¶
Return the input filename for a single job for a given complex. Returns None by default, which is a valid option for jobs that don’t require input files.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- command(complex, *a)¶
Return the command line for a subjob as a list. Subclasses MUST override this method.
- mmgbsa_out_filename(complex, *a)¶
- class schrodinger.application.glide_ws.nodes.MMGBSAReceptorModelNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.SerialJobMixin
,schrodinger.application.glide_ws.nodes.ModelLoopMixin
,schrodinger.application.glide_ws.nodes.MMGBSAReceptorBaseNode
Node for optimizing the free-receptors of the complexes over a set of model(s)
- default_name = 'MMGBSA_RECEPTOR_MODEL'¶
- begin_node()¶
Called when the node begins running. Does nothing by default.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- class schrodinger.application.glide_ws.nodes.MMGBSAReceptorComplexNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.SerialJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.MMGBSAReceptorBaseNode
Node for optimizing the free-receptors over all complexes.
- default_name = 'MMGBSA_RECEPTOR_COMPLEX'¶
- class schrodinger.application.glide_ws.nodes.MMGBSANode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ModelLoopMixin
,schrodinger.application.glide_ws.nodes.MMGBSAModelBaseNode
“MMGBSA” Node for WScore docking jobs. Generates an epv file as the final output.
- default_name = 'MMGBSA'¶
- receptor_mmgbsa_name = 'MMGBSA_RECEPTOR_MODEL'¶
- glide_name = 'MODEL'¶
- csv_writer_mode = 'w'¶
- end_node_per_model(model)¶
Additional per-model postprocessing. Does nothing, but it can be overrriden by derived classes as needed.
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- outputs()¶
Return a set of output files produced by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- pose_outfile(model)¶
- class schrodinger.application.glide_ws.nodes.MMGBSADecoysNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.DistributedMMGBSABaseNode
“MMGBSA_DECOYS” Node: generates lib files for all the decoys based on poses from the DECOYS node. These lib files are later used for enrichment calculations as part of the model-generation job.
- default_name = 'MMGBSA_DECOYS'¶
- glide_name = 'DECOYS'¶
- receptor_mmgbsa_name = 'MMGBSA_RECEPTOR_COMPLEX'¶
- class schrodinger.application.glide_ws.nodes.MMGBSACrossNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ComplexLoopMixin
,schrodinger.application.glide_ws.nodes.DistributedMMGBSABaseNode
“MMGBSA_CROSS” Node: run MMGBSA jobs on the exhaustive cross-docking of the training-set actives.
- default_name = 'MMGBSA_CROSS'¶
- glide_name = 'CROSS'¶
- receptor_mmgbsa_name = 'MMGBSA_RECEPTOR_COMPLEX'¶
- class schrodinger.application.glide_ws.nodes.MMGBSATestsetNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.DistributedJobMixin
,schrodinger.application.glide_ws.nodes.ModelLoopMixin
,schrodinger.application.glide_ws.nodes.MMGBSAModelBaseNode
“MMGBSA_TESTSET” Node: generates lib files for the decoy poses from the TESTSET node. Generates a test-set enrichment report and updates the model file.
- default_name = 'MMGBSA_TESTSET'¶
- glide_name = 'TESTSET'¶
- csv_writer_mode = 'a'¶
- receptor_mmgbsa_name = 'MMGBSA_RECEPTOR_COMPLEX'¶
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- end_node_per_model(model)¶
Additional per-model postprocessing. Does nothing, but it can be overrriden by derived classes as needed.
- property testset_data_file¶
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.
- class schrodinger.application.glide_ws.nodes.OptimizeNode(wscore_job, name=None, jobdj=None, job_class=None)¶
Bases:
schrodinger.application.glide_ws.nodes.Node
Run the ensemble optimization.
- Uses data generated by the following nodes:
CROSS, MMGBSA_CROSS for actives
DECOYS, MMGBSA_DECOYS for decoys
- default_name = 'OPTIMIZE'¶
- prepare_inputs()¶
Create all the input files for this node.
- partition_decoy_set()¶
Partition the decoy set into a training set and a test set. The split is random, based on the value of the decoy_training_fraction and random_seed keywords.
As a side effect,
training_decoys
andtest_decoys
are set in the global state object. Both are sets of strings (decoy titles).
- property actives_data_file¶
- property decoys_data_file¶
- property ensemble_data_file¶
- job_directory()¶
Return the path to the directory where the job(s) for a single complex will be run. The default is to return complex.directory.
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- optimizer_options()¶
- jobname()¶
Return the job name for a single job for a given complex. The default is “<node>_<complex>”, where <node> is the beautified version of the node name (lowercase and stripped of _GEN suffixes).
Nodes that run more than one job per complex can use additional arguments designated here as
*a
.
- command()¶
Return the command line for a subjob as a list. Subclasses MUST override this method.
- job_generator()¶
- merge_poses(docking_node, mmgbsa_node, name, csv_writer_mode)¶
For each model, merge the docking and mmgbsa poses using the combined scoring function given a pair of nodes. The name is only used for logging.
- end_node()¶
Do postprocessing after all the subjobs are done. The default is to loop over complexes and call end_node_per_complex on each.
- subjob_outputs(job)¶
Return the output file(s) for a given subjob.
- Parameters
job (Subjob) – subjob
- Return type
iterable of str
- inputs()¶
Return a set of input files needed by this node, for the purpose of registering them with job control during startup. Returns an empty set by default.