schrodinger.active_learning.al_driver module

Implementation of screening large library with active learning scheme.

Active learning scheme 1. Select N ligands from the library 2. Dock the selected portion of the library. 3. Train a ligand_ml model with the scores. 4. Evaluate the whole library with the generated ligand_ml model. 5. Pick N from the top M best ligands predicted by the ligand_ml model. 6. Dock the ligands picked in step 5 and repeat step 3 until it reaches num_iter.

Copyright Schrodinger Inc, All Rights Reserved.

class schrodinger.active_learning.al_driver.StopAfterPattern

Bases: StrEnum

Enum for supported stop_after patterns in Active Learning workflows.

FINISH_ALL = 'FinishAll'
class schrodinger.active_learning.al_driver.StopAfterRequest(original_value: str, num_iter: int, al_node_supplier: ActiveLearningNodeSupplier)

Bases: object

Encapsulates a stop_after request with conversion and validation logic.

This centralizes all string manipulation and conversion logic in one place, eliminating code duplication and providing a clean interface.

original_value: str
num_iter: int
al_node_supplier: ActiveLearningNodeSupplier
property is_iteration_pattern: bool

Check if this is an iter_X pattern that was converted.

property iteration_number: Optional[int]

Extract iteration number from iter_X pattern.

__init__(original_value: str, num_iter: int, al_node_supplier: ActiveLearningNodeSupplier) None
class schrodinger.active_learning.al_driver.StopAfterValidator(task: str, num_iter: int, use_known_score: bool, run_rescore_ligand: bool, restart_file: str, al_node_supplier: ActiveLearningNodeSupplier)

Bases: object

Validator for stop_after requests.

__init__(task: str, num_iter: int, use_known_score: bool, run_rescore_ligand: bool, restart_file: str, al_node_supplier: ActiveLearningNodeSupplier)
property workflow_nodes
property finished_nodes

Lazy-loaded list of finished node names.

validate_and_convert(stop_after: str) Tuple[Optional[str], str]

Validate and convert stop_after value.

class schrodinger.active_learning.al_driver.Option(*names, dest=None, help=None, type=<class 'str'>, metavar=None, default=None, action=None, nargs=None, choices=None, required=False)

Bases: object

A class to represent “options” which may be translated into argparse command-line arguments or an InputConfig spec for parsing input files. This is used to support the behavior of the legacy SiteMap driver, where every option could be specified in an input file or on the command line, with the latter taking precedence.

__init__(*names, dest=None, help=None, type=<class 'str'>, metavar=None, default=None, action=None, nargs=None, choices=None, required=False)

The arguments all have the same meaning as for argparse.ArgumentParser.add_argument(), except min and max which are only used by ConfigObj and limit the range of allowed values for numeric types.

toArgparse(parser)

Add an option to an argument parser.

Parameters:

parser (argparse.ArgumentParser) – argument parser

toConfigObj()

Return a ConfigObj validator spec for self.

Returns:

validation spec

Return type:

str

toSubparser(subparsers)

Create a subparser for certain task.

Returns:

argument parser

Return type:

argparse.ArgumentParser

schrodinger.active_learning.al_driver.get_workflow_node_names(task, num_iter, use_known_score, run_rescore_ligand, al_node_supplier)

Return a list of stages needed to complete the workflow based on the task type, number of iteration, whether score is known and whether to run rescore stage.

Parameters:
  • task (str in [SCREEN_TASK, PILOT_TASK or EVAL_TASK]) – workflow task type

  • num_iter (int) – number of iterations

  • use_known_score (bool) – Use known scores in score_file to obtain the score.

  • run_rescore_ligand (bool) – run rescore stage for ligand.

  • al_node_supplier (ActiveLearningNodeSupplier) – Supplier of active learning nodes

Returns:

list of names of stages needed to complete the workflow

Return type:

list(str)

schrodinger.active_learning.al_driver.validate_stop_after(stop_after_node, task, num_iter, use_known_score, run_rescore_ligand, restart_file, al_node_supplier: ActiveLearningNodeSupplier)

Check whether the node name user specified in -stop_after is valid.

Supports both explicit stage names and iteration-based patterns: - Explicit: ‘LigandMLEvalNode_iter_1’, ‘LigandMLTrainNode_iter_2’, etc. - Iteration-based: ‘iter_1’, ‘iter_2’, etc. (automatically converted to LigandMLEvalNode) - Special: ‘FinishAll’ to run all remaining stages

Parameters:
  • stop_after_node – Name of the node where workflow will exit when finished

  • task – Workflow task type

  • num_iter – Number of iterations

  • use_known_score – Use known scores to obtain the score

  • run_rescore_ligand – Run rescore stage for ligands

  • restart_file – Restart file path for loading previous progress

  • al_node_supplier – Supplier of active learning nodes

Returns:

Error_message, Converted_stop_after_node

schrodinger.active_learning.al_driver.validate_and_update_stop_after_args(args: Namespace, task: str, num_iter: int, use_known_score: bool, run_rescore_ligand: bool, restart_file: str, al_node_supplier: ActiveLearningNodeSupplier) Tuple[bool, Optional[str]]

Convenience function for AL drivers to validate and update args.stop_after.

This provides a clean interface that handles the common pattern used across all AL drivers while maintaining backward compatibility.

Parameters:
  • args – Argument namespace (args.stop_after will be updated in place)

  • task – Workflow task type

  • num_iter – Number of iterations

  • use_known_score – Use known scores to obtain the score

  • run_rescore_ligand – Run rescore stage for ligands

  • restart_file – Restart file path for loading previous progress

  • al_node_supplier – Supplier of active learning nodes

Returns:

validation_success, error_message

class schrodinger.active_learning.al_driver.ActiveLearningJob(args, al_node_supplier)

Bases: object

__init__(args, al_node_supplier)

Initialize the ActiveLearningJob from the cmd arguments.

Parameters:

args (argparse.Namespace) – argument namespace with command line options

static LoadPreviousNodes(restart_file)

Load nodes that were finished in previous job.

Parameters:

restart_file (str) – filename of the AL .pkl restart file

Returns:

Nodes that were finished in previous job.

Return type:

OrderedDict that maps node name to node instance.

static getNodeClasses(use_known_score, al_node_supplier)

Return a list of node classes to run based on the job type.

Parameters:
  • use_known_score (bool) – Use known scores in score_file to obtain the score.

  • al_node_supplier (ActiveLearningNodeSupplier) – Supplier of active learning nodes

Returns:

a list of ActiveLearningNode subclass

Return type:

list

property most_recent_pred_file: Optional[str]

Get the most recent prediction file. :return: most recent prediction file

LoadOptionalRestartFiles()

Load the restart files for the possible restarting of the running node.

Returns:

list of filenames

Type:

list(str) or None

LoadOptionalRestartFilesDict()

Load a dict of the optional restart files. The dictionary maps the jobname of the subjob to a list of optional files for restarting the subjob.

Returns:

dict of optional files

Return type:

{str: list}

addOptionalRestartFilesDict()

Pass variable optional_restart_files_dict to the restart dictionary. Dump the restart dictionary as a pickle file.

SetupLigandsFromZippedDir()

Extracts the ligands from an expected .zip archive transferred from the launch directory. The archive is extracted to the zip archive directory name without the extension. The ligands are shuffled and validated.

DelayedInputValidation()

“Big” data input validation initiated once the driver is live. Evaluates only the first 5 entries per file to minimize validation overheads.

configure()

Prepare the active learning job.

property scored_csv_file_list

Get all the .csv files that contain scored ligands from ScoreProviderNode.

Returns:

list of .csv files contain score ligands.

Return type:

list(str)

property restart_files

Get all the necessary files for restarting the workflow from finished nodes.

Returns:

a set of files for restarting.

Return type:

set(str)

getPilotScoreFile()

Reorder the columns in the pilot ligand score file for the use of machine learning model training input.

Returns:

name of reorder .csv file.

Return type:

str

checkOSFileLimit()

Check the system file descriptors limit.

getNodesToRun()

Return nodes to run and finished nodes for current active learning job.

splitInputfiles()

Separate the input files into small blocks randomly.

getRestartNode()

Get the node for restarting the workflow.

Returns:

last finished node

Return type:

ActiveLearningNode

getInitialInputs()

Get the inputs for the runNode() method of the first node in the workflow.

Returns:

dict that contains the keyword arguments and values for the runNode() of the first node.

Return type:

dict{keyword argument: value}

getLocalArgs()
Returns:

arguments on local machine.

Return type:

argparse.Namespace

runNodes()

Run all the ActiveLearningNode instances in self.nodes_to_run.

postprocessing()

Combine the log files of all subjobs.

getFromRestartFile(key: str) dict

Retrieves data from a previous run’s restart file using the specified key.

Parameters:

key – The key to look up in the restart file dictionary

Returns:

The data associated with the key, or an empty dictionary if the key doesn’t exist or the restart file is not found

addToRestartFile(key: str, data: dict)

Dumps object to the restart file and updates restart file. :param key: Tag for storing information :param data: Dictionary containing information to be saved

Side effects: - Updates self.restart_dict with new information - Writes updated restart_dict to restart file - Adds restart file to output files list

schrodinger.active_learning.al_driver.read_paths_listed_in_file(old_paths, paths_list_file)

Add the paths specified in the paths_list_file to old_paths.

Parameters:
  • old_paths (list) – None or list of original paths

  • paths_list_file (string) – path of the file that contains paths to be added

Returns:

list of paths

Return type:

list(str)

schrodinger.active_learning.al_driver.restart_args_handler(args, script_name=None)

Load the previous arguments stored in args.restart_file.

Parameters:

args (argparse.Namespace) – argument namespace with command line options

Returns:

updated argument namespace, argument namespace of previous job or None

Return type:

argument namespace, argument namespace or None

schrodinger.active_learning.al_driver.common_parse_args(args, script_name=None)

Parses command-line arguments.

Parameters:

args (argparse.Namespace) – argument namespace with command line options

Returns:

argument namespace with command line options

Return type:

argparse.Namespace

schrodinger.active_learning.al_driver.common_validate_args(args, skip_validate_infile: bool = False) Tuple[bool, str]

Validate command-line arguments

Parameters:
  • args – argument namespace with command line options

  • skip_validate_infile – Whether to skip validation of the infile list.

Returns:

Whether the args were validated. If they were not, the corresponding message is given.

schrodinger.active_learning.al_driver.common_get_job_spec_from_args(args, jsb)