schrodinger.application.desmond.task module

exception schrodinger.application.desmond.task.SubtaskExecutionError

Bases: RuntimeError

class schrodinger.application.desmond.task.Task(name: str, subtasks: Optional[List] = None)

Bases: object

This is a base class. An instance of this class defines a concrete task to be executed. All subclasses are expected to implement the __init__ and the execute methods. The execute should be either a public callable attribute or a public method. See ParchTrajectoryForFepLambda below for example.

A task can be composed of one or more subtasks. The relationship among the premises of this task and its subtasks is the following: - If this task’s premises are not met, no subtasks will be executed. - Failure of one subtask will NOT affect other subtasks being executed.

Six public attributes/properties:

  • name: An arbitrary name for the task. Useful for error logging.

  • is_completed - A boolean value indicating if the particular task has been completed successfully.

  • results - A list of Datum objects as the results of the execution of the task. The data will be automatically put into the dababase.

  • log - A list of strings recording the error messages (if any) during the last execution of the task. The list is empty if there was no errors at all.

  • premises - A list of lists of Premise objects. The first list are the premises of this Task object, followed by that of the first subtask, and then of the second subtask, and so on. Each element list can be empty.

  • options - Similar to premises except that the object type is Option.

__init__(name: str, subtasks: Optional[List] = None)
Parameters

name – An arbitrary name. Useful for error logging.

property premises
property options
clear()

Cleans the state of this object for a new execution.

execute(db: schrodinger.application.desmond.arkdb.ArkDb)

Executes this task. This should only be called after all premises of this task are met. The premises of the subtasks are ignored until the subtask is executed. Subclasses should implement an execute, either as an instance method, or as an instance’s public callable attribute. After execution, all results desired to be put into the database should be saved as the results attribute.

The first argument of execute should always be for the database.

class schrodinger.application.desmond.task.ParchTrajectoryForSolubilityFep(name, cms_fname_pattern: str, trj_fname_pattern: str, out_bname_pattern: str, num_solvent: int = 200)

Bases: schrodinger.application.desmond.task.Task

Task to parch the trajectory for the given FEP lambda state. The lambda state is represented by 0 and 1.

Results are all Datum objects:

  • key = “ResultLambda{fep_lambda}.ParchedTrajectoryFileName”, where {fep_lambda} is the value of the lambda state.

  • val = Name of the parched trajectory file

__init__(name, cms_fname_pattern: str, trj_fname_pattern: str, out_bname_pattern: str, num_solvent: int = 200)

The values of the arguments: cms_fname_pattern, trj_fname_pattern, and out_bname_pattern, are simple strings that specify f-string patterns to be evaluated yet to get the corresponding file names. Example, "{jobname}_replica_{index}-out.cms", note that it’s a simple string and uses two f-string variables {jobname} and {index}. The values of the f-string variables will be obtained on the fly when the task is executed. Currently, the following f-string variables are available for this task:

{jobname}    - The FEP job's name
{index}      - The index number of the replica corresponding to either
               the first lambda window or the last one, depending on
               the value of the `fep_lambda` argument.
class schrodinger.application.desmond.task.ParchTrajectoryForFepLambda(name, fep_lambda: int, result_lambda: int, cms_fname_pattern: str, trj_fname_pattern: str, out_bname_pattern: str, num_solvent: int = 200)

Bases: schrodinger.application.desmond.task.Task

Task to parch the trajectory for the given FEP lambda state. The lambda state is represented by 0 and 1.

Results are all Datum objects:

  • key = “ResultLambda{fep_lambda}.ParchedTrajectoryFileName”, where {fep_lambda} is the value of the lambda state.

  • val = Name of the parched trajectory file

We leave this class here (1) to explain how the framework basically works and (2) to demonstrate how to create a concrete Task subclass.

  • Introduction From the architectural point of view, one of the common and difficult issues in computation is perhaps data coupling: Current computation needs data produced by previous ones. It’s difficult because the coupling is implicit and across multiple programming units/modules/files, which often results in bugs when code change in one place implicitly breaks code somewhere else.

    Taking this class as an example, the task is trivial when explained at the conceptual level: Call the trj_parch.py script with properly set options to generated a “parched” trajectory. But when we get to the detail to incorporate this task in a workflow, it becomes very complicated, mostly because of the data coupling issue (which is the devil here): From the view point of this task, we have to check the following data dependencies:

    1. The input files (the output CMS file and the trajectory file) exist.

    2. We identify the input files by file name patterns that depend on the current jobname which is supposed to be stored in a (.sid) data file. So we have to ensure the jobname exists in the database. (Alternatively, we can pass the jobname through a series of function calls, but we won’t discuss about the general issues of that approach)

    3. To call trj_parch.py, we must set the -dew-asl and -fep-lambda options correctly. The value for these options are either stored in .sid data file or passed into this class via an argument of the __init__ method.

    Furthermore, when any of these conditions are not met, informative errors messages must be logged. All of these used to force the developer to write a LOT of biolerplate code to get/put data from the database, to check these conditions, and to log all errors, for even the most conceptually trivial task. So often than not, such boring (and repeated) code is either incomplete or not in place at all. And we take the risk of doing computations without verifying the data dependencies, until some code changes break one of the conditions.

  • Four types of data We must realize where the coupling comes into the architecture of our software. For this, it helps to categorize data into the following types in terms of the source of the data: 1. Hard coded data:

    • This type of data is hard coded and rarely needs to be modified customized. Example, num_solvent=200.

    1. Arguments:

    • Data passed into the function by the caller code. Example, fep_lambda.

    1. From the database:

    • Examples: jobname, ligand ASL, number of lambda windows.

    1. Assumptions:

    • Assumptions are data generated by previous stages in a workflow but are out of the control of the task of interest. For example, we have to assume the CMS and trajectory files following certain naming patterns exist in the file system. In theory, the less assumptions, the more robust the code. But in practice, it is very difficult (if not impossible) to totally avoid assumptions.

    Implicit data coupling happens for the types (3) and (4) data.

  • The task framework The basic idea of this framework is to make the types (3) and (4) data more explicitly and easily defined in our code, which will then make it possible to automatically check their availabilities and log errors. For the type (3) data, we provide Premise and Option classes for getting the data. For the type (4) data, we have to rely on a convention to verify the assumpations. But utility functions are provided to make that easier and idiomatic. In both cases, when the data are unavailable, informative error messages will be automatically logged. The goal of this framework is to relieve the developer from writing a lot of biolerplate code and shift their attentions to writing reusable tasks.

__init__(name, fep_lambda: int, result_lambda: int, cms_fname_pattern: str, trj_fname_pattern: str, out_bname_pattern: str, num_solvent: int = 200)

The values of the arguments: cms_fname_pattern, trj_fname_pattern, and out_bname_pattern, are simple strings that specify f-string patterns to be evaluated yet to get the corresponding file names. Example, "{jobname}_replica_{index}-out.cms", note that it’s a simple string and uses two f-string variables {jobname} and {index}. The values of the f-string variables will be obtained on the fly when the task is executed. Currently, the following f-string variables are available for this task:

{jobname}    - The FEP job's name
{fep_lambda} - Same value as that of the argument `fep_lambda`. It's
               either 0 or 1.
{result_lambda} - Same value as that of the argument `result_lambda`. It's
        either 0 or 1
{index}      - The index number of the replica corresponding to either
               the first lambda window or the last one, depending on
               the value of the `fep_lambda` argument.
class schrodinger.application.desmond.task.ParchTrajectoryForFep(name, num_solvent=200)

Bases: schrodinger.application.desmond.task.Task

Task to generate parched trajectories for both FEP lambda states. The lambda state is represented by 0 and 1.

Results are all Datum objects: - key = “ResultLambda0.ParchedCmsFname” - val = Name of the parched CMS file for lambda state 0: “lambda0-out.cms” - key = “ResultLambda1.ParchedCmsFname” - val = Name of the parched CMS file for lambda state 1: “lambda1-out.cms” - key = “ResultLambda0.ParchedTrjFname” - val = Name of the parched trajectory file for lambda state 0:

"lambda0{ext}", where "{ext}" is the same extension of the input
trajectory file name.
  • key = “ResultLambda1.ParchedTrjFname”

  • val = Name of the parched trajectory file for lambda state 1:

    "lambda0{ext}", where "{ext}" is the same extension of the input
    trajectory file name.
    

We leave this class here to demonstrate how to define a concrete Task subclass by composition.

__init__(name, num_solvent=200)
Parameters

name – An arbitrary name. Useful for error logging.

class schrodinger.application.desmond.task.ParchTrajectoryForAbsoluteFep(name, num_solvent=200)

Bases: schrodinger.application.desmond.task.Task

Task to generate the parched trajectory for the lambda state with the fully-interacting ligand.

Results are all Datum objects: - key = “ResultLambda0.ParchedCmsFname” - val = Name of the parched CMS file: “lambda0-out.cms”

  • key = “ResultLambda0.ParchedTrjFname”

  • val = Name of the parched trajectory file:

    "lambda0{ext}", where "{ext}" is the same extension of the input
    trajectory file name.
    
__init__(name, num_solvent=200)
Parameters

name – An arbitrary name. Useful for error logging.

class schrodinger.application.desmond.task.TrajectoryForSolubilityFep(name, num_solvent=200)

Bases: schrodinger.application.desmond.task.Task

Task to generate the parched trajectory for the lambda state with the fully-interacting molecule.

Results are all Datum objects: - key = “ResultLambda1.ParchedCmsFname” - val = Name of the parched CMS file: “lambda0-out.cms”

  • key = “ResultLambda1.ParchedTrjFname”

  • val = Name of the parched trajectory file:

    "lambda0{ext}", where "{ext}" is the same extension of the input
    trajectory file name.
    
__init__(name, num_solvent=200)
Parameters

name – An arbitrary name. Useful for error logging.

schrodinger.application.desmond.task.execute(arkdb: schrodinger.application.desmond.arkdb.ArkDb, tasks: Iterable[schrodinger.application.desmond.task.Task]) bool

Executes one or more tasks against the given database arkdb.

This function is guaranteed to do the following:

  1. This function will examine each task’s premises against the database.

  2. If the premises are NOT met, it skips the task; otherwise, it will proceed to check the task’s options against the database.

  3. After getting the premises and options data, it will call the task’s execute callable object. If the execution of the task is completed without errors, it will set the task’s is_completed attribute to true.

  4. During the above steps, errors (if any) will be logged in the task’s log list.

  5. After doing the above for all tasks, this function will return True if all tasks are completed without errors, or False otherwise.

schrodinger.application.desmond.task.collect_logs(tasks: Iterable[schrodinger.application.desmond.task.Task]) List[str]

Iterates over the given Task objects, and aggregates the logs of uncompleted tasks into a list to return. The returned strings can be joined and printed out:

print("\n".join(collect_logs(...)))

and the text will look like the following:

task0: Task
  message
  another message
  another multiword message
task1: ConcreteTaskForTesting
  message
  another arbitrary message
  another completely arbitrary message

Note that the above is just an example to demostrate the format as explained further below. Do NOT take the error messages literally. And all the error messages here are unrelated to each other, and any patterns you might see is unintended!

So for each uncompleted task, the name and the class’ name of the task will be printed out, and following that are the error messages of the task, each in a separate line indented by 2 spaces.

Note the purpose of returning a list of strings instead of a single string is to make it slightly easier to further indent the text. For example, if you want to indent the whole text by two spaces. You can do this:

print("  %s" % "\n  ".join(collect_logs(...)))

which will look like the following:

task0: Task
  message
  another message
  another multiword message
task1: ConcreteTaskForTesting
  message
  another arbitrary message
  another completely arbitrary message
class schrodinger.application.desmond.task.Premise(key)

Bases: schrodinger.application.desmond.arkdb.Datum

A premise here is a datum that must be available for a task (see the definition below) to be successfully executed.

__init__(key)

Creates a Datum object with the given key and the default value val. key’s value can be None, and in this case the get_from method will always return the default value val.

class schrodinger.application.desmond.task.Option(key: Optional[str], val=None)

Bases: schrodinger.application.desmond.arkdb.Datum

An option here is a datum that does NOT have to be available for a task (see the definition below) to be successfully executed.