schrodinger.application.desmond.queue module¶
- class schrodinger.application.desmond.queue.Queue(hosts: str, max_job: int, max_retries: int, periodic_callback=None)¶
Bases:
object
- __init__(hosts: str, max_job: int, max_retries: int, periodic_callback=None)¶
- Parameters
hosts – string passed to -HOST.
max_job – Maximum number of jobs to run simultaneously.
max_retries – Maximum number of times to retry a failed job.
periodic_callback – Function to call periodically as the jobs run. This can be used to handle the halt message for stopping a running workflow.
- run()¶
Run jobs for all multisim stages.
Starts a separate JobDJ for each multisim stage.:
queue.push(jobs) queue.run() while jobs: <---------------| jobdj.run() | multisim_jobs.finish() | stage.capture() | next_stage.push() | next_stage.release() | queue.push(next_jobs) --
- stop() int ¶
Attempt to stop the subjobs, but kill them if they do not stop in time.
- Returns
Number of subjobs killed due to a failure to stop.
- push(jobs: List[cmj.Job])¶
- property running_jobs: List[schrodinger.application.desmond.queue.JobAdapter]¶
- class schrodinger.application.desmond.queue.JobAdapter(*args, multisim_job=None, **kwargs)¶
Bases:
schrodinger.job.queue.JobControlJob
- __init__(*args, multisim_job=None, **kwargs)¶
Job constructor.
- Parameters
command – The command that runs the job.
command_dir – The directory from which to run the command.
name – The name of the job.
max_retries – Number of allowed retries for this job. If this is set, it is never overridden by the
SCHRODINGER_MAX_RETRIES
environment variable. If it is not set, the value of max_retries defined in JobDJ is used, andSCHRODINGER_MAX_RETRIES
can be used to override this value at runtime. To prevent this job from being restarted altogether, set max_retries to zero.timeout – Timeout (in seconds) after which the job will be killed. If None, the job is allowed to run indefinitely.
launch_timeout – Timeout (in seconds) for the job launch process to complete. If None, a default timeout will be used for jobserver and old jobcontrol jobs ( see get_default_timeout() ) unless a value for job timeout parameter is passed and is not greater than the default timeout.
launch_env_variables – A dictionary with the environment variables to add when the jobcontrol job is launched. The name of any additional variables to set should be in the keyword of the dict and the value should be the corresponding value. These will be added to any environment variables already present, but removed after the job has been launched.
kwargs – Additional keyword arguments. Provided for consistency of interface in subclasses.
resource_requirement – Whether the job will require special compute resources, such as GPU.
license_requirement – List of license tokens required for the job to be used for license checking when SMART_LICENSE_CHECK feature flag is turned on. This is useful for license checking the first job of the smart distribution launched directly to the localhost without canceling from the queue. The license requirements are not known until the job is launched. Each license token is in the form ‘TOKEN’ or ‘TOKEN:n’ where TOKEN is the name of the license, and n is the number of tokens.
smart_dist_eligible – Whether this job can be submitted via smart distribution (True) or not (False). This setting only comes into play if all other requirements (such as the resource_requirement, license requirement, number of processors, and smart distribution being turned on) are met. In other words, setting it to True will not force the job to run via smart distribution, but setting it to False will ensure that it does not.
- getCommand() List[str] ¶
Return the command used to run this job.
- maxFailuresReached(**kwargs)¶
Print an error summary, including the last 20 lines from each log file in the LogFiles list of the job record.
- acquireLicenseForSmartDistribution() bool ¶
Acquire and hold licenses for a smart distribution job. This makes sure the job won’t fail due to unavailable licenses.
Returns True if the licenses registered for the job are acquired, and False if they are not. If no licenses are registered, it always returns True to avoid preventing jobs from using the smart distribution feature. For legacy jobcontrol, the license check is not performed, and is always returned True. We want to use this feature as a pitch to move users to JOB_SERVER.
- addFinalizer(function: Callable[[schrodinger.job.queue.BaseJob], None], run_dir: Optional[str] = None)¶
Add a function to be invoked when the job completes successfully.
See also the add_multi_job_finalizer function.
- addGroupPrereq(job: schrodinger.job.queue.BaseJob)¶
Make all jobs connected to
job
prerequisites of all jobs connected to this Job.
- addLaunchEnv(key: str, val: str)¶
Adds the given environment key and and value to the list of launch environment.
- Parameters
key – environment key to add to the launch environment.
val – environment value associcated with the key to add to the launch environment.
- addPrereq(job: schrodinger.job.queue.BaseJob)¶
Add a job that is an immediate prerequisite for this one.
- cancel()¶
Send kill request to jobcontrol managed job. This method will eventually deprecate JobControlJob.kill
- cancelSubmitted(do_license_check: bool = False) schrodinger.job.queue.CancelSubmittedStatus ¶
If the job is still in the ‘submitted’ state, cancel it, purge the jobrecord and set the job handle to None. This tries to acquire licenses for the job before canceling from the queue if do_license_check is turned on.
- Parameters
do_license_check – Acquire licenses for the job before canceling from the queue.
Returns one of the status of CancelSubmittedStatus.
- doCommand(host: str, local: bool = False)¶
Launch job on specified
host
using jobcontrol.launch_job().- Parameters
host – Host on which the job will be executed.
local – Removed in JOB_SERVER.
- finalize()¶
Clean up after a job successfully runs.
- genAllJobs(seen: Optional[Set[schrodinger.job.queue.BaseJob]] = None) Generator[schrodinger.job.queue.BaseJob, None, None] ¶
A generator that yields all jobs connected to this one.
- genAllPrereqs(seen=None) Generator[schrodinger.job.queue.BaseJob, None, None] ¶
A generator that yields all jobs that are prerequisites on this one.
- getCommandDir() str ¶
Return the launch/command directory name. If None is returned, the job will be launched in the current directory.
- getDuration() Optional[int] ¶
Return the duration of the Job as recorded by job server. The duration does not include queue wait time.
If the job is running or has not launched, returns None.
Note that this method makes a blocking call to the job server.
- getJob() Optional[schrodinger.job.jobcontrol.Job] ¶
Return the job record as a schrodinger.job.jobcontrol.Job instance.
Returns None if the job hasn’t been launched.
- getJobDJ() schrodinger.job.queue.JobDJ ¶
Return the JobDJ instance that this job has been added to.
- getPrereqs()¶
Return a set of all immediate prerequisites for this job.
- getStatusStrings() Tuple[str, str, str] ¶
Return a tuple of status strings for printing by
JobDJ
.The strings returned are (status, jobid, host).
- hasExited() bool ¶
Returns True if this job finished, successfully or not.
- hasStarted() bool ¶
Returns True if this job has started (not waiting)
- init_count = 0¶
- isComplete() bool ¶
Returns True if this job finished successfully
- kill()¶
Send kill request to jobcontrol managed job
- postCommand()¶
A method to restore things to the pre-command state.
- preCommand()¶
A method to make pre-command changes, like cd’ing to the correct directory to run the command in.
- retryFailure(max_retries: int = 0) bool ¶
This method will be called when the job has failed, and JobDJ needs to know whether the job should be retried or not.
JobDJ’s value for the max_retries parameter is passed in, to be used when the job doesn’t have its own max_retries value.
Return True if this job should be retried, otherwise False.
- run(*args, **kwargs)¶
Run the job.
- The steps taken are as follows:
Execute the preCommand method for things like changing the working directory.
Call the doCommand to do the actual work of computation or job launching.
Call the postCommand method to undo the changes from the preCommand that need to be undone.
- runsLocally() bool ¶
Return True if the job runs on the
JobDJ
control host, False if not. Jobs that run locally don’t need hosts.There is no limit on the number of locally run jobs.
- setup()¶
A method to do initial setup; executed after
preCommand
, just beforedoCommand
.
- property state: schrodinger.job.queue.JobState¶
Return the current state of the job.
Note that this method can be overridden by subclasses that wish to provide for restartability at a higher level than unpickling
BaseJob
instances. For example, by examining some external condition (e.g. presence of output files) the state JobState.DONE could be returned immediately and the job would not run.
- update()¶
Checks for changes in job status, and updates the object appropriately (marks for restart, etc).
- Raises
RuntimeError – if an unknown Job Status or ExitStatus is encountered.
- usesJobServer() bool ¶
Detect, by looking at the jobId, whether this job uses a job server.