schrodinger.job.jobcontrol module¶
Core job control for python.
There are currently four major sections of this module - “Job database,” “Job launching,” “Job backend,” and “Job hosts.” The job database section deals with getting info about existing Jobs, the job launching section deals with starting up a subjob, and the job backend section provides utilities for a python script running as a job.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.job.jobcontrol.DisplayStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- WAITING = 'Waiting'¶
- RUNNING = 'Running'¶
- CANCELED = 'Canceled'¶
- STOPPED = 'Stopped'¶
- FAILED = 'Failed'¶
- COMPLETED = 'Completed'¶
- schrodinger.job.jobcontrol.timestamp(msg)¶
- exception schrodinger.job.jobcontrol.JobcontrolException¶
Bases:
Exception
- exception schrodinger.job.jobcontrol.JobLaunchFailure¶
Bases:
schrodinger.job.jobcontrol.JobcontrolException
,RuntimeError
- exception schrodinger.job.jobcontrol.MissingHostsFileException¶
- exception schrodinger.job.jobcontrol.UnreadableHostsFileException¶
- class schrodinger.job.jobcontrol.Job(job_id: str, cpp_job: mmjob.Job = None)¶
Bases:
object
A Job instance is always a snapshot of the job state at a specific point in time. It is only updated when the
readAgain
method is explicitly invoked.- __init__(job_id: str, cpp_job: mmjob.Job = None)¶
Initialize a read-only Job object.
- Parameters
job_id – Unique identifier for a job
cpp_job – provide a c++ job object in memory, used for constructing objects in wrapper objects from c++ APIs, rather than direct construction.
- readAgain()¶
Reread the database. Calling this routine is necessary to get fresh values.
- isComplete(wait_for_exited=False) bool ¶
Returns True if the job is complete.
Note that this does not necessarily mean the output files have been downloaded.
- Parameters
wait_for_exited – If set, wait for the job to be completed, as long as the job’s ExitStatus is set. (This only makes sense for legacy jobcontrol jobs.)
- isQueued() bool ¶
Returns True if the job runs on a HPC queueing system.
- succeeded() bool ¶
Returns False if the job was killed, died or fizzled. Returns True if ExitStatus is finished.
- Raises
RuntimeError – if the job isn’t completed, so use isComplete() before calling.
- wait_before_kill()¶
- stop()¶
Kill the job while collecting output files.
- kill()¶
Kill the job if it is running. This cancels a running job and does not return output files.
- cancel()¶
Cancel a running job and do not return output files. This method will eventually deprecate job.kill
- kill_for_smart_distribution() bool ¶
Kill the job for smart distribution. This method cancels a job successfully when waiting or sitting in the queue.
Return True if canceled, False otherwise.
- wait(max_interval: int = 60, throw_on_failure: bool = False)¶
Wait for the job to complete; sleeping up to ‘max_interval’ seconds between each database check. (Interval increase gradually from 2 sec up to the maximum.)
- NOTE: Do not use if your program is running in Maestro, as this
will make Maestro unresponsive while the job is running.
- Parameters
throw_on_failure (bool) – whether to raise an exception if not succeeded
- Raises
RuntimeError – if the job did not succeed. The error message will contain the last 20 lines of the job’s logfile (if available).
- download()¶
Download the output of the job into job’s launch directory. No-op in legacy jobcontrol.
- get(attr, default=None)¶
This function will always raise an error, but is provided to guide users to a new syntax.
- summary() str ¶
Return a string summarizing all current Job attributes.
- getDuration() Optional[int] ¶
Returns the wallclock running time of the job if it is complete. This does not include time is submission status. Returns time in seconds. If the job is not complete, returns None.
- isDownloaded()¶
Check if output files were downloaded. For legacy job control, identical to
isComplete()
.- Returns
Whether the job files were downloaded.
- Return type
bool
- property BatchId: Optional[str]¶
Return the batch id, if running on an HPC queueing system. Otherwise return None.
- property Dir: str¶
Return the absolute path of the launch directory.
- property ExitCode: Union[int, str]¶
Returns the exit code of a process. If the job is still running, or it was killed without collecting the exit code, return a string indicating unknown status.
- property Host: str¶
Return the hostname of the host which launched this job.
- property HostEntry: str¶
Return the name of the host entry this job was launched to.
- property LaunchTime: str¶
Return a string timestamp for the time that the job was launched. This will before the job starts running, as soon as it is registered with jobcontrol as a job to be run.
- property JobId: str¶
Return an identifier for a job.
- property Name: str¶
Returns a string representing -JOBNAME that was specified on launch. This may be an empty string.
- property ParentJobId: Optional[str]¶
Return the jobid of a parent job. If the job does not have a parent, return None.
- property Processors: int¶
For a batch job, returns the number of queue slots attached to this job. For a local job, return the number of CPU cores allowed to be used.
- property Program: str¶
Return descriptive text for the name of the program running this job, e.g. Jaguar. This field is optional and may return an empty string.
- property Version: str¶
Return the build number.
- property Project: str¶
Return the job’s project name field. This will be an empty string if no project is set.
- property QueueHost: str¶
Return the hostname of the submission node of a HPC queueing system. If not an HPC host, this will be an empty string.
- property StructureOutputFile: str¶
Return the name of the file returned by the job that will get incorporated into a project of Maestro. Returns an empty string if no file is specified.
- property DisplayStatus: Optional[schrodinger.job.jobcontrol.DisplayStatus]¶
Return a user-focused status that indicates the current state of the job.
Returns None in the case of non JOB_SERVER jobs.
- property StatusChangeReason: str¶
Returns a human-readable reason that a job entered its current state, such as “job canceled by the user.” If the reason was not recorded or is not particularly interesting (e.g. normal transition from waiting to running) it may be the empty string.
- property Status: str¶
Get the Status of the job. This is used by legacy jobcontrol API, but is superseded by DisplayStatus for JOB_SERVER jobs.
- property StartTime: str¶
Return a string for the starting time of the job. Returns an empty string if the job is not yet started, for example, enqueued in an HPC environment.
- property StopTime: str¶
Return a string for the completion time of the job. Returns an empty string if the job is not yet completed.
- property StatusTime: str¶
Return a string for the time when the job was last updated.
- property Viewname: str¶
Return a representation of name used to filter jobs in maestro. May be empty.
- property ExitStatus: str¶
Get the ExitStatus of the job. This is a string representation of a job. Consider using DisplayStatus instead.
- Raises
RuntimeError if the job is not yet complete.
- property JobDir: str¶
Return the directory where the job is run. This will be an empty string if the job has not yet started.
- property JobHost: str¶
Return the hostname where the job is run. This will be an empty string if the job has not yet started.
- property JobSchrodinger: str¶
Return the directory of Schrodinger installation where the job is running.
- property Envs: List[str]¶
Return a list of environment varaibles that are set by job, in addition to a default environment on a machine. The format is [“SPECIAL_VAR=0”, “SPECIAL_VAR2=yes”]
- property Licenses: List[str]¶
Return a list of licenses needed for the job in the format ‘license_name:tokens’.
- property Errors: List[str]¶
Return possible error messages associated with a job. This will only return values in legacy jobcontrol.
- property Commandline: str¶
Return the command used to launch the job.
Note that this may not be accurate when the job is called directly from a jobspec. In that case it will instead return the commandline of the parent process.
- property User: str¶
Return the username of user who launched the job.
- getApplicationHeaderFields(default=None) Dict[str, str] ¶
Returns a dictionary of commonly used jobcontrol keyword:value pairs used to standardize application log files.
- Parameters
default (any) – Value assigned to a keyword if the corresponding attribute is not defined.
- getApplicationHeaderString(field_sep: str = ' : ') str ¶
Returns a formatted string, suitable for printing at the top of a log file printing helpful information about the state of the job.
- Parameters
field_sep – String that delimits the keyword and value.
Example:
backend = schrodinger.job.jobcontrol.get_backend() if backend: print backend.getJob().getApplicationHeaderString()
- getInputFiles() List[str] ¶
- property InputFiles: List[str]¶
Return list of files that will be transferred to the job directory on launch.
- property JobDB: str¶
Path to the Job Database in legacy jobcontrol. This is an empty str for JOB_SERVER jobs.
- property OrigLaunchDir: str¶
Return the launch directory of the oldest ancestor of this job.
- property OrigLaunchHost: str¶
Return the hostname of the oldest ancestor of this job.
- getOutputFiles() List[str] ¶
- property OutputFiles: List[str]¶
Return a list of output filenames which will be copied back, if existing, at the end of a job.
Note that this list can grow while the backend is running, since output files can be registered by the backend.
- getProgressAsPercentage() float ¶
Get the value of backend job progress in terms of percentage (values from 0.0 - 100.0)
Return 0.0 when a job is not yet in running state.
- getProgressAsSteps() Tuple[int, int] ¶
Get the value of backend job progress in terms of steps and totalsteps. Return (0,1) when a job is not yet in ‘running’ state.
- getProgressAsString() str ¶
Get the value of backend job progress in terms of descriptive text. Return “The job has not yet started.” when a job is not yet in running state.
- purgeRecord()¶
Purge the job record for the job from the database.
- schrodinger.job.jobcontrol.launch_job(cmd: List[str], print_output: bool = False, expandvars: bool = True, launch_dir: Optional[str] = None, timeout: Optional[int] = None, env: Optional[Dict[str, str]] = None, show_failure_dialog: bool = True, _debug_delay=None) schrodinger.job.jobcontrol.Job ¶
Run a process under job control and return a Job object. For a process to be under job control, it must print a valid JobId: line to stdout. If such a line isn’t printed, a RuntimeError will be raised.
The cmd argument should be a list of command arguments (including the executable) as expected by the subprocess module.
If the executable is present in $SCHRODINGER or $SCHRODINGER/utilities, an absolute path does not need to be specified.
NOTE: UI events will be processed while the job is launching.
- Parameters
print_output – Determines if the output from job launch is printed to the terminal or not. Output will be logged (to stderr by default) if Python or JobControl debugging is turned on or if there is a launch failure, even if ‘print_output’ is False.
expandvars – If True, any environment variables of the form
$var
or${var
} will be expanded with their values by theos.path.expandvars
function.launch_dir – Launch the job from the specified directory. If unspecified use current working directory.
timeout – Timeout (in seconds) to be applied while waiting for the job control launch process to start or finish. The launch process will be terminated after this time. If None, the launch process will run with a default timeout of 300s under jobcontrol, or 40000s under job server.
env – This dictionary will replace the environment for the launch process. If env is None, use the current environment for the launch process.
show_failure_dialog – If True, show failure dialog if we detect we are using a graphical application and the job launch fails.
- Raises
RuntimeError – If there is a problem launching the job (e.g., no JobId gets printed). If running within Maestro, an error dialog will first be shown to the user.
FileNotFoundError – If launch_dir doesn’t exist.
- schrodinger.job.jobcontrol.prepend_schrodinger_run(cmd: List[str]) List[str] ¶
Check if a command executes a Python script and prepend $SCHRODINGER/run to the command if it does not already begin with it.
- Parameters
cmd – Command to prepend $SCHRODINGER/run to.
- schrodinger.job.jobcontrol.fix_cmd(cmd: List[str], expandvars: bool = True) List[str] ¶
A function to clean up the command passed to launch_job.
- Parameters
cmd – A list of strings for command line launching.
expandvars – If True, any environment variables of the form
$var
or${var
} will be expanded with their values by theos.path.expandvars
function.
- Returns
The command to be launched
- schrodinger.job.jobcontrol.list2jmonitorcmdline(cmdlist: List[str]) str ¶
Turn a command in list form to a single string that can be executed by jmonitor.
- schrodinger.job.jobcontrol.get_launch_command_without_toplevel()¶
Returns the command which can be used for launching the job without going through the toplevel script (i.e., without $SCHRODINGER/run). Launch arguments have to be appended to this command.
- schrodinger.job.jobcontrol.input_file_arguments(job_spec, launch_parameters, write_output)¶
Return a set of file arguments (a list of (option, value) tuples) corresponding to the input files of a given job. If any of the input files are missing, raises an error.
- schrodinger.job.jobcontrol.file_arguments_for_launch_command(file_args)¶
Given a set of “raw” file arguments, return the set of those to be used on an actual command line. If the given set is too long, the arguments will be written to an argfile. (It is the responsibility of the caller to remove that file after use.)
- schrodinger.job.jobcontrol.total_file_arguments_length(args)¶
Determine the total length of the given set of file arguments (which is a list of 2-tuples) as they would be represented on the command line.
- schrodinger.job.jobcontrol.write_argfile(file_args)¶
Write a set of file arguments to a temporary “argfile” (one option-value pair per line) and return the name of that file. (The caller is responsible for removing it.)
- Parameters
file_args – A list of (option, value) tuples
- schrodinger.job.jobcontrol.launch_from_job_spec(job_spec, launch_parameters, display_commandline: Optional[str] = None, wait: bool = False) schrodinger.job.jobcontrol.Job ¶
Launch a job based on its specification.
- Parameters
job_spec (schrodinger.job.launchapi.JobSpecification) – Data defining the job.
launch_parameters (schrodinger.job.launchparams.LaunchParameters) – Data defining how the job is run
display_commandline – commandline attribute of resulting job. Most cases will require this value to be specified, optional value to make it easier to refactor out in the future.
wait – Indicates the job is passed with option to wait which helps to decide if downloaderd has to start for jobserver job.
- Returns
A schrodinger.job.jobcontrol.Job object.
- schrodinger.job.jobcontrol.get_backend() Optional[schrodinger.job.jobcontrol._Backend] ¶
A convenience function to see if we’re running under job control. If so, return a _Backend object. Otherwise, return None.
- schrodinger.job.jobcontrol.get_runtime_path(pathname: str) str ¶
Return the runtime path for the input file ‘pathname’.
If the pathname is of a type that job control will not copy to the job directory or no runtime file can be found, returns the original path name.
- schrodinger.job.jobcontrol.under_job_control() bool ¶
Returns True if this process is running under job control; False otherwise.
- class schrodinger.job.jobcontrol.Host(name: str)¶
Bases:
object
A class to encapsulate host info from the schrodinger.hosts file.
Use the module level functions get_host or get_hosts to create Host instances.
- Variables
name – Label for the Host.
user – Username by which to run jobs.
processors – Number of processors for the host/cluster.
processors_per_node – Number of processors per node on host/cluster
tmpdir – Temporary/scratch directory to use for jobs. List
schrodinger – $SCHRODINGER installation to use for jobs.
env – Variables to set in the job environment. List.
gpgpu – GPGPU entries. List.
queue – Queue entries only. Queue type (e.g., SGE, PBS).
qargs – Queue entries only. Optional arguments passed to the queue submission command.
- __init__(name: str)¶
Create a named Host object. The various host attributes must be set after object instantiation.
Only host-entry fields can be public attributes of a Host object. Attributes introduced to capture other information about the entry must be private (named with a leading underscore.)
- Parameters
name – name of the host entry.
- to_hostentry() str ¶
Return a string representation of the Host object suitable for including in a hosts file.
- getHost() str ¶
Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.
- setHost(host: str)¶
Store host as _host to allow us to use a property for the ‘host’ attr.
- property host: str¶
Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.
- isQueue() bool ¶
Check to see whether the host represents a batch queue. Returns True if the host is a HPC queueing system.
- schrodinger.job.jobcontrol.get_hostfile() str ¶
Return the name of the schrodinger.hosts file last used by get_hosts(). The file is found using the standard search path ($SCHRODINGER_HOSTS, local dir, $HOME/.schrodinger, $SCHRODINGER).
- schrodinger.job.jobcontrol.hostfile_is_empty(host_filepath: str) bool ¶
Return if the given host_filepath host is empty, meaning it contains only the localhost entry. If the host_filepath str is empty or invalid, then this function will raise an invalid path exception - IOError.
- Parameters
host_filepath (str) – schrodinger.hosts file to use.
- schrodinger.job.jobcontrol.get_installed_hostfiles(root_dir='') List[str] ¶
Return the pathname for the schrodinger.hosts file installed in the most recent previous installation directory we can find.
If a root pathname is passed in, previous installations are searched for there. Otherwise, we look in the standard install locations.
- schrodinger.job.jobcontrol.get_hosts() List[schrodinger.job.jobcontrol.Host] ¶
Return a list of all Hosts in the schrodinger.hosts file. After this is called, get_hostfile() will return the pathname for the schrodinger.hosts file that was used. Raises UnreadableHostsFileException or MissingHostsFileException on error.
- schrodinger.job.jobcontrol.hostfile_is_valid(fname: str) Tuple[bool, str] ¶
- Parameters
fname – The full path of the host file to validate
- Returns
a (bool, str) tuple indicating whether the host file is valid
- schrodinger.job.jobcontrol.is_hostname_known(hostname: str) bool ¶
Check whether hostname is defined in the host file. This function is used to distinguish known hosts from the automatically created localhost-equivalent Hosts provided by the get_host function.
- Parameters
hostname – the hostname to check against the host file.
- Returns
whether the hostname is in the host file.
- schrodinger.job.jobcontrol.get_host(name: str) schrodinger.job.jobcontrol.Host ¶
Return a Host object for the named host. If the host is not found, we return a Host object with the provided name and details that match localhost. This matches behavior that jobcontrol uses. Raises UnreadableHostsFileException or MissingHostsFileException on error.
- schrodinger.job.jobcontrol.get_gpgpu_params(gpgpu_str: str) Tuple[str, str] ¶
Convert a gpgpu string (ex. “0,V100”) to a tuple (index, description). Raise an exception if the string is invalid.
- Parameters
gpugpu_str – gpgpu line from schrodinger.hosts (ex. “0,V100”)
- Return type
tuple(str, str)
- Raises
ValueError if the input is invalid
- schrodinger.job.jobcontrol.host_str_to_list(hosts_str: str) List[Tuple[str, int]] ¶
Convert a hosts string (Ex: “galina:1 monica:4”) to a list of tuples. First value of each tuple is the host, second value is # of cpus.
- schrodinger.job.jobcontrol.host_list_to_str(host_list: List[Tuple[str, int]]) str ¶
Converts a hosts list [(‘host1’,1), (‘host2’, 10)] to a string. Output example: “host1:1,host2:10”
- schrodinger.job.jobcontrol.get_command_line_host_list() Optional[List[Tuple[str, int]]] ¶
Return a list of (host, ncpu) tuples corresponding to the host list that is specified on the command line.
This function is meant to be called by scripts that are running under a toplevel job control script but are not running under jlaunch.
- The host list is determined from the following sources:
SCHRODINGER_NODELIST
JOBHOST (if only a single host is specified)
“localhost” (if no host is specified)
If no SCHRODINGER_NODELIST is present in the environment, None is returned.
- schrodinger.job.jobcontrol.get_backend_host_list() Optional[List[Tuple[str, int]]] ¶
Return a list of (host, ncpu) tuples corresponding to the host list as determined from the SCHRODINGER_NODEFILE.
This function is meant to be called from scripts that are running under jlaunch (i.e. backend scripts).
Returns None if SCHRODINGER_NODEFILE is not present in the environment.
- schrodinger.job.jobcontrol.get_host_list() List[Tuple[str, int]] ¶
Return the host list for the current process. If running under jobcontrol, returns the backend host list; otherwise, returns a host list derived from parsing the commandline -HOST argument.
- Returns
The job hosts from the backend or the command line. If the job hosts are undefined, the default return value is [(“localhost”, 1)].
- schrodinger.job.jobcontrol.calculate_njobs(host_list: Union[str, List[Tuple[str, int]]] = None) int ¶
Derive the number of jobs from the specified host list. This function is useful to determine number of subjobs if user didn’t specified the ‘-NJOBS’ option.
- Parameters
host_list – String of hosts along with optional number of subjobs -HOST my_cluster:20 or list of tuples of hosts, typically one element [(my_cluster, 20)]
If host list is not specified then it uses get_command_line_host_list() to determine njobs, else uses the user provided host list.
- schrodinger.job.jobcontrol.is_valid_hostname(hostname: str) bool ¶
Checks if the hostname is valid.
- Parameters
hostname – host name
- schrodinger.job.jobcontrol.get_jobname(filename: Optional[str] = None) Optional[str] ¶
Figure out the jobname from the first available source: 1) the SCHRODINGER_JOBNAME environment variable (comes from -JOBNAME during startup); 2) the job control backend; 3) the basename of a given filename.
- Parameters
filename – if provided, and the jobname can’t otherwise be determined, (e.g., running outside job control with no -FILENAME argument), construct a jobname from its basename.
- Returns
jobname (may be None if filename was not provided)
- schrodinger.job.jobcontrol.register_job_output(job: schrodinger.job.jobcontrol.Job)¶
Registers the output and log files associated with the given job to the backend if running under jobcontrol.
- Parameters
job – job from which to extract output/log files