schrodinger.job.jobcontrol module

Core job control for python.

There are currently four major sections of this module - “Job database,” “Job launching,” “Job backend,” and “Job hosts.” The job database section deals with getting info about existing Jobs, the job launching section deals with starting up a subjob, and the job backend section provides utilities for a python script running as a job.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.job.jobcontrol.DisplayStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

WAITING = 'Waiting'
RUNNING = 'Running'
CANCELED = 'Canceled'
STOPPED = 'Stopped'
FAILED = 'Failed'
COMPLETED = 'Completed'
schrodinger.job.jobcontrol.timestamp(msg)
exception schrodinger.job.jobcontrol.JobcontrolException

Bases: Exception

exception schrodinger.job.jobcontrol.JobLaunchFailure

Bases: schrodinger.job.jobcontrol.JobcontrolException, RuntimeError

exception schrodinger.job.jobcontrol.MissingHostsFileException

Bases: schrodinger.job.jobcontrol.JobcontrolException

exception schrodinger.job.jobcontrol.UnreadableHostsFileException

Bases: schrodinger.job.jobcontrol.JobcontrolException

class schrodinger.job.jobcontrol.Job(job_id: str, cpp_job: mmjob.Job = None)

Bases: object

A Job instance is always a snapshot of the job state at a specific point in time. It is only updated when the readAgain method is explicitly invoked.

__init__(job_id: str, cpp_job: mmjob.Job = None)

Initialize a read-only Job object.

Parameters
  • job_id – Unique identifier for a job

  • cpp_job – provide a c++ job object in memory, used for constructing objects in wrapper objects from c++ APIs, rather than direct construction.

readAgain()

Reread the database. Calling this routine is necessary to get fresh values.

isComplete(wait_for_exited=False) bool

Returns True if the job is complete.

Note that this does not necessarily mean the output files have been downloaded.

Parameters

wait_for_exited – If set, wait for the job to be completed, as long as the job’s ExitStatus is set. (This only makes sense for legacy jobcontrol jobs.)

isQueued() bool

Returns True if the job runs on a HPC queueing system.

succeeded() bool

Returns False if the job was killed, died or fizzled. Returns True if ExitStatus is finished.

Raises

RuntimeError – if the job isn’t completed, so use isComplete() before calling.

wait_before_kill()
stop()

Kill the job while collecting output files.

kill()

Kill the job if it is running. This cancels a running job and does not return output files.

cancel()

Cancel a running job and do not return output files. This method will eventually deprecate job.kill

kill_for_smart_distribution() bool

Kill the job for smart distribution. This method cancels a job successfully when waiting or sitting in the queue.

Return True if canceled, False otherwise.

wait(max_interval: int = 60, throw_on_failure: bool = False)

Wait for the job to complete; sleeping up to ‘max_interval’ seconds between each database check. (Interval increase gradually from 2 sec up to the maximum.)

NOTE: Do not use if your program is running in Maestro, as this

will make Maestro unresponsive while the job is running.

Parameters

throw_on_failure (bool) – whether to raise an exception if not succeeded

Raises

RuntimeError – if the job did not succeed. The error message will contain the last 20 lines of the job’s logfile (if available).

download()

Download the output of the job into job’s launch directory. No-op in legacy jobcontrol.

get(attr, default=None)

This function will always raise an error, but is provided to guide users to a new syntax.

summary() str

Return a string summarizing all current Job attributes.

getDuration() Optional[int]

Returns the wallclock running time of the job if it is complete. This does not include time is submission status. Returns time in seconds. If the job is not complete, returns None.

isDownloaded()

Check if output files were downloaded. For legacy job control, identical to isComplete().

Returns

Whether the job files were downloaded.

Return type

bool

property BatchId: Optional[str]

Return the batch id, if running on an HPC queueing system. Otherwise return None.

property Dir: str

Return the absolute path of the launch directory.

property ExitCode: Union[int, str]

Returns the exit code of a process. If the job is still running, or it was killed without collecting the exit code, return a string indicating unknown status.

property Host: str

Return the hostname of the host which launched this job.

property HostEntry: str

Return the name of the host entry this job was launched to.

property LaunchTime: str

Return a string timestamp for the time that the job was launched. This will before the job starts running, as soon as it is registered with jobcontrol as a job to be run.

property JobId: str

Return an identifier for a job.

property Name: str

Returns a string representing -JOBNAME that was specified on launch. This may be an empty string.

property ParentJobId: Optional[str]

Return the jobid of a parent job. If the job does not have a parent, return None.

property Processors: int

For a batch job, returns the number of queue slots attached to this job. For a local job, return the number of CPU cores allowed to be used.

property Program: str

Return descriptive text for the name of the program running this job, e.g. Jaguar. This field is optional and may return an empty string.

property Version: str

Return the build number.

property Project: str

Return the job’s project name field. This will be an empty string if no project is set.

property QueueHost: str

Return the hostname of the submission node of a HPC queueing system. If not an HPC host, this will be an empty string.

property StructureOutputFile: str

Return the name of the file returned by the job that will get incorporated into a project of Maestro. Returns an empty string if no file is specified.

property DisplayStatus: Optional[schrodinger.job.jobcontrol.DisplayStatus]

Return a user-focused status that indicates the current state of the job.

Returns None in the case of non JOB_SERVER jobs.

property StatusChangeReason: str

Returns a human-readable reason that a job entered its current state, such as “job canceled by the user.” If the reason was not recorded or is not particularly interesting (e.g. normal transition from waiting to running) it may be the empty string.

property Status: str

Get the Status of the job. This is used by legacy jobcontrol API, but is superseded by DisplayStatus for JOB_SERVER jobs.

property StartTime: str

Return a string for the starting time of the job. Returns an empty string if the job is not yet started, for example, enqueued in an HPC environment.

property StopTime: str

Return a string for the completion time of the job. Returns an empty string if the job is not yet completed.

property StatusTime: str

Return a string for the time when the job was last updated.

property Viewname: str

Return a representation of name used to filter jobs in maestro. May be empty.

property ExitStatus: str

Get the ExitStatus of the job. This is a string representation of a job. Consider using DisplayStatus instead.

Raises

RuntimeError if the job is not yet complete.

property JobDir: str

Return the directory where the job is run. This will be an empty string if the job has not yet started.

property JobHost: str

Return the hostname where the job is run. This will be an empty string if the job has not yet started.

property JobSchrodinger: str

Return the directory of Schrodinger installation where the job is running.

property Envs: List[str]

Return a list of environment varaibles that are set by job, in addition to a default environment on a machine. The format is [“SPECIAL_VAR=0”, “SPECIAL_VAR2=yes”]

property Licenses: List[str]

Return a list of licenses needed for the job in the format ‘license_name:tokens’.

property Errors: List[str]

Return possible error messages associated with a job. This will only return values in legacy jobcontrol.

property LogFiles: List[str]

Get list of log files associated with a log. May be an empty list.

property SubJobs: List[str]

Return list of subjob job ids.

property Commandline: str

Return the command used to launch the job.

Note that this may not be accurate when the job is called directly from a jobspec. In that case it will instead return the commandline of the parent process.

property User: str

Return the username of user who launched the job.

getApplicationHeaderFields(default=None) Dict[str, str]

Returns a dictionary of commonly used jobcontrol keyword:value pairs used to standardize application log files.

Parameters

default (any) – Value assigned to a keyword if the corresponding attribute is not defined.

getApplicationHeaderString(field_sep: str = ' : ') str

Returns a formatted string, suitable for printing at the top of a log file printing helpful information about the state of the job.

Parameters

field_sep – String that delimits the keyword and value.

Example:

backend = schrodinger.job.jobcontrol.get_backend()
if backend:
    print backend.getJob().getApplicationHeaderString()
getInputFiles() List[str]
property InputFiles: List[str]

Return list of files that will be transferred to the job directory on launch.

property JobDB: str

Path to the Job Database in legacy jobcontrol. This is an empty str for JOB_SERVER jobs.

property OrigLaunchDir: str

Return the launch directory of the oldest ancestor of this job.

property OrigLaunchHost: str

Return the hostname of the oldest ancestor of this job.

getOutputFiles() List[str]
property OutputFiles: List[str]

Return a list of output filenames which will be copied back, if existing, at the end of a job.

Note that this list can grow while the backend is running, since output files can be registered by the backend.

getProgressAsPercentage() float

Get the value of backend job progress in terms of percentage (values from 0.0 - 100.0)

Return 0.0 when a job is not yet in running state.

getProgressAsSteps() Tuple[int, int]

Get the value of backend job progress in terms of steps and totalsteps. Return (0,1) when a job is not yet in ‘running’ state.

getProgressAsString() str

Get the value of backend job progress in terms of descriptive text. Return “The job has not yet started.” when a job is not yet in running state.

purgeRecord()

Purge the job record for the job from the database.

schrodinger.job.jobcontrol.launch_job(cmd: List[str], print_output: bool = False, expandvars: bool = True, launch_dir: Optional[str] = None, timeout: Optional[int] = None, env: Optional[Dict[str, str]] = None, show_failure_dialog: bool = True, _debug_delay=None) schrodinger.job.jobcontrol.Job

Run a process under job control and return a Job object. For a process to be under job control, it must print a valid JobId: line to stdout. If such a line isn’t printed, a RuntimeError will be raised.

The cmd argument should be a list of command arguments (including the executable) as expected by the subprocess module.

If the executable is present in $SCHRODINGER or $SCHRODINGER/utilities, an absolute path does not need to be specified.

NOTE: UI events will be processed while the job is launching.

Parameters
  • print_output – Determines if the output from job launch is printed to the terminal or not. Output will be logged (to stderr by default) if Python or JobControl debugging is turned on or if there is a launch failure, even if ‘print_output’ is False.

  • expandvars – If True, any environment variables of the form $var or ${var} will be expanded with their values by the os.path.expandvars function.

  • launch_dir – Launch the job from the specified directory. If unspecified use current working directory.

  • timeout – Timeout (in seconds) to be applied while waiting for the job control launch process to start or finish. The launch process will be terminated after this time. If None, the launch process will run with a default timeout of 300s under jobcontrol, or 40000s under job server.

  • env – This dictionary will replace the environment for the launch process. If env is None, use the current environment for the launch process.

  • show_failure_dialog – If True, show failure dialog if we detect we are using a graphical application and the job launch fails.

Raises
  • RuntimeError – If there is a problem launching the job (e.g., no JobId gets printed). If running within Maestro, an error dialog will first be shown to the user.

  • FileNotFoundError – If launch_dir doesn’t exist.

schrodinger.job.jobcontrol.prepend_schrodinger_run(cmd: List[str]) List[str]

Check if a command executes a Python script and prepend $SCHRODINGER/run to the command if it does not already begin with it.

Parameters

cmd – Command to prepend $SCHRODINGER/run to.

schrodinger.job.jobcontrol.fix_cmd(cmd: List[str], expandvars: bool = True) List[str]

A function to clean up the command passed to launch_job.

Parameters
  • cmd – A list of strings for command line launching.

  • expandvars – If True, any environment variables of the form $var or ${var} will be expanded with their values by the os.path.expandvars function.

Returns

The command to be launched

schrodinger.job.jobcontrol.list2jmonitorcmdline(cmdlist: List[str]) str

Turn a command in list form to a single string that can be executed by jmonitor.

schrodinger.job.jobcontrol.get_launch_command_without_toplevel()

Returns the command which can be used for launching the job without going through the toplevel script (i.e., without $SCHRODINGER/run). Launch arguments have to be appended to this command.

schrodinger.job.jobcontrol.input_file_arguments(job_spec, launch_parameters, write_output)

Return a set of file arguments (a list of (option, value) tuples) corresponding to the input files of a given job. If any of the input files are missing, raises an error.

schrodinger.job.jobcontrol.file_arguments_for_launch_command(file_args)

Given a set of “raw” file arguments, return the set of those to be used on an actual command line. If the given set is too long, the arguments will be written to an argfile. (It is the responsibility of the caller to remove that file after use.)

schrodinger.job.jobcontrol.total_file_arguments_length(args)

Determine the total length of the given set of file arguments (which is a list of 2-tuples) as they would be represented on the command line.

schrodinger.job.jobcontrol.write_argfile(file_args)

Write a set of file arguments to a temporary “argfile” (one option-value pair per line) and return the name of that file. (The caller is responsible for removing it.)

Parameters

file_args – A list of (option, value) tuples

schrodinger.job.jobcontrol.get_launch_command_from_args(command_line_args: List[str], get_job_spec_from_args: Callable[[List[str]], schrodinger.job.launchapi.JobSpecification]) List[str]

Get a command-line that will launch a job based on a list of user-level command line arguments such as [“$SCHRODINGER/testapp”, “-t”, “30”, “-HOST”, “bolt_personal”]. This is preferable to shelling out to run the command because it avoids starting a new toplevel process stack. The overhead of the launch process stack can be significant because it is a shell script that starts a python process. The python process may import many modules, especially if the get_job_spec_from_args function is used.

Parameters
  • command_line_args – The arguments that would be used to launch the job on the command-line, include job arguments like -HOST, -NPROC, etc.

  • get_job_spec_from_args – A function that takes the command line arguments and returns a JobSpecification object.

Returns

A list of arguments that can be executed used to launch the job.

schrodinger.job.jobcontrol.launch_from_job_spec(job_spec, launch_parameters, display_commandline: Optional[str] = None, wait: bool = False) schrodinger.job.jobcontrol.Job

Launch a job based on its specification.

Parameters
  • job_spec (schrodinger.job.launchapi.JobSpecification) – Data defining the job.

  • launch_parameters (schrodinger.job.launchparams.LaunchParameters) – Data defining how the job is run

  • display_commandline – commandline attribute of resulting job. Most cases will require this value to be specified, optional value to make it easier to refactor out in the future.

  • wait – Indicates the job is passed with option to wait which helps to decide if downloaderd has to start for jobserver job.

Returns

A schrodinger.job.jobcontrol.Job object.

schrodinger.job.jobcontrol.get_backend() Optional[schrodinger.job.jobcontrol._Backend]

A convenience function to see if we’re running under job control. If so, return a _Backend object. Otherwise, return None.

schrodinger.job.jobcontrol.get_runtime_path(pathname: str) str

Return the runtime path for the input file ‘pathname’.

If the pathname is of a type that job control will not copy to the job directory or no runtime file can be found, returns the original path name.

schrodinger.job.jobcontrol.under_job_control() bool

Returns True if this process is running under job control; False otherwise.

class schrodinger.job.jobcontrol.Host(name: str)

Bases: object

A class to encapsulate host info from the schrodinger.hosts file.

Use the module level functions get_host or get_hosts to create Host instances.

Variables
  • name – Label for the Host.

  • user – Username by which to run jobs.

  • processors – Number of processors for the host/cluster.

  • processors_per_node – Number of processors per node on host/cluster

  • tmpdir – Temporary/scratch directory to use for jobs. List

  • schrodinger – $SCHRODINGER installation to use for jobs.

  • env – Variables to set in the job environment. List.

  • gpgpu – GPGPU entries. List.

  • queue – Queue entries only. Queue type (e.g., SGE, PBS).

  • qargs – Queue entries only. Optional arguments passed to the queue submission command.

__init__(name: str)

Create a named Host object. The various host attributes must be set after object instantiation.

Only host-entry fields can be public attributes of a Host object. Attributes introduced to capture other information about the entry must be private (named with a leading underscore.)

Parameters

name – name of the host entry.

to_hostentry() str

Return a string representation of the Host object suitable for including in a hosts file.

getHost() str

Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.

setHost(host: str)

Store host as _host to allow us to use a property for the ‘host’ attr.

property host: str

Return the name of the host, which defaults to ‘name’ if a separate ‘host’ attribute wasn’t specified.

isQueue() bool

Check to see whether the host represents a batch queue. Returns True if the host is a HPC queueing system.

schrodinger.job.jobcontrol.get_hostfile() str

Return the name of the schrodinger.hosts file last used by get_hosts(). The file is found using the standard search path ($SCHRODINGER_HOSTS, local dir, $HOME/.schrodinger, $SCHRODINGER).

schrodinger.job.jobcontrol.hostfile_is_empty(host_filepath: str) bool

Return if the given host_filepath host is empty, meaning it contains only the localhost entry. If the host_filepath str is empty or invalid, then this function will raise an invalid path exception - IOError.

Parameters

host_filepath (str) – schrodinger.hosts file to use.

schrodinger.job.jobcontrol.get_installed_hostfiles(root_dir='') List[str]

Return the pathname for the schrodinger.hosts file installed in the most recent previous installation directory we can find.

If a root pathname is passed in, previous installations are searched for there. Otherwise, we look in the standard install locations.

schrodinger.job.jobcontrol.get_hosts() List[schrodinger.job.jobcontrol.Host]

Return a list of all Hosts in the schrodinger.hosts file. After this is called, get_hostfile() will return the pathname for the schrodinger.hosts file that was used. Raises UnreadableHostsFileException or MissingHostsFileException on error.

schrodinger.job.jobcontrol.hostfile_is_valid(fname: str) Tuple[bool, str]
Parameters

fname – The full path of the host file to validate

Returns

a (bool, str) tuple indicating whether the host file is valid

schrodinger.job.jobcontrol.is_hostname_known(hostname: str) bool

Check whether hostname is defined in the host file. This function is used to distinguish known hosts from the automatically created localhost-equivalent Hosts provided by the get_host function.

Parameters

hostname – the hostname to check against the host file.

Returns

whether the hostname is in the host file.

schrodinger.job.jobcontrol.get_host(name: str) schrodinger.job.jobcontrol.Host

Return a Host object for the named host. If the host is not found, we return a Host object with the provided name and details that match localhost. This matches behavior that jobcontrol uses. Raises UnreadableHostsFileException or MissingHostsFileException on error.

schrodinger.job.jobcontrol.get_gpgpu_params(gpgpu_str: str) Tuple[str, str]

Convert a gpgpu string (ex. “0,V100”) to a tuple (index, description). Raise an exception if the string is invalid.

Parameters

gpugpu_str – gpgpu line from schrodinger.hosts (ex. “0,V100”)

Return type

tuple(str, str)

Raises

ValueError if the input is invalid

schrodinger.job.jobcontrol.host_str_to_list(hosts_str: str) List[Tuple[str, int]]

Convert a hosts string (Ex: “galina:1 monica:4”) to a list of tuples. First value of each tuple is the host, second value is # of cpus.

schrodinger.job.jobcontrol.host_list_to_str(host_list: List[Tuple[str, int]]) str

Converts a hosts list [(‘host1’,1), (‘host2’, 10)] to a string. Output example: “host1:1,host2:10”

schrodinger.job.jobcontrol.get_command_line_host_list() Optional[List[Tuple[str, int]]]

Return a list of (host, ncpu) tuples corresponding to the host list that is specified on the command line.

This function is meant to be called by scripts that are running under a toplevel job control script but are not running under jlaunch.

The host list is determined from the following sources:
  1. SCHRODINGER_NODELIST

  2. JOBHOST (if only a single host is specified)

  3. “localhost” (if no host is specified)

If no SCHRODINGER_NODELIST is present in the environment, None is returned.

schrodinger.job.jobcontrol.get_backend_host_list() Optional[List[Tuple[str, int]]]

Return a list of (host, ncpu) tuples corresponding to the host list as determined from the SCHRODINGER_NODEFILE.

This function is meant to be called from scripts that are running under jlaunch (i.e. backend scripts).

Returns None if SCHRODINGER_NODEFILE is not present in the environment.

schrodinger.job.jobcontrol.get_host_list() List[Tuple[str, int]]

Return the host list for the current process. If running under jobcontrol, returns the backend host list; otherwise, returns a host list derived from parsing the commandline -HOST argument.

Returns

The job hosts from the backend or the command line. If the job hosts are undefined, the default return value is [(“localhost”, 1)].

schrodinger.job.jobcontrol.calculate_njobs(host_list: Union[str, List[Tuple[str, int]]] = None) int

Derive the number of jobs from the specified host list. This function is useful to determine number of subjobs if user didn’t specified the ‘-NJOBS’ option.

Parameters

host_list – String of hosts along with optional number of subjobs -HOST my_cluster:20 or list of tuples of hosts, typically one element [(my_cluster, 20)]

If host list is not specified then it uses get_command_line_host_list() to determine njobs, else uses the user provided host list.

schrodinger.job.jobcontrol.is_valid_hostname(hostname: str) bool

Checks if the hostname is valid.

Parameters

hostname – host name

schrodinger.job.jobcontrol.get_jobname(filename: Optional[str] = None) Optional[str]

Figure out the jobname from the first available source: 1) the SCHRODINGER_JOBNAME environment variable (comes from -JOBNAME during startup); 2) the job control backend; 3) the basename of a given filename.

Parameters

filename – if provided, and the jobname can’t otherwise be determined, (e.g., running outside job control with no -FILENAME argument), construct a jobname from its basename.

Returns

jobname (may be None if filename was not provided)

schrodinger.job.jobcontrol.register_job_output(job: schrodinger.job.jobcontrol.Job)

Registers the output and log files associated with the given job to the backend if running under jobcontrol.

Parameters

job – job from which to extract output/log files