schrodinger.application.matsci.genetic_optimization module

Classes and functions for the genetic optimization module.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.application.matsci.genetic_optimization.PropertyInfo(key, units, is_positive, class_evaluator, class_kwargs)

Bases: tuple

class_evaluator

Alias for field number 3

class_kwargs

Alias for field number 4

is_positive

Alias for field number 2

key

Alias for field number 0

units

Alias for field number 1

class schrodinger.application.matsci.genetic_optimization.ClassEvaluator(structs, properties)

Bases: object

Manage a class evaluator.

SEPARATOR = '_'
HOST_STR = 'host_str'
__init__(structs, properties)

Create an instance.

Parameters
  • structs (list of schrodinger.structure.Structure) – contains input structures

  • properties (list) – contains Property

getBaseName(struct, aproperty)

Get the base name.

Parameters
Return type

str

Returns

the base name

runIt()

Run it.

Raises

RuntimeError – for any issue

setQueue(job_dj)

Set a JobDJ to run jobs.

Parameters

job_dj (schrodinger.job.queue.JobDJ) – the queue

getQueue(properties, tpp=1, disable_smart_distribution=True)

Return a JobDJ to run jobs.

Parameters
  • properties (list) – contains Property

  • tpp (int) – the threads per process

  • disable_smart_distribution (bool) – if True then force disable smart distribution regardless of the input tpp value

Return type

JobDJ

Returns

the queue

static getHostStr(host_str=None)

Return the host string.

Parameters

host_str (str) – the host string, for example ‘localhost:4’

Return type

str

Returns

the host string

class schrodinger.application.matsci.genetic_optimization.StructureEvaluator(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.ClassEvaluator

Manage structure evaluation.

SMARTS_PATTERN_SEPARATOR = '_'
SMARTS_PROP = 'smarts'
MOL_WEIGHT_PROP = 'molecular_weight'
NATOMS_PROP = 'natoms'
NELEMENTS_PROP = 'nelements'
PROPERTIES = {'molecular_weight', 'natoms', 'nelements', 'smarts'}
SMARTS_KEY = 'i_matsci_SMARTS_property_%s'
MOL_WEIGHT_KEY = 'r_m_Molecular_weight'
NATOMS_KEY = 'i_m_Number_of_atoms'
NELEMENTS_KEY = 'i_m_Number_of_elements'
NO_UNITS = 'None'
MOL_WEIGHT_UNITS = 'g/mol'
PATTERNS = 'patterns'
__init__(structs, properties)

See parent class for documentation.

static getInfo(key, units, patterns=None, host_str=None)

Return a PropertyInfo.

Parameters
  • key (str) – the property key

  • units (str) – the property units

  • patterns (list) – the SMARTS patterns

  • host_str (str) – the host string, for example ‘localhost:4’

Return type

PropertyInfo

Returns

the property information

runIt()

Run it.

class schrodinger.application.matsci.genetic_optimization.ChemInformatics(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.ClassEvaluator

Manage cheminformatics jobs.

UNITS = 'unknown'
PROP_BASE_NAME = None
CHECK_MSG = ''
OUT_EXT = '.mae'
__init__(structs, properties)

See parent class for documentation.

classmethod getInfo(name, units, model_file, tpp=1, host_str=None)

Return a PropertyInfo.

Parameters
  • cls (type) – the calling class

  • name (str) – the property name

  • units (str) – the property units

  • model_file (str) – the model file

  • tpp (int) – the threads per process

  • host_str (str) – the host string, for example ‘localhost:4’

Return type

PropertyInfo

Returns

the property information

getModelFile(aproperty)

Return the model file from the given property.

Parameters

aproperty (Property) – the property

Return type

str

Returns

the model file

copyModelFiles()

Copy the model files to the CWD.

makeMaestroInfile(struct, aproperty)

Make Maestro infile.

Parameters
Return type

str

Returns

the Maestro input file name

getPropertyValue(property_outfile, prop_key=None)

Get the property value.

Parameters
  • property_outfile (str) – the property output file

  • prop_key (str or None) – if the property output file is a Maestro file then this specifies the structure property key

Raises

RuntimeError – if property output file doesn’t exist or doesn’t contain the property value

Return type

float

Returns

the property value

generateExtraInput(struct)

Generate any extra input files needed for the actual property evaluation.

Parameters

struct (schrodinger.structure.Structure) – the structure

getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)

Get the command line.

Parameters
  • mae_infile (str) – Maestro file containing the structure

  • model_infile (str) – the cheminformatics model file

  • prop_name (str) – a property name possibly used to identify the property value

  • xtra_infile (str) – an additional input file

  • property_outfile (str) – name of the output file containing the property value

  • job_name (str) – the job name

Return type

list

Returns

the command line

runPrediction(struct)

Run the prediction.

Parameters

struct (schrodinger.structure.Structure) – the structure

Raises

RuntimeError – if it fails

setUp()

Do any necessary set up prior to the actual property prediction.

runIt()

Run it.

classmethod getModelFiles(property_lists)

Return file names of any model files.

Parameters

property_lists (list) – contains lists of property specifications

Return type

list

Returns

file names of the model files

classmethod checkModelFiles(model_files, host_str=None)

Check the given model files.

Parameters
  • model_files (list[str]) – the names of the model files

  • host_str (str) – the host string, for example ‘localhost:4’

Raises

RuntimeError – if there is an issue

static getAllModelFiles(property_lists, check=False, host_str=None)

Return file names of any model files.

Parameters
  • property_lists (list) – contains lists of property specifications

  • check (bool) – whether to check the model files

  • host_str (str) – the host string, for example ‘localhost:4’

Return type

list

Returns

file names of the model files

class schrodinger.application.matsci.genetic_optimization.CanvasKPLS(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.ChemInformatics

Manage Canvas KPLS jobs.

EXT = 'kpls.tar.gz'
FP_TEXT_FILE = 'fpInfo.txt'
MODEL_OPTION = 'kpls_model'
KEY = 'r_matsci_KPLS_%s/%s'
PATH = '/scr/buildbot/savedbuilds/2024-3/NB/build-133/mmshare-v6.7/data/genetic_optimization/canvas_kpls_models'
FP_EXT = '.fp'
VALUE_PATTERN = re.compile('\\s*1\\s+unknown.*\\s+(-?\\d+\\.?\\d*)$')
ALLOWED_FP_TYPES = ['linear', 'maccs', 'radial', 'molprint2D', 'torsion', 'pairwise', 'triplet', 'quartet', 'dendritic']
SINGLE_PRECISION = 32
DOUBLE_PRECISION = 64
BIT_EXT = '-bit'
LABEL = 'canvasKPLS'
CHECK_MSG = "  Model files must correspond to a single property type and as input take fingerprint files featuring one of the following fingerprint types, ['linear', 'maccs', 'radial', 'molprint2D', 'torsion', 'pairwise', 'triplet', 'quartet', 'dendritic'], as well as either no atom type or one of the 12 known atom types (integers in [1, 12]).  See $SCHRODINGER/utilities/canvasFPGen -h for more details."
makeFingerPrintInfile(mae_infile, name, job_dj=None)

Make fingerprint infile or add the job to do so to the given queue.

Parameters
  • mae_infile (str) – the Maestro input file name

  • name (str) – the property name

  • job_dj (None or schrodinger.job.queue.JobDJ) – if given then add the fingerprint job to this queue and return

Raises

RuntimeError – if canvasFPGen fails

Return type

str

Returns

the Canvas fingerprint input file name

getPropertyValue(property_outfile, prop_key=None)

See parent class.

generateExtraInput(struct)

See parent class.

makeFpOptionsDict()

Make the fingerprint options dictionary.

getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)

See parent class.

setUp()

See parent class.

static getFpOptions(model_file)

Return fingerprint options obtained from the given Canvas KPLS model file.

Parameters

model_file (str) – the name of the Canvas KPLS model file

Return type

int, str, int or None

Returns

contains (1) precision, (2) fingerprint type, and (3) atom type if present

Raises

RuntimeError – if there is anything wrong with the Canvas KPLS model file

class schrodinger.application.matsci.genetic_optimization.AutoQSAR(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.ChemInformatics

Manage AutoQSAR jobs.

EXT = 'qzip'
MODEL_OPTION = 'auto_qsar_model'
KEY = 'r_matsci_Auto_QSAR_%s/%s'
PROP_BASE_NAME = 'r_autoqsar_Pred_{prop_name}'
IN_EXT = '.inp'
LABEL = 'autoqsar'
writeAutoQSARInFile(mae_infile, model_infile, base_name, prop_name)

Write the AutoQSAR input file.

Parameters
  • mae_infile (str) – the input Maestro file name

  • model_infile (str) – the input AutoQSAR model file name

  • base_name (str) – the base name to use for the AutoQSAR input file

  • prop_name (str) – the property name

Return type

str

Returns

the name of the AutoQSAR input file

generateExtraInput(struct)

See parent class.

getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)

See parent class.

class schrodinger.application.matsci.genetic_optimization.DeepChem(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.ChemInformatics

Manage DeepChem jobs.

EXT = 'qzip'
MODEL_OPTION = 'deep_autoqsar_model'
KEY = 'r_matsci_DeepAutoQSAR_%s/%s'
PROP_BASE_NAME = 'r_m_{prop_name}_score'
LABEL = 'deepautoqsar'
getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)

See parent class.

class schrodinger.application.matsci.genetic_optimization.Jaguar(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.ClassEvaluator

Manage Jaguar jobs.

JAGUAR_OPTIONS = 'jaguar_options'
TPP = 'tpp'
JAGUAR_OUTPUT_ATTR = 'jaguar_output_attr'
IN_EXT = '.in'
OUT_EXT = '.out'
__init__(structs, properties)

See parent class for documentation.

loadQueue(job_dj)

Load the JobDJ with jobs.

Parameters

job_dj (JobDJ) – the queue

postProcess(silent=False)

Post process the jobs and set the final results.

Parameters

silent (bool) – if True, don’t raise.

Raises

RuntimeError – if there is a problem

runIt(silent=False)

Run it.

Parameters

silent (bool) – if True, don’t raise.

Raises

RuntimeError – if there is a problem

class schrodinger.application.matsci.genetic_optimization.GlassTransitionTemperature(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.CanvasKPLS

Manage glass transition temperature jobs.

UNITS = 'C'
KEY = 'r_matsci_KPLS_Tg/C'
PROP = 'kpls_tg'
FILE = 'Tg250.kpls.tar.gz'
static getInfo(host_str=None)

Return a PropertyInfo.

Parameters

host_str (str) – the host string, for example ‘localhost:4’

Return type

PropertyInfo

Returns

the property information

class schrodinger.application.matsci.genetic_optimization.RefractiveIndex(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.CanvasKPLS, schrodinger.application.matsci.genetic_optimization.Jaguar

Manage refractive index jobs.

UNITS = 'none'
KEY = 'r_matsci_Refractive_Index_298K/none'
PROP = 'refractive_index'
FILE = '01a_kpls5_amorphous_density_HT.kpls.tar.gz'
MASS_DENSITY_KEY = 'r_matsci_Mass_Density/g/cm^3'
ISOTROPIC_POLARIZABILITY_KEY = 'r_matsci_Isotropic_Polarizability/bohr^3'
__init__(structs, properties)

See parent classes for documentation.

getQueue(*args, **kwargs)

See parent class for documentation.

static getInfo(jaguar_options=None, tpp=1, host_str=None)

Return a PropertyInfo.

Parameters
  • jaguar_options (dict) – contain Jaguar options

  • tpp (int) – the threads per process

  • host_str (str) – the host string, for example ‘localhost:4’

Return type

PropertyInfo

Returns

the property information

classmethod setRefractiveIndexProperty(struct, mass_density, polarizability)

Set the refractive index property on the given structure.

Parameters
  • struct (schrodinger.structure.Structure) – the structure

  • mass_density (float) – the mass density in g/cm^3

  • polarizability (float) – the isotropically averaged polarizability in atomic units of bohr^3

Raises

ValueError – Abnormal small total_weight or large polarizability

runIt()

Run it.

class schrodinger.application.matsci.genetic_optimization.CustomClassEvaluator(structs, properties)

Bases: schrodinger.application.matsci.genetic_optimization.ClassEvaluator

Manage a custom class evaluator.

UNITS = 'unknown'
KEY = 'r_matsci_{name}/{units}'
CUSTOM_CLASS_FILE_OPTION = 'custom_class_file'
UNITS_DICT = {'stub': 'unknown'}
RUN_DICT = {'stub': 'runStub'}
EXTRA_INPUT_FILES_DICT = {'stub': []}
__init__(structs, properties)

See parent class for documentation.

static getInfo(name, units, custom_class_file, tpp=1, host_str=None)

Return a PropertyInfo.

Parameters
  • name (str) – the property name

  • units (str) – the property units

  • custom_class_file (str) – the Python file containing the custom class

  • tpp (int) – the threads per process

  • host_str (str) – the host string, for example ‘localhost:4’

Return type

PropertyInfo

Returns

the property information

static getModule(path)

Return the module from the given path.

Parameters

path (str) – the path to the module python file

Raises

RuntimeError – if there is a problem

Return type

object

Returns

the module

static getExtraInputFiles(path, name)

Return the extra input files for the given property name.

Parameters
  • path (str) – the path to the module python file

  • name (str) – the name of the property

Raises

RuntimeError – if there is a problem

Return type

list

Returns

the extra input files

static checkDict(path, dict_name)

Check dictionary.

Parameters
  • path (str) – the path to the module python file

  • dict_name (str) – the name of the dictionary to check

Raises

RuntimeError – if there is a problem

static checkModule(path, name)

Check the module.

Parameters
  • path (str) – the path to the module python file

  • name (str or None) – if given the name of the property using this module will be checked against the module

Raises

RuntimeError – if there is a problem

static addInputFiles(job_builder, property_lists)

Add input files from the given properties to the job builder.

Parameters
  • job_builder (launchapi.JobSpecificationArgsBuilder) – Job specification builder object

  • property_lists (list) – contains lists of property specifications

Raises

RuntimeError – if there is a problem

runStub(st, job_dj, tpp)

Run stub.

Parameters
Raises

RuntimeError – if there is a problem

Return type

float

Returns

the property value

runIt()

Run it.

Raises

RuntimeError – if there is a problem

schrodinger.application.matsci.genetic_optimization.get_script_property_info_dict(host_str=None)

Return a (name, PropertyInfo) dict for script based properties.

Parameters

host_str (str) – the host string, for example ‘localhost:4’

Return type

dict

Returns

contains (name, PropertyInfo)

schrodinger.application.matsci.genetic_optimization.get_property_info(name, jaguar_options=None, tpp=None, patterns=None, host_str=None)

Return a PropertyInfo for the given name and properties.

Parameters
  • name (str) – the property name

  • jaguar_options (dict) – contains Jaguar options

  • tpp (int) – the threads per process

  • patterns (list) – the SMARTS patterns

  • host_str (str) – the host string, for example ‘localhost:4’

Return type

PropertyInfo or None

Returns

the PropertyInfo

schrodinger.application.matsci.genetic_optimization.get_random_csearch_seed(this_random=None)

Return a random csearch seed.

Parameters

this_random (numpy.random.RandomState or None) – random state, if None use the module constant

Return type

int

Returns

the seed

exception schrodinger.application.matsci.genetic_optimization.PropertySyntaxError

Bases: Exception

exception schrodinger.application.matsci.genetic_optimization.UnknownPropertySuboptionError

Bases: Exception

class schrodinger.application.matsci.genetic_optimization.Property(index=1, key=None, name=None, units=None, minimax=None, target=None, comparator=None, error=None, weight=1.0, positive=None, summarize=None, class_kwargs=None)

Bases: object

Manage a property to be used in a genetic optimization.

MAX = 'max'
MIN = 'min'
EQUALS = 'eq'
GREATER_THAN = 'gt'
LESS_THAN = 'lt'
SUB_OPTIONS = ['index', 'key', 'name', 'units', 'minimax', 'target', 'comparator', 'error', 'weight', 'positive', 'patterns', 'summarize', 'kpls_model', 'custom_class_file', 'class_kwargs']
__init__(index=1, key=None, name=None, units=None, minimax=None, target=None, comparator=None, error=None, weight=1.0, positive=None, summarize=None, class_kwargs=None)

Create an instance.

Parameters
  • index (int) – a numeric index used to refer to this Property instance, a default of 1 is used

  • key (str) – the schrodinger.structure.Structure property key to be optimized

  • name (str) – specify a name for the property, this name will be, for example used in any *log files, etc.

  • units (str) – enter the units that the property is in, for example eV, nm, etc.

  • minimax (str) – to minimize or maximize this property then set this option to the class constants MIN or MAX

  • target (float) – if instead of maximizing or minimizing the property, the genetic optimization is supposed to handle a specific value then enter that value using this option.

  • comparator (str) – specify here how the target value and computed values are to be compared, i.e. either the class constants EQUALS for =, GREATER_THAN for >, or LESS_THAN for <.

  • error (float) – if equality to a target value has been specified then this option allows the user to control the error bounds of the target value, if not specified then a default of 10% of the specified target value will be used.

  • weight (float) – specify the weight to use for this property, if the genetic optimization is to be run on several properties then the weight allows the user to bias the solution. This option can also be used to control a situation where more than a single property is desired and where those properties are quantified using different physical units such that the numbers might be orders of magnitude apart from one another, for example comparing eV and nm. A default of 1.0 is used.

  • positive (bool) – True if this property can only take on positive values, for example as in the area of a surface, False otherwise, for example as in temperature in Celcius. The default is False.

  • summarize (bool) – if True then print a summary of this property, False otherwise

  • class_kwargs (dict or None) – contains kwargs for class based evaluation of this property

setAttributes(index=1, key=None, name=None, units=None, minimax=None, target=None, comparator=None, error=None, weight=1.0, positive=None, summarize=None, class_kwargs=None)

Set some attributes for this class.

Parameters
  • index (int) – a numeric index used to refer to this Property instance, a default of 1 is used

  • key (str) – the schrodinger.structure.Structure property key to be optimized

  • name (str) – specify a name for the property, this name will be, for example used in any *log files, etc.

  • units (str) – enter the units that the property is in, for example eV, nm, etc.

  • minimax (str) – to minimize or maximize this property then set this option to the class constants MIN or MAX

  • target (float) – if instead of maximizing or minimizing the property, the genetic optimization is supposed to handle a specific value then enter that value using this option.

  • comparator (str) – specify here how the target value and computed values are to be compared, i.e. either the class constants EQUALS for =, GREATER_THAN for >, or LESS_THAN for <.

  • error (float) – if equality to a target value has been specified then this option allows the user to control the error bounds of the target value, if not specified then a default of 10% of the specified target value will be used.

  • weight (float) – specify the weight to use for this property, if the genetic optimization is to be run on several properties then the weight allows the user to bias the solution. This option can also be used to control a situation where more than a single property is desired and where those properties are quantified using different physical units such that the numbers might be orders of magnitude apart from one another, for example comparing eV and nm. A default of 1.0 is used.

  • positive (bool) – True if this property can only take on positive values, for example as in the area of a surface, False otherwise, for example as in temperature in Celcius. The default is False.

  • summarize (bool) – if True then print a summary of this property, False otherwise

  • class_kwargs (dict or None) – contains kwargs for class based evaluation of this property

setClassKwargs(class_kwargs)

Set the class kwargs.

Parameters

class_kwargs (dict or None) – contains kwargs for class based evaluation of this property

parsePropertyString(property_string)

Parse the attributes of this class from a string representation of the property specifications. For example, ‘index=1 key=r_matsci_Reduction_Potential_(eV) name=reduction units=eV target=1.28 comparator=eq error=0.05 weight=0.5’ or ‘index=2 key=r_matsci_Oxidation_Potential_(eV) name=oxidation units=eV minimax=max weight=2.5’

Parameters

property_string (str) – the string representation of the property specifications

Raises
checkProperty()

Check this property instance.

isScriptProperty()

Return True if this property is a script property, False otherwise.

Return type

bool

Returns

return True if this property is a script property, False otherwise

static getPropertyStrings(property_lists)

Return property strings from the given property lists.

Parameters

property_lists (list) – contains lists of property specifications

Return type

list

Returns

contains string representations of the property specifications

static getRelPath(file_name)

Return the relative path to the given file name.

Parameters

file_name (str) – the file name

Return type

str

Returns

the relative path to the file

static getKwargs(property_string, option_substrings, add_relative_paths=None)

Return kwargs of the given property options from the given property string.

Parameters
  • property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’

  • option_substrings (list or str) – contains the option substrings for the needed values, a single occurence or list of occurences may be passed

  • add_relative_paths (list) – contains options for which relative paths should be added, such relative paths might be needed for correctly parallelizing the evaluation stage of the genetic optimization as they will be needed to copy otherwise shared files into local subdirectories

Return type

dict, str, or None

Returns

the extracted dictionary of kwargs or single kwarg depending on the input option_substrings or None if nothing is found

static rmKwargs(property_string, option_substrings)

Return a copy of the given property string with all of the given property option substrings removed.

Parameters
  • property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’

  • option_substrings (list) – contains the option substrings to be removed

Return type

str

Returns

the string representation of the property specifications less the options substrings that were to be removed

static addKwargs(property_string, kwargs)

Add the given options to the given property string.

Parameters
  • property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’

  • kwargs (dict) – key-value option pairs to add to the property string

Return type

str

Returns

the string representation of the property specifications containin the new options

static getCustomInfo(property_string, name, tpp=1, host_str=None)

Return a PropertyInfo.

Parameters
  • property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’

  • name (str) – the property name

  • tpp (int) – the threads per process

  • host_str (str) – the host string, for example ‘localhost:4’

Raises

RuntimeError – if there is a problem

Return type

PropertyInfo

Returns

the property information

schrodinger.application.matsci.genetic_optimization.set_title_to_stoichiometry(astructure, toappend=None, separation='.')

Set the structure title to be the stoichiometry of the structure.

Parameters
  • astructure (schrodinger.structure.Structure) – the structure

  • toappend (str) – a string to append to the stoichiometry

  • separation (str) – used to separate the stoichiometry and the toappend str

class schrodinger.application.matsci.genetic_optimization.StructureGenome

Bases: pyevolve.GenomeBase.GenomeBase

Manage a genome. The genome, aka chromosome, is the solution to the problem trying to be solved via genetic optimization. It is referred to as being composed of genes that are manipulated by the crossover and mutation operators. In our genetic optimization module this genome is basically just a schrodinger.structure.Structure object.

__init__()

Create an instance.

copy(genome)

Copy the current genome to the provided genome.

Parameters

genome (StructureGenome) – a new genome instance to which to copy the current genome

clone()

Clone the current genome.

Return type

StructureGenome

Returns

genome

updateStructureProperties(index, generation)

Update some structure properties.

Parameters
  • index (int) – the index of this individual

  • generation (int) – this generation

resetParentProperties()

Reset the crossover and mutation parent structure properties.

removeProperties()

Remove some structure properties.

optimizeGeometry()

Optimize the geometry of this genome’s structure using OPLS.

addPreviousFreezerFile(freezer_file)

Add the given file to the list of previous freezer files.

Parameters

freezer_file (str) – the name of the file to be added

evaluate(**args)

Evaluate the score of this individual.

Parameters

args (dict) – dictionary of genetic optimization parameters created and used by pyevolve

evaluator
initializator
mutator
crossover
internalParams
score
fitness
schrodinger.application.matsci.genetic_optimization.from_initial_population(genome, **args)

Draw a unique genome from the initial population.

Parameters
  • genome (StructureGenome) – a genome

  • args (dict) – dictionary of genetic optimization parameters created and used by pyevolve

schrodinger.application.matsci.genetic_optimization.get_num_simple_bonds(astructure)

Return the number of simple bonds in the provided structure. The definition of a simple bond follows from that used in the reaction channel module and is an acyclic single order bond that may involve a hydrogen atom.

Parameters

astructure (schrodinger.structure.Structure) – the structure for which to get the number of simple bonds

Return type

int

Returns

the number of simple bonds

schrodinger.application.matsci.genetic_optimization.combine_two_structures(astructure, bstructure, offset=10.0)

Combine two structure objects into a single structure object using somewhat arbitrary placement.

Parameters
  • astructure (schrodinger.structure.Structure) – the first of the structures to be combined

  • bstructure (schrodinger.structure.Structure) – the second of the structures to be combined

  • offset (float) – the final distance between the structures will be the sum of the molecular VDW radii plus this offset in Angstrom

Return type

schrodinger.structure.Structure

Returns

the combined structure object

schrodinger.application.matsci.genetic_optimization.bond_crossover(genome, **args)

Perform a crossover operation by swapping molecular fragments at two randomly choosen bonds, i.e. a double displacement reaction channel.

Parameters
  • genome (StructureGenome) – a genome

  • args (dict) – dictionary of genetic optimization parameters created and used by pyevolve

Return type

tuple

Returns

tuple containing the sister and brother StructureGenome

schrodinger.application.matsci.genetic_optimization.get_element_mutator_dict(astructure)

Return a dictionary where the keys contain the indicies of the mutatable atoms and the values contain those elements that the keyed atom may be mutated to.

Parameters

astructure (schrodinger.structure.Structure) – the structure to be mutated

Return type

dict

Returns

keys are atom indicies of those atoms that are mutatable and values are those elements that the atom can be mutated to

schrodinger.application.matsci.genetic_optimization.get_isoelectronic_mutator_indicies(astructure)

Return a list of atom indicies that can be mutated by the isoelectronic mutator.

Parameters

astructure (schrodinger.structure.Structure) – the structure to be mutated

Return type

list

Returns

mutatable indicies

schrodinger.application.matsci.genetic_optimization.get_child_like_parent(parent_st, children_sts, definition)

Return the child structure that is most like the provided parent.

Parameters
  • parent_st (schrodinger.structure.Structure) – the parent structure

  • children_sts (list of schrodinger.structure.Structure) – the children structures

  • definition (two-element list) – each sublist contains two atom indicies describing the reactive bonds in parent and fragment structures which created the children

Return type

schrodinger.structure.Structure

Returns

the sought child structure

schrodinger.application.matsci.genetic_optimization.elemental_mutator(genome, **args)

Perform a random elemental mutation to an element in the same column (as known as group) of the periodic table. Note that hydrogen and the halogens are considered to belong to the same column.

Parameters
  • genome (StructureGenome) – a genome

  • args (dict) – dictionary of genetic optimization parameters created and used by pyevolve

Return type

int

Returns

the number of mutations applied, appears to never be used in PyEvolve

schrodinger.application.matsci.genetic_optimization.fragment_mutator(genome, **args)

Randomly mutate the genome by swapping a molecular fragement on one side of a bond by a similar fragment from a library.

Parameters
  • genome (StructureGenome) – a genome

  • args (dict) – dictionary of genetic optimization parameters created and used by pyevolve

Return type

int

Returns

the number of mutations applied, appears to never be used in PyEvolve

schrodinger.application.matsci.genetic_optimization.isoelectronic_mutator(genome, **args)

Perform a random isoelectronic mutation from the following sets of series CH3X, NH2X, OHX, and FX, CH2XY, NHXY, OXY, and CHXYZ and NXYZ, where X, Y, and Z are non-H-bonds.

Parameters
  • genome (StructureGenome) – a genome

  • args (dict) – dictionary of genetic optimization parameters created and used by pyevolve

Return type

int

Returns

the number of mutations applied, appears to never be used in PyEvolve

schrodinger.application.matsci.genetic_optimization.get_loggable_float(afloat, num_decimal='%.2f', field_width=10)

Return a float as a string with the specified format.

Parameters
  • afloat (float) – a float to convert to a string

  • num_decimal (str) – the format of the string representation

  • field_width (int) – the field width of the final string

Return type

str

Returns

the float as a string

schrodinger.application.matsci.genetic_optimization.uniquify_titles_callback(ga_obj)

Callback to uniquify titles of the individuals.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

schrodinger.application.matsci.genetic_optimization.prepare_next_generation_dirs_callback(ga_obj)

Callback to update the generation property of the genomes and to create a subdirectory to hold the next series of evaluations.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

schrodinger.application.matsci.genetic_optimization.manage_skips_callback(ga_obj)

Callback to manage skips in the evaluation.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

schrodinger.application.matsci.genetic_optimization.manage_failures_callback(ga_obj)

Callback to manage failures in the evaluation.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

schrodinger.application.matsci.genetic_optimization.logging_summary_callback(ga_obj)

Callback to log progress.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

schrodinger.application.matsci.genetic_optimization.molecule_history_callback(ga_obj)

Callback to append all structures from all generations to individual log files.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

schrodinger.application.matsci.genetic_optimization.first_property(ga_obj)

Terminate when the first property has been matched.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

Return type

bool

Returns

True to terminate, False otherwise

schrodinger.application.matsci.genetic_optimization.all_properties(ga_obj)

Terminate when all properties have been matched.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

Return type

bool

Returns

True to terminate, False otherwise

schrodinger.application.matsci.genetic_optimization.unproductive(ga_obj)

Terminate if the maximum number of unproductive generations has been reached.

Parameters

ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization

Return type

bool

Returns

True to terminate, False otherwise

class schrodinger.application.matsci.genetic_optimization.CheckInput

Bases: object

Manage checking user input.

checkMaeFile(input_file, logger=None)

Check that a file exists and is *mae.

Parameters
  • input_file (str) – the name of the input file

  • logger (logging.Logger) – output logger

checkOperators(operators, logger=None)

Check the operators.

Parameters
  • operators (list) – contains tuples of the operator functions and their weights

  • logger (logging.Logger) – output logger

checkRates(crossover_rate, mutation_rate, logger=None)

Check the specified rates of crossover and mutation.

Parameters
  • crossover_rate (float) – the rate of crossover as a percentage

  • mutation_rate (float) – the rate of mutation as a percentage

  • logger (logging.Logger) – output logger

checkInitialPopulation(initial_population, crossover_names, mutator_names, crossover_rate, mutation_rate, no_open_shell, logger=None)

Check the initial population.

Parameters
  • initial_population (list) – the initial population of schrodinger.structure.Structure

  • crossover_names (list) – contains the function names of the crossover operators to be used

  • mutator_names (list) – contains the function names of the mutation operators to be used

  • crossover_rate (float) – the rate of crossover

  • mutation_rate (float) – the rate of mutation

  • no_open_shell (bool) – if True then check for open shell structures otherwise do not

  • logger (logging.Logger) – output logger

checkPopulationParam(population, num_structures_given, logger=None)

Check the population parameter.

Parameters
  • population (int) – the size of the population to use in the genetic optimization

  • num_structures_given (int) – the number of structures provided to the genetic optimization

  • logger (logging.Logger) – output logger

checkFragmentLibs(fragment_libs, logger=None)

Check the specified fragment libraries.

Parameters
  • fragment_libs (list) – strings specifying fragment libraries to be used

  • logger (logging.Logger) – output logger

Return type

list

Returns

valid user provided fragment files

checkProperties(properties, logger=None)

Check the list of properties.

Parameters
  • properties (list) – contains Property instances

  • logger (logging.Logger) – output logger

checkGenerations(generations, logger=None)

Check the specified number of generations.

Parameters
  • generations (int) – the number of generations

  • logger (logging.Logger) – output logger

checkSelection(selection, logger=None)

Check the specified selection protocol.

Parameters
  • selection (str) – the selection protocol to use.

  • logger (logging.Logger) – output logger

checkTournamentSize(tournament_size, population, logger=None)

Check the specified tournament size.

Parameters
  • tournament_size (int) – the size of tournament to use in tournament based selection

  • population (int) – the size of population to use

  • logger (logging.Logger) – output logger

checkTerminationParams(terminators, num_unproductive, logger=None)

Check the termination parameters.

Parameters
  • terminators (list) – the list of terminators to use

  • num_unproductive (int) – used when the unproductive termination option is active, it is the generation number on which to exit if the score hasn’t improved

  • logger (logging.Logger) – output logger

Return type

list and int

Returns

valid terminators and valid num_unproductive

checkScaling(scaling, properties, logger=None)

Check the scaling.

Parameters
  • scaling (str) – the scaling protocol to use in the genetic optimization

  • properties (list) – the properties to be optimized

  • logger (logging.Logger) – output logger

checkElitism(elitism, population, logger=None)

Check the elitism.

Parameters
  • elitism (int) – the number of elite individuals to use

  • population (int) – the size of population to use

  • logger (logging.Logger) – output logger

checkConformationalSearch(conformational_search, logger=None)

Check the conformational search.

Parameters
  • conformational_search (bool or str) – specifies whether a conformational search is to be performed, if a string is given specifies a file used to set options

  • logger (logging.Logger) – output logger

checkFreezers(freezers, pop_size, input_size, logger=None)

Check the freezers.

Parameters
  • freezers (list) – collection of freezers to use

  • pop_size (int) – the size of the population

  • input_size (int) – the number of structures given

  • logger (logging.Logger) – output logger

Return type

list

Returns

collection of freezers to use

checkInoculate(inoculate, logger=None)

Check the inoculate.

Parameters
  • inoculate (list) – circumstances in which to inoculate

  • logger (logging.Logger) – output logger

schrodinger.application.matsci.genetic_optimization.print_bad_jobs(all_bad_jobs, logger, bad_type='skip')

Log bad jobs, i.e. skips and failures.

Parameters
  • all_bad_jobs (dict) – a collection of bad subjobs, keys are genetic optimization generation and values are a list of Skip or Failure objects for bad subjobs

  • logger (logging.Logger) – output logger

  • bad_type (str) – specifies either ‘skip’ or ‘fail’ type

class schrodinger.application.matsci.genetic_optimization.GeneticOptimization(initial_population, properties, structure_score_threshold=- 50.0, eval_kwargs={}, crossovers=None, mutators=None, fragment_libs=['optoelectronics'], script_evaluator=None, generations=10, population=8, crossover_rate=90.0, mutation_rate=90.0, selection='roulette_wheel', tournament_size=2, terminators=['unproductive', 'all_properties'], num_unproductive=6, scaling='sigma_truncation', elitism=1, random_seed=None, no_minimize=False, file_base_name='genopt', no_open_shell=False, props_to_remove=None, jobbe=None, conformational_search=False, freezers=['remainder', 'previous'], inoculate=['no_child', 'bad_structure'], class_evaluators=None, logger=None)

Bases: object

Manage the genetic optimization.

MSGWIDTH = 80
__init__(initial_population, properties, structure_score_threshold=- 50.0, eval_kwargs={}, crossovers=None, mutators=None, fragment_libs=['optoelectronics'], script_evaluator=None, generations=10, population=8, crossover_rate=90.0, mutation_rate=90.0, selection='roulette_wheel', tournament_size=2, terminators=['unproductive', 'all_properties'], num_unproductive=6, scaling='sigma_truncation', elitism=1, random_seed=None, no_minimize=False, file_base_name='genopt', no_open_shell=False, props_to_remove=None, jobbe=None, conformational_search=False, freezers=['remainder', 'previous'], inoculate=['no_child', 'bad_structure'], class_evaluators=None, logger=None)

Create an instance.

Parameters
  • initial_population (list) – the initial population of schrodinger.structure.Structure

  • properties (list of Property) – the properties to be optimized, including structural properties as well as more physical calculable observables

  • structure_score_threshold (float) – if structure-based properties are being sought and if the base evaluator will be used then subjobs on structures with structure scores below this value will not be launched but rather such structures treated as skips

  • eval_kwargs (dict) – a dictionary that will be available in all evaluator functions

  • crossovers (list) – contains two-element tuples each of which holds a crossover operator to be used in the optimization along with a weight

  • mutators (list) – contains two-element tuples each of which holds a mutation operator to be used in the optimization along with a weight

  • fragment_libs (list) – strings specifying fragment libraries to be used, can be either module constants from FRAGMENT_LIBS.keys() (or ALL if all of those are desired) or the names of Maestro files (including the file extensions) containing fragments collected by the user

  • script_evaluator (method) – the evaluator function to be called to score individuals during the optimization, takes a StructureGenome and returns a JobDJ

  • generations (int) – the number of generations for which to run the optimization

  • population (int) – the population size to use in the optimization, can be less-than-or-equal-to the length of initial_population

  • crossover_rate (float) – the rate of crossover as a percentage

  • mutation_rate (float) – the rate of mutation as a percentage

  • selection (str) – the selection protocol used to select individuals to the gene pool for the upcoming generation

  • tournament_size (int) – the size of tournament to use if using tournament based selection, unused if a tournament based selection is not being used

  • terminators (list) – list of strings that specify the termination protocols to be used to terminate the optimization, typically more than one is specified only if the unproductive protocol is being used

  • num_unproductive (int) – if the unproductive protocol is being used to terminate the optimization then this integer specifies how many unproductive cycles are allowed before terminating, unused if a different termination protocol is used

  • scaling (str) – specifies the scaling protocol to use, scaling scales the raw scores of the individuals to produce fitness scores to ease selection in cases where raw scores are nearly equal

  • elitism (int) – specify the number of elite individuals guaranteed to be added to the gene pool for the upcoming generation, zero disables elitism

  • random_seed (None or int) – the random seed, if None then system time will be used

  • no_minimize (bool) – specify that the offspring structures generated by the crossover and mutation operators not be geometry optimized prior to selection

  • file_base_name (str) – base name to use for output and generation log files

  • no_open_shell (bool) – if True then do not allow the processing of open shell molecules, False otherwise

  • props_to_remove (list) – a list of structure property keys to be removed prior to the evaluation stage

  • jobbe (schrodinger.job.jobcontrol._Backend) – the jobcontrol backend of the driver job

  • conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options

  • freezers (list) – a collection of freezers containing structures that are used to swap out individuals from the population

  • inoculate (list) – the list of circumstances under which to use the structure freezers

  • class_evaluators (dict) – keys are the evaluator classes to be called to score individuals during the optimization, each must inherit ClassEvaluator, values are lists of Property to be passed to the class evaluator

  • logger (logging.Logger) – output logger

setRootLoggerForPyEvolve()

Set up the root logger for PyEvolve.

setOperatorNames()

Set the operator names.

checkInputParams()

Check the input parameters.

printProperties()

Log the set of sought properties and their details.

printParams()

Log the parameters.

initializeGenome()

Initialize a genome.

Return type

StructureGenome

Returns

a genome

initializeGA(genome)

Initialize the genetic optimization.

Parameters

genome (StructureGenome) – a genome

setMonomerGrowAtoms()

Set the monomer grow atoms using the mark monomer module convention rather than the polymer builder module convention.

runIt()

Run the components of the genetic optimization.

schrodinger.application.matsci.genetic_optimization.get_output_file_name(basename)

Get the output file name from the basename.

Parameters

basename (str) – base name to use

Return type

str

Returns

output_file_name, name of output file

schrodinger.application.matsci.genetic_optimization.get_generation_log_file_name(basename, generation)

Get the generation log file name.

Parameters
  • basename (str) – base name to use

  • generation (int) – the generation

Return type

str

Returns

generation_log_file_name, name of generation log file

schrodinger.application.matsci.genetic_optimization.get_structure_score(astructure, properties, conformational_search, seed=None, this_random=None)

Return the structure score for the provided structure.

Parameters
  • astructure (schrodinger.structure.Structure) – the structure to score

  • properties (list of Property) – the properties used in scoring

  • conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options

  • seed (int or None) – random seed used in conformational search or None if conformational search is not being done

  • this_random (numpy.random.RandomState or None) – random state, if None use the module constant

Return type

float

Returns

the structure score

schrodinger.application.matsci.genetic_optimization.structure_evaluator(genome, this_random=None)

This is the structure evaulator.

Parameters
  • genome (StructureGenome) – a genome

  • this_random (numpy.random.RandomState or None) – random state, if None use the module constant

Return type

float

Returns

the score for this individual

schrodinger.application.matsci.genetic_optimization.base_evaluator(genome)

This is the base evaulator used to wrap all other evaluators.

Parameters

genome (StructureGenome) – a genome

Return type

float

Returns

the score for this individual

schrodinger.application.matsci.genetic_optimization.optoelectronics_evaluator(genome)

Run an optoelectronics job.

Parameters

genome (StructureGenome) – a genome

Return type

JobDJ

Returns

the JobDJ object for this individual, it is run in the base evaluator

schrodinger.application.matsci.genetic_optimization.apply_uniform_operator_weights(operators)

Set the operator weights uniformly.

Parameters

operators (list) – a list of two-element tuples, each tuple contains first an operator function and second a weight

Return type

list

Returns

list of two-element tuples of operators and uniform weights

schrodinger.application.matsci.genetic_optimization.structure_is_open_shell(astructure, ignore_charge=True)

Return True if the provided structure is open shell, i.e. has an odd number of electrons.

Parameters
  • astructure (schrodinger.structure.Structure) – the structure in question

  • ignore_charge (bool) – if True then ignore any structure.formal_charge settings

Return type

bool

Returns

True if the provided structure is open shell, False otherwise

schrodinger.application.matsci.genetic_optimization.get_element_histogram(astructure)

Return a dictionary where keys are elements and values are the numbers of atoms of a given element.

Parameters

astructure (schrodinger.structure.Structure) – the structure in question

Return type

dict

Returns

dictionary with element histogram, keys are elements (strs) and values are numbers (ints)

schrodinger.application.matsci.genetic_optimization.remove_basename_ext(stoich_ext)

Remove the basename extension from the given string and return the remainder which is the stoichiometry. Do this instead of having to recompute the stoichiometry which can be expensive.

Parameters

stoich_ext (str) – contains the stoichiometry and basename extension

Return type

str

Returns

stoichiometry

schrodinger.application.matsci.genetic_optimization.get_low_energy_conformers(astructure_in, macromodel_options_file=None, remove_files=False, overwrite=False, seed=None, this_random=None, host_str=None)

Return the lowest energy conformers from a Macromodel conformational search.

Parameters
  • astructure_in (schrodinger.structure.Structure) – the structure to search for conformations

  • macromodel_options_file (str or None) – the name of a simplified Macromodel input file that contains any options to use in addition to those used by default in a conformational search or None if there are none and you just want to use the defaults

  • remove_files (bool) – if the job is successful, specifies whether to remove all files created for it after it finishes

  • overwrite (bool) – if True then the coordinates of the input structure will be overwritten by those of the lowest energy conformer and that structure alone returned by this function

  • seed (int or None) – used to seed the random number generator used in the Macromodel conformational search, should be in CONF_SEARCH_SEED_RANGE, if None then if a CONFSEARCH_SEED has been specified in macromodel_options_file it will be used, otherwise a random int in CONF_SEARCH_SEED_RANGE will be used

  • this_random (numpy.random.RandomState or None) – random state, if None use the module constant

  • host_str (str) – the host string, for example ‘localhost:4’

Return type

list of schrodinger.structure.Structure, int

Returns

the structures of the lowest energy conformers sorted by increasing energy and the seed used in the conformational search (same as input if input was given either as seed or in macromodel_options_file)

schrodinger.application.matsci.genetic_optimization.get_random_structure(structure_libs, tries_from_libs=3, structure_score_threshold=None, properties=None, conformational_search=False, seed=None, this_random=None)

From the given dictionary of libraries return a random structure.

Parameters
  • structure_libs (dict) – keys are strings specifying the types of libraries to be used and can be module constants from FREEZER_CHOICES.keys(), values are lists of libraries by type and can be either module constants from FRAGMENT_LIBS.keys(), ALL, or the names of Maestro files (including the file extensions)

  • tries_from_libs (int) – the number of times to try before giving up

  • structure_score_threshold (float or None) – specifies that a structure with a structure score greater-than-or-equal-to this threshold is sought, the best of the considered structures will be returned and will contain several structure properties related to the scoring

  • properties (list of Property or None) – the properties used in structure scoring

  • conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options

  • seed (int or None) – if not None specifies that random should be reseeded with the given value

  • this_random (numpy.random.RandomState or None) – random state, if None use the module constant

Return type

schrodinger.structure.Structure or None

Returns

the random structure or None if one couldn’t be found

schrodinger.application.matsci.genetic_optimization.get_freezer_structure(structure_libs, tries_from_libs=3, structure_score_threshold=None, properties=None, conformational_search=False, inoculate='no_child', crossover_applied=None, mutation_applied=None, basename_ext=None, seed=None, this_random=None)

Return a random structure from the freezer and update that structure’s properties.

Parameters
  • structure_libs (dict) – keys are strings specifying the types of libraries to be used and can be module constants from FREEZER_CHOICES.keys(), values are lists of libraries by type and can be either module constants from FRAGMENT_LIBS.keys(), ALL, or the names of Maestro files (including the file extensions)

  • tries_from_libs (int) – the number of times to try before giving up

  • structure_score_threshold (float or None) – specifies that a structure with a structure score greater-than-or-equal-to this threshold is sought, the best of the considered structures will be returned and will contain several structure properties related to the scoring

  • properties (list of Property or None) – the properties used in structure scoring

  • conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options

  • inoculate (str) – specify the reason for drawing from the freezer, which is an inoculate option from INOCULATE_CHOICES

  • crossover_applied (str or None) – specify the intended crossover operator or None if there isn’t to be one

  • mutation_applied (str or None) – specify the intended mutation operator or None if there isn’t to be one

  • basename_ext (str or None) – specify an extension to append to the stoichiometry which is used to set the title of the returned structure

  • seed (int or None) – if not None specifies that random should be reseeded with the given value

  • this_random (numpy.random.RandomState or None) – random state, if None use the module constant

Return type

schrodinger.structure.Structure or None

Returns

the random structure or None if one couldn’t be found