schrodinger.application.matsci.genetic_optimization module¶
Classes and functions for the genetic optimization module.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.application.matsci.genetic_optimization.PropertyInfo(key, units, is_positive, class_evaluator, class_kwargs)¶
Bases:
tuple
- class_evaluator¶
Alias for field number 3
- class_kwargs¶
Alias for field number 4
- is_positive¶
Alias for field number 2
- key¶
Alias for field number 0
- units¶
Alias for field number 1
- class schrodinger.application.matsci.genetic_optimization.ClassEvaluator(structs, properties)¶
Bases:
object
Manage a class evaluator.
- SEPARATOR = '_'¶
- HOST_STR = 'host_str'¶
- __init__(structs, properties)¶
Create an instance.
- Parameters
structs (list of schrodinger.structure.Structure) – contains input structures
properties (list) – contains Property
- getBaseName(struct, aproperty)¶
Get the base name.
- Parameters
struct (schrodinger.structure.Structure) – the structure
aproperty (Property) – the property
- Return type
str
- Returns
the base name
- runIt()¶
Run it.
- Raises
RuntimeError – for any issue
- setQueue(job_dj)¶
Set a JobDJ to run jobs.
- Parameters
job_dj (
schrodinger.job.queue.JobDJ
) – the queue
- getQueue(properties, tpp=1, disable_smart_distribution=True)¶
Return a JobDJ to run jobs.
- Parameters
properties (list) – contains Property
tpp (int) – the threads per process
disable_smart_distribution (bool) – if True then force disable smart distribution regardless of the input tpp value
- Return type
- Returns
the queue
- static getHostStr(host_str=None)¶
Return the host string.
- Parameters
host_str (str) – the host string, for example ‘localhost:4’
- Return type
str
- Returns
the host string
- class schrodinger.application.matsci.genetic_optimization.StructureEvaluator(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.ClassEvaluator
Manage structure evaluation.
- SMARTS_PATTERN_SEPARATOR = '_'¶
- SMARTS_PROP = 'smarts'¶
- MOL_WEIGHT_PROP = 'molecular_weight'¶
- NATOMS_PROP = 'natoms'¶
- NELEMENTS_PROP = 'nelements'¶
- PROPERTIES = {'molecular_weight', 'natoms', 'nelements', 'smarts'}¶
- SMARTS_KEY = 'i_matsci_SMARTS_property_%s'¶
- MOL_WEIGHT_KEY = 'r_m_Molecular_weight'¶
- NATOMS_KEY = 'i_m_Number_of_atoms'¶
- NELEMENTS_KEY = 'i_m_Number_of_elements'¶
- NO_UNITS = 'None'¶
- MOL_WEIGHT_UNITS = 'g/mol'¶
- PATTERNS = 'patterns'¶
- __init__(structs, properties)¶
See parent class for documentation.
- static getInfo(key, units, patterns=None, host_str=None)¶
Return a PropertyInfo.
- Parameters
key (str) – the property key
units (str) – the property units
patterns (list) – the SMARTS patterns
host_str (str) – the host string, for example ‘localhost:4’
- Return type
- Returns
the property information
- runIt()¶
Run it.
- class schrodinger.application.matsci.genetic_optimization.ChemInformatics(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.ClassEvaluator
Manage cheminformatics jobs.
- UNITS = 'unknown'¶
- PROP_BASE_NAME = None¶
- CHECK_MSG = ''¶
- OUT_EXT = '.mae'¶
- __init__(structs, properties)¶
See parent class for documentation.
- classmethod getInfo(name, units, model_file, tpp=1, host_str=None)¶
Return a PropertyInfo.
- Parameters
cls (type) – the calling class
name (str) – the property name
units (str) – the property units
model_file (str) – the model file
tpp (int) – the threads per process
host_str (str) – the host string, for example ‘localhost:4’
- Return type
- Returns
the property information
- getModelFile(aproperty)¶
Return the model file from the given property.
- Parameters
aproperty (Property) – the property
- Return type
str
- Returns
the model file
- copyModelFiles()¶
Copy the model files to the CWD.
- makeMaestroInfile(struct, aproperty)¶
Make Maestro infile.
- Parameters
struct (schrodinger.structure.Structure) – the structure
aproperty (Property) – the property
- Return type
str
- Returns
the Maestro input file name
- getPropertyValue(property_outfile, prop_key=None)¶
Get the property value.
- Parameters
property_outfile (str) – the property output file
prop_key (str or None) – if the property output file is a Maestro file then this specifies the structure property key
- Raises
RuntimeError – if property output file doesn’t exist or doesn’t contain the property value
- Return type
float
- Returns
the property value
- generateExtraInput(struct)¶
Generate any extra input files needed for the actual property evaluation.
- Parameters
struct (schrodinger.structure.Structure) – the structure
- getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)¶
Get the command line.
- Parameters
mae_infile (str) – Maestro file containing the structure
model_infile (str) – the cheminformatics model file
prop_name (str) – a property name possibly used to identify the property value
xtra_infile (str) – an additional input file
property_outfile (str) – name of the output file containing the property value
job_name (str) – the job name
- Return type
list
- Returns
the command line
- runPrediction(struct)¶
Run the prediction.
- Parameters
struct (schrodinger.structure.Structure) – the structure
- Raises
RuntimeError – if it fails
- setUp()¶
Do any necessary set up prior to the actual property prediction.
- runIt()¶
Run it.
- classmethod getModelFiles(property_lists)¶
Return file names of any model files.
- Parameters
property_lists (list) – contains lists of property specifications
- Return type
list
- Returns
file names of the model files
- classmethod checkModelFiles(model_files, host_str=None)¶
Check the given model files.
- Parameters
model_files (list[str]) – the names of the model files
host_str (str) – the host string, for example ‘localhost:4’
- Raises
RuntimeError – if there is an issue
- static getAllModelFiles(property_lists, check=False, host_str=None)¶
Return file names of any model files.
- Parameters
property_lists (list) – contains lists of property specifications
check (bool) – whether to check the model files
host_str (str) – the host string, for example ‘localhost:4’
- Return type
list
- Returns
file names of the model files
- class schrodinger.application.matsci.genetic_optimization.CanvasKPLS(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.ChemInformatics
Manage Canvas KPLS jobs.
- EXT = 'kpls.tar.gz'¶
- FP_TEXT_FILE = 'fpInfo.txt'¶
- MODEL_OPTION = 'kpls_model'¶
- KEY = 'r_matsci_KPLS_%s/%s'¶
- PATH = '/scr/buildbot/savedbuilds/2024-4/NB/build-117/mmshare-v6.8/data/genetic_optimization/canvas_kpls_models'¶
- FP_EXT = '.fp'¶
- VALUE_PATTERN = re.compile('\\s*1\\s+unknown.*\\s+(-?\\d+\\.?\\d*)$')¶
- ALLOWED_FP_TYPES = ['linear', 'maccs', 'radial', 'molprint2D', 'torsion', 'pairwise', 'triplet', 'quartet', 'dendritic']¶
- SINGLE_PRECISION = 32¶
- DOUBLE_PRECISION = 64¶
- BIT_EXT = '-bit'¶
- LABEL = 'canvasKPLS'¶
- CHECK_MSG = " Model files must correspond to a single property type and as input take fingerprint files featuring one of the following fingerprint types, ['linear', 'maccs', 'radial', 'molprint2D', 'torsion', 'pairwise', 'triplet', 'quartet', 'dendritic'], as well as either no atom type or one of the 12 known atom types (integers in [1, 12]). See $SCHRODINGER/utilities/canvasFPGen -h for more details."¶
- makeFingerPrintInfile(mae_infile, name, job_dj=None)¶
Make fingerprint infile or add the job to do so to the given queue.
- Parameters
mae_infile (str) – the Maestro input file name
name (str) – the property name
job_dj (None or
schrodinger.job.queue.JobDJ
) – if given then add the fingerprint job to this queue and return
- Raises
RuntimeError – if canvasFPGen fails
- Return type
str
- Returns
the Canvas fingerprint input file name
- getPropertyValue(property_outfile, prop_key=None)¶
See parent class.
- generateExtraInput(struct)¶
See parent class.
- makeFpOptionsDict()¶
Make the fingerprint options dictionary.
- getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)¶
See parent class.
- setUp()¶
See parent class.
- static getFpOptions(model_file)¶
Return fingerprint options obtained from the given Canvas KPLS model file.
- Parameters
model_file (str) – the name of the Canvas KPLS model file
- Return type
int, str, int or None
- Returns
contains (1) precision, (2) fingerprint type, and (3) atom type if present
- Raises
RuntimeError – if there is anything wrong with the Canvas KPLS model file
- class schrodinger.application.matsci.genetic_optimization.AutoQSAR(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.ChemInformatics
Manage AutoQSAR jobs.
- EXT = 'qzip'¶
- MODEL_OPTION = 'auto_qsar_model'¶
- KEY = 'r_matsci_Auto_QSAR_%s/%s'¶
- PROP_BASE_NAME = 'r_autoqsar_Pred_{prop_name}'¶
- IN_EXT = '.inp'¶
- LABEL = 'autoqsar'¶
- writeAutoQSARInFile(mae_infile, model_infile, base_name, prop_name)¶
Write the AutoQSAR input file.
- Parameters
mae_infile (str) – the input Maestro file name
model_infile (str) – the input AutoQSAR model file name
base_name (str) – the base name to use for the AutoQSAR input file
prop_name (str) – the property name
- Return type
str
- Returns
the name of the AutoQSAR input file
- generateExtraInput(struct)¶
See parent class.
- getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)¶
See parent class.
- class schrodinger.application.matsci.genetic_optimization.DeepChem(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.ChemInformatics
Manage DeepChem jobs.
- EXT = 'qzip'¶
- MODEL_OPTION = 'deep_autoqsar_model'¶
- KEY = 'r_matsci_DeepAutoQSAR_%s/%s'¶
- PROP_BASE_NAME = 'r_m_{prop_name}_score'¶
- LABEL = 'deepautoqsar'¶
- getCmd(mae_infile, model_infile, prop_name, xtra_infile, property_outfile, job_name)¶
See parent class.
- class schrodinger.application.matsci.genetic_optimization.Jaguar(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.ClassEvaluator
Manage Jaguar jobs.
- JAGUAR_OPTIONS = 'jaguar_options'¶
- TPP = 'tpp'¶
- JAGUAR_OUTPUT_ATTR = 'jaguar_output_attr'¶
- IN_EXT = '.in'¶
- OUT_EXT = '.out'¶
- __init__(structs, properties)¶
See parent class for documentation.
- postProcess(silent=False)¶
Post process the jobs and set the final results.
- Parameters
silent (bool) – if True, don’t raise.
- Raises
RuntimeError – if there is a problem
- runIt(silent=False)¶
Run it.
- Parameters
silent (bool) – if True, don’t raise.
- Raises
RuntimeError – if there is a problem
- class schrodinger.application.matsci.genetic_optimization.GlassTransitionTemperature(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.CanvasKPLS
Manage glass transition temperature jobs.
- UNITS = 'C'¶
- KEY = 'r_matsci_KPLS_Tg/C'¶
- PROP = 'kpls_tg'¶
- FILE = 'Tg250.kpls.tar.gz'¶
- static getInfo(host_str=None)¶
Return a PropertyInfo.
- Parameters
host_str (str) – the host string, for example ‘localhost:4’
- Return type
- Returns
the property information
- class schrodinger.application.matsci.genetic_optimization.RefractiveIndex(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.CanvasKPLS
,schrodinger.application.matsci.genetic_optimization.Jaguar
Manage refractive index jobs.
- UNITS = 'none'¶
- KEY = 'r_matsci_Refractive_Index_298K/none'¶
- PROP = 'refractive_index'¶
- FILE = '01a_kpls5_amorphous_density_HT.kpls.tar.gz'¶
- MASS_DENSITY_KEY = 'r_matsci_Mass_Density/g/cm^3'¶
- ISOTROPIC_POLARIZABILITY_KEY = 'r_matsci_Isotropic_Polarizability/bohr^3'¶
- __init__(structs, properties)¶
See parent classes for documentation.
- getQueue(*args, **kwargs)¶
See parent class for documentation.
- static getInfo(jaguar_options=None, tpp=1, host_str=None)¶
Return a PropertyInfo.
- Parameters
jaguar_options (dict) – contain Jaguar options
tpp (int) – the threads per process
host_str (str) – the host string, for example ‘localhost:4’
- Return type
- Returns
the property information
- classmethod setRefractiveIndexProperty(struct, mass_density, polarizability)¶
Set the refractive index property on the given structure.
- Parameters
struct (schrodinger.structure.Structure) – the structure
mass_density (float) – the mass density in g/cm^3
polarizability (float) – the isotropically averaged polarizability in atomic units of bohr^3
- Raises
ValueError – Abnormal small total_weight or large polarizability
- runIt()¶
Run it.
- class schrodinger.application.matsci.genetic_optimization.CustomClassEvaluator(structs, properties)¶
Bases:
schrodinger.application.matsci.genetic_optimization.ClassEvaluator
Manage a custom class evaluator.
- UNITS = 'unknown'¶
- KEY = 'r_matsci_{name}/{units}'¶
- CUSTOM_CLASS_FILE_OPTION = 'custom_class_file'¶
- UNITS_DICT = {'stub': 'unknown'}¶
- RUN_DICT = {'stub': 'runStub'}¶
- EXTRA_INPUT_FILES_DICT = {'stub': []}¶
- __init__(structs, properties)¶
See parent class for documentation.
- static getInfo(name, units, custom_class_file, tpp=1, host_str=None)¶
Return a PropertyInfo.
- Parameters
name (str) – the property name
units (str) – the property units
custom_class_file (str) – the Python file containing the custom class
tpp (int) – the threads per process
host_str (str) – the host string, for example ‘localhost:4’
- Return type
- Returns
the property information
- static getModule(path)¶
Return the module from the given path.
- Parameters
path (str) – the path to the module python file
- Raises
RuntimeError – if there is a problem
- Return type
object
- Returns
the module
- static getExtraInputFiles(path, name)¶
Return the extra input files for the given property name.
- Parameters
path (str) – the path to the module python file
name (str) – the name of the property
- Raises
RuntimeError – if there is a problem
- Return type
list
- Returns
the extra input files
- static checkDict(path, dict_name)¶
Check dictionary.
- Parameters
path (str) – the path to the module python file
dict_name (str) – the name of the dictionary to check
- Raises
RuntimeError – if there is a problem
- static checkModule(path, name)¶
Check the module.
- Parameters
path (str) – the path to the module python file
name (str or None) – if given the name of the property using this module will be checked against the module
- Raises
RuntimeError – if there is a problem
- static addInputFiles(job_builder, property_lists)¶
Add input files from the given properties to the job builder.
- Parameters
job_builder (
launchapi.JobSpecificationArgsBuilder
) – Job specification builder objectproperty_lists (list) – contains lists of property specifications
- Raises
RuntimeError – if there is a problem
- runStub(st, job_dj, tpp)¶
Run stub.
- Parameters
st (schrodinger.structure.Structure) – the structure
job_dj (JobDJ) – the queue
tpp (int) – the threads per process
- Raises
RuntimeError – if there is a problem
- Return type
float
- Returns
the property value
- runIt()¶
Run it.
- Raises
RuntimeError – if there is a problem
- schrodinger.application.matsci.genetic_optimization.get_script_property_info_dict(host_str=None)¶
Return a (name, PropertyInfo) dict for script based properties.
- Parameters
host_str (str) – the host string, for example ‘localhost:4’
- Return type
dict
- Returns
contains (name, PropertyInfo)
- schrodinger.application.matsci.genetic_optimization.get_property_info(name, jaguar_options=None, tpp=None, patterns=None, host_str=None)¶
Return a PropertyInfo for the given name and properties.
- Parameters
name (str) – the property name
jaguar_options (dict) – contains Jaguar options
tpp (int) – the threads per process
patterns (list) – the SMARTS patterns
host_str (str) – the host string, for example ‘localhost:4’
- Return type
PropertyInfo or None
- Returns
the PropertyInfo
- schrodinger.application.matsci.genetic_optimization.get_random_csearch_seed(this_random=None)¶
Return a random csearch seed.
- Parameters
this_random (numpy.random.RandomState or None) – random state, if None use the module constant
- Return type
int
- Returns
the seed
- exception schrodinger.application.matsci.genetic_optimization.PropertySyntaxError¶
Bases:
Exception
- exception schrodinger.application.matsci.genetic_optimization.UnknownPropertySuboptionError¶
Bases:
Exception
- class schrodinger.application.matsci.genetic_optimization.Property(index=1, key=None, name=None, units=None, minimax=None, target=None, comparator=None, error=None, weight=1.0, positive=None, summarize=None, class_kwargs=None)¶
Bases:
object
Manage a property to be used in a genetic optimization.
- MAX = 'max'¶
- MIN = 'min'¶
- EQUALS = 'eq'¶
- GREATER_THAN = 'gt'¶
- LESS_THAN = 'lt'¶
- SUB_OPTIONS = ['index', 'key', 'name', 'units', 'minimax', 'target', 'comparator', 'error', 'weight', 'positive', 'patterns', 'summarize', 'kpls_model', 'custom_class_file', 'class_kwargs']¶
- __init__(index=1, key=None, name=None, units=None, minimax=None, target=None, comparator=None, error=None, weight=1.0, positive=None, summarize=None, class_kwargs=None)¶
Create an instance.
- Parameters
index (int) – a numeric index used to refer to this Property instance, a default of 1 is used
key (str) – the schrodinger.structure.Structure property key to be optimized
name (str) – specify a name for the property, this name will be, for example used in any
*log
files, etc.units (str) – enter the units that the property is in, for example eV, nm, etc.
minimax (str) – to minimize or maximize this property then set this option to the class constants MIN or MAX
target (float) – if instead of maximizing or minimizing the property, the genetic optimization is supposed to handle a specific value then enter that value using this option.
comparator (str) – specify here how the target value and computed values are to be compared, i.e. either the class constants EQUALS for =, GREATER_THAN for >, or LESS_THAN for <.
error (float) – if equality to a target value has been specified then this option allows the user to control the error bounds of the target value, if not specified then a default of 10% of the specified target value will be used.
weight (float) – specify the weight to use for this property, if the genetic optimization is to be run on several properties then the weight allows the user to bias the solution. This option can also be used to control a situation where more than a single property is desired and where those properties are quantified using different physical units such that the numbers might be orders of magnitude apart from one another, for example comparing eV and nm. A default of 1.0 is used.
positive (bool) – True if this property can only take on positive values, for example as in the area of a surface, False otherwise, for example as in temperature in Celcius. The default is False.
summarize (bool) – if True then print a summary of this property, False otherwise
class_kwargs (dict or None) – contains kwargs for class based evaluation of this property
- setAttributes(index=1, key=None, name=None, units=None, minimax=None, target=None, comparator=None, error=None, weight=1.0, positive=None, summarize=None, class_kwargs=None)¶
Set some attributes for this class.
- Parameters
index (int) – a numeric index used to refer to this Property instance, a default of 1 is used
key (str) – the schrodinger.structure.Structure property key to be optimized
name (str) – specify a name for the property, this name will be, for example used in any
*log
files, etc.units (str) – enter the units that the property is in, for example eV, nm, etc.
minimax (str) – to minimize or maximize this property then set this option to the class constants MIN or MAX
target (float) – if instead of maximizing or minimizing the property, the genetic optimization is supposed to handle a specific value then enter that value using this option.
comparator (str) – specify here how the target value and computed values are to be compared, i.e. either the class constants EQUALS for =, GREATER_THAN for >, or LESS_THAN for <.
error (float) – if equality to a target value has been specified then this option allows the user to control the error bounds of the target value, if not specified then a default of 10% of the specified target value will be used.
weight (float) – specify the weight to use for this property, if the genetic optimization is to be run on several properties then the weight allows the user to bias the solution. This option can also be used to control a situation where more than a single property is desired and where those properties are quantified using different physical units such that the numbers might be orders of magnitude apart from one another, for example comparing eV and nm. A default of 1.0 is used.
positive (bool) – True if this property can only take on positive values, for example as in the area of a surface, False otherwise, for example as in temperature in Celcius. The default is False.
summarize (bool) – if True then print a summary of this property, False otherwise
class_kwargs (dict or None) – contains kwargs for class based evaluation of this property
- setClassKwargs(class_kwargs)¶
Set the class kwargs.
- Parameters
class_kwargs (dict or None) – contains kwargs for class based evaluation of this property
- parsePropertyString(property_string)¶
Parse the attributes of this class from a string representation of the property specifications. For example, ‘index=1 key=r_matsci_Reduction_Potential_(eV) name=reduction units=eV target=1.28 comparator=eq error=0.05 weight=0.5’ or ‘index=2 key=r_matsci_Oxidation_Potential_(eV) name=oxidation units=eV minimax=max weight=2.5’
- Parameters
property_string (str) – the string representation of the property specifications
- Raises
PropertySyntaxError – if there is something wrong with the property syntax
UnknownPropertySuboptionError – if an unknown property suboption is found
- checkProperty()¶
Check this property instance.
- isScriptProperty()¶
Return True if this property is a script property, False otherwise.
- Return type
bool
- Returns
return True if this property is a script property, False otherwise
- static getPropertyStrings(property_lists)¶
Return property strings from the given property lists.
- Parameters
property_lists (list) – contains lists of property specifications
- Return type
list
- Returns
contains string representations of the property specifications
- static getRelPath(file_name)¶
Return the relative path to the given file name.
- Parameters
file_name (str) – the file name
- Return type
str
- Returns
the relative path to the file
- static getKwargs(property_string, option_substrings, add_relative_paths=None)¶
Return kwargs of the given property options from the given property string.
- Parameters
property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’
option_substrings (list or str) – contains the option substrings for the needed values, a single occurence or list of occurences may be passed
add_relative_paths (list) – contains options for which relative paths should be added, such relative paths might be needed for correctly parallelizing the evaluation stage of the genetic optimization as they will be needed to copy otherwise shared files into local subdirectories
- Return type
dict, str, or None
- Returns
the extracted dictionary of kwargs or single kwarg depending on the input option_substrings or None if nothing is found
- static rmKwargs(property_string, option_substrings)¶
Return a copy of the given property string with all of the given property option substrings removed.
- Parameters
property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’
option_substrings (list) – contains the option substrings to be removed
- Return type
str
- Returns
the string representation of the property specifications less the options substrings that were to be removed
- static addKwargs(property_string, kwargs)¶
Add the given options to the given property string.
- Parameters
property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’
kwargs (dict) – key-value option pairs to add to the property string
- Return type
str
- Returns
the string representation of the property specifications containin the new options
- static getCustomInfo(property_string, name, tpp=1, host_str=None)¶
Return a PropertyInfo.
- Parameters
property_string (str) – the string representation of the property specifications, containing options as ‘<option_substring>=<value>’
name (str) – the property name
tpp (int) – the threads per process
host_str (str) – the host string, for example ‘localhost:4’
- Raises
RuntimeError – if there is a problem
- Return type
- Returns
the property information
- schrodinger.application.matsci.genetic_optimization.set_title_to_stoichiometry(astructure, toappend=None, separation='.')¶
Set the structure title to be the stoichiometry of the structure.
- Parameters
astructure (schrodinger.structure.Structure) – the structure
toappend (str) – a string to append to the stoichiometry
separation (str) – used to separate the stoichiometry and the toappend str
- class schrodinger.application.matsci.genetic_optimization.StructureGenome¶
Bases:
pyevolve.GenomeBase.GenomeBase
Manage a genome. The genome, aka chromosome, is the solution to the problem trying to be solved via genetic optimization. It is referred to as being composed of genes that are manipulated by the crossover and mutation operators. In our genetic optimization module this genome is basically just a schrodinger.structure.Structure object.
- __init__()¶
Create an instance.
- copy(genome)¶
Copy the current genome to the provided genome.
- Parameters
genome (StructureGenome) – a new genome instance to which to copy the current genome
- clone()¶
Clone the current genome.
- Return type
- Returns
genome
- updateStructureProperties(index, generation)¶
Update some structure properties.
- Parameters
index (int) – the index of this individual
generation (int) – this generation
- resetParentProperties()¶
Reset the crossover and mutation parent structure properties.
- removeProperties()¶
Remove some structure properties.
- optimizeGeometry()¶
Optimize the geometry of this genome’s structure using OPLS.
- addPreviousFreezerFile(freezer_file)¶
Add the given file to the list of previous freezer files.
- Parameters
freezer_file (str) – the name of the file to be added
- evaluate(**args)¶
Evaluate the score of this individual.
- Parameters
args (dict) – dictionary of genetic optimization parameters created and used by pyevolve
- evaluator¶
- initializator¶
- mutator¶
- crossover¶
- internalParams¶
- score¶
- fitness¶
- schrodinger.application.matsci.genetic_optimization.from_initial_population(genome, **args)¶
Draw a unique genome from the initial population.
- Parameters
genome (StructureGenome) – a genome
args (dict) – dictionary of genetic optimization parameters created and used by pyevolve
- schrodinger.application.matsci.genetic_optimization.get_num_simple_bonds(astructure)¶
Return the number of simple bonds in the provided structure. The definition of a simple bond follows from that used in the reaction channel module and is an acyclic single order bond that may involve a hydrogen atom.
- Parameters
astructure (schrodinger.structure.Structure) – the structure for which to get the number of simple bonds
- Return type
int
- Returns
the number of simple bonds
- schrodinger.application.matsci.genetic_optimization.combine_two_structures(astructure, bstructure, offset=10.0)¶
Combine two structure objects into a single structure object using somewhat arbitrary placement.
- Parameters
astructure (schrodinger.structure.Structure) – the first of the structures to be combined
bstructure (schrodinger.structure.Structure) – the second of the structures to be combined
offset (float) – the final distance between the structures will be the sum of the molecular VDW radii plus this offset in Angstrom
- Return type
- Returns
the combined structure object
- schrodinger.application.matsci.genetic_optimization.bond_crossover(genome, **args)¶
Perform a crossover operation by swapping molecular fragments at two randomly choosen bonds, i.e. a double displacement reaction channel.
- Parameters
genome (StructureGenome) – a genome
args (dict) – dictionary of genetic optimization parameters created and used by pyevolve
- Return type
tuple
- Returns
tuple containing the sister and brother StructureGenome
- schrodinger.application.matsci.genetic_optimization.get_element_mutator_dict(astructure)¶
Return a dictionary where the keys contain the indicies of the mutatable atoms and the values contain those elements that the keyed atom may be mutated to.
- Parameters
astructure (schrodinger.structure.Structure) – the structure to be mutated
- Return type
dict
- Returns
keys are atom indicies of those atoms that are mutatable and values are those elements that the atom can be mutated to
- schrodinger.application.matsci.genetic_optimization.get_isoelectronic_mutator_indicies(astructure)¶
Return a list of atom indicies that can be mutated by the isoelectronic mutator.
- Parameters
astructure (schrodinger.structure.Structure) – the structure to be mutated
- Return type
list
- Returns
mutatable indicies
- schrodinger.application.matsci.genetic_optimization.get_child_like_parent(parent_st, children_sts, definition)¶
Return the child structure that is most like the provided parent.
- Parameters
parent_st (schrodinger.structure.Structure) – the parent structure
children_sts (list of schrodinger.structure.Structure) – the children structures
definition (two-element list) – each sublist contains two atom indicies describing the reactive bonds in parent and fragment structures which created the children
- Return type
- Returns
the sought child structure
- schrodinger.application.matsci.genetic_optimization.elemental_mutator(genome, **args)¶
Perform a random elemental mutation to an element in the same column (as known as group) of the periodic table. Note that hydrogen and the halogens are considered to belong to the same column.
- Parameters
genome (StructureGenome) – a genome
args (dict) – dictionary of genetic optimization parameters created and used by pyevolve
- Return type
int
- Returns
the number of mutations applied, appears to never be used in PyEvolve
- schrodinger.application.matsci.genetic_optimization.fragment_mutator(genome, **args)¶
Randomly mutate the genome by swapping a molecular fragement on one side of a bond by a similar fragment from a library.
- Parameters
genome (StructureGenome) – a genome
args (dict) – dictionary of genetic optimization parameters created and used by pyevolve
- Return type
int
- Returns
the number of mutations applied, appears to never be used in PyEvolve
- schrodinger.application.matsci.genetic_optimization.isoelectronic_mutator(genome, **args)¶
Perform a random isoelectronic mutation from the following sets of series CH3X, NH2X, OHX, and FX, CH2XY, NHXY, OXY, and CHXYZ and NXYZ, where X, Y, and Z are non-H-bonds.
- Parameters
genome (StructureGenome) – a genome
args (dict) – dictionary of genetic optimization parameters created and used by pyevolve
- Return type
int
- Returns
the number of mutations applied, appears to never be used in PyEvolve
- schrodinger.application.matsci.genetic_optimization.get_loggable_float(afloat, num_decimal='%.2f', field_width=10)¶
Return a float as a string with the specified format.
- Parameters
afloat (float) – a float to convert to a string
num_decimal (str) – the format of the string representation
field_width (int) – the field width of the final string
- Return type
str
- Returns
the float as a string
- schrodinger.application.matsci.genetic_optimization.uniquify_titles_callback(ga_obj)¶
Callback to uniquify titles of the individuals.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- schrodinger.application.matsci.genetic_optimization.prepare_next_generation_dirs_callback(ga_obj)¶
Callback to update the generation property of the genomes and to create a subdirectory to hold the next series of evaluations.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- schrodinger.application.matsci.genetic_optimization.manage_skips_callback(ga_obj)¶
Callback to manage skips in the evaluation.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- schrodinger.application.matsci.genetic_optimization.manage_failures_callback(ga_obj)¶
Callback to manage failures in the evaluation.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- schrodinger.application.matsci.genetic_optimization.logging_summary_callback(ga_obj)¶
Callback to log progress.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- schrodinger.application.matsci.genetic_optimization.molecule_history_callback(ga_obj)¶
Callback to append all structures from all generations to individual log files.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- schrodinger.application.matsci.genetic_optimization.first_property(ga_obj)¶
Terminate when the first property has been matched.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- Return type
bool
- Returns
True to terminate, False otherwise
- schrodinger.application.matsci.genetic_optimization.all_properties(ga_obj)¶
Terminate when all properties have been matched.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- Return type
bool
- Returns
True to terminate, False otherwise
- schrodinger.application.matsci.genetic_optimization.unproductive(ga_obj)¶
Terminate if the maximum number of unproductive generations has been reached.
- Parameters
ga_obj (GSimpleGA.GSimpleGA) – the entire current state of the genetic optimization
- Return type
bool
- Returns
True to terminate, False otherwise
- class schrodinger.application.matsci.genetic_optimization.CheckInput¶
Bases:
object
Manage checking user input.
- checkMaeFile(input_file, logger=None)¶
Check that a file exists and is
*mae
.- Parameters
input_file (str) – the name of the input file
logger (logging.Logger) – output logger
- checkOperators(operators, logger=None)¶
Check the operators.
- Parameters
operators (list) – contains tuples of the operator functions and their weights
logger (logging.Logger) – output logger
- checkRates(crossover_rate, mutation_rate, logger=None)¶
Check the specified rates of crossover and mutation.
- Parameters
crossover_rate (float) – the rate of crossover as a percentage
mutation_rate (float) – the rate of mutation as a percentage
logger (logging.Logger) – output logger
- checkInitialPopulation(initial_population, crossover_names, mutator_names, crossover_rate, mutation_rate, no_open_shell, logger=None)¶
Check the initial population.
- Parameters
initial_population (list) – the initial population of schrodinger.structure.Structure
crossover_names (list) – contains the function names of the crossover operators to be used
mutator_names (list) – contains the function names of the mutation operators to be used
crossover_rate (float) – the rate of crossover
mutation_rate (float) – the rate of mutation
no_open_shell (bool) – if True then check for open shell structures otherwise do not
logger (logging.Logger) – output logger
- checkPopulationParam(population, num_structures_given, logger=None)¶
Check the population parameter.
- Parameters
population (int) – the size of the population to use in the genetic optimization
num_structures_given (int) – the number of structures provided to the genetic optimization
logger (logging.Logger) – output logger
- checkFragmentLibs(fragment_libs, logger=None)¶
Check the specified fragment libraries.
- Parameters
fragment_libs (list) – strings specifying fragment libraries to be used
logger (logging.Logger) – output logger
- Return type
list
- Returns
valid user provided fragment files
- checkProperties(properties, logger=None)¶
Check the list of properties.
- Parameters
properties (list) – contains Property instances
logger (logging.Logger) – output logger
- checkGenerations(generations, logger=None)¶
Check the specified number of generations.
- Parameters
generations (int) – the number of generations
logger (logging.Logger) – output logger
- checkSelection(selection, logger=None)¶
Check the specified selection protocol.
- Parameters
selection (str) – the selection protocol to use.
logger (logging.Logger) – output logger
- checkTournamentSize(tournament_size, population, logger=None)¶
Check the specified tournament size.
- Parameters
tournament_size (int) – the size of tournament to use in tournament based selection
population (int) – the size of population to use
logger (logging.Logger) – output logger
- checkTerminationParams(terminators, num_unproductive, logger=None)¶
Check the termination parameters.
- Parameters
terminators (list) – the list of terminators to use
num_unproductive (int) – used when the unproductive termination option is active, it is the generation number on which to exit if the score hasn’t improved
logger (logging.Logger) – output logger
- Return type
list and int
- Returns
valid terminators and valid num_unproductive
- checkScaling(scaling, properties, logger=None)¶
Check the scaling.
- Parameters
scaling (str) – the scaling protocol to use in the genetic optimization
properties (list) – the properties to be optimized
logger (logging.Logger) – output logger
- checkElitism(elitism, population, logger=None)¶
Check the elitism.
- Parameters
elitism (int) – the number of elite individuals to use
population (int) – the size of population to use
logger (logging.Logger) – output logger
- checkConformationalSearch(conformational_search, logger=None)¶
Check the conformational search.
- Parameters
conformational_search (bool or str) – specifies whether a conformational search is to be performed, if a string is given specifies a file used to set options
logger (logging.Logger) – output logger
- checkFreezers(freezers, pop_size, input_size, logger=None)¶
Check the freezers.
- Parameters
freezers (list) – collection of freezers to use
pop_size (int) – the size of the population
input_size (int) – the number of structures given
logger (logging.Logger) – output logger
- Return type
list
- Returns
collection of freezers to use
- checkInoculate(inoculate, logger=None)¶
Check the inoculate.
- Parameters
inoculate (list) – circumstances in which to inoculate
logger (logging.Logger) – output logger
- schrodinger.application.matsci.genetic_optimization.print_bad_jobs(all_bad_jobs, logger, bad_type='skip')¶
Log bad jobs, i.e. skips and failures.
- Parameters
all_bad_jobs (dict) – a collection of bad subjobs, keys are genetic optimization generation and values are a list of Skip or Failure objects for bad subjobs
logger (logging.Logger) – output logger
bad_type (str) – specifies either ‘skip’ or ‘fail’ type
- class schrodinger.application.matsci.genetic_optimization.GeneticOptimization(initial_population, properties, structure_score_threshold=- 50.0, eval_kwargs={}, crossovers=None, mutators=None, fragment_libs=['optoelectronics'], script_evaluator=None, generations=10, population=8, crossover_rate=90.0, mutation_rate=90.0, selection='roulette_wheel', tournament_size=2, terminators=['unproductive', 'all_properties'], num_unproductive=6, scaling='sigma_truncation', elitism=1, random_seed=None, no_minimize=False, file_base_name='genopt', no_open_shell=False, props_to_remove=None, jobbe=None, conformational_search=False, freezers=['remainder', 'previous'], inoculate=['no_child', 'bad_structure'], class_evaluators=None, logger=None)¶
Bases:
object
Manage the genetic optimization.
- MSGWIDTH = 80¶
- __init__(initial_population, properties, structure_score_threshold=- 50.0, eval_kwargs={}, crossovers=None, mutators=None, fragment_libs=['optoelectronics'], script_evaluator=None, generations=10, population=8, crossover_rate=90.0, mutation_rate=90.0, selection='roulette_wheel', tournament_size=2, terminators=['unproductive', 'all_properties'], num_unproductive=6, scaling='sigma_truncation', elitism=1, random_seed=None, no_minimize=False, file_base_name='genopt', no_open_shell=False, props_to_remove=None, jobbe=None, conformational_search=False, freezers=['remainder', 'previous'], inoculate=['no_child', 'bad_structure'], class_evaluators=None, logger=None)¶
Create an instance.
- Parameters
initial_population (list) – the initial population of schrodinger.structure.Structure
properties (list of Property) – the properties to be optimized, including structural properties as well as more physical calculable observables
structure_score_threshold (float) – if structure-based properties are being sought and if the base evaluator will be used then subjobs on structures with structure scores below this value will not be launched but rather such structures treated as skips
eval_kwargs (dict) – a dictionary that will be available in all evaluator functions
crossovers (list) – contains two-element tuples each of which holds a crossover operator to be used in the optimization along with a weight
mutators (list) – contains two-element tuples each of which holds a mutation operator to be used in the optimization along with a weight
fragment_libs (list) – strings specifying fragment libraries to be used, can be either module constants from FRAGMENT_LIBS.keys() (or ALL if all of those are desired) or the names of Maestro files (including the file extensions) containing fragments collected by the user
script_evaluator (method) – the evaluator function to be called to score individuals during the optimization, takes a StructureGenome and returns a JobDJ
generations (int) – the number of generations for which to run the optimization
population (int) – the population size to use in the optimization, can be less-than-or-equal-to the length of initial_population
crossover_rate (float) – the rate of crossover as a percentage
mutation_rate (float) – the rate of mutation as a percentage
selection (str) – the selection protocol used to select individuals to the gene pool for the upcoming generation
tournament_size (int) – the size of tournament to use if using tournament based selection, unused if a tournament based selection is not being used
terminators (list) – list of strings that specify the termination protocols to be used to terminate the optimization, typically more than one is specified only if the unproductive protocol is being used
num_unproductive (int) – if the unproductive protocol is being used to terminate the optimization then this integer specifies how many unproductive cycles are allowed before terminating, unused if a different termination protocol is used
scaling (str) – specifies the scaling protocol to use, scaling scales the raw scores of the individuals to produce fitness scores to ease selection in cases where raw scores are nearly equal
elitism (int) – specify the number of elite individuals guaranteed to be added to the gene pool for the upcoming generation, zero disables elitism
random_seed (None or int) – the random seed, if None then system time will be used
no_minimize (bool) – specify that the offspring structures generated by the crossover and mutation operators not be geometry optimized prior to selection
file_base_name (str) – base name to use for output and generation log files
no_open_shell (bool) – if True then do not allow the processing of open shell molecules, False otherwise
props_to_remove (list) – a list of structure property keys to be removed prior to the evaluation stage
jobbe (schrodinger.job.jobcontrol._Backend) – the jobcontrol backend of the driver job
conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options
freezers (list) – a collection of freezers containing structures that are used to swap out individuals from the population
inoculate (list) – the list of circumstances under which to use the structure freezers
class_evaluators (dict) – keys are the evaluator classes to be called to score individuals during the optimization, each must inherit ClassEvaluator, values are lists of Property to be passed to the class evaluator
logger (logging.Logger) – output logger
- setRootLoggerForPyEvolve()¶
Set up the root logger for PyEvolve.
- setOperatorNames()¶
Set the operator names.
- checkInputParams()¶
Check the input parameters.
- printProperties()¶
Log the set of sought properties and their details.
- printParams()¶
Log the parameters.
- initializeGenome()¶
Initialize a genome.
- Return type
- Returns
a genome
- initializeGA(genome)¶
Initialize the genetic optimization.
- Parameters
genome (StructureGenome) – a genome
- setMonomerGrowAtoms()¶
Set the monomer grow atoms using the mark monomer module convention rather than the polymer builder module convention.
- runIt()¶
Run the components of the genetic optimization.
- schrodinger.application.matsci.genetic_optimization.get_output_file_name(basename)¶
Get the output file name from the basename.
- Parameters
basename (str) – base name to use
- Return type
str
- Returns
output_file_name, name of output file
- schrodinger.application.matsci.genetic_optimization.get_generation_log_file_name(basename, generation)¶
Get the generation log file name.
- Parameters
basename (str) – base name to use
generation (int) – the generation
- Return type
str
- Returns
generation_log_file_name, name of generation log file
- schrodinger.application.matsci.genetic_optimization.get_structure_score(astructure, properties, conformational_search, seed=None, this_random=None)¶
Return the structure score for the provided structure.
- Parameters
astructure (schrodinger.structure.Structure) – the structure to score
properties (list of Property) – the properties used in scoring
conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options
seed (int or None) – random seed used in conformational search or None if conformational search is not being done
this_random (numpy.random.RandomState or None) – random state, if None use the module constant
- Return type
float
- Returns
the structure score
- schrodinger.application.matsci.genetic_optimization.structure_evaluator(genome, this_random=None)¶
This is the structure evaulator.
- Parameters
genome (StructureGenome) – a genome
this_random (numpy.random.RandomState or None) – random state, if None use the module constant
- Return type
float
- Returns
the score for this individual
- schrodinger.application.matsci.genetic_optimization.base_evaluator(genome)¶
This is the base evaulator used to wrap all other evaluators.
- Parameters
genome (StructureGenome) – a genome
- Return type
float
- Returns
the score for this individual
- schrodinger.application.matsci.genetic_optimization.optoelectronics_evaluator(genome)¶
Run an optoelectronics job.
- Parameters
genome (StructureGenome) – a genome
- Return type
- Returns
the JobDJ object for this individual, it is run in the base evaluator
- schrodinger.application.matsci.genetic_optimization.apply_uniform_operator_weights(operators)¶
Set the operator weights uniformly.
- Parameters
operators (list) – a list of two-element tuples, each tuple contains first an operator function and second a weight
- Return type
list
- Returns
list of two-element tuples of operators and uniform weights
- schrodinger.application.matsci.genetic_optimization.structure_is_open_shell(astructure, ignore_charge=True)¶
Return True if the provided structure is open shell, i.e. has an odd number of electrons.
- Parameters
astructure (schrodinger.structure.Structure) – the structure in question
ignore_charge (bool) – if True then ignore any structure.formal_charge settings
- Return type
bool
- Returns
True if the provided structure is open shell, False otherwise
- schrodinger.application.matsci.genetic_optimization.get_element_histogram(astructure)¶
Return a dictionary where keys are elements and values are the numbers of atoms of a given element.
- Parameters
astructure (schrodinger.structure.Structure) – the structure in question
- Return type
dict
- Returns
dictionary with element histogram, keys are elements (strs) and values are numbers (ints)
- schrodinger.application.matsci.genetic_optimization.remove_basename_ext(stoich_ext)¶
Remove the basename extension from the given string and return the remainder which is the stoichiometry. Do this instead of having to recompute the stoichiometry which can be expensive.
- Parameters
stoich_ext (str) – contains the stoichiometry and basename extension
- Return type
str
- Returns
stoichiometry
- schrodinger.application.matsci.genetic_optimization.get_low_energy_conformers(astructure_in, macromodel_options_file=None, remove_files=False, overwrite=False, seed=None, this_random=None, host_str=None)¶
Return the lowest energy conformers from a Macromodel conformational search.
- Parameters
astructure_in (schrodinger.structure.Structure) – the structure to search for conformations
macromodel_options_file (str or None) – the name of a simplified Macromodel input file that contains any options to use in addition to those used by default in a conformational search or None if there are none and you just want to use the defaults
remove_files (bool) – if the job is successful, specifies whether to remove all files created for it after it finishes
overwrite (bool) – if True then the coordinates of the input structure will be overwritten by those of the lowest energy conformer and that structure alone returned by this function
seed (int or None) – used to seed the random number generator used in the Macromodel conformational search, should be in CONF_SEARCH_SEED_RANGE, if None then if a CONFSEARCH_SEED has been specified in macromodel_options_file it will be used, otherwise a random int in CONF_SEARCH_SEED_RANGE will be used
this_random (numpy.random.RandomState or None) – random state, if None use the module constant
host_str (str) – the host string, for example ‘localhost:4’
- Return type
list of schrodinger.structure.Structure, int
- Returns
the structures of the lowest energy conformers sorted by increasing energy and the seed used in the conformational search (same as input if input was given either as seed or in macromodel_options_file)
- schrodinger.application.matsci.genetic_optimization.get_random_structure(structure_libs, tries_from_libs=3, structure_score_threshold=None, properties=None, conformational_search=False, seed=None, this_random=None)¶
From the given dictionary of libraries return a random structure.
- Parameters
structure_libs (dict) – keys are strings specifying the types of libraries to be used and can be module constants from FREEZER_CHOICES.keys(), values are lists of libraries by type and can be either module constants from FRAGMENT_LIBS.keys(), ALL, or the names of Maestro files (including the file extensions)
tries_from_libs (int) – the number of times to try before giving up
structure_score_threshold (float or None) – specifies that a structure with a structure score greater-than-or-equal-to this threshold is sought, the best of the considered structures will be returned and will contain several structure properties related to the scoring
properties (list of Property or None) – the properties used in structure scoring
conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options
seed (int or None) – if not None specifies that random should be reseeded with the given value
this_random (numpy.random.RandomState or None) – random state, if None use the module constant
- Return type
schrodinger.structure.Structure or None
- Returns
the random structure or None if one couldn’t be found
- schrodinger.application.matsci.genetic_optimization.get_freezer_structure(structure_libs, tries_from_libs=3, structure_score_threshold=None, properties=None, conformational_search=False, inoculate='no_child', crossover_applied=None, mutation_applied=None, basename_ext=None, seed=None, this_random=None)¶
Return a random structure from the freezer and update that structure’s properties.
- Parameters
structure_libs (dict) – keys are strings specifying the types of libraries to be used and can be module constants from FREEZER_CHOICES.keys(), values are lists of libraries by type and can be either module constants from FRAGMENT_LIBS.keys(), ALL, or the names of Maestro files (including the file extensions)
tries_from_libs (int) – the number of times to try before giving up
structure_score_threshold (float or None) – specifies that a structure with a structure score greater-than-or-equal-to this threshold is sought, the best of the considered structures will be returned and will contain several structure properties related to the scoring
properties (list of Property or None) – the properties used in structure scoring
conformational_search (bool or str) – specifies whether a Macromodel conformational search will be performed prior to evaluation, when a string it specifies a simplified Macromodel input file containing extra options
inoculate (str) – specify the reason for drawing from the freezer, which is an inoculate option from INOCULATE_CHOICES
crossover_applied (str or None) – specify the intended crossover operator or None if there isn’t to be one
mutation_applied (str or None) – specify the intended mutation operator or None if there isn’t to be one
basename_ext (str or None) – specify an extension to append to the stoichiometry which is used to set the title of the returned structure
seed (int or None) – if not None specifies that random should be reseeded with the given value
this_random (numpy.random.RandomState or None) – random state, if None use the module constant
- Return type
schrodinger.structure.Structure or None
- Returns
the random structure or None if one couldn’t be found