schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)¶
Bases:
tuple
- __contains__(key, /)¶
Return key in self.
- __len__()¶
Return len(self).
- components¶
Alias for field number 1
- count(value, /)¶
Return number of occurrences of value.
- flag¶
Alias for field number 0
- header¶
Alias for field number 2
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- units¶
Alias for field number 3
- schrodinger.application.matsci.mlearn.features.DescriptorUtility¶
alias of
schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
- schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)[source]¶
Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
- Parameters
struct (
schrodinger.structure.Structure
) – Input structurecutoff (float) – The cutoff for finding nearest neighbor atoms
- Return type
schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
- Returns
Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
- Raise
ValueError if struct is missing PBCs
- schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)[source]¶
- schrodinger.application.matsci.mlearn.features.get_anion(struct)[source]¶
Get the most electronegative element in the structure (anion).
- Parameters
struct (
schrodinger.structure.Structure
) – Input structure- Return type
str, float, int
- Returns
Element, it’s electronegativity, number of anions in the cell
- class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)[source]¶
Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate lattice-based features.
- FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
- runFeature(feature)[source]¶
Get result from a feature.
- Param
feature: One of the features listed in FEATURES.
- Return type
int or float
- Returns
Feature value
- transform(structs)[source]¶
Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – List of structures to be featurized- Return type
numpy array of shape [n_samples, n_features]
- Returns
Transformed array
- avgAtomicVol()[source]¶
Get average atomic volume.
- Parameters
struct (
schrodinger.structure.Structure
) – Structure to be used for feature calculation- Return type
float
- Returns
Average atomic volume (A^3)
- avgNeighborCount()[source]¶
Get average neighbor count.
- Return type
float
- Returns
Average neighbor count
- stdNeighborCount()[source]¶
Get standard deviation of neighbor count.
- Return type
float
- Returns
Average neighbor count
- avgSublatticeEneg()[source]¶
Get average sublattice electronegativity.
- Return type
float
- Returns
Average sublattice electronegativity
- avgSublatticeNeighborCount()[source]¶
Get average sublattice neighbor count.
- Return type
float
- Returns
Average sublattice neighbor count
- avgNeighborIon()[source]¶
Get average neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
- stdNeighborIon()[source]¶
Get standard deviation of neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
- avgSublatticeNeighborIon()[source]¶
Get average sublattice neighbor ionicity.
- Return type
float
- Returns
Average sublattice neighbor count
- packingFraction(skip_element=None)[source]¶
Get packing fraction of the crystal.
- Parameters
skip_element (str) – Element to skip
- Return type
float
- Returns
Packing fraction
- effectiveRadius(atom)[source]¶
Get atom effective radius.
- Parameters
atom (schrodinger.structure._StructureAtom) – Atom
- Return type
float
- Returns
Effective radius
- sublatticePackingFraction()[source]¶
Get packing fraction of the sublattice crystal.
- Return type
float
- Returns
Packing fraction
- avgElementNeighborCount()[source]¶
Get average element neighbor count.
- Return type
float
- Returns
Average number of bonds per element
- avgAnionAnionShortDistance()[source]¶
Get average anion anion shortest distance.
- Return type
float
- Returns
Average anion anion shortest distance
- avgElementAnionShortDistance()[source]¶
Get average element anion shortest distance.
- Return type
float
- Returns
Average element anion shortest distance
- avgShortDistance()[source]¶
Get average element element shortest distance.
- Return type
float
- Returns
Average element element shortest distance
- anionFrameCoordination()[source]¶
Get anion framework coordination.
- Return type
float
- Returns
Anion framework coordination
- pathWidth(eval_eneg=False)[source]¶
Evaluate average straight line path width. See the reference in the constructor for more info.
- Parameters
eval_eneg (bool) – If True, return average over electronegativity, instead of distance
- Return type
float
- Returns
Average path or electronegativity
- pathWidthEneg()[source]¶
Evaluate average straight line path electronegativity.
- Return type
float
- Returns
Average electronegativity along the path
- fit(data, data_y=None)¶
Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
- yndarray of shape (n_samples,), default=None
Target values.
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)¶
Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsmapping of string to any
Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfobject
Estimator instance.
- class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)[source]¶
Bases:
object
Manage a ligand.
- __init__(st, metal_atom, new_to_old, coordination_idxs)[source]¶
Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structuremetal_atom (
schrodinger.structure._StructureAtom
) – the metal atomnew_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
- getVec(point)[source]¶
Return a vector pointing from the metal atom to the given point.
- Parameters
point (
numpy.array
) – the point in Ang.- Return type
numpy.array
- Returns
the vector in Ang.
- getCentroid(st, idxs)[source]¶
Return the centroid vector of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the centroid vector in Ang.
- getCoordinationVec(st, idxs)[source]¶
Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the coordination vector in Ang.
- getHapticCharacter()[source]¶
Return the haptic character.
- Return type
int
- Returns
the haptic character
- getBiteAngle()[source]¶
Return the bite angle in degrees.
- Return type
float or None
- Returns
the bite angle in degrees
- getAtomConeAngle(atom)[source]¶
Return the cone angle for the given atom in degrees.
- Parameters
atom (
schrodinger.structure._StructureAtom
) – the atom- Return type
float
- Returns
the cone angle for the given atom in degrees
- getConeAngle()[source]¶
Return the cone angle in degrees.
- Return type
float
- Returns
the cone angle in degrees
- class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())[source]¶
Bases:
object
Manage a complex.
- BURIED_VOLUME_VDW_SCALE = 1.17¶
- CONTOURS_DIR = 'contours'¶
- __init__(st, logger=None, nonmetallic_centers=())[source]¶
Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structurelogger (logging.Logger or None) – output logger or None if there isn’t one
nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom
- getBondAngle()[source]¶
Return the bond angle in degrees.
- Return type
float
- Returns
the bond angle in degrees
- getVDWSurfaceArea()[source]¶
Return the VDW surface area in Angstrom^2.
- Return type
float
- Returns
the VDW surface area in Angstrom^2
- getVDWVolume(vdw_scale=1, buffer_len=2)[source]¶
Return the VDW volume in Angstrom^3.
- Parameters
vdw_scale (float) – the VDW scale
buffer_len (float) – a shape buffer lengths in Angstrom
- Return type
float
- Returns
the VDW volume in Angstrom^3
- getBuriedVolumeStructure(only_largest_ligands=False)[source]¶
Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric.
- Parameters
only_largest_ligands (bool) – Whether small ligands should be deleted
- Return type
- Returns
the structure containing some or all ligands
- getBuriedVDWVolumePct(struct, vdw_scale=1.17)[source]¶
Return the buried VDW volume percent.
- Parameters
struct (structure.Structure) – The structure to get buried volume for
vdw_scale (float) – the VDW scale
- Return type
float
- Returns
the buried VDW volume percent
- getFreeVolumeVector()[source]¶
Return a unit vector pointing from the metal atom of the complex in the direction of free volume.
- Return type
numpy.array
- Returns
the free volume unit vector
- getRotatedComplex()[source]¶
Return a copy of the complex that is rotated so that the free volume vector points along the positive z-axis.
- Return type
structure.Structure
- Returns
A rotated copy of the input structure
- exportBuriedVolumeContour(sphere_radius=3.5, vdw_scale=1.17, num_bins=30, seed=1234)[source]¶
Export the buried volume contour for the complex
- Parameters
sphere_radius (float) – The radius for the sphere to sample points in
vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom
num_bins (int) – The number of bins in x and y direction to put the points in
seed (int) – Seed for random number generation
- Return type
str, str
- Returns
The paths to contour png and csv files
- plotContour(points)[source]¶
Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour.
- Parameters
points (numpy.array) – The x, y, z values of points
- getVectorizedDescriptors(jaguar_out_file)[source]¶
Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
- Parameters
jaguar_out_file (str or None) – the name of a Jaguar
*.out
file from which descriptors will be extracted or None if there isn’t one- Return type
dict
- Returns
(label, data) pairs
- schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)[source]¶
Return a list of unique titles for the given structures.
- Parameters
sts (list) – contains
schrodinger.structure.Structure
- Return type
list
- Returns
the unique titles
- class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)[source]¶
Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate features for metal complexes.
- __init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)[source]¶
Create an instance.
- Parameters
jaguar (bool) – specify whether to calculate Jaguar features
jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
tpp (int) – the number of threads for any Jaguar jobs
ligfilter (bool) – specify whether to calculate Ligfilter features
no_organometallic (bool) – Whether organometallic descriptors should be skipped
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors
include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
save_files (bool) – Whether to save subjob files or not
logger (logging.Logger or None) – output logger or None if there isn’t one
- runJaguar()[source]¶
Run Jaguar on the given structures.
- Return type
list
- Returns
contains Jaguar
*.out
file names
- getFeatures(structs, jaguar_out_files=None)[source]¶
Return features dictionary for the given structures
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurizedjaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar
*.out
files then specify the files here using the same ordering as used for any given structures
- verifyJaguarOutfiles()[source]¶
Run jaguar and get the out-files if the out-files have not been provided
- getComplexDescriptors()[source]¶
Create a
Complex
object for each structure and get their descriptors- Return type
dict
- Returns
The descriptors from
Complex
for each structure
- getJaguarDescriptors()[source]¶
Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.
- Return type
dict
- Returns
The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getUtilityDescriptors()[source]¶
Get the requested utility descriptors for all structures
- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getDescriptorUtilityJob(descriptor_utility)[source]¶
Get the job to run to generate the descriptors using the passed descriptor_utility for all structures
- Parameters
descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors
- Return type
jobutils.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
- processUtilityDescriptorOutputs(jobs_dict)[source]¶
Read the descriptors for all descriptor utilities that were run, and return them
- Parameters
jobs_dict (dict) – Dictionary with
DescriptorUtility
as keys and jobs as values- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getMolecularDescriptorsJob()[source]¶
Get the job to run to generate molecular descriptors for all structures
- Return type
jobutils.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
- static writeFingerprintFiles(structs)[source]¶
Write fingerprint files for the given structures.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be fingerprinted- Return type
list
- Returns
the fingerprint file names
- log(msg, **kwargs)[source]¶
Add a message to the log file
- Parameters
msg (str) – The message to log
Additional keyword arguments are passed to the textlogger.log_msg function
- fit(data, data_y=None)¶
Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
- yndarray of shape (n_samples,), default=None
Target values.
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)¶
Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsmapping of string to any
Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfobject
Estimator instance.
- transform(data)¶
Get numerical features. Must be implemented by a child class.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
- Return type
numpy array of shape [n_samples, n_features_new]
- Returns
Transformed array
- class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')[source]¶
Bases:
object
Calculates CrystalNN structure fingerprints as implemented in pymatgen
- OPS_PRESET = 'ops'¶
- CN_PRESET = 'cn'¶
- __init__(preset='ops')[source]¶
Create a structure featurizer
- Parameters
preset (str) – One of
OPS_PRESET
orCN_PRESET
class constants