schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)¶
Bases:
tuple
- __contains__(key, /)¶
Return key in self.
- __len__()¶
Return len(self).
- components¶
Alias for field number 1
- count(value, /)¶
Return number of occurrences of value.
- flag¶
Alias for field number 0
- header¶
Alias for field number 2
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- units¶
Alias for field number 3
- schrodinger.application.matsci.mlearn.features.DescriptorUtility¶
alias of
schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
- schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)¶
Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
- Parameters
struct (
schrodinger.structure.Structure
) – Input structurecutoff (float) – The cutoff for finding nearest neighbor atoms
- Return type
schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
- Returns
Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
- Raise
ValueError if struct is missing PBCs
- schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)¶
- schrodinger.application.matsci.mlearn.features.get_anion(struct)¶
Get the most electronegative element in the structure (anion).
- Parameters
struct (
schrodinger.structure.Structure
) – Input structure- Return type
str, float, int
- Returns
Element, it’s electronegativity, number of anions in the cell
- class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)¶
Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate lattice-based features.
- FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
- __init__(features, element='Li', cutoff=4.0)¶
Initialize the object.
- runFeature(feature)¶
Get result from a feature.
- Param
feature: One of the features listed in FEATURES.
- Return type
int or float
- Returns
Feature value
- transform(structs)¶
Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – List of structures to be featurized- Return type
numpy array of shape [n_samples, n_features]
- Returns
Transformed array
- avgAtomicVol()¶
Get average atomic volume.
- Parameters
struct (
schrodinger.structure.Structure
) – Structure to be used for feature calculation- Return type
float
- Returns
Average atomic volume (A^3)
- avgNeighborCount()¶
Get average neighbor count.
- Return type
float
- Returns
Average neighbor count
- stdNeighborCount()¶
Get standard deviation of neighbor count.
- Return type
float
- Returns
Average neighbor count
- avgSublatticeEneg()¶
Get average sublattice electronegativity.
- Return type
float
- Returns
Average sublattice electronegativity
- avgSublatticeNeighborCount()¶
Get average sublattice neighbor count.
- Return type
float
- Returns
Average sublattice neighbor count
- avgNeighborIon()¶
Get average neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
- stdNeighborIon()¶
Get standard deviation of neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
- avgSublatticeNeighborIon()¶
Get average sublattice neighbor ionicity.
- Return type
float
- Returns
Average sublattice neighbor count
- volPerAnion()¶
Get volume per anion.
- Return type
float
- Returns
Volume per anion
- packingFraction(skip_element=None)¶
Get packing fraction of the crystal.
- Parameters
skip_element (str) – Element to skip
- Return type
float
- Returns
Packing fraction
- effectiveRadius(atom)¶
Get atom effective radius.
- Parameters
atom (schrodinger.structure._StructureAtom) – Atom
- Return type
float
- Returns
Effective radius
- sublatticePackingFraction()¶
Get packing fraction of the sublattice crystal.
- Return type
float
- Returns
Packing fraction
- avgElementNeighborCount()¶
Get average element neighbor count.
- Return type
float
- Returns
Average number of bonds per element
- avgAnionAnionShortDistance()¶
Get average anion anion shortest distance.
- Return type
float
- Returns
Average anion anion shortest distance
- avgElementAnionShortDistance()¶
Get average element anion shortest distance.
- Return type
float
- Returns
Average element anion shortest distance
- avgShortDistance()¶
Get average element element shortest distance.
- Return type
float
- Returns
Average element element shortest distance
- anionFrameCoordination()¶
Get anion framework coordination.
- Return type
float
- Returns
Anion framework coordination
- pathWidth(eval_eneg=False)¶
Evaluate average straight line path width. See the reference in the constructor for more info.
- Parameters
eval_eneg (bool) – If True, return average over electronegativity, instead of distance
- Return type
float
- Returns
Average path or electronegativity
- pathWidthEneg()¶
Evaluate average straight line path electronegativity.
- Return type
float
- Returns
Average electronegativity along the path
- ratioIonicity()¶
Get ratio ionicity.
- Return type
float
- Returns
Ratio ionicity
- ratioCount()¶
Get ratio neighbor count.
- Return type
float
- Returns
Ratio neighbor count
- fit(data, data_y=None)¶
Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)¶
Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)¶
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- transform{“default”, “pandas”}, default=None
Configure output of
transform
andfit_transform
."default"
: Default output format of a transformer"pandas"
: DataFrame outputNone
: Transform configuration is unchanged
- selfestimator instance
Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfestimator instance
Estimator instance.
- class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)¶
Bases:
object
Manage a ligand.
- __init__(st, metal_atom, new_to_old, coordination_idxs)¶
Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structuremetal_atom (
schrodinger.structure._StructureAtom
) – the metal atomnew_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
- getVec(point)¶
Return a vector pointing from the metal atom to the given point.
- Parameters
point (
numpy.array
) – the point in Ang.- Return type
numpy.array
- Returns
the vector in Ang.
- getCentroid(st, idxs)¶
Return the centroid vector of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the centroid vector in Ang.
- getCoordinationVec(st, idxs)¶
Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the coordination vector in Ang.
- getStoichiometry()¶
Return the stoichiometry.
- Return type
str
- Returns
the stoichiometry
- getDenticity()¶
Return the denticity.
- Return type
int
- Returns
the denticity
- getHapticity()¶
Return the hapticity.
- Return type
int
- Returns
the hapticity
- getHapticCharacter()¶
Return the haptic character.
- Return type
int
- Returns
the haptic character
- getBiteAngle()¶
Return the bite angle in degrees.
- Return type
float or None
- Returns
the bite angle in degrees
- getAtomConeAngle(atom)¶
Return the cone angle for the given atom in degrees.
- Parameters
atom (
schrodinger.structure._StructureAtom
) – the atom- Return type
float
- Returns
the cone angle for the given atom in degrees
- getConeAngle()¶
Return the cone angle in degrees.
- Return type
float
- Returns
the cone angle in degrees
- getBondLength()¶
Return the bond length in Ang.
- Return type
float
- Returns
the bond length in Ang.
- getDescriptors()¶
Return descriptors.
- Return type
dict
- Returns
(label, data) pairs
- class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())¶
Bases:
object
Manage a complex.
- BURIED_VOLUME_VDW_SCALE = 1.17¶
- CONTOURS_DIR = 'contours'¶
- __init__(st, logger=None, nonmetallic_centers=())¶
Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structurelogger (logging.Logger or None) – output logger or None if there isn’t one
nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom
- setMetalAtom()¶
Set the metal atom.
- setLigands()¶
Set the ligands.
- getBondAngle()¶
Return the bond angle in degrees.
- Return type
float
- Returns
the bond angle in degrees
- getVDWSurfaceArea()¶
Return the VDW surface area in Angstrom^2.
- Return type
float
- Returns
the VDW surface area in Angstrom^2
- getVDWVolume(vdw_scale=1, buffer_len=2)¶
Return the VDW volume in Angstrom^3.
- Parameters
vdw_scale (float) – the VDW scale
buffer_len (float) – a shape buffer lengths in Angstrom
- Return type
float
- Returns
the VDW volume in Angstrom^3
- getBuriedVolumeStructure(only_largest_ligands=False)¶
Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric.
- Parameters
only_largest_ligands (bool) – Whether small ligands should be deleted
- Return type
- Returns
the structure containing some or all ligands
- getBuriedVDWVolumePct(struct, vdw_scale=1.17, sphere_quadrant=None, free_volume=False)¶
Return the buried VDW volume percent.
- Parameters
struct (structure.Structure) – The structure to get buried volume for
vdw_scale (float) – the VDW scale
sphere_quadrant (None or str) – restrict sphere sampling to a quadrant specified as a key of amorphous.ORDINAL_DIRECTIONS
free_volume (bool) – use this option to return the free volume
- Return type
float
- Returns
the buried VDW volume percent
- getFreeVolumeVector()¶
Return a unit vector pointing from the metal atom of the complex in the direction of free volume.
- Return type
numpy.array
- Returns
the free volume unit vector
- getRotatedComplex()¶
Return a copy of the complex that is rotated so that the free volume vector points along the positive z-axis.
- Return type
structure.Structure
- Returns
A rotated copy of the input structure
- exportBuriedVolumeContour(sphere_radius=3.5, vdw_scale=1.17, num_bins=30, seed=1234)¶
Export the buried volume contour for the complex
- Parameters
sphere_radius (float) – The radius for the sphere to sample points in
vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom
num_bins (int) – The number of bins in x and y direction to put the points in
seed (int) – Seed for random number generation
- Return type
str, str
- Returns
The paths to contour png and csv files
- plotContour(points)¶
Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour.
- Parameters
points (numpy.array) – The x, y, z values of points
- getVectorizedDescriptors(jaguar_out_file)¶
Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
- Parameters
jaguar_out_file (str or None) – the name of a Jaguar
*.out
file from which descriptors will be extracted or None if there isn’t one- Return type
dict
- Returns
(label, data) pairs
- getDescriptors(no_organometallic=False)¶
Return descriptors.
- Parameters
no_organometallic (bool) – Whether organometallic descriptors should be skipped
- Return type
dict
- Returns
(label, data) pairs
- schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)¶
Return a list of unique titles for the given structures.
- Parameters
sts (list) – contains
schrodinger.structure.Structure
- Return type
list
- Returns
the unique titles
- class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate features for metal complexes.
- __init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
Create an instance.
- Parameters
jaguar (bool) – specify whether to calculate Jaguar features
jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
tpp (int) – the number of threads for any Jaguar jobs
ligfilter (bool) – specify whether to calculate Ligfilter features
no_organometallic (bool) – Whether organometallic descriptors should be skipped
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors
include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
save_files (bool) – Whether to save subjob files or not
logger (logging.Logger or None) – output logger or None if there isn’t one
- runJaguar()¶
Run Jaguar on the given structures.
- Return type
list
- Returns
contains Jaguar
*.out
file names
- getFeatures(structs, jaguar_out_files=None)¶
Return features dictionary for the given structures
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurizedjaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar
*.out
files then specify the files here using the same ordering as used for any given structures
- verifyJaguarOutfiles()¶
Run jaguar and get the out-files if the out-files have not been provided
- getComplexDescriptors()¶
Create a
Complex
object for each structure and get their descriptors- Return type
dict
- Returns
The descriptors from
Complex
for each structure
- getJaguarDescriptors()¶
Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.
- Return type
dict
- Returns
The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getUtilityDescriptors()¶
Get the requested utility descriptors for all structures
- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getDescriptorUtilityJob(descriptor_utility)¶
Get the job to run to generate the descriptors using the passed descriptor_utility for all structures
- Parameters
descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors
- Return type
jobutils.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
- getExtraMolecularDescriptorsProps(st, descriptor_utility)¶
Return any extra structure properties computed using the output from molecular descriptors.
- Parameters
st (
schrodinger.structure.Structure
) – the structure output from molecular descriptors which has all output properties defineddescriptor_utility (DescriptorUtility) – the molecular descriptor utility containing the original job parameters
- Return type
dict
- Returns
pairs are property names and values
- processUtilityDescriptorOutputs(jobs_dict)¶
Read the descriptors for all descriptor utilities that were run, and return them
- Parameters
jobs_dict (dict) – Dictionary with
DescriptorUtility
as keys and jobs as values- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getMolecularDescriptorsJob()¶
Get the job to run to generate molecular descriptors for all structures
- Return type
jobutils.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
- static writeFingerprintFiles(structs)¶
Write fingerprint files for the given structures.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be fingerprinted- Return type
list
- Returns
the fingerprint file names
- log(msg, **kwargs)¶
Add a message to the log file
- Parameters
msg (str) – The message to log
Additional keyword arguments are passed to the textlogger.log_msg function
- fit(data, data_y=None)¶
Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)¶
Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)¶
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- transform{“default”, “pandas”}, default=None
Configure output of
transform
andfit_transform
."default"
: Default output format of a transformer"pandas"
: DataFrame outputNone
: Transform configuration is unchanged
- selfestimator instance
Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfestimator instance
Estimator instance.
- transform(data)¶
Get numerical features. Must be implemented by a child class.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
- Return type
numpy array of shape [n_samples, n_features_new]
- Returns
Transformed array
- class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')¶
Bases:
object
Calculates CrystalNN structure fingerprints as implemented in pymatgen
- OPS_PRESET = 'ops'¶
- CN_PRESET = 'cn'¶
- __init__(preset='ops')¶
Create a structure featurizer
- Parameters
preset (str) – One of
OPS_PRESET
orCN_PRESET
class constants
- featurize(struct)¶
Get CrystalNN fingerprints for the passed structure
:param
structure.Structure
The structure to get features for- Return type
list
- Returns
List of CrystalNN fingerprints for the structure