schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)¶
Bases:
tuple
- components¶
Alias for field number 1
- flag¶
Alias for field number 0
- header¶
Alias for field number 2
- units¶
Alias for field number 3
- schrodinger.application.matsci.mlearn.features.DescriptorUtility¶
alias of
schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
- class schrodinger.application.matsci.mlearn.features.ProjBondData(atom_1, atom_2, xy_1, xy_2)¶
Bases:
tuple
- atom_1¶
Alias for field number 0
- atom_2¶
Alias for field number 1
- xy_1¶
Alias for field number 2
- xy_2¶
Alias for field number 3
- schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)¶
Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
- Parameters
struct (
schrodinger.structure.Structure
) – Input structurecutoff (float) – The cutoff for finding nearest neighbor atoms
- Return type
schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
- Returns
Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
- Raise
ValueError if struct is missing PBCs
- schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)¶
- schrodinger.application.matsci.mlearn.features.get_anion(struct)¶
Get the most electronegative element in the structure (anion).
- Parameters
struct (
schrodinger.structure.Structure
) – Input structure- Return type
str, float, int
- Returns
Element, it’s electronegativity, number of anions in the cell
- class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)¶
Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate lattice-based features.
- FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
- __init__(features, element='Li', cutoff=4.0)¶
Initialize the object.
- runFeature(feature)¶
Get result from a feature.
- Param
feature: One of the features listed in FEATURES.
- Return type
int or float
- Returns
Feature value
- transform(structs)¶
Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – List of structures to be featurized- Return type
numpy array of shape [n_samples, n_features]
- Returns
Transformed array
- avgAtomicVol()¶
Get average atomic volume.
- Parameters
struct (
schrodinger.structure.Structure
) – Structure to be used for feature calculation- Return type
float
- Returns
Average atomic volume (A^3)
- avgNeighborCount()¶
Get average neighbor count.
- Return type
float
- Returns
Average neighbor count
- stdNeighborCount()¶
Get standard deviation of neighbor count.
- Return type
float
- Returns
Average neighbor count
- avgSublatticeEneg()¶
Get average sublattice electronegativity.
- Return type
float
- Returns
Average sublattice electronegativity
- avgSublatticeNeighborCount()¶
Get average sublattice neighbor count.
- Return type
float
- Returns
Average sublattice neighbor count
- avgNeighborIon()¶
Get average neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
- stdNeighborIon()¶
Get standard deviation of neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
- avgSublatticeNeighborIon()¶
Get average sublattice neighbor ionicity.
- Return type
float
- Returns
Average sublattice neighbor count
- volPerAnion()¶
Get volume per anion.
- Return type
float
- Returns
Volume per anion
- packingFraction(skip_element=None)¶
Get packing fraction of the crystal.
- Parameters
skip_element (str) – Element to skip
- Return type
float
- Returns
Packing fraction
- effectiveRadius(atom)¶
Get atom effective radius.
- Parameters
atom (schrodinger.structure._StructureAtom) – Atom
- Return type
float
- Returns
Effective radius
- sublatticePackingFraction()¶
Get packing fraction of the sublattice crystal.
- Return type
float
- Returns
Packing fraction
- avgElementNeighborCount()¶
Get average element neighbor count.
- Return type
float
- Returns
Average number of bonds per element
- avgAnionAnionShortDistance()¶
Get average anion anion shortest distance.
- Return type
float
- Returns
Average anion anion shortest distance
- avgElementAnionShortDistance()¶
Get average element anion shortest distance.
- Return type
float
- Returns
Average element anion shortest distance
- avgShortDistance()¶
Get average element element shortest distance.
- Return type
float
- Returns
Average element element shortest distance
- anionFrameCoordination()¶
Get anion framework coordination.
- Return type
float
- Returns
Anion framework coordination
- pathWidth(eval_eneg=False)¶
Evaluate average straight line path width. See the reference in the constructor for more info.
- Parameters
eval_eneg (bool) – If True, return average over electronegativity, instead of distance
- Return type
float
- Returns
Average path or electronegativity
- pathWidthEneg()¶
Evaluate average straight line path electronegativity.
- Return type
float
- Returns
Average electronegativity along the path
- ratioIonicity()¶
Get ratio ionicity.
- Return type
float
- Returns
Ratio ionicity
- ratioCount()¶
Get ratio neighbor count.
- Return type
float
- Returns
Ratio neighbor count
- set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.LatticeFeatures ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data
parameter infit
.- data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data_y
parameter infit
.
- selfobject
The updated object.
- set_transform_request(*, structs: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.LatticeFeatures ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- structsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
structs
parameter intransform
.
- selfobject
The updated object.
- class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)¶
Bases:
object
Manage a ligand.
- __init__(st, metal_atom, new_to_old, coordination_idxs)¶
Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structuremetal_atom (
schrodinger.structure._StructureAtom
) – the metal atomnew_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
- getVec(point)¶
Return a vector pointing from the metal atom to the given point.
- Parameters
point (
numpy.array
) – the point in Ang.- Return type
numpy.array
- Returns
the vector in Ang.
- getCentroid(st, idxs)¶
Return the centroid vector of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the centroid vector in Ang.
- getCoordinationVec(st, idxs)¶
Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the coordination vector in Ang.
- getStoichiometry()¶
Return the stoichiometry.
- Return type
str
- Returns
the stoichiometry
- getDenticity()¶
Return the denticity.
- Return type
int
- Returns
the denticity
- getHapticity()¶
Return the hapticity.
- Return type
int
- Returns
the hapticity
- getHapticCharacter()¶
Return the haptic character.
- Return type
int
- Returns
the haptic character
- getBiteAngle()¶
Return the bite angle in degrees.
- Return type
float or None
- Returns
the bite angle in degrees
- getAtomConeAngle(atom)¶
Return the cone angle for the given atom in degrees.
- Parameters
atom (
schrodinger.structure._StructureAtom
) – the atom- Return type
float
- Returns
the cone angle for the given atom in degrees
- getConeAngle()¶
Return the cone angle in degrees.
- Return type
float
- Returns
the cone angle in degrees
- getBondLength()¶
Return the bond length in Ang.
- Return type
float
- Returns
the bond length in Ang.
- getDescriptors()¶
Return descriptors.
- Return type
dict
- Returns
(label, data) pairs
- class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())¶
Bases:
object
Manage a complex.
- BURIED_VOLUME_VDW_SCALE = 1.17¶
- CONTOURS_DIR = 'contours'¶
- __init__(st, logger=None, nonmetallic_centers=())¶
Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structurelogger (logging.Logger or None) – output logger or None if there isn’t one
nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom
- setMetalAtom()¶
Set the metal atom.
- setLigands()¶
Set the ligands.
- getBondAngle()¶
Return the bond angle in degrees.
- Return type
float
- Returns
the bond angle in degrees
- getVDWSurfaceArea()¶
Return the VDW surface area in Angstrom^2.
- Return type
float
- Returns
the VDW surface area in Angstrom^2
- getVDWVolume(vdw_scale=1, buffer_len=2)¶
Return the VDW volume in Angstrom^3.
- Parameters
vdw_scale (float) – the VDW scale
buffer_len (float) – a shape buffer lengths in Angstrom
- Return type
float
- Returns
the VDW volume in Angstrom^3
- getBuriedVolumeStructure(only_largest_ligands=False)¶
Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric.
- Parameters
only_largest_ligands (bool) – Whether small ligands should be deleted
- Return type
- Returns
the structure containing some or all ligands
- getBuriedVDWVolumePct(struct, vdw_scale=1.17, sphere_quadrant=None, free_volume=False)¶
Return the buried VDW volume percent.
- Parameters
struct (structure.Structure) – The structure to get buried volume for
vdw_scale (float) – the VDW scale
sphere_quadrant (None or str) – restrict sphere sampling to a quadrant specified as a key of amorphous.ORDINAL_DIRECTIONS
free_volume (bool) – use this option to return the free volume
- Return type
float
- Returns
the buried VDW volume percent
- getAlignmentVectors()¶
Return two vectors for the structure that will be used to rotate it, the first is the vector to be aligned along the +Y-axis and the second is the vector to be aligned along the +Z-axis.
- Return type
numpy.array, numpy.array
- Returns
vectors to be aligned along the +Y-axis and +Z-axis
- getRotatedComplex()¶
Return a copy of the complex that is rotated so that the smallest ligands are along the +Z-axis.
- Return type
- Returns
copy of the structure rotated so that the smallest ligands are along the +Z-axis
- exportBuriedVolumeContour(sphere_radius=10, vdw_scale=1, num_bins=100, seed=1234, num_points=2000000)¶
Export the buried volume contour for the complex
- Parameters
sphere_radius (float) – The radius for the sphere to sample points in
vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom
num_bins (int) – The number of bins in x and y direction to put the points in
seed (int) – Seed for random number generation
num_points (int) – the sample size of random points in the sphere
- Return type
str, str
- Returns
The paths to contour png and csv files
- getProjectedBonds(struct)¶
Return the projections of each bond in the given structure onto the xy-plane.
- Parameters
struct (
schrodinger.structure.Structure
) – the structure- Return type
list[ProjBondData]
- Returns
projections of bonds onto the xy-plane
- addShadow(axes, struct)¶
Add the given structure’s shadow to the given contour axes.
- Parameters
axes (
matplotlib.axes.Axes
) – the contour axesstruct (
schrodinger.structure.Structure
) – the structure
- plotContour(points, struct)¶
Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour.
- Parameters
points (numpy.array) – The x, y, z values of points
struct (schrodinger.structure.Structure) – the structure
- getVectorizedDescriptors(jaguar_out_file)¶
Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
- Parameters
jaguar_out_file (str or None) – the name of a Jaguar
*.out
file from which descriptors will be extracted or None if there isn’t one- Return type
dict
- Returns
(label, data) pairs
- getDescriptors(no_organometallic=False)¶
Return descriptors.
- Parameters
no_organometallic (bool) – Whether organometallic descriptors should be skipped
- Return type
dict
- Returns
(label, data) pairs
- schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)¶
Return a list of unique titles for the given structures.
- Parameters
sts (list) – contains
schrodinger.structure.Structure
- Return type
list
- Returns
the unique titles
- class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate features for metal complexes.
- __init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
Create an instance.
- Parameters
jaguar (bool) – specify whether to calculate Jaguar features
jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
tpp (int) – the number of threads for any Jaguar jobs
ligfilter (bool) – specify whether to calculate Ligfilter features
no_organometallic (bool) – Whether organometallic descriptors should be skipped
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors
include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
save_files (bool) – Whether to save subjob files or not
logger (logging.Logger or None) – output logger or None if there isn’t one
- runJaguar()¶
Run Jaguar on the given structures.
- Return type
list
- Returns
contains Jaguar
*.out
file names
- getFeatures(structs, jaguar_out_files=None)¶
Return features dictionary for the given structures
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurizedjaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar
*.out
files then specify the files here using the same ordering as used for any given structures
- verifyJaguarOutfiles()¶
Run jaguar and get the out-files if the out-files have not been provided
- getComplexDescriptors()¶
Create a
Complex
object for each structure and get their descriptors- Return type
dict
- Returns
The descriptors from
Complex
for each structure
- getJaguarDescriptors()¶
Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.
- Return type
dict
- Returns
The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getUtilityDescriptors()¶
Get the requested utility descriptors for all structures
- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getDescriptorUtilityJob(descriptor_utility)¶
Get the job to run to generate the descriptors using the passed descriptor_utility for all structures
- Parameters
descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors
- Return type
jobutils.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
- getExtraMolecularDescriptorsProps(st, descriptor_utility)¶
Return any extra structure properties computed using the output from molecular descriptors.
- Parameters
st (
schrodinger.structure.Structure
) – the structure output from molecular descriptors which has all output properties defineddescriptor_utility (DescriptorUtility) – the molecular descriptor utility containing the original job parameters
- Return type
dict
- Returns
pairs are property names and values
- processUtilityDescriptorOutputs(jobs_dict)¶
Read the descriptors for all descriptor utilities that were run, and return them
- Parameters
jobs_dict (dict) – Dictionary with
DescriptorUtility
as keys and jobs as values- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getMolecularDescriptorsJob()¶
Get the job to run to generate molecular descriptors for all structures
- Return type
jobutils.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
- static writeFingerprintFiles(structs)¶
Write fingerprint files for the given structures.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be fingerprinted- Return type
list
- Returns
the fingerprint file names
- log(msg, **kwargs)¶
Add a message to the log file
- Parameters
msg (str) – The message to log
Additional keyword arguments are passed to the textlogger.log_msg function
- set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.ComplexFeatures ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data
parameter infit
.- data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data_y
parameter infit
.
- selfobject
The updated object.
- set_transform_request(*, data: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.ComplexFeatures ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data
parameter intransform
.
- selfobject
The updated object.
- class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')¶
Bases:
object
Calculates CrystalNN structure fingerprints as implemented in pymatgen
- OPS_PRESET = 'ops'¶
- CN_PRESET = 'cn'¶
- __init__(preset='ops')¶
Create a structure featurizer
- Parameters
preset (str) – One of
OPS_PRESET
orCN_PRESET
class constants
- featurize(struct)¶
Get CrystalNN fingerprints for the passed structure
:param
structure.Structure
The structure to get features for- Return type
list
- Returns
List of CrystalNN fingerprints for the structure