schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)¶
Bases:
tuple
- components¶
Alias for field number 1
- flag¶
Alias for field number 0
- header¶
Alias for field number 2
- units¶
Alias for field number 3
- schrodinger.application.matsci.mlearn.features.DescriptorUtility¶
alias of
DescriptorUtilitity
- class schrodinger.application.matsci.mlearn.features.ProjBondData(atom_1, atom_2, xy_1, xy_2)¶
Bases:
tuple
- atom_1¶
Alias for field number 0
- atom_2¶
Alias for field number 1
- xy_1¶
Alias for field number 2
- xy_2¶
Alias for field number 3
- schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)¶
Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
- Parameters:
struct (
schrodinger.structure.Structure
) – Input structurecutoff (float) – The cutoff for finding nearest neighbor atoms
- Return type:
schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
- Returns:
Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
- Raise:
ValueError if struct is missing PBCs
- schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)¶
- schrodinger.application.matsci.mlearn.features.get_anion(struct)¶
Get the most electronegative element in the structure (anion).
- Parameters:
struct (
schrodinger.structure.Structure
) – Input structure- Return type:
str, float, int
- Returns:
Element, it’s electronegativity, number of anions in the cell
- class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)¶
Bases:
BaseFeaturizer
Class to generate lattice-based features.
- FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
- __init__(features, element='Li', cutoff=4.0)¶
Initialize the object.
- runFeature(feature)¶
Get result from a feature.
- Param:
feature: One of the features listed in FEATURES.
- Return type:
int or float
- Returns:
Feature value
- transform(structs)¶
Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
- Parameters:
structs (list(
schrodinger.structure.Structure
)) – List of structures to be featurized- Return type:
numpy array of shape [n_samples, n_features]
- Returns:
Transformed array
- avgAtomicVol()¶
Get average atomic volume.
- Parameters:
struct (
schrodinger.structure.Structure
) – Structure to be used for feature calculation- Return type:
float
- Returns:
Average atomic volume (A^3)
- avgNeighborCount()¶
Get average neighbor count.
- Return type:
float
- Returns:
Average neighbor count
- stdNeighborCount()¶
Get standard deviation of neighbor count.
- Return type:
float
- Returns:
Average neighbor count
- avgSublatticeEneg()¶
Get average sublattice electronegativity.
- Return type:
float
- Returns:
Average sublattice electronegativity
- avgSublatticeNeighborCount()¶
Get average sublattice neighbor count.
- Return type:
float
- Returns:
Average sublattice neighbor count
- avgNeighborIon()¶
Get average neighbor ionicity.
- Return type:
float
- Returns:
Average neighbor ionicity
- stdNeighborIon()¶
Get standard deviation of neighbor ionicity.
- Return type:
float
- Returns:
Average neighbor ionicity
- avgSublatticeNeighborIon()¶
Get average sublattice neighbor ionicity.
- Return type:
float
- Returns:
Average sublattice neighbor count
- volPerAnion()¶
Get volume per anion.
- Return type:
float
- Returns:
Volume per anion
- packingFraction(skip_element=None)¶
Get packing fraction of the crystal.
- Parameters:
skip_element (str) – Element to skip
- Return type:
float
- Returns:
Packing fraction
- effectiveRadius(atom)¶
Get atom effective radius.
- Parameters:
atom (schrodinger.structure._StructureAtom) – Atom
- Return type:
float
- Returns:
Effective radius
- sublatticePackingFraction()¶
Get packing fraction of the sublattice crystal.
- Return type:
float
- Returns:
Packing fraction
- avgElementNeighborCount()¶
Get average element neighbor count.
- Return type:
float
- Returns:
Average number of bonds per element
- avgAnionAnionShortDistance()¶
Get average anion anion shortest distance.
- Return type:
float
- Returns:
Average anion anion shortest distance
- avgElementAnionShortDistance()¶
Get average element anion shortest distance.
- Return type:
float
- Returns:
Average element anion shortest distance
- avgShortDistance()¶
Get average element element shortest distance.
- Return type:
float
- Returns:
Average element element shortest distance
- anionFrameCoordination()¶
Get anion framework coordination.
- Return type:
float
- Returns:
Anion framework coordination
- pathWidth(eval_eneg=False)¶
Evaluate average straight line path width. See the reference in the constructor for more info.
- Parameters:
eval_eneg (bool) – If True, return average over electronegativity, instead of distance
- Return type:
float
- Returns:
Average path or electronegativity
- pathWidthEneg()¶
Evaluate average straight line path electronegativity.
- Return type:
float
- Returns:
Average electronegativity along the path
- ratioIonicity()¶
Get ratio ionicity.
- Return type:
float
- Returns:
Ratio ionicity
- ratioCount()¶
Get ratio neighbor count.
- Return type:
float
- Returns:
Ratio neighbor count
- set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') LatticeFeatures ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data
parameter infit
.- data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data_y
parameter infit
.
- selfobject
The updated object.
- set_transform_request(*, structs: Union[bool, None, str] = '$UNCHANGED$') LatticeFeatures ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- structsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
structs
parameter intransform
.
- selfobject
The updated object.
- class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)¶
Bases:
object
Manage a ligand.
- __init__(st, metal_atom, new_to_old, coordination_idxs)¶
Create an instance.
- Parameters:
st (
schrodinger.structure.Structure
) – the structuremetal_atom (
schrodinger.structure._StructureAtom
) – the metal atomnew_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
- getVec(point)¶
Return a vector pointing from the metal atom to the given point.
- Parameters:
point (
numpy.array
) – the point in Ang.- Return type:
numpy.array
- Returns:
the vector in Ang.
- getCentroid(st, idxs)¶
Return the centroid vector of the given coordination atom indices.
- Parameters:
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type:
numpy.array
- Returns:
the centroid vector in Ang.
- getCoordinationVec(st, idxs)¶
Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
- Parameters:
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type:
numpy.array
- Returns:
the coordination vector in Ang.
- getStoichiometry()¶
Return the stoichiometry.
- Return type:
str
- Returns:
the stoichiometry
- getDenticity()¶
Return the denticity.
- Return type:
int
- Returns:
the denticity
- getHapticity()¶
Return the hapticity.
- Return type:
int
- Returns:
the hapticity
- getHapticCharacter()¶
Return the haptic character.
- Return type:
int
- Returns:
the haptic character
- getBiteAngle()¶
Return the bite angle in degrees.
- Return type:
float or None
- Returns:
the bite angle in degrees
- getAtomConeAngle(atom)¶
Return the cone angle for the given atom in degrees.
- Parameters:
atom (
schrodinger.structure._StructureAtom
) – the atom- Return type:
float
- Returns:
the cone angle for the given atom in degrees
- getConeAngle()¶
Return the cone angle in degrees.
- Return type:
float
- Returns:
the cone angle in degrees
- getBondLength()¶
Return the bond length in Ang.
- Return type:
float
- Returns:
the bond length in Ang.
- getDescriptors()¶
Return descriptors.
- Return type:
dict
- Returns:
(label, data) pairs
- class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())¶
Bases:
object
Manage a complex.
- BURIED_VOLUME_VDW_SCALE = 1.17¶
- CONTOURS_DIR = 'contours'¶
- __init__(st, logger=None, nonmetallic_centers=())¶
Create an instance.
- Parameters:
st (
schrodinger.structure.Structure
) – the structurelogger (logging.Logger or None) – output logger or None if there isn’t one
nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom
- setMetalAtom()¶
Set the metal atom.
- setLigands()¶
Set the ligands.
- getBondAngle()¶
Return the bond angle in degrees.
- Return type:
float
- Returns:
the bond angle in degrees
- getVDWSurfaceArea()¶
Return the VDW surface area in Angstrom^2.
- Return type:
float
- Returns:
the VDW surface area in Angstrom^2
- getVDWVolume(vdw_scale=1, buffer_len=2)¶
Return the VDW volume in Angstrom^3.
- Parameters:
vdw_scale (float) – the VDW scale
buffer_len (float) – a shape buffer lengths in Angstrom
- Return type:
float
- Returns:
the VDW volume in Angstrom^3
- getBuriedVolumeStructure(only_largest_ligands=False)¶
Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric.
- Parameters:
only_largest_ligands (bool) – Whether small ligands should be deleted
- Return type:
- Returns:
the structure containing some or all ligands
- getBuriedVDWVolumePct(struct, vdw_scale=1.17, sphere_quadrant=None, free_volume=False)¶
Return the buried VDW volume percent.
- Parameters:
struct (structure.Structure) – The structure to get buried volume for
vdw_scale (float) – the VDW scale
sphere_quadrant (None or str) – restrict sphere sampling to a quadrant specified as a key of amorphous.ORDINAL_DIRECTIONS
free_volume (bool) – use this option to return the free volume
- Return type:
float
- Returns:
the buried VDW volume percent
- getAlignmentVectors()¶
Return two vectors for the structure that will be used to rotate it, the first is the vector to be aligned along the +Y-axis and the second is the vector to be aligned along the +Z-axis.
- Return type:
numpy.array, numpy.array
- Returns:
vectors to be aligned along the +Y-axis and +Z-axis
- getRotatedComplex()¶
Return a copy of the complex that is rotated so that the smallest ligands are along the +Z-axis.
- Return type:
- Returns:
copy of the structure rotated so that the smallest ligands are along the +Z-axis
- exportBuriedVolumeContour(sphere_radius=10, vdw_scale=1, num_bins=100, seed=1234, num_points=2000000)¶
Export the buried volume contour for the complex
- Parameters:
sphere_radius (float) – The radius for the sphere to sample points in
vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom
num_bins (int) – The number of bins in x and y direction to put the points in
seed (int) – Seed for random number generation
num_points (int) – the sample size of random points in the sphere
- Return type:
str, str
- Returns:
The paths to contour png and csv files
- getProjectedBonds(struct)¶
Return the projections of each bond in the given structure onto the xy-plane.
- Parameters:
struct (
schrodinger.structure.Structure
) – the structure- Return type:
list[ProjBondData]
- Returns:
projections of bonds onto the xy-plane
- addShadow(axes, struct)¶
Add the given structure’s shadow to the given contour axes.
- Parameters:
axes (
matplotlib.axes.Axes
) – the contour axesstruct (
schrodinger.structure.Structure
) – the structure
- plotContour(points, struct)¶
Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour.
- Parameters:
points (numpy.array) – The x, y, z values of points
struct (schrodinger.structure.Structure) – the structure
- getVectorizedDescriptors(jaguar_out_file)¶
Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
- Parameters:
jaguar_out_file (str or None) – the name of a Jaguar
*.out
file from which descriptors will be extracted or None if there isn’t one- Return type:
dict
- Returns:
(label, data) pairs
- getDescriptors(no_organometallic=False)¶
Return descriptors.
- Parameters:
no_organometallic (bool) – Whether organometallic descriptors should be skipped
- Return type:
dict
- Returns:
(label, data) pairs
- schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)¶
Return a list of unique titles for the given structures.
- Parameters:
sts (list) – contains
schrodinger.structure.Structure
- Return type:
list
- Returns:
the unique titles
- class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
Bases:
BaseFeaturizer
Class to generate features for metal complexes.
- __init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
Create an instance.
- Parameters:
jaguar (bool) – specify whether to calculate Jaguar features
jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
tpp (int) – the number of threads for any Jaguar jobs
ligfilter (bool) – specify whether to calculate Ligfilter features
no_organometallic (bool) – Whether organometallic descriptors should be skipped
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors
include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
save_files (bool) – Whether to save subjob files or not
logger (logging.Logger or None) – output logger or None if there isn’t one
- runJaguar()¶
Run Jaguar on the given structures.
- Return type:
list
- Returns:
contains Jaguar
*.out
file names
- getFeatures(structs, jaguar_out_files=None)¶
Return features dictionary for the given structures
- Parameters:
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurizedjaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar
*.out
files then specify the files here using the same ordering as used for any given structures
- verifyJaguarOutfiles()¶
Run jaguar and get the out-files if the out-files have not been provided
- getComplexDescriptors()¶
Create a
Complex
object for each structure and get their descriptors- Return type:
dict
- Returns:
The descriptors from
Complex
for each structure
- getJaguarDescriptors()¶
Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.
- Return type:
dict
- Returns:
The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getUtilityDescriptors()¶
Get the requested utility descriptors for all structures
- Return type:
dict
- Returns:
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getDescriptorUtilityJob(descriptor_utility)¶
Get the job to run to generate the descriptors using the passed descriptor_utility for all structures
- Parameters:
descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors
- Return type:
jobutils.RobustSubmissionJob
- Returns:
The job to run to generate the descriptors
- getExtraMolecularDescriptorsProps(st, descriptor_utility)¶
Return any extra structure properties computed using the output from molecular descriptors.
- Parameters:
st (
schrodinger.structure.Structure
) – the structure output from molecular descriptors which has all output properties defineddescriptor_utility (DescriptorUtility) – the molecular descriptor utility containing the original job parameters
- Return type:
dict
- Returns:
pairs are property names and values
- processUtilityDescriptorOutputs(jobs_dict)¶
Read the descriptors for all descriptor utilities that were run, and return them
- Parameters:
jobs_dict (dict) – Dictionary with
DescriptorUtility
as keys and jobs as values- Return type:
dict
- Returns:
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
- getMolecularDescriptorsJob()¶
Get the job to run to generate molecular descriptors for all structures
- Return type:
jobutils.RobustSubmissionJob
- Returns:
The job to run to generate the descriptors
- static writeFingerprintFiles(structs)¶
Write fingerprint files for the given structures.
- Parameters:
structs (list(
schrodinger.structure.Structure
)) – list of structures to be fingerprinted- Return type:
list
- Returns:
the fingerprint file names
- log(msg, **kwargs)¶
Add a message to the log file
- Parameters:
msg (str) – The message to log
Additional keyword arguments are passed to the textlogger.log_msg function
- set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') ComplexFeatures ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data
parameter infit
.- data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data_y
parameter infit
.
- selfobject
The updated object.
- set_transform_request(*, data: Union[bool, None, str] = '$UNCHANGED$') ComplexFeatures ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
data
parameter intransform
.
- selfobject
The updated object.
- class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')¶
Bases:
object
Calculates CrystalNN structure fingerprints as implemented in pymatgen
- OPS_PRESET = 'ops'¶
- CN_PRESET = 'cn'¶
- __init__(preset='ops')¶
Create a structure featurizer
- Parameters:
preset (str) – One of
OPS_PRESET
orCN_PRESET
class constants
- featurize(struct)¶
Get CrystalNN fingerprints for the passed structure
:param
structure.Structure
The structure to get features for- Return type:
list
- Returns:
List of CrystalNN fingerprints for the structure