schrodinger.application.matsci.mlearn.features module

Classes and functions to deal with ML features.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)

Bases: tuple

components

Alias for field number 1

flag

Alias for field number 0

header

Alias for field number 2

units

Alias for field number 3

schrodinger.application.matsci.mlearn.features.DescriptorUtility

alias of DescriptorUtilitity

class schrodinger.application.matsci.mlearn.features.ProjBondData(atom_1, atom_2, xy_1, xy_2)

Bases: tuple

atom_1

Alias for field number 0

atom_2

Alias for field number 1

xy_1

Alias for field number 2

xy_2

Alias for field number 3

schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)

Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.

Parameters:
Return type:

schrodinger.structure.Structure, , schrodinger.infra.structure.DistanceCell, schrodinger.infra.structure.PBC

Returns:

Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.

Raise:

ValueError if struct is missing PBCs

schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)
schrodinger.application.matsci.mlearn.features.get_anion(struct)

Get the most electronegative element in the structure (anion).

Parameters:

struct (schrodinger.structure.Structure) – Input structure

Return type:

str, float, int

Returns:

Element, it’s electronegativity, number of anions in the cell

class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)

Bases: BaseFeaturizer

Class to generate lattice-based features.

FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}
__init__(features, element='Li', cutoff=4.0)

Initialize the object.

runFeature(feature)

Get result from a feature.

Param:

feature: One of the features listed in FEATURES.

Return type:

int or float

Returns:

Feature value

transform(structs)

Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.

Parameters:

structs (list(schrodinger.structure.Structure)) – List of structures to be featurized

Return type:

numpy array of shape [n_samples, n_features]

Returns:

Transformed array

avgAtomicVol()

Get average atomic volume.

Parameters:

struct (schrodinger.structure.Structure) – Structure to be used for feature calculation

Return type:

float

Returns:

Average atomic volume (A^3)

avgNeighborCount()

Get average neighbor count.

Return type:

float

Returns:

Average neighbor count

stdNeighborCount()

Get standard deviation of neighbor count.

Return type:

float

Returns:

Average neighbor count

avgSublatticeEneg()

Get average sublattice electronegativity.

Return type:

float

Returns:

Average sublattice electronegativity

avgSublatticeNeighborCount()

Get average sublattice neighbor count.

Return type:

float

Returns:

Average sublattice neighbor count

avgNeighborIon()

Get average neighbor ionicity.

Return type:

float

Returns:

Average neighbor ionicity

stdNeighborIon()

Get standard deviation of neighbor ionicity.

Return type:

float

Returns:

Average neighbor ionicity

avgSublatticeNeighborIon()

Get average sublattice neighbor ionicity.

Return type:

float

Returns:

Average sublattice neighbor count

volPerAnion()

Get volume per anion.

Return type:

float

Returns:

Volume per anion

packingFraction(skip_element=None)

Get packing fraction of the crystal.

Parameters:

skip_element (str) – Element to skip

Return type:

float

Returns:

Packing fraction

effectiveRadius(atom)

Get atom effective radius.

Parameters:

atom (schrodinger.structure._StructureAtom) – Atom

Return type:

float

Returns:

Effective radius

sublatticePackingFraction()

Get packing fraction of the sublattice crystal.

Return type:

float

Returns:

Packing fraction

avgElementNeighborCount()

Get average element neighbor count.

Return type:

float

Returns:

Average number of bonds per element

avgAnionAnionShortDistance()

Get average anion anion shortest distance.

Return type:

float

Returns:

Average anion anion shortest distance

avgElementAnionShortDistance()

Get average element anion shortest distance.

Return type:

float

Returns:

Average element anion shortest distance

avgShortDistance()

Get average element element shortest distance.

Return type:

float

Returns:

Average element element shortest distance

anionFrameCoordination()

Get anion framework coordination.

Return type:

float

Returns:

Anion framework coordination

pathWidth(eval_eneg=False)

Evaluate average straight line path width. See the reference in the constructor for more info.

Parameters:

eval_eneg (bool) – If True, return average over electronegativity, instead of distance

Return type:

float

Returns:

Average path or electronegativity

pathWidthEneg()

Evaluate average straight line path electronegativity.

Return type:

float

Returns:

Average electronegativity along the path

ratioIonicity()

Get ratio ionicity.

Return type:

float

Returns:

Ratio ionicity

ratioCount()

Get ratio neighbor count.

Return type:

float

Returns:

Ratio neighbor count

set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') LatticeFeatures

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data parameter in fit.

data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data_y parameter in fit.

selfobject

The updated object.

set_transform_request(*, structs: Union[bool, None, str] = '$UNCHANGED$') LatticeFeatures

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

structsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for structs parameter in transform.

selfobject

The updated object.

class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)

Bases: object

Manage a ligand.

__init__(st, metal_atom, new_to_old, coordination_idxs)

Create an instance.

Parameters:
  • st (schrodinger.structure.Structure) – the structure

  • metal_atom (schrodinger.structure._StructureAtom) – the metal atom

  • new_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)

  • coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms

getVec(point)

Return a vector pointing from the metal atom to the given point.

Parameters:

point (numpy.array) – the point in Ang.

Return type:

numpy.array

Returns:

the vector in Ang.

getCentroid(st, idxs)

Return the centroid vector of the given coordination atom indices.

Parameters:
Return type:

numpy.array

Returns:

the centroid vector in Ang.

getCoordinationVec(st, idxs)

Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.

Parameters:
Return type:

numpy.array

Returns:

the coordination vector in Ang.

getStoichiometry()

Return the stoichiometry.

Return type:

str

Returns:

the stoichiometry

getDenticity()

Return the denticity.

Return type:

int

Returns:

the denticity

getHapticity()

Return the hapticity.

Return type:

int

Returns:

the hapticity

getHapticCharacter()

Return the haptic character.

Return type:

int

Returns:

the haptic character

getBiteAngle()

Return the bite angle in degrees.

Return type:

float or None

Returns:

the bite angle in degrees

getAtomConeAngle(atom)

Return the cone angle for the given atom in degrees.

Parameters:

atom (schrodinger.structure._StructureAtom) – the atom

Return type:

float

Returns:

the cone angle for the given atom in degrees

getConeAngle()

Return the cone angle in degrees.

Return type:

float

Returns:

the cone angle in degrees

getBondLength()

Return the bond length in Ang.

Return type:

float

Returns:

the bond length in Ang.

getDescriptors()

Return descriptors.

Return type:

dict

Returns:

(label, data) pairs

class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())

Bases: object

Manage a complex.

BURIED_VOLUME_VDW_SCALE = 1.17
CONTOURS_DIR = 'contours'
__init__(st, logger=None, nonmetallic_centers=())

Create an instance.

Parameters:
  • st (schrodinger.structure.Structure) – the structure

  • logger (logging.Logger or None) – output logger or None if there isn’t one

  • nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom

setMetalAtom()

Set the metal atom.

setLigands()

Set the ligands.

getBondAngle()

Return the bond angle in degrees.

Return type:

float

Returns:

the bond angle in degrees

getVDWSurfaceArea()

Return the VDW surface area in Angstrom^2.

Return type:

float

Returns:

the VDW surface area in Angstrom^2

getVDWVolume(vdw_scale=1, buffer_len=2)

Return the VDW volume in Angstrom^3.

Parameters:
  • vdw_scale (float) – the VDW scale

  • buffer_len (float) – a shape buffer lengths in Angstrom

Return type:

float

Returns:

the VDW volume in Angstrom^3

getBuriedVolumeStructure(only_largest_ligands=False)

Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric.

Parameters:

only_largest_ligands (bool) – Whether small ligands should be deleted

Return type:

schrodinger.structure.Structure

Returns:

the structure containing some or all ligands

getBuriedVDWVolumePct(struct, vdw_scale=1.17, sphere_quadrant=None, free_volume=False)

Return the buried VDW volume percent.

Parameters:
  • struct (structure.Structure) – The structure to get buried volume for

  • vdw_scale (float) – the VDW scale

  • sphere_quadrant (None or str) – restrict sphere sampling to a quadrant specified as a key of amorphous.ORDINAL_DIRECTIONS

  • free_volume (bool) – use this option to return the free volume

Return type:

float

Returns:

the buried VDW volume percent

getAlignmentVectors()

Return two vectors for the structure that will be used to rotate it, the first is the vector to be aligned along the +Y-axis and the second is the vector to be aligned along the +Z-axis.

Return type:

numpy.array, numpy.array

Returns:

vectors to be aligned along the +Y-axis and +Z-axis

getRotatedComplex()

Return a copy of the complex that is rotated so that the smallest ligands are along the +Z-axis.

Return type:

schrodinger.structure.Structure

Returns:

copy of the structure rotated so that the smallest ligands are along the +Z-axis

exportBuriedVolumeContour(sphere_radius=10, vdw_scale=1, num_bins=100, seed=1234, num_points=2000000)

Export the buried volume contour for the complex

Parameters:
  • sphere_radius (float) – The radius for the sphere to sample points in

  • vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom

  • num_bins (int) – The number of bins in x and y direction to put the points in

  • seed (int) – Seed for random number generation

  • num_points (int) – the sample size of random points in the sphere

Return type:

str, str

Returns:

The paths to contour png and csv files

getProjectedBonds(struct)

Return the projections of each bond in the given structure onto the xy-plane.

Parameters:

struct (schrodinger.structure.Structure) – the structure

Return type:

list[ProjBondData]

Returns:

projections of bonds onto the xy-plane

addShadow(axes, struct)

Add the given structure’s shadow to the given contour axes.

Parameters:
plotContour(points, struct)

Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour.

Parameters:
getVectorizedDescriptors(jaguar_out_file)

Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

Parameters:

jaguar_out_file (str or None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one

Return type:

dict

Returns:

(label, data) pairs

getDescriptors(no_organometallic=False)

Return descriptors.

Parameters:

no_organometallic (bool) – Whether organometallic descriptors should be skipped

Return type:

dict

Returns:

(label, data) pairs

schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)

Return a list of unique titles for the given structures.

Parameters:

sts (list) – contains schrodinger.structure.Structure

Return type:

list

Returns:

the unique titles

class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Bases: BaseFeaturizer

Class to generate features for metal complexes.

__init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Create an instance.

Parameters:
  • jaguar (bool) – specify whether to calculate Jaguar features

  • jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here

  • tpp (int) – the number of threads for any Jaguar jobs

  • ligfilter (bool) – specify whether to calculate Ligfilter features

  • no_organometallic (bool) – Whether organometallic descriptors should be skipped

  • canvas (bool) – specify whether to calculate Canvas features

  • moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors

  • include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

  • save_files (bool) – Whether to save subjob files or not

  • logger (logging.Logger or None) – output logger or None if there isn’t one

runJaguar()

Run Jaguar on the given structures.

Return type:

list

Returns:

contains Jaguar *.out file names

getFeatures(structs, jaguar_out_files=None)

Return features dictionary for the given structures

Parameters:
  • structs (list(schrodinger.structure.Structure)) – list of structures to be featurized

  • jaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures

verifyJaguarOutfiles()

Run jaguar and get the out-files if the out-files have not been provided

getComplexDescriptors()

Create a Complex object for each structure and get their descriptors

Return type:

dict

Returns:

The descriptors from Complex for each structure

getJaguarDescriptors()

Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.

Return type:

dict

Returns:

The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getUtilityDescriptors()

Get the requested utility descriptors for all structures

Return type:

dict

Returns:

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getDescriptorUtilityJob(descriptor_utility)

Get the job to run to generate the descriptors using the passed descriptor_utility for all structures

Parameters:

descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors

Return type:

jobutils.RobustSubmissionJob

Returns:

The job to run to generate the descriptors

getExtraMolecularDescriptorsProps(st, descriptor_utility)

Return any extra structure properties computed using the output from molecular descriptors.

Parameters:
  • st (schrodinger.structure.Structure) – the structure output from molecular descriptors which has all output properties defined

  • descriptor_utility (DescriptorUtility) – the molecular descriptor utility containing the original job parameters

Return type:

dict

Returns:

pairs are property names and values

processUtilityDescriptorOutputs(jobs_dict)

Read the descriptors for all descriptor utilities that were run, and return them

Parameters:

jobs_dict (dict) – Dictionary with DescriptorUtility as keys and jobs as values

Return type:

dict

Returns:

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getMolecularDescriptorsJob()

Get the job to run to generate molecular descriptors for all structures

Return type:

jobutils.RobustSubmissionJob

Returns:

The job to run to generate the descriptors

static writeFingerprintFiles(structs)

Write fingerprint files for the given structures.

Parameters:

structs (list(schrodinger.structure.Structure)) – list of structures to be fingerprinted

Return type:

list

Returns:

the fingerprint file names

log(msg, **kwargs)

Add a message to the log file

Parameters:

msg (str) – The message to log

Additional keyword arguments are passed to the textlogger.log_msg function

set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') ComplexFeatures

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data parameter in fit.

data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data_y parameter in fit.

selfobject

The updated object.

set_transform_request(*, data: Union[bool, None, str] = '$UNCHANGED$') ComplexFeatures

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data parameter in transform.

selfobject

The updated object.

class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')

Bases: object

Calculates CrystalNN structure fingerprints as implemented in pymatgen

OPS_PRESET = 'ops'
CN_PRESET = 'cn'
__init__(preset='ops')

Create a structure featurizer

Parameters:

preset (str) – One of OPS_PRESET or CN_PRESET class constants

featurize(struct)

Get CrystalNN fingerprints for the passed structure

:param structure.Structure The structure to get features for

Return type:

list

Returns:

List of CrystalNN fingerprints for the structure