schrodinger.application.matsci.mlearn.features module

Classes and functions to deal with ML features.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)

Bases: tuple

components

Alias for field number 1

flag

Alias for field number 0

header

Alias for field number 2

units

Alias for field number 3

schrodinger.application.matsci.mlearn.features.DescriptorUtility

alias of schrodinger.application.matsci.mlearn.features.DescriptorUtilitity

class schrodinger.application.matsci.mlearn.features.ProjBondData(atom_1, atom_2, xy_1, xy_2)

Bases: tuple

atom_1

Alias for field number 0

atom_2

Alias for field number 1

xy_1

Alias for field number 2

xy_2

Alias for field number 3

schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)

Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.

Parameters
Return type

schrodinger.structure.Structure, , schrodinger.infra.structure.DistanceCell, schrodinger.infra.structure.PBC

Returns

Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.

Raise

ValueError if struct is missing PBCs

schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)
schrodinger.application.matsci.mlearn.features.get_anion(struct)

Get the most electronegative element in the structure (anion).

Parameters

struct (schrodinger.structure.Structure) – Input structure

Return type

str, float, int

Returns

Element, it’s electronegativity, number of anions in the cell

class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate lattice-based features.

FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}
__init__(features, element='Li', cutoff=4.0)

Initialize the object.

runFeature(feature)

Get result from a feature.

Param

feature: One of the features listed in FEATURES.

Return type

int or float

Returns

Feature value

transform(structs)

Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.

Parameters

structs (list(schrodinger.structure.Structure)) – List of structures to be featurized

Return type

numpy array of shape [n_samples, n_features]

Returns

Transformed array

avgAtomicVol()

Get average atomic volume.

Parameters

struct (schrodinger.structure.Structure) – Structure to be used for feature calculation

Return type

float

Returns

Average atomic volume (A^3)

avgNeighborCount()

Get average neighbor count.

Return type

float

Returns

Average neighbor count

stdNeighborCount()

Get standard deviation of neighbor count.

Return type

float

Returns

Average neighbor count

avgSublatticeEneg()

Get average sublattice electronegativity.

Return type

float

Returns

Average sublattice electronegativity

avgSublatticeNeighborCount()

Get average sublattice neighbor count.

Return type

float

Returns

Average sublattice neighbor count

avgNeighborIon()

Get average neighbor ionicity.

Return type

float

Returns

Average neighbor ionicity

stdNeighborIon()

Get standard deviation of neighbor ionicity.

Return type

float

Returns

Average neighbor ionicity

avgSublatticeNeighborIon()

Get average sublattice neighbor ionicity.

Return type

float

Returns

Average sublattice neighbor count

volPerAnion()

Get volume per anion.

Return type

float

Returns

Volume per anion

packingFraction(skip_element=None)

Get packing fraction of the crystal.

Parameters

skip_element (str) – Element to skip

Return type

float

Returns

Packing fraction

effectiveRadius(atom)

Get atom effective radius.

Parameters

atom (schrodinger.structure._StructureAtom) – Atom

Return type

float

Returns

Effective radius

sublatticePackingFraction()

Get packing fraction of the sublattice crystal.

Return type

float

Returns

Packing fraction

avgElementNeighborCount()

Get average element neighbor count.

Return type

float

Returns

Average number of bonds per element

avgAnionAnionShortDistance()

Get average anion anion shortest distance.

Return type

float

Returns

Average anion anion shortest distance

avgElementAnionShortDistance()

Get average element anion shortest distance.

Return type

float

Returns

Average element anion shortest distance

avgShortDistance()

Get average element element shortest distance.

Return type

float

Returns

Average element element shortest distance

anionFrameCoordination()

Get anion framework coordination.

Return type

float

Returns

Anion framework coordination

pathWidth(eval_eneg=False)

Evaluate average straight line path width. See the reference in the constructor for more info.

Parameters

eval_eneg (bool) – If True, return average over electronegativity, instead of distance

Return type

float

Returns

Average path or electronegativity

pathWidthEneg()

Evaluate average straight line path electronegativity.

Return type

float

Returns

Average electronegativity along the path

ratioIonicity()

Get ratio ionicity.

Return type

float

Returns

Ratio ionicity

ratioCount()

Get ratio neighbor count.

Return type

float

Returns

Ratio neighbor count

set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.LatticeFeatures

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data parameter in fit.

data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data_y parameter in fit.

selfobject

The updated object.

set_transform_request(*, structs: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.LatticeFeatures

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

structsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for structs parameter in transform.

selfobject

The updated object.

class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)

Bases: object

Manage a ligand.

__init__(st, metal_atom, new_to_old, coordination_idxs)

Create an instance.

Parameters
  • st (schrodinger.structure.Structure) – the structure

  • metal_atom (schrodinger.structure._StructureAtom) – the metal atom

  • new_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)

  • coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms

getVec(point)

Return a vector pointing from the metal atom to the given point.

Parameters

point (numpy.array) – the point in Ang.

Return type

numpy.array

Returns

the vector in Ang.

getCentroid(st, idxs)

Return the centroid vector of the given coordination atom indices.

Parameters
Return type

numpy.array

Returns

the centroid vector in Ang.

getCoordinationVec(st, idxs)

Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.

Parameters
Return type

numpy.array

Returns

the coordination vector in Ang.

getStoichiometry()

Return the stoichiometry.

Return type

str

Returns

the stoichiometry

getDenticity()

Return the denticity.

Return type

int

Returns

the denticity

getHapticity()

Return the hapticity.

Return type

int

Returns

the hapticity

getHapticCharacter()

Return the haptic character.

Return type

int

Returns

the haptic character

getBiteAngle()

Return the bite angle in degrees.

Return type

float or None

Returns

the bite angle in degrees

getAtomConeAngle(atom)

Return the cone angle for the given atom in degrees.

Parameters

atom (schrodinger.structure._StructureAtom) – the atom

Return type

float

Returns

the cone angle for the given atom in degrees

getConeAngle()

Return the cone angle in degrees.

Return type

float

Returns

the cone angle in degrees

getBondLength()

Return the bond length in Ang.

Return type

float

Returns

the bond length in Ang.

getDescriptors()

Return descriptors.

Return type

dict

Returns

(label, data) pairs

class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())

Bases: object

Manage a complex.

BURIED_VOLUME_VDW_SCALE = 1.17
CONTOURS_DIR = 'contours'
__init__(st, logger=None, nonmetallic_centers=())

Create an instance.

Parameters
  • st (schrodinger.structure.Structure) – the structure

  • logger (logging.Logger or None) – output logger or None if there isn’t one

  • nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom

setMetalAtom()

Set the metal atom.

setLigands()

Set the ligands.

getBondAngle()

Return the bond angle in degrees.

Return type

float

Returns

the bond angle in degrees

getVDWSurfaceArea()

Return the VDW surface area in Angstrom^2.

Return type

float

Returns

the VDW surface area in Angstrom^2

getVDWVolume(vdw_scale=1, buffer_len=2)

Return the VDW volume in Angstrom^3.

Parameters
  • vdw_scale (float) – the VDW scale

  • buffer_len (float) – a shape buffer lengths in Angstrom

Return type

float

Returns

the VDW volume in Angstrom^3

getBuriedVolumeStructure(only_largest_ligands=False)

Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric.

Parameters

only_largest_ligands (bool) – Whether small ligands should be deleted

Return type

schrodinger.structure.Structure

Returns

the structure containing some or all ligands

getBuriedVDWVolumePct(struct, vdw_scale=1.17, sphere_quadrant=None, free_volume=False)

Return the buried VDW volume percent.

Parameters
  • struct (structure.Structure) – The structure to get buried volume for

  • vdw_scale (float) – the VDW scale

  • sphere_quadrant (None or str) – restrict sphere sampling to a quadrant specified as a key of amorphous.ORDINAL_DIRECTIONS

  • free_volume (bool) – use this option to return the free volume

Return type

float

Returns

the buried VDW volume percent

getAlignmentVectors()

Return two vectors for the structure that will be used to rotate it, the first is the vector to be aligned along the +Y-axis and the second is the vector to be aligned along the +Z-axis.

Return type

numpy.array, numpy.array

Returns

vectors to be aligned along the +Y-axis and +Z-axis

getRotatedComplex()

Return a copy of the complex that is rotated so that the smallest ligands are along the +Z-axis.

Return type

schrodinger.structure.Structure

Returns

copy of the structure rotated so that the smallest ligands are along the +Z-axis

exportBuriedVolumeContour(sphere_radius=10, vdw_scale=1, num_bins=100, seed=1234, num_points=2000000)

Export the buried volume contour for the complex

Parameters
  • sphere_radius (float) – The radius for the sphere to sample points in

  • vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom

  • num_bins (int) – The number of bins in x and y direction to put the points in

  • seed (int) – Seed for random number generation

  • num_points (int) – the sample size of random points in the sphere

Return type

str, str

Returns

The paths to contour png and csv files

getProjectedBonds(struct)

Return the projections of each bond in the given structure onto the xy-plane.

Parameters

struct (schrodinger.structure.Structure) – the structure

Return type

list[ProjBondData]

Returns

projections of bonds onto the xy-plane

addShadow(axes, struct)

Add the given structure’s shadow to the given contour axes.

Parameters
plotContour(points, struct)

Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour.

Parameters
getVectorizedDescriptors(jaguar_out_file)

Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

Parameters

jaguar_out_file (str or None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one

Return type

dict

Returns

(label, data) pairs

getDescriptors(no_organometallic=False)

Return descriptors.

Parameters

no_organometallic (bool) – Whether organometallic descriptors should be skipped

Return type

dict

Returns

(label, data) pairs

schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)

Return a list of unique titles for the given structures.

Parameters

sts (list) – contains schrodinger.structure.Structure

Return type

list

Returns

the unique titles

class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate features for metal complexes.

__init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Create an instance.

Parameters
  • jaguar (bool) – specify whether to calculate Jaguar features

  • jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here

  • tpp (int) – the number of threads for any Jaguar jobs

  • ligfilter (bool) – specify whether to calculate Ligfilter features

  • no_organometallic (bool) – Whether organometallic descriptors should be skipped

  • canvas (bool) – specify whether to calculate Canvas features

  • moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors

  • include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

  • save_files (bool) – Whether to save subjob files or not

  • logger (logging.Logger or None) – output logger or None if there isn’t one

runJaguar()

Run Jaguar on the given structures.

Return type

list

Returns

contains Jaguar *.out file names

getFeatures(structs, jaguar_out_files=None)

Return features dictionary for the given structures

Parameters
  • structs (list(schrodinger.structure.Structure)) – list of structures to be featurized

  • jaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures

verifyJaguarOutfiles()

Run jaguar and get the out-files if the out-files have not been provided

getComplexDescriptors()

Create a Complex object for each structure and get their descriptors

Return type

dict

Returns

The descriptors from Complex for each structure

getJaguarDescriptors()

Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.

Return type

dict

Returns

The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getUtilityDescriptors()

Get the requested utility descriptors for all structures

Return type

dict

Returns

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getDescriptorUtilityJob(descriptor_utility)

Get the job to run to generate the descriptors using the passed descriptor_utility for all structures

Parameters

descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors

Return type

jobutils.RobustSubmissionJob

Returns

The job to run to generate the descriptors

getExtraMolecularDescriptorsProps(st, descriptor_utility)

Return any extra structure properties computed using the output from molecular descriptors.

Parameters
  • st (schrodinger.structure.Structure) – the structure output from molecular descriptors which has all output properties defined

  • descriptor_utility (DescriptorUtility) – the molecular descriptor utility containing the original job parameters

Return type

dict

Returns

pairs are property names and values

processUtilityDescriptorOutputs(jobs_dict)

Read the descriptors for all descriptor utilities that were run, and return them

Parameters

jobs_dict (dict) – Dictionary with DescriptorUtility as keys and jobs as values

Return type

dict

Returns

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getMolecularDescriptorsJob()

Get the job to run to generate molecular descriptors for all structures

Return type

jobutils.RobustSubmissionJob

Returns

The job to run to generate the descriptors

static writeFingerprintFiles(structs)

Write fingerprint files for the given structures.

Parameters

structs (list(schrodinger.structure.Structure)) – list of structures to be fingerprinted

Return type

list

Returns

the fingerprint file names

log(msg, **kwargs)

Add a message to the log file

Parameters

msg (str) – The message to log

Additional keyword arguments are passed to the textlogger.log_msg function

set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$', data_y: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.ComplexFeatures

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data parameter in fit.

data_ystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data_y parameter in fit.

selfobject

The updated object.

set_transform_request(*, data: Union[bool, None, str] = '$UNCHANGED$') schrodinger.application.matsci.mlearn.features.ComplexFeatures

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

datastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for data parameter in transform.

selfobject

The updated object.

class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')

Bases: object

Calculates CrystalNN structure fingerprints as implemented in pymatgen

OPS_PRESET = 'ops'
CN_PRESET = 'cn'
__init__(preset='ops')

Create a structure featurizer

Parameters

preset (str) – One of OPS_PRESET or CN_PRESET class constants

featurize(struct)

Get CrystalNN fingerprints for the passed structure

:param structure.Structure The structure to get features for

Return type

list

Returns

List of CrystalNN fingerprints for the structure