schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)¶
- Bases: - tuple- components¶
- Alias for field number 1 
 - flag¶
- Alias for field number 0 
 - header¶
- Alias for field number 2 
 - units¶
- Alias for field number 3 
 
- schrodinger.application.matsci.mlearn.features.DescriptorUtility¶
- alias of - schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
- schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)¶
- Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties. - Parameters
- struct ( - schrodinger.structure.Structure) – Input structure
- cutoff (float) – The cutoff for finding nearest neighbor atoms 
 
- Return type
- schrodinger.structure.Structure, ,- schrodinger.infra.structure.DistanceCell,- schrodinger.infra.structure.PBC
- Returns
- Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it. 
- Raise
- ValueError if struct is missing PBCs 
 
- schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)¶
- schrodinger.application.matsci.mlearn.features.get_anion(struct)¶
- Get the most electronegative element in the structure (anion). - Parameters
- struct ( - schrodinger.structure.Structure) – Input structure
- Return type
- str, float, int 
- Returns
- Element, it’s electronegativity, number of anions in the cell 
 
- class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)¶
- Bases: - schrodinger.application.matsci.mlearn.base.BaseFeaturizer- Class to generate lattice-based features. - FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
 - __init__(features, element='Li', cutoff=4.0)¶
- Initialize the object. 
 - runFeature(feature)¶
- Get result from a feature. - Param
- feature: One of the features listed in FEATURES. 
- Return type
- int or float 
- Returns
- Feature value 
 
 - transform(structs)¶
- Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation. - Parameters
- structs (list( - schrodinger.structure.Structure)) – List of structures to be featurized
- Return type
- numpy array of shape [n_samples, n_features] 
- Returns
- Transformed array 
 
 - avgAtomicVol()¶
- Get average atomic volume. - Parameters
- struct ( - schrodinger.structure.Structure) – Structure to be used for feature calculation
- Return type
- float 
- Returns
- Average atomic volume (A^3) 
 
 - avgNeighborCount()¶
- Get average neighbor count. - Return type
- float 
- Returns
- Average neighbor count 
 
 - stdNeighborCount()¶
- Get standard deviation of neighbor count. - Return type
- float 
- Returns
- Average neighbor count 
 
 - avgSublatticeEneg()¶
- Get average sublattice electronegativity. - Return type
- float 
- Returns
- Average sublattice electronegativity 
 
 - avgSublatticeNeighborCount()¶
- Get average sublattice neighbor count. - Return type
- float 
- Returns
- Average sublattice neighbor count 
 
 - avgNeighborIon()¶
- Get average neighbor ionicity. - Return type
- float 
- Returns
- Average neighbor ionicity 
 
 - stdNeighborIon()¶
- Get standard deviation of neighbor ionicity. - Return type
- float 
- Returns
- Average neighbor ionicity 
 
 - avgSublatticeNeighborIon()¶
- Get average sublattice neighbor ionicity. - Return type
- float 
- Returns
- Average sublattice neighbor count 
 
 - volPerAnion()¶
- Get volume per anion. - Return type
- float 
- Returns
- Volume per anion 
 
 - packingFraction(skip_element=None)¶
- Get packing fraction of the crystal. - Parameters
- skip_element (str) – Element to skip 
- Return type
- float 
- Returns
- Packing fraction 
 
 - effectiveRadius(atom)¶
- Get atom effective radius. - Parameters
- atom (schrodinger.structure._StructureAtom) – Atom 
- Return type
- float 
- Returns
- Effective radius 
 
 - sublatticePackingFraction()¶
- Get packing fraction of the sublattice crystal. - Return type
- float 
- Returns
- Packing fraction 
 
 - avgElementNeighborCount()¶
- Get average element neighbor count. - Return type
- float 
- Returns
- Average number of bonds per element 
 
 - avgAnionAnionShortDistance()¶
- Get average anion anion shortest distance. - Return type
- float 
- Returns
- Average anion anion shortest distance 
 
 - avgElementAnionShortDistance()¶
- Get average element anion shortest distance. - Return type
- float 
- Returns
- Average element anion shortest distance 
 
 - avgShortDistance()¶
- Get average element element shortest distance. - Return type
- float 
- Returns
- Average element element shortest distance 
 
 - anionFrameCoordination()¶
- Get anion framework coordination. - Return type
- float 
- Returns
- Anion framework coordination 
 
 - pathWidth(eval_eneg=False)¶
- Evaluate average straight line path width. See the reference in the constructor for more info. - Parameters
- eval_eneg (bool) – If True, return average over electronegativity, instead of distance 
- Return type
- float 
- Returns
- Average path or electronegativity 
 
 - pathWidthEneg()¶
- Evaluate average straight line path electronegativity. - Return type
- float 
- Returns
- Average electronegativity along the path 
 
 - ratioIonicity()¶
- Get ratio ionicity. - Return type
- float 
- Returns
- Ratio ionicity 
 
 - ratioCount()¶
- Get ratio neighbor count. - Return type
- float 
- Returns
- Ratio neighbor count 
 
 
- class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)¶
- Bases: - object- Manage a ligand. - __init__(st, metal_atom, new_to_old, coordination_idxs)¶
- Create an instance. - Parameters
- st ( - schrodinger.structure.Structure) – the structure
- metal_atom ( - schrodinger.structure._StructureAtom) – the metal atom
- new_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure) 
- coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms 
 
 
 - getVec(point)¶
- Return a vector pointing from the metal atom to the given point. - Parameters
- point ( - numpy.array) – the point in Ang.
- Return type
- numpy.array
- Returns
- the vector in Ang. 
 
 - getCentroid(st, idxs)¶
- Return the centroid vector of the given coordination atom indices. - Parameters
- st ( - schrodinger.structure.Structure) – the structure
- idxs (list) – the coordination indices 
 
- Return type
- numpy.array
- Returns
- the centroid vector in Ang. 
 
 - getCoordinationVec(st, idxs)¶
- Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices. - Parameters
- st ( - schrodinger.structure.Structure) – the structure
- idxs (list) – the coordination indices 
 
- Return type
- numpy.array
- Returns
- the coordination vector in Ang. 
 
 - getStoichiometry()¶
- Return the stoichiometry. - Return type
- str 
- Returns
- the stoichiometry 
 
 - getDenticity()¶
- Return the denticity. - Return type
- int 
- Returns
- the denticity 
 
 - getHapticity()¶
- Return the hapticity. - Return type
- int 
- Returns
- the hapticity 
 
 - getHapticCharacter()¶
- Return the haptic character. - Return type
- int 
- Returns
- the haptic character 
 
 - getBiteAngle()¶
- Return the bite angle in degrees. - Return type
- float or None 
- Returns
- the bite angle in degrees 
 
 - getAtomConeAngle(atom)¶
- Return the cone angle for the given atom in degrees. - Parameters
- atom ( - schrodinger.structure._StructureAtom) – the atom
- Return type
- float 
- Returns
- the cone angle for the given atom in degrees 
 
 - getConeAngle()¶
- Return the cone angle in degrees. - Return type
- float 
- Returns
- the cone angle in degrees 
 
 - getBondLength()¶
- Return the bond length in Ang. - Return type
- float 
- Returns
- the bond length in Ang. 
 
 - getDescriptors()¶
- Return descriptors. - Return type
- dict 
- Returns
- (label, data) pairs 
 
 
- class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())¶
- Bases: - object- Manage a complex. - BURIED_VOLUME_VDW_SCALE = 1.17¶
 - CONTOURS_DIR = 'contours'¶
 - __init__(st, logger=None, nonmetallic_centers=())¶
- Create an instance. - Parameters
- st ( - schrodinger.structure.Structure) – the structure
- logger (logging.Logger or None) – output logger or None if there isn’t one 
- nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom 
 
 
 - setMetalAtom()¶
- Set the metal atom. 
 - setLigands()¶
- Set the ligands. 
 - getBondAngle()¶
- Return the bond angle in degrees. - Return type
- float 
- Returns
- the bond angle in degrees 
 
 - getVDWSurfaceArea()¶
- Return the VDW surface area in Angstrom^2. - Return type
- float 
- Returns
- the VDW surface area in Angstrom^2 
 
 - getVDWVolume(vdw_scale=1, buffer_len=2)¶
- Return the VDW volume in Angstrom^3. - Parameters
- vdw_scale (float) – the VDW scale 
- buffer_len (float) – a shape buffer lengths in Angstrom 
 
- Return type
- float 
- Returns
- the VDW volume in Angstrom^3 
 
 - getBuriedVolumeStructure(only_largest_ligands=False)¶
- Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric. - Parameters
- only_largest_ligands (bool) – Whether small ligands should be deleted 
- Return type
- Returns
- the structure containing some or all ligands 
 
 - getBuriedVDWVolumePct(struct, vdw_scale=1.17, sphere_quadrant=None, free_volume=False)¶
- Return the buried VDW volume percent. - Parameters
- struct (structure.Structure) – The structure to get buried volume for 
- vdw_scale (float) – the VDW scale 
- sphere_quadrant (None or str) – restrict sphere sampling to a quadrant specified as a key of amorphous.ORDINAL_DIRECTIONS 
- free_volume (bool) – use this option to return the free volume 
 
- Return type
- float 
- Returns
- the buried VDW volume percent 
 
 - getFreeVolumeVector()¶
- Return a unit vector pointing from the metal atom of the complex in the direction of free volume. - Return type
- numpy.array 
- Returns
- the free volume unit vector 
 
 - getRotatedComplex()¶
- Return a copy of the complex that is rotated so that the free volume vector points along the positive z-axis. - Return type
- structure.Structure
- Returns
- A rotated copy of the input structure 
 
 - exportBuriedVolumeContour(sphere_radius=3.5, vdw_scale=1.17, num_bins=30, seed=1234)¶
- Export the buried volume contour for the complex - Parameters
- sphere_radius (float) – The radius for the sphere to sample points in 
- vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom 
- num_bins (int) – The number of bins in x and y direction to put the points in 
- seed (int) – Seed for random number generation 
 
- Return type
- str, str 
- Returns
- The paths to contour png and csv files 
 
 - plotContour(points)¶
- Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour. - Parameters
- points (numpy.array) – The x, y, z values of points 
 
 - getVectorizedDescriptors(jaguar_out_file)¶
- Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc. - Parameters
- jaguar_out_file (str or None) – the name of a Jaguar - *.outfile from which descriptors will be extracted or None if there isn’t one
- Return type
- dict 
- Returns
- (label, data) pairs 
 
 - getDescriptors(no_organometallic=False)¶
- Return descriptors. - Parameters
- no_organometallic (bool) – Whether organometallic descriptors should be skipped 
- Return type
- dict 
- Returns
- (label, data) pairs 
 
 
- schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)¶
- Return a list of unique titles for the given structures. - Parameters
- sts (list) – contains - schrodinger.structure.Structure
- Return type
- list 
- Returns
- the unique titles 
 
- class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
- Bases: - schrodinger.application.matsci.mlearn.base.BaseFeaturizer- Class to generate features for metal complexes. - __init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶
- Create an instance. - Parameters
- jaguar (bool) – specify whether to calculate Jaguar features 
- jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here 
- tpp (int) – the number of threads for any Jaguar jobs 
- ligfilter (bool) – specify whether to calculate Ligfilter features 
- no_organometallic (bool) – Whether organometallic descriptors should be skipped 
- canvas (bool) – specify whether to calculate Canvas features 
- moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors 
- include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc. 
- save_files (bool) – Whether to save subjob files or not 
- logger (logging.Logger or None) – output logger or None if there isn’t one 
 
 
 - runJaguar()¶
- Run Jaguar on the given structures. - Return type
- list 
- Returns
- contains Jaguar - *.outfile names
 
 - getFeatures(structs, jaguar_out_files=None)¶
- Return features dictionary for the given structures - Parameters
- structs (list( - schrodinger.structure.Structure)) – list of structures to be featurized
- jaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar - *.outfiles then specify the files here using the same ordering as used for any given structures
 
 
 - verifyJaguarOutfiles()¶
- Run jaguar and get the out-files if the out-files have not been provided 
 - getComplexDescriptors()¶
- Create a - Complexobject for each structure and get their descriptors- Return type
- dict 
- Returns
- The descriptors from - Complexfor each structure
 
 - getJaguarDescriptors()¶
- Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures. - Return type
- dict 
- Returns
- The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors 
 
 - getUtilityDescriptors()¶
- Get the requested utility descriptors for all structures - Return type
- dict 
- Returns
- The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors 
 
 - getDescriptorUtilityJob(descriptor_utility)¶
- Get the job to run to generate the descriptors using the passed descriptor_utility for all structures - Parameters
- descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors 
- Return type
- jobutils.RobustSubmissionJob
- Returns
- The job to run to generate the descriptors 
 
 - getExtraMolecularDescriptorsProps(st, descriptor_utility)¶
- Return any extra structure properties computed using the output from molecular descriptors. - Parameters
- st ( - schrodinger.structure.Structure) – the structure output from molecular descriptors which has all output properties defined
- descriptor_utility (DescriptorUtility) – the molecular descriptor utility containing the original job parameters 
 
- Return type
- dict 
- Returns
- pairs are property names and values 
 
 - processUtilityDescriptorOutputs(jobs_dict)¶
- Read the descriptors for all descriptor utilities that were run, and return them - Parameters
- jobs_dict (dict) – Dictionary with - DescriptorUtilityas keys and jobs as values
- Return type
- dict 
- Returns
- The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors 
 
 - getMolecularDescriptorsJob()¶
- Get the job to run to generate molecular descriptors for all structures - Return type
- jobutils.RobustSubmissionJob
- Returns
- The job to run to generate the descriptors 
 
 - static writeFingerprintFiles(structs)¶
- Write fingerprint files for the given structures. - Parameters
- structs (list( - schrodinger.structure.Structure)) – list of structures to be fingerprinted
- Return type
- list 
- Returns
- the fingerprint file names 
 
 - log(msg, **kwargs)¶
- Add a message to the log file - Parameters
- msg (str) – The message to log 
 - Additional keyword arguments are passed to the textlogger.log_msg function 
 
- class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')¶
- Bases: - object- Calculates CrystalNN structure fingerprints as implemented in pymatgen - OPS_PRESET = 'ops'¶
 - CN_PRESET = 'cn'¶
 - __init__(preset='ops')¶
- Create a structure featurizer - Parameters
- preset (str) – One of - OPS_PRESETor- CN_PRESETclass constants
 
 - featurize(struct)¶
- Get CrystalNN fingerprints for the passed structure - :param - structure.StructureThe structure to get features for- Return type
- list 
- Returns
- List of CrystalNN fingerprints for the structure