schrodinger.comparison.dedup_utils module

class schrodinger.comparison.dedup_utils.StructureClusterPair(structure, cluster, asu_pg_sym_ops)

Bases: tuple

asu_pg_sym_ops

Alias for field number 2

cluster

Alias for field number 1

structure

Alias for field number 0

class schrodinger.comparison.dedup_utils.CrystalData(st0, e, rdv, centroids)

Bases: tuple

centroids

Alias for field number 3

e

Alias for field number 1

rdv

Alias for field number 2

st0

Alias for field number 0

schrodinger.comparison.dedup_utils.load_yaml(config_fname: str) Namespace

Return configurations. Any missing key is set as ‘’

schrodinger.comparison.dedup_utils.crystals_are_duplicates(optres1, optres2, renumber_rmsd_thresh, n_thresh, rmsd_thresh, allow_reflection, matching_cutoff, skip_centroids_matching, n_nearby, zprime2: bool = False) bool

Evaluate if these two crystals are duplicates

Parameters:

n_nearby – max number of nearby centroids for alignment check

schrodinger.comparison.dedup_utils.compare_property_wrapper(r1: CrystalData, r2: CrystalData, prop_key: str, max_diff: float, dup_func: Callable)

Compares property of structures before calling more expensive duplicate checking function

params:

prop_key: Key of property to compare stored in Structure.property diff: Maximum allowed difference to compute RMSDn dup_func: Function of type Step.isClose

Returns:

Boolean where True is that the Structures are duplicates and False is that the Structures are unique

schrodinger.comparison.dedup_utils.deduplicate_crystals(sts: List[Structure], renumber_rmsd_thresh: float = 0.5, rmsd_thresh: float = 0.3, n_thresh: int = 20, energy_key: str = 'r_sani_energy', per_molecule_energy_key: Optional[str] = None, energy_units: str = 'hartree', bin_width: float = 1.0, energy_window: float = 1.7976931348623157e+308, compare_property_key: str = 'r_m_Unit_Cell_Volume/Ang.^3', compare_property_max_diff: Optional[float] = None, allow_reflection: bool = True, point_group_symmetry: bool = False, matching_cutoff: float = 2.0, skip_centroids_matching: bool = False, spherical_cluster_as_input: bool = False, n_nearby: int = 5, first_come_first_serve: bool = False, zprime=1, regenerate_unitcell=False) List[Structure]

Deduplicate a list of crystal structures.

Parameters:
  • sts – List of structures to deduplicate

  • renumber_rmsd_thresh – RMSD threshold to trigger ASU renumbering

  • rmsd_thresh – RMSD value to consider as duplicates

  • n_thresh – Number of matches required to consider as duplicate

  • energy_key – Property key that holds the energy of the supercell

  • per_molecule_energy_key – Property key that holds the energy per molecule. Note that specifying this will override energy_key; without it, the energy_key value is converted to per-molecule energy.

  • energy_units – Units for energy. Must be either ‘hartree’ or ‘kcal’

  • bin_width – Width of energy bins in bucket sort (kcal/mol/molecule)

  • energy_window – Energy window to keep structures (kcal/mol/molecule)

  • compare_property_key – Property for fast comparison before computing RMSDn

  • compare_property_max_diff – Maximum allowed absolute difference for property comparison. Default value of None does not perform the comparison.

  • allow_reflection – If True, allow both rotation and reflection when aligning structures. If False, only allow rotation.

  • point_group_symmetry – If we will explore point group symmetry operations to deliver the best RMSD N during deduplication. Suggested for crystals composed of molecules with point group symmetry.

  • matching_cutoff – Cutoff used to identify matched molecules. he centroids of matched molecules must be within this distance.

  • skip_centroids_matching – If True, perform RMSDn calculations without centroids matching

  • spherical_cluster_as_input – If True, use input files as spherical clusters directly

  • n_nearby – Max number of nearby centroids for alignment check. Must be between 4 and 9, inclusive.

  • first_come_first_serve – If True, when two duplicate crystals are found, the first one is preferred. If False, the lower energy one is saved.

  • zprime – the crystal Z’ value

  • regenerate_unitcell – regenerate the unitcells from ASU from the input