schrodinger.comparison.dedup_utils module¶
- class schrodinger.comparison.dedup_utils.StructureClusterPair(structure, cluster, asu_pg_sym_ops)¶
Bases:
tuple
- asu_pg_sym_ops¶
Alias for field number 2
- cluster¶
Alias for field number 1
- structure¶
Alias for field number 0
- class schrodinger.comparison.dedup_utils.CrystalData(st0, e, rdv, centroids)¶
Bases:
tuple
- centroids¶
Alias for field number 3
- e¶
Alias for field number 1
- rdv¶
Alias for field number 2
- st0¶
Alias for field number 0
- schrodinger.comparison.dedup_utils.load_yaml(config_fname: str) Namespace ¶
Return configurations. Any missing key is set as ‘’
- schrodinger.comparison.dedup_utils.crystals_are_duplicates(optres1, optres2, renumber_rmsd_thresh, n_thresh, rmsd_thresh, allow_reflection, matching_cutoff, skip_centroids_matching, n_nearby, zprime2: bool = False) bool ¶
Evaluate if these two crystals are duplicates
- Parameters:
n_nearby – max number of nearby centroids for alignment check
- schrodinger.comparison.dedup_utils.compare_property_wrapper(r1: CrystalData, r2: CrystalData, prop_key: str, max_diff: float, dup_func: Callable)¶
Compares property of structures before calling more expensive duplicate checking function
- params:
prop_key: Key of property to compare stored in Structure.property diff: Maximum allowed difference to compute RMSDn dup_func: Function of type Step.isClose
- Returns:
Boolean where True is that the Structures are duplicates and False is that the Structures are unique
- schrodinger.comparison.dedup_utils.deduplicate_crystals(sts: List[Structure], renumber_rmsd_thresh: float = 0.5, rmsd_thresh: float = 0.3, n_thresh: int = 20, energy_key: str = 'r_sani_energy', per_molecule_energy_key: Optional[str] = None, energy_units: str = 'hartree', bin_width: float = 1.0, energy_window: float = 1.7976931348623157e+308, compare_property_key: str = 'r_m_Unit_Cell_Volume/Ang.^3', compare_property_max_diff: Optional[float] = None, allow_reflection: bool = True, point_group_symmetry: bool = False, matching_cutoff: float = 2.0, skip_centroids_matching: bool = False, spherical_cluster_as_input: bool = False, n_nearby: int = 5, first_come_first_serve: bool = False, zprime=1, regenerate_unitcell=False) List[Structure] ¶
Deduplicate a list of crystal structures.
- Parameters:
sts – List of structures to deduplicate
renumber_rmsd_thresh – RMSD threshold to trigger ASU renumbering
rmsd_thresh – RMSD value to consider as duplicates
n_thresh – Number of matches required to consider as duplicate
energy_key – Property key that holds the energy of the supercell
per_molecule_energy_key – Property key that holds the energy per molecule. Note that specifying this will override energy_key; without it, the energy_key value is converted to per-molecule energy.
energy_units – Units for energy. Must be either ‘hartree’ or ‘kcal’
bin_width – Width of energy bins in bucket sort (kcal/mol/molecule)
energy_window – Energy window to keep structures (kcal/mol/molecule)
compare_property_key – Property for fast comparison before computing RMSDn
compare_property_max_diff – Maximum allowed absolute difference for property comparison. Default value of None does not perform the comparison.
allow_reflection – If True, allow both rotation and reflection when aligning structures. If False, only allow rotation.
point_group_symmetry – If we will explore point group symmetry operations to deliver the best RMSD N during deduplication. Suggested for crystals composed of molecules with point group symmetry.
matching_cutoff – Cutoff used to identify matched molecules. he centroids of matched molecules must be within this distance.
skip_centroids_matching – If True, perform RMSDn calculations without centroids matching
spherical_cluster_as_input – If True, use input files as spherical clusters directly
n_nearby – Max number of nearby centroids for alignment check. Must be between 4 and 9, inclusive.
first_come_first_serve – If True, when two duplicate crystals are found, the first one is preferred. If False, the lower energy one is saved.
zprime – the crystal Z’ value
regenerate_unitcell – regenerate the unitcells from ASU from the input