schrodinger.comparison.dedup_utils module¶
- class schrodinger.comparison.dedup_utils.SphericalClusterData(cluster: Structure, rdv: ndarray, centroids: LabeledCentroidCoords)¶
Bases:
NamedTupleA spherical cluster and cached information about the centroids of molecules in the cluster.
- rdv: ndarray¶
Alias for field number 1
- centroids: LabeledCentroidCoords¶
Alias for field number 2
- class schrodinger.comparison.dedup_utils.CrystalData(st0, e, pg_symmetry_ops, clusters)¶
Bases:
NamedTuple- e: float¶
Alias for field number 1
- pg_symmetry_ops: tuple[numpy.ndarray]¶
Alias for field number 2
- clusters: list[SphericalClusterData]¶
Alias for field number 3
- schrodinger.comparison.dedup_utils.load_yaml(config_fname: str) Namespace¶
Return configurations. Any missing key is set as ‘’
- schrodinger.comparison.dedup_utils.get_potential_alignment(cluster_data_1: SphericalClusterData, cluster_data_2: SphericalClusterData, match_thresh: float, n_max: int = 5) Iterator[ndarray]¶
Return potential alignment
- Parameters:
n_max – max number of centroids for alignment check, in addition to the origin
- schrodinger.comparison.dedup_utils.crystals_are_duplicates(spherical_data_1: CrystalData, spherical_data_2: CrystalData, zprime2: bool, **kwargs) bool¶
Wrapper for _crystals_are_duplicates to handle Z’>1.
The for loop tries aligning spherical_data_1 cluster A, B… (centered on the different repeated subunits) with spherical_data_2 cluster A’ only, i.e. all the matchings we need to examine to test if data_1, data_2 are duplicates.
- schrodinger.comparison.dedup_utils.compare_property_wrapper(r1: CrystalData, r2: CrystalData, prop_key: str, max_diff: float, dup_func: Callable)¶
Compares property of structures before calling more expensive duplicate checking function
- params:
prop_key: Key of property to compare stored in Structure.property diff: Maximum allowed difference to compute RMSDn dup_func: Function of type Step.isClose
- Returns:
Boolean where True is that the Structures are duplicates and False is that the Structures are unique
- schrodinger.comparison.dedup_utils.deduplicate_crystals(sts: List[Structure], renumber_rmsd_thresh: float = 0.5, rmsd_thresh: float = 0.3, n_thresh: int = 20, energy_key: str = 'r_sani_energy', per_molecule_energy_key: Optional[str] = None, energy_units: str = 'hartree', bin_width: float = 1.0, energy_window: float = 1.7976931348623157e+308, compare_property_key: str = 'r_m_Unit_Cell_Volume/Ang.^3', compare_property_max_diff: Optional[float] = None, allow_reflection: bool = True, point_group_symmetry: bool = False, matching_cutoff: float = 2.0, skip_centroids_matching: bool = False, spherical_cluster_as_input: bool = False, n_nearby: int = 5, first_come_first_serve: bool = False, zprime: int = 1, regenerate_unitcell=False) List[Structure]¶
Deduplicate a list of crystal structures.
- Parameters:
sts – List of structures to deduplicate
renumber_rmsd_thresh – RMSD threshold to trigger ASU renumbering
rmsd_thresh – RMSD value to consider as duplicates
n_thresh – Number of matches required to consider as duplicate
energy_key – Property key that holds the energy of the supercell
per_molecule_energy_key – Property key that holds the energy per molecule. Note that specifying this will override energy_key; without it, the energy_key value is converted to per-molecule energy.
energy_units – Units for energy. Must be either ‘hartree’ or ‘kcal’
bin_width – Width of energy bins in bucket sort (kcal/mol/molecule)
energy_window – Energy window to keep structures (kcal/mol/molecule)
compare_property_key – Property for fast comparison before computing RMSDn
compare_property_max_diff – Maximum allowed absolute difference for property comparison. Default value of None does not perform the comparison.
allow_reflection – If True, allow both rotation and reflection when aligning structures. If False, only allow rotation.
point_group_symmetry – If we will explore point group symmetry operations to deliver the best RMSD N during deduplication. Suggested for crystals composed of molecules with point group symmetry.
matching_cutoff – Cutoff used to identify matched molecules. he centroids of matched molecules must be within this distance.
skip_centroids_matching – If True, perform RMSDn calculations without centroids matching
spherical_cluster_as_input – If True, use input files as spherical clusters directly
n_nearby – Max number of nearby centroids for alignment check. Must be between 4 and 9, inclusive.
first_come_first_serve – If True, when two duplicate crystals are found, the first one is preferred. If False, the lower energy one is saved.
zprime – the crystal Z’ value
regenerate_unitcell – regenerate the unitcells from ASU from the input