schrodinger.seam.io.sourceid module¶
Structure source identification for tracking where structures came from.
This module enables traceability by attaching source metadata to structures. File readers automatically set source IDs, so most users only need to retrieve them.
Usage¶
Get the source ID from a structure:
from schrodinger.seam.io.sourceid import get_source_id
source_id = get_source_id(st)
if source_id is not None:
print(f"Structure came from: {source_id}")
Source IDs are sortable and comparable:
source_ids = [get_source_id(st) for st in structures]
for sid in sorted(source_ids):
print(sid)
Source ID Types¶
FileSourceID: Structures read from files (filename, index, file hash)
StructureSourceID: Fallback based on structure content hash
DerivedSourceID: Structures derived from other structures (tracks root and derivation path)
Extending¶
Custom source ID types can be created by subclassing SourceID and setting
the source_type class variable. Subclasses auto-register for decoding
via get_source_id().
- class schrodinger.seam.io.sourceid.DerivationStep(method: str, variant_id: str, metadata: dict[str, Any] | None = None, tags: tuple[str, ...] = ())¶
Bases:
NamedTupleA single step in a derivation path.
- Parameters:
method – What transformation was applied (e.g., “LigPrep”, “GlideDock”)
variant_id – Unique identifier for this output variant
metadata – Optional extra context (JSON-serializable, not used for identity)
tags – Tags on the SourceID at this point when the next derivation happened
- method: str¶
Alias for field number 0
- variant_id: str¶
Alias for field number 1
- metadata: dict[str, Any] | None¶
Alias for field number 2
- tags: tuple[str, ...]¶
Alias for field number 3
- class schrodinger.seam.io.sourceid.SourceID¶
Bases:
ABCAbstract base class for structure source identifiers.
Each source ID represents where a structure came from and provides: - Methods to encode/decode to/from structure properties - Comparison and sorting support - Human-readable string representation
Subclasses must define a
source_typeclass variable to register themselves for automatic decoding viaget_source_id().- source_type: ClassVar[str] = ''¶
- abstract property sort_key: tuple¶
Return tuple for sorting within this source type.
The full comparison key is (source_type, sort_key).
- Returns:
Tuple of comparable values
- abstract toDict() dict[str, Any]¶
Return dict representation for JSON serialization.
- Returns:
Dict with type-specific fields (not including source_type)
- abstract classmethod fromDict(data: dict[str, Any]) SourceID¶
Create instance from dict representation.
- Parameters:
data – Dict with type-specific fields
- Returns:
SourceID instance
- addTag(tag: str) None¶
Add a tag to this SourceID.
Tags are workflow metadata excluded from equality and hashing.
- Parameters:
tag – Tag string to add
- getTags() frozenset[str]¶
Return the tags on this SourceID.
- Returns:
Frozenset of tag strings
- class schrodinger.seam.io.sourceid.FileSourceID(filename: str, index: int, file_hash: str, title: Optional[str] = None)¶
Bases:
SourceIDSource ID for structures read from files.
Encodes file information including filename, index within file, a hash of the file metadata (name, size, mtime), and optionally the structure title.
- source_type: ClassVar[str] = 'file'¶
- __init__(filename: str, index: int, file_hash: str, title: Optional[str] = None)¶
Initialize FileSourceID.
- Parameters:
filename – Filename (basename only, not full path)
index – Index within file (1-based)
file_hash – Hash of file metadata
title – Structure title (optional)
- property sort_key: tuple¶
Return tuple for sorting within this source type.
The full comparison key is (source_type, sort_key).
- Returns:
Tuple of comparable values
- toDict() dict[str, Any]¶
Return dict representation for JSON serialization.
- Returns:
Dict with type-specific fields (not including source_type)
- classmethod fromDict(data: dict[str, Any]) FileSourceID¶
Create instance from dict representation.
- Parameters:
data – Dict with type-specific fields
- Returns:
SourceID instance
- classmethod from_file(filepath: str, index: int, title: Optional[str] = None) FileSourceID¶
Create FileSourceID from file path and index.
Computes the file hash from the actual file.
- Parameters:
filepath – Full path to file
index – Index within file (1-based)
title – Structure title (optional)
- Returns:
FileSourceID instance
- class schrodinger.seam.io.sourceid.StructureSourceID(structure_hash: str)¶
Bases:
SourceIDSource ID based on structure content itself.
Used as a fallback when no external source information is available. The hash is based on coordinates, elements, and connectivity.
- source_type: ClassVar[str] = 'content'¶
- __init__(structure_hash: str)¶
Initialize StructureSourceID.
- Parameters:
structure_hash – Hash of structure content
- property sort_key: tuple¶
Return tuple for sorting within this source type.
The full comparison key is (source_type, sort_key).
- Returns:
Tuple of comparable values
- toDict() dict[str, Any]¶
Return dict representation for JSON serialization.
- Returns:
Dict with type-specific fields (not including source_type)
- classmethod fromDict(data: dict[str, Any]) StructureSourceID¶
Create instance from dict representation.
- Parameters:
data – Dict with type-specific fields
- Returns:
SourceID instance
- classmethod from_structure(st: Structure) StructureSourceID¶
Create StructureSourceID by computing hash from structure.
- Parameters:
st – Structure to hash
- Returns:
StructureSourceID instance
- class schrodinger.seam.io.sourceid.DerivedSourceID(root: SourceID, derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None)¶
Bases:
SourceIDSource ID for structures derived from other structures.
Tracks the original source (root) and the derivation path. The root is always a non-DerivedSourceID (e.g., FileSourceID or StructureSourceID), ensuring a flat structure.
The derivation_path is a list of DerivationStep namedtuples, each with:
method: what transformation was applied (e.g., “LigPrep”, “enumeration”)
variant_id: unique identifier for this output (auto-generated if not provided)
metadata: optional flat JSON-serializable dict for extra context
- source_type: ClassVar[str] = 'derived'¶
- __init__(root: SourceID, derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None)¶
Initialize DerivedSourceID with a single derivation step.
- Parameters:
root – Original source (must not be a DerivedSourceID)
derivation_method – Name of the derivation (e.g., “LigPrep”)
variant_id – Identifier for this output (auto-generated UUID if None)
metadata – Optional metadata dict (JSON-serializable, max 1KB)
- Raises:
ValueError – If root is a DerivedSourceID or metadata validation fails
- property sort_key: tuple¶
Return tuple for sorting within this source type.
The full comparison key is (source_type, sort_key).
- Returns:
Tuple of comparable values
- derive(derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None) Self¶
Create a child DerivedSourceID with an additional derivation step.
- Parameters:
derivation_method – Name of the derivation (e.g., “LigPrep”, “GlideDock”)
variant_id – Identifier for this specific output (auto-generated UUID if None)
metadata – Optional metadata dict (must be JSON-serializable, max 1KB)
- Returns:
New DerivedSourceID with extended path
- Raises:
ValueError – If metadata validation fails
- classmethod deriveFrom(source: Structure | SourceID, derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None) DerivedSourceID¶
Create a DerivedSourceID from a Structure or SourceID.
This is a convenience method that handles all source types uniformly:
If source is a Structure, extracts its SourceID (or creates a StructureSourceID if none exists)
If source is a DerivedSourceID, extends its derivation path
If source is any other SourceID, uses it as the root
- Parameters:
source – Structure or SourceID to derive from
derivation_method – Name of the derivation (e.g., “LigPrep”)
variant_id – Identifier for this output (auto-generated UUID if None)
metadata – Optional metadata dict (must be JSON-serializable, max 1KB)
- Returns:
New DerivedSourceID
- Raises:
ValueError – If metadata validation fails
- parent() SourceID¶
Return the source ID of the parent structure.
- Returns:
Parent’s source ID (root if only one derivation step)
- getParentWithTag(tag: str) SourceID¶
Find an ancestor with the given tag.
Walks the derivation path in reverse looking for a step whose
tagscontain the given tag. Returns the reconstructed ancestor SourceID at that position with its tags restored.- Parameters:
tag – Tag to search for
- Returns:
Ancestor SourceID with the tag
- Raises:
ValueError – If no ancestor with the tag is found
- prettyPrint() str¶
Return a multi-line formatted ancestry path.
- Returns:
Human-readable ancestry chain
- toDict() dict[str, Any]¶
Return dict representation for JSON serialization.
- Returns:
Dict with type-specific fields (not including source_type)
- classmethod fromDict(data: dict[str, Any]) Self¶
Create instance from dict representation.
- Parameters:
data – Dict with type-specific fields
- Returns:
SourceID instance
- schrodinger.seam.io.sourceid.get_source_id(st: Structure) SourceID¶
Get source ID from a structure.
Reads the PROP_SOURCE_TYPE property to determine which SourceID subclass to use for decoding. If no source ID is present, computes and returns a StructureSourceID based on the structure’s content.
- Parameters:
st – Structure to get source ID from
- Returns:
SourceID instance (always returns a value; never None)
- schrodinger.seam.io.sourceid.get_root_source_id(source: Structure | SourceID) SourceID¶
Get the root source ID from a Structure or SourceID.
For derived sources, returns the ultimate root (the original non-derived source). For non-derived sources, returns the source itself.
- Parameters:
source – Structure or SourceID to get root from
- Returns:
Root SourceID (never a DerivedSourceID)