schrodinger.seam.io.sourceid module

Structure source identification for tracking where structures came from.

This module enables traceability by attaching source metadata to structures. File readers automatically set source IDs, so most users only need to retrieve them.

Usage

Get the source ID from a structure:

from schrodinger.seam.io.sourceid import get_source_id

source_id = get_source_id(st)
if source_id is not None:
    print(f"Structure came from: {source_id}")

Source IDs are sortable and comparable:

source_ids = [get_source_id(st) for st in structures]
for sid in sorted(source_ids):
    print(sid)

Source ID Types

  • FileSourceID: Structures read from files (filename, index, file hash)

  • StructureSourceID: Fallback based on structure content hash

  • DerivedSourceID: Structures derived from other structures (tracks root and derivation path)

Extending

Custom source ID types can be created by subclassing SourceID and setting the source_type class variable. Subclasses auto-register for decoding via get_source_id().

class schrodinger.seam.io.sourceid.DerivationStep(method: str, variant_id: str, metadata: dict[str, Any] | None = None, tags: tuple[str, ...] = ())

Bases: NamedTuple

A single step in a derivation path.

Parameters:
  • method – What transformation was applied (e.g., “LigPrep”, “GlideDock”)

  • variant_id – Unique identifier for this output variant

  • metadata – Optional extra context (JSON-serializable, not used for identity)

  • tags – Tags on the SourceID at this point when the next derivation happened

method: str

Alias for field number 0

variant_id: str

Alias for field number 1

metadata: dict[str, Any] | None

Alias for field number 2

tags: tuple[str, ...]

Alias for field number 3

class schrodinger.seam.io.sourceid.SourceID

Bases: ABC

Abstract base class for structure source identifiers.

Each source ID represents where a structure came from and provides: - Methods to encode/decode to/from structure properties - Comparison and sorting support - Human-readable string representation

Subclasses must define a source_type class variable to register themselves for automatic decoding via get_source_id().

source_type: ClassVar[str] = ''
abstract property sort_key: tuple

Return tuple for sorting within this source type.

The full comparison key is (source_type, sort_key).

Returns:

Tuple of comparable values

abstract toDict() dict[str, Any]

Return dict representation for JSON serialization.

Returns:

Dict with type-specific fields (not including source_type)

abstract classmethod fromDict(data: dict[str, Any]) SourceID

Create instance from dict representation.

Parameters:

data – Dict with type-specific fields

Returns:

SourceID instance

addTag(tag: str) None

Add a tag to this SourceID.

Tags are workflow metadata excluded from equality and hashing.

Parameters:

tag – Tag string to add

getTags() frozenset[str]

Return the tags on this SourceID.

Returns:

Frozenset of tag strings

class schrodinger.seam.io.sourceid.FileSourceID(filename: str, index: int, file_hash: str, title: Optional[str] = None)

Bases: SourceID

Source ID for structures read from files.

Encodes file information including filename, index within file, a hash of the file metadata (name, size, mtime), and optionally the structure title.

source_type: ClassVar[str] = 'file'
__init__(filename: str, index: int, file_hash: str, title: Optional[str] = None)

Initialize FileSourceID.

Parameters:
  • filename – Filename (basename only, not full path)

  • index – Index within file (1-based)

  • file_hash – Hash of file metadata

  • title – Structure title (optional)

property sort_key: tuple

Return tuple for sorting within this source type.

The full comparison key is (source_type, sort_key).

Returns:

Tuple of comparable values

toDict() dict[str, Any]

Return dict representation for JSON serialization.

Returns:

Dict with type-specific fields (not including source_type)

classmethod fromDict(data: dict[str, Any]) FileSourceID

Create instance from dict representation.

Parameters:

data – Dict with type-specific fields

Returns:

SourceID instance

classmethod from_file(filepath: str, index: int, title: Optional[str] = None) FileSourceID

Create FileSourceID from file path and index.

Computes the file hash from the actual file.

Parameters:
  • filepath – Full path to file

  • index – Index within file (1-based)

  • title – Structure title (optional)

Returns:

FileSourceID instance

class schrodinger.seam.io.sourceid.StructureSourceID(structure_hash: str)

Bases: SourceID

Source ID based on structure content itself.

Used as a fallback when no external source information is available. The hash is based on coordinates, elements, and connectivity.

source_type: ClassVar[str] = 'content'
__init__(structure_hash: str)

Initialize StructureSourceID.

Parameters:

structure_hash – Hash of structure content

property sort_key: tuple

Return tuple for sorting within this source type.

The full comparison key is (source_type, sort_key).

Returns:

Tuple of comparable values

toDict() dict[str, Any]

Return dict representation for JSON serialization.

Returns:

Dict with type-specific fields (not including source_type)

classmethod fromDict(data: dict[str, Any]) StructureSourceID

Create instance from dict representation.

Parameters:

data – Dict with type-specific fields

Returns:

SourceID instance

classmethod from_structure(st: Structure) StructureSourceID

Create StructureSourceID by computing hash from structure.

Parameters:

st – Structure to hash

Returns:

StructureSourceID instance

class schrodinger.seam.io.sourceid.DerivedSourceID(root: SourceID, derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None)

Bases: SourceID

Source ID for structures derived from other structures.

Tracks the original source (root) and the derivation path. The root is always a non-DerivedSourceID (e.g., FileSourceID or StructureSourceID), ensuring a flat structure.

The derivation_path is a list of DerivationStep namedtuples, each with:

  • method: what transformation was applied (e.g., “LigPrep”, “enumeration”)

  • variant_id: unique identifier for this output (auto-generated if not provided)

  • metadata: optional flat JSON-serializable dict for extra context

source_type: ClassVar[str] = 'derived'
__init__(root: SourceID, derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None)

Initialize DerivedSourceID with a single derivation step.

Parameters:
  • root – Original source (must not be a DerivedSourceID)

  • derivation_method – Name of the derivation (e.g., “LigPrep”)

  • variant_id – Identifier for this output (auto-generated UUID if None)

  • metadata – Optional metadata dict (JSON-serializable, max 1KB)

Raises:

ValueError – If root is a DerivedSourceID or metadata validation fails

property sort_key: tuple

Return tuple for sorting within this source type.

The full comparison key is (source_type, sort_key).

Returns:

Tuple of comparable values

derive(derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None) Self

Create a child DerivedSourceID with an additional derivation step.

Parameters:
  • derivation_method – Name of the derivation (e.g., “LigPrep”, “GlideDock”)

  • variant_id – Identifier for this specific output (auto-generated UUID if None)

  • metadata – Optional metadata dict (must be JSON-serializable, max 1KB)

Returns:

New DerivedSourceID with extended path

Raises:

ValueError – If metadata validation fails

classmethod deriveFrom(source: Structure | SourceID, derivation_method: str, variant_id: str | None = None, metadata: dict[str, Any] | None = None) DerivedSourceID

Create a DerivedSourceID from a Structure or SourceID.

This is a convenience method that handles all source types uniformly:

  • If source is a Structure, extracts its SourceID (or creates a StructureSourceID if none exists)

  • If source is a DerivedSourceID, extends its derivation path

  • If source is any other SourceID, uses it as the root

Parameters:
  • source – Structure or SourceID to derive from

  • derivation_method – Name of the derivation (e.g., “LigPrep”)

  • variant_id – Identifier for this output (auto-generated UUID if None)

  • metadata – Optional metadata dict (must be JSON-serializable, max 1KB)

Returns:

New DerivedSourceID

Raises:

ValueError – If metadata validation fails

parent() SourceID

Return the source ID of the parent structure.

Returns:

Parent’s source ID (root if only one derivation step)

getParentWithTag(tag: str) SourceID

Find an ancestor with the given tag.

Walks the derivation path in reverse looking for a step whose tags contain the given tag. Returns the reconstructed ancestor SourceID at that position with its tags restored.

Parameters:

tag – Tag to search for

Returns:

Ancestor SourceID with the tag

Raises:

ValueError – If no ancestor with the tag is found

prettyPrint() str

Return a multi-line formatted ancestry path.

Returns:

Human-readable ancestry chain

toDict() dict[str, Any]

Return dict representation for JSON serialization.

Returns:

Dict with type-specific fields (not including source_type)

classmethod fromDict(data: dict[str, Any]) Self

Create instance from dict representation.

Parameters:

data – Dict with type-specific fields

Returns:

SourceID instance

schrodinger.seam.io.sourceid.get_source_id(st: Structure) SourceID

Get source ID from a structure.

Reads the PROP_SOURCE_TYPE property to determine which SourceID subclass to use for decoding. If no source ID is present, computes and returns a StructureSourceID based on the structure’s content.

Parameters:

st – Structure to get source ID from

Returns:

SourceID instance (always returns a value; never None)

schrodinger.seam.io.sourceid.get_root_source_id(source: Structure | SourceID) SourceID

Get the root source ID from a Structure or SourceID.

For derived sources, returns the ultimate root (the original non-derived source). For non-derived sources, returns the source itself.

Parameters:

source – Structure or SourceID to get root from

Returns:

Root SourceID (never a DerivedSourceID)

schrodinger.seam.io.sourceid.set_source_id(st: Structure, source_id: SourceID) None

Set source ID on a structure.

Sets PROP_SOURCE_TYPE and delegates to the source ID’s encoding method.

Parameters:
  • st – Structure to annotate

  • source_id – SourceID to set