schrodinger.seam.coders module

exception schrodinger.seam.coders.UnserializableMolError

Bases: Exception

class schrodinger.seam.coders.MolCoder

Bases: Coder

encode(mol: Mol) bytes

Encodes the given object into a byte string.

decode(mol_bytes: bytes) Mol

Decodes the given byte string into the corresponding object.

is_deterministic()

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:

Whether coder is deterministic.

estimate_size(mol: Mol) int

Estimates the encoded size of the given value, in bytes.

Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.

The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.

Arguments:

value: the value whose encoded size is to be estimated.

Returns:

The estimated encoded size of the given value.

to_type_hint() type
class schrodinger.seam.coders.MolToSmilesCoder

Bases: MolCoder

Encodes and decodes Mol’s to and from SMILES strings similar, and will raise an exception if the molecule is not sanitizable.

“Sanitizable” in this context is defined as a smiles that will return None when passed to Chem.MolFromSmiles (which by default attempts to sanitize the molecule).

NOTE: This is not as fast as MolCoder but will have a smaller binary size

encode(mol: Mol) bytes

Encodes the given object into a byte string.

decode(smiles_bytes: bytes) Mol

Decodes the given byte string into the corresponding object.

class schrodinger.seam.coders.RGroupToSmilesCoder

Bases: Coder

Encodes and decodes RGroups to and from SMILES strings similar.

encode(rgroup: RGroup) bytes

Encodes the given object into a byte string.

decode(smiles_bytes: bytes) RGroup

Decodes the given byte string into the corresponding object.

is_deterministic() bool

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:

Whether coder is deterministic.

to_type_hint() type
class schrodinger.seam.coders.StructureCoder

Bases: Coder

encode(st: Structure) bytes

Encodes the given object into a byte string.

decode(mae_bytes: bytes) Structure

Decodes the given byte string into the corresponding object.

is_deterministic()

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:

Whether coder is deterministic.

to_type_hint() type
class schrodinger.seam.coders.RouteNodeCoder

Bases: Coder

encode(route_node: RouteNode) bytes

Encodes the given object into a byte string.

decode(route_node_bytes: bytes) RouteNode

Decodes the given byte string into the corresponding object.

is_deterministic()

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:

Whether coder is deterministic.

to_type_hint() type
class schrodinger.seam.coders.MaybeCompressedCoder(coder: Coder)

Bases: FastCoder

A wrapper coder that may compress the encoded elements if they exceed a certain size threshold.

URN = 'seam:coders:MaybeCompressedCoder'
__init__(coder: Coder)
is_deterministic()

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:

Whether coder is deterministic.

to_type_hint() type
estimate_size(element: bytes) int

Estimates the encoded size of the given value, in bytes.

Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.

The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.

Arguments:

value: the value whose encoded size is to be estimated.

Returns:

The estimated encoded size of the given value.

to_runner_api_parameter(unused_context)
static from_runner_api_parameter(unused_payload, components, unused_context)
class schrodinger.seam.coders.SourceIDCoder

Bases: Coder

Coder for SourceID objects that enables deterministic serialization.

This coder serializes SourceID objects to JSON format using the existing toDict() and _fromDict() methods. The encoding is deterministic because it uses sort_keys=True for JSON serialization, which is required for SourceID objects to be used as keys in Apache Beam GroupByKey operations.

The coder handles all SourceID subclasses (FileSourceID, StructureSourceID, DerivedSourceID) polymorphically by including the source_type discriminator in the encoded data.

encode(source_id: SourceID) bytes

Encode SourceID to bytes.

Parameters:

source_id – SourceID instance to encode

Returns:

JSON-encoded bytes with source_type and data

decode(encoded_bytes: bytes) SourceID

Decode bytes to SourceID.

Parameters:

encoded_bytes – JSON-encoded bytes

Returns:

SourceID instance

Raises:

ValueError – If source_type is unknown

is_deterministic() bool

Return True because encoding is deterministic.

The encoding uses sort_keys=True for JSON serialization, which ensures that equal SourceID objects always encode to identical bytes.

to_type_hint() type