schrodinger.seam.coders module¶
- exception schrodinger.seam.coders.UnserializableMolError¶
Bases:
Exception
- class schrodinger.seam.coders.MolCoder¶
Bases:
Coder- encode(mol: Mol) bytes¶
Encodes the given object into a byte string.
- decode(mol_bytes: bytes) Mol¶
Decodes the given byte string into the corresponding object.
- is_deterministic()¶
Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
- Returns:
Whether coder is deterministic.
- estimate_size(mol: Mol) int¶
Estimates the encoded size of the given value, in bytes.
Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.
The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.
- Arguments:
value: the value whose encoded size is to be estimated.
- Returns:
The estimated encoded size of the given value.
- to_type_hint() type¶
- class schrodinger.seam.coders.MolToSmilesCoder¶
Bases:
MolCoderEncodes and decodes Mol’s to and from SMILES strings similar, and will raise an exception if the molecule is not sanitizable.
“Sanitizable” in this context is defined as a smiles that will return
Nonewhen passed toChem.MolFromSmiles(which by default attempts to sanitize the molecule).NOTE: This is not as fast as
MolCoderbut will have a smaller binary size- encode(mol: Mol) bytes¶
Encodes the given object into a byte string.
- decode(smiles_bytes: bytes) Mol¶
Decodes the given byte string into the corresponding object.
- class schrodinger.seam.coders.RGroupToSmilesCoder¶
Bases:
CoderEncodes and decodes RGroups to and from SMILES strings similar.
- is_deterministic() bool¶
Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
- Returns:
Whether coder is deterministic.
- to_type_hint() type¶
- class schrodinger.seam.coders.StructureCoder¶
Bases:
Coder- is_deterministic()¶
Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
- Returns:
Whether coder is deterministic.
- to_type_hint() type¶
- class schrodinger.seam.coders.RouteNodeCoder¶
Bases:
Coder- decode(route_node_bytes: bytes) RouteNode¶
Decodes the given byte string into the corresponding object.
- is_deterministic()¶
Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
- Returns:
Whether coder is deterministic.
- to_type_hint() type¶
- class schrodinger.seam.coders.MaybeCompressedCoder(coder: Coder)¶
Bases:
FastCoderA wrapper coder that may compress the encoded elements if they exceed a certain size threshold.
- URN = 'seam:coders:MaybeCompressedCoder'¶
- __init__(coder: Coder)¶
- is_deterministic()¶
Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
- Returns:
Whether coder is deterministic.
- to_type_hint() type¶
- estimate_size(element: bytes) int¶
Estimates the encoded size of the given value, in bytes.
Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.
The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.
- Arguments:
value: the value whose encoded size is to be estimated.
- Returns:
The estimated encoded size of the given value.
- to_runner_api_parameter(unused_context)¶
- static from_runner_api_parameter(unused_payload, components, unused_context)¶
- class schrodinger.seam.coders.SourceIDCoder¶
Bases:
CoderCoder for SourceID objects that enables deterministic serialization.
This coder serializes SourceID objects to JSON format using the existing toDict() and _fromDict() methods. The encoding is deterministic because it uses sort_keys=True for JSON serialization, which is required for SourceID objects to be used as keys in Apache Beam GroupByKey operations.
The coder handles all SourceID subclasses (FileSourceID, StructureSourceID, DerivedSourceID) polymorphically by including the source_type discriminator in the encoded data.
- encode(source_id: SourceID) bytes¶
Encode SourceID to bytes.
- Parameters:
source_id – SourceID instance to encode
- Returns:
JSON-encoded bytes with source_type and data
- decode(encoded_bytes: bytes) SourceID¶
Decode bytes to SourceID.
- Parameters:
encoded_bytes – JSON-encoded bytes
- Returns:
SourceID instance
- Raises:
ValueError – If source_type is unknown
- is_deterministic() bool¶
Return True because encoding is deterministic.
The encoding uses sort_keys=True for JSON serialization, which ensures that equal SourceID objects always encode to identical bytes.
- to_type_hint() type¶