schrodinger.seam.transforms.samplers module¶
- class schrodinger.seam.transforms.samplers.RandomSample(n: int, seed: Optional[int] = None, distinct=False)¶
Bases:
PTransformA PTransform that returns approximately n random elements.
On average, the number of elements sampled will be at most 0.3% off from
n. For small numbers ofn(less than or equal to 100,000), it will be exactlyn.The seed value is only used if n is larger than 100,000.
Example usage:
>>> with beam.Pipeline() as p: ... sample = (p | beam.Create(range(10)) ... | RandomSample(3)) ... # sample will contain three randomly selected elements
If
distinctis True, then the input pcollection is first deduplicated before sampling.- N_CUTOFF = 100000¶
- __init__(n: int, seed: Optional[int] = None, distinct=False)¶
- display_data() dict¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
- Returns:
Dict[str, Any]: A dictionary containing
key:valuepairs. The value might be an integer, float or string value; aDisplayDataItemfor values that have more data (e.g. short value, label, url); or aHasDisplayDatainstance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
- WithCount()¶
Returns a tuple of the sampled pcollection and a pcollection containing the number of inputs that were sampled from.
Example usage:
>>> with beam.Pipeline() as p: ... sample, count = (p | beam.Create([1, 1, 2, 2]) ... | RandomSample(3, distinct=True).WithCount()) ... # sample will contain three randomly selected elements ... # count will contain the number of elements in the input pcollection (2)