schrodinger.seam.transforms.samplers module¶
- class schrodinger.seam.transforms.samplers.RandomSample(n: int, seed: Optional[int] = None, distinct=False)¶
Bases:
apache_beam.transforms.ptransform.PTransform
A PTransform that returns approximately n random elements.
On average, the number of elements sampled will be at most 0.3% off from
n
. For small numbers ofn
(less than or equal to 100,000), it will be exactlyn
.The seed value is only used if n is larger than 100,000.
Example usage:
>>> with beam.Pipeline() as p: ... sample = (p | beam.Create(range(10)) ... | RandomSample(3)) ... # sample will contain three randomly selected elements
If
distinct
is True, then the input pcollection is first deduplicated before sampling.- N_CUTOFF = 100000¶
- __init__(n: int, seed: Optional[int] = None, distinct=False)¶
- expand(inputs)¶
- WithCount()¶
Returns a tuple of the sampled pcollection and a pcollection containing the number of inputs that were sampled from.
Example usage:
>>> with beam.Pipeline() as p: ... sample, count = (p | beam.Create([1, 1, 2, 2]) ... | RandomSample(3, distinct=True).WithCount()) ... # sample will contain three randomly selected elements ... # count will contain the number of elements in the input pcollection (2)
- default_label()¶