schrodinger.seam.transforms.samplers module¶
- class schrodinger.seam.transforms.samplers.RandomSample(n: int, seed: Optional[int] = None, distinct=False)¶
Bases:
apache_beam.transforms.ptransform.PTransform
A PTransform that returns approximately n random elements.
On average, the number of elements sampled will be at most 0.3% off from
n
. For small numbers ofn
(less than or equal to 100,000), it will be exactlyn
.The seed value is only used if n is larger than 100,000.
Example usage: >>> with beam.Pipeline() as p: … sample = (p | beam.Create(range(10)) … | RandomSample(3)) … # sample will contain three randomly selected elements
If
distinct
is True, then the input pcollection is first deduplicated before sampling.- N_CUTOFF = 100000¶
- __init__(n: int, seed: Optional[int] = None, distinct=False)¶
- expand(inputs)¶
- WithCount()¶
Returns a tuple of the sampled pcollection and a pcollection containing the number of inputs that were sampled from.
Example usage: >>> with beam.Pipeline() as p: … sample, count = (p | beam.Create([1, 1, 2, 2]) … | RandomSample(3, distinct=True).WithCount()) … # sample will contain three randomly selected elements … # count will contain the number of elements in the input pcollection (2)
- default_label()¶