schrodinger.seam.transforms.samplers module

class schrodinger.seam.transforms.samplers.RandomSample(n: int, seed: Optional[int] = None, distinct=False)

Bases: apache_beam.transforms.ptransform.PTransform

A PTransform that returns approximately n random elements.

On average, the number of elements sampled will be at most 0.3% off from n. For small numbers of n (less than or equal to 100,000), it will be exactly n.

The seed value is only used if n is larger than 100,000.

Example usage: >>> with beam.Pipeline() as p: … sample = (p | beam.Create(range(10)) … | RandomSample(3)) … # sample will contain three randomly selected elements

If distinct is True, then the input pcollection is first deduplicated before sampling.

N_CUTOFF = 100000
__init__(n: int, seed: Optional[int] = None, distinct=False)
expand(inputs)
WithCount()

Returns a tuple of the sampled pcollection and a pcollection containing the number of inputs that were sampled from.

Example usage: >>> with beam.Pipeline() as p: … sample, count = (p | beam.Create([1, 1, 2, 2]) … | RandomSample(3, distinct=True).WithCount()) … # sample will contain three randomly selected elements … # count will contain the number of elements in the input pcollection (2)

default_label()