schrodinger.seam.testing.benchmarks module¶

Benchmarks for different parts of the SeamRunner or Seam transforms.

Usage:

$SCHRODINGER/run seam_benchmarks.py

class schrodinger.seam.testing.benchmarks.BenchmarkResult(value: float, units: Optional[str] = None)¶

Bases: object

Variables:

value – Current value of the metric
units – (optional) units of the value.

value: float¶

units: Optional[str] = None¶

__init__(value: float, units: Optional[str] = None) → None¶

class schrodinger.seam.testing.benchmarks.Benchmark¶

Bases: object

UNIT_TEST_N = 1¶

PERFORMANCE_TEST_N = 500¶

setup()¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

schrodinger.seam.testing.benchmarks.AlkaneStructure(n: int) → Structure¶

schrodinger.seam.testing.benchmarks.tmp_cwd()¶

class schrodinger.seam.testing.benchmarks.WriteStructuresToFile_MaxSizeOfDirectory¶

Bases: Benchmark

This benchmark is to measure WriteStructuresToFile’s max shard directory size.

UNIT_TEST_N = 5¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WriteStructuresToFile_Runtime¶

Bases: Benchmark

This benchmark is to measure WriteStructuresToFile’s runtime.

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyWithSlowCoder¶

Bases: Benchmark

This benchmark is to test the performance of GroupByKey when using keys and values that are slow to deserialize.

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.ExecStagesWithSlowCoder¶

Bases: Benchmark

This benchmark is to test the performance of ExecutableStages which have outputs that are slow to deserialize.

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.FlattenWithSlowCoder¶

Bases: Benchmark

This benchmark is to test the performance of Flatten when using elements that are slow to deserialize.

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.FlattenBenchmark¶

Bases: Benchmark

UNIT_TEST_N = 3¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKey¶

Bases: Benchmark

property gbk_executor¶

Set up a GroupByKeyExecutor with an input and output EmbeddedPCollManager

Groups input type of (int, int) to output type of (int, Iterable[int])

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.MaxBundleMemUsage¶

Bases: Benchmark

Test that that bundles that are loaded directly into memory to process never exceed the max bundle memory limit.

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WriteStructuresToFileMemoryBenchmark¶

Bases: Benchmark

Test that that bundles that are loaded directly into memory to process never exceed the max bundle memory limit.

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.LargeOutputPerBundleWithInProcessWorkerBenchmark¶

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption.

UNIT_TEST_N = 5¶

PERFORMANCE_TEST_N = 2000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.LargeOutputPerBundleWithSubprocessWorkerBenchmark¶

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption.

UNIT_TEST_N = 5¶

PERFORMANCE_TEST_N = 8000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.LargeOutputPerBundleWithLocalWorkerBenchmark¶

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 5¶

PERFORMANCE_TEST_N = 8000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WorkerServerWithSortingBundleProcessingTime¶

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 1000¶

PERFORMANCE_TEST_N = 1000000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WorkerServerBundleProcessingTime¶

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 1000¶

PERFORMANCE_TEST_N = 100000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.PCollFileManager_CreateReadersWithManyDataFiles¶

Bases: Benchmark

static pcoll_mngr()¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyLargeValues_runtime¶

Bases: Benchmark

UNIT_TEST_N = 5¶

static generate_1M_random_string()¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyDiskSpaceWithLargeValues¶

Bases: Benchmark

UNIT_TEST_N = 100¶

static generate_1M_random_string()¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyLargeElements_memory¶

Bases: Benchmark

UNIT_TEST_N = 5¶

PERFORMANCE_TEST_N = 1000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyWithManyDataChunkFiles¶

Bases: Benchmark

Test that running GroupByKey with many data chunk files that exceed the _MAX_OPEN_FILES limit does not lead to excessive memory usage

UNIT_TEST_N = 5¶

PERFORMANCE_TEST_N = 1000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.RedistributingWithLargeElements¶

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 100¶

PERFORMANCE_TEST_N = 1000¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

class schrodinger.seam.testing.benchmarks.SeamRunnerOverheadBenchmark¶

Bases: Benchmark

Test a basic Map transform with the SeamRunner to test the overhead of the SeamRunner itself.

UNIT_TEST_N = 1¶

PERFORMANCE_TEST_N = 1¶

run(n: int) → BenchmarkResult¶

Parameters:: n – The number of elements to process

schrodinger.seam.testing.benchmarks.get_benchmarks()¶