schrodinger.seam.testing.benchmarks module

Benchmarks for different parts of the SeamRunner or Seam transforms.

Usage:

$SCHRODINGER/run seam_benchmarks.py

class schrodinger.seam.testing.benchmarks.BenchmarkResult(value: float, units: Optional[str] = None)

Bases: object

Variables:
  • value – Current value of the metric

  • units – (optional) units of the value.

value: float
units: Optional[str] = None
__init__(value: float, units: Optional[str] = None) None
class schrodinger.seam.testing.benchmarks.Benchmark

Bases: object

UNIT_TEST_N = 1
PERFORMANCE_TEST_N = 500
setup()
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

schrodinger.seam.testing.benchmarks.AlkaneStructure(n: int) Structure
schrodinger.seam.testing.benchmarks.tmp_cwd()
class schrodinger.seam.testing.benchmarks.WriteStructuresToFile_MaxSizeOfDirectory

Bases: Benchmark

This benchmark is to measure WriteStructuresToFile’s max shard directory size.

UNIT_TEST_N = 5
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WriteStructuresToFile_Runtime

Bases: Benchmark

This benchmark is to measure WriteStructuresToFile’s runtime.

run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyWithSlowCoder

Bases: Benchmark

This benchmark is to test the performance of GroupByKey when using keys and values that are slow to deserialize.

run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.ExecStagesWithSlowCoder

Bases: Benchmark

This benchmark is to test the performance of ExecutableStages which have outputs that are slow to deserialize.

run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.FlattenWithSlowCoder

Bases: Benchmark

This benchmark is to test the performance of Flatten when using elements that are slow to deserialize.

run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.FlattenBenchmark

Bases: Benchmark

UNIT_TEST_N = 3
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKey

Bases: Benchmark

property gbk_executor

Set up a GroupByKeyExecutor with an input and output EmbeddedPCollManager

Groups input type of (int, int) to output type of (int, Iterable[int])

run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.MaxBundleMemUsage

Bases: Benchmark

Test that that bundles that are loaded directly into memory to process never exceed the max bundle memory limit.

run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WriteStructuresToFileMemoryBenchmark

Bases: Benchmark

Test that that bundles that are loaded directly into memory to process never exceed the max bundle memory limit.

run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.LargeOutputPerBundleWithInProcessWorkerBenchmark

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption.

UNIT_TEST_N = 5
PERFORMANCE_TEST_N = 2000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.LargeOutputPerBundleWithSubprocessWorkerBenchmark

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption.

UNIT_TEST_N = 5
PERFORMANCE_TEST_N = 8000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.LargeOutputPerBundleWithLocalWorkerBenchmark

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 5
PERFORMANCE_TEST_N = 8000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WorkerServerWithSortingBundleProcessingTime

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 1000
PERFORMANCE_TEST_N = 1000000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.WorkerServerBundleProcessingTime

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 1000
PERFORMANCE_TEST_N = 100000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.PCollFileManager_CreateReadersWithManyDataFiles

Bases: Benchmark

static pcoll_mngr()
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyLargeValues_runtime

Bases: Benchmark

UNIT_TEST_N = 5
static generate_1M_random_string()
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyDiskSpaceWithLargeValues

Bases: Benchmark

UNIT_TEST_N = 100
static generate_1M_random_string()
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyLargeElements_memory

Bases: Benchmark

UNIT_TEST_N = 5
PERFORMANCE_TEST_N = 1000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.GroupByKeyWithManyDataChunkFiles

Bases: Benchmark

Test that running GroupByKey with many data chunk files that exceed the _MAX_OPEN_FILES limit does not lead to excessive memory usage

UNIT_TEST_N = 5
PERFORMANCE_TEST_N = 1000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.RedistributingWithLargeElements

Bases: Benchmark

Test that running a pipeline where the size of outputs generated per bundle is large doesn’t result in large memory consumption even when using a DoFn marked with @requires_local_execution.

UNIT_TEST_N = 100
PERFORMANCE_TEST_N = 1000
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

class schrodinger.seam.testing.benchmarks.SeamRunnerOverheadBenchmark

Bases: Benchmark

Test a basic Map transform with the SeamRunner to test the overhead of the SeamRunner itself.

UNIT_TEST_N = 1
PERFORMANCE_TEST_N = 1
run(n: int) BenchmarkResult
Parameters:

n – The number of elements to process

schrodinger.seam.testing.benchmarks.get_benchmarks()