schrodinger.application.building_block_exploration.bb_explorer_utils module¶

schrodinger.application.building_block_exploration.bb_explorer_utils.existing_file(file_path: str) → str¶: Validate that file exists and is a regular file.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_input_files(input_dir: str, exts: list = ['.csv', '.csv.gz', '.pfx']) → list[str]¶

Get files of given extensions from the input directory. Raises an exception if no files with the given extensions are present.

Parameters:

input_dir – input directory
exts – list of allowed file extension

return: list of files with the given extension

schrodinger.application.building_block_exploration.bb_explorer_utils.set_input_files(jsb, args)¶: Set input files for the job specification builder based on the provided arguments.

schrodinger.application.building_block_exploration.bb_explorer_utils.set_immutable_input_files(jsb, args)¶: Set input files that are supposed to be immutable, for the job specification builder.

schrodinger.application.building_block_exploration.bb_explorer_utils.set_mutable_input_files(jsb, args)¶: Set input files that are supposed to be mutable, for the job specification builder.

schrodinger.application.building_block_exploration.bb_explorer_utils.perform_synthesis(route_name: str, route_object: RouteNode, reagent_files: list[str], max_products: int, product_file: str, num_synthesis_jobs: int, systematic: bool = False, sort_by_index_sum: bool = False, start: int = 0, stop: Optional[int] = None, product_property_filter: Optional[str] = None, product_smarts_filter: Optional[str] = None, logfile_archive: Optional[str] = None, random_seed: Optional[int] = None)¶

Performs combinatorial synthesis for the given synthetic route using the specified reagent files as sources. We do not deduplicate the products here because we do it later in the workflow and there is no need to do it twice.

Parameters:

route_name – Name of the route to run synthesis for.
route_object – Route node to use for the synthesis.
reagent_files – list of reagent files associated with the route which contain building blocks to use in the synthesis.
max_products – Maximum number of products to synthesize.
product_file – Path to the file where the synthesized products will be written.
num_synthesis_jobs – Number of synthesis subjobs to run in parallel.
systematic – Whether to do systematic enumeration or pick reagents randomly.
sort_by_index_sum – perform systematic enumeration in the order of the sum of reagent indices within the source files rather than in the order of Cartesian product of the sources. Only applicable if systematic is True.
product_property_filter – A json file that contains a list of property filters. Only products that match the filters will be included in the output.
product_smarts_filter – A json file that contains a list of SMARTS based filters. Only products that match the filters will be included in the output.
logfile_archive – A zip file containing all the subjob logs, logs from this function are added to this zip file.

schrodinger.application.building_block_exploration.bb_explorer_utils.process_raw_building_blocks(building_blocks_file: str, route_dict: dict, reagent_classes_file: Optional[str] = None, logfile_archive: Optional[str] = None) → str¶

Process the raw building blocks by 1) running the deprotection reaction to enhance the building blocks and 2) creating reagent libraries for all the reagent classes required by the routes we are running.

Parameters:

raw_inp_bb – Path to the raw input building blocks file.
route_dict – Dictionary of route names to route objects.
logfile_archive – A zip file containing all the subjob logs, logs from this function are added to this zip file.

Returns:

Path to the directory containing the processed building blocks.

schrodinger.application.building_block_exploration.bb_explorer_utils.add_deprotected_reagents(building_blocks_file: str, logfile_archive: Optional[str] = None) → str¶

Add deprotected reagents to the raw input building blocks file. Currently we only run BOC-deprotection because we observed this has a huge impact on number of compounds we can synthesize for Enamine. The output file only has SMILES, and Name columns, so any other columns in the input file are ignored.

Parameters:: raw_inp_bb – Path to the raw input building blocks file.
Returns:: Path to the combined file which includes deprotected reagents in addition to the original reagents.

schrodinger.application.building_block_exploration.bb_explorer_utils.run_boc_deprotection(input_file: str, output_file: str, logfile_archive: Optional[str] = None)¶: Runs BOC deprotection on the input file and writes the results to the output file.

schrodinger.application.building_block_exploration.bb_explorer_utils.run_create_reagent_library(bb_file, route_dict, reagent_classes_file=None, logfile_archive: Optional[str] = None) → str¶

Creates reagent libraries for all the reagent classes required by the provided routes. :param bb_file: Path to the building blocks file.

Parameters:

route_dict – Dictionary of route names to route objects.
reagent_classes_file – Path to the reagent classes file. If not provided, the default one from mmshare will be used.

Returns:

Path to the directory containing the reagent libraries.

schrodinger.application.building_block_exploration.bb_explorer_utils.filter_reagent_classes(reagent_classes: dict, route_dict: dict) → dict¶

Filter the reagent classes and return only the classes that are used in the routes in the provided routes dictionary.

Parameters:

reagent_classes – Dictionary of reagent classes.
route_dict – Dictionary of routes.

Returns:

Dictionary of filtered reagent classes.

schrodinger.application.building_block_exploration.bb_explorer_utils.update_route_dict(route_dict: dict, bb_dir: str) → dict¶

Update the route dictionary to ensure that only the routes that have all the required reagents present in the building blocks directory are the ones we run in the workflow.

Parameters:

route_dict – Dictionary of route names to route objects.
bb_dir – Directory containing the building blocks.

Returns:

updated routes dictionary.

schrodinger.application.building_block_exploration.bb_explorer_utils.is_reagent_file_valid(reagent_name: str, reagent_file: str, bb_dir: str) → bool¶: Check if the reagent file is valid and is present in the building blocks directory. A valid reagent file is either a .csv or .pfx file with at least one reagent.

schrodinger.application.building_block_exploration.bb_explorer_utils.validate_input_bb_file(input_bb_file: str)¶: Validate the input building blocks file to ensure it has the required columns. :param input_bb_file: Path to the input building blocks file.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_smi_name_index(input_file: str)¶

Get the index of the SMILES and Name columns in the input file. :param input_file: Path to the input file.

Returns:: Tuple of (SMILES index, Name index, msg) where msg is None if both SMILES and Name column are found, otherwise appropriate error message.

schrodinger.application.building_block_exploration.bb_explorer_utils.validate_routes_file(routes_file)¶

schrodinger.application.building_block_exploration.bb_explorer_utils.filter_through_bloom_filter(input_file: str, bloom_filter_path: Path, num_to_keep: int, chunk_size: int = 10000, csv_column_name: str = 's_rdkit_InchiKey') → str¶

Filter the input file through a bloom filter and return a file containing ligands that are present in the bloom filter. Maximum number of ligands it keeps is specified by the num_to_keep parameter and the function stops going through the input file once it finds the maximum number of filtered ligands.

Parameters:

input_file – Path to the input file. This file is removed after filtering is done and we only keep the filtered file.
bloom_filter_path – Path to the bloom filter file or directory.
num_to_keep – Maximum number of compounds to keep after filtering.
chunk_size – Number of elements to load in memory and process in each chunk through the bloom filter.
csv_column_name – Name of the column in the input csv that contains the entries that are stored in the Bloom filter

Returns:

Path to the filtered output file.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_csv_header_and_column_index(input_file: str, column_name: str)¶

Get the header and index of the specified column in the input csv file.

Parameters:

input_file – Path to the input csv file.
column_name – Name of the column to get the index for.

Returns:

Tuple of (header, column_index) where header is a list of column names and column_index is the index of the specified column.

schrodinger.application.building_block_exploration.bb_explorer_utils.create_state_dict(args) → dict¶: Creates a state dictionary to keep track of the workflow state. It determines the details of every cycle based on the num_to_dock parameter within the args. The protocal for determining these details is manually implemented and was found by trial and error to most efficiently get to the best scoring building blocks.

schrodinger.application.building_block_exploration.bb_explorer_utils.add_cycle_block(state_dict, cycle_params, num_cycles, num_left, curr_cycle_num, max_dock_cycle)¶

Helper function to add a block of cycles to the state dictionary.

Parameters:

state_dict (dict) – The state dictionary to update.
cycle_params (dict) – The cycle parameters to use for this block of cycles.
num_cycles (int) – The number of cycles to add.
num_left (int) – Total number of compounds left to be added to cycles.
curr_cycle_num (int) – The current cycle number.
max_dock_cycle (int) – The maximum number of compounds to add to a single cycle.

schrodinger.application.building_block_exploration.bb_explorer_utils.update_state_file(state_file_name, state_dict)¶

class schrodinger.application.building_block_exploration.bb_explorer_utils.Product(InchiKey: str, reagent_smis: ~typing.List[str] = <factory>, SMILES: str = '', docking_score: float = 100.0, route_name: str = '', cycle_name: str = '', pre_ligprep_SMILES: str = '')¶

Bases: object

Class to represent a product compound synthesized using a route. Contains following information about the product:

: InchiKey: InchiKey of the product.

: reagent_smis: List of SMILES of the reagents used to synthesize the: product.

: SMILES: SMILES of the product.

: docking_score: Docking score of the product.

: route_name: Name of the synthetic route used to synthesize the product.

: cycle_name: Name of the cycle in which the product was synthesized: during the current job.
: pre_ligprep_SMILES: SMILES of the product compound before running it: through ligprep, same as SMILES if ligprep was not run.

InchiKey: str¶

reagent_smis: List[str]¶

SMILES: str = ''¶

docking_score: float = 100.0¶

route_name: str = ''¶

cycle_name: str = ''¶

pre_ligprep_SMILES: str = ''¶

classmethod fromCsvRow(row: dict, column_names: dict = {'InchiKey': 'InchiKey', 'SMILES': 'SMILES', 'cycle_name': 'cycle_name', 'docking_score': 'docking_score', 'reagent1': 'reagent1', 'reagent2': 'reagent2', 'reagent3': 'reagent3', 'route_name': 'route_name'})¶

Create a Product object from a csv row. :param row: A dictionary representing a row from a csv file.

Parameters:: column_names – A dictionary mapping the expected column names required for the Product object to the actual column names in the csv file.

toCsvRow(column_names: dict = {'InchiKey': 'InchiKey', 'SMILES': 'SMILES', 'cycle_name': 'cycle_name', 'docking_score': 'docking_score', 'reagent1': 'reagent1', 'reagent2': 'reagent2', 'reagent3': 'reagent3', 'route_name': 'route_name'})¶

Get a csv row as a dictionary to be used with csv.DictWriter from a Product object.

Parameters:: column_names – A dictionary mapping the expected column names required for the Product object to the actual column names in the csv file.

: returnA dictionary representing a row for the csv file corresponding: to the Product object.

__init__(InchiKey: str, reagent_smis: ~typing.List[str] = <factory>, SMILES: str = '', docking_score: float = 100.0, route_name: str = '', cycle_name: str = '', pre_ligprep_SMILES: str = '') → None¶

class schrodinger.application.building_block_exploration.bb_explorer_utils.ReagentDistribution(reagent_smiles: ~typing.List[str] = <factory>, smi_index_map: ~typing.Dict[str, int] = <factory>, means: ~numpy.ndarray = <factory>, stds: ~numpy.ndarray = <factory>, prior_variance: float = 1.0, rng: ~numpy.random._generator.Generator = <factory>)¶

Bases: object

Class to keep track of the distribution of docking scores for all the reagents in a given reagent class for a given synthetic route. Contains following information about the reagent class:

: reagent_smiles: List of SMILES of the reagents in the reagent class.: for a given route.
: smi_index_map: A dictionary mapping reagent SMILES to: their index in the corresponding reagent_smiles list.
: means: A numpy array containing the mean docking score: for each reagent in the corresponding reagent_smiles list.
: stds: A numpy array containing the standard deviations: of docking scores for each reagent in the corresponding reagent_smiles list.
: prior_variance: Prior variance determined by the warmup phase,: it is the same for all the reagents in the reagent class.

: rng: Random number generator to be used for sampling scores.

reagent_smiles: List[str]¶

smi_index_map: Dict[str, int]¶

means: ndarray¶

stds: ndarray¶

prior_variance: float = 1.0¶

rng: Generator¶

initializeDataFromSmiList()¶: Initialize the means and stds arrays based on the reagent_smiles list.

setPrior(prior_mean: float, prior_std: float)¶: Set the prior mean and standard deviation for the reagent class.

updateBayesian(sum_of_scores: ndarray, observed_counts: ndarray)¶

Update the mean and standard deviation array using Bayesian updating assuming that the scores for each reagent are independent and normally distributed.

Parameters:

sum_of_scores – Array containing the sum of docking scores for each reagent in the current batch. The size of the array is equal to the number of reagents in the reagent class.
observed_counts – Array containing the number of times each reagent was used to synthesize a compound in the current batch. The size of the array is equal to the number of reagents in the reagent class.

getReagentIndex(reagent_smi: str) → int¶

Get the index of the reagent in the reagent_smiles list.

Parameters:: reagent_smi – SMILES of the reagent.
Returns:: Index of the reagent in the reagent_smiles list.

sampleScores() → ndarray¶: Samples scores for all the reagents in the reagent class assuming a Gaussian distribution with mean and variance equal to the reagent means and variances.

getTopReagents(num_to_get: int) → List[str]¶

Get the top reagents based on the sampled scores. This function assumes that a lower score is better, for instance, docking scores.

Parameters:: num_to_get – Number of top reagents to get.
Returns:: List of SMILES of the top reagents.

sampleReagentsRoulette(num_to_sample: int, temperature: float) → List[str]¶

Samples reagents for the reagent class using roulette wheel sampling. The sampling is done with replacement, so the same reagent can be sampled multiple times and num_to_sample may be greater than the number of reagents in the reagent class.

: param num_to_sample: Number of reagents to sample.

: param temperature: Temperature parameter to control the exploration: vs exploitation trade-off. A higher temperature will result in more exploration, while a lower temperature will result in more exploitation. Note that if a lower score is better, then the temperature should be negative.

: return: List of SMILES of the sampled reagents.

__init__(reagent_smiles: ~typing.List[str] = <factory>, smi_index_map: ~typing.Dict[str, int] = <factory>, means: ~numpy.ndarray = <factory>, stds: ~numpy.ndarray = <factory>, prior_variance: float = 1.0, rng: ~numpy.random._generator.Generator = <factory>) → None¶

class schrodinger.application.building_block_exploration.bb_explorer_utils.RouteData(random_seed: Optional[int] = None)¶

Bases: object

Class to keep track of the data for each route. Contains following information:

: route_object: RouteNode object representing the synthetic route. This: is used to run combinatorial synthesis for the route.
: mean_score: Mean docking score of the compounds synthesized using this: route.
: mean_squared_score: Mean of the squared docking scores of the compounds: synthesized using this route.
: num_compounds_docked: Number of compounds synthesized and docked using: this route so far.
: fraction_of_total_compounds: Fraction of the total compounds in the job: that were synthesized using this route. This is used to filter out routes that have very low yields.
: estimated_in_library_tries: Estimated number of tries it would take to: successfully synthesizie an in-library compound using this route. This is only used when we are using a bloom filter to filter out out-of-library compounds. At the start, we set it to be 10 for 1-step routes, 100 for 2-step routes. This is then updated if these numbers are found to be too low for the user-provided bloom filter, upto a user-specified maximum.
: reagent_class_distributions: List of ReagentDistribution objects: representing the distribution of docking scores for all the reagents in each reagent class for this route. The order of the reagent classes is the same as the order of the starting nodes in the route object.
: max_potential_products: Product of size of the reagent classes, this: represents the maximum number of potential products that can be synthesized using this route.

__init__(random_seed: Optional[int] = None)¶

property std_score¶

updateRouteDistribution(total_score, total_squared_score, count)¶

Update the mean and standard deviation of the docking scores for the current route.

Parameters:

total_score – Sum of docking scores of the compounds synthesized using this route in the current batch.
total_squared_score – Sum of the squared docking scores of the compounds synthesized in the current batch.
count – Number of compounds synthesized using this route in the current batch.

updateReagentClassDistributions(reagent_classes_scores: List[ndarray], observed_counts: List[ndarray])¶

Update the docking score distribution for all reagents for all reagent classes used for this route. The updates are performed using Bayesian update formulae.

Parameters:

scores – List of arrays containing the sum of docking scores for each reagent in the route in the currrent batch. Each array in the list corresponds to a different reagent class for the route. And the size of each array is equal to the number of reagents in the corresponding reagent class.
counts – List of arrays containing the number of times each reagent in the route was used to synthesize a compound in the current batch. Each array in the list corresponds to a different reagent class for the route. And the size of each array is equal to the number of reagents in the corresponding reagent class.

sampleScore()¶

Samples a score for the route assuming a Gaussian distribution with: mean and variance equal to the route mean and variance.

schrodinger.application.building_block_exploration.bb_explorer_utils.setup_routes_data(seed_file: str, reactions_dict: dict, routes_file: Optional[str] = None, random_seed: Optional[int] = None) → dict¶

Create a dictionary of RouteData objects for each route present in the seed file.

: param seed_file: Path to the seed file.

: param reactions_dict: Dictionary of reaction names to Reaction objects.

: param routes_file: Path to the routes file. If not provided, the default: one from mmshare will be used.

: return: Dictionary mapping route names to RouteData objects.

schrodinger.application.building_block_exploration.bb_explorer_utils.parse_seed_file(seed_file_reader: Iterable[dict], route_dict: dict, random_seed: Optional[int] = None) → dict¶: Helper function to read the seed file and create a dictionary of RouteData objects for each route present in the seed file.

schrodinger.application.building_block_exploration.bb_explorer_utils.yield_compounds_from_seed_file(seed_file: str, num_compounds: int, done_compounds: set, random_seed=None)¶

Randomly selects num_compounds compounds from the seed_file and returns a dictionary mapping InchiKeys to Product objects for the selected compounds.

: param seed_file: Path to the seed file.

: param num_compounds: number of compounds to select.

: param done_compounds: Set of InchiKeys of compounds that have already: been processed in this job. The compounds that this function yields are also added to this set.

: yields: Product objects for the selected compounds.

schrodinger.application.building_block_exploration.bb_explorer_utils.pick_routes(route_data_dict: dict, num_routes: int) → list[str]¶: Picks top num_routes routes based on their sample scores. It also filters out routes for which fraction of total compounds is less than a tenth of minimum assigned weight amongst all routes. This is because those routes have low yields and will take longer to synthesize.

schrodinger.application.building_block_exploration.bb_explorer_utils.pick_reagents_for_route(route_data: RouteData, num_samples: list[int], output_reagent_files: list[str], rescore: bool = False, temperature: float = -1.0)¶

Picks a set of reagents for a given route and stores them in the specified reagent_files. Starts by sampling scores for all the reagents assuming a normal distribution. If rescore is set to True, it returns the top scoring reagents upto num_samples. Otherwise it performs a roulette wheel selection using the specified temperature. Note that for non-rescore cycles, the sampling is done with replacement so num_samples may be more than number of elements in the associated reagent list.

Parameters:

route_data – The route data object for which to pick reagents.
num_samples – The number of samples to pick for each reagent class.
output_reagent_files – The files to which the selected reagents will be written.
rescore – Whether to select top scoring reagents or do a roulette wheel selection.
temperature – The temperature to use for sampling.

schrodinger.application.building_block_exploration.bb_explorer_utils.create_reagent_source_file(reagent_smi_list: list[str], reagent_file: str) → None¶: Creates a reagent source file containing the SMILES of the reagents in the reagent_smi_list.

schrodinger.application.building_block_exploration.bb_explorer_utils.write_ligand_file_for_docking(compounds_generator: Iterable[Product]) → str¶: Writes the compounds that are to be docked in the current cycle to an input csv file.

schrodinger.application.building_block_exploration.bb_explorer_utils.run_glide(ligand_file: str, glide_grid: str, jobname: str, max_glide_cpu: int = 50, glide_mq: bool = True, extra_docking_config: dict = None, ligprep_args: str = '-pht 1.0 -epik -s16', logfile_archive: str = None) → Tuple[str, str]¶

Runs glide docking on the input ligand file.

Parameters:

ligand_file – Path to the input ligand file.
glide_grid – Path to the glide grid file.
jobname – Job name to use for the glide run.
max_glide_cpu – Maximum number of CPUs to use simultaneously for the glide run.
glide_mq – Whether to run glide docking using ZeroMQ.
extra_docking_config – Extra docking configuration parameters to be added to the glide input file.
ligprep_args – Additional arguments to pass to LigPrep.
logfile_archive – Path to a zip file to which the glide log file will be appended. If None, the log file is not archived.

Returns:

Tuple of (docking_csv_file, libfile) which are the output csv file and pose library structure file from the glide docking run.

schrodinger.application.building_block_exploration.bb_explorer_utils.write_glide_input_file(ligand_file, glide_grid, jobname, extra_docking_config, ligprep_args)¶

Writes a .inp file to be used as an input to glide docking.

: param ligand_file: Path to the input ligand file.

: param glide_grid: Path to the glide grid file.

: param jobname: Job name to use for the glide run.

: param extra_docking_config: A dictionary that contains: extra docking keywords to specify more docking options.

: param ligprep_args: Additional arguments to pass to LigPrep.

: return: Path to the glide .inp input file.

schrodinger.application.building_block_exploration.bb_explorer_utils.process_docking_results(docking_results_csv: str, output_scores_csv: str, docking_results_structure_file: str, cycle_name: str, output_fieldnames: list, curr_top_ligands_pool: str, num_ligands_in_pool: int = 50000) → dict¶

Reads the docking results file and appends the results to output files. Additionally it also returns the dictionary of compounds in the docking results arranged by routes and InchiKeys, i.e. a dictionary for every route which maps InchiKeys of compounds for that route to Product objects.

Parameters:

docking_results_csv – The CSV output file from glide for the current cycle.
output_scores_csv – The output csv file for the job to which the docking results from the current cycle will be appended. Assumes that the file is sorted by docking scores and this function maintains the order.
docking_results_structure_file – The glide pose file for the current cycle.
cycle_name – The name of the current cycle.
output_fieldnames – The field names for the output csv file.
curr_top_ligands_pool – The glide pose library file which contains the poses of the pool of top ligands across all cycles. We merge the current cycle results into this file to maintain the pool of num_ligands_in_pool ligands.
num_ligands_in_pool – The number of ligands in the top ligands pool.

schrodinger.application.building_block_exploration.bb_explorer_utils.read_glide_output_file(docking_results_file: str) → dict¶

Reads the glide output file and stores the products in a dictionary along with their docking scores and building block information.

: param docking_results_file: The CSV output file from glide.

: return: results from the glide output file organized by routes. The output: dictionary contains a dictionary for every route which maps InchiKeys of compounds for that route to Product objects.

schrodinger.application.building_block_exploration.bb_explorer_utils.add_products_to_sorted_output_file(sorted_products: Iterable[Product], sorted_output_file: str, output_fieldnames: List[str], cycle_name: str) → None¶

Appends the sorted products to the sorted output file maintaining the order by docking score.

Parameters:

sorted_products – An iterable of Product objects sorted by docking score.
sorted_output_file – Output file to which the products will be added.
output_fieldnames – List of fieldnames in the output file header.
cycle_name – Cycle name for the current cycle.

schrodinger.application.building_block_exploration.bb_explorer_utils.add_poses_to_output_lib_file(docking_results_structure_file: str, curr_top_ligands_pool: str, num_ligands_in_pool: int = 50000) → None¶

Sort and merge the new docking results with the pool of top ligands. It retains upto num_ligands_in_pool ligands from current and previous cycles combined.

Parameters:

docking_results_structure_file – Output structure file with new docking results from the current cycle.
curr_top_ligands_pool – The structure file containing the pool of top ligands across all cycles. This file is updated by this function to include the new docking results.
num_ligands_in_pool – Number of top ligands to retain in the pool.

schrodinger.application.building_block_exploration.bb_explorer_utils.write_output_structure_files(route_data_dict: Dict[str, RouteData], curr_top_ligands_pool: str, output_top_hits_structure_file: str, output_top_hits_by_route_file: str, num_hits_to_report: int = 5000, num_top_routes_to_report: int = 5, num_ligands_each_top_route: int = 1000)¶

Takes in the structure file containing a pool of top ligands across all cycles and extracts top hits (by default upto 5k) from the pool and writes them to a file. It also extracts top hits (by default upto 1k) for each of the top routes (by default upto 5) based on route mean docking scores and writes them to a separate file.

Parameters:

route_data_dict – Dictionary mapping route names to RouteData objects.
curr_top_ligands_pool – The structure file containing the pool of top ligands across all cycles.
output_top_hits_structure_file – The output structure file with top hits extracted from the top ligands pool.
output_top_hits_by_route_file – The output structure file with top hits for each one of the top routes extracted from the top ligands pool.
num_hits_to_report – Number of top hits to extract from the top ligands pool.
num_top_routes_to_report – Number of top routes to consider for extracting top hits by route.
num_ligands_each_top_route – Number of top hits to extract for each of the top routes.

schrodinger.application.building_block_exploration.bb_explorer_utils.write_top_hits_structure_file(curr_top_ligands_pool: str, output_top_hits_structure_file: str, num_hits_to_report: int = 5000)¶: Writes the top hits (by default upto 5k) from the current top ligands pool.

schrodinger.application.building_block_exploration.bb_explorer_utils.write_top_hits_by_route(route_data_dict: Dict[str, RouteData], curr_top_ligands_pool: str, output_top_hits_by_route_file: str, num_top_routes_to_report: int = 5, num_ligands_each_top_route: int = 1000)¶

Writes the top hits drawn from the current top ligands pool for each of the top routes into the output file. Top routes are selected based on their mean docking scores.

Parameters:

route_data_dict – Dictionary mapping route names to RouteData objects.
curr_top_ligands_pool – The structure file containing the pool of top ligands across all cycles.
output_top_hits_by_route – The output structure file with top hits for each one of the top routes.
num_top_routes_to_report – Number of top routes to report the hits for.
num_ligands_each_top_route – Number of top hits to extract for each of the top routes.

schrodinger.application.building_block_exploration.bb_explorer_utils.update_route_fractions(route_data_dict, num_docked)¶: Updates the fraction of total compounds for each route in the route_data_dict.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_args_from_state_file(state_file: str)¶

Read job arguments from a state file.

Parameters:: state_file – Path to the state file.
Returns:: Parsed arguments.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_previous_output_file_names(restart_state_file: str) → Tuple[str, str]¶

Returns the previous output file names from the restart state file. Includes the docking scores file and glide lib file containing top ligands from the previous run. : param restart_state_file: Path to the restart state file.

: return: Previous output file names.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_state_dict_for_restart(args) → dict¶

Reads the state file and returns the state dictionary and the previous docking scores file name. : param args: Arguments passed to the workflow.

: return: Tuple of state dictionary and restart docking file name.

schrodinger.application.building_block_exploration.bb_explorer_utils.update_state_dict_for_restart(state_dict, num_to_dock, previous_num_to_dock)¶: Updates the state dictionary for a restart with a different num_to_dock value.

schrodinger.application.building_block_exploration.bb_explorer_utils.yield_products_from_csv_file(products_file: str, column_names: dict = {'InchiKey': 'InchiKey', 'SMILES': 'SMILES', 'cycle_name': 'cycle_name', 'docking_score': 'docking_score', 'reagent1': 'reagent1', 'reagent2': 'reagent2', 'reagent3': 'reagent3', 'route_name': 'route_name'})¶

Yields Product objects from a csv file. The csv file must have at least an InchiKey column.

: param docking_scores_file: Path to the previous docking scores file,: must be generated by this workflow.
: param column_names: Dictionary mapping the column names used to specify: a Product object to the column names in the docking scores csv file.

: yield: Product object for all the rows in the file.

schrodinger.application.building_block_exploration.bb_explorer_utils.check_available_memory(num_to_dock: int, num_seed_compounds: int)¶: Checks if there is enough available memory to run the workflow. Raise RuntimeError if enough memory is not available.

schrodinger.application.building_block_exploration.bb_explorer_utils.calculate_product_space(route_dict: dict, bb_dir: str) → dict¶

Calculates and returns maximum possible products for all the synthetic routes based on provided building blocks.

: param route_dict: Dictionary mapping route names to Route objects.

: param bb_dir: Path to the building block directory.

: return: Dictionary mapping route names to maximum potential number of: compounds we can synthesize from the given routes and building blocks.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_route_counts(route_data_dict: dict, routes_to_include: list) → dict¶: Returns a dictionary mapping route names to maximum potential number of compounds from the given route data.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_route_weights(route_counts: dict) → dict¶

Returns fraction of compounds that should be assigned to each route based on maximum possible products for the route. It balances the fractions by averaging the fraction based on counts and a uniform fraction to avoid routes with very low counts being assigned zero fraction. This also helps in keeping the results diverse across routes.

Parameters:: route_counts – Dictionary mapping route names to counts of total possible products.
Returns:: Dictionary mapping route names to assigned fractions.