schrodinger.application.building_block_exploration.bb_explorer_utils module

schrodinger.application.building_block_exploration.bb_explorer_utils.get_input_files(input_dir: str, exts: list = ['.csv', '.csv.gz', '.pfx']) list[str]

Get files of given extensions from the input directory. Raises an exception if no files with the given extensions are present.

Parameters:
  • input_dir – input directory

  • exts – list of allowed file extension

return: list of files with the given extension

schrodinger.application.building_block_exploration.bb_explorer_utils.set_input_files(jsb, args)

Set input files for the job specification builder based on the provided arguments.

schrodinger.application.building_block_exploration.bb_explorer_utils.perform_synthesis(route_name: str, route_object: RouteNode, reagent_files: list[str], max_products: int, product_file: str, num_synthesis_jobs: int, systematic: bool = False, sort_by_index_sum: bool = False, product_property_filter: str = None, product_smarts_filter: str = None, logfile_list: list[str] = None)

Performs combinatorial synthesis for the given synthetic route using the specified reagent files as sources. We do not deduplicate the products here because we do it later in the workflow and there is no need to do it twice.

Parameters:
  • route_name – Name of the route to run synthesis for.

  • route_object – Route node to use for the synthesis.

  • reagent_files – list of reagent files associated with the route which contain building blocks to use in the synthesis.

  • max_products – Maximum number of products to synthesize.

  • product_file – Path to the file where the synthesized products will be written.

  • num_synthesis_jobs – Number of synthesis subjobs to run in parallel.

  • systematic – Whether to do systematic enumeration or pick reagents randomly.

  • sort_by_index_sum – perform systematic enumeration in the order of the sum of reagent indices within the source files rather than in the order of Cartesian product of the sources. Only applicable if systematic is True.

  • product_property_filter – A json file that contains a list of property filters. Only products that match the filters will be included in the output.

  • product_smarts_filter – A json file that contains a list of SMARTS based filters. Only products that match the filters will be included in the output.

  • logfile_list – A list to which the name of the synthesis log file will be added.

schrodinger.application.building_block_exploration.bb_explorer_utils.process_raw_building_blocks(building_blocks_file: str, route_dict: dict, reagent_classes_file: str = None, logfile_list: list[str] = None) str

Process the raw building blocks by 1) running the deprotection reaction to enhance the building blocks and 2) creating reagent libraries for all the reagent classes required by the routes we are running.

Parameters:
  • raw_inp_bb – Path to the raw input building blocks file.

  • route_dict – Dictionary of route names to route objects.

Returns:

Path to the directory containing the processed building blocks.

schrodinger.application.building_block_exploration.bb_explorer_utils.add_deprotected_reagents(building_blocks_file: str, logfile_list: list[str] = None) str

Add deprotected reagents to the raw input building blocks file. Currently we only run BOC-deprotection because we observed this has a huge impact on number of compounds we can synthesize for Enamine. The output file only has SMILES, and Name columns, so any other columns in the input file are ignored.

Parameters:

raw_inp_bb – Path to the raw input building blocks file.

Returns:

Path to the combined file which includes deprotected reagents in addition to the original reagents.

schrodinger.application.building_block_exploration.bb_explorer_utils.run_boc_deprotection(input_file: str, output_file: str, logfile_list: list[str] = None)

Runs BOC deprotection on the input file and writes the results to the output file.

schrodinger.application.building_block_exploration.bb_explorer_utils.run_create_reagent_library(bb_file, route_dict, reagent_classes_file=None, logfile_list: list[str] = None) str

Creates reagent libraries for all the reagent classes required by the provided routes. :param bb_file: Path to the building blocks file.

Parameters:
  • route_dict – Dictionary of route names to route objects.

  • reagent_classes_file – Path to the reagent classes file. If not provided, the default one from mmshare will be used.

Returns:

Path to the directory containing the reagent libraries.

schrodinger.application.building_block_exploration.bb_explorer_utils.filter_reagent_classes(reagent_classes: dict, route_dict: dict) dict

Filter the reagent classes and return only the classes that are used in the routes in the provided routes dictionary.

Parameters:
  • reagent_classes – Dictionary of reagent classes.

  • route_dict – Dictionary of routes.

Returns:

Dictionary of filtered reagent classes.

schrodinger.application.building_block_exploration.bb_explorer_utils.update_route_dict(route_dict: dict, bb_dir: str) dict

Update the route dictionary to ensure that only the routes that have all the required reagents present in the building blocks directory are the ones we run in the workflow.

Parameters:
  • route_dict – Dictionary of route names to route objects.

  • bb_dir – Directory containing the building blocks.

Returns:

updated routes dictionary.

schrodinger.application.building_block_exploration.bb_explorer_utils.is_reagent_file_valid(reagent_name: str, reagent_file: str, bb_dir: str) bool

Check if the reagent file is valid and is present in the building blocks directory. A valid reagent file is either a .csv or .pfx file with at least one reagent.

schrodinger.application.building_block_exploration.bb_explorer_utils.validate_input_bb_file(input_bb_file: str)

Validate the input building blocks file to ensure it has the required columns. :param input_bb_file: Path to the input building blocks file.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_smi_name_index(input_file: str)

Get the index of the SMILES and Name columns in the input file. :param input_file: Path to the input file.

Returns:

Tuple of (SMILES index, Name index, msg) where msg is None if both SMILES and Name column are found, otherwise appropriate error message.

schrodinger.application.building_block_exploration.bb_explorer_utils.validate_routes_file(routes_file)
schrodinger.application.building_block_exploration.bb_explorer_utils.filter_through_bloom_filter(input_file: str, bloom_filter_file: str, num_to_keep: int, chunk_size: int = 10000, csv_column_name: str = 's_rdkit_InchiKey') str

Filter the input file through a bloom filter and return a file containing ligands that are present in the bloom filter. Maximum number of ligands it keeps is specified by the num_to_keep parameter and the function stops going through the input file once it finds the maximum number of filtered ligands.

Parameters:
  • input_file – Path to the input file. This file is removed after filtering is done and we only keep the filtered file.

  • bloom_filter_file – Path to the bloom filter file.

  • num_to_keep – Maximum number of compounds to keep after filtering.

  • chunk_size – Number of elements to load in memory and process in each chunk through the bloom filter.

  • csv_column_name – Name of the column in the input csv that contains the entries that are stored in the Bloom filter

Returns:

Path to the filtered output file.

schrodinger.application.building_block_exploration.bb_explorer_utils.get_csv_header_and_column_index(input_file: str, column_name: str)

Get the header and index of the specified column in the input csv file.

Parameters:
  • input_file – Path to the input csv file.

  • column_name – Name of the column to get the index for.

Returns:

Tuple of (header, column_index) where header is a list of column names and column_index is the index of the specified column.