schrodinger.application.bb_database.bb_task_utils module¶
Performs build, describe, qsplit and qrun tasks for bb_database_driver.py.
The following functions may be called in lieu of running a bb_database job when the database is locally accessible:
Function bb_database Arguments ——– ——————— build_database build [options] <bbkeys> <dbname>.bbdb rebuild_chunk build -rebuild <chunkdb> [options] <bbkeys> <dbname>.bbdb describe_database describe [options] <dbname>.bbdb split_query qsplit [options] <bbkeys> <dbname>.bbdb run_query qrun [options] <query>.bbq <dbname>.bbdb
Copyright Schrodinger LLC, All Rights Reserved.
- schrodinger.application.bb_database.bb_task_utils.add_chunk_row(dbpath: str, chunk_index: int, collector: phase.BBCollector) None ¶
Adds a row to <dbpath>/chunks.csv.
- schrodinger.application.bb_database.bb_task_utils.build_database(dbpath: str, key_files: list[str], newdb: bool, chunk_size: Optional[int] = 10000000, max_chunks: Optional[int] = None, commit_size: Optional[int] = 1000000, key_column: Optional[str] = 'InChIKey', key_substr: Optional[str] = ':', logger: Optional[logging.Logger] = None) None ¶
Builds a new database or adds chunks to an existing database. May be called in lieu of running a bb_database build job.
- Parameters
dbpath – Absolute path to database (.bbdb)
key_files – CSV files (.csv, .csv.gz, .csvgz) with building block keys
newdb – Whether to create a new database
chunk_size – Number of building blocks per chunk database. Ignored if newdb is False.
max_chunks – Maximum number of database chunks to add. The default is to add chunks until no more building blocks remain.
commit_size – Number of rows added to a chunk database per commit
key_column – Name of the column that holds building block keys
key_substr – <min>:<max> slice of building block key field. If the key field looks like ‘InChIKey=VXDCVOCMBGPXPL-UHFFFAOYSA-N’ key_substr should be ‘9:’
logger – Logger for informative messages
- schrodinger.application.bb_database.bb_task_utils.create_new_database(dbpath: str, chunk_size: Optional[int] = 10000000, logger: Optional[logging.Logger] = None) None ¶
Creates a new, empty database.
- Parameters
dbpath – Absolute path to database (.bbdb)
chunk_size – Number of building blocks per chunk database
logger – Logger for informative messages
- schrodinger.application.bb_database.bb_task_utils.collect_bbkeys(key_files: list[str], chunk_size: Optional[int] = 10000000, lower_bound: Optional[str] = '', key_column: Optional[str] = 'InChIKey', key_substr: Optional[str] = ':', logger: Optional[logging.Logger] = None) phase.BBCollector ¶
Makes a pass through the key files and returns sorted capped keys in a BBCollector object.
- Parameters
key_files – CSV files (.csv, .csv.gz, .csvgz) with building block keys
chunk_size – Cap on the number of sorted building blocks
lower_bound – Only keys > lower_bound are added
key_column – Name of the column that holds building block keys
key_substr – <min>:<max> slice of building block key field
logger – Logger for informative messages
- schrodinger.application.bb_database.bb_task_utils.describe_database(dbpath: str, verbose: Optional[bool] = False) str ¶
Returns a string that describes the database contents. May be called in lieu of running a bb_database describe job.
- Parameters
dbpath – Absolute path to database (.bbdb)
verbose – Whether to include information about each chunk
- schrodinger.application.bb_database.bb_task_utils.get_chunk_file_name(dbpath: str, collector: phase.BBCollector) str ¶
Returns the name of the chunk database file to which the building block keys in the provided BBCollector should be written.
- schrodinger.application.bb_database.bb_task_utils.get_chunk_info(dbpath: str) tuple[int, str, int] ¶
Given the path to an existing database, this funcion returns a tuple of the chunk size, lower bound on new building block keys to add, and the 1-based index for the next chunk to add.
- schrodinger.application.bb_database.bb_task_utils.get_chunk_row(dbpath: str, low_key: str, high_key: str) list[str] ¶
Returns a row from chunks.csv based on the low and high key values.
- schrodinger.application.bb_database.bb_task_utils.get_chunk_rows(dbpath: str, want_header_row: Optional[bool] = False) list[list[str]] ¶
Returns the rows in chunks.csv.
- schrodinger.application.bb_database.bb_task_utils.get_key_limits(chunkdb: str) list[str, str] ¶
Determines the low and high building block key values from the name of a chunk database file. The basename of chunkdb should be of the form <lowkey>_<highkey>.chkdb, where <lowkey> and <highkey> are the first and last building block keys in the chunk database. It is assumed that building block keys do not contain underscores.
- schrodinger.application.bb_database.bb_task_utils.get_settings(dbpath: str) dict[str, str] ¶
Reads settings.json file and returns the settings as a dict.
- schrodinger.application.bb_database.bb_task_utils.log_msg(msg: str, logger: logging.Logger) None ¶
Writes a message to a logger if it exists
- schrodinger.application.bb_database.bb_task_utils.read_query_catalog(query_catalog: str) dict[str, list[str]] ¶
Reads a query catalog file created by split_query() and returns a dictionary that maps each host name to a list of chunkdb files to be queried.
- schrodinger.application.bb_database.bb_task_utils.read_split_query(query_file: str) dict[str, list[str]] ¶
Reads a .bbq query file created by split_query() and returns a dictionary that maps chunk database name to [<key_column>, <key1>, <key2>, etc.].
- schrodinger.application.bb_database.bb_task_utils.rebuild_chunk(dbpath: str, chunkdb: str, key_files: list[str], commit_size: Optional[int] = 1000000, key_column: Optional[str] = 'InChIKey', key_substr: Optional[str] = ':', logger: Optional[logging.Logger] = None) None ¶
Rebuilds a specific chunk database that’s damaged or incomplete. May be called in lieu of running a bb_database build -rebuild job.
- Parameters
dbpath – Absolute path to database (.bbdb)
chunkdb – Chunk database file (.chkdb) to rebuild. Only the base name is used.
key_files – CSV files (.csv, .csv.gz, .csvgz) with building block keys
commit_size – Number of rows added to chunk database per commit
key_column – Name of the column that holds building block keys
key_substr – <min>:<max> slice of building block key field.
logger – Logger for informative messages
- schrodinger.application.bb_database.bb_task_utils.run_query(dbpath: str, query_file: str, matches_file: str, select_size: Optional[int] = 1000000, logger: Optional[logging.Logger] = None) None ¶
Runs a query created by split_query(). May be called in lieu of running a bb_database qrun job.
- Parameters
dbpath – Absolute path to database (.bbdb)
query_file – Query file (.bbq) created by split_query()
matches_file – Output CSV file (.csv, .csv.gz, .csvgz) for matching keys
select_size – Maximum number of building block keys per database SELECT statement
logger – Logger for informative messages
- schrodinger.application.bb_database.bb_task_utils.split_query(dbpath: str, key_file: str, prefix: str, key_column: Optional[str] = 'InChIKey', key_substr: Optional[str] = ':', logger: Optional[logging.Logger] = None) None ¶
Splits a query according to the database chunks it covers. May be called in lieu of running a bb_database qsplit job.
- Parameters
dbpath – Absolute path to database (.bbdb)
key_file – CSV file (.csv, .csv.gz, .csvgz) with building block keys
prefix – Prefix for query files to create.
key_column – Name of the column that holds building block keys
key_substr – <min>:<max> slice of building block key field.
logger – Logger for informative messages
- schrodinger.application.bb_database.bb_task_utils.unzip_query(prefix: str, dest_dir: Optional[str] = None) dict[str, list[str]] ¶
Unzips an archive <prefix>_queries.zip created by zip_query() to the specified directory, which is CWD by default. Returns a dictionary that maps each extracted query file [<dest_dir>/]<prefix>_<host>.bbq to to the list [<host>, <dbpath>], where <host> is the name of the host on on which the query should be run, and <dbpath> is the location of the building block database on that host.
- schrodinger.application.bb_database.bb_task_utils.validate_dbpath(dbpath: str, expected_to_exist: Optional[bool] = True) None ¶
Raises a RuntimeError if dbpath has the wrong extension or is not an absolute path. Raises a FileExistsError if expected_to_exist is False and dbpath exists; raises a FileNotFoundError if expected_to_exist is True and dbpath doesn’t exist.
- schrodinger.application.bb_database.bb_task_utils.validate_global_database(dbpath: str) None ¶
Raises a RuntimeError if the database does not have global scope, if the rows in chunk.csv are not sequentially numbered starting at 1, or if any row has the wrong number of fields.
- schrodinger.application.bb_database.bb_task_utils.validate_key_files(key_files: list[str]) None ¶
Raises a RuntimeError if any of the provided key files do not have a recognized CSV extension. Raises a FileNotFoundError if a file is missing.
- schrodinger.application.bb_database.bb_task_utils.write_readme_file(dbpath: str) None ¶
Creates a README.txt file with important information about building block databases.
- schrodinger.application.bb_database.bb_task_utils.zip_query(prefix: str) str ¶
Given the prefix that was supplied to split_query(), this function creates the Zip archive <prefix>_queries.zip, adds <prefix>_catalog.json to the archive, along with all of its associated .bbq files. Returns the name of the archive.