schrodinger.rdkit.molio module¶
PathFinder helper functions for reading and writing files using RDKit Mol objects.
- class schrodinger.rdkit.molio.MolWriter(filename, generate_coordinates=True, require_stereo=False)¶
Bases:
schrodinger.structure._io.StructureWriter
Write Mol objects to a file using a StructureWriter-like API, optionally generating 3D coordinates.
- __init__(filename, generate_coordinates=True, require_stereo=False)¶
Create a structure writer class based on the format.
- Parameters
filename (str or pathlib.Path) – The filename to write to.
overwrite (bool) – If False, append to an existing file instead of overwriting it.
format (str) – The format of the file. Values should be specified by one of the module-level constants MAESTRO, MOL2, SD, SMILES, or SMILESCSV. If the format is not explicitly specified it will be determined from the suffix of the filename. Multi-structure PDB files are not supported.
stereo (enum) –
Use of the stereo option in the constructor is pending deprecation. Please use the setOption method instead.
See the class docstring for documentation on the stereo options.
allow_empty_file (bool) – whether we should create a file with no structures if we don’t append any structures. Only a valid option for Maestro files.
- append(mol)¶
Append the provided structure to the open file.
- class schrodinger.rdkit.molio.StructureReaderAdapter(reader, implicitH=True)¶
Bases:
object
A wrapper for a Structure reader, which, when iterated through, yields RDKit Mol objects, and can also be used as a context manager that closes the reader on exit.
- __init__(reader, implicitH=True)¶
- Parameters
reader (iterable of Structure) – source of structures to convert
implicitH (bool) – use implicit hydrogens
- class schrodinger.rdkit.molio.BaseCsvMolReader(file, name_field: str = None)¶
Bases:
object
Parent class for CsvMolReader and CsvMolIterator.
- __init__(file, name_field: str = None)¶
- Parameters
file (str or file-like object) – CSV filename (file may be compressed) or file-like object.
name_field (str) – name of the field to use as the molecule name, will take precedence over the default NAME_FIELDS
- NAME_FIELDS = ('NAME', 's_m_title', 'Name')¶
- close()¶
- class schrodinger.rdkit.molio.CsvMolReader(file, *args, **kwargs)¶
Bases:
schrodinger.rdkit.molio.BaseCsvMolReader
Read a SMILES CSV file, returning Mol objects.
This is similar to RDKit’s SmilesMolSupplier with delimiter=’,’, except that it uses the csv module instead of naively splitting on commas. This makes it possible to have field values containing commas, as long as they are quoted following the CSV convention. Note, however, that multi-line records are still not supported for efficiency reasons.
Also, gzip-compressed files (identified by the filename ending in “gz”) are supported.
A CsvMolReader supports random access, like a list. Upon instantiation, the file is read in full and kept in memory. For a CSV file having only SMILES and an ID, this takes about 100 MB per million entries.
- __init__(file, *args, **kwargs)¶
- Parameters
file (str or file-like object) – CSV filename (file may be compressed) or file-like object.
name_field (str) – name of the field to use as the molecule name, will take precedence over the default NAME_FIELDS
- __len__()¶
- class schrodinger.rdkit.molio.CsvMolIterator(file, name_field: str = None)¶
Bases:
schrodinger.rdkit.molio.BaseCsvMolReader
Read a SMILES CSV file, returning Mol objects.
Unlike CsvMolReader, CsvMolIterator does not support random access, but since it only keeps one line in memory at a time, memory use is minimal.
- class schrodinger.rdkit.molio.CsvMolWriter(filename, properties=None, cxsmiles=False)¶
Bases:
object
Write a CSV file given Mol objects, using a StructureWriter-like API. The first two columns are the SMILES and title, and the rest are the properties of the molecule.
We don’t use structure.SmilesCsvWriter because it is too slow due to all the conversions (the overall job takes 4 times as long, so the bottleneck clearly becomes the writing of the output file!).
We don’t use Chem.SmilesWriter because even though it can use comma as a delimiter, it doesn’t write proper CSV files because it doesn’t know how to escape the delimiter.
Also, gzip-compressed files (identified by the filename ending in “gz”) are supported.
- __init__(filename, properties=None, cxsmiles=False)¶
- Parameters
filename (str or file-like object) – file to write
properties (list of str or None) – optional, list of names of properties to write to output file. If None, all the properties are written. (CAVEAT: if
filename
is a file object rather than an actual filename, only the properties present in the first molecule are written.)cxsmiles (bool) – when writing SMILES, use CXSMILES extensions
- append(mol)¶
Write a molecule to the file. The first time this is called, the header row is written based on mol’s properties or the properties passed to __init__, if any.
- Parameters
mol (rdkit.Chem.rdchem.Mol) – molecule
- toSmiles(mol)¶
- close()¶
- class schrodinger.rdkit.molio.BasePfxMolReader(filename)¶
Bases:
object
Parent class for PfxMolReader and PfxMolIterator.
- __init__(filename)¶
- close()¶
- class schrodinger.rdkit.molio.PfxMolReader(filename)¶
Bases:
schrodinger.rdkit.molio.BasePfxMolReader
Reader for PFX (PathFinder reactants) files. These are really zip archives containing a CSV file and a metadata JSON file.
Like CsvMolReader, PfxMolReader supports random access, like a list. Upon instantiation, the file is read in full and kept in memory. For a file having only SMILES and an ID, this takes about 100 MB per million entries.
- csv_mol_reader_class¶
alias of
schrodinger.rdkit.molio.CsvMolReader
- __len__()¶
- class schrodinger.rdkit.molio.PfxMolIterator(filename)¶
Bases:
schrodinger.rdkit.molio.BasePfxMolReader
Reader for PFX (PathFinder reactants) files. These are really zip archives containing a CSV file and a metadata JSON file.
Unlike PfxMolReader, PfxMolIterator does not support random access, but since it only keeps one line in memory at a time, memory use is minimal.
- csv_mol_reader_class¶
- class schrodinger.rdkit.molio.PfxMolWriter(filename, properties=None)¶
Bases:
object
Writer for PFX (PathFinder reactants) files. These are really zip archives containing a CSV file and a metadata JSON file.
- __init__(filename, properties=None)¶
- Parameters
filename (str) – file to write
properties (list of str or None) – optional, list of names of properties to write to output file. If None, all the properties present on the first structure will be written (the assumption is that all molecules will have the same properties, or at least that the first molecule has all the properties that we care about).
- append(mol)¶
Write a molecule to the file.
- Parameters
mol (rdkit.Chem.rdchem.Mol) – molecule
- property written_count¶
- close()¶
- class schrodinger.rdkit.molio.RdkitMolWriter(filename, v3000=False)¶
Bases:
object
Write Mol objects to a file using the RDKit file-writing classes, but with a StructureWriter-like API. Supports SMILES and SDF.
- __init__(filename, v3000=False)¶
- Parameters
filename (str) – filename to write
v3000 (bool) – when writing SD, force the use of the V3000 format
- property written_count¶
- append(mol)¶
- close()¶
- class schrodinger.rdkit.molio.NoneSkipper(supplier)¶
Bases:
object
A wrapper for a mol supplier, which, when iterated through, skips the
None
mols, and can also be used as a context manager.- __init__(supplier)¶
- Parameters
supplier (iterable of Mol) – supplier of molecules
- __len__()¶
- class schrodinger.rdkit.molio.GzippedSDMolSupplier(filename, *a, **kw)¶
Bases:
rdkit.Chem.rdmolfiles.ForwardSDMolSupplier
Subclass of ForwardSDMolSupplier to read gzip-compressed files. Use as a context manager to ensure that the file gets closed.
- __init__(filename, *a, **kw)¶
- Parameters
filename (str) – gzip-compressed file
a – positional arguments to pass through to parent
kw – keyword arguments to pass through to parent
- schrodinger.rdkit.molio.get_mol_writer(filename, generate_coordinates=True, require_stereo=False, v3000=False, cxsmiles=False)¶
Return a StructureWriter-like object based on the command-line arguments. RDkit is used for non-Maestro formats.
- Parameters
filename (str) – filename to write
generate_coordinates (bool) – generate 3D coordinates (non-SMILES formats)
require_stereo (bool) – when generating coordinates, fail when there’s unspecified stereochemistry, instead of producing an arbitrary isomer
v3000 (bool) – when writing SD, force the use of the V3000 format
cxsmiles (bool) – when writing SMILES, use CXSMILES extensions
- schrodinger.rdkit.molio.supported_output_format(filename)¶
Check whether we know how to write a file with a given name, but without actually opening a file. Used for argument validation.
- Return type
bool
- schrodinger.rdkit.molio.get_mol_reader(filename, skip_bad=True, implicitH=True, random_access=True)¶
Return a Mol reader given a filename or a SMILES string. For .smi and .csv files, use the RDKit SmilesMolSupplier; for other formats, use StructureReader but convert Structure to Mol before yielding each molecule.
Whenever possible, the reader will be a Sequence. This is the currently the case for .smi and .csv files when skip_bad is False. (And for a SMILES string, which returns a list of size 1.)
- Parameters
skip_bad (bool) – if True, bad structures are skipped implicitly, instead of being yielded as None (only applies to SMILES and CSV formats.)
implicitH (bool) – use implicit hydrogens (only has an effect when reading Maestro files)
random_access (bool) – if False, the reader object can only be used as an iterator, and the file is not read in memory all at once. (Only applies to CSV and PFX and is ignored for other formats, which provide no random access except for uncompressed SD.)
- Return type
Generator or Sequence of Mol
- schrodinger.rdkit.molio.get_mol(target, implicitH=True)¶
Read a Mol from a file or a SMILES string.
- Parameters
target (str) – filename or SMILES
implicitH (bool) – use implicit hydrogens (only has an effect when reading Maestro files)
- Return type
rdkit.Chem.Mol
- schrodinger.rdkit.molio.combine_output_files(outfiles, out, dedup=True, sort=False, union_csv_columns=False, rdkit=False, v3000=False)¶
Write the final output file.
- Parameters
outfiles (list[str]) – subjob output filenames
out (str) – output filename
dedup (bool) – skip duplicate products
sort (bool) – sort output (implies the subjob output is sorted)
union_csv_columns (bool) – if csv, union infile columns.
rdkit (bool) – Use an RDKit writer for SD files.
v3000 (bool) – If using an RDKit writer and writing an SD file, force V3000 format.
- schrodinger.rdkit.molio.get_format_handler(infiles, outfile, union_csv_columns=False, rdkit=False, v3000=False)¶
Return the appropriate format handler for a specified output file type.
- Parameters
infiles (list[str]) – subjob output filenames, used as input for merging
outfile (str) – output filename
union_csv_columns (bool) – flag to write out the union of infile csv columns (if infile columns differ)
rdkit (bool) – Use an RDKit writer for SD files.
v3000 (bool) – If using an RDKit writer and writing an SD file, force V3000 format.
- Returns
instance of a subclass of BaseMergeHandler
- Return type
- schrodinger.rdkit.molio.merge_files_as_streams(infiles, outfile, file_handler, dedup)¶
Copies structures from
infiles
intooutfile
. Rejects duplicates using ‘file_handler.getCompareKey.’ Assumes infiles are sorted.- Parameters
infiles (iterable over str) – names of the structure files to be joined
outfile (instance of subclass of BaseMergeHandler) – output file name
file_handler – object to handle open, read and write operations for the file.
dedup (bool) – flag to indicate if duplicate products should be removed from merged output file
- Returns
number of structures written
- Return type
int
- schrodinger.rdkit.molio.merge_files_in_memory(infiles, outfile, filetype_handler, dedup)¶
Copies structures from
infiles
intooutfile
. Rejects duplicates using filetype_handler.getCompareKey.- Parameters
infiles (iterable over str) – names of the structure files to be joined
outfile (str) – output file name
- Returns
number of structures written
- Return type
int
- class schrodinger.rdkit.molio.BaseMergeHandler¶
Bases:
object
Base class for filetype handlers for subjob output deduplication and merging.
- getProductReader(file)¶
Given a file name, create and return an iterable file handle to iterate over all products.
- Parameters
file (str) – file name
- Returns
iterable context manager over filetype-specific product format
- Return type
iterable
- getProductAppender(file)¶
Given a file name, create and return a file-writing object that writes with when its “append” method is called.
- Parameters
file (str) – file name
- Returns
a file handle with context management that supports the append() call used in merge_files_in_memory and merge_files_as_streams.
- Return type
file-like object
- getCompareKey(product)¶
Given a product (formatted according to the filetype), return the computed comparison key (SMILES string) for the product.
- Parameters
product (filetype-specific product (type varies)) – filetype-specific product
- class schrodinger.rdkit.molio.CsvMergeHandler(infiles, outfile, union_columns=True, dedup_field=None)¶
Bases:
schrodinger.rdkit.molio.BaseMergeHandler
Class to bundle csv read/write operations
- __init__(infiles, outfile, union_columns=True, dedup_field=None)¶
- Parameters
infiles (list(str)) – list of output files to join column, if necessary.
outfile (str) – output file
union_columns (bool) – flag to write out the union of infile csv columns (if infile columns differ)
dedup_field (str) – csv column to use to check for duplicates during deduplication
- getProductReader(file)¶
Open a csv file, skip the first (header) line if necessary, and return a context-managing iterable over all remaining lines.
- Parameters
file (str) – file name
- Returns
iterable context manager over csv lines
- Return type
_CsvReadWrapper (iter(str) or iter(dict))
- getProductAppender(file)¶
Open a csv file, write the first (header) line, and return a line writer that supports the getProductAppender.append calls.
- Parameters
file (str) – file name
- Returns
a file handle that supports the append() call used in merge_files_in_memory and merge_files_as_streams.
- Return type
file-like object
- getHeader()¶
Returns the header for ProductAppenders to reference.
- Returns
Header line for the input csv files.
- Return type
str
- getCompareKey(prod)¶
Compute SMILES from a given csv-formatted product.
- Parameters
prod (dict or list) – product in question
- Returns
SMILES string
- Return type
str
- class schrodinger.rdkit.molio.StructureMergeHandler¶
Bases:
schrodinger.rdkit.molio.BaseMergeHandler
Helper class to bundle structure.Structure IO operations.
- __init__()¶
- getProductReader(file)¶
Create and return a structure reader
- Parameters
file (str) – structure file name
- Returns
structure reader for file
- Return type
- getProductAppender(file)¶
Create and return a structure writer
- Parameters
file (str) – structure file name
- Returns
structure writer for file
- Return type
- getCompareKey(prod)¶
Compute smiles from a given Schrodinger structure to compare against other structures.
- Parameters
prod (structure.Structure) – product in question
- Returns
SMILES string
- Return type
str
- class schrodinger.rdkit.molio.RdkitMergeHandler(v3000=False)¶
Bases:
schrodinger.rdkit.molio.BaseMergeHandler
- __init__(v3000=False)¶
- Parameters
v3000 (bool) – If using an RDKit writer and writing an SD file, force V3000 format.
- getProductReader(file)¶
Given a file name, create and return an iterable file handle to iterate over all products.
- Parameters
file (str) – file name
- Returns
iterable context manager over filetype-specific product format
- Return type
iterable
- getProductAppender(file)¶
Given a file name, create and return a file-writing object that writes with when its “append” method is called.
- Parameters
file (str) – file name
- Returns
a file handle with context management that supports the append() call used in merge_files_in_memory and merge_files_as_streams.
- Return type
file-like object
- getCompareKey(prod)¶
- Parameters
prod (rdkit.Chem.Mol) – product in question
- Returns
SMILES string
- Return type
str
- class schrodinger.rdkit.molio.SmiMergeHandler¶
Bases:
schrodinger.rdkit.molio.BaseMergeHandler
Helper class to bundle SMILES (.smi) IO operations.
- getProductReader(file)¶
Create and return a SMILES line reader
- Parameters
file (str) – SMILES file name
- Returns
SMILES line reader for file
- Return type
file-like object (__enter__, __exit__, __iter__)
- getProductAppender(file)¶
Create and return a SMILES line writer
- Parameters
file (str) – SMILES file name
- Returns
SMILES line writer for file
- Return type
_SmilesAppender
- getCompareKey(prod)¶
Compute smiles from a given SMILES line for comparison to other SMILES lines.
- Parameters
prod (str) – product in question
- Returns
SMILES string
- Return type
str
- schrodinger.rdkit.molio.get_fieldnames(filenames)¶
Return a list with the union of the field names from all the given CSV files. The field names are listed in the order in which they were first seen. (First all the fields from file #1, then the “new” field names from file #2, etc.)
- Parameters
filenames ([str]) – list of CSV files
- Returns
list of field names
- Return type
[str]
- schrodinger.rdkit.molio.is_csvgz(filename)¶
- schrodinger.rdkit.molio.is_pfx(filename)¶
- schrodinger.rdkit.molio.get_pfx_size(filename)¶
Return the size from the metadata header of a .pfx file.
- schrodinger.rdkit.molio.extract_structures(filename, dest_file)¶
Extract structures from .pfx file into a given file.
- schrodinger.rdkit.molio.remove_react_atom_props(mol)¶
Return a copy of
mol
where atom properties added by the RDKit reaction module have been stripped out.- Parameters
mol (rdkit.Chem.Mol) – input molecule; not modified
- Returns
modified molecule
- Return type
rdkit.Chem.Mol
- schrodinger.rdkit.molio.cat_csv_files(source_filenames, dest_filename)¶
Quick and dirty csv concatenation strategy. Assumes all csv files have the same columns and does not deduplicate.
- Parameters
source_filenames – input files
dest_filename – destination file
- schrodinger.rdkit.molio.copy_csv_file(input_file, output_file)¶
Copy compressed or uncompressed input .csv file to another .csv file. Output file can also be compressed or uncompressed.
- Parameters
input_file (str) – input file name
output_file (str) – output file name