schrodinger.utils.fileutils module¶
A module of file utilities to deal with common file issues.
NOTE: This module is used in scripts that need to be able to run without a Schrodinger license, and therefore can’t depend on the pymmlibs.
The force_remove and force_rename functions deal with the fact that os.remove() and os.rename() don’t work on Windows if write permissions are not enabled.
Copyright Schrodinger LLC, All Rights Reserved.
- exception schrodinger.utils.fileutils.SharingViolationError¶
Bases:
PermissionError
- schrodinger.utils.fileutils.force_remove(*args)¶
Remove each file in ‘args’ in a platform independent way without an exception, regardless of presence of the file or the lack of write permission.
- Parameters
args (str) – the pathname for the files to remove
- schrodinger.utils.fileutils.force_rmtree(dirname: Union[str, pathlib.Path], ignore_errors: bool = False)¶
Remove the directory ‘dirname’, using force_remove to remove any difficult to remove files or sub-directories.
- Parameters
dirname – the directory to remove
ignore_errors – If True, silently ignore errors, otherwise raise OSError
- schrodinger.utils.fileutils.force_rename(old: Union[pathlib.Path, str], new: Union[pathlib.Path, str])¶
Rename a file, even if a file at the new name exists, and even if that file doesn’t have write permission, and even if old and new are on different devices.
- Parameters
old – Path to the file source.
new – Path to the file destination.
- Note
Renaming may not be an atomic operation. If the ‘new’ file exists then it is first removed then renamed in two operations. Similarly, if old and new are not on the same device then the file is copied to ‘new’ then the ‘old’ file is removed.
- schrodinger.utils.fileutils.force_copy2(*args)¶
Same as shutil.copy2 but don’t raise shutil.SameFileError.
- schrodinger.utils.fileutils.atomic_copyfile(src_file: Union[pathlib.Path, str], dst_file: Union[pathlib.Path, str], *args)¶
Prevents corruption of dst_file if the copy is unsuccessful. Atomic copyfile action: 1. copy src_file to a temporary file in the same directory as dst_file. 2. rename to dst_file if copy was successful.
- schrodinger.utils.fileutils.patient_file_deletions(max_time: Union[int, float] = 10, interval: Union[int, float] = 0.2)¶
Allow
force_remove
andforce_rmtree
to wait longer forSharingViolationError
exceptions to resolve themselves. This is only relevant on Windows and this context has no effect on Mac or Linux.
- schrodinger.utils.fileutils.splitext(p: str) Tuple[str, str] ¶
Split the extension from a pathname. Returns “(root, ext)”. Equivalent to os.path.splitext(), except that for gzip compressed files, such as *.mae.gz files, “.mae.gz” is split off instead of “.gz”. *.sdf.gz, *.sd.gz, *.mol.gz
- Parameters
p – a pathname
- Returns
The root filename and the file extension.
- class schrodinger.utils.fileutils.SeqFormat(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- fasta = 1¶
- swissprot = 2¶
- gcg = 3¶
- embl = 4¶
- pir = 5¶
- clustal = 6¶
- csv = 7¶
- schrodinger.utils.fileutils.get_file_extension(filename)¶
Return the file extension of the given file, including any suffixes prior to “.gz” extension.
For example:
assert get_file_extension('myfile.txt') == '.txt' assert get_file_extension('test.mae.gz') == '.mae.gz'
- Parameters
filename – File name to detect the format
- Type
str
- Returns
format of the file.
- Return type
str
- schrodinger.utils.fileutils.get_file_format(filename)¶
- schrodinger.utils.fileutils.get_structure_file_format(filename: str) Optional[str] ¶
Return the format of a structure file, based on the filename extension. None is returned if the file extension is not recognized.
- Parameters
filename – Filename to detect format
- Returns
File format or None if not recognized
- schrodinger.utils.fileutils.get_sequence_file_format(filename: str) Optional[str] ¶
Return the format of a sequence file, based on the filename extension. None is returned if the file extension is not recognized.
- Parameters
filename – Filename to detect format
- Returns
File format or None if not recognized
- schrodinger.utils.fileutils.get_name_filter(name_mapping: Dict[str, List[str]]) List[str] ¶
Create filename filters for QFileDialog
- Parameters
name_mapping – Mapping between category name and list of file types (must be keys of
EXTENSIONS
)- Returns
List of filename filters
- schrodinger.utils.fileutils.is_pdb_file(filename: str) bool ¶
Returns whether the specified filename represents a PDB file.
- Parameters
filename – a filename
- Returns
Whether the file is a pdb file.
- schrodinger.utils.fileutils.is_maestro_file(filename: str) bool ¶
Returns True if specified filename represents a Maestro file.
- Parameters
filename – a filename
- Returns
Is this filename a maestro file?
- schrodinger.utils.fileutils.is_sd_file(filename: str) bool ¶
Returns True if specified filename represents a SD file.
- Parameters
filename – a filename
- Returns
Is this filename an SD file?
- schrodinger.utils.fileutils.is_csv_file(filename: str) bool ¶
Returns True if specified filename represents a CSV file.
- Parameters
filename – a filename
- Returns
Is this filename a csv file?
- schrodinger.utils.fileutils.is_smiles_file(filename: str) bool ¶
Returns True if specified filename represents a Smiles file.
- Parameters
filename – a filename
- Returns
Is this filename a smiles file?
- schrodinger.utils.fileutils.is_poseviewer_file(filename: str) bool ¶
Determines whether the filename follows Pose Viewer file naming conventions.
Effectively, this checks whether the file name ends with a ‘_pv’ or ‘_epv’ followed by a Maestro file extension (‘.mae’, ‘.mae.gz’, or ‘.maegz’). Roughly equivalent to the regular expression r’_e?pv.mae(.?gz)?$’.
- schrodinger.utils.fileutils.is_epv_file(filename: str) bool ¶
Determines whether a filename follows Extended Pose Viewer (EPV) file naming conventions.
- schrodinger.utils.fileutils.split_ext_pv(filename: str)¶
Return stem and extension, while accounting for compression and ‘_pv’ or ‘_epv’ as part of the extension.
For example:
split_ext_pv('/path/to/foo_pv.mae.gz') # -> ('foo', '_pv.mae.gz')
- Returns
A tuple with the stem and extension. The extension portion will include ‘_pv’ or ‘_epv’, if present.
- Return type
tuple[str, str]
- schrodinger.utils.fileutils.is_cms_file(filename: str) bool ¶
Returns True if specified filename represent a CMS file.
- Parameters
filename – a filename
- Returns
Is this filename a CMS file?
- schrodinger.utils.fileutils.is_hypothesis_file(filename: str) bool ¶
Returns True if specified filename represents a Phase hypothesis file. The .phypo extension corresponds to a gzipped Maestro file containing a single ct which is a Phase hypothesis.
- Parameters
filename – a filename
- Returns
Is this filename a Phase hypothesis file?
- schrodinger.utils.fileutils.strip_extension(filename: str) str ¶
Return a new file path without extension. Suffixes such as “_pv” and “_epv” are also removed.
- schrodinger.utils.fileutils.get_basename(filename: str) str ¶
Returns the final component of specified path name minus the extension. Suffixes such as “_pv” and “_epv” are also stripped.
- schrodinger.utils.fileutils.is_gzipped_structure_file(filename: str) bool ¶
Returns True if the filename represents a file that is GZipped and it has a recognized structure extension.
- Parameters
filename – a filename
- Returns
Is this filename a gzipped structure file?
- schrodinger.utils.fileutils.is_valid_jobname(jobname: str) bool ¶
Returns True if specified job name is valid, does not contain any illegal characters, and does not start with “.”.
- schrodinger.utils.fileutils.get_jobname(filename: str) str ¶
Returns a job name derived from the specified filename. Same as get_basename(), except that illegal characters are removed.
- schrodinger.utils.fileutils.get_next_filename_prefix(path: str, midfix: str, zfill_width: int = 0) str ¶
Return next filename prefix in series <root><midfix><number>.
Given a path (absolute or relative) to a filename or filename prefix, return the next prefix in the sequence implied by path and midfix. For example, with a path of /full/path/to/foo.mae, path/to/foo.mae or foo.mae, or /full/path/to/foo, path/to/foo or foo, and a midfix of ‘-’, this function will return “foo-3” if any file whose prefix foo-2 (and no higher-numbered foo-*) is present. It will return foo-1 if no file whose prefix is foo-<number> is present. The net effect is that any file-name extension in the path argument will be ignored.
This function differs from next_filename() in that here, all files sharing the prefix contained in the path are searched, regardless of extension, and the next filename prefix is returned.
The search is case sensitive or not depending on the semantics of the file system. The leading directory of the path, if any, is included in the return value.
Usage note: you might use this when the filename prefix could be exhibited by many files and you don’t want to overwrite any of them. For example, you are starting up a job which will create many files with the same prefix.
- schrodinger.utils.fileutils.get_next_filename(path: str, midfix: str, zfill_width: int = 0)¶
Return next filename in series <root><midfix><number>.<ext>.
Given a path (absolute or relative) to a filename, return the next filename in the sequence implied by path and midfix. For example, with a path of /full/path/to/foo.mae, path/to/foo.mae or foo.mae and a midfix of ‘-’, this function will return “foo-3.mae” if file foo-2.mae (and no higher-numbered foo-*.mae) is present. It will return foo-1.mae if no file named foo-<number>.mae is present.
This function differs from next_filename_prefix() in that here, only files with the specified extension are searched, and the next full filename is retured.
The search is case sensitive or not depending on the semantics of the file system. The leading directory of the path, if any, is included in the return value.
Usage note: You might use this when you are expecting to update only a single file: the one whose filename is given in the path. For example, you are exporting structures to a .mae file and you want to pick a non-conflicting name based on a user’s filename specification.
Return the path to the local $SCHRODINGER/mmshare-*/ directory
- Returns
Path to the “mmshare” directory.
Return the path of the local $SCHRODINGER/mmshare-*/data/ directory.
- Returns
Path to the “data” directory.
Return the path of the $SCHRODINGER/mmshare-*/python/scripts/ directory.
- Returns
Path to the “scripts” directory.
Return the path of the $SCHRODINGER/mmshare-*/python/common/ directory.
- Returns
Path to the “common” directory.
- schrodinger.utils.fileutils.get_docs_dir() str ¶
Return the path to the local $SCHRODINGER/docs/ directory
- Returns
Path to the “docs” directory.
- schrodinger.utils.fileutils.get_directory_path(which_directory) str ¶
This function returns the schrodinger specific directory.
If an invalid which_directory is specified, then a TypeError is thrown.
Valid directories are:
HOME : To get user’s home dir
APPDATA : To get the Schrodinger application shared data dir
LOCAL_APPDATA : To get the Schrodinger application local data dir
USERDATA : To get user’s data dir
TEMP : To get default temporary data dir
DESKTOP : To get user’s desktop dir
DOCUMENTS : To get user’s ‘My Documents’ dir
NETWORK : To get user’s ‘My Network places’ dir (only for Windows)
- Return type
str
- Returns
Directory path
- schrodinger.utils.fileutils.get_directory(which_directory) -> (<class 'int'>, <class 'str'>)¶
- Deprecated
Because this function behaves in a non-standard way by returning an mmlib status,
get_directory_path
is preferred.
- schrodinger.utils.fileutils.get_home_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.get_appdata_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.get_local_appdata_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.get_desktop_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.get_mydocuments_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.get_mynetworkplaces_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.get_userdata_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.get_schrodinger_temp_dir() str ¶
- Deprecated
get_directory_path should be used instead.
- schrodinger.utils.fileutils.locate_darwin_pymol() Optional[str] ¶
Return path to Pymol on a MacOS system. Return None if no Pymol installations are found.
- schrodinger.utils.fileutils.locate_pymol() Optional[str] ¶
Find the executable or script we use to launch PyMOL.
- Returns
The pymol launch command or None if PyMOL was not found
- schrodinger.utils.fileutils.get_pymol_cmd(use_x11: bool = False) List[str] ¶
Get a cmd list for launching Pymol. This may include extra platform- specific arguments.
- Parameters
use_x11 – if True causes -m to be added to the launch command on Mac
- Returns
a cmd list with the executable as first element and any other options following it.
- class schrodinger.utils.fileutils.chdir(dirname: Union[pathlib.Path, str])¶
Bases:
object
A context manager that carries out commands inside of the specified directory and restores the current directory when done.
- __init__(dirname: Union[pathlib.Path, str])¶
- schrodinger.utils.fileutils.create_hard_link(source: str, link_name: str)¶
Create a hard link pointing to source named link_name.
On Windows, uses CreateHardLinkA() and will raise RuntimeError() on failure.
On other OSes uses os.link(), and will raise OSError on failure.
- schrodinger.utils.fileutils.mkdir_p(path: str, *mode)¶
- Deprecated
use
os.makedirs(path, exist_ok=True)
- class schrodinger.utils.fileutils.tempfilename(prefix='tmp', suffix='', temp_dir=None)¶
Bases:
str
- remove()¶
- class schrodinger.utils.fileutils.TempStructureFile(sts)¶
- schrodinger.utils.fileutils.cat(source_filenames: List[str], dest_filename: str)¶
Concatenate the contents of the source files, writing them to a destination file. All files are specified by name. If source_filenames is an empty list, an empty file is produced.
- Parameters
source_filenames – input files
dest_filename – destination file
- schrodinger.utils.fileutils.cat_flat_files(source_filenames: Iterable[str], dest_filename: str)¶
Combine multiple flat files (such as CSVs) into one large file.
Expects each source file to contain a header line and will write the header from the first source file into the destination file. Any file specified may be compressed.
- Parameters
source_filenames – A list of paths to source files.
dest_filename – The name of the destination file.
- schrodinger.utils.fileutils.tar_files(tarname: str, mode: str, files: List[str])¶
Writes files to tar archive.
- Parameters
tarname – Tar file name.
mode – File open mode.
files – Iterable over file names to be added to the archive.
- schrodinger.utils.fileutils.zip_files(zipname: str, mode: str, files: List[str])¶
Writes files to tar archive.
- Parameters
zipname – Zip file name.
mode – File open mode.
files – Iterable over file names to be added to the archive.
- schrodinger.utils.fileutils.is_within_directory(directory, afile)¶
- schrodinger.utils.fileutils.safe_extractall_tar(tar, path='.', *args, **kwargs)¶
Extract all files from a tar file. Please see Python Vulnerability: CVE-2007-4559 for details on issue with tar.extractall() method. See tar.extractall method description for details on args and kwargs.
- Parameters
tar (tarfile.TarFile) – TarFile object
path (str) – path of directory where tarfile will be extracted
- schrodinger.utils.fileutils.safe_extractall_zip(zip_file, path='.', *args, **kwargs)¶
Extract all files from a zip file. Please see Python Vulnerability: CVE-2007-4559 for details on issue with zip.extractall() method. See zip.extractall method description for details on args and kwargs.
- Parameters
zip_file (zipfile.ZipFile) – ZipFile object
path (str) – path of directory where tarfile will be extracted
- schrodinger.utils.fileutils.on_same_drive_letter(path_a: str, path_b: str) bool ¶
Returns true if path_a and path_b are on the same driveletter. On systems without drive letters, always return True.
- schrodinger.utils.fileutils.get_files_from_folder(folder_abs_path: str) List[Tuple[str, str]] ¶
Walk through a folder, find all files inside it.
- Parameters
folder_abs_path – folder path
- Returns
each tuple contains: absolute path of a file, and a relative path that the file will be transferred to.
- schrodinger.utils.fileutils.change_working_directory(folder: Union[pathlib.Path, str])¶
A context manager to temporarily change the working directory to folder :param folder: the folder that becomes the working directory
- schrodinger.utils.fileutils.in_temporary_directory()¶
A context manager for executing a block of code in a temporary directory.
- schrodinger.utils.fileutils.mmfile_path(path: Optional[str] = None)¶
Context manager and decorator that resets the mmfile search path on exit. If the optional
path
is supplied, it is set on entry.- Parameters
path – mmfile path to set while in the context
- schrodinger.utils.fileutils.count_lines(filename: str) int ¶
Count the number of newlines in a file, in a way similar to “wc -l”.
- Parameters
filename – input filename
- Returns
number of newlines in file
- schrodinger.utils.fileutils.get_directory_size(dirpath)¶
Get the size of the given directory in MB
(Note: MB => 1e6 bytes)
- Parameters
dirpath (str) – The path to the directory
- Return type
float
- Returns
The size of the directory in MB
- schrodinger.utils.fileutils.get_existing_filepath(path_file: str) Optional[str] ¶
Check and find the path/file either at the given path, in the current working directory, and the original launch directory. The first found path is returned.
This can be useful when the file has been copied from path_file to the CWD, such as when launchapi copies a file from an absolute path on the local machine into the job directory on a remote machine.
This can also be useful when large files (e.g. trajectory) file are not copied from path_file to the job launch dir for localhost jobs. The job in the current launch dir can access the files in the original launch dir.
- Returns
None if the file cannot be located
- schrodinger.utils.fileutils.xyz_to_sdf(xyz_filepath: str, out_sdf: Optional[str] = None, save_file: bool = True) str ¶
Convert a XYZ format file to sdf one.
- Parameters
xyz_filepath – filename with path
out_sdf – the output sdf filename if provided. If None means the out_sdf is auto-set based on input filename
save_file – If false, the output information is written to stdout instead of a file.
- Returns
the output sdf filename
- Raises
ValueError – input file is of wrong extension
RuntimeError – failed to convert the xyz file
- schrodinger.utils.fileutils.open_maybe_compressed(filename: str, *a, **d) io.IOBase ¶
Open a file, using the gzip module if the filename ends in gz, or the builtin open otherwise. All arguments are passed through.
- schrodinger.utils.fileutils.get_csv_file_column_count(csv_file: str) int ¶
Return the number of columns in the csv file. :param csv_file: CSV file path. :return: Number of columns in the csv file.
- schrodinger.utils.fileutils.hash_for_file(path, algorithm=<built-in function openssl_md5>, buff_size=8388608)¶
Get file hash.
- Parameters
path (str) – File path
algorithm (method) – Algorithm to use
buff_size (int) – Buffer size
- Return type
str
- Returns
File hash
- schrodinger.utils.fileutils.extended_windows_path(dos_path, only_if_required=True)¶
Convert path to absolute path and prepend extended path tag to paths on Windows
- Parameters
dos_path (str) – a Windows file path, which may be longer than 256 characters and therefore invalid
only_if_required (bool) – Whether to append windows extended path tag to to file paths that do not exceed WINDOWS_MAX_PATH in length.
- Return type
string
- Returns
An Windows extended file path which can accommodate 30000+ characters
- schrodinger.utils.fileutils.slugify(text)¶
Slugifies a filename for use in a URL or file name.
Based on the Django implementation. (https://github.com/django/django/blob/dcebc5da4831d2982b26d00a9480ad538b5c5acf/django/utils/text.py#L400)
- Parameters
text (str) – Text to slugify
- Returns
Slugified text
- Return type
str
- schrodinger.utils.fileutils.is_subpath(path, parent_dir, strict=False)¶
Returns whether the specified path is a subdir of the specified parent directory.
- Parameters
strict (bool) – if False, the parent_dir is considered a subpath of itself. Set to True so only actual subpaths qualify as paths.
- schrodinger.utils.fileutils.split_file_round_robin(infile, outfiles, has_header)¶
Chunks a larger file into smaller files by systematically sampling every k-th input line into the k-th output file. To be used with flat data files such as CSV or SMI files.
- Parameters
infile (str) – The input file to be split.
outfiles (list[str]) – The files to be written to. Will be overwrite any file that already exists.
has_header (bool) – Whether the input file has a header line.
- class schrodinger.utils.fileutils.MultiFileReader(files, *, have_header=True)¶
Bases:
object
An iterator context manager to read in a collection of flat files. Files are logically concatenated so that all records are treated as one large file. Supports a mixture of compressed and uncompressed files.
- Variables
header (Optional[str]) – If ‘have_header’ is set, this will contain the first line of the last file that was read in. I.e., the header line.
- __init__(files, *, have_header=True)¶
- Parameters
files (Iterable[str]) – The files to be sampled.
have_header (bool) – Whether the files have a header. If so, the header will be stored as a member variable ‘header’.
- close()¶
- schrodinger.utils.fileutils.touch(path)¶
Touch a path.
- schrodinger.utils.fileutils.gzip_file(infile, outfile, remove_original=False)¶
Creates a new gzip compressed file from the input file.
- Parameters
infile (Path | str) – The input file to be compressed.
outfile (Path | str) – The destination file.
remove_original (bool) – Whether to delete the original file after compression is completed.