schrodinger.utils.fileutils module

A module of file utilities to deal with common file issues.

NOTE: This module is used in scripts that need to be able to run without a Schrodinger license, and therefore can’t depend on the pymmlibs.

The force_remove and force_rename functions deal with the fact that os.remove() and os.rename() don’t work on Windows if write permissions are not enabled.

Copyright Schrodinger LLC, All Rights Reserved.

exception schrodinger.utils.fileutils.SharingViolationError

Bases: PermissionError

schrodinger.utils.fileutils.force_remove(*args)

Remove each file in ‘args’ in a platform independent way without an exception, regardless of presence of the file or the lack of write permission.

Parameters

args (str) – the pathname for the files to remove

schrodinger.utils.fileutils.force_rmtree(dirname: Union[str, pathlib.Path], ignore_errors: bool = False)

Remove the directory ‘dirname’, using force_remove to remove any difficult to remove files or sub-directories.

Parameters
  • dirname – the directory to remove

  • ignore_errors – If True, silently ignore errors, otherwise raise OSError

schrodinger.utils.fileutils.force_rename(old: Union[pathlib.Path, str], new: Union[pathlib.Path, str])

Rename a file, even if a file at the new name exists, and even if that file doesn’t have write permission, and even if old and new are on different devices.

Parameters
  • old – Path to the file source.

  • new – Path to the file destination.

Note

Renaming may not be an atomic operation. If the ‘new’ file exists then it is first removed then renamed in two operations. Similarly, if old and new are not on the same device then the file is copied to ‘new’ then the ‘old’ file is removed.

schrodinger.utils.fileutils.force_copy2(*args)

Same as shutil.copy2 but don’t raise shutil.SameFileError.

schrodinger.utils.fileutils.atomic_copyfile(src_file: Union[pathlib.Path, str], dst_file: Union[pathlib.Path, str], *args)

Prevents corruption of dst_file if the copy is unsuccessful. Atomic copyfile action: 1. copy src_file to a temporary file in the same directory as dst_file. 2. rename to dst_file if copy was successful.

schrodinger.utils.fileutils.patient_file_deletions(max_time: Union[int, float] = 10, interval: Union[int, float] = 0.2)

Allow force_remove and force_rmtree to wait longer for SharingViolationError exceptions to resolve themselves. This is only relevant on Windows and this context has no effect on Mac or Linux.

schrodinger.utils.fileutils.splitext(p: str) Tuple[str, str]

Split the extension from a pathname. Returns “(root, ext)”. Equivalent to os.path.splitext(), except that for gzip compressed files, such as *.mae.gz files, “.mae.gz” is split off instead of “.gz”. *.sdf.gz, *.sd.gz, *.mol.gz

Parameters

p – a pathname

Returns

The root filename and the file extension.

class schrodinger.utils.fileutils.SeqFormat(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

fasta = 1
swissprot = 2
gcg = 3
embl = 4
pir = 5
clustal = 6
csv = 7
schrodinger.utils.fileutils.get_file_extension(filename)

Return the file extension of the given file, including any suffixes prior to “.gz” extension.

For example:

assert get_file_extension('myfile.txt') == '.txt'
assert get_file_extension('test.mae.gz') == '.mae.gz'
Parameters

filename – File name to detect the format

Type

str

Returns

format of the file.

Return type

str

schrodinger.utils.fileutils.get_file_format(filename)
schrodinger.utils.fileutils.get_structure_file_format(filename: str) Optional[str]

Return the format of a structure file, based on the filename extension. None is returned if the file extension is not recognized.

Parameters

filename – Filename to detect format

Returns

File format or None if not recognized

schrodinger.utils.fileutils.get_sequence_file_format(filename: str) Optional[str]

Return the format of a sequence file, based on the filename extension. None is returned if the file extension is not recognized.

Parameters

filename – Filename to detect format

Returns

File format or None if not recognized

schrodinger.utils.fileutils.get_name_filter(name_mapping: Dict[str, List[str]]) List[str]

Create filename filters for QFileDialog

Parameters

name_mapping – Mapping between category name and list of file types (must be keys of EXTENSIONS)

Returns

List of filename filters

schrodinger.utils.fileutils.is_pdb_file(filename: str) bool

Returns whether the specified filename represents a PDB file.

Parameters

filename – a filename

Returns

Whether the file is a pdb file.

schrodinger.utils.fileutils.is_maestro_file(filename: str) bool

Returns True if specified filename represents a Maestro file.

Parameters

filename – a filename

Returns

Is this filename a maestro file?

schrodinger.utils.fileutils.is_sd_file(filename: str) bool

Returns True if specified filename represents a SD file.

Parameters

filename – a filename

Returns

Is this filename an SD file?

schrodinger.utils.fileutils.is_csv_file(filename: str) bool

Returns True if specified filename represents a CSV file.

Parameters

filename – a filename

Returns

Is this filename a csv file?

schrodinger.utils.fileutils.is_smiles_file(filename: str) bool

Returns True if specified filename represents a Smiles file.

Parameters

filename – a filename

Returns

Is this filename a smiles file?

schrodinger.utils.fileutils.is_poseviewer_file(filename: str) bool

Determines whether the filename follows Pose Viewer file naming conventions.

Effectively, this checks whether the file name ends with a ‘_pv’ or ‘_epv’ followed by a Maestro file extension (‘.mae’, ‘.mae.gz’, or ‘.maegz’). Roughly equivalent to the regular expression r’_e?pv.mae(.?gz)?$’.

schrodinger.utils.fileutils.is_epv_file(filename: str) bool

Determines whether a filename follows Extended Pose Viewer (EPV) file naming conventions.

schrodinger.utils.fileutils.split_ext_pv(filename: str)

Return stem and extension, while accounting for compression and ‘_pv’ or ‘_epv’ as part of the extension.

For example:

split_ext_pv('/path/to/foo_pv.mae.gz')  # -> ('foo', '_pv.mae.gz')
Returns

A tuple with the stem and extension. The extension portion will include ‘_pv’ or ‘_epv’, if present.

Return type

tuple[str, str]

schrodinger.utils.fileutils.is_cms_file(filename: str) bool

Returns True if specified filename represent a CMS file.

Parameters

filename – a filename

Returns

Is this filename a CMS file?

schrodinger.utils.fileutils.is_hypothesis_file(filename: str) bool

Returns True if specified filename represents a Phase hypothesis file. The .phypo extension corresponds to a gzipped Maestro file containing a single ct which is a Phase hypothesis.

Parameters

filename – a filename

Returns

Is this filename a Phase hypothesis file?

schrodinger.utils.fileutils.strip_extension(filename: str) str

Return a new file path without extension. Suffixes such as “_pv” and “_epv” are also removed.

schrodinger.utils.fileutils.get_basename(filename: str) str

Returns the final component of specified path name minus the extension. Suffixes such as “_pv” and “_epv” are also stripped.

schrodinger.utils.fileutils.is_gzipped_structure_file(filename: str) bool

Returns True if the filename represents a file that is GZipped and it has a recognized structure extension.

Parameters

filename – a filename

Returns

Is this filename a gzipped structure file?

schrodinger.utils.fileutils.is_valid_jobname(jobname: str) bool

Returns True if specified job name is valid, does not contain any illegal characters, and does not start with “.”.

schrodinger.utils.fileutils.get_jobname(filename: str) str

Returns a job name derived from the specified filename. Same as get_basename(), except that illegal characters are removed.

schrodinger.utils.fileutils.get_next_filename_prefix(path: str, midfix: str, zfill_width: int = 0) str

Return next filename prefix in series <root><midfix><number>.

Given a path (absolute or relative) to a filename or filename prefix, return the next prefix in the sequence implied by path and midfix. For example, with a path of /full/path/to/foo.mae, path/to/foo.mae or foo.mae, or /full/path/to/foo, path/to/foo or foo, and a midfix of ‘-’, this function will return “foo-3” if any file whose prefix foo-2 (and no higher-numbered foo-*) is present. It will return foo-1 if no file whose prefix is foo-<number> is present. The net effect is that any file-name extension in the path argument will be ignored.

This function differs from next_filename() in that here, all files sharing the prefix contained in the path are searched, regardless of extension, and the next filename prefix is returned.

The search is case sensitive or not depending on the semantics of the file system. The leading directory of the path, if any, is included in the return value.

Usage note: you might use this when the filename prefix could be exhibited by many files and you don’t want to overwrite any of them. For example, you are starting up a job which will create many files with the same prefix.

schrodinger.utils.fileutils.get_next_filename(path: str, midfix: str, zfill_width: int = 0)

Return next filename in series <root><midfix><number>.<ext>.

Given a path (absolute or relative) to a filename, return the next filename in the sequence implied by path and midfix. For example, with a path of /full/path/to/foo.mae, path/to/foo.mae or foo.mae and a midfix of ‘-’, this function will return “foo-3.mae” if file foo-2.mae (and no higher-numbered foo-*.mae) is present. It will return foo-1.mae if no file named foo-<number>.mae is present.

This function differs from next_filename_prefix() in that here, only files with the specified extension are searched, and the next full filename is retured.

The search is case sensitive or not depending on the semantics of the file system. The leading directory of the path, if any, is included in the return value.

Usage note: You might use this when you are expecting to update only a single file: the one whose filename is given in the path. For example, you are exporting structures to a .mae file and you want to pick a non-conflicting name based on a user’s filename specification.

schrodinger.utils.fileutils.get_mmshare_dir() str

Return the path to the local $SCHRODINGER/mmshare-*/ directory

Returns

Path to the “mmshare” directory.

schrodinger.utils.fileutils.get_mmshare_data_dir() str

Return the path of the local $SCHRODINGER/mmshare-*/data/ directory.

Returns

Path to the “data” directory.

schrodinger.utils.fileutils.get_mmshare_scripts_dir() str

Return the path of the $SCHRODINGER/mmshare-*/python/scripts/ directory.

Returns

Path to the “scripts” directory.

schrodinger.utils.fileutils.get_mmshare_common_dir() str

Return the path of the $SCHRODINGER/mmshare-*/python/common/ directory.

Returns

Path to the “common” directory.

schrodinger.utils.fileutils.get_docs_dir() str

Return the path to the local $SCHRODINGER/docs/ directory

Returns

Path to the “docs” directory.

schrodinger.utils.fileutils.get_directory_path(which_directory) str

This function returns the schrodinger specific directory.

If an invalid which_directory is specified, then a TypeError is thrown.

Valid directories are:

  • HOME : To get user’s home dir

  • APPDATA : To get the Schrodinger application shared data dir

  • LOCAL_APPDATA : To get the Schrodinger application local data dir

  • USERDATA : To get user’s data dir

  • TEMP : To get default temporary data dir

  • DESKTOP : To get user’s desktop dir

  • DOCUMENTS : To get user’s ‘My Documents’ dir

  • NETWORK : To get user’s ‘My Network places’ dir (only for Windows)

Return type

str

Returns

Directory path

schrodinger.utils.fileutils.get_directory(which_directory) -> (<class 'int'>, <class 'str'>)
Deprecated

Because this function behaves in a non-standard way by returning an mmlib status, get_directory_path is preferred.

schrodinger.utils.fileutils.get_home_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.get_appdata_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.get_local_appdata_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.get_desktop_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.get_mydocuments_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.get_mynetworkplaces_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.get_userdata_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.get_schrodinger_temp_dir() str
Deprecated

get_directory_path should be used instead.

schrodinger.utils.fileutils.locate_darwin_pymol() Optional[str]

Return path to Pymol on a MacOS system. Return None if no Pymol installations are found.

schrodinger.utils.fileutils.locate_pymol() Optional[str]

Find the executable or script we use to launch PyMOL.

Returns

The pymol launch command or None if PyMOL was not found

schrodinger.utils.fileutils.get_pymol_cmd(use_x11: bool = False) List[str]

Get a cmd list for launching Pymol. This may include extra platform- specific arguments.

Parameters

use_x11 – if True causes -m to be added to the launch command on Mac

Returns

a cmd list with the executable as first element and any other options following it.

class schrodinger.utils.fileutils.chdir(dirname: Union[pathlib.Path, str])

Bases: object

A context manager that carries out commands inside of the specified directory and restores the current directory when done.

__init__(dirname: Union[pathlib.Path, str])

Create a hard link pointing to source named link_name.

On Windows, uses CreateHardLinkA() and will raise RuntimeError() on failure.

On other OSes uses os.link(), and will raise OSError on failure.

schrodinger.utils.fileutils.mkdir_p(path: str, *mode)
Deprecated

use os.makedirs(path, exist_ok=True)

class schrodinger.utils.fileutils.tempfilename(prefix='tmp', suffix='', temp_dir=None)

Bases: str

remove()
class schrodinger.utils.fileutils.TempStructureFile(sts)

Bases: schrodinger.utils.fileutils.tempfilename

schrodinger.utils.fileutils.cat(source_filenames: List[str], dest_filename: str, allow_missing_source: Optional[bool] = False, allow_empty_dest: Optional[bool] = True)

Concatenate the contents of the source files, writing them to a destination file. All files are specified by name. If allow_missing_source is True, only existing source files are concatenated and no failure occurs if a source file is missing. If allow_empty_dest is False and the list of files to concatenate is empty, no destination file is created.

Parameters
  • source_filenames – input files

  • dest_filename – destination file

  • allow_missing_source – Whether to allow missing source files

  • allow_empty_dest – Whether to create an empty destination file if the list of source files to concatenate is empty

schrodinger.utils.fileutils.cat_flat_files(source_filenames: Iterable[str], dest_filename: str)

Combine multiple flat files (such as CSVs) into one large file.

Expects each source file to contain a header line and will write the header from the first source file into the destination file. Any file specified may be compressed.

Parameters
  • source_filenames – A list of paths to source files.

  • dest_filename – The name of the destination file.

schrodinger.utils.fileutils.tar_files(tarname: str, mode: str, files: List[str])

Writes files to tar archive.

Parameters
  • tarname – Tar file name.

  • mode – File open mode.

  • files – Iterable over file names to be added to the archive.

schrodinger.utils.fileutils.zip_files(zipname: str, mode: str, files: List[str])

Writes files to tar archive.

Parameters
  • zipname – Zip file name.

  • mode – File open mode.

  • files – Iterable over file names to be added to the archive.

schrodinger.utils.fileutils.is_within_directory(directory, afile)
schrodinger.utils.fileutils.safe_extractall_tar(tar, path='.', *args, **kwargs)

Extract all files from a tar file. Please see Python Vulnerability: CVE-2007-4559 for details on issue with tar.extractall() method. See tar.extractall method description for details on args and kwargs.

Parameters
  • tar (tarfile.TarFile) – TarFile object

  • path (str) – path of directory where tarfile will be extracted

schrodinger.utils.fileutils.safe_extractall_zip(zip_file, path='.', *args, **kwargs)

Extract all files from a zip file. Please see Python Vulnerability: CVE-2007-4559 for details on issue with zip.extractall() method. See zip.extractall method description for details on args and kwargs.

Parameters
  • zip_file (zipfile.ZipFile) – ZipFile object

  • path (str) – path of directory where tarfile will be extracted

schrodinger.utils.fileutils.on_same_drive_letter(path_a: str, path_b: str) bool

Returns true if path_a and path_b are on the same driveletter. On systems without drive letters, always return True.

schrodinger.utils.fileutils.get_files_from_folder(folder_abs_path: str) List[Tuple[str, str]]

Walk through a folder, find all files inside it.

Parameters

folder_abs_path – folder path

Returns

each tuple contains: absolute path of a file, and a relative path that the file will be transferred to.

schrodinger.utils.fileutils.change_working_directory(folder: Union[pathlib.Path, str])

A context manager to temporarily change the working directory to folder :param folder: the folder that becomes the working directory

schrodinger.utils.fileutils.in_temporary_directory()

A context manager for executing a block of code in a temporary directory.

schrodinger.utils.fileutils.mmfile_path(path: Optional[str] = None)

Context manager and decorator that resets the mmfile search path on exit. If the optional path is supplied, it is set on entry.

Parameters

path – mmfile path to set while in the context

schrodinger.utils.fileutils.count_lines(filename: str) int

Count the number of newlines in a file, in a way similar to “wc -l”.

Parameters

filename – input filename

Returns

number of newlines in file

schrodinger.utils.fileutils.get_directory_size(dirpath)

Get the size of the given directory in MB

(Note: MB => 1e6 bytes)

Parameters

dirpath (str) – The path to the directory

Return type

float

Returns

The size of the directory in MB

schrodinger.utils.fileutils.get_existing_filepath(path_file: str) Optional[str]

Check and find the path/file either at the given path, in the current working directory, and the original launch directory. The first found path is returned.

This can be useful when the file has been copied from path_file to the CWD, such as when launchapi copies a file from an absolute path on the local machine into the job directory on a remote machine.

This can also be useful when large files (e.g. trajectory) file are not copied from path_file to the job launch dir for localhost jobs. The job in the current launch dir can access the files in the original launch dir.

Returns

None if the file cannot be located

schrodinger.utils.fileutils.xyz_to_sdf(xyz_filepath: str, out_sdf: Optional[str] = None, save_file: bool = True) str

Convert a XYZ format file to sdf one.

Parameters
  • xyz_filepath – filename with path

  • out_sdf – the output sdf filename if provided. If None means the out_sdf is auto-set based on input filename

  • save_file – If false, the output information is written to stdout instead of a file.

Returns

the output sdf filename

Raises
  • ValueError – input file is of wrong extension

  • RuntimeError – failed to convert the xyz file

schrodinger.utils.fileutils.open_maybe_compressed(filename: str, *a, **d) io.IOBase

Open a file, using the gzip module if the filename ends in gz, or the builtin open otherwise. All arguments are passed through.

schrodinger.utils.fileutils.get_csv_file_column_count(csv_file: str) int

Return the number of columns in the csv file. :param csv_file: CSV file path. :return: Number of columns in the csv file.

schrodinger.utils.fileutils.hash_for_file(path, algorithm=<built-in function openssl_md5>, buff_size=8388608)

Get file hash.

Parameters
  • path (str) – File path

  • algorithm (method) – Algorithm to use

  • buff_size (int) – Buffer size

Return type

str

Returns

File hash

schrodinger.utils.fileutils.extended_windows_path(dos_path, only_if_required=True)

Convert path to absolute path and prepend extended path tag to paths on Windows

Parameters
  • dos_path (str) – a Windows file path, which may be longer than 256 characters and therefore invalid

  • only_if_required (bool) – Whether to append windows extended path tag to to file paths that do not exceed WINDOWS_MAX_PATH in length.

Return type

string

Returns

An Windows extended file path which can accommodate 30000+ characters

schrodinger.utils.fileutils.slugify(text)

Slugifies a filename for use in a URL or file name.

Based on the Django implementation. (https://github.com/django/django/blob/dcebc5da4831d2982b26d00a9480ad538b5c5acf/django/utils/text.py#L400)

Parameters

text (str) – Text to slugify

Returns

Slugified text

Return type

str

schrodinger.utils.fileutils.is_subpath(path, parent_dir, strict=False)

Returns whether the specified path is a subdir of the specified parent directory.

Parameters

strict (bool) – if False, the parent_dir is considered a subpath of itself. Set to True so only actual subpaths qualify as paths.

schrodinger.utils.fileutils.split_file_round_robin(infile, outfiles, has_header)

Chunks a larger file into smaller files by systematically sampling every k-th input line into the k-th output file. To be used with flat data files such as CSV or SMI files.

Parameters
  • infile (str) – The input file to be split.

  • outfiles (list[str]) – The files to be written to. Will be overwrite any file that already exists.

  • has_header (bool) – Whether the input file has a header line.

schrodinger.utils.fileutils.convert_path_to_unix(path: pathlib.Path) str

Convert a path to a Unix-style path. Return a string representation of the path with forward slashes ‘/’.

class schrodinger.utils.fileutils.MultiFileReader(files, *, have_header=True)

Bases: object

An iterator context manager to read in a collection of flat files. Files are logically concatenated so that all records are treated as one large file. Supports a mixture of compressed and uncompressed files.

Variables

header (Optional[str]) – If ‘have_header’ is set, this will contain the first line of the last file that was read in. I.e., the header line.

__init__(files, *, have_header=True)
Parameters
  • files (Iterable[str]) – The files to be sampled.

  • have_header (bool) – Whether the files have a header. If so, the header will be stored as a member variable ‘header’.

close()
schrodinger.utils.fileutils.touch(path)

Touch a path.

schrodinger.utils.fileutils.gzip_file(infile, outfile, remove_original=False)

Creates a new gzip compressed file from the input file.

Parameters
  • infile (Path | str) – The input file to be compressed.

  • outfile (Path | str) – The destination file.

  • remove_original (bool) – Whether to delete the original file after compression is completed.