schrodinger.protein.getpdb module¶
Module for downloading PDB files from the web.
The data is retrieved from the RCSB. Current download URLs are documented at https://www.rcsb.org/docs/programmatic-access/file-download-services
Running this module is no different from using a web-browser to access the site - it’s just a different type of web client. Therefore this should cause no problems for the maintainers of that site and be within the terms and conditions of use.
Note that certain assumptions are made about the layout of the web site - changes there in future may make this script stop working.
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.protein.getpdb.download_file(filename)¶
Download the given file from RCSB and save it to either CWD or temp dir with same name. Path to the written file is returned.
- Parameters
filename (str) – File to download from RSCB web site.
- Raises
requests.HTTPError – if error in connection to RCSB.
- schrodinger.protein.getpdb.download_sf(pdb_code)¶
Download the ENT file for the given PDB ID, converts it to CNS format, and returns the CNS file name. Will raise a RuntimeError if either download or conversion fails.
Not every pdb has structure factor files deposited, and not every structure factor file will convert perfectly.
- schrodinger.protein.getpdb.download_fasta(pdb_code)¶
Attempts to download the fasta file for the given PDB ID and chain.
- Parameters
pdb_code (str) – PDB ID of the file to download
- schrodinger.protein.getpdb.download_em_map(emdb_code)¶
Attempts to download the EM map file for the given EMDB ID.
- Parameters
emdb_code (str) – EMDB ID of the map file to download
- schrodinger.protein.getpdb.get_pdb(pdbid, source=0, caps_asis=False)¶
Attempts to get the specified PDB file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.
pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.
- Parameters
caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.
- Returns
Path to the PDB file that was written (
*.pdb
or*.cif
)- Return type
str
- Raises
requests.HTTPError – if error in connection to RCSB
RuntimeError – for other error retreiving file
- schrodinger.protein.getpdb.retrieve_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)¶
Attempt to retrieve the PDB from the local repository
First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:
pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z
- Parameters
pdbid (str) – the PDB code of the desired file
local_repos (list of str) – the paths to the parent directories of each local repository.
caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.
- Return type
str
- Returns
the name of the pdb file or None if a failure occurs
- schrodinger.protein.getpdb.find_local_repository(verbose=False)¶
Determine a directory list for local repositories.
Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY/database/pdb * SCHRODINGER/thirdparty/database/pdb (the default)
- Parameters
verbose (bool) – True if debugging messages should be printed to the screen
- Return type
list of str
- Returns
the paths to the parent directories of each local repository. Returns an empty list if the local repository cannot be determined.
- schrodinger.protein.getpdb.find_local_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)¶
Check a series of local directories and filenames for the PDB files.
First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:
pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z
Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY * SCHRODINGER/thirdparty (the default)
- Parameters
pdbid (str) – the PDB code of the desired file
local_repos (list of str) – the paths to the parent directories of each local repository.
verbose (bool) – True if debug messages should be printed out
caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.
- Return type
str
- Returns
the path to an existing file ith the desired PDB code
- schrodinger.protein.getpdb.download_pdb(pdb_code, biological_unit=False, try_as_cif=True)¶
Download the PDB record from www.rcsb.org into the CWD. If the PDB is too large to be downloaded as
*.pdb
file, it will be saved as*.cif
.- Parameters
pdb_code (str) – Four character alphanumeric string for the PDB id.
biological_unit (bool) – If True, and the file needs to be downloaded, then download the file at the biological unit URL, otherwise use the typical record URL. Default is False, get the typical record. # NOTE: This option is no longer used by PrepWizard, but still # used by getpdb_utility.py ($SCHRODINGER/utilities/getpdb)
try_as_cif (bool) – Whether to try downloading the file as CIF format if the structure is too large to be represented in PDB format.
- Returns
Path to the downloaded file.
- Return type
str
- Raises
requests.HTTPError – if error in connection to RCSB or pdb ID does not exist
RuntimeError – for other error retreiving file
- schrodinger.protein.getpdb.download_cif(pdb_code)¶
Download
*.cif
file from Web for a given PDB code.- Parameters
pdb_code (str) – Four character alphanumeric string for the PDB id.
- Returns
Path to the downloaded file.
- Return type
str
- Raises
requests.HTTPError – if error in connection to RCSB or pdb ID does not exist
- schrodinger.protein.getpdb.requests_retry_session(max_retries=3, backoff_factor=0.3, status_forcelist=(500, 502, 503, 504), session=None)¶
Return a session to connect to a web url. In case of network failures the session will retry (number of re-attempts allowed is specified by
retries
) to connect to the url.- Parameters
retries (int) – Total number of retries allowed
backoff_factor (float) – Backoff factor to apply between attempts after the second try.
urllib3
will sleep for: {backoff factor} * (2 ** ({number of total retries} - 1)) seconds before making next attempt.status_forcelist (iterable of int) – Http error status codes for which retry will happen
session (requests.Session) – A session object
- Returns
A session object
- Return type
requests.Session
- schrodinger.protein.getpdb.retrieve_ent(pdbid)¶
Retrieves the ENT file for the specified PDB ID from the third-party database and copies it to the CWD. File path is returned.
Raises RuntimeError on error.
- schrodinger.protein.getpdb.download_ent(pdbid)¶
Downloads the ENT file for the specified PDB ID from the RCSB web site, and saves it to the CWD. File path is returned.
- Raises
requests.HTTPError – if error in connection to RCSB
RuntimeError – for other error retreiving file
- schrodinger.protein.getpdb.get_ent(pdbid, source=0)¶
Attempts to get the specified ENT file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.
pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.
- Raises
requests.HTTPError – if error in connection to RCSB
RuntimeError – for other error retreiving file
- schrodinger.protein.getpdb.open_filename(filename, mode, encoding=None)¶
Opens a filename, or a temporary filename, if filename is not writeable. The name may change and is accessible via name attribute on file object.
- schrodinger.protein.getpdb.download_reflection_data(pdbid)¶
Attempt to download reflection data type pdbid: str param pdbid: PDB ID