schrodinger.application.msv.seqio module¶
- class schrodinger.application.msv.seqio.FetchIDs(pdb, entrez, uniprot)¶
Bases:
tuple
- __contains__(key, /)¶
Return key in self.
- __len__()¶
Return len(self).
- count(value, /)¶
Return number of occurrences of value.
- entrez¶
Alias for field number 1
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- pdb¶
Alias for field number 0
- uniprot¶
Alias for field number 2
- exception schrodinger.application.msv.seqio.SequenceWarning[source]¶
Bases:
UserWarning
Custom warning for problems loading sequences
- __init__(*args, **kwargs)¶
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class schrodinger.application.msv.seqio.catch_sequence_warnings(*args, **kwargs)[source]¶
Bases:
contextlib.ExitStack
Filter SequenceWarnings and store them on the instance
- callback(callback, /, *args, **kwds)¶
Registers an arbitrary callback and arguments.
Cannot suppress exceptions.
- close()¶
Immediately unwind the context stack.
- enter_context(cm)¶
Enters the supplied context manager.
If successful, also pushes its __exit__ method as a callback and returns the result of the __enter__ method.
- pop_all()¶
Preserve the context stack by transferring it to a new instance.
- push(exit)¶
Registers a callback with the standard __exit__ method signature.
Can suppress exceptions the same way __exit__ method can. Also accepts any object with an __exit__ method (registering a call to the method instead of the object itself).
- exception schrodinger.application.msv.seqio.GetSequencesException[source]¶
Bases:
OSError
Custom Exception for problems retrieving sequences.
- __init__(*args, **kwargs)¶
- args¶
- characters_written¶
- errno¶
POSIX exception code
- filename¶
exception filename
- filename2¶
second exception filename
- strerror¶
exception strerror
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class schrodinger.application.msv.seqio.PdbParts(pdbcode, pdbchain)¶
Bases:
tuple
- __contains__(key, /)¶
Return key in self.
- __len__()¶
Return len(self).
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- pdbchain¶
Alias for field number 1
- pdbcode¶
Alias for field number 0
- class schrodinger.application.msv.seqio.FastaParts(name, long_name, chain, anno_type)¶
Bases:
tuple
- __contains__(key, /)¶
Return key in self.
- __len__()¶
Return len(self).
- anno_type¶
Alias for field number 3
- chain¶
Alias for field number 2
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- long_name¶
Alias for field number 1
- name¶
Alias for field number 0
- schrodinger.application.msv.seqio.make_maestro_pdb_id(pdb_id)[source]¶
Convert a PDB ID to “:”-separated PDB code and PDB chain (e.g. 4hhb if chain is blank or 4hhb:A)
- Parameters
pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
- Returns
PDB ID with “:” between PDB code and PDB chain
- Return type
str
- schrodinger.application.msv.seqio.parse_pdb_id(pdb_id, permissive=False)[source]¶
Parse a PDB ID into a (pdb code, pdb chain) Named tuple.
- Parameters
pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
permissive (bool) – Whether to use permissive parsing. In strict mode, PDB ID must be 4 characters starting with a digit and single-letter chain is optional. In permissive mode, PDB ID can contain any non-whitespace characters but chain separator and single-letter chain are required.
- Returns
Named tuple of (pdbcode, pdbchain)
- Type
- Raises
GetSequencesException – if pdb_id can’t be parsed
- schrodinger.application.msv.seqio.get_valid_pdb_id_map_for_seqs(seqs, structureless_only=True)[source]¶
For a list of sequences return a map of valid PDB IDs to sequences.
- Parameters
seqs (list(sequence.Sequence)) – List of sequences to get the map for
structureless_only (bool) – Whether to only return structureless seqs
- Returns
Map of valid PDB IDs to their source sequence
- Return type
dict(str: sequence.Sequence)
- schrodinger.application.msv.seqio.valid_pdb_id(pdb_id: str) bool [source]¶
- Returns
Whether the ID appears to be a valid PDB ID
- schrodinger.application.msv.seqio.valid_entrez_id(entrez_id: str) bool [source]¶
Entrez ID may be:
1) NCBI Accession number: 9 or 12 characters starting with any letter, followed by
"P_"
, ending with 6 or 9 numbers and an optional number following a period (ex. NP_123456, XP_123456789.1)NCBI GenInfo identifier: A single 9-digit number (ex. 123456789).
- Returns
Whether the ID appears to be a valid Entrez ID
- schrodinger.application.msv.seqio.valid_uniprot_id(uniprot_id: str) bool [source]¶
UniProt ID must be 6 characters or 10 characters starting with a letter
- Returns
Whether the ID appears to be a valid UniProt ID
- schrodinger.application.msv.seqio.valid_swiss_prot_name(swiss_prot_name: str) bool [source]¶
Swiss-Prot entry name must be of the form X_Y, where X and Y are at most 5 alphanumeric characters and the underscore serves as a separator.
We also require Y to be a minimum of 2 characters to avoid confusion with a PDB ID.
- Returns
Whether the name appears to be a valid Swiss-Prot entry name
- schrodinger.application.msv.seqio.process_fetch_ids(ids, *, dialog_parent, allow_pdb=True)[source]¶
Convenience method to parse a list or comma-separated strings into valid sequence and/or structure identifiers. If any IDs can’t be identified, prompt the user to continue.
- Parameters
ids (str or list) – Database ID or IDs (comma-separated str or list)
dialog_parent (QtWidgets.QWidget) – Parent to show dialog box
allow_pdb (bool) – Whether to allow structure identifiers. If False, they will be treated as unidentified.
- Returns
Namedtuple of IDs identified as PDB, entrez, uniprot; or None if there are unidentified IDs and the user cancels.
- Return type
FetchIDs or NoneType
- schrodinger.application.msv.seqio.maestro_get_pdb(maestro_pdb_id, pdb_dir=None, remote_ok=False)[source]¶
Download a PDB file. If specified, the chain will be split out into a separate file.
- Parameters
maestro_pdb_id (str) – 4-letter PDB code or code:chain (e.g. 4hhb or 4hhb:A)
pdb_dir (str) – directory to check for existing files and destination to download new files
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
downloaded PDB path
- Return type
str
- Raises
GetSequencesException – if pdb file can’t be downloaded
- class schrodinger.application.msv.seqio.SeqDownloader[source]¶
Bases:
object
- ENTREZ_FORMAT_STR = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&id={ID}'¶
- UNIPROT_FORMAT_STR = 'https://www.uniprot.org/uniprot/{ID}.{EXT}'¶
- classmethod downloadPDB(pdb_id, pdb_dir=None, remote_ok=False)[source]¶
Parse PDB ID string and download PDB file.
- Parameters
pdb_id (str) – PDB ID with optional chain (e.g. 4hhb, 4hhbA, 4hhb:A)
pdb_dir (str) – directory to check for existing files and destination to download new files
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
Full path to downloaded PDB path
- Type
str
- Raises
GetSequencesException – if pdb file can’t be downloaded
- classmethod downloadEntrezSeq(sequence_id, remote_ok)[source]¶
Download a sequence from Entrez database.
- Parameters
sequence_id (str) – Sequence ID in Entrez format.
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
Full path to downloaded fasta file
- Return type
str
- classmethod downloadUniprotSeq(sequence_id, remote_ok, *, use_xml=False)[source]¶
Download a sequence from Uniprot database.
- Parameters
sequence_id (str) – Sequence ID in Uniprot format.
remote_ok (bool) – whether it’s okay to make a remote query.
use_xml (bool) – whether to get the xml file with the full UniProt annotation information (e.g. domains). Setting this to True with download the xml file instead of the FASTA file.
- Returns
Full path to downloaded fasta or xml file
- Return type
str
- schrodinger.application.msv.seqio.read_sequences(filename)[source]¶
Read sequences from the filename. Format is detected from the file extension
Note that this function is only used for non-structure filetypes. For structure filetypes, see the StructureConverter class.
- Parameters
filename (str) – Path to sequence file
- Return type
list
- Returns
A list of sequences in the file
- schrodinger.application.msv.seqio.from_biopython(biopy_seq)[source]¶
Convert a Biopython sequence to a ProteinSequence
- Parameters
seq (Bio.SeqRecord.SeqRecord) – A Biopython sequence to convert to a ProteinSequence
- Returns
The converted sequence
- Return type
- class schrodinger.application.msv.seqio.StructureConverter(ct, eid=None)[source]¶
Bases:
object
Reads a structure and converts it to a list of sequences.
Note that this class produces sequences that are ordered based on residue number and insertion code, not connectivity. If that ever changes,
structure_model.MaestroStructureModel._extractChains
must also be updated.- __init__(ct, eid=None)[source]¶
- Parameters
ct (schrodinger.structure.Structure) – A structure to convert to sequences.
eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
- classmethod convert(ct, eid=None)[source]¶
Convert the provided structure into a list of sequences.
- Parameters
ct (schrodinger.structure.Structure) – A structure to convert to sequences.
eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
- Returns
A list of sequences, one per chain.
- Return type
list[sequence.Sequence]
- makeSequences()[source]¶
Note that disulfide bonds might be between chains, so need to be calculated at the ct level
- Returns
A list of sequences, one per chain.
- Return type
list[sequence.Sequence]
- classmethod convertStructResidue(struct_res, make_res)[source]¶
Convert a
structure._Residue
into aresidue.Residue
.- Parameters
struct_res (structure._Residue or residue.Residue) – A structure residue to convert. If this is a
residue.Residue
object, it will be returned unchanged.make_res (callable) – A method to convert a string into a
residue.Residue
- Returns
A newly created residue
- Return type
- class schrodinger.application.msv.seqio.MMSequenceConverter[source]¶
Bases:
object
Converts sequence between mmseq and MSV sequence formats.
- Note
This is supposed to be used with ‘with’ context manager.
- classmethod readSequences(file_name, file_format=0)[source]¶
Reads all sequences from file specified by file_name.
- Parameters
file_name (str) – Name of input file.
file_format (int) – Format of the input file. By default, the format is MMSEQIO_ANY meaning file type is automatically recognized.
- Return type
- Returns
List of sequences read from the file.
- Raises
GetSequencesException – If the file could not be read.
- classmethod writeSequences(sequences, file_name, file_format=1)[source]¶
Writes sequences to a file specified by file_name.
- Raises
mmcheck.MmException – If the file could not be open for writing.
- Parameters
seqences – List of sequences to be written to file.
file_name (str) – Name of input file.
file_format (int) – Format of the input file. By default, the format is MMSEQIO_NATIVE.
- class schrodinger.application.msv.seqio.BaseProteinAlignmentReader[source]¶
Bases:
object
Base class for reading protein sequence alignments from files.
- classmethod read(file_name, AlnCls=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶
Returns alignment read from file
- Note
The alignment can be empty if no sequence was present in the input file.
- Parameters
file_name (str) – Source file name
AlnCls (type) – The type of the Alignment to return
- Returns
An alignment of the specified type
- Raises
IOError – If file cannot be read
- class schrodinger.application.msv.seqio.ClustalAlignmentReader[source]¶
Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentReader
Class for reading Clustal
*.aln
files.
- class schrodinger.application.msv.seqio.FastaAlignmentReader[source]¶
Bases:
object
- classmethod parseSSA(seq)[source]¶
Parse a SSA sequence into a list of SSA values that can be assigned to residues’
secondary_structure
property- Parameters
seq (str) – the “sequence” from the FASTA file which encodes the SSA values
- Returns
a list of the SSA values. The SSA values come from schrodinger.structure. Returns None if any of the elements was invalid
- Type
list(int) or NoneType
- classmethod read(file_name, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶
Loads a sequence file in FASTA format, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
- Parameters
file_name (str) – name of input FASTA file
AlnClass (type) – The class of the alignment object to return
- Returns
Read alignment.
- Return type
AlnClass
- classmethod readFromText(lines, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶
Read sequences from FASTA-formatted text, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
- Parameters
lines (list of str) – list of strings representing FASTA file
AlnClass (type) – The class of the alignment object to return
- Returns
The alignment
- Return type
AlnClass
- classmethod readFromStringList(strings, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶
Return an alignment object created from an iterable of sequence strings
- Parameters
strings (Iterable of strings) – Sequences as iterable of strings (1D codes)
AlnClass (type) – The class of the alignment object to return
- Returns
The alignment
- Return type
AlnClass
- schrodinger.application.msv.seqio.to_biopython(seq)[source]¶
Converts a sequence to a Biopython sequence
- Parameters
seq (schrodinger.protein.sequence.ProteinSequence) – A sequence to convert to a Biopython sequence
- Returns
The sequence converted to a Biopython SeqRecord
- Return type
Bio.SeqRecord.SeqRecord
- class schrodinger.application.msv.seqio.BaseProteinAlignmentWriter[source]¶
Bases:
object
Class for writing protein alignments to files.
- class schrodinger.application.msv.seqio.FastaAlignmentWriter[source]¶
Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
Class for writing FASTA .fasta files.
Format is described here: U{Fasta format wikipedia<https://en.wikipedia.org/wiki/FASTA_format>}
- HEADER_START = '>'¶
- HEADER_END = ''¶
- classmethod toStringAndNames(aln, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)[source]¶
Converts aln to FASTA string
- Parameters
aln (
ProteinAlignment
) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported.sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
FASTA string
- Return type
string
- classmethod toStringList(aln)[source]¶
Convert ProteinAlignment object to list of sequence strings
- Parameters
aln (
ProteinAlignment
) – Alignment data- Return type
list of str
- Returns
A list of sequence strings representing the alignment
- classmethod write(aln, file_name, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None, **kwargs)[source]¶
Write aln to FASTA file
- Raises
IOError – If output file cannot be written.
- Parameters
aln (
ProteinAlignment
) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
file_name (str) – Destination file name.
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported.sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
output names of each sequence
- Return type
list of str
- class schrodinger.application.msv.seqio.ClustalAlignmentWriter[source]¶
Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
Class for writing Clustal
*.aln
files.The format is described here:
http://meme-suite.org/doc/clustalw-format.html
- classmethod write(aln, file_name, use_unique_names=True, **kwargs)[source]¶
Writes aln to a Clustal alignment file.
Note:
**kwargs
are ignored, to preserve signature of BaseProteinAlignmentWriter- Raises
IOError – If output file cannot be written.
- Parameters
aln (
BaseAlignment
) – Alignment to be written to a file.file_name (str) – Destination file name.
use_unique_names (bool) – If True, write unique name for each sequence.
- Return type
dict
- Returns
A mapping of names written to the clustal file and sequences
- class schrodinger.application.msv.seqio.CSVAlignmentWriter[source]¶
Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
- class schrodinger.application.msv.seqio.SeqDAlignmentWriter[source]¶
Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
Class to write sequence and descriptors to seqd file. Each sequence is exported to a seqd file with name “<seq_name>_<chain_name>.seqd”
- schrodinger.application.msv.seqio.is_inhouse_header(fasta_header)[source]¶
Test that the given fasta header is of the in house format In house format is given by
">NAME:<long_name>|CHAIN:<chain>"
with an optional"|<anno_type>"
flag on the end.- Example::
>NAME:ABC|CHAIN:X|SSA >NAME:A|B|C|CHAIN:X x
- Parameters
fasta_header (str) – The fasta header to check
- Returns
Whether it is or isnt the in-house format
- Return type
bool
- schrodinger.application.msv.seqio.parse_in_house_header(fasta_header)[source]¶
Test that the given fasta header is of the in house format In house format is given by
">NAME:<long_name>|CHAIN:<chain>"
with an optional"|<anno_type>"
flag on the end.:- Example::
>NAME:ABC LONG|CHAIN:X|SSA –> ABC LONG, X, secondary_structure >NAME:A|B|C|CHAIN:X x –> A|B|C, X, None
- Parameters
fasta_header (str) – The fasta header to parse
- Returns
the long_name, chain and annotation type corresponding to the header
- Return type
tuple(str, str, PSAnno.ANNOTATION_TYPES) or NoneType)
- schrodinger.application.msv.seqio.parse_fasta_header(header, permissive=True)[source]¶
Parse a FASTA header into a (name, long_name, chain, anno_type) Named tuple.
- Parameters
header (str) – The header for a single entry in a FASTA file (including leading comment character)
permissive (bool) – Whether to use permissive parsing. See
parse_pdb_id
for documentation.
- Returns
Named tuple of (name, long_name, chain, anno_type)
- Type
- schrodinger.application.msv.seqio.parse_long_name(long_name, permissive=True)[source]¶
Attempt to parse a long_name into a short name and a chain.
- Example: 1FSK:A –> 1FSK, A
2BJM.H VH CDR_LENGTH: 5 17 11 –> 2BJM, H sp|accession|entry name –> accession, “”
- Parameters
long_name (str) – The long name to attempt to parse
permissive (bool) – Whether to use permissive parsing. See
parse_pdb_id
for documentation.
- Returns
A short name and a chain id
- Return type
- schrodinger.application.msv.seqio.reorder_fasta_alignment(aln, orig_names)[source]¶
Reorder a FASTA alignment to match the order of names written to FASTA.
Intended for use after alignment methods that reorder the output.
Example usage:
orig_names = seqio.FastaAlignmentWriter.write(orig_aln, input_filename) # run alignment method aln = seqio.FastaAlignmentReader.read(out_filename) reorder_fasta_alignment(aln, orig_names)
- Parameters
aln (alignment.BaseAlignment) – Alignment to reorder. Will be modified in place.
orig_names (list[str]) – Original order of sequence names as written to FASTA.
- Raises
ValueError – If the alignments have different lengths or mismatched names