Core Concepts¶
Structures¶
The Structure class is the fundamental class in our modules, and will probably be used in all of the code you write. Structure objects can be single molecules or groups of molecules. They provide access to atoms, bonds, properties, and a number of substructure elements.
Like any other Python object, Structure objects can be stored in arrays or dictionaries, assigned to variables, and passed between functions. (However, they cannot be pickled because they wrap an underlying C library.)
In principle, Structure objects can be created programmatically, by creating a zero-atom structure, adding the desired atoms and connecting them with bonds. However, this usage pattern is atypical. In most cases a structure will be loaded from a file or retrieved from the Maestro Workspace or the Maestro Project Table.
Most Schrödinger calculations will produce a Maestro-format output file (with either a mae or maegz file extension). Creating a Structure object from one of these files will allow you to investigate the properties and structure of the resulting molecule or molecules.
Structure Class Organization¶
Structure objects expose many attributes as iterators, including atoms, bonds, and substructure elements. In addition to attributes that are part of the class definition of these objects, structures, atoms, and bonds each have general dynamic dictionary-like property attributes that can store properties associated with the specific object.
See the API documentation for more details on the properties and methods of the Structure class.
First we’ll set up a structure for what follows. “st” for “structure” is commonly used to name Structure instances.
>>> from schrodinger import structure
>>> from schrodinger.test import mmshare_data_file
>>> st = structure.StructureReader.read(mmshare_data_file('r_group_enumeration_library/Diverse_R-groups.maegz'))
Note
The >>>
prefix in the examples that follow is the interactive
prompt. Examples without the prompt are snippets
of scripts.
Atoms¶
All Structure objects have a list-like atom
attribute that can be used to
iterate over all atoms or to access them by index. For example:
>>> for atom in st.atom:
... name = atom.name
... atomic_number = atom.atomic_number
... # do something with these attributes
It is also possible to index into the atom container (we do not currently support slicing). Indexing starts at 1.
>>> # Print the name and atomic number of the first atom in the structure.
>>> atom = st.atom[4]
>>> name = atom.atom_type_name
>>> atomic_number = atom.atomic_number
>>> print(f"{name}: {atomic_number}")
H1: 1
Each atom is represented by an instance of the _StructureAtom class.
Some attributes (actually Python properties) of the
_StructureAtom
objects include name
, atomic_number
, formal_charge
,
and the Cartesian coordinates in x
, y
, and z
. See the _StructureAtom
properties for a full list.
Note that atom indices can change if the structure is modified and so can’t be safely relied on in many contexts. If you need to reidentify atoms after performing an operation that modifies the structure, you can use the _StructureAtom instance to ensure that you continue to refer to the correct atom. The _StructureAtom instance has an index attribute that will remain up-to-date through any such changes.
Bonds¶
Each atom also has a list-like bond attribute:
for atom in st.atom:
print(f"{atom} is bonded to:")
for bond in atom.bond:
print(f" atom {bond.atom2}")
Bonds are represented by the _StructureBond class. Important attributes
of the bond class include order
, atom1
, and atom2
. See the
_StructureBond properties for full documentation.
Bonds within the structure are also accessible from a list-like attribute of a
Structure object called bond
. This access is useful for cases where you
want to iterate over all bonds in a structure exactly once.
# It's possible to iterate over all bonds in a structure:
for bond in st.bond:
print(f"Bonded atoms: {bond.atom1} and {bond.atom2}")
Substructures¶
A number of “substructure iterators” are available from each Structure
object. Each of these iterators returns an instance of a non-public class that
is a view on the substructure contained within the Structure object. Each
substructure class has an extractStructure
method that can be used to create
a new and independent Structure object with the atoms in the substructure.
They also have getAtomList
methods to return a list of atom indices
corresponding to the substructure and an atom
iterator.
- molecule
A Structure may have multiple unconnected molecules which can be iterated over using the molecule attribute. Returns an iterator that iterates over _Molecule objects.
- chain
Iterates over protein chains in the Structure object. Returns a _Chain instance.
- residue
Iterates over protein residues in the Structure object. Returns a _Residue instance.
- ring
Iterates over all rings in the Structure object, as found by SSSR. Returns a _Ring instance.
Some example usages:
for res in st.residue:
resname = res.pdbres.strip()
print(f"{res.chain}:{resname}{res.resnum}")
# A molecule is just a connected graph of atoms
for mol in st.molecule:
num_atoms = len({st.atom[i] for i in mol.getAtomIndices()})
print(f"Mol {mol.number} has {num_atoms} residues")
for chain in st.chain:
print(f"Chain: {chain.name}")
print(f"The structure has {len(st.ring))} rings")
for ring in st.ring:
if ring.isAromatic():
print(list(ring.atom))
print(f"The structure has {len(st.molecule)} molecules.")
for mol in st.molecule:
print("Molecule {mol.number} has {len(mol.atom)} atoms.")
The _Molecule
and _Chain
instances also support their own residue
iterators. For example:
for chain in st.chain:
residues = "".join([res.getCode() for res in chain.residue])
print(f"chain {chain.name}: {residues}")
A few things are worth noting. First, you can’t index into a _Residuecontainer in the way that you can an atom or molecule container. If you’d like to do this, pass the residue container to a Python list and index into that list, remembering that Python lists are 0-based:
first_res = list(ct.residue)[0]
Note that when you’re iterating over a structure, you should not add or delete atoms or bonds.
Interface¶
The Structure class has a rich interface for performing common tasks, such as getting and settings atomic coordinates, searching for substructures, measuring distances and angles, etc. Many of these will be covered in the Cookbook section.
Properties¶
Structures and atoms can store properties in a dictionary-like attribute named
property
. Structure properties can be viewed in the Maestro Project Table, and
are used by product backends to store results and intermediate data.
The property names in this property
object must follow a pattern that is
required for storage in Maestro-format files. The required naming scheme is
type_author_property_name
, where type
is a data type
prefix, author
is a source specification, and property_name
is the actual name of the data. The type
prefix must be b
for
boolean, i
for integer, r
for real, and s
for string. The source
specification is typically a Schrödinger program abbreviation (e.g. m
for
Maestro and j
for Jaguar) and the appropriate user-level source
specification is user
. (In Maestro-format files, the Structure object
property names correspond to the properties listed under the f_m_ct {
line.)
This example shows how to access, set, and delete Structure object properties:
# 'r_j_Gas_Phase_Energy' is a real property set by Jaguar.
gas_phase_energy = st.property['r_j_Gas_Phase_Energy']
# Properties stored by the user should use an "author" of 'user'.
st.property['r_user_Energy_Plus_Two'] = gas_phase_energy + 2.0
# Delete the new 'r_user_Energy_Plus_Two' property.
del st.property['r_user_Energy_Plus_Two']
Because the property
objects are dictionary subclasses, the standard
dictionary methods like keys
and items
also work.
Properties of atoms work the same way. For example, you could assign a property to all carbon atoms:
for atom in st.atom:
if atom.atomic_number == 6:
atom.property['b_user_is_carbon'] = True
Structure I/O¶
Reading a Structure from a File¶
The schrodinger.structure.StructureReader_
class creates
Structure objects from molecular data stored in a number of standard file
formats. Supported file types are Maestro, MDL SD, PDB, and Sybyl Mol2.
Because these files may contain multiple molecules, the StructureReader
is
an iterator, and molecule files are presented as a sequence of Structure
objects.
from schrodinger import structure
#Input can be a .mae, .sdf, .sd, .pdb, or .mol2 file.
input_file = "input.mae"
for st in structure.StructureReader(input_file):
# Do something with the Structure...
result = process_structure(st)
# To read only the first structure from a file, pass the handle to next.
reader = structure.StructureReader(input_file)
st = next(reader)
If you’re interested in a specific structure in the file and know the index,
the Structure class also has a read
classmethod for convenience:
# selects the first structure
st = structure.StructureReader.read(input_file)
# select the #nth structure, counting from 1
st = structure.StructureReader.read(input_file, index=3)
SMILES format
files and CSV files with SMILES data are also supported, but because
these have no structural data, resulting structures are SmilesStructures,
which have less functionality than standard Structures
. See the
SmilesReader and SmilesCsvReader documentation.
Saving a Structure to a File¶
The StructureWriter class is the counterpart to the
schrodinger.structure.StructureReader. It can write the same file formats as
the StructureReader
but mae
is recommended as the least lossy.
This is an example of a typical read, process, and write script:
from schrodinger import structure
with structure.StructureReader("input.mae") as reader:
with structure.StructureWriter("output.mae") as writer:
for st in reader:
# Do the required processing
result_structure = do_processing(st)
# Save the result to the output file
writer.append(result_structure)
# Use reader and writer as context managers to ensure that the files are
closed after we're done with them.
Alternatively, if only a single structure is being written to a file, you
can use the write staticmethod of StructureWriter
:
from schrodinger import structure
# select the first structure
st = structure.StructureReader.read(input_file)
# do something here . . .
# . . . and then write it to a separate file using the staticmethod
structure.StructureWriter.write(st, output_file)
Structure Operations¶
In addition to the functionality provided in the schrodinger.structure module itself, much is provided in the schrodinger.structutils package.
This section lists some additional Structure features and a few highlights
of the structutils
package.
Structure Minimization¶
Structures can be minimized using one of the OPLS_2005 or OPLS3e force fields by using the minimize_structure function. This operation requires a valid product license from MacroModel, GLIDE, Impact, or PLOP. Note that minimization will not hold on to a license; a license is checked out to ensure that one is available, then immediately checked back in.
For example, to compare the energy of a molecule before and after minimization:
from schrodinger.structutils.minimize import minimize_structure
# Do a 0-step "minimization" to get the initial energy.
min_res = minimize_structure(st, max_steps=0)
original_energy = min_res.potential_energy
min_res = minimize_structure(st)
minimized_energy = min_res.potential_energy
energy_diff = original_energy - minimized_energy
print(f"The minimized energy is {energy_diff} kcal/mol lower than the original.")
Substructure Searching or Specification¶
Generate SMILES, SMARTS, or ASL strings based on a set of atom indices via the generate_smiles, generate_smarts, and generate_asl functions. Documentation on ASL can be found in the Maestro Command Reference Manual.
Evaluate SMARTS or ASL strings and return a list of matching atom indices via the evaluate_smarts and evaluate_asl functions.
This example finds the set of unique SMILES strings in a structure file:
from schrodinger.structutils.analyze import generate_smiles
unique_smiles = set()
for st in reader:
pattern = generate_smiles(st)
unique_smiles.add(pattern)
Structure Measurement¶
The schrodinger.structutils.measure module provides functions for measuring distances, angles, dihedral angles, and plane angles. It also offers the get_close_atoms method to find all pairs of atoms within a specified distance in less than O(N 2) time.
Structure Superimposition or Comparison¶
The in-place RMSD of two structures can be determined via the calculate_in_place_rmsd function. The ConformerRmsd class offers more complete RMSD comparison tools for conformers.
Two structures can be superimposed based on all atoms or a subset of atoms with the superimpose function.
Conversion Between 1D/2D and 3D Structures¶
To convert a 3D structure to a 1D structure (SMILES or SMARTS), use the appropriate function from schrodinger.structutils.analyze:
from schrodinger.structutils import analyze
smiles_list = []
smarts_list = []
for st in reader:
smiles_list.append(analyze.generate_smiles(st))
smarts_list.append(analyze.generate_smarts(st))
It is possible to convert a file of 1D SMILES strings to 3D structures.:
from schrodinger import structure
3d_sts = []
with structure.StructureReader.fromString('smiles_input') as reader:
for 1d_st in reader:
3d_sts.append(1d_st.generate3dConformation())
To convert a 3D structure to a 2D structure, use the canvasConvert
utility from the command line:
$SCHRODINGER/utilities/canvasConvert -imae input.mae -2D -osd output.sd
The resulting SD file can then be read back in with the StructureReader class.
Modifying a Structure¶
Atoms can be added via the Structure.addAtoms method.
Individual atoms can be deleted with standard Python list syntax:
>>> st_copy = st.copy()
>>> len(st.atom)
5
>>> del st.atom[2]
>>> len(st.atom)
4
Note
Deleting atoms changes the indices of the atoms remaining in the Structure object.
Because deleting atoms renumbers the remaining atoms, multiple atoms should be deleted via the Structure.deleteAtoms method.
>>> len(st.atom)
4
>>> st.deleteAtoms([1, 2])
>>> len(st.atom)
2
Charges and atom identity can be modified by making assignments to the
proper _StructureAtom
attributes:
>>> st = structure.StructureReader.read(mmshare_data_file('r_group_enumeration_library/Diverse_R-groups.maegz'))
>>> at = st.atom[1]
>>> at.element
'C'
>>> at.atomic_number
6
>>> at.formal_charge
0
>>> at.element = 'N'
>>> at.formal_charge = 1
>>> at.formal_charge
1
>>> at.atomic_number
7
>>> at.atomic_number = 6
>>> at.element
'C'
>>> other_atom = st.atom[5]
>>> other_atom.index
5
>>> del st.atom[2]
>>> other_atom.index # This index is updated
4
As can be seen from the above examples, changing the atomic_number
or
element
attributes automatically updates the associated value.
Bonds can be broken or created. For example:
# To avoid modifying the original structure, make a copy.
st = st_orig.copy()
# Break and re-join the first bond on the first atom.
bond = st.atom[1].bond[1]
atom1 = bond.atom1.index
atom2 = bond.atom2.index
order = bond.order
st.deleteBond(atom1, atom2) # Delete the bond.
st.addBond(atom1, atom2, order) # Recreate bond with same bond order.
Hydrogens can be added via the schrodinger.structutils.build.add_hydrogens function, or deleted via the delete_hydrogens function.