schrodinger.application.canvas.cluster module¶
Canvas clustering functionality.
There are classes to perform custering and to support command line and graphical interfaces to the clustering options.
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.application.canvas.cluster.mktemp()¶
A simple wrapper to tempfile.mkstemp which closes the file and returns just the name of the file
- class schrodinger.application.canvas.cluster.CanvasFingerprintCluster(logger)¶
Bases:
object
A class which handles clustering of canvas fingerprints. This maintains a list of the possible linkage types and keeps track of the current type of linkage specified
- LINKAGE_TYPES = ['Single', 'Complete', 'Average', 'Centroid', 'McQuitty', 'Ward', 'Weighted Centroid', 'Flexible Beta', 'Schrodinger']¶
- __init__(logger)¶
Initialize the instance of the cluster class
- getDescription()¶
Returns a string representing a summary of the current linkage settings
- debug(output)¶
Wrapper for debug logging, just to simplify logging
- setLinkage(linkage)¶
Set the current linkage based on the linkage name
- getCurrentLinkage()¶
Returns the current linkage definition
- clusterDM(dm_file_name)¶
Cluster the distance matrix file given in dm_file_name, using similarity settings encapsulated in dp_sim. The value returned is the cluster strain. The dm_file_name should point to a CSV file containing the matrix
- generateDM(dm_file_name, fp_file, fp_gen, fp_sim)¶
Generate a distance matrix of the specified filename from the finger print file fp_file. The fp_gen and fp_sim objects encapsulate the current fingerprint and similarity settings
- clusterFP(fp_file, fp_gen, fp_sim)¶
Cluster the fingerprints contained in fp_file. The bitsize will be taken from the CanvasFingerpintGenerator(). The similarity metric will be taken from the CanvasFingerprintSimilarity object fp_sim This function returns the ‘strain’ reported by the clustering
- group(num_clusters)¶
Perform a grouping operation based on an existing clustering run. If the clustering has not actually been performed yet then an exception will be raised.
- getMatrixTime()¶
Returns the time required for distance matrix generation
- getClusterTime()¶
Returns the time required for clustering
- getGroupTime()¶
Returns the time required for group creation
- getClusteringMap()¶
Once grouping has been done this method may be called to return a dictionary where the keys represent the original fingerprint IDs (usually the position of the structure in the file or the entry ID) and the values are the cluster this structure belongs to
- getClusterContents()¶
Once grouping has been done this method may be called to return a dictionary where the keys represent the cluster number and the values are a list of ID (usually position in the file or entry ids)
- getDistanceToCentroid(item)¶
For a given item in the most recent cluster grouping return the distance to the centroid of the cluster which contains this item
- getIsNearestToCentroid(item)¶
For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid
- getIsFarthestFromCentroid(item)¶
For a given item in the most recent cluster grouping return a boolean value which indicates whether the item is nearest the centroid
- getMaxDistanceFromCentroid(item)¶
For a given item in the most recent cluster grouping return the maximum distance to the centroid for any item in the cluster
- getAverageDistanceFromCentroid(item)¶
For a given item in the most recent cluster grouping return the average distance to the centroid for any item in the cluster
- getClusterVariance(item)¶
For a given item return the variance of the cluster which that item belongs to.
- getBestNumberOfClusters()¶
The cluster statistics file contains information about each clustering level. This function returns the number of clusters at which the Kelley function has a minimum
- getNumberOfClustersList()¶
Returns the number of clusters at each level
- getRSquaredList()¶
Returns the r-squared value at each clustering level
- getSemiPartialRSquaredList()¶
Returns the semi-partial R-squared value at each clustering level
- getKelleyPenaltyList()¶
Returns the Kelley Penalty value at each clustering level
- getMergeDistanceList()¶
Returns the merge distance value at each clustering level
- getSeparationRatioList()¶
Returns the separation ratio - calculated from the merge distance of
- getDendrogramData()¶
Returns a tuple with 1) a list of line positions, each in the form [x1,x2][y1,y2] each one of which defines a line segment to be plotted in a dendrogram 2) a list of x-axis tick positions 3) a list of x-axis tick labels
- getDistanceMatrixFile()¶
Returns the name of the distance matrix file used in the most recent clustering
- getClusterOrderMap(num_clusters)¶
Returns a dictionary where the keys are the item labels and the values represent the index it would have in the grouping which places the items in cluster order
- class schrodinger.application.canvas.cluster.CanvasFingerprintClusterCLI(logger)¶
Bases:
schrodinger.application.canvas.cluster.CanvasFingerprintCluster
A subclass of the canvas fingerprint cluster manager which is to be used from a program with a command line interface. This class has methods for defining options in an option parser and for applying those options once they’ve been parsed. The idea is to provide a standard command line interface for setting the clustering options
- __init__(logger)¶
Initialize the instance of the cluster class
- addOptions(parser)¶
Add options for cluster linkage
- parseOptions(options)¶
Examine the options and set the internal state to reflect them
- getLinkageDescription()¶
Return a string which contains a description of the linkage methods available for cluster linkage