schrodinger.application.desmond.antlr3.tree module¶
@package antlr3.tree @brief ANTLR3 runtime package, tree module
This module contains all support classes for AST construction and tree parsers.
- exception schrodinger.application.desmond.antlr3.tree.RewriteCardinalityException(elementDescription)¶
Bases:
RuntimeError
Base class for all exceptions thrown during AST rewrite construction.
This signifies a case where the cardinality of two or more elements in a subrule are different:
(ID INT)+
where|ID|!=|INT|
- __init__(elementDescription)¶
- getMessage()¶
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception schrodinger.application.desmond.antlr3.tree.RewriteEarlyExitException(elementDescription=None)¶
Bases:
schrodinger.application.desmond.antlr3.tree.RewriteCardinalityException
@brief No elements within a (…)+ in a rewrite rule
- __init__(elementDescription=None)¶
- args¶
- getMessage()¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception schrodinger.application.desmond.antlr3.tree.RewriteEmptyStreamException(elementDescription)¶
Bases:
schrodinger.application.desmond.antlr3.tree.RewriteCardinalityException
@brief Ref to ID or expr but no tokens in ID stream or subtrees in expr stream
- __init__(elementDescription)¶
- args¶
- getMessage()¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class schrodinger.application.desmond.antlr3.tree.Tree¶
Bases:
object
@brief Abstract baseclass for tree nodes.
What does a tree look like? ANTLR has a number of support classes such as CommonTreeNodeStream that work on these kinds of trees. You don’t have to make your trees implement this interface, but if you do, you’ll be able to use more support code.
NOTE: When constructing trees, ANTLR can build any kind of tree; it can even use Token objects as trees if you add a child list to your tokens.
This is a tree node without any payload; just navigation and factory stuff.
- getChild(i)¶
- getChildCount()¶
- getParent()¶
Tree tracks parent and child index now > 3.0
- setParent(t)¶
Tree tracks parent and child index now > 3.0
- hasAncestor(ttype)¶
Walk upwards looking for ancestor with this token type.
- getAncestor(ttype)¶
Walk upwards and get first ancestor with this token type.
- getAncestors()¶
Return a list of all ancestors of this node.
The first node of list is the root and the last is the parent of this node.
- getChildIndex()¶
This node is what child index? 0..n-1
- setChildIndex(index)¶
This node is what child index? 0..n-1
- freshenParentAndChildIndexes()¶
Set the parent and child index values for all children
- addChild(t)¶
Add t as a child to this node. If t is null, do nothing. If t is nil, add all children of t to this’ children.
- setChild(i, t)¶
Set ith child (0..n-1) to t; t must be non-null and non-nil node
- deleteChild(i)¶
- replaceChildren(startChildIndex, stopChildIndex, t)¶
Delete children from start to stop and replace with t even if t is a list (nil-root tree). num of children can increase or decrease. For huge child lists, inserting children can force walking rest of children to set their childindex; could be slow.
- isNil()¶
Indicates the node is a nil node but may still have children, meaning the tree is a flat list.
- getTokenStartIndex()¶
- What is the smallest token index (indexing from 0) for this node
and its children?
- setTokenStartIndex(index)¶
- getTokenStopIndex()¶
What is the largest token index (indexing from 0) for this node and its children?
- setTokenStopIndex(index)¶
- dupNode()¶
- getType()¶
Return a token type; needed for tree parsing.
- getText()¶
- getLine()¶
In case we don’t have a token payload, what is the line for errors?
- getCharPositionInLine()¶
- toStringTree()¶
- toString()¶
- class schrodinger.application.desmond.antlr3.tree.TreeAdaptor¶
Bases:
object
@brief Abstract baseclass for tree adaptors.
How to create and navigate trees. Rather than have a separate factory and adaptor, I’ve merged them. Makes sense to encapsulate.
This takes the place of the tree construction code generated in the generated code in 2.x and the ASTFactory.
I do not need to know the type of a tree at all so they are all generic Objects. This may increase the amount of typecasting needed. :(
- createWithPayload(payload)¶
Create a tree node from Token object; for CommonTree type trees, then the token just becomes the payload. This is the most common create call.
Override if you want another kind of node to be built.
- dupNode(treeNode)¶
Duplicate a single tree node.
Override if you want another kind of node to be built.
- dupTree(tree)¶
Duplicate tree recursively, using dupNode() for each node
- nil()¶
Return a nil node (an empty but non-null node) that can hold a list of element as the children. If you want a flat tree (a list) use “t=adaptor.nil(); t.addChild(x); t.addChild(y);”
- errorNode(input, start, stop, exc)¶
Return a tree node representing an error. This node records the tokens consumed during error recovery. The start token indicates the input symbol at which the error was detected. The stop token indicates the last symbol consumed during recovery.
You must specify the input stream so that the erroneous text can be packaged up in the error node. The exception could be useful to some applications; default implementation stores ptr to it in the CommonErrorNode.
This only makes sense during token parsing, not tree parsing. Tree parsing should happen only when parsing and tree construction succeed.
- isNil(tree)¶
Is tree considered a nil node used to make lists of child nodes?
- addChild(t, child)¶
Add a child to the tree t. If child is a flat tree (a list), make all in list children of t. Warning: if t has no children, but child does and child isNil then you can decide it is ok to move children to t via t.children = child.children; i.e., without copying the array. Just make sure that this is consistent with have the user will build ASTs. Do nothing if t or child is null.
- becomeRoot(newRoot, oldRoot)¶
If oldRoot is a nil root, just copy or move the children to newRoot. If not a nil root, make oldRoot a child of newRoot.
old=^(nil a b c), new=r yields ^(r a b c) old=^(a b c), new=r yields ^(r ^(a b c))
If newRoot is a nil-rooted single child tree, use the single child as the new root node.
old=^(nil a b c), new=^(nil r) yields ^(r a b c) old=^(a b c), new=^(nil r) yields ^(r ^(a b c))
If oldRoot was null, it’s ok, just return newRoot (even if isNil).
old=null, new=r yields r old=null, new=^(nil r) yields ^(nil r)
Return newRoot. Throw an exception if newRoot is not a simple node or nil root with a single child node–it must be a root node. If newRoot is ^(nil x) return x as newRoot.
Be advised that it’s ok for newRoot to point at oldRoot’s children; i.e., you don’t have to copy the list. We are constructing these nodes so we should have this control for efficiency.
- rulePostProcessing(root)¶
Given the root of the subtree created for this rule, post process it to do any simplifications or whatever you want. A required behavior is to convert ^(nil singleSubtree) to singleSubtree as the setting of start/stop indexes relies on a single non-nil root for non-flat trees.
Flat trees such as for lists like “idlist : ID+ ;” are left alone unless there is only one ID. For a list, the start/stop indexes are set in the nil node.
This method is executed after all rule tree construction and right before setTokenBoundaries().
- getUniqueID(node)¶
For identifying trees.
How to identify nodes so we can say “add node to a prior node”? Even becomeRoot is an issue. Use System.identityHashCode(node) usually.
- createFromToken(tokenType, fromToken, text=None)¶
Create a new node derived from a token, with a new token type and (optionally) new text.
This is invoked from an imaginary node ref on right side of a rewrite rule as IMAG[$tokenLabel] or IMAG[$tokenLabel “IMAG”].
This should invoke createToken(Token).
- createFromType(tokenType, text)¶
Create a new node derived from a token, with a new token type.
This is invoked from an imaginary node ref on right side of a rewrite rule as IMAG[“IMAG”].
This should invoke createToken(int,String).
- getType(t)¶
For tree parsing, I need to know the token type of a node
- setType(t, type)¶
Node constructors can set the type of a node
- getText(t)¶
- setText(t, text)¶
Node constructors can set the text of a node
- getToken(t)¶
Return the token object from which this node was created.
Currently used only for printing an error message. The error display routine in BaseRecognizer needs to display where the input the error occurred. If your tree of limitation does not store information that can lead you to the token, you can create a token filled with the appropriate information and pass that back. See BaseRecognizer.getErrorMessage().
- setTokenBoundaries(t, startToken, stopToken)¶
Where are the bounds in the input token stream for this node and all children? Each rule that creates AST nodes will call this method right before returning. Flat trees (i.e., lists) will still usually have a nil root node just to hold the children list. That node would contain the start/stop indexes then.
- getTokenStartIndex(t)¶
Get the token start index for this subtree; return -1 if no such index
- getTokenStopIndex(t)¶
Get the token stop index for this subtree; return -1 if no such index
- getChild(t, i)¶
Get a child 0..n-1 node
- setChild(t, i, child)¶
Set ith child (0..n-1) to t; t must be non-null and non-nil node
- deleteChild(t, i)¶
Remove ith child and shift children down from right.
- getChildCount(t)¶
How many children? If 0, then this is a leaf node
- getParent(t)¶
Who is the parent node of this node; if null, implies node is root. If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- setParent(t, parent)¶
Who is the parent node of this node; if null, implies node is root. If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- getChildIndex(t)¶
What index is this node in the child list? Range: 0..n-1 If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- setChildIndex(t, index)¶
What index is this node in the child list? Range: 0..n-1 If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- replaceChildren(parent, startChildIndex, stopChildIndex, t)¶
Replace from start to stop child index of parent with t, which might be a list. Number of children may be different after this call.
If parent is null, don’t do anything; must be at root of overall tree. Can’t replace whatever points to the parent externally. Do nothing.
- create(*args)¶
Deprecated, use createWithPayload, createFromToken or createFromType.
This method only exists to mimic the Java interface of TreeAdaptor.
- class schrodinger.application.desmond.antlr3.tree.BaseTree(node=None)¶
Bases:
schrodinger.application.desmond.antlr3.tree.Tree
@brief A generic tree implementation with no payload.
You must subclass to actually have any user data. ANTLR v3 uses a list of children approach instead of the child-sibling approach in v2. A flat tree (a list) is an empty node whose children represent the list. An empty, but non-null node is called “nil”.
- __init__(node=None)¶
Create a new node from an existing node does nothing for BaseTree as there are no fields other than the children list, which cannot be copied as the children are not considered part of this node.
- getChild(i)¶
- getChildren()¶
@brief Get the children internal List
Note that if you directly mess with the list, do so at your own risk.
- getFirstChildWithType(treeType)¶
- getChildCount()¶
- addChild(childTree)¶
Add t as child of this node.
Warning: if t has no children, but child does and child isNil then this routine moves children to t via t.children = child.children; i.e., without copying the array.
- addChildren(children)¶
Add all elements of kids list as children of this node
- setChild(i, t)¶
Set ith child (0..n-1) to t; t must be non-null and non-nil node
- deleteChild(i)¶
- replaceChildren(startChildIndex, stopChildIndex, newTree)¶
Delete children from start to stop and replace with t even if t is a list (nil-root tree). num of children can increase or decrease. For huge child lists, inserting children can force walking rest of children to set their childindex; could be slow.
- isNil()¶
Indicates the node is a nil node but may still have children, meaning the tree is a flat list.
- freshenParentAndChildIndexes(offset=0)¶
Set the parent and child index values for all children
- sanityCheckParentAndChildIndexes(parent=None, i=- 1)¶
- getChildIndex()¶
BaseTree doesn’t track child indexes.
- setChildIndex(index)¶
BaseTree doesn’t track child indexes.
- getParent()¶
BaseTree doesn’t track parent pointers.
- setParent(t)¶
BaseTree doesn’t track parent pointers.
- hasAncestor(ttype)¶
Walk upwards looking for ancestor with this token type.
- getAncestor(ttype)¶
Walk upwards and get first ancestor with this token type.
- getAncestors()¶
Return a list of all ancestors of this node.
The first node of list is the root and the last is the parent of this node.
- toStringTree()¶
Print out a whole tree not just a node
- getLine()¶
In case we don’t have a token payload, what is the line for errors?
- getCharPositionInLine()¶
- toString()¶
Override to say how a node (not a tree) should look as text
- dupNode()¶
- getText()¶
- getTokenStartIndex()¶
- What is the smallest token index (indexing from 0) for this node
and its children?
- getTokenStopIndex()¶
What is the largest token index (indexing from 0) for this node and its children?
- getType()¶
Return a token type; needed for tree parsing.
- setTokenStartIndex(index)¶
- setTokenStopIndex(index)¶
- class schrodinger.application.desmond.antlr3.tree.BaseTreeAdaptor¶
Bases:
schrodinger.application.desmond.antlr3.tree.TreeAdaptor
@brief A TreeAdaptor that works with any Tree implementation.
- nil()¶
Return a nil node (an empty but non-null node) that can hold a list of element as the children. If you want a flat tree (a list) use “t=adaptor.nil(); t.addChild(x); t.addChild(y);”
- errorNode(input, start, stop, exc)¶
create tree node that holds the start and stop tokens associated with an error.
If you specify your own kind of tree nodes, you will likely have to override this method. CommonTree returns Token.INVALID_TOKEN_TYPE if no token payload but you might have to set token type for diff node type.
You don’t have to subclass CommonErrorNode; you will likely need to subclass your own tree node class to avoid class cast exception.
- isNil(tree)¶
Is tree considered a nil node used to make lists of child nodes?
- dupTree(t, parent=None)¶
This is generic in the sense that it will work with any kind of tree (not just Tree interface). It invokes the adaptor routines not the tree node routines to do the construction.
- addChild(tree, child)¶
Add a child to the tree t. If child is a flat tree (a list), make all in list children of t. Warning: if t has no children, but child does and child isNil then you can decide it is ok to move children to t via t.children = child.children; i.e., without copying the array. Just make sure that this is consistent with have the user will build ASTs.
- becomeRoot(newRoot, oldRoot)¶
If oldRoot is a nil root, just copy or move the children to newRoot. If not a nil root, make oldRoot a child of newRoot.
old=^(nil a b c), new=r yields ^(r a b c) old=^(a b c), new=r yields ^(r ^(a b c))
If newRoot is a nil-rooted single child tree, use the single child as the new root node.
old=^(nil a b c), new=^(nil r) yields ^(r a b c) old=^(a b c), new=^(nil r) yields ^(r ^(a b c))
If oldRoot was null, it’s ok, just return newRoot (even if isNil).
old=null, new=r yields r old=null, new=^(nil r) yields ^(nil r)
Return newRoot. Throw an exception if newRoot is not a simple node or nil root with a single child node–it must be a root node. If newRoot is ^(nil x) return x as newRoot.
Be advised that it’s ok for newRoot to point at oldRoot’s children; i.e., you don’t have to copy the list. We are constructing these nodes so we should have this control for efficiency.
- rulePostProcessing(root)¶
Transform ^(nil x) to x and nil to null
- createFromToken(tokenType, fromToken, text=None)¶
Create a new node derived from a token, with a new token type and (optionally) new text.
This is invoked from an imaginary node ref on right side of a rewrite rule as IMAG[$tokenLabel] or IMAG[$tokenLabel “IMAG”].
This should invoke createToken(Token).
- createFromType(tokenType, text)¶
Create a new node derived from a token, with a new token type.
This is invoked from an imaginary node ref on right side of a rewrite rule as IMAG[“IMAG”].
This should invoke createToken(int,String).
- getType(t)¶
For tree parsing, I need to know the token type of a node
- setType(t, type)¶
Node constructors can set the type of a node
- getText(t)¶
- setText(t, text)¶
Node constructors can set the text of a node
- getChild(t, i)¶
Get a child 0..n-1 node
- setChild(t, i, child)¶
Set ith child (0..n-1) to t; t must be non-null and non-nil node
- deleteChild(t, i)¶
Remove ith child and shift children down from right.
- getChildCount(t)¶
How many children? If 0, then this is a leaf node
- getUniqueID(node)¶
For identifying trees.
How to identify nodes so we can say “add node to a prior node”? Even becomeRoot is an issue. Use System.identityHashCode(node) usually.
- createToken(fromToken=None, tokenType=None, text=None)¶
Tell me how to create a token for use with imaginary token nodes. For example, there is probably no input symbol associated with imaginary token DECL, but you need to create it as a payload or whatever for the DECL node as in ^(DECL type ID).
If you care what the token payload objects’ type is, you should override this method and any other createToken variant.
- create(*args)¶
Deprecated, use createWithPayload, createFromToken or createFromType.
This method only exists to mimic the Java interface of TreeAdaptor.
- createWithPayload(payload)¶
Create a tree node from Token object; for CommonTree type trees, then the token just becomes the payload. This is the most common create call.
Override if you want another kind of node to be built.
- dupNode(treeNode)¶
Duplicate a single tree node.
Override if you want another kind of node to be built.
- getChildIndex(t)¶
What index is this node in the child list? Range: 0..n-1 If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- getParent(t)¶
Who is the parent node of this node; if null, implies node is root. If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- getToken(t)¶
Return the token object from which this node was created.
Currently used only for printing an error message. The error display routine in BaseRecognizer needs to display where the input the error occurred. If your tree of limitation does not store information that can lead you to the token, you can create a token filled with the appropriate information and pass that back. See BaseRecognizer.getErrorMessage().
- getTokenStartIndex(t)¶
Get the token start index for this subtree; return -1 if no such index
- getTokenStopIndex(t)¶
Get the token stop index for this subtree; return -1 if no such index
- replaceChildren(parent, startChildIndex, stopChildIndex, t)¶
Replace from start to stop child index of parent with t, which might be a list. Number of children may be different after this call.
If parent is null, don’t do anything; must be at root of overall tree. Can’t replace whatever points to the parent externally. Do nothing.
- setChildIndex(t, index)¶
What index is this node in the child list? Range: 0..n-1 If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- setParent(t, parent)¶
Who is the parent node of this node; if null, implies node is root. If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- setTokenBoundaries(t, startToken, stopToken)¶
Where are the bounds in the input token stream for this node and all children? Each rule that creates AST nodes will call this method right before returning. Flat trees (i.e., lists) will still usually have a nil root node just to hold the children list. That node would contain the start/stop indexes then.
- class schrodinger.application.desmond.antlr3.tree.CommonTree(payload)¶
Bases:
schrodinger.application.desmond.antlr3.tree.BaseTree
@brief A tree node that is wrapper for a Token object.
After 3.0 release while building tree rewrite stuff, it became clear that computing parent and child index is very difficult and cumbersome. Better to spend the space in every tree node. If you don’t want these extra fields, it’s easy to cut them out in your own BaseTree subclass.
- __init__(payload)¶
Create a new node from an existing node does nothing for BaseTree as there are no fields other than the children list, which cannot be copied as the children are not considered part of this node.
- getToken()¶
- dupNode()¶
- isNil()¶
Indicates the node is a nil node but may still have children, meaning the tree is a flat list.
- getType()¶
Return a token type; needed for tree parsing.
- property type¶
- getText()¶
- property text¶
- getLine()¶
In case we don’t have a token payload, what is the line for errors?
- property line¶
- getCharPositionInLine()¶
- property charPositionInLine¶
- getTokenStartIndex()¶
- What is the smallest token index (indexing from 0) for this node
and its children?
- setTokenStartIndex(index)¶
- property tokenStartIndex¶
- getTokenStopIndex()¶
What is the largest token index (indexing from 0) for this node and its children?
- setTokenStopIndex(index)¶
- property tokenStopIndex¶
- setUnknownTokenBoundaries()¶
For every node in this subtree, make sure it’s start/stop token’s are set. Walk depth first, visit bottom up. Only updates nodes with at least one token index < 0.
- getChildIndex()¶
BaseTree doesn’t track child indexes.
- setChildIndex(idx)¶
BaseTree doesn’t track child indexes.
- getParent()¶
BaseTree doesn’t track parent pointers.
- setParent(t)¶
BaseTree doesn’t track parent pointers.
- toString()¶
Override to say how a node (not a tree) should look as text
- toStringTree()¶
Print out a whole tree not just a node
- addChild(childTree)¶
Add t as child of this node.
Warning: if t has no children, but child does and child isNil then this routine moves children to t via t.children = child.children; i.e., without copying the array.
- addChildren(children)¶
Add all elements of kids list as children of this node
- deleteChild(i)¶
- freshenParentAndChildIndexes(offset=0)¶
Set the parent and child index values for all children
- getAncestor(ttype)¶
Walk upwards and get first ancestor with this token type.
- getAncestors()¶
Return a list of all ancestors of this node.
The first node of list is the root and the last is the parent of this node.
- getChild(i)¶
- getChildCount()¶
- getChildren()¶
@brief Get the children internal List
Note that if you directly mess with the list, do so at your own risk.
- getFirstChildWithType(treeType)¶
- hasAncestor(ttype)¶
Walk upwards looking for ancestor with this token type.
- replaceChildren(startChildIndex, stopChildIndex, newTree)¶
Delete children from start to stop and replace with t even if t is a list (nil-root tree). num of children can increase or decrease. For huge child lists, inserting children can force walking rest of children to set their childindex; could be slow.
- sanityCheckParentAndChildIndexes(parent=None, i=- 1)¶
- setChild(i, t)¶
Set ith child (0..n-1) to t; t must be non-null and non-nil node
- class schrodinger.application.desmond.antlr3.tree.CommonErrorNode(input, start, stop, exc)¶
Bases:
schrodinger.application.desmond.antlr3.tree.CommonTree
A node representing erroneous token range in token stream
- __init__(input, start, stop, exc)¶
Create a new node from an existing node does nothing for BaseTree as there are no fields other than the children list, which cannot be copied as the children are not considered part of this node.
- isNil()¶
Indicates the node is a nil node but may still have children, meaning the tree is a flat list.
- getType()¶
Return a token type; needed for tree parsing.
- getText()¶
- toString()¶
Override to say how a node (not a tree) should look as text
- addChild(childTree)¶
Add t as child of this node.
Warning: if t has no children, but child does and child isNil then this routine moves children to t via t.children = child.children; i.e., without copying the array.
- addChildren(children)¶
Add all elements of kids list as children of this node
- property charPositionInLine¶
- deleteChild(i)¶
- dupNode()¶
- freshenParentAndChildIndexes(offset=0)¶
Set the parent and child index values for all children
- getAncestor(ttype)¶
Walk upwards and get first ancestor with this token type.
- getAncestors()¶
Return a list of all ancestors of this node.
The first node of list is the root and the last is the parent of this node.
- getCharPositionInLine()¶
- getChild(i)¶
- getChildCount()¶
- getChildIndex()¶
BaseTree doesn’t track child indexes.
- getChildren()¶
@brief Get the children internal List
Note that if you directly mess with the list, do so at your own risk.
- getFirstChildWithType(treeType)¶
- getLine()¶
In case we don’t have a token payload, what is the line for errors?
- getParent()¶
BaseTree doesn’t track parent pointers.
- getToken()¶
- getTokenStartIndex()¶
- What is the smallest token index (indexing from 0) for this node
and its children?
- getTokenStopIndex()¶
What is the largest token index (indexing from 0) for this node and its children?
- hasAncestor(ttype)¶
Walk upwards looking for ancestor with this token type.
- property line¶
- replaceChildren(startChildIndex, stopChildIndex, newTree)¶
Delete children from start to stop and replace with t even if t is a list (nil-root tree). num of children can increase or decrease. For huge child lists, inserting children can force walking rest of children to set their childindex; could be slow.
- sanityCheckParentAndChildIndexes(parent=None, i=- 1)¶
- setChild(i, t)¶
Set ith child (0..n-1) to t; t must be non-null and non-nil node
- setChildIndex(idx)¶
BaseTree doesn’t track child indexes.
- setParent(t)¶
BaseTree doesn’t track parent pointers.
- setTokenStartIndex(index)¶
- setTokenStopIndex(index)¶
- setUnknownTokenBoundaries()¶
For every node in this subtree, make sure it’s start/stop token’s are set. Walk depth first, visit bottom up. Only updates nodes with at least one token index < 0.
- property text¶
- toStringTree()¶
Print out a whole tree not just a node
- property tokenStartIndex¶
- property tokenStopIndex¶
- property type¶
- class schrodinger.application.desmond.antlr3.tree.CommonTreeAdaptor¶
Bases:
schrodinger.application.desmond.antlr3.tree.BaseTreeAdaptor
@brief A TreeAdaptor that works with any Tree implementation.
It provides really just factory methods; all the work is done by BaseTreeAdaptor. If you would like to have different tokens created than ClassicToken objects, you need to override this and then set the parser tree adaptor to use your subclass.
To get your parser to build nodes of a different type, override create(Token), errorNode(), and to be safe, YourTreeClass.dupNode(). dupNode is called to duplicate nodes during rewrite operations.
- dupNode(treeNode)¶
Duplicate a node. This is part of the factory; override if you want another kind of node to be built.
I could use reflection to prevent having to override this but reflection is slow.
- createWithPayload(payload)¶
Create a tree node from Token object; for CommonTree type trees, then the token just becomes the payload. This is the most common create call.
Override if you want another kind of node to be built.
- createToken(fromToken=None, tokenType=None, text=None)¶
Tell me how to create a token for use with imaginary token nodes. For example, there is probably no input symbol associated with imaginary token DECL, but you need to create it as a payload or whatever for the DECL node as in ^(DECL type ID).
If you care what the token payload objects’ type is, you should override this method and any other createToken variant.
- setTokenBoundaries(t, startToken, stopToken)¶
Track start/stop token for subtree root created for a rule. Only works with Tree nodes. For rules that match nothing, seems like this will yield start=i and stop=i-1 in a nil node. Might be useful info so I’ll not force to be i..i.
- getTokenStartIndex(t)¶
Get the token start index for this subtree; return -1 if no such index
- getTokenStopIndex(t)¶
Get the token stop index for this subtree; return -1 if no such index
- getText(t)¶
- getType(t)¶
For tree parsing, I need to know the token type of a node
- getToken(t)¶
What is the Token associated with this node? If you are not using CommonTree, then you must override this in your own adaptor.
- getChild(t, i)¶
Get a child 0..n-1 node
- getChildCount(t)¶
How many children? If 0, then this is a leaf node
- getParent(t)¶
Who is the parent node of this node; if null, implies node is root. If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- setParent(t, parent)¶
Who is the parent node of this node; if null, implies node is root. If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- getChildIndex(t)¶
What index is this node in the child list? Range: 0..n-1 If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- setChildIndex(t, index)¶
What index is this node in the child list? Range: 0..n-1 If your node type doesn’t handle this, it’s ok but the tree rewrites in tree parsers need this functionality.
- replaceChildren(parent, startChildIndex, stopChildIndex, t)¶
Replace from start to stop child index of parent with t, which might be a list. Number of children may be different after this call.
If parent is null, don’t do anything; must be at root of overall tree. Can’t replace whatever points to the parent externally. Do nothing.
- addChild(tree, child)¶
Add a child to the tree t. If child is a flat tree (a list), make all in list children of t. Warning: if t has no children, but child does and child isNil then you can decide it is ok to move children to t via t.children = child.children; i.e., without copying the array. Just make sure that this is consistent with have the user will build ASTs.
- becomeRoot(newRoot, oldRoot)¶
If oldRoot is a nil root, just copy or move the children to newRoot. If not a nil root, make oldRoot a child of newRoot.
old=^(nil a b c), new=r yields ^(r a b c) old=^(a b c), new=r yields ^(r ^(a b c))
If newRoot is a nil-rooted single child tree, use the single child as the new root node.
old=^(nil a b c), new=^(nil r) yields ^(r a b c) old=^(a b c), new=^(nil r) yields ^(r ^(a b c))
If oldRoot was null, it’s ok, just return newRoot (even if isNil).
old=null, new=r yields r old=null, new=^(nil r) yields ^(nil r)
Return newRoot. Throw an exception if newRoot is not a simple node or nil root with a single child node–it must be a root node. If newRoot is ^(nil x) return x as newRoot.
Be advised that it’s ok for newRoot to point at oldRoot’s children; i.e., you don’t have to copy the list. We are constructing these nodes so we should have this control for efficiency.
- create(*args)¶
Deprecated, use createWithPayload, createFromToken or createFromType.
This method only exists to mimic the Java interface of TreeAdaptor.
- createFromToken(tokenType, fromToken, text=None)¶
Create a new node derived from a token, with a new token type and (optionally) new text.
This is invoked from an imaginary node ref on right side of a rewrite rule as IMAG[$tokenLabel] or IMAG[$tokenLabel “IMAG”].
This should invoke createToken(Token).
- createFromType(tokenType, text)¶
Create a new node derived from a token, with a new token type.
This is invoked from an imaginary node ref on right side of a rewrite rule as IMAG[“IMAG”].
This should invoke createToken(int,String).
- deleteChild(t, i)¶
Remove ith child and shift children down from right.
- dupTree(t, parent=None)¶
This is generic in the sense that it will work with any kind of tree (not just Tree interface). It invokes the adaptor routines not the tree node routines to do the construction.
- errorNode(input, start, stop, exc)¶
create tree node that holds the start and stop tokens associated with an error.
If you specify your own kind of tree nodes, you will likely have to override this method. CommonTree returns Token.INVALID_TOKEN_TYPE if no token payload but you might have to set token type for diff node type.
You don’t have to subclass CommonErrorNode; you will likely need to subclass your own tree node class to avoid class cast exception.
- getUniqueID(node)¶
For identifying trees.
How to identify nodes so we can say “add node to a prior node”? Even becomeRoot is an issue. Use System.identityHashCode(node) usually.
- isNil(tree)¶
Is tree considered a nil node used to make lists of child nodes?
- nil()¶
Return a nil node (an empty but non-null node) that can hold a list of element as the children. If you want a flat tree (a list) use “t=adaptor.nil(); t.addChild(x); t.addChild(y);”
- rulePostProcessing(root)¶
Transform ^(nil x) to x and nil to null
- setChild(t, i, child)¶
Set ith child (0..n-1) to t; t must be non-null and non-nil node
- setText(t, text)¶
Node constructors can set the text of a node
- setType(t, type)¶
Node constructors can set the type of a node
- class schrodinger.application.desmond.antlr3.tree.TreeNodeStream¶
Bases:
schrodinger.application.desmond.antlr3.streams.IntStream
@brief A stream of tree nodes
It accessing nodes from a tree of some kind.
- get(i)¶
Get a tree node at an absolute index i; 0..n-1. If you don’t want to buffer up nodes, then this method makes no sense for you.
- LT(k)¶
Get tree node at current input pointer + i ahead where i=1 is next node. i<0 indicates nodes in the past. So LT(-1) is previous node, but implementations are not required to provide results for k < -1. LT(0) is undefined. For i>=n, return null. Return null for LT(0) and any index that results in an absolute address that is negative.
This is analogus to the LT() method of the TokenStream, but this returns a tree node instead of a token. Makes code gen identical for both parser and tree grammars. :)
- getTreeSource()¶
Where is this stream pulling nodes from? This is not the name, but the object that provides node objects.
- getTokenStream()¶
If the tree associated with this stream was created from a TokenStream, you can specify it here. Used to do rule $text attribute in tree parser. Optional unless you use tree parser rule text attribute or output=template and rewrite=true options.
- getTreeAdaptor()¶
What adaptor can tell me how to interpret/navigate nodes and trees. E.g., get text of a node.
As we flatten the tree, we use UP, DOWN nodes to represent the tree structure. When debugging we need unique nodes so we have to instantiate new ones. When doing normal tree parsing, it’s slow and a waste of memory to create unique navigation nodes. Default should be false;
- toString(start, stop)¶
Return the text of all nodes from start to stop, inclusive. If the stream does not buffer all the nodes then it can still walk recursively from start until stop. You can always return null or “” too, but users should not access $ruleLabel.text in an action of course in that case.
- replaceChildren(parent, startChildIndex, stopChildIndex, t)¶
Replace from start to stop child index of parent with t, which might be a list. Number of children may be different after this call. The stream is notified because it is walking the tree and might need to know you are monkeying with the underlying tree. Also, it might be able to modify the node stream to avoid restreaming for future phases.
If parent is null, don’t do anything; must be at root of overall tree. Can’t replace whatever points to the parent externally. Do nothing.
- LA(i)¶
Get int at current input pointer + i ahead where i=1 is next int.
Negative indexes are allowed. LA(-1) is previous token (token just matched). LA(-i) where i is before first token should yield -1, invalid char / EOF.
- consume()¶
- getSourceName()¶
Where are you getting symbols from? Normally, implementations will pass the buck all the way to the lexer who can ask its input stream for the file name or whatever.
- index()¶
Return the current input symbol index 0..n where n indicates the last symbol has been read. The index is the symbol about to be read not the most recently read symbol.
- mark()¶
Tell the stream to start buffering if it hasn’t already. Return current input position, index(), or some other marker so that when passed to rewind() you get back to the same spot. rewind(mark()) should not affect the input cursor. The Lexer track line/col info as well as input index so its markers are not pure input indexes. Same for tree node streams.
- release(marker=None)¶
You may want to commit to a backtrack but don’t want to force the stream to keep bookkeeping objects around for a marker that is no longer necessary. This will have the same behavior as rewind() except it releases resources without the backward seek. This must throw away resources for all markers back to the marker argument. So if you’re nested 5 levels of mark(), and then release(2) you have to release resources for depths 2..5.
- rewind(marker=None)¶
Reset the stream so that next call to index would return marker. The marker will usually be index() but it doesn’t have to be. It’s just a marker to indicate what state the stream was in. This is essentially calling release() and seek(). If there are markers created after this marker argument, this routine must unroll them like a stack. Assume the state the stream was in when this marker was created.
If marker is None: Rewind to the input position of the last marker. Used currently only after a cyclic DFA and just before starting a sem/syn predicate to get the input position back to the start of the decision. Do not “pop” the marker off the state. mark(i) and rewind(i) should balance still. It is like invoking rewind(last marker) but it should not “pop” the marker off. It’s like seek(last marker’s input position).
- seek(index)¶
Set the input cursor to the position indicated by index. This is normally used to seek ahead in the input stream. No buffering is required to do this unless you know your stream will use seek to move backwards such as when backtracking.
This is different from rewind in its multi-directional requirement and in that its argument is strictly an input cursor (index).
For char streams, seeking forward must update the stream state such as line number. For seeking backwards, you will be presumably backtracking using the mark/rewind mechanism that restores state and so this method does not need to update state when seeking backwards.
Currently, this method is only used for efficient backtracking using memoization, but in the future it may be used for incremental parsing.
The index is 0..n-1. A seek to position i means that LA(1) will return the ith symbol. So, seeking to 0 means LA(1) will return the first element in the stream.
- size()¶
Only makes sense for streams that buffer everything up probably, but might be useful to display the entire stream or for testing. This value includes a single EOF.
- class schrodinger.application.desmond.antlr3.tree.CommonTreeNodeStream(*args)¶
Bases:
schrodinger.application.desmond.antlr3.tree.TreeNodeStream
@brief A buffered stream of tree nodes.
Nodes can be from a tree of ANY kind.
This node stream sucks all nodes out of the tree specified in the constructor during construction and makes pointers into the tree using an array of Object pointers. The stream necessarily includes pointers to DOWN and UP and EOF nodes.
This stream knows how to mark/release for backtracking.
This stream is most suitable for tree interpreters that need to jump around a lot or for tree parsers requiring speed (at cost of memory). There is some duplicated functionality here with UnBufferedTreeNodeStream but just in bookkeeping, not tree walking etc…
:see UnBufferedTreeNodeStream
- __init__(*args)¶
- fillBuffer()¶
Walk tree with depth-first-search and fill nodes buffer. Don’t do DOWN, UP nodes if its a list (t is isNil).
- getNodeIndex(node)¶
What is the stream index for node? 0..n-1 Return -1 if node not found.
As we flatten the tree, we use UP, DOWN nodes to represent the tree structure. When debugging we need unique nodes so instantiate new ones when uniqueNavigationNodes is true.
- get(i)¶
Get a tree node at an absolute index i; 0..n-1. If you don’t want to buffer up nodes, then this method makes no sense for you.
- LT(k)¶
Get tree node at current input pointer + i ahead where i=1 is next node. i<0 indicates nodes in the past. So LT(-1) is previous node, but implementations are not required to provide results for k < -1. LT(0) is undefined. For i>=n, return null. Return null for LT(0) and any index that results in an absolute address that is negative.
This is analogus to the LT() method of the TokenStream, but this returns a tree node instead of a token. Makes code gen identical for both parser and tree grammars. :)
- getCurrentSymbol()¶
- LB(k)¶
Look backwards k nodes
- getTreeSource()¶
Where is this stream pulling nodes from? This is not the name, but the object that provides node objects.
- getSourceName()¶
Where are you getting symbols from? Normally, implementations will pass the buck all the way to the lexer who can ask its input stream for the file name or whatever.
- getTokenStream()¶
If the tree associated with this stream was created from a TokenStream, you can specify it here. Used to do rule $text attribute in tree parser. Optional unless you use tree parser rule text attribute or output=template and rewrite=true options.
- setTokenStream(tokens)¶
- getTreeAdaptor()¶
What adaptor can tell me how to interpret/navigate nodes and trees. E.g., get text of a node.
As we flatten the tree, we use UP, DOWN nodes to represent the tree structure. When debugging we need unique nodes so we have to instantiate new ones. When doing normal tree parsing, it’s slow and a waste of memory to create unique navigation nodes. Default should be false;
- consume()¶
- LA(i)¶
Get int at current input pointer + i ahead where i=1 is next int.
Negative indexes are allowed. LA(-1) is previous token (token just matched). LA(-i) where i is before first token should yield -1, invalid char / EOF.
- mark()¶
Tell the stream to start buffering if it hasn’t already. Return current input position, index(), or some other marker so that when passed to rewind() you get back to the same spot. rewind(mark()) should not affect the input cursor. The Lexer track line/col info as well as input index so its markers are not pure input indexes. Same for tree node streams.
- release(marker=None)¶
You may want to commit to a backtrack but don’t want to force the stream to keep bookkeeping objects around for a marker that is no longer necessary. This will have the same behavior as rewind() except it releases resources without the backward seek. This must throw away resources for all markers back to the marker argument. So if you’re nested 5 levels of mark(), and then release(2) you have to release resources for depths 2..5.
- index()¶
Return the current input symbol index 0..n where n indicates the last symbol has been read. The index is the symbol about to be read not the most recently read symbol.
- rewind(marker=None)¶
Reset the stream so that next call to index would return marker. The marker will usually be index() but it doesn’t have to be. It’s just a marker to indicate what state the stream was in. This is essentially calling release() and seek(). If there are markers created after this marker argument, this routine must unroll them like a stack. Assume the state the stream was in when this marker was created.
If marker is None: Rewind to the input position of the last marker. Used currently only after a cyclic DFA and just before starting a sem/syn predicate to get the input position back to the start of the decision. Do not “pop” the marker off the state. mark(i) and rewind(i) should balance still. It is like invoking rewind(last marker) but it should not “pop” the marker off. It’s like seek(last marker’s input position).
- seek(index)¶
Set the input cursor to the position indicated by index. This is normally used to seek ahead in the input stream. No buffering is required to do this unless you know your stream will use seek to move backwards such as when backtracking.
This is different from rewind in its multi-directional requirement and in that its argument is strictly an input cursor (index).
For char streams, seeking forward must update the stream state such as line number. For seeking backwards, you will be presumably backtracking using the mark/rewind mechanism that restores state and so this method does not need to update state when seeking backwards.
Currently, this method is only used for efficient backtracking using memoization, but in the future it may be used for incremental parsing.
The index is 0..n-1. A seek to position i means that LA(1) will return the ith symbol. So, seeking to 0 means LA(1) will return the first element in the stream.
- push(index)¶
Make stream jump to a new location, saving old location. Switch back with pop().
- pop()¶
Seek back to previous index saved during last push() call. Return top of stack (return index).
- reset()¶
- size()¶
Only makes sense for streams that buffer everything up probably, but might be useful to display the entire stream or for testing. This value includes a single EOF.
- replaceChildren(parent, startChildIndex, stopChildIndex, t)¶
Replace from start to stop child index of parent with t, which might be a list. Number of children may be different after this call. The stream is notified because it is walking the tree and might need to know you are monkeying with the underlying tree. Also, it might be able to modify the node stream to avoid restreaming for future phases.
If parent is null, don’t do anything; must be at root of overall tree. Can’t replace whatever points to the parent externally. Do nothing.
- toString(start, stop)¶
Return the text of all nodes from start to stop, inclusive. If the stream does not buffer all the nodes then it can still walk recursively from start until stop. You can always return null or “” too, but users should not access $ruleLabel.text in an action of course in that case.
- class schrodinger.application.desmond.antlr3.tree.TreeParser(input, state=None)¶
Bases:
schrodinger.application.desmond.antlr3.recognizers.BaseRecognizer
@brief Baseclass for generated tree parsers.
A parser for a stream of tree nodes. “tree grammars” result in a subclass of this. All the error reporting and recovery is shared with Parser via the BaseRecognizer superclass.
- __init__(input, state=None)¶
- reset()¶
reset the parser’s state; subclasses must rewinds the input stream
- setTreeNodeStream(input)¶
Set the input stream
- getTreeNodeStream()¶
- getSourceName()¶
- getCurrentInputSymbol(input)¶
Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID. Token and tree parsers need to return different objects. Rather than test for input stream type or change the IntStream interface, I use a simple method to ask the recognizer to tell me what the current input symbol is.
This is ignored for lexers.
- getMissingSymbol(input, e, expectedTokenType, follow)¶
Conjure up a missing token during error recovery.
The recognizer attempts to recover from single missing symbols. But, actions might refer to that missing symbol. For example, x=ID {f($x);}. The action clearly assumes that there has been an identifier matched previously and that $x points at that token. If that token is missing, but the next token in the stream is what we want we assume that this token is missing and we keep going. Because we have to return some token to replace the missing token, we have to conjure one up. This method gives the user control over the tokens returned for missing tokens. Mostly, you will want to create something special for identifier tokens. For literals such as ‘{’ and ‘,’, the default action in the parser or tree parser works. It simply creates a CommonToken of the appropriate type. The text will be the token. If you change what tokens must be created by the lexer, override this method to create the appropriate tokens.
- dotdot = '.*[^.]\\.\\.[^.].*'¶
- doubleEtc = '.*\\.\\.\\.\\s+\\.\\.\\..*'¶
- dotdotPattern = re.compile('.*[^.]\\.\\.[^.].*')¶
- doubleEtcPattern = re.compile('.*\\.\\.\\.\\s+\\.\\.\\..*')¶
- inContext(context, adaptor=None, tokenName=None, t=None)¶
Check if current node in input has a context.
Context means sequence of nodes towards root of tree. For example, you might say context is “MULT” which means my parent must be MULT. “CLASS VARDEF” says current node must be child of a VARDEF and whose parent is a CLASS node. You can use “…” to mean zero-or-more nodes. “METHOD … VARDEF” means my parent is VARDEF and somewhere above that is a METHOD node. The first node in the context is not necessarily the root. The context matcher stops matching and returns true when it runs out of context. There is no way to force the first node to be the root.
- matchAny(ignore)¶
Match ‘.’ in tree parser has special meaning. Skip node or entire tree if node has children. If children, scan until corresponding UP node.
- mismatch(input, ttype, follow)¶
We have DOWN/UP nodes in the stream that have no line info; override. plus we want to alter the exception type. Don’t try to recover from tree parser errors inline…
- getErrorHeader(e)¶
Prefix error message with the grammar name because message is always intended for the programmer because the parser built the input tree not the user.
- getErrorMessage(e, tokenNames)¶
Tree parsers parse nodes they usually have a token object as payload. Set the exception token and do the default behavior.
- traceIn(ruleName, ruleIndex)¶
- traceOut(ruleName, ruleIndex)¶
- DEFAULT_TOKEN_CHANNEL = 0¶
- HIDDEN = 99¶
- MEMO_RULE_FAILED = -2¶
- MEMO_RULE_UNKNOWN = -1¶
- alreadyParsedRule(input, ruleIndex)¶
Has this rule already parsed input at the current index in the input stream? Return the stop token index or MEMO_RULE_UNKNOWN. If we attempted but failed to parse properly before, return MEMO_RULE_FAILED.
This method has a side-effect: if we have seen this input for this rule and successfully parsed before, then seek ahead to 1 past the stop token matched for this rule last time.
- antlr_version = (3, 0, 1, 0)¶
- antlr_version_str = '3.0.1'¶
- beginResync()¶
A hook to listen in on the token consumption during error recovery. The DebugParser subclasses this to fire events to the listenter.
- combineFollows(exact)¶
- computeContextSensitiveRuleFOLLOW()¶
Compute the context-sensitive FOLLOW set for current rule. This is set of token types that can follow a specific rule reference given a specific call chain. You get the set of viable tokens that can possibly come next (lookahead depth 1) given the current call chain. Contrast this with the definition of plain FOLLOW for rule r:
FOLLOW(r)={x | S=>*alpha r beta in G and x in FIRST(beta)}
where x in T* and alpha, beta in V*; T is set of terminals and V is the set of terminals and nonterminals. In other words, FOLLOW(r) is the set of all tokens that can possibly follow references to r in any sentential form (context). At runtime, however, we know precisely which context applies as we have the call chain. We may compute the exact (rather than covering superset) set of following tokens.
For example, consider grammar:
stat : ID '=' expr ';' // FOLLOW(stat)=={EOF} | "return" expr '.' ; expr : atom ('+' atom)* ; // FOLLOW(expr)=={';','.',')'} atom : INT // FOLLOW(atom)=={'+',')',';','.'} | '(' expr ')' ;
The FOLLOW sets are all inclusive whereas context-sensitive FOLLOW sets are precisely what could follow a rule reference. For input input “i=(3);”, here is the derivation:
- stat => ID ‘=’ expr ‘;’
=> ID ‘=’ atom (‘+’ atom)* ‘;’ => ID ‘=’ ‘(’ expr ‘)’ (‘+’ atom)* ‘;’ => ID ‘=’ ‘(’ atom ‘)’ (‘+’ atom)* ‘;’ => ID ‘=’ ‘(’ INT ‘)’ (‘+’ atom)* ‘;’ => ID ‘=’ ‘(’ INT ‘)’ ‘;’
At the “3” token, you’d have a call chain of
stat -> expr -> atom -> expr -> atom
What can follow that specific nested ref to atom? Exactly ‘)’ as you can see by looking at the derivation of this specific input. Contrast this with the FOLLOW(atom)={‘+’,’)’,’;’,’.’}.
You want the exact viable token set when recovering from a token mismatch. Upon token mismatch, if LA(1) is member of the viable next token set, then you know there is most likely a missing token in the input stream. “Insert” one by just not throwing an exception.
- computeErrorRecoverySet()¶
Compute the error recovery set for the current rule. During rule invocation, the parser pushes the set of tokens that can follow that rule reference on the stack; this amounts to computing FIRST of what follows the rule reference in the enclosing rule. This local follow set only includes tokens from within the rule; i.e., the FIRST computation done by ANTLR stops at the end of a rule.
EXAMPLE
When you find a “no viable alt exception”, the input is not consistent with any of the alternatives for rule r. The best thing to do is to consume tokens until you see something that can legally follow a call to r or any rule that called r. You don’t want the exact set of viable next tokens because the input might just be missing a token–you might consume the rest of the input looking for one of the missing tokens.
Consider grammar:
a : '[' b ']' | '(' b ')' ; b : c '^' INT ; c : ID | INT ;
At each rule invocation, the set of tokens that could follow that rule is pushed on a stack. Here are the various “local” follow sets:
FOLLOW(b1_in_a) = FIRST(‘]’) = ‘]’ FOLLOW(b2_in_a) = FIRST(‘)’) = ‘)’ FOLLOW(c_in_b) = FIRST(‘^’) = ‘^’
Upon erroneous input “[]”, the call chain is
a -> b -> c
and, hence, the follow context stack is:
- depth local follow set after call to rule
0 <EOF> a (from main()) 1 ‘]’ b 3 ‘^’ c
Notice that ‘)’ is not included, because b would have to have been called from a different context in rule a for ‘)’ to be included.
For error recovery, we cannot consider FOLLOW(c) (context-sensitive or otherwise). We need the combined set of all context-sensitive FOLLOW sets–the set of all tokens that could follow any reference in the call chain. We need to resync to one of those tokens. Note that FOLLOW(c)=’^’ and if we resync’d to that token, we’d consume until EOF. We need to sync to context-sensitive FOLLOWs for a, b, and c: {‘]’,’^’}. In this case, for input “[]”, LA(1) is in this set so we would not consume anything and after printing an error rule c would return normally. It would not find the required ‘^’ though. At this point, it gets a mismatched token error and throws an exception (since LA(1) is not in the viable following token set). The rule exception handler tries to recover, but finds the same recovery set and doesn’t consume anything. Rule b exits normally returning to rule a. Now it finds the ‘]’ (and with the successful match exits errorRecovery mode).
So, you cna see that the parser walks up call chain looking for the token that was a member of the recovery set.
Errors are not generated in errorRecovery mode.
ANTLR’s error recovery mechanism is based upon original ideas:
“Algorithms + Data Structures = Programs” by Niklaus Wirth
and
“A note on error recovery in recursive descent parsers”: http://portal.acm.org/citation.cfm?id=947902.947905
Later, Josef Grosch had some good ideas:
“Efficient and Comfortable Error Recovery in Recursive Descent Parsers”: ftp://www.cocolab.com/products/cocktail/doca4.ps/ell.ps.zip
Like Grosch I implemented local FOLLOW sets that are combined at run-time upon error to avoid overhead during parsing.
- consumeUntil(input, tokenTypes)¶
Consume tokens until one matches the given token or token set
tokenTypes can be a single token type or a set of token types
- displayRecognitionError(tokenNames, e)¶
- emitErrorMessage(msg)¶
Override this method to change where error messages go
- endResync()¶
A hook to listen in on the token consumption during error recovery. The DebugParser subclasses this to fire events to the listenter.
- failed()¶
Return whether or not a backtracking attempt failed.
- getBacktrackingLevel()¶
- getGrammarFileName()¶
For debugging and other purposes, might want the grammar name.
Have ANTLR generate an implementation for this method.
- getNumberOfSyntaxErrors()¶
Get number of recognition errors (lexer, parser, tree parser). Each recognizer tracks its own number. So parser and lexer each have separate count. Does not count the spurious errors found between an error and next valid token match
See also reportError()
- getRuleInvocationStack()¶
Return List<String> of the rules in your parser instance leading up to a call to this method. You could override if you want more details such as the file/line info of where in the parser java code a rule is invoked.
This is very useful for error messages and for context-sensitive error recovery.
You must be careful, if you subclass a generated recognizers. The default implementation will only search the module of self for rules, but the subclass will not contain any rules. You probably want to override this method to look like
- def getRuleInvocationStack(self):
return self._getRuleInvocationStack(<class>.__module__)
where <class> is the class of the generated recognizer, e.g. the superclass of self.
- getRuleMemoization(ruleIndex, ruleStartIndex)¶
Given a rule number and a start token index number, return MEMO_RULE_UNKNOWN if the rule has not parsed input starting from start index. If this rule has parsed input starting from the start index before, then return where the rule stopped parsing. It returns the index of the last token matched by the rule.
- getTokenErrorDisplay(t)¶
How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out. Override in that case to use t.toString() (which, for CommonToken, dumps everything about the token). This is better than forcing you to override a method in your token objects because you don’t have to go modify your lexer so that it creates a new Java type.
- match(input, ttype, follow)¶
Match current input symbol against ttype. Attempt single token insertion or deletion error recovery. If that fails, throw MismatchedTokenException.
To turn off single token insertion or deletion error recovery, override recoverFromMismatchedToken() and have it throw an exception. See TreeParser.recoverFromMismatchedToken(). This way any error in a rule will cause an exception and immediate exit from rule. Rule would recover by resynchronizing to the set of symbols that can follow rule ref.
- memoize(input, ruleIndex, ruleStartIndex, success)¶
Record whether or not this rule parsed the input at this position successfully.
- mismatchIsMissingToken(input, follow)¶
- mismatchIsUnwantedToken(input, ttype)¶
- recover(input, re)¶
Recover from an error found on the input stream. This is for NoViableAlt and mismatched symbol exceptions. If you enable single token insertion and deletion, this will usually not handle mismatched symbol exceptions but there could be a mismatched token that the match() routine could not recover from.
- recoverFromMismatchedSet(input, e, follow)¶
Not currently used
- recoverFromMismatchedToken(input, ttype, follow)¶
Attempt to recover from a single missing or extra token.
EXTRA TOKEN
LA(1) is not what we are looking for. If LA(2) has the right token, however, then assume LA(1) is some extra spurious token. Delete it and LA(2) as if we were doing a normal match(), which advances the input.
MISSING TOKEN
If current token is consistent with what could come after ttype then it is ok to ‘insert’ the missing token, else throw exception For example, Input ‘i=(3;’ is clearly missing the ‘)’. When the parser returns from the nested call to expr, it will have call chain:
stat -> expr -> atom
and it will be trying to match the ‘)’ at this point in the derivation:
=> ID '=' '(' INT ')' ('+' atom)* ';' ^
match() will see that
';'
doesn’t match')'
and report a mismatched token error. To recover, it sees thatLA(1)==';'
is in the set of tokens that can follow the')'
token reference in rule atom. It can assume that you forgot the')'.
- reportError(e)¶
Report a recognition problem.
This method sets errorRecovery to indicate the parser is recovering not parsing. Once in recovery mode, no errors are generated. To get out of recovery mode, the parser must successfully match a token (after a resync). So it will go:
error occurs
enter recovery mode, report error
consume until token found in resynch set
try to resume parsing
next match() will reset errorRecovery mode
If you override, make sure to update syntaxErrors if you care about that.
- setBacktrackingLevel(n)¶
- setInput(input)¶
- toStrings(tokens)¶
A convenience method for use most often with template rewrites.
Convert a List<Token> to List<String>
- tokenNames = None¶
- class schrodinger.application.desmond.antlr3.tree.TreeVisitor(adaptor=None)¶
Bases:
object
Do a depth first walk of a tree, applying pre() and post() actions we go.
- __init__(adaptor=None)¶
- visit(t, pre_action=None, post_action=None)¶
Visit every node in tree t and trigger an action for each node before/after having visited all of its children. Bottom up walk. Execute both actions even if t has no children. Ignore return results from transforming children since they will have altered the child list of this node (their parent). Return result of applying post action to this node.
The Python version differs from the Java version by taking two callables ‘pre_action’ and ‘post_action’ instead of a class instance that wraps those methods. Those callables must accept a TreeNode as their single argument and return the (potentially transformed or replaced) TreeNode.
- class schrodinger.application.desmond.antlr3.tree.RewriteRuleElementStream(adaptor, elementDescription, elements=None)¶
Bases:
object
@brief Internal helper class.
A generic list of elements tracked in an alternative to be used in a -> rewrite rule. We need to subclass to fill in the next() method, which returns either an AST node wrapped around a token payload or an existing subtree.
Once you start next()ing, do not try to add more elements. It will break the cursor tracking I believe.
:see org.antlr.runtime.tree.RewriteRuleSubtreeStream :see org.antlr.runtime.tree.RewriteRuleTokenStream
TODO: add mechanism to detect/puke on modification after reading from stream
- __init__(adaptor, elementDescription, elements=None)¶
- reset()¶
Reset the condition of this stream so that it appears we have not consumed any of its elements. Elements themselves are untouched. Once we reset the stream, any future use will need duplicates. Set the dirty bit.
- add(el)¶
- nextTree()¶
Return the next element in the stream. If out of elements, throw an exception unless size()==1. If size is 1, then return elements[0].
Return a duplicate node/subtree if stream is out of elements and size==1. If we’ve already used the element, dup (dirty bit set).
- dup(el)¶
When constructing trees, sometimes we need to dup a token or AST subtree. Dup’ing a token means just creating another AST node around it. For trees, you must call the adaptor.dupTree() unless the element is for a tree root; then it must be a node dup.
- toTree(el)¶
Ensure stream emits trees; tokens must be converted to AST nodes. AST nodes can be passed through unmolested.
- hasNext()¶
- size()¶
- __len__()¶
- getDescription()¶
Deprecated. Directly access elementDescription attribute
- class schrodinger.application.desmond.antlr3.tree.RewriteRuleTokenStream(adaptor, elementDescription, elements=None)¶
Bases:
schrodinger.application.desmond.antlr3.tree.RewriteRuleElementStream
@brief Internal helper class.
- toTree(el)¶
Ensure stream emits trees; tokens must be converted to AST nodes. AST nodes can be passed through unmolested.
- nextNode()¶
- nextToken()¶
- dup(el)¶
When constructing trees, sometimes we need to dup a token or AST subtree. Dup’ing a token means just creating another AST node around it. For trees, you must call the adaptor.dupTree() unless the element is for a tree root; then it must be a node dup.
- __init__(adaptor, elementDescription, elements=None)¶
- __len__()¶
- add(el)¶
- getDescription()¶
Deprecated. Directly access elementDescription attribute
- hasNext()¶
- nextTree()¶
Return the next element in the stream. If out of elements, throw an exception unless size()==1. If size is 1, then return elements[0].
Return a duplicate node/subtree if stream is out of elements and size==1. If we’ve already used the element, dup (dirty bit set).
- reset()¶
Reset the condition of this stream so that it appears we have not consumed any of its elements. Elements themselves are untouched. Once we reset the stream, any future use will need duplicates. Set the dirty bit.
- size()¶
- class schrodinger.application.desmond.antlr3.tree.RewriteRuleSubtreeStream(adaptor, elementDescription, elements=None)¶
Bases:
schrodinger.application.desmond.antlr3.tree.RewriteRuleElementStream
@brief Internal helper class.
- nextNode()¶
Treat next element as a single node even if it’s a subtree. This is used instead of next() when the result has to be a tree root node. Also prevents us from duplicating recently-added children; e.g., ^(type ID)+ adds ID to type and then 2nd iteration must dup the type node, but ID has been added.
Referencing a rule result twice is ok; dup entire tree as we can’t be adding trees as root; e.g., expr expr.
Hideous code duplication here with super.next(). Can’t think of a proper way to refactor. This needs to always call dup node and super.next() doesn’t know which to call: dup node or dup tree.
- __init__(adaptor, elementDescription, elements=None)¶
- __len__()¶
- add(el)¶
- dup(el)¶
When constructing trees, sometimes we need to dup a token or AST subtree. Dup’ing a token means just creating another AST node around it. For trees, you must call the adaptor.dupTree() unless the element is for a tree root; then it must be a node dup.
- getDescription()¶
Deprecated. Directly access elementDescription attribute
- hasNext()¶
- nextTree()¶
Return the next element in the stream. If out of elements, throw an exception unless size()==1. If size is 1, then return elements[0].
Return a duplicate node/subtree if stream is out of elements and size==1. If we’ve already used the element, dup (dirty bit set).
- reset()¶
Reset the condition of this stream so that it appears we have not consumed any of its elements. Elements themselves are untouched. Once we reset the stream, any future use will need duplicates. Set the dirty bit.
- size()¶
- toTree(el)¶
Ensure stream emits trees; tokens must be converted to AST nodes. AST nodes can be passed through unmolested.
- class schrodinger.application.desmond.antlr3.tree.RewriteRuleNodeStream(adaptor, elementDescription, elements=None)¶
Bases:
schrodinger.application.desmond.antlr3.tree.RewriteRuleElementStream
Queues up nodes matched on left side of -> in a tree parser. This is the analog of RewriteRuleTokenStream for normal parsers.
- __init__(adaptor, elementDescription, elements=None)¶
- __len__()¶
- add(el)¶
- getDescription()¶
Deprecated. Directly access elementDescription attribute
- hasNext()¶
- nextTree()¶
Return the next element in the stream. If out of elements, throw an exception unless size()==1. If size is 1, then return elements[0].
Return a duplicate node/subtree if stream is out of elements and size==1. If we’ve already used the element, dup (dirty bit set).
- reset()¶
Reset the condition of this stream so that it appears we have not consumed any of its elements. Elements themselves are untouched. Once we reset the stream, any future use will need duplicates. Set the dirty bit.
- size()¶
- nextNode()¶
- toTree(el)¶
Ensure stream emits trees; tokens must be converted to AST nodes. AST nodes can be passed through unmolested.
- dup(el)¶
When constructing trees, sometimes we need to dup a token or AST subtree. Dup’ing a token means just creating another AST node around it. For trees, you must call the adaptor.dupTree() unless the element is for a tree root; then it must be a node dup.
- class schrodinger.application.desmond.antlr3.tree.TreeRuleReturnScope¶
Bases:
schrodinger.application.desmond.antlr3.recognizers.RuleReturnScope
This is identical to the ParserRuleReturnScope except that the start property is a tree nodes not Token object when you are parsing trees. To be generic the tree node types have to be Object.
- getStop()¶
Return the stop token or tree.
- getTemplate()¶
Has a value potentially if output=template.
- __init__()¶
- getStart()¶
Return the start token or tree.
- getTree()¶
Has a value potentially if output=AST.