discodop.tree

Various Tree objects for representing syntax or morphological trees.

Functions

brackettree(treestr) Parse a single tree presented in (disc)bracket format (autodetected).
discbrackettree(treestr) Parse a single tree presented in discbracket format.
escape(text) Escape all occurrences of parentheses and replace None with ‘’.
frontier(tree, sent[, nodecolor, leafcolor]) Return a representation of the frontier of a tree with ANSI colors.
isdisc(node) Test whether a particular node has a discontinuous yield.
latexlabel(label) Quote/format label for latex.
ptbescape(token) Escape brackets according to PTB convention in a single token.
ptbunescape(token) Unescape brackets in a single token, including PTB notation.
ranges(seq) Partition seq into a list of contiguous ranges.
unescape(text) Reverse escaping of parentheses, frontier spans.
writebrackettree(tree, sent) Return a tree in bracket notation with words as leaves.
writediscbrackettree(tree, sent[, pretty]) Return tree in bracket notation with leaves as index=word.

Classes

DiscTree(tree, sent) Wrap an immutable tree with indices as leaves and a sentence.
DrawDependencies(sent, pos, deps) Visualize labeled, directed dependency structures.
DrawTree(tree[, sent, abbr, highlight, …]) Visualize a discontinuous tree in various formats.
ImmutableParentedTree(label_or_str[, children]) Combination of an Immutable and Parented Tree.
ImmutableTree(label_or_str[, children]) A tree which may not be modified.; has a hash() value.
ParentedTree(label_or_str[, children]) A Tree that maintains parent pointers for single-parented trees.
Tree(label_or_str[, children]) A mutable, labeled, n-ary tree structure.
class discodop.tree.Tree(label_or_str, children=None)[source]

A mutable, labeled, n-ary tree structure.

Each Tree represents a single hierarchical grouping of leaves and subtrees. A tree’s children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree. Any other properties that a Tree defines are known as node properties, and are used to add information about individual hierarchical groupings. For example, syntax trees use a label property to label syntactic constituents with phrase labels, such as “NP” and “VP”. Several Tree methods use tree positions to specify children or descendants of a tree. Tree positions are defined as follows:

  • The tree position i specifies a Tree’s ith child.
  • The tree position () specifies the Tree itself.
  • If p is the tree position of descendant d, then
    p + (i,) specifies the ith child of d.

i.e., every tree position is either a single index i, specifying self[i]; or a sequence (i1, i2, ..., iN), specifying self[i1][i2]...[iN].

The constructor can be called in two ways:

  • Tree(label, children) constructs a new tree with the specified label
    and list of children.
  • Tree(s) constructs a new tree by parsing the string s. Equivalent to
    calling the class method Tree.parse(s). NB: expects integers as leaves by default; use Tree.parse(s, parse_leaf=None) to interpret leaves as strings.
append(child)[source]

Append child to this node.

extend(children)[source]

Extend this node’s children with an iterable.

insert(index, child)[source]

Insert child at integer index.

pop(index=-1)[source]

Remove child at specified integer index (or default to last).

remove(child)[source]

Remove child, based on equality.

index(child)[source]

Return index of child, based on equality.

leaves()[source]
Returns:list containing this tree’s leaves.

The order reflects the order of the tree’s hierarchical structure.

height()[source]
Returns:The longest distance from this node to a leaf node.
  • The height of a tree containing no children is 1;
  • the height of a tree containing only leaves is 2;
  • the height of any other tree is one plus the maximum of its
    children’s heights.
subtrees(condition=None)[source]

Yield subtrees of this tree in depth-first, pre-order traversal.

Parameters:condition – a function Tree -> bool to filter which nodes are yielded (does not affect whether children are visited).

NB: store traversal as list before any structural modifications.

postorder(condition=None)[source]

A generator that does a post-order traversal of this tree.

Similar to Tree.subtrees() which does a pre-order traversal. NB: store traversal as list before any structural modifications.

Yields:Tree objects.
find(condition)[source]

Return first child s.t. condition(child) is True, or None.

pos(nodes=False)[source]

Collect preterminals (part-of-speech nodes).

Parameters:nodes – if True, return a sequence of preterminal Tree objects instead of tuples. NB: a preterminal that dominates multiple terminals is returned once.
Returns:a list of tuples containing leaves and pre-terminals (part-of-speech tags). The order reflects the order of the tree’s hierarchical structure. A preterminal that dominates multiple terminals generates a tuple for each terminal.
treepositions(order='preorder')[source]
Parameters:order – One of preorder, postorder, bothorder, leaves.
leaf_treeposition(index)[source]
Returns:The tree position of the index-th leaf in this tree; i.e., if tp=self.leaf_treeposition(i), then self[tp]==self.leaves()[i].
Raises:IndexError – if this tree contains fewer than index + 1 leaves, or if index < 0.
treeposition_spanning_leaves(start, end)[source]
Returns:The tree position of the lowest descendant of this tree that dominates self.leaves()[start:end].
Raises:ValueError – if end <= start.
classmethod convert(val)[source]

Convert a tree between different subtypes of Tree.

Parameters:
  • cls – the class that will be used for the new tree.
  • val – The tree that should be converted.
copy(deep=False)[source]

Create a copy of this tree.

freeze(leaf_freezer=None)[source]
Returns:an immutable version of this tree.
classmethod parse(s, parse_label=None, parse_leaf=<class 'int'>, label_pattern=None, leaf_pattern=None)[source]

Parse a bracketed tree string and return the resulting tree. Trees are represented as nested bracketings, such as: (S (NP (NNP John)) (VP (V runs)))

Parameters:
  • s – The string to parse
  • parse_leaf (parse_label,) – If specified, these functions are applied to the substrings of s corresponding to labels and leaves (respectively) to obtain the values for those labels and leaves. They should have the following signature: parse_label(str) -> value
  • leaf_pattern (label_pattern,) – Regular expression patterns used to find label and leaf substrings in s. By default, both label and leaf patterns are defined to match any sequence of non-whitespace non-bracket characters.
Returns:

A tree corresponding to the string representation s. If this class method is called using a subclass of Tree, then it will return a tree of that type.

pprint(margin=70, indent=0, brackets='()')[source]
Returns:

A pretty-printed string representation of this tree.

Parameters:
  • margin – The right margin at which to do line-wrapping.
  • indent – The indentation level at which printing begins. This number is used to decide how far to indent subsequent lines.
draw()[source]
Returns:an ASCII art visualization of tree.
class discodop.tree.ImmutableTree(label_or_str, children=None)[source]

A tree which may not be modified.; has a hash() value.

NB: the label and children attributes should not be modified, but this is not enforced.

This class has the following optimizations compared to Tree objects:
  • precomputed hash() value
  • precomputed leaves() value of each node
  • a bitset attribute recording the leaf indices dominated by each node
leaves()[source]
Returns:list containing this tree’s leaves.

The order reflects the order of the tree’s hierarchical structure.

append(_)[source]

Append child to this node.

extend(_)[source]

Extend this node’s children with an iterable.

pop(_=None)[source]

Remove child at specified integer index (or default to last).

remove(_)[source]

Remove child, based on equality.

class discodop.tree.ParentedTree(label_or_str, children=None)[source]

A Tree that maintains parent pointers for single-parented trees.

The parent pointers are updated whenever any change is made to a tree’s structure.

The following read-only property values are automatically updated whenever the structure of a parented tree is modified: parent, parent_index, left_sibling, right_sibling, root, treeposition. Each ParentedTree may have at most one parent; i.e., subtrees may not be shared. Any attempt to reuse a single ParentedTree as a child of more than one parent (or as multiple children of the same parent) will cause a ValueError exception to be raised. ParentedTrees should never be used in the same tree as Trees or MultiParentedTrees. Mixing tree implementations may result in incorrect parent pointers and in TypeError exceptions.

The ParentedTree class redefines all operations that modify a tree’s structure to call two methods, which are used by subclasses to update parent information:

  • _setparent() is called whenever a new child is added.
  • _delparent() is called whenever a child is removed.
parent

The parent of this tree, or None if it has no parent.

parent_index

The index of this tree in its parent.

i.e., ptree.parent[ptree.parent_index] is ptree. Note that ptree.parent_index is not necessarily equal to ptree.parent.index(ptree), since the index() method returns the first child that is _equal_ to its argument.

left_sibling

The left sibling of this tree, or None if it has none.

right_sibling

The right sibling of this tree, or None if it has none.

root
Returns:the root of this tree.
treeposition

The tree position of this tree, relative to the root of the tree.

i.e., ptree.root[ptree.treeposition] is ptree.

append(child)[source]

Append child to this node.

extend(children)[source]

Extend this node’s children with an iterable.

insert(index, child)[source]

Insert child at integer index.

pop(index=-1)[source]

Remove child at specified integer index (or default to last).

remove(child)[source]

Remove child, based on equality.

detach()[source]

Remove this node from its parent node, if it has one.

Returns:self, s.t. self.parent is None.
disown()[source]

Detach children of this node and return as list.

spliceabove(label)[source]

Insert a new node between this node and its parent.

splicebelow(label)[source]

Insert a new node between this node and its children.

prune()[source]

Replace this node with its children.

class discodop.tree.ImmutableParentedTree(label_or_str, children=None)[source]

Combination of an Immutable and Parented Tree.

class discodop.tree.DiscTree(tree, sent)[source]

Wrap an immutable tree with indices as leaves and a sentence.

Provides hash & equality.

class discodop.tree.DrawTree(tree, sent=None, abbr=False, highlight=(), highlightfunc=(), secedge=False)[source]

Visualize a discontinuous tree in various formats.

DrawTree(tree, sent=None, highlight=(), abbr=False) creates an object from which different visualizations can be created.

Parameters:
  • tree – a Tree object or a string.
  • sent – a list of words (strings). If sent is not given, and the tree contains non-integers as leaves, a continuous phrase-structure tree is assumed. If sent is not given and the tree contains only indices as leaves, the indices are displayed as placeholder terminals.
  • abbr – when True, abbreviate labels longer than 5 characters.
  • highlight – Optionally, a sequence of Tree objects in tree which should be highlighted. Has the effect of only applying colors to nodes in this sequence (nodes should be given as Tree objects, terminals as indices).
  • highlightfunc – Similar to highlight, but affects function tags.
>>> print(DrawTree('(S (NP Mary) (VP walks))').text())
... 
      S
  ____|____
 NP        VP
 |         |
Mary     walks

Interesting reference (not used for this code): T. Eschbach et al., Orth. Hypergraph Drawing, Journal of Graph Algorithms and Applications, 10(2) 141–157 (2006)149. http://jgaa.info/accepted/2006/EschbachGuentherBecker2006.10.2.pdf

nodecoords(tree, sent, highlight, highlightfunc, secedge)[source]

Produce coordinates of nodes on a grid.

Objective:

  • Produce coordinates for a non-overlapping placement of nodes and
    horizontal lines.
  • Order edges so that crossing edges cross a minimal number of previous
    horizontal lines (never vertical lines).

Approach:

  • bottom up level order traversal (start at terminals)
  • at each level, identify nodes which cannot be on the same row
  • identify nodes which cannot be in the same column
  • place nodes into a grid at (row, column)
  • order child-parent edges with crossing edges last

Coordinates are (row, column); the origin (0, 0) is at the top left; the root node is on row 0. Coordinates do not consider the size of a node (which depends on font, &c), so the width of a column of the grid should be automatically determined by the element with the greatest width in that column. Alternatively, the integer coordinates could be converted to coordinates in which the distances between adjacent nodes are non-uniform.

Produces tuple (nodes, coords, edges) where:

  • nodes[id]: Tree object for the node with this integer id
  • coords[id]: (n, m) coordinate where to draw node with id in the grid
  • edges[id]: parent id of node with this id (ordered dictionary)
svg(hscale=40, hmult=3, nodecolor='blue', leafcolor='red', funccolor='green', funcsep=None)[source]
Returns:SVG representation of a discontinuous tree.
text(unicodelines=False, html=False, ansi=False, nodecolor='blue', leafcolor='red', funccolor='green', funcsep=None, morphsep=None, maxwidth=16, nodeprops=None)[source]
Returns:

ASCII art for a discontinuous tree.

Parameters:
  • unicodelines – whether to use Unicode line drawing characters instead of plain (7-bit) ASCII.
  • html – whether to wrap output in HTML code (default plain text).
  • ansi – whether to produce colors with ANSI escape sequences (only effective when html==False).
  • nodecolor (leafcolor,) – specify colors of leaves and phrasal nodes; effective when either html or ansi is True.
  • funcsep (funccolor,) – if funcsep is a string, it is taken as a separator for function tags; when it occurs, the rest of the label is drawn with funccolor.
  • secedge – if True, add coindexation to show secondary edges and labels; e.g., an NP with a object primary edge: NP-OBJ; the same NP with a secondary edge: NP-OBJ[SBJ:1] and S=1.
  • maxwidth – maximum number of characters before a label starts to wrap across multiple lines; pass None to disable.
  • nodeprops – if not None, pass an ID; adds several javascript properties to each node.
tikzmatrix(nodecolor='blue', leafcolor='red', funccolor='green', funcsep=None)[source]

Produce TiKZ code for use with LaTeX.

PDF can be produced with pdflatex. Uses TiKZ matrices meaning that nodes are put into a fixed grid. Where the cells of each column all have the same width.

tikznode(nodecolor='blue', leafcolor='red', funccolor='green', funcsep=None)[source]

Produce TiKZ code to draw a tree.

Nodes are drawn with the node command so they can have arbitrary coordinates.

tikzqtree(nodecolor='blue', leafcolor='red')[source]

Produce TiKZ-qtree code to draw a continuous tree.

To get trees with straight edges, add this in the preamble:

\tikzset{edge from parent/.style={draw, edge from parent path={
        (\tikzparentnode.south) -- +(0,-3pt) -| (\tikzchildnode)}}}
discodop.tree.latexlabel(label)[source]

Quote/format label for latex.

discodop.tree.frontier(tree, sent, nodecolor='blue', leafcolor='red')[source]

Return a representation of the frontier of a tree with ANSI colors.

discodop.tree.brackettree(treestr)[source]

Parse a single tree presented in (disc)bracket format (autodetected).

Returns:(tree, sent) tuple of ParentedTree and list of str.
discodop.tree.writebrackettree(tree, sent)[source]

Return a tree in bracket notation with words as leaves.

discodop.tree.writediscbrackettree(tree, sent, pretty=False)[source]

Return tree in bracket notation with leaves as index=word.

discodop.tree.isdisc(node)[source]

Test whether a particular node has a discontinuous yield.

i.e., test whether its yield contains two or more non-adjacent strings. Nodes can be continuous even if some of their children are discontinuous.

discodop.tree.escape(text)[source]

Escape all occurrences of parentheses and replace None with ‘’.

discodop.tree.unescape(text)[source]

Reverse escaping of parentheses, frontier spans.

discodop.tree.ptbescape(token)[source]

Escape brackets according to PTB convention in a single token.

discodop.tree.ptbunescape(token)[source]

Unescape brackets in a single token, including PTB notation.