discodop.treebanktransforms

Treebank transformations.

  • Transforms (primarily state splits) listed by name
  • Relational-realizational transform

Functions

ancestors(node) Yield ancestors of node from direct parent to root node.
base(node, match) Test whether node.label equals match after stripping features.
bracketings(tree) Labeled bracketings of a tree.
collapselabels(trees[, _sents, tbmapping]) Collapse non-root phrasal labels with specified mapping.
dlevel(tree[, lang]) Return the D-level measure of syntactic complexity.
expandpresets(transformations) Expand aliases for presets.
ftbtransforms(name, tree, sent) Port of manual FTB enrichments specified in Stanford parser.
function(node)
returns:The first function tag for node, or the empty string.
functions(node)
returns:list of function tags for node, or an empty list.
getmaxid(tree) Return highest export non-terminal ID in tree.
hassecedge(node, func, parentid) Test whether this node has a secondary edge (func, parentid).
labels(tree)
returns:the labels of the children of this node.
lassytransforms(name, tree, _sent) Transformations for the Dutch Lassy & Alpino treebanks.
morphfeats(node) Return the morphological features of a preterminal node.
negratransforms(name, tree, sent) Negra / Tiger transforms.
pop(node) Remove this node from its parent node, if it has one.
ptbtransforms(name, tree, sent) Transforms for WSJ section of Penn treebank.
reversetransform(tree, transformations) Undo specified transformations and remove state splits marked by ^.
rindex(l, v) Like list.index(), but go from right to left.
rrbacktransform(tree[, adjunctionlabel, func]) Reverse relational-realizational transformation.
rrtransform(tree[, morphlevels, ...]) Relational-realizational tree transformation.
strip(label) Equivalent to the effect of the @ operator in tregex.
transform(tree, sent, transformations) Perform specified sequence of transformations on a tree.
unifymorphfeat(feats[, percolatefeatures]) Get the sorted union of features for a sequence of feature vectors.
discodop.treebanktransforms.expandpresets(transformations)[source]

Expand aliases for presets.

discodop.treebanktransforms.transform(tree, sent, transformations)[source]

Perform specified sequence of transformations on a tree.

State-splits are preceded by ‘^’. transformations is a sequence of transformation names (order matters) that will be performed on the given tree (in-place). There are presets for particular treebanks. The name of a preset can be used as an alias that expands to a sequence of transformations; see the variable PRESETS.

discodop.treebanktransforms.reversetransform(tree, transformations)[source]

Undo specified transformations and remove state splits marked by ^.

Do not apply twice (might remove VPs which shouldn’t be).

discodop.treebanktransforms.collapselabels(trees, _sents=None, tbmapping=None)[source]

Collapse non-root phrasal labels with specified mapping.

Trees are modified in-place.

Parameters:tbmapping

a mapping of treebank labels of the form:

{coarselabel1: {finelabel1, finelabel2, ...}, ...}

Cf. treebanktransforms.MAPPINGS

Returns:a tuple (trees, mapping) with the transformed trees and a mapping of their original labels to the collapsed labels.
discodop.treebanktransforms.rrtransform(tree, morphlevels=0, percolatefeatures=None, adjunctionlabel=None, ignorefunctions=None, ignorecategories=None, adjleft=True, adjright=True)[source]

Relational-realizational tree transformation.

Every constituent node is expanded to three levels:

  1. syntactic category, e.g., S
  2. unordered functional argument structure of children, e.g., S/<SBJ,HD,OBJ>
  3. for each child:
    grammatical function + parent syntactic category, e.g., OBJ/S

Example:

(NP-SBJ (NN-HD ...)) => (NP (<HD>/NP (HD/NP (NN ...))))
Parameters:
  • adjunctionlabel – a grammatical function label identifying adjunctions. They will not be part of argument structures, and their grammatical function will be replaced with their neighboring non-adjunctive functions.
  • adjright (adjleft,) – whether to include the left and right sibling, respectively, when replacing the function label for adjunctionlabel.
  • ignorefunctions – function labels that do not go into argument structure, but keep their function in their realization to make backtransform possible.
  • morphlevels – if nonzero, percolate morphological features this many levels upwards. For a given node, the union of the features of its children are collected, and the result is appended to its syntactic category.
  • percolatefeatures – if a sequence is given, percolate only these morphological features; by default all features are used.
Returns:

a new, transformed tree.

discodop.treebanktransforms.rrbacktransform(tree, adjunctionlabel=None, func=None)[source]

Reverse relational-realizational transformation.

Parameters:
  • adjunctionlabel – used to assign a grammatical function to adjunctions that have been converted to contextual labels ‘next:prev’.
  • func – used internally to percolate functional labels.
Returns:

a new tree.

discodop.treebanktransforms.dlevel(tree, lang='nl')[source]

Return the D-level measure of syntactic complexity.

Original version: Rosenberg & Abbeduto (1987), https://doi.org/10.1017/S0142716400000047 Covington et al. (2006), http://ai1.ai.uga.edu/caspr/2006-01-Covington.pdf Dutch version implemented here: Appendix A of T-Scan manual, https://github.com/proycon/tscan/raw/master/docs/tscanhandleiding.pdf

Parameters:tree – A tree from the Alpino parser (i.e., not binarized, with function and morphological tags).
Returns:integer 0-7; 7 is most complex.
discodop.treebanktransforms.rindex(l, v)[source]

Like list.index(), but go from right to left.

discodop.treebanktransforms.labels(tree)[source]
Returns:the labels of the children of this node.
discodop.treebanktransforms.pop(node)[source]

Remove this node from its parent node, if it has one.

Convenience function for ParentedTrees.

discodop.treebanktransforms.strip(label)[source]

Equivalent to the effect of the @ operator in tregex.

discodop.treebanktransforms.ancestors(node)[source]

Yield ancestors of node from direct parent to root node.

discodop.treebanktransforms.bracketings(tree)[source]

Labeled bracketings of a tree.

discodop.treebanktransforms.morphfeats(node)[source]

Return the morphological features of a preterminal node.

Features may be separated by dots or commas.

discodop.treebanktransforms.unifymorphfeat(feats, percolatefeatures=None)[source]

Get the sorted union of features for a sequence of feature vectors.

Parameters:
  • feats – a sequence of strings of comma/dot separated feature vectors.
  • percolatefeatures – if a set is given, select only these features; by default all features are used.
>>> print(unifymorphfeat({'Def.*.*', '*.Sg.*', '*.*.Akk'}))
Akk.Def.Sg
>>> print(unifymorphfeat({'LID[bep,stan,rest]', 'N[soort,ev,zijd,stan]'}))
bep,ev,rest,soort,stan,zijd
discodop.treebanktransforms.function(node)[source]
Returns:The first function tag for node, or the empty string.
discodop.treebanktransforms.functions(node)[source]
Returns:list of function tags for node, or an empty list.
discodop.treebanktransforms.hassecedge(node, func, parentid)[source]

Test whether this node has a secondary edge (func, parentid).