discodop.treebanktransforms

Treebank transformations.

  • Transforms (primarily state splits) listed by name
  • Relational-realizational transform

Functions

ancestors(node) Yield ancestors of node from direct parent to root node.
base(node, match) Test whether node.label equals match after stripping features.
bracketings(tree) Labeled bracketings of a tree.
collapselabels(trees[, _sents, tbmapping]) Collapse non-root phrasal labels with specified mapping.
dlevel(tree[, lang]) Return the D-level measure of syntactic complexity.
expandpresets(transformations) Expand aliases for presets.
ftbtransforms(name, tree, sent) Port of manual FTB enrichments specified in Stanford parser.
function(node)
returns:The first function tag for node, or the empty string.
functions(node)
returns:list of function tags for node, or an empty list.
getftbcompounds(trees, sents, cachedfile) Collect multi-word expressions in FTB, or read from cached file.
getmaxid(tree) Return highest export non-terminal ID in tree.
hassecedge(node, func, parentid) Test whether this node has a secondary edge (func, parentid).
labels(tree)
returns:the labels of the children of this node.
lassytransforms(name, tree, _sent) Transformations for the Dutch Lassy & Alpino treebanks.
morphfeats(node) Return the morphological features of a preterminal node.
negratransforms(name, tree, sent) Negra / Tiger transforms.
ptbtransforms(name, tree, sent) Transforms for WSJ section of Penn treebank.
reversetransform(tree, sent, transformations) Undo specified transformations and remove state splits marked by ^.
rindex(l, v) Like list.index(), but go from right to left.
rrbacktransform(tree[, adjunctionlabel, func]) Reverse relational-realizational transformation.
rrtransform(tree[, morphlevels, …]) Relational-realizational tree transformation.
strip(label) Equivalent to the effect of the @ operator in tregex.
transform(tree, sent, transformations) Perform specified sequence of transformations on a tree.
unifymorphfeat(feats[, percolatefeatures]) Get the sorted union of features for a sequence of feature vectors.
discodop.treebanktransforms.expandpresets(transformations)[source]

Expand aliases for presets.

discodop.treebanktransforms.transform(tree, sent, transformations)[source]

Perform specified sequence of transformations on a tree.

State-splits are preceded by ‘^’. transformations is a sequence of transformation names (order matters) that will be performed on the given tree (in-place). There are presets for particular treebanks. The name of a preset can be used as an alias that expands to a sequence of transformations; see the variable PRESETS.

discodop.treebanktransforms.reversetransform(tree, sent, transformations)[source]

Undo specified transformations and remove state splits marked by ^.

Do not apply twice (might remove VPs which shouldn’t be).

discodop.treebanktransforms.collapselabels(trees, _sents=None, tbmapping=None)[source]

Collapse non-root phrasal labels with specified mapping.

Trees are modified in-place.

Parameters:tbmapping

a mapping of treebank labels of the form:

{coarselabel1: {finelabel1, finelabel2, ...}, ...}

Cf. treebanktransforms.MAPPINGS

Returns:a tuple (trees, mapping) with the transformed trees and a mapping of their original labels to the collapsed labels.
discodop.treebanktransforms.rrtransform(tree, morphlevels=0, percolatefeatures=None, adjunctionlabel=None, ignorefunctions=None, ignorecategories=None, adjleft=True, adjright=True)[source]

Relational-realizational tree transformation.

Every constituent node is expanded to three levels:

  1. syntactic category, e.g., S
  2. unordered functional argument structure of children, e.g., S/<SBJ,HD,OBJ>
  3. for each child:
    grammatical function + parent syntactic category, e.g., OBJ/S

Example:

(NP-SBJ (NN-HD ...)) => (NP (<HD>/NP (HD/NP (NN ...))))
Parameters:
  • adjunctionlabel – a grammatical function label identifying adjunctions. They will not be part of argument structures, and their grammatical function will be replaced with their neighboring non-adjunctive functions.
  • adjright (adjleft,) – whether to include the left and right sibling, respectively, when replacing the function label for adjunctionlabel.
  • ignorefunctions – function labels that do not go into argument structure, but keep their function in their realization to make backtransform possible.
  • morphlevels – if nonzero, percolate morphological features this many levels upwards. For a given node, the union of the features of its children are collected, and the result is appended to its syntactic category.
  • percolatefeatures – if a sequence is given, percolate only these morphological features; by default all features are used.
Returns:

a new, transformed tree.

discodop.treebanktransforms.rrbacktransform(tree, adjunctionlabel=None, func=None)[source]

Reverse relational-realizational transformation.

Parameters:
  • adjunctionlabel – used to assign a grammatical function to adjunctions that have been converted to contextual labels ‘next:prev’.
  • func – used internally to percolate functional labels.
Returns:

a new tree.

discodop.treebanktransforms.dlevel(tree, lang='nl')[source]

Return the D-level measure of syntactic complexity.

Original version: Rosenberg & Abbeduto (1987), https://doi.org/10.1017/S0142716400000047 Covington et al. (2006), http://ai1.ai.uga.edu/caspr/2006-01-Covington.pdf Dutch version implemented here: Appendix A of T-Scan manual, https://github.com/proycon/tscan/raw/master/docs/tscanhandleiding.pdf

Parameters:tree – A tree from the Alpino parser (i.e., not binarized, with function and morphological tags).
Returns:integer 0-7; 7 is most complex.
discodop.treebanktransforms.rindex(l, v)[source]

Like list.index(), but go from right to left.

discodop.treebanktransforms.labels(tree)[source]
Returns:the labels of the children of this node.
discodop.treebanktransforms.strip(label)[source]

Equivalent to the effect of the @ operator in tregex.

discodop.treebanktransforms.ancestors(node)[source]

Yield ancestors of node from direct parent to root node.

discodop.treebanktransforms.bracketings(tree)[source]

Labeled bracketings of a tree.

discodop.treebanktransforms.morphfeats(node)[source]

Return the morphological features of a preterminal node.

Features may be separated by dots or commas.

discodop.treebanktransforms.unifymorphfeat(feats, percolatefeatures=None)[source]

Get the sorted union of features for a sequence of feature vectors.

Parameters:
  • feats – a sequence of strings of comma/dot separated feature vectors.
  • percolatefeatures – if a set is given, select only these features; by default all features are used.
>>> print(unifymorphfeat({'Def.*.*', '*.Sg.*', '*.*.Akk'}))
Akk.Def.Sg
>>> print(unifymorphfeat({'LID[bep,stan,rest]', 'N[soort,ev,zijd,stan]'}))
bep,ev,rest,soort,stan,zijd
discodop.treebanktransforms.function(node)[source]
Returns:The first function tag for node, or the empty string.
discodop.treebanktransforms.functions(node)[source]
Returns:list of function tags for node, or an empty list.
discodop.treebanktransforms.hassecedge(node, func, parentid)[source]

Test whether this node has a secondary edge (func, parentid).