discodop.treebanktransforms¶
Treebank transformations.
- Transforms (primarily state splits) listed by name
- Relational-realizational transform
Functions
ancestors(node) |
Yield ancestors of node from direct parent to root node. | ||
base(node, match) |
Test whether node.label equals match after stripping features. |
||
bracketings(tree) |
Labeled bracketings of a tree. | ||
collapselabels(trees[, _sents, tbmapping]) |
Collapse non-root phrasal labels with specified mapping. | ||
dlevel(tree[, lang]) |
Return the D-level measure of syntactic complexity. | ||
expandpresets(transformations) |
Expand aliases for presets. | ||
ftbtransforms(name, tree, sent) |
Port of manual FTB enrichments specified in Stanford parser. | ||
function(node) |
|
||
functions(node) |
|
||
getmaxid(tree) |
Return highest export non-terminal ID in tree. | ||
hassecedge(node, func, parentid) |
Test whether this node has a secondary edge (func, parentid). |
||
labels(tree) |
|
||
lassytransforms(name, tree, _sent) |
Transformations for the Dutch Lassy & Alpino treebanks. | ||
morphfeats(node) |
Return the morphological features of a preterminal node. | ||
negratransforms(name, tree, sent) |
Negra / Tiger transforms. | ||
pop(node) |
Remove this node from its parent node, if it has one. | ||
ptbtransforms(name, tree, sent) |
Transforms for WSJ section of Penn treebank. | ||
reversetransform(tree, transformations) |
Undo specified transformations and remove state splits marked by ^. |
||
rindex(l, v) |
Like list.index(), but go from right to left. | ||
rrbacktransform(tree[, adjunctionlabel, func]) |
Reverse relational-realizational transformation. | ||
rrtransform(tree[, morphlevels, ...]) |
Relational-realizational tree transformation. | ||
strip(label) |
Equivalent to the effect of the @ operator in tregex. | ||
transform(tree, sent, transformations) |
Perform specified sequence of transformations on a tree. | ||
unifymorphfeat(feats[, percolatefeatures]) |
Get the sorted union of features for a sequence of feature vectors. |
-
discodop.treebanktransforms.transform(tree, sent, transformations)[source]¶ Perform specified sequence of transformations on a tree.
State-splits are preceded by ‘^’.
transformationsis a sequence of transformation names (order matters) that will be performed on the given tree (in-place). There are presets for particular treebanks. The name of a preset can be used as an alias that expands to a sequence of transformations; see the variablePRESETS.
-
discodop.treebanktransforms.reversetransform(tree, transformations)[source]¶ Undo specified transformations and remove state splits marked by
^.Do not apply twice (might remove VPs which shouldn’t be).
-
discodop.treebanktransforms.collapselabels(trees, _sents=None, tbmapping=None)[source]¶ Collapse non-root phrasal labels with specified mapping.
Trees are modified in-place.
Parameters: tbmapping – a mapping of treebank labels of the form:
{coarselabel1: {finelabel1, finelabel2, ...}, ...}
Cf.
treebanktransforms.MAPPINGSReturns: a tuple (trees, mapping)with the transformed trees and a mapping of their original labels to the collapsed labels.
-
discodop.treebanktransforms.rrtransform(tree, morphlevels=0, percolatefeatures=None, adjunctionlabel=None, ignorefunctions=None, ignorecategories=None, adjleft=True, adjright=True)[source]¶ Relational-realizational tree transformation.
Every constituent node is expanded to three levels:
- syntactic category, e.g., S
- unordered functional argument structure of children, e.g., S/<SBJ,HD,OBJ>
- for each child:
- grammatical function + parent syntactic category, e.g., OBJ/S
Example:
(NP-SBJ (NN-HD ...)) => (NP (<HD>/NP (HD/NP (NN ...))))
Parameters: - adjunctionlabel – a grammatical function label identifying adjunctions. They will not be part of argument structures, and their grammatical function will be replaced with their neighboring non-adjunctive functions.
- adjright (adjleft,) – whether to include the left and right sibling,
respectively, when replacing the function label for
adjunctionlabel. - ignorefunctions – function labels that do not go into argument structure, but keep their function in their realization to make backtransform possible.
- morphlevels – if nonzero, percolate morphological features this many levels upwards. For a given node, the union of the features of its children are collected, and the result is appended to its syntactic category.
- percolatefeatures – if a sequence is given, percolate only these morphological features; by default all features are used.
Returns: a new, transformed tree.
-
discodop.treebanktransforms.rrbacktransform(tree, adjunctionlabel=None, func=None)[source]¶ Reverse relational-realizational transformation.
Parameters: - adjunctionlabel – used to assign a grammatical function to adjunctions that have been converted to contextual labels ‘next:prev’.
- func – used internally to percolate functional labels.
Returns: a new tree.
-
discodop.treebanktransforms.dlevel(tree, lang='nl')[source]¶ Return the D-level measure of syntactic complexity.
Original version: Rosenberg & Abbeduto (1987), https://doi.org/10.1017/S0142716400000047 Covington et al. (2006), http://ai1.ai.uga.edu/caspr/2006-01-Covington.pdf Dutch version implemented here: Appendix A of T-Scan manual, https://github.com/proycon/tscan/raw/master/docs/tscanhandleiding.pdf
Parameters: tree – A tree from the Alpino parser (i.e., not binarized, with function and morphological tags). Returns: integer 0-7; 7 is most complex.
-
discodop.treebanktransforms.pop(node)[source]¶ Remove this node from its parent node, if it has one.
Convenience function for ParentedTrees.
-
discodop.treebanktransforms.strip(label)[source]¶ Equivalent to the effect of the @ operator in tregex.
-
discodop.treebanktransforms.ancestors(node)[source]¶ Yield ancestors of node from direct parent to root node.
-
discodop.treebanktransforms.morphfeats(node)[source]¶ Return the morphological features of a preterminal node.
Features may be separated by dots or commas.
-
discodop.treebanktransforms.unifymorphfeat(feats, percolatefeatures=None)[source]¶ Get the sorted union of features for a sequence of feature vectors.
Parameters: - feats – a sequence of strings of comma/dot separated feature vectors.
- percolatefeatures – if a set is given, select only these features; by default all features are used.
>>> print(unifymorphfeat({'Def.*.*', '*.Sg.*', '*.*.Akk'})) Akk.Def.Sg >>> print(unifymorphfeat({'LID[bep,stan,rest]', 'N[soort,ev,zijd,stan]'})) bep,ev,rest,soort,stan,zijd
-
discodop.treebanktransforms.function(node)[source]¶ Returns: The first function tag for node, or the empty string.