discodop.eval¶
Evaluation of (discontinuous) parse trees.
Designed to behave like the reference implementation EVALB [1] for regular parse trees, with a natural extension to the discontinuous case. Also provides additional, alternative parse tree evaluation metrics (leaf ancestor, treeedit distance, unlabeled dependencies), as well as facilities for error analysis.
[1] http://nlp.cs.nyu.edu/evalb/
Functions
accuracy (reference, candidate) 
Compute fraction of equivalent pairs in two sequences. 
alignsent (csent, gsent, gpos) 
Map tokens of csent onto those of gsent , and translate indices. 
bracketing (node[, labeled]) 
Generate bracketing (label, indices) for a given node. 
bracketings (tree[, labeled, dellabel, disconly]) 
Return the labeled set of bracketings for a tree. 
editdistance (seq1, seq2) 
Calculate the Levenshtein editdistance between two strings. 
f_measure (reference, candidate[, alpha]) 
Get Fmeasure of precision and recall for two multisets. 
harmean (seq) 
Compute harmonic mean of a sequence of numbers. 
intervals (bitset) 
Return a sequence of intervals corresponding to contiguous ranges. 
leafancestor (goldtree, candtree, dellabel) 
Sampson, Babarcz (2002): A test of the leafancestor metric […]. 
leafancestorpaths (tree, dellabel) 
Generate a list of ancestors for each leaf node in a tree. 
main () 
Command line interface for evaluation. 
mean (seq) 
Compute arithmetic mean of a sequence. 
nozerodiv (func) 
Return func() as 6character string but catch zero division. 
parentedbracketings (tree[, labeled, …]) 
Return the labeled bracketings with parents for a tree. 
pathscore (gold, cand) 
Get edit distance for two leafancestor paths. 
precision (reference, candidate) 
Get precision score for two multisets. 
pyintbitcount (a) 
Return number of set bits (1s) in a Python integer. 
readparam (filename) 
Read an EVALBstyle parameter file and return a dictionary. 
recall (reference, candidate) 
Get recall score for two multisets. 
strbracketings (brackets) 
Return a string with a concise representation of a bracketing. 
transform (tree, sent, pos, gpos, param, grootpos) 
Apply the transformations according to the parameter file. 
transitiveclosure (eqpairs) 
Transitive closure of (undirected) EQ relations with DFS. 
treedisteval (a, b[, includeroot, debug]) 
Get treedistance for two trees and compute the Dice normalization. 
Classes
EvalAccumulator ([disconly]) 
Collect scores of evaluation. 
Evaluator (param[, keylen]) 
Incremental evaluator for syntactic trees. 
TreePairResult (n, gtree, gsent, ctree, …) 
Holds the evaluation result of a pair of trees. 

class
discodop.eval.
Evaluator
(param, keylen=8)[source]¶ Incremental evaluator for syntactic trees.
Initialize evaluator object with given parameters.
Parameters:  param – a dictionary of parameters, as read by
readparam
.  keylen – the length of the longest sentence ID, for padding purposes.
 param – a dictionary of parameters, as read by

class
discodop.eval.
TreePairResult
(n, gtree, gsent, ctree, csent, param)[source]¶ Holds the evaluation result of a pair of trees.
Construct a pair of gold and candidate trees for evaluation.

class
discodop.eval.
EvalAccumulator
(disconly=False)[source]¶ Collect scores of evaluation.
Parameters: disconly – if True, only collect discontinuous bracketings.

discodop.eval.
readparam
(filename)[source]¶ Read an EVALBstyle parameter file and return a dictionary.

discodop.eval.
transitiveclosure
(eqpairs)[source]¶ Transitive closure of (undirected) EQ relations with DFS.
Given a sequence of pairs denoting an equivalence relation, produce a dictionary with equivalence classes as values and arbitrary members of those classes as keys.
>>> result = transitiveclosure({('A', 'B'), ('B', 'C')}) >>> len(result) 1 >>> k, v = result.popitem() >>> k in ('A', 'B', 'C') and v == {'A', 'B', 'C'} True

discodop.eval.
alignsent
(csent, gsent, gpos)[source]¶ Map tokens of
csent
onto those ofgsent
, and translate indices.Returns: a copy of gpos
with indices ofcsent
as keys, but tags fromgpos
.>>> gpos = {0: "``", 1: 'RB', 2: '.', 3: "''"} >>> alignsent(['No'], ['``', 'No', '.', "''"], gpos) == {0: 'RB'} True

discodop.eval.
transform
(tree, sent, pos, gpos, param, grootpos)[source]¶ Apply the transformations according to the parameter file.
Does not delete the root node, which is a special case because if there is more than one child it cannot be deleted.
Parameters:  pos – a list with the contents of tree.pos(); modified inplace.
 gpos – a dictionary of the POS tags of the original gold tree, before any tags/words have been deleted.
 param – the parameters specifying which labels / words to delete
 grootpos – the set of indices with preterminals directly under the root node of the gold tree.
Returns: an immutable, transformed copy of
tree
.

discodop.eval.
parentedbracketings
(tree, labeled=True, dellabel=(), disconly=False)[source]¶ Return the labeled bracketings with parents for a tree.
Returns: multiset with items of the form ((label, indices), parentlabel)

discodop.eval.
bracketings
(tree, labeled=True, dellabel=(), disconly=False)[source]¶ Return the labeled set of bracketings for a tree.
For each nonterminal node, the set will contain a tuple with the label and the set of terminals which it dominates.
tree
must have been processed bytransform()
. The argumentdellabel
is only used to exclude the ROOT node from the results (because it cannot be deleted bytransform()
when nonunary).>>> tree = Tree('(S (NP 1) (VP (VB 0) (JJ 2)))') >>> params = {'DELETE_LABEL': set(), 'DELETE_WORD': set(), ... 'EQ_LABEL': {}, 'EQ_WORD': {}, ... 'DELETE_ROOT_PRETERMS': 0} >>> tree = transform(tree, tree.leaves(), tree.pos(), dict(tree.pos()), ... params, set()) >>> for (label, span), cnt in sorted(bracketings(tree).items()): ... print(label, bin(span), cnt) S 0b111 1 VP 0b101 1 >>> tree = Tree('(S (NP 1) (VP (VB 0) (JJ 2)))') >>> params['DELETE_LABEL'] = {'VP'} >>> tree = transform(tree, tree.leaves(), tree.pos(), dict(tree.pos()), ... params, set()) >>> for (label, span), cnt in sorted(bracketings(tree).items()): ... print(label, bin(span), cnt) S 0b111 1

discodop.eval.
bracketing
(node, labeled=True)[source]¶ Generate bracketing
(label, indices)
for a given node.

discodop.eval.
strbracketings
(brackets)[source]¶ Return a string with a concise representation of a bracketing.
>>> print(strbracketings({('S', 0b111), ('VP', 0b101)})) S[02], VP[0,2]

discodop.eval.
leafancestorpaths
(tree, dellabel)[source]¶ Generate a list of ancestors for each leaf node in a tree.

discodop.eval.
leafancestor
(goldtree, candtree, dellabel)[source]¶ Sampson, Babarcz (2002): A test of the leafancestor metric […].
http://www.lrecconf.org/proceedings/lrec2002/pdf/ws20.pdf p. 27; 2003 journal paper: https://doi.org/10.1017/S1351324903003243

discodop.eval.
treedisteval
(a, b, includeroot=False, debug=False)[source]¶ Get treedistance for two trees and compute the Dice normalization.

discodop.eval.
f_measure
(reference, candidate, alpha=Decimal('0.5'))[source]¶ Get Fmeasure of precision and recall for two multisets.
The default weight
alpha=0.5
corresponds to the F_1measure.

discodop.eval.
accuracy
(reference, candidate)[source]¶ Compute fraction of equivalent pairs in two sequences.
In particular, return the fraction of indices
0<i<=len(test)
such thattest[i] == reference[i]
.

discodop.eval.
harmean
(seq)[source]¶ Compute harmonic mean of a sequence of numbers.
Returns NaN when
seq
contains zero.

discodop.eval.
mean
(seq)[source]¶ Compute arithmetic mean of a sequence.
Returns NaN when
seq
is empty.

discodop.eval.
intervals
(bitset)[source]¶ Return a sequence of intervals corresponding to contiguous ranges.
seq
is an integer representing a bitvector. An interval is a pair(a, b)
, witha <= b
denoting a contiguous range of one bitsx
inseq
such thata <= x <= b
.>>> list(intervals(0b111011011)) # NB: read from right to left [(0, 1), (3, 4), (6, 8)]

discodop.eval.
editdistance
(seq1, seq2)[source]¶ Calculate the Levenshtein editdistance between two strings.
The edit distance is the number of characters that need to be substituted, inserted, or deleted, to transform seq1 into seq2. For example, transforming ‘rain’ to ‘shine’ requires three steps, consisting of two substitutions and one insertion: ‘rain’ > ‘sain’ > ‘shin’ > ‘shine’. These operations could have been done in other orders, but at least three steps are needed.