discodop.disambiguation

Disambiguate parse forests with various methods for parse selection.

Use as follows:

>>> getderivations(chart, 1000)  
>>> parses, msg = marginalize('mpp', chart)  

Functions

dopparseprob(tree, sent, Grammar coarse, …) Compute the exact DOP parse probability of a Tree in a DOP reduction.
doprerank(parsetrees, sent, k, …) Rerank k-best coarse trees w/parse probabilities of DOP reduction.
fragmentsinderiv_str(unicode deriv, chart, …) Extract the list of fragments that were used in a given derivation.
frontiernt(node) Test whether node from a DOP derivation is a frontier nonterminal.
getderivations(Chart chart, int k[, …]) Get k-best derivations from chart.
gettree(cells, span) Extract parse tree from most constituents correct table.
marginalize(method, Chart chart, …) Take a list of derivations and optimizes a given objective function.
mcrerank(parsetrees, sent, k, trees, vocab) Rerank k-best trees using tree fragments from training treebank.
ostagderivation(derivtreestr, sent) Extract the list of fragments that were used in a given derivation.
ostagfrontiernt(node) Test if osTAG derivation node is a substitution/adjunction site.
removeadjunaries(tree) Remove artificial unary adjunction nodes from osTAG derivation.
splitfrag(node) Return a copy of a tree with subtrees labeled without ‘@’ removed.
splitostagfrag(node, sent) Return copy of tree after pruning subtrees that are subst/adj sites.
test()
testconstraints(treestr, require, block) Test if tree satisfies constraints of required/blocked labeled spans.
treeparsing(trees, sent, Grammar grammar, …) Assign probabilities to a sequence of trees with a DOP grammar.
viterbiderivation(Chart chart) Wrapper to get Viterbi derivation from chart.
discodop.disambiguation.getderivations(Chart chart, int k, derivstrings=True)

Get k-best derivations from chart.

Parameters:
  • k – number of derivations to extract from chart
  • derivstrings – whether to create derivations as strings
Returns:

None. Modifies chart.derivations and chart.rankededges in-place.

chart.derivations:
 list of tuples (deriv, logprob) where deriv is a string; or list of None if derivstrings==False
chart.rankededges[chart.root()]:
 corresponding list of RankedEdge objects for the derivations in chart.derivations.

discodop.disambiguation.marginalize(method, Chart chart, list sent=None, list tags=None, int k=1000, int sldop_n=7, double mcplambda=1.0, set mcplabels=None, bool ostag=False, set require=None, set block=None)

Take a list of derivations and optimizes a given objective function.

  1. Rewrites derivations into the intended parse trees.
  2. Marginalizes (sum / combine) weights of derivations for same parse tree.
  3. Applies objective function (scoring criterion) for ranking parse trees.

Dependending on the value of chart.grammar.backtransform, one of two ways is employed to turn derivations into parse trees:

None:assume derivations are from DOP reduction and strip annotations of the form @123 from labels. This option is also applicable when the grammar is not a TSG and derivations already coincide with parse trees.
list of strings:
 assume Double-DOP and recover derivations by substituting fragments in this list for flattened rules in derivations.
Parameters:
  • method

    Available objective functions:

    ’mpp’:Most Probable Parse
    ’mpd’:Most Probable Derivation
    ’shortest’:Most Probable Shortest Derivation
    ’sl-dop’:Simplicity-Likelihood DOP; select most likely parse from the sldop_n parse trees with the shortest derivations.
    ’sl-dop-simple’:
     Approximation of Simplicity-Likelihood DOP
  • k – when method='sl-dop, number of derivations to consider.
  • require – optionally, a list of tuples (label, indices); only parse trees containing these labeled spans will be kept. For example. ('NP', [0, 1, 2]).
  • block – optionally, a list of tuples (label, indices); parse trees with these labeled spans will be pruned.
Returns:

(parses, msg).

parses:a list of tuples (parsetree, probability, fragments) where parsetree is a string, probability is a float (0 < p <= 1), or a tuple (numfrags, p) for shortest derivation parsing, and fragments is a list of fragments in the most probable derivation for this parse. NB: the list is in an arbitrary order.
msg:a message reporting the number of derivations / parses.

discodop.disambiguation.gettree(cells, span)

Extract parse tree from most constituents correct table.

discodop.disambiguation.treeparsing(trees, sent, Grammar grammar, int m, backtransform, tags=None, maskrules=True)

Assign probabilities to a sequence of trees with a DOP grammar.

Given a sequence of trees (as strings), parse them with a DOP grammar to get parse tree probabilities; will consider multiple derivations.

Parameters:maskrules – If True, prune any grammar rule not in the trees. If DOP reduction is used, requires selfrulemapping of grammar which should map to itself; e.g., ‘NP@2 => DT@3 NN’ should be mapped to ‘NP => DT NN’ in the same grammar.
Returns:a tuple (derivations, msg, chart).
discodop.disambiguation.viterbiderivation(Chart chart)

Wrapper to get Viterbi derivation from chart.

discodop.disambiguation.doprerank(parsetrees, sent, k, Grammar coarse, Grammar fine)

Rerank k-best coarse trees w/parse probabilities of DOP reduction.

cf. dopparseprob().

discodop.disambiguation.dopparseprob(tree, sent, Grammar coarse, Grammar fine)

Compute the exact DOP parse probability of a Tree in a DOP reduction.

This follows up on a suggestion made by Goodman (2003, p. 143) of calculating DOP probabilities of given parse trees, although I’m not sure it has complexity O(nP) as he suggests (with n as number of nodes in input, and P as max number of rules consistent with a node in the input). Furthermore, the idea of sampling trees “long enough” until we have the MPP is no faster than sampling without applying this procedure, because to determine that some probability p is the maximal probability, we need to collect the probability mass p_seen of enough parse trees such that we have some parsetree with probability p > (1 - p_seen), which requires first seeing almost all parse trees, unless p is exceptionally high. Hence, this method is mostly useful in a reranking framework where it is known in advance that a small set of trees is of interest.

Expects a mapping which gives a list of consistent rules from the reduction as produced by fine.getrulemapping(coarse, re.compile('@[-0-9]+$')).

NB: this algorithm could also be used to determine the probability of derivations, but then the input would have to distinguish whether nodes are internal nodes of fragments, or whether they join two fragments.

discodop.disambiguation.frontiernt(node)

Test whether node from a DOP derivation is a frontier nonterminal.

discodop.disambiguation.splitfrag(node)

Return a copy of a tree with subtrees labeled without ‘@’ removed.

discodop.disambiguation.testconstraints(treestr, require, block)

Test if tree satisfies constraints of required/blocked labeled spans.